其他分享
首页 > 其他分享> > sklearn模型使用贝叶斯优化调参

sklearn模型使用贝叶斯优化调参

作者:互联网

文章目录

贝叶斯优化github地址:https://github.com/fmfn/BayesianOptimization

paper地址:http://papers.nips.cc/paper/4522-practical-bayesian%20-optimization-of-machine-learning-algorithms.pdf
Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. “Practical bayesian optimization of machine learning algorithms.” Advances in neural information processing systems 25 (2012).

以随机森林为例:

1. 构造数据源

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
import pandas as pd

然后构造一个二分类任务:

x, y = make_classification(n_samples=1000, n_features=5, n_classes=2)

2. 构造黑盒目标函数

def rf_cv(n_estimators, min_samples_split, max_features, max_depth):
    val = cross_val_score(
    	# 这些是随机森林的
        RandomForestClassifier(n_estimators=int(n_estimators),
                               min_samples_split=int(min_samples_split),
                               max_features=min(max_features, 0.999),  # float
                               max_depth=int(max_depth),
                               random_state=2),
        x, y, scoring=['f1', 'accuracy'], cv=5
    ).mean()
    return val

更多评价指标请参考:https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules

3. 确定取值空间

pbounds = {'n_estimators': (10, 250),  # 表示取值范围为10至250
           'min_samples_split': (2, 25),
           'max_features': (0.1, 0.999),
           'max_depth': (5, 15)}

这里字典里的key要与目标函数的参数名对应

4. 构造贝叶斯优化器

optimizer = BayesianOptimization(
    f=rf_cv,  # 黑盒目标函数
    pbounds=pbounds,  # 取值空间
    verbose=2,  # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
    random_state=1,
)

5. 运行,导出结果与最优参数

optimizer.maximize(  # 运行
    init_points=5,  # 随机搜索的步数
    n_iter=25,  # 执行贝叶斯优化迭代次数
)
print(optimizer.res)  # 所有优化的结果
print(optimizer.max)  # 最好的结果与对应的参数

全部代码

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
import pandas as pd

# 产生随机分类数据集,10个特征, 2个类别
x, y = make_classification(n_samples=1000, n_features=5, n_classes=2)

# 步骤一:构造黑盒目标函数
scoring = {'acc': 'accuracy',
           'f_1': 'f1', }


def rf_cv(n_estimators, min_samples_split, max_features, max_depth):
    val = cross_val_score(
        RandomForestClassifier(n_estimators=int(n_estimators),
                               min_samples_split=int(min_samples_split),
                               max_features=min(max_features, 0.999),  # float
                               max_depth=int(max_depth),
                               random_state=2),
        x, y, scoring='f1', cv=5
    ).mean()
    return val


# 步骤二:确定取值空间
pbounds = {'n_estimators': (10, 250),  # 表示取值范围为10至250
           'min_samples_split': (2, 25),
           'max_features': (0.1, 0.999),
           'max_depth': (5, 15)}

# 步骤三:构造贝叶斯优化器
optimizer = BayesianOptimization(
    f=rf_cv,  # 黑盒目标函数
    pbounds=pbounds,  # 取值空间
    verbose=2,  # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
    random_state=1,
)
optimizer.maximize(  # 运行
    init_points=5,  # 随机搜索的步数
    n_iter=25,  # 执行贝叶斯优化迭代次数
)
print(optimizer.res)  # 打印所有优化的结果
print(optimizer.max)  # 最好的结果与对应的参数

结果显示如下:

itertargetmax_depthmax_fe…min_sa…n_esti…
10.95219.170.74762.00382.56
20.94756.4680.1836.28492.93
30.95028.9680.584411.64174.5
40.9527.0450.88942.63170.9
50.95219.1730.60235.22957.54
60.95228.3040.60735.08657.32
70.951110.590.72312.4774.19
80.94666.6110.24313.66749.53
90.94926.1820.88034.41162.05
100.95147.7350.11644.57679.58
110.953112.720.41084.38981.27
120.951314.280.73383.1284.51
130.950114.80.83986.76777.78
140.951212.650.29562.37679.54
150.952312.040.10536.51382.47
160.950111.790.66552.21168.6
170.95338.3740.4229.81356.87
180.952311.810.873711.0556.84
190.95238.270.636713.3257.61
200.95148.1260.408111.0153.97
210.94959.3230.110.260.14
220.95468.760.15127.38155.76
230.950510.760.14337.15555.15
240.95557.2060.44566.97355.74
250.95435.3590.98097.83555.49
260.95547.0830.41538.07555.05
270.95546.9630.51638.68756.26
280.954314.520.709416.456.91
290.951512.070.727219.0656.5
300.951214.30.52414.4359.32

最优的参数为:

{'target': 0.9554574460534715, 
'params':{
	'max_depth': 7.2061957920136965, 
	'max_features': 0.44564993926538743, 
	'min_samples_split': 6.972807143834928, 
	'n_estimators': 55.73671041246315
}}

标签:features,min,调参,estimators,贝叶斯,samples,max,import,sklearn
来源: https://blog.csdn.net/weixin_35757704/article/details/118416689