其他分享
首页 > 其他分享> > 为什么带有1个估计量的adaboost比简单的决策树要快?

为什么带有1个估计量的adaboost比简单的决策树要快?

作者:互联网

我想比较adaboost和决策树.作为原理上的证明,我将决策树分类器的默认值设置为adaboost中的估计数为1,期望得到与简单决策树相同的结果.

在预测测试标签方面,我确实具有相同的准确性.但是,适合adaboost的拟合时间要短得多,而测试时间要长一些. Adaboost似乎使用与DecisionTreeClassifier相同的默认设置,否则,准确性将不完全相同.

谁能解释一下?

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score   

print("creating classifier")
clf = AdaBoostClassifier(n_estimators = 1)
clf2 = DecisionTreeClassifier()

print("starting to fit")

time0 = time()
clf.fit(features_train,labels_train) #fit adaboost
fitting_time = time() - time0
print("time for fitting adaboost was", fitting_time)

time0 = time()
clf2.fit(features_train,labels_train) #fit dtree
fitting_time = time() - time0
print("time for fitting dtree was", fitting_time)

time1 = time()
pred = clf.predict(features_test) #test adaboost
test_time = time() - time1
print("time for testing adaboost was", test_time)

time1 = time()
pred = clf2.predict(features_test) #test dtree
test_time = time() - time1
print("time for testing dtree was", test_time)

accuracy_ada = accuracy_score(pred, labels_test) #acc ada
print("accuracy for adaboost is", accuracy_ada)

accuracy_dt = accuracy_score(pred, labels_test) #acc dtree
print("accuracy for dtree is", accuracy_dt)

输出量

('time for fitting adaboost was', 3.8290421962738037)
('time for fitting dtree was', 85.19442415237427)
('time for testing adaboost was', 0.1834099292755127)
('time for testing dtree was', 0.056527137756347656)
('accuracy for adaboost is', 0.99089874857792948)
('accuracy for dtree is', 0.99089874857792948)

解决方法:

我尝试在IPython中重复您的实验,但没有太大的区别:

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
import numpy as np
x = np.random.randn(3785,16000)
y = (x[:,0]>0.).astype(np.float)    
clf = AdaBoostClassifier(n_estimators = 1)
clf2 = DecisionTreeClassifier()
%timeit clf.fit(x,y)
1 loop, best of 3: 5.56 s per loop
%timeit clf2.fit(x,y)
1 loop, best of 3: 5.51 s per loop

尝试使用探查器,或首先重复实验.

标签:decision-tree,adaboost,scikit-learn,machine-learning,python
来源: https://codeday.me/bug/20191112/2024386.html