python – Scikit-learn的LabelBinarizer与OneHotEncoder
作者:互联网
两者有什么区别?似乎两者都创建了新列,其数量等于要素中唯一类别的数量.然后,他们根据数据点的类别为数据点分配0和1.
解决方法:
使用LabelEncoder,OneHotEncoder,LabelBinarizer对数组进行编码的简单示例如下所示.
我看到OneHotEncoder首先需要整数编码形式的数据转换成各自的编码,而不需要LabelBinarizer.
from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold',
'warm', 'hot']
values = array(data)
print "Data: ", values
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print "Label Encoder:" ,integer_encoded
# onehot encode
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print "OneHot Encoder:", onehot_encoded
#Binary encode
lb = LabelBinarizer()
print "Label Binarizer:", lb.fit_transform(values)
另一个解释OneHotEncoder的好链接是:Explain onehotencoder using python
专家们可能解释的两者之间可能存在其他有效差异.
标签:categorical-data,python,scikit-learn,data-science,encoding 来源: https://codeday.me/bug/20191004/1852017.html