编程语言
首页 > 编程语言> > python – Scikit-learn的LabelBinarizer与OneHotEncoder

python – Scikit-learn的LabelBinarizer与OneHotEncoder

作者:互联网

两者有什么区别?似乎两者都创建了新列,其数量等于要素中唯一类别的数量.然后,他们根据数据点的类别为数据点分配0和1.

解决方法:

使用LabelEncoder,OneHotEncoder,LabelBinarizer对数组进行编码的简单示例如下所示.

我看到OneHotEncoder首先需要整数编码形式的数据转换成各自的编码,而不需要LabelBinarizer.

from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 
'warm', 'hot']
values = array(data)
print "Data: ", values
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print "Label Encoder:" ,integer_encoded

# onehot encode
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print "OneHot Encoder:", onehot_encoded

#Binary encode
lb = LabelBinarizer()
print "Label Binarizer:", lb.fit_transform(values)

enter image description here

另一个解释OneHotEncoder的好链接是:Explain onehotencoder using python

专家们可能解释的两者之间可能存在其他有效差异.

标签:categorical-data,python,scikit-learn,data-science,encoding
来源: https://codeday.me/bug/20191004/1852017.html