python – 来自scipy.cluster.kmeans的不稳定结果
作者:互联网
以下代码在每个运行时给出不同的结果,同时使用k均值方法将数据聚类为3个部分:
from numpy import array
from scipy.cluster.vq import kmeans,vq
data = array([1,1,1,1,1,1,3,3,3,3,3,3,7,7,7,7,7,7])
centroids = kmeans(data,3,100) #with 100 iterations
print (centroids)
获得的三个可能的结果是:
(array([1, 3, 7]), 0.0)
(array([3, 7, 1]), 0.0)
(array([7, 3, 1]), 0.0)
实际上,计算出的k均值的顺序是不同的.但是,分配哪个k意味着哪个点属于哪个集群并不稳定?任何的想法??
解决方法:
那是因为如果你传递一个整数作为k_or_guess参数,那么从输入观测值集中随机选择k个初始质心(这被称为Forgy method).
从the docs开始:
k_or_guess : int or ndarray
The number of centroids to generate. A
code is assigned to each centroid, which is also the row index of the
centroid in the code_book matrix generated.The initial k centroids
are chosen by randomly selecting observations from the observation
matrix. Alternatively, passing a k by N array specifies the initial k
centroids.
尝试交替猜测:
kmeans(data,np.array([1,3,7]),100)
# (array([1, 3, 7]), 0.0)
# (array([1, 3, 7]), 0.0)
# (array([1, 3, 7]), 0.0)
标签:python,numpy,scipy,k-means 来源: https://codeday.me/bug/20190825/1718346.html