编程语言
首页 > 编程语言> > python – 来自scipy.cluster.kmeans的不稳定结果

python – 来自scipy.cluster.kmeans的不稳定结果

作者:互联网

以下代码在每个运行时给出不同的结果,同时使用k均值方法将数据聚类为3个部分:

from numpy import array
from scipy.cluster.vq import kmeans,vq

data = array([1,1,1,1,1,1,3,3,3,3,3,3,7,7,7,7,7,7])
centroids = kmeans(data,3,100) #with 100 iterations
print (centroids)

获得的三个可能的结果是:

(array([1, 3, 7]), 0.0)
(array([3, 7, 1]), 0.0)
(array([7, 3, 1]), 0.0)

实际上,计算出的k均值的顺序是不同的.但是,分配哪个k意味着哪个点属于哪个集群并不稳定?任何的想法??

解决方法:

那是因为如果你传递一个整数作为k_or_guess参数,那么从输入观测值集中随机选择k个初始质心(这被称为Forgy method).

the docs开始:

k_or_guess : int or ndarray

The number of centroids to generate. A
code is assigned to each centroid, which is also the row index of the
centroid in the code_book matrix generated.

The initial k centroids
are chosen by randomly selecting observations from the observation
matrix. Alternatively, passing a k by N array specifies the initial k
centroids.

尝试交替猜测:

kmeans(data,np.array([1,3,7]),100)

# (array([1, 3, 7]), 0.0)
# (array([1, 3, 7]), 0.0)
# (array([1, 3, 7]), 0.0)

标签:python,numpy,scipy,k-means
来源: https://codeday.me/bug/20190825/1718346.html