首页 > 编程语言> > python – 在3d数组的“切片”中置换行以相互匹配

python – 在3d数组的“切片”中置换行以相互匹配

2019-07-06 06:57:16 作者：互联网

我有一系列的2d数组,其中行是某些空间中的点.所有数组都有许多类似的点,但行顺序不同.我想对行进行排序,以便它们具有最相似的顺序.对于使用K-means或DBSCAN进行聚类,这些点也太不同了.问题也可以像这样.如果我将数组堆叠成3d数组,我如何置换行以最小化沿第二轴的平均标准偏差(SD)？这个问题有什么好的排序算法？

我尝试了以下方法.

>创建一组引用2d数组并对每个数组中的行进行排序,以最小化到参考2d数组的平均欧氏距离.这恐怕会产生偏颇的结果.
>按顺序对数组中的行进行排序,然后对成对的中间数,然后是成对的等等……这实际上不起作用,我不知道为什么.

第三种方法可能只是强力优化,但我试图避免这种情况,因为我有多组数组来执行该过程.

这是我的第二种方法(Python)的代码：

def reorder_to(A, B):
    """Reorder rows in A to best match rows in B.

    Input
    -----
    A : N x M numpy.array
    B : N x M numpy.array

    Output
    ------
    perm_order : permutation order
    """

    if A.shape != B.shape:
        print "A and B must have the same shape"
        return None

    N = A.shape[0]

    # Create a distance matrix of distance between rows in A and B
    distance_matrix = np.ones((N, N))*np.inf
    for i, a in enumerate(A):
        for ii, b in enumerate(B):
            ba = (b-a)
            distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))

    # Choose permutation order by smallest distances first
    perm_order = [[] for _ in range(N)]
    for _ in range(N):
        ind = np.argmin(distance_matrix)
        i, ii = ind/N, ind%N
        perm_order[ii] = i
        distance_matrix[i, :] = np.inf
        distance_matrix[:, ii] = np.inf

    return perm_order


def permute_tensor_rows(A):
    """Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.

    Input
    -----
    A : numpy.3darray
        Each "slice" in the 2nd direction is an independent array whose rows can be permuted
        to decrease the average SD in the 2nd direction.

    Output
    ------
    A : numpy.3darray
        A with sorted rows in each "slice".
    """
    step = 2
    while step <= A.shape[2]:
        for k in range(0, A.shape[2], step):

            # If last, reorder to previous
            if k + step > A.shape[2]:
                A_kk = A[:, :, k:(k+step)]
                kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
                A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
                continue

            k_0, k_1 = k, k+step/2
            kk_0, kk_1 = k+step/2, k+step

            A_k = A[:, :, k_0:k_1]
            A_kk = A[:, :, kk_0:kk_1]

            order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
            A[:, :, k_0:k_1] = A[order, :, k_0:k_1]

        print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
        step *= 2

    return A

解决方法:

对不起,我应该查看你的代码示例;这是非常翔实的.

这里看起来似乎为您的问题提供了开箱即用的解决方案：

http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment

根据我的经验,只有最多100分才真正可行.

标签：python,arrays,sorting,cluster-analysis,numpy
来源： https://codeday.me/bug/20190706/1394917.html