首页 > 编程语言> > Python语言编程学习：sklearn.manifold的TSNE函数的简介、使用方法、代码实现之详细攻略

Python语言编程学习：sklearn.manifold的TSNE函数的简介、使用方法、代码实现之详细攻略

2020-12-18 23:29:47 作者：互联网

TSNE简介

t-分布随机邻居嵌入。t-SNE是一个可视化高维数据的工具。它将数据点之间的相似性转化为联合概率，并试图最小化低维嵌入和高维数据联合概率之间的Kullback-Leibler差异。t-SNE有一个非凸的代价函数，即通过不同的初始化，我们可以得到不同的结果。强烈建议使用另一种降维方法(如密集数据的PCA或稀疏数据的集群svd)来减少维数到一个合理的数量(如50)，如果特征的数量非常高。这将抑制一些噪声，加快样本间成对距离的计算。想要了解更多的技巧，请参阅劳伦斯·范·德·马顿的FAQ[2]。

TSNE使用方法

from sklearn.manifold import TSNE
visual_model = TSNE(metric='precomputed', perplexity=10)  # t分布随机邻接嵌入
visual = visual_model.fit_transform(dis)

TSNE代码实现

class TSNE Found at: sklearn.manifold._t_sne

class TSNE(BaseEstimator):
    """t-distributed Stochastic Neighbor Embedding.
    
    t-SNE [1] is a tool to visualize high-dimensional data. It converts
    similarities between data points to joint probabilities and tries
    to minimize the Kullback-Leibler divergence between the joint
    probabilities of the low-dimensional embedding and the
    high-dimensional data. t-SNE has a cost function that is not convex,
    i.e. with different initializations we can get different results.
    
    It is highly recommended to use another dimensionality reduction
    method (e.g. PCA for dense data or TruncatedSVD for sparse data)
    to reduce the number of dimensions to a reasonable amount (e.g. 
     50)
    if the number of features is very high. This will suppress some
    noise and speed up the computation of pairwise distances 
     between
    samples. For more tips see Laurens van der Maaten's FAQ [2].
    
    Read more in the :ref:`User Guide <t_sne>`.
    
    Parameters
    ----------
    n_components : int, optional (default: 2)
    Dimension of the embedded space.
    
    perplexity : float, optional (default: 30)
    The perplexity is related to the number of nearest neighbors that
    is used in other manifold learning algorithms. Larger datasets
    usually require a larger perplexity. Consider selecting a value
    between 5 and 50. Different values can result in significanlty
    different results.
    
    early_exaggeration : float, optional (default: 12.0)
    Controls how tight natural clusters in the original space are in
    the embedded space and how much space will be between them. 
     For
    larger values, the space between natural clusters will be larger
    in the embedded space. Again, the choice of this parameter is not
    very critical. If the cost function increases during initial
    optimization, the early exaggeration factor or the learning rate
    might be too high.
    
    learning_rate : float, optional (default: 200.0)
    The learning rate for t-SNE is usually in the range [10.0, 1000.0]. If
    the learning rate is too high, the data may look like a 'ball' with any
    point approximately equidistant from its nearest neighbours. If the
    learning rate is too low, most points may look compressed in a 
     dense
    cloud with few outliers. If the cost function gets stuck in a bad local
    minimum increasing the learning rate may help.
    
    n_iter : int, optional (default: 1000)
    Maximum number of iterations for the optimization. Should be at
    least 250.
    
    n_iter_without_progress : int, optional (default: 300)
    Maximum number of iterations without progress before we abort 
     the
    optimization, used after 250 initial iterations with early
    exaggeration. Note that progress is only checked every 50 
     iterations so
    this value is rounded to the next multiple of 50.
    
    .. versionadded:: 0.17
    parameter *n_iter_without_progress* to control stopping criteria.
    
    min_grad_norm : float, optional (default: 1e-7)
    If the gradient norm is below this threshold, the optimization will
    be stopped.
    
    metric : string or callable, optional
    The metric to use when calculating distance between instances in a
    feature array. If metric is a string, it must be one of the options
    allowed by scipy.spatial.distance.pdist for its metric parameter, or
    a metric listed in pairwise.PAIRWISE_DISTANCE_FUNCTIONS.
    If metric is "precomputed", X is assumed to be a distance matrix.
    Alternatively, if metric is a callable function, it is called on each
    pair of instances (rows) and the resulting value recorded. The 
     callable
    should take two arrays from X as input and return a value indicating
    the distance between them. The default is "euclidean" which is
    interpreted as squared euclidean distance.
    
    init : string or numpy array, optional (default: "random")
    Initialization of embedding. Possible options are 'random', 'pca',
    and a numpy array of shape (n_samples, n_components).
    PCA initialization cannot be used with precomputed distances and 
     is
    usually more globally stable than random initialization.
    
    verbose : int, optional (default: 0)
    Verbosity level.
    
    random_state : int, RandomState instance, default=None
    Determines the random number generator. Pass an int for 
     reproducible
    results across multiple function calls. Note that different
    initializations might result in different local minima of the cost
    function. See :term: `Glossary <random_state>`.
    
    method : string (default: 'barnes_hut')
    By default the gradient calculation algorithm uses Barnes-Hut
    approximation running in O(NlogN) time. method='exact'
    will run on the slower, but exact, algorithm in O(N^2) time. The
    exact algorithm should be used when nearest-neighbor errors 
     need
    to be better than 3%. However, the exact method cannot scale to
    millions of examples.
    
    .. versionadded:: 0.17
    Approximate optimization *method* via the Barnes-Hut.
    
    angle : float (default: 0.5)
    Only used if method='barnes_hut'
    This is the trade-off between speed and accuracy for Barnes-Hut T-
     SNE.
    'angle' is the angular size (referred to as theta in [3]) of a distant
    node as measured from a point. If this size is below 'angle' then it is
    used as a summary node of all points contained within it.
    This method is not very sensitive to changes in this parameter
    in the range of 0.2 - 0.8. Angle less than 0.2 has quickly increasing
    computation time and angle greater 0.8 has quickly increasing 
     error.
    
    n_jobs : int or None, optional (default=None)
    The number of parallel jobs to run for neighbors search. This 
     parameter
    has no impact when ``metric="precomputed"`` or
    (``metric="euclidean"`` and ``method="exact"``).
    ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
    ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
    for more details.
    
    .. versionadded:: 0.22
    
    Attributes
    ----------
    embedding_ : array-like, shape (n_samples, n_components)
    Stores the embedding vectors.
    
    kl_divergence_ : float
    Kullback-Leibler divergence after optimization.
    
    n_iter_ : int
    Number of iterations run.
    
    Examples
    --------
    
    >>> import numpy as np
    >>> from sklearn.manifold import TSNE
    >>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
    >>> X_embedded = TSNE(n_components=2).fit_transform(X)
    >>> X_embedded.shape
    (4, 2)
    
    References
    ----------
    
    [1] van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-
     Dimensional Data
    Using t-SNE. Journal of Machine Learning Research 9:2579-2605, 
     2008.
    
    [2] van der Maaten, L.J.P. t-Distributed Stochastic Neighbor 
     Embedding
    https://lvdmaaten.github.io/tsne/
    
    [3] L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based 
     Algorithms.
    Journal of Machine Learning Research 15(Oct):3221-3245, 2014.
    https://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf
    """
    # Control the number of exploration iterations with 
     early_exaggeration on
    _EXPLORATION_N_ITER = 250
    # Control the number of iterations between progress checks
    _N_ITER_CHECK = 50
    @_deprecate_positional_args
    def __init__(self, n_components=2, *, perplexity=30.0, 
        early_exaggeration=12.0, learning_rate=200.0, n_iter=1000, 
        n_iter_without_progress=300, min_grad_norm=1e-7, 
        metric="euclidean", init="random", verbose=0, 
        random_state=None, method='barnes_hut', angle=0.5, 
        n_jobs=None):
        self.n_components = n_components
        self.perplexity = perplexity
        self.early_exaggeration = early_exaggeration
        self.learning_rate = learning_rate
        self.n_iter = n_iter
        self.n_iter_without_progress = n_iter_without_progress
        self.min_grad_norm = min_grad_norm
        self.metric = metric
        self.init = init
        self.verbose = verbose
        self.random_state = random_state
        self.method = method
        self.angle = angle
        self.n_jobs = n_jobs
    
    def _fit(self, X, skip_num_points=0):
        """Private function to fit the model using X as training data."""
        if self.method not in ['barnes_hut', 'exact']:
            raise ValueError("'method' must be 'barnes_hut' or 'exact'")
        if self.angle < 0.0 or self.angle > 1.0:
            raise ValueError("'angle' must be between 0.0 - 1.0")
        if self.method == 'barnes_hut':
            X = self._validate_data(X, accept_sparse=['csr'], 
                ensure_min_samples=2, 
                dtype=[np.float32, np.float64])
        else:
            X = self._validate_data(X, accept_sparse=['csr', 'csc', 'coo'], 
                dtype=[np.float32, np.float64])
        if self.metric == "precomputed":
            if isinstance(self.init, str) and self.init == 'pca':
                raise ValueError("The parameter init=\"pca\" cannot be "
                    "used with metric=\"precomputed\".")
            if X.shape[0] != X.shape[1]:
                raise ValueError("X should be a square distance matrix")
            check_non_negative(X, "TSNE.fit(). With 
             metric='precomputed', X "
                "should contain positive distances.")
            if self.method == "exact" and issparse(X):
                raise TypeError('TSNE with method="exact" does not 
                 accept sparse '
                    'precomputed distance matrix. Use method="
                     barnes_hut" '
                    'or provide the dense distance matrix.')
        if self.method == 'barnes_hut' and self.n_components > 3:
            raise ValueError("'n_components' should be inferior to 4 for 
             the "
                "barnes_hut algorithm as it relies on "
                "quad-tree or oct-tree.")
        random_state = check_random_state(self.random_state)
        if self.early_exaggeration < 1.0:
            raise ValueError(
                "early_exaggeration must be at least 1, but is {}".format(self.
                 early_exaggeration))
        if self.n_iter < 250:
            raise ValueError("n_iter should be at least 250")
        n_samples = X.shape[0]
        neighbors_nn = None
        if self.method == "exact":
            # Retrieve the distance matrix, either using the precomputed 
             one or
            # computing it.
            if self.metric == "precomputed":
                distances = X
            else:
                if self.verbose:
                    print("[t-SNE] Computing pairwise distances...")
                if self.metric == "euclidean":
                    distances = pairwise_distances(X, metric=self.metric, 
                        squared=True)
                else:
                    distances = pairwise_distances(X, metric=self.metric, 
                        n_jobs=self.n_jobs)
                if np.any(distances < 0):
                    raise ValueError("All distances should be positive, the "
                        "metric given is not correct")
            # compute the joint probability distribution for the input 
             space
            P = _joint_probabilities(distances, self.perplexity, self.verbose)
            assert np.all(np.isfinite(P)), "All probabilities should be finite"
            assert np.all(P >= 0), "All probabilities should be non-
             negative"
            assert np.all(P <= 1), ("All probabilities should be less "
                "or then equal to one")
        else:
            n_neighbors = min(n_samples - 1, int(3. * self.perplexity + 1))
            if self.verbose:
                print("[t-SNE] Computing {} nearest neighbors...".format
                 (n_neighbors))
            # Find the nearest neighbors for every point
            knn = NearestNeighbors(algorithm='auto', n_jobs=self.
             n_jobs, 
                n_neighbors=n_neighbors, 
                metric=self.metric)
            t0 = time()
            knn.fit(X)
            duration = time() - t0
            if self.verbose:
                print("[t-SNE] Indexed {} samples in {:.3f}s...".format
                 (n_samples, duration))
            t0 = time()
            distances_nn = knn.kneighbors_graph(mode='distance')
            duration = time() - t0
            if self.verbose:
                print("[t-SNE] Computed neighbors for {} samples "
                    "in {:.3f}s...".
                    format(n_samples, duration)) # Free the memory used by 
                     the ball_tree
            del knn
            if self.metric == "euclidean":
            # knn return the euclidean distance but we need it squared
            # to be consistent with the 'exact' method. Note that the
            # the method was derived using the euclidean method as in 
             the
            # input space. Not sure of the implication of using a different
            # metric.
                distances_nn.data **= 2
            # compute the joint probability distribution for the input 
             space
            P = _joint_probabilities_nn(distances_nn, self.perplexity, self.
             verbose) # Compute the number of nearest neighbors to find.
            # LvdM uses 3 * perplexity as the number of neighbors.
            # In the event that we have very small # of points
            # set the neighbors to n - 1.
        if isinstance(self.init, np.ndarray):
            X_embedded = self.init
        elif self.init == 'pca':
            pca = PCA(n_components=self.n_components, 
             svd_solver='randomized', random_state=random_state)
            X_embedded = pca.fit_transform(X).astype(np.float32, 
             copy=False)
        elif self.init == 'random': # The embedding is initialized with iid 
         samples from Gaussians with
        # standard deviation 1e-4.
            X_embedded = 1e-4 * random_state.randn(n_samples, self.
             n_components).astype(np.float32)
        else:
            raise ValueError("'init' must be 'pca', 'random', or "
                "a numpy array")
        # Degrees of freedom of the Student's t-distribution. The 
         suggestion
        # degrees_of_freedom = n_components - 1 comes from
        # "Learning a Parametric Embedding by Preserving Local 
         Structure"
        # Laurens van der Maaten, 2009.
        degrees_of_freedom = max(self.n_components - 1, 1)
        return self._tsne(P, degrees_of_freedom, n_samples, 
            X_embedded=X_embedded, 
            neighbors=neighbors_nn, 
            skip_num_points=skip_num_points)
    
    def _tsne(self, P, degrees_of_freedom, n_samples, X_embedded, 
        neighbors=None, skip_num_points=0):
        """Runs t-SNE."""
        # t-SNE minimizes the Kullback-Leiber divergence of the 
         Gaussians P
        # and the Student's t-distributions Q. The optimization 
         algorithm that
        # we use is batch gradient descent with two stages:
        # * initial optimization with early exaggeration and momentum 
         at 0.5
        # * final optimization with momentum at 0.8
        params = X_embedded.ravel()
        opt_args = {
            "it":0, 
            "n_iter_check":self._N_ITER_CHECK, 
            "min_grad_norm":self.min_grad_norm, 
            "learning_rate":self.learning_rate, 
            "verbose":self.verbose, 
            "kwargs":dict(skip_num_points=skip_num_points), 
            "args":[P, degrees_of_freedom, n_samples, self.
             n_components], 
            "n_iter_without_progress":self._EXPLORATION_N_ITER, 
            "n_iter":self._EXPLORATION_N_ITER, 
            "momentum":0.5}
        if self.method == 'barnes_hut':
            obj_func = _kl_divergence_bh
            opt_args['kwargs']['angle'] = self.angle
            # Repeat verbose argument for _kl_divergence_bh
            opt_args['kwargs']['verbose'] = self.verbose # Get the number 
             of threads for gradient computation here to
            # avoid recomputing it at each iteration.
            opt_args['kwargs']['num_threads'] = 
             _openmp_effective_n_threads()
        else:
            obj_func = _kl_divergence
        # Learning schedule (part 1): do 250 iteration with lower 
         momentum but
        # higher learning rate controlled via the early exaggeration 
         parameter
        P *= self.early_exaggeration
        params, kl_divergence, it = _gradient_descent(obj_func, params, 
         **opt_args)
        if self.verbose:
            print("[t-SNE] KL divergence after %d iterations with early "
                "exaggeration: %f" % 
                (it + 1, kl_divergence)) # Learning schedule (part 2): disable 
                 early exaggeration and finish
        # optimization with a higher momentum at 0.8
        P /= self.early_exaggeration
        remaining = self.n_iter - self._EXPLORATION_N_ITER
        if it < self._EXPLORATION_N_ITER or remaining > 0:
            opt_args['n_iter'] = self.n_iter
            opt_args['it'] = it + 1
            opt_args['momentum'] = 0.8
            opt_args['n_iter_without_progress'] = self.
             n_iter_without_progress
            params, kl_divergence, it = _gradient_descent(obj_func, 
             params, **opt_args)
        # Save the final number of iterations
        self.n_iter_ = it
        if self.verbose:
            print("[t-SNE] KL divergence after %d iterations: %f" % (it + 1, 
             kl_divergence))
        X_embedded = params.reshape(n_samples, self.n_components)
        self.kl_divergence_ = kl_divergence
        return X_embedded
    
    def fit_transform(self, X, y=None):
        """Fit X into an embedded space and return that transformed
        output.

        Parameters
        ----------
        X : array, shape (n_samples, n_features) or (n_samples, 
         n_samples)
            If the metric is 'precomputed' X must be a square distance
            matrix. Otherwise it contains a sample per row. If the method
            is 'exact', X may be a sparse matrix of type 'csr', 'csc'
            or 'coo'. If the method is 'barnes_hut' and the metric is
            'precomputed', X may be a precomputed sparse graph.

        y : Ignored

        Returns
        -------
        X_new : array, shape (n_samples, n_components)
            Embedding of the training data in low-dimensional space.
        """
        embedding = self._fit(X)
        self.embedding_ = embedding
        return self.embedding_
    
    def fit(self, X, y=None):
        """Fit X into an embedded space.

        Parameters
        ----------
        X : array, shape (n_samples, n_features) or (n_samples, 
         n_samples)
            If the metric is 'precomputed' X must be a square distance
            matrix. Otherwise it contains a sample per row. If the method
            is 'exact', X may be a sparse matrix of type 'csr', 'csc'
            or 'coo'. If the method is 'barnes_hut' and the metric is
            'precomputed', X may be a precomputed sparse graph.

        y : Ignored
        """
        self.fit_transform(X)
        return self

标签：precomputed,manifold,Python,self,iter,method,TSNE,metric
来源： https://blog.csdn.net/qq_41185868/article/details/111397989