其他分享
首页 > 其他分享> > 论文解读GALA《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》

论文解读GALA《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》

作者:互联网

论文信息

  Title:《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》
  Authors:Jiwoong Park、Minsik Lee、H. Chang、Kyuewang Lee、J. Choi
  Sources:2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  Paper:Download
  Code:Download
  Others:44 Citations, 44 References


Abstract

  本文提出一个完全对称的自编码器,其中 解码器 基于 Laplacian sharpening  设计 ,编码器  基于 Laplacian smoothing 设计。此外还设计了一种基于  latent representation 和  affinity matrix  的代价函数。


1. Introduction

  在机器学习和数据挖掘领域,使用图数据做以下研究:

  卷积方法产生限制的原因:

   近年来,一个被称为几何深度学习( geometric deep learning) 的新兴领域,将深度神经网络模型推广到 非欧几里得域 ( non-Euclidean domains),比如: meshes、 manifolds、 graphs。其中,利用自编码器框架寻找图的 geometrical structures 的深层潜在表示正越来越受到关注。

  首先尝试的是   VGAE  它由一个  Graph Convolutional Network (GCN)  encoder   和一个  matrix outerproduct decoder  组成,如 Figure 1 (a) 所示。

  作为  VGAE  的一种变体,ARVGA 采用了 VGAE 的  adversarial approach。然而 VGAE 和  ARVGA  重建   affinity matrix  $A$,而不是节点   feature matrix  $X $,因此,解码器部分不能是可学习的,故, graphical feature 不能在解码器部分中使用。 

  之后,提出了  MGAE,它使用具有线性激活函数的堆叠单层图自动编码器和边缘化过程 如 Figure 1 (b)所示。但是, since the MGAE reconstructs the feature matrix of nodes without hidden layers, it cannot manipulate the dimension of the latent representation and performs a linear mapping.

    

  本文提出了一种新的图卷积自编码器框架,该框架具有对称的自编码器架构,并在编码和解码过程中同时使用图和节点属性,如 Figure 1(c)  所示 。

  VGAE  的编码器可以解释为一种特殊形式的   Laplacian smoothing  ,它计算每个节点的新表示加权局部平均的邻居和本身。

  本文使用切比雪夫多项式(Chebyshev polynomial)的形式表示拉普拉斯锐化以构造一个 Laplacian sharpening  的解码器。

  本文贡献如下:


2. Preliminaries

2.1. Basic notations on graphs

2.2. Spectral convolution on graphs

  可以发现 2.2 节的内容在《第三代GCN》已经将的很清楚了,所以下面不再详细展开介绍。

  A spectral convolution on a graph is the multiplication of an input signal $x \in \mathbb{R}^{n}$ with a spectral filter $g_{\theta}=\operatorname{diag}(\theta)$ parameterized by the vector of Fourier coefficients $\theta \in \mathbb{R}^{n}$ as follows:

  $g_{\theta} * x=U g_{\theta} U^{T} x\quad \quad \quad (1)$

  where

  一般图卷积带来的问题:this operation is inappropriate for large-scale graphs since it requires an eigendecomposition to obtain the eigenvalues and eigenvectors of $L$ .

  解决办法:the spectral filter $g_{\theta}(\Lambda)$ was approximated by $K^{t h}$ order Chebyshev polynomials in previous works . By doing so, the spectral convolution on the graph can be approximated as

    $g_{\theta} * x \approx U \sum\limits _{k=0}^{K} \theta_{k}^{\prime} T_{k}(\tilde{\Lambda}) U^{T} x=\sum\limits _{k=0}^{K} \theta_{k}^{\prime} T_{k}(\tilde{L}) x\quad \quad \quad (2)$

  where 

  In the GCN , the Chebyshev approximation model was simplified by setting $K=1$, $\lambda_{\max } \approx 2$ and $\theta=\theta_{0}^{\prime}= -\theta_{1}^{\prime}$ . This makes the spectral convolution simplified as follows:

    $g_{\theta} * x \approx \theta\left(I_{n}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}}\right) x  \quad \quad \quad (3)$

  However, repeated application of $I_{n}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}}$ can cause numerical instabilities in neural networks since the spectral radius of $I_{n}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}}$ is $2$ , and the Chebyshev polynomials form an orthonormal basis when its spectral radius is $1$ . To circumvent this issue, the GCN uses renormalization trick:

    $I_{n}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}} \rightarrow \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}  \quad \quad \quad (4)$

  where $\tilde{A}=A+I_{n}$ and $\tilde{D}_{i i}=\sum_{j} \tilde{A}_{i j}$ . Since adding selfloop on nodes to an affinity matrix cannot affect the spectral radius of the corresponding graph Laplacian matrix , this renormalization trick can provide a numerically stable form of $I_{n}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}}$ while maintaining the meaning of each elements as follows:

    $\left(I_{n}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}}\right)_{i j}=\left\{\begin{array}{ll}1 & i=j \\A_{i j} / \sqrt{D_{i i} D_{j j}} & i \neq j\end{array}\right.\quad \quad \quad (5)$

    $\left(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}\right)_{i j}=\left\{\begin{array}{ll}1 /\left(D_{i i}+1\right) & i=j \\A_{i j} / \sqrt{\left(D_{i i}+1\right)\left(D_{j j}+1\right)} & i \neq j\end{array}\right.\quad \quad \quad (6)$

  Finally, the forward-path of the GCN can be expressed by

    $H^{(m+1)}=\xi\left(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(m)} \Theta^{(m)}\right)\quad \quad \quad (7)$

  where $H^{(m)}$  is the activation matrix in the $m^{t h}$  layer and $H^{(0)}$  is the input nodes' feature matrix $X $. $\xi(\cdot)$  is a nonlinear activation function like $\operatorname{ReLU}(\cdot)=\max (0, \cdot) $, and $\Theta^{(m)}$  is a trainable weight matrix. The GCN presents a computationally efficient convolutional process (given the assumption that $\tilde{A}$  is sparse) and achieves an improved accuracy over the state-of-the-art methods in semi-supervised node classification task by using features of nodes and geometric structure of graph simultaneously.

2.3. Laplacian smoothing

   Li et al. show that GCN is a special form of Laplacian smoothing.

  Laplacian smoothing equation:

    $x_{i}^{(m+1)}=(1-\gamma) x_{i}^{(m)}+\gamma \sum\limits_{j} \frac{\tilde{A}_{i j}}{\tilde{D}_{i i}} x_{j}^{(m)}\quad \quad \quad (8)$

   Where:

  Rewrite the above equation in a matrix form as follows:

    $\begin{aligned}X^{(m+1)} &=(1-\gamma) X^{(m)}+\gamma \tilde{D}^{-1} \tilde{A} X^{(m)} \\&=X^{(m)}-\gamma\left(I_{n}-\tilde{D}^{-1} \tilde{A}\right) X^{(m)} \\&=X^{(m)}-\gamma \tilde{L}_{r w} X^{(m)}\end{aligned}\quad \quad \quad (9)$

Tips:

  Random walk normalized Laplacian 

     $L^{r w}=D^{-1} L=I-D^{-1} A$

  Symmetric normalized Laplacian

    $L^{\text {sys }}=D^{-1 / 2} L D^{-1 / 2}=I-D^{-1 / 2} A D^{-1 / 2}$

   We set  $\gamma=1$  and replace $\tilde{L}_{r w}$  with  $\tilde{L}$ , then Eq. (9) is changed into 

    $X^{(m+1)}=\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} X^{(m)}$ 

  This equation is the same as the renormalized version of spectral convolution in Eq. (7).


3 The proposed method

  Our mode is GALA (Graph convolutional Autoencoder using Laplacian smoothing and sharpening).

  In GALA, there are  $M$  layers in total, from the first to $\frac{M}{2}$  th layers for the encoder and from the  $\left(\frac{M}{2}+1\right)$ th to  $M$  th layers for the decoder where  $M$  is an even number.

3.1 Laplacian sharpening

  The encoder performs Laplacian smoothing that makes the latent representation of each node similar to those of its neighboring nodes.

  Laplacian sharpening is a process that makes the reconstructed feature of each node farther away from the centroid of its neighbors.

  Laplacian sharpening equation:

     $x_{i}^{(m+1)}=(1+\gamma) x_{i}^{(m)}-\gamma \sum\limits _{j} \frac{A_{i j}}{D_{i i}} x_{j}^{(m)}\quad \quad \quad (10)$

   Where:

   The matrix form of Eq. (10) :

    $\begin{aligned}X^{(m+1)} &=(1+\gamma) X^{(m)}-\gamma D^{-1} A X^{(m)} \\&=X^{(m)}+\gamma\left(I_{n}-D^{-1} A\right) X^{(m)} \\&=X^{(m)}+\gamma L_{r w} X^{(m)}\end{aligned}\quad \quad \quad (11)$

  We set  $\gamma=1$  and replace $\tilde{L}_{r w}$  with  $\tilde{L}$ , then Eq. (10) is changed into 

     $X^{(m+1)}=(2 I_{n}-D^{-\frac{1}{2}} A D^{-\frac{1}{2}})X^{(m)}$

  We express Laplacian sharpening in the form of Chebyshev polynomial  and simplify it with 

  Then, a decoder layer can be expressed by

    $H^{(m+1)}=\xi\left(\left(2 I_{n}-D^{-\frac{1}{2}} A D^{-\frac{1}{2}}\right) H^{(m)} \Theta^{(m)}\right)\quad \quad \quad (12)$

  Where:

  The spectral radius of  $2 I_{n}-D^{-\frac{1}{2}} A D^{-\frac{1}{2}}$  is 3 , repeated application of this operator can be numerically instable. 

       方阵的谱半径

  若 $\boldsymbol{A}=\left(a_{i j}\right)$ 是复数域上的 $\mathrm{n} $ 阶方阵,又 $\lambda_{1}, \lambda_{2}, \ldots, \lambda_{n}$ 是 $A$ 的全部特征值,则

    $\rho(\boldsymbol{A})= \underset{1 \leq i \leq n}{max}  |\lambda_{i}|$

  From previous articles,we know that 

    $D^{-\frac{1}{2}} A D^{-\frac{1}{2}}$ range  $[ -1,1]$

   So we need to find a  numerically stable form of Laplacian sharpening.

3.2 Numerically stable Laplacian sharpening

  To find a new representation of Laplacian sharpening whose spectral radius is $1$ .

  A signed graph is denoted by  $\Gamma=(\mathcal{V}, \mathcal{E}, \hat{A})$  which is induced from the unsigned graph  $\mathcal{G}=(\mathcal{V}, \mathcal{E}, A)$ .

  Each element in  $\hat{A}$  has the same absolute value with  $A$ , but its sign is changed into minus or keeps plus.

  The degree matrix $\hat{D}$ can be defined as $\hat{D}_{i i}=\sum\limits _{j}\left|\hat{A}_{i j}\right|$ .

  Thus,we can construct

  The range of the eigenvalue of  $\hat{L}$  is  $[0,2]$ , thus the spectral radius of  $\hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}}$  is $1$ for any choice of  $\hat{A}$ . 

   Using this result, instead of Eq. (12):

    $H^{(m+1)}=\xi\left(\hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}} H^{(m)} \Theta^{(m)}\right)\quad \quad \quad (13)$

  Where:

  $\hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}}$ 计算方式如下:

    $\left(\hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}}\right)_{i j}=\left\{\begin{array}{ll}2 /\left(D_{i i}+2\right) & i=j \\-A_{i j} / \sqrt{\left(D_{i i}+2\right)\left(D_{j j}+2\right)} & i \neq j\end{array}\right.\quad \quad \quad (14)$

  Which has the same meaning with the original one given by

    $\left(2 I_{n}-D^{-\frac{1}{2}} A D^{-\frac{1}{2}}\right)_{i j}=\left\{\begin{array}{ll}2 & i=j \\-A_{i j} / \sqrt{D_{i i} D_{j j}} & i \neq j\end{array}\right.\quad \quad \quad (15)$

  目的:

   Sumary, the numerically stable decoder layer of GALA is given as

    $H^{(m+1)}=\xi\left(\hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}} H^{(m)} \Theta^{(m)}\right),\left(m=\frac{M}{2}, \ldots, M-1\right)\quad \quad \quad (16)$

  Where

    The encoder layer of GALA is given as

    $H^{(m+1)}=\xi\left(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(m)} \Theta^{(m)}\right),\left(m=0, \ldots, \frac{M}{2}-1\right)\quad \quad \quad (17)$

  Where

   Both encoder and decoder propagation functions, Eqs. (16) and (17), are both $\mathcal{O}(m p c)$.

  In Table 1, we show the reason why the Laplacian smoothing is not appropriate to the decoder and the necessity of numerically stable Laplacian sharpening by node clustering experiments.

    

  The basic cost function of GALA is given by

     $\min _{\bar{X}} \frac{1}{2}\|X-\bar{X}\|_{F}^{2}\quad \quad \quad (18)$

   Where

3.3. Subspace clustering cost for image clustering

  For image clustering tasks,we add  an element of subspace clustering to the proposed method.

    $\underset{\bar{X}, H, A_{H}}{min} \frac{1}{2}\|X-\bar{X}\|_{F}^{2}+\frac{\lambda}{2}\left\|H-H A_{H}\right\|_{F}^{2}+\frac{\mu}{2}\left\|A_{H}\right\|_{F}^{2}\quad \quad \quad (19)$

   Where

  The second term of Eq. (19) aims at the self-expressive model of subspace clustering and the third term of Eq. (19) is for regularizing  $A_{H}$ .

  If we only consider minimizing  $A_{H}$ , the problem becomes:

     $\underset{A_{H}}{min}    \frac{\lambda}{2}\left\|H-H A_{H}\right\|_{F}^{2}+\frac{\mu}{2}\left\|A_{H}\right\|_{F}^{2}\quad \quad \quad (20)$

  We can easily obtain the analytic solution

    $A_{H}^{*}=(H^{T} H+\frac{\mu}{\lambda} I_{n})^{-1} H^{T} H $

  By using this analytic solution and singular value decomposition, we derive a computationally efficient subspace clustering cost function as follows :

    $\underset{\bar{X}, H}{min}  \frac{1}{2}\|X-\bar{X}\|_{F}^{2}+\frac{\mu \lambda}{2} \operatorname{tr}\left(\left(\mu I_{k}+\lambda H H^{T}\right)^{-1} H H^{T}\right)\quad \quad \quad (21)$

  Some benefits are it can reduce the computational burden significantly.

3.4. Training

  For image clustering tasks,we add the subspace clustering cost is added to reconstruction cost.

  After the training process are over, we construct  $k$  nearest neighborhood graph using attained latent representations  $H^{*}$ . Then we perform spectral clustering and get the clustering performance.

  In the case of image clustering, after all training processes are over, we construct the optimal affinity matrix  $A_{H}^{*}$  noted in the previous subsection by using the attained latent representation matrix  $H^{*}$  from GALA. Then we perform spectral clustering on the affinity matrix and get the optimal clustering with respect to our cost function.


4. Experiments

4.1. Datasets

  Four network datasets

  Every network dataset has the feature matrix $X$ and the affinity matrix $A$ .

  Three image datasets 

  Every image dataset has the feature matrix $X$ only.  

  Task:

4.2. Experimental settings

  Three metrics:

  For average in 50 times.

4.3. Comparing methods

  

4.4. Node clustering results

  The experimental results of node clustering are presented in Table 2.

    

  We conduct another node clustering experiment on a large network dataset (Pubmed), and the results are reported in Table 5.

    

4.5. Image clustering results

  The experimental results of image clustering are presented in Table 4.

   We report both GALA’s performance of reconstruction cost only case and the subspace clustering cost added case (GALA+SCC).

  

4.6. Ablation studies

   We validate the effectiveness of the proposed stable decoder and the subspace clustering cost by image clustering experiments on the three image datasets (COIL20, YALE and MNIST).

   There are four configurations as shown in Table 6.

    

4.7. Link prediction results

  We provide some results on link prediction task on Citeseer dataset.

  For link prediction task, we minimized the below cost function that added link prediction cost of GAE to the reconstruction cost

     $\underset{\bar{X}, H}{min} \frac{1}{2}\|X-\bar{X}\|_{F}^{2}+\gamma \mathbb{E}_{H}[\log p(\hat{A} \mid H)]$

  where 

  The results are shown in Table 7

    

4.8. Visualization

  We visualize the distribution of learned latent representations and the input features of each node in two-dimensional space using t-SNE [23] as shown in Figure 3.

    


5. Conclusions

 

标签:Convolutional,frac,Graph,Laplacian,Symmetric,tilde,quad,hat,left
来源: https://www.cnblogs.com/BlairGrowing/p/15802478.html