其他分享
首页 > 其他分享> > MADE: Masked Autoencoder for Distribution Estimation

MADE: Masked Autoencoder for Distribution Estimation

作者:互联网

文章目录

Germain M., Gregor K., Murray I. and Larochelle H. MADE: Masked Autoencoder for Distribution Estimation. ICML, 2015.

考虑
x ^ = f ( x ) ∈ R D , x ∈ R D . \hat{x} = f(x) \in \mathbb{R}^D, \quad x \in \mathbb{R}^D. x^=f(x)∈RD,x∈RD.
怎么样结构的 f f f使得
x ^ = [ x ^ 1 , f 2 ( x 1 ) , f 3 ( x 1 , x 2 ) , … , f d ( x 1 , x 2 , … , x D ) ] . \hat{x} = [\hat{x}_1, f_2(x_1), f_3(x_1, x_2), \ldots, f_d(x_1,x_2,\ldots,x_D)]. x^=[x^1​,f2​(x1​),f3​(x1​,x2​),…,fd​(x1​,x2​,…,xD​)].
即, x ^ d \hat{x}_d x^d​只与 x < d x_{< d} x<d​有关.

主要内容

假设第 l l l层的关系式为:
x l = σ l ( W l x l − 1 + b l ) . x^l = \sigma^l(W^lx^{l-1} + b^l). xl=σl(Wlxl−1+bl).
作者给出的思路是, 给一个隐层的第k个神经元分配一个数字 m 1 ( k ) ∈ { 1 , … , D − 1 } m^1(k) \in \{1, \ldots, D-1\} m1(k)∈{1,…,D−1}, 则构建一个掩码矩阵 M 1 M^1 M1:
M k , d 1 = { 1 , m 1 ( k ) ≥ d 0 , e l s e . M^1_{k,d} = \left \{ \begin{array}{ll} 1, & m^1(k) \ge d \\ 0, & \mathrm{else}. \end{array} \right . Mk,d1​={1,0,​m1(k)≥delse.​
于是实际上的过程为:
x 1 = σ 1 ( W 1 ⊙ M 1   x + b 1 ) . x^1 = \sigma^1(W^1 \odot M^1 \: x + b^1). x1=σ1(W1⊙M1x+b1).

进一步, 给第 l l l个隐层的第 i i i个神经元分配数字 m l ( i ) ∈ { min ⁡ m l − 1 , … , D − 1 } m^l(i) \in \{\min m^{l-1}, \ldots, D-1\} ml(i)∈{minml−1,…,D−1} (否则会出现 M l M^l Ml的某些行全为0):
M i , j l = { 1 , m l ( i ) ≥ m l − 1 ( j ) 0 , e l s e . M^l_{i,j} = \left \{ \begin{array}{ll} 1, & m^l(i) \ge m^{l-1}(j) \\ 0, & \mathrm{else}. \end{array} \right . Mi,jl​={1,0,​ml(i)≥ml−1(j)else.​

x l = σ l ( W l ⊙ M l   x l − 1 + b l ) . x^l = \sigma^l(W^l \odot M^l \: x^{l-1} + b^l). xl=σl(Wl⊙Mlxl−1+bl).

以及最后的输出层:
M d , k L = { 1 , d > m L − 1 ( k ) 0 , e l s e . M^L_{d,k} = \left \{ \begin{array}{ll} 1, & d > m^{L-1}(k) \\ 0, & \mathrm{else}. \end{array} \right . Md,kL​={1,0,​d>mL−1(k)else.​

个人感觉, 会有很明显的稀疏化… 而且越深状况越严重.

代码

原文代码

标签:MADE,ll,ldots,x1,else,Distribution,Estimation,hat,array
来源: https://blog.csdn.net/MTandHJ/article/details/115271006