首页 > 其他分享> > WGAN：Wasserstein GAN

WGAN：Wasserstein GAN

2020-02-28 14:03:43 作者：互联网

Wasserstein GAN

Paper:https://arxiv.org/pdf/1701.07875.pdf
Code:https://github.com/igul222/improved_wgan_training
参考：
https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html
https://vincentherrmann.github.io/blog/wasserstein/
（阅读笔记）

1.Intro

得到目标概率密度一般就利用极大似然估计的方法，而不同分布之间则一般用散度衡量。
模型生成得到的分布与原始真实的分布不太可能有交叉的地方。两个分布都仅仅只是各自有各自的，而不是联合的，得到的这种形式的目标分布是不理想的。It is then unlikely that…have a non-negligible intersection.
- 所以很多文献都是通过给目标分布添加噪声来尽量覆盖所有的例子，但是会使图像受损。
- 而GAN就是通过生成器让低维流形产生高维的分布，当下效果也不是很理想。
主要目标是衡量分布之间的距离。we direct our attention on the various ways to measure how close the model distribution and the real distribution are, or equivalently.
研究了EM距离。we provide a comprehensive theoretical analysis of how the Earth Mover (EM) distance behaves in comparison to popular probability distances and divergences used in the context of learning distributions.
定义了WGAN。we define a form of GAN called Wasserstein-GAN that minimizes a reasonable and efficient approximation of the EM distance, and we theoretically show that the corresponding optimization problem is sound.

2.Distances

各种 $distances(divergences)$ distances(divergences)： $\mathbf{TV}$ TV； $\mathbf{KL}$ KL； $\mathbf{JS}$ JS等（可见论文 $f$ f-GAN），而 $Earth$ Earth- $Mover(EM)$ Mover(EM)如下：
$\begin{aligned} W(\mathbb{P}_{r},\mathbb{P}_{g})&=\inf_{\gamma \in \Pi(\mathbb{P}_{r},\mathbb{P}_{g})} \mathbb{E}_{(x,y) \sim \gamma} \left[\|x-y \| \right]\\ &=\inf_{\gamma \in \Pi(\mathbb{P}_{r},\mathbb{P}_{g})} \int \int \left[\ \gamma(x,y)\|x-y \| \right]\mathrm{d}x\mathrm{d}y \tag{1} \end{aligned}$ W(Pr,Pg)=γ∈Π(Pr,Pg)infE(x,y)∼γ[∥x−y∥]=γ∈Π(Pr,Pg)inf∫∫[ γ(x,y)∥x−y∥]dxdy(1)
$\mathbb{P}_{r},\mathbb{P}_{g}$ Pr,Pg的联合分布集为 $\Pi$ Π； $\gamma$ γ是其中一种联合分布；从 $\gamma$ γ中抽样得到所有 $(x,y)$ (x,y)，用范数衡量距离后再求均值；在所有联合分布集 $\Pi$ Π中， $\gamma$ γ使该期望达到下界，该最小值即是 $Earth$ Earth- $Mover(EM)$ Mover(EM)。
所以具体实现就是类似推土的意思，主要目标是保证每一组抽样点相似：
假设有均匀分布 $Z \sim U[0,1]$ Z∼U[0,1]，现有真实分布 $P_0 \sim (0,Z)\in \mathbb{R}^2$ P0∼(0,Z)∈R2，类似在二维坐标图中，点分布于 $y$ y轴 $0$ 0到 $1$ 1。而目标使分布 $g_\theta \sim(\theta,Z)$ gθ∼(θ,Z)去拟合 $P_0$ P0。
$\forall (x, y) \in P, x = 0 \text{ and } y \sim U(0, 1) \tag{2} \\ \forall (x, y) \in Q, x = \theta, 0 \leq \theta \leq 1 \text{ and } \theta, y \sim U(0, 1) \\$ ∀(x,y)∈P,x=0 and y∼U(0,1)∀(x,y)∈Q,x=θ,0≤θ≤1 and θ,y∼U(0,1)(2)

所以有如下距离定义，只有当 $\theta=0$ θ=0时，才能达到最小，但是除了 $W$ W，均达不到最下值。：
$\begin{aligned} W(\mathbb{P}_{0},\mathbb{P}_{\theta})&=|\theta|\\ \mathbf{JS}(\mathbb{P}_{0},\mathbb{P}_{\theta})&= \begin{cases} \log 2& \text{if $\theta \neq$0}\\ 0& \text{if $\theta=$0} \end{cases} \\ \mathbf{KL}(\mathbb{P}_{0},\mathbb{P}_{\theta})&=\mathbf{KL}(\mathbb{P}_{\theta},\mathbb{P}_{0})= \begin{cases} \infty& \text{if $\theta \neq$0}\\ 0& \text{if $\theta=$0} \end{cases} \\ \mathbf{TV}(\mathbb{P}_{0},\mathbb{P}_{\theta})&= \begin{cases} 1 & \text{if $\theta \neq$0}\\ 0& \text{if $\theta=$0} \end{cases} \\ \text{where: $D_{KL}(P \| Q$) }& \text{$= \sum_{x=0, y \sim U(0, 1)} 1 \cdot \log\frac{1}{0} = +\infty$ } \\ \text{ $D_{JS}(P \| Q$)}&= \text{$\frac{1}{2}(\sum_{x=0, y \sim U(0, 1)} 1 \cdot \log\frac{1}{1/2} + \sum_{x=0, y \sim U(0, 1)} 1 \cdot \log\frac{1}{1/2}) = \log 2$ } \\ \tag{3} \end{aligned}$ W(P0,Pθ)JS(P0,Pθ)KL(P0,Pθ)TV(P0,Pθ)where: DKL(P∥Q) DJS(P∥Q)=∣θ∣={log20if θ=0if θ=0=KL(Pθ,P0)={∞0if θ=0if θ=0={10if θ=0if θ=0=∑x=0,y∼U(0,1)1⋅log01=+∞ =21(∑x=0,y∼U(0,1)1⋅log1/21+∑x=0,y∼U(0,1)1⋅log1/21)=log2 (3)
Why Wasserstein is indeed weak?（有待研究更新）
论文还叙述了为什么Wasserstein距离是比 $\mathbf{JS}$ JS距离差的，但作者仍然用Wasserstein距离。证明用到了一些泛函的概念。 $\mathcal{X}$ X为 $\mathbb{R}^2$ R2中的一组集，即 $\mathcal{X}\in \mathbb{R}^2$ X∈R2； $C_b(\mathcal{X})$ Cb(X)是将 $\mathcal{X}$ X映射到 $\mathbb{R}$ R的函数的空间（ $C_b(\mathcal{X})$ Cb(X)中每一个元素都是函数，它是一集合）：
$\begin{aligned} C_b(\mathcal{X}) &= \{ f:\mathcal{X} \rightarrow \mathbb{R}, &\text{$f$ is continuous and bounded} \}\\ \tag{4} \end{aligned}$ Cb(X)={f:X→R,f is continuous and bounded}(4)
当有 $f \in C_b(\mathcal{X})$ f∈Cb(X)后，按照矩阵的方式理解则有，所以 $f$ f的无穷范数即是得到的 $\mathbb{R}^2$ R2空间结果的绝对值最大值：
$\begin{aligned} \text{assume:}f_{m \times n} \cdot \mathcal{X}_{n \times 1}= \mathbb{R}_{m \times 1} \\ \therefore f_{m \times n} \cdot \mathcal{X}_{n \times d}= \mathbb{R}_{m \times d} \\ \therefore \|f\|_{\infin} = \max_{x \in \mathcal{X}}|f(x)| \tag{5} \end{aligned}$ assume:fm×n⋅Xn×1=Rm×1∴fm×n⋅Xn×d=Rm×d∴∥f∥∞=x∈Xmax∣f(x)∣(5)
给集合 $(C_b(\mathcal{X})$ (Cb(X)赋予一范数进行约束得到一个赋范向量空间 $(C_b(\mathcal{X}),\| \cdot \| )$ (Cb(X),∥⋅∥)（ $f_\infin$ f∞范数诱导的自然拓扑）
$\begin{aligned} {\mathbb {E}}\times {\mathbb {E}}\longrightarrow {\mathbb {R}} {\displaystyle \ (x,y)\mapsto \Vert x-y\Vert } \ (x,y)\mapsto \Vert x-y\Vert \tag{6} \end{aligned}$ E×E⟶R (x,y)↦∥x−y∥ (x,y)↦∥x−y∥(6)

3.WGAN

利用Kantorovich-Rubinstein对偶性，将推土距离转换如下（but why?有待研究更新），其中 $K$ K代表 $\text{K-Lipschitz}:\lvert f(x_1) - f(x_2) \rvert \leq K \lvert x_1 - x_2 \rvert$ K-Lipschitz:∣f(x1)−f(x2)∣≤K∣x1−x2∣，约束函数平稳，斜率不能太大：
$\begin{aligned} W(\mathbb{P}_{r},\mathbb{P}_{\theta})= \frac{1}{K} \sup_{\| f \|_L \leq K} \mathbb{E}_{x \sim \mathbb{P}_{r}}[f(x)] - \mathbb{E}_{x \sim \mathbb{P}_{\theta}}[f(x)] \tag{7} \end{aligned}$ W(Pr,Pθ)=K1∥f∥L≤KsupEx∼Pr[f(x)]−Ex∼Pθ[f(x)](7)
所以有 $\text{K-Lipschitz}$ K-Lipschitz函数 $\{ f_w \}_{w \in W}$ {fw}w∈W，判别器需要学到一个好的 $f$ f，并且要求损失函数如下进行收敛：
$\begin{aligned} L(\mathbb{P}_{r},\mathbb{P}_{\theta})=W(\mathbb{P}_{r},\mathbb{P}_{\theta})= \max_{w \in W} \mathbb{E}_{x \sim p_r}[f_w(x)] - \mathbb{E}_{z \sim p_r(z)}[f_w(g_\theta(z))] \tag{8} \end{aligned}$ L(Pr,Pθ)=W(Pr,Pθ)=w∈WmaxEx∼pr[fw(x)]−Ez∼pr(z)[fw(gθ(z))](8)

正如算法流程所述，以便使用梯度下降，所以文中使用约束权重范围的方法，以防止改变权重造成很大的改变，确保 $\text{1-Lipschitz}$ 1-Lipschitz。

强大源发布了29 篇原创文章 · 获赞 15 · 访问量 1万+ 私信关注

标签：mathbb,Pr,text,GAN,theta,Wasserstein,aligned,sim,WGAN
来源： https://blog.csdn.net/qq_42192910/article/details/104524492