首页 > 其他分享> > 统计推断(一) Hypothesis Test

统计推断(一) Hypothesis Test

2020-02-04 09:05:21 作者：互联网

1. Binary Bayesian hypothesis testing

1.0 Problem Setting

Hypothesis
- Hypothesis space $\mathcal{H}=\{H_0, H_1\}$ H={H0,H1}
- Bayesian approach: Model the valid hypothesis as an RV H
- Prior $P_0 = p_\mathsf{H}(H_0), P_1=p_\mathsf{H}(H_1)=1-P_0$ P0=pH(H0),P1=pH(H1)=1−P0
Observation
- Observation space $\mathcal{Y}$ Y
- Observation Model $p_\mathsf{y|H}(\cdot|H_0), p_\mathsf{y|H}(\cdot|H_1)$ py∣H(⋅∣H0),py∣H(⋅∣H1)
Decision rule $f:\mathcal{Y\to H}$ f:Y→H
Cost function C:H×H→RC: \mathcal{H\times H} \to \mathbb{R}C:H×H→R
- Let $C_{ij}=C(H_j,H_i), correct hypo is H_j$ Cij=C(Hj,Hi),correcthypoisHj
- $C$ C is valid if $C_{jj}<C_{ij}$ Cjj<Cij
Optimum decision rule $\hat{H}(\cdot) = \arg\min\limits_{f(\cdot)}\mathbb{E}[C(\mathsf{H},f(\mathsf{y}))]$ H^(⋅)=argf(⋅)minE[C(H,f(y))]

1.1 Binary Bayesian hypothesis testing

Theorem: The optimal Bayes’ decision takes the form
$L(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \frac{P_0}{P_1} \frac{C_{10}-C_{00}}{C_{01}-C_{11}} \triangleq \eta$ L(y)≜py∣H(⋅∣H0)py∣H(⋅∣H1)⋛H1P1P0C01−C11C10−C00≜η
Proof:
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(f) &=…
Given $y^*$ y∗

if $f(y^*)=H_0$ f(y∗)=H0, $\mathbb{E}=C_{00}p_{\mathsf{H|y}}(H_0|y^*)+C_{01}p_{\mathsf{H|y}}(H_1|y^*)$ E=C00pH∣y(H0∣y∗)+C01pH∣y(H1∣y∗)

if $f(y^*)=H_1$ f(y∗)=H1, $\mathbb{E}=C_{10}p_{\mathsf{H|y}}(H_0|y^*)+C_{11}p_{\mathsf{H|y}}(H_1|y^*)$ E=C10pH∣y(H0∣y∗)+C11pH∣y(H1∣y∗)

So
$\frac{p_\mathsf{H|y}(H_1|y^*)}{p_\mathsf{H|y}(H_0|y^*)} \overset{H_1} \gtreqless \frac{C_{10}-C_{00}}{C_{01}-C_{11}}$ pH∣y(H0∣y∗)pH∣y(H1∣y∗)⋛H1C01−C11C10−C00
备注：证明过程中，注意贝叶斯检验为确定性检验，因此对于某个确定的 y， $f(y)=H_1$ f(y)=H1 的概率要么为 0 要么为 1。因此对代价函数求期望时，把 H 看作是随机变量，而把 $f(y)$ f(y) 看作是确定的值来分类讨论

Special cases

Maximum a posteriori (MAP)
- $C_{00}=C_{11}=0,C_{01}=C_{10}=1$ C00=C11=0,C01=C10=1
- $\hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{H|y}(H|y)$ H^(y)==argH∈{H0,H1}maxpH∣y(H∣y)
Maximum likelihood (ML)
- $C_{00}=C_{11}=0,C_{01}=C_{10}=1, P_0=P_1=0.5$ C00=C11=0,C01=C10=1,P0=P1=0.5
- $\hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{y|H}(y|H)$ H^(y)==argH∈{H0,H1}maxpy∣H(y∣H)

1.2 Likelyhood Ratio Test

Generally, LRT
$L(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \eta$ L(y)≜py∣H(⋅∣H0)py∣H(⋅∣H1)⋛H1η

Bayesian formulation gives a method of calculating $\eta$ η
$L(y)$ L(y) is a sufficient statistic for the decision problem
$L(y)$ L(y) 的可逆函数也是充分统计量

充分统计量

1.3 ROC

Detection probability $P_D = P(\hat{H}=H_1 | \mathsf{H}=H_1)$ PD=P(H^=H1∣H=H1)
False-alarm probability $P_F = P(\hat{H}=H_1 | \mathsf{H}=H_0)$ PF=P(H^=H1∣H=H0)

性质（重要！）

LRT 的 ROC 曲线是单调不减的

ROC

2. Non-Bayesian hypo test

Non-Bayesian 不需要先验概率或者代价函数

Neyman-Pearson criterion

$\max_{\hat{H}(\cdot)}P_D \ \ \ s.t. P_F\le \alpha$ H^(⋅)maxPD s.t.PF≤α

Theorem(Neyman-Pearson Lemma)：NP 准则的最优解由 LRT 得到，其中 $\eta$ η 由以下公式得到
$P_F=P(L(y)\ge\eta | \mathsf{H}=H_0) = \alpha$ PF=P(L(y)≥η∣H=H0)=α
Proof：

物理直观：同一个 $P_F$ PF 时 LRT 的 $P_D$ PD 最大。物理直观来看，LRT 中判决为 H1 的区域中 $\frac{p(y|H_1)}{p(y|H_0)}$ p(y∣H0)p(y∣H1) 都尽可能大，因此 $P_F$ PF 相同时 $P_D$ PD 可最大化

备注：NP 准则最优解为 LRT，原因是

同一个 $P_F$ PF 时， LRT 的 $P_D$ PD 最大

LRT 取不同的 $\eta$ η 时， $P_F$ PF 越大，则 $P_D$ PD 也越大，即 ROC 曲线单调不减

3. Randomized test

3.1 Decision rule

Two deterministic decision rules $\hat{H'}(\cdot),\hat{H''}(\cdot)$ H′^(⋅),H′′^(⋅)
Randomized decision rule $\hat{H}(\cdot)$ H^(⋅) by time-sharing
$\hat{\mathrm{H}}(\cdot)=\left\{\begin{array}{ll}{\hat{H}^{\prime}(\cdot),} & {\text { with probability } p} \\ {\hat{H}^{\prime \prime}(\cdot),} & {\text { with probability } 1-p}\end{array}\right.$ H^(⋅)={H^′(⋅),H^′′(⋅), with probability p with probability 1−p
- Detection prob $P_D=pP_D'+(1-p)P_D''$ PD=pPD′+(1−p)PD′′
- False-alarm prob $P_F=pP_F'+(1-P)P_F''$ PF=pPF′+(1−P)PF′′
A randomized decision rule is fully described by $p_{\mathsf{\hat{H}|y}}(H_m|y)$ pH^∣y(Hm∣y) for m=0,1

3.2 Proposition

Bayesian case: cannot achieve a lower Bayes’ risk than the optimum LRT

Proof: Risk for each y is linear in $p_{\mathrm{H} | \mathbf{y}}\left(H_{0} | \mathbf{y}\right)$ pH∣y(H0∣y), so the minima is achieved at 0 or 1, which degenerate to deterministic decision
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(\mathb…
Neyman-Pearson case:
1. continuous-valued: For a given $P_F$ PF constraint, randomized test cannot achieve a larger $P_D$ PD than optimum LRT
2. discrete-valued: For a given $P_F$ PF constraint, randomized test can achieve a larger $P_D$ PD than optimum LRT. Furthermore, the optimum rand test corresponds to simple time-sharing between the two LRTs nearby

3.3 Efficient frontier

Boundary of region of achievable $(P_D,P_F)$ (PD,PF) operation points

continuous-valued: ROC of LRT
discrete-valued: LRT points and the straight line segments

Facts

$P_D \ge P_F$ PD≥PF
efficient frontier is concave function
$\frac{dP_D}{dP_F}=\eta$ dPFdPD=η

efficient frontier

4. Minmax hypo testing

prior: unknown, cost fun: known

4.1 Decision rule

minmax approach
$\hat H(\cdot)=\arg\min_{f(\cdot)}\max_{p\in[0,1]} \varphi(f,p)$ H^(⋅)=argf(⋅)minp∈[0,1]maxφ(f,p)
optimal decision rule
$\hat H(\cdot)=\hat{H}_{p_*}(\cdot) \\ p_* = \arg\max_{p\in[0,1]} \varphi(\hat H_p, p)$ H^(⋅)=H^p∗(⋅)p∗=argp∈[0,1]maxφ(H^p,p)
要想证明上面的最优决策，首先引入 mismatch Bayes decision
$\hat{\mathrm{H}}_q(y)=\left\{ \begin{array}{ll}{H_1,} & {L(y) \ge \frac{1-q}{q}\frac{C_{10}-C_{00}}{C_{01}-C_{11}}} \\ {H_0,} & {otherwise}\end{array}\right.$ H^q(y)={H1,H0,L(y)≥q1−qC01−C11C10−C00otherwise
代价函数如下，可得到 $\varphi(\hat H_q,p)$ φ(H^q,p) 与概率 $p$ p 成线性关系
$\varphi(\hat H_q,p)=(1-p)[C_{00}(1-P_F(q))+C_{10}P_F(q)] + p[C_{01}(1-P_D(q))+C_{11}P_D(q)]$ φ(H^q,p)=(1−p)[C00(1−PF(q))+C10PF(q)]+p[C01(1−PD(q))+C11PD(q)]
Lemma: Max-min inequality
$\max_x\min_y g(x,y) \le \min_y\max_x g(x,y)$ xmaxyming(x,y)≤yminxmaxg(x,y)
Theorem:
$\min_{f(\cdot)}\max_{p\in[0,1]}\varphi(f,p)=\max_{p\in[0,1]}\min_{f(\cdot)}\varphi(f,p)$ f(⋅)minp∈[0,1]maxφ(f,p)=p∈[0,1]maxf(⋅)minφ(f,p)
Proof of Lemma: Let $h(x)=\min_y g(x,y)$ h(x)=minyg(x,y)
$\begin{aligned} g(x) &\leq f(x, y), \forall x \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \max _{x} f(x, y), \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \min _{y} \max _{x} f(x, y) \end{aligned}$ g(x)⟹xmaxg(x)⟹xmaxg(x)≤f(x,y),∀x∀y≤xmaxf(x,y),∀y≤yminxmaxf(x,y)
Proof of Thm: 先取 $\forall p_1,p_2 \in [0,1]$ ∀p1,p2∈[0,1]，可得到
$\varphi(\hat H_{p_1},p_1)=\min_f \varphi(f,p_1) \le \max_p \min_f \varphi(f,p) \le \min_f \max_p \varphi(f, p) \le \max_p \varphi(\hat H_{p_2}, p)$ φ(H^p1,p1)=fminφ(f,p1)≤pmaxfminφ(f,p)≤fminpmaxφ(f,p)≤pmaxφ(H^p2,p)
由于 $p_1,p_2$ p1,p2 任取时上式都成立，因此可以取 $p_1=p_2=p_*=\arg\max_p \varphi(\hat H_p, p)$ p1=p2=p∗=argmaxpφ(H^p,p)

要想证明定理则只需证明 $\varphi(\hat H_{p_*},p_*)=\max_p \varphi(\hat H_{p_*}, p)$ φ(H^p∗,p∗)=maxpφ(H^p∗,p)

由前面可知 $\varphi(\hat H_q,p)$ φ(H^q,p) 与 $p$ p 成线性关系，因此要证明上式
- 若 $p_* \in (0,1)$ p∗∈(0,1)，只需 $\left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p}=0$ ∂p∂φ(H^q∗,p)∣∣∣∣for any p=0，等式自然成立
- 若 $p_* = 1$ p∗=1，只需 $\left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p} > 0$ ∂p∂φ(H^q∗,p)∣∣∣∣for any p>0，最优解就是 $p=1$ p=1； $q_*=0$ q∗=0 同理
根据下面的引理，可以得到最优决策就是 Bayes 决策 $p_*=\arg\max_p \varphi(\hat H_p, p)$ p∗=argmaxpφ(H^p,p)，其中 $p_*$ p∗ 满足
$\begin{aligned} 0 &=\frac{\partial \varphi\left(\hat{H}_{p_{*}}, p\right)}{\partial p} \\ &=\left(C_{01}-C_{00}\right)-\left(C_{01}-C_{11}\right) P_{\mathrm{D}}\left(p_{*}\right)-\left(C_{10}-C_{00}\right) P_{\mathrm{F}}\left(p_{*}\right) \end{aligned}$ 0=∂p∂φ(H^p∗,p)=(C01−C00)−(C01−C11)PD(p∗)−(C10−C00)PF(p∗)
Lemma:
$\left.\frac{\mathrm{d} \varphi\left(\hat{H}_{p}, p\right)}{\mathrm{d} p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{\text {for any } p}$ dpdφ(H^p,p)∣∣∣∣∣∣p=q=∂p∂φ(H^q,p)∣∣∣∣∣∣p=q=∂p∂φ(H^q,p)∣∣∣∣∣∣for any p

Bonennult 发布了37 篇原创文章 · 获赞 27 · 访问量 2万+ 私信关注

标签：min,max,H0,H1,varphi,Hypothesis,推断,Test,hat
来源： https://blog.csdn.net/weixin_41024483/article/details/104165225