编程语言
首页 > 编程语言> > Multiple Testing 中的 Type I error(python代码实现)

Multiple Testing 中的 Type I error(python代码实现)

作者:互联网

目录

Type I Error

T-tests与Type I error

Confidence Interval与Type I error

针对type I error的改进措施


Type I Error

含义:认为treatment group和control group之间有显著不同,而实际上并没有。也叫做"false positive".

T-tests与Type I error


def multi_ttests(x):
    x0 = df[df['Group'] == 0][x]
    x1 = df[df['Group'] == 1][x]
    x2 = df[df['Group'] == 2][x]
    cm01 = sms.CompareMeans(sms.DescrStatsW(x0), sms.DescrStatsW(x1))
    cm02 = sms.CompareMeans(sms.DescrStatsW(x0), sms.DescrStatsW(x2))
    cm12 = sms.CompareMeans(sms.DescrStatsW(x1), sms.DescrStatsW(x2))
    cprint(x,'red', 'on_yellow')
    print(cm01.ttest_ind(alternative='two-sided', usevar='pooled'))  
    print(cm02.ttest_ind(alternative='two-sided', usevar='pooled')) 
    print(cm12.ttest_ind(alternative='two-sided', usevar='pooled')) 

var = df.columns
for i in range(14):
    multi_ttests(var[i+1])

(注意:'pooled'意味着这些组之间是equal variance的,因为我们认为treatment对这些组的variable都没有影响,那自然他们应该都是equal variance的。官方描述:If pooled, then the standard deviation of the samples is assumed to be the same. If unequal, then the variance of Welch ttest will be used)

发现有两条t-test的p-value<0.05, 一共允许3*14*0.05=2.1条。所以这两条可以都是type I error。

 

Confidence Interval与Type I error

lift = 1.1
ctr0=0.5
ctrl = np.random.binomial(30, p=ctr0, size=1000) * 1.0
test = np.random.binomial(30, p=ctr0*lift, size=1000) * 1.0

cm = sms.CompareMeans(sms.DescrStatsW(test), sms.DescrStatsW(ctrl))

print(cm.tconfint_diff(alpha=0.05, alternative='two-sided', usevar='unequal'))
print(cm.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='unequal'))

def multi_CI(x):
    x0 = df[df['Group'] == 0][x]
    x1 = df[df['Group'] == 1][x]
    x2 = df[df['Group'] == 2][x]
    cm01 = sms.CompareMeans(sms.DescrStatsW(x0), sms.DescrStatsW(x1))
    cm02 = sms.CompareMeans(sms.DescrStatsW(x0), sms.DescrStatsW(x2))
    cm12 = sms.CompareMeans(sms.DescrStatsW(x1), sms.DescrStatsW(x2))
    cprint(x,'red', 'on_yellow')
    print(cm01.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='pooled'))
    print(cm02.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='pooled'))
    print(cm12.zconfint_diff(alpha=0.05, alternative='two-sided', usevar='pooled'))

for i in range(14):
    multi_CI(var[i+1])

 

针对type I error的改进措施

为了减少type I error,我们可以降低alpha的值,比如从5%降低到1%,这样这些t-test中的p-value<0.01的肯定比<0.05的要少,甚至没有p-value<0.01的,这样就消除了type I error了。confidence interval也同理。

标签:Multiple,python,Testing,sms,df,DescrStatsW,treatment,test,group
来源: https://blog.csdn.net/Nancyninghao/article/details/122682164