其他分享
首页 > 其他分享> > 用PyMC3进行贝叶斯统计分析(代码+实例)

用PyMC3进行贝叶斯统计分析(代码+实例)

作者:互联网

 

问题类型1:参数估计

真实值是否等于X?

给出数据,对于参数,可能的值的概率分布是多少?

例子1:抛硬币问题

硬币扔了n次,正面朝上是h次。

参数问题

想知道 p 的可能性。给定 n 扔的次数和 h 正面朝上次数,p 的值很可能接近 0.5,比如说在 [0.48,0.52]?

说明

import pymc3 as pm
import numpy.random as npr
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from collections import Counter
import seaborn as sns

sns.set_style('white')
sns.set_context('poster')

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import warnings
warnings.filterwarnings('ignore')
from random import shuffle
total = 30
n_heads = 11
n_tails = total - n_heads
tosses = [1] * n_heads + [0] * n_tails
shuffle(tosses)

可视化数据:

def plot_coins():
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.bar(list(Counter(tosses).keys()), list(Counter(tosses).values()))
    ax.set_xticks([0, 1])
    ax.set_xticklabels(['tails', 'heads'])
    ax.set_ylim(0, 20)
    ax.set_yticks(np.arange(0, 21, 5))        
    return fig

fig = plot_coins()
plt.show()

 建立模型:

with pm.Model() as coin_model:
    # Specify prior using Uniform object.
    p_prior = pm.Uniform('p', 0, 1)  

    # Specify likelihood using Bernoulli object.
    like = pm.Bernoulli('likelihood', p=p_prior, observed=tosses)     # "observed=data" is key for likelihood.

MCMC Inference Button 

with coin_model:
    step = pm.Metropolis()      # focus on this, the Inference Button:
    coin_trace = pm.sample(2000, step=step)

结果:

pm.traceplot(coin_trace)
plt.show()

pm.plot_posterior(coin_trace[100:], color='#87ceeb',rope=[0.48,0.52], point_estimate='mean', ref_val=0.5)
plt.show()

 

 

例子2:化学活性问题

我有一个新开发的分子X; X在阻止流感方面的效果有多好?

实验

数据

import numpy as np
import pandas as pd

chem_data = [(0.00080, 99),
(0.00800, 91),
(0.08000, 89),
(0.40000, 89),
(0.80000, 79),
(1.60000, 61),
(4.00000, 39),
(8.00000, 25),
(80.00000, 4)]

chem_df = pd.DataFrame(chem_data)
chem_df.columns = ['concentration', 'activity']
chem_df['concentration_log'] = chem_df['concentration'].apply(lambda x:np.log10(x)) 

参数问题

给出数据,化学品的IC50 值是多少, 以及其周围的不确定性?

说明

可视化数据

def plot_chemical_data(log=True):
    fig = plt.figure(figsize=(10,6))
    ax = fig.add_subplot(1,1,1)       
    if log:
        ax.scatter(x=chem_df['concentration_log'], y=chem_df['activity'])
        ax.set_xlabel('log10(concentration (mM))', fontsize=20)    
    else:
        ax.scatter(x=chem_df['concentration'], y=chem_df['activity'])
        ax.set_xlabel('concentration (mM)', fontsize=20)
    ax.set_xticklabels([int(i) for i in ax.get_xticks()], fontsize=18)
    ax.set_yticklabels([int(i) for i in ax.get_yticks()], fontsize=18)

    plt.hlines(y=50, xmin=min(ax.get_xlim()), xmax=max(ax.get_xlim()), linestyles='--',)    
    return fig

fig = plot_chemical_data(log=True)
plt.show()

with pm.Model() as ic50_model:
    beta = pm.HalfNormal('beta', sd=100**2)
    ic50_log10 = pm.Flat('IC50_log10')  # Flat prior
    # MATH WITH DISTRIBUTION OBJECTS!
    measurements = beta / (1 + np.exp(chem_df['concentration_log'].values - ic50_log10))

    y_like = pm.Normal('y_like', mu=measurements, observed=chem_df['activity'])  
    ic50 = pm.Deterministic('IC50', np.power(10, ic50_log10))

# MCMC Inference Button 
with ic50_model:
    step = pm.Metropolis()
    ic50_trace = pm.sample(10000, step=step)
pm.traceplot(ic50_trace[2000:], varnames=['IC50_log10', 'IC50'])  # live: sample from step 2000 onwards.
plt.show()

 

pm.plot_posterior(ic50_trace[4000:], varnames=['IC50'], color='#87ceeb', point_estimate='mean')
plt.show()

 该化学物质的IC50在约 [2mM,2.4mM](95%HPD)

 

标签:统计分析,set,df,贝叶斯,PyMC3,import,ax,chem,pm
来源: https://blog.csdn.net/weixin_38314865/article/details/97367566