其他分享
首页 > 其他分享> > Logistic回归——原理加实战

Logistic回归——原理加实战

作者:互联网

Logistic回归

1. 什么是Logistic回归

Logistic是一种常用的分类方法,属于对数线性模型,利用Logistic回归,根据现有数据对分类边界建立回归公式,以此进行分类。

回归:假设现有一些数据点,我们用一条直线对这些点进行拟合,这个拟合过程就称为回归

2. Logistic回归与Sigmoid函数

Sigmoid函数:

\[\sigma(z) = \frac{1}{1 + e^{-z}} \tag{1} \]

下图为Sigmoid函数曲线图。z为0时,Sigmoid函数值为0.5。随着z的增大,Sigmoid函数值将趋近于1;随着x的减小,Sigmoid函数值将趋近于0.
sigmoid

为了实现Logistic回归分类器,我们在每个特征上都乘以一个回归系数,然后把结果相加,总和带入Sigmoid函数中,得到一个0~1的数值。若数值大于0.5,则被分为1类,否则,分为0类。

Logistic回归:

考虑n维特征\(x = (x_0,x_1,x_2,\cdots,x_n)\),参数向量\(w=(w_0,w_1,w_2,\cdots,w_n)\)我们对输入数据线性加权得:

\[z = w^Tx = w_0x_0+ w_1x_1 + w_2x_2 + \cdots \cdots + w_nx_n \tag{2} \]

将z作为自变量带入Sigmoid函数中,得到一个0~1的数值。若数值大于0.5,则被分为1类,否则,分为0类。即

\[\sigma(z) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-w^Tx}} = \sigma(w^Tx) \]

现在的问题则是:如何确定最佳参数\(w\)从而使分类尽可能地准确。

3. Logistic回归参数优化法

3.1.1 梯度上升法

梯度上升法即沿着该函数的梯度方向探寻,寻找最优解。记函数f(x,y)的梯度为

\[\nabla f(x,y) = \begin{bmatrix} \frac{\partial f(x,y)}{\partial x}\\ \frac{\partial f(x,y)}{\partial y} \end{bmatrix} \]

梯度算法迭代公式如下:

\[w = w + \alpha \nabla_w f(w) \]

改公式一直迭代执行,直至达到某个条件未知,如达到可以允许的误差范围,或迭代次数达到某个值。

PS: 梯度上升法用来求最大值,而梯度下降法是用来求最小值

3.1.2 目标函数与梯度上升

目标函数

使用梯度上升算法之前,我们需要知道如何优化,才能达到我们的目的,即目标函数是什么,根据目标函数来使用梯度上升算法。我们考虑二分类问题,其中包含类别1与类别0,可以得到预测函数,公式如下:

\[f_w(x)= \sigma(w^Tx) = \frac{1}{1 + e^{-w^Tx}} \]

\(f_w(x)\)的值表示\(y=1\)的概率,因此分类结果为类别1与类别0的概率分别为:

\[P(y=1|x;w) = f_w(x)\\ \\ P(y=0|x;w) = 1 - f_w(x) \]

即:

\[P(y|x;w) = (f_w(x))^y (1 - f_w(x))^{1-y} \]

其似然函数为:

\[L(w) = \prod_{i=1}^{m}P(y^{(i)}|x^{(i)};w) = \prod_{i=1}^{m}(f_w(x^{(i)}))^{y^{(i)}} (1 - f_w(x^{(i)}))^{1-y^{(i)}} \]

\(m\)为样本个数

对数似然函数为:

\[l(w) = lnL(w) = \sum_{i=1}^{m}\begin{Bmatrix}y^{(i)}ln(f_w(x^{(i)})) + (1-y^{(i)})ln(1 - f_w(x^{(i)}))\end{Bmatrix} \]

最大似然估计就是要求使得\(l(w)\)达到最大值的\(w\),所以目标函数就是\(l(w)\)

梯度

\[\begin{align} \frac{\partial l(w)}{\partial w_j} &= \sum_{i = 1}^{m}\begin{Bmatrix} y^{(i)}\frac{1}{f_x(x^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} - (1-y^{(i)})\frac{1}{1 - f_x(x^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} - (1-y^{(i)})\frac{1}{1 - f_x(x^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} \frac{\partial \sigma(w^Tx^{(i)})}{\partial w_j}(y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})} - (1-y^{(i)})\frac{1}{1 - \sigma(w^Tx^{(i)}))}) \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} \frac{\partial \sigma(w^Tx^{(i)})}{\partial w_j}(y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})} - (1-y^{(i)})\frac{1}{1 - \sigma(w^Tx^{(i)})}) \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} \sigma(w^Tx^{(i)}) (1 - \sigma(w^Tx^{(i)})) \frac{\partial w^Tx^{(i)}}{\partial w_j} (y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})} - (1-y^{(i)})\frac{1}{1 - \sigma(w^Tx^{(i)})}) \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} y^{(i)}(1 - \sigma(w^Tx^{(i)})) - (1-y^{(i)})\sigma(w^Tx^{(i)})x^{(i)}_j \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} (y^{(i)}- \sigma(w^Tx^{(i)}))x^{(i)}_j \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} (y^{(i)}- f_x(x^{(i)}))x^{(i)}_j \end{Bmatrix} \end{align} \]

3.1.3 梯度上升法代码实现

import numpy as np

def loadDataSet():
    dataMat = []
    labelMat = []
    file = open('testSet.txt','r')				# testSet.txt可在附录获取
    for line in file:
        strLine = line.strip().split()
        dataMat.append([1.0,float(strLine[0]),float(strLine[1])])
        labelMat.append([strLine[2]])
    return dataMat,labelMat

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

def lossFunction(y,y_hat):		# 梯度的相减部分
    return y - y_hat

def gradAscent(data,labels):
    dataMat = np.mat(data, dtype = 'float64')                      # 转换为numpy数据类型
    labelMat = np.mat(labels, dtype = 'float64')
    m,n = dataMat.shape
    lr = 0.001
    epochs = 500
    weights = np.ones((n,1))
    for epoch in range(epochs):
        labelEst = sigmoid(dataMat*weights)
        loss = lossFunction(labelMat,labelEst)					# 目标函数
        weights = weights + lr * dataMat.transpose() * loss
    return weights

def plotBestFit(weight):
    weightArray = weight.getA()
    dataMat, labelMat = loadDataSet()
    dataArr = np.array(dataMat)
    n = dataArr.shape[0]
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):
        if int(labelMat[i][0]) == 1:
            xcord1.append(dataArr[i,1])
            ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1])
            ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1,ycord1,s=30,c='red',marker='s')
    ax.scatter(xcord2,ycord2,s=30,c='green')
    x = np.arange(-3.0,3.0,0.1)
    y = (-weightArray[0] - weightArray[1] * x) / weightArray[2]
    ax.plot(x,y)
    plt.xlabel('x1'); plt.ylabel('x2')
    plt.show()

data,labels = loadDataSet()
weights = gradAscent(data,labels)
print(weights)
plotBestFit(weights)

可视化:

3.1.4 改进的随机梯度上升算法

随机梯度下降法不同于批量梯度下降,随机梯度下降是每次迭代使用一个样本来对参数进行更新。使得训练速度加快。推荐一篇讲的非常好的博文:批量梯度下降、随机梯度下降和小批量梯度下降

def stocGradAscent1(dataMatrix, classLabels,numIter=150):
    dataMatrix = np.array(data, dtype='float64')  # 转换为numpy数据类型
    classLabels = np.array(labels, dtype='float64')
    m,n = np.shape(dataMatrix)
    weights = np.ones(n)
    for j in range(numIter):
        dataIndex = list(range(m))
        for i in range(m):
            lr = 4 / (1.0 +j + i) + 0.01
            randIndex = int(random.uniform(0,len(dataIndex)))
            h = sigmoid(sum(dataMatrix[randIndex] * weights))
            error = classLabels[randIndex] - h
            weights = weights + lr * error * dataMatrix[randIndex]
            del (dataIndex[randIndex])
    return weights

4. 附录

testSet.txt文件:

-0.017612	14.053064	0
-1.395634	4.662541	1
-0.752157	6.538620	0
-1.322371	7.152853	0
0.423363	11.054677	0
0.406704	7.067335	1
0.667394	12.741452	0
-2.460150	6.866805	1
0.569411	9.548755	0
-0.026632	10.427743	0
0.850433	6.920334	1
1.347183	13.175500	0
1.176813	3.167020	1
-1.781871	9.097953	0
-0.566606	5.749003	1
0.931635	1.589505	1
-0.024205	6.151823	1
-0.036453	2.690988	1
-0.196949	0.444165	1
1.014459	5.754399	1
1.985298	3.230619	1
-1.693453	-0.557540	1
-0.576525	11.778922	0
-0.346811	-1.678730	1
-2.124484	2.672471	1
1.217916	9.597015	0
-0.733928	9.098687	0
-3.642001	-1.618087	1
0.315985	3.523953	1
1.416614	9.619232	0
-0.386323	3.989286	1
0.556921	8.294984	1
1.224863	11.587360	0
-1.347803	-2.406051	1
1.196604	4.951851	1
0.275221	9.543647	0
0.470575	9.332488	0
-1.889567	9.542662	0
-1.527893	12.150579	0
-1.185247	11.309318	0
-0.445678	3.297303	1
1.042222	6.105155	1
-0.618787	10.320986	0
1.152083	0.548467	1
0.828534	2.676045	1
-1.237728	10.549033	0
-0.683565	-2.166125	1
0.229456	5.921938	1
-0.959885	11.555336	0
0.492911	10.993324	0
0.184992	8.721488	0
-0.355715	10.325976	0
-0.397822	8.058397	0
0.824839	13.730343	0
1.507278	5.027866	1
0.099671	6.835839	1
-0.344008	10.717485	0
1.785928	7.718645	1
-0.918801	11.560217	0
-0.364009	4.747300	1
-0.841722	4.119083	1
0.490426	1.960539	1
-0.007194	9.075792	0
0.356107	12.447863	0
0.342578	12.281162	0
-0.810823	-1.466018	1
2.530777	6.476801	1
1.296683	11.607559	0
0.475487	12.040035	0
-0.783277	11.009725	0
0.074798	11.023650	0
-1.337472	0.468339	1
-0.102781	13.763651	0
-0.147324	2.874846	1
0.518389	9.887035	0
1.015399	7.571882	0
-1.658086	-0.027255	1
1.319944	2.171228	1
2.056216	5.019981	1
-0.851633	4.375691	1
-1.510047	6.061992	0
-1.076637	-3.181888	1
1.821096	10.283990	0
3.010150	8.401766	1
-1.099458	1.688274	1
-0.834872	-1.733869	1
-0.846637	3.849075	1
1.400102	12.628781	0
1.752842	5.468166	1
0.078557	0.059736	1
0.089392	-0.715300	1
1.825662	12.693808	0
0.197445	9.744638	0
0.126117	0.922311	1
-0.679797	1.220530	1
0.677983	2.556666	1
0.761349	10.693862	0
-2.168791	0.143632	1
1.388610	9.341997	0
0.317029	14.739025	0

标签:实战,partial,Bmatrix,Tx,梯度,回归,Logistic,frac,sigma
来源: https://www.cnblogs.com/Aegsteh/p/16226100.html