本篇开启机器学习另一个重要的知识点,逻辑回归(Logistic Regression),其实它也是传承了线性回归的知识,只是稍微做了点变动.$ $

公式

讲到逻辑回归(Logistic regression),我们通常想到的是二分类.但是为什么它的名字叫回归了.下面我们来解开其面纱.
首先我要声明一下,为什么选择\(g(z)\)这个函数,我这儿还不能解释

\[ \begin{align} &h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^{T}x}}\\ &g(z)=\frac{1}{1+e^{-z}} \end{align} \]

我们知道\(0\leqslant h\leq 1\),它可以表示一个概率,由于这儿是二分类问题,很自然有下面的概率假设:

\[ \begin{align} &P(y=1|x;\theta)=h_{\theta}(x)\\ \\ &P(y=0|x;\theta)=1-h_{\theta}(x)\\ \\ &\Rightarrow p(y|x;\theta)=(h_{\theta}(x))^{y}(1-h_{\theta}(x))^{1-y} \end{align} \]

这样,我们面对得就是一个概率问题.我们要使得数据尽可能得满足该概率假设,说白了其实就是参数估计问题.这里我们采用最大似然估计.

\[ \begin{align} L(\theta)&=\prod_{i=1}^{m}p(y^{(i)}|x^{(i)};\theta)\\ &=\prod_{i=1}^{m}(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}} \end{align} \]

接下来就是,求偏导数:

\[ \begin{align} \frac{\partial }{\partial \theta_{j}}l(\theta)=(y-h_{\theta}(x))x_{j} \end{align} \]

到这儿我们会想,是否有解析解,可惜的是确实没有,那么我们就只能用迭代算法了.

\[ \theta_{j}:=\theta_{j}+\alpha(y-h_{\theta}(x))x_{j} \]

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import numpy as np
import numpy.matlib
import matplotlib.pyplot as plt
import time
def sigmoid(inX):
return 1.0 / (1 + np.exp(-inX))
def trainLogRegres(train_x, train_y, opts):
# calculate training time
startTime = time.time()
numSamples, numFeatures = train_x.shape
alpha = opts['alpha']; maxIter = opts['maxIter']
theta = np.matlib.ones((numFeatures, 1))
# optimize through gradient descent algorilthm
for k in range(maxIter):
if opts['optimizeType'] == 'gradDescent': # gradient descent algorilthm
output = sigmoid(train_x * theta)
error = train_y - output
theta = theta + alpha * train_x.T * error
elif opts['optimizeType'] == 'stocGradDescent': # stochastic gradient descent
for i in range(numSamples):
output = sigmoid(train_x[i, :] * theta)
error = train_y[i, 0] - output
theta = theta + alpha * train_x[i, :].T * error
elif opts['optimizeType'] == 'smoothStocGradDescent': # smooth stochastic gradient descent
# randomly select samples to optimize for reducing cycle fluctuations
dataIndex = range(numSamples)
for i in range(numSamples):
alpha = 4.0 / (1.0 + k + i) + 0.01
randIndex = int(np.random.uniform(0, len(dataIndex)))
output = sigmoid(train_x[randIndex, :] * theta)
error = train_y[randIndex, 0] - output
theta = theta + alpha * train_x[randIndex, :].T * error
del(dataIndex[randIndex]) # during one interation, delete the optimized sample
else:
raise NameError('Not support optimize method type!')
print 'Congratulations, training complete! Took %fs!' % (time.time() - startTime)
return theta
def testLogRegres(theta, test_x, test_y):
numSamples, numFeatures = np.shape(test_x)
matchCount = 0
for i in xrange(numSamples):
predict = sigmoid(test_x[i, :] * theta)[0, 0] > 0.5
if predict == bool(test_y[i, 0]):
matchCount += 1
accuracy = float(matchCount) / numSamples
return accuracy
def showLogRegres(theta, train_x, train_y):
# notice: train_x and train_y is mat datatype
numSamples, numFeatures = np.shape(train_x)
if numFeatures != 3:
print "Sorry! I can not draw because the dimension of your data is not 2!"
return 1
# draw all samples
for i in xrange(numSamples):
if int(train_y[i, 0]) == 0:
plt.plot(train_x[i, 1], train_x[i, 2], 'ro')
elif int(train_y[i, 0]) == 1:
plt.plot(train_x[i, 1], train_x[i, 2], 'bo')
# draw the classify line
min_x = min(train_x[:, 1])[0, 0]
max_x = max(train_x[:, 1])[0, 0]
theta = theta.getA() # convert mat to array
y_min_x = float(-theta[0] - theta[1] * min_x) / theta[2]
y_max_x = float(-theta[0] - theta[1] * max_x) / theta[2]
plt.plot([min_x, max_x], [y_min_x, y_max_x], '-g')
plt.xlabel('X1'); plt.ylabel('X2')
plt.show()
def loadData():
data=np.loadtxt('lr_nonlinear_data.txt')
train_x=data[:,0:2]
train_y=data[:,2:]
train_x=np.insert(train_x,0,1,axis=1)
return np.mat(train_x), np.mat(train_y)
## step 1: load data
print "step 1: load data..."
train_x, train_y = loadData()
test_x = train_x; test_y = train_y
## step 2: training...
print "step 2: training..."
opts = {'alpha': 0.01, 'maxIter': 500, 'optimizeType': 'smoothStocGradDescent'}
optimalTheta = trainLogRegres(train_x, train_y, opts)
## step 3: testing
print "step 3: testing..."
accuracy = testLogRegres(optimalTheta, test_x, test_y)
## step 4: show the result
print "step 4: show the result..."
print 'The classify accuracy is: %.3f%%' % (accuracy * 100)
showLogRegres(optimalTheta, train_x, train_y)
print sigmoid(train_x*optimalTheta)

全文到了这里其实还没有去说回归和分类的关系,如果认真阅读输出数据,你会发现\(\theta^{T}x\)通过\(g(z)\)映射后的结果要么是接近1,要么是接近0.其实我们还是在做回归,就是在拟合\(h_{\theta}(x)\)这个函数,只是最后学习到得\(h_{\theta}(x)\)使得数据分布很极端.如此以来,就达到了分类的效果.从图像上看,更神奇的是最终学到的\(\theta\)构成的直线\(\theta_{0}+\theta_{1}x+\theta_{2}y=0\)居然是将两类不同点分开的界限.逻辑回归就是用回归做分类

数据下载:Data