2016-11-25

# 线性分类

f(xi,W,b)=Wxi+b

f(xi,W,b)=Wxi+b

f(xi,W)=Wxi

## 多类支持向量机损失（Multiclass Support Vector Machine loss）

Li=&sum;j&ne;yimax(0,sj?syi+&Delta;)

Li=max(0,?7?13+10)+max(0,11?13+10)

Li=&sum;j&ne;yimax(0,wTjxi?wTyixi+&Delta;)

R(W)=&sum;k&sum;lW2k,l

L=1N&sum;iLi+&lambda;R(W)

L=1N&sum;i&sum;j&ne;yi[max(0,f(xi;W)j?f(xi;W)yi+&Delta;)]+&lambda;&sum;k&sum;lW2k,l
N是训练集中的图像数量。我们将正则化惩罚函数乘以一个超参数&lambda;加到损失对象中。一般来说没有简单方法来设置这个超参数，常用的确定方法是交叉验证。

```def L_i(x, y, W):
"""
unvectorized version. Compute the multiclass svm loss for a single example (x,y)
- x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
with an appended bias dimension in the 3073-rd position (i.e. bias trick)
- y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
- W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
"""
delta = 1.0 # see notes about delta later in this section
scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
correct_class_score = scores[y]
D = W.shape[0] # number of classes, e.g. 10
loss_i = 0.0
for j in xrange(D): # iterate over all wrong classes
if j == y:
# skip for the true class to only loop over incorrect classes
continue
# accumulate loss for the i-th example
loss_i += max(0, scores[j] - correct_class_score + delta)
return loss_i

def L_i_vectorized(x, y, W):
"""
A faster half-vectorized implementation. half-vectorized
refers to the fact that for a single example the implementation contains
no for loops, but there is still one loop over the examples (outside this function)
"""
delta = 1.0
scores = W.dot(x)
# compute the margins for all classes in one vector operation
margins = np.maximum(0, scores - scores[y] + delta)
# on y-th position scores[y] - scores[y] canceled and gave delta. We want
# to ignore the y-th position and only consider margin on max wrong class
margins[y] = 0
loss_i = np.sum(margins)
return loss_i

def L(X, y, W):
"""
fully-vectorized implementation :
- X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
- y is array of integers specifying correct class (e.g. 50,000-D array)
- W are weights (e.g. 10 x 3073)
"""
# evaluate loss over all examples in X without using any for loops
# left as exercise to reader in the assignment```

SVM损失使用一个特定的方法来衡量基于训练集的预测与真实标签的一致程度。在训练集上的正确预测与最小化损失是一样重要的。

## 实践中的考量

Li=Cmax(0,1?yiwTxi)+R(W)

## Softmax 分类器

SVM是常见的两个分类器之一，另外一个就是有不同的损失函数的Softmax分类器。如果你听说过二元逻辑斯蒂回归分类器，那么Softmax就是它扩展到多分类的情况。与SVM把输出f(wi,W)当做每个类别的得分（无标定，并且很难理解）不一样，Softmax分类器给出了更加直观的输出。在Softmax中，用f(xi;W)=Wxi做映射这一步没有变化，但是我们将这些得分看做每个分类的未标准化的对数概率，并且用互熵损失（cross-entropy loss）代替了铰链损失（hinge loss），并有了如下表达式：
Li=?log(efyi&sum;jefj)等同于Li=?fyi+log&sum;jefj

H(p,q)=?&sum;xp(x)logq(x)
Softmax分类器将估计的分类可能性（q=efyi&sum;jefj）和“真”分布情况（也就是我们只在正确的类别有一个1，而其他错误的类别处均为0，例如p=[0,...1,...0]）间的互熵减到最小。而且，由于互熵可以被写成熵和相对熵（Kullback-Leibler divergence）两部分，也就是H(p,q)=H(p)+DKL(p||q)&Delta;函数p的熵是0，所以这也等同于最小化两个分布之间的相对熵（一种衡量距离的方法）。也就是说，互熵想要让预测分布把所有的值放到正确的分类中。

P(yi|xi;W)=efyi&sum;jefj

efyi&sum;jefj=CefyiC&sum;jefj=efyi+logC&sum;jefj+logC

```f = np.array([123, 456, 789]) # example with 3 classes and each having large scores
p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup

# instead: first shift the values of f so that the highest number is 0:
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer```

## SVM与Softmax

SVM与Softmax间不同点的例子。在两个情况下，我们都计算了相同的得分向量f（在这个部分里是矩阵乘法）。不同点在于对于得分向量f的理解：SVM将其理解为类别得分，它的损失函数让正确分类的得分至少比其他类别高&Delta;。Softmax分类器则将其解释为对应每个分类的未标准化的对数概率，再将正确分类的标准化的对数概率提高（也就是让错误分类的对数概率降低）。样例中对应的SVM损失为1.58而Softmax对应的损失为1.04（注意这里的底数不是2，也不是10，而是自然对数底数e）。这些数字不具有可比性，他们只在同一以及同一分类器下才有意义。

Softmax分类器为每个分类提供“概率” 与计算无标定而且难以解释的所有类别得分的SVM不同，Softmax允许我们计算所有标签（分类）的可能性。比如，一个给定的图片在SVM分类器中对于分类“猫”，“狗”和“船”有得分[12.5,0.6,?23.0]。Softmax可以计算这三种标签对应的可能性[0.9,0.09,0.01]，也就是每个类别的置信度。我们把概率这个词放在引号里，是因为这些概率多高或者多分散，都是由正则化系数&lambda;直接决定的—-作为系统输入你可以决定这个系数。比如，这里有对应三个类别的未标准化的对数概率[1,?2,0]。Softmax函数会计算：
[1,?2,0]&rarr;[e1,e?2,e0]=[2.71,0.14,1]&rarr;[0.7,0.04,0.26]

[0.5,?1,0]&rarr;[e0.5,e?1,e0]=[1.65,0.37,1]&rarr;[0.55,0.12,0.33]