拉普拉斯平滑(Laplacian smoothing)(Laplacian smoothing)

  概念

  • 零概率问题:在计算事件的概率时,如果某个事件在观察样本库(训练集)中没有出现过,会导致该事件的概率结果是  $0$ 。这是不合理的,不能因为一个事件没有观察到,就被认为该事件一定不可能发生(即该事件的概率为 $0$ )。

  拉普拉斯平滑(Laplacian smoothing) 是为了解决零概率的问题。

  • 法国数学家 拉普拉斯 最早提出用 加 $1$  的方法,估计没有出现过的现象的概率。
  • 理论假设:假定训练样本很大时,每个分量 $x$ 的计数加  $1$  造成的估计概率变化可以忽略不计,但可以方便有效的避免零概率问题

  具体公式  对于一个随机变量  $\mathrm{z} $ , 它的取值范围是   $\{1,2,3 \ldots, \mathrm{k}\} $, 对于   $\mathrm{m} $  次试验后的观测 结果  $  \left\{\mathrm{z}^{(1)}, \mathrm{z}^{(2)}, \mathrm{z}^{(3)}, \ldots, \mathrm{z}^{(\mathrm{m})}\right\} $, 极大似然估计按照下式计算:

    $\varphi_{j}=\frac{\sum_{i=1}^{m} I\left\{z^{(i)}=j\right\}}{m}$

  使用 Laplace 平滑后, 计算公式变为:

    $\varphi_{j}=\frac{\sum_{i=1}^{m} I\left\{z^{(i)}=j\right\}+1}{m+\mathrm{k}}$

  即在分母上加上取值范围的大小, 在分子加  $1$ 。  总结: 分子加一,分母加  $K$,$K$  代表类别数目。

  应用场景举例  假设在文本分类中,有  $3$  个类:$C_1$、$C_2$、$C_3$  在指定的训练样本中,某个词语  $K_1$ ,在各个类中观测计数分别为  $0$,$990$,$10$。  则对应   $K_1$  的概率为 $0,0.99,0.01$。

  显然  $C_1$  类中概率为  $0$,不符合实际。

  于是对这三个量使用拉普拉斯平滑的计算方法如下:  $1/1003 = 0.001$,$991/1003=0.988$,$11/1003=0.011$  在实际的使用中也经常使用加 $λ$($0≤λ≤1$)来代替简单加  $1$。如果对  $N$个计数都加上  $λ$,这时分母也要记得加上 $N*λ$。

————————

< strong > concept < / strong >

  • Zero probability problem: when calculating the probability of an event, if an event does not appear in the observation sample database (training set), the probability result of the event will be $0 $. This is unreasonable. An event cannot be considered impossible because it is not observed (that is, the probability of the event is $0 $).

< strong > Laplacian smoothing is to solve the problem of zero probability

  • French mathematician Laplace first proposed the method of adding $1 $to estimate the probability of phenomena that have not occurred.
  • Theoretical hypothesis: assuming that the training sample is large, the change of estimated probability caused by the count of $x $plus $1 $of each component can be ignored, but the problem of zero probability can be avoided conveniently and effectively

< strong > specific formula < / strong > for a random variable $\ mathrm {Z} $, its value range is $\ {1,2,3 \ ldots, \ mathrm {K} \} $. For the observation results after $\ mathrm {m} $tests $\ left \ {\ mathrm {Z} ^ {(1)}, \ mathrm {Z} ^ {(2)}, \ mathrm {Z} ^ {(3)}, \ ldots, \ mathrm {Z} ^ {(\ mathrm {m})} \ right \} $, the maximum likelihood estimation is calculated according to the following formula:

$\varphi_ {j}=\frac{\sum_{i=1}^{m} I\left\{z^{(i)}=j\right\}}{m}$

After using Laplace smoothing, the calculation formula becomes:

$\varphi_ {j}=\frac{\sum_{i=1}^{m} I\left\{z^{(i)}=j\right\}+1}{m+\mathrm{k}}$

That is, add the value range to the denominator and $1 to the numerator. Summary: numerator plus one, denominator plus $k $, $k $represents the number of categories.

The application scenario example assumes that there are $3 $classes in text classification: $C_ 1$、$C_ 2$、$C_ 3 $in the specified training sample, a word $k_ 1 $, and the observation counts in each class are $0 $, $990 $, $10 $, respectively. Then it corresponds to $K_ The probability of 1 $is $0, 0.99, 0.01 $.

Obviously $C_ The probability in the 1 $class is $0 $, which is not realistic.

Therefore, the calculation method of Laplace smoothing for these three quantities is as follows: $1 / 1003 = 0.001 $, $991 / 1003 = 0.988 $, $11 / 1003 = 0.011 $. Addition is often used in practical use$ λ$ ($0≤ λ ≤ 1 $) instead of simply adding $1 $. If you add $n $to all counts$ λ$, Remember to add $n to the denominator* λ$。