Introduction
集中不等式,或概率界限,是机器学习算法或随机算法的分析是非常重要的工具。亚洲金博宝在统计学习理论,我们经常要证明随机变量,给出了一些假设,已经接近它的高概率的预期。本文提供的这些浓度措施的分析最基本的不平等的概述。
马尔可夫不等式
马尔可夫不等式是最基本的边界之一,它假定几乎一无所知的随机变量。The assumptions that Markov’s inequality makes is that the random variable \(X\) is non-negative \(X > 0\) and has a finite expectation \(\mathbb{E}\left[X\right] < \infty\). The Markov’s inequality is given by:
$$ \ underbrace {P(X \ GEQ \阿尔法)} _ {\文本{的大于恒定} \阿尔法概率} \当量\ underbrace {\压裂{\ mathbb {E} \左[X \权利]}{\阿尔法}} _ {\文本{界以上由期望超过恒定} \阿尔法} $$
这意味着由恒定\(\阿尔法\)划分的概率随机变量\(X \)将由\(X \)的期望的限制。什么是显着的这个约束,是它拥有与正值任何分配,它不依赖于概率分布的任何功能,只需要一些薄弱的假设和它的第一时刻,期待。
例:一家杂货店出售的平均每天40种啤酒(它的夏天!)。是什么,将出售80个或更多的啤酒明天的概率是多少?
$$
\ {开始}对齐
P(X \ GEQ \阿尔法)\当量\压裂{\ mathbb {E} \左[X \右]} {\阿尔法} \\\\
P(X \geq 80) & \leq\frac{40}{80} = 0.5 = 50\%
\end{align}
$$
马尔可夫不等式不依赖于随机变量的概率分布的任何财产,所以很明显,有两种使用更好的界限,如果有关概率分布的信息是可用的。
切比雪夫不等式
When we have information about the underlying distribution of a random variable, we can take advantage of properties of this distribution to know more about the concentration of this variable. Let’s take for example a normal distribution with mean \(\mu = 0\) and unit standard deviation \(\sigma = 1\) given by the probability density function (PDF) below:
$$ F(X)= \压裂{1} {\ SQRT {2 \ PI}}ë^ { - X ^ 2/2} $$
Integrating from -1 to 1: \(\int_{-1}^{1} \frac{1}{\sqrt{2\pi}}e^{-x^2/2}\), we know that 68% of the data is within \(1\sigma\) (one standard deviation) from the mean \(\mu\) and 95% is within \(2\sigma\) from the mean. However, when it’s not possible to assume normality, any other amount of data can be concentrated within \(1\sigma\) or \(2\sigma\).
Chebyshev’s inequality provides a way to get a bound on the concentration for any distribution, without assuming any underlying property except a finite mean and variance. Chebyshev’s also holds for any random variable, not only for non-negative variables as in Markov’s inequality.
切比雪夫不等式是由以下关系式给出:
$$
P(\中期X - \亩\中间\ GEQķ\西格马)\当量\压裂{1} {K ^ 2}
$$
that can also be rewritten as:
$$
P(\mid X – \mu \mid < k\sigma) \geq 1 – \frac{1}{k^2}
$$
For the concrete case of \(k = 2\), the Chebyshev’s tells us that at least 75% of the data is concentrated within 2 standard deviations of the mean. And this holds forany distribution。
Now, when we compare this result for \( k = 2 \) with the 95% concentration of the normal distribution for \(2\sigma\), we can see how conservative is the Chebyshev’s bound. However, one must not forget that this holds for any distribution and not only for a normally distributed random variable, and all that Chebyshev’s needs, is the first and second moments of the data. Something important to note is that in absence of more information about the random variable, this cannot be improved.
切比雪夫不等式与大数定律弱
Chebyshev’s inequality can also be used to prove the弱大数定律,它说,在概率对真实均值样本均值收敛。
That can be done as follows:
- 考虑独立同分布的序列(独立同分布)的随机变量\(X_1,X_2,X_3,\ ldots \),平均\(\亩\)和方差\(\西格马^ 2 \);
- The sample mean is \(M_n = \frac{X_1 + \ldots + X_n}{n}\) and the true mean is \(\mu\);
- 对于样品的期望意味着我们有:$$ \ mathbb {E} \左[M_n \右] = \压裂{\ mathbb {E} \左[X_1 \右] + \ ldots + \ mathbb {E} \左[X_n \右]} {N} = \压裂{N \亩} {N} = \亩$$
- 对于样品的方差,我们有:$$瓦尔\左[M_n \右] = \压裂{VAR \左[X_1 \右] + \ ldots +无功\左[X_n \右]} {N ^ 2} =\压裂{N \西格玛^ 2} {N ^ 2} = \压裂{\西格马^ 2} {N} $$
- 由切比雪夫不等式的应用e have: $$ P(\mid M_n – \mu \mid \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}$$ for any (fixed) \(\epsilon > 0\), as \(n\) increases, the right side of the inequality goes to zero. Intuitively, this means that for a large \(n\) the concentration of the distribution of \(M_n\) will be around \(\mu\).
提高马尔科夫的切比雪夫和与切尔诺夫界
Before getting into the Chernoff bound, let’s understand the motivation behind it and how one can improve on Chebyshev’s bound. To understand it, we first need to understand the difference between a pairwise independence and mutual independence. For the pairwise independence, we have the following for A, B, and C:
$$
P(A \帽B)= P(A)P(B)\\
P(A \帽C)= P(A)P(C)\\
P(B \cap C) = P(B)P(C)
$$
Which means that any pair (any two events) are independent, but not necessarily that:
$$
P(A \cap B\cap C) = P(A)P(B)P(C)
$$
which is called “mutual independence” and it is a stronger independence. By definition, the mutual independence assumes the pairwise independence but the opposite isn’t always true. And this is the case where we can improve on Chebyshev’s bound, as it is not possible without doing these further assumptions (stronger assumptions leads to stronger bounds).
我们将谈论在本教程的第二部分中的切尔诺夫界!
Is there is a typo in equation 9? Should be $\frac{\sigma^2}{n}$ right?
You’re right, thanks for seeing this, will fix it now.