Azuma's_inequality

Azuma's inequality

Add article description

In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences.

Suppose $\{X_{k}:k=0,1,2,3,\dots \}$ is a martingale (or super-martingale) and

|X_{k}-X_{k-1}|\leq c_{k},\,

almost surely. Then for all positive integers N and all positive reals $\epsilon$ ,

{\text{P}}(X_{N}-X_{0}\geq \epsilon )\leq \exp \left({-\epsilon ^{2} \over 2\sum _{k=1}^{N}c_{k}^{2}}\right).

And symmetrically (when X_k is a sub-martingale):

{\text{P}}(X_{N}-X_{0}\leq -\epsilon )\leq \exp \left({-\epsilon ^{2} \over 2\sum _{k=1}^{N}c_{k}^{2}}\right).

If X is a martingale, using both inequalities above and applying the union bound allows one to obtain a two-sided bound:

{\text{P}}(|X_{N}-X_{0}|\geq \epsilon )\leq 2\exp \left({-\epsilon ^{2} \over 2\sum _{k=1}^{N}c_{k}^{2}}\right).

A general form of Azuma's inequality

Limitation of the vanilla Azuma's inequality

Note that the vanilla Azuma's inequality requires symmetric bounds on martingale increments, i.e. $-c_{t}\leq X_{t}-X_{t-1}\leq c_{t}$ . So, if known bound is asymmetric, e.g. $a_{t}\leq X_{t}-X_{t-1}\leq b_{t}$ , to use Azuma's inequality, one need to choose $c_{t}=\max(|a_{t}|,|b_{t}|)$ which might be a waste of information on the boundedness of $X_{t}-X_{t-1}$ . However, this issue can be resolved and one can obtain a tighter probability bound with the following general form of Azuma's inequality.

Statement

Let $\left\{X_{0},X_{1},\cdots \right\}$ be a martingale (or supermartingale) with respect to filtration $\left\{{\mathcal {F}}_{0},{\mathcal {F}}_{1},\cdots \right\}$ . Assume there are predictable processes $\left\{A_{0},A_{1},\cdots \right\}$ and $\left\{B_{0},B_{1},\dots \right\}$ with respect to $\left\{{\mathcal {F}}_{0},{\mathcal {F}}_{1},\cdots \right\}$ , i.e. for all $t$ , $A_{t},B_{t}$ are ${\mathcal {F}}_{t-1}$ -measurable, and constants $0<c_{1},c_{2},\cdots <\infty$ such that

A_{t}\leq X_{t}-X_{t-1}\leq B_{t}\quad {\text{and}}\quad B_{t}-A_{t}\leq c_{t}

almost surely. Then for all $\epsilon >0$ ,

{\text{P}}(X_{n}-X_{0}\geq \epsilon )\leq \exp \left(-{\frac {2\epsilon ^{2}}{\sum _{t=1}^{n}c_{t}^{2}}}\right).

Since a submartingale is a supermartingale with signs reversed, we have if instead $\left\{X_{0},X_{1},\dots \right\}$ is a martingale (or submartingale),

{\text{P}}(X_{n}-X_{0}\leq -\epsilon )\leq \exp \left(-{\frac {2\epsilon ^{2}}{\sum _{t=1}^{n}c_{t}^{2}}}\right).

If $\left\{X_{0},X_{1},\dots \right\}$ is a martingale, since it is both a supermartingale and submartingale, by applying union bound to the two inequalities above, we could obtain the two-sided bound:

{\text{P}}(|X_{n}-X_{0}|\geq \epsilon )\leq 2\exp \left(-{\frac {2\epsilon ^{2}}{\sum _{t=1}^{n}c_{t}^{2}}}\right).

Proof

We will prove the supermartingale case only as the rest are self-evident. By Doob decomposition, we could decompose supermartingale $\left\{X_{t}\right\}$ as $X_{t}=Y_{t}+Z_{t}$ where $\left\{Y_{t},{\mathcal {F}}_{t}\right\}$ is a martingale and $\left\{Z_{t},{\mathcal {F}}_{t}\right\}$ is a nonincreasing predictable sequence (Note that if $\left\{X_{t}\right\}$ itself is a martingale, then $Z_{t}=0$ ). From $A_{t}\leq X_{t}-X_{t-1}\leq B_{t}$ , we have

-(Z_{t}-Z_{t-1})+A_{t}\leq Y_{t}-Y_{t-1}\leq -(Z_{t}-Z_{t-1})+B_{t}

Applying Chernoff bound to $Y_{n}-Y_{0}$ , we have for $\epsilon >0$ ,

{\begin{aligned}{\text{P}}(Y_{n}-Y_{0}\geq \epsilon )&\leq {\underset {s>0}{\min }}\ e^{-s\epsilon }\mathbb {E} [e^{s(Y_{n}-Y_{0})}]\\&={\underset {s>0}{\min }}\ e^{-s\epsilon }\mathbb {E} \left[\exp \left(s\sum _{t=1}^{n}(Y_{t}-Y_{t-1})\right)\right]\\&={\underset {s>0}{\min }}\ e^{-s\epsilon }\mathbb {E} \left[\exp \left(s\sum _{t=1}^{n-1}(Y_{t}-Y_{t-1})\right)\mathbb {E} \left[\exp \left(s(Y_{n}-Y_{n-1})\right)\mid {\mathcal {F}}_{n-1}\right]\right]\end{aligned}}

For the inner expectation term, since

(i) $\mathbb {E} [Y_{t}-Y_{t-1}\mid {\mathcal {F}}_{t-1}]=0$ as $\left\{Y_{t}\right\}$ is a martingale;

(ii) $-(Z_{t}-Z_{t-1})+A_{t}\leq Y_{t}-Y_{t-1}\leq -(Z_{t}-Z_{t-1})+B_{t}$ ;

(iii) $-(Z_{t}-Z_{t-1})+A_{t}$ and $-(Z_{t}-Z_{t-1})+B_{t}$ are both ${\mathcal {F}}_{t-1}$ -measurable as $\left\{Z_{t}\right\}$ is a predictable process;

(iv) $B_{t}-A_{t}\leq c_{t}$ ;

by applying Hoeffding's lemma^{[note 1]}, we have

\mathbb {E} \left[\exp \left(s(Y_{t}-Y_{t-1})\right)\mid {\mathcal {F}}_{t-1}\right]\leq \exp \left({\frac {s^{2}(B_{t}-A_{t})^{2}}{8}}\right)\leq \exp \left({\frac {s^{2}c_{t}^{2}}{8}}\right).

Repeating this step, one could get

{\text{P}}(Y_{n}-Y_{0}\geq \epsilon )\leq {\underset {s>0}{\min }}\ e^{-s\epsilon }\exp \left({\frac {s^{2}\sum _{t=1}^{n}c_{t}^{2}}{8}}\right).

Note that the minimum is achieved at $s={\frac {4\epsilon }{\sum _{t=1}^{n}c_{t}^{2}}}$ , so we have

{\text{P}}(Y_{n}-Y_{0}\geq \epsilon )\leq \exp \left(-{\frac {2\epsilon ^{2}}{\sum _{t=1}^{n}c_{t}^{2}}}\right).

Finally, since $X_{n}-X_{0}=(Y_{n}-Y_{0})+(Z_{n}-Z_{0})$ and $Z_{n}-Z_{0}\leq 0$ as $\left\{Z_{n}\right\}$ is nonincreasing, so event $\left\{X_{n}-X_{0}\geq \epsilon \right\}$ implies $\left\{Y_{n}-Y_{0}\geq \epsilon \right\}$ , and therefore

{\text{P}}(X_{n}-X_{0}\geq \epsilon )\leq {\text{P}}(Y_{n}-Y_{0}\geq \epsilon )\leq \exp \left(-{\frac {2\epsilon ^{2}}{\sum _{t=1}^{n}c_{t}^{2}}}\right).\square

Remark

Note that by setting $A_{t}=-c_{t},B_{t}=c_{t}$ , we could obtain the vanilla Azuma's inequality.

Note that for either submartingale or supermartingale, only one side of Azuma's inequality holds. We can't say much about how fast a submartingale with bounded increments rises (or a supermartingale falls).

This general form of Azuma's inequality applied to the Doob martingale gives McDiarmid's inequality which is common in the analysis of randomized algorithms.

Simple example of Azuma's inequality for coin flips

Let F_i be a sequence of independent and identically distributed random coin flips (i.e., let F_i be equally likely to be −1 or 1 independent of the other values of F_i). Defining $X_{i}=\sum _{j=1}^{i}F_{j}$ yields a martingale with |X_k − X_k−1| ≤ 1, allowing us to apply Azuma's inequality. Specifically, we get

{\text{P}}(X_{n}>t)\leq \exp \left({\frac {-t^{2}}{2n}}\right).

For example, if we set t proportional to n, then this tells us that although the maximum possible value of X_n scales linearly with n, the probability that the sum scales linearly with n decreases exponentially fast with n.

If we set $t={\sqrt {2n\ln n}}$ we get:

{\text{P}}(X_{n}>{\sqrt {2n\ln n}})\leq 1/n,

which means that the probability of deviating more than ${\sqrt {2n\ln n}}$ approaches 0 as n goes to infinity.

Share this article:

This article uses material from the Wikipedia article Azuma's_inequality, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[1] [N 1]
It is not a direct application of Hoeffding's lemma though. The statement of Hoeffding's lemma handles the total expectation, but it also holds for the case when the expectation is conditional expectation and the bounds are measurable with respect to the sigma-field the conditional expectation is conditioned on. The proof is the same as for the classical Hoeffding's lemma.

[note 1]

Azuma's_inequality

Azuma's inequality

Proof

A general form of Azuma's inequality

Limitation of the vanilla Azuma's inequality

Statement

Proof

Remark

Simple example of Azuma's inequality for coin flips

Remark

See also

Notes

References

Share this article: