Total variation distance of probability measures: Difference between revisions
add a thumbnail depiction Tag: Reverted |
m correct thumbnail description Tag: Reverted |
||
Line 1: | Line 1: | ||
[[File:Total variation distance.svg|thumb|250px|Total variation distance is the absolute area between the two curves.]] |
[[File:Total variation distance.svg|thumb|250px|Total variation distance is half the absolute area between the two curves: Half the shaded area above.]] |
||
In [[probability theory]], the '''total variation distance''' is a distance measure for probability distributions. It is an example of a [[statistical distance]] metric, and is sometimes called the '''statistical distance''', '''statistical difference''' or '''variational distance'''. |
In [[probability theory]], the '''total variation distance''' is a distance measure for probability distributions. It is an example of a [[statistical distance]] metric, and is sometimes called the '''statistical distance''', '''statistical difference''' or '''variational distance'''. |
Revision as of 04:27, 2 March 2022
In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance.
Definition
The total variation distance between two probability measures P and Q on a sigma-algebra of subsets of the sample space is defined via[1]
Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.
Properties
Relation to other distances
The total variation distance is related to the Kullback–Leibler divergence by Pinsker's inequality:
One also has the following inequality, due to Bretagnolle and Huber[2] (see, also, Tsybakov[3]), which has the advantage of providing a non-vacuous bound even when :
When the set is countable, the total variation distance is related to the L1 norm by the identity:[4]
The total variation distance is related to the Hellinger distance as follows:[5]
These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.
Connection to transportation theory
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is , that is,
where the expectation is taken with respect to the probability measure on the space where lives, and the infimum is taken over all such with marginals and , respectively.[6]
See also
References
- ^ Chatterjee, Sourav. "Distances between probability measures" (PDF). UC Berkeley. Archived from the original (PDF) on July 8, 2008. Retrieved 21 June 2013.
- ^ Bretagnolle, J.; Huber, C, Estimation des densités: risque minimax, Séminaire de Probabilités, XII (Univ. Strasbourg, Strasbourg, 1976/1977), pp. 342–363, Lecture Notes in Math., 649, Springer, Berlin, 1978, Lemma 2.1 (French).
- ^ Tsybakov, Alexandre B., Introduction to nonparametric estimation, Revised and extended from the 2004 French original. Translated by Vladimir Zaiats. Springer Series in Statistics. Springer, New York, 2009. xii+214 pp. ISBN 978-0-387-79051-0, Equation 2.25.
- ^ David A. Levin, Yuval Peres, Elizabeth L. Wilmer, Markov Chains and Mixing Times, 2nd. rev. ed. (AMS, 2017), Proposition 4.2, p. 48.
- ^ Harsha, Prahladh (September 23, 2011). "Lecture notes on communication complexity" (PDF).
- ^ Villani, Cédric (2009). Optimal Transport, Old and New. Grundlehren der mathematischen Wissenschaften. Vol. 338. Springer-Verlag Berlin Heidelberg. p. 10. doi:10.1007/978-3-540-71050-9. ISBN 978-3-540-71049-3.