Jensen–Shannon Divergence The Jensen–Shannon divergence, a method for measuring similarity between two probability distributions, has been formally defined in probability theory and statistics. Named after Johan Jensen and Claude Shannon, the metric is symmetric and always yields a finite value, with the square root of the divergence serving as a distance metric where values closer to zero indicate greater similarity between distributions. The divergence is bounded by 1 for discrete distributions using base-2 logarithms and is related to mutual information, providing a foundation for comparing distributions in fields ranging from statistical thermodynamics to information theory. Jensen–Shannon divergence In probability theory /wiki/Probability theory and statistics /wiki/Statistics , the Jensen–Shannon divergence , named after Johan Jensen /wiki/Johan Jensen mathematician and Claude Shannon /wiki/Claude Shannon , is a method of measuring the similarity between two probability distributions /wiki/Probability distribution . It is also known as information radius IRad 1 cite note-1 2 or total divergence to the average . It is based on the 3 cite note-3 Kullback–Leibler divergence /wiki/Kullback%E2%80%93Leibler divergence , with some notable and useful differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric /wiki/Metric mathematics often referred to as Jensen–Shannon distance. The similarity between the distributions is greater when the Jensen-Shannon distance is closer to zero. 4 cite note-4 5 cite note-5 6 cite note-6 Definition edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=1 Consider the set of probability distributions where is a set provided with some σ-algebra /wiki/Sigma-algebra of measurable subsets. In particular we can take to be a finite or countable set /wiki/Countable set with all subsets being measurable. The Jensen–Shannon divergence JSD is a symmetrized and smoothed version of the Kullback–Leibler divergence /wiki/Kullback%E2%80%93Leibler divergence . It is defined by where is a mixture distribution /wiki/Mixture distribution of and . The geometric Jensen–Shannon divergence 7 or G-Jensen–Shannon divergence yields a closed-form formula for divergence between two Gaussian distributions by taking the geometric mean. A more general definition, allowing for the comparison of more than two probability distributions, is: where and are weights that are selected for the probability distributions , and is the Shannon entropy /wiki/Shannon entropy for distribution . For the two-distribution case described above, Hence, for those distributions Bounds edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=2 The Jensen–Shannon divergence is bounded by 1 for two discrete probability distributions, given that one uses the base 2 logarithm: 8 cite note-Lin-8 - . With this normalization, it is a lower bound on the total variation distance /wiki/Total variation distance between P and Q: - . With base-e logarithm, which is commonly used in statistical thermodynamics, the upper bound is . In general, the bound in base b is : - . A more general bound, the Jensen–Shannon divergence is bounded by for more than two probability distributions: 8 cite note-Lin-8 - . Relation to mutual information edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=3 The Jensen–Shannon divergence is the mutual information /wiki/Mutual information between a random variable associated to a mixture distribution /wiki/Mixture distribution between and and the binary indicator variable that is used to switch between and to produce the mixture. Let be some abstract function on the underlying set of events that discriminates well between events, and choose the value of according to if and according to if , where is equiprobable. That is, we are choosing according to the probability measure , and its distribution is the mixture distribution. We compute It follows from the above result that the Jensen–Shannon divergence is bounded by 0 and 1 because mutual information is non-negative and bounded by in base 2 logarithm. One can apply the same principle to a joint distribution and the product of its two marginal distribution /wiki/Marginal distribution in analogy to Kullback–Leibler divergence and mutual information and to measure how reliably one can decide if a given response comes from the joint distribution or the product distribution—subject to the assumption that these are the only two possibilities. 9 cite note-9 Quantum Jensen–Shannon divergence edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=4 The generalization of probability distributions on density matrices /wiki/Density matrices allows to define quantum Jensen–Shannon divergence QJSD . 10 cite note-10 11 It is defined for a set of density matrices /wiki/Density matrices and a probability distribution as where is the von Neumann entropy /wiki/Von Neumann entropy of . This quantity was introduced in quantum information /wiki/Quantum information theory, where it is called the Holevo information: it gives the upper bound for amount of classical information encoded by the quantum states under the prior distribution see Holevo's theorem /wiki/Holevo%27s theorem . 12 Quantum Jensen–Shannon divergence for and two density matrices is a symmetric function, everywhere defined, bounded and equal to zero only if two density matrices /wiki/Density matrices are the same. It is a square of a metric for pure states /wiki/Pure states , and it was recently shown that this metric property holds for mixed states as well. 13 cite note-13 14 cite note-14 The 15 cite note-15 Bures metric /wiki/Bures metric is closely related to the quantum JS divergence; it is the quantum analog of the Fisher information metric /wiki/Fisher information metric . Jensen–Shannon centroid edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=5 The centroid C of a finite set /wiki/Finite set of probability distributions can be defined as the minimizer of the average sum of the Jensen-Shannon divergences between a probability distribution and the prescribed set of distributions: An efficient algorithm 16 CCCP based on difference of convex functions is reported to calculate the Jensen-Shannon centroid of a set of discrete distributions histograms . Applications edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=6 The Jensen–Shannon divergence has been applied in bioinformatics /wiki/Bioinformatics and genome comparison /wiki/Genome comparison , 17 cite note-Sims-17 18 in protein surface comparison, in the social sciences, 19 cite note-Ofran-19 in the quantitative study of history, 20 cite note-DeDeo-20 in fire experiments, 21 cite note-Klingenstein-21 and in machine learning. 22 cite note-22 23 cite note-23 Notes edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=7 Frank Nielsen 2021 . ^ cite ref-1 "On a variational definition for the Jensen-Shannon symmetrization of distances based on the information radius" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974 . Entropy . 23 4 . MDPI: 464. Bibcode /wiki/Bibcode identifier : 2019Entrp..21..485N https://ui.adsabs.harvard.edu/abs/2019Entrp..21..485N . doi /wiki/Doi identifier : 10.3390/e21050485 https://doi.org/10.3390%2Fe21050485 . PMC /wiki/PMC identifier 7514974 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974 . PMID /wiki/PMID identifier 33267199 https://pubmed.ncbi.nlm.nih.gov/33267199 .Hinrich Schütze; Christopher D. Manning 1999 . ^ cite ref-2 . Cambridge, Mass: MIT Press. p. 304. Foundations of Statistical Natural Language Processing ISBN /wiki/ISBN identifier 978-0-262-13360-9 /wiki/Special:BookSources/978-0-262-13360-9 .Dagan, Ido; ^ cite ref-3 Lee, Lillian /wiki/Lillian Lee computer scientist ; Pereira, Fernando C. N. 1997 . "Similarity-based methods for word sense disambiguation" https://aclanthology.org/P97-1008/ . In Cohen, Philip R.; Wahlster, Wolfgang eds. . 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 7–12 July 1997, Universidad Nacional de Educación a Distancia UNED , Madrid, Spain . Morgan Kaufmann Publishers / ACL. pp. 56–63. arXiv /wiki/ArXiv identifier : cmp-lg/9708010 https://arxiv.org/abs/cmp-lg/9708010 . doi /wiki/Doi identifier : 10.3115/976909.979625 https://doi.org/10.3115%2F976909.979625 .Endres, D. M.; J. E. Schindelin 2003 . ^ cite ref-4 "A new metric for probability distributions" https://research-repository.st-andrews.ac.uk/bitstream/10023/1591/1/Endres2003-IEEETransInfTheory49-NewMetric.pdf PDF . IEEE Trans. Inf. Theory . 49 7 : 1858–1860. Bibcode /wiki/Bibcode identifier : 2003ITIT...49.1858E https://ui.adsabs.harvard.edu/abs/2003ITIT...49.1858E . doi /wiki/Doi identifier : 10.1109/TIT.2003.813506 https://doi.org/10.1109%2FTIT.2003.813506 . hdl /wiki/Hdl identifier : 10023/1591 https://hdl.handle.net/10023%2F1591 . S2CID /wiki/S2CID identifier 14437777 https://api.semanticscholar.org/CorpusID:14437777 .Ôsterreicher, F.; I. Vajda 2003 . "A new class of metric divergences on probability spaces and its statistical applications". ^ cite ref-5 Ann. Inst. Statist. Math . 55 3 : 639–653. doi /wiki/Doi identifier : 10.1007/BF02517812 https://doi.org/10.1007%2FBF02517812 . S2CID /wiki/S2CID identifier 13085920 https://api.semanticscholar.org/CorpusID:13085920 .Fuglede, B.; Topsoe, F. 2004 . ^ cite ref-6 "Jensen-Shannon divergence and Hilbert space embedding" https://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf PDF . Proceedings of the International Symposium on Information Theory, 2004 . IEEE. p. 30. doi /wiki/Doi identifier : 10.1109/ISIT.2004.1365067 https://doi.org/10.1109%2FISIT.2004.1365067 . ISBN /wiki/ISBN identifier 978-0-7803-8280-0 /wiki/Special:BookSources/978-0-7803-8280-0 . S2CID /wiki/S2CID identifier 7891037 https://api.semanticscholar.org/CorpusID:7891037 .Frank Nielsen 2019 . ^ cite ref-7 "On the Jensen-Shannon symmetrization of distances relying on abstract means" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974 . Entropy . 21 5 . MDPI: 485. arXiv /wiki/ArXiv identifier : 1904.04017 https://arxiv.org/abs/1904.04017 . Bibcode /wiki/Bibcode identifier : 2019Entrp..21..485N https://ui.adsabs.harvard.edu/abs/2019Entrp..21..485N . doi /wiki/Doi identifier : 10.3390/e21050485 https://doi.org/10.3390%2Fe21050485 . PMC /wiki/PMC identifier 7514974 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974 . PMID /wiki/PMID identifier 33267199 https://pubmed.ncbi.nlm.nih.gov/33267199 .- ^ a Lin, J. 1991 . b "Divergence measures based on the shannon entropy" https://www.cise.ufl.edu/~anand/sp06/jensen-shannon.pdf PDF . IEEE Transactions on Information Theory . 37 1 : 145–151. CiteSeerX /wiki/CiteSeerX identifier 10.1.1.127.9167 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.9167 . doi /wiki/Doi identifier : 10.1109/18.61115 https://doi.org/10.1109%2F18.61115 . S2CID /wiki/S2CID identifier 12121632 https://api.semanticscholar.org/CorpusID:12121632 . Schneidman, Elad; Bialek, W; Berry, M.J. II 2003 . ^ cite ref-9 "Synergy, Redundancy, and Independence in Population Codes" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740962 . Journal of Neuroscience . 23 37 : 11539–11553. doi /wiki/Doi identifier : 10.1523/JNEUROSCI.23-37-11539.2003 https://doi.org/10.1523%2FJNEUROSCI.23-37-11539.2003 . PMC /wiki/PMC identifier 6740962 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740962 . PMID /wiki/PMID identifier 14684857 https://pubmed.ncbi.nlm.nih.gov/14684857 .Majtey, A.; Lamberti, P.; Prato, D. 2005 . "Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states". ^ cite ref-10 Physical Review A . 72 5 052310. arXiv /wiki/ArXiv identifier : quant-ph/0508138 https://arxiv.org/abs/quant-ph/0508138 . Bibcode /wiki/Bibcode identifier : 2005PhRvA..72e2310M https://ui.adsabs.harvard.edu/abs/2005PhRvA..72e2310M . doi /wiki/Doi identifier : 10.1103/PhysRevA.72.052310 https://doi.org/10.1103%2FPhysRevA.72.052310 . S2CID /wiki/S2CID identifier 32062112 https://api.semanticscholar.org/CorpusID:32062112 .Briët, Jop; Harremoës, Peter 2009 . "Properties of classical and quantum Jensen-Shannon divergence". ^ cite ref-briet 11-0 Physical Review A . 79 5 052311. arXiv /wiki/ArXiv identifier : 0806.4472 https://arxiv.org/abs/0806.4472 . Bibcode /wiki/Bibcode identifier : 2009PhRvA..79e2311B https://ui.adsabs.harvard.edu/abs/2009PhRvA..79e2311B . doi /wiki/Doi identifier : 10.1103/PhysRevA.79.052311 https://doi.org/10.1103%2FPhysRevA.79.052311 . ^ cite ref-12 Holevo, A. S. /wiki/Alexander Holevo 1973 , "Bounds for the quantity of information transmitted by a quantum communication channel", Problemy Peredachi Informatsii in Russian , 9 : 3–11. English translation: Probl. Inf. Transm ., 9 : 177–183 1975 MR /wiki/MR identifier 0456936 https://mathscinet.ams.org/mathscinet-getitem?mr=0456936 Braunstein, Samuel; Caves, Carlton 1994 . "Statistical distance and the geometry of quantum states". ^ cite ref-13 Physical Review Letters . 72 22 : 3439–3443. Bibcode /wiki/Bibcode identifier : 1994PhRvL..72.3439B https://ui.adsabs.harvard.edu/abs/1994PhRvL..72.3439B . doi /wiki/Doi identifier : 10.1103/PhysRevLett.72.3439 https://doi.org/10.1103%2FPhysRevLett.72.3439 . PMID /wiki/PMID identifier 10056200 https://pubmed.ncbi.nlm.nih.gov/10056200 .Virosztek, Dániel 2021 . "The metric property of the quantum Jensen-Shannon divergence". ^ cite ref-14 . Advances in Mathematics /wiki/Advances in Mathematics 380 107595. arXiv /wiki/ArXiv identifier : 1910.10447 https://arxiv.org/abs/1910.10447 . doi /wiki/Doi identifier : 10.1016/j.aim.2021.107595 https://doi.org/10.1016%2Fj.aim.2021.107595 . S2CID /wiki/S2CID identifier 204837864 https://api.semanticscholar.org/CorpusID:204837864 .Sra, Suvrit 2019 . "Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences". ^ cite ref-15 arXiv /wiki/ArXiv identifier : 1911.02643 https://arxiv.org/abs/1911.02643 cs.IT https://arxiv.org/archive/cs.IT .Frank Nielsen 2021 . ^ cite ref-16 "On a generalization of the Jensen-Shannon divergence and the Jensen--Shannon centroid" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516653 . Entropy . 22 2 . MDPI: 221. doi /wiki/Doi identifier : 10.3390/e22020221 https://doi.org/10.3390%2Fe22020221 . PMC /wiki/PMC identifier 7516653 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516653 . PMID /wiki/PMID identifier 33285995 https://pubmed.ncbi.nlm.nih.gov/33285995 .Sims, GE; Jun, SR; Wu, GA; Kim, SH 2009 . ^ cite ref-Sims 17-0 "Alignment-free genome comparison with feature frequency profiles FFP and optimal resolutions" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2634796 .. Proceedings of the National Academy of Sciences of the United States of America /wiki/Proceedings of the National Academy of Sciences of the United States of America 106 8 : 2677–82. Bibcode /wiki/Bibcode identifier : 2009PNAS..106.2677S https://ui.adsabs.harvard.edu/abs/2009PNAS..106.2677S . doi /wiki/Doi identifier : 10.1073/pnas.0813249106 https://doi.org/10.1073%2Fpnas.0813249106 . PMC /wiki/PMC identifier 2634796 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2634796 . PMID /wiki/PMID identifier 19188606 https://pubmed.ncbi.nlm.nih.gov/19188606 .Itzkovitz, S; Hodis, E; Segal, E 2010 . ^ cite ref-It 18-0 "Overlapping codes within protein-coding sequences" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963821 . Genome Research . 20 11 : 1582–9. doi /wiki/Doi identifier : 10.1101/gr.105072.110 https://doi.org/10.1101%2Fgr.105072.110 . PMC /wiki/PMC identifier 2963821 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963821 . PMID /wiki/PMID identifier 20841429 https://pubmed.ncbi.nlm.nih.gov/20841429 .Ofran, Y; Rost, B 2003 . "Analysing six types of protein-protein interfaces". ^ cite ref-Ofran 19-0 Journal of Molecular Biology . 325 2 : 377–87. CiteSeerX /wiki/CiteSeerX identifier 10.1.1.6.9207 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.9207 . doi /wiki/Doi identifier : 10.1016/s0022-2836 02 01223-8 https://doi.org/10.1016%2Fs0022-2836%2802%2901223-8 . PMID /wiki/PMID identifier 12488102 https://pubmed.ncbi.nlm.nih.gov/12488102 .DeDeo, Simon; Hawkins, Robert X. D.; Klingenstein, Sara; Hitchcock, Tim 2013 . ^ cite ref-DeDeo 20-0 "Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems" https://doi.org/10.3390%2Fe15062246 . Entropy . 15 6 : 2246–2276. arXiv /wiki/ArXiv identifier : 1302.0907 https://arxiv.org/abs/1302.0907 . Bibcode /wiki/Bibcode identifier : 2013Entrp..15.2246D https://ui.adsabs.harvard.edu/abs/2013Entrp..15.2246D . doi /wiki/Doi identifier : 10.3390/e15062246 https://doi.org/10.3390%2Fe15062246 .Klingenstein, Sara; Hitchcock, Tim; DeDeo, Simon 2014 . ^ cite ref-Klingenstein 21-0 "The civilizing process in London's Old Bailey" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084475 .. Proceedings of the National Academy of Sciences of the United States of America /wiki/Proceedings of the National Academy of Sciences of the United States of America 111 26 : 9419–9424. Bibcode /wiki/Bibcode identifier : 2014PNAS..111.9419K https://ui.adsabs.harvard.edu/abs/2014PNAS..111.9419K . doi /wiki/Doi identifier : 10.1073/pnas.1405984111 https://doi.org/10.1073%2Fpnas.1405984111 . PMC /wiki/PMC identifier 4084475 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084475 . PMID /wiki/PMID identifier 24979792 https://pubmed.ncbi.nlm.nih.gov/24979792 .Flavia-Corina Mitroi-Symeonidis; Ion Anghel; Nicuşor Minculete 2020 . ^ cite ref-22 "Parametric Jensen-Shannon statistical complexity and its applications on full-scale compartment fire data" https://doi.org/10.3390%2Fsym12010022 . Symmetry . 12 1 : 22. doi /wiki/Doi identifier : 10.3390/sym12010022 https://doi.org/10.3390%2Fsym12010022 .Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; ^ cite ref-23 Bengio, Yoshua /wiki/Yoshua Bengio 2014 . Generative Adversarial Networks . NIPS /wiki/Advances in Neural Information Processing Systems . arXiv /wiki/ArXiv identifier : 1406.2661 https://arxiv.org/abs/1406.2661 . Bibcode /wiki/Bibcode identifier : 2014arXiv1406.2661G https://ui.adsabs.harvard.edu/abs/2014arXiv1406.2661G . External links edit /w/index.php?title=Jensen%E2%80%93Shannon divergence&action=edit§ion=8