{"slug": "jensen-shannon-divergence", "title": "Jensen–Shannon Divergence", "summary": "The Jensen–Shannon divergence, a method for measuring similarity between two probability distributions, has been formally defined in probability theory and statistics. Named after Johan Jensen and Claude Shannon, the metric is symmetric and always yields a finite value, with the square root of the divergence serving as a distance metric where values closer to zero indicate greater similarity between distributions. The divergence is bounded by 1 for discrete distributions using base-2 logarithms and is related to mutual information, providing a foundation for comparing distributions in fields ranging from statistical thermodynamics to information theory.", "body_md": "# Jensen–Shannon divergence\n\nIn [probability theory](/wiki/Probability_theory) and [statistics](/wiki/Statistics), the **Jensen–Shannon divergence**, named after [Johan Jensen](/wiki/Johan_Jensen_(mathematician)) and [Claude Shannon](/wiki/Claude_Shannon), is a method of measuring the similarity between two [probability distributions](/wiki/Probability_distribution). It is also known as **information radius** (**IRad**)[[1]](#cite_note-1) [2] or\n\n**total divergence to the average**.\n\nIt is based on the\n\n[[3]](#cite_note-3)[Kullback–Leibler divergence](/wiki/Kullback%E2%80%93Leibler_divergence), with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a\n\n[metric](/wiki/Metric_(mathematics))often referred to as Jensen–Shannon distance. The similarity between the distributions is greater when the Jensen-Shannon distance is closer to zero.\n\n[[4]](#cite_note-4)\n\n[[5]](#cite_note-5)\n\n[[6]](#cite_note-6)## Definition\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=1)]\n\nConsider the set of probability distributions where is a set provided with some [σ-algebra](/wiki/Sigma-algebra) of measurable subsets. In particular we can take to be a finite or [countable set](/wiki/Countable_set) with all subsets being measurable.\n\nThe Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the [Kullback–Leibler divergence](/wiki/Kullback%E2%80%93Leibler_divergence) . It is defined by\n\nwhere is a [mixture distribution](/wiki/Mixture_distribution) of and .\n\nThe geometric Jensen–Shannon divergence [7] (or G-Jensen–Shannon divergence) yields a closed-form formula for divergence between two Gaussian distributions by taking the geometric mean.\n\nA more general definition, allowing for the comparison of more than two probability distributions, is:\n\nwhere\n\nand are weights that are selected for the probability distributions , and is the [Shannon entropy](/wiki/Shannon_entropy) for distribution . For the two-distribution case described above,\n\nHence, for those distributions\n\n## Bounds\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=2)]\n\nThe Jensen–Shannon divergence is bounded by 1 for two discrete probability distributions, given that one uses the base 2 logarithm:[[8]](#cite_note-Lin-8)\n\n- .\n\nWith this normalization, it is a lower bound on the [total variation distance](/wiki/Total_variation_distance) between P and Q:\n\n- .\n\nWith base-e logarithm, which is commonly used in statistical thermodynamics, the upper bound is . In general, the bound in base b is :\n\n- .\n\nA more general bound, the Jensen–Shannon divergence is bounded by for more than two probability distributions:[[8]](#cite_note-Lin-8)\n\n- .\n\n## Relation to mutual information\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=3)]\n\nThe Jensen–Shannon divergence is the [mutual information](/wiki/Mutual_information) between a random variable associated to a [mixture distribution](/wiki/Mixture_distribution) between and and the binary indicator variable that is used to switch between and to produce the mixture. Let be some abstract function on the underlying set of events that discriminates well between events, and choose the value of according to if and according to if , where is equiprobable. That is, we are choosing according to the probability measure , and its distribution is the mixture distribution. We compute\n\nIt follows from the above result that the Jensen–Shannon divergence is bounded by 0 and 1 because mutual information is non-negative and bounded by in base 2 logarithm.\n\nOne can apply the same principle to a joint distribution and the product of its two [marginal distribution](/wiki/Marginal_distribution) (in analogy to Kullback–Leibler divergence and mutual information) and to measure how reliably one can decide if a given response comes from the joint distribution or the product distribution—subject to the assumption that these are the only two possibilities.[[9]](#cite_note-9)\n\n## Quantum Jensen–Shannon divergence\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=4)]\n\nThe generalization of probability distributions on [density matrices](/wiki/Density_matrices) allows to define quantum Jensen–Shannon divergence (QJSD).[[10]](#cite_note-10) [11] It is defined for a set of\n\n[density matrices](/wiki/Density_matrices)and a probability distribution as\n\nwhere is the [von Neumann entropy](/wiki/Von_Neumann_entropy) of . This quantity was introduced in [quantum information](/wiki/Quantum_information) theory, where it is called the Holevo information: it gives the upper bound for amount of classical information encoded by the quantum states under the prior distribution (see [Holevo's theorem](/wiki/Holevo%27s_theorem)). [12] Quantum Jensen–Shannon divergence for and two density matrices is a symmetric function, everywhere defined, bounded and equal to zero only if two\n\n[density matrices](/wiki/Density_matrices)are the same. It is a square of a metric for\n\n[pure states](/wiki/Pure_states),\n\nand it was recently shown that this metric property holds for mixed states as well.\n\n[[13]](#cite_note-13)\n\n[[14]](#cite_note-14)The\n\n[[15]](#cite_note-15)[Bures metric](/wiki/Bures_metric)is closely related to the quantum JS divergence; it is the quantum analog of the\n\n[Fisher information metric](/wiki/Fisher_information_metric).\n\n## Jensen–Shannon centroid\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=5)]\n\nThe centroid C* of a [finite set](/wiki/Finite_set) of probability distributions can\nbe defined as the minimizer of the average sum of the Jensen-Shannon divergences between a probability distribution and the prescribed set of distributions:\nAn efficient algorithm [16] (CCCP) based on difference of convex functions is reported to calculate the Jensen-Shannon centroid of a set of discrete distributions (histograms).\n\n## Applications\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=6)]\n\nThe Jensen–Shannon divergence has been applied in [bioinformatics](/wiki/Bioinformatics) and [genome comparison](/wiki/Genome_comparison),[[17]](#cite_note-Sims-17) [18] in protein surface comparison,\n\nin the social sciences,\n\n[[19]](#cite_note-Ofran-19)in the quantitative study of history,\n\n[[20]](#cite_note-DeDeo-20)in fire experiments,\n\n[[21]](#cite_note-Klingenstein-21)and in machine learning.\n\n[[22]](#cite_note-22)\n\n[[23]](#cite_note-23)## Notes\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=7)]\n\nFrank Nielsen (2021).[^](#cite_ref-1)[\"On a variational definition for the Jensen-Shannon symmetrization of distances based on the information radius\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974).*Entropy*.**23**(4). MDPI: 464.[Bibcode](/wiki/Bibcode_(identifier)):[2019Entrp..21..485N](https://ui.adsabs.harvard.edu/abs/2019Entrp..21..485N).[doi](/wiki/Doi_(identifier)):[10.3390/e21050485](https://doi.org/10.3390%2Fe21050485).[PMC](/wiki/PMC_(identifier))[7514974](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974).[PMID](/wiki/PMID_(identifier))[33267199](https://pubmed.ncbi.nlm.nih.gov/33267199).Hinrich Schütze; Christopher D. Manning (1999).[^](#cite_ref-2). Cambridge, Mass: MIT Press. p. 304.*Foundations of Statistical Natural Language Processing*[ISBN](/wiki/ISBN_(identifier))[978-0-262-13360-9](/wiki/Special:BookSources/978-0-262-13360-9).Dagan, Ido;[^](#cite_ref-3)[Lee, Lillian](/wiki/Lillian_Lee_(computer_scientist)); Pereira, Fernando C. N. (1997).[\"Similarity-based methods for word sense disambiguation\"](https://aclanthology.org/P97-1008/). In Cohen, Philip R.; Wahlster, Wolfgang (eds.).*35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 7–12 July 1997, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain*. Morgan Kaufmann Publishers / ACL. pp. 56–63.[arXiv](/wiki/ArXiv_(identifier)):[cmp-lg/9708010](https://arxiv.org/abs/cmp-lg/9708010).[doi](/wiki/Doi_(identifier)):[10.3115/976909.979625](https://doi.org/10.3115%2F976909.979625).Endres, D. M.; J. E. Schindelin (2003).[^](#cite_ref-4)[\"A new metric for probability distributions\"](https://research-repository.st-andrews.ac.uk/bitstream/10023/1591/1/Endres2003-IEEETransInfTheory49-NewMetric.pdf)(PDF).*IEEE Trans. Inf. Theory*.**49**(7): 1858–1860.[Bibcode](/wiki/Bibcode_(identifier)):[2003ITIT...49.1858E](https://ui.adsabs.harvard.edu/abs/2003ITIT...49.1858E).[doi](/wiki/Doi_(identifier)):[10.1109/TIT.2003.813506](https://doi.org/10.1109%2FTIT.2003.813506).[hdl](/wiki/Hdl_(identifier)):[10023/1591](https://hdl.handle.net/10023%2F1591).[S2CID](/wiki/S2CID_(identifier))[14437777](https://api.semanticscholar.org/CorpusID:14437777).Ôsterreicher, F.; I. Vajda (2003). \"A new class of metric divergences on probability spaces and its statistical applications\".[^](#cite_ref-5)*Ann. Inst. Statist. Math*.**55**(3): 639–653.[doi](/wiki/Doi_(identifier)):[10.1007/BF02517812](https://doi.org/10.1007%2FBF02517812).[S2CID](/wiki/S2CID_(identifier))[13085920](https://api.semanticscholar.org/CorpusID:13085920).Fuglede, B.; Topsoe, F. (2004).[^](#cite_ref-6)[\"Jensen-Shannon divergence and Hilbert space embedding\"](https://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf)(PDF).*Proceedings of the International Symposium on Information Theory, 2004*. IEEE. p. 30.[doi](/wiki/Doi_(identifier)):[10.1109/ISIT.2004.1365067](https://doi.org/10.1109%2FISIT.2004.1365067).[ISBN](/wiki/ISBN_(identifier))[978-0-7803-8280-0](/wiki/Special:BookSources/978-0-7803-8280-0).[S2CID](/wiki/S2CID_(identifier))[7891037](https://api.semanticscholar.org/CorpusID:7891037).Frank Nielsen (2019).[^](#cite_ref-7)[\"On the Jensen-Shannon symmetrization of distances relying on abstract means\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974).*Entropy*.**21**(5). MDPI: 485.[arXiv](/wiki/ArXiv_(identifier)):[1904.04017](https://arxiv.org/abs/1904.04017).[Bibcode](/wiki/Bibcode_(identifier)):[2019Entrp..21..485N](https://ui.adsabs.harvard.edu/abs/2019Entrp..21..485N).[doi](/wiki/Doi_(identifier)):[10.3390/e21050485](https://doi.org/10.3390%2Fe21050485).[PMC](/wiki/PMC_(identifier))[7514974](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514974).[PMID](/wiki/PMID_(identifier))[33267199](https://pubmed.ncbi.nlm.nih.gov/33267199).- ^\n**a** Lin, J. (1991).**b**[\"Divergence measures based on the shannon entropy\"](https://www.cise.ufl.edu/~anand/sp06/jensen-shannon.pdf)(PDF).*IEEE Transactions on Information Theory*.**37**(1): 145–151.[CiteSeerX](/wiki/CiteSeerX_(identifier))[10.1.1.127.9167](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.9167).[doi](/wiki/Doi_(identifier)):[10.1109/18.61115](https://doi.org/10.1109%2F18.61115).[S2CID](/wiki/S2CID_(identifier))[12121632](https://api.semanticscholar.org/CorpusID:12121632). Schneidman, Elad; Bialek, W; Berry, M.J. II (2003).[^](#cite_ref-9)[\"Synergy, Redundancy, and Independence in Population Codes\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740962).*Journal of Neuroscience*.**23**(37): 11539–11553.[doi](/wiki/Doi_(identifier)):[10.1523/JNEUROSCI.23-37-11539.2003](https://doi.org/10.1523%2FJNEUROSCI.23-37-11539.2003).[PMC](/wiki/PMC_(identifier))[6740962](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6740962).[PMID](/wiki/PMID_(identifier))[14684857](https://pubmed.ncbi.nlm.nih.gov/14684857).Majtey, A.; Lamberti, P.; Prato, D. (2005). \"Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states\".[^](#cite_ref-10)*Physical Review A*.**72**(5) 052310.[arXiv](/wiki/ArXiv_(identifier)):[quant-ph/0508138](https://arxiv.org/abs/quant-ph/0508138).[Bibcode](/wiki/Bibcode_(identifier)):[2005PhRvA..72e2310M](https://ui.adsabs.harvard.edu/abs/2005PhRvA..72e2310M).[doi](/wiki/Doi_(identifier)):[10.1103/PhysRevA.72.052310](https://doi.org/10.1103%2FPhysRevA.72.052310).[S2CID](/wiki/S2CID_(identifier))[32062112](https://api.semanticscholar.org/CorpusID:32062112).Briët, Jop; Harremoës, Peter (2009). \"Properties of classical and quantum Jensen-Shannon divergence\".[^](#cite_ref-briet_11-0)*Physical Review A*.**79**(5) 052311.[arXiv](/wiki/ArXiv_(identifier)):[0806.4472](https://arxiv.org/abs/0806.4472).[Bibcode](/wiki/Bibcode_(identifier)):[2009PhRvA..79e2311B](https://ui.adsabs.harvard.edu/abs/2009PhRvA..79e2311B).[doi](/wiki/Doi_(identifier)):[10.1103/PhysRevA.79.052311](https://doi.org/10.1103%2FPhysRevA.79.052311).[^](#cite_ref-12)[Holevo, A. S.](/wiki/Alexander_Holevo)(1973), \"Bounds for the quantity of information transmitted by a quantum communication channel\",*Problemy Peredachi Informatsii*(in Russian),**9**: 3–11. English translation:*Probl. Inf. Transm*.,**9**: 177–183 (1975)[MR](/wiki/MR_(identifier))[0456936](https://mathscinet.ams.org/mathscinet-getitem?mr=0456936)Braunstein, Samuel; Caves, Carlton (1994). \"Statistical distance and the geometry of quantum states\".[^](#cite_ref-13)*Physical Review Letters*.**72**(22): 3439–3443.[Bibcode](/wiki/Bibcode_(identifier)):[1994PhRvL..72.3439B](https://ui.adsabs.harvard.edu/abs/1994PhRvL..72.3439B).[doi](/wiki/Doi_(identifier)):[10.1103/PhysRevLett.72.3439](https://doi.org/10.1103%2FPhysRevLett.72.3439).[PMID](/wiki/PMID_(identifier))[10056200](https://pubmed.ncbi.nlm.nih.gov/10056200).Virosztek, Dániel (2021). \"The metric property of the quantum Jensen-Shannon divergence\".[^](#cite_ref-14).[Advances in Mathematics](/wiki/Advances_in_Mathematics)**380** 107595.[arXiv](/wiki/ArXiv_(identifier)):[1910.10447](https://arxiv.org/abs/1910.10447).[doi](/wiki/Doi_(identifier)):[10.1016/j.aim.2021.107595](https://doi.org/10.1016%2Fj.aim.2021.107595).[S2CID](/wiki/S2CID_(identifier))[204837864](https://api.semanticscholar.org/CorpusID:204837864).Sra, Suvrit (2019). \"Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences\".[^](#cite_ref-15)[arXiv](/wiki/ArXiv_(identifier)):[1911.02643](https://arxiv.org/abs/1911.02643)[[cs.IT](https://arxiv.org/archive/cs.IT)].Frank Nielsen (2021).[^](#cite_ref-16)[\"On a generalization of the Jensen-Shannon divergence and the Jensen--Shannon centroid\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516653).*Entropy*.**22**(2). MDPI: 221.[doi](/wiki/Doi_(identifier)):[10.3390/e22020221](https://doi.org/10.3390%2Fe22020221).[PMC](/wiki/PMC_(identifier))[7516653](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516653).[PMID](/wiki/PMID_(identifier))[33285995](https://pubmed.ncbi.nlm.nih.gov/33285995).Sims, GE; Jun, SR; Wu, GA; Kim, SH (2009).[^](#cite_ref-Sims_17-0)[\"Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2634796)..[Proceedings of the National Academy of Sciences of the United States of America](/wiki/Proceedings_of_the_National_Academy_of_Sciences_of_the_United_States_of_America)**106**(8): 2677–82.[Bibcode](/wiki/Bibcode_(identifier)):[2009PNAS..106.2677S](https://ui.adsabs.harvard.edu/abs/2009PNAS..106.2677S).[doi](/wiki/Doi_(identifier)):[10.1073/pnas.0813249106](https://doi.org/10.1073%2Fpnas.0813249106).[PMC](/wiki/PMC_(identifier))[2634796](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2634796).[PMID](/wiki/PMID_(identifier))[19188606](https://pubmed.ncbi.nlm.nih.gov/19188606).Itzkovitz, S; Hodis, E; Segal, E (2010).[^](#cite_ref-It_18-0)[\"Overlapping codes within protein-coding sequences\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963821).*Genome Research*.**20**(11): 1582–9.[doi](/wiki/Doi_(identifier)):[10.1101/gr.105072.110](https://doi.org/10.1101%2Fgr.105072.110).[PMC](/wiki/PMC_(identifier))[2963821](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963821).[PMID](/wiki/PMID_(identifier))[20841429](https://pubmed.ncbi.nlm.nih.gov/20841429).Ofran, Y; Rost, B (2003). \"Analysing six types of protein-protein interfaces\".[^](#cite_ref-Ofran_19-0)*Journal of Molecular Biology*.**325**(2): 377–87.[CiteSeerX](/wiki/CiteSeerX_(identifier))[10.1.1.6.9207](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.9207).[doi](/wiki/Doi_(identifier)):[10.1016/s0022-2836(02)01223-8](https://doi.org/10.1016%2Fs0022-2836%2802%2901223-8).[PMID](/wiki/PMID_(identifier))[12488102](https://pubmed.ncbi.nlm.nih.gov/12488102).DeDeo, Simon; Hawkins, Robert X. D.; Klingenstein, Sara; Hitchcock, Tim (2013).[^](#cite_ref-DeDeo_20-0)[\"Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems\"](https://doi.org/10.3390%2Fe15062246).*Entropy*.**15**(6): 2246–2276.[arXiv](/wiki/ArXiv_(identifier)):[1302.0907](https://arxiv.org/abs/1302.0907).[Bibcode](/wiki/Bibcode_(identifier)):[2013Entrp..15.2246D](https://ui.adsabs.harvard.edu/abs/2013Entrp..15.2246D).[doi](/wiki/Doi_(identifier)):[10.3390/e15062246](https://doi.org/10.3390%2Fe15062246).Klingenstein, Sara; Hitchcock, Tim; DeDeo, Simon (2014).[^](#cite_ref-Klingenstein_21-0)[\"The civilizing process in London's Old Bailey\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084475)..[Proceedings of the National Academy of Sciences of the United States of America](/wiki/Proceedings_of_the_National_Academy_of_Sciences_of_the_United_States_of_America)**111**(26): 9419–9424.[Bibcode](/wiki/Bibcode_(identifier)):[2014PNAS..111.9419K](https://ui.adsabs.harvard.edu/abs/2014PNAS..111.9419K).[doi](/wiki/Doi_(identifier)):[10.1073/pnas.1405984111](https://doi.org/10.1073%2Fpnas.1405984111).[PMC](/wiki/PMC_(identifier))[4084475](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084475).[PMID](/wiki/PMID_(identifier))[24979792](https://pubmed.ncbi.nlm.nih.gov/24979792).Flavia-Corina Mitroi-Symeonidis; Ion Anghel; Nicuşor Minculete (2020).[^](#cite_ref-22)[\"Parametric Jensen-Shannon statistical complexity and its applications on full-scale compartment fire data\"](https://doi.org/10.3390%2Fsym12010022).*Symmetry*.**12**(1): 22.[doi](/wiki/Doi_(identifier)):[10.3390/sym12010022](https://doi.org/10.3390%2Fsym12010022).Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron;[^](#cite_ref-23)[Bengio, Yoshua](/wiki/Yoshua_Bengio)(2014).*Generative Adversarial Networks*.[NIPS](/wiki/Advances_in_Neural_Information_Processing_Systems).[arXiv](/wiki/ArXiv_(identifier)):[1406.2661](https://arxiv.org/abs/1406.2661).[Bibcode](/wiki/Bibcode_(identifier)):[2014arXiv1406.2661G](https://ui.adsabs.harvard.edu/abs/2014arXiv1406.2661G).\n\n## External links\n\n[[edit](/w/index.php?title=Jensen%E2%80%93Shannon_divergence&action=edit§ion=8)]", "url": "https://wpnews.pro/news/jensen-shannon-divergence", "canonical_source": "https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence", "published_at": "2026-05-22 19:27:27+00:00", "updated_at": "2026-05-25 20:34:14.301399+00:00", "lang": "en", "topics": ["machine-learning", "artificial-intelligence", "ai-research"], "entities": ["Johan Jensen", "Claude Shannon"], "alternates": {"html": "https://wpnews.pro/news/jensen-shannon-divergence", "markdown": "https://wpnews.pro/news/jensen-shannon-divergence.md", "text": "https://wpnews.pro/news/jensen-shannon-divergence.txt", "jsonld": "https://wpnews.pro/news/jensen-shannon-divergence.jsonld"}}