Continuous Diffusion Models Can Obey Formal Syntax

Researchers introduced Diffinity, a training-free guidance method that enables continuous diffusion language models to satisfy formal syntactic constraints defined by regular expressions. The method constructs an analytic score to estimate the probability that a latent state decodes to a valid string and uses its gradient to steer sampling without auxiliary classifiers. In evaluations on 180 regular-expression constraints over JSON and natural-language benchmarks, Diffinity achieved 68-96% constraint satisfaction with minimal perplexity cost, outperforming autoregressive constrained decoding in both constraint adherence and output quality.

Computer Science Machine Learning Submitted on 12 Feb 2026 v1 https://arxiv.org/abs/2602.12468v1 , last revised 27 May 2026 this version, v2 Title:Continuous Diffusion Models Can Obey Formal Syntax View PDF /pdf/2602.12468 Abstract:Diffusion language models offer a promising alternative to autoregressive models due to their global, non-causal generation process, but their continuous latent dynamics make discrete constraints -- e.g., the output should be a JSON file that matches a given schema -- difficult to impose. We introduce a training-free guidance method for steering continuous diffusion language models to satisfy formal syntactic constraints expressed using regular expressions. Our approach constructs an analytic score estimating the probability that a latent state decodes to a valid string accepted by a given regular expression, and uses its gradient to guide sampling, without training auxiliary classifiers. The denoising process targets the base model conditioned on syntactic validity. We implement our method in Diffinity on top of the PLAID diffusion model and evaluate it on 180 regular-expression constraints over JSON and natural-language benchmarks. Diffinity achieves 68-96\% constraint satisfaction while incurring only a small perplexity cost relative to unconstrained sampling, outperforming autoregressive constrained decoding in both constraint satisfaction and output quality. Diffinity is open-sourced at this http URL . Submission history From: Jinwoo Kim view email /show-email/da38629a/2602.12468 Thu, 12 Feb 2026 22:55:05 UTC 71 KB v1 /abs/2602.12468v1 v2 Wed, 27 May 2026 11:13:24 UTC 75 KB References & Citations Loading... Bibliographic and Citation Tools Bibliographic Explorer What is the Explorer? https://info.arxiv.org/labs/showcase.html arxiv-bibliographic-explorer Connected Papers What is Connected Papers? https://www.connectedpapers.com/about Litmaps What is Litmaps? https://www.litmaps.co/ scite Smart Citations What are Smart Citations? https://www.scite.ai/ Code, Data and Media Associated with this Article alphaXiv What is alphaXiv? https://alphaxiv.org/ CatalyzeX Code Finder for Papers What is CatalyzeX? https://www.catalyzex.com DagsHub What is DagsHub? https://dagshub.com/ Gotit.pub What is GotitPub? http://gotit.pub/faq Hugging Face What is Huggingface? https://huggingface.co/huggingface ScienceCast What is ScienceCast? https://sciencecast.org/welcome Demos Recommenders and Search Tools Influence Flower What are Influence Flowers? https://influencemap.cmlab.dev/ CORE Recommender What is CORE? https://core.ac.uk/services/recommender IArxiv Recommender What is IArxiv? https://iarxiv.org/about arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs https://info.arxiv.org/labs/index.html .