We faced a recurring issue in our content generation pipeline: the LLM frequently outputted malformed Markdown. Unclosed code blocks, broken list levels—you name it. Relying solely on Prompt engineering became a game of whack-a-mole that we couldn't win.
The core problem? Asking an LLM to generate Markdown is a probabilistic process. A Prompt is a "soft constraint." No matter how well you phrase it, a slight token fluctuation can break the syntax, causing frontend crashes.
We realized we were violating the Single Responsibility Principle. We were asking the model to do two jobs:
Models are great at semantics but terrible at strict formatting rules. So, we decoupled them.
Instead of asking the LLM to write Markdown, we switched to JSON output and let Jinja2 handle the rendering.
Before (Probabilistic):
prompt = "Write an article about {topic} in Markdown format."
response = llm.generate(prompt)
After (Deterministic):
prompt = "Output data about {topic} in JSON format."
json_data = llm.generate(prompt)
md_content = jinja_env.get_template('article.md').render(data=json_data)
This moved the formatting from a "maybe" to a "definitely." If the template is correct, the Markdown is correct.
Just in case (and for legacy compatibility), we added a post-processing layer with regex validation. It acts as a safety net for unclosed code fences.
def sanitize_markdown(text):
if not re.search(r'```
[\s\S]*?
```', text):
text = re.sub(r'(^.*$)', r'```
\n\1\n
```', text)
return text
final_markdown = sanitize_markdown(llm_output)
While fixing the text generation, we also noticed a logic gap in our stock data queries. We treated A-shares, ETFs, and Hong Kong stocks identically. This caused failures because:
.SH
or .SZ
suffixes.We implemented a router at the query entry point:
def get_stock_data(code):
if is_hk_stock(code):
return hk_api.get_price(code)
elif ".SH" not in code and ".SZ" not in code:
code = f"{code}.SH"
return api.get_price(code)
By shifting from "Prompt Optimization" to "Engineering Hard Constraints":
If you are fighting with LLMs to output perfect HTML or Markdown, stop. Use the LLM for what it's good at—generating structured JSON data—and use a template engine like Jinja2 to enforce the view layer. It turns a probabilistic headache into a deterministic pipeline.