I spent hours writing unit tests – so I made an LLM do it (and learned what not to do)

A developer built an automated test generation system using an LLM after spending hours writing repetitive unit tests for 30+ similar validation functions. The engineer created a Python script that feeds function source code into an LLM API, requests comprehensive pytest test cases covering normal, edge, and error scenarios, then validates the output with `ast.parse()` before writing it to a test file. The approach successfully generated tests for a `validate_email` function, producing cases for valid emails, plus-addressing, missing @ symbols, and empty strings.

About a month ago I hit that point in a project where the business logic was solid, the API endpoints were clean, but the test file was a pathetic stub. I had 30+ similar validation functions – each one a slight variation on “does this field exist?”, “is it the right type?”, “does it pass this custom rule?”. The manual approach would mean copying the same assert pattern dozens of times, changing only the function name and the test input. My brain started melting just thinking about it. I’m a big believer in testing, but I’m also a big believer in not doing boring work twice. So I started looking for ways to automate test generation. My first instinct was to write a Python generator that parsed the function signatures and spat out basic asserts. Something like: python def generate test func name, params : lines = f"def test {func name} :" for p in params: lines.append f" assert {func name} {p} is not None" return "\n".join lines This worked only for the most trivial cases. As soon as the functions had side effects, required fixtures, or needed specific edge-case values, the template became a nightmare of conditionals. Plus, what about the negative tests – the inputs that should raise errors? My generator didn’t know anything about the domain logic. Next I tried a rule‑based approach with regular expressions. I wrote about 200 lines of heuristics to infer parameter types from docstrings. It sort of worked for one function, then broke completely on the next. I felt like I was rebuilding a tiny compiler for a language nobody uses. I had a hunch that an LLM could do better if I gave it the right context. The idea was simple: feed the function source plus docstring into a language model, ask it to produce pytest test functions, and then validate the output before writing it to a file. Here’s the core loop I ended up with: python import json import ast import requests For demo purposes – replace with your own endpoint BASE URL = "https://ai.interwestinfo.com/v1" Example: LLM API API KEY = "your-key" def generate tests source code: str, max retries=2 : prompt = f"""You are an expert Python tester. Given the function below, write comprehensive pytest test functions covering: - Normal cases - Edge cases empty, None, large values - Error cases wrong types, out-of-range Do NOT use external libraries beyond pytest. Return ONLY valid Python code no explanations . Function: python {source code} """ for attempt in range max retries : response = requests.post f"{BASE URL}/chat/completions", headers={"Authorization": f"Bearer {API KEY}"}, json={ "model": "gpt-4o-mini", Or whatever model you prefer "messages": {"role": "user", "content": prompt} , "temperature": 0.3, } response.raise for status content = response.json "choices" 0 "message" "content" Validate that the output is parseable Python try: ast.parse content return content except SyntaxError: if attempt == max retries - 1: raise continue return content Fallback shouldn't happen python The validation step is crucial. LLMs love adding markdown fences, random comments, or incomplete brackets. By parsing the output with ast.parse I catch those before I write bad code to my test file. I pointed this at a validate email function with three lines of logic. The LLM returned: python import pytest from validation import validate email def test valid email : assert validate email "user@example.com" is True def test valid email with plus : assert validate email "user+tag@example.com" is True def test no at symbol : assert validate email "userexample.com" is False def test empty string : assert validate email "" is False def test none input : with pytest.raises TypeError : validate email None Not bad – it even guessed I wanted a TypeError for None which my function did raise . I ran the tests and they passed. Success. But it wasn’t all roses. For a complex function that involved a database query, the LLM generated tests that mocked things incorrectly. It assumed the function would call db.fetch when in reality it used an async ORM. The generated tests were syntactically valid but semantically wrong. Use LLMs for boilerplate, not for domain-specific logic. If your function requires deep knowledge of your database schema or business rules, the generated tests will be too generic. You’re better off hand‑writing those or providing a schema context in the prompt. Prompt engineering matters more than the model. Adding "Do NOT include imports that don't exist in your project." and "Use pytest.raises for exceptions." dramatically improved the output quality. Always validate the output. I parse the response with ast.parse and also run a quick pytest --collect-only on the generated file to catch any syntax or import errors before the full test run. Temperature 0.2 – 0.4 is the sweet spot. Too high and it invents random test cases; too low and it repeats the same pattern ad nauseam. For my validation functions, this approach saved about 20 minutes per function. Over 30 functions, that’s 10 hours I got back. The generated tests aren’t perfect – I still review every file – but they catch the obvious stuff, which is where many bugs hide. I’d write a small CLI tool that takes a list of function names or reads a module and generates a test file for each, then opens a diff viewer so I can accept/reject chunks. That’s the next weekend project. Now I’m curious: How do you handle the boring parts of testing? Do you use any code generation, or do you just accept the grind? Let me know in the comments – I’d love to steal your ideas.