{"slug": "safere-building-a-production-quality-regex-library-with-agents", "title": "SafeRE: Building a production-quality regex library with agents", "summary": "Ethan Afton built SafeRE, a production-quality linear-time regular expression library for Java, using AI agents to write the code while he directed the work. The library aims to prevent ReDoS attacks by ensuring matching time grows linearly with input size, addressing a problem that Google estimated would require two engineer-years to solve. SafeRE is open-source and available on GitHub.", "body_md": "# SafeRE: Building a production-quality regex library with agents\n\n*This is the first in a series of blog posts about SafeRE, my\nlinear-time regular expression library for Java.*\n\nA few months ago, I was having coffee with a friend, and we were talking about how good AI agents had gotten. Recent frontier models like Opus 4.6 and GPT-5.5 felt like a step change: not just better at small coding tasks, but much more capable of working through complex, long-running tasks. I started to wonder: could I build a substantial, production-quality software project purely with agents, with no human-written code at all? How would I ensure correctness if I wasn’t writing every line myself? Could agents make a project like this feasible to attempt in my spare time?\n\nI decided to try an experiment.\n\nWhen I worked on the Java team at Google, we considered building a linear-time regular expression\nlibrary in pure Java. A bit of background: many popular regular expression libraries use\n[backtracking engines](https://en.wikipedia.org/wiki/Regular_expression#:~:text=The%20third%20algorithm%20is%20to%20match%20the%20pattern%20against%20the%20input%20string%20by%20backtracking.),\nwhich can take exponential time on some patterns and inputs. Attackers can\nexploit that behavior by sending inputs that cause a service to burn huge amounts of CPU evaluating\na regex – a class of attacks known as [regular expression denial of service, or\nReDoS](https://en.wikipedia.org/wiki/ReDoS). A linear-time regex library avoids that failure mode by\nensuring that matching time grows linearly with the size of the input. While this might sound like a\nniche concern, it was a real problem at Google.\n\nBuilding a new regex library would have been a lot of work. We estimated it at roughly two engineer-years. We couldn’t justify the investment, so we never built it. But I could never fully let go of the idea. It’s the kind of project that’s the reason I got into this field: using computer science to solve a real-world problem.\n\nPerhaps naively, I thought I could build this library in my spare time with agents doing the bulk of the work. So I decided to try it.\n\nThe outcome is SafeRE, which is open-source and available at\n[github.com/eaftan/safere](https://github.com/eaftan/safere).\n\nWhen I say SafeRE was built with agents, I don’t mean that I told an agent “go build a regex engine” and came back a week later to a finished project. I mean that agents wrote the code, while I directed the work: breaking down tasks, reviewing code, steering the agents when they went in the wrong direction, and shaping how I wanted them to approach the problem. My role was somewhere between tech lead and pair programmer.\n\n## Suitability for agents\n\nI initially chose this project because it seemed well-suited to agents. In reality, it turned out to be much harder than I expected. I was overly optimistic at the start.\n\n**Why did it seem well-suited?**\n\nWhile it’s technically difficult to build a linear-time regular expression library, the core ideas\nare well understood. There are existing libraries, [RE2](https://github.com/google/re2) in\nparticular, that SafeRE could learn from. Russ Cox, the author of RE2, also wrote [an excellent\nseries of blog posts explaining the ideas behind it](https://swtch.com/~rsc/regexp/). So while the\nwork is difficult, it is not research. We don’t have to invent new techniques to do this.\n\nSafeRE owes a huge debt to RE2. The project started as a Java port of RE2, and I intentionally kept\nRE2’s license and license header to make that lineage clear. As the project evolved, SafeRE diverged\nfrom RE2 because the goal shifted from “RE2 in Java” to drop-in compatibility with\n`java.util.regex`\n\n, whose semantics are often different. But RE2 was the starting point, both\ntechnically and intellectually.\n\nRegular expression engines are also unusually testable. They are deterministic and self-contained. You don’t have to wire together a distributed system to test them. There are also extensive open-source test suites that can be reused or adapted, where licenses permit and with appropriate attribution.\n\n**Why was it hard?**\n\nThis is the part where I was overconfident. Regular expressions are a type of programming language,\nand they are very widely used. The popular implementations are incredibly battle-tested. My stated\ngoal was for SafeRE to be a drop-in replacement for `java.util.regex`\n\n. That meant SafeRE had to be\nin the same neighborhood as the Java standard library’s regex implementation for correctness.\n\n`java.util.regex`\n\nhas been around [since Java 1.4 in\n2002](https://docs.oracle.com/en/java/javase/26/docs/api/java.base/java/util/regex/Pattern.html#:~:text=Since%3A,1.4)\nand has widespread usage. SafeRE was\nbuilt from scratch. To be viable for production usage, I was going to have to polish it to an\nincredibly high standard. This turned out to be where I spent most of my time on the project.\n\nA concrete example: SafeRE inherited support for POSIX bracket classes from RE2. In RE2, expressions\nlike `[[:lower:]]`\n\nand `[[:digit:]]`\n\nhave special meaning. Java’s regex library accepts those\nstrings, but doesn’t treat them as POSIX bracket classes. In Java, POSIX-style character properties\nare written with escapes like `\\p{Lower}`\n\n. So this was not a parser error or a missing feature. It\nwas worse: accepted syntax with different semantics, which means SafeRE could silently return the\nwrong answer.\n\nThat kind of issue came up repeatedly. The hard part was not implementing the core regex engine; it\nwas matching the long tail of behavior that real Java programs may depend on. This is another\nexample of [Hyrum’s Law](https://www.hyrumslaw.com/).\n\n## Not vibe coding, but agentic engineering[1](#user-content-fn-1)\n\n[1](#user-content-fn-1)\n\nThere’s no shortage of blog posts about building demos or prototypes with agents, but there still aren’t many detailed accounts of building production-quality projects from scratch with agents.\n\nI wanted to learn from this experiment:\n\n- Is it possible to build a production-quality, technically complex project from scratch using only agents?\n- How do you ensure correctness when you’re not writing the code?\n- How do you maintain the code when you’re not intimately familiar with every line?\n- How do you make sure the agent is doing what you want?\n- What infrastructure do you have to put in place?\n- What processes do you need to work effectively with agents?\n- What is it actually like to work with an agent on a project like this?\n\nTo preview one answer: correctness didn’t come from painstakingly reviewing the agents’ code. It came from building increasingly aggressive validation machinery.\n\nMy testing approach started by incorporating the test suites from RE2 and\n[RE2/J](https://github.com/google/re2j) and driving test failures to zero. Then [I substituted SafeRE\nfor java.util.regex in six large open-source Java\nprojects](https://github.com/eaftan/safere/issues/26), ran their tests, and fixed the SafeRE bugs\nthey uncovered. Then I implemented a\n\n[fuzzer](https://github.com/eaftan/safere/tree/main/safere-fuzz), found more bugs, and fixed them. In the latest phase of the project, I’ve created\n\n[sweeps that exhaustively enumerate regexes of certain forms and compare SafeRE’s output to](https://github.com/eaftan/safere/tree/main/safere-exhaustive). Those sweeps currently cover around 20 billion test cases and take days to run. They’re slow, expensive, and extremely useful. They find bugs that ordinary unit tests would never find.\n\n`java.util.regex`\n\n## How do I know it’s production grade?\n\nIf the goal of this project is to see if I can build a production-grade, linear-time regex engine\nusing agents, how do I *know* it’s production grade?\n\nTo be honest, I don’t yet. I’ve put a tremendous amount of effort into testing and performance tuning, and I’ve tested it in several large Java open source projects. But the only way to know for sure is to have someone actually deploy it in production.\n\nI have some people kicking the tires and sending me feedback, which you can see from the list of\n[issues](https://github.com/eaftan/safere/issues?q=is%3Aissue%20-author%3A%40eaftan) and\n[PRs](https://github.com/eaftan/safere/pulls?q=is%3Apr+-author%3A%40eaftan) not authored by me. But\nI’d love to have more. SafeRE should be a drop-in replacement for `java.util.regex`\n\n, barring a few\n[features that cannot be implemented within the linear-time\nguarantee](https://github.com/eaftan/safere#not-supported). So please give it a try and tell me (1)\nif you run into any problems, and (2) if you do end up deploying it in production.\n\nThe project is open-source and available on GitHub at\n[github.com/eaftan/safere](https://github.com/eaftan/safere). I publish a Maven artifact that you\ncan depend on; instructions are in the README.\n\n## More to come\n\nI’ve spent a lot of time trying to start writing about SafeRE. There’s a lot to say, and I can’t say it all in one post. So there will be more to come. But I have to start somewhere, and this post is the start of the series.\n\nI’m not sure yet what I’ll write about next. Some possibilities:\n\n- My agent workflow\n- The testing process, which escalated quickly!\n- Stats about agent usage: tokens, cost, number of sessions, etc.\n- A project timeline\n- Where I had to correct the agents\n- What kinds of bugs the agents introduced\n- What this experiment suggests, and does not suggest, about how agents may change software engineering\n\n## Discuss\n\nDiscussion is happening on\n[LinkedIn](https://www.linkedin.com/posts/eddie-aftandilian-772b267_for-the-past-few-months-ive-been-building-share-7474247197438423040-LSOV/)\nand [Hacker News](https://news.ycombinator.com/item?id=48615570).\n\n*Note: I wrote this post by hand. I used an agent for proofreading and feedback.*", "url": "https://wpnews.pro/news/safere-building-a-production-quality-regex-library-with-agents", "canonical_source": "https://eaftan.github.io/safere-intro/", "published_at": "2026-06-21 04:02:09+00:00", "updated_at": "2026-06-21 04:07:03.621603+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-research"], "entities": ["SafeRE", "Google", "RE2", "Russ Cox", "Java", "GitHub", "Opus 4.6", "GPT-5.5"], "alternates": {"html": "https://wpnews.pro/news/safere-building-a-production-quality-regex-library-with-agents", "markdown": "https://wpnews.pro/news/safere-building-a-production-quality-regex-library-with-agents.md", "text": "https://wpnews.pro/news/safere-building-a-production-quality-regex-library-with-agents.txt", "jsonld": "https://wpnews.pro/news/safere-building-a-production-quality-regex-library-with-agents.jsonld"}}