Two months ago I started building voice agents for small service businesses. dental clinics and HVAC companies that lose real money every time a call goes to voicemail. I'm doing it solo, alongside a day job, which means every wrong turn costs me a weekend I don't get back.
Here's what actually went wrong, what I'd tell myself on day one, and the parts of the stack that held up.
Nothing here is exotic. That was the point. I wanted boring, debuggable infrastructure I could reason about at 11pm.
I assumed the voice model would be the scary part. It wasn't. Modern voice platforms handle the conversation surprisingly well out of the box.
The actual pain was everything around the conversation — what happens when the agent needs to check an appointment slot, write to a calendar, or hand off to a human gracefully. That orchestration logic is where I lost the most time, and it's the part no demo video ever shows you.
If you're evaluating this space: budget your time for the plumbing, not the model. Running n8n in Docker on a small VPS is genuinely fine for low volume. What nobody warned me about: execution data accumulates fast and will quietly eat your disk.
The fix is one environment variable:
Set it early. I found out the way you'd expect — a workflow failing for no obvious reason, an hour of confusion, then a df -h
showing a nearly full disk.
I ran a cold email campaign to roughly 1,600 leads over two months. Clean domain warmup, SPF/DKIM/DMARC all verified, aggregate reports showing no auth failures.
Replies: basically zero.
That stung, but it was useful. It forced me to confront that deliverability being technically correct and the message being compelling are completely different problems. The infrastructure was fine. The offer and the targeting weren't sharp enough yet. No amount of DNS hygiene fixes a message that doesn't land.
Early on I wanted to serve "service businesses." Too vague. The moment I picked one vertical and wrote scripts for specific call patterns — new patient booking, after-hours emergencies, the weird edge cases a real receptionist handles — everything got easier. The demos got sharper. The objections got predictable.
If you're building anything agent-shaped: pick the narrowest viable slice and over-fit to it. You can generalize later. What I'd tell myself on day one
If you're building in the voice-agent or automation space and have hit the same walls, I'd genuinely like to compare notes in the comments. I'm building [VoiceIntego], AI voice agents for service businesses, mostly so businesses stop losing jobs to voicemail. Still early. Happy to talk shop.