Building a Voice AI Agent in Italian with ElevenLabs + n8n: Lessons From 200 Live Bookings/Month

Voice AI agent using ElevenLabs and n8n across seven Italian restaurants, handling 200 bookings per month for a total cost of €87. Key lessons included that Italian requires a faster response time (under 1.0 seconds) than English, regional accent variance causes word error rates up to 19%, and the agent needed a social warmup phase to match Italian cultural norms. The production stack uses ElevenLabs for voice, self-hosted n8n for orchestration, PostgreSQL on Supabase for data, and Twilio for VoIP, with a daily automated workflow to pull updated menus from Google Drive.

I deployed a voice AI agent in 7 Italian restaurants. It handles 200 bookings a month, in native Italian, for €87/month total cost. Here is what worked, what broke, and the exact stack that ships in production. No marketing fluff. Real numbers from 60 days of live deployment, the latency we hit, the edge cases that broke our agent, and the three scenarios where I now refuse to deploy voice AI even when the client begs. If you've shipped an English voice agent, here's what you should expect to break when you port the same architecture to Italian. Latency tolerance is lower. Italian conversation has shorter pauses between turns. The 1.5s response time that feels "fast" in English starts to feel awkward in Italian above 1.2s. Native speakers begin to repeat themselves or check if you're still there. We had to drop our agent's first-response target from 1.5s to 1.0s before user satisfaction stopped degrading. Regional accent variance is huge. A 60-year-old Roman saying "addò sta er ristorante" is grammatically Italian but ASR-wise it might as well be Catalan. We tested ElevenLabs ASR on three sample populations Roman elderly, Neapolitan middle-aged, Northern under-30 : WER ranged from 4% Northern under-30 to 19% Roman elderly . For restaurants in Rome's historic center, we had to add a clarification fallback at the first failed parse, not the third. Cultural patterns matter for prompt design. Italian restaurant calls open with extended pleasantries "buongiorno, scusi il disturbo, volevo solo sapere se per stasera..." . English-trained prompts that try to short-circuit straight to "what's your reservation?" feel rude. We added a 1-2 turn "social warmup" phase before pushing toward intent collection. Booking completion rate went from 71% to 89%. English-trained TTS sounds robotic in Italian. This isn't just opinion. Blind test, n=50 Italian native speakers, two voice models: Italian-trained ElevenLabs voice scored 4.4/5 naturalness, English-trained voice doing TTS in Italian scored 2.1/5. The latter was correctly identified as AI 78% of the time within the first 10 seconds. After testing Vapi, Retell, Bland and a custom Whisper + GPT-4o + OpenAI TTS pipeline, here's what shipped to production: 1. ElevenLabs Conversational AI — voice synthesis + ASR + intent routing in one product. Italian native voices "Bianca", custom-cloned , conversation flow handled in their dashboard. Cost: $0.08/min on the Creator plan. Why I picked this over a custom pipeline: managing Whisper + GPT-4o + TTS as three separate services added ~400ms latency and required a state machine I didn't want to maintain. 2. n8n self-hosted on a Hetzner CX22 €5/month — the orchestration layer. Webhook from ElevenLabs hits n8n on intent recognized "book table", "ask menu", "modify reservation" , n8n does the actual DB work Postgres lookup, availability check, write reservation , responds back to the agent with structured data. 3. PostgreSQL on Supabase free tier handles all 7 restaurants comfortably — restaurant menu, tables, opening hours, reservations, customer history. Schema is boring on purpose: 6 tables, 22 columns total. 4. Twilio for Italian VoIP — €1/month per phone number, €0.013/min inbound. Yes, we evaluated Vonage, Plivo and Bandwidth. Twilio's Italian numbers had the best call quality and the only support team that actually answered Italian dial issues within 24h. Total monthly cost per restaurant: €12.50 ~€5 Hetzner shared across 7 restaurants + €1 Twilio + ~€6.50 ElevenLabs minutes . Alternatives I rejected after testing: Restaurant menus change daily. A static system prompt with "today's menu" baked in goes stale in 24h. The naive solution — paste the new menu into the prompt every morning — breaks at scale 7 restaurants, 3 staff who don't want to log into a dashboard . The workflow that works: Every day at 06:00: 1. Cron in n8n triggers 2. Pulls menu PDF from each restaurant's shared Google Drive folder 3. Sends PDF to Mistral OCR free tier, 1GB/month, more than enough for menus 4. Parses returned text into structured JSON dishes, prices, allergens 5. Upserts into Postgres menu items table with valid from = today 6. Marks yesterday's menu valid to = today 7. Pings ElevenLabs webhook to invalidate runtime cache 40 lines of n8n nodes. Staff drop the PDF into Drive when they want, the agent picks it up next morning. Zero manual sync. The trick that took me 3 iterations to find: don't load the full menu into the prompt context. The agent calls a menu lookup tool function only when the user asks about food. Keeps context lean cheaper , keeps responses focused, lets us A/B test menu phrasing without touching the agent prompt. Aggregate data across 7 restaurants, May 2026: The 10% that fails: These three categories cover ~95% of all handoffs. I list them in the agent's system prompt as "always escalate" cases. The remaining 5% are genuinely unclassifiable edge cases drunk callers, kids playing with phones, sales calls from suppliers . I get 4-5 inbound requests per week now. I refuse roughly 30% of them. Here's when: Scenario 1: under 25 calls/week. The ROI math doesn't work. Below that volume, the time the owner spends learning the system, training staff to interpret the dashboard, and dealing with the inevitable first-month edge cases costs more than the time saved. Pay a part-time receptionist. Scenario 2: elderly-skewed clientele 70% over 65 . Voice AI works fine with seniors who are tech-comfortable, but if the bulk of your callers are 70+ and not tech-comfortable, the conversational friction is real. We had one trattoria in Pescara where 80% of bookings were elderly regulars. Booking completion rate stayed at 62% even after 4 weeks of prompt tuning. We turned the agent off. Scenario 3: highly consultative calls. Anything where the value is in extended human conversation — legal intake, medical triage, financial advice — should stay human for ethical reasons before practical ones. Voice AI can take a callback request. It shouldn't take a substantive professional consultation. Saying no to these is how I keep the 90% completion rate on the agents that do ship. If you want the full Sofia case study architecture diagrams, n8n workflow JSON, the actual ElevenLabs prompt template, screenshots of the dashboard , I documented everything publicly at andreasisofo.it/sofia-ristoranti. I'm Andrea Sisofo, freelance ex-BBDO based in Rome. I build voice AI agents in Italian for SMBs. Reach me at andreasisofo.it or on LinkedIn. Open to questions in the comments.