From Voice Demo to Operational Voice Assistant: Reviving Ovela AI Ovela AI has evolved from a voice demo prototype into an operational voice assistant designed for businesses, shifting focus from natural conversation to reliable behavior in tasks like checking availability, updating reservations, and collecting payments. The system now uses a multi-agent architecture built around Google's Agent Development Kit (ADK), with separate agents handling reservations, business operations, and information requests independently. Key improvements include interruption handling, reduced response delays, and operational safeguards against abuse, guided by the principle that AI should support human operations rather than blindly replace them. This is a submission for the GitHub Finish-Up-A-Thon Challenge Ovela AI started as a side project driven by a question that kept pulling me back: What would it take for a business to genuinely trust a voice AI system? At first, I thought the answer was simple: make conversations sound natural. The original prototype could answer calls, respond to questions, and carry a conversation reasonably well. From a technical perspective, it looked impressive. But after speaking with accommodation providers and small business owners, I realized I was focused on the wrong problem. Businesses don't trust a system because it sounds human. They trust it because it behaves reliably. Can it check availability correctly? Can it update reservations safely? Can it collect payments? Can it transfer a call when confidence is low? Can staff see exactly what happened afterward? That realization changed the direction of the project completely. Ovela AI evolved from a voice demo into an operational voice assistant designed to help businesses handle real customer interactions while keeping humans in control of important decisions. Today, Ovela can: More importantly, every improvement is guided by a simple principle: AI should support human operations, not blindly replace them. πŸ“ž Live Demo Australia Phone: +61 3 4823 6219 Due to abuse protection and testing limits, availability may occasionally be restricted. Try asking: 🌐 Website: https://ovela.dev https://ovela.dev πŸ™ GitHub Repository: https://github.com/My-CMDhub/Ovela-AI https://github.com/My-CMDhub/Ovela-AI Like many side projects, Ovela reached a point where the prototype worked well enough to demonstrate the idea. Then it sat untouched not because the project failed but because other priorities took over. Months later, after more conversations with business owners and more exposure to real operational challenges, I came back to the project with a very different perspective. The biggest lesson was surprisingly non-technical. The challenge isn't making AI speak. The challenge is making AI behave appropriately within human workflows. A real receptionist doesn't simply answer questions. They: Most voice demos don't fail because speech recognition is poor. They fail because the operational behavior doesn't match what people expect from a trusted assistant. That became the focus of the revival. The original system relied on a much simpler flow. The new version uses a multi-agent architecture built around Google's Agent Development Kit ADK , allowing different agents to handle reservations, business operations, and information requests independently. Voice interactions are highly sensitive to delays. Several architectural bottlenecks were removed to improve response times and reduce awkward pauses during calls. One of the most interesting challenges was interruption handling. People interrupt constantly during real conversations. The system now maintains awareness of what information has already been spoken, allowing it to continue naturally instead of restarting or losing context. Reservation workflows, payment handling, availability checks, and dashboard synchronization were rebuilt to behave more like real business processes rather than isolated AI actions. Real phone systems attract misuse. Rate limits, call protections, and operational safeguards were added to prevent abuse while keeping legitimate usage frictionless. Returning to a codebase that has sat inactive for months is often harder than starting a brand new one. You inherit your own past decisions without fully remembering why you made them. For the revival of Ovela AI, I didn't use GitHub Copilot as a simple autocomplete tool to write boilerplate code. Instead, I used it as a high-level engineering partner and data auditor to manage complex architectural shifts and harden my system’s reliability. Here are the two major ways Copilot helped me cross the finish line, backed by real-world interaction during my workflow development: As Ovela AI transitioned to a multi-agent setup, mapping out component connections, telephony triggers, and dashboard synchronization endpoints became a major cognitive bottleneck. I leveraged Copilot within my workspace as a principal solutions architect. By feeding it my core file dependencies, it mapped out a clean, production-ready system workflow directly in Mermaid.js syntax for the repository documentation. Building a reliable voice AI requires robust testing. I simulate conversations between two LLMs and dump the evaluation telemetry into local .json log files. However, default automated grading scripts are notoriously prone to false positives e.g., grading a hallucinated response highly simply because it sounded polite . I utilized Copilot as a Supreme AI Evaluation Auditor . I passed it raw JSON conversation objects, prompting it to critically audit the automated scores, spot misleading feedback, and generate an adjusted "Supreme Score" with a bulleted logical justification. This drastically reduced noise in my evaluation pipeline. Beyond these two core pillars, Copilot served as an excellent "cleanup crew" throughout this journey even after hitting rate limits βœ‹. It assisted in tracking down legacy typing issues, reviewing asynchronous edge cases, and generating clean inline documentation. Ultimately, the biggest value Copilot provided wasn't just writing lines of code faster but it was accelerating complex architectural decisions and data validation when reviving a stale codebase. The most valuable lesson wasn't technical. It was understanding the difference between a convincing demo and a useful product. A demo succeeds when the AI says the right thing. A business system succeeds when the right thing actually happens afterward. That distinction changed how I think about voice AI. Natural conversation matters. Latency matters. Speech quality matters. But trust matters more. Trust comes from reliability, transparency, and knowing when humans should remain part of the process. I don't believe current voice AI systems perfectly replicate human interaction, and that's not really the goal. What interests me is the space between humans and AI: How can AI handle repetitive operational work while humans remain responsible for judgment, relationships, and important decisions? Reviving Ovela helped me explore that question far more deeply than when I first started the project. And honestly, that's what made finishing it worthwhile.