🧠 NeuroDoc: From Broken Prototype to Production-Ready Async AI Documentation Engine

NeuroDoc, an AI-powered documentation engine, was rebuilt from a fragile CLI prototype into a production-ready full-stack web dashboard with RAG capabilities. The original tool suffered from blocking synchronous loops, an in-memory task queue that lost all jobs on crash, and brittle core resolvers, but a complete architectural rewrite using asyncio and aiohttp eliminated these flaws. The new version features a persistent database-backed task queue and a RAG pipeline that embeds queries, retrieves relevant documentation chunks, and generates grounded summaries.

This is a submission for the GitHub Finish-Up-A-Thon Challenge I abandoned this project. Then I resurrected it. Here's how a fragile CLI script became a full-stack async web dashboard with RAG capabilities. NeuroDoc started as an ambitious idea: a single tool to fetch, scrape, process, and summarize documentation across Python, scikit-learn, PyTorch, and TensorFlow — powered by NLP and multi-core processing. But it hit a wall fast. The villain: a blocking synchronous loop that froze everything while True: query = input "Enter query: " 🚫 BLOCKS the main thread result = fetch docs query 🚫 BLOCKS background workers print result The original prototype had three fatal flaws : | Problem | Impact | |---|---| input loop on main thread | Blocked all background scraping workers | | In-memory task queue | All pending jobs vanished on crash | | Brittle core resolver | Failed silently on dynamic imports | Long-running doc crawls would stall. A single crash wiped the entire task queue. It was a house of cards — impressive from a distance, terrifying up close. So I shelved it. Months later, I came back with a clear head and a plan. The rewrite wasn't incremental — it was architectural. Three shifts made everything click: asyncio + aiohttp Out went the blocking loop. In came a proper async event loop that lets scraping, processing, and serving happen concurrently without stepping on each other. php async def fetch documentation url: str, session: aiohttp.ClientSession - DocResult: async with session.get url, timeout=aiohttp.ClientTimeout total=30 as response: content = await response.text return await process content content async def run pipeline queries: list str - list DocResult : async with aiohttp.ClientSession as session: tasks = fetch documentation q, session for q in queries return await asyncio.gather tasks, return exceptions=True No more frozen terminals. No more stalled workers. The in-memory queue was replaced with a persistent, database-backed task queue . Now if the server crashes at 3 AM while crawling PyTorch docs, no work is lost. Tasks resume exactly where they left off. php class TaskQueue: async def enqueue self, task: DocumentationTask - str: task id = str uuid.uuid4 await self.db.execute "INSERT INTO tasks id, status, payload, created at VALUES ?, ?, ?, ? ", task id, TaskStatus.PENDING, task.to json , datetime.utcnow return task id async def get next self - DocumentationTask | None: row = await self.db.fetchone "SELECT FROM tasks WHERE status = 'pending' ORDER BY created at LIMIT 1" return DocumentationTask.from row row if row else None This is where NeuroDoc levels up from "scraper" to "intelligent documentation assistant." Instead of returning raw docs, it: php class RAGPipeline: async def query self, user query: str - RAGResponse: Step 1: Embed the query query embedding = await self.embedder.embed user query Step 2: Retrieve top-k relevant chunks relevant chunks = await self.vector store.similarity search query embedding, top k=5 Step 3: Generate grounded summary context = "\n\n".join chunk.text for chunk in relevant chunks summary = await self.llm.generate prompt=f"Answer based on this documentation:\n{context}\n\nQuery: {user query}" return RAGResponse summary=summary, sources=relevant chunks ┌─────────────────────────────────────────────────────┐ │ Web Dashboard FastAPI │ │ ┌──────────┬──────────────┐ │ │ │ Submit │ Results │ │ │ │ Query │ Viewer │ │ │ └────┬─────┴──────┬───────┘ │ └───────────────────┼────────────┼────────────────────┘ │ │ ┌─────────▼────────────▼──────────┐ │ Async Task Dispatcher │ │ asyncio + DB task queue │ └──────┬──────────────────┬────────┘ │ │ ┌────────────▼────┐ ┌────────▼────────────┐ │ Multi-core │ │ RAG Pipeline │ │ Doc Scraper │ │ Embed → Retrieve │ │ aiohttp │ │ → Generate │ └────────┬────────┘ └────────┬─────────────┘ │ │ ┌────────▼──────────────────────▼─────────────┐ │ SQLite / PostgreSQL DB │ │ tasks · chunks · embeddings · results │ └──────────────────────────────────────────────┘ | Library | Sections Scraped | NLP Processing | |---|---|---| 🐍 Python | stdlib, builtins, language ref | Code extraction, summaries | 🤖 scikit-learn | API reference, user guide | Table parsing, param docs | 🔥 PyTorch | Tensor ops, nn, autograd | Code snippets, examples | 🌊 TensorFlow | Keras, tf.data, layers | API signatures, guides | Clone the repo git clone https://github.com/kaushikcoderpy1/neurodoc cd neurodoc Install dependencies pip install -r requirements.txt Initialize the database python -m neurodoc.db init Start the async dashboard uvicorn neurodoc.app:app --reload --port 8000 Then open http://localhost:8000 and start querying. Why asyncio over threading? asyncio handles thousands of concurrent requests with a single thread — no GIL fights, no race conditions. Why SQLite for the task queue instead of Redis? Zero infrastructure. NeuroDoc is a dev tool — adding a Redis dependency just to persist a queue adds friction. SQLite WAL mode handles concurrent reads/writes cleanly for this use case. Why RAG over fine-tuning? Documentation changes constantly. RAG retrieves from live-scraped content. A fine-tuned model would be stale in weeks. This section is the heart of the comeback story. NeuroDoc didn't just get rewritten — it got debugged at a deep architectural levelwith Copilot as a true pair programmer. Here are four real, production-blocking bugs it helped resolve. The failure: Under high-concurrency loads via asyncio.gather , edge-case exceptions inside sub-coroutines bypassed connection release hooks — leaving asyncpg pool sockets exhausted and the app hanging silently. Standard try/finally cleanup blocks failed because they referenced stale async contexts. The pool hit max capacity and froze. How Copilot helped: Copilot introduced a strict connection acquisition pattern bound directly to local transaction lifecycles, with absolute timeout guards: Copilot-suggested acquisition pattern async with pool.acquire as connection: async with connection.transaction : result = await asyncio.wait for connection.fetch query, args , timeout=5.0 Hard boundary — no silent hangs It also added global exception wrappers that translate raw driver errors into clean structured responses — guaranteeing connection cleanup even if the downstream scraping pipeline crashed . SpecifierSet .contains AttributeError Across Packaging Versions The failure: formatter.py runs dependency diagnostics via DependencyAnalyzer . On environments with older packaging library versions, calling .contains on a SpecifierSet threw: AttributeError: 'SpecifierSet' object has no attribute 'contains' This crashed the entire diagnostic panel before it could render — silently breaking environment validation for a large chunk of users. How Copilot helped: Copilot identified that .contains is version-specific, but the native in operator is universally backward-compatible across all historical releases of packaging : ❌ Old failing code elif not raw spec.contains local : ✅ Copilot's robust fix — works on every packaging version elif local not in raw spec: One operator swap. Zero crashes across all environments. The failure: In neurodoc.py , CLI input like neurodoc fetch os passed the core ID "1" as a raw string into isinstance core, Core1PythonBasics checks. Since "1" is a string, every check silently fell through with: Unknown core type for str Worse — the topic "os" was passed into the batch resolver without list wrapping, so it iterated over the characters 'o' and 's' separately instead of treating "os" as a unified module name. How Copilot helped: Copilot introduced dynamic string dereferencing that maps string IDs back to their live handler instances, plus list-wrapping for topic encapsulation: Dynamic dereference — string → live core handler if isinstance core, str : core = self.command handler.available cores.get core Topic wrapped as list — no more character iteration return await self.call backend "core1", topics= topic f , flags=flags The failure: nlp with cos.py calculates semantic similarity across documentation topics using PyTorch/TensorFlow models. Queries of varying lengths produced tensors with mismatched dimensions, throwing: RuntimeError: Tensors must be of the same shape This crashed deep multi-core fetches completely — the most expensive operation in the entire pipeline. How Copilot helped: Copilot suggested a preprocessing step using dynamic zero-padding and truncation to align all input vectors before the cosine similarity matrix calculation: Copilot's shape-alignment fix inputs = tokenizer text, padding="max length", truncation=True, max length=512, return tensors="pt" All tensors now enter the similarity layer at identical dimensions — no shape mismatches, no crashes. These weren't simple autocomplete suggestions. Copilot reasoned about async lifecycle boundaries , cross-version API compatibility , type system edge cases , and linear algebra constraints — the kind of bugs that take hours of debugging to even locate , let alone fix. The biggest unlock: it didn't just fix the symptom. For each bug, it explained why the original approach was fragile and offered a pattern that would hold up under production conditions. That's the difference between a tool and a collaborator. Built for the DEV.to hackathon. Powered by stubbornness, async Python, and too much coffee.