🧠 NeuroDoc: From Broken Prototype to Production-Ready Async AI Documentation Engine

wpnews.pro

This is a submission for the GitHub Finish-Up-A-Thon Challenge

I abandoned this project. Then I resurrected it. Here's how a fragile CLI script became a full-stack async web dashboard with RAG capabilities.

NeuroDoc started as an ambitious idea: a single tool to fetch, scrape, process, and summarize documentation across Python, scikit-learn, PyTorch, and TensorFlow — powered by NLP and multi-core processing.

But it hit a wall fast.

while True:
    query = input("Enter query: ")  # 🚫 BLOCKS the main thread
    result = fetch_docs(query)      # 🚫 BLOCKS background workers
    print(result)

The original prototype had three fatal flaws:

Problem	Impact
`input()` loop on main thread
Blocked all background scraping workers
In-memory task queue	All pending jobs vanished on crash
Brittle core resolver	Failed silently on dynamic imports

Long-running doc crawls would stall. A single crash wiped the entire task queue. It was a house of cards — impressive from a distance, terrifying up close.

So I shelved it.

Months later, I came back with a clear head and a plan. The rewrite wasn't incremental — it was architectural. Three shifts made everything click:

asyncio

aiohttp

Out went the blocking loop. In came a proper async event loop that lets scraping, processing, and serving happen concurrently without stepping on each other.

async def fetch_documentation(url: str, session: aiohttp.ClientSession) -> DocResult:
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as response:
        content = await response.text()
        return await process_content(content)

async def run_pipeline(queries: list[str]) -> list[DocResult]:
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_documentation(q, session) for q in queries]
        return await asyncio.gather(*tasks, return_exceptions=True)

No more frozen terminals. No more stalled workers.

The in-memory queue was replaced with a persistent, database-backed task queue. Now if the server crashes at 3 AM while crawling PyTorch docs, no work is lost. Tasks resume exactly where they left off.

class TaskQueue:
    async def enqueue(self, task: DocumentationTask) -> str:
        task_id = str(uuid.uuid4())
        await self.db.execute(
            "INSERT INTO tasks (id, status, payload, created_at) VALUES (?, ?, ?, ?)",
            (task_id, TaskStatus.PENDING, task.to_json(), datetime.utcnow())
        )
        return task_id

    async def get_next(self) -> DocumentationTask | None:
        row = await self.db.fetchone(
            "SELECT * FROM tasks WHERE status = 'pending' ORDER BY created_at LIMIT 1"
        )
        return DocumentationTask.from_row(row) if row else None

This is where NeuroDoc levels up from "scraper" to "intelligent documentation assistant."

Instead of returning raw docs, it:

class RAGPipeline:
    async def query(self, user_query: str) -> RAGResponse:
        query_embedding = await self.embedder.embed(user_query)

        relevant_chunks = await self.vector_store.similarity_search(
            query_embedding, top_k=5
        )

        context = "\n\n".join(chunk.text for chunk in relevant_chunks)
        summary = await self.llm.generate(
            prompt=f"Answer based on this documentation:\n{context}\n\nQuery: {user_query}"
        )

        return RAGResponse(summary=summary, sources=relevant_chunks)
┌─────────────────────────────────────────────────────┐
│                   Web Dashboard (FastAPI)            │
│              ┌──────────┬──────────────┐            │
│              │  Submit  │   Results    │            │
│              │  Query   │   Viewer     │            │
│              └────┬─────┴──────┬───────┘            │
└───────────────────┼────────────┼────────────────────┘
                    │            │
          ┌─────────▼────────────▼──────────┐
          │       Async Task Dispatcher      │
          │    (asyncio + DB task queue)     │
          └──────┬──────────────────┬────────┘
                 │                  │
    ┌────────────▼────┐    ┌────────▼────────────┐
    │  Multi-core     │    │   RAG Pipeline       │
    │  Doc Scraper    │    │  (Embed → Retrieve   │
    │  (aiohttp)      │    │   → Generate)        │
    └────────┬────────┘    └────────┬─────────────┘
             │                      │
    ┌────────▼──────────────────────▼─────────────┐
    │           SQLite / PostgreSQL DB             │
    │   (tasks · chunks · embeddings · results)    │
    └──────────────────────────────────────────────┘

Library	Sections Scraped	NLP Processing
🐍 Python

stdlib, builtins, language ref	Code extraction, summaries
🤖 scikit-learn

API reference, user guide	Table parsing, param docs
🔥 PyTorch

Tensor ops, nn, autograd	Code snippets, examples
🌊 TensorFlow

Keras, tf.data, layers	API signatures, guides

git clone https://github.com/kaushikcoderpy1/neurodoc
cd neurodoc

pip install -r requirements.txt

python -m neurodoc.db init

uvicorn neurodoc.app:app --reload --port 8000

Then open http://localhost:8000

and start querying.

Why asyncio over threading?

asyncio

handles thousands of concurrent requests with a single thread — no GIL fights, no race conditions.Why SQLite for the task queue instead of Redis?

Zero infrastructure. NeuroDoc is a dev tool — adding a Redis dependency just to persist a queue adds friction. SQLite WAL mode handles concurrent reads/writes cleanly for this use case.

Why RAG over fine-tuning?

Documentation changes constantly. RAG retrieves from live-scraped content. A fine-tuned model would be stale in weeks.

This section is the heart of the comeback story. NeuroDoc didn't just get rewritten — it got

debugged at a deep architectural levelwith Copilot as a true pair programmer. Here are four real, production-blocking bugs it helped resolve.

The failure: Under high-concurrency loads via asyncio.gather

, edge-case exceptions inside sub-coroutines bypassed connection release hooks — leaving asyncpg

pool sockets exhausted and the app hanging silently.

Standard try/finally

cleanup blocks failed because they referenced stale async contexts. The pool hit max capacity and froze.

How Copilot helped:

Copilot introduced a strict connection acquisition pattern bound directly to local transaction lifecycles, with absolute timeout guards:

async with pool.acquire() as connection:
    async with connection.transaction():
        result = await asyncio.wait_for(
            connection.fetch(query, *args),
            timeout=5.0  # Hard boundary — no silent hangs
        )

It also added global exception wrappers that translate raw driver errors into clean structured responses — guaranteeing connection cleanup even if the downstream scraping pipeline crashed.

SpecifierSet .contains()

AttributeError Across Packaging Versions The failure: formatter.py

runs dependency diagnostics via DependencyAnalyzer

. On environments with older packaging

library versions, calling .contains()

on a SpecifierSet

threw:

AttributeError: 'SpecifierSet' object has no attribute 'contains'

This crashed the entire diagnostic panel before it could render — silently breaking environment validation for a large chunk of users.

How Copilot helped:

Copilot identified that .contains()

is version-specific, but the native in

operator is universally backward-compatible across all historical releases of packaging

:

elif not raw_spec.contains(local):

elif local not in raw_spec:

One operator swap. Zero crashes across all environments.

The failure: In neurodoc.py

, CLI input like neurodoc fetch os

passed the core ID "1"

as a raw string into isinstance(core, Core1PythonBasics)

checks. Since "1"

is a string, every check silently fell through with:

Unknown core type for str

Worse — the topic "os"

was passed into the batch resolver without list wrapping, so it iterated over the characters 'o'

and 's'

separately instead of treating "os"

as a unified module name.

How Copilot helped:

Copilot introduced dynamic string dereferencing that maps string IDs back to their live handler instances, plus list-wrapping for topic encapsulation:

if isinstance(core, str):
    core = self.command_handler.available_cores.get(core)

return await self.call_backend("core1", topics=[topic_f], flags=flags)

The failure: nlp_with_cos.py

calculates semantic similarity across documentation topics using PyTorch/TensorFlow models. Queries of varying lengths produced tensors with mismatched dimensions, throwing:

RuntimeError: Tensors must be of the same shape

This crashed deep multi-core fetches completely — the most expensive operation in the entire pipeline.

How Copilot helped:

Copilot suggested a preprocessing step using dynamic zero-padding and truncation to align all input vectors before the cosine similarity matrix calculation:

inputs = tokenizer(
    text,
    padding="max_length",
    truncation=True,
    max_length=512,
    return_tensors="pt"
)

All tensors now enter the similarity layer at identical dimensions — no shape mismatches, no crashes.

These weren't simple autocomplete suggestions. Copilot reasoned about async lifecycle boundaries, cross-version API compatibility, type system edge cases, and linear algebra constraints — the kind of bugs that take hours of debugging to even locate, let alone fix.

The biggest unlock: it didn't just fix the symptom. For each bug, it explained why the original approach was fragile and offered a pattern that would hold up under production conditions.

That's the difference between a tool and a collaborator.

Built for the DEV.to hackathon. Powered by stubbornness, async Python, and too much coffee.

source & further reading

dev.to — original article Manticore Search 28.4.4: Faster KNN, better conversational search, easier installs and more faceting controls Try an AI Dev Platform Without the Setup Tax: MonkeyCode's Hosted SaaS Empero AI Releases Qwythos-9B-v2: Addressing Looping and Enhancing Robustness in a 1M-Token LLM

🧠 NeuroDoc: From Broken Prototype to Production-Ready Async AI Documentation Engine

Run your AI side-project on zahid.host