cd /news/ai-products/months-of-self-testing-citations-shi… · home topics ai-products article
[ARTICLE · art-13970] src=dev.to pub= topic=ai-products verified=true sentiment=↑ positive

Months of self-testing: Citations shine, other features remain unproven.

A developer built UUMuse, a PDF workspace tool that indexes documents and allows users to ask questions with verifiable citations across multiple AI models. After months of self-testing, the developer found that citation accuracy and cross-model memory work well, while onboarding and retrieval failures when the model sounds confident remain unproven pain points. The tool is designed for students, researchers, PMs, and legal professionals who need to trust document-based answers and reuse context across GPT, Claude, DeepSeek, and other models.

read4 min publishedMay 26, 2026

ok so full disclosure: I'm the person who built this. I'm not here to drop a pitch deck on you — I just need people who actually live in PDF hell to tell me if I'm solving a real problem or cosplaying as a startup.

how I used to work (embarrassing) my "system" was: files in Drive → open ChatGPT → paste half a PDF → pray → switch to Claude → paste again → forget which paragraph the model hallucinated from.

NotebookLM was the first time I went "oh it actually read the doc." but I'd still end up with a chat thread that's useless outside that window. couldn't hand it to a teammate. couldn't put it on a landing page. couldn't switch models without feeling like I'm starting over.

that's basically why I started building UUMuse. yes that's the name. yes my friends say it sounds like a muse who uwu's. moving on.

what it is, if I explain it like I'm drunk at a meetup

you throw files in a workspace. it indexes them. you ask questions. answers try to show little [1][2] things so you can click and be like "ok yeah that's actually on page 4" instead of vibes-based trust.

you can bounce between GPT / Claude / DeepSeek / whatever without re-up the same stupid 80-page PDF. there's memory that carries over — like it learned I hate long intros and prefer bullets. you can also go into settings and delete a memory when it confidently learns the wrong thing about you (ask me how I know).

there's agent mode when I'm lazy — "summarize this folder and write me a memo" — and there's this multi-expert debate mode (Spark) that I built because I have decision paralysis and wanted AI to argue with itself using my files. ngl that's probably a me problem not a market problem.

how it actually feels to use day to day

the first "oh shit" moment for me was boring: I uploaded a messy PDF, asked something specific, clicked [1], and the highlight was actually the right paragraph. I know that sounds like table stakes in 2026 but emotionally it was the first time I stopped copy-pasting quotes back into the chat to fact-check.

second moment: I switched models mid-conversation and it didn't gaslight me with "I don't have access to your files." small thing. huge relief.

third: memory. I told it once I want concise answers and it… mostly listened. when it didn't, I deleted the memory entry instead of fighting the model in chat like it's an ex.

where it still feels clunky for me:

first-time onboarding tries to be helpful and sometimes feels like homework. I'm trimming it again for the 800th time.

when retrieval misses, the model can still sound confident. we're better than embed-and-pray v1 but it's not magic.

I definitely built "publish a docs page / embed widget / MCP hook" before I had ten humans who loved plain Q&A. classic side project disease. if you only try one thing, try upload → ask → click a citation. ignore the rest until that loop feels good.

who I think it's for (not "everyone")

if you're a student/researcher/PM/legal-ish person with a pile of docs and you don't want to rebuild context every time you change models — you'll probably get it.

if you want a clean notes app, use Obsidian. if you want Google's polished doc chat, use NotebookLM. I'm not trying to win that fight on day one.

I'm trying to win "I trust this enough to act on it" + "I can reuse the same brain elsewhere later."

where we're at

early. like… there's no user count I can flex. Product Hunt soon-ish. I'm posting here because my last attempt read like an ad and got nuked (fair).

built with

Next.js · FastAPI · Postgres/pgvector · Redis/Celery · Stripe · Docker. AI through the usual suspects (OpenAI/Anthropic/DeepSeek/Google). happy to yap about RAG failures in comments.

actually want from you

do you still paste docs into chat or did you fix your life with something else?

are citations a must for you or do you not care?

if you bounced off a tool like this — what was the dealbreaker?

if you want the link I'll drop it in a comment — trying not to lead with URL and get flagged again.

thanks for reading. roast welcome. compliments suspicious.

── more in #ai-products 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/months-of-self-testi…] indexed:0 read:4min 2026-05-26 ·