How I built a RAG-grounded Discord brain in 5 weeks (solo, ESL, no funding)

wpnews.pro

A user in our Discord asked, for the fourth time that week, the same question. Same wording, almost. The first three answers were buried somewhere in a thread, a pinned message, and a Notion page nobody bookmarked. A mod typed it out again. I watched it happen, opened Cursor, and started typing.

That's the moment Acortia became a product instead of a side note.

I'm Peng. Solo founder. Non-native English speaker. ESL teacher in Taipei by day, building backend software at night and on weekends. No funding. No team. No accelerator yet — YC F26 application is in. Five weeks ago I committed to building Acortia: a Discord-native Company Brain that answers /ask <q>

with a grounded, cited answer pulled from whatever the server has /save

d. $99/month. Mid-June launch.

This is the build log. Real numbers, real bugs, real tradeoffs. No hype.

Discord communities accumulate institutional knowledge the way a cluttered desk accumulates receipts: faster than anyone can file it. Threads scroll past. Pinned messages cap at 50. Search is keyword-based and stops at the channel boundary. New members ask questions that were answered six months ago in a thread that's now archived.

The cost isn't dramatic — it's grinding. Mods burn out re-answering. Founders re-explain pricing. Engineers re-link the same architecture diagram. Knowledge exists; it just isn't retrievable.

I looked at the existing options. Notion + Discord bots: too much manual upkeep. Generic AI chatbots: hallucinate confidently with no source. Custom in-house RAG: out of reach for the average community. The gap was a thin, opinionated tool that lived where the conversation already happened.

Acortia is three slash commands and a cron job.

/save <url>

— ingest a doc, a thread, a webpage, a PDF. Worker chunks it, embeds it, stores it./ask <q>

— retrieve top-k chunks via cosine similarity, ground a model response in them, return the answer with /sources

— list what the server has ingested. Audit trail.Install: OAuth the bot, click through to api.acortia.com/install

, claim the workspace via magic-link email. Thirty seconds end-to-end if the operator already has Discord admin.

That's the whole product surface. Everything else is plumbing.

Discord is the surface. Three slash commands registered globally, one OAuth flow, webhook-style interaction endpoints handled by the Render web service.

Supabase is the brain. Seven tables. Postgres with the pgvector

extension. Row Level Security keyed to workspace_id

. A single SQL RPC, match_artifacts

, does the vector search. RLS means a misrouted query physically cannot return another workspace's data — the database itself enforces tenancy.

Render is the muscle. A web service handles interactive Discord requests with a < 3s deadline. A worker process handles the slow path: fetch URL, extract text (PDF connector for application/pdf

, readability-style extractor for HTML), chunk, embed, write. A */15

cron sweeps queued ingest jobs and re-runs anything that timed out.

Stripe is the till. Checkout session for the $99/mo plan, webhook handler with idempotency (every event ID is upserted into stripe_events_seen

before any side effect runs), portal link for self-serve management. Promo codes managed in the Stripe dashboard.

Here's the SQL signature of the only RPC the app calls for retrieval. Stylized — the live function has more telemetry, but this is the shape:

-- match_artifacts: cosine similarity search scoped by workspace
create or replace function match_artifacts(
  query_embedding vector(1536),
  workspace_id_input uuid,
  match_count int default 5,
  min_similarity float default 0.15
)
returns table (
  artifact_id uuid,
  chunk_id uuid,
  content text,
  source_url text,
  similarity float
)
language sql stable
as $$
  select
    a.id as artifact_id,
    c.id as chunk_id,
    c.content,
    a.source_url,
    1 - (c.embedding <=> query_embedding) as similarity
  from chunks c
  join artifacts a on a.id = c.artifact_id
  where a.workspace_id = workspace_id_input
    and 1 - (c.embedding <=> query_embedding) >= min_similarity
  order by c.embedding <=> query_embedding
  limit match_count;
$$;

Two numbers in there worth naming: match_count = 5

and min_similarity = 0.15

. I tuned both empirically against my own corpus. Higher k bloats the context window without lifting answer quality; lower threshold lets junk through and the model hedges. Lower k makes confident answers brittle when the corpus is sparse. These are the knobs you'll want to revisit per-customer in v2.

Here's /ask

, sanitized and stylized. The real handler has more error wrapping and a deferred-response pattern for Discord's 3-second deadline, but the spine looks like this:

// apps/web/src/routes/interactions/ask.ts (illustrative)
import { embed } from "../../lib/embed";
import { supabase } from "../../lib/supabase";
import { groundAnswer } from "../../lib/llm";

export async function handleAsk(interaction: DiscordInteraction) {
  const question = interaction.data.options[0].value as string;
  const workspaceId = await resolveWorkspace(interaction.guild_id);

  const queryEmbedding = await embed(question);

  const { data: matches, error } = await supabase.rpc("match_artifacts", {
    query_embedding: queryEmbedding,
    workspace_id_input: workspaceId,
    match_count: 5,
    min_similarity: 0.15,
  });

  if (error) throw error;
  if (!matches?.length) {
    return reply(interaction, "No grounded sources found. Try `/save` first.");
  }

  const answer = await groundAnswer(question, matches);
  await logQuery(workspaceId, question, matches, answer); // queries.metadata

  return reply(interaction, formatWithCitations(answer, matches));
}

The logQuery

call writes to queries.metadata

— a JSON column that captures which artifacts were retrieved, the similarity scores, latency, and the model used. Telemetry isn't an afterthought; it's the only way to tell, six weeks in, whether the threshold of 0.15 is still right for a given customer.

Pinecone is excellent. It's also a second system to bill, monitor, and reconcile RLS against. Acortia's whole tenancy model is workspace_id

on every table. If embeddings live in a separate vector DB, I have to re-implement multi-tenant isolation there and trust two systems instead of one.

pgvector keeps embeddings inside the same Postgres that enforces RLS. The retrieval call is a single RPC. Cost at MVP scale: included in Supabase free tier. The day I outgrow it, the migration to a dedicated vector DB is a few hours, not a rewrite.

Discord OAuth tells me who installed the bot. It does not tell me which email owns the workspace for billing. I needed a second factor: a magic link sent to the operator's email so the Stripe Checkout, the invoice, and the workspace ownership all land on the same identity.

The decision inside that decision was implicit-flow vs PKCE for the magic-link callback. I went with implicit. PKCE is more secure on paper, but it requires client-side code verifier storage, which on Discord's embedded browser context is fragile. Implicit + short-lived (10 min) one-time codes + server-side verification gave me a flow that worked first try on iOS Discord, Android Discord, and desktop. The tradeoff: implicit is theoretically replayable in the 10-minute window. Mitigation: one-time-use enforced server-side, codes invalidated on first verification.

I'll revisit PKCE in v2 when I have time to test the embedded-browser edge cases properly.

Vercel is faster to ship for stateless routes. Acortia is not stateless. The ingest pipeline runs longer than any serverless function's hard timeout — PDFs in particular. I needed a long-running worker process and a cron. Render gives me both with one config file and one bill. Web + worker + cron on Render hobby tier costs less than a sandwich per month at MVP scale.

The day I need autoscale across regions, I'll consider Fly. Not before.

Day 20. A test user installed Acortia in two Discord servers using the same email, within about ninety seconds of each other. Both installs triggered a workspace-claim flow. Both wrote to the workspaces

table. The second write silently overwrote the first install's billing pointer. The user ended up with one Stripe customer and two Discord servers, but only one of the servers was correctly linked.

The bug had two causes braided together. The naive implementation was:

// Buggy original — two installs collide
const existing = await supabase
  .from("workspaces")
  .select("id")
  .eq("guild_id", guildId)
  .maybeSingle();

if (existing.data) {
  await supabase.from("workspaces").update({ ... }).eq("id", existing.data.id);
} else {
  await supabase.from("workspaces").insert({ ... });
}

Classic check-then-act. Two concurrent claims both saw existing.data === null

, both ran insert

, the unique constraint caught one and the other won the race. The losing install thought it succeeded because the response came from a different row.

The fix was atomic upsert plus moving email collection to claim time, not install time:

// Day-20 fix — atomic, idempotent
const { data, error } = await supabase
  .from("workspaces")
  .upsert(
    {
      guild_id: guildId,
      claim_email: null, // email collected later via magic link
      claim_token: generateToken(),
      claim_expires_at: new Date(Date.now() + 10 * 60 * 1000),
    },
    { onConflict: "guild_id", ignoreDuplicates: false }
  )
  .select()
  .single();

The atomic upsert means the database decides the winner. The deferred email means the second install doesn't even try to write the email column until the magic link is verified, which by then has a unique session token to disambiguate. I also added a trigger to fail-loud if claim_email

ever gets overwritten on a row that already has one — defense in depth.

Stripe webhooks got the same treatment because they always should:

// Webhook idempotency — check before any side effect
const { data: seen } = await supabase
  .from("stripe_events_seen")
  .select("id")
  .eq("event_id", event.id)
  .maybeSingle();

if (seen) return new Response("ok", { status: 200 });

await supabase.from("stripe_events_seen").insert({ event_id: event.id });
await handleStripeEvent(event); // safe to run exactly once

Idempotent webhooks are non-negotiable. Stripe will retry. You will get duplicates. Plan for it on Day 1, not Day 30.

Three things were on the board and got cut. Each cut was deliberate.

Slack adapter. I scaffolded a platform-adapter abstraction on Day 8 — the idea was that /save

and /ask

would be platform-agnostic and Slack would be a second surface. The scaffolding is in the repo. I did not build the Slack OAuth flow, slash command registration, or interaction handler. Reason: Slack outreach pre-launch was zero signal. Discord operators were actively asking for the tool. Building Slack would have cost a week and shipped a feature for a customer I didn't have. Parked until live revenue justifies it.

Notion connector. Considered. Killed. The use case I imagined — pull Notion pages as artifacts — is well-served by users copy-pasting URLs into /save

. The MCP route through Claude Desktop is enough for the operator's personal workflow. A first-party Notion connector adds OAuth, page-permission edge cases, and a separate sync cron. Not worth the complexity at MVP.

Pipedream MCP custom server. I spent a few hours wiring Pipedream as a generic connector tier. Backend was healthy, auth worked, but the abstraction was leaking into the slash-command UX. I cut it and routed power-user workflows through Claude Desktop's MCP instead. Acortia stays focused. Operators who want orchestration use Claude Desktop and call Acortia as a tool.

Telemetry first. I added queries.metadata

on Day 6, which was correct, but I didn't build a dashboard around it until Week 4. For the first three weeks I was debugging retrieval quality by reading raw Postgres rows. A 30-minute Metabase dashboard would have saved hours of squinting. If you're building RAG: instrument retrieval before you instrument anything else. You can't tune what you can't see.

Mid-June 2026 launch. Soft-live now for beta operators.

Install: api.acortia.com/install

Domain: acortia.com

Promo for readers of this post: BETA-FREE-30D

— 100% off the first month, 10 redemptions, expires 2026-06-30 23:59 UTC. After that the price is $99/month flat. No per-seat. No usage tier. One Discord server, one bill.

If you operate a Discord community, run a developer relations team, or moderate a paid creator server: this was built for you. If you don't, the architecture above is open notes — steal whatever's useful.

I'm in Taipei. I teach English to fund this build. I am not a native English speaker and I rewrite half of what I publish three times before it reads cleanly. Every line of Acortia was written between lesson plans and weekend mornings. No team. No accelerator yet. No outside capital.

What I'm proving with this build: a solo non-US founder can ship a credible B2B SaaS product end-to-end — auth, billing, RAG, multi-tenant data isolation, idempotent webhooks, a real cron pipeline — in five weeks of nights-and-weekends time, on a stack that costs less than a streaming subscription to run.

If that's interesting to you, the install link is above. If you want to talk shop, I'm on Discord and X under the same handle.

Brief. Concept. Preview. Ship.

source & further reading

dev.to — original article Technical Debt Didn’t Disappear. We Just Started Paying for It in Tokens. I Finally Understood Why Neural Networks Need Activation Functions Give Your Coding Agent a Deterministic Vulnerability Oracle

How I built a RAG-grounded Discord brain in 5 weeks (solo, ESL, no funding)

Run your AI side-project on zahid.host