{"slug": "how-i-built-reposense-a-github-intelligence-cli-with-coral-sql", "title": "How I Built RepoSense: A GitHub Intelligence CLI With Coral SQL", "summary": "A developer built RepoSense, a terminal-based GitHub intelligence tool that answers questions about any public repository in under 10 seconds using a single command. The tool leverages Coral SQL to query live data from GitHub, Hacker News, and OSV APIs as SQL tables, eliminating the need for data warehouses or ETL pipelines. The developer discovered that using GitHub's search API functions instead of REST-paginated tables reduced query times from over 30 seconds to under 2 seconds, even for repositories with 14,000+ open issues.", "body_md": "Every developer I know has the same problem: 300 open issues, 40 stale PRs, a security label buried somewhere in the noise — and no fast answer to *what actually needs attention right now?*\n\nI built RepoSense to answer that question in under 10 seconds, for any public GitHub repo, with one terminal command and no dashboard.\n\nRepoSense is a terminal intelligence layer for GitHub repos. You point it at any repo and get:\n\nAnd if none of those match what you need, you can just type a question in plain English and the built-in AI agent (Claude, Groq, or GPT-4o) writes the SQL and runs it for you.\n\nThe whole thing runs in your terminal. No browser, no dashboard, no SaaS login.\n\nThe data I needed — GitHub issues, PRs, Hacker News posts, OSV vulnerability records — lives in completely different APIs with completely different schemas and auth systems. The traditional approach is to write a separate HTTP client for each, normalise the responses into Python dicts, and glue them together in application code.\n\nCoral makes all of that disappear. It exposes live APIs as SQL tables. GitHub becomes `github.search_issues()`\n\n. Hacker News becomes `hn.search`\n\n. OSV becomes `osv.query_by_version`\n\n. I write one SQL query and Coral handles auth, pagination, and response normalisation for all three.\n\nThe architecture looks like this:\n\n```\nreposense.py  →  coral sql -- \"<SQL>\"  →  GitHub API\n                                       →  Hacker News API\n                                       →  OSV API\n```\n\nThere is no data warehouse. There is no ETL pipeline. Every query hits live data.\n\nHere is the triage query:\n\n``` js\nSELECT number, title, user_login AS author\nFROM github.search_issues(\n  q => 'repo:django/django is:open sort:created-asc'\n)\nLIMIT 15\n```\n\nThat runs against `django/django`\n\n— a repo with 14,000+ open issues — and returns in 1.2 seconds. Not because I cached anything the first time. Because `search_issues()`\n\nis a Coral table function that pushes the filter to the GitHub Search API server-side. GitHub evaluates the query and returns only the matching items.\n\nRun it a second time within 5 minutes and it returns in 0.0 seconds — served from RepoSense's disk cache, which matches Coral's own 5-minute HTTP cache window. The footer shows `⚡ cached`\n\nso you always know.\n\nI learned this the hard way.\n\nMy first version used `github.issues`\n\n— the REST-paginated table:\n\n```\nSELECT number, title FROM github.issues\nWHERE owner = 'django' AND repo = 'django'\n  AND state = 'open'\nORDER BY created_at ASC\nLIMIT 15\n```\n\nOn `withcoral/coral`\n\n(650 issues), this returned in 4 seconds. Fine. I deployed it and tested on `django/django`\n\n. It never came back. Coral applies `LIMIT`\n\nafter fetching all pages — `LIMIT 15`\n\non 14,000 issues means 14,000 API calls first, then cut to 15. The query hit Coral's 30-second timeout before returning a single row.\n\nThe fix: `github.search_issues()`\n\n. GitHub Search API evaluates the filter server-side. `LIMIT 15`\n\non any size repo returns 15 items in under 2 seconds.\n\nThis became the foundational pattern for every query in RepoSense. **If you can express the filter as a GitHub Search query, always use search_issues().** I documented this as\n\nThe repo health score runs four concurrent queries, each counting items in a category (stale issues, stale PRs, merged PRs, closed issues). My original attempt:\n\n``` js\nSELECT COUNT(*) as count FROM (\n  SELECT 1 FROM github.search_issues(\n    q => 'repo:django/django is:issue is:open created:<2026-05-10 comments:<2'\n  )\n) sub\n```\n\nEven with `search_issues()`\n\n, this paginates all matching results before `COUNT(*)`\n\nruns. On `django/django`\n\n, \"stale issues older than 14 days with fewer than 2 comments\" returns hundreds of items — each page is another API call.\n\nThe fix: add `LIMIT 50`\n\ninside the subquery. Coral stops fetching after 50 results. The health score formula saturates well below 50 anyway (the penalty for 50 stale issues is the same as 100). Accuracy unchanged. Query time: 2.2 seconds on any repo.\n\n``` js\nSELECT COUNT(*) as count FROM (\n  SELECT 1 FROM github.search_issues(\n    q => 'repo:django/django is:issue is:open created:<2026-05-10 comments:<2'\n  ) LIMIT 50\n) sub\n```\n\nFour of these run in parallel via `ThreadPoolExecutor`\n\n. Total health score time: ~2.5 seconds.\n\nDuplicate detection requires comparing every issue against every other issue — a `CROSS JOIN`\n\n. On 14,000 open issues that is 196 million pairs. It will never complete.\n\nMy solution: bound both sides to 50 items with subqueries ordered by `created_at DESC`\n\n.\n\n```\nSELECT a.number, a.title, b.number, b.title\nFROM (\n  SELECT number, title FROM github.issues\n  WHERE owner = 'django' AND repo = 'django' AND state = 'open'\n  ORDER BY created_at DESC LIMIT 50\n) a\nCROSS JOIN (\n  SELECT number, title FROM github.issues\n  WHERE owner = 'django' AND repo = 'django' AND state = 'open'\n  ORDER BY created_at DESC LIMIT 50\n) b\nWHERE a.number < b.number\nLIMIT 2000\n```\n\nMaximum 1,225 pairs from 50 issues. A Python post-processing step then filters for pairs with 2+ shared significant keywords (after stripping stopwords and common commit-prefix verbs like `fixed`\n\n, `added`\n\n, `updated`\n\n). The table shows the matched keywords alongside each pair — and uses color-coding: bold for 4+ shared words, yellow for 3, dim for 2.\n\nThe SQL CROSS JOIN is also used for the `pulse`\n\ncommand in an entirely different way — joining HN search results against open GitHub issues to show what the tech community is discussing alongside what the project still has open. This is where it gets interesting: `pulse`\n\nis a genuine cross-source SQL JOIN across two live external APIs (`github`\n\n+ `hn`\n\n) in a single SQL statement.\n\nMy original hn-buzz design joined Hacker News search results against open issue titles:\n\n```\nSELECT h.title, h.points\nFROM hn.search h\nJOIN github.search_issues(...) i ON h.query = i.title\n```\n\nReturns zero rows. GitHub issue titles like `\"feat(ui): port custom icons from monorepo\"`\n\nare too specific for HN full-text search. HN posts discuss technologies and concepts, not individual issue names.\n\nThe correct design: two parallel queries. Query A searches HN for the project name. Query B fetches open GitHub issues. Claude reads both and surfaces *thematic* connections — \"HN is discussing SQL-over-APIs performance, which relates to your open issues about query timeouts.\"\n\nThis is more accurate to how HN actually works. It's documented as [ED-005](https://github.com/athul-2003/reposense/blob/main/docs/engineering-decisions.md).\n\n```\nreposense/\n├── reposense.py          # CLI entrypoint, command routing, interactive loop\n├── agent/\n│   ├── claude_agent.py   # Agentic loop — Claude, Groq, or GPT-4o, with coral_query tool\n│   ├── coral_runner.py   # run_query(), substitute_tokens(), disk cache\n│   ├── mcp_server.py     # MCP stdio server — run_command, coral_sql, list_sources\n│   └── prompts.py        # System prompt + grounding rules (no hallucination)\n├── queries/              # SQL files, one per feature\n│   ├── triage.sql\n│   ├── stale_prs.sql\n│   ├── release_notes.sql\n│   ├── hn_buzz.sql\n│   ├── cve_scan.sql      # Three queries: Dependabot + keyword + OSV\n│   ├── contributors.sql\n│   ├── duplicates.sql\n│   ├── pulse.sql         # CROSS JOIN: github.search_issues × hn.search\n│   ├── so_buzz.sql       # Stack Overflow top questions by vote score\n│   ├── dev_buzz.sql      # Dev.to trending articles by tag\n│   └── scorecard.sql     # OpenSSF Scorecard checks — sorted by score ASC\n├── sources/\n│   ├── stackoverflow/\n│   │   ├── manifest.yaml # Custom Coral DSL v3 source — Stack Exchange API\n│   │   └── README.md\n│   ├── devto/\n│   │   ├── manifest.yaml # Custom Coral DSL v3 source — Dev.to API\n│   │   └── README.md\n│   └── scorecard/\n│       ├── manifest.yaml # Custom Coral DSL v3 source — OpenSSF Scorecard API\n│       └── README.md\n└── ui/\n    ├── splash.py         # Logo, health score bar, 4 concurrent signals\n    ├── chat.py           # SQL panel, spinner, error panel\n    └── tables.py         # Rich table renderers — one per command\n```\n\nEvery command is a `.sql`\n\nfile. Adding a new feature is: write the SQL, add a command mapping. No new Python code required.\n\nAdding a new *source* is: write a Coral DSL v3 YAML manifest and run `coral source add`\n\n. RepoSense ships a Stack Overflow source spec as an example.\n\nThe SQL files use runtime tokens that get substituted before execution:\n\n| Token | Resolves to |\n|---|---|\n`{owner}` |\nGitHub org or username |\n`{repo}` |\nRepository name |\n`{30_days_ago}` |\nISO date 30 days back |\n`{14_days_ago}` |\nISO date 14 days back |\n`{7_days_ago}` |\nISO date 7 days back |\n`{hn_query}` |\nHN search term (from `HN_QUERY` env var, default: repo name) |\n`{package_name}` |\nDependency to scan (from `PACKAGE_NAME` env var) |\n`{package_ecosystem}` |\ne.g. PyPI, npm, Go (from `PACKAGE_ECOSYSTEM` env var) |\n`{package_version}` |\nVersion to check CVEs for (from `PACKAGE_VERSION` env var) |\n\nRunning `reposense --repo owner/repo`\n\nwithout a command drops into interactive mode. You can type commands (`triage`\n\n, `cve-scan`\n\n) or ask questions in plain English:\n\n```\n> which contributor should I thank this week?\n> are there any security issues related to authentication?\n> what did we ship in the last two weeks?\n```\n\nPlain English questions go to an AI agent (Claude or GPT-4o — whichever API key you have) that has one tool: `coral_query(sql)`\n\n. The agent writes the SQL, runs it via Coral, reads the result, and answers in natural language. The SQL it generates is displayed in the terminal so you can see exactly what it ran.\n\nThe agent auto-detects which provider to use:\n\n``` python\n# Priority: Claude → Groq (free) → GPT-4o\ndef _get_backend():\n    if key := os.getenv(\"ANTHROPIC_API_KEY\"):\n        return _ClaudeBackend(key)   # uses claude-sonnet-4-6\n    if key := os.getenv(\"GROQ_API_KEY\"):\n        return _GroqBackend(key)     # uses llama-3.3-70b-versatile (free)\n    if key := os.getenv(\"OPENAI_API_KEY\"):\n        return _OpenAIBackend(key)   # uses gpt-4o-mini\n    return None                      # shows setup instructions\n```\n\nThe main loop is provider-agnostic — both backends implement the same interface (`call`\n\n, `is_final`\n\n, `get_text`\n\n, `get_tool_calls`\n\n, `append_tool_results`\n\n).\n\nThe agent can run multiple queries if needed — e.g., \"show me stale PRs and who opened them\" runs two separate Coral queries before composing a final answer.\n\n`so-buzz`\n\nisn't backed by a built-in Coral source — it uses a community source manifest I wrote for Stack Overflow. The file is `sources/stackoverflow/manifest.yaml`\n\nand follows the Coral DSL v3 format.\n\nThe Stack Exchange API is zero-auth for public data: 300 requests/day with no key, 10,000/day with a free `STACK_EXCHANGE_KEY`\n\n. The manifest handles REST pagination, Unix timestamp → UTC conversion, and nested field access (`owner.display_name`\n\nfor the asker's display name). Once installed, this is valid SQL:\n\n```\nSELECT title, score, answer_count, is_answered\nFROM stackoverflow.questions\nWHERE tagged = 'django' AND site = 'stackoverflow'\nORDER BY score DESC\nLIMIT 10\n```\n\nThis is Coral's extensibility model in practice: any JSON HTTP API can become a queryable SQL table in ~80 lines of YAML. No SDK, no custom connector, no deployment. Write the manifest, run `coral source add`\n\n, and the table is live.\n\nI submitted the OpenSSF Scorecard manifests as community contribution to the Coral repo. The Scorecard source uses an unusual DSL pattern — `{{filter.owner}}`\n\nand `{{filter.repo}}`\n\ninjected directly into the URL path, not as query parameters — which is rare across Coral's 90+ community sources.\n\nAll 12 commands tested across 4 repos of very different sizes and characteristics: `withcoral/coral`\n\n(~650 issues), `django/django`\n\n(14,000+ issues), `expressjs/express`\n\n, and `facebook/react`\n\n.\n\n| Command | withcoral/coral | django/django |\n|---|---|---|\n| triage | 1.4s ✅ | 1.2s ✅ |\n| stale-prs | 1.8s ✅ | 1.4s ✅ |\n| contributors | 3.0s ✅ | 2.7s ✅ |\n| hn-buzz | 1.6s ✅ | 1.5s ✅ |\n| cve-scan | 2.3s ✅ | 2.2s ✅ |\n| release-notes | 2.4s ✅ | 1.7s ✅ |\n| duplicates | 15–20s ✅ | 25–35s ✅ |\n| health | 2.5s ✅ | 2.2s ✅ |\n| pulse | 2.4s ✅ | 2.4s ✅ |\n| so-buzz | 1.1s ✅ | 0.9s ✅ |\n| dev-buzz | 1.7s ✅ | 1.7s ✅ |\n| scorecard | 1.0s ✅ | 0.9s ✅ |\n\n48/48 commands pass across all 4 repos. 11 of 12 commands complete in under 5 seconds. `duplicates`\n\nis slower by design (CROSS JOIN + similarity scan) but completes in under 90 seconds on any public repo.\n\n```\ngit clone https://github.com/athul-2003/reposense\ncd reposense\nbash setup.sh\n```\n\n`setup.sh`\n\ninstalls Coral, connects all 6 sources (GitHub, HN, OSV, Stack Overflow, Dev.to, OpenSSF Scorecard), and installs the `reposense`\n\ncommand globally. The only prompt is your GitHub Personal Access Token — needed once.\n\n```\nreposense --repo withcoral/coral triage\nreposense --repo django/django health\nreposense --repo expressjs/express cve-scan\nreposense --repo withcoral/coral       # interactive agent mode\n```\n\nAll 12 commands work without any LLM key. Agent mode (plain English questions) uses whichever key you have — Claude, Groq (free), or GPT-4o — in that priority order.\n\nRepoSense also runs as an MCP server for Claude Desktop and other MCP clients:\n\n```\n# Add to Claude Desktop in one command\nclaude mcp add reposense -- reposense --mcp\n```\n\nOr add manually to your Claude config:\n\n```\n{\n  \"mcpServers\": {\n    \"reposense\": {\n      \"command\": \"reposense\",\n      \"args\": [\"--mcp\"]\n    }\n  }\n}\n```\n\nMCP tools exposed: `run_command`\n\n(all 12 commands), `coral_sql`\n\n(arbitrary SQL), `list_sources`\n\n(schema discovery).\n\nCoral is not a database. It's a SQL runtime that turns live APIs into queryable tables. The mental shift is: **you're not querying stored data, you're designing API call patterns**.\n\nEvery `FROM`\n\nclause is an API request. Every filter you push into a `WHERE`\n\nclause inside a `search_issues()`\n\ncall is a filter you're pushing to GitHub's servers. Every `JOIN`\n\nacross two tables is two sets of API calls whose results get joined in memory.\n\nOnce that clicks, the optimisation instincts are the same as database query optimisation — but the bottleneck is API call count, not disk I/O. The engineering decisions in this project are all expressions of that mental model.\n\nThe `search_issues()`\n\ntable function is particularly powerful. It lets you express complex filters in GitHub Search Query Language and get server-side evaluation — something GitHub's REST API doesn't expose directly. Coral wraps it cleanly as a SQL table function.\n\nA few things I'd add with more time:\n\n`coral source add --file slack/manifest.yaml`\n\nand RepoSense could show which GitHub issues are being discussed in your team's Slack channels.`reposense digest --repo org/repo --email me@example.com`\n\nas a scheduled cron using the same SQL queries.`reposense <TAB>`\n\nto see available commands.`reposense --repos org/repo1,org/repo2 triage`\n\n.All of these are purely additive — new `.sql`\n\nfiles, new Coral source additions, or new CLI flags. The architecture supports them without changes to the core.\n\nRepoSense already ships a custom Stack Overflow source spec (`sources/stackoverflow/manifest.yaml`\n\n) as a worked example of writing a Coral DSL v3 manifest for a new API. Anyone can follow the same pattern to add any HTTP API as a queryable SQL table.\n\n*RepoSense is open source under the MIT license.*\n\n*Built with Coral, Claude,rich, click, and uv.*\n\n[github.com/athul-2003/reposense](https://github.com/athul-2003/reposense)\n\n*Demo video: youtu.be/7hxAJ9SiKqU*", "url": "https://wpnews.pro/news/how-i-built-reposense-a-github-intelligence-cli-with-coral-sql", "canonical_source": "https://dev.to/athulkrishnan_h_210b5ea44/how-i-built-reposense-a-github-intelligence-cli-with-coral-sql-4jgm", "published_at": "2026-05-31 08:17:38+00:00", "updated_at": "2026-05-31 08:41:05.085028+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "ai-infrastructure"], "entities": ["RepoSense", "Coral SQL", "GitHub", "Claude", "Groq", "GPT-4o", "Hacker News", "OSV"], "alternates": {"html": "https://wpnews.pro/news/how-i-built-reposense-a-github-intelligence-cli-with-coral-sql", "markdown": "https://wpnews.pro/news/how-i-built-reposense-a-github-intelligence-cli-with-coral-sql.md", "text": "https://wpnews.pro/news/how-i-built-reposense-a-github-intelligence-cli-with-coral-sql.txt", "jsonld": "https://wpnews.pro/news/how-i-built-reposense-a-github-intelligence-cli-with-coral-sql.jsonld"}}