cd /news/ai-tools/how-i-built-reposense-a-github-intel… Β· home β€Ί topics β€Ί ai-tools β€Ί article
[ARTICLE Β· art-19042] src=dev.to pub= topic=ai-tools verified=true sentiment=↑ positive

How I Built RepoSense: A GitHub Intelligence CLI With Coral SQL

A developer built RepoSense, a terminal-based GitHub intelligence tool that answers questions about any public repository in under 10 seconds using a single command. The tool leverages Coral SQL to query live data from GitHub, Hacker News, and OSV APIs as SQL tables, eliminating the need for data warehouses or ETL pipelines. The developer discovered that using GitHub's search API functions instead of REST-paginated tables reduced query times from over 30 seconds to under 2 seconds, even for repositories with 14,000+ open issues.

read12 min publishedMay 31, 2026

Every developer I know has the same problem: 300 open issues, 40 stale PRs, a security label buried somewhere in the noise β€” and no fast answer to what actually needs attention right now?

I built RepoSense to answer that question in under 10 seconds, for any public GitHub repo, with one terminal command and no dashboard.

RepoSense is a terminal intelligence layer for GitHub repos. You point it at any repo and get:

And if none of those match what you need, you can just type a question in plain English and the built-in AI agent (Claude, Groq, or GPT-4o) writes the SQL and runs it for you.

The whole thing runs in your terminal. No browser, no dashboard, no SaaS login.

The data I needed β€” GitHub issues, PRs, Hacker News posts, OSV vulnerability records β€” lives in completely different APIs with completely different schemas and auth systems. The traditional approach is to write a separate HTTP client for each, normalise the responses into Python dicts, and glue them together in application code.

Coral makes all of that disappear. It exposes live APIs as SQL tables. GitHub becomes github.search_issues()

. Hacker News becomes hn.search

. OSV becomes osv.query_by_version

. I write one SQL query and Coral handles auth, pagination, and response normalisation for all three.

The architecture looks like this:

reposense.py  β†’  coral sql -- "<SQL>"  β†’  GitHub API
                                       β†’  Hacker News API
                                       β†’  OSV API

There is no data warehouse. There is no ETL pipeline. Every query hits live data.

Here is the triage query:

SELECT number, title, user_login AS author
FROM github.search_issues(
  q => 'repo:django/django is:open sort:created-asc'
)
LIMIT 15

That runs against django/django

β€” a repo with 14,000+ open issues β€” and returns in 1.2 seconds. Not because I cached anything the first time. Because search_issues()

is a Coral table function that pushes the filter to the GitHub Search API server-side. GitHub evaluates the query and returns only the matching items.

Run it a second time within 5 minutes and it returns in 0.0 seconds β€” served from RepoSense's disk cache, which matches Coral's own 5-minute HTTP cache window. The footer shows ⚑ cached

so you always know.

I learned this the hard way.

My first version used github.issues

β€” the REST-paginated table:

SELECT number, title FROM github.issues
WHERE owner = 'django' AND repo = 'django'
  AND state = 'open'
ORDER BY created_at ASC
LIMIT 15

On withcoral/coral

(650 issues), this returned in 4 seconds. Fine. I deployed it and tested on django/django

. It never came back. Coral applies LIMIT

after fetching all pages β€” LIMIT 15

on 14,000 issues means 14,000 API calls first, then cut to 15. The query hit Coral's 30-second timeout before returning a single row.

The fix: github.search_issues()

. GitHub Search API evaluates the filter server-side. LIMIT 15

on any size repo returns 15 items in under 2 seconds.

This became the foundational pattern for every query in RepoSense. If you can express the filter as a GitHub Search query, always use search_issues(). I documented this as

The repo health score runs four concurrent queries, each counting items in a category (stale issues, stale PRs, merged PRs, closed issues). My original attempt:

SELECT COUNT(*) as count FROM (
  SELECT 1 FROM github.search_issues(
    q => 'repo:django/django is:issue is:open created:<2026-05-10 comments:<2'
  )
) sub

Even with search_issues()

, this paginates all matching results before COUNT(*)

runs. On django/django

, "stale issues older than 14 days with fewer than 2 comments" returns hundreds of items β€” each page is another API call.

The fix: add LIMIT 50

inside the subquery. Coral stops fetching after 50 results. The health score formula saturates well below 50 anyway (the penalty for 50 stale issues is the same as 100). Accuracy unchanged. Query time: 2.2 seconds on any repo.

SELECT COUNT(*) as count FROM (
  SELECT 1 FROM github.search_issues(
    q => 'repo:django/django is:issue is:open created:<2026-05-10 comments:<2'
  ) LIMIT 50
) sub

Four of these run in parallel via ThreadPoolExecutor

. Total health score time: ~2.5 seconds.

Duplicate detection requires comparing every issue against every other issue β€” a CROSS JOIN

. On 14,000 open issues that is 196 million pairs. It will never complete.

My solution: bound both sides to 50 items with subqueries ordered by created_at DESC

.

SELECT a.number, a.title, b.number, b.title
FROM (
  SELECT number, title FROM github.issues
  WHERE owner = 'django' AND repo = 'django' AND state = 'open'
  ORDER BY created_at DESC LIMIT 50
) a
CROSS JOIN (
  SELECT number, title FROM github.issues
  WHERE owner = 'django' AND repo = 'django' AND state = 'open'
  ORDER BY created_at DESC LIMIT 50
) b
WHERE a.number < b.number
LIMIT 2000

Maximum 1,225 pairs from 50 issues. A Python post-processing step then filters for pairs with 2+ shared significant keywords (after stripping stopwords and common commit-prefix verbs like fixed

, added

, updated

). The table shows the matched keywords alongside each pair β€” and uses color-coding: bold for 4+ shared words, yellow for 3, dim for 2.

The SQL CROSS JOIN is also used for the pulse

command in an entirely different way β€” joining HN search results against open GitHub issues to show what the tech community is discussing alongside what the project still has open. This is where it gets interesting: pulse

is a genuine cross-source SQL JOIN across two live external APIs (github

  • hn

) in a single SQL statement.

My original hn-buzz design joined Hacker News search results against open issue titles:

SELECT h.title, h.points
FROM hn.search h
JOIN github.search_issues(...) i ON h.query = i.title

Returns zero rows. GitHub issue titles like "feat(ui): port custom icons from monorepo"

are too specific for HN full-text search. HN posts discuss technologies and concepts, not individual issue names.

The correct design: two parallel queries. Query A searches HN for the project name. Query B fetches open GitHub issues. Claude reads both and surfaces thematic connections β€” "HN is discussing SQL-over-APIs performance, which relates to your open issues about query timeouts."

This is more accurate to how HN actually works. It's documented as ED-005.

reposense/
β”œβ”€β”€ reposense.py          # CLI entrypoint, command routing, interactive loop
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ claude_agent.py   # Agentic loop β€” Claude, Groq, or GPT-4o, with coral_query tool
β”‚   β”œβ”€β”€ coral_runner.py   # run_query(), substitute_tokens(), disk cache
β”‚   β”œβ”€β”€ mcp_server.py     # MCP stdio server β€” run_command, coral_sql, list_sources
β”‚   └── prompts.py        # System prompt + grounding rules (no hallucination)
β”œβ”€β”€ queries/              # SQL files, one per feature
β”‚   β”œβ”€β”€ triage.sql
β”‚   β”œβ”€β”€ stale_prs.sql
β”‚   β”œβ”€β”€ release_notes.sql
β”‚   β”œβ”€β”€ hn_buzz.sql
β”‚   β”œβ”€β”€ cve_scan.sql      # Three queries: Dependabot + keyword + OSV
β”‚   β”œβ”€β”€ contributors.sql
β”‚   β”œβ”€β”€ duplicates.sql
β”‚   β”œβ”€β”€ pulse.sql         # CROSS JOIN: github.search_issues Γ— hn.search
β”‚   β”œβ”€β”€ so_buzz.sql       # Stack Overflow top questions by vote score
β”‚   β”œβ”€β”€ dev_buzz.sql      # Dev.to trending articles by tag
β”‚   └── scorecard.sql     # OpenSSF Scorecard checks β€” sorted by score ASC
β”œβ”€β”€ sources/
β”‚   β”œβ”€β”€ stackoverflow/
β”‚   β”‚   β”œβ”€β”€ manifest.yaml # Custom Coral DSL v3 source β€” Stack Exchange API
β”‚   β”‚   └── README.md
β”‚   β”œβ”€β”€ devto/
β”‚   β”‚   β”œβ”€β”€ manifest.yaml # Custom Coral DSL v3 source β€” Dev.to API
β”‚   β”‚   └── README.md
β”‚   └── scorecard/
β”‚       β”œβ”€β”€ manifest.yaml # Custom Coral DSL v3 source β€” OpenSSF Scorecard API
β”‚       └── README.md
└── ui/
    β”œβ”€β”€ splash.py         # Logo, health score bar, 4 concurrent signals
    β”œβ”€β”€ chat.py           # SQL panel, spinner, error panel
    └── tables.py         # Rich table renderers β€” one per command

Every command is a .sql

file. Adding a new feature is: write the SQL, add a command mapping. No new Python code required.

Adding a new source is: write a Coral DSL v3 YAML manifest and run coral source add

. RepoSense ships a Stack Overflow source spec as an example.

The SQL files use runtime tokens that get substituted before execution:

Token Resolves to
{owner}
GitHub org or username
{repo}
Repository name
{30_days_ago}
ISO date 30 days back
{14_days_ago}
ISO date 14 days back
{7_days_ago}
ISO date 7 days back
{hn_query}
HN search term (from HN_QUERY env var, default: repo name)
{package_name}
Dependency to scan (from PACKAGE_NAME env var)
{package_ecosystem}
e.g. PyPI, npm, Go (from PACKAGE_ECOSYSTEM env var)
{package_version}
Version to check CVEs for (from PACKAGE_VERSION env var)

Running reposense --repo owner/repo

without a command drops into interactive mode. You can type commands (triage

, cve-scan

) or ask questions in plain English:

> which contributor should I thank this week?
> are there any security issues related to authentication?
> what did we ship in the last two weeks?

Plain English questions go to an AI agent (Claude or GPT-4o β€” whichever API key you have) that has one tool: coral_query(sql)

. The agent writes the SQL, runs it via Coral, reads the result, and answers in natural language. The SQL it generates is displayed in the terminal so you can see exactly what it ran.

The agent auto-detects which provider to use:

def _get_backend():
    if key := os.getenv("ANTHROPIC_API_KEY"):
        return _ClaudeBackend(key)   # uses claude-sonnet-4-6
    if key := os.getenv("GROQ_API_KEY"):
        return _GroqBackend(key)     # uses llama-3.3-70b-versatile (free)
    if key := os.getenv("OPENAI_API_KEY"):
        return _OpenAIBackend(key)   # uses gpt-4o-mini
    return None                      # shows setup instructions

The main loop is provider-agnostic β€” both backends implement the same interface (call

, is_final

, get_text

, get_tool_calls

, append_tool_results

).

The agent can run multiple queries if needed β€” e.g., "show me stale PRs and who opened them" runs two separate Coral queries before composing a final answer.

so-buzz

isn't backed by a built-in Coral source β€” it uses a community source manifest I wrote for Stack Overflow. The file is sources/stackoverflow/manifest.yaml

and follows the Coral DSL v3 format.

The Stack Exchange API is zero-auth for public data: 300 requests/day with no key, 10,000/day with a free STACK_EXCHANGE_KEY

. The manifest handles REST pagination, Unix timestamp β†’ UTC conversion, and nested field access (owner.display_name

for the asker's display name). Once installed, this is valid SQL:

SELECT title, score, answer_count, is_answered
FROM stackoverflow.questions
WHERE tagged = 'django' AND site = 'stackoverflow'
ORDER BY score DESC
LIMIT 10

This is Coral's extensibility model in practice: any JSON HTTP API can become a queryable SQL table in ~80 lines of YAML. No SDK, no custom connector, no deployment. Write the manifest, run coral source add

, and the table is live.

I submitted the OpenSSF Scorecard manifests as community contribution to the Coral repo. The Scorecard source uses an unusual DSL pattern β€” {{filter.owner}}

and {{filter.repo}}

injected directly into the URL path, not as query parameters β€” which is rare across Coral's 90+ community sources.

All 12 commands tested across 4 repos of very different sizes and characteristics: withcoral/coral

(~650 issues), django/django

(14,000+ issues), expressjs/express

, and facebook/react

.

Command withcoral/coral django/django
triage 1.4s βœ… 1.2s βœ…
stale-prs 1.8s βœ… 1.4s βœ…
contributors 3.0s βœ… 2.7s βœ…
hn-buzz 1.6s βœ… 1.5s βœ…
cve-scan 2.3s βœ… 2.2s βœ…
release-notes 2.4s βœ… 1.7s βœ…
duplicates 15–20s βœ… 25–35s βœ…
health 2.5s βœ… 2.2s βœ…
pulse 2.4s βœ… 2.4s βœ…
so-buzz 1.1s βœ… 0.9s βœ…
dev-buzz 1.7s βœ… 1.7s βœ…
scorecard 1.0s βœ… 0.9s βœ…

48/48 commands pass across all 4 repos. 11 of 12 commands complete in under 5 seconds. duplicates

is slower by design (CROSS JOIN + similarity scan) but completes in under 90 seconds on any public repo.

git clone https://github.com/athul-2003/reposense
cd reposense
bash setup.sh

setup.sh

installs Coral, connects all 6 sources (GitHub, HN, OSV, Stack Overflow, Dev.to, OpenSSF Scorecard), and installs the reposense

command globally. The only prompt is your GitHub Personal Access Token β€” needed once.

reposense --repo withcoral/coral triage
reposense --repo django/django health
reposense --repo expressjs/express cve-scan
reposense --repo withcoral/coral       # interactive agent mode

All 12 commands work without any LLM key. Agent mode (plain English questions) uses whichever key you have β€” Claude, Groq (free), or GPT-4o β€” in that priority order.

RepoSense also runs as an MCP server for Claude Desktop and other MCP clients:

claude mcp add reposense -- reposense --mcp

Or add manually to your Claude config:

{
  "mcpServers": {
    "reposense": {
      "command": "reposense",
      "args": ["--mcp"]
    }
  }
}

MCP tools exposed: run_command

(all 12 commands), coral_sql

(arbitrary SQL), list_sources

(schema discovery).

Coral is not a database. It's a SQL runtime that turns live APIs into queryable tables. The mental shift is: you're not querying stored data, you're designing API call patterns.

Every FROM

clause is an API request. Every filter you push into a WHERE

clause inside a search_issues()

call is a filter you're pushing to GitHub's servers. Every JOIN

across two tables is two sets of API calls whose results get joined in memory.

Once that clicks, the optimisation instincts are the same as database query optimisation β€” but the bottleneck is API call count, not disk I/O. The engineering decisions in this project are all expressions of that mental model.

The search_issues()

table function is particularly powerful. It lets you express complex filters in GitHub Search Query Language and get server-side evaluation β€” something GitHub's REST API doesn't expose directly. Coral wraps it cleanly as a SQL table function.

A few things I'd add with more time:

coral source add --file slack/manifest.yaml

and RepoSense could show which GitHub issues are being discussed in your team's Slack channels.reposense digest --repo org/repo --email me@example.com

as a scheduled cron using the same SQL queries.reposense <TAB>

to see available commands.reposense --repos org/repo1,org/repo2 triage

.All of these are purely additive β€” new .sql

files, new Coral source additions, or new CLI flags. The architecture supports them without changes to the core.

RepoSense already ships a custom Stack Overflow source spec (sources/stackoverflow/manifest.yaml

) as a worked example of writing a Coral DSL v3 manifest for a new API. Anyone can follow the same pattern to add any HTTP API as a queryable SQL table.

RepoSense is open source under the MIT license.

Built with Coral, Claude,rich, click, and uv.

github.com/athul-2003/reposense

Demo video: youtu.be/7hxAJ9SiKqU

── more in #ai-tools 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/how-i-built-reposens…] indexed:0 read:12min 2026-05-31 Β· β€”