One repo clone, shared forever

wpnews.pro

Back to Notes At Falconer we build agents that answer the kinds of questions that come up constantly in engineering teams: what does this code do, what changed recently, what did we decide in that meeting last week? Most of those questions we can answer from indexed documents and recent activity. But some questions are different.

“Who introduced this bug, and what were they trying to fix?” “What actually changed between the v1.2 and v1.3 releases?” These aren’t questions about recent context — they’re requests for time travel through code version history. Answering them well means giving the agent real git access, not a filtered API view. Getting there turned out to be a surprisingly fun infrastructure problem, and we shipped the full solution in about six weeks. Here’s what we built, why we built it that way, and what we learned along the way.

EBS and the GitHub API both fall short #

Before this project, Falconer’s background pipeline ran on ECS tasks backed by Elastic Block Store (EBS) volumes. Every time a job needed to process a customer’s codebase, it cloned the repo fresh and discarded the clone afterwards. Fast, simple, and wasteful.

Exposing git tools to Falcon agents is not straightforward. When a user asks “what has changed in our repo this week?”, a cold git clone

on a fresh EBS volume can take ten seconds for a small repo and many minutes for a large one. That latency is unacceptable in a conversational context. EBS volumes are also bound to a single instance, so you either pin customer traffic to specific workers or you’re constantly managing EBS snapshots and copies.

The natural next question was whether we even needed a clone at all. The GitHub APIs cover a lot of ground, and proxying git operations through them would avoid the storage problem entirely. It works for simple cases, but falls apart quickly under real agent workloads. Rate limits bite when an agent issues a burst of exploratory queries in a single conversation — and with a shared installation token the budget drains fast. More fundamentally, the GitHub API does not expose the full git surface: there is no equivalent of git blame

with line-range precision, no git log -S

pickaxe search, no git diff

between arbitrary refs with rename detection. The agent would be working with a constrained, mediated view of history rather than the real thing.

As a small startup with under ten people, we also had a hard constraint: whatever we built had to be low-maintenance. The solution had to run itself. What we needed was a shared, persistent filesystem — one that the ingest service could write to and the UI service could read from without owning a clone of its own. The ingest service would do the heavy lifting once, and every subsequent read, whether from a background job or a live agent conversation, would hit the same pre-populated tree. We needed a shared, multi-mountable filesystem — specifically, one that implements the Network File System (NFS) protocol.

S3 Files vs. Elastic File System #

Our first stop was Amazon EFS. It supports NFS v4 and works well with ECS containers. However, a friend working at AWS told us about the newly launched S3 Files, a new AWS offering that presents an NFS v4.1 interface over an S3 bucket. We benchmarked EFS against S3 Files across clone time, find

, and ripgrep

on a 28,000-file, 343 MB Next.js repository. These are deliberately punishing access patterns, far more aggressive than typical git operations, but we wanted to understand the worst case. EFS and S3 Files landed within ten percent of each other on every test, which is not a coincidence: S3 Files is not a FUSE layer over S3 object APIs. It is a real NFS server that uses EFS as a high-performance caching layer, with S3 as the durable backing store and source of truth. Your data never leaves S3 — EFS just accelerates access to the active working set.

Operation	EBS	EFS	S3 Files
`git clone` (343 MB)	10.5s	9m 59s	9m 33s
`find *.js` cold	0.51s	58.5s	47.2s
`find *.tsx` warm	0.44s	13.3s	13.1s
`rg "function"` cold	2.3s	2m 01s	1m 55s
`rg "use strict"` warm	1.5s	32.8s	37.2s

The cold/warm gap also makes sense once you know this: file metadata and smaller files are lazily loaded into the EFS cache on first access, keeping subsequent reads fast. Large reads (≥1 MiB) bypass the cache entirely and stream straight from S3 at no additional S3 Files throughput cost. The cache expires after 30 days by default and auto-evicts cold data; S3 remains the authoritative copy throughout.

For us this means the first agent request against a freshly cloned repo is cold, and subsequent ones are fast. That is a fine tradeoff. The ingest service clones and warms the repo, and the agent inherits a pre-warmed filesystem. S3 Files stores data at roughly $0.023 per GB-month (standard S3 rates) compared to EFS’s $0.30 per GB-month — 13× cheaper. For thousands of customer repos, that difference adds up. S3 Files also introduces no operational overhead. It inherits S3’s elastic, multi-AZ durable storage and requires no capacity planning or sync pipelines to maintain.

How we set up S3 Files #

Setting up the S3 Files filesystem meant provisioning a bucket with versioning enabled (required by S3 Files), server-side encryption, a bucket policy enforcing HTTPS-only access, and restricting direct S3 API writes to the S3 Files service role. Mount targets go in each availability zone behind a dedicated security group. At Falconer we use Pulumi to model our infrastructure on AWS. The UI Fargate service mounts the filesystem at /repos

as read-write, namespaced by organization, provider, and repo: /repos/{orgId}/github/{owner}/{repo} .

One friction point worth calling out: S3 Files went GA in April 2026 and the tooling ecosystem had not fully caught up. The standard @pulumi/aws

provider still does not support s3FilesVolumeConfiguration

on ECS task definitions, and CloudFormation does not support it either. To get full IaC coverage, we migrated the UI task definition to @pulumi/aws-native

, Pulumi’s Cloud Control API-backed provider, which does expose the field natively. The rest of the infrastructure stays on @pulumi/aws

where support is complete. Having to switch providers mid-stack to get end-to-end declarative coverage showed how early we were: at the time of writing, S3 Files support in the standard Terraform AWS provider is still pending.

Integration with Falconer services #

Here is how the UI service, ingest service, the shared filesystem, and the repo sync workflows fit together. We built two complementary repo sync mechanisms. The webhook path handles real-time freshness: whenever a customer’s GitHub app installation receives a push

event on the default branch, or an installation_repositories

event adding a new repo, the ingest service enqueues a github-persistent-repo-sync

job. The job runs a git fetch

against the existing clone, fast-forwarding HEAD to match origin. The cron path handles repos that went quiet or were missed during downtime: a daily github-repo-sync-cron

job walks every connected repo across all organizations and enqueues a sync for each one. Both paths route through the same auto-update-queue

so they share a cluster-wide concurrency cap and do not overwhelm the ingest workers.

On top of exposing git tools to Falcon in the UI service, we also migrated our PR update workflow in the ingest service to read from the same persistent repos.

Making repo sync robust and reliable #

The most interesting implementation challenge was the initial clone itself. Cloning directly to an NFS mount is a known problem: git calls fsync()

after every pack object write, and NFS can return EIO

or block for a full server round-trip on each call. A typical repo clone writes tens of thousands of small pack files, so the round-trips stack up fast and the clone can either fail outright or crawl.

The fix was a two-step process. The ingest service:

Clones the repo to a local EBS temporary directory (

`/data-ebs/persistent-repo-clone-tmp/{attemptId}/{owner}/{repo}`

) with`core.fsync=none`

. That flag disables all of git’sfsync()

calls for the duration of the clone — loose objects, pack files, metadata, everything. It’s safe here because the local staging clone is disposable: if the worker crashes mid-clone, the next retry starts fresh. - Bulk-copies the completed directory tree to the S3 Files mount with an atomic rename at the end.

The atomic rename is the key correctness guarantee: there is never a window where an incomplete clone is visible to readers at the canonical path. A reader either sees the previous complete clone or the new one — never a half-written intermediate state. On first clone, the rename creates the path; on subsequent syncs, a git fetch

runs in-place against the already-mounted repo, since fetches only append pack objects and do not trigger the same metadata explosion.

Because multiple webhook events can arrive for the same repo in quick succession, we added a per-repo Redis lock keyed on persistent-repo-sync:{orgId}:github:{owner}/{repo} . Each sync job tries to acquire it before touching the filesystem. If another worker is already holding the lock, the job throws a DelayedError

and reschedules itself 5 seconds later — so events are serialized without being dropped. The lock carries a 15-minute TTL as a safeguard against worker crashes that would otherwise leave it permanently acquired.

A Falcon agent that can time travel #

With a persistent, up-to-date clone available on the shared filesystem, we could expose a proper git tool to the agent — running allowlisted read-only subcommands like log

, show

, diff

, blame

, grep

, and ls-tree

directly against the code repos on S3 Files. Here’s an example of Falcon using it in action.

If you’re building agents that need to reason about code history, the pattern here is worth stealing. Clone once to a shared NFS mount, keep it fresh with webhooks and a daily cron, and expose git as a read-only subprocess behind an allowlist. The agent gets full git access with no round-trip latency, and you get a foundation you can build on independently of whatever your agent needs next.

source & further reading

falconer.com — original article