How Sarva keeps the same GPU multipler on the cost side and the earn side without the ledger drifting

Sarva, a hub-and-spoke compute grid, has implemented a ledger system that ensures cost and earn multipliers remain consistent without drifting. The system uses a quality score based on GPU tier and a per-user region rate, with the cost locked at job submission using the submitter's region and the earn calculated at completion using the node's region. This design prevents precision loss and allows reconstruction of any disputes via audit logs.

Sarva is a hub-and-spoke compute grid — a FastAPI backend, a Next.js 14 dashboard, and a Python node agent that runs on contributor machines. This post is not about the hub-and-spoke pattern that one is well-trodden . It's about the one piece of business logic I was most worried about getting right: the moment a job is assigned, how do you decide what it costs the submitter, what the node earns , and how do you make sure the audit log captures enough state to reconstruct any disagreement after the fact. I am writing this after the third time I rewrote submit job and complete job . The first two versions diverged — they used different gr lookups, different rounding, and a "credit" abstraction that quietly lost precision. This is the version that has been live on Railway for the last few weeks, and it is the one I am willing to defend. There are two multiplier functions, both tiny: GPU MULT = { "rtx-4090": 3.0, "rtx-5090": 3.0, "rtx-3090": 2.5, "rtx-4070": 2.5, "rtx-3060": 2.0, "rtx-2070": 2.0, "gtx-1080ti": 1.5, "gtx-1080": 1.5, "gtx-1660": 1.3, "cpu": 0.8 } GEO RATE = {"in": 0.7, "india": 0.7, "us": 1.0, "uk": 1.0, "eu": 0.95} PLATFORM FEE = 0.20 def qs g: str - float: return GPU MULT.get g.lower , 1.0 def gr r: str - float: return GEO RATE.get r.lower , 1.0 qs is the node's quality score — set once at registration, never changes. gr is a per-user region rate. Both functions are called on the cost side and the earn side. The catch is whose gr you use on each side, and that asymmetry is intentional. python @app.post "/jobs/submit" def submit job type: str, submitter id: str, script: str = None, slices: int = 1, priority: int = 0, db: Session = Depends get db : user = db.query User .filter User.id == submitter id .first if not user: raise HTTPException status code=404, detail="User not found" gpu cost = {"ml": 2.5, "gaming": 3.0, "compute": 1.0}.get type.lower , 1.0 cost = slices gpu cost gr user.region ← submitter's region final cost = 0.0 if user.tier == UserTier.GOD else cost if user.tier = UserTier.GOD and user.balance < final cost: raise HTTPException status code=400, detail=f"Insufficient credits. Need {final cost}, have {user.balance}" job = Job id=job id, type=type, status=JobStatus.PENDING, submitter id=submitter id, script=script, slices=slices, credits cost=final cost, priority=priority db.add job if user.tier = UserTier.GOD: user.balance -= final cost user.spent total += final cost tx = Transaction ... type="spend", amount=-final cost, ... db.add tx audit db, "job submitted", {"job id": job id, "type": type, "submitter id": submitter id} db.commit Two things to notice: gr user.region is the Job.credits cost is locked at submission time. If the node that eventually runs the job is in a different region, that does python @app.post "/jobs/{job id}/complete" def complete job job id: str, result cid: str = None, error: str = None, db: Session = Depends get db : job = db.query Job .filter Job.id == job id .first ... mark COMPLETED / FAILED ... if job.assigned node id: node = db.query Node .filter Node.id == job.assigned node id .first if node: node.status = NodeStatus.ONLINE if job.credits cost 0 and not error: earn mult = qs node.gpu tier gr node.region ← node's region earned = job.credits cost earn mult 1 - PLATFORM FEE owner = db.query User .filter User.id == node.owner id .first if owner: owner.balance += earned owner.earned total += earned tx = Transaction ..., type="earn", amount=earned, job id=job id, ... db.add tx audit db, "job completed", {"job id": job id, "error": error} db.commit gr node.region here is the node's region, not the submitter's. A node in India running a job submitted by a US user gets 0.7x — the cheaper-region rate flows through to the earner. This is symmetric in spirit cheap power → cheap earn , but the two gr calls are in different functions, hours apart in real time, and called with different arguments. That asymmetry used to be a source of bugs. I eventually settled on the rule: "use the subject's region, always." PLATFORM FEE = 0.20 is taken off the top of the earn , not added to the cost. So the 20% comes from what the node would have earned, not from the submitter's pocket. This is the part where a lot of decentralized-compute projects get the framing wrong: "we take 20% from the worker" sounds bad; "the worker keeps 80% of whatever the submitter paid" sounds fine. They are the same number. The latter framing is what we ship. The credit math is small enough to hold in your head. The reason I sleep at night is audit db, ... . Every state-changing operation writes a row to audit logs with a type and a data JSON blob: python def audit db: Session, log type: str, data: dict : log = AuditLog id=uuid.uuid4 .hex :12 , type=log type, data=data db.add log The events I currently log: user registered , node registered , job submitted , job assigned , job completed , topup , cashout . The data blob is whatever I have at the time of the call — it is not normalized. That is a deliberate choice. The alternative is a clean event schema, but clean event schemas are how you end up with event v2 and a migration that nobody wants to run. The JSON blob is messy but it is complete : if a node owner and the platform disagree about whether a job ran, the audit log has the job id , the node id , the error field, and the timestamp. I can replay the credit math from the audit log and the immutable transactions table to figure out who owes whom what. There is a /logs endpoint that returns the most recent 50 entries. It is the first thing I check when somebody opens a ticket. gr snapshot on the Job row. Job table stores credits cost locked , but not the gr value at submission time. If we ever change the GEO RATE dict and a dispute arises, the audit log + transactions table is enough to reconstruct — but it is annoying, not instant. I am 70% convinced this is fine and 30% convinced I should add a cost gr snapshot column to the Job row tomorrow. users.balance is the source of truth, but I do not yet have a job that walks every user's transactions and asserts balance == sum tx.amount . I run this query by hand once a week. It is fine for a few hundred users. It is not fine at 10,000. MIN DISK GB enforcement on assignment. diskFreeGb at registration but the orchestrator does not check it before handing out a job. This is on the list. final cost = 0.0 if user.tier == UserTier.GOD else cost lets the god user submit anything for free. That is intentional for dev, but it is god user ID in production would be a small catastrophe. I am aware.The pricing formula is the single thing I would want a second pair of eyes on. Specifically: PLATFORM FEE be a flat 20%, or should it scale with slices lower for short jobs, higher for long ones so that the platform has a stronger incentive to keep cheap jobs flowing?Sarva is open source at github.com/AmSach/sarva https://github.com/AmSach/sarva — the monorepo is /backend FastAPI + Postgres , /frontend Next.js 14 , and /node Python agent . The backend is live on Railway, the dashboard is on Vercel, and the node agent is a single-file Python script you can run on any machine with HUB URL and AUTH TOKEN set. If you have shipped a two-sided credit ledger before, I'd genuinely like to know whether the audit-blob approach scales or whether I am about to regret it. Comments welcome. Tags: python, fastapi, opensource, distributed