cd /news/large-language-models/i-built-a-3b-lease-risk-scanner-that… · home topics large-language-models article
[ARTICLE · art-27221] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

I built a 3B lease risk scanner that runs without an external LLM API

A developer built Lease Lens, a 3B-parameter contract risk scanner that runs entirely without an external LLM API, for the Hugging Face Build Small Hackathon. The fine-tuned Llama 3.2 3B model achieved a 242% relative F1 improvement over the base model and outperformed an 8B fine-tune on legal clause extraction. Lease Lens analyzes leases for risky clauses, highlights them in the source text, and drafts negotiation emails, all while keeping private data local.

read4 min publishedJun 14, 2026

I built Lease Lens for the Hugging Face Build Small Hackathon.

The idea is simple: most people sign contracts they do not really read.

That is true for apartment leases, freelance agreements, gym memberships, SaaS terms, and small-business office leases. The risk is not that every contract is malicious. The risk is that a normal person can miss a renewal clause, late-fee stack, deposit condition, indemnity clause, repair burden, or arbitration waiver until it is too late.

Lease Lens is a small-model contract review assistant. It reads a lease or contract, finds risky clauses, quotes the exact language, highlights it in the source text, scores the contract, and drafts a plain-English negotiation email.

Demo: https://youtu.be/M-v3OAKO5-k

Space: https://huggingface.co/spaces/build-small-hackathon/lease-lens

GitHub: https://github.com/bO-05/lease-lens

Model: https://huggingface.co/giladam01/lease-lens-legal-3b

GGUF: https://huggingface.co/giladam01/lease-lens-legal-3b-gguf

For this problem, the small-model constraint is not just a hackathon rule. It is part of the product.

Contracts can contain private addresses, payments, business terms, and personal details. A user should not have to send that text to a closed external LLM API just to understand whether a lease contains obvious risk.

Lease Lens runs the model inside the Hugging Face Space and also ships a GGUF build for local llama.cpp / Ollama usage. The app does not call an external LLM API.

That gives the project a clear target:

The app checks for common contract risk categories:

For every accepted flag, Lease Lens shows:

Then it can draft a negotiation email from the grounded flags.

It is not legal advice. It is a review assistant: evidence first, user judgment second.

The shipped model is a fine-tuned Llama 3.2 3B legal extraction model.

I fine-tuned on CUAD-style legal clause extraction and evaluated on 100 held-out CUAD extraction items with the same setup across models.

The headline result:

Model F1 Exact match
Llama 3.2 3B base 0.119 0.010
Lease Lens 3B 0.406 0.280
Llama 3.1 8B base 0.206 0.020
my 8B fine-tune 0.357 0.230

The 3B fine-tune improved F1 by about +242% relative over the base 3B model and even beat my own 8B fine-tune on the same held-out items.

That is the part I like most about the project: small did not mean worse by default. For a specific extraction task, a tuned 3B model was enough to become useful.

The first version had an important failure mode: when trained mostly on positive examples, the bare model over-extracted on absent clause types. In other words, it was too eager to find something.

So the app does not trust generation alone.

Lease Lens wraps the model with deterministic guards:

For long contracts, the app reads the first 80k characters, splits the text into overlapping windows, routes each clause category only to windows containing relevant keywords, and runs the checks as a batched generation call.

This makes the output less magical, but much more inspectable. A user can look at the quote, look at the highlighted source text, and decide whether it matters.

The Space includes real executed commercial leases from SEC EDGAR filings.

That matters because benchmark scores are not enough. A demo can look good on short synthetic examples and then fall apart on actual legal documents.

The built-in examples include:

The Boston example is a good quick demo: Lease Lens finds 3 grounded flags and catches the exact $125,301.33

security-deposit clause.

The Addison example is a stress test: long text, partial coverage, and enough complexity to show why the UI needs to be evidence-first instead of just a chatbot answer.

I started with a Gradio app, but the final submission needed to feel less like a stock demo and more like a focused tool.

The current UI is a "redline legal evidence desk":

The goal is that a judge can understand the whole product path in under a minute:

I used Modal for the v2.5 training path and smoke verification.

The smoke run used an A100-40GB, loaded a CUAD smoke split of 400 positives and 100 synthesized NONE examples, trained for 60 steps, and completed cleanly in about 160 seconds. I kept the run as --no-push

evidence so it verified the Modal path without overwriting the published model.

The repo also includes the training script:

https://github.com/bO-05/lease-lens/blob/main/training/finetune_legal_3b_modal_v2.py

For local usage, I published a GGUF build:

ollama pull hf.co/giladam01/lease-lens-legal-3b-gguf

I also built/finalized the submission with OpenAI Codex as my coding agent. The public GitHub history contains Codex-attributed commits, and the repo includes a Codex build log:

https://github.com/bO-05/lease-lens/blob/main/docs/codex-build-log.md

There are still obvious next steps:

The big lesson for me was that a small legal model can be useful if the product does not ask it to be a lawyer.

Ask it to extract. Ground the quote. Highlight the evidence. Show the limitation. Let the human decide.

That is the shape of Lease Lens.

Demo: https://youtu.be/M-v3OAKO5-k

Live Space: https://huggingface.co/spaces/build-small-hackathon/lease-lens

GitHub: https://github.com/bO-05/lease-lens

Model: https://huggingface.co/giladam01/lease-lens-legal-3b

Field notes: https://huggingface.co/blog/giladam01/lease-lens-article

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-built-a-3b-lease-r…] indexed:0 read:4min 2026-06-14 ·