{"slug": "building-a-free-osha-compliance-tool-8-weeks-solo", "title": "Building a Free OSHA Compliance Tool — 8 Weeks Solo", "summary": "A solo developer built SafetyVision, an open-source PPE compliance monitor that performs core workplace-safety detection for free, over eight weeks. The tool uses a fine-tuned YOLOv8 model to analyze camera feeds for violations like missing hard hats or safety vests, generating OSHA-grounded incident reports and compliance forecasts without requiring paid infrastructure. SafetyVision runs on three free-tier surfaces—a web app, a Gradio demo, and a serverless API—and achieves a 0.763 mAP@50 while transparently documenting its failure modes.", "body_md": "Commercial workplace-safety software — Protex AI, Intenseye, and the rest — runs $500 to $2,000 a month. It watches camera feeds for PPE violations: a worker without a hard hat, a missing high-vis vest, no fall harness at height. The technology isn't exotic anymore. The price tag is.\n\nSo over eight weeks, solo, I built **SafetyVision** — an open-source PPE compliance monitor that does the core job for free and runs on $0 of infrastructure. Not a toy: a fine-tuned detection model, explainable predictions, OSHA-grounded incident reports, compliance forecasting, a documented API and SDK, and a one-command self-host. Three live surfaces, all free-tier.\n\n▶ ** 3-minute walkthrough** ·\n\nThis is the story of the decisions that mattered — including the ones that didn't go to plan.\n\nUpload a worksite photo. SafetyVision finds each worker, flags missing PPE in red ranked by risk, shows you *why* it flagged it (a GradCAM heatmap and SHAP attribution), writes an incident report citing the actual OSHA regulation, exports an audit-ready PDF, and forecasts the site's 7-day compliance trend. Every inspection is saved to your history.\n\nIt runs three ways: a Next.js web app on Vercel (the product), a no-signup Gradio demo on Hugging Face Spaces (the open-source try-it), and a serverless REST API on AWS Lambda (for developers). Same core powers all three.\n\nThe compromises in this project are about *scale* — free tiers, a small model, a modest training set — never about *sophistication*. Here's where the sophistication went.\n\nDetection is a fine-tuned YOLOv8, exported to ONNX so it runs on a plain CPU — no GPU required for end users. Version 1 was YOLOv8**n** (nano), trained on ~58k images, landing at **0.701 mAP@50**. Decent, but it had a clear weakness: it was biased toward frontal poses and missed workers seen from the side, the back, or partially occluded.\n\nFor v2 I went bigger — YOLOv8**s** (small), 80k+ images, and an aggressive Albumentations augmentation pipeline (random occlusion, brightness/contrast jitter, motion blur, mosaic) specifically to fight that frontal bias. The target was **mAP@50 ≥ 0.78**.\n\nIt landed at **0.763**.\n\nI could have buried that. Instead it's in the README, the model card, and the demo's closing line. Here's why: a recruiter or a safety officer evaluating this doesn't trust a project with no failure modes — they trust one that knows exactly where it's weak. v2 is a real improvement (Fall-Detected hits 0.956, hard hats 0.936), and the per-class breakdown shows precisely which classes still struggle (NO-Safety-Vest at 0.382). An honest 0.763 with a documented gap is worth more than a suspicious 0.78.\n\nThat became the project's organizing principle: **the demo is curated, the model card is honest.** The demo shows the best-case path because that's what every product demo does; the model card lists every failure mode because that's what every responsible model card does. Both exist on purpose.\n\nEvery detection ships with *both* a GradCAM heatmap and SHAP attribution. That's deliberate redundancy, and it's the feature I'm most attached to.\n\nGradCAM answers \"where did the model look?\" — it paints a heatmap over the image so you can see it attended to the head region when it flagged a missing hard hat. It's spatial and immediately intuitive; a safety officer with no ML background gets it in two seconds.\n\nSHAP answers a different question: \"which pixels actually moved the prediction?\" — per-pixel attribution that a technical reviewer can interrogate. It's slower to compute (the heaviest step in the pipeline) and harder to read, but it's the one that holds up under scrutiny.\n\nA black-box safety tool is a non-starter — if the system flags a worker, someone needs to be able to ask *why*. Shipping both means the answer satisfies the floor manager and the auditor.\n\nA generic \"this worker is missing a hard hat\" message isn't useful. A citation of **29 CFR 1910.135(a)(1)** is. So the incident report is generated by a multimodal Gemini Flash model that receives three things: the annotated image (so it sees what the camera sees), the structured violation data, and the relevant OSHA regulation text — retrieved by a RAG pipeline (Qdrant vector store + BGE embeddings) over the actual 29 CFR 1910 and 1926 standards.\n\nDoes the RAG grounding actually help, or is it theater? I A/B tested it. With RAG vs. without, judged on report quality: **RAG wins, Cohen's d = 0.65, p = 0.0197** (paired t-test, N=16). Small sample, but a real and significant effect. I ran a second A/B on the detection confidence threshold (0.40 vs 0.55): **0.40 wins, McNemar p = 4×10⁻⁵** on 200 held-out images. Decisions backed by numbers, not vibes.\n\nI planned to train on GCP with the $300 free credit. Every GPU VM request bounced — across dozens of zones and machine types. The error messages pointed at regional quotas that *looked* fine. The real culprit took systematic testing to find: a global `GPUS_ALL_REGIONS`\n\numbrella quota that defaults to **0** on new paid accounts and silently overrides every regional quota. You can have regional GPU quota of 1 and still be blocked because the global cap is 0.\n\nFor v1 I pivoted to Kaggle's free 2×T4 notebooks and trained around the 12-hour session cap with checkpoint-resume. For v2, after the account aged and an explicit quota request cleared the global cap, I trained on a single GCP L4 — then wound the whole GCP footprint down to $0 once the weights were on Hugging Face. Documented the entire diagnosis as an architecture decision record, because the next person hitting that wall deserves better than the error message I got.\n\nFor the API, I chose a **Lambda Function URL** over API Gateway. The reasoning: Function URLs are free *forever*, while API Gateway's free tier expires after 12 months — and for a single `/analyze`\n\nendpoint, I didn't need API Gateway's usage plans or request transformations. API-key auth and rate-limiting live at the handler level instead. It's the kind of trade you make explicit so the alternative is on record, not the kind you default into.\n\nLambda Function URLs cap payloads at 6MB. I built the frontend to that limit, and 5MB images started returning 413s. The cause: **base64 inflation.** A 6MB on-the-wire cap is really ~4MB of raw image once you account for the ~33% base64 overhead in the JSON envelope. And it bites the *response* too — my annotated image, GradCAM, and SHAP visuals were going out as PNG and blowing the ceiling. Fix: JPEG q85 instead of PNG, cap input resolution at 1280px, and set the real frontend limit to 4MB. The kind of constraint that's invisible until production traffic finds it.\n\nThe hard constraint was zero ongoing cost, and every runtime service honors it: AWS Lambda/S3/DynamoDB/ECR (always-free, no 12-month cliff), Supabase for Postgres + auth, Vercel for the frontend, Hugging Face for hosting and weights, Qdrant Cloud for vectors, Google AI Studio for the LLM. Cost per analysis: $0.\n\nThat's not a limitation to apologize for — for a small factory that can't justify $2,000/month, the free version *is* the product-relevant version. And the architecture is built so the expensive upgrades (a bigger model on a GPU endpoint, a frontier LLM, multi-seed evals) are config flags away, not rewrites. I built the cheap version of an upgrade-ready system.\n\nNone of these are blind spots — each was a conscious trade against \"ship the rigorous free version.\"\n\nEight weeks, solo, $0: a fine-tuned and ONNX-exported detector, dual explainability, RAG-grounded multimodal reporting, time-series forecasting with a baseline, statistically-validated A/B tests, a three-surface deployment (Next.js + Vercel, Gradio + HF Spaces, serverless AWS via Terraform), a published PyPI SDK with a CLI, and honest metrics throughout.\n\nThe point was never to out-spend the incumbents. It was to show that the capability is no longer the moat — and to build the free version well enough that someone would actually use it.\n\n** Try the live demo** ·\n\n*SafetyVision is an AI-assisted pre-screening tool to support human safety officers — not a replacement for human judgment.*", "url": "https://wpnews.pro/news/building-a-free-osha-compliance-tool-8-weeks-solo", "canonical_source": "https://dev.to/ayushgupta07xx/building-a-free-osha-compliance-tool-8-weeks-solo-325p", "published_at": "2026-05-30 10:13:05+00:00", "updated_at": "2026-05-30 10:41:28.089794+00:00", "lang": "en", "topics": ["computer-vision", "ai-products", "ai-tools", "ai-startups"], "entities": ["SafetyVision", "Protex AI", "Intenseye", "OSHA", "Vercel", "Hugging Face", "AWS Lambda", "Gradio"], "alternates": {"html": "https://wpnews.pro/news/building-a-free-osha-compliance-tool-8-weeks-solo", "markdown": "https://wpnews.pro/news/building-a-free-osha-compliance-tool-8-weeks-solo.md", "text": "https://wpnews.pro/news/building-a-free-osha-compliance-tool-8-weeks-solo.txt", "jsonld": "https://wpnews.pro/news/building-a-free-osha-compliance-tool-8-weeks-solo.jsonld"}}