Turn any company website into structured B2B data (one API call)

A developer built an API that turns any company website into structured B2B data in a single call. The API reads live site content, never guesses missing fields, and returns clean JSON with company name, sector, description, social links, contact email, and tech stack. It uses a two-pass tech detection system and strict schema validation to ensure reliability.

If you've ever needed to go from a company's website to clean, structured data — its name, sector, a short description, social links, a contact email, and the technologies it runs on — you know the options aren't great: - Build your own scraper. Brittle, and every site is different. You'll spend more time maintaining selectors than using the data. - Pay a heavyweight data provider. Expensive, and the data is often a stale snapshot from months ago. - Paste HTML into an LLM and pray. Sometimes you get valid JSON. Sometimes you get a hallucinated CEO email that doesn't exist. I kept hitting this wall while working with lists of company domains, so I built a small API that does one thing well: send a company URL, get back clean JSON. The two rules that shaped it 1. It reads the live site at request time. Not a database snapshot from last quarter. If a company rebranded yesterday, you get today's version. 2. It never guesses. This was the hardest constraint to enforce with an LLM in the pipeline. Missing fields come back as null — never invented. If there's no contact email on the site, you get "email": null , not a plausible-looking fake you'd import straight into your CRM. What a call looks like And the response: How it works under the hood A few design decisions, for the curious: - Two-pass tech detection. A fast pattern-matching pass first think Wappalyzer-style fingerprints , then an LLM enrichment pass only for what patterns can't catch. Cheaper and faster than going full-LLM on everything. - Hard content trimming before the LLM. Page content is capped before any model call. This keeps latency and cost predictable instead of exploding on heavy JS-rendered sites. - Caching with a 14-day TTL. Repeat lookups on the same domain return in ~200 ms instead of re-scraping. The cached field in the response tells you which path you hit. - Strict schema validation. Every response is validated against a strict schema Pydantic v2 before it leaves the API. Either the JSON conforms, or you get a proper error — never half-broken output. Use cases I built it for - Lead enrichment: turn a list of prospect domains into CRM-ready records. - Tech-based targeting: filter prospects by their stack "show me companies running Shopify" . - Data hygiene: verify and refresh company records against the live web instead of stale databases. Try it There's a free tier 100 requests/month , enough to test it against your own data: 👉 AI Live Company Enrichment & Tech Detector on RapidAPI https://rapidapi.com/coinduciel143/api/ai-live-company-enrichment-tech-detector I'd genuinely love feedback from other builders — on the positioning, the pricing, and especially: what field would you want it to extract next? Drop a comment below.