Turn any company website into structured B2B data (one API call)

wpnews.pro

cd /news/ai-products/turn-any-company-website-into-struct… · home › topics › ai-products › article

[ARTICLE · art-25782] src=dev.to ↗ pub=2026-06-13T00:08Z topic=ai-products verified=true sentiment=↑ positive

Turn any company website into structured B2B data (one API call)

A developer built an API that turns any company website into structured B2B data in a single call. The API reads live site content, never guesses missing fields, and returns clean JSON with company name, sector, description, social links, contact email, and tech stack. It uses a two-pass tech detection system and strict schema validation to ensure reliability.

read2 min views24 publishedJun 13, 2026

Build your own scraper. Brittle, and every site is different. You'll spend more time maintaining selectors than using the data. #

Pay a heavyweight data provider. Expensive, and the data is often a stale snapshot from months ago. #

Paste HTML into an LLM and pray. Sometimes you get valid JSON. Sometimes you get a hallucinated CEO email that doesn't exist.

I kept hitting this wall while working with lists of company domains, so I built a small API that does one thing well: send a company URL, get back clean JSON.

#

The two rules that shaped it

1. It reads the live site at request time. Not a database snapshot from last quarter. If a company rebranded yesterday, you get today's version.

2. It never guesses. This was the hardest constraint to enforce with an LLM in the pipeline. Missing fields come back as null

— never invented. If there's no contact email on the site, you get "email": null

, not a plausible-looking fake you'd import straight into your CRM.

#

What a call looks like

And the response:

#

How it works under the hood

A few design decisions, for the curious:

Two-pass tech detection. A fast pattern-matching pass first (think Wappalyzer-style fingerprints), then an LLM enrichment pass only for what patterns can't catch. Cheaper and faster than going full-LLM on everything. #

Hard content trimming before the LLM. Page content is capped before any model call. This keeps latency and cost predictable instead of exploding on heavy JS-rendered sites. #

Caching with a 14-day TTL. Repeat lookups on the same domain return in ~200 ms instead of re-scraping. The cached

field in the response tells you which path you hit. #

Strict schema validation. Every response is validated against a strict schema (Pydantic v2) before it leaves the API. Either the JSON conforms, or you get a proper error — never half-broken output.

#

Use cases I built it for #

Lead enrichment: turn a list of prospect domains into CRM-ready records. #

Tech-based targeting: filter prospects by their stack ("show me companies running Shopify"). #

Data hygiene: verify and refresh company records against the live web instead of stale databases.

#

Try it There's a free tier (100 requests/month), enough to test it against your own data:

👉 AI Live Company Enrichment & Tech Detector on RapidAPI I'd genuinely love feedback from other builders — on the positioning, the pricing, and especially: what field would you want it to extract next? Drop a comment below.

source & further reading

dev.to — original article Same AI Coding Tools, Two Languages For Buying Them a7c8d3 LiveKit x Wavix: Bringing real phone calls into your apps RAG Retrieves. Fine-Tuning Forgets. HyperNetworks Inject - and Now We Have the Scaling Laws.

~/api · this article 200

$curl api.wpnews.pro/v1/news/turn-any-company-website…

Read original on dev.to → dev.to/cdcsaas/turn-any-company-website-into-str…

mentioned entities

RapidAPI

Pydantic

Wappalyzer

metadata

slugturn-any-company-website-into-structured-b2b-data-one-api-call

topic#ai-products

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevAI shopping agents are coming. N…

next →What only the pixels knew: givin…

── more in #ai-products 4 stories · sorted by recency

insideai.news · 28 Jul · #ai-products

Amazon Winds Down Most Nova AI Models, Pivots to Frontier Effort

byteiota.com · 28 Jul · #ai-products

Kimi K3 Open Weights: What Developers Need to Know

promptcube3.com · 28 Jul · #ai-products

AI Tokenmaxxing vs. Cost Efficiency: Shifting LLM Strategies

lee-ai.com · 28 Jul · #ai-products

Show HN: Build an Avatar-Cursor Integration

── more on @rapidapi 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required

Turn any company website into structured B2B data (one API call)

If you've ever needed to go from a company's website to clean, structured data — its name, sector, a short description, social links, a contact email, and the technologies it runs on — you know the options aren't great: #

Build your own scraper. Brittle, and every site is different. You'll spend more time maintaining selectors than using the data. #

Pay a heavyweight data provider. Expensive, and the data is often a stale snapshot from months ago. #

#

#

#

Two-pass tech detection. A fast pattern-matching pass first (think Wappalyzer-style fingerprints), then an LLM enrichment pass only for what patterns can't catch. Cheaper and faster than going full-LLM on everything. #

Hard content trimming before the LLM. Page content is capped before any model call. This keeps latency and cost predictable instead of exploding on heavy JS-rendered sites. #

field in the response tells you which path you hit. #

#

Use cases I built it for #

Lead enrichment: turn a list of prospect domains into CRM-ready records. #

Tech-based targeting: filter prospects by their stack ("show me companies running Shopify"). #

#

Run your AI side-project on zahid.host