When scraping orchestration is the wrong abstraction for LLM workflows A developer argues that full scraping orchestration platforms introduce unnecessary complexity for most LLM workflows, where the real need is a simple typed extraction interface. The post advocates for wrapping scraping providers behind a lightweight adapter that returns either structured data or a typed error, keeping polling and job management hidden from the application. This approach, exemplified by the tool Wire, avoids the overhead of actor lifecycles, scheduling, and dataset retrieval that platforms like Apify are built around. A lot of LLM workflows start with the same small problem: the model needs fresh data from a web page. Then the integration grows sideways. You add a scraper, a queue, a dataset store, polling logic, retries, and a parser. By the end, the code that moves data around is larger than the code that uses the data. This is not because scraping platforms are bad. It is because they solve a broader problem than many LLM apps actually have. Platforms like Apify are built around actors: reusable scraping or automation jobs with inputs, runs, logs, datasets, scheduling, and platform-managed execution. That model makes sense when you run recurring jobs across many targets, chain multiple scraping tasks, or need shared actors across a team. For example, a batch pipeline might look like this: php schedule - run actor - wait for completion - read dataset - normalize rows - store results - trigger downstream job That is useful if you are refreshing competitor pricing every night or maintaining a long-lived dataset. An LLM tool call usually looks different: php prompt - fetch one page - extract fields - pass JSON back to the model If you use a full actor lifecycle for that second case, you pay for concepts you may not need: actor discovery, input schemas, run state, dataset retrieval, and actor-specific output formats. The failure modes also spread out. A run can succeed while the dataset is empty. A page can render differently and produce partial data. A parser can return HTML where your downstream tool expects JSON. That is where the abstraction matters more than the vendor. For most agentic workflows, the cleanest internal interface is not “run scraper X.” It is “given this target and extraction intent, return typed data or a typed error.” Something like this: type ExtractRequest = { url: string; schema: Record