How to stop holding AI agents back

wpnews.pro

Developers of agentic AI have been making some big claims. The promise has been of autonomous systems that can do everything, from booking our flights and keeping an eye on competitors in real time to handling entire procurement cycles , all without needing an actual human to hit “confirm.” And while the technology needed to achieve most of these marvels already largely exists, the infrastructure necessary to make it work reliably at scale still leaves much to be desired.

Gartner recently projected that over 40% of agentic AI projects will be canceled before the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. That’s pretty striking, especially in view of the expectation that autonomous agents would finally herald AI’s coming-of-age. And yet, this should not really surprise anyone who has seen the undeniable limitations these agents exhibit in the real world. Most people assume the underlying issue to be related to the quality of the models themselves. Although this might seem plausible, it is a little off the mark.

Why the Web Resists Agents #

Consider what a capable agent actually needs. Accessing a website and getting a response is just the start , it then has to translate that response into something usable. Not only that, it has to do it consistently, in real time, and at a scale that makes the whole exercise worthwhile to begin with.

Given the web’s current shape, this is a daunting task. Just take online platforms as an example. There is no technical reason why an independent agent could not compare different platforms and make the choice that best suits users’ preferences.

However, those same platforms currently depend on that information not being readily available. To maintain their advantage, they work on increasingly personalized results, sponsored placements, and urgency cues to shape user behavior and tip the scales in their favor. Without access to pertinent data, no AI agent will ever be able to complete tasks on the web or automate selecting the best option for its users.

The result of this is a web that works reasonably well for general browsing but systematically discourages automated access. I will give a sneak peek into some of the findings that provide a clear illustration.

Oxylabs is about to release the Web Openness Index, which scores over 120 countries based on various aspects of web accessibility. The findings show:

The global average score for practical reachability , essentially, how well a site responds to standard automated HTTP requests , stands at 83.4 out of 100.
The score for anti-automation friction (the lower, the more friction there is) , such as CAPTCHAs, rate limiting, fingerprinting, and bot detection , is, on average, 62.8. - And structured data interoperability , whether sites return data in formats that machines can actually work with , drops even further to 60.3.

Those 20-plus-point differences reflect a structural gap. Sites generally respond to requests for automated access. At the same time, restrictions abound, and data is often returned in machine-unfriendly ways. Agents that depend on reliable, timely, structured information will often fall into that gap.

Data-Starved AI #

Within organizations, agents face a different but related problem: a lack of usable data. In other words, the relevant data exists but has not been cleaned, tagged, or structured in a way that an AI system can understand.

The same applies to customer-facing applications built on agentic systems. Without real-time web data , current prices, live inventory, policy updates, market movements , they have no other choice than to reason based on a frozen version of the world.

Latency is another problem. Put simply, an agent that eventually returns the right answer is far less useful than one that returns it fast enough to act on. When dealing with autonomous systems, the tolerance for delay is even lower.

In each case, the constraint is the same: agents need context they can trust, and they’re not getting it , not from their own organizational data, and not from the web.

Solving a Problem That’s Been Solved Before #

It is easy to forget, but this is actually not the first time the sheer volume of information has eclipsed our capacity to process it. The early web is particularly instructive here. It already held so much knowledge but it could not be useful in its raw state. What made the difference back then was infrastructure built for scale. Namely, web crawlers were deployed to index pages, scrapers were used to compare prices online, and monitoring systems were put in place to track fraudulent ads and brand impersonation across thousands of domains. All of these innovations require the ability to collect public web data reliably and at scale.

A more recent example comes from our pro bono Project 4β partners Debunk.org. This non-profit, fighting online disinformation and fraud, conducted an investigation that uncovered a large-scale, multilingual scam operation targeting former fraud victims. The investigation identified over 50,000 ads, 459 domains, and more than 1,100 related web pages, with an estimated reach of 52 million people across Europe. That kind of coverage requires systematic, automated data collection at scale.

Agentic AI needs an infrastructure of the same kind, except with even higher demands, because agents do more with data than any previous application. They need information that is structured, current, complete, and returned fast enough to support real-time action.

The Three Cs of Reliable Agent Infrastructure #

As noted above, all of this is unlikely to happen organically. For platforms, opening up to frictionless automated access means ceding control over discovery, ranking, and customer relationships. While this is beneficial for the consumer and invites reshaping business models accordingly, it is also a threat to short-term revenue.

The infrastructure that makes agentic systems work reliably has to be built independently. Three requirements, or three Cs, stand out:

Consistency: agents that encounter unreliable data sources produce unreliable behavior, and unreliable behavior is the fastest route to project cancellations.

Currency: real-time access to prices, inventory, availability, and policy is what separates an agent reasoning based on current facts from one reasoning by reference to stale assumptions , in most commercial contexts, the latter creates more problems than it solves.

Compliance: access built outside fair standards tends to provoke countermeasures that raise barriers for all automated systems, so any infrastructure worth building has to be sustainable, not just technically but in practice.

The web was not designed for agents. Within organizations, the context agents need is often not easily accessible to them or even readily available. These are data quality problems that can be solved and infrastructure problems that we are actively solving. Finally, what we as a society truly need is to decide if we are ready to welcome AI agents or if we want to keep holding them back.

Get the most important tech news in your inbox each week.

TNW newsroom and editorial staff were not involved in the creation of this content.

source & further reading

thenextweb.com — original article Medical AI was meant to help. This week it replaced nurses and dodged its own checks Microsoft promised to be carbon negative. Its emissions just jumped 25% Meta is pushing brands onto its AI ad tools. The results are a mess

How to stop holding AI agents back

Why the Web Resists Agents #

Data-Starved AI #

Solving a Problem That’s Been Solved Before #

The Three Cs of Reliable Agent Infrastructure #

Get the TNW newsletter #

Run your AI side-project on zahid.host