The CDP Is the AI

wpnews.pro

I was at a Bloomreach breakfast recently, a room full of fintech marketers from brands large and small, all there to talk about "AI". We didn't, really. Or rather, we talked about it the way you talk about a holiday you can't afford yet. The recurring theme, from everyone in the room regardless of company size, was getting their data into a usable shape in the first place. Nobody was stuck on which model to use or which clever feature to switch on. They were stuck on the fact that their data is a mess, spread across half a dozen systems that don't talk to each other, and until that's fixed none of the clever stuff means anything.

That's the whole piece, really, and I could stop there. But the gap that room kept bumping into is bigger and more structural than "we need to sort our data out", and it's widening.

At one end of the market there's a small number of consumer platforms running production machine learning for messaging that is several capability generations ahead of anything a brand can buy. At the other end there's the long tail of mid-market and SMB brands whose access to ML in their lifecycle stack runs from "a few predictive features in our ESP" to "nothing we've actually turned on". In between, a newer category of product is trying to sell brands something much closer to the frontier. The catch, and it's the catch the breakfast table kept circling, is that all of it depends on having your data in order, and most brands don't.

The thing the trade press calls "channel maturity" is more accurately described as sorting by AI sophistication. Some senders can afford to make a channel work. Most can't. And increasingly, the deciding factor isn't budget or headcount. It's whether your data is in a state that lets you do anything at all.

The frontier #

A handful of consumer technology companies publish peer-reviewed work on production notification and messaging systems. These are not white papers or vendor case studies. These are KDD, RecSys, WSDM and CIKM submissions, written by PhD-staffed teams who have to put their methodology in front of academic reviewers and answer for it. The papers aren't exhaustive descriptions of what the companies do internally (they never are), but they're a useful floor on capability. The internal systems are at least as good as what gets published, usually better, and the act of publishing signals that the team has the organisational backing to do this work seriously.

A quick tour of the visible frontier:

Pinterest set a weekly notification budget per user, optimising against long-term site engagement rather than click-through, on the finding that the incremental value of a notification is highest forcasualusers; the heavy openers have high click-through because they engage with everything, not because the notification moved them.1Duolingo used a bandit algorithm to pick which reminder template to send each user, and reported a 0.5% lift in daily active users and a 2% lift in new-user retention over a strong baseline.2Twitter used model-based reinforcement learning to decide whether to send a push at all, modelling the effect over a multi-day horizon. The published trade-off is the interesting part: the settings that cut volume hardest pushed open rate up by as much as 14%, but those same settingsreduceddaily active users; only the most conservative setting, an open-rate gain of about 8%, improved daily actives at all, and then by 0.2%. Maximising the headline number and serving the real objective pointed in opposite directions.3LinkedIn framed notification decisioning as offline reinforcement learning, a Double Deep Q-Network with a conservatism penalty, trained on logged data and deployed: sessions up a quarter of a percent, click-through up a couple of points, notification volumedown, all at once.4By 2026 the same lineage had reached email: BanditLP pairs neural Thompson Sampling with a linear program large enough for billions of variables to choose, under business constraints, what each member is sent.5Zillow governs email and push volume with a boosted-tree classifier deciding send-or-don't per user, tuned to keep 98% of the clicks while shedding the surplus sends and the unsubscribes they cause. No reinforcement learning required, which is its own lesson: the cheapest method on this list still wins by sending less.6Meta treated Instagram's notification slots as an auction: the 550-plus internal teams that want to message you bid against each other (with the platform able to subsidise bids) so no single user is flooded by competing product surfaces. In test it sent slightly fewer notifications, lifted click-through and left reach untouched, across 77 million users per arm.7- And the frontier keeps moving. PushGen, deployed at Kuaishou, the Chinese short-video platform, and presented at WSDM in February 2026, generates push copy with an LLM under style controls, then ranks the candidates with a learned reward model that predicts click-through and picks the winner, across hundreds of millions of users a day.8Pinterest'sTransAct points the same way: a transformer reading a user's realtime activity, now feeding ranking across Homefeed, Search and Notifications, with push open rate and email click-through up a point or two each.9

These systems share four characteristics:

Built on first-party event streams collected by the platform owner at full session-level granularity.
Operated by in-house engineering teams that include researchers, ML engineers and platform engineers in numbers most B2C brands could not staff if they tried.
Tuned against the platform's own long-term value functions, sessions, weekly actives, multi-day retention, sitewide engagement, rather than the open or the click.
Premised on the user's response being something that changes as a function of the messages sent, rather than a fixed signal to be exploited.

That last characteristic is the one almost everything sold to brands gets wrong. A static model asks which users are most likely to open or buy and aims at them. An adaptive one asks a different question: how does sending this, now, change what this user does, against the version of them you left alone? The response isn't a fixed trait you discover and exploit. It's an outcome the message itself moves, up for some people and down for others, which is why the same model has to be willing to send nothing at all. Optimise for who looks likely to engage and you keep messaging the people who were going to engage regardless, while quietly annoying the ones a message pushes the wrong way.

The lift on the real objective, sessions or daily actives, is almost always under a single percent; the bigger-looking gains sit on proxy metrics like open rate. The frontier's advantage isn't enormous lift. It's that a platform with hundreds of millions of users can bank a reliable 0.3% on sessions, which is a vast sum in absolute terms, while a mid-market brand chasing the same 0.3% on a fraction of the base can't justify the engineering to capture it. The gap is one of scale economics, not magnitude.

The gains nearly all come from sending less, or more precisely from sending differently. Volume falls while engagement holds or rises. The comparison is relative, mind: these platforms are trimming a per-user volume that already sits well above what most brands send, so "cutting volume" starts from a regime the long tail was never in. And even within that regime, "send less" is too blunt a reading, because these systems aren't cutting uniformly; they're reallocating, more to some users, none to others. Twitter's best setting actually raised the per-user send ceiling even as total sends dropped, because the policy got more selective about who was worth the headroom. The skill is knowing whom not to message, which is harder to learn than whom to message and needs exactly the per-user signal the long tail lacks. Uplift studies make the same point: the cumulative incremental effect peaks well before you've reached the whole list, usually far earlier than intuition suggests, and past that peak more targeting reduces it, because you start reaching people the message turns off rather than wins over. Send to the whole list and you are well into zero-or-negative territory.

What brands can actually buy: support versus decisioning #

The distinction most often collapsed here is between machine learning that supports a marketer's decision and machine learning that makes it.

Most of the ML that lands in a brand's lifecycle stack is decision support. A smaller, newer category is decision making. Conflating the two is how you end up badly overestimating what the average brand is actually running.

Decision support is what the big customer engagement platforms ship:

Send-time and best-channel optimisation that pick the hour and the route for a message you already decided to send.
Predictive scores for churn, conversion, purchase propensity and lifetime value that you can build a segment around.
Predictive RFM and other "AI" segments that group users by recency, frequency, spend or likely next action.
Product recommendations off purchase history.
Subject-line suggestions and generative copy, the specialism Phrasee, now Jacquard, built on well before the LLM wave, generating brand-voice variants and ranking them on past performance.
Frequency capping and send-volume governing.

Most of these do something. Whether they do much is harder to say: whether send-time optimisation actually beats sending at a sensible fixed hour, for instance, rests almost entirely on vendor case studies, while the independent academic work that exists tends to favour fixed, scheduled times and to warn about notification overload.10 But the common thread, lift or no lift, is that the marketer still designs the journey. The ML scores, suggests, ranks and optimises at specific points inside a structure the human built. It assists the decision. It doesn't make it.

This is the bracket almost every vendor sits in:

Salesforce Marketing Cloud's Einstein, with send-time, content selection and predictive scoring, now wrapped into the Agentforce branding.
Adobe CX Enterprise ( rebranded from Experience Cloud at Summit 2026), with Sensei, the AI Assistant, predictive audiences, automated send-time, content selection, generative variants, and the new tier of AI Coworkers and purpose-built agents. - Klaviyo's predictive analytics for CLV and churn, plus send-time, subject-line optimisation and generative copy.
Iterable's predictive goals, send-time optimisation, Brand Affinity and channel optimisation.
HubSpot's Breeze across the workflow.
Emarsys, now folded into SAP Engagement Cloud, with its AI Scores for conversion, churn and spend, continuously-updated predictive segments, Predict product recommendations, and a growing pile of generative helpers like natural-language catalogue search and an AI report builder.
Braze's Sage AI for send-time, channel, predictive churn and content suggestions.
The likes of CleverTap, MoEngage and Insider play here too, though they're more niche, strong in mobile-first and in particular regions rather than across the board.

If you read three of these vendors' marketing pages back to back you can't tell them apart. The vocabulary is identical and the screenshots are interchangeable. That isn't a coincidence. There's a finite set of things a multi-tenant ML feature can do well, and the vendors have all converged on roughly the same set: send-time, predictive segments, generative copy, journey-level orchestration with some scoring at the branch points. MessageGears documents the standard kit about as plainly as anyone, send-time and channel optimisation, propensity and churn scoring, predictive RFM, product recommendations. And all of it, across vendors, is built around one shape of data and, more to the point, one kind of feedback. The data is the purchase history, browse and engagement of a transactional retail customer. The feedback is short-term engagement, the open and the click, because that is the only label that means the same thing across thousands of unrelated tenants. That's who and what the median model fits. Point it at a niche company with a long, irregular, high-consideration sales cycle and two things go wrong at once: there's less for it to learn from, and the only signal it can optimise, the click, pulls it toward winning the open at the expense of the sale. It underperforms less because the brand's data is exotic than because its objective is the wrong one. Decision making, or "AI decisioning" as the category has ended up calling itself, is the newer and more interesting thing. Here the system decides, per user, what to send, when, on which channel, which offer or creative to use, and increasingly whether to send anything at all. The marketer sets the goal, the eligible audience, the allowed messages and the guardrails. The system explores within those guardrails and learns from what happens. This is the bracket that actually borrows the platforms' methods: reinforcement learning, contextual bandits, holdout-based measurement, optimisation against an outcome rather than an open. There's a clean theoretical reason it beats the test-and-roll-out habit the support tools encourage: running an A/B test and committing to the winner is provably worse, by a fixed factor, than a system that keeps reallocating as it learns. Every fortnight with half your traffic parked on the losing variant is regret you never get back.11

Aampe is the clearest example of the idea. Founded in 2020, before the LLM wave, by a team out of data science and military intelligence, it assigns a notional agent to every individual user and uses contextual bandits, Thompson sampling and difference-in-differences to work out what works for that person, message by message. No predefined journeys, no static segments. You give it a high-fidelity event feed, which it can read from a data warehouse, a CDP, cloud storage or its own streaming API, and a messaging provider to send through, you write a pile of message variants, and the agents explore the combinations. It tracks hundreds of downstream outcomes rather than just click-through. It raised an $18m Series A at the end of 2024 on the pitch of "billions-scale" personalisation. And unusually for a vendor, it publishes: in a two-month randomised trial on 6.4 million users of a financial-services app over a tax-filing season, agent-led messaging cut unsubscribes by 21% against the rule-based baseline, not by sending less but by sending more relevantly, and shifted people towards filing earlier rather than simply more often.12

Hightouch comes at the same problem from the data layer. It started life as a reverse-ETL tool, became a "composable CDP", and has repositioned again as an agentic marketing platform. I've had my disagreements with the company elsewhere, but the product is a serious piece of work. Its AI Decisioning runs on top of your data warehouse, reading from it rather than storing your data, and uses reinforcement learning to choose message, offer, channel, creative, timing and frequency for each customer one at a time, including whether to contact them at all. The marketer defines the audiences, the allowed content, the channels, the contact limits and what counts as success. The system optimises within that, measures every decision against a control group, and reports back what it chose and why. The architecture is more specific than "real-time" suggests, though. It decides on the warehouse and acts through your ESP; its learning loop, action out, outcome back through the warehouse, next decision, runs in hours, not the seconds of a platform optimising live inside its own app. Real-time events can feed it, but it isn't TransAct, and the gap is the latency of that loop and the round-trip through tools it doesn't own.

The incumbents have noticed. OfferFit, an AI decisioning company built on reinforcement learning agents that replace manual A/B testing, was acquired by Braze for $325m, announced in March 2025 and closed in June; Braze is now stitching its engine into the platform alongside its own native agent project. Movable Ink sits adjacent, using ML to assemble email content per individual at open time. But it's one acquisition, not a wave: Aampe is an independent that only does this, Hightouch built its capability in-house from the data layer up, and OfferFit is the lone case of a big CEP buying its way in. Up the stack the picture is different. Adobe Journey Optimizer's real-time decisioning runs natively on Adobe Experience Platform, Salesforce's Agentforce Personalization and Journey Decisioning run natively on Data Cloud, and Pega Customer Decision Hub has been doing next-best-action for banks and insurers for over a decade, with contextual bandits and adaptive models more documented than anything the consumer marketing clouds expose. The honest read is decisioning is emerging on multiple tracks at once, independent specialists at the mid-market, a single CEP acquisition in the middle, and enterprise build-natively above, and it's too early to call which wins for the brands caught between them.

So that's the landscape. Frontier ML at the platforms, mostly invisible. Decision-support ML in every CEP, real but assistive. A decisioning category that genuinely brings frontier methods within a brand's reach, mostly from independents, with the incumbents circling.

Most of them tell you almost nothing about how they work. From the big CEPs there are blog posts that wave at "machine learning" and case studies that report lift, but nothing like a Pinterest or Twitter paper that lays out the model, the training, the evaluation and the result. The exception cuts against the cliché that vendors never publish: Aampe puts out actual papers, with randomised controlled trials, confidence intervals and counterfactual measurement.12, 13 OfferFit sits in between, publishing white papers that explain the method in marketing language without the rigour of a reviewed venue. So the honest version isn't "vendors don't publish". It's that the one decisioning vendor behaving like a research lab is the one already worth pointing to as the clearest example, while the CEP features most brands actually run stay a black box you can only judge by watching your own numbers, which most brands don't rigorously do either. Martech Square has made the same case at the category level: each vendor has defined "decisioning" to mean whatever their strongest capability does best, leaving the discipline shaped by vendor positioning rather than independent definition.

The support features are tuned for the average customer, because the vendor is multi-tenant and the cost of deep per-brand customisation would break the unit economics. The feature works pretty well across most brands and exceptionally well for none. The decisioning products are different here, because they learn against your data and your outcomes, which is exactly what makes them more capable and exactly what makes them dependent on your data being any good.

The decisioning products, the ones that could actually narrow the gap, are gated by the data problem. Aampe needs a high-fidelity event feed, whether from a warehouse, a CDP or a stream, and a clean messaging connection. Hightouch reads from your warehouse, so you need a warehouse with your behaviour in it. OfferFit needs the same kind of structured outcome data to optimise against. Adobe Journey Optimizer's real-time decisioning sits on Adobe Experience Platform the same way, and Salesforce's Agentforce Personalization and Journey Decisioning sit on Data Cloud: the offer-ranking and adaptive-journey machinery at the enterprise tier can't fire until the unified profile underneath is in production, which is the same multi-year programme by a different name and a bigger invoice. The category that brings the frontier within reach is precisely the category that can't do anything until your data is in order. Which is what the room at the breakfast had worked out for themselves, even if they wouldn't have put it that way.

The deeper shift is that the CDP itself has been reframed. For most of the 2015-to-2022 window the CDP was a moderately useful, frequently disappointing category: heavily marketed as a "single view of customer", steadily undermined by warehouse-native critics, beset by partial implementations and complaints that the unifying layer had just become yet another silo. Plenty of brands skipped or stalled their CDP investments on the reasonable view that the ROI didn't justify the cost. The AI wave has changed the stakes entirely. The CDP is no longer the optional unification layer marketers might want; it is the substrate the next generation of decisioning and agent products cannot function without. Adobe's Real-Time CDP is the data layer AJO decisioning depends on; Salesforce's Data Cloud is what Agentforce reads from; Pega's Customer Decision Hub runs against its own unified customer profile. The brands that wrote the CDP off as a marketing fad in 2020 find themselves, in 2026, locked out of the AI capabilities they want by the very absence of the data layer they declined to build.

Salesforce is where this reframing has played out most publicly. The product has been rebranded six times since 2020 (Customer 360 Audiences, Salesforce CDP, Marketing Cloud CDP, Genie, Data Cloud, and now Data 360 under the Agentforce 360 umbrella), partly in search of a story stakeholders could understand. The community ecosystem publishes openly on why Data Cloud implementations fail. The Martech Weekly traced the pattern further back to a 2023 whistleblower retaliation suit from the former Evergage CEO whose product Salesforce had acquired and rebuilt into Genie, alleging Salesforce knew at the 2022 Dreamforce launch that Genie was not really real-time; a federal judge denied Salesforce's motion to dismiss in September 2024 and the case has been allowed to proceed. Salesforce is now staking its next decade on the substrate it spent five years failing to explain.

Most brands aren't using any of it #

Most brands aren't using the decision-support ML, never mind the decisioning kind. They are still batching and blasting: one message to one broad segment, everyone at once, with maybe a subject-line test that, on a list that size, is too underpowered to call. Where there's any automation, it amounts to a handful of generic flows that every contact drops into the same way regardless of who they are. The send-time optimisation toggle sits unused. Hardly anyone is building churn or propensity scores in the first place. The ML is in the contract; it isn't in the workflow.

This changes what the gap actually is for most of the market. For the typical mid-market brand the binding constraint isn't that their vendor's ML lags the platforms'. It's that they aren't running ML at all. There's a frontier-versus-vendor gap, and underneath it a much larger gap between the ML a brand has bought and the ML it actually uses, and most brands are losing on the second one long before the first becomes relevant.

The data isn't in a state where the features can work, so switching them on changes nothing visible and they get switched off again. But a lot of it is simply people. The team hasn't got the expertise or the experience, and the business KPIs they're judged on leave no room to acquire it: there's a number to hit this week, and learning to operate a decisioning system is not how you hit it. The advice they'd reach for is its own problem, a mountain of confident, authoritative-sounding marketing content written by people who don't actually understand how any of this works. So the path of least resistance stays the same broad send, because that's the thing a marketer under a weekly target can ship before lunch. The sophisticated option needs setup, trust, time and data the brand hasn't got; the broad send needs a subject line.

And it's getting actively more expensive to keep doing, which is the genuinely new part. Under the receiver-side regime I described in an earlier piece, sender-level engagement is the currency: consistently weak engagement from a dormant list drags down placement for the engaged part of the list too, because the provider treats engagement as a property of the sender, not just the recipient. What used to be merely wasteful is now self-harming, and the brands still doing it are competing in a channel that has quietly started charging them for the privilege.

Why the gap exists and is widening #

The data problem. The frontier systems run on first-party event streams collected by the platform owner: every session, every scroll, every dwell, every back-out, every notification opened and ignored, every search and click. The training data is unified, real-time, granular and owned end to end.

A typical mid-market brand has its event stream fragmented across an ESP, a marketing automation platform, a mobile measurement partner, a web analytics platform, an ad platform, a half-implemented CDP and several point-solution vendors each with their own SDK. Unifying that into a training-ready dataset is the CDP-or-warehouse problem, and most brands haven't solved it. The Salesforce, Snowflake and Databricks pitch for fixing it is real and the products are real, but the implementation cost is high, the time to value is measured in years rather than quarters, and the marketer is rarely the budget owner for the project. The brands that have solved it tend to be the ones large enough to have an enterprise data warehouse function reporting to a CIO or CDO, which is a small fraction of the mid-market.

This is the bit the breakfast confirmed for me in a way the abstract argument never quite does. The large, publicly-traded banks in the room, with deep pockets and proper engineering functions, had the problem as much as the small fintechs did. It isn't a budget problem you grow out of, it's a structural problem you have to actually solve, and the brands that hadn't solved it were all stuck at the same starting line regardless of how much they were spending downstream.

It's also why the data problem sits underneath everything else. Without unified, training-ready data you can't build the kind of system the platforms publish, and you can't feed the decisioning products that would otherwise hand you a chunk of that capability off the shelf. The decision-support features in your CEP will run on whatever's in the CEP's own database, which is a subset of what your brand has, which is itself a subset of what the user actually does. The signal is degraded twice before training even starts. The academic systems literature on this kind of decisioning makes the same point from the engineering side: the algorithms assume clean, correctly logged data as their input, and producing that is the hard, unglamorous part nobody writes the headline papers about.14

There's a harder version of this, about measurement rather than training. The real question in lifecycle marketing isn't "did the user open it?" but "did sending it change what they did, against an identical user left alone?" That's the uplift question, and it separates the persuadable from the sure-things who convert anyway and the do-not-disturbers, who would have converted if you'd stayed quiet and whom messaging actively loses. Telling those groups apart needs holdout groups and enough volume to see a small effect through a lot of noise. Meta could read a 0.42% lift in click-through because it was testing across 77 million users an arm; a brand with a list in the tens of thousands cannot detect a sub-percent move in anything, and so can only ever measure the gross, high-frequency proxies like opens, never the small shifts in retention or revenue that decide the business. So the long tail isn't clinging to open rates because nobody told them opens are a weak signal; that argument has been made well, including by me. They cling to opens because, at their size, it's the only thing they have the statistical power to measure at all. "Measure outcomes, not opens" is good advice that most brands cannot act on.15

There's a layer above even that. The most valuable thing these systems do is explore, deliberately send something the model is unsure about to learn from the result, and exploration reads as flat or negative on short-term metrics while paying back only over months. Google had to build bespoke experiments simply to prove that exploration on a video platform serving billions was worth anything, because the standard A/B tests showed it as neutral or harmful; the measurement was the hard part, not the method.16 LinkedIn's email bandit shows the same shape from the other end: switching exploration on lifted long-term revenue by about three percent and cut unsubscribes, while the short-term conversion rate didn't move at all.5 If platforms with that depth of instrumentation have to work this hard to see the payoff, a brand reading a per-campaign open rate has no chance of seeing it.

The talent problem. The engineers who can build and operate the systems in the Twitter and LinkedIn papers are expensive, scarce, and concentrated at the platforms that pay the most and offer the most interesting high-impact problems: a notification system touching 800 million people is a more compelling place to work than a mid-market retailer's email programme, almost regardless of salary. But it isn't only the engineers; the scarcity runs through the whole chain. The marketers who genuinely understand how to wield this are nearly as rare, and rarer still is the senior management willing to fund a multi-year, hard-to-attribute capability over the campaign that books revenue this quarter, even when the report crediting it with that revenue is bunk. Lack any one of the three and the effort stalls; most mid-market brands lack all three. Even the engineer alone is a stretch to hire and keep, let alone the team around them: the data engineer, the platform engineer, the experimentation infrastructure, the analytics support. The vendors absorb this cost across thousands of customers, which is the entire point of buying ML from a vendor. But the resulting feature is generic by construction: the average of what's good for everyone, not what's optimal for you. The decisioning vendors are, in effect, a way to rent a slice of that scarce talent, a genuinely good answer to this particular problem. It just doesn't fix the one above it.

The signal problem. Even with clean data and a competent team, a brand only sees the user inside its own product and at the boundary: opens, clicks, web visits, app sessions, transactions. The platforms see the user across all their surfaces, and in the case of Meta and Google across much of the rest of the web. They train on a signal an order of magnitude richer than anything a brand can assemble. You can't replicate that without becoming a platform yourself, which is a different business, or buying signal from a data partner of dubious provenance, which the privacy regulators will keep finding ways to stop. This one has no answer at the brand level no matter how much you spend, and it caps how far the gap can ever close.

The infrastructure problem. Production decisioning needs constant online experimentation, real-time scoring at request time, careful off-policy evaluation and a feature store that keeps up. Building that in-house is a multi-year investment most marketers can't justify against next quarter's number. This is the problem the decisioning vendors solve most cleanly, by offering the substrate as a service. It's the third link in a chain, though: you can't use the rented infrastructure without the data to feed it.

The vendor incentive problem. A vendor's roadmap is set by what most customers will pay for, not by the frontier. And there's a subtler version of this that explains the support-versus-decisioning split directly. Decision-support features are easy to sell because the marketer stays in control and can explain to their boss exactly what the system is doing: it picked a better send time, it scored these users as likely to churn. Decisioning asks the marketer to hand the wheel to a reinforcement learning system and trust that the holdout numbers will vindicate it later. That's a much harder sell, it requires more trust and more data maturity, and it's why decisioning is still the newer, smaller category despite being the more capable one. The frontier systems at the platforms don't have this constraint, because the team that built the system is the customer. They never have to explain it to a CMO defending a quarterly budget.

These compound. The data problem makes the talent problem moot, because there's nothing for the engineer to work on. It starves the infrastructure and the decisioning products of their input. The signal problem caps the ceiling regardless. And the vendor incentive problem keeps the most capable category harder to adopt than it needs to be. The gap widens every year because the platforms publish more, hire more and invest more, while the average brand's marketing budget is flat, and because the platforms are now also running frontier ML on the receiving end, editing and suppressing the messages brands send before they ever arrive.

Consequences #

Squeezed at both ends. The receiving end is now its own frontier ML system, and a heavily published one. The inbox providers parse, extract, rank, summarise and suppress commercial mail with models documented across a decade of papers and patents, which I went throughseparately; the same has happened to push, where on-device models now rewrite and reorder notifications before the user sees them, which I cover in acompanion piece. Between them they touch most of what a brand sends: Apple's notification summaries, Gmail's tabs and AI-generated priority views, Outlook's focused inbox, the on-device summarisation shipping on both major mobile platforms. As the receivers get better at this and the largest senders get better at deciding what to send, the long tail loses at both ends. The brand's message is more likely to be batched, summarised or suppressed before it lands, and the user's attention is more likely to have been spent already on a better-targeted message before the brand's even arrives. The competitor that spent it is rarely the brand the marketer was worried about. It's the platform itself. A mid-market retailer isn't losing attention to other mid-market retailers so much as to Instagram, TikTok and LinkedIn, each of which has trained harder on this user than any brand ever could. There's even a floor effect: below a minimum number of recipients per template, the providers' extraction pipelines may not process a sender's mail at all, so the smallest senders are partly invisible to the very systems the larger ones are optimising against. Two opaque ML systems, one at each end of the wire, both pulling attention away from the brand in the middle. That's the squeeze.More of your competitive position depends on vendor choice. As in-house ML gets harder to justify, which vendor you picked and how well they invest in ML becomes strategic in a way it wasn't five years ago. Choosing one suite over another, or deciding whether to bolt a decisioning layer onto your warehouse at all, is partly a bet on whose roadmap pays off three years out, which is a bet most brands aren't equipped to make. It hands pricing power to the larger vendors, and the OfferFit deal is the first sign of them buying the capability in rather than building it.You can't actually test the vendors against each other. Every vendor's ML is undisclosed in method, runs on your data inside their pipeline, and is wired so deep into the journey builder that you can't cleanly hold it against a rival's on the same audience. There's no bake-off. You can't run Vendor A's send-time model and Vendor B's on the same users in the same period and compare; switching is a migration, not a toggle. So the one decision that's become most strategic, which vendor's ML to bet on, is also the one you can least de-risk with evidence. You choose on demos, case studies and sales narrative, then find out over a two-year contract. The decisioning vendors at least lean on holdout measurement against a control, which tells you whether their system beats your status quo, but still not whether it beats the vendor you didn't pick.Talent flight from brand-side to platform-side. The brand can't afford the engineer the platform can. So the brand-side lifecycle team becomes more about vendor selection, orchestration and data plumbing, and less about building anything. The senior role in 2026 is increasingly a procurement-and-architecture role. The build roles are at the platforms and, to a lesser extent, at the vendors.Lock-in by integration depth. A brand that's built its CRM on a vendor's commodity ML can't easily switch it off to do something more sophisticated, because the data infrastructure assumes the vendor's pipeline. The data lives in the vendor's database, the decisioning in the vendor's journey builder, the events in the vendor's SDK. You're locked in not by contract but by integration depth. This is also why the large vendors keep bundling CDP functionality inwards: they want your data in their stack, not in a warehouse you could point a competitor's decisioning product at. Notably, the warehouse-native decisioning vendors are selling against exactly this, which is a real point in their favour for any brand that cares about optionality.Channel decline at the long tail. The economics of any sender-paid channel are that it works for the senders who can afford to make it work and stops working for the rest. SMS went through it. Email has been going through it for a decade. Search went through it years ago. Push is going through it now. The trade press calls it "channel maturity", which is the polite version. The accurate version is that the channel sorts senders by AI sophistication, the ones sorted to the bottom find it no longer pays for itself, and they leave. The ones at the top get more concentrated returns from a channel cleared of noise. The brand reading a "death of email" think-piece is usually reading something written by, or for, the senders who've already been sorted out, or by someone trying to sell them the next channel. The ones still making the channel pay are quietly doing fine.

Cross-industry parallels #

None of this is unique to messaging. The adjacent markets where the same pattern has already played out tell you which responses actually worked.

Display advertising. Programmatic buying through DSPs commoditised an ML capability, real-time bid and click-through optimisation, that the largest platforms then internalised at higher sophistication. Google and Meta were publishing the production click-prediction methods that underpin this more than a decade ago.17The result is the market we have: most advertisers buy through Google and Meta because those platforms optimise better internally than any third-party DSP can for the spend they handle. The independent DSP market got pushed toward the inventory the walled gardens don't sell. The analogy to messaging is direct. The platform sender is to the brand sender what the walled garden is to the independent advertiser. The mechanism is different, mind. In display there's a literal shared auction, independents bidding cash against each other for third-party inventory. In messaging there's no such market: a brand sends to its own opted-in list, and the contest for attention is settled not by bids but by the operating system's on-device ranking and filtering, the receiving-end editor again. The parallel holds at the level of structure and outcome, not plumbing. What partly worked here was concentration of buying power through agency holding companies. There's no equivalent in lifecycle. No one consolidates lifecycle spend across brands to negotiate with Apple or Google. The brand stands alone.E-commerce. Amazon's recommendation and personalisation operation is larger than most commerce companies in total, and publishes accordingly: its work on repeat-purchase recommendation, modelling the rhythm at which a given customer rebuys a given product, is the kind of thing a Shopify seller has no route to.18Sellers on Shopify or BigCommerce get the platform's commodity recommendations, useful but nowhere near Amazon's level. The sellers who survive compete on something other than personalisation: brand, niche, product specificity, relationship. The lifecycle equivalent is to compete on something other than ML-driven message optimisation.Content recommendation. TikTok and YouTube run recommenders at a scale no publisher can touch; YouTube has published its recommender since 2010, rebuilt it on deep learning by 2016, and has only compounded the advantage since.19Publishers on off-the-shelf engagement tools are several generations behind, and the user's attention goes to the better recommender. The result is a publisher industry structurally weaker than a decade ago, which has mostly responded by retreating to email and direct subscriptions or by chasing whatever the algorithm rewards this quarter, which is not a position of strength.

In every case the same question recurs: what does the long tail do? The answers, in rough order of what actually worked: compete on something the platform can't replicate; coordinate across the long tail to build shared infrastructure; abandon the channel for a different way to reach the user; lobby for regulatory intervention; or try to in-house the capability and partly succeed. The first works most consistently, because it sidesteps the ML contest rather than entering it. The second has barely been tried. The third is expensive. The fourth has mostly failed. The fifth works at the top of the market and not in the middle, though the decisioning vendors are quietly turning "in-house it" into "rent it", which is the most interesting shift of the lot.

Open questions #

Data first, or ML first? The breakfast answered this for me, mostly. The data layer is the precondition for everything, the in-house build and the bought decisioning product alike. But the CDP investment is expensive, slow and easy to do badly, and there's a real failure mode where you spend two years standing up a warehouse, the ML never materialises because nobody owns it, and you end up roughly where you started, only more locked into Salesforce or Snowflake. Data first, yes. But "data first" is not the same as "data and then we'll see", and too many CDP projects are sold as the second.At what scale do you in-house rather than buy? The decisioning vendors have genuinely moved this line. Five years ago the honest answer was "almost never, the threshold is tens of millions of users". Now the question is less "build or buy the ML" and more "is your data good enough to feed something you've bought". For most of the mid-market, renting decisioning via Aampe, Hightouch or OfferFit-inside-Braze is going to make more sense than building, assuming the data's there. Full in-housing stays a top-of-market move.How do you evaluate a vendor's ML when the methodology is undisclosed? Mostly through experiments against real holdouts, which most brands don't run, partly because it's not made easy and partly because they lack the analytical capacity to read the results. A standardised benchmarking framework, like the one the IAB eventually forced on display with viewability, would help. The industry bodies are well placed to push for it and unlikely to, because the vendors are heavily represented in those bodies and don't want a benchmark. The decisioning vendors, to their credit, lean on holdout measurement as a selling point, which at least drags the conversation toward evidence.Does the decisioning category actually close the gap, or just repackage it? Genuinely open. The methods are real and much closer to the frontier than anything in a standard CEP. But the products are still multi-tenant, still don't see the user beyond the brand's own boundary, and still depend entirely on the brand's data. My instinct is that they narrow the gap meaningfully for brands that have solved their data problem, and do nothing at all for the brands that haven't, which means they widen the spread inside the long tail rather than lifting it uniformly. The brands that get their data right and bolt on decisioning pull away from the brands that don't.Is there a regulatory frame for the receiving-end editing? There's a case to be made, particularly under the DMA, that on-device summarisation and inbox categorisation are forms of platform self-preferencing when they systematically favour the platform's own surfaces over a brand's message. The counter is that it's consumer-protective and done on the user's behalf, which is also partly true. The self-preferencing argument itself is being made vigorously: the Commission has gone after Google Search's treatment of its own verticals, opened an investigation into AI Overviews using publishers' content, moved on Meta, and is being lobbied to bring AI assistants under the regime as de facto gatekeepers. What nobody is yet arguing is that brand-message editing specifically is a competition problem, and there's a structural reason it's hard to start: when the EU designated its first gatekeepers it explicitly declined to designate Gmail and Outlook, finding they weren't important enough gateways to qualify. The surface where most of the editing happens was deliberately left outside the regime. So the frame exists and the enforcement energy exists, just not pointed here, and the obvious door is bolted.

Is there an open-source or coalition path? A group of mid-market brands funding shared decisioning infrastructure, the way some industries have funded shared data trusts, is theoretically possible and has barely been tried. The brands don't trust each other and the vendors will fight it. But the trade associations could organise it if they wanted to, and the warehouse-native model makes it more plausible than it used to be, because the data needn't leave the brand's own warehouse to be useful.

What this means for the lifecycle role, and where it ends up #

The job has changed. Five years ago the senior lifecycle marketer built segments by hand, wrote copy, ran campaigns and did the analysis the BI team didn't get to. Judgment was applied at the campaign level. Today it's applied at the architecture level: choosing the vendor, configuring the decisioning, plumbing the data, evaluating whether the ML does what it claims, and managing the people who run the campaigns rather than running them. The campaign-level decisions are increasingly made by a system the marketer configured rather than operates.

That's broadly fine for senior people and broadly bad for the entry-level pipeline. The bits that used to teach you the craft, writing the email, building the segment, watching the open rate, iterating, are being absorbed by the vendor. You can spend three years at an ESP-driven brand now and not really learn how messaging works, because the system is doing the messaging and you're configuring it. Same thing that ate analytics a decade ago, when the entry-level analyst stopped learning anything except how to query a vendor's schema. I don't have a fix for it. That doesn't make it less real.

Underneath the pipeline problem is a comprehension one. Most marketers don't really understand data, as I've argued before, and that has to change, because the move from decision support to decisioning is also a move from deterministic answers to probabilistic ones. An A/B test hands you a winner and a loser. A bandit or a decisioning system hands you a shifting set of bets and asks you to trust the holdout rather than the dashboard, to act on a distribution rather than a result. Operating that takes a comfort with probability, with uncertainty, with not being able to point at the single thing that worked, that the discipline has mostly never had to develop.

There's at least some evidence for how the human's part of this settles. In Aampe's longitudinal study, an active phase where marketers curated content and audiences produced the highest lift (direct app opens up 65%), and was followed by a passive phase where the agents ran autonomously from a fixed library of content for seven months and still sustained a large lift on their own (notification clicks up 57%). The two phases measure different things, so the point isn't a clean subtraction; it's the shape. The reading the authors give, and it matches what I see, is that human intervention drives strategic initialisation and discovery while the autonomous system handles scalable retention, holding the gains once they're found. That's a more useful picture than flat replacement: the craft moves up the stack rather than vanishing, but there's less of it, and it's a different craft.13

Where this ends up, if current trends hold, breaks by tier rather than fitting one template. At the enterprise marketing cloud tier, Adobe and Salesforce most clearly offer a converged stack of ESP, orchestration, decision-support ML and a decisioning engine built natively on their own CDP, with Pega Customer Decision Hub doing the same for banks and insurers in an adjacent bracket and Microsoft, Oracle and SAP carrying lighter variants of the same picture. At the CEP and mid-market tier, the field is fuller: Braze, Klaviyo, Iterable, Customer.io and the mobile-first names compete more directly, with the OfferFit-into-Braze acquisition as the template for those buying their way in and the rest building or partnering. Alongside them, a thinner layer of specialists survives where methodological depth or warehouse-native architecture keeps them out of consolidation: Aampe, Hightouch, the experimentation and content-assembly players. Underneath, a long tail of legacy ESPs in slow decline as the mid-market vendors push down with cheaper tiers and the SMB segment increasingly runs on whatever Klaviyo or Mailchimp ships. Decisioning becomes a tier marker rather than the next universal checkbox: present at the top, absorbed in the middle, theatre at the bottom. The platforms keep investing on both the sending and the receiving end, and edit or suppress a larger share of brand messages than they do today, with the brand mostly accepting it because the alternative is to leave the channel.

And the brands that come out ahead are the few large enough to run real ML in-house, the larger number who got their data into shape and rented a decisioning layer to sit on top of it, or the ones competing on the things ML can't touch: brand, product, relationship, the human craft of knowing what to say. That last group is the most under-discussed, and if I were starting a B2C brand today it's where I'd put the effort, because it's the one position that doesn't depend on out-engineering a platform with more data, more talent and more signal than you will ever have.

Which brings it back to the breakfast. A room of marketers, large brands and small, every one of them held up at the same place: not by the cleverness of the models on offer, which has never been higher, but by the unglamorous job of getting their own data into a state where any of it can be used. The frontier keeps moving. The vendors keep shipping. None of it lands until that's fixed.

The gap isn't really about the AI. It's about whether you've done the hard groundwork that lets you use any.

1Bo Zhao, Koichiro Narita, Burkay Orten and John Egan, "Notification Volume Control and Optimization System at Pinterest," KDD '18.

https://dl.acm.org/doi/10.1145/3219819.32199062Kevin P. Yancey and Burr Settles, "A Sleeping, Recovering Bandit Algorithm for Optimizing Recurring Notifications," KDD '20.

https://dl.acm.org/doi/10.1145/3394486.34033513Conor O'Brien et al., "Should I send this notification? Optimizing push notification decision making by modeling the future," 2022.

https://arxiv.org/abs/2202.088124Yiping Yuan, Ajith Muralidharan, Preetam Nandy, Miao Cheng and Prakruthi Prabhakar, "Offline Reinforcement Learning for Mobile Notifications," 2022 (

https://arxiv.org/abs/2202.03867), with the related "Multi-objective Optimization of Notifications Using Offline Reinforcement Learning," KDD '22 (https://arxiv.org/abs/2207.03029).5a5bPhuc Nguyen, Benjamin Zelditch, Joyce Chen, Rohit Patra and Changshuai Wei, "BanditLP: Large-Scale Stochastic Optimization for Personalized Recommendations," 2026, applied in LinkedIn's email marketing system (

https://arxiv.org/abs/2601.15552); see also the broader system paper, Changshuai Wei et al., "Neural optimization with adaptive heuristics for intelligent marketing system" (the NOAH framework), KDD '24 (https://arxiv.org/abs/2405.10490).6Eric Nichols, Rui Xu, Bhaskar Thiagarajan and Sumit Kamath, "Volume Governing for Email and Push Messages," RecSys '22.

https://dl.acm.org/doi/10.1145/3523227.35473997Christian Kroer et al., "Fair Notification Optimization: An Auction Approach," 2023.

https://arxiv.org/abs/2302.048358Shifu Bie et al., "PushGen: Push Notifications Generation with LLM," WSDM '26.

https://arxiv.org/abs/2512.144909Xue Xia et al., "TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest," KDD '23.

https://doi.org/10.1145/3580305.359991810The reported gains for send-time optimisation come almost entirely from vendor case studies; the peer-reviewed work on notification timing, much of it in education and online-learning settings, tends to favour fixed, scheduled sends and to flag the risks of notification overload rather than validate per-user optimisation against a sensible baseline.

11Aurélien Garivier, Emilie Kaufmann and Tor Lattimore, "On Explore-Then-Commit Strategies," NeurIPS 2016.

https://arxiv.org/abs/1605.0898812a12bOlivier Jeunen and Schaun Wheeler, "Behavioural Effects of Agentic Messaging: A Case Study on a Financial Service Application," 2025.

https://arxiv.org/abs/2512.1746213a13bOlivier Jeunen, Eleanor Hanna and Schaun Wheeler, "Sustained Impact of Agentic Personalisation in Marketing: A Longitudinal Case Study," 2026.

https://arxiv.org/abs/2604.0862114Alekh Agarwal et al., "Making Contextual Decisions with Low Technical Debt," 2016.

https://arxiv.org/abs/1606.0396615For the formal treatment, Chunyuan Zheng et al., "Uplift Modeling with Delayed Feedback: Identifiability and Algorithms," AAAI 2026. The persuadable / do-not-disturber framing and the diminishing-returns curve beyond the most responsive segment are standard in the uplift literature, e.g. work on the public Hillstrom (MineThatData) email dataset.

16Yi Su et al., "Long-Term Value of Exploration: Measurements, Findings and Algorithms," WSDM '24 (Google and DeepMind, on a short-form video platform serving billions). Standard A/B tests registered exploration as neutral or negative on short-term engagement; the contribution is a set of experiment designs that capture its long-term value.

https://arxiv.org/abs/2305.0776417Xinran He et al., "Practical Lessons from Predicting Clicks on Ads at Facebook," ADKDD '14.

https://dl.acm.org/doi/10.1145/2648584.2648589See also H. Brendan McMahan et al., "Ad Click Prediction: a View from the Trenches," KDD '13 (Google).18Rahul Bhagat, Srevatsan Muralidharan, Alex Lobzhanidze and Shankar Vishwanath, "Buy It Again: Modeling Repeat Purchase Recommendations," KDD '18 (Amazon).

https://assets.amazon.science/40/e5/89556a6341eaa3d7dacc074ff24d/buy-it-again-modeling-repeat-purchase-recommendations.pdf19James Davidson et al., "The YouTube Video Recommendation System," RecSys '10 (

https://dl.acm.org/doi/10.1145/1864708.1864770); and the deep-learning rebuild, Paul Covington, Jay Adams and Emre Sargin, "Deep Neural Networks for YouTube Recommendations," RecSys '16 (https://doi.org/10.1145/2959100.2959190).

source & further reading

jacquescorbytuech.com — original article A CRM Marketing Knowledge Base Counting in the dark: measuring marketing channels with platforms wedged in the middle AI Needs Shame, Not Taste