What Customers Are Doing With AI and Honeycomb

wpnews.pro

At O11yCon, we talked to engineering teams across the industry, and the numbers are starting to get genuinely wild: Mixpanel DevOps Engineer Eddie Bracho told us their engineering team is generating 50% more PRs than before AI came into the mix (sorry).

By: Rox Williams

Nathen Harvey Shares DORA Report Results at O11yCon 2026

Watch this highlight reel of Nathen Harvey, DORA Lead and Product Manager at Google Cloud, talking about results from the most recent DORA report at O11yCon 2026.

Watch Now At O11yCon, we talked to engineering teams across the industry, and the numbers are starting to get genuinely wild: Mixpanel DevOps Engineer Eddie Bracho told us their engineering team is generating 50% more PRs than before AI came into the mix (sorry).

That kind of velocity is exciting, but it's also a pressure test for every part of your stack that isn't writing code, including your observability practice. Here's what we're hearing from customers about how that's playing out.

Code volume is the new forcing function #

Eddie put it plainly: the investment Mixpanel had already made in observability infrastructure is what's letting them absorb the new volume. "We still have the same number of engineers. They're just writing more code now. Being able to rely on Honeycomb there has been really great."

Observability isn’t just a “nice to have” anymore. When your deploy frequency jumps 50%, you need to be able to tell quickly whether a new change is behaving as expected or if it’s misbehaving. The feedback loop either keeps up or becomes the bottleneck.

Yi-an Lai, Engineering Manager at Gem (a recruiting tech platform), framed it well: "Shipping code is just a little part of an engineer's job. We also do code reviews, design complex interconnected systems, and a big part of our job is participating in on-call rotations where engineers help triage and solve customer-reported issues. We may be faster at shipping code, but it is critical for us to also become faster and more efficient in supporting our product."

AI helps with the left side of the loop. The right side, understanding what your code is doing once it's out in the world, still needs work.

The Honeycomb MCP server #

Several of the teams we talked to have been connecting the Honeycomb MCP server to their AI coding agents, and the use cases are interesting.

At Gem, Yi-an's team built AI-assisted tooling to help engineers during on-call rotations: "Instead of just reasoning with static code, it's able to gather concrete evidence, or it's able to dive deep into traces to uncover performance bottlenecks."

Mixpanel is building toward an incident triage bot. The goal: whenever there's a page, automatically open an agent session, pull in data from Honeycomb alongside GCP logs and Kubernetes, and surface as much context as possible before a human even opens their laptop. "How do I surface as much context as possible to an engineer?" is how Eddie described it. The Honeycomb MCP server is a core component of that.

StarSling (which builds faster, cheaper GitHub Actions via a self-improving CI loop) has gone the furthest down this path. Their Co-founder Daniel Worku described an interesting approach: start with the Honeycomb MCP server, layer on the Honeycomb skill that wraps the MCP with sub-agents and prompts, then build a custom skill on top of that with specific context about how their architecture works. "When we have an incident, I can just run that skill. I'm like, 'what's going on?' And we usually get an intelligent answer out the backend."

The progression from raw MCP access to a bespoke, context-rich skill is something we expect to see more teams working through.

BubbleUp for Bubble #

One thing Shogo Wada at Bubble told us deserves more attention. Bubble is a no-code platform, which comes with its own challenges: AI doesn't know the company’s proprietary language, which makes instrumenting agents harder. But for their backend systems, Honeycomb has become useful in a specific organizational way.

"Before Honeycomb, we were using logs and metrics. There were a few engineers who could see the patterns in the dashboard and come to a conclusion. But they needed to spend years at the organization to get there. With Honeycomb, anybody can see the correlation and come to the conclusion and go from there."

BubbleUp, specifically, comes up a lot in conversations like this. Finding the signal in high-cardinality data used to require deep institutional knowledge. When that knowledge is encoded in a tool anyone can use, you stop depending on the two people who've been around long enough to read the tea leaves.

Reactive now, proactive next #

Most teams are honest that their current AI-assisted observability workflows are still reactive. Yi-an at Gem said as much: "A majority of it is reactive, but our team is also currently going through a revamp of our frontend observability stack. We're adopting OpenTelemetry, and with this transition, we expect to first standardize and then switch from reactive to proactive monitoring."

That's the arc we keep seeing: instrument properly, standardize on OpenTelemetry, get the data rich enough that AI can start surfacing things before a customer notices. Right now, most teams are using AI to go faster once an alert fires. The teams further along are working toward AI that catches things before the alert even needs to fire.

The infrastructure needs to be there first, though. You can't ask an agent to reason about your system if your telemetry is incomplete, inconsistent, or siloed across three tools that don't talk to each other.

What this actually requires #

Rich, high-cardinality telemetry is the foundation everything else is built on. Observability is only as useful as the data behind it.

The good news is that the teams doing this well aren't running exotic infrastructure: OpenTelemetry, distributed traces with enough context attached, and SLOs that measure the customer experience rather than just server health. The AI tooling on top of that is increasingly accessible, but the data quality underneath it still requires deliberate investment.

If your AI coding agents are already shipping 50% more code, that investment is probably due.

source & further reading

honeycomb.io — original article Observability Engineering (2nd Edition) Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide Honeycomb Incident Report: Kafka Maintenance on May 4 and 7, 2026