What is Your DORA Dashboard Really Telling You?

A new analysis warns that DORA dashboards may provide misleading quality metrics because most organizations cannot accurately measure change failure rate, leading to risky speed-versus-quality decisions, especially as AI adoption accelerates. The author argues that change failure must be deliberately captured by teams, not mined from existing data, and that underinvestment in DevOps undermines meaningful feedback loops.

What is Your DORA Dashboard Really Telling You about quality? You might have a DORA dashboard… but are you getting half the story? The problem with change failure rate isn’t the number of failures you see, it’s that your organisation probably can’t really measure it and that makes any push to ship faster with AI risky The short version Here’s the test for whether this is a problem for you; can you actually pinpoint the specific deployment that’s caused your last production incident without anyone having to read a ticket / debug things to work that out? If the answer is no, then your change failure rate is probably just a guess and you’re making speed versus quality calls based on only half the picture. That’s going to lead to a bigger risk of reputational, legal, regulatory and cost impacts for your company. If you take one thing away from this post, know that change failure is something teams have to deliberately capture and they need to be empowered and supported in doing that. Engineering teams want to ship and deliver value faster and at pace, with AI adoption being the main way that we’re now improving this capability. But as I’ve said when discussing AI adoption readiness https://cakehurstryan.com/2026/06/12/ai-readiness-radar/ , teams need meaningful feedback loops to make sure that delivering faster isn’t destroying value through AI slop and failure. The obvious answer to this is to use your probably already implemented DORA dashboard to get a steer on quality, but from my observations under investment in DevOps will mean you don’t get a meaningful picture from this and that’ll lead you to make the wrong decisions . The problem you have with DORA Note: I’m going to use change failure rate as my worked example throughout, but recovery time how long it takes to recover from a failed deployment has the exact same disease. You can’t measure how long a recovery took if you’ve never recorded when the failure started and when service came back. For any readers unaware DORA DevOps Research and Assessment metrics are the industry standard for software delivery performance. They’re used to track team throughput deployment frequency and lead time as well as product stability change failure rate and recovery time , basically giving a speed vs. quality assessment for changes made by your teams. From my observations, many teams have a gap where change failure rate and recovery time per deployment get skipped because the foundations for tracking this just aren’t there. DORA is a form of observability or shift right testing https://cakehurstryan.com/2025/12/05/why-arent-we-talking-about-shift-right-in-quality-engineering/ testing in production , something that I’ve said is the hard part of Quality Engineering for teams. It’s not just me, Nicola Sedgwick calls out a whole observability quadrant in her Quality Radar https://medium.com/cazoo/quality-radar-a-new-way-to-visualise-quality-f1131668cf95 a tool for visualising quality maturity specifically because most teams don’t know how to solve this as a testing problem https://www.youtube.com/watch?v=lFbFKEwlK9w&list=PLKBhokJ0qd3 Qms3DloAbdq0zTGLQ0pFE&t=1s . You can’t just track these measures unless specific time and effort are put into solving this problem. Teams need to set up specific data for tracking change failure, it can’t really be meaningfully mined from other naturally occurring team data. Teams don’t do DevOps DevOps means an engineering team owns the deploying and running of a product in production, it’s not handed over to another team deployments / support engineering to manage. A team that does this can meaningfully track a failed deployment and importantly create the data that feeds the change failure rate by manually marking a deployment as failed. Change failure is actually something deliberately marked by teams as opposed to a naturally occurring data point; if a team doesn’t have the process or data hygiene to do this then it’s really difficult to track change failure. Yes, you could use an agent to track this or try to reverse engineer failure from support tickets and escaped defects but this also needs amazing data hygiene to allow for this. A siloed ops team likely won’t have the context to know what a meaningful failure is, don’t have the need to track development change failure or might even have their own separate processes and deployments, blocking tracking failures back to what changes are made. Making change failure rate tracking a shared concern across two possibly very siloed teams requires complex and careful management and communications to achieve. There’s a second reason that owning your deployments matters. Change failure rate is a rate it’s failures per deployment so you need a clean count of deployments to divide by, not just a count of failures. A team that owns its own pipeline has that record… every release, when it went out and what was in it. A team that’s handed deployment to someone else often gets a pulled together integrated release candidate so can’t see number of deployments, let alone link failures back to the release that caused it. No deployment record and no link back to the change means you don’t have a rate, you’ve just got a pile of disconnected incidents. Teams don’t know what failure means This is something I mention in my other blogs and talks… engineering teams don’t know what good quality or failure looks like. They might think in limited terms like “did it break everything” or “is the system up” rather than thinking holistically about non-functional requirements, UX, market fit and edge cases. If your teams say that change failure just means an outage, then your quality signals are limited to knowing that you have software that stays up high availability, but not much else . To move beyond just seeing failure as a system crashing, teams need to build agreed view of what good enough means, which holistically tracks what counts as failure: - Full outage that needs a rollback? - Something we can fix forward? - Copy or UX failures? - Non functional issues security, maintainability, accessibility… ? - Regression issues? - Deploying something that customers don’t like or engage with? - Dips in net promoter score? - Too expensive to run? DORA’s classic definition of change failure deployment that degrades service and needs remediation is deliberately narrow… treat that as the floor and not the ceiling for quality. Without having this clearly articulated, teams cannot create a view of what failure means. It needs a level of realism mixed with pragmatism; just saying complete outage is a failure means we miss the opportunity to learn about quality but saying every small thing is a failure is gold plating and impacts team success and velocity. To reach a holistic and pragmatic view of good enough a team should be supported by the wider organisation to define what they think a good product means. In my experience, many teams don’t have a clearly codified standard for success and failure set as a foundation. This is risky because then success is by convention or at the whims of whatever the person / team feels that day, making any tracking of this untrustworthy. Leaders use this incorrectly DORA metrics should be an in team feedback loop and not a cross team or cross engineer leaderboard. Leaders using these metrics and signals to rank teams have not understood their purpose and risk teams hiding actual quality signals from them. You want transparent reporting to know if team ways of working are supporting safe deployments at pace or impacting quality to support informed decision making; that comes from fostering the psychological safety to report transparently and not a competitive culture. If leaders turn change failure rate into a ranking, teams will just stop marking deployments as failed remember the data isn’t naturally occurring and has to be deliberately tracked by teams themselves . That’s Goodhart’s law in action: as soon as the measure becomes a target it stops being a useful measure. You don’t get better quality… just a better score on a report at the expense of demoralised teams and poisoned signals, leaving you with a falsely positive speed only picture. Half the measure leads to hald the outcome Not being able to meaningfully track change failure rate means your DORA dashboard is incomplete and is likely focused just on speed without the balance of quality. That means you’ll likely push to optimise on speed without quality, resulting in delivering bad results faster. In my experience I’ve seen leadership teams take half the measures deployment frequency and lead time and make decisions / act upon them as if they’re the only thing that matters. They then pushed teams to optimise on cycle time and throughput without the safety net of quality metrics or worse not understanding anything about quality leading the team to cut corners and release lower quality features and code. With no signals to tell leadership about this, no action gets taken to solve this and things get worse. This matters a lot in the current state of the industry; doing more with less and starting AI adoption https://cakehurstryan.com/2026/06/12/ai-readiness-radar/ across engineering means that teams are being pushed to deliver more and faster. Any organisation that’s unable to clearly measure change failure rates meaningfully is putting themselves at massive business risk when they try to deliver faster: - Reputational damage from poor products. - Legal non compliance from security or accessibility holes. - Failure to comply with regulatory body standards. - Compounding engineering and maintainability issues that make supporting a product slower. - Higher support costs. - Not knowing what or how to fix problems. These risks get worse over time, the likelihood and impact trend upwards. Initial speed gains might appear fine to begin with but then compound and get worse over months or years, leaving you with systemic organisational and engineering issues to manage. What can be done? Check your foundations Use a Quality radar or AI Readiness radar to assess where your teams are at with being able to provide change failure rate metrics meaningfully. - Do they own their deployments? - Have they documented what good enough and so failure looks like? - Do they link incidents, defects or support back to releases? - Is change failure marked on releases routinely? - Are rollbacks and fixes marked and discoverable? Understanding the possibilities of whether DORA can be extracted and checked is an important first step. This needs to be a blameless audit to understand where people are not punish them so that foundations can be retrofitted and built upon. If foundations are not there, then look at the signals you’ve been using about team performance with a critical eye. Do you get a meaningful signal on the safety of releases at pace or are the decisions being made uninformed and putting the business at risk? Look at other signals of quality Your organisation might be informed by other signals of quality, these may be being used in place of change failure rate. - Escaped defect count. - Engineering quality reports. - Test results and reports. - Support burden. - Customer feedback. If meaningful links between defects or rollbacks exist, you may be able to reverse engineer change failure rate for team deployments. But this relies on good data hygiene in other places, such as linking defects and incidents directly back to releases. Look to understand what underpins these are they a real reflection of what good enough means and are they holistically reported alongside other metrics? Leadership teams should also look to understand how they’re interacting with such signals, are they rubber stamping or actively trying to understand quality and are decisions being driven from a view of quality alongside speed? Build DORA capability Make tracking quality a first class concern for your organisation and empower teams to do so. Know that tracking change failure is something teams have to actively do per release and that this needs to be based from codified standards of good enough. - Create and set meaningful organisational and team quality standards. - Empower teams to track and document failure. - Create an environment of psychological safety for reporting failure. - Set up observability and tooling to measure for failure as part of releases and beyond . - Ensure support tickets and escaped defects go back to the engineering team for fix. - Ensure data hygiene is in place and maintained for releases and JIRA tickets e.g. linking issues back to releases and marking failure in a standard way . Know that building this capability will not guarantee back filling metrics, you will likely only be able to track things going forwards. So what is your DORA dashboard really telling you? If your teams don’t own their deployments, haven’t agreed what good looks like and aren’t deliberately marking their failures, then the honest answer is you’re only getting half the picture. You’ve got a speed number and a quality number that’s either mostly noise or non existent. If that’s the case then you’re likely making partially informed decisions as if both are solid. The fix isn’t a better dashboard or a new tool… it’s the foundations underneath it. Get those right and change failure rate becomes what it was always meant to be, an honest team feedback loop that tells you whether you’re shipping fast and well. Get them wrong and you’re just measuring your speed towards building the wrong thing and possibly a lot of AI slop. Thanks for taking the time to read If you found this helpful and would like learn more, be sure to check out my other posts on the blog. You can also connect with me on LinkedIn for additional content, updates and discussions; I’d love to hear your thoughts and continue the conversation.