After a core rollback, halt the rest — a safety design we arrived at the hard way

A developer on a WordPress maintenance automation project redesigned the system to halt all remaining plugin updates after a core update rollback, after discovering that the original "keep going" design caused false-positive logs and masked unrelated bugs. The new approach stops subsequent updates when core breaks, defers them to the next maintenance run, and preserves rollback records and notifications for clearer operational traceability.

In WordPress maintenance automation, you inevitably run into points where you have to decide: keep going, or stop right here? One that took us a long time to get right was this: when a WordPress core update goes wrong and gets rolled back, should the remaining plugin updates continue, or stop? We eventually switched to the "stop" design, but we started with "keep going" — and several traps surfaced only after running it in production. Here's how the redesign happened. The outcome of a core update, viewed through a rollback lens, falls into three patterns: Case 1 is clearly "keep going," and Cases 2/3 are clearly "abnormal." But what to do next isn't as simple as that framing suggests. The original design kept maintenance running through Cases 2 and 3: skip http check = True disable HTTP check after a core anomaly, keep going remaining plugin / theme / translation updates still run The reasoning was: "Once core is broken, of course a plugin update will return 5xx — so disable the HTTP check, and we won't mistakenly roll back unrelated plugins." In practice, this did reduce false-positive rollbacks. But as the tool ran in real environments, two problems emerged. If 20 plugins are updated while core is still broken, the log records "20 updates succeeded." The site is still broken, but the log reads as healthy. The next day, when the agency tries to trace "where did it break?" — there's no way to tell whether core was the cause, one of the later updates was, or some combination of them . A safety mechanism intended to reduce noise was actually inflating investigation cost. Setting skip http check = True disables the HTTP check uniformly — including for plugin-side bugs that have nothing to do with core memory leaks, dependency conflicts, PHP version incompatibility . What was supposed to be "skip the HTTP check while core is broken" was actually " make all anomalies in this window invisible ." That's equivalent to intentionally disabling a safety device. Based on these problems, Cases 2 and 3 now stop all subsequent plugin / theme / translation updates entirely. if halt remaining: set to True in Case 2 / 3 record step rollbacks first, then early return return The key is that this isn't just "stop and walk away": step rollbacks records are kept visual check / browser automation / email notification still runCase 1 RB succeeded + recovery confirmed continues as before. The site is healthy again, so the precondition for safely running the remaining plugin updates with HTTP checks is intact. This change means "if core breaks today, all subsequent updates for the day stop." A scheduled batch of 20 plugin updates gets deferred to the next maintenance run if a single core rollback happens. In the short term, that feels inconvenient. But in real operation: These three together — at the cost of skipping that day's plugin updates — give operationally far more traceable behavior than the silently-continues-while-broken alternative. Designs that "disable the check during abnormal conditions" can look clever but tend to make anomalies invisible . Stopping the moment something abnormal is detected, and handing the decision back to the agency, generally gives more predictable behavior across the workflow. When a maintenance-automation design choice is hard to settle, a useful heuristic is: don't try to fix it automatically — communicate it clearly to a human. Surprisingly often, that's what saves the operation.