{"slug": "did-claude-increase-bugs-in-rsync", "title": "Did Claude increase bugs in rsync?", "summary": "A distributional analysis of every rsync release with bug data shows that Claude-assisted releases are not unusually buggy. The analysis was conducted in response to a viral May 2026 Mastodon post and subsequent GitHub issue accusing the rsync project of introducing regressions through AI-assisted development, which escalated into harassment and death threats against maintainers. The findings contradict widespread claims that Claude-assisted commits caused a decline in the tool's stability.", "body_md": "A simple distributional analysis of every rsync release with bug data. Nothing complicated, answers only one question: are the Claude-assisted releases unusually buggy?\n\nIn order to avoid accuastions of this \"just being Claude defending Claude,\" \"AI slop,\" \"probably all hallucinations,\" etc., I've decided it's probably worth explaining a few key points about how this report was created:\n\nIn late May 2026, rsync blew up. First, an [evidence-free Mastodon post\nwas made](https://mastodon.gamedev.place/@JeremiahFieldhaven/116654345332213390) pointing to a spurious correlation between a regression that particular user experienced\nupon upgrading to a release, and that release having Claude commits in it. It was viewed an unknown number\nof times, but even likes and boosts passed the thousands mark handily, and it gained significant traction —\nas all spurious\nanti-AI hate does —, seeing 58 replies from 32 unique users. Someone rages about \"cognitive surrender\" with\nno evidence; another suggests adding rsync to the famous [open-slopware](https://codeberg.org/small-hack/open-slopware) [blacklist](https://en.wikipedia.org/wiki/Index_Librorum_Prohibitorum). From\nthere, it spread to [Hacker News](https://news.ycombinator.com/item?id=48334021), with 81\ncomments,\nfull of mixed dread, anger, and crowing about how this finally proves once and for all no one can use LLMs\nsafely. Among all that was [one particular\ncomment](https://news.ycombinator.com/item?id=48334270) which spurred\nfurther the view that the regressions and bugs were caused by Claude.\n\nThis On May 30, 2026, this burgeoning outrage emergently coalesced into a single focal point: a GitHub issue\ntitled [\"Please Do\nNot Vibe Fuck Up This Software\"](https://github.com/RsyncProject/rsync/issues/929), opened against the rsync repository. It attached a screenshot of\nthe Mastodon post criticizing the project's use of Claude. That's it. No bug report, no technical content,\nno\nattempt to actually ascertain if the concern was real or justified; just **350+ comments**\nranging from thoughtful concern to outright\nharassment (most of the most egregious, unreasonable, and outright violent comments have since been deleted;\nfew thought to preserve them).\n\nThe thread did not stop at words. As is [typical\nfor anti-AI users](https://www.msn.com/en-us/news/technology/animator-jorge-gutierrez-got-death-threats-over-ai-then-he-quit/ar-AA24ph6X), it eventually escalated to\nfantasies of violence. One user posted a now deleted comment including My Little Pony drawings of themselves\nstrangling the\n\"project janitor that pushed vibecoded commits\":\n\nCompleting the internet outrage cycle, this issue in turn spread to [Hacker News](https://news.ycombinator.com/item?id=48342705), generating hundreds more comments.\nSome [attempted](https://news.ycombinator.com/item?id=48346124) to point at the number of\nregressions after the introduction of Claude —\n*\"The Linux Mint Timeshift tool has an issue open documenting a number of regressions that are currently\nopen on the rsync issues page, that were only introduced post-vibecoding\"* — as evidence that it was\nworse. Others [pointed out](https://news.ycombinator.com/item?id=48348708) that those regressions\nwere not caused by Claude, and in response, the goalposts were moved again. Over and over, the core theme\nwas one\ncentral claim, repeated everywhere: Claude-assisted development introduced bugs\ninto a previously stable tool. AI is cognitive surrender, is cocaine, is loss of craft, and the\nusers are right to be angry as a result:\n\nPeople are very justifiably angry that a\n\n— fao_ on Hacker Newsvery stable, well trusted tool, has started to immediately go downhill…all because the main dev is vibecoding that software.\n\nHowever, this isn't doesn't have to be a question solved only on the basis of — ironically — vibes. This is something that could be, at least to a degree, empirically tested. Some even pointed that out:\n\nOn Lobste.rs, in response to [the Medium\nessay Tridge himself posted in response](https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0), finally some users like `boramalper`\n\nbegin to\nactually ask for evidence one way or another:\n\nIt'd be interesting if someone actually did a timechart of regressions after each release (if at all possible) to see if the number actually went up recently or not.\n\n— boramalper on Lobsters\n\nUser `bitshift`\n\nreplied: *\"I would also love to see such a chart. It wouldn't be completely\ninformative… But at least it would be something objective we could measure.\"*\n\n**This analysis is that chart.** Or, well, as best as it can be made, given the limitations of\nthe data (see the previous section).\n\nThe analysis uses a single metric: **bugs per 10 commits** (bugs/10c). For each release, divide\nthe number of bugs attributed to it by the number of commits in its range, then multiply by 10. This\nnormalizes for release size.\n\nEvery commit on the default branch was ordered by committer date to produce a sequential timeline. Each git tag points to a specific commit in this timeline. A release's range is all commits between the previous tag and its own tag. Pre-release tags (\"pre\", \"rc\") are skipped as boundaries and absorbed into their final release. Every commit belongs to exactly one release.\n\nBug counts come from three sources:\n\nGitHub issues and mailing-list bugs are attributed to the most recent release that shipped before the bug was reported. For Bugzilla, each entry has a \"Version\" field that explicitly states which release the bug was reported against, and bugs are attributed to that release.\n\nWhy group commits by release, bugs by release, and then ascertain the correlation — or lack thereof — between Claude commits and bugs through the intermediary of releases? This is for two reasons.\n\nFirst, because the claim that the critics are making is also, itself, made in terms of releases: that\nhaving any Claude commits in a release makes the whole release more buggy as a whole in a noticeable way,\nnot just that Claude-authored commits may introduce more bugs; the latter is a different metric, because\n*later Claude- or human-authored commits could correct for those bugs within the same release*, and\nnobody would then notice as part of the release, and overall it wouldn't matter to users; additionally,\nit's simply important, as stated elsewhere, to meet the claim of the critics where it's at. If this forces\nthem to make their claims more nuanced — or otherwise move the goalposts — then\n*mission accomplished*.\n\nSecond, it's a problem of attribution: the vast, vast majority of bugs do not state exactly which commit\ncaused them, because doing so would require extensive research and analysis that is often not worth it in\nfavor of simply fixing-forward, and even if that analysis *was* done — via something like\n`git bisect`\n\n— it wouldn't necessarily result in anything useful, or anything at all.\nMany bugs can result from a combination of multiple commits, often separated significantly over time,\nwhere it's unclear whether one commit or the other really introduced the bug. Or, one commit can reveal\nseveral latent bugs introduced by other commits at once, and so on.\n\nThe critics' claim is simplistic, absolute, and universalistic: the rate of bugs in the Claude-exposed\nreleases went up. Therefore, the simplest honest response is to analyize precisely what is being claimed:\nbugs, commits, releases, and Claude-exposed commits. If the Claude releases sit in the middle of the\nhistorical distribution, the burden shifts to the critics to explain why this particular middle is somehow\nworse than all the other middles that came before it. Even if that results in is shifting the conversation\ntoward a more nuanced discussion of the *quality* and *type* and *user impact* of the bugs in\nthe releases, it will already have been a major win for the pro-AI crowd, and a shifting of the goalposts\nfor the anti-AI crowd, and then we can do further analysis based on that. And the ball's in the anti-AI\ncourt for that game.\n\nI'm aware that this metric does not control for commit complexity, security intensity, or bug severity. It does not distinguish between a one-line typo fix and a CVE patch. It is a blunt instrument. But the critics' accusation is also blunt: \"Claude is making things worse.\" A blunt instrument is what is required in response. Blood begets blood.\n\nBefore we jump into deeper analysis, let's just look at the two Claude releases themselves, to get a sense for them:\n\nIf that doesn't look like a red flag to you, you'd be right.\n\nSo the question is: are the Claude releases unusually buggy, or could you easily pull a group just as bad\nout of the historical distribution by dumb luck? The way you answer that question statistically is an\n[ exact permutation test](https://en.wikipedia.org/wiki/Permutation_test), which\njust enumerates all pairs of two releases and asks: what fraction have a\nmean bug rate as bad or worse than the one we actually observed? That fraction is the p-value of the\nhypothesis under test.\n\nWhat this p-value tells us is that the hypothesis that Claude makes releases worse has, at least so far, about as much predictive power as a coin flip: if you closed your eyes and picked 2 releases at random, you'd do as bad or worse nearly half the time. There's nothing unusual about the Claude group.\n\nThe permutation test asks: how likely is it that a random group of releases scores as badly as the\nClaude group? But there's another way to pose the question:\nare Claude releases more likely than non-Claude releases to fall above the historical\nmedian? That's a textbook 2×2 contingency table, and the standard test for it is\n[Fisher's exact test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test).\n\n| ≤ median | > median | |\n|---|---|---|\n| Non-Claude | 18 | 17 |\n| Claude | 1 | 1 |\n\nIn case you're not convinced, here's a visual aid, showing where these releases fall in the distribution of all prior releases:\n\n**How to read this graph:** Each dot is a release. The **shaded green band** is the\n[interquartile range](https://en.wikipedia.org/wiki/Interquartile_range) —\nthe middle 50% of historical releases, from **0.65** to **6.82** bugs/10c.\nThe darker regions on either side are the lower and upper quarters.\n\nThis is another way of saying the same thing the previous two tests said, but more\nintuitively: that the Claude releases (green dots) **both fall inside the IQR** —\ntheir bug rates fall within the typical historical range. (Check the numbers if you don't\nbelieve the graph.)\n\nThe obvious counterargument is that maybe *earlier* rsync releases were less in maintenence-mode, and so\nhad more bugs, but *recent* rsync releases have been more stable, so comparing the two Claude-exposed\nreleases to the full historical distribution is masking the fact that they're actually outliers for their\nregime. Luckily, there's a way to test this statistically.\n\nYes, the historical mean (7.59 bugs/10c) was driven by a bimodal distribution: v2.x releases average 2.04 bugs/10c; v3.x releases average 11.46. But even within the v3.x regime, the Claude releases sit in the middle of the pack or better:\n\nSo the regime-shift argument doesn't just fail — it fails *backwards*. The v3.x era has a\nmuch *higher* mean bugs/10c than v2.x. If you restrict the comparison to v3.x only, Claude releases\ndon't stand out at all — and one of them is *better* than most. The only way to make Claude look\nlike an outlier is to compare it against a *quieter* era and then blame the shift on Claude, when\nthe data says the shift predates Claude entirely.\n\nWe can further test whether there are meaningfully different regimes in the version history, and thus\nwhether using the full historical data is valid, by doing a\n[runs test](https://en.wikipedia.org/wiki/Wald%E2%80%93Wolfowitz_runs_test).\nIf such regimes existed, the runs test would detect non-random clustering — it doesn't:\n\nHere's my favorite part, though. Digging into the data, one of the first things that jumped out at me with\nblinding clarity was that *the worst release, by far, in rsync history* was *entirely prior to the\nintroduction of Claude*:\n\nAnd yet **nobody noticed.** There was no AI to blame **so** there was no GitHub\nissue\nwith\n300 comments, no death threats, no threats to fork or move to openrsync. A maintainer\nshipped a broken release and fixed it, just like normal. The only thing that made v3.4.3\nspecial was the availability of an enemy *everyone had already decided to hate*.\n\n| Release | Bugs | Commits | Claude | Bugs/10c | Percentile |\n|---|---|---|---|---|---|\n| v2.4.6 | 2 | 13 | 0 | 1.54 | 46th percentile |\n| v2.5.0 | 4 | 73 | 0 | 0.55 | 14th percentile |\n| v2.5.1 | 4 | 69 | 0 | 0.58 | 17th percentile |\n| v2.5.2 | 6 | 117 | 0 | 0.51 | 11th percentile |\n| v2.5.4 | 5 | 21 | 0 | 2.38 | 57th percentile |\n| v2.5.5 | 22 | 88 | 0 | 2.50 | 60th percentile |\n| v2.5.6 | 14 | 239 | 0 | 0.59 | 20th percentile |\n| v2.6.0 | 8 | 267 | 0 | 0.30 | 9th percentile |\n| v2.6.1 | 5 | 444 | 0 | 0.11 | 0th percentile |\n| v2.6.2 | 29 | 17 | 0 | 17.06 | 89th percentile |\n| v2.6.3 | 49 | 381 | 0 | 1.29 | 37th percentile |\n| v2.6.4 | 22 | 760 | 0 | 0.29 | 6th percentile |\n| v2.6.5 | 16 | 146 | 0 | 1.10 | 34th percentile |\n| v2.6.7 | 15 | 649 | 0 | 0.23 | 3rd percentile |\n| v2.6.8 | 12 | 72 | 0 | 1.67 | 49th percentile |\n| v2.6.9 | 53 | 261 | 0 | 2.03 | 51st percentile |\n| v3.0.0 | 64 | 909 | 0 | 0.70 | 26th percentile |\n| v3.0.1 | 6 | 102 | 0 | 0.59 | 23rd percentile |\n| v3.0.2 | 10 | 9 | 0 | 11.11 | 83rd percentile |\n| v3.0.3 | 22 | 55 | 0 | 4.00 | 71st percentile |\n| v3.1.0 | 170 | 571 | 0 | 2.98 | 63rd percentile |\n| v3.1.1 | 68 | 66 | 0 | 10.30 | 77th percentile |\n| v3.1.2 | 55 | 57 | 0 | 9.65 | 74th percentile |\n| v3.1.3 | 87 | 61 | 0 | 14.26 | 86th percentile |\n| v3.2.0 | 24 | 304 | 0 | 0.79 | 29th percentile |\n| v3.2.1 | 9 | 63 | 0 | 1.43 | 43rd percentile |\n| v3.2.2 | 20 | 58 | 0 | 3.45 | 66th percentile |\n| v3.2.3 | 166 | 157 | 0 | 10.57 | 80th percentile |\n| v3.2.4 | 29 | 213 | 0 | 1.36 | 40th percentile |\n| v3.2.5 | 12 | 53 | 0 | 2.26 | 54th percentile |\n| v3.2.6 | 11 | 28 | 0 | 3.93 | 69th percentile |\n| v3.2.7 | 128 | 60 | 0 | 21.33 | 94th percentile |\n| v3.3.0 | 76 | 38 | 0 | 20.00 | 91st percentile |\n| v3.4.0 | 6 | 60 | 0 | 1.00 | 31st percentile |\n| v3.4.1 | 102 | 9 | 0 | 113.33 | 97th percentile |\n| v3.4.2 | 4 | 50 | 9 | 0.80 | 31st percentile |\n| v3.4.3 | 23 | 34 | 28 | 6.76 | 74th percentile |\n\nSo, why do people feel like they've been betrayed? A lot of it is just sheer, blind outrage at the use of LLMs. However, there are some confounders that might have caused people to feel that way:\n\nOn the HN thread, user `zos_kia`\n\n[pointed](https://news.ycombinator.com/item?id=48347728) at the confound directly:\n\nFrom a cursory look, it looks like a security fix in response to a CVE surfaced a coding error which has been present in the code since 2007. This is so banal that it's actually hilarious to see people lose their shit over it.\n\n— zos_kia on Hacker News\n\nOn Lobsters, user `jbert`\n\n[spelled\nout](https://lobste.rs/s/k1b0za/rsync_outrage#c_2iowov) the causal chain:\n\nThe trigger for the increased volume of changes (and hence increased number of regressions) was the influx of (mostly) LLM-enabled security issues. i.e. the causal chain was: LLMs → more known security issues → more changes needed than usual → more regressions than usual.\n\n— jbert on Lobsters\n\nEssentially, this isn't a \"Claude\" problem, it's a \"more security work\" problem, something that\n[Tridge himself confirmed](https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0) in\nhis response, describing how a flood of AI-generated CVE reports forced rapid,\nextensive\nchanges to rsync's attack surface.\n\nBut, as with all things AI, it doesn't matter. In the end, the outrage isn't about whether rsync is worse\nor better now, it's about people not liking AI, and arguing from *a priori* definitions, not\nempirical results, to the desired conclusion: that AI is bad:\n\nLike I said, the author \"tried to balance security against feature regression.\" I don't dispute that he tried. I merely dispute that the chatbots are good at writing code; in fact, they are bad at writing code.\n\n— CorbinIf the author had approached these security bugs by hand with a mental model (a Naur theory!) which preserves their desired features and functionality then they would have caused fewer regressions...[on Lobste.rs]\n\nIn response to this sweeping, absolute, causal claim made with no evidence — and in fact, counter to the evidence — based on an old philosophical claim about the epistemology of programming, it is perhaps best to leave the victim of this outrage himself with the final word:\n\n…for the people saying things like \"I'm a PhD from xyz uni and I'm telling you LLMs are just stochastic tools that make everything up and the world will fall apart if you use them\", I'm here to tell you that you are out of date. The world of software engineering has changed dramatically in the last few months. The world of IT security and maintaining software in the face of the flood of reports has completely and utterly changed just in the last few weeks. Anything you learned about this stuff last year might as well be from another planet… Bottom line is I do know (well, roughly!) how LLMs work, but that doesn't make them not useful. It does mean you have to be cautious, but I am being cautious, or as cautious as I can be given my desire to be sailing and not dealing with a flood of gunk from so-called internet experts.\n\n—[Andrew Tridgell]", "url": "https://wpnews.pro/news/did-claude-increase-bugs-in-rsync", "canonical_source": "https://alexispurslane.github.io/rsync-analysis/", "published_at": "2026-06-05 12:43:33+00:00", "updated_at": "2026-06-05 18:43:40.470941+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "ai-ethics", "ai-products"], "entities": ["Claude", "rsync", "Jeremiah Fieldhaven", "Hacker News", "Mastodon", "open-slopware"], "alternates": {"html": "https://wpnews.pro/news/did-claude-increase-bugs-in-rsync", "markdown": "https://wpnews.pro/news/did-claude-increase-bugs-in-rsync.md", "text": "https://wpnews.pro/news/did-claude-increase-bugs-in-rsync.txt", "jsonld": "https://wpnews.pro/news/did-claude-increase-bugs-in-rsync.jsonld"}}