# Writers Who Use AI Without a Harness Are One Published Article From Disaster

> Source: <https://dev.to/keithjmackay/writers-who-use-ai-without-a-harness-are-one-published-article-from-disaster-3gf2>
> Published: 2026-06-06 23:23:32+00:00

*AI can be tremendously helpful, or can drive you right into Disaster Chasm. Here are some ways to NOT get burned.*

Just over a month ago, a staff writer at Ars Technica was fired. The journalist covering AI, Benj Edwards, senior AI reporter, got tripped up by AI.

The published article attributed quotes to a real person. Those quotes were fabricated: not pulled from a transcript, not reconstructed from notes, not collected in an interview. A language model wrote words and put them in someone's mouth. The writer didn't catch it. The interviewee did, after publication. Ars Technica's editor-in-chief addressed it publicly and moved fast. Termination on discovery. No corrective action plan. Gone [1].

The irony writes itself. The argument that follows is harder.

Around the same time, a short essay started circulating: 546 points on Hacker News. Title: "Don't Let AI Write For You." The core argument, distilled: writing is thinking. Hand the writing to a model, and you're not outsourcing prose. You're outsourcing cognition. "It is like paying somebody to work out for you" [2].

That lands, and it's correct as far as it goes. The essay spread because it names something real: the uneasy feeling that AI-generated text isn't just lazier, it's emptier. The writer who lets a model draft their ideas isn't really the author of those ideas. The trust problem is real too. When a document reads as AI-generated, it signals that the sender didn't contend with the material. Bynder surveyed 2,000 consumers in the US and UK and found the gap is trust, not quality: 56% preferred the AI-written version when shown blind, but 52% said they'd disengage when they merely *suspected* AI involvement [3]. The content wasn't worse. The lack of trust was.

I don't disagree with any of this. But the essay sets up a binary that doesn't hold, and stopping there misses the more important question: what does legitimate AI collaboration actually look like?

The Ars Technica incident and the "writing is thinking" critique are describing different problems. Conflating them makes both harder to address.

The journalist's failure was not a thinking failure. It was a verification failure. The fabricated quotes weren't a product of outsourced cognition: they were a product of an unverified draft from a reporter who was under the gun and under the weather, apparently using some unvetted AI validation tools. The model filled in what it didn't know with what sounded plausible. Plausible is not the same as correct. This distinction matters.

**Language models write confident, fluent fabrications.** They've consumed enough interviews, profiles, and reported features to understand the cadence of a well-placed expert quote: the hedge, the technical specificity, the personality tell. When you ask a model to draft a section with relevant expert perspectives, it doesn't return `[QUOTE NEEDED]`

. It writes something. It sounds like attribution. It reads like reporting. Ironically, a colleague's AI tool alerted him to a problem and quoted the Bynder stat above, but attributed it to Nielsen, as several secondary sources do. I found the Bynder study. I found nothing from Nielsen speaking to this issue with these statistics.

Stanford researchers found that when asked about specific federal court cases, LLMs hallucinate at least 69% of the time, and models tend toward overconfidence regardless of accuracy, stating fabrications with the same certainty as verified facts [4]. A separate Anthropic interpretability study identified internal circuits that activate hallucination specifically when a model recognizes a name but lacks sufficient information about it: the model knows enough to be dangerous, not enough to be accurate [5].

The model doesn't know the difference between what it has read and what it's plausibly reconstructing. It treats both with equal confidence. Fluent output is not evidence of accuracy. That's true for humans too, but the scale is different.

This is a different failure from ordinary human error. A human who invents a quote does so intentionally, and there's a trail. AI fabrication leaves no trail. The quote exists nowhere except in the model's output and, fatally, the published article.

The "don't let AI write for you" critique is worried about cognitive outsourcing: the writer who hands over thinking and gets back something that approximates thought. That's worth worrying about. Microsoft Research surveyed 319 knowledge workers and found that reliance on AI correlates with reduced critical thinking effort, particularly for users with high trust in the tools [6]. MIT Media Lab went further: in a controlled essay-writing experiment, participants who used ChatGPT showed the weakest brain connectivity of any group, and struggled to accurately quote their own work afterward [7]. You can outsource your writing and outsource your judgment along with it. The MIT data suggests you can also lose track of what you actually said. Shen and Tamkin (2026) documented the same pattern in software developers: AI-assisted teams produced working code, but scored 17% worse than unassisted developers on conceptual quizzes about the code they'd just written [8]. The output ships. The understanding doesn't.

What Shen and Tamkin also found is worth holding onto: not all AI-assisted developers showed comprehension loss. Three patterns produced good results: asking follow-up questions after generating code, requesting explanations alongside outputs, and using AI only for conceptual questions while debugging independently. The common thread across all three: staying engaged with understanding, not just output. The tool was in the loop. So was the brain.

These are two separate failure modes. One is a workflow problem. One is a discipline problem. The fix for one doesn't address the other.

I write with an AI collaborator. I want to be direct about that, because the current discourse pushes toward either "I never use AI" (a missed opportunity, in my view) or radio silence. Neither is useful.

Here is my workflow:

**The outline is mine.** That is not a small thing. The outline is the argument: what question this piece answers, what the structure of the answer is, where it builds and where it lands. That thinking is mine. No model does it for me. This is the cognitive work the "don't let AI write for you" essay is correctly defending.

**The model writes the first draft.** Given the outline, a structure, and a specific voice, it generates prose. This is closer to research assistance than ghostwriting: it executes the thinking I've already done. It's faster than I am at drafting. It's better at certain kinds of sustained exposition. It finds great sources that I don't find. I wrote a Claude skill that helps maintain active voice, draft stylistically like things I've written in the past, and review the coherence of the argument I'm making. I use it for that.

**I edit with real scrutiny.** Not for typos. For accuracy: whether the argument is actually what I intended, whether anything got invented that I didn't put there. If something sounds authoritative but I can't trace it to a primary source, it doesn't stay in (like my Nielsen study that was actually a Bynder study). If a quote appears that I didn't supply in the outline, I find the source or I cut it. This is where the Ars Technica writer failed: he stopped before this step.

Calling the model's role "execution" undersells it. The draft isn't just my outline translated into prose. It's a response, and it makes choices I wouldn't have made, finds angles I didn't plant in the outline. Some I reject. Some I keep and build on. The editing pass isn't just verification; it's a negotiation between my judgment and the LLM's enormous accumulation of captured human expression. When I push back on a draft, restructure a section, or challenge a framing, I'm not correcting a tool. I'm arguing with a collaborator that has read more than I ever will. The work that comes out the other side: the direction, the punch, the accuracy, is better than what I'd produce alone. That's not a caveat about AI collaboration. That's the whole argument for it.

The result is work I own. Not because my name is on it. Because my thinking is in it, and my editorial judgment touched every line.

There's a useful frame from software engineering for this. Birgitta Böckeler, writing for Martin Fowler's site, describes an engineering practice that my colleagues and I have found critical to good coding performance: building an outer harness around coding agents, a system of guides and sensors that increases the probability the agent gets things right the first time, and lets errors surface before they reach human eyes. The formulation: **Agent = Model + Harness** [9]. The model generates; the harness gives the output shape and accountability.

The outline is a harness. My Claude skill is a harness. My editorial pass is a harness. Feedforward and feedback controls: these either steer before the model acts, or catch what slipped through. What the Ars Technica writer lacked wasn't a better model. He lacked the right harness. The model ran free, and fabrication passed all the way to publication.

I'd argue that these are the workflow practices to implement in concrete form:

**Never let a quote pass without a primary source.** If the model generates a quote or attributed statement, treat it as a placeholder until you've verified it against a transcript, recording, published interview, or direct outreach. "That sounds like something they'd say" is not a source. A source that references the source without strict attribution is not a source. Find a primary source or cut the quote.

**The confidence of the output is a warning signal, not a quality signal.** Models hallucinate with authority. They build plausible outcomes, and plausibility is not the same as correctness. An authoritative-sounding claim is exactly when to slow down. Fluency and accuracy are unrelated.

**Build a verification trail, even a minimal one.** A note in the draft: "Quote verified against [source], [date]" creates accountability and a path to correction if something slips. The Ars Technica writer had no trail. Ideally, you want one.

**Declare AI involvement on anything with legal, reputational, or attribution stakes.** Not on every internal email. But sourcing and attribution are precisely those contexts. Reuters now requires that any AI-assisted content involve a human verification step before publication; the AP prohibits AI from generating publishable content for the wire service [10].

**Own the outline; own the edit.** If you didn't write the structure and you didn't do the final pass, are you the author in any meaningful sense? I believe that both ends of the process have to be yours. There's a mechanism behind this, not just a principle: auditing AI output requires the domain knowledge you're supposed to be bringing to the piece. You can't catch a bad argument about a topic you don't understand. You can't spot a fabricated quote if you don't know the source material. The outline forces you to understand the argument before the model touches it. The edit forces you to verify it after. Remove either end, and you're not just outsourcing writing. You're outsourcing the capacity to know whether it's right.

The firing will read as harsh to some. It shouldn't.

Editorial trust is the product. A publication that treats AI fabrication as a correctable process failure, rather than a terminal professional breach, is signaling that its sourcing standards are negotiable. They aren't.

The research on why this matters is unambiguous. The Nuremberg Institute for Market Decisions ran a controlled experiment: identical ads, labeled either "AI-generated" or "human-generated." Same content, different label. Consumers rated the AI-labeled versions as less natural, less useful, and showed lower willingness to purchase, not because the content differed, but because the label changed how they processed it [11]. NIQ's December 2024 study reinforced this neurologically: AI-generated ads elicited measurably weaker memory activation than human-made ads, even when rated as high quality [12]. Trust erosion in AI-detected content isn't just a perception problem. It appears to operate below conscious evaluation.

A Reuters Institute study found that only 42% of news organizations have guidelines on disclosing AI use [13]. Most publications are still catching up to the policy problem. Ars Technica's speed and clarity sets a useful precedent for organizations writing those policies now: AI-fabricated attribution in published work is not a performance issue. It is an integrity issue. Those require different responses.

The question for every organization that hasn't been through this yet: what's your equivalent? Most professional contexts don't have retraction processes. The client who discovers you put words in their mouth in a deliverable, the colleague who reads a fabricated characterization in a meeting summary: those errors don't get formally corrected. They just sit there, corroding trust.

Build the guardrails before you need the retraction.

The "write your own work" argument is right that thinking cannot be outsourced. It's right that credibility comes from contending with material, not from producing a document that approximates what people want to hear.

It doesn't mean AI can't be in the process.

The question isn't whether a model touched the prose. It's who owns the thinking going in and the verification coming out. An outline is cognitive work. An edit is cognitive work. The drafting between those two poles can be assisted, and even improved, without surrendering the part that actually matters.

The tools aren't going away. The standards don't have to erode. Both things can be true, but only if the people using the tools decide where their editorial responsibility begins and, more importantly, where it cannot end.

[1] Maggie Harrison Dupré, "Ars Technica Fires Reporter After AI Controversy Involving Fabricated Quotes," *Futurism*, March 2, 2026. [https://futurism.com/artificial-intelligence/ars-technica-fires-reporter-ai-quotes](https://futurism.com/artificial-intelligence/ars-technica-fires-reporter-ai-quotes)

[2] Alex Woods, "Don't Let AI Write For You," alexhwoods.com, March 8, 2026. [https://alexhwoods.com/dont-let-ai-write-for-you/](https://alexhwoods.com/dont-let-ai-write-for-you/)

[3] Bynder, "AI vs. Human-Made Content Study," April 3, 2024 (n=2,000, 1,000 US / 1,000 UK). [https://www.bynder.com/en/press-media/ai-vs-human-made-content-study/](https://www.bynder.com/en/press-media/ai-vs-human-made-content-study/)

[4] Johann Hue et al., "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models," *Journal of Legal Analysis* 16 (2024): 64–93. [https://doi.org/10.1093/jla/laae003](https://doi.org/10.1093/jla/laae003)

[5] Anthropic, "Tracing the Thoughts of a Large Language Model," March 27, 2025. Primary papers: "Circuit Tracing: Revealing Computational Graphs in Language Models" and "On the Biology of a Large Language Model," transformer-circuits.pub. [https://transformer-circuits.pub/2025/attribution-graphs/biology.html](https://transformer-circuits.pub/2025/attribution-graphs/biology.html)

[6] Lyndal Lee et al., "The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers," Microsoft Research / Carnegie Mellon University, January 2025. [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/](https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/)

[7] Nataliya Kosmyna et al., "Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task," MIT Media Lab, June 10, 2025 (n=54, EEG-monitored). [https://www.media.mit.edu/publications/your-brain-on-chatgpt/](https://www.media.mit.edu/publications/your-brain-on-chatgpt/)

[8] Judy Hanwen Shen and Alex Tamkin, "How AI Impacts Skill Formation," arXiv:2601.20245, January 28, 2026. [https://doi.org/10.48550/arXiv.2601.20245](https://doi.org/10.48550/arXiv.2601.20245)

[9] Birgitta Böckeler, "Harness Engineering for Coding Agent Users," martinfowler.com, April 2, 2026. [https://martinfowler.com/articles/harness-engineering.html](https://martinfowler.com/articles/harness-engineering.html)

[10] Associated Press, "AP AI guidelines," August 2023. [https://www.ap.org/ai-tools-and-technology/](https://www.ap.org/ai-tools-and-technology/)

[11] Nuremberg Institute for Market Decisions (NIM), "Transparency Without Trust," *NIM INSIGHTS*, Vol. 7 (n=1,000 each in USA, UK, Germany; two controlled experiments). [https://www.nim.org/en/publications/detail/transparency-without-trust](https://www.nim.org/en/publications/detail/transparency-without-trust)

[12] NIQ, "NIQ Research Uncovers Hidden Consumer Attitudes Toward AI-Generated Ads," December 12, 2024 (n=2,000+ survey; ~150 EEG). [https://nielseniq.com/global/en/news-center/2024/niq-research-uncovers-hidden-consumer-attitudes-toward-ai-generated-ads/](https://nielseniq.com/global/en/news-center/2024/niq-research-uncovers-hidden-consumer-attitudes-toward-ai-generated-ads/)

[13] Reuters Institute for the Study of Journalism, "Journalism, Media, and Technology Trends and Predictions 2025," (survey on AI policies in news organizations). [https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2025](https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2025)

**If this resonated, here are some related articles:**

*Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology, with Claude and Codex as AI collaborators.*
