Can A.I. Produce Writing That We Actually Want to Read?

wpnews.pro

In the previous installment of this series on the future of higher education, I talked with professors about the ways that A.I. has changed their classrooms. Most felt despair over the breakdown of a contract between student and teacher, one predicated on the faith that, even if students weren’t always perfect, they would at least challenge themselves to think every once in a while. If students rely on A.I. summaries to do their “reading” for them, if they don’t attempt to put their ideas into prose, are they really learning anything?

When I consider the original question of this series—whether my nine-year-old daughter will go to college—I find myself wondering whether she will actually struggle through the writing process in that old-fashioned way. Readers will always want literature written by humans, but, for everything else—e-mails, advertising copy, legal briefs, student papers—the resistance to A.I.-generated writing will almost certainly slip as technology improves and it becomes functionally impossible to see the difference between writing by a person and writing by a machine. When that happens, the major incentive that educators hold over students—“I will fail you if you cheat”—will disappear, because there will simply be no way to know.

With that in mind, I want to take a step back from the implications of A.I. for higher education and ask a more fundamental question: How far are we from that moment? Right now, I believe it’s still easy for people to spot obvious examples of A.I. writing. A professor who reads hundreds of papers and has a decent grasp of her students’ writing ability can recognize the fakes. A manager who starts getting tidy, bullet-pointed, and mostly cheery e-mails from her employees will rightly suspect that robots have autocompleted their messages. Robot writing is also frequently filled with tells: copious em dashes, “not X but Y” constructions, conspicuous verbs (“delve” comes to mind).

But those tells generally show up only in Claude’s most rudimentary outputs. What about the kind of prose that we actually want to read? Can Claude produce that?

This question, or some version of it, was asked by thousands of enraged readers during the past couple of weeks, after the literary magazine Granta published a Commonwealth Prize-winning story by a writer named Jamir Nazir that seemed to bear all the hallmarks of A.I. writing. People noted the strange recurrence of the word “hum,” for instance, and, especially, the awkward, constipated metaphors that didn’t make much sense. The publisher of Granta then put out a bizarrely ambivalent statement, concluding that “perhaps we never will know” whether A.I. had written the story. Nazir, for his part, rebutted the allegation. A whole bunch of writers screamed that the end times had arrived, or, less persuasively, insisted that the reason A.I. writing could win the Commonwealth Prize was that literary fiction was in such a bad place. (Is literary fiction better or worse today than it was twenty or thirty or forty years ago? I have no idea, but I do know that every generation of writers has made more or less the same complaint.)

Using Claude, I vibe-coded a simple game that presented roughly two hundred words of text and asked the player whether it was written by a human or generated by A.I. The sample texts all came from Project Gutenberg, an online library of public-domain literature; I asked the robots to scan through works by writers including George Eliot, James Joyce, Ernest Hemingway, and Arthur Conan Doyle and come up with passages in their respective styles. The robot would then display the results and let me and a few of my friends guess whether each was the real deal or a fabrication. The test rounds were fairly easy. The A.I. writing had tells, including formatting and punctuation problems, and an overreliance on tortured similes and metaphors. A.I. also had a weird habit of making its characters fidget constantly, always running a finger along the edge of a table or adjusting a collar. The most reliable marker, though, was something more abstract, and, I suppose, upon reflection, even a little spooky. The scenes generated by A.I. had characters, but, apart from fidgeting, they mostly did nothing.

Consider this passage that Claude generated in the style of Henry Fielding:

There is very little action and no certainty. Sophia doesn’t say much, and Mr. Western can’t interpret her expression, which she herself does not fully understand. And, after Western says his piece, which is described with both an “as if” and an “as though” clause, Sophia doesn’t respond, and looks to the fireplace that is burning a pointless flame.

In early rounds, the people I shared such deadened passages with immediately assumed that they were fake, even if the robots had done a decent job of approximating a given writer’s style.

For the next couple of days, I chatted with Claude about how to get rid of these tells. I told it to avoid similes and to cut down on such words as “nowhere” and “something,” which tended to betray its odd, core ambivalence. For a while, Claude kept spitting out the same inert passages, in which Jay Gatsby or Sherlock Holmes did a whole lot of nothing and had no opinion about the very little that was happening around them. I told Claude that it wasn’t doing a very good job of unlearning its bad habits, and suggested that it create another agent to scan through the fakes and catch any mistakes it made. A third agent made notes with instructions on how best to imitate each author. I imagined these as cue cards that the agent would hold up to make sure everyone remembered to make Dorothea Brooke actually do something. Here’s a sampling of the rules, which I had no part in writing—these are Claude’s instructions to itself regarding how to mimic each author’s style. (I have included only a few; there were typically about ten instructions in each “Does” and “Does Not” category.)

Multiplying the robot workforce and reminding the bot of its task seemed to work, at least in part. (When I asked a friend who teaches computer science and machine learning at U.C. Berkeley why the robots needed other robots to check their work, he replied, “One hundred percent serious answer: No one knows.”) The similes went away. But Claude took some of the new directives a bit too seriously; suddenly, every fake passage was filled with characters hopping on a horse, or delivering an important package, or running. This, for whatever reason, led to very short sentences that were easy for people to spot as fake. So I loosened the rules a bit, and let Claude do its usual thing, with a handful of strict rules about vague words and similes.

After a few days of testing, I posted a link to the test on my X account. Within five days, I had more than thirty thousand responses. The people who took the test were able to identify a real passage versus a fake one roughly fifty-two per cent of the time—which might be another way of saying that they couldn’t actually distinguish the two. But roughly ten per cent of players seemed good at the game, whether because they had prior knowledge of the original material or a particularly keen eye for A.I. tics that I still don’t recognize.

By this point, I had figured out how to make slightly better fakes. I deployed another A.I. employee and had it double-check both samples for tells. And, by the end of the week, I was fooling more than half of the people who played the game. The sample that tricked the most people came from a robot Bram Stoker. Only seventeen per cent of players were able to discern that it was fake.

What struck me was that, although this is definitely a better facsimile of Bram Stoker than earlier iterations of the game included, it still describes absence and stasis. The narrator is trying to avoid a “course of reflection” through “constant activity,” but can’t find enough to do to occupy his mind. The Count is nowhere to be found, leaving the narrator to walk through empty corridors where he hears “no sound but the wind in the chimney in the hall.” Not all of the fake samples contained this degree of emptiness, but a sufficient number did to suggest that, though Claude can generate imitations of famous public-domain authors—ones that are good enough to fool the vast majority of even discerning readers, though not all of them—it still can’t reliably have those characters do much of anything. No amount of additional cue cards or feedback could fix this problem; the second I asked it to make things more active, the stunted and more easily identifiable A.I. prose kicked in again.

I hesitate to claim that this is the great tell, because it sounds, well, far too literary, or even corny—I am a bit too bashful to fully indulge in what it might mean that the robots cannot quite bring a scene to life. I will leave that to the poets and the anti-clankers. My only humble submission in this dialogue: the art of fiction relies, in heavy measure, on the reader accepting these descriptive, atmospheric passages that Claude seems to favor as what the literary critic James Wood has called “a camera’s easy swipe.” Wood has argued that an author’s choices, both big and small, always push up through the surface. A.I. makes choices, too, not by drawing on its personal reveries about, say, a street in Paris at dusk but rather by lifting from pretty much every word that’s ever been written. If Claude prefers to write these passages in which nothing seemingly happens and the hallways are always empty and the characters do nothing except idly touch nearby furniture, it’s because we do, too.

Claude, I am sure, will soon be able to have one of these characters at least fire up a stove or drive a buggy to Norwich, and all of this will just feel like a weird hiccup. Still, I am ultimately heartened by this silly experiment in robot mimicry, because at no point did I or any of the test-takers conclude that we wanted to read literature written by A.I., nor were we left with the revelation that reading and writing were no longer necessary.

Whenever I start thinking about this technology and all the possibilities it holds for replacing us, I remind myself, almost as a matter of mental hygiene, that the top grand masters have not been able to beat the best chess computers for two decades, and yet hundreds of thousands of kids now follow chess influencers on TikTok. We still value the human process of chess, how the game makes our brains move. The superiority of the machines is irrelevant when it comes to why we play, even if computers have had a lot of influence on human strategy.

The same, of course, will be true of writing, which means that we can probably do away with both the doomerism and all the iterations and inversions of littera scripta manet—“the written word remains”—and simply be confident that people will always need to understand things, and that they will need to convert those things into words that can then be used to communicate with other human beings. Skipping that process will always feel like cheating, even if there might be some near future in which a portion of the words we produce comes from robots. The nasty feeling that arises when you read an e-mail or an article or a short story written by A.I. isn’t really dread that our usefulness here on Earth is coming to an end but, rather, the same discomfort and disappointment you would feel if you found out that your opponent in chess was using a bot to plan their next moves. As long as that displeasure remains, a million large language models can write a million copies of the great works of literature, and some might even stumble upon discoveries that could expand how we write sentences, but the basic relationship between humans and writing will stay the same. ♦

source & further reading

newyorker.com — original article A Single Mom, Her Daughters, and Their Chatbots The A.I.-Design Aesthetic That’s Taking Over the Internet The Enrollment Cliff Is Here. Which Schools Will Survive It?

Can A.I. Produce Writing That We Actually Want to Read?

Run your AI side-project on zahid.host