There’s a fix for AI-generated essays. Why aren’t we using it?

It’s the start of the school year, and thus the start of a fresh round of discourse on generative AI’s new role in schools. In the space of about three years, essays have gone from a mainstay of classroom education everywhere to a much less useful tool, for one reason: ChatGPT. Estimates of how many students use ChatGPT for essays vary, but it’s commonplace enough to force teachers to adapt.

While generative AI has many limitations, student essays fall into the category of services that they’re very good at: There are lots of examples of essays on the assigned topics in their training data, there’s demand for an enormous volume of such essays, and the standards for prose quality and original research in student essays are not all that high.

Right now, cheating on essays via the use of AI tools is hard to catch. A number of tools advertise they can verify that text is AI-generated, but they’re not very reliable. Since falsely accusing students of plagiarism is a big deal, these tools would have to be extremely accurate to work at all — and they simply aren’t.

AI fingerprinting with technology

But there is a technical solution here. Back in 2022, a team at OpenAI, led by quantum computing researcher Scott Aaronson, developed a “watermarking” solution that makes AI text virtually unmistakable — even if the end user changes a few words here and there or rearranges text. The solution is a bit technically complicated, but bear with me, because it’s also very interesting.

At its core, the way that AI text generation works is that the AI “guesses” a bunch of possible next tokens given what appears in a text so far. In order not to be overly predictable and produce the same repetitive output constantly, AI models don’t just guess the most probable token — instead, they include an element of randomization, favoring “more likely” completions but sometimes selecting a less likely one.

The watermarking works at this stage. Instead of having the AI generate the next token according to random selection, it has the AI use a nonrandom process: favoring next tokens that get a high score in an internal “scoring” function OpenAI invented. It might, for example, favor words with the letter V just slightly, so that text generated with this scoring rule will have 20 percent more Vs than normal human text (though the actual scoring functions are more complicated than this). Readers wouldn’t normally notice this — in fact, I edited this newsletter to increase the number of Vs in it, and I doubt this variation in my normal writing stood out.

Similarly, the watermarked text will not, at a glance, be different from normal AI output. But it would be straightforward for OpenAI, which knows the secret scoring rule, to evaluate whether a given body of text gets a much higher score on that hidden scoring rule than human-generated text ever would. If, for example, the scoring rule were my above example about the letter V, you could run this newsletter through a verification program and see that it has about 90 Vs in 1,200 words, more than you’d expect based on how often V is used in English. It’s a clever, technically sophisticated solution to a hard problem, and OpenAI has had a working prototype for two years.

So if we wanted to solve the problem of AI text masquerading as human-written text, it’s very much solvable. But OpenAI hasn’t released their watermarking system, nor has anyone else in the industry. Why not?

It’s all about competition

If OpenAI — and only OpenAI — released a watermarking system for ChatGPT, making it easy to tell when generative AI had produced a text, this wouldn’t affect student essay plagiarism in the slightest. Word would get out fast, and everyone would just switch over to one of the many AI options available today: Meta’s Llama, Anthropic’s Claude, Google’s Gemini. Plagiarism would continue unabated, and OpenAI would lose a lot of its user base. So it’s not shocking that they would keep their watermarking system under wraps.

In a situation like this, it might seem appropriate for regulators to step in. If every generative AI system is required to have watermarking, then it’s not a competitive disadvantage. This is the logic behind a bill introduced this year in the California state Assembly, known as the California Digital Content Provenance Standards, which would require generative AI providers to make their AI-generated content detectable, along with requiring providers to label generative AI and remove deceptive content. OpenAI is in favor of the bill — not surprisingly, as they’re the only generative AI provider known to have a system that does this. Their rivals are mostly opposed.

I’m broadly in favor of some kind of watermarking requirements for generative AI content. AI can be incredibly useful, but its productive uses don’t require it to pretend to be human-created. And while I don’t think it’s the place of government to ban newspapers from replacing us journalists with AI, I certainly don’t want outlets to misinform readers about whether the content they’re reading was created by real humans.

Though I’d like some kind of watermarking obligation, I am not sure it’s possible to implement. The best of the “open” AI models that have been released (like the latest Llama), models that you can run yourself on your own computer, are very high quality — certainly good enough for student essays. They’re already out there, and there’s no way to go back and add watermarking to them because anyone can run the current versions, whatever updates are applied in future versions. (This is among the many ways I have complicated feelings about open models. They enable an enormous amount of creativity, research, and discovery — and they also make it impossible to do all kinds of common-sense anti-impersonation or anti-child sexual abuse material measures that we otherwise might really like to have.)

So even though watermarking is possible, I don’t think we can count on it, which means we’ll have to figure out how to address the ubiquity of easy, AI-generated content as a society. Teachers are already switching to in-class essay requirements and other approaches to cut down on student cheating. We’re likely to see a switch away from college admissions essays as well — and, honestly, it’ll be good riddance, as those were probably never a good way to select students.

But while I won’t mourn much over the college admissions essay, and while I think teachers are very much capable of finding better ways to assess students, I do notice some troubling trends in the whole saga. There was a simple way to let us harness the benefits of AI without obvious downsides like impersonation and plagiarism, yet AI development happened so fast that society more or less just let the opportunity pass us by. Individual labs could do it, but they won’t because it’d put them at a competitive disadvantage — and there isn’t likely to be a good way to make everyone do it.

In the school plagiarism debate, the stakes are low. But the same dynamic reflected in the AI watermarking debate — where commercial incentives stop companies from self-regulating and the pace of change stops external regulators from stepping in until it’s too late — seems likely to remain as the stakes get higher.