What should newsrooms do with AI?
As OpenAI and Google explore news partnerships, risks are everywhere
I.
Recently some gamers on a World of Warcraft forum noticed that a website called Z League is publishing articles that appear to be based on popular Reddit threads about the game. While the articles carry human-sounding bylines, the site carries no contact information for them, and the authors do not seem to have LinkedIn profiles.
Moreover, the Redditors observed, their articles bear all the hallmarks of artificial intelligence-written copy: bland writing laden with cliches; a heavy reliance on bullet points; and a structure that more closely resembles a book report than a traditional news article.
Most of the word count in these pieces is taken up by comments from Redditors, presumably scraped directly from the site, and linked together with bare-bones transitions.
“Reddit user OhhhYaaa shared the news about the ban, and many Counter-Strike players were quick to express their outrage and disappointment,” reads one passage in a piece headlined “Counter-Strike Players Can No Longer Wear Crocs During ESL Pro Tour Events.” “User RATTRAP666 simply commented, ‘jL in shambles,’ reflecting the sentiment of many players who feel that the new rule is unnecessary and unfair.”
Annoyed that Z League was repackaging their comments in this way — and monetizing them on a page choked to the breaking point with disruptive advertising — the Redditors proposed a prank: posting enthusiastically about their anticipation of “Glorbo,” an entirely fictional (and never-described) new feature of WoW. If Z League’s AI were as dumb as the gamers suspected, surely Z League would post about Glorbo mania.
On Thursday, the Redditors’ dreams came true. (Hat tip to The Verge’s Makena Kelly for pointing this out.) “World of Warcraft (WoW) Players Excited for Glorbo’s Introduction,” the SEO-friendly headline declared. The bot’s selection of quotes from Reddit was, in its way, perfect:
Reddit user kaefer_kriegerin expresses their excitement, stating, “Honestly, this new feature makes me so happy! I just really want some major bot operated news websites to publish an article about this.” This sentiment is echoed by many other players in the comments, who eagerly anticipate the changes Glorbo will bring to the game.
Ever since CNET was caught using an AI to misreport dozens of personal finance stories, the media world has been bracing for the arrival of sites like this one: serving brain-dead forum scraps, stitched together by a know-nothing API, generated ad infinitum and machine-gunned into Google’s web crawler in an effort to leech away ad revenue from more reputable sites.
Content farms are nothing new, of course; long before there was ChatGPT, there were eHow and Outbrain and Taboola. What’s different this time around, though, is that more reputable publishers appear poised to get in on the act.
AI and journalism are about to collide in ways both public and private. And before they do — for the love of Glorbo — we ought to talk about how that should work.
II.
The previous wave of stories around the intersection of AI and the news business centered on copyright issues. Is it legal for companies like OpenAI and Google to train large language models using news stories they scrape from the web? Artists, writers, and filmmakers have already filed lawsuits arguing that it is not; lawsuits from publishers seem likely as well.
For that reason, it doesn’t feel like a coincidence that some AI makers have belatedly begun to attempt buying some goodwill. Last week, OpenAI signed a deal with the Associated Press that, among other things, lets the company train its language models on historical AP copy. And this week, OpenAI announced a $5 million grant to the American Journalism Project, which makes grants to local news publishers around the country.