The Irish Times view on the New York Times vs OpenAI: rights and wrongs of artificial intelligence

It seems reasonable that those who create the original content without which these systems could not exist should have a claim to a share of any profits

The ChatGPT application developed by US artificial intelligence research organization OpenAI
The ChatGPT application developed by US artificial intelligence research organization OpenAI

When the New York Times filed suit against the artificial intelligence company OpenAI recently, it set in train what promises to be one of the most intriguing legal battles of 2024. The issues at stake might at first appear esoteric, but they go to the heart of unresolved questions raised by recent developments in large language models powered by AI. These models use immense computing power to ingest vast amounts of information from many different sources. They then use that knowledge to answer user requests and carry out complex tasks. The issue raised by the suit is whether by doing this with New York Times articles, OpenAI is breaching copyright law.

The New York Times alleges that ChatGPT, the large language model owned by OpenAI, along with the similar service provided by Microsoft’s Bing, is illegally using the newspaper’s original content. It cites the example of Bing generating responses to search requests that contain “verbatim excerpts” of its articles. OpenAI responds that direct reproduction is an error which it is willing to correct, and accuses the New York Times of being too quick to go to court.

Some may feel we have been here before. The technological advances of the past three decades delivered massive profits for search and social media companies, while decimating the revenues of traditional content creators in publishing, music and media. But OpenAI argues, as Google and Facebook did before it, that it would prefer to have a “constructive relationship” with the New York Times and other publishers.

These issues go well beyond the commercial interests of any one company. The new language models draw on copyrighted material from across the world for the training datasets they require. Whether such activity is covered by the traditional concept of “fair use” has yet to be determined. It seems reasonable, though, that companies and individuals who actually create the original content without which these systems could not exist should have a legitimate claim to a share of any profits that ensue.