Owners of 400 newspapers, including WEHCO Newspapers, owners of the McDonald County Press, the Little Rock Democrat-Gazette and the Northwest Arkansas Democrat-Gazette in Fayetteville and Rust Newspapers, which owns the Nevada Daily Mail and Fort Scott Tribune, sued Microsoft and Open AI Wednesday claiming their work product is being used without permission.
The lawsuit was filed in U. S. District Court for the Southern District of New York.
From the petition:
This lawsuit arises from Defendants’ systematic and willful theft of hundreds of thousands of copyrighted articles belonging to the Publishers, who collectively own and operate nearly 400 local and regional newspaper outlets across the country, all of whom have spent decades—and in some cases over a century—investing in the journalists, editors, and infrastructure required to produce the trusted, original reporting on which their communities depend.Without permission and without any compensation to the Publishers, Defendants scraped, copied, and
ingested that content to build and commercialize their generative artificial intelligence (“GenAI”)
products, including ChatGPT and Microsoft Copilot.
ingested that content to build and commercialize their generative artificial intelligence (“GenAI”)
products, including ChatGPT and Microsoft Copilot.
Those products have generated hundreds of billions of dollars (and counting) in market value for Defendants. Not a cent of it has gone to the Publishers whose work made it possible.
Using automated systems, Defendants systematically and secretly crawled the Publishers’ websites—including content behind paywalls and other access restrictions—and copied the Publishers’ articles, stories, and other original works onto their own servers without authorization.
Using automated systems, Defendants systematically and secretly crawled the Publishers’ websites—including content behind paywalls and other access restrictions—and copied the Publishers’ articles, stories, and other original works onto their own servers without authorization.
As part of that process, Defendants’ systems stripped from the Publishers’ works all copyright management information (“CMI”) embedded in and associated with those works, such as author credits, publication names, copyright notices, and terms of use information, that establish ownership and signal that a work is protected.
That CMI-stripping, an instrumental part of Defendants’ ingestion pipeline, helped sever the link between the copied content and its rightful owners and authorizations. The scraped, stripped content was then used to train Defendants’ large language models (“LLMs”), which have “memorized” that material and likely reproduced it, verbatim or near-verbatim, in response to user prompts for years.
And because Defendants’ models must be continuously updated with new material to remain current and commercially viable, these processes have been repeated over and over and over again.
That Defendants’ conduct was willful is beyond dispute. OpenAI’s founder, Sam Altman, acknowledged as much in testimony before the British House of Lords, conceding that it would be “impossible to train today’s leading AI models without using copyrighted materials.”
Defendants made deliberate engineering choices to copy the Publishers’ content and strip its CMI, knowing it would obscure the origins of the works they were taking and impair the Publishers’
ability to detect and prove the theft. And even as litigation from other publishers mounted, and as
courts began to recognize the validity of these claims, Defendants pressed forward undeterred.
Defendants’ data scraping processes, their products’ storage and reproduction of “memorized”
content, and therefore their violations of the Copyright Act and the Digital Millennium Copyright
Act, continue to this day.
On the backs of the Publishers, Defendants built some of the most valuable businesses in human history. OpenAI, once styled as a nonprofit, now commands a valuation worth nearly one trillion dollars.
That Defendants’ conduct was willful is beyond dispute. OpenAI’s founder, Sam Altman, acknowledged as much in testimony before the British House of Lords, conceding that it would be “impossible to train today’s leading AI models without using copyrighted materials.”
Defendants made deliberate engineering choices to copy the Publishers’ content and strip its CMI, knowing it would obscure the origins of the works they were taking and impair the Publishers’
ability to detect and prove the theft. And even as litigation from other publishers mounted, and as
courts began to recognize the validity of these claims, Defendants pressed forward undeterred.
Defendants’ data scraping processes, their products’ storage and reproduction of “memorized”
content, and therefore their violations of the Copyright Act and the Digital Millennium Copyright
Act, continue to this day.
On the backs of the Publishers, Defendants built some of the most valuable businesses in human history. OpenAI, once styled as a nonprofit, now commands a valuation worth nearly one trillion dollars.
Microsoft’s deployment of its Copilot product has added hundreds of billions of dollars to its market capitalization. These are not the fruits of Defendants’ ingenuity alone. The Publishers’ journalism was essential to the Defendants’ explosive growth, and unless Defendants are held accountable for stealing, stripping, and misusing the Publishers’ content, the AI boom Defendants orchestrated and benefit from will be a death knell for local journalism—which remains the most trusted news sources in America.
The Publishers are generally independently-owned newspaper companies. Many are family-owned small businesses. They are the lifeblood of the communities they serve. They send reporters to city council meetings and school board hearings. They investigate corruption and hold local officials accountable. They are the outlets that cover the latest high school football game, the new restaurant opening downtown, or the storm bearing down on the coast. They publish obituaries, job listings, and apartment notices. They convey to their readers everyday stories of local civic life that national outlets do not cover.
The Publishers have spent billions of dollars to sustain this work. Defendants helped themselves to all of it—without providing a cent of compensation.
The U.S. Constitution has, since the nation’s founding, charged Congress with protecting authors’ and publishers’ exclusive rights in their work. Congress has exercised that authority to implement robust protections against copyright theft, including through enacting the Copyright Act and the Digital Millennium Copyright Act, and authorizing substantial penalties for willful violations of both.
The Publishers are generally independently-owned newspaper companies. Many are family-owned small businesses. They are the lifeblood of the communities they serve. They send reporters to city council meetings and school board hearings. They investigate corruption and hold local officials accountable. They are the outlets that cover the latest high school football game, the new restaurant opening downtown, or the storm bearing down on the coast. They publish obituaries, job listings, and apartment notices. They convey to their readers everyday stories of local civic life that national outlets do not cover.
The Publishers have spent billions of dollars to sustain this work. Defendants helped themselves to all of it—without providing a cent of compensation.
The U.S. Constitution has, since the nation’s founding, charged Congress with protecting authors’ and publishers’ exclusive rights in their work. Congress has exercised that authority to implement robust protections against copyright theft, including through enacting the Copyright Act and the Digital Millennium Copyright Act, and authorizing substantial penalties for willful violations of both.
Defendants invoke those protections vigorously for their own products, shielding their code, their models, and their systems behind licenses, paywalls, and legal threats.
In bringing this action, the Publishers seek to hold Defendants to the same standard they insist
upon for themselves.
Novel as the technology at issue may be, this is not a case of first impression. The Publishers, along with other news publishers, authors, and other copyright holders across the country have brought these same claims against these same Defendants, and those cases have survived motions to dismiss largely intact. Defendants have chosen to continue their unlawful conduct rather than rectify it. This lawsuit seeks to hold Defendants fully accountable for every violation—past, present, and ongoing.
In bringing this action, the Publishers seek to hold Defendants to the same standard they insist
upon for themselves.
Novel as the technology at issue may be, this is not a case of first impression. The Publishers, along with other news publishers, authors, and other copyright holders across the country have brought these same claims against these same Defendants, and those cases have survived motions to dismiss largely intact. Defendants have chosen to continue their unlawful conduct rather than rectify it. This lawsuit seeks to hold Defendants fully accountable for every violation—past, present, and ongoing.
The three-count lawsuit alleges copyright infringement, vicarious copyright infringement and violation of the Digital Millennium Copyright Act and asks for compensatory damages, that the use of their material be stopped and that all copies of their material be removed from Chat GPT.
The plaintiffs are asking for a jury trial.

No comments:
Post a Comment