The development of large language models (LLMs) is once again colliding with copyright law. This time, Nvidia is in the crosshairs, facing a class-action lawsuit filed by a group of authors. At the heart of the dispute is the use of a massive training dataset called “The Pile,” which includes a batch known as “Books3.” Investigators argue that this collection pulls from the private torrent tracker Bibliotik and contains over 197,000 illegally shared e-books used to train the corporation’s algorithms.
Nvidia’s lawyers tried to get the allegations thrown out, basing their defense on the Supreme Court ruling in the Cox v. Sony case. They argued that the NeMo Megatron Framework has multiple legitimate uses and that the corporation is merely acting as an internet service provider (ISP), meaning it cannot be held liable for its users’ actions and piracy.
US District Judge Jon Tigar categorically rejected this narrative. He noted that the issue isn’t the framework itself, but specific scripts built into Nvidia’s tools that were allegedly used almost exclusively to automate and accelerate the mass downloading of pirated datasets.
“The scripts are alleged to have no other purpose than to speed up the process of infringement, unlike the digital video recorder systems at issue in Sony Corp. or the internet service provided in Cox” – Judge Tigar wrote in his ruling (Tom’s Hardware).
The court’s decision sparked outrage online. The tech enthusiast community tore into Nvidia, pointing out the absurdity of its defense strategy. A user going by the handle bigdragon bluntly noted that citing the Cox v. Sony case is ridiculous, since during the AI training process, Nvidia was the actual consumer of the pirated content, not a passive middleman like a classic telecom operator.
Netizens also highlighted the purely financial aspect of the whole operation, exposing the Silicon Valley giant’s true intentions. As commenter GenericUser2001 calculated, the trillion-dollar corporation could have simply avoided the lawsuit by buying every single title it needed. Assuming an average market price of $20 per book, legally acquiring 197,000 works would have cost Nvidia less than $4 million – absolute peanuts for the semiconductor leader.
The dominant tone in the discussion, however, remains a massive accusation of double standards. User Shiznizzle emphasized that if an average citizen built a business on illegally downloading content, they would immediately go to jail, whereas tech corporations employ sophisticated mental gymnastics to turn theft into a completely legal practice. The situation was summarized by user DRagor:
“Normal person copies a copyrighted data for personal use: It’s a Piracy! Meanwhile big company copies thousands of those for commercial use: it’s a fair use” (Tom’s Hardware).
Judge Tigar’s ruling means the case against Nvidia will move forward and could set a massive precedent for the entire market. Authors and creators have no intention of backing down – Meta has been battling similar, ongoing lawsuits since last year, and giants like Google are being forced to lobby increasingly hard to get algorithmic training written into fair use legislation.

