News listAnthropic shredded and destroyed 2 million books directly after scanning them to train Claude
動區 BlockTempo2026-05-19 08:38:32

Anthropic shredded and destroyed 2 million books directly after scanning them to train Claude

ORIGINALAnthropic 掃描 200 萬本書籍訓練 Claude 後,直接送入碎紙機銷毀
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯2169 words
The lawsuit stemming from Anthropic secretly purchasing 2 million books, cutting off their spines, scanning them, and sending them to the shredder reached its final hearing last week. But once public discussion died down, there was no new legislation, no industry standards, and no second company named. (Recap: Anthropic stealthily used 7 million books to train Claude, facing a trillion-dollar piracy lawsuit! The AI giant's reckless sprint and the legal boundaries) (Background: Musk loses to OpenAI! His $134 billion damages claim fails, removing the biggest legal obstacle to Altman's push for an IPO) On May 14, 2026, the final hearing of Bartz v. Anthropic was held in San Francisco. The lawsuit's origin was Anthropic's "Project Panama" — a plan that secretly purchased 2 million books, cut off their spines, scanned them, and destroyed the originals. The lawsuit was settled in September 2025 for $1.5 billion, and last week officially came to a close. But although the lawsuit has ended, the problems it exposed have not. The plaintiffs: Andrea Bartz (thriller novelist), Charles Graeber, and Kirk Wallace Johnson (nonfiction writers), representing approximately 500,000 authors, filed the class action lawsuit in 2024 at the U.S. District Court for the Northern District of California. Presiding judge: William Alsup. Rulings: - June 2025: Judge Alsup ruled that legally purchased books used for AI training qualify as fair use; however, pirated books downloaded through shadow libraries such as LibGen are not protected by fair use - September 5, 2025: Anthropic reached a settlement with the plaintiffs, paying $1.5 billion (the largest copyright settlement in U.S. history), approximately $3,000 per book - September 25, 2025: Judge Alsup preliminarily approved the settlement - May 14, 2026: Final approval hearing In the files unsealed at the beginning of the year, there was an internal Anthropic planning document that read: "Project Panama is our plan to perform destructive scanning of all the books in the world." In the notes column beside it, there was another line: "We don't want the outside world to know we are doing this." At the time, Anthropic commissioned a professional archive scanning vendor, which processed 500,000 to 2 million books over about six months. The process had three steps: - A hydraulic book cutter neatly removed the spines - High-speed production-grade scanners scanned page by page into digital files - Finally, a recycling company periodically hauled away the paper waste The book sources came from used bookseller The Strand — a nearly century-old used bookstore in New York that was on the procurement list — along with online used-book platforms and libraries. What they purchased weren't rare antiquarian books, but ordinary used books that had been read and could be resold, procured in batches of hundreds of thousands. Anthropic's legal logic was clear: the "first sale doctrine" allows the buyer to dispose of legally purchased physical books in any way; destroying the originals prevents illegal recirculation, further reinforcing the claim of "transformative use." The judge ultimately found this conduct to be fair use. But the large number of pirated books used during the same period did not have this protection, ultimately resulting in a settlement of approximately $1.5 billion — about $3,000 in compensation per book. Anthropic also knew this didn't look good if it got out, so the phrase "don't want the public to know" appeared more than once in internal files. This deliberate low-key approach stands in direct contrast to the company's long-standing high-profile rhetoric on AI safety issues. Anthropic's public image is built on "responsible AI development"; but this secret book-cutting plan does not align with that image. Actually, back in 2004, Google publicly announced partnerships with top research libraries such as Harvard, Stanford, and Oxford, launching the "Google Books" project. Google likewise scanned tens of millions of books, likewise faced a copyright lawsuit (Authors Guild v. Google), and likewise was ruled fair use by the court in 2015. But Google did two things that Anthropic did not, and community perception was completely different. First, the books Google scanned all remained in the libraries: the originals were preserved intact, the libraries also received digital backups, and the public gained additional channels to access these books. Second, the purpose was to make the books findable: search indexing and snippet previews helped readers discover works they didn't know existed. The beneficiaries were everyone who used the search engine. The project was public, and publishers could opt out. What Anthropic did was something different: the contents of the books entered the parameter layer of a private model, becoming the core competitive advantage of a commercial AI product. The originals were destroyed, later readers had no way to find those books, the authors received about $3,000 in settlement money, and the beneficiaries were Anthropic's shareholders and valuation. At the same time, the plan was secret, with no opt-out mechanism whatsoever. In both cases, the judge ultimately ruled fair use. But the concept of "transformative use," in Google's context, means turning a book into a findable entry point; in Anthropic's context, it means digesting a book into a private AI and then making the original disappear. The reason humans store written knowledge is so that the next person can read it. Libraries, the used book market, public circulation after copyright expiration — they're all based on the same assumption: knowledge should be able to be transmitted and accessed again. But Claude's Project Panama broke this chain: the contents of the books went into Anthropic's private model, the originals disappeared, the authors received settlement money, and later readers lost one book they could have found. And harder to resolve than copyright compensation is that there is currently no mechanism at all to allow the decision of "which knowledge is worth being trained into the model" to be publicly discussed, questioned, and adjusted. The selection of training data is one of the most upstream decisions in AI, yet it is almost the least touched part of regulatory discussion. Project Panama is the first case to be unsealed. But this is not an isolated case — it's just that most cases haven't had a copyright lawsuit help walk them out from the shadow of confidentiality agreements. The boundaries of fair use in the future will need more discussion in the AI era.
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching6 items
💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later
Raw Information
ID:290d947f2b
Source:動區 BlockTempo
Published:2026-05-19 08:38:32
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments