Unbiased AI-powered news
Global news organizations are blocking the Internet Archive's web crawlers to prevent AI companies from using archived content to train models without permission. This move follows ongoing lawsuits against firms like OpenAI for copyright infringement. A separate court ruling denied copyright protection for fully AI-generated works, potentially preserving human roles in creative industries.
theweek.comAround 245 news organizations from nine countries have begun blocking the Internet Archive's crawlers, which capture and archive web pages for the Wayback Machine. These blocks aim to stop AI companies from accessing historical content to train large language models without compensation or consent.
The Internet Archive holds over one trillion web pages dating back to 1996, including past articles from outlets like CNN, The New York Times, The Guardian, and USA Today. More than 20 major news organizations already block the main crawler, ia_archiverbot, according to an analysis by Originality AI.
At least one of the Archive's four bots is blocked by 241 global news sites, many owned by USA Today Co, the largest U.S. newspaper publisher. This has effectively removed hundreds of local publications from historical records.
Content from the Internet Archive has appeared in key AI datasets, prompting lawsuits against companies like Perplexity and OpenAI for alleged copyright violations. News organizations argue this use competes directly with their original journalism.
“The issue is that Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us," — Graham James, a spokesperson from The New York Times, as cited by The Next Web. Some outlets, such as The Guardian, have opted to limit rather than fully block the Archive's access. The Internet Archive's director stated that the organization is collateral damage in the dispute, with AI companies being the real issue. The Archive has implemented measures like preventing large downloads and limiting automated extractions to curb misuse.”
90 lawsuits have been filed by creators, including authors, musicians, artists, and news publishers, accusing AI firms like OpenAI, Meta, and Anthropic of using copyrighted works to train models without permission. The Atlantic is involved in one such lawsuit against Cohere.
These cases highlight concerns about the future of creative labor. A 2024 court decision in Thaler v. Perlmutter ruled that works generated autonomously by AI cannot receive copyright protection, as copyright requires a human author. The Supreme Court declined to review this in March.
This leaves open questions about how much AI involvement renders a work uncopyrightable. The ruling has economic implications for industries reliant on monetizing intellectual property through licensing. Entertainment companies like studios, record labels, and book publishers depend on copyright to generate revenue from films, music, and books.
Major players have avoided fully AI-generated content to maintain copyrightability. Netflix's production guidelines warn against using AI for main characters, key visuals, or central settings without approval. Hachette pulled the book Shy Girl after allegations of AI-written portions.
These decisions reflect business pragmatism, as uncopyrightable AI content cannot be licensed or protected from copying. The prohibition incentivizes keeping human creators involved to preserve profitable IP models. OpenAI's video tool Sora, announced with a licensing deal with Disney, was shut down months later.
Sources suggested high costs and lack of popularity contributed, but the inability to copyright AI-generated output may have factored in, making large investments unviable.
The Copyright Office has suggested that human prompting alone is insufficient for copyright on AI outputs, though courts have not yet ruled. There are calls for harsher penalties on misrepresenting AI involvement in registrations.
“The Atlantic is involved in one such lawsuit, against the AI firm Cohere." — The Atlantic article. Advocacy group Fight for the Future launched a petition, signed by 100 journalists, protesting blocks on the Archive. They argue preservation is crucial for accountability, as the Wayback Machine tracks edits to articles. Some news organizations are seeking compromises with the Archive to limit access without full blocks.”
nypost.comSuper PACs tied to Anthropic and OpenAI have spent more than $37 million on congressional primaries this cycle. The groups have outspent candidates in some races and focused on candidates who back differing approaches to AI regulation.
flipboard.comPresident Trump met Anthropic CEO Dario Amodei at the G7 summit and described talks on restoring access to Fable 5 and Mythos 5 as progressing. The company disabled the models for all users after an administration order to block foreign nationals.
techcentral.co.zaAmazon Web Services is in early talks to sell its Trainium chips outside its own data centers. The move follows statements in Andy Jassy’s April shareholder letter projecting a potential $50 billion annual run rate.