Google’s “AI Overviews” Draws a Formal Complaint in Germany under the EU Digital Services Act

A coalition of NGOs, media associations, and publishers in Germany has filed a formal Digital Services Act (DSA) complaint against Google’s AI Overviews, arguing the feature diverts traffic and revenue from independent media, increases misinformation risks via opaque systems, and threatens media plurality. Under the DSA, violations can carry fines up to 6% of global revenue—a potentially multibillion-dollar exposure.

The complaint claims that AI Overviews answer users’ queries inside Google, short-circuiting click-throughs to the original sources and starving publishers of ad and subscription revenues. Because users can’t see how answers are generated or verified, the coalition warns of heightened misinformation risk and erosion of democratic discourse.

Why the Digital Services Act Matters

As I understand the DSA, the news publishers can either (1) lodge a complaint with their national Digital Services Coordinator alleging a platform’s DSA breach (triggers regulatory scrutiny);  (2) Use the platform dispute tools: first the internal complaint-handling system, then certified out-of-court dispute settlement for moderation/search-display disputes—often faster practical relief; (3) Sue for damages in national courts for losses caused by a provider’s DSA infringement (Art. 54); or (4) Act collectively by mandating a qualified entity or through the EU Representative Actions Directive to seek injunctions/redress (kind of like class actions in the US but more limited in scope). 

Under the DSA, Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) are services with more than 45 million EU users (approximately 10% of the population). Once formally designated by the European Commission, they face stricter obligations than smaller platforms: conducting annual systemic risk assessments, implementing mitigation measures, submitting to independent audits, providing data access to researchers, and ensuring transparency in recommender systems and advertising. Enforcement is centralized at the Commission, with penalties up to 6% of global revenue. This matters because VLOPs like Google, Meta, and TikTok must alter core design choices that directly affect media visibility and revenue.In parallel, the European Commission/DSCs retain powerful public-enforcement tools against Very Large Online Platforms. 

As a designated Very Large Online Platform, Google faces strict duties to mitigate systemic risks, provide algorithmic transparency, and avoid conduct that undermines media pluralism. The complaint contends AI Overviews violate these requirements by replacing outbound links with Google’s own synthesized answers.

The U.S. Angle: Penske lawsuit

A Major Publisher Has Sued Google in Federal Court Over AI Overview

On Sept. 14, 2025, Penske Media (Rolling Stone, Billboard, Variety) sued Google in D.C. federal court, alleging AI Overviews repurpose its journalism, depress clicks, and damage revenue—marking the first lawsuit by a major U.S. publisher aimed squarely at AI Overviews. The claims include an allegation on training-use claiming that Google enriched itself by using PMC’s works to train and ground models powering Gemini/AI Overviews, seeking restitution and disgorgement. Penske also argues that Google abuses its search monopoly to coerce publishers: indexing is effectively tied to letting Google (a) republish/summarize their material in AI Overviews, Featured Snippets, and AI Mode, and (b) use their works to train Google’s LLMs—reducing click-through and revenues while letting Google expand its monopoly into online publishing. 

Trade Groups Urged FTC/DOJ Action

The News/Media Alliance had previously asked the FTC and DOJ to investigate AI Overviews for diverting traffic and ‘misappropriating’ publishers’ investments, calling for enforcement under FTC Act §5 and Sherman Act §2.

Data Showing Traffic Harm

Industry analyses indicate material referral declines tied to AI Overviews. Digital Content Next reports Google Search referrals down 1%–25% for most member publishers over recent weeks; Digiday pegs impacts as much as 25%. The trend feeds a broader ‘Google Zero’ concern—zero-click results displacing publisher visits.

Why Europe vs. U.S. Paths Differ

The EU/DSA offers a procedural path to assess systemic risk and platform design choices like AI Overviews and levy platform-wide remedies and fines. In the U.S., the fight currently runs through private litigation (Penske) and competition/consumer-protection advocacy at FTC/DOJ, where enforcement tools differ and take longer to mobilize.

RAG vs. Training Data Issues

AI Overviews are best understood as a Retrieval-Augmented Generation (RAG) issue. Readers will recall that RAG is probably the most direct example of verbatim copying in AI outputs. The harms arise because Google as middleman retrieves live publisher content and synthesizes it into an answer inside the Search Engine Results Page (SERP), reducing traffic to the sources. This is distinct from the training-data lawsuits (Kadrey, Bartz) that allege unlawful ingestion of works during model pretraining.

Kadrey: Indirect Market Harm

A RAG case like Penske’s could also be characterized as indirect market harm. Judge Chhabria’s ruling in Kadrey under U.S. law highlights that market harm isn’t limited to direct substitution for fair use purposes. Factor 4 in fair use analysis includes foreclosure of licensing and derivative markets. For AI/search, that means reduced referrals depress ad and subscription revenue, while widespread zero-click synthesis may foreclose an emerging licensing market for summaries and excerpts. Evidence of harm includes before/after referral data, revenue deltas, and qualitative harms like brand erasure and loss of attribution. Remedies could include more prominent linking, revenue-sharing, compliance with robots/opt-outs, and provenance disclosures.

I like them RAG cases.

The Essential Issue is Similar in EU and US

Whether in Brussels or Washington, the core dispute is very similar: Who captures the value of journalism in an AI-mediated search world? Germany’s DSA complaint and Penske’s U.S. lawsuit frame twin fronts of a larger conflict—one about control of distribution, payment for content, and the future of a pluralistic press. Not to mention the usual free-riding and competition issues swirling around Google as it extracts rents by inserting itself into places it’s not wanted.

How an AI Moratorium Would Preclude Penske’s Lawsuit

Many “AI moratorium” proposals function as broad safe harbors with preemption. A moratorium to benefit AI and pick national champions was the subject of an IP Subcommittee hearing on September 18. If Congress enacted a moratorium that (1) expressly immunizes core AI practices (training, grounding, and SERP-level summaries), (2) preempts overlapping state claims, and (3) channels disputes into agency processes with exclusive public enforcement, it would effectively close the courthouse door to private suits like Penske and make the US more like Europe without the enforcement apparatus. Here’s how:

Express immunity for covered conduct. If the statute declares that using publicly available content for training and for retrieval-augmented summaries in search is lawful during the moratorium, Penske’s core theory (RAG substitution plus training use) loses its predicate.
No private right of action / exclusive public enforcement. Limiting enforcement to the FTC/DOJ (or a designated tech regulator) would bar private plaintiffs from seeking damages or injunctions over covered AI conduct.
Antitrust carve-out or agency preclearance. Congress could provide that covered AI practices (AI Overviews, featured snippets powered by generative models, training/grounding on public web content) cannot form the basis for Sherman/Clayton liability during the moratorium, or must first be reviewed by the agency—undercutting Penske’s §1/§2 counts.
Primary-jurisdiction plus statutory stay. Requiring first resort to the agency with a mandatory stay of court actions would pause (or dismiss) Penske until the regulator acts.
Preemption of state-law theories. A preemption clause would sweep in state unjust-enrichment and consumer-protection claims that parallel the covered AI practices.
Limits on injunctive relief. Barring courts from enjoining covered AI features (e.g., SERP-level summaries) and reserving design changes to the agency would eliminate the centerpiece remedy Penske seeks.
Potential retroactive shield. If drafted to apply to past conduct, a moratorium could moot pending suits by deeming prior training/RAG uses compliant for the moratorium period.

A moratorium with safe harbors, preemption, and agency-first review would either stay, gut, or bar Penske’s antitrust and unjust-enrichment claims—reframing the dispute as a regulatory matter rather than a private lawsuit. Want to bet that White House AI Viceroy David Sacks will be sitting in judgement?

Judge Failla’s Opinion in Dow Jones v. Perplexity: RAG as Mechanism of Infringement

Judge Failla’s opinion in Dow Jones v. Perplexity doesn’t just keep the case alive—it frames RAG itself as the act of copying, and raises the specter of inducement liability under Grokster.

Although Judge Katherine Polk Failla’s August 21, 2025 opinion in Dow Jones & Co. v. Perplexity is technically a procedural ruling denying Perplexity’s motions to dismiss or transfer, Judge Failla offers an unusually candid window into how the Court may view the substance of the case. In particular, her treatment of retrieval-augmented generation (RAG) is striking: rather than describing it as Perplexity’s background plumbing, she identified it as the mechanism by which copyright infringement and trademark misattribution allegedly occur.  

Remember, Perplexity’s CEO described the company to Forbes as “It’s almost like Wikipedia and ChatGPT had a kid.” I’m still looking for that attribution under the Wikipedia Creative Commons license.

As readers may recall, I’ve been very interested in RAG as an open door for infringement actions, so naturally this discussion caught my eye.  So we’re all on the page, retrieval-augmented generation (RAG) uses a “vector database” to expand an AI system’s knowledge beyond what is locked in its training data, including recent news sources for example. 

When you prompt a RAG-enabled model, it first searches the database for context, then weaves that information into its generated answer. This architecture makes outputs more accurate, current, and domain-specific, but also raises questions about copyright, data governance, and intentional use of third-party content mostly because RAG may rely on information outside of its training data.  Like if I queried “single bullet theory” the AI might have a copy of the Warren Commission report, but would need to go out on the web for the latest declassified JFK materials or news reports about those materials to give a complete answer.

You can also think of Google Search or Bing as a kind of RAG index—and you can see how that would give search engines a big leg up in the AI race, even though none of their various safe harbors, Creative Commons licenses, Google Books or direct licenses were for this RAG purpose.  So there’s that.

Judge Failla’s RAG Analysis

As Judge Failla explained, Perplexity’s system “relies on a retrieval-augmented generation (‘RAG’) database, comprised of ‘content from original sources,’ to provide answers to users,” with the indices “comprised of content that [Perplexity] want[s] to use as source material from which to generate the ‘answers’ to user prompts and questions.’” The model then “repackages the original, indexed content in written responses … to users,” with the RAG technology “tell[ing] the LLM exactly which original content to turn into its ‘answer.’” Or as another judge once said, “One who distributes a device with the object of promoting its use to infringe copyright, as shown by clear expression or other affirmative steps taken to foster infringement, going beyond mere distribution with knowledge of third-party action, is liable for the resulting acts of infringement by third parties using the device, regardless of the device’s lawful uses.” Or something like that.

On that basis, Judge Failla recognized Plaintiffs’ claim that infringement occurred at both ends of the process: “first, by ‘copying a massive amount of Plaintiffs’ copyrighted works as inputs into its RAG index’; second, by providing consumers with outputs that ‘contain full or partial verbatim reproductions of Plaintiffs’ copyrighted articles’; and third, by ‘generat[ing] made-up text (hallucinations) … attribut[ed] … to Plaintiffs’ publications using Plaintiffs’ trademarks.’” In her jurisdictional analysis, Judge Failla stressed that these “inputs are significant because they cause Defendant’s website to produce answers that are reproductions or detailed summaries of Plaintiffs’ copyrighted works,” thus tying the alleged misconduct directly to Perplexity’s business activities in New York although she was not making a substantive ruling in this instance.

What is RAG and Why It Matters

Retrieval-augmented generation is a method that pairs two steps: (1) retrieval of content from external databases or the open web, and (2) generation of a synthetic answer using a large language model. Instead of relying solely on the model’s pre-training, RAG systems point the model toward selected source material such as news articles, scientific papers, legal databases and instruct it to weave that content into an answer. 

From a user perspective, this can produce more accurate, up-to-date results. But from a legal perspective, the same pipeline can directly copy or closely paraphrase copyrighted material, often without attribution, and can even misattribute hallucinated text to legitimate sources. This dual role of RAG—retrieving copyrighted works as inputs and reproducing them as outputs—is exactly what made it central to Judge Failla’s opinion procedurally, but also may show where she is thinking substantively.

RAG in Frontier Labs

RAG is not a niche technique. It has become standard practice at nearly every frontier AI lab:

– OpenAI uses retrieval plug-ins and Bing integrations to ground ChatGPT answers.
– Anthropic deploys RAG pipelines in Claude for enterprise customers.
– Google DeepMind integrates RAG into Gemini and search-linked models.
– Meta builds retrieval into LLaMA applications and experimental assistants like Grok.
– Microsoft has made Copilot fundamentally a RAG product, pairing Bing with GPT.
– Cohere, Mistral, and other independents market RAG as a service layer for enterprises.

Why Dow Jones Matters Beyond Perplexity

Perplexity just happened to be first reported opinion as far as I know. The technical structure of its answer engine—indexing copyrighted content into a RAG system, then repackaging it for users—is not unique. It mirrors how the rest of the frontier labs are building their flagship products. What makes this case important is not that Perplexity is an outlier, but that it illustrates the legal vulnerability inherent in the RAG architecture itself.

Is RAG the Low-Hanging Fruit?

What makes this case so consequential is not just that Judge Failla recognized, at least for this ruling, that RAG is at least one mechanism of infringement, but that RAG cases may be easier to prove than disputes over model training inputs. Training claims often run into evidentiary hurdles: plaintiffs must show that their works were included in massive opaque training corpora, that those works influenced model parameters, and that the resulting outputs are “substantially similar.” That chain of proof can be complex and indirect.

By contrast, RAG systems operate in the open. They index specific copyrighted articles, feed them directly into a generation process, and sometimes output verbatim or near-verbatim passages. Plaintiffs can point to before-and-after evidence: the copyrighted article itself, the RAG index that ingested it, and the system’s generated output reproducing it. That may make proving copyright infringement far more straightforward to demonstrate than in a pure training case.

For that reason, Perplexity just happened to be first, but it will not be the last. Nearly every frontier lab such as OpenAI, Anthropic, Google, Meta, Microsoft is relying on RAG as the architecture of choice to ground their models. If RAG is the legal weak point, this opinion could mark the opening salvo in a much broader wave of litigation aimed at AI platforms, with courts treating RAG not as a technical curiosity but as a direct, provable conduit for infringement. 

And lurking in the background is a bigger question: is Grokster going to be Judge Failla’s roundhouse kick? That irony is delicious.  By highlighting how Perplexity (and the others) deliberately designed its system to ingest and repackage copyrighted works, the opinion sets the stage for a finding of intentionality that could make RAG the twenty-first-century version of inducement liability.

How Google’s “AI Overviews” Product Exposes a New Frontier in Copyright Infringement and Monopoly Abuse: Lessons from the Chegg Lawsuit

In February 2025, Chegg, Inc.—a Santa Clara education technology company—filed what I think will be a groundbreaking antitrust lawsuit against Google and Alphabet over Google’s use of “retrieval augmented generation” or “RAG.” Chegg alleges that the search monopolist’s new AI-powered search product, AI Overviews, is the latest iteration of its longstanding abuse of monopoly power.

The Chegg case may be the first major legal test of how RAG tools, like those powering Google’s AI search features, can be weaponized to maintain dominance in a core market—while gutting adjacent industries.

What Is at Stake?

Chegg’s case is more than a business dispute over search traffic. It’s a critical turning point in how regulators, courts, and the public understand Google’s dual role as:
– The gatekeeper of the web, and
– The competitor to every content publisher, educator, journalist, or creator whose material feeds its systems.

According to Chegg, Google’s AI Overviews scrapes and repackages publisher content—including Chegg’s proprietary educational explanations—into neatly summarized answers, which are then featured prominently at the top of search results. These AI responses provide zero compensation and little visibility for the original source, effectively diverting traffic and revenue from publishers who are still needed to produce the underlying content. Very Googley.

Chegg alleges it has experienced a 49% drop in non-subscriber traffic from Google searches, directly attributing the collapse to the introduction of AI Overviews. Google, meanwhile, offers its usual “What, Me Worry?” defense and insists its AI summaries enhance the user experience and are simply the next evolution of search—not a monopoly violation. Yeah, right, that’s the ticket.

But the implications go far beyond Chegg’s case.

Monopoly Abuse, Evolved for AI

The Chegg lawsuit revives a familiar pattern from Google’s past:

– In the 2017 Google Shopping case, the EU fined Google €2.42 billion for self-preferencing—boosting its own comparison shopping service in search while demoting rivals.
– In the U.S. DOJ monopoly case (2020–2024), a federal court found that Google illegally maintained its monopoly by locking in default search placement on mobile browsers and devices.

Now with AI Overviews, Google is not just favoring its own product in the search interface—it is repurposing the product of others to power that offering. And unlike traditional links, AI Overviews can satisfy a query without any click-through, undermining both the economic incentive to create content and the infrastructure of the open web.

Critically, publishers who have opted out of AI training via robots.txt or Google’s own tools like Google-Extended find that this does not block RAG-based uses in AI Overviews—highlighting a regulatory gap that Google exploits. This should come as no surprise given Google’s long history of loophole seeking arbitrage.

Implications Under EU Law

The European Union should take note. Article 102 of the Treaty on the Functioning of the European Union (TFEU) prohibits dominant firms from abusing their market position to distort competition. The same principles that justified the €2.42B Google Shopping fine and the 2018 €4.1B Android fine apply here:

– Leveraging dominance in general search to distort competition in education, journalism, and web publishing.
– Self-preferencing and vertical integration via AI systems that cannibalize independent businesses.
– Undermining effective consent mechanisms (like AI training opt-outs) to maintain data advantage.

Chegg’s case may be the canary in the coal mine for what’s to come globally as more AI systems become integrated into dominant platforms. Google’s strategy with AI Overviews represents not just feature innovation, but a structural shift in how monopolies operate: they no longer just exclude rivals—they absorb them.

A Revelatory Regulatory Moment

The Chegg v. Google case matters because it pushes antitrust law into the AI litigation arena. It challenges regulators to treat search-AI hybrids as more than novel tech. They are economic chokepoints that extend monopoly control through invisible algorithms and irresistible user interfaces.

Rights holders, US courts and the European Commission should watch closely: this is not just a copyright fight—it’s a competition law flashpoint.

How RAG Affects Different Media and Web Publishers

Note: RAG systems can use audiovisual content, but typically through textual intermediaries like transcripts, not by directly retrieving and analyzing raw audio/video files. But that could be next.

CategoryExamples of Rights HoldersHow RAG Uses the Content
Film Studios / ScriptwritersParamount, Amazon, DisneySummarizes plots, reviews, and character arcs (e.g., ‘What happens in Oppenheimer?’)
Music Publishers / SongwritersUniversal, Concord, Peer/Taylor Swift/Bob Dylan/Kendrick LamarDisplays lyrics, interpretations, and credits (e.g., ‘Meaning of Anti-Hero by Taylor Swift’)
News OrganizationsCNN, Reuters, BBCGenerates summaries from live news feeds (e.g., ‘What’s happening in Gaza today?’)
Book Publishers / AuthorsHarpersCollins, Hachette, Macmillan Synthesizes themes, summaries, and reviews (e.g., ‘Theme of Beloved by Toni Morrison’)
Gaming Studios / ReviewersGameFAQs, IGN, RedditExplains gameplay strategies using fan walkthroughs (e.g., ‘How to defeat Fire Giant in Elden Ring’)
Visual Artists / PhotojournalistsArtNet, Museum Sites, Personal PortfoliosExplains style and methods from exhibition texts and bios (e.g., ‘How does Banksy create his art?’)
Podcasters / Transcription ServicesPodcast transcripts, show notesPulls quotes and summaries from transcript databases (e.g., ‘What did Ezra Klein say about AI regulation?’)
Educational Publishers / EdTechKhan Academy, Chegg, PearsonDelivers step-by-step solutions and concept explanations (e.g., ‘Explain the Pythagorean Theorem’)
Science and Medical PublishersMayo Clinic, MedlinePlus, PubMedAnswers medical questions with clinical and scientific data (e.g., ‘Symptoms of lupus’)