Since it’s 1999, What MGM v. Grokster Teaches Us About Perplexity’s Bizarre Infringement Defense

Nate Garhart writing in Reuters analyzes Perplexity AI’s novel—some might say bizarre—legal defense in copyright suits filed by the New York Times and the Chicago Tribune in December 2025.   Rather than relying primarily on fair use, the typical defense in AI infringement cases, Perplexity instead argues it lacked “volitional conduct” sufficient for direct copyright infringement, contending that it did not “make” the infringing copies in a legally relevant sense. The defense in Perplexity’s motion to dismiss draws on the Second Circuit’s 2008 Cartoon Network v. CSC Holdings decision, where a DVR service was not held directly liable because the user, not the service, initiated the recording of each specific work.  Sound familiar?  That’s one straight outta 1999.  You know, the technology made me do it.

Why Generative AI Is Not a Passive Conduit

Mr. Garhart makes clear that Perplexity’s attempt to cast itself as a mere automated tool triggered by user prompts is fundamentally at odds with how generative AI systems actually work. There are several reasons why the “passive conduit” framing fails.

Deliberate System Architecture Embodies Volition

The Grokster Inducement Framework Reinforces This Analysis

The Court identified three particularly notable features of intent evidence:

  1. Failure to implement filtering or safeguards: Neither defendant developed tools to diminish infringing activity, which—while not independently sufficient—was probative of intent alongside other evidence. 

Moreover, at each stage of Perplexity’s training pipeline, human decision-making is deeply embedded: engineers and researchers decide what content to tokenize, how to structure training data, and which model behaviors to reinforce or suppress through “reinforcement learning from human feedback” (RLHF) and other fine-tuning methods. The resulting system is curated by humans at multiple points in the typical workflow from dataset selection and preprocessing, to model alignment and quality control, meaning the outputs are not the product of a purely autonomous process but rather of layered, intentional design choices made by people, or more precisely, by Perplexity.

Tokenization itself is a telling example of design choice: by selecting a tokenization scheme and deciding which corpora to process (and spend scarce compute resources on), the system’s developers are making both editorial and commercial judgments about what material the model will learn from and be capable of reproducing. These upstream human choices further undercut the notion that the system is a passive conduit simply responding to downstream user prompts.

Importantly, these tokenization decisions are not made in a vacuum or for altruistic reasons—they are driven by the commercial imperative of delivering a product sufficiently useful that consumers will pay Perplexity for it, rather than paying the New York Times or other original publishers for their journalism. The economic logic is plain: the more effectively the system can ingest and repackage high-quality copyrighted content, the more valuable the product becomes to subscribers, and the more extracted revenue flows to Perplexity instead of to the creators whose work fuels the system. These upstream human choices further undercut the notion that the system is a passive conduit simply responding to user prompts.  Sound familiar?

Applying Grokster‘s Logic to Generative AI

Several design features of a generative AI answer engine map onto the Grokster framework, even without identical facts:

The Causal Chain Is Not Broken by a User Prompt

I think Mr. Garhart’s most compelling point is that a user’s query is not the kind of discrete, volitional act that broke the causal chain in Cartoon Network.  A user who types “What does the New York Times say about X?” is asking a question—not selecting a specific copyrighted work and pressing “copy” as with a DVR. The Perplexity system then selects, processes, and generates expressive content drawn from copyrighted sources because that’s how it was trained.   The Grokster Court rejected the notion that intermediaries like Perplexity could hide behind user-initiated actions when those intermediaries had built systems designed to facilitate infringement and had taken affirmative steps to encourage it. 

Critically, the generative AI system’s response to a prompt is shaped by decisions made long before the user ever typed a query. Humans selected the training corpora, decided how text would be tokenized and encoded, fine-tuned the model’s outputs through iterative RLHF and other quality-control processes, and designed the retrieval and generation architecture. Each of these steps reflects purposeful human conduct—not the behavior of a neutral pipe. A system in which humans curate the inputs, architect the processing, and refine the outputs at multiple stages is, by any reasonable measure, an active participant in producing the allegedly infringing content.

In sum, generative AI systems are not passive conduits. They are purpose-built products whose design choices—what to crawl, what to tokenize, how to store it, when to reproduce it, and how to monetize it—reflect exactly the kind of upstream volition and deliberate architecture that both the Cartoon Network volitional conduct doctrine and the Grokster inducement framework are designed to capture. The fact that a user prompt triggers the final output does not absolve a company that engineered every step in the chain leading to that output.

Why did Perplexity scrape leading newspapers for content to feed their AI?  Because it was high value, well written, well editing writing and it was valuable to them.  In short, they did it for the money.

They robbed the authors for the same famous reason Willie Sutton robbed the banks.  Because that’s where the money is.

And going back to 1999 won’t save them.

AI’s Legal Defense Team Looks Familiar — Because It Is

If you feel like you’ve seen this movie before, you have.

Back in the 2003-ish runup to the 2005 MGM Studios, Inc. v. Grokster, Ltd. Supreme Court case, I met with the founder of one of the major p2p platforms in an effort to get him to go legal.  I reminded him that he knew there was all kinds of bad stuff that got uploaded to his platform.  However much he denied it, he was filtering it out and he was able to do that because he had the control over the content that he (and all his cohorts) denied he had.  

I reminded him that if this case ever went bad, someone was going to invade his space and find out exactly what he was up to. Just because the whole distributed p2p model (unlike Napster, by the way) was built to both avoid knowledge and be a perpetual motion machine, there was going to come a day when none of that legal advice was going to matter.  Within a few months the platform shut down, not because he didn’t want to go legal, but because he couldn’t, at least not without actually devoting himself to respecting other people’s rights.

Everything Old is New Again

Back in the early 2000s, peer-to-peer (P2P) piracy platforms claimed they weren’t responsible for the illegal music and videos flooding their networks. Today, AI companies claim they don’t know what’s in their training data. The defense is essentially the same: “We’re just the neutral platform. We don’t control the content.”  It’s that distorted view of the DMCA and Section 230 safe harbors that put many lawyers’ children through prep school, college and graduate school.

But just like with Morpheus, eDonkey, Grokster, and LimeWire, everyone knew that was BS because the evidence said otherwise — and here’s the kicker: many of the same lawyers are now running essentially the same playbook to defend AI giants.

The P2P Parallel: “We Don’t Control Uploads… Except We Clearly Do”

In the 2000s, platforms like Kazaa and LimeWire were like my little buddy–magically they  never had illegal pornography or extreme violence available to consumers, they prioritized popular music and movies, and filtered out the worst of the web

That selective filtering made it clear: they knew what was on their network. It wasn’t even a question of “should have known”, they actually knew and they did it anyway.  Courts caught on. 

In Grokster,  the Supreme Court side stepped the hosting issue and essentially said that if you design a platform with the intent to enable infringement, you’re liable.

The Same Playbook in the AI Era

Today’s AI platforms — OpenAI, Anthropic, Meta, Google, and others — essentially argue:
“Our model doesn’t remember where it learned [fill in the blank]. It’s just statistics.”

But behind the curtain, they:
– Run deduplication tools to avoid overloading, for example on copyrighted books
– Filter out NSFW or toxic content
– Choose which datasets to include and exclude
– Fine-tune models to align with somebody’s social norms or optics

This level of control shows they’re not ignorant — they’re deflecting liability just like they did with p2p.

Déjà Vu — With Many of the Same Lawyers

Many of the same law firms that defended Grokster, Kazaa, and other P2P pirate defendants as well as some of the ISPs are now representing AI companies—and the AI companies are very often some, not all, but some of the same ones that started screwing us on DMCA, etc., for the last 25 years.  You’ll see familiar names all of whom have done their best to destroy the creative community for big, big bucks in litigation and lobbying billable hours while filling their pockets to overflowing. 

The legal cadre pioneered the ‘willful blindness’ defense and are now polishing it up for AI, hoping courts haven’t learned the lesson.  And judging…no pun intended…from some recent rulings, maybe they haven’t.

Why do they drive their clients into a position where they pose an existential threat to all creators?  Do they not understand that they are creating a vast community of humans that really, truly, hate their clients?  I think they do understand, but there is a corresponding hatred of the super square Silicon Valley types who hate “Hollywood” right back.

Because, you know, information wants to be free—unless they are selling it.  And your data is their new oil. They apply this “ethic” not just to data, but to everything: books, news, music, images, and voice. Copyright? A speed bump. Terms of service? A suggestion. Artist consent? Optional.  Writing a song is nothing compared to the complexities of Biggest Tech.

Why do they do this?  OCPD Much?

Because control over training data is strategic dominance and these people are the biggest control freaks that mankind has ever produced.  They exhibit persistent and inflexible patterns of behavior characterized by an excessive need to control people, environments, and outcomes, often associated with traits of obsessive-compulsive personality disorder.  

So empathy will get you nowhere with these people, although their narcissism allows them to believe that they are extremely empathetic.  Pathetic, yes, empathetic, not so much.  

Pay No Attention to that Pajama Boy Behind the Curtain

The driving force behind AI is very similar to the driving force behind the Internet.   If pajama boy can harvest the world’s intellectual property and use it to train his proprietary AI model, he now owns a simulation of the culture he is not otherwise part of, and not only can he monetize it without sharing profits or credit, he can deny profits and credit to the people who actually created it.

So just like the heyday of Pirate Bay, Grokster & Co.  (and Daniel Ek’s pirate incarnation) the goal isn’t innovation. The goal is control over language, imagery, and the markets that used to rely on human creators.  This should all sound familiar if you were around for the p2p era.

Why This Matters

Like p2p platforms, it’s just not believable that the AI companies do know what’s in their models.  They may build their chatbot interface so that the public can’t ask the chatbot to blow the whistle on the platform operator, but that doesn’t mean  the company can’t tell what they are training on.  These operators have to be able to know what’s in the training materials and manipulate that data daily.  

They fingerprint, deduplicate, and sanitize their datasets. How else can they avoid having multiple copies of books, for example, that would be a compute nightmare.  They store “embeddings” in a way that they can optimize their AI to use only the best copy of any particular book.  They control the pipeline.

It’s not about the model’s memory. It’s about the platform’s intent and awareness.

If they’re smart enough to remove illegal content and prioritize clean data, they’re smart enough to be held accountable.

We’re not living through the first digital content crisis — just the most powerful one yet. The legal defenses haven’t changed much. But the stakes — for copyright, competition, and consumer protection — are much higher now.

Courts, Congress, and the public should recognize this for what it is: a recycled defense strategy in service of unchecked AI power. Eventually Grokster ran into Grokster— and all these lawyers are praying that there won’t be an AI version of the Grokster case.