The Devil’s Greatest Trick: Ro Khanna’s “Creator Bill of Rights” Is a Political Shield, Not a Charter for Creative Labor

La plus belle des ruses du Diable est de vous persuader qu’il n’existe pas! (“The greatest trick the Devil ever pulled was convincing the world he didn’t exist.”)

Charles Baudelaire, Le Joueur généreux

Ro Khanna’s so‑called “Creator Bill of Rights” is being sold as a long‑overdue charter for fairness in the digital economy—you know, like for gig workers. In reality, it functions as a political shield for Silicon Valley platforms: a non‑binding, influencer‑centric framework built on a false revenue‑share premise that bypasses child labor, unionized creative labor, professional creators, non‑featured artists, and the central ownership and consent crises posed by generative AI. 

Mr. Khanna’s resolution treats transparency as leverage, consent as vibes, and platform monetization as deus ex machina-style natural law of the singularity—while carefully avoiding enforceable rights, labor classification, copyright primacy, artist consent for AI training, work‑for‑hire abuse, and real remedies against AI labs for artists. What flows from his assumptions is not a “bill of rights” for creators, but a narrative framework designed to pacify the influencer economy and legitimize platform power at the exact moment that judges are determining that creative labor is being illegally scraped, displaced, and erased by AI leviathans including some publicly traded companies with trillion-dollar market caps.

The First Omission: Child Labor in the Creator Economy

Rep. Khanna’s newly unveiled “Creator Bill of Rights” has been greeted with the kind of headlines Silicon Valley loves: Congress finally standing up for creators, fairness, and transparency in the digital economy. But the very first thing it doesn’t do should set off alarm bells. The resolution never meaningfully addresses child labor in the creator economy, a sector now infamous for platform-driven exploitation of minors through user generated content, influencer branding, algorithmic visibility contests, and monetized childhood.  (Wikipedia is Exhibit A, Facebook Exhibit B, YouTube Exhibit C and Instagram Exhibit D.)

There is no serious discussion of child worker protections and all that comes with it, often under state laws: working-hour limits, trust accounts, consent frameworks, or the psychological and economic coercion baked into platform monetization systems. For a document that styles itself as a “bill of rights,” that omission alone is disqualifying. But perhaps understandable given AI Viceroy David Sacks’ obsession with blocking enforcement of state laws that “impede” AI.

And it’s not an isolated miss. Once you read Khanna’s framework closely, a pattern emerges. This isn’t a bill of rights for creators. It’s a political shield for platforms that is built on a false economic premise, framed around influencers, silent on professional creative labor, evasive on AI ownership and training consent, and carefully structured to avoid enforceable obligations.

The Foundational Error: Treating Revenue Share as Natural Law that Justifies A Stream Share Threshold

The foundational error appears right at the center of the resolution: its uncritical embrace of the Internet’s coin of the realm: revenue-sharing. Khanna calls for “clear, transparent, and predictable revenue-sharing terms” between platforms and creators. That phrase sounds benign, even progressive. But it quietly locks in the single worst idea anyone ever had for royalty economics: big-pool platform revenue share.  An idea that is being rejected by pretty much everyone except Spotify with its stream share threshold. In case Mr. Khanna didn’t get the memo, artist-centric is the new new thing.

Revenue sharing treats creators as participants in a platform monetization program, not as rights-holders.  You know, “partners.”  Artists don’t get a share of Spotify stock, they get a “revenue share” because they’re “partnering” with Spotify.   If that’s how Spotify treats “partners”….

Under that revenue share model, the platform defines what counts as revenue, what gets excluded, how it’s allocated, which metrics matter, and how the rules change. The platform controls all the data. The platform controls the terms. And the platform retains unilateral power to rewrite the deal. Hey “partner,” that’s not compensation grounded in intellectual property or labor rights. It’s a dodge grounded in platform policy.

We already know how this story ends. Big-pool revenue share regimes hide cross-subsidies, reward algorithm gaming over quality, privilege viral noise over durable cultural work, and collapse bargaining power into opaque market share payments of microscopic proportion. Revenue share deals destroy price signals, hollow out licensing markets, and make creative income volatile and non-forecastable. This is exceptionally awful for songwriters and nobody can tell a songwriter today what that burger on Tuesday will actually bring.

A advertising revenue-share model penalizes artists because they receive only a tiny fraction of the ads served against their own music, while platforms like Google capture roughly half of the total advertising revenue generated across the entire network. Naturally they love it.

Rev shares of advertising revenue are the core economic pathology behind what happened to music, journalism, and digital publishing over the last fifteen years.  As we have seen from Spotify’s stream share threshold, a platform can unilaterally decide to cut off payments at any time for any absurd reason and get away with it.  And Khanna’s resolution doesn’t challenge that logic. It blesses it.

He doesn’t say creators are entitled to enforceable royalties tied to uses of their work at rates set by the artist. He doesn’t say there should be statutory floors, audit rights, underpayment penalties, nondiscrimination rules, or retaliation protections. He doesn’t say platforms should be prohibited from unilaterally redefining the pie. He says let’s make the revenue share more “transparent” and “predictable.” That’s not a power shift. That’s UX optimization for exploitation.

This Is an Influencer Bill, Not a Creator Bill

The second fatal flaw is sociological. Khanna’s resolution is written for the creator economy, not the creative economy.

The “creator” in Khanna’s bill is a YouTuber, a TikToker, a Twitch streamer, a podcast personality, a Substack writer, a platform-native entertainer (but no child labor protection). Those are real jobs, and the people doing them face real precarity. But they are not the same thing as professional creative labor. They are usually not professional musicians, songwriters, composers, journalists, photographers, documentary filmmakers, authors, screenwriters, actors, directors, designers, engineers, visual artists, or session musicians. They are not non-featured performers. They are not investigative reporters. They are not the people whose works are being scraped at industrial scale to train generative AI systems.

Those professional creators are workers who produce durable cultural goods governed by copyright, contract, and licensing markets. They rely on statutory royalties, collective bargaining, residuals, reuse frameworks, audit rights, and enforceable ownership rules. They face synthetic displacement and market destruction from AI systems trained on their work without consent. Khanna’s resolution barely touches any of that. It governs platform participation. It does not govern creative labor.  It’s not that influencers shouldn’t be able to rely on legal protections; it’s that if you’re going to have a bill of rights for creators it should include all creators and very often the needs are different.  Starting with collective bargaining and unions.

The Total Bypass of Unionized Labor

Nowhere is this shortcoming more glaring than in the complete bypass of unionized labor. The framework lives in a parallel universe where SAG-AFTRA, WGA, DGA, IATSE, AFM, Equity, newsroom unions, residuals, new-use provisions, grievance procedures, pension and health funds, minimum rates, credit rules, and collective bargaining simply do not exist. That entire legal architecture is invisible.  And Khanna’s approach could easily roll back the gains on AI protections that unions have made through collective bargaining.

Which means the resolution is not attempting to interface with how creative work actually functions in film, television, music, journalism, or publishing. It is not creative labor policy. It is platform fairness rhetoric.

Invisible Labor: Non-Featured Artists and the People the Platform Model Erases

The same erasure applies to non-featured artists and invisible creative labor. Session musicians, backup singers, supporting actors, dancers, crew, editors, photographers on assignment, sound engineers, cinematographers — these people don’t live inside platform revenue-share dashboards. They are paid through wage scales, reuse payments, residuals, statutory royalty regimes, and collective agreements.

None of that exists in Khanna’s world. His “creator” is an account, not a worker.

AI Without Consent Is Not Accountability

The AI plank in the resolution follows the same pattern of rhetorical ambition and structural emptiness. Khanna gestures at transparency, consent, and accountability for AI and synthetic media. But he never defines what consent actually means.

Consent for training? For style mimicry? For voice cloning? For archival scraping of journalism and music catalogs? For derivative outputs? For model fine-tuning? For prompt exploitation? For replacement economics?

The bill carefully avoids the training issue. Which is the whole issue.

A real AI consent regime would force Congress to confront copyright primacy, opt-in licensing, derivative works, NIL rights, data theft, model ownership, and platform liability. Khanna’s framework gestures at harms while preserving the industrial ingestion model intact.

The Ownership Trap: Work-for-Hire and AI Outputs

This omission is especially telling. Nowhere does Khanna say platforms may not claim authorship or ownership of AI outputs by default. Nowhere does he say AI-assisted works are not works made for hire. Nowhere does he say users retain rights in their contributions and edits. Nowhere does he say WFH boilerplate cannot be used to convert prompts into platform-owned assets.

That silence is catastrophic.

Right now, platforms are already asserting ownership contractually, claiming assignments of outputs, claiming compilation rights, claiming derivative rights, controlling downstream licensing, locking creators out of monetization, and building synthetic catalogs they own. Even though U.S. law says purely AI-generated content isn’t copyrightable absent human authorship, platforms can still weaponize terms of service, automated enforcement, and contractual asymmetry to create “synthetic  ownership” or “practical control.” Khanna’s resolution says nothing about any of it.

Portable Benefits as a Substitute for Labor Rights

Then there’s the portable-benefits mirage. Portable benefits sound progressive. They are also the classic substitute for confronting misclassification. So first of all, Khanna starts our saying that “gig workers” in the creative economy don’t get health care—aside from the union health plans, I guess. But then he starts with the portable benefits mirage. So which is it? Surely he doesn’t mean nothing from nothing leaves nothing?

If you don’t want to deal with whether creators are actually employees, whether platforms owe payroll taxes, whether wage-and-hour law applies, whether unemployment insurance applies, whether workers’ comp applies, whether collective bargaining rights attach, or…wait for it…stock options apply…you propose portable benefits without dealing with the reality that there are no benefits. You preserve contractor status. You socialize costs and privatize upside. You deflect labor-law reform and health insurance reform for that matter. You look compassionate. And you change nothing structurally.

Khanna’s framework sits squarely in that tradition of nothing from nothing leaves nothing.

A Non-Binding Resolution for a Reason

The final tell is procedural. Khanna didn’t introduce a bill. He introduced a non-binding resolution.

No enforceable rights. No regulatory mandates. No private causes of action. No remedies. No penalties. No agency duties. No legal obligations.

This isn’t legislation. It’s political signaling.

What This Really Is: A Political Shield

Put all of this together and the picture becomes clear. Khanna’s “Creator Bill of Rights” is built on a false revenue-share premise. It is framed around influencers. It bypasses professional creators. It bypasses unions. It bypasses non-featured artists. It bypasses child labor. It bypasses training consent. It bypasses copyright primacy. It bypasses WFH abuse. It bypasses platform ownership grabs. It bypasses misclassification. It bypasses enforceability. I give you…Uber.

It doesn’t fail because it’s hostile to creators, rather because it is indifferent to creators. It fails because it redefines “creator” downward until every hard political and legal question disappears.

And in doing so, it functions as a political shield for the very platforms headquartered in Khanna’s district.

When the Penny Drops

Ro Khanna’s “Creator Bill of Rights” isn’t a rights charter.

It’s a narrative framework designed to stabilize the influencer economy, legitimize platform compensation models, preserve contractor status, soften AI backlash, avoid copyright primacy, avoid labor-law reform, avoid ownership reform, and avoid real accountability.

It treats transparency as leverage. It treats consent as vibes. It treats revenue share as natural law. It treats AI as branding. It treats creative labor as content. It treats platforms as inevitable.

And it leaves out the people who are actually being scraped, displaced, devalued, erased, and replaced: musicians, journalists, photographers, actors, directors, songwriters, composers, engineers, non-featured performers, visual artists, and professional creators.

If Congress actually wants a bill of rights for creators, it won’t start with influencer UX and non-binding resolutions. It will start with enforceable intellectual-property rights, training consent, opt-in regimes, audit rights, statutory floors, collective bargaining, exclusion of AI outputs from work-for-hire, limits on platform ownership claims, labor classification clarity, and real remedies.

Until then, this isn’t a bill of rights.

It’s a press release with footnotes.

From Plutonium to Prompt Engineering: Big Tech’s Land Grab at America’s Nuclear Sites–and Who’s Paying for It?

In a twist of post–Cold War irony, the same federal sites that once forged the isotopes of nuclear deterrence are now poised to fuel the arms race of artificial intelligence under the leadership of Special Government Employee and Silicon Valley Viceroy David Sacks. Under a new Department of Energy (DOE) initiative, 16 legacy nuclear and lab sites — including Savannah River, Idaho National Lab, and Oak Ridge Tennessee — are being opened to private companies to host massive AI data centers. That’s right–Tennessee where David Sacks is riding roughshod over the ELVIS Act.

But as this techno-industrial alliance gathers steam, one question looms large: Who benefits — and how will the American public be compensated for leasing its nuclear commons to the world’s most powerful corporations? Spoiler alert: We won’t.

A New Model, But Not the Manhattan Project

This program is being billed in headlines as a “new Manhattan Project for AI.” But that comparison falls apart quickly. The original Manhattan Project was:
– Owned by the government
– Staffed by public scientists
– Built for collective defense

Today’s AI infrastructure effort is:
– Privately controlled
– Driven by monopolies and venture capital
– Structured to avoid transparency and public input
– Uses free leases on public land with private nuclear reactors

Call it the Manhattan Project in reverse — not national defense, but national defense capture.

The Art of the Deal: Who gets what?

What Big Tech Is Getting

– Access to federal land already zoned, secured, and wired
– Exemption from state and local permitting
– Bypass of grid congestion via nuclear-ready substations
– DOE’s help fast-tracking nuclear microreactors (SMRs)
– Potential sovereign AI training enclaves, shielded from export controls and oversight

And all of it is being made available to private companies called the “Frontier labs”: Microsoft, Oracle, Amazon, OpenAI, Anthropic, xAI — the very firms at the center of the AI race.

What the Taxpayer Gets (Maybe)

Despite this extraordinary access, almost nothing is disclosed about how the public is compensated. No known revenue-sharing models. No guaranteed public compute access. No equity. No royalties.

Land lease payments? Not disclosed. Probably none.
Local tax revenue? Minimal (federal lands exempt)
Infrastructure benefit sharing? Unclear or limited

It’s all being negotiated quietly, under vague promises of “national competitiveness.”

Why AI Labs Want DOE Sites

Frontier labs like OpenAI and Anthropic — and their cloud sponsors — need:
– Gigawatts of energy
– Secure compute environments
– Freedom from export rules and Freedom of Information Act requests
– Permitting shortcuts and national branding

The DOE sites offer all of that — plus built-in federal credibility. The same labs currently arguing in court that their training practices are “fair use” now claim they are defenders of democracy training AI on taxpayer-built land.

This Isn’t the Manhattan Project — It’s the Extraction Economy in a Lab Coat

The tech industry loves to invoke patriotism when it’s convenient — especially when demanding access to federal land, nuclear infrastructure, or diplomatic cover from the EU’s AI Act. But let’s be clear:

This isn’t the Manhattan Project. Or rather we should hope it isn’t because that one didn’t end well and still hasn’t.
It’s not public service.
It’s Big Tech lying about fair use, wrapped in an American flag — and for all we know, it might be the first time David Sacks ever saw one.

When companies like OpenAI and Microsoft claim they’re defending democracy while building proprietary systems on DOE nuclear land, we’re not just being gaslit — we’re being looted.

If the AI revolution is built on nationalizing risk and privatizing power, it’s time to ask whose country this still is — and who gets to turn off the lights.

Deduplication and Discovery: The Smoking Gun in the Machine

WINSTON

“Wipe up all those little pieces of brains and skull”

From Pulp Fiction, screenplay by Quentin Tarantino and Roger Avary

Deduplication—the process of removing identical or near-identical content from AI training data—is a critical yet often overlooked indicator that AI platforms actively monitor and curate their training sets. This is the kind of process that one would expect given the kind of “scrape, ready, aim” business practices that seems precisely the approach of AI platforms that have ready access to large amounts of fairly high quality data from users of other products placed into commerce by business affiliates or confederates of the AI platforms.

For example, Google Gemini could have access to gmail, YouTube, at least “publicly available” Google Docs, Google Translate, or Google for Education, and then of course one of the great scams of all time, Google Books. Microsoft uses Bing searches, MSN browsing, the consumer Copilot experience, and ad interactions. Amazon uses Alexa prompts, Facebook uses “public” posts and so on.

This kind of hoovering up of indiscriminate amounts of “data” in the form of your baby pictures posted on Facebook and your user generated content on YouTube is bound to produce duplicates. After all, how may users have posted their favorite Billie Eilish or Taylor Swift music video. AI doesn’t need 10000 versions of “Shake it Off” they probably just need the official video. Enter deduplication–which by definition means the platform knows what it has scraped and also knows what it wants to get rid of.

“Get rid of” is a relative concept. In many systems—particularly in storage environments like backup servers or object stores—deduplication means keeping only one physical copy of a file. Any other instances of that data don’t get stored again; instead, they’re represented by pointers to the original copy. This approach, known as inline deduplication, happens in real time and minimizes storage waste without actually deleting anything of functional value. It requires knowing what you have, knowing you have more than one version of the same thing, and being able to tell the system where to look to find the “original” copy without disturbing the process and burning compute inefficiently.

In other cases, such as post-process deduplication, the system stores data initially, then later scans for and eliminates redundancies. Again, the AI platform knows there are two or more versions of the same thing, say the book Being and Nothingness, knows where to find the copies and has been trained to keep only one version. Even here, the duplicates may not be permanently erased—they might be archived, versioned, or logged for auditing, compliance, or reconstruction purposes.

In AI training contexts, deduplication usually means removing redundant examples from the training set to avoid copyright risk. The duplicate content may be discarded from the training pipeline but often isn’t destroyed. Instead, AI companies may retain it in a separate filtered corpus or keep hashed fingerprints to ensure future models don’t retrain on the same material unknowingly.

So they know what they have, and likely know where it came from. They just don’t want to tell any plaintiffs.

Ultimately, deduplication is less about destruction and more about optimization. It’s a way to reduce noise, save resources, and improve performance—while still allowing systems to track, reference, or even rehydrate the original data if needed.

Its existence directly undermines claims that companies are unaware of which copyrighted works were ingested. Indeed, it only makes sense that one of the hidden consequences of the indiscriminate scraping that underpins large-scale AI training is the proliferation of duplicated data. Web crawlers ingest everything they can access—news articles republished across syndicates, forum posts echoed in aggregation sites, Wikipedia mirrors, boilerplate license terms, spammy SEO farms repeating the same language over and over. Without any filtering, this avalanche of redundant content floods the training pipeline.

This is where deduplication becomes not just useful, but essential. It’s the cleanup crew after a massive data land grab. The more messy and indiscriminate the scraping, the more aggressively the model must filter for quality, relevance, and uniqueness to avoid training inefficiencies or—worse—model behaviors that are skewed by repetition. If a model sees the same phrase or opinion thousands of times, it might assume it’s authoritative or universally accepted, even if it’s just a meme bouncing around low-quality content farms.

Deduplication is sort of the Winston Wolf of AI. And if the cleaner shows up, somebody had to order the cleanup. It is a direct response to the excesses of indiscriminate scraping. It’s both a technical fix and a quiet admission that the underlying data collection strategy is, by design, uncontrolled. But while the scraping may be uncontrolled to get copies of as much of your data has they can lay hands on, even by cleverly changing their terms of use boilerplate so they can do all this under the effluvia of legality, they send in the cleaner to take care of the crime scene.

So to summarize: To deduplicate, platforms must identify content-level matches (e.g., multiple copies of Being and Nothingness by Jean-Paul Sartre). This process requires tools that compare, fingerprint, or embed full documents—meaning the content is readable and classifiable–and, oh, yes, discoverable.

Platforms may choose the ‘cleanest’ copy to keep, showing knowledge and active decision-making about which version of a copyrighted work is retained. And–big finish–removing duplicates only makes sense if operators know which datasets they scraped and what those datasets contain.

Drilling down on a platform’s deduplication tools and practices may prove up knowledge and intent to a precise degree—contradicting arguments of plausible deniability in litigation. Johnny ate the cookies isn’t going to fly. There’s a market clearing level of record keeping necessary for deduping to work at all, so it’s likely that there are internal deduplication logs or tooling pipelines that are discoverable.

When AI platforms object to discovery about deduplication, plaintiffs can often overcome those objections by narrowing their focus. Rather than requesting broad details about how a model deduplicates its entire training set, plaintiffs should ask a simple, specific question: Were any of these known works—identified by title or author—deduplicated or excluded from training?

This approach avoids objections about overbreadth or burden. It reframes discovery as a factual inquiry, not a technical deep dive. If the platform claims the data was not retained, plaintiffs can ask for existing artifacts—like hash filters, logs, or manifests—or seek a sworn statement explaining the loss and when it occurred. That, in turn, opens the door to potential spoliation arguments.

If trade secrets are cited, plaintiffs can propose a protective order, limiting access to outside counsel or experts like we’ve done 100,000 times before in other cases. And if the defendant claims “duplicate” is too vague, plaintiffs can define it functionally—as content that’s identical or substantially similar, by hash, tokens, or vectors.

Most importantly, deduplication is relevant. If a platform identified a plaintiff’s work and trained on it anyway, that speaks to volitional use, copying, and lack of care—key issues in copyright and fair use analysis. And if they lied about it, particularly to the court—Helloooooo Harper & Row. Discovery requests that are focused, tailored, and anchored in specific works stand a far better chance of surviving objections and yielding meaningful evidence which hopefully will be useful and lead to other positive results.

AI’s Legal Defense Team Looks Familiar — Because It Is

If you feel like you’ve seen this movie before, you have.

Back in the 2003-ish runup to the 2005 MGM Studios, Inc. v. Grokster, Ltd. Supreme Court case, I met with the founder of one of the major p2p platforms in an effort to get him to go legal.  I reminded him that he knew there was all kinds of bad stuff that got uploaded to his platform.  However much he denied it, he was filtering it out and he was able to do that because he had the control over the content that he (and all his cohorts) denied he had.  

I reminded him that if this case ever went bad, someone was going to invade his space and find out exactly what he was up to. Just because the whole distributed p2p model (unlike Napster, by the way) was built to both avoid knowledge and be a perpetual motion machine, there was going to come a day when none of that legal advice was going to matter.  Within a few months the platform shut down, not because he didn’t want to go legal, but because he couldn’t, at least not without actually devoting himself to respecting other people’s rights.

Everything Old is New Again

Back in the early 2000s, peer-to-peer (P2P) piracy platforms claimed they weren’t responsible for the illegal music and videos flooding their networks. Today, AI companies claim they don’t know what’s in their training data. The defense is essentially the same: “We’re just the neutral platform. We don’t control the content.”  It’s that distorted view of the DMCA and Section 230 safe harbors that put many lawyers’ children through prep school, college and graduate school.

But just like with Morpheus, eDonkey, Grokster, and LimeWire, everyone knew that was BS because the evidence said otherwise — and here’s the kicker: many of the same lawyers are now running essentially the same playbook to defend AI giants.

The P2P Parallel: “We Don’t Control Uploads… Except We Clearly Do”

In the 2000s, platforms like Kazaa and LimeWire were like my little buddy–magically they  never had illegal pornography or extreme violence available to consumers, they prioritized popular music and movies, and filtered out the worst of the web

That selective filtering made it clear: they knew what was on their network. It wasn’t even a question of “should have known”, they actually knew and they did it anyway.  Courts caught on. 

In Grokster,  the Supreme Court side stepped the hosting issue and essentially said that if you design a platform with the intent to enable infringement, you’re liable.

The Same Playbook in the AI Era

Today’s AI platforms — OpenAI, Anthropic, Meta, Google, and others — essentially argue:
“Our model doesn’t remember where it learned [fill in the blank]. It’s just statistics.”

But behind the curtain, they:
– Run deduplication tools to avoid overloading, for example on copyrighted books
– Filter out NSFW or toxic content
– Choose which datasets to include and exclude
– Fine-tune models to align with somebody’s social norms or optics

This level of control shows they’re not ignorant — they’re deflecting liability just like they did with p2p.

Déjà Vu — With Many of the Same Lawyers

Many of the same law firms that defended Grokster, Kazaa, and other P2P pirate defendants as well as some of the ISPs are now representing AI companies—and the AI companies are very often some, not all, but some of the same ones that started screwing us on DMCA, etc., for the last 25 years.  You’ll see familiar names all of whom have done their best to destroy the creative community for big, big bucks in litigation and lobbying billable hours while filling their pockets to overflowing. 

The legal cadre pioneered the ‘willful blindness’ defense and are now polishing it up for AI, hoping courts haven’t learned the lesson.  And judging…no pun intended…from some recent rulings, maybe they haven’t.

Why do they drive their clients into a position where they pose an existential threat to all creators?  Do they not understand that they are creating a vast community of humans that really, truly, hate their clients?  I think they do understand, but there is a corresponding hatred of the super square Silicon Valley types who hate “Hollywood” right back.

Because, you know, information wants to be free—unless they are selling it.  And your data is their new oil. They apply this “ethic” not just to data, but to everything: books, news, music, images, and voice. Copyright? A speed bump. Terms of service? A suggestion. Artist consent? Optional.  Writing a song is nothing compared to the complexities of Biggest Tech.

Why do they do this?  OCPD Much?

Because control over training data is strategic dominance and these people are the biggest control freaks that mankind has ever produced.  They exhibit persistent and inflexible patterns of behavior characterized by an excessive need to control people, environments, and outcomes, often associated with traits of obsessive-compulsive personality disorder.  

So empathy will get you nowhere with these people, although their narcissism allows them to believe that they are extremely empathetic.  Pathetic, yes, empathetic, not so much.  

Pay No Attention to that Pajama Boy Behind the Curtain

The driving force behind AI is very similar to the driving force behind the Internet.   If pajama boy can harvest the world’s intellectual property and use it to train his proprietary AI model, he now owns a simulation of the culture he is not otherwise part of, and not only can he monetize it without sharing profits or credit, he can deny profits and credit to the people who actually created it.

So just like the heyday of Pirate Bay, Grokster & Co.  (and Daniel Ek’s pirate incarnation) the goal isn’t innovation. The goal is control over language, imagery, and the markets that used to rely on human creators.  This should all sound familiar if you were around for the p2p era.

Why This Matters

Like p2p platforms, it’s just not believable that the AI companies do know what’s in their models.  They may build their chatbot interface so that the public can’t ask the chatbot to blow the whistle on the platform operator, but that doesn’t mean  the company can’t tell what they are training on.  These operators have to be able to know what’s in the training materials and manipulate that data daily.  

They fingerprint, deduplicate, and sanitize their datasets. How else can they avoid having multiple copies of books, for example, that would be a compute nightmare.  They store “embeddings” in a way that they can optimize their AI to use only the best copy of any particular book.  They control the pipeline.

It’s not about the model’s memory. It’s about the platform’s intent and awareness.

If they’re smart enough to remove illegal content and prioritize clean data, they’re smart enough to be held accountable.

We’re not living through the first digital content crisis — just the most powerful one yet. The legal defenses haven’t changed much. But the stakes — for copyright, competition, and consumer protection — are much higher now.

Courts, Congress, and the public should recognize this for what it is: a recycled defense strategy in service of unchecked AI power. Eventually Grokster ran into Grokster— and all these lawyers are praying that there won’t be an AI version of the Grokster case.