Schrödinger’s Training Clause: How Platforms Like WeTransfer Say They’re Not Using Your Files for AI—Until They Are

Tech companies want your content. Not just to host it, but for their training pipeline—to train models, refine algorithms, and “improve services” in ways that just happen to lead to new commercial AI products. But as public awareness catches up, we’ve entered a new phase: deniable ingestion.

Welcome to the world of the Schrödinger’s training clause—a legal paradox where your data is simultaneously not being used to train AI and fully licensed in case they decide to do so.

The Door That’s Always Open

Let’s take the WeTransfer case. For a brief period this month (in July 2025), their Terms of Service included an unmistakable clause: users granted them rights to use uploaded content to “improve the performance of machine learning models.” That language was direct. It caused backlash. And it disappeared.

Many mea culpas later, their TOS has been scrubbed clean of AI references. I appreciate the sentiment, really I do. But—and there’s always a but–the core license hasn’t changed. It’s still:

– Perpetual

– Worldwide

– Royalty-free

– Transferable

– Sub-licensable

They’ve simply returned the problem clause to its quantum box. No machine learning references. But nothing that stops it either.

 A Clause in Superposition

Platforms like WeTransfer—and others—have figured out the magic words: Don’t say you’re using data to train AI. Don’t say you’re not using it either. Instead, claim a sweeping license to do anything necessary to “develop or improve the service.”

That vague phrasing allows future pivots. It’s not a denial. It’s a delay. And to delay is to deny.

That’s what makes it Schrödinger’s training clause: Your content isn’t being used for AI. Unless it is. And you won’t know until someone leaks it, or a lawsuit makes discovery public.

The Scrape-Then-Scrub Scenario

Let’s reconstruct what could have happened–not saying it did happen, just could have–following the timeline in The Register:

1. Early July 2025: WeTransfer silently updates its Terms of Service to include AI training rights.

2. Users continue uploading sensitive or valuable content.

3. [Somebody’s] AI systems quickly ingest that data under the granted license.

4. Public backlash erupts mid-July.

5. WeTransfer removes the clause—but to my knowledge never revokes the license retroactively or promises to delete what was scraped. In fact, here’s their statement which includes this non-denial denial: “We don’t use machine learning or any form of AI to process content shared via WeTransfer.” OK, that’s nice but that wasn’t the question. And if their TOS was so clear, then why the amendment in the first place?

Here’s the Potential Legal Catch

Even if WeTransfer removed the clause later, any ingestion that occurred during the ‘AI clause window’ is arguably still valid under the terms then in force. As far as I know, they haven’t promised:

– To destroy any trained models

– To purge training data caches

– Or to prevent third-party partners from retaining data accessed lawfully at the time

What Would ‘Undoing’ Scraping Require?

– Audit logs to track what content was ingested and when

– Reversion of any models trained on user data

– Retroactive license revocation and sub-license termination

None of this has been offered that I have seen.

What ‘We Don’t Train on Your Data’ Actually Means

When companies say, “we don’t use your data to train AI,” ask:

– Do you have the technical means to prevent that?

– Is it contractually prohibited?

– Do you prohibit future sublicensing?

– Can I audit or opt out at the file level?

If the answer to those is “no,” then the denial is toothless.

How Creators Can Fight Back

1. Use platforms that require active opt-in for AI training.

2. Encrypt files before uploading.

3. Include counter-language in contracts or submission terms:

   “No content provided may be used, directly or indirectly, to train or fine-tune machine learning or artificial intelligence systems, unless separately and explicitly licensed for that purpose in writing” or something along those lines.

4. Call it out. If a platform uses Schrödinger’s language, name it. The only thing tech companies fear more than litigation is transparency.

What is to Be Done?

The most dangerous clauses aren’t the ones that scream “AI training.” They’re the ones that whisper, “We’re just improving the service.”

If you’re a creative, legal advisor, or rights advocate, remember: the future isn’t being stolen with force. It’s being licensed away in advance, one unchecked checkbox at a time.

And if a platform’s only defense is “we’re not doing that right now”—that’s not a commitment. That’s a pause.

That’s Schrödinger’s training clause.

From Plutonium to Prompt Engineering: Big Tech’s Land Grab at America’s Nuclear Sites–and Who’s Paying for It?

In a twist of post–Cold War irony, the same federal sites that once forged the isotopes of nuclear deterrence are now poised to fuel the arms race of artificial intelligence under the leadership of Special Government Employee and Silicon Valley Viceroy David Sacks. Under a new Department of Energy (DOE) initiative, 16 legacy nuclear and lab sites — including Savannah River, Idaho National Lab, and Oak Ridge Tennessee — are being opened to private companies to host massive AI data centers. That’s right–Tennessee where David Sacks is riding roughshod over the ELVIS Act.

But as this techno-industrial alliance gathers steam, one question looms large: Who benefits — and how will the American public be compensated for leasing its nuclear commons to the world’s most powerful corporations? Spoiler alert: We won’t.

A New Model, But Not the Manhattan Project

This program is being billed in headlines as a “new Manhattan Project for AI.” But that comparison falls apart quickly. The original Manhattan Project was:
– Owned by the government
– Staffed by public scientists
– Built for collective defense

Today’s AI infrastructure effort is:
– Privately controlled
– Driven by monopolies and venture capital
– Structured to avoid transparency and public input
– Uses free leases on public land with private nuclear reactors

Call it the Manhattan Project in reverse — not national defense, but national defense capture.

The Art of the Deal: Who gets what?

What Big Tech Is Getting

– Access to federal land already zoned, secured, and wired
– Exemption from state and local permitting
– Bypass of grid congestion via nuclear-ready substations
– DOE’s help fast-tracking nuclear microreactors (SMRs)
– Potential sovereign AI training enclaves, shielded from export controls and oversight

And all of it is being made available to private companies called the “Frontier labs”: Microsoft, Oracle, Amazon, OpenAI, Anthropic, xAI — the very firms at the center of the AI race.

What the Taxpayer Gets (Maybe)

Despite this extraordinary access, almost nothing is disclosed about how the public is compensated. No known revenue-sharing models. No guaranteed public compute access. No equity. No royalties.

Land lease payments? Not disclosed. Probably none.
Local tax revenue? Minimal (federal lands exempt)
Infrastructure benefit sharing? Unclear or limited

It’s all being negotiated quietly, under vague promises of “national competitiveness.”

Why AI Labs Want DOE Sites

Frontier labs like OpenAI and Anthropic — and their cloud sponsors — need:
– Gigawatts of energy
– Secure compute environments
– Freedom from export rules and Freedom of Information Act requests
– Permitting shortcuts and national branding

The DOE sites offer all of that — plus built-in federal credibility. The same labs currently arguing in court that their training practices are “fair use” now claim they are defenders of democracy training AI on taxpayer-built land.

This Isn’t the Manhattan Project — It’s the Extraction Economy in a Lab Coat

The tech industry loves to invoke patriotism when it’s convenient — especially when demanding access to federal land, nuclear infrastructure, or diplomatic cover from the EU’s AI Act. But let’s be clear:

This isn’t the Manhattan Project. Or rather we should hope it isn’t because that one didn’t end well and still hasn’t.
It’s not public service.
It’s Big Tech lying about fair use, wrapped in an American flag — and for all we know, it might be the first time David Sacks ever saw one.

When companies like OpenAI and Microsoft claim they’re defending democracy while building proprietary systems on DOE nuclear land, we’re not just being gaslit — we’re being looted.

If the AI revolution is built on nationalizing risk and privatizing power, it’s time to ask whose country this still is — and who gets to turn off the lights.

When Viceroy David Sacks Writes the Tariffs: How One VC Could Weaponize U.S. Trade Against the EU

David Sacks is a “Special Government Employee”, Silicon Valley insider and a PayPal mafioso who has become one of the most influential “unofficial” architects of AI policy under the Trump administration. No confirmation hearings, no formal role—but direct access to power.

He:
– Hosts influential political podcasts with Musk and Thiel-aligned narratives.
– Coordinates behind closed doors with elite AI companies who are now PRC-style “national champions” (OpenAI, Anthropic, Palantir).
– Has reportedly played a central role in shaping the AI Executive Orders and industrial strategy driving billions in public infrastructure to favored firms.

Under 18 U.S.C. § 202(a), a Special Government Employee is:

  • Temporarily retained to perform limited government functions,
  • For no more than 130 days per year (which for Sacks ends either April 14 or May 30, 2025), unless reappointed in a different role,
  • Typically serves in an advisory or consultative role, or
  • Without holding actual decision-making or operational authority over federal programs or agencies.

SGEs are used to avoid conflict-of-interest entanglements for outside experts while still tapping their expertise for advisory purposes. They are not supposed to wield sweeping executive power or effectively run a government program. Yeah, right.

And like a good little Silicon Valley weasel, Sacks supposedly is alternating between his DC side hustle and his VC office to stay under 130 days. This is a dumbass reading of the statute which says “‘Special Government employee’ means… any officer or employee…retained, designated, appointed, or employed…to perform…temporary duties… for not more than 130 days during any period of 365 consecutive days.” That’s not the same as “worked” 130 days on the time card punch. But oh well.

David Sacks has already exceeded the legal boundaries of his appointment as a Special Government Employee (SGE) both in time served but also by directing the implementation of a sweeping, whole-of-government AI policy, including authoring executive orders, issuing binding directives to federal agencies, and coordinating interagency enforcement strategies—actions that plainly constitute executive authority reserved for duly appointed officers under the Appointments Clause. As an SGE, Sacks is authorized only to provide temporary, nonbinding advice, not to exercise operational control or policy-setting discretion across the federal government. Accordingly, any executive actions taken at his direction or based on his advisement are constitutionally infirm as the unlawful product of an individual acting without valid authority, and must be deemed void as “fruit of the poisonous tree.”

Of course, one of the states that the Trump AI Executive Orders will collide with almost immediately is the European Union and its EU AI Act. Were they 51st? No that’s Canada. 52nd? Ah, right that’s Greenland. Must be 53rd.

How Could David Sacks Weaponize Trade Policy to Help His Constituents in Silicon Valley?

Here’s the playbook:

Engineer Executive Orders

Through his demonstrated access to Trump and senior White House officials, Sacks could promote executive orders under the International Emergency Economic Powers Act (IEEPA) or Section 301 of the Trade Act, aimed at punishing countries (like EU members) for “unfair restrictions” on U.S. AI exports or operations.

Something like this: “The European Union’s AI Act constitutes a discriminatory and protectionist measure targeting American AI innovation, and materially threatens U.S. national security and technological leadership.” I got your moratorium right here.

Leverage the USTR as a Blunt Instrument

The Office of the U.S. Trade Representative (USTR) can initiate investigations under Section 301 without needing new laws. All it takes is political will—and a nudge from someone like Viceroy Sacks—to argue that the EU’s AI Act discriminates against U.S. firms. See Canada’s “Tech Tax”. Gee, I wonder if Viceroy Sacks had anything to do with that one.

Redefine “National Security”

Sacks and his allies can exploit the Trump administration’s loose definition of “national security” claiming that restricting U.S. AI firms in Europe endangers critical defense and intelligence capabilities.

Smear Campaigns and Influence Operations

Sacks could launch more public campaigns against the EU like his attacks on the AI diffusion rule. According to the BBC, “Mr. Sacks cited the alienation of allies as one of his key arguments against the AI diffusion plan”. That’s a nice ally you got there, be a shame if something happened to it.

After all, the EU AI Act does what Sacks despises like protects artists and consumers, restricts deployment of high-risk AI systems (like facial recognition and social scoring), requires documentation of training data (which exposes copyright violations), and applies extraterritorially (meaning U.S. firms must comply even at home).

And don’t forget, Viceroy Sacks actually was given a portfolio that at least indirectly includes the National Security Council, so he can use the NATO connection to put a fine edge on his “industrial patriotism” just as war looms over Europe.

When Policy Becomes Personal

In a healthy democracy, trade retaliation should be guided by evidence, public interest, and formal process.

But under the current setup, someone like David Sacks can short-circuit the system—turning a private grievance into a national trade war. He’s already done it to consumers, wrongful death claims and copyright, why not join war lords like Eric Schmidt and really jack with people? Like give deduplication a whole new meaning.

When one man’s ideology becomes national policy, it’s not just bad governance.

It’s a broligarchy in real time.

Beyond Standard Oil: How the AI Action Plan Made America a Command Economy for Big Tech That You Will Pay For

When the White House requested public comments earlier this year on how the federal government should approach artificial intelligence, thousands of Americans—ranging from scientists to artists, labor leaders to civil liberties advocates—responded with detailed recommendations. Yet when America’s AI Action Plan was released today, it became immediately clear that those voices were largely ignored. The plan reads less like a response to public input and more like a pre-written blueprint drafted in collaboration with the very corporations it benefits. The priorities, language, and deregulatory thrust suggest that the real consultations happened behind closed doors—with Big Tech executives, not the American people.

In other words, business as usual.

By any historical measure—Standard Oil, AT&T, or even the Cold War military-industrial complex—the Trump Administration’s “America’s AI Action Plan” represents a radical leap toward a command economy built for and by Big Tech. Only this time, there are no rate regulations, no antitrust checks, and no public obligations—just streamlined subsidies, deregulation, and federally orchestrated dominance by a handful of private AI firms.

“Frontier Labs” as National Champions

The plan doesn’t pretend to be neutral. It picks winners—loudly. Companies like OpenAI, Anthropic, Meta, Microsoft, and Google are effectively crowned as “national champions,” entrusted with developing the frontier of artificial intelligence on behalf of the American state.

– The National AI Research Resource (NAIRR) and National Science Foundation partnerships funnel taxpayer-funded compute and talent into these firms.
– Federal procurement standards now require models that align with “American values,” but only as interpreted by government-aligned vendors.
– These companies will receive priority access to compute in a national emergency, hard-wiring them into the national security apparatus.
– Meanwhile, so-called “open” models will be encouraged in name only—no requirement for training data transparency, licensing, or reproducibility.

This is not a free market. This is national champion industrial policy—without the regulation or public equity ownership that historically came with it.

Infrastructure for Them, Not Us

The Action Plan reads like a wishlist from Silicon Valley’s executive suites:

– Federal lands are being opened up for AI data centers and energy infrastructure.
– Environmental and permitting laws are gutted to accelerate construction of facilities for private use.
– A national electrical grid expansion is proposed—not to serve homes and public transportation, but to power hyperscaler GPUs for model training.
– There’s no mention of public access, community benefit, or rural deployment. This is infrastructure built with public expense for private use.

Even during the era of Ma Bell, the public got universal service and price caps. Here? The public is asked to subsidize the buildout and then stand aside.

Deregulation for the Few, Discipline for the Rest

The Plan explicitly orders:
– Rescission of Biden-era safety and equity requirements.
– Reviews of FTC investigations to shield AI firms from liability.
– Withholding of federal AI funding from states that attempt to regulate the technology for safety, labor, or civil rights purposes.

Meanwhile, these same companies are expected to supply the military, detect cyberattacks, run cloud services for federal agencies, and set speech norms in government systems.

The result? An unregulated cartel tasked with executing state functions.

More Extreme Than Standard Oil or AT&T

Let’s be clear: Standard Oil was broken up. AT&T had to offer regulated universal service. Lockheed, Raytheon, and the Cold War defense contractors were overseen by procurement auditors and GAO enforcement.

This new AI economy is more privatized than any prior American industrial model—yet more dependent on the federal government than ever before. It’s an inversion of free market principles wrapped in American flags and GPU clusters.

Welcome to the Command Economy—For Tech Oligarchs

There’s a word for this: command economy. But instead of bureaucrats in Soviet ministries, we now have a handful of unelected CEOs directing infrastructure, energy, science, education, national security, and labor policy—all through cozy relationships with federal agencies.

If we’re going to nationalize AI, let’s do it honestly—with public governance, democratic accountability, and shared benefit. But this halfway privatized, fully subsidized, and wholly unaccountable structure isn’t capitalism. It’s capture.

AI Needs Ever More Electricity—And Google Wants Us to Pay for It

Uncle Sugar’s “National Emergency” Pitch to Congress

At a recent Congressional hearing, former Google CEO Eric “Uncle Sugar” Schmidt delivered a message that was as jingoistic as it was revealing: if America wants to win the AI arms race, it better start building power plants. Fast. But the subtext was even clearer—he expects the taxpayer to foot the bill because, you know, the Chinese Communist Party. Yes, when it comes to fighting the Red Menace, the all-American boys in Silicon Valley will stand ready to fight to the last Ukrainian, or Taiwanese, or even Texan.

Testifying before the House Energy & Commerce Committee on April 9, Schmidt warned that AI’s natural limit isn’t chips—it’s electricity. He projected that the U.S. would need 92 gigawatts of new generation capacity—the equivalent of nearly 100 nuclear reactors—to keep up with AI demand.

Schmidt didn’t propose that Google, OpenAI, Meta, or Microsoft pay for this themselves, just like they didn’t pay for broadband penetration. No, Uncle Sugar pushed for permitting reform, federal subsidies, and government-driven buildouts of new energy infrastructure. In plain English? He wants the public sector to do the hard and expensive work of generating the electricity that Big Tech will profit from.

Will this Improve the Grid?

And let’s not forget: the U.S. electric grid is already dangerously fragile. It’s aging, fragmented, and increasingly vulnerable to cyberattacks, electromagnetic pulse (EMP) weapons, and even extreme weather events. Pouring public money into ultra-centralized AI data infrastructure—without first securing the grid itself—is like building a mansion on a cracked foundation.

If we are going to incur public debt, we should prioritize resilience, distributed energy, grid security, and community-level reliability—not a gold-plated private infrastructure buildout for companies that already have trillion-dollar valuations.

Big Tech’s Growing Appetite—and Private Hoarding

This isn’t just a future problem. The data center buildout is already in full swing and your Uncle Sugar must be getting nervous about where he’s going to get the money from to run his AI and his autonomous drone weapons. In Oregon, where electricity is famously cheap thanks to the Bonneville Power Administration’s hydroelectric dams on the Columbia River, tech companies have quietly snapped up huge portions of the grid’s output. What was once a shared public benefit—affordable, renewable power—is now being monopolized by AI compute farms whose profits leave the region to the bank accounts in Silicon Valley.

Meanwhile, Microsoft is investing in a nuclear-powered data center next to the defunct Three Mile Island reactor—but again, it’s not about public benefit. It’s about keeping Azure’s training workloads running 24/7. And don’t expect them to share any of that power capacity with the public—or even with neighboring hospitals, schools, or communities.

Letting the Public Build Private Fortresses

The real play here isn’t just to use public power—it’s to get the public to build the power infrastructure, and then seal it off for proprietary use. Moats work both ways.

That includes:
– Publicly funded transmission lines across hundreds of miles to deliver power to remote server farms;
– Publicly subsidized generation capacity (nuclear, gas, solar, hydro—you name it);
– And potentially, prioritized access to the grid that lets AI workloads run while the rest of us face rolling blackouts during heatwaves.

All while tech giants don’t share their models, don’t open their training data, and don’t make their outputs public goods. It’s a privatized extractive model, powered by your tax dollars.

Been Burning for Decades

Don’t forget: Google and YouTube have already been burning massive amounts of electricity for 20 years. It didn’t start with ChatGPT or Gemini. Serving billions of search queries, video streams, and cloud storage events every day requires a permanent baseload—yet somehow this sudden “AI emergency” is being treated like a surprise, as if nobody saw it coming.

If they knew this was coming (and they did), why didn’t they build the power? Why didn’t they plan for sustainability? Why is the public now being told it’s our job to fix their bottleneck?

The Cold War Analogy—Flipped on Its Head

Some industry advocates argue that breaking up Big Tech or slowing AI infrastructure would be like disarming during a new Cold War with China. But Gail Slater, the Assistant Attorney General leading the DOJ’s Antitrust Division, pushed back forcefully—not at a hearing, but on the War Room podcast.

In that interview, Slater recalled how AT&T tried to frame its 1980s breakup as a national security threat, arguing it would hurt America’s Cold War posture. But the DOJ did it anyway—and it led to an explosion of innovation in wireless technology.

“AT&T said, ‘You can’t do this. We are a national champion. We are critical to this country’s success. We will lose the Cold War if you break up AT&T,’ in so many words. … Even so, [the DOJ] moved forward … America didn’t lose the Cold War, and … from that breakup came a lot of competition and innovation.”

“I learned that in order to compete against China, we need to be in all these global races the American way. And what I mean by that is we’ll never beat China by becoming more like China. China has national champions, they have a controlled economy, et cetera, et cetera.

We win all these races and history has taught by our free market system, by letting the ball rip, by letting companies compete, by innovating one another. And the reason why antitrust matters to that picture, to the free market system is because we’re the cop on the beat at the end of the day. We step in when competition is not working and we ensure that markets remain competitive.”

Slater’s message was clear: regulation and competition enforcement are not threats to national strength—they’re prerequisites to it. So there’s no way that the richest corporations in commercial history should be subsidized by the American taxpayer.

Bottom Line: It’s Public Risk, Private Reward

Let’s be clear:

– They want the public to bear the cost of new electricity generation.
– They want the public to underwrite transmission lines.
– They want the public to streamline regulatory hurdles.
– And they plan to privatize the upside, lock down the infrastructure, keep their models secret and socialize the investment risk.

This isn’t a public-private partnership. It’s a one-way extraction scheme. America needs a serious conversation about energy—but it shouldn’t begin with asking taxpayers to bail out the richest companies in commercial history.

Deduplication and Discovery: The Smoking Gun in the Machine

WINSTON

“Wipe up all those little pieces of brains and skull”

From Pulp Fiction, screenplay by Quentin Tarantino and Roger Avary

Deduplication—the process of removing identical or near-identical content from AI training data—is a critical yet often overlooked indicator that AI platforms actively monitor and curate their training sets. This is the kind of process that one would expect given the kind of “scrape, ready, aim” business practices that seems precisely the approach of AI platforms that have ready access to large amounts of fairly high quality data from users of other products placed into commerce by business affiliates or confederates of the AI platforms.

For example, Google Gemini could have access to gmail, YouTube, at least “publicly available” Google Docs, Google Translate, or Google for Education, and then of course one of the great scams of all time, Google Books. Microsoft uses Bing searches, MSN browsing, the consumer Copilot experience, and ad interactions. Amazon uses Alexa prompts, Facebook uses “public” posts and so on.

This kind of hoovering up of indiscriminate amounts of “data” in the form of your baby pictures posted on Facebook and your user generated content on YouTube is bound to produce duplicates. After all, how may users have posted their favorite Billie Eilish or Taylor Swift music video. AI doesn’t need 10000 versions of “Shake it Off” they probably just need the official video. Enter deduplication–which by definition means the platform knows what it has scraped and also knows what it wants to get rid of.

“Get rid of” is a relative concept. In many systems—particularly in storage environments like backup servers or object stores—deduplication means keeping only one physical copy of a file. Any other instances of that data don’t get stored again; instead, they’re represented by pointers to the original copy. This approach, known as inline deduplication, happens in real time and minimizes storage waste without actually deleting anything of functional value. It requires knowing what you have, knowing you have more than one version of the same thing, and being able to tell the system where to look to find the “original” copy without disturbing the process and burning compute inefficiently.

In other cases, such as post-process deduplication, the system stores data initially, then later scans for and eliminates redundancies. Again, the AI platform knows there are two or more versions of the same thing, say the book Being and Nothingness, knows where to find the copies and has been trained to keep only one version. Even here, the duplicates may not be permanently erased—they might be archived, versioned, or logged for auditing, compliance, or reconstruction purposes.

In AI training contexts, deduplication usually means removing redundant examples from the training set to avoid copyright risk. The duplicate content may be discarded from the training pipeline but often isn’t destroyed. Instead, AI companies may retain it in a separate filtered corpus or keep hashed fingerprints to ensure future models don’t retrain on the same material unknowingly.

So they know what they have, and likely know where it came from. They just don’t want to tell any plaintiffs.

Ultimately, deduplication is less about destruction and more about optimization. It’s a way to reduce noise, save resources, and improve performance—while still allowing systems to track, reference, or even rehydrate the original data if needed.

Its existence directly undermines claims that companies are unaware of which copyrighted works were ingested. Indeed, it only makes sense that one of the hidden consequences of the indiscriminate scraping that underpins large-scale AI training is the proliferation of duplicated data. Web crawlers ingest everything they can access—news articles republished across syndicates, forum posts echoed in aggregation sites, Wikipedia mirrors, boilerplate license terms, spammy SEO farms repeating the same language over and over. Without any filtering, this avalanche of redundant content floods the training pipeline.

This is where deduplication becomes not just useful, but essential. It’s the cleanup crew after a massive data land grab. The more messy and indiscriminate the scraping, the more aggressively the model must filter for quality, relevance, and uniqueness to avoid training inefficiencies or—worse—model behaviors that are skewed by repetition. If a model sees the same phrase or opinion thousands of times, it might assume it’s authoritative or universally accepted, even if it’s just a meme bouncing around low-quality content farms.

Deduplication is sort of the Winston Wolf of AI. And if the cleaner shows up, somebody had to order the cleanup. It is a direct response to the excesses of indiscriminate scraping. It’s both a technical fix and a quiet admission that the underlying data collection strategy is, by design, uncontrolled. But while the scraping may be uncontrolled to get copies of as much of your data has they can lay hands on, even by cleverly changing their terms of use boilerplate so they can do all this under the effluvia of legality, they send in the cleaner to take care of the crime scene.

So to summarize: To deduplicate, platforms must identify content-level matches (e.g., multiple copies of Being and Nothingness by Jean-Paul Sartre). This process requires tools that compare, fingerprint, or embed full documents—meaning the content is readable and classifiable–and, oh, yes, discoverable.

Platforms may choose the ‘cleanest’ copy to keep, showing knowledge and active decision-making about which version of a copyrighted work is retained. And–big finish–removing duplicates only makes sense if operators know which datasets they scraped and what those datasets contain.

Drilling down on a platform’s deduplication tools and practices may prove up knowledge and intent to a precise degree—contradicting arguments of plausible deniability in litigation. Johnny ate the cookies isn’t going to fly. There’s a market clearing level of record keeping necessary for deduping to work at all, so it’s likely that there are internal deduplication logs or tooling pipelines that are discoverable.

When AI platforms object to discovery about deduplication, plaintiffs can often overcome those objections by narrowing their focus. Rather than requesting broad details about how a model deduplicates its entire training set, plaintiffs should ask a simple, specific question: Were any of these known works—identified by title or author—deduplicated or excluded from training?

This approach avoids objections about overbreadth or burden. It reframes discovery as a factual inquiry, not a technical deep dive. If the platform claims the data was not retained, plaintiffs can ask for existing artifacts—like hash filters, logs, or manifests—or seek a sworn statement explaining the loss and when it occurred. That, in turn, opens the door to potential spoliation arguments.

If trade secrets are cited, plaintiffs can propose a protective order, limiting access to outside counsel or experts like we’ve done 100,000 times before in other cases. And if the defendant claims “duplicate” is too vague, plaintiffs can define it functionally—as content that’s identical or substantially similar, by hash, tokens, or vectors.

Most importantly, deduplication is relevant. If a platform identified a plaintiff’s work and trained on it anyway, that speaks to volitional use, copying, and lack of care—key issues in copyright and fair use analysis. And if they lied about it, particularly to the court—Helloooooo Harper & Row. Discovery requests that are focused, tailored, and anchored in specific works stand a far better chance of surviving objections and yielding meaningful evidence which hopefully will be useful and lead to other positive results.

Uncle Sugar, the Lord of War: Drones, Data, and Don’t Be Evil

“You know who’s going to inherit the Earth? Arms dealers. Because everyone else is too busy killing each other.”

The Lord of War, Screenplay by Andrew Niccol

Aren’t you glad that we allowed YouTube to jack us around, let Google distribute pirate tracks and sell advertising to pirate sites? Oh, and don’t forget allowing Google to scan all the world’s books–good thing they’re not using any of that to train AI. All thanks to Google’s former CEO Eric Schmidt, aka Uncle Sugar.

This week, Ukraine’s Office of the President announced a strategic partnership with Swift Beat, an AI drone technology company reportedly linked to Eric Schmidt who is showing up everywhere like a latter day Zelig. Yes, that’s right–your Uncle Sugar is back. The Ukraine memorandum of understanding adds yet another layer to the quiet convergence of Silicon Valley money and 21st century warfare that is looking to be Uncle Sugar’s sweet spot. Given that Ukraine depends on the United States to fund roughly half of its defense budget, it’s a fairly safe assumption that somehow, some way, Uncle Sugar’s Washington buddies are helping to fund this deal.

The President of Ukraine’s announcement says that “[Swift Beat] will produce interceptor drones for the Armed Forces of Ukraine to destroy Russian UAVs and missiles, quadcopters for reconnaissance, surveillance, fire adjustment, and logistics, as well as medium-class strike drones for engaging enemy targets.” All based on US intel. So if Swift Beat uses US money received by Ukraine to manufacture this kit, you don’t suppose that Uncle Sugar might be planning on selling it to the good old US of A at some point in the future? Particularly given that the Russia-Ukraine war is frequently cited as a proving ground for the AI driven battle space?

Swift Beat has been portrayed as a nimble startup positioned to bring real-time battlefield intelligence and autonomous drone operations to Ukraine’s army. But as Defence-UA reported, the company’s website is opaque, its corporate structure elusive, and its origins murky. Despite the gravity of the deal—delivering critical defense technology to a country in a kinetic war—Swift Beat appears to lack a documented track record, a history of defense contracting, or even a clear business address. Reporting suggests that Swift Beat is owned by Volya Robotics OÜ, registered in Tallinn, Estonia, with Eric Schmidt as the sole beneficiary. Yeah, that’s the kind of rock solid pedigree I want from someone manufacturing a weapon system to defend my capitol.

Defence-UA raises further questions: why did Ukraine partner with a new firm (apparently founded in 2023) whose founders are tightly linked to U.S. defense tech circles, but whose public presence is nearly nonexistent? What role, if any, did Eric Schmidt’s extensive political and financial connections play in sealing the agreement? Is this a case of wartime innovation at speed—or something more…shall we say…complicated?

The entire arrangement feels eerily familiar. Nicholas Cage’s character in *Lord of War* wasn’t just trafficking weapons—he was selling access, power, and plausible deniability. Substitute advanced AI for Kalashnikovs and you get a contemporary upgrade to the AI bubble: an ecosystem where elite technologists and financiers claim to be “helping,” while building opaque commercial networks through jurisdictions with far less oversight that your uncle would have back home in the US. Cage’s arms dealer character had swagger, but also cover. You know, babes dig the drone. Not that Uncle Sugar would know anything about that angle. Schmidt’s Swift Beat seems to be playing a similar game to Yuri Orlov—with more money, but no less ambiguity.

And this isn’t Schmidt’s first dance in this space. As readers will recall, his growing entanglement in defense procurement, battlefield innovation, and AI-powered surveillance raises not just ethical questions—but geopolitical ones. The revolving door between Big Tech and government has never spun faster, and now it’s air-dropping influence into actual war zones.

Dr. Sarah Myers West of the AI Now Institute warns that figures like Eric Schmidt—who bridge Big Tech and national security—are crafting frameworks that sideline accountability in favor of accelerated deployment. That critique lands squarely in the case of Swift Beat, whose shadowy profile and deep ties to Silicon Valley make it a case study in how defense contracts and contractors can be opaque and deeply unaccountable. And Swift Beat is definitely a company that Dr. West calls “Eric Schmidt adjacent.”

While no public allegations have been made, the unusual structure of the Swift Beat–Ukraine agreement—paired with the company’s lack of operational history and the involvement of high-profile U.S. individuals—may raise important questions under the Foreign Corrupt Practices Act (FCPA). The FCPA prohibits U.S. entities from offering anything of value to foreign officials to secure business advantages, directly or indirectly. When so-far unaudited wartime procurement contracts are awarded through opaque processes and international actors operate through newly formed entities…dare I say “cutouts”–the risk of FCPA violations needs to be explored. In other words, if Google were to get in to the military hardware business like Meta, there would be an employee revolt at the Googleplex. But if they do it through a trusted source, even one over yonder way across the river, well…what’s the evil in helping an old friend? The whole thing sounds pretty spooky.

As Ukraine deepens its relationships with U.S. technology suppliers, and as prominent U.S. investors and executives like Uncle Sugar increase their involvement with all of the above, it may be appropriate for U.S. oversight bodies to take a closer look—not as a condemnation, but in service of transparency, compliance, and public trust. You know, don’t be evil.

The Duty Comes From the Data: Rethinking Platform Liability in the Age of Algorithmic Harm

For too long, dominant tech platforms have hidden behind Section 230 of the Communications Decency Act, claiming immunity for any harm caused by third-party content they host or promote. But as platforms like TikTok, YouTube, and Google have long ago moved beyond passive hosting into highly personalized, behavior-shaping recommendation systems, the legal landscape is shifting in the personal injury context. A new theory of liability is emerging—one grounded not in speech, but in conduct. And it begins with a simple premise: the duty comes from the data.

Surveillance-Based Personalization Creates Foreseeable Risk

Modern platforms know more about their users than most doctors, priests, or therapists. Through relentless behavioral surveillance, they collect real-time information about users’ moods, vulnerabilities, preferences, financial stress, and even mental health crises. This data is not inert or passive. It is used to drive engagement by pushing users toward content that exploits or heightens their current state.

If the user is a minor, a person in distress, or someone financially or emotionally unstable, the risk of harm is not abstract. It is foreseeable. When a platform knowingly recommends payday loan ads to someone drowning in debt, promotes eating disorder content to a teenager, or pushes a dangerous viral “challenge” to a 10-year-old child, it becomes an actor, not a conduit. It enters the “range of apprehension,” to borrow from Judge Cardozo’s reasoning in Palsgraf v. Long Island Railroad (one of my favorite law school cases). In tort law, foreseeability or knowledge creates duty. And here, the knowledge is detailed, intimate, and monetized. In fact it is so detailed we had to coin a new name for it: Surveillance capitalism.

Algorithmic Recommendations as Calls to Action

Defenders of platforms often argue that recommendations are just ranked lists—neutral suggestions, not expressive or actionable speech. But I think in the context of harm accruing to users for whatever reason, speech misses the mark. The speech argument collapses when the recommendation is designed to prompt behavior. Let’s be clear, advertisers don’t come to Google because speech, they come to Google because Google can deliver an audience. As Mr. Wanamaker said, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” If he’d had Google, none of his money would have been wasted–that’s why Google is a trillion dollar market cap company.

When TikTok serves the same deadly challenge over and over to a child, or Google delivers a “pharmacy” ad to someone seeking pain relief that turns out to be a fentanyl-laced fake pill, the recommendation becomes a call to action. That transforms the platform’s role from curator to instigator. Arguably, that’s why Google paid a $500,000,000 fine and entered a non prosecution agreement to keep their executives out of jail. Again, nothing to do with speech.

Calls to action have long been treated differently in tort and First Amendment law. Calls to action aren’t passive; they are performative and directive. Especially when based on intimate surveillance data, these prompts and nudges are no longer mere expressions—they are behavioral engineering. When they cause harm, they should be judged accordingly. And to paraphrase the gambling bromide, the get paid their money and they takes their chances.

Eggshell Skull Meets Platform Targeting

In tort law, the eggshell skull rule (Smith v. Leech Brain & Co. Ltd. my second favorite law school tort case) holds that a defendant must take their victim as they find them. If a seemingly small nudge causes outsized harm because the victim is unusually vulnerable, the defendant is still liable. Platforms today know exactly who is vulnerable—because they built the profile. There’s nothing random about it. They can’t claim surprise when their behavioral nudges hit someone harder than expected.

When a child dies from a challenge they were algorithmically fed, or a financially desperate person is drawn into predatory lending through targeted promotion, or a mentally fragile person is pushed toward self-harm content, the platform can’t pretend it’s just a pipeline. It is a participant in the causal chain. And under the eggshell skull doctrine, it owns the consequences.

Beyond 230: Duty, Not Censorship

This theory of liability does not require rewriting Section 230 or reclassifying platforms as publishers although I’m not opposed to that review. It’s a legal construct that may have been relevant in 1996 but is no longer fit for purpose. Duty as data bypasses the speech debate entirely. What it says is simple: once you use personal data to push a behavioral outcome, you have a duty to consider the harm that may result and the law will hold you accountable for your action. That duty flows from knowledge, very precise knowledge that is acquired with great effort and cost for a singular purpose–to get rich. The platform designed the targeting, delivered the prompt, and did so based on a data profile it built and exploited. It has left the realm of neutral hosting and entered the realm of actionable conduct.

Courts are beginning to catch up. The Third Circuit’s 2024 decision in Anderson v. TikTok reversed the district court and refused to grant 230 immunity where the platform’s recommendation engine was seen as its own speech. But I think the tort logic may be even more powerful than a 230 analysis based on speech: where platforms collect and act on intimate user data to influence behavior, they incur a duty of care. And when that duty is breached, they should be held liable.

The duty comes from the data. And in a world where your data is their new oil, that duty is long overdue.

David Sacks Is Learning That the States Still Matter

For a moment, it looked like the tech world’s powerbrokers had pulled it off. Buried deep in a Republican infrastructure and tax package was a sleeper provision — the so-called AI moratorium — that would have blocked states from passing their own AI laws for up to a decade. It was an audacious move: centralize control over one of the most consequential technologies in history, bypass 50 state legislatures, and hand the reins to a small circle of federal agencies and especially to tech industry insiders.

But then it collapsed.

The Senate voted 99–1 to strike the moratorium. Governors rebelled. Attorneys general sounded the alarm. Artists, parents, workers, and privacy advocates from across the political spectrum said “no.” Even hardline conservatives like Ted Cruz eventually reversed course when it came down to the final vote. The message to Big Tech or the famous “Little Tech” was clear: the states still matter — and America’s tech elite ignore that at their peril.  (“Little Tech” is the latest rhetorical deflection promoted by Big Tech aka propaganda.)

The old Google crowd pushed the moratorium–their fingerprints were obvious. Having gotten fabulously rich off of their two favorites: The DMCA farce and the Section 230 shakedown. But there’s increasing speculation that White House AI Czar and Silicon Valley Viceroy David Sacks, PayPal alum and vocal MAGA-world player, was calling the ball. If true, that makes this defeat even more revealing.

Sacks represents something of a new breed of power-hungry tech-right influencer — part of the emerging “Red Tech” movement that claims to reject woke capitalism and coastal elitism but still wants experts to shape national policy from Silicon Valley, a chapter straight out of Philip Dru: Administrator. Sacks is tied to figures like Peter Thiel, Elon Musk, and a growing network of Trump-aligned venture capitalists. But even that alignment couldn’t save the moratorium.

Why? Because the core problem wasn’t left vs. right. It was top vs. bottom.

In 1964, Ronald Reagan’s classic speech called A Time for Choosing warned about “a little intellectual elite in a far-distant capitol” deciding what’s best for everyone else. That warning still rings true — except now the “capitol” might just be a server farm in Menlo Park or a podcast studio in LA.

The AI moratorium was an attempt to govern by preemption and fiat, not by consent. And the backlash wasn’t partisan. It came from red states and blue ones alike — places where elected leaders still think they have the right to protect their citizens from unregulated surveillance, deepfakes, data scraping, and economic disruption.

So yes, the defeat of the moratorium was a blow to Google’s strategy of soft-power dominance. But it was also a shot across the bow for David Sacks and the would-be masters of tech populism. You can’t have populism without the people.

If Sacks and his cohort want to play a long game in AI policy, they’ll have to do more than drop ideas into the policy laundry of think tank white papers and Beltway briefings. They’ll need to win public trust, respect state sovereignty, and remember that governing by sneaky safe harbors is no substitute for legitimacy.  

The moratorium failed because it presumed America could be governed like a tech startup — from the top, at speed, with no dissent. Turns out the country is still under the impression they have something to say about how they are governed, especially by Big Tech.

The Patchwork They Fear Is Accountability: Why Big AI Wants a Moratorium on State Laws

Why Big Tech’s Push for a Federal AI Moratorium Is Really About Avoiding State Investigations, Liability, and Transparency

As Congress debates the so-called “One Big Beautiful Bill Act,” one of its most explosive provisions has stayed largely below the radar: a 10-year or 5-year or any-year federal moratorium on state and local regulation of artificial intelligence. Supporters frame it as a common sense way to prevent a “patchwork” of conflicting state laws. But the real reason for the moratorium may be more self-serving—and more ominous.

The truth is, the patchwork they fear is not complexity. It’s accountability.

Liability Landmines Beneath the Surface

As has been well-documented by the New York Times and others, generative AI platforms have likely ingested and processed staggering volumes of data that implicate state-level consumer protections. This includes biometric data (like voiceprints and faces), personal communications, educational records, and sensitive metadata—all of which are protected under laws in states like Illinois (BIPA), California (CCPA/CPRA), and Texas.

If these platforms scraped and trained on such data without notice or consent, they are sitting on massive latent liability. Unlike federal laws, which are often narrow or toothless, many state statutes allow private lawsuits and statutory damages. Class action risk is not hypothetical—it is systemic.  It is crucial for policymakers to have a clear understanding of where we are today with respect to the collision between AI and consumer rights, including copyright.  The corrosion of consumer rights by the richest corporations in commercial history is not something that may happen in the future.  Massive violations have  already occurred, are occurring this minute, and will continue to occur into the future at an increasing rate.  

The Quiet Race to Avoid Discovery

State laws don’t just authorize penalties; they open the door to discovery. Once an investigation or civil case proceeds, AI platforms could be forced to disclose exactly what data they trained on, how it was retained, and whether any red flags were ignored.

This mirrors the arc of the social media addiction lawsuits now consolidated in multidistrict litigation. Platforms denied culpability for years—until internal documents showed what they knew and when. The same thing could happen here, but on a far larger scale.

Preemption as Shield and Sword

The proposed AI moratorium isn’t a regulatory timeout. It’s a firewall. By halting enforcement of state AI laws, the moratorium could prevent lawsuits, derail investigations, and shield past conduct from scrutiny.

Even worse, the Senate version conditions broadband infrastructure funding (BEAD) on states agreeing to the moratorium—an unconstitutional act of coercion that trades state police powers for federal dollars. The legal implications are staggering, especially under the anti-commandeering doctrine of Murphy v. NCAA and Printz v. United States.

This Isn’t About Clarity. It’s About Control.

Supporters of the moratorium, including senior federal officials and lobbying arms of Big Tech, claim that a single federal standard is needed to avoid chaos. But the evidence tells a different story.

States are acting precisely because Congress hasn’t. Illinois’ BIPA led to real enforcement. California’s privacy framework has teeth. Dozens of other states are pursuing legislation to respond to harms AI is already causing.

In this light, the moratorium is not a policy solution. It’s a preemptive strike.

Who Gets Hurt?
– Consumers, whose biometric data may have been ingested without consent
– Parents and students, whose educational data may now be part of generative models
– Artists, writers, and journalists, whose copyrighted work has been scraped and reused
– State AGs and legislatures, who lose the ability to investigate and enforce

Google Is an Example of Potential Exposure

Google’s former executive chairman Eric Schmidt has seemed very, very interested in writing the law for AI.  For example, Schmidt worked behind the scenes for the two years at least to establish US artificial intelligence policy under President Biden. Those efforts produced the “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence“, the longest executive order in history. That EO was signed into effect by President Biden on October 30.  In his own words during an Axios interview with Mike Allen, the Biden AI EO was signed just in time for Mr. Schmidt to present that EO as what Mr. Schmidt calls “bait” to the UK government–which convened a global AI safety conference at Bletchley Park in the UK convened by His Excellency Rishi Sunak (the UK’s tech bro Prime Minister) that just happened to start on November 1, the day after President Biden signed the EO.  And now look at the disaster that the UK AI proposal would be.  

As Mr. Schmidt told Axios:

So far we are on a win, the taste of winning is there.  If you look at the UK event which I was part of, the UK government took the bait, took the ideas, decided to lead, they’re very good at this,  and they came out with very sensible guidelines.  Because the US and UK have worked really well together—there’s a group within the National Security Council here that is particularly good at this, and they got it right, and that produced this EO which is I think is the longest EO in history, that says all aspects of our government are to be organized around this.

Apparently, Mr. Schmidt hasn’t gotten tired of winning.  Of course, President Trump rescinded the Biden AI EO which may explain why we are now talking about a total moratorium on state enforcement which percolated at a very pro-Google shillery called R Street Institute, apparently by one Adam Thierer .  But why might Google be so interested in this idea?

Google may face exponentially acute liability under state laws if it turns out that biometric or behavioral data from platforms like YouTube Kids or Google for Education were ingested into AI training sets. 

These services, marketed to families and schools, collect sensitive information from minors—potentially implicating both federal protections like COPPA and more expansive state statutes. As far back as 2015, Senator Ben Nelson raised alarms about YouTube Kids, calling it “ridiculously porous” in terms of oversight and lack of safeguards. If any of that youth-targeted data has been harvested by generative AI tools, the resulting exposure is not just a regulatory lapse—it’s a landmine. 

The moratorium could be seen as an attempt to preempt the very investigations that might uncover how far that exposure goes.

What is to be Done?

Instead of smuggling this moratorium into a must-pass bill, Congress should strip it out and hold open hearings. If there’s merit to federal preemption, let it be debated on its own. But do not allow one of the most sweeping power grabs in modern tech policy to go unchallenged.

The public deserves better. Our children deserve better.  And the states have every right to defend their people. Because the patchwork they fear isn’t legal confusion.

It’s accountability.