Schrödinger’s Training Clause: How Platforms Like WeTransfer Say They’re Not Using Your Files for AI—Until They Are

Tech companies want your content. Not just to host it, but for their training pipeline—to train models, refine algorithms, and “improve services” in ways that just happen to lead to new commercial AI products. But as public awareness catches up, we’ve entered a new phase: deniable ingestion.

Welcome to the world of the Schrödinger’s training clause—a legal paradox where your data is simultaneously not being used to train AI and fully licensed in case they decide to do so.

The Door That’s Always Open

Let’s take the WeTransfer case. For a brief period this month (in July 2025), their Terms of Service included an unmistakable clause: users granted them rights to use uploaded content to “improve the performance of machine learning models.” That language was direct. It caused backlash. And it disappeared.

Many mea culpas later, their TOS has been scrubbed clean of AI references. I appreciate the sentiment, really I do. But—and there’s always a but–the core license hasn’t changed. It’s still:

– Perpetual

– Worldwide

– Royalty-free

– Transferable

– Sub-licensable

They’ve simply returned the problem clause to its quantum box. No machine learning references. But nothing that stops it either.

 A Clause in Superposition

Platforms like WeTransfer—and others—have figured out the magic words: Don’t say you’re using data to train AI. Don’t say you’re not using it either. Instead, claim a sweeping license to do anything necessary to “develop or improve the service.”

That vague phrasing allows future pivots. It’s not a denial. It’s a delay. And to delay is to deny.

That’s what makes it Schrödinger’s training clause: Your content isn’t being used for AI. Unless it is. And you won’t know until someone leaks it, or a lawsuit makes discovery public.

The Scrape-Then-Scrub Scenario

Let’s reconstruct what could have happened–not saying it did happen, just could have–following the timeline in The Register:

1. Early July 2025: WeTransfer silently updates its Terms of Service to include AI training rights.

2. Users continue uploading sensitive or valuable content.

3. [Somebody’s] AI systems quickly ingest that data under the granted license.

4. Public backlash erupts mid-July.

5. WeTransfer removes the clause—but to my knowledge never revokes the license retroactively or promises to delete what was scraped. In fact, here’s their statement which includes this non-denial denial: “We don’t use machine learning or any form of AI to process content shared via WeTransfer.” OK, that’s nice but that wasn’t the question. And if their TOS was so clear, then why the amendment in the first place?

Here’s the Potential Legal Catch

Even if WeTransfer removed the clause later, any ingestion that occurred during the ‘AI clause window’ is arguably still valid under the terms then in force. As far as I know, they haven’t promised:

– To destroy any trained models

– To purge training data caches

– Or to prevent third-party partners from retaining data accessed lawfully at the time

What Would ‘Undoing’ Scraping Require?

– Audit logs to track what content was ingested and when

– Reversion of any models trained on user data

– Retroactive license revocation and sub-license termination

None of this has been offered that I have seen.

What ‘We Don’t Train on Your Data’ Actually Means

When companies say, “we don’t use your data to train AI,” ask:

– Do you have the technical means to prevent that?

– Is it contractually prohibited?

– Do you prohibit future sublicensing?

– Can I audit or opt out at the file level?

If the answer to those is “no,” then the denial is toothless.

How Creators Can Fight Back

1. Use platforms that require active opt-in for AI training.

2. Encrypt files before uploading.

3. Include counter-language in contracts or submission terms:

   “No content provided may be used, directly or indirectly, to train or fine-tune machine learning or artificial intelligence systems, unless separately and explicitly licensed for that purpose in writing” or something along those lines.

4. Call it out. If a platform uses Schrödinger’s language, name it. The only thing tech companies fear more than litigation is transparency.

What is to Be Done?

The most dangerous clauses aren’t the ones that scream “AI training.” They’re the ones that whisper, “We’re just improving the service.”

If you’re a creative, legal advisor, or rights advocate, remember: the future isn’t being stolen with force. It’s being licensed away in advance, one unchecked checkbox at a time.

And if a platform’s only defense is “we’re not doing that right now”—that’s not a commitment. That’s a pause.

That’s Schrödinger’s training clause.