AI training – Music Tech Solutions

What would Freud do?

It’s a strange question to ask about AI and copyright, but a useful one. When generative-AI fans insist that training models on copyrighted works is merely “learning like a human,” they rely on a metaphor that collapses under even minimal scrutiny. Psychoanalysis—whatever one thinks of Freud’s conclusions—begins from a premise that modern AI rhetoric quietly denies: the unconscious is not a database, and humans are not machines.

As Freud wrote in The Interpretation of Dreams, “Our memory has no guarantees at all, and yet we bow more often than is objectively justified to the compulsion to believe what it says.” No AI truthiness there.

Human learning does not involve storing perfect, retrievable copies of what we read, hear, or see. Memory is reconstructive, shaped by context, emotion, repression, and time. Dreams do not replay inputs; they transform them. What persists is meaning, not a file.

AI training works in the opposite direction—obviously. Training begins with high-fidelity copying at industrial scale. It converts human expressive works into durable statistical parameters designed for reuse, recall, and synthesis for eternity. Where the human mind forgets, distorts, and misremembers as a feature of cognition, models are engineered to remember as much as possible, as efficiently as possible, and to deploy those memories at superhuman speed. Nothing like humans.

Calling these two processes “the same kind of learning” is not analogy—it is misdirection. And that misdirection matters, because copyright law was built around the limits of human expression: scarcity, imperfection, and the fact that learning does not itself create substitute works at scale.

Dream-Work Is Not a Training Pipeline

Freud’s theory of dreams turns on a simple but powerful idea: the mind does not preserve experience intact. Instead, it subjects experience to dream-work—processes like condensation (many ideas collapsed into one image), displacement (emotional significance shifted from one object to another), and symbolization (one thing representing another, allowing humans to create meaning and understanding through symbols). The result is not a copy of reality but a distorted, overdetermined construction whose origins cannot be cleanly traced.

This matters because it shows what makes human learning human. We do not internalize works as stable assets. We metabolize them. Our memories are partial, fallible, and personal. Two people can read the same book and walk away with radically different understandings—and neither “contains” the book afterward in any meaningful sense. There is no Rashamon effect for an AI.

AI training is the inverse of dream-work. It depends on perfect copying at ingestion, retention of expressive regularities across vast parameter spaces, and repeatable reuse untethered from embodiment, biography, or forgetting. If Freud’s model describes learning as transformation through loss, AI training is transformation through compression without forgetting.

One produces meaning. The other produces capacity.

The Unconscious Is Not a Database

Psychoanalysis rejects the idea that memory functions like a filing cabinet. The unconscious is not a warehouse of intact records waiting to be retrieved. Memory is reconstructed each time it is recalled, reshaped by narrative, emotion, and social context. Forgetting is not a failure of the system; it is a defining feature.

AI systems are built on the opposite premise. Training assumes that more retention is better, that fidelity is a virtue, and that expressive regularities should remain available for reuse indefinitely. What human cognition resists by design—perfect recall at scale—machine learning seeks to maximize.

This distinction alone is fatal to the “AI learns like a human” claim. Human learning is inseparable from distortion, limitation, and individuality. AI training is inseparable from durability, scalability, and reuse.

In The Divided Self, R. D. Laing rejects the idea that the mind is a kind of internal machine storing stable representations of experience. What we encounter instead is a self that exists only precariously, defined by what Laing calls “ontological security” or its absence—the sense of being real, continuous, and alive in relation to others. Experience, for Laing, is not an object that can be detached, stored, or replayed; it is lived, relational, and vulnerable to distortion. He warns repeatedly against confusing outward coherence with inner unity, emphasizing that a person may present a fluent, organized surface while remaining profoundly divided within. That distinction matters here: performance is not understanding, and intelligible output is not evidence of an interior life that has “learned” in any human sense.

Why “Unlearning” Is Not Forgetting

Once you understand this distinction, the problem with AI “unlearning” becomes obvious.

In human cognition, there is no clean undo. Memories are never stored as discrete objects that can be removed without consequence. They reappear in altered forms, entangled with other experiences. Freud’s entire thesis rests on the impossibility of clean erasure.

AI systems face the opposite dilemma. They begin with discrete, often unlawful copies, but once those works are distributed across parameters, they cannot be surgically removed with certainty. At best, developers can stop future use, delete datasets, retrain models, or apply partial mitigation techniques (none of which they are willing to even attempt). What they cannot do is prove that the expressive contribution of a particular work has been fully excised.

This is why promises (especially contractual promises) to “reverse” improper ingestion are so often overstated. The system was never designed for forgetting. It was designed for reuse.

Why This Matters for Fair Use and Market Harm

The “AI = human learning” analogy does real damage in copyright analysis because it smuggles conclusions into fair-use factor one (transformative purpose and character) and obscures factor four (market harm).

Learning has always been tolerated under copyright law because learning does not flood markets. Humans do not emerge from reading a novel with the ability to generate thousands of competing substitutes at scale. Generative models do exactly that—and only because they are trained through industrial-scale copying.

Copyright law is calibrated to human limits. When those limits disappear, the analysis must change with them. Treating AI training as merely “learning” collapses the very distinction that makes large-scale substitution legally and economically significant.

The Pensieve Fallacy

There is a world in which minds function like databases. It is a fictional one.

In Harry Potter and the Goblet of Fire, wizards can extract memories, store them in vials, and replay them perfectly using a Pensieve. Memories in that universe are discrete, stable, lossless objects. They can be removed, shared, duplicated, and inspected without distortion. As Dumbledore explained to Harry, “I use the Pensieve. One simply siphons the excess thoughts from one’s mind, pours them into the basin, and examines them at one’s leisure. It becomes easier to spot patterns and links, you understand, when they are in this form.”

That is precisely how AI advocates want us to imagine learning works.

But the Pensieve is magic because it violates everything we know about human cognition. Real memory is not extractable. It cannot be replayed faithfully. It cannot be separated from the person who experienced it. Arguably, Freud’s work exists because memory is unstable, interpretive, and shaped by conflict and context.

AI training, by contrast, operates far closer to the Pensieve than to the human mind. It depends on perfect copies, durable internal representations, and the ability to replay and recombine expressive material at will.

The irony is unavoidable: the metaphor that claims to make AI training ordinary only works by invoking fantasy.

Humans Forget. Machines Remember.

Freud would not have been persuaded by the claim that machines “learn like humans.” He would have rejected it as a category error. Human cognition is defined by imperfection, distortion, and forgetting. AI training is defined by reproduction, scale, and recall.

To believe AI learns like a human, you have to believe humans have Pensieves. They don’t. That’s why Pensieves appear in Harry Potter—not neuroscience, copyright law, or reality.

Tech companies want your content. Not just to host it, but for their training pipeline—to train models, refine algorithms, and “improve services” in ways that just happen to lead to new commercial AI products. But as public awareness catches up, we’ve entered a new phase: deniable ingestion.

Welcome to the world of the Schrödinger’s training clause—a legal paradox where your data is simultaneously not being used to train AI and fully licensed in case they decide to do so.

The Door That’s Always Open

Let’s take the WeTransfer case. For a brief period this month (in July 2025), their Terms of Service included an unmistakable clause: users granted them rights to use uploaded content to “improve the performance of machine learning models.” That language was direct. It caused backlash. And it disappeared.

Many mea culpas later, their TOS has been scrubbed clean of AI references. I appreciate the sentiment, really I do. But—and there’s always a but–the core license hasn’t changed. It’s still:

– Perpetual

– Worldwide

– Royalty-free

– Transferable

– Sub-licensable

They’ve simply returned the problem clause to its quantum box. No machine learning references. But nothing that stops it either.

A Clause in Superposition

Platforms like WeTransfer—and others—have figured out the magic words: Don’t say you’re using data to train AI. Don’t say you’re not using it either. Instead, claim a sweeping license to do anything necessary to “develop or improve the service.”

That vague phrasing allows future pivots. It’s not a denial. It’s a delay. And to delay is to deny.

That’s what makes it Schrödinger’s training clause: Your content isn’t being used for AI. Unless it is. And you won’t know until someone leaks it, or a lawsuit makes discovery public.

The Scrape-Then-Scrub Scenario

Let’s reconstruct what could have happened–not saying it did happen, just could have–following the timeline in The Register:

1. Early July 2025: WeTransfer silently updates its Terms of Service to include AI training rights.

2. Users continue uploading sensitive or valuable content.

3. [Somebody’s] AI systems quickly ingest that data under the granted license.

4. Public backlash erupts mid-July.

5. WeTransfer removes the clause—but to my knowledge never revokes the license retroactively or promises to delete what was scraped. In fact, here’s their statement which includes this non-denial denial: “We don’t use machine learning or any form of AI to process content shared via WeTransfer.” OK, that’s nice but that wasn’t the question. And if their TOS was so clear, then why the amendment in the first place?

Here’s the Potential Legal Catch

Even if WeTransfer removed the clause later, any ingestion that occurred during the ‘AI clause window’ is arguably still valid under the terms then in force. As far as I know, they haven’t promised:

– To destroy any trained models

– To purge training data caches

– Or to prevent third-party partners from retaining data accessed lawfully at the time

What Would ‘Undoing’ Scraping Require?

– Audit logs to track what content was ingested and when

– Reversion of any models trained on user data

– Retroactive license revocation and sub-license termination

None of this has been offered that I have seen.

What ‘We Don’t Train on Your Data’ Actually Means

When companies say, “we don’t use your data to train AI,” ask:

– Do you have the technical means to prevent that?

– Is it contractually prohibited?

– Do you prohibit future sublicensing?

– Can I audit or opt out at the file level?

If the answer to those is “no,” then the denial is toothless.

How Creators Can Fight Back

1. Use platforms that require active opt-in for AI training.

2. Encrypt files before uploading.

3. Include counter-language in contracts or submission terms:

“No content provided may be used, directly or indirectly, to train or fine-tune machine learning or artificial intelligence systems, unless separately and explicitly licensed for that purpose in writing” or something along those lines.

4. Call it out. If a platform uses Schrödinger’s language, name it. The only thing tech companies fear more than litigation is transparency.

What is to Be Done?

The most dangerous clauses aren’t the ones that scream “AI training.” They’re the ones that whisper, “We’re just improving the service.”

If you’re a creative, legal advisor, or rights advocate, remember: the future isn’t being stolen with force. It’s being licensed away in advance, one unchecked checkbox at a time.

And if a platform’s only defense is “we’re not doing that right now”—that’s not a commitment. That’s a pause.

That’s Schrödinger’s training clause.

Music Tech Solutions

Chris Castle's Solutions for Music-Tech Entrepreneurs

Tag: AI training

What Would Freud Do? The Unconscious Is Not a Database — and Humans Are Not Machines

What would Freud do?

Dream-Work Is Not a Training Pipeline

The Unconscious Is Not a Database

Why “Unlearning” Is Not Forgetting

Why This Matters for Fair Use and Market Harm

The Pensieve Fallacy

Humans Forget. Machines Remember.

Schrödinger’s Training Clause: How Platforms Like WeTransfer Say They’re Not Using Your Files for AI—Until They Are

The Door That’s Always Open

A Clause in Superposition

The Scrape-Then-Scrub Scenario

Here’s the Potential Legal Catch

What Would ‘Undoing’ Scraping Require?

What ‘We Don’t Train on Your Data’ Actually Means

How Creators Can Fight Back

What is to Be Done?

Menu

Music Tech Solutions

Chris Castle's Solutions for Music-Tech Entrepreneurs

What would Freud do?

Dream-Work Is Not a Training Pipeline

The Unconscious Is Not a Database

Why “Unlearning” Is Not Forgetting

Why This Matters for Fair Use and Market Harm

The Pensieve Fallacy

Humans Forget. Machines Remember.

Share this:

The Door That’s Always Open

A Clause in Superposition

The Scrape-Then-Scrub Scenario

Here’s the Potential Legal Catch

What Would ‘Undoing’ Scraping Require?

What ‘We Don’t Train on Your Data’ Actually Means

How Creators Can Fight Back

What is to Be Done?

Share this:

Menu