The Return of the Bubble Rider: Masa, OpenAI, and the New AI Supercycle

“Hubris gives birth to the tyrant; hubris, when glutted on vain visions, plunges into an abyss of doom.”
Agamemnon by Aeschylus

Masayoshi Son has always believed he could see farther into the technological future than everyone else. Sometimes he does. Sometimes he rides straight off a cliff. But the pattern is unmistakable: he is the market’s most fearless—and sometimes most reckless—Bubble Rider.

In the late 1990s, Masa became the patron saint of the early internet. SoftBank took stakes in dozens of dot-coms, anchored by its wildly successful bet on Yahoo! (yes, Yahoo!  Ask your mom.). For a moment, Masa was briefly one of the world’s richest men on paper. Then the dot-bomb hit. Overnight, SoftBank lost nearly everything. Masa has said he personally watched $70 billion evaporate—the largest individual wealth wipeout ever recorded at the time. But his instinct wasn’t to retreat. It was to reload.

That same pattern returned with SoftBank’s Vision Fund. Masa raised unprecedented capital from sovereign wealth pools and bet big on the “AI + data” megatrend—then plowed it into companies like WeWork, Zume, Brandless, and other combustion-ready unicorns. When those valuations collapsed, SoftBank again absorbed catastrophic losses. And yet the thesis survived, just waiting for its next bubble.

We’re now in what I’ve called the AI Bubble—the largest capital-formation mania since the original dot-com wave, powered by foundation AI labs, GPU scarcity, and a global arms race to capture platform rents. And here comes Masa again, right on schedule.

SoftBank has now sold its entire Nvidia stake—the hottest AI infrastructure trade of the decade—freeing up nearly $6 billion. That money is being redirected straight into OpenAI’s secondary stock offering at an eyewatering marked-to-fantasy $500 billion valuation. In the same week, SoftBank confirmed it is preparing even larger AI investments. This is Bubble Riding at its purest: exiting one vertical where returns may be peaking, and piling into the center of speculative gravity before the froth crests.

What I suspect Masa sees is simple: if generative AI succeeds, the model owners will become the new global monopolies alongside the old global monopolies like Google and Microsoft.  You know, democratizing the Internet. If it fails, the whole electric grid and water supply may crash along with it. He’s choosing a side—and choosing it at absolute top-of-market pricing.

The other difference between the dot-com bubble and the AI bubble is legal, not just financial. Pets.com and its peers (who I refer to generically as “Socks.com” the company that uses the Internet to find socks under the bed) were silly, but they weren’t being hauled into court en masse for building their core product on other people’s property. 

Today’s AI darlings are major companies being run like pirate markets. Meta, Anthropic, OpenAI and others are already facing a wall of litigation from authors, news organizations, visual artists, coders, and music rightsholders who all say the same thing: your flagship models exist only because you ingested our work without permission, at industrial scale, and you’re still doing it. 

That means this bubble isn’t just about overpaying for growth; it’s about overpaying for businesses whose main asset—trained model weights—may be encumbered by unpriced copyright and privacy claims. The dot-com era mispriced eyeballs. The AI era may be mispricing liability.  And that’s serious stuff.

There’s another distortion the dot-com era never had: the degree to which the AI bubble is being propped up by taxpayers. Socks.com didn’t need a new substation, a federal loan guarantee, or a 765 kV transmission corridor to find your socks. Today’s Socks.ai does need all that to use AI to find socks under the bed.  All the AI giants do. Their business models quietly assume public willingness to underwrite an insanely expensive buildout of power plants, high-voltage lines, and water-hungry cooling infrastructure—costs socialized onto ratepayers and communities so that a handful of platforms can chase trillion-dollar valuations. The dot-com bubble misallocated capital; the AI bubble is trying to reroute the grid.

In that sense, this isn’t just financial speculation on GPUs and model weights—it’s a stealth industrial policy, drafted in Silicon Valley and cashed at the public’s expense.

The problem, as always, is timing. Bubbles create enormous winners and equally enormous craters. Masa’s career is proof. But this time, the stakes are higher. The AI Bubble isn’t just a capital cycle; it’s a geopolitical and industrial reordering, pulling in cloud platforms, national security, energy systems, media industries, and governments with a bad case of FOMO scrambling to regulate a technology they barely understand.

And now, just as Masa reloads for his next moonshot, the market itself is starting to wobble. The past week’s selloff may not be random—it feels like a classic early-warning sign of a bubble straining under its own weight. In every speculative cycle, the leaders crack first: the most crowded trades, the highest-multiple stories, the narratives everyone already believes. This time, those leaders are the AI complex—GPU giants, hyperscale clouds, and anything with “model” or “inference” in the deck. When those names roll over together, it tells you something deeper than normal volatility is at work.

What the downturn may exposes is the growing narrative about an “earnings gap.“ Investors have paid extraordinary prices for companies whose long-term margins remain theoretical, whose energy demands are exploding, and whose regulatory and copyright liabilities are still unpriced. The AI story is enormous—but the business model remains unresolved. A selloff forces the market to remember the thing it forgets at every bubble peak: cash flow eventually matters.

Back in the late-cycle of the dot com era, I had lunch in December of 1999 with a friend who had worked 20 years in a division of a huge conglomerate, bought his division in a leveraged buyout, ran that company for 10 years then took that public, sold it to another company that then went public.  He asked me to explain how these dot coms were able to go public, a process he equated with hard work and serious people.  I said, well we like them to have four quarters of top line revenue.  He stared at me.  I said, I know it’s stupid, but that’s what they say.  He said, it’s all going to crash.  And boy did it ever.

And ironically, nothing captures this late-cycle psychology better than Masa’s own behavior. SoftBank selling Nvidia—the proven cash-printing side of AI—to buy OpenAI at a $500 billion valuation isn’t contrarian genius; it’s the definition of a crowded climax trade, the moment when everyone is leaning the same direction. When that move coincides with the tape turning red, the message is unmistakable: the AI supercycle may not be over, but the easy phase is.

Whether this is the start of a genuine deflation or just the first hard jolt before the final manic leg, the pattern is clear. The AI Bubble is no longer hypothetical—it is showing up on the trading screens, in the sentiment, and in the rotation of capital itself.

Masa may still believe the crest of the wave lies ahead. But the market has begun to ask the question every bubble eventually faces: What if this is the top of the ride?

Masa is betting that the crest of the curve lies ahead—that we’re in Act Two of an AI supercycle. Maybe he’s right. Or maybe he’s gearing up for his third historic wipeout.

Either way, he’s back in the saddle.

The Bubble Rider rides again.

Judge Failla’s Opinion in Dow Jones v. Perplexity: RAG as Mechanism of Infringement

Judge Failla’s opinion in Dow Jones v. Perplexity doesn’t just keep the case alive—it frames RAG itself as the act of copying, and raises the specter of inducement liability under Grokster.

Although Judge Katherine Polk Failla’s August 21, 2025 opinion in Dow Jones & Co. v. Perplexity is technically a procedural ruling denying Perplexity’s motions to dismiss or transfer, Judge Failla offers an unusually candid window into how the Court may view the substance of the case. In particular, her treatment of retrieval-augmented generation (RAG) is striking: rather than describing it as Perplexity’s background plumbing, she identified it as the mechanism by which copyright infringement and trademark misattribution allegedly occur.  

Remember, Perplexity’s CEO described the company to Forbes as “It’s almost like Wikipedia and ChatGPT had a kid.” I’m still looking for that attribution under the Wikipedia Creative Commons license.

As readers may recall, I’ve been very interested in RAG as an open door for infringement actions, so naturally this discussion caught my eye.  So we’re all on the page, retrieval-augmented generation (RAG) uses a “vector database” to expand an AI system’s knowledge beyond what is locked in its training data, including recent news sources for example. 

When you prompt a RAG-enabled model, it first searches the database for context, then weaves that information into its generated answer. This architecture makes outputs more accurate, current, and domain-specific, but also raises questions about copyright, data governance, and intentional use of third-party content mostly because RAG may rely on information outside of its training data.  Like if I queried “single bullet theory” the AI might have a copy of the Warren Commission report, but would need to go out on the web for the latest declassified JFK materials or news reports about those materials to give a complete answer.

You can also think of Google Search or Bing as a kind of RAG index—and you can see how that would give search engines a big leg up in the AI race, even though none of their various safe harbors, Creative Commons licenses, Google Books or direct licenses were for this RAG purpose.  So there’s that.

Judge Failla’s RAG Analysis

As Judge Failla explained, Perplexity’s system “relies on a retrieval-augmented generation (‘RAG’) database, comprised of ‘content from original sources,’ to provide answers to users,” with the indices “comprised of content that [Perplexity] want[s] to use as source material from which to generate the ‘answers’ to user prompts and questions.’” The model then “repackages the original, indexed content in written responses … to users,” with the RAG technology “tell[ing] the LLM exactly which original content to turn into its ‘answer.’” Or as another judge once said, “One who distributes a device with the object of promoting its use to infringe copyright, as shown by clear expression or other affirmative steps taken to foster infringement, going beyond mere distribution with knowledge of third-party action, is liable for the resulting acts of infringement by third parties using the device, regardless of the device’s lawful uses.” Or something like that.

On that basis, Judge Failla recognized Plaintiffs’ claim that infringement occurred at both ends of the process: “first, by ‘copying a massive amount of Plaintiffs’ copyrighted works as inputs into its RAG index’; second, by providing consumers with outputs that ‘contain full or partial verbatim reproductions of Plaintiffs’ copyrighted articles’; and third, by ‘generat[ing] made-up text (hallucinations) … attribut[ed] … to Plaintiffs’ publications using Plaintiffs’ trademarks.’” In her jurisdictional analysis, Judge Failla stressed that these “inputs are significant because they cause Defendant’s website to produce answers that are reproductions or detailed summaries of Plaintiffs’ copyrighted works,” thus tying the alleged misconduct directly to Perplexity’s business activities in New York although she was not making a substantive ruling in this instance.

What is RAG and Why It Matters

Retrieval-augmented generation is a method that pairs two steps: (1) retrieval of content from external databases or the open web, and (2) generation of a synthetic answer using a large language model. Instead of relying solely on the model’s pre-training, RAG systems point the model toward selected source material such as news articles, scientific papers, legal databases and instruct it to weave that content into an answer. 

From a user perspective, this can produce more accurate, up-to-date results. But from a legal perspective, the same pipeline can directly copy or closely paraphrase copyrighted material, often without attribution, and can even misattribute hallucinated text to legitimate sources. This dual role of RAG—retrieving copyrighted works as inputs and reproducing them as outputs—is exactly what made it central to Judge Failla’s opinion procedurally, but also may show where she is thinking substantively.

RAG in Frontier Labs

RAG is not a niche technique. It has become standard practice at nearly every frontier AI lab:

– OpenAI uses retrieval plug-ins and Bing integrations to ground ChatGPT answers.
– Anthropic deploys RAG pipelines in Claude for enterprise customers.
– Google DeepMind integrates RAG into Gemini and search-linked models.
– Meta builds retrieval into LLaMA applications and experimental assistants like Grok.
– Microsoft has made Copilot fundamentally a RAG product, pairing Bing with GPT.
– Cohere, Mistral, and other independents market RAG as a service layer for enterprises.

Why Dow Jones Matters Beyond Perplexity

Perplexity just happened to be first reported opinion as far as I know. The technical structure of its answer engine—indexing copyrighted content into a RAG system, then repackaging it for users—is not unique. It mirrors how the rest of the frontier labs are building their flagship products. What makes this case important is not that Perplexity is an outlier, but that it illustrates the legal vulnerability inherent in the RAG architecture itself.

Is RAG the Low-Hanging Fruit?

What makes this case so consequential is not just that Judge Failla recognized, at least for this ruling, that RAG is at least one mechanism of infringement, but that RAG cases may be easier to prove than disputes over model training inputs. Training claims often run into evidentiary hurdles: plaintiffs must show that their works were included in massive opaque training corpora, that those works influenced model parameters, and that the resulting outputs are “substantially similar.” That chain of proof can be complex and indirect.

By contrast, RAG systems operate in the open. They index specific copyrighted articles, feed them directly into a generation process, and sometimes output verbatim or near-verbatim passages. Plaintiffs can point to before-and-after evidence: the copyrighted article itself, the RAG index that ingested it, and the system’s generated output reproducing it. That may make proving copyright infringement far more straightforward to demonstrate than in a pure training case.

For that reason, Perplexity just happened to be first, but it will not be the last. Nearly every frontier lab such as OpenAI, Anthropic, Google, Meta, Microsoft is relying on RAG as the architecture of choice to ground their models. If RAG is the legal weak point, this opinion could mark the opening salvo in a much broader wave of litigation aimed at AI platforms, with courts treating RAG not as a technical curiosity but as a direct, provable conduit for infringement. 

And lurking in the background is a bigger question: is Grokster going to be Judge Failla’s roundhouse kick? That irony is delicious.  By highlighting how Perplexity (and the others) deliberately designed its system to ingest and repackage copyrighted works, the opinion sets the stage for a finding of intentionality that could make RAG the twenty-first-century version of inducement liability.

AI Frontier Labs and the Singularity as a Modern Prophetic Cult

It gets rid of your gambling debts 
It quits smoking 
It’s a friend, it’s a companion 
It’s the only product you will ever need
From Step Right Up, written by Tom Waits

The AI “frontier labs” — OpenAI, Anthropic, DeepMind, xAI, and their constellation of evangelists — often present themselves as the high priests of a coming digital transcendence. This is sometimes called “the singularity” which refers to a hypothetical future point when artificial intelligence surpasses human intelligence, triggering rapid, unpredictable technological growth. Often associated with self-improving AI, it implies a transformation of society, consciousness, and control, where human decision-making may be outpaced or rendered obsolete by machines operating beyond our comprehension. 

But viewed through the lens of social psychology, the AI evangelists increasingly resembles that of cognitive dissonance cults, as famously documented in Dr. Leon Festinger and team’s important study of a UFO cult (a la Heaven’s Gate), When Prophecy Fails.  (See also The Great Disappointment.)

In that social psychology foundational study, a group of believers centered around a woman named “Marian Keech” predicted the world would end in a cataclysmic flood, only to be rescued by alien beings — but when the prophecy failed, they doubled down. Rather than abandoning their beliefs, the group rationalized the outcome (“We were spared because of our faith”) and became even more committed. They get this self-hypnotized look, kind of like this guy (and remember-this is what the Meta marketing people thought was the flagship spot for Meta’s entire superintelligence hustle):


This same psychosis permeates Singularity narratives and the AI doom/alignment discourse:
– The world is about to end — not by water, but by unaligned superintelligence.
– A chosen few (frontier labs) hold the secret knowledge to prevent this.
– The public must trust them to build, contain, and govern the very thing they fear.
– And if the predicted catastrophe doesn’t come, they’ll say it was their vigilance that saved us.

Like cultic prophecy, the Singularity promises transformation:
– Total liberation or annihilation (including liberation from annihilation by the Red Menace, i.e., the Chinese Communist Party).
– A timeline (“AGI by 2027”, “everything will change in 18 months”).
– An elite in-group with special knowledge and “Don’t be evil” moral responsibility.
– A strict hierarchy of belief and loyalty — criticism is heresy, delay is betrayal.

This serves multiple purposes:
1. Maintains funding and prestige by positioning the labs as indispensable moral actors.
2. Deflects criticism of copyright infringement, resource consumption, or labor abuse with existential urgency (because China, don’t you know).
3. Converts external threats (like regulation) into internal persecution, reinforcing group solidarity.

The rhetoric of “you don’t understand how serious this is” mirrors cult defenses exactly.

Here’s the rub: the timeline keeps slipping. Every six months, we’re told the leap to “godlike AI” is imminent. GPT‑4 was supposed to upend everything. That didn’t happen, so GPT‑5 will do it for real. Gemini flopped, but Claude 3 might still be the one.

When prophecy fails, they don’t admit error — they revise the story:
– “AI keeps accelerating”
– “It’s a slow takeoff, not a fast one.”
– “We stopped the bad outcomes by acting early.”
– “The doom is still coming — just not yet.”

Leon Festinger’s theories seen in When Prophecy Fails, especially cognitive dissonance and social comparison, influence AI by shaping how systems model human behavior, resolve conflicting inputs, and simulate decision-making. His work guides developers of interactive agents, recommender systems, and behavioral algorithms that aim to mimic or respond to human inconsistencies, biases, and belief formation.   So this isn’t a casual connection.

As with Festinger’s study, the failure of predictions intensifies belief rather than weakening it. And the deeper the believer’s personal investment, the harder it is to turn back. For many AI cultists, this includes financial incentives, status, and identity.

Unlike spiritual cults, AI frontier labs have material outcomes tied to their prophecy:
– Federal land allocations, as we’ve seen with DOE site handovers.
– Regulatory exemptions, by presenting themselves as saviors.
– Massive capital investment, driven by the promise of world-changing returns.

In the case of AI, this is not just belief — it’s belief weaponized to secure public assets, shape global policy, and monopolize technological futures. And when the same people build the bomb, sell the bunker, and write the evacuation plan, it’s not spiritual salvation — it’s capture.

The pressure to sustain the AI prophecy—that artificial intelligence will revolutionize everything—is unprecedented because the financial stakes are enormous. Trillions of dollars in market valuation, venture capital, and government subsidies now hinge on belief in AI’s inevitable dominance. Unlike past tech booms, today’s AI narrative is not just speculative; it is embedded in infrastructure planning, defense strategy, and global trade. This creates systemic incentives to ignore risks, downplay limitations, and dismiss ethical concerns. To question the prophecy is to threaten entire business models and geopolitical agendas. As with any ideology backed by capital, maintaining belief becomes more important than truth.

The Singularity, as sold by the frontier labs, is not just a future hypothesis — it’s a living ideology. And like the apocalyptic cults before them, these institutions demand public faith, offer no accountability, and position themselves as both priesthood and god.

If we want a secular, democratic future for AI, we must stop treating these frontier labs as prophets — and start treating them as power centers subject to scrutiny, not salvation.

From Plutonium to Prompt Engineering: Big Tech’s Land Grab at America’s Nuclear Sites–and Who’s Paying for It?

In a twist of post–Cold War irony, the same federal sites that once forged the isotopes of nuclear deterrence are now poised to fuel the arms race of artificial intelligence under the leadership of Special Government Employee and Silicon Valley Viceroy David Sacks. Under a new Department of Energy (DOE) initiative, 16 legacy nuclear and lab sites — including Savannah River, Idaho National Lab, and Oak Ridge Tennessee — are being opened to private companies to host massive AI data centers. That’s right–Tennessee where David Sacks is riding roughshod over the ELVIS Act.

But as this techno-industrial alliance gathers steam, one question looms large: Who benefits — and how will the American public be compensated for leasing its nuclear commons to the world’s most powerful corporations? Spoiler alert: We won’t.

A New Model, But Not the Manhattan Project

This program is being billed in headlines as a “new Manhattan Project for AI.” But that comparison falls apart quickly. The original Manhattan Project was:
– Owned by the government
– Staffed by public scientists
– Built for collective defense

Today’s AI infrastructure effort is:
– Privately controlled
– Driven by monopolies and venture capital
– Structured to avoid transparency and public input
– Uses free leases on public land with private nuclear reactors

Call it the Manhattan Project in reverse — not national defense, but national defense capture.

The Art of the Deal: Who gets what?

What Big Tech Is Getting

– Access to federal land already zoned, secured, and wired
– Exemption from state and local permitting
– Bypass of grid congestion via nuclear-ready substations
– DOE’s help fast-tracking nuclear microreactors (SMRs)
– Potential sovereign AI training enclaves, shielded from export controls and oversight

And all of it is being made available to private companies called the “Frontier labs”: Microsoft, Oracle, Amazon, OpenAI, Anthropic, xAI — the very firms at the center of the AI race.

What the Taxpayer Gets (Maybe)

Despite this extraordinary access, almost nothing is disclosed about how the public is compensated. No known revenue-sharing models. No guaranteed public compute access. No equity. No royalties.

Land lease payments? Not disclosed. Probably none.
Local tax revenue? Minimal (federal lands exempt)
Infrastructure benefit sharing? Unclear or limited

It’s all being negotiated quietly, under vague promises of “national competitiveness.”

Why AI Labs Want DOE Sites

Frontier labs like OpenAI and Anthropic — and their cloud sponsors — need:
– Gigawatts of energy
– Secure compute environments
– Freedom from export rules and Freedom of Information Act requests
– Permitting shortcuts and national branding

The DOE sites offer all of that — plus built-in federal credibility. The same labs currently arguing in court that their training practices are “fair use” now claim they are defenders of democracy training AI on taxpayer-built land.

This Isn’t the Manhattan Project — It’s the Extraction Economy in a Lab Coat

The tech industry loves to invoke patriotism when it’s convenient — especially when demanding access to federal land, nuclear infrastructure, or diplomatic cover from the EU’s AI Act. But let’s be clear:

This isn’t the Manhattan Project. Or rather we should hope it isn’t because that one didn’t end well and still hasn’t.
It’s not public service.
It’s Big Tech lying about fair use, wrapped in an American flag — and for all we know, it might be the first time David Sacks ever saw one.

When companies like OpenAI and Microsoft claim they’re defending democracy while building proprietary systems on DOE nuclear land, we’re not just being gaslit — we’re being looted.

If the AI revolution is built on nationalizing risk and privatizing power, it’s time to ask whose country this still is — and who gets to turn off the lights.

AI’s Legal Defense Team Looks Familiar — Because It Is

If you feel like you’ve seen this movie before, you have.

Back in the 2003-ish runup to the 2005 MGM Studios, Inc. v. Grokster, Ltd. Supreme Court case, I met with the founder of one of the major p2p platforms in an effort to get him to go legal.  I reminded him that he knew there was all kinds of bad stuff that got uploaded to his platform.  However much he denied it, he was filtering it out and he was able to do that because he had the control over the content that he (and all his cohorts) denied he had.  

I reminded him that if this case ever went bad, someone was going to invade his space and find out exactly what he was up to. Just because the whole distributed p2p model (unlike Napster, by the way) was built to both avoid knowledge and be a perpetual motion machine, there was going to come a day when none of that legal advice was going to matter.  Within a few months the platform shut down, not because he didn’t want to go legal, but because he couldn’t, at least not without actually devoting himself to respecting other people’s rights.

Everything Old is New Again

Back in the early 2000s, peer-to-peer (P2P) piracy platforms claimed they weren’t responsible for the illegal music and videos flooding their networks. Today, AI companies claim they don’t know what’s in their training data. The defense is essentially the same: “We’re just the neutral platform. We don’t control the content.”  It’s that distorted view of the DMCA and Section 230 safe harbors that put many lawyers’ children through prep school, college and graduate school.

But just like with Morpheus, eDonkey, Grokster, and LimeWire, everyone knew that was BS because the evidence said otherwise — and here’s the kicker: many of the same lawyers are now running essentially the same playbook to defend AI giants.

The P2P Parallel: “We Don’t Control Uploads… Except We Clearly Do”

In the 2000s, platforms like Kazaa and LimeWire were like my little buddy–magically they  never had illegal pornography or extreme violence available to consumers, they prioritized popular music and movies, and filtered out the worst of the web

That selective filtering made it clear: they knew what was on their network. It wasn’t even a question of “should have known”, they actually knew and they did it anyway.  Courts caught on. 

In Grokster,  the Supreme Court side stepped the hosting issue and essentially said that if you design a platform with the intent to enable infringement, you’re liable.

The Same Playbook in the AI Era

Today’s AI platforms — OpenAI, Anthropic, Meta, Google, and others — essentially argue:
“Our model doesn’t remember where it learned [fill in the blank]. It’s just statistics.”

But behind the curtain, they:
– Run deduplication tools to avoid overloading, for example on copyrighted books
– Filter out NSFW or toxic content
– Choose which datasets to include and exclude
– Fine-tune models to align with somebody’s social norms or optics

This level of control shows they’re not ignorant — they’re deflecting liability just like they did with p2p.

Déjà Vu — With Many of the Same Lawyers

Many of the same law firms that defended Grokster, Kazaa, and other P2P pirate defendants as well as some of the ISPs are now representing AI companies—and the AI companies are very often some, not all, but some of the same ones that started screwing us on DMCA, etc., for the last 25 years.  You’ll see familiar names all of whom have done their best to destroy the creative community for big, big bucks in litigation and lobbying billable hours while filling their pockets to overflowing. 

The legal cadre pioneered the ‘willful blindness’ defense and are now polishing it up for AI, hoping courts haven’t learned the lesson.  And judging…no pun intended…from some recent rulings, maybe they haven’t.

Why do they drive their clients into a position where they pose an existential threat to all creators?  Do they not understand that they are creating a vast community of humans that really, truly, hate their clients?  I think they do understand, but there is a corresponding hatred of the super square Silicon Valley types who hate “Hollywood” right back.

Because, you know, information wants to be free—unless they are selling it.  And your data is their new oil. They apply this “ethic” not just to data, but to everything: books, news, music, images, and voice. Copyright? A speed bump. Terms of service? A suggestion. Artist consent? Optional.  Writing a song is nothing compared to the complexities of Biggest Tech.

Why do they do this?  OCPD Much?

Because control over training data is strategic dominance and these people are the biggest control freaks that mankind has ever produced.  They exhibit persistent and inflexible patterns of behavior characterized by an excessive need to control people, environments, and outcomes, often associated with traits of obsessive-compulsive personality disorder.  

So empathy will get you nowhere with these people, although their narcissism allows them to believe that they are extremely empathetic.  Pathetic, yes, empathetic, not so much.  

Pay No Attention to that Pajama Boy Behind the Curtain

The driving force behind AI is very similar to the driving force behind the Internet.   If pajama boy can harvest the world’s intellectual property and use it to train his proprietary AI model, he now owns a simulation of the culture he is not otherwise part of, and not only can he monetize it without sharing profits or credit, he can deny profits and credit to the people who actually created it.

So just like the heyday of Pirate Bay, Grokster & Co.  (and Daniel Ek’s pirate incarnation) the goal isn’t innovation. The goal is control over language, imagery, and the markets that used to rely on human creators.  This should all sound familiar if you were around for the p2p era.

Why This Matters

Like p2p platforms, it’s just not believable that the AI companies do know what’s in their models.  They may build their chatbot interface so that the public can’t ask the chatbot to blow the whistle on the platform operator, but that doesn’t mean  the company can’t tell what they are training on.  These operators have to be able to know what’s in the training materials and manipulate that data daily.  

They fingerprint, deduplicate, and sanitize their datasets. How else can they avoid having multiple copies of books, for example, that would be a compute nightmare.  They store “embeddings” in a way that they can optimize their AI to use only the best copy of any particular book.  They control the pipeline.

It’s not about the model’s memory. It’s about the platform’s intent and awareness.

If they’re smart enough to remove illegal content and prioritize clean data, they’re smart enough to be held accountable.

We’re not living through the first digital content crisis — just the most powerful one yet. The legal defenses haven’t changed much. But the stakes — for copyright, competition, and consumer protection — are much higher now.

Courts, Congress, and the public should recognize this for what it is: a recycled defense strategy in service of unchecked AI power. Eventually Grokster ran into Grokster— and all these lawyers are praying that there won’t be an AI version of the Grokster case. 

What Bell Labs and Xerox PARC Can Teach Us About the Future of Music

When we talk about the great innovation engines of the 20th century, two names stand out: Bell Labs and Xerox PARC. These legendary research institutions didn’t just push the boundaries of science and technology—they found solutions that brought us breakthroughs to challenges. The transistor, the laser, the UNIX operating system, the graphical user interface, and Ethernet networking all trace their origins to these hubs of long-range, cross-disciplinary thinking.

These breakthroughs didn’t happen by accident. They were the product of institutions that were intentionally designed to explore what might be possible outside the pressures of quarterly earnings reports–which means monthly which means weekly. Bell Labs and Xerox PARC proved that bold ideas need space, time, and a mandate to explore—even if commercial applications aren’t immediately apparent. You cannot solve big problems with an eye on weekly revenues–and I know that because I worked at A&M Records.

Now imagine if music had something like Bell Labs and Xerox PARC.

What if there were a Bell Labs for Music—an independent research and development hub where songwriters, engineers, logisticians, rights experts, and economists could collaborate to solve deep-rooted industry challenges? Instead of letting dominant tech platforms dictate the future, the music industry could build its own innovation engine, tailored to the needs of creators. Let’s consider how similar institutions could empower the music industry to reclaim its creative and economic future particularly confronted by AI and its institutional takeover.

Big Tech’s Self-Dealing: A $500 Million Taxpayer-Funded Windfall

While creators are being told to “adapt” to the age of AI, Big Tech has quietly written itself a $500 million check—funded by taxpayers—for AI infrastructure. Buried within the sprawling “innovation and competitiveness” sections of legislation being promoted as part of Trump’s “big beautiful bill,” this provision would hand over half a billion dollars in public funding—more accurately, public debt—to cloud providers, chipmakers, and AI monopolists with little transparency and even fewer obligations to the public.

Don’t bother looking–it will come as no surprise that there are no offsetting provisions for musicians, authors, educators, or even news publishers whose work is routinely scraped to train these AI models. There are no earmarks for building fair licensing infrastructure or consent-based AI training databases. There is no “AI Bell Labs” for the creative economy.

Once again, we see that innovation policy is being written by and for the same old monopolists who already control the platforms and the Internet itself, while the people whose work fills those platforms are left unprotected, uncompensated, and uninformed. If we are willing to borrow hundreds of millions to accelerate private AI growth, we should be at least as willing to invest in creator-centered infrastructure that ensures innovation is equitable—not extractive.

Innovation Needs a Home—and a Conscience

Bell Labs and Xerox PARC were designed not just to build technology, but to think ahead. They solved many future challenges often before the world even knew they existed.

The music industry can—and must—do the same. Instead of waiting for another monopolist to exercise its political clout to grant itself new safe harbors to upend the rules–like AI platforms are doing right now–we can build a space where songwriters, developers, and rights holders collaborate to define a better future. That means metadata that respects rights and tracks payments to creators. That means fair discovery systems. That means artist-first economic models.

It’s time for a Bell Labs for music. And it’s time to fund it not through government dependency—but through creator-led coalitions, industry responsibility, and platform accountability.

Because the future of music shouldn’t be written in Silicon Valley boardrooms. It should be composed, engineered, and protected by the people who make it matter.

Now with added retroactive acrobatics: @DamianCollins calls on UK Prime Minister to stop Google’s “Text and Data Mining” Circus

Damian Collins (former chair of the UK Parliament’s Digital Culture Media and Sport Select Committee) warns of Google’s latest AI shenanigans in a must-read opinion piece in the Daily Mail that highlights Google’s attempt to lobby its way into what is essentially a retroactive safe harbor to protect Google and its confederates in the AI land grab. While Mr. Collins writes about Google’s efforts to rewrite the laws of the UK to free ride in his home country which is egregious bullying, the episode he documents is instructive for all of us. If Google & Co. will do it to the Mother of Parliaments, it’s only a matter of time until Google & Co. do the same everywhere or know the reason why. Their goal is to hoover up all the world’s culture that the AI platforms have not scraped already and–crucially–to get away with it. And as Guy Forsyth says, “…nothing says freedom like getting away with it.”

The timeline of AI’s appropriation of all the world’s culture is a critical understanding to appreciate just how depraved Big Tech’s unbridled greed really is. The important thing to remember is that AI platforms like Google have been scraping the Internet to train their AI for some time now, possibly many years. This apparently includes social media platforms they control. My theory is that Google Books was an early effort at digitization for large language models to support products like corpus machine translation as a predecessor to Gemini (“your twin”) and other Google AI products. We should ask Ray Kurzweil.

There is starting to be increasing evidence that this is exactly what these people are up to.

The New York Times Uncovers the Crimes

According to an extensive long-form report in the New York Times by a team of very highly respected journalists, it turns out that Google has been planning this “Text and Data Mining” land grab for some time. At the very moment YouTube was issuing press releases about their Music AI Incubator and their “partners”–Google was stealing anything that was not nailed down that anyone had hosted on their massive platforms, including Google Docs, Google Maps, and…YouTube. The Times tells us:

Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators….Google said that its A.I. models “are trained on some YouTube content,” which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. 

I find it hard to believe that YouTube was both allowed to transcribe and scrape under all its content deals, or that they parsed through all videos to find the unprotected ones subject to their interpretation of the YouTube terms of use. So as we say in Texas, that sounds like bullshit for starters.

How does this relate to the Text and Data Mining exception that Mr. Collins warns of? Note that the NYT tells us “Google transcribed YouTube videos to harvest text.” That’s a clue.

As Mr. Collins tells us:

Google [recently] published a policy paper entitled: Unlocking The UK’s AI Potential.

What’s not to like?, you might ask. Artificial intelligence has the potential to revolutionise our economy and we don’t want to be left behind as the rest of the world embraces its benefits.

But buried in Google’s report is a call for a ‘text and data mining’ (TDM) exception to copyright.

This TDM exception would allow Google to scrape the entire history of human creativity from the internet without permission and without payment.

And, of course, Mr. Collins is exactly correct, that’s exactly what Google have in mind.

The Conspiracy of Dunces and the YouTube Fraud

In fairness, it wasn’t just Google ripping us off, but Google didn’t do anything to stop it as far as I can tell. One thing to remember is that YouTube was, and I think still is, not very crawlable by outsiders. It is almost certainly the case that Google would know who was crawling youtube.com, such as Bingbot, DuckDuckBot, Yandex Bot, or Yahoo Slurp if for no other reason that those spiders were not googlebot. With that understanding, the Times also tells us:

OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform. [Whatever “independent” means.]

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot….

OpenAI eventually made Whisper, the speech recognition tool, to transcribe YouTube videos and podcasts, six people said. But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means (such as robots, botnets or scrapers).”

OpenAI employees knew they were wading into a legal gray area, the people said, but believed that training A.I. with the videos was fair use. 

And strangely enough, many of the AI platforms sued by creators raise “fair use” as a defense (if not all of the cases) which is strangely reminiscent of the kind of crap we have been hearing from these people since 1999.

Now why might Google have permitted OpenAI to crawl YouTube and transcribe videos (and who knows what else)? Probably because Google was doing the same thing. In fact, the Times tells us:

Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn’t stop OpenAI because Google had also used transcripts of YouTube videos to train its A.I. models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.

So Google and its confederate OpenAI may well have conspired to commit massive copyright infringement against the owner of a valid copyright, did so willingly, and for purposes of commercial advantage and private financial gain. (Attempts to infringe are prohibited to the same extent as the completed act). The acts of these confederates vastly exceed the limits for criminal prosecution for both infringement and conspiracy.

But to Mr. Collins’ concern, the big AI platforms transcribed likely billions of hours of YouTube videos to manipulate text and data–you know, TDM.

The New Retroactive Safe Harbor: The Flying Googles Bring their TDM Circus Act to the Big Tent With Retroactive Acrobatics

But also realize the effect of the new TDM exception that Google and their Big Tech confederates are trying to slip past the UK government (and our own for that matter). A lot of the discussion about AI rulemaking acts as if new rules would be for future AI data scraping. Au contraire mes amis–on the contrary, the bad acts have already happened and they happened on an unimaginable scale.

So what Google is actually trying to do is get the UK to pass a retroactive safe harbor that would deprive citizens of valuable property rights–and also pass a prospective safe harbor so they can keep doing the bad acts with impunity.

Fortunately for UK citizens, the UK Parliament has not passed idiotic retroactive safe harbor legislation like the U.S. Congress has. I am, of course, thinking of the vaunted Music Modernization Act (MMA) that drooled its way to a retroactive safe harbor for copyright infringement, a shining example of the triumph of corruption that has yet to be properly challenged in the US on Constitutional grounds.

There’s nothing like the MMA absurdity in the UK, at least not yet. However, that retroactive safe harbor was not lost on Google, who benefited directly from it. They loved it. They hung it over the mantle next to their other Big Game trophy, the DMCA. And now they’d like to do it again for the triptych of legislative taxidermy.

Because make no mistake–a retroactive safe harbor would be exactly the effect of Google’s TDM exception. Not to mention it would also be a form of retroactive eminent domain, or what the UK analogously might call the compulsory purchase of property under the Compulsory Purchase of Property Act. Well…”purchase” might be too strong a word, more like “transfer” because these people don’t intend to pay for a thing.

The effect of passing Google’s TDM exception would be to take property rights and other personal rights from UK citizens without anything like the level of process or compensation required under the Compulsory Purchase of Property–even when the government requires the sale of private property to another private entity (such as a railroad right of way or a utility easement).

The government is on very shaky ground with a TDM exception imposed by the government for the benefit of a private company, indeed foreign private companies who can well afford to pay for it. It would be missing government oversight on a case-by-base basis, no proper valuation, and for entirely commercial purposes with no public benefit. In the US, it would likely violate the Takings Clause of our Constitution, among other things.

It’s Not Just the Artists

Mr. Collins also makes a very important point that might get lost among the stars–it’s not just the stars that AI is ripping off–it is everyone. As the New York Times story points out (and it seems that there’s more whistleblowers on this point every day), the AI platforms are hoovering up EVERYTHING that is on the Internet, especially on their affiliated platforms. That includes baby videos, influencers, everything.

This is why it is cultural appropriation on a grand scale, indeed a scale of depravity that we haven’t seen since the Nurenberg Trials. A TDM exception would harm all Britons in one massive offshoring of British culture.