Damian Collins (former chair of the UK Parliament’s Digital Culture Media and Sport Select Committee) warns of Google’s latest AI shenanigans in a must-read opinion piece in the Daily Mail that highlights Google’s attempt to lobby its way into what is essentially a retroactive safe harbor to protect Google and its confederates in the AI land grab. While Mr. Collins writes about Google’s efforts to rewrite the laws of the UK to free ride in his home country which is egregious bullying, the episode he documents is instructive for all of us. If Google & Co. will do it to the Mother of Parliaments, it’s only a matter of time until Google & Co. do the same everywhere or know the reason why. Their goal is to hoover up all the world’s culture that the AI platforms have not scraped already and–crucially–to get away with it. And as Guy Forsyth says, “…nothing says freedom like getting away with it.”
The timeline of AI’s appropriation of all the world’s culture is a critical understanding to appreciate just how depraved Big Tech’s unbridled greed really is. The important thing to remember is that AI platforms like Google have been scraping the Internet to train their AI for some time now, possibly many years. This apparently includes social media platforms they control. My theory is that Google Books was an early effort at digitization for large language models to support products like corpus machine translation as a predecessor to Gemini (“your twin”) and other Google AI products. We should ask Ray Kurzweil.
There is starting to be increasing evidence that this is exactly what these people are up to.
The New York Times Uncovers the Crimes
According to an extensive long-form report in the New York Times by a team of very highly respected journalists, it turns out that Google has been planning this “Text and Data Mining” land grab for some time. At the very moment YouTube was issuing press releases about their Music AI Incubator and their “partners”–Google was stealing anything that was not nailed down that anyone had hosted on their massive platforms, including Google Docs, Google Maps, and…YouTube. The Times tells us:
Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators….Google said that its A.I. models “are trained on some YouTube content,” which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program.
I find it hard to believe that YouTube was both allowed to transcribe and scrape under all its content deals, or that they parsed through all videos to find the unprotected ones subject to their interpretation of the YouTube terms of use. So as we say in Texas, that sounds like bullshit for starters.
How does this relate to the Text and Data Mining exception that Mr. Collins warns of? Note that the NYT tells us “Google transcribed YouTube videos to harvest text.” That’s a clue.
As Mr. Collins tells us:
Google [recently] published a policy paper entitled: Unlocking The UK’s AI Potential.
What’s not to like?, you might ask. Artificial intelligence has the potential to revolutionise our economy and we don’t want to be left behind as the rest of the world embraces its benefits.
But buried in Google’s report is a call for a ‘text and data mining’ (TDM) exception to copyright.
This TDM exception would allow Google to scrape the entire history of human creativity from the internet without permission and without payment.
And, of course, Mr. Collins is exactly correct, that’s exactly what Google have in mind.
The Conspiracy of Dunces and the YouTube Fraud
In fairness, it wasn’t just Google ripping us off, but Google didn’t do anything to stop it as far as I can tell. One thing to remember is that YouTube was, and I think still is, not very crawlable by outsiders. It is almost certainly the case that Google would know who was crawling youtube.com, such as Bingbot, DuckDuckBot, Yandex Bot, or Yahoo Slurp if for no other reason that those spiders were not googlebot. With that understanding, the Times also tells us:
OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.
Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform. [Whatever “independent” means.]
Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot….
OpenAI eventually made Whisper, the speech recognition tool, to transcribe YouTube videos and podcasts, six people said. But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means (such as robots, botnets or scrapers).”
OpenAI employees knew they were wading into a legal gray area, the people said, but believed that training A.I. with the videos was fair use.
And strangely enough, many of the AI platforms sued by creators raise “fair use” as a defense (if not all of the cases) which is strangely reminiscent of the kind of crap we have been hearing from these people since 1999.
Now why might Google have permitted OpenAI to crawl YouTube and transcribe videos (and who knows what else)? Probably because Google was doing the same thing. In fact, the Times tells us:
Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn’t stop OpenAI because Google had also used transcripts of YouTube videos to train its A.I. models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.
So Google and its confederate OpenAI may well have conspired to commit massive copyright infringement against the owner of a valid copyright, did so willingly, and for purposes of commercial advantage and private financial gain. (Attempts to infringe are prohibited to the same extent as the completed act). The acts of these confederates vastly exceed the limits for criminal prosecution for both infringement and conspiracy.
But to Mr. Collins’ concern, the big AI platforms transcribed likely billions of hours of YouTube videos to manipulate text and data–you know, TDM.
The New Retroactive Safe Harbor: The Flying Googles Bring their TDM Circus Act to the Big Tent With Retroactive Acrobatics
But also realize the effect of the new TDM exception that Google and their Big Tech confederates are trying to slip past the UK government (and our own for that matter). A lot of the discussion about AI rulemaking acts as if new rules would be for future AI data scraping. Au contraire mes amis–on the contrary, the bad acts have already happened and they happened on an unimaginable scale.
So what Google is actually trying to do is get the UK to pass a retroactive safe harbor that would deprive citizens of valuable property rights–and also pass a prospective safe harbor so they can keep doing the bad acts with impunity.
Fortunately for UK citizens, the UK Parliament has not passed idiotic retroactive safe harbor legislation like the U.S. Congress has. I am, of course, thinking of the vaunted Music Modernization Act (MMA) that drooled its way to a retroactive safe harbor for copyright infringement, a shining example of the triumph of corruption that has yet to be properly challenged in the US on Constitutional grounds.
There’s nothing like the MMA absurdity in the UK, at least not yet. However, that retroactive safe harbor was not lost on Google, who benefited directly from it. They loved it. They hung it over the mantle next to their other Big Game trophy, the DMCA. And now they’d like to do it again for the triptych of legislative taxidermy.
Because make no mistake–a retroactive safe harbor would be exactly the effect of Google’s TDM exception. Not to mention it would also be a form of retroactive eminent domain, or what the UK analogously might call the compulsory purchase of property under the Compulsory Purchase of Property Act. Well…”purchase” might be too strong a word, more like “transfer” because these people don’t intend to pay for a thing.
The effect of passing Google’s TDM exception would be to take property rights and other personal rights from UK citizens without anything like the level of process or compensation required under the Compulsory Purchase of Property–even when the government requires the sale of private property to another private entity (such as a railroad right of way or a utility easement).
The government is on very shaky ground with a TDM exception imposed by the government for the benefit of a private company, indeed foreign private companies who can well afford to pay for it. It would be missing government oversight on a case-by-base basis, no proper valuation, and for entirely commercial purposes with no public benefit. In the US, it would likely violate the Takings Clause of our Constitution, among other things.
It’s Not Just the Artists
Mr. Collins also makes a very important point that might get lost among the stars–it’s not just the stars that AI is ripping off–it is everyone. As the New York Times story points out (and it seems that there’s more whistleblowers on this point every day), the AI platforms are hoovering up EVERYTHING that is on the Internet, especially on their affiliated platforms. That includes baby videos, influencers, everything.
This is why it is cultural appropriation on a grand scale, indeed a scale of depravity that we haven’t seen since the Nurenberg Trials. A TDM exception would harm all Britons in one massive offshoring of British culture.
