Now with added retroactive acrobatics: @DamianCollins calls on UK Prime Minister to stop Google’s “Text and Data Mining” Circus

Damian Collins (former chair of the UK Parliament’s Digital Culture Media and Sport Select Committee) warns of Google’s latest AI shenanigans in a must-read opinion piece in the Daily Mail that highlights Google’s attempt to lobby its way into what is essentially a retroactive safe harbor to protect Google and its confederates in the AI land grab. While Mr. Collins writes about Google’s efforts to rewrite the laws of the UK to free ride in his home country which is egregious bullying, the episode he documents is instructive for all of us. If Google & Co. will do it to the Mother of Parliaments, it’s only a matter of time until Google & Co. do the same everywhere or know the reason why. Their goal is to hoover up all the world’s culture that the AI platforms have not scraped already and–crucially–to get away with it. And as Guy Forsyth says, “…nothing says freedom like getting away with it.”

The timeline of AI’s appropriation of all the world’s culture is a critical understanding to appreciate just how depraved Big Tech’s unbridled greed really is. The important thing to remember is that AI platforms like Google have been scraping the Internet to train their AI for some time now, possibly many years. This apparently includes social media platforms they control. My theory is that Google Books was an early effort at digitization for large language models to support products like corpus machine translation as a predecessor to Gemini (“your twin”) and other Google AI products. We should ask Ray Kurzweil.

There is starting to be increasing evidence that this is exactly what these people are up to.

The New York Times Uncovers the Crimes

According to an extensive long-form report in the New York Times by a team of very highly respected journalists, it turns out that Google has been planning this “Text and Data Mining” land grab for some time. At the very moment YouTube was issuing press releases about their Music AI Incubator and their “partners”–Google was stealing anything that was not nailed down that anyone had hosted on their massive platforms, including Google Docs, Google Maps, and…YouTube. The Times tells us:

Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators….Google said that its A.I. models “are trained on some YouTube content,” which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. 

I find it hard to believe that YouTube was both allowed to transcribe and scrape under all its content deals, or that they parsed through all videos to find the unprotected ones subject to their interpretation of the YouTube terms of use. So as we say in Texas, that sounds like bullshit for starters.

How does this relate to the Text and Data Mining exception that Mr. Collins warns of? Note that the NYT tells us “Google transcribed YouTube videos to harvest text.” That’s a clue.

As Mr. Collins tells us:

Google [recently] published a policy paper entitled: Unlocking The UK’s AI Potential.

What’s not to like?, you might ask. Artificial intelligence has the potential to revolutionise our economy and we don’t want to be left behind as the rest of the world embraces its benefits.

But buried in Google’s report is a call for a ‘text and data mining’ (TDM) exception to copyright.

This TDM exception would allow Google to scrape the entire history of human creativity from the internet without permission and without payment.

And, of course, Mr. Collins is exactly correct, that’s exactly what Google have in mind.

The Conspiracy of Dunces and the YouTube Fraud

In fairness, it wasn’t just Google ripping us off, but Google didn’t do anything to stop it as far as I can tell. One thing to remember is that YouTube was, and I think still is, not very crawlable by outsiders. It is almost certainly the case that Google would know who was crawling youtube.com, such as Bingbot, DuckDuckBot, Yandex Bot, or Yahoo Slurp if for no other reason that those spiders were not googlebot. With that understanding, the Times also tells us:

OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform. [Whatever “independent” means.]

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot….

OpenAI eventually made Whisper, the speech recognition tool, to transcribe YouTube videos and podcasts, six people said. But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means (such as robots, botnets or scrapers).”

OpenAI employees knew they were wading into a legal gray area, the people said, but believed that training A.I. with the videos was fair use. 

And strangely enough, many of the AI platforms sued by creators raise “fair use” as a defense (if not all of the cases) which is strangely reminiscent of the kind of crap we have been hearing from these people since 1999.

Now why might Google have permitted OpenAI to crawl YouTube and transcribe videos (and who knows what else)? Probably because Google was doing the same thing. In fact, the Times tells us:

Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn’t stop OpenAI because Google had also used transcripts of YouTube videos to train its A.I. models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.

So Google and its confederate OpenAI may well have conspired to commit massive copyright infringement against the owner of a valid copyright, did so willingly, and for purposes of commercial advantage and private financial gain. (Attempts to infringe are prohibited to the same extent as the completed act). The acts of these confederates vastly exceed the limits for criminal prosecution for both infringement and conspiracy.

But to Mr. Collins’ concern, the big AI platforms transcribed likely billions of hours of YouTube videos to manipulate text and data–you know, TDM.

The New Retroactive Safe Harbor: The Flying Googles Bring their TDM Circus Act to the Big Tent With Retroactive Acrobatics

But also realize the effect of the new TDM exception that Google and their Big Tech confederates are trying to slip past the UK government (and our own for that matter). A lot of the discussion about AI rulemaking acts as if new rules would be for future AI data scraping. Au contraire mes amis–on the contrary, the bad acts have already happened and they happened on an unimaginable scale.

So what Google is actually trying to do is get the UK to pass a retroactive safe harbor that would deprive citizens of valuable property rights–and also pass a prospective safe harbor so they can keep doing the bad acts with impunity.

Fortunately for UK citizens, the UK Parliament has not passed idiotic retroactive safe harbor legislation like the U.S. Congress has. I am, of course, thinking of the vaunted Music Modernization Act (MMA) that drooled its way to a retroactive safe harbor for copyright infringement, a shining example of the triumph of corruption that has yet to be properly challenged in the US on Constitutional grounds.

There’s nothing like the MMA absurdity in the UK, at least not yet. However, that retroactive safe harbor was not lost on Google, who benefited directly from it. They loved it. They hung it over the mantle next to their other Big Game trophy, the DMCA. And now they’d like to do it again for the triptych of legislative taxidermy.

Because make no mistake–a retroactive safe harbor would be exactly the effect of Google’s TDM exception. Not to mention it would also be a form of retroactive eminent domain, or what the UK analogously might call the compulsory purchase of property under the Compulsory Purchase of Property Act. Well…”purchase” might be too strong a word, more like “transfer” because these people don’t intend to pay for a thing.

The effect of passing Google’s TDM exception would be to take property rights and other personal rights from UK citizens without anything like the level of process or compensation required under the Compulsory Purchase of Property–even when the government requires the sale of private property to another private entity (such as a railroad right of way or a utility easement).

The government is on very shaky ground with a TDM exception imposed by the government for the benefit of a private company, indeed foreign private companies who can well afford to pay for it. It would be missing government oversight on a case-by-base basis, no proper valuation, and for entirely commercial purposes with no public benefit. In the US, it would likely violate the Takings Clause of our Constitution, among other things.

It’s Not Just the Artists

Mr. Collins also makes a very important point that might get lost among the stars–it’s not just the stars that AI is ripping off–it is everyone. As the New York Times story points out (and it seems that there’s more whistleblowers on this point every day), the AI platforms are hoovering up EVERYTHING that is on the Internet, especially on their affiliated platforms. That includes baby videos, influencers, everything.

This is why it is cultural appropriation on a grand scale, indeed a scale of depravity that we haven’t seen since the Nurenberg Trials. A TDM exception would harm all Britons in one massive offshoring of British culture.

Grifting Under Heaven: What happens if TikTok Shuts Itself Down?

It finally happened–Congress passed the  Protecting Americans from Foreign Adversary Controlled Applications Act that prohibits the distribution, maintenance, or provision of internet hosting services for applications that are directly or indirectly operated by foreign adversaries. This legislation would include applications owned by ByteDance, Ltd. (the company that owns TikTok) or social media companies controlled by foreign adversaries that pose a significant threat to national security.

According to a Reuters exclusive, the response from Bytedance is that they would rather shut down TikTok than sell it–if the sale included the TikTok algorithm:

“The algorithms TikTok relies on for its operations are deemed core to ByteDance’s overall operations, which would make a sale of the app with algorithms highly unlikely, said the sources close to the parent….

TikTok shares the same core algorithms with ByteDance domestic apps like short video platform Douyin, three of the sources said. Its algorithms are considered better than ByteDance rivals such as Tencent and Xiaohongshu, said one of them.

It would be impossible to divest TikTok with its algorithms as their intellectual property licence is registered under ByteDance in China and thus difficult to disentangle from the parent company, said the sources.”

Well then. Of course, one of the primary national security arguments supporting any First Amendment defense on a challenge by TikTok to the content neutral, time, place and manner regulation will involve both the data privacy and foreign actor mass media manipulation evidentiary hearings. I don’t know how you make that defense without access to the algorithm. So why so secretive?

One could therefore plausibly argue that refusing to put the algorithm on the table is as good as admitting that TikTok is manipulating US users through algorithmic emotional targeting and scraping their users private data to do so. That would directly undermine their First Amendment attack on the US government and be a big step toward proving the government’s case.

And, of course, that secret algorithm uses music as the honeypot to attract users from the very young to the not so young. Remember, if this issue ever comes up in a court as a defense for the government, it will likely be because TikTok brought the underlying lawsuit that gave rise to the defense, and then refused to comply with a subpoena for the key piece of evidence. We call that “bootstrapping” in the trade.

In the interest of full disclosure, I’ve been supporting a version of the foreign adversary divestment legislation since 2020 and did so publicly that year when I moderated a great panel at the Music Biz conference on this very subject. If that panel or this topic made you uncomfortable, it may be because you felt such a strong…let’s say attraction…to TikTok as either a marketer or user that you couldn’t imagine living without it. Or maybe you bought into the “exposure” benefits of TikTok. Or maybe you’d had no reason to think about the larger implications. More about that another time.

After the legislation passed–despite a US lobbying campaign against it worthy of The Internet Association…ahem–people are asking, now what? So let’s think about that.

The Universal Connection

TikTok’s future cannot be well understood without taking into account the withdrawal of Universal’s recordings and songs from the platform for commercial reasons. That withdrawal now looks even more prescient given the foreign adversary divestiture legislation. Is it materially different to make a deal with a company that is just another piggy Big Tech company that doesn’t value music and considers it a loss leader to get to the really big bundle of cash like Spotify stock, or to do a deal with that piggy company who has also been declared a tool of a strategic foreign adversary of the United States by none other than the President of the United States?

I think it rather is. So the two events are in some ways quite connected.

First of all, in the short run I would expect TikTok to immediately expand their direct licensing campaign which evidently has already snared Taylor Swift and do it quickly before anyone noticed that what was just a crappy licensing deal the day before President Biden signed the legislation into law, now is a crappy licensing deal from a declared foreign adversary of the United States. How that twist will affect the brand of Miss Americana remains to be seen.

One solution I would expect to get floated in coming days is the need for TikTok executives to register as foreign agents under the Foreign Agents Registration Act. According to the Congressional Research Service:

In 1938, the Foreign Agents Registration Act (22 U.S.C. §§611-621; FARA) was enacted to require individuals doing political or advocacy work on behalf of foreign entities in the United States to register with the Department of Justice and to disclose their relationship, activities, receipts, and disbursements in support of their activities. The FARA does not prohibit any specific activities; rather it seeks to require registration and disclosure of them….In 1966, FARA was amended to shift the focus from political propagandists to agents representing the economic interests of foreign principals. These amendments were partially the result of an investigation by the Senate Foreign Relations Committee into foreign sugar interests and other lobbying activities. The 1966 amendments changed several definitions in the law, prohibited contingent fee contracts, broadened exemptions to ensure legitimate commercial activities were not burdened, strengthened provisions for the disclosure and labeling of propaganda, and required the Department of Justice to issue regulations on the act (28 C.F.R. §5.1 et seq.).

FARA enforcement languished for a bit over the years. However, FARA enforcement against those who fail to register as a foreign agent has had a resurgence in popularity at the Department of Justice. I think it can fairly be said that requiring TikTok executives to register would be consistent with DOJ’s actions and is worth a discussion. The policy underlying FARA is for the public to be aware of who is whom–disclosure not imprisonment, or at least disclosure first.

Enter the Miasma of Angst

There is something of a miasma of angst around passing the foreign adversary divestment legislation as applied to TikTok which is partly due to an extraordinary amount of commercial activity between the US and China which may tend to mask the underlying kinetic tensions between our countries. It’s quite difficult for Americans to grasp this kinetic part due to the Great Firewall of China, the language and cultural barrier, and China’s own propaganda which is way, way more effective and long lasting than anything the Nazis dreamed up. TikTok is, after all, a danger close propaganda missile battery.

The legislation seems to assume that China is an “adversary” and not a “belligerent”. Is that actually true?

There are other rather inescapable events that suggests that the U.S. is already in a war with China, at least as far as the Chinese government are concerned. It helps to understand that when people say the Chinese Communist Party or “the CCP”, they mean the Chinese government and vice versa, a government ruled by Chairman for Life Xi Jinping. The Chinese constitution is, for example, the Constitution of the CCP.

Always remember that Usama Bin Laden declared war on the US but nobody took him seriously. Nuff said.

Why is that relevant to TikTok? Well, here’s another declaration of war on the US that nobody noticed. On May 14, 2019, the CCP government declared a “people’s war” against the United States as reported in the Pravda of China, the Global Times operated by Xinhua News Agency (the cabinet-level “news” agency run by the CCP):

“The most important thing is that in the China-US trade war, the US side fights for greed and arrogance … and morale will break at any point…The Chinese side is fighting back to protect its legitimate interests. The trade war in the US is the creation of one person and one administration, but it affects that country’s entire population…In China, the entire country and all its people are being threatened. For us, this is a real ‘people’s war.'”

What is the “people’s war”? It is an old Maoist phrase (remembering that Xi Jinping’s father fought with Mao during China’s Communist Revolution). It has a very specific meaning in the history of the Chinese Communist Party according to Wikipedia:

People’s war, also called protracted people’s war, is a Maoist military strategy. First developed by the Chinese communist revolutionary leader Mao Zedong (1893–1976), the basic concept behind people’s war is to maintain the support of the population and draw the enemy deep into the countryside (stretching their supply lines) where the population will bleed them dry through a mix of mobile warfare and guerrilla warfare. 

So in the dimension of “unrestricted warfare,” what end state would the CCP like to see? Bearing in mind that they will avoid a shooting war in favor of the various other dimensions of civil-military fusion and following Sun Tzu’s admonishment o subdue the enemy without fighting. One way would be to impose economic damage on the United States.

The Unrestricted Warfare Dimension

What is this “unrestricted warfare”? That is a much bigger topic and I cannot emphasize enough the importance for every American and really everyone to understand it. Literally “Unrestricted Warfare” is one of the most important books on military strategy and geopolitics that nobody has read.

We think the book was published in Mandarin In 1999; it could have been earlier. It was written by two colonels in the Peoples’ Liberation Army of the Peoples Republic of China and entitled Unrestricted Warfare. The title is variously translated as Unrestricted Warfare: Two Air Force Senior Colonels on Scenarios for War and the Operational Art in an Era of Globalization, or the more bellicose Unrestricted Warfare: China’s Master Plan to Destroy America. 

Why is this important? You must understand that when the colonels say “to destroy America” they actually mean that very thing. China’s military and civil goal is to replace the United States as the global hegemon under the “mandate of Heaven.” (See 2050 China: Understanding Xi Jinping Thought.)

No kidding.

The thesis of the book is that it is a mistake for a contemporary great power to think of war solely in military terms; war includes an economic, cyber, space, information war (especially social media like TikTok), and other dimensions–including kinetic–depending on the national interest at the time. I think of Unrestricted Warfare as an origin story for China’s civil and military fusion policy, later expressed in various statutes of the Chinese Communist Party that were on full display in the TikTok hearings before Congress.

Although the book was translated and certain of the cognoscenti read it in Mandarin (see Josh Rogan, Michal Pillsbury and Gen. Rob Spaulding), it was largely unnoticed until recently. Except in China–the CCP rewarded the authors handsomely: Colonel Qiao Liang retired as a major general in the PLA and Colonel Wang Xiangsui is a professor at Beihang University in Beijing following his retirement as a senior Colonel in the PLA (OF-5).

The point of both Bin Ladin’s 1998 fatwa and Unrestricted Warfare, and the 2019 people’s war declaration, is both that each of them declared war on America, and that no one paid attention. We know where that got us with bin Ladin, there are movies about it.

To War or Not?

So the first question is what is the argument that we are not at war currently with China under their definition? Particularly given that they declared war on us with just enough plausible deniability to make you feel bad about shutting down TikTok–see what I did there? (I think the CCP declared war started much, much longer ago, but let’s stick with their people’s war declaration as a recent tangible event to keep it manageable and ignore, oh, say island building, expanding to the largest navy in the world, and the rest of it. (Read The Hundred-Year Marathon and see what you think. It may be worth reviewing the history on the Anglo-German Naval Agreement indirectly referenced in a Noël Coward song.)

Don’t Let’s Be Beastly to the Germans by Noël Coward is reflective on “excessive humanitarians”

It is also worth remembering that should open hostilities with China actually break out, i.e., in the colonels’ words should the current level of unrestricted warfare go kinetic, CCP-owned companies operating in the US will fall under an entirely more intense level of scrutiny. This is permitted by international laws of armed conflict and doesn’t even require additional US national laws although there surely will be many.

In the first instance, is the ostensibly private company actually private? What if good old chummy Mr. Tok turned out to be a colonel in the People’s Liberation Army and just didn’t get around to telling anyone? (I don’t think anyone in Congressional hearings ever asked him.)

And what if TikTok complied with the CCP laws that apply to Bytedance for sure and may apply to TikTok that require there to be a CCP cadre in each company? (See Article 19 of China’s “Company Law.”) If a private company’s staff members are also members of the armed forces of a state or have combat functions for an organized armed group belonging to a party in the conflict, they are not considered civilians.  Further, if a private company is directly involved in military operations (e.g., cyber attacks or psy ops), it may lose its civilian status and become a legitimate military target under the Geneva Conventions. (Further reading, an excellent article from West Point on topic. I don’t think anyone ever asked Mr. Chew if he was a serving member of the PLA.)

So if China invades Taiwan and the US comes in on the side of Taiwan, but TikTok assists in even psychological warfare ops to support that war effort for China against Taiwan (and possibly the US), then what happens? What if it turns out that senior Tiks are reservists or active duty in the Peoples Liberation Army that they just kind of didn’t mention before? Good old Uncle Chew? This kind of thing can also get you sanctioned if you try hard enough. Remember this came up with Elon Musk when Starlink allegedly thwarted an attack by Ukraine (which he denied for other reasons).

So about those licenses….Do artists really want to be used as a honeypot? Especially if TikTok keeps its algorithm, ostensibly shuts down in the US, but parks outside the US and still assaults US users?