Now with added retroactive acrobatics: @DamianCollins calls on UK Prime Minister to stop Google’s “Text and Data Mining” Circus

Damian Collins (former chair of the UK Parliament’s Digital Culture Media and Sport Select Committee) warns of Google’s latest AI shenanigans in a must-read opinion piece in the Daily Mail that highlights Google’s attempt to lobby its way into what is essentially a retroactive safe harbor to protect Google and its confederates in the AI land grab. While Mr. Collins writes about Google’s efforts to rewrite the laws of the UK to free ride in his home country which is egregious bullying, the episode he documents is instructive for all of us. If Google & Co. will do it to the Mother of Parliaments, it’s only a matter of time until Google & Co. do the same everywhere or know the reason why. Their goal is to hoover up all the world’s culture that the AI platforms have not scraped already and–crucially–to get away with it. And as Guy Forsyth says, “…nothing says freedom like getting away with it.”

The timeline of AI’s appropriation of all the world’s culture is a critical understanding to appreciate just how depraved Big Tech’s unbridled greed really is. The important thing to remember is that AI platforms like Google have been scraping the Internet to train their AI for some time now, possibly many years. This apparently includes social media platforms they control. My theory is that Google Books was an early effort at digitization for large language models to support products like corpus machine translation as a predecessor to Gemini (“your twin”) and other Google AI products. We should ask Ray Kurzweil.

There is starting to be increasing evidence that this is exactly what these people are up to.

The New York Times Uncovers the Crimes

According to an extensive long-form report in the New York Times by a team of very highly respected journalists, it turns out that Google has been planning this “Text and Data Mining” land grab for some time. At the very moment YouTube was issuing press releases about their Music AI Incubator and their “partners”–Google was stealing anything that was not nailed down that anyone had hosted on their massive platforms, including Google Docs, Google Maps, and…YouTube. The Times tells us:

Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators….Google said that its A.I. models “are trained on some YouTube content,” which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. 

I find it hard to believe that YouTube was both allowed to transcribe and scrape under all its content deals, or that they parsed through all videos to find the unprotected ones subject to their interpretation of the YouTube terms of use. So as we say in Texas, that sounds like bullshit for starters.

How does this relate to the Text and Data Mining exception that Mr. Collins warns of? Note that the NYT tells us “Google transcribed YouTube videos to harvest text.” That’s a clue.

As Mr. Collins tells us:

Google [recently] published a policy paper entitled: Unlocking The UK’s AI Potential.

What’s not to like?, you might ask. Artificial intelligence has the potential to revolutionise our economy and we don’t want to be left behind as the rest of the world embraces its benefits.

But buried in Google’s report is a call for a ‘text and data mining’ (TDM) exception to copyright.

This TDM exception would allow Google to scrape the entire history of human creativity from the internet without permission and without payment.

And, of course, Mr. Collins is exactly correct, that’s exactly what Google have in mind.

The Conspiracy of Dunces and the YouTube Fraud

In fairness, it wasn’t just Google ripping us off, but Google didn’t do anything to stop it as far as I can tell. One thing to remember is that YouTube was, and I think still is, not very crawlable by outsiders. It is almost certainly the case that Google would know who was crawling youtube.com, such as Bingbot, DuckDuckBot, Yandex Bot, or Yahoo Slurp if for no other reason that those spiders were not googlebot. With that understanding, the Times also tells us:

OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform. [Whatever “independent” means.]

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot….

OpenAI eventually made Whisper, the speech recognition tool, to transcribe YouTube videos and podcasts, six people said. But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means (such as robots, botnets or scrapers).”

OpenAI employees knew they were wading into a legal gray area, the people said, but believed that training A.I. with the videos was fair use. 

And strangely enough, many of the AI platforms sued by creators raise “fair use” as a defense (if not all of the cases) which is strangely reminiscent of the kind of crap we have been hearing from these people since 1999.

Now why might Google have permitted OpenAI to crawl YouTube and transcribe videos (and who knows what else)? Probably because Google was doing the same thing. In fact, the Times tells us:

Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn’t stop OpenAI because Google had also used transcripts of YouTube videos to train its A.I. models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.

So Google and its confederate OpenAI may well have conspired to commit massive copyright infringement against the owner of a valid copyright, did so willingly, and for purposes of commercial advantage and private financial gain. (Attempts to infringe are prohibited to the same extent as the completed act). The acts of these confederates vastly exceed the limits for criminal prosecution for both infringement and conspiracy.

But to Mr. Collins’ concern, the big AI platforms transcribed likely billions of hours of YouTube videos to manipulate text and data–you know, TDM.

The New Retroactive Safe Harbor: The Flying Googles Bring their TDM Circus Act to the Big Tent With Retroactive Acrobatics

But also realize the effect of the new TDM exception that Google and their Big Tech confederates are trying to slip past the UK government (and our own for that matter). A lot of the discussion about AI rulemaking acts as if new rules would be for future AI data scraping. Au contraire mes amis–on the contrary, the bad acts have already happened and they happened on an unimaginable scale.

So what Google is actually trying to do is get the UK to pass a retroactive safe harbor that would deprive citizens of valuable property rights–and also pass a prospective safe harbor so they can keep doing the bad acts with impunity.

Fortunately for UK citizens, the UK Parliament has not passed idiotic retroactive safe harbor legislation like the U.S. Congress has. I am, of course, thinking of the vaunted Music Modernization Act (MMA) that drooled its way to a retroactive safe harbor for copyright infringement, a shining example of the triumph of corruption that has yet to be properly challenged in the US on Constitutional grounds.

There’s nothing like the MMA absurdity in the UK, at least not yet. However, that retroactive safe harbor was not lost on Google, who benefited directly from it. They loved it. They hung it over the mantle next to their other Big Game trophy, the DMCA. And now they’d like to do it again for the triptych of legislative taxidermy.

Because make no mistake–a retroactive safe harbor would be exactly the effect of Google’s TDM exception. Not to mention it would also be a form of retroactive eminent domain, or what the UK analogously might call the compulsory purchase of property under the Compulsory Purchase of Property Act. Well…”purchase” might be too strong a word, more like “transfer” because these people don’t intend to pay for a thing.

The effect of passing Google’s TDM exception would be to take property rights and other personal rights from UK citizens without anything like the level of process or compensation required under the Compulsory Purchase of Property–even when the government requires the sale of private property to another private entity (such as a railroad right of way or a utility easement).

The government is on very shaky ground with a TDM exception imposed by the government for the benefit of a private company, indeed foreign private companies who can well afford to pay for it. It would be missing government oversight on a case-by-base basis, no proper valuation, and for entirely commercial purposes with no public benefit. In the US, it would likely violate the Takings Clause of our Constitution, among other things.

It’s Not Just the Artists

Mr. Collins also makes a very important point that might get lost among the stars–it’s not just the stars that AI is ripping off–it is everyone. As the New York Times story points out (and it seems that there’s more whistleblowers on this point every day), the AI platforms are hoovering up EVERYTHING that is on the Internet, especially on their affiliated platforms. That includes baby videos, influencers, everything.

This is why it is cultural appropriation on a grand scale, indeed a scale of depravity that we haven’t seen since the Nurenberg Trials. A TDM exception would harm all Britons in one massive offshoring of British culture.

Chronology: The Week in Review: Search is Ripping Us Off Again, this time for AI; #FreeJimmyLai; the MLC loves the MLC almost as much as the NMPA

In case you think that infringement of copyrights in AI training happens only to rock stars, photographers, record companies or movie studios, let me give you a personal anecdote that might cause you to reconsider. I asked Bing’s Copilot AI tool to list the deal points in a record producer agreement, and this is what I got back:

Copilot

Certainly! When negotiating a producer agreement, several key points should be considered. Here are some common negotiation points from the artist’s perspective:

  1. Scope of Engagement:
    1. The producer is generally engaged for one album project at a time.
    1. Clarify the scope of work, including the number of songs or recordings to be produced.
  2. Budget and Compensation:
    1. Determine the budget for the project (e.g., $10,000 in your case).
    1. Agree on the producer’s compensation, which may include an upfront fee, points on sales, or a combination.
  3. Ownership and Rights:
    1. Address ownership of the master recordings:
      1. Will the artist own and control the masters?
      1. Does the producer retain any rights?
  4. Credits and Royalties:
    1. Specify how the producer will be credited on releases.
    1. Discuss any royalties or backend points for the producer.
  5. Recording Schedule:
    1. Set a timeline for recording sessions.
    1. Define the two-week period for your project.
  6. Logistics:
    1. Determine where the recordings will take place.
    1. Outline the artist’s approval process.
    1. Define the producer’s deliverables.
  7. Termination Clause:
    1. Include provisions for termination if either party breaches the agreement.
    1. Address payment for work completed up to that point.

Remember that each negotiation is unique, and it’s essential to tailor the agreement to your specific needs and circumstances. Legal advice is recommended to ensure compliance with local laws and industry standards1 3 4.

Source(s)

1. An Artist’s Guide to Negotiating Record Producer Agreements – Medium

2. Negotiating a Music Producer Agreement: 7 Key Issues

3. 5 Things to Look for in a Music Producer Agreement – Lawyer Drummer

4. How to Negotiate: Steps, Phrases & Tips to Win the Deal – Selling Signals

Notice that Copilot lists “Source(s)” and that source #1 is an article from Medium.

My article.

I’ve written a number of articles about negotiating producer agreements and I can categorically say I haven’t given permission or been asked to give permission for any of those articles to be scraped and used to train AI. So how did it come to be in Copilot’s training materials?

My guess is that Medium (and probably a lot of other sites) were crawled by Bing’s spider and that copies of Medium pages were stored in Bing’s (or Microsoft’s) data centers. When Microsoft launched its Copilot AI tool, one could imagine that the temptation would have been strong to use those copies as part of the training tools for Copilot. In fact, the Verge reports that “companies around the web are using your site and its data to build massive sets of training data, in order to build models and products that may not acknowledge your existence at all.”

Are you surprised then that two of the biggest operators in the AI space are the search engine operators Google and Microsoft? This is another example of how Big Tech helps itself to your data and work product without you even knowing it’s happening. So now what? Now I know I’m being ripped off, and I’m wondering if Medium is in on it.

The Verge tells us:

The ability to download, store, organize, and query the modern internet gives any company or developer something like the world’s accumulated knowledge to work with. In the last year or so, the rise of AI products like ChatGPT, and the large language models underlying them, have made high-quality training data one of the internet’s most valuable commodities. That has caused internet providers of all sorts to reconsider the value of the data on their servers, and rethink who gets access to what. 

Ya think?

#FreeJimmyLai

In case you were wondering if the Chinese Communist Party is serious about using TikTok to collect data as they please, Hong Kong publisher Jimmy Lai is a perfect example of what happens when a company refuses to cooperate. The CCP took over Jimmy’s Apple Daily newspaper and arrested him. Jimmy has been in prison for three years and has still not come to “trial” (whatever that means under the CCP).

Blake Morgan sounds off in Hypebot on how TikTok uses music as a honeypot to hook innocent users

The MLC Reup

The MLC gets a five year review of how they are doing. The Copyright Office conducts that review of both the MLC and the DLC. The public (that’s you) gets a chance to weigh in now because the MLC and the DLC filed their respective written statements about their respective awesomeness.

One of the most interesting exchanges happened between Graham Davies (who essentially is the public face of the DLC and is the head of the Digital Media Association) and the NMPA. Graham offered muted criticism of the MLC which irked the easily irked NMPA who offered a rather scathing response.

The most interesting thing about the exchange was that it was the NMPA–not the MLC–that responded to Graham. Tell you anything?

Chronology: The Week in Review: MLC Redesignation Proceeding Highlights Ownership Issues for the Government’s Musical Works Database; TikTok’s SOPA Problem; Google’s Nonindemnity Indemnity for AI

One of the few things Congress got right in Title I of the Music Modernization Act is the five-year review of the mechanical licensing collective. Or more precisely, whether the private company previously designated by the Copyright Office to conduct the functions of the Mechanical Licensing Collective should have another five years to continue doing whatever it is they do.

Impliedly, and I think a bit unfairly, Congress told the Copyright Office to approve its own decision to appoint the current MLC or admit they made a mistake. This is yet another one of the growing list oversights in the oversight. Wouldn’t it make more sense for someone not involved in the initial decision to be evaluating the performance of the MLC? Particularly when there are at least tens of millions changing hands as well as some highly compensated MLC employees, any one of whom makes more than the Copyright Royalty Judges.

What happens if the Register of Copyright actually fires The MLC, Inc. and designates a new MLC operator? The first question probably should be what happens to the vaunted MLC musical works database and the attendant software and accounting systems which seem to be maintained out of the UK for some reason.

I actually raised this question in a comment to the Copyright Office back in 2020. In short, my question was probably more of a statement: ‘‘The musical works database does not belong to the MLC or The MLC and if there is any confusion about that, it should be cleared up right away.” The Copyright Office had a very clear response:

While the mechanical licensing collective must ‘‘establish and maintain a database containing information relating to musical works,’’ the statute and legislative history emphasize that the database is meant to benefit the music industry overall and is not ‘‘owned’’ by the collective itself….Any use by the Office referring to the public database as ‘‘the MLC’s database’’ or ‘‘its database’’ was meant to refer to the creation and maintenance of the database, not ownership. [85 FR at 58172, text accompanying notes 30 and 31.]

So if the current operator of the MLC is fired, we know from the MMA and the Copyright Office guidance that one thing The MLC, Inc. cannot do is hold the database and its attendant systems hostage, or demand payment, or any other shadiness. These items do not belong to them so they must not assert control over that which they do not own.

Which would include the hundreds of millions of black box money that the MLC, Inc. has failed to distribute in going on four years. I’ve even heard cynics suggest that the market share distribution of black box will occur immediately following The MLC, Inc.’s redesignation and the corresponding renewal of HFA’s back office contract which seems to be worth about $10 million a year all by itself.

What would also have been helpful would be for Congress to have required the Copyright Office to publish evaluation criteria for what they expected the MLC’s operator to actually do as well as performance benchmarks. Like I said, it’s a bit unfair of Congress to put the Copyright Office in the unprecedented position of evaluating such an important role with no guidance whatsoever. Surely Congress did not intend for the Copyright Office to have unfettered autonomy in deciding what standards to apply to their review of a quasi-governmental agency like the MLC, yet seems to have defaulted to the guardrail of the Administrative Procedures Act or some other backstop to sustain checks and balances on the situation.

But at least the ownership question is settled.

Breaking the Internet Yet Again: TikTok’s SOPA Problem

TikTok users swarmed over the Capitol to protest and impede a Congressional vote that would force the sale of the ubiquitous TikTok. Can Camp Pelosi redux be far behind? Well, no, because this was a digital swarm which is just different, you see. It’s just different when Big Tech tries to protect an IPO.

TikTok’s tactics are very reminiscent of Google’s tactics with SOPA or Napster’s tactics with Camp Chaos.

But not even Napster had the brass to go to full on papal indulgences. Yes, that’s right: NunTok will save the IPO.

Nuns good, TikTok bad!

I wonder which Washington lobbyist thought of NunTok? Perhaps this guy:

Google’s Nonindemnity Indemnity for AI

Some generative AI platforms are trying to make users believe that the company will actually protect them from copyright infringement claims. When you drill down on what the promise actually is, it’s pretty flimsy and may itself be consumer fraud.

Chronology: The week in review, Spotify layoffs, mechanical rate increase, FTC on copyright issues in AI

What Spotify needs is a good pandemic.  

Harsh?  Not really, at least not from a share price point of view. Spotify’s all time highest share price was during the COVID pandemic.

Spotify CEO Daniel Ek and the press tells us that Spotify is cutting 1,500 jobs which works out to about 17% of Spotify employees. Which works out to a pre-layoff workforce of 8,823.  So let’s start there—that workforce number seems very high and is completely out of line with some recent data from Statista which is usually reliable.

If Statista is correct, Spotify employed 5,584 as of last year. Yet somehow Spotify’s 2023 workforce grew to 9200 according to the Guardian, fully 2/3 over that 2022 level without a commensurate and offsetting growth in revenue. That’s a governance question in and of itself.

Why the layoffs? The Guardian reports that Spotify CEO Daniel Ek is concerned about costs. He says “Despite our efforts to reduce costs this past year, our cost structure for where we need to be is too big.” Maybe I missed it, but the only time I can recall Daniel Ek being vocally concerned about Spotify’s operating costs was when it came to paying royalties. Then it was full-blown poor mouthing while signing leases for very expensive office space in 4 World Trade Center as well as other pricy real estate, executive compensation and podcasters like Harry & Meghan.

Mr. Ek announced his new, new thing:

Over the last two years, we’ve put significant emphasis on building Spotify into a truly great and sustainable business – one designed to achieve our goal of being the world’s leading audio company and one that will consistently drive profitability and growth into the future. While we’ve made worthy strides, as I’ve shared many times, we still have work to do. Economic growth has slowed dramatically and capital has become more expensive. Spotify is not an exception to these realities.

Which “economic growth” is that?

But, he is definitely right about capital costs.

Still, Spotify’s job cuts are not necessarily that surprising considering the macro economy, most specifically rents and interest rates. As recently as 2018, Spotify was the second largest tenant at 4 WTC. Considering the sheer size of Spotify’s New York office space, it’s not surprising that Spotify is now subletting five floors of 4 WTC earlier this year. That’s right, the company had a spare five floors. Can that excess just be more people working at home given Mr. Ek’s decision to expand Spotify’s workforce? But why does Spotify need to be a major tenant in World Trade Center in the first place? Renting the big New York office space is the corporate equivalent of playing house. That’s an expensive game of pretend.

Remember that Spotify is one of the many companies that rose to dominance during the era of easy money in response to the financial crisis that was the hallmark of quantitative easing and the Federal Reserve’s Zero Interest Rate Policy beginning around 2008. Spotify’s bankers were able to fuel Daniel Ek’s desire to IPO and cash out in the public markets by enabling Spotify to run at a loss because money was cheap and the stock market had a higher tolerance for risky investments. When you get a negative interest rate for saving money, Spotify stock doesn’t seem like a totally insane investment by comparison. This may have contributed to two stock buy-back programs of $1 billion each, Spotify’s deal with Barcelona FC and other notorious excesses.

As a great man said, don’t confuse leverage for genius. It was only a matter of time until the harsh new world of quantitative tightening and sharply higher inflation came back to bite. For many years, Spotify told Wall Street a growth story which deflected attention away from the company’s loss making operations. A growth story pumps up the stock price until the chickens start coming home to roost. (Growth is also the reason to put off exercising pricing power over subscriptions.) Investors bought into the growth story in the absence of alternatives, not just for Spotify but for the market in general (compare Russell Growth and Value indexes from 2008-2023). Cutting costs and seeking profit is an example of what public company CEOs might do in anticipation of a rotational shift from growth to value investing that could hit their shares.

Never forget that due to Daniel Ek’s super-voting stock (itself an ESG fail), he is in control of Spotify. So there’s nowhere to hide when the iconography turns to blame. It’s not that easy or cheap to fire him, but if the board really wanted to give him the heave, they could do it.

I expect that Ek’s newly found parsimony will be even more front and center in renegotiations of Spotify’s royalty deals since he’s always blamed the labels for why Spotify can’t turn a profit. Not that WTC lease, surely. This would be a lot more tolerable from someone you thought was actually making an effort to cut all costs not just your revenue. Maybe that will happen, but even if Spotify became a lean mean machine, it will take years to recover from the 1999 levels of stupid that preceded it.

Hellooo Apple. One big thinker in music business issues calls it “Spotify drunk” which describes the tendency of record company marketers to focus entirely on Spotify and essentially ignore Apple Music as a distribution partner. If you’re in that group drinking the Spotify Kool Aid, you may want to give Apple another look. One thing that is almost certain is that that Apple will still be around in five years.

Just sayin.

Mechanicals on Physical and Downloads Get COLA Increase; Nothing for Streaming

Recall that the “Phonorecords IV” minimum mechanical royalties paid by record companies on physical and downloads increased from 9.1¢ to 12¢ with an annual cost of living adjustment each year of the PR IV rate period. The first increase was calculated by the Copyright Royalty Judges and was announced this week. That increase was from 12¢ to 12.40¢ and is automatic effective January 1, 2024.

Note that there is no COLA increase for streaming for reasons I personally do not understand. There really is no justification for not applying a COLA to a government mandated rate that blocks renegotiation to cover inflation expectations. After all, it works for Edmund Phelps.

The Federal Trade Commission on Copyright and AI

The FTC’s comment in the Copyright Office AI inquiry shows an interesting insight to the Commission’s thinking on some of the same copyright issues that bother us about AI, especially AI training. Despite Elon Musk’s refreshing candor of the obvious truth about AI training on copyrights, the usual suspects in the Copyleft (Pam Samuelson, Sy Damle, etc.) seem to have a hard time acknowledging the unfair competition aspects of AI and AI training (at p. 5):

Conduct that may violate the copyright laws––such as training an AI tool on protected expression without the creator’s consent or selling output generated from such an AI tool, including by mimicking the creator’s writing style, vocal or instrumental performance, or likeness—may also constitute an unfair method of competition or an unfair or deceptive practice, especially when the copyright violation deceives consumers, exploits a creator’s reputation or diminishes the value of her existing or future works, reveals private information, or otherwise causes substantial injury to consumers. In addition, conduct that may be consistent with the copyright laws nevertheless may violate Section 5.

We’ve seen unfair competition claims pleaded in the AI cases–maybe we should be thinking about trying to engage the FTC in prosecutions.