Machines Don’t Let Machines Do Opt Outs: Why robots.txt won’t get it done for AI Opt Outs

[Following is based on an except from the Artist Rights Institute’s submission to the UK Intellectual Property Office consultation on a UK AI legislative proposal]

The fundamental element of any rights reservation regime is knowing which work is being blocked by which rights owner.  This will require creating a metadata identification regime for all works of authorship, a regime that has never existed and must be created from whole cloth.  As the IPO is aware, metadata for songs is quite challenging as was demonstrated in the IPO’s UK Industry Agreement on Music Streaming Metadata Working Groups.

Using machine-readable formats for reservations sounds like would be an easy fix, but it creates an enormous burden on the artist, i.e., the target of the data scraper, and is a major gift to the AI platform delivered by government.  We can look to the experience with robots.txt for guidance.

Using a robots.txt file or similar “do not index” file puts far too big a bet on machines getting it right in the silence of the Internet.  Big Tech has used this opt-out mantra for years in a somewhat successful attempt to fool lawmakers into believing that blocking is all so easy.  If only there was a database, even a machine can do it.  And yet there are still massive numbers of webpages copied and those pages that were copied for search (or the Internet Archive) are now being used to train AI.  

It also must be said that a “disallow” signal is designed to work with file types or folders, not millions of song titles or sound recordings (see GEMA’s lawsuits against AI platforms). For example, this robots.txt code will recognize and block a “private-directory” folder but would otherwise allow Google to freely index the site while blocking Bing from indexing images:

User-agent: *

Disallow: /private-directory/

User-agent: Googlebot

Allow: /

User-agent: Bingbot

Disallow: /images/

Theoretically, existing robots.txt files could be configured to block AI crawlers entirely by designating known crawlers as user-agents such as ChatGPT.  However, there are many known defects when robots.txt can fail to block web crawlers or AI data scrapers including:

Malicious or non-compliant crawlers might ignore the rules in a robots.txt file and continue to scrape a website despite the directives.

Incorrect Syntax of a robots.txt file can lead to unintended results, such as not blocking the intended paths or blocking too many paths.

Issues with server configuration can prevent the robots.txt file from being correctly read or accessed by crawlers.

Content generated dynamically through JavaScript or AJAX requests might not be blocked if robots.txt is not properly configured to account for these resources.

Unlisted crawlers or scrapers not known to the user may not adhere to the intended rules.

Crawlers using cached versions of a site may bypass rules in a robots.txt file, particularly updated rules since the cache was created.

Subdomains and Subdirectories limiting the scope of the rules can lead to not blocking all intended subdomains or subdirectories.

Missing Entire Lists of Songs, Recordings, or Audiovisual works.

While robots.txt and similar techniques theoretically are useful tools for managing crawler access, they are not foolproof. Implementing additional security measures, such as IP blocking, CAPTCHA, rate limiting, and monitoring server logs, can help strengthen a site’s defenses against unwanted scraping.  However, like the other tools that were supposed to level the playing field for artists against Big Tech, none of these tools are free, all of them require more programming knowledge than can reasonably be expected, all require maintenance, and at scale, all of them can be gamed or will eventually fail. 

 It must be said that all of the headaches and expense of keeping Big Tech out is because Big Tech so desperately wants to get in.

The difference between blocking a search engine crawler and an AI data scraper (which could each be operated by the same company in the case of Meta, Bing or Google) is that failing to block a search engine crawler is inconvenient for artists, but failing to block an AI data scraper is catastrophic for artists.

Even if the crawlers worked seamlessly, should any of these folders change names and the site admin forgets to change the robots.txt file, that is asking a lot of every website on the Internet.

It must also be said that pages using machine readable blocking tools may result in pages being downranked, particularly for AI platforms closely associated with search engines.  Robots.txt blocking already has problems with crawlers and downranking for several reasons. A robots.txt file itself doesn’t directly cause pages to be downranked in search results. However, it can indirectly affect rankings by limiting search engine crawlers’ access to certain parts of a website. Here’s how:

Restricted Crawling: If you block crawlers from accessing important pages using robots.txt, those pages won’t be indexed. Without indexing, they won’t appear in search results, let alone rank.

Crawl Budget Mismanagement: For large websites, search engines allocate a “crawl budget”—the number of pages they crawl in a given time. If robots.txt doesn’t guide crawlers efficiently, that may randomly leave pages unindexed.

No Content Evaluation: If a page is blocked by robots.txt but still linked elsewhere, search engines might index its URL without evaluating its content. This can result in poor rankings since the page’s relevance and quality can’t be assessed.

The TDM safe harbor is too valuable and potentially too dangerous to leave to machines.

The Delay’s The Thing: Anthropic Leapfrogs Its Own November Valuation Despite Litigation from Authors and Songwriters in the Heart of Darkness

If you’ve read Joseph Conrad’s Heart of Darkness, you’ll be familiar with the Congo Free State, a private colony of Belgian King Leopold II that is today largely the Democratic Republic of the Congo. When I say “private” I mean literally privately owned by his Leopoldness. Why would old King Leo be so interested in owning a private colony in Africa? Why for the money, of course. Leo had to move some pieces around the board and get other countries to allow him to get away with essentially “buying” the place, if “buying” is the right description.

So Leo held an international conference in Berlin to discuss the idea and get international buy-in, kind of like the World Economic Forum with worse food and no skiing. Rather than acknowledging his very for-profit intention to ravage the Congo for ivory (aka slaughtering elephants) and rubber (the grisly extraction of which was accomplished by uncompensated slave labor) with brutal treatment of all concerned, Leo convinced the assembled nations that his intentions were humanitarian and philanthropic. You know, don’t be evil. Just lie.

Of course, however much King Leopold may have foreshadowed our sociopathic overlords from Silicon Valley, it must be said that Leo’s real envy won’t so much be the money as what he could have done with AI himself had he only known. Oh well, he just had to make do with Kurtz.

Which bring us to AI in general and Anthropic in particular. Anthropic’s corporate slogan is equally humanitarian and philanthropic: “Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.” Oh yes, all very jolly.

All very innocent and high minded, until you get punched in the face (to coin a phrase). It turns out–quelle horreur–that Anthropic stands accused of massive copyright infringement rather than lauded for its humanitarianism. Even more shocking? The company’s valuation is going through the stratosphere! These innocents surely must be falsely accused! The VC’s are voting with their bucks, so they wouldn’t put their shareholders’ money or limiteds money on the line for a–RACKETEER INFLUENCED CORRUPT ORGANIZATION?!?

Not only have authors brought this class action against Anthropic which is both Google’s stalking horse and cats paw to mix a metaphor, but the songwriters and music publishers have sued them as well. Led by Concord and Universal, the publishers have sued for largely the same reasons as the authors but for their quite distinct copyrights.

So let’s understand the game that’s being played here–as the Artist Rights Institute submitted in a comment to the UK Intellectual Property Office in the IPO’s current consultation on AI and copyright, the delay is the thing. And thanks to Anthropic, we can now put a valuation on the delay since the $4,000,000,000 the company raised in November 2024: $3,500,000,000. This one company is valued at $61.5 billion, roughly half of the entire creative industries in the UK and roughly equal to the entire U.S. music industry. No wonder delay is their business model.

However antithetical, copyright and AI must be discussed together for a very specific reason:  Artificial intelligence platforms operated by Google, Microsoft/OpenAI, Meta and the like have scraped and ingested works of authorship from baby pictures to Sir Paul McCartney as fast and as secretly as possible.  And the AI platforms know that the longer they can delay accountability, the more of the world’s culture they will have devoured—or as they might say, the more data they will have ingested.  And Not to mention the billions in venture capital they will have raised, just like Anthropic. For the good of humanity, of course, just like old King Leo.

As the Hon. Alison Hume, MP recently told Parliament, this theft is massive and has already happened, another example of why any “opt out” scheme (as had been suggested by the UK government) has failed before it starts:

This week, I discovered that the subtitles from one of my episodes of New Tricks have been scraped and are being used to create learning materials for artificial intelligence.  Along with thousands of other films and television shows, my original work is being used by generative AI to write scripts which one day may replace versions produced by mere humans like me.

This is theft, and it’s happening on an industrial scale.  As the law stands, artificial intelligence companies don’t have to be transparent about what they are stealing.[1]

Any delay[2] in prosecuting AI platforms simply increases their de facto “text and data mining” safe harbor while they scrape ever more of world culture.  As Ms. Hume states, this massive “training” has transferred value to these data-hungry mechanical beasts to a degree that confounds human understanding of its industrial scale infringement.  This theft dwarfs even the Internet piracy that drove broadband penetration, Internet advertising and search platforms in the 1999-2010 period.  It must be said that for Big Tech, commerce and copyright are once again inherently linked for even greater profit.

As the Right Honourable Baroness Kidron said in her successful opposition to the UK Government’s AI legislation in the House of Lords:

The Government are doing this not because the current law does not protect intellectual property rights, nor because they do not understand the devastation it will cause, but because they are hooked on the delusion that the UK’s best interests and economic future align with those of Silicon Valley.[3]  

Baroness Kidron identifies a question of central importance that mankind is forced to consider by the sheer political brute force of the AI lobbying steamroller:  What if AI is another bubble like the Dot Com bubble?  AI is, to a large extent, a black box utterly lacking in transparency much less recordkeeping or performance metrics.  As Baroness Kidron suggests, governments and the people who elect them are making a very big bet that AI is not pursuing an ephemeral bubble like the last time.

Indeed, the AI hype has the earmarks of a bubble, just as the Dot Com bubble did.  Baroness Kidron also reminds us of these fallacious economic arguments surrounding AI:

The Prime Minister cited an IMF report that claimed that, if fully realised, the gains from AI could be worth up to an average of £47 billion to the UK each year over a decade. He did not say that the very same report suggested that unemployment would increase by 5.5% over the same period. This is a big number—a lot of jobs and a very significant cost to the taxpayer. Nor does that £47 billion account for the transfer of funds from one sector to another. The creative industries contribute £126 billion per year to the economy. I do not understand the excitement about £47 billion when you are giving up £126 billion.[4]  

As Hon. Chris Kane, MP said in Parliament,  the Government runs the risk of enabling a wealth transfer that itself is not producing new value but would make old King Leo feel right at home: 

Copyright protections are not a barrier to AI innovation and competition, but they are a safeguard for the work of an industry worth £125 billion per year, employing over two million people.  We can enable a world where much of this value  is transferred to a handful of big tech firms or we can enable a win-win situation for the creative industries and AI developers, one where they work together based on licensed relationships with remuneration and transparency at its heart.


[1] Paul Revoir, AI companies are committing ‘theft’ on an ‘industrial scale’, claims Labour MP – who has written for TV series including New Tricks, Daily Mail (Feb. 12, 2025) available at https://www.dailymail.co.uk/news/article-14391519/AI-companies-committing-theft-industrial-scale-claims-Labour-MP-wrote-TV-shows-including-New-Tricks.html

[2] See, e.g., Kerry Muzzey, [YouTube Delay Tactics with DMCA Notices], Twitter (Feb. 13, 2020) available at https://twitter.com/kerrymuzzey/status/1228128311181578240  (Film composer with Content ID account notes “I have a takedown pending against a heavily-monetized YouTube channel w/a music asset that’s been fine & in use for 7 yrs & 6 days. Suddenly today, in making this takedown, YT decides “there’s a problem w/my metadata on this piece.” There’s no problem w/my metadata tho. This is the exact same delay tactic they threw in my way every single time I applied takedowns against broadcast networks w/monetized YT channels….And I attached a copy of my copyright registration as proof that it’s just fine.”); Zoë Keating, [Content ID secret rules], Twitter (Feb. 6. 2020) available at https://twitter.com/zoecello/status/1225497449269284864  (Independent artist with Content ID account states “[YouTube’s Content ID] doesn’t find every video, or maybe it does but then it has selective, secret rules about what it ultimately claims for me.”).

[3] The Rt. Hon. Baroness Kidron, Speech regarding Data (Use and Access) Bill [HL] Amendment 44A, House of Lords (Jan. 28, 2025) available at https://hansard.parliament.uk/Lords%E2%80%8F/2025-01-28/debates/9BEB4E59-CAB1-4AD3-BF66-FE32173F971D/Data(UseAndAccess)Bill(HL)#contribution-9A4614F3-3860-4E8E-BA1E-53E932589CBF 

[4] Id. 

UK Government’s AI Legislation is Defeated in the House of Lords

The new-ish UK government led by Labour Prime Minister Sir Keir Starmer faced a defeat in the House of Lords regarding their AI bill. The defeat was specifically about measures to protect copyrighted material from being used to train AI models without permission or compensation. Members of the House of Lords (known as “Peers”) voted 145 to 126 in favor of amendments to the UK Government’s Data (Use and Access) Bill, proposed by film director Beeban Tania Kidron, the Baroness Kidron (a “cross bench peer”) which aim to safeguard the intellectual property of creatives. Lady Kidron said:

There is a role in our economy for AI… and there is an opportunity for growth in the combination of AI and creative industries, but this forced marriage on slave terms is not it.

So there’s that. We need a film director in the Senate, don’t you think? Yes, let’s have one of those, please.

Bill Dies With Amendments

The amendments proposed by Baroness Kidron received cross-party support (what would be called “bi-partisan” in the US, but the UK has more than two political parties represented in Parliament). The amendments include provisions to ensure, among other things, that AI companies comply with UK copyright law, disclose the names and owners of web crawlers doing their dastardly deeds in the dark of the recesses of the Internet, and allow copyright owners to know when and how their work is used. It might even protect users of Microsoft or Google products from having their drafts crawled and scraped for AI training.

This defeat highlights the growing concerns within Parliament about the unregulated use of copyrighted material by major tech firms. Starmer’s Data (Use and Access) Bill was proposed by the UK government to excuse the use of copyrighted material by AI models. However, thanks in part to Lady Kidron it faced significant opposition in the House of Lords, leading to its defeat.

Here’s a summary of why it failed:

  1. Cross-Party Support for Amendments: The amendments proposed by Baroness Kidron received strong support from both Labour and Conservative peers. They argued that the bill needed stronger measures to protect the intellectual property of creatives.
  2. Transparency and Redress: The amendments aimed to improve transparency by requiring AI companies to disclose the names and owners of web crawlers and allowing copyright owners to know when and how their work is used.
  3. Government’s Preferred Option: The government suggested an “opt-out” system for text and data mining, which would allow AI developers to scrape copyrighted content unless rights holders actively opted out. This approach was heavily criticized as it would lead to widespread unauthorized use of intellectual property, or as we might say in Texas, that’s bullshit for starters.
  4. Economic Impact: Supporters of the amendments argued that the bill, in its original form, would transfer wealth from individual creatives and small businesses to big tech companies, undermining the sustainability of the UK’s creative industries. Because just like Google’s products, it was a thinly disguised wealth transfer.

The defeat highlights the growing concerns within Parliament about the unregulated use of copyrighted material by major tech firms and the need for stronger protections for creatives. several prominent artists voiced their opposition to the UK government’s AI bill. Sir Elton John and Sir Paul McCartney were among the most prominent critics. They argued that the government’s proposed changes would allow AI companies to use copyrighted material without proper compensation, which could threaten the livelihoods of artists, especially emerging ones.

Elton John expressed concerns that the bill would enable big tech companies to “ride roughshod over traditional copyright laws,” potentially diluting and threatening young artists’ earnings. As a fellow former member of Long John Baldry’s back up band, I say well done, Reg. Paul McCartney echoed these sentiments, emphasizing that the new laws would allow AI to rip off creators and hinder younger artists who might not have the means to protect their work–and frankly, the older artists don’t either when going up against Google and Microsoft, with backing by Softbank and freaking countries.

Their opposition highlights the broader concerns within the creative community like Ivors Academy and ECSA about the potential impact of AI on artists’ rights and earnings.

Role of the House of Lords

The House of Lords is one of the two houses of the UK Parliament, the other being the House of Commons. It plays a crucial role in the legislative process and functions as a revising chamber. Here are some key aspects of the House of Lords:

Functions of the House of Lords

  1. Scrutiny and Revision of Legislation:
    • The House of Lords reviews and scrutinizes bills passed by the House of Commons.
    • It can suggest amendments and revisions to bills, although it cannot ultimately block legislation.
  2. Debate and Deliberation:
    • The Lords engage in detailed debates on a wide range of issues, contributing their expertise and experience.
    • These debates can influence public opinion and policy-making.
  3. Committees:
    • The House of Lords has several committees that investigate specific issues, scrutinize government policies, and produce detailed reports.
    • Committees play a vital role in examining the impact of proposed legislation and holding the government to account.
  4. Checks and Balances:
    • The House of Lords acts as a check on the power of the House of Commons and the executive branch of the government.
    • It ensures that legislation is thoroughly examined and that diverse perspectives are considered.

Composition of the House of Lords

  • Life Peers: Appointed by the King on the advice of the Prime Minister, these members serve for life but do not pass on their titles.
  • Bishops: A number of senior bishops from the Church of England have seats in the House of Lords.
  • Hereditary Peers: A limited number of hereditary peers remain, but most hereditary peerages no longer carry the right to sit in the House of Lords.
  • Law Lords: Senior judges who used to sit in the House of Lords as the highest court of appeal, a function now transferred to the Supreme Court of the United Kingdom.

Limitations

While the House of Lords can delay legislation and suggest amendments, it does not have the power to prevent the House of Commons from passing laws. Its role is more about providing expertise, revising, and advising rather than blocking legislation.

Now What?

Following the defeat in the House of Lords, the government’s Data (Use and Access) Bill will need to be reconsidered by the UK government. They will have to decide whether to accept the amendments proposed by the Lords or to push back and attempt to pass the bill in its original form.

It’s not entirely unusual for Labour peers to vote against a Labour government, especially on issues where they have strong differing opinions or concerns. The House of Lords operates with a degree of independence of the House of Commons, where I would say it would be highly unusual for the government to lose a vote on something as visible at the AI issue.

The AI bill would no doubt be a “triple whip vote”, a strict instruction issued by a political party to its members usually in the House of Commons (in this case the Labour Party), requiring them to attend a vote and vote according to the party’s official stance to support the Government. It’s the most serious form of voting instruction, indicating that the vote is crucial and that party discipline must be strictly enforced. Despite the sadomasochistic overtones of a “triple whip” familiar as caning to British public school boys, peers in the Lords often vote based on their own judgment and expertise rather than strict party lines. This can lead to situations where Labour peers might oppose government proposals if they believe it is in the best interest of the public or aligns with their principles. Imagine that!

So, while it’s not the norm, it’s also not entirely unexpected for Labour peers to vote against a Labour government when significant issues are at stake like, oh say the destruction of the British creative industries.

Crucially, the government is currently consulting on the issue of text and data mining through the Intellectual Property Office. The IPO is accepting public comments on the AI proposals with a deadline of February 25, 2025. This feedback will likely influence their next steps. Did I say that the IPO is accepting public comments, even from Americans? Hint, hint. Read all about the IPO consultation here.

Big Tech’s Misapprehensions About the AI Appropriation Invasion: Artist Rights are Not “Regulation”

It was a rough morning. I ran across both reports from Davos where they are busy blowing AI bubbles yet again and also read about a leading Silicon Valley apologist discussing the current crop of AI litigation. That was nauseating. But once the bile settled down, I had a realization: This is all straight out of the Woodrow Wilson rule-by-technocrats playbook.

Wilson believed that experts and intellectuals, rather than the voting public, should guide the creation and implementation of public policy. The very model of a modern technocrat. The present day technocrats and their enablers in the legal profession are heirs to Wilsonian rule by experts. They view copyright and other human rights of artists as regulation impeding innovation. Innovation is the godhead to which all mankind must–emphasis on must–aspire, whether mankind likes it or not.

Not human rights–artist rights are human rights, so that proposition cannot be allowed. The technocrats want to normalize “innovation” as the superior value that need not be humanized or even explained. Artist rights must yield and even be shattered in the advance of “innovation”. The risible Lessig is already talking about “the right to train” for AI, a human rights exception you can drive a tank through as is his want in the coin-operated policy laundry. In Wilsonian tradition, we are asked to believe that public policy must be the handmaiden to appropriation by technology even if by doing so the experts destroy culture.

We went through this before with Internet piracy. There are many familiar faces in the legal profession showing up on AI cases who were just getting warmed up on the piracy cases of the 1999-2015 period that did their best to grind artist rights into bits. AI is far beyond the massive theft and wealth transfer that put a generation of acolyte children through prep school and higher education. AI takes extracting profit from cultural appropriation to a whole new level–it’s like shoplifting compared to carpet bombing.

“I got the shotgun, you got the brief case…”

And since the AI lawyers are fascinated by Nazi metaphors, let me give you one myself: Internet piracy is to Guernica what AI is to Warsaw. The Luftwaffe was essentially on a training run when they bombed Guernica during the Spanish Civil War. Guernica was a warm up act; the main event was carpet bombing a culture out of existence in Warsaw and after. It was all about the Luftwaffe testing and refining their aerial bombing tactics that opened the door to hell and allowed Satan to walk the Earth swishing his tail as he does to this day. But in the words of Stefan Starzyński, the Mayor of Warsaw who broadcast through the German attack, “We are fighting for our freedom, for our honor, and for our future. We will not surrender.”

This is what these crusader technocrats do not seem to understand no matter how they enrich themselves from the wealth transfer of cultural appropriation. AI litigation and policy confrontation is not about the money–there is no license fee big enough and nobody trusts Silicon Valley to give a straight count in any event.

Artists, songwriters, authors and other creators have nowhere to go. The battle of human rights against the AI appropriation invasion may well be humanity’s last stand.

Now with added retroactive acrobatics: @DamianCollins calls on UK Prime Minister to stop Google’s “Text and Data Mining” Circus

Damian Collins (former chair of the UK Parliament’s Digital Culture Media and Sport Select Committee) warns of Google’s latest AI shenanigans in a must-read opinion piece in the Daily Mail that highlights Google’s attempt to lobby its way into what is essentially a retroactive safe harbor to protect Google and its confederates in the AI land grab. While Mr. Collins writes about Google’s efforts to rewrite the laws of the UK to free ride in his home country which is egregious bullying, the episode he documents is instructive for all of us. If Google & Co. will do it to the Mother of Parliaments, it’s only a matter of time until Google & Co. do the same everywhere or know the reason why. Their goal is to hoover up all the world’s culture that the AI platforms have not scraped already and–crucially–to get away with it. And as Guy Forsyth says, “…nothing says freedom like getting away with it.”

The timeline of AI’s appropriation of all the world’s culture is a critical understanding to appreciate just how depraved Big Tech’s unbridled greed really is. The important thing to remember is that AI platforms like Google have been scraping the Internet to train their AI for some time now, possibly many years. This apparently includes social media platforms they control. My theory is that Google Books was an early effort at digitization for large language models to support products like corpus machine translation as a predecessor to Gemini (“your twin”) and other Google AI products. We should ask Ray Kurzweil.

There is starting to be increasing evidence that this is exactly what these people are up to.

The New York Times Uncovers the Crimes

According to an extensive long-form report in the New York Times by a team of very highly respected journalists, it turns out that Google has been planning this “Text and Data Mining” land grab for some time. At the very moment YouTube was issuing press releases about their Music AI Incubator and their “partners”–Google was stealing anything that was not nailed down that anyone had hosted on their massive platforms, including Google Docs, Google Maps, and…YouTube. The Times tells us:

Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators….Google said that its A.I. models “are trained on some YouTube content,” which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. 

I find it hard to believe that YouTube was both allowed to transcribe and scrape under all its content deals, or that they parsed through all videos to find the unprotected ones subject to their interpretation of the YouTube terms of use. So as we say in Texas, that sounds like bullshit for starters.

How does this relate to the Text and Data Mining exception that Mr. Collins warns of? Note that the NYT tells us “Google transcribed YouTube videos to harvest text.” That’s a clue.

As Mr. Collins tells us:

Google [recently] published a policy paper entitled: Unlocking The UK’s AI Potential.

What’s not to like?, you might ask. Artificial intelligence has the potential to revolutionise our economy and we don’t want to be left behind as the rest of the world embraces its benefits.

But buried in Google’s report is a call for a ‘text and data mining’ (TDM) exception to copyright.

This TDM exception would allow Google to scrape the entire history of human creativity from the internet without permission and without payment.

And, of course, Mr. Collins is exactly correct, that’s exactly what Google have in mind.

The Conspiracy of Dunces and the YouTube Fraud

In fairness, it wasn’t just Google ripping us off, but Google didn’t do anything to stop it as far as I can tell. One thing to remember is that YouTube was, and I think still is, not very crawlable by outsiders. It is almost certainly the case that Google would know who was crawling youtube.com, such as Bingbot, DuckDuckBot, Yandex Bot, or Yahoo Slurp if for no other reason that those spiders were not googlebot. With that understanding, the Times also tells us:

OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform. [Whatever “independent” means.]

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot….

OpenAI eventually made Whisper, the speech recognition tool, to transcribe YouTube videos and podcasts, six people said. But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means (such as robots, botnets or scrapers).”

OpenAI employees knew they were wading into a legal gray area, the people said, but believed that training A.I. with the videos was fair use. 

And strangely enough, many of the AI platforms sued by creators raise “fair use” as a defense (if not all of the cases) which is strangely reminiscent of the kind of crap we have been hearing from these people since 1999.

Now why might Google have permitted OpenAI to crawl YouTube and transcribe videos (and who knows what else)? Probably because Google was doing the same thing. In fact, the Times tells us:

Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn’t stop OpenAI because Google had also used transcripts of YouTube videos to train its A.I. models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.

So Google and its confederate OpenAI may well have conspired to commit massive copyright infringement against the owner of a valid copyright, did so willingly, and for purposes of commercial advantage and private financial gain. (Attempts to infringe are prohibited to the same extent as the completed act). The acts of these confederates vastly exceed the limits for criminal prosecution for both infringement and conspiracy.

But to Mr. Collins’ concern, the big AI platforms transcribed likely billions of hours of YouTube videos to manipulate text and data–you know, TDM.

The New Retroactive Safe Harbor: The Flying Googles Bring their TDM Circus Act to the Big Tent With Retroactive Acrobatics

But also realize the effect of the new TDM exception that Google and their Big Tech confederates are trying to slip past the UK government (and our own for that matter). A lot of the discussion about AI rulemaking acts as if new rules would be for future AI data scraping. Au contraire mes amis–on the contrary, the bad acts have already happened and they happened on an unimaginable scale.

So what Google is actually trying to do is get the UK to pass a retroactive safe harbor that would deprive citizens of valuable property rights–and also pass a prospective safe harbor so they can keep doing the bad acts with impunity.

Fortunately for UK citizens, the UK Parliament has not passed idiotic retroactive safe harbor legislation like the U.S. Congress has. I am, of course, thinking of the vaunted Music Modernization Act (MMA) that drooled its way to a retroactive safe harbor for copyright infringement, a shining example of the triumph of corruption that has yet to be properly challenged in the US on Constitutional grounds.

There’s nothing like the MMA absurdity in the UK, at least not yet. However, that retroactive safe harbor was not lost on Google, who benefited directly from it. They loved it. They hung it over the mantle next to their other Big Game trophy, the DMCA. And now they’d like to do it again for the triptych of legislative taxidermy.

Because make no mistake–a retroactive safe harbor would be exactly the effect of Google’s TDM exception. Not to mention it would also be a form of retroactive eminent domain, or what the UK analogously might call the compulsory purchase of property under the Compulsory Purchase of Property Act. Well…”purchase” might be too strong a word, more like “transfer” because these people don’t intend to pay for a thing.

The effect of passing Google’s TDM exception would be to take property rights and other personal rights from UK citizens without anything like the level of process or compensation required under the Compulsory Purchase of Property–even when the government requires the sale of private property to another private entity (such as a railroad right of way or a utility easement).

The government is on very shaky ground with a TDM exception imposed by the government for the benefit of a private company, indeed foreign private companies who can well afford to pay for it. It would be missing government oversight on a case-by-base basis, no proper valuation, and for entirely commercial purposes with no public benefit. In the US, it would likely violate the Takings Clause of our Constitution, among other things.

It’s Not Just the Artists

Mr. Collins also makes a very important point that might get lost among the stars–it’s not just the stars that AI is ripping off–it is everyone. As the New York Times story points out (and it seems that there’s more whistleblowers on this point every day), the AI platforms are hoovering up EVERYTHING that is on the Internet, especially on their affiliated platforms. That includes baby videos, influencers, everything.

This is why it is cultural appropriation on a grand scale, indeed a scale of depravity that we haven’t seen since the Nurenberg Trials. A TDM exception would harm all Britons in one massive offshoring of British culture.

Open the Pod Bay Doors, HAL: Why Eric Schmidt is Insane in his own words

In the GAI, no one can hear you scream. Let’s remember that this man has already stolen world culture–twice. It will be a dark kind of fun watching Schmidt get the World Economic Forum, Lawrence Lessig and Greta Thunberg to do a 180 on climate change. Don’t laugh–if anyone can do it, he can. You watch, the Berkman Center and EFF will lead the charge.

Chronology: The Week in Review: Search is Ripping Us Off Again, this time for AI; #FreeJimmyLai; the MLC loves the MLC almost as much as the NMPA

In case you think that infringement of copyrights in AI training happens only to rock stars, photographers, record companies or movie studios, let me give you a personal anecdote that might cause you to reconsider. I asked Bing’s Copilot AI tool to list the deal points in a record producer agreement, and this is what I got back:

Copilot

Certainly! When negotiating a producer agreement, several key points should be considered. Here are some common negotiation points from the artist’s perspective:

  1. Scope of Engagement:
    1. The producer is generally engaged for one album project at a time.
    1. Clarify the scope of work, including the number of songs or recordings to be produced.
  2. Budget and Compensation:
    1. Determine the budget for the project (e.g., $10,000 in your case).
    1. Agree on the producer’s compensation, which may include an upfront fee, points on sales, or a combination.
  3. Ownership and Rights:
    1. Address ownership of the master recordings:
      1. Will the artist own and control the masters?
      1. Does the producer retain any rights?
  4. Credits and Royalties:
    1. Specify how the producer will be credited on releases.
    1. Discuss any royalties or backend points for the producer.
  5. Recording Schedule:
    1. Set a timeline for recording sessions.
    1. Define the two-week period for your project.
  6. Logistics:
    1. Determine where the recordings will take place.
    1. Outline the artist’s approval process.
    1. Define the producer’s deliverables.
  7. Termination Clause:
    1. Include provisions for termination if either party breaches the agreement.
    1. Address payment for work completed up to that point.

Remember that each negotiation is unique, and it’s essential to tailor the agreement to your specific needs and circumstances. Legal advice is recommended to ensure compliance with local laws and industry standards1 3 4.

Source(s)

1. An Artist’s Guide to Negotiating Record Producer Agreements – Medium

2. Negotiating a Music Producer Agreement: 7 Key Issues

3. 5 Things to Look for in a Music Producer Agreement – Lawyer Drummer

4. How to Negotiate: Steps, Phrases & Tips to Win the Deal – Selling Signals

Notice that Copilot lists “Source(s)” and that source #1 is an article from Medium.

My article.

I’ve written a number of articles about negotiating producer agreements and I can categorically say I haven’t given permission or been asked to give permission for any of those articles to be scraped and used to train AI. So how did it come to be in Copilot’s training materials?

My guess is that Medium (and probably a lot of other sites) were crawled by Bing’s spider and that copies of Medium pages were stored in Bing’s (or Microsoft’s) data centers. When Microsoft launched its Copilot AI tool, one could imagine that the temptation would have been strong to use those copies as part of the training tools for Copilot. In fact, the Verge reports that “companies around the web are using your site and its data to build massive sets of training data, in order to build models and products that may not acknowledge your existence at all.”

Are you surprised then that two of the biggest operators in the AI space are the search engine operators Google and Microsoft? This is another example of how Big Tech helps itself to your data and work product without you even knowing it’s happening. So now what? Now I know I’m being ripped off, and I’m wondering if Medium is in on it.

The Verge tells us:

The ability to download, store, organize, and query the modern internet gives any company or developer something like the world’s accumulated knowledge to work with. In the last year or so, the rise of AI products like ChatGPT, and the large language models underlying them, have made high-quality training data one of the internet’s most valuable commodities. That has caused internet providers of all sorts to reconsider the value of the data on their servers, and rethink who gets access to what. 

Ya think?

#FreeJimmyLai

In case you were wondering if the Chinese Communist Party is serious about using TikTok to collect data as they please, Hong Kong publisher Jimmy Lai is a perfect example of what happens when a company refuses to cooperate. The CCP took over Jimmy’s Apple Daily newspaper and arrested him. Jimmy has been in prison for three years and has still not come to “trial” (whatever that means under the CCP).

Blake Morgan sounds off in Hypebot on how TikTok uses music as a honeypot to hook innocent users

The MLC Reup

The MLC gets a five year review of how they are doing. The Copyright Office conducts that review of both the MLC and the DLC. The public (that’s you) gets a chance to weigh in now because the MLC and the DLC filed their respective written statements about their respective awesomeness.

One of the most interesting exchanges happened between Graham Davies (who essentially is the public face of the DLC and is the head of the Digital Media Association) and the NMPA. Graham offered muted criticism of the MLC which irked the easily irked NMPA who offered a rather scathing response.

The most interesting thing about the exchange was that it was the NMPA–not the MLC–that responded to Graham. Tell you anything?

Chronology: The Week in Review: Could Spotify Extend Stream Discrimination to Songs, the No AI Fraud Act, Chairman Issa Has Questions on MLC Investment Policy

Spotify has announced they are “Modernizing Our Royalty System.” Beware of geeks bearing “modernization”–that almost always means they get what they want to your disadvantage. Also sounds like yet another safe harbor. At a minimum, they are demonstrating the usual lack of understanding of the delicate balance of the music business they now control. But if they can convince you not to object, then they get away with it.

Don’t let them.

An Attack on Property Rights

There’s some serious questions about whether Spotify has the right to unilaterally change the way it counts royalty-bearing streams and to encroach on the private property rights of artists. 

Here’s their plan: Evidently the plan is to only pay on streams over 1,000 per song accruing during the previous 12 months. I seriously doubt that they can engage in this terribly modern “stream discrimination” in a way that doesn’t breach any negotiated direct license with a minimum guarantee (if not others). 

That doubt also leads me to think that Spotify’s unilateral change in “royalty policy” (whatever that is) is unlikely to affect everyone the same. Taking a page from 1984 newspeakers, Spotify calls this discrimination policy “Track Monetization Eligibility”. It’s not discrimination, you see, it’s “eligibility”, a whole new thing. Kind of like war is peace, right? Or bouillabaisse.

According to Spotify’s own announcement this proposed change is not an increase in the total royalty pool that Spotify pays out (God forbid the famous “pie” should actually grow): ”There is no change to the size of the music royalty pool being paid out to rights holders from Spotify; we will simply use the tens of millions of dollars annually [of your money] to increase the payments to all eligible tracks, rather than spreading it out into $0.03 payments [that we currently owe you].” 

Yep, you won’t even miss it, and you should sacrifice for all those deserving artists who are more eligible than you. They are not growing the pie, they are shifting money around–rearranging the deck chairs.

Spotify’s Need for Living Space

So why is Spotify doing this to you? The simple answer is the same reason monopolists always use: they need living space for Greater Spotify. Or more simply, because they can, or they can try. They’ll tell you it’s to address “streaming fraud” but there are a lot more direct ways to address streaming fraud such as establishing a simple “know your vendor” policy, or a simple pruning policy similar to that established by record companies to cut out low-sellers (excluding classical and instrumental jazz). But that would require Spotify to get real about their growth rates and be honest with their shareholders and partners. Based on the way Spotify treated the country of Uruguay, they are more interested in espoliating a country’s cultural resources than they are in fairly compensating musicians.

Of course, they won’t tell you that side of the story. They won’t even tell you if certain genres or languages will be more impacted than others (like the way labels protected classical and instrumental jazz from getting cut out measured by pop standards). Here’s their explanation:

It’s more impactful [says who?] for these tens of millions of dollars per year to increase payments to those most dependent on streaming revenue — rather than being spread out in tiny payments that typically don’t even reach an artist (as they do not surpass distributors’ minimum payout thresholds). 99.5% of all streams are of tracks that have at least 1,000 annual streams, and each of those tracks will earn more under this policy.

This reference to “minimum payout thresholds” is a very Spotifyesque twisting of a generalization wrapped in cross reference inside of spin. Because of the tiny sums Spotify pays artists due to the insane “big pool” or “market centric” royalty model that made Spotify rich, extremely low royalties make payment a challenge. 

Plus, if they want to make allegations about third party distributors, they should say which distributors they are speaking of and cite directly to specific terms and conditions of those services. We can’t ask these anonymous distributors about their policies if we don’t know who they are. 

What’s more likely is that tech platforms like PayPal stack up transaction fees to make the payment cost more than the royalty paid. Of course, you could probably say that about all streaming if you calculate the cost of accounting on a per stream basis, but that’s a different conversation.

So Spotify wants you to ignore the fact that they impose this “market centric” royalty rate that pays you bupkis in the first place. Since your distributor holds the tiny slivers of money anyway, Spotify just won’t pay you at all. It’s all the same to you, right? You weren’t getting paid anyway, so Spotify will just give your money to these other artists who didn’t ask for it and probably wouldn’t want it if you asked them.

There is a narrative going around that somehow the major labels are behind this. I seriously doubt it–if they ever got caught with their fingers in the cookie jar on this scam, would it be worth the pittance that they will end up getting in pocket after all mouths are fed? The scam is also 180 out from Lucian Grange’s call for artist centric royalty rates, so as a matter of policy it’s inconsistent with at least Universal’s stated goals. So I’d be careful about buying into that theory without some proof.

What About Mechanical Royalties?

What’s interesting about this scam is that switching to Spotify’s obligations on the song side, the accounting rules for mechanical royalties say (37 CFR § 210.6(g)(6) for those reading along at home) seem to contradict the very suckers deal that Spotify is cramming down on the recording side:

Royalties under 17 U.S.C. 115 shall not be considered payable, and no Monthly Statement of Account shall be required, until the compulsory licensee’s [i.e., Spotify’s] cumulative unpaid royalties for the copyright owner equal at least one cent. Moreover, in any case in which the cumulative unpaid royalties under 17 U.S.C. 115 that would otherwise be payable by the compulsory licensee to the copyright owner are less than $5, and the copyright owner has not notified the compulsory licensee in writing that it wishes to receive Monthly Statements of Account reflecting payments of less than $5, the compulsory licensee may choose to defer the payment date for such royalties and provide no Monthly Statements of Account until the earlier of the time for rendering the Monthly Statement of Account for the month in which the compulsory licensee’s cumulative unpaid royalties under section 17 U.S.C. 115 for the copyright owner exceed $5 or the time for rendering the Annual Statement of Account, at which time the compulsory licensee may provide one statement and payment covering the entire period for which royalty payments were deferred.

Much has been made of the fact that Spotify may think it can unilaterally change its obligations to pay sound recording royalties, but they still have to pay mechanicals because of the statute. And when they pay mechanicals, the accounting rules have some pretty low thresholds that require them to pay small amounts. This seems to be the very issue they are criticizing with their proposed change in “royalty policy.”

But remember that the only reason that Spotify has to pay mechanical royalties on the stream discrimination is because they haven’t managed to get that free ride inserted into the mechanical royalty rates alongside all the other safe harbors and goodies they seem to have bought for their payment of historical black box.

So I would expect that Spotify will show up at the Copyright Royalty Board for Phonorecords V and insist on a safe harbor to enshrine stream discrimination into the Rube Goldberg streaming mechanical royalty rates. After all, controlled compositions are only paid on royalty bearing sales, right? And since it seems like they get everything else they want, everyone will roll over and give this to them, too. Then the statutory mechanical will give them protection.

To Each According to Their Needs

Personally, I have an issue with any exception that results in any artist being forced to accept a royalty free deal. Plus, it seems like what should be happening here is that underperforming tracks get dropped, but that doesn’t support the narrative that all the world’s music is on offer. Just not paid for.

Is it a lot of money to any one person? Not really, but it’s obviously enough money to make the exercise worthwhile to Spotify. And notice that they haven’t really told you how much money is involved. It may be that Spotify isn’t holding back any small payments from distributors if all payments are aggregated. But either way it does seem like this new new thing should start with a clean slate–and all accrued royalties should be paid.

This idea that you should be forced to give up any income at all for the greater good of someone else is kind of an odd way of thinking. Or as they say back in the home country, from each according to their ability and to each according to their needs. And you don’t really need the money, do you?

By the way, can you break a $20?

The NO AI Fraud Act

Thanks to U.S. Representatives Salazar and Dean, there’s an effort underway to limit Big Tech’s AI rampage just in time for Davos. (Remember, the AI bubble got started at last year’s World Economic Forum Winter Games in Davos, Switzerland).

Chairman Issa Questions MLC’s Secretive Investment Policy for Hundreds of Millions in Black Box

As we’ve noted a few times, the MLC has a nontransparent–some might say “secretive”–investment policy that has the effect of a government rule. This has caught the attention of Chairman Darrell Issa and Rep. Ben Cline at a recent House oversight hearing. Chairman Issa asked for more information about the investment policy in follow-up “questions for the record” directed to MLC CEO Kris Ahrend. It’s worth getting smart about what the MLC is up to in advance of the upcoming “redesignation” proceeding at the Copyright Office. We all know the decision is cooked and scammed already as part of the Harry Fox Preservation Act (AKA Title I of the MMA), but it will be interesting to see if anyone actually cares and the investment policy is a perfect example. It will also be interesting to see which Copyright Office examiner goes to work for one of the DiMA companies after the redesignation as is their tradition.

Chronology: The week in review, Spotify layoffs, mechanical rate increase, FTC on copyright issues in AI

What Spotify needs is a good pandemic.  

Harsh?  Not really, at least not from a share price point of view. Spotify’s all time highest share price was during the COVID pandemic.

Spotify CEO Daniel Ek and the press tells us that Spotify is cutting 1,500 jobs which works out to about 17% of Spotify employees. Which works out to a pre-layoff workforce of 8,823.  So let’s start there—that workforce number seems very high and is completely out of line with some recent data from Statista which is usually reliable.

If Statista is correct, Spotify employed 5,584 as of last year. Yet somehow Spotify’s 2023 workforce grew to 9200 according to the Guardian, fully 2/3 over that 2022 level without a commensurate and offsetting growth in revenue. That’s a governance question in and of itself.

Why the layoffs? The Guardian reports that Spotify CEO Daniel Ek is concerned about costs. He says “Despite our efforts to reduce costs this past year, our cost structure for where we need to be is too big.” Maybe I missed it, but the only time I can recall Daniel Ek being vocally concerned about Spotify’s operating costs was when it came to paying royalties. Then it was full-blown poor mouthing while signing leases for very expensive office space in 4 World Trade Center as well as other pricy real estate, executive compensation and podcasters like Harry & Meghan.

Mr. Ek announced his new, new thing:

Over the last two years, we’ve put significant emphasis on building Spotify into a truly great and sustainable business – one designed to achieve our goal of being the world’s leading audio company and one that will consistently drive profitability and growth into the future. While we’ve made worthy strides, as I’ve shared many times, we still have work to do. Economic growth has slowed dramatically and capital has become more expensive. Spotify is not an exception to these realities.

Which “economic growth” is that?

But, he is definitely right about capital costs.

Still, Spotify’s job cuts are not necessarily that surprising considering the macro economy, most specifically rents and interest rates. As recently as 2018, Spotify was the second largest tenant at 4 WTC. Considering the sheer size of Spotify’s New York office space, it’s not surprising that Spotify is now subletting five floors of 4 WTC earlier this year. That’s right, the company had a spare five floors. Can that excess just be more people working at home given Mr. Ek’s decision to expand Spotify’s workforce? But why does Spotify need to be a major tenant in World Trade Center in the first place? Renting the big New York office space is the corporate equivalent of playing house. That’s an expensive game of pretend.

Remember that Spotify is one of the many companies that rose to dominance during the era of easy money in response to the financial crisis that was the hallmark of quantitative easing and the Federal Reserve’s Zero Interest Rate Policy beginning around 2008. Spotify’s bankers were able to fuel Daniel Ek’s desire to IPO and cash out in the public markets by enabling Spotify to run at a loss because money was cheap and the stock market had a higher tolerance for risky investments. When you get a negative interest rate for saving money, Spotify stock doesn’t seem like a totally insane investment by comparison. This may have contributed to two stock buy-back programs of $1 billion each, Spotify’s deal with Barcelona FC and other notorious excesses.

As a great man said, don’t confuse leverage for genius. It was only a matter of time until the harsh new world of quantitative tightening and sharply higher inflation came back to bite. For many years, Spotify told Wall Street a growth story which deflected attention away from the company’s loss making operations. A growth story pumps up the stock price until the chickens start coming home to roost. (Growth is also the reason to put off exercising pricing power over subscriptions.) Investors bought into the growth story in the absence of alternatives, not just for Spotify but for the market in general (compare Russell Growth and Value indexes from 2008-2023). Cutting costs and seeking profit is an example of what public company CEOs might do in anticipation of a rotational shift from growth to value investing that could hit their shares.

Never forget that due to Daniel Ek’s super-voting stock (itself an ESG fail), he is in control of Spotify. So there’s nowhere to hide when the iconography turns to blame. It’s not that easy or cheap to fire him, but if the board really wanted to give him the heave, they could do it.

I expect that Ek’s newly found parsimony will be even more front and center in renegotiations of Spotify’s royalty deals since he’s always blamed the labels for why Spotify can’t turn a profit. Not that WTC lease, surely. This would be a lot more tolerable from someone you thought was actually making an effort to cut all costs not just your revenue. Maybe that will happen, but even if Spotify became a lean mean machine, it will take years to recover from the 1999 levels of stupid that preceded it.

Hellooo Apple. One big thinker in music business issues calls it “Spotify drunk” which describes the tendency of record company marketers to focus entirely on Spotify and essentially ignore Apple Music as a distribution partner. If you’re in that group drinking the Spotify Kool Aid, you may want to give Apple another look. One thing that is almost certain is that that Apple will still be around in five years.

Just sayin.

Mechanicals on Physical and Downloads Get COLA Increase; Nothing for Streaming

Recall that the “Phonorecords IV” minimum mechanical royalties paid by record companies on physical and downloads increased from 9.1¢ to 12¢ with an annual cost of living adjustment each year of the PR IV rate period. The first increase was calculated by the Copyright Royalty Judges and was announced this week. That increase was from 12¢ to 12.40¢ and is automatic effective January 1, 2024.

Note that there is no COLA increase for streaming for reasons I personally do not understand. There really is no justification for not applying a COLA to a government mandated rate that blocks renegotiation to cover inflation expectations. After all, it works for Edmund Phelps.

The Federal Trade Commission on Copyright and AI

The FTC’s comment in the Copyright Office AI inquiry shows an interesting insight to the Commission’s thinking on some of the same copyright issues that bother us about AI, especially AI training. Despite Elon Musk’s refreshing candor of the obvious truth about AI training on copyrights, the usual suspects in the Copyleft (Pam Samuelson, Sy Damle, etc.) seem to have a hard time acknowledging the unfair competition aspects of AI and AI training (at p. 5):

Conduct that may violate the copyright laws––such as training an AI tool on protected expression without the creator’s consent or selling output generated from such an AI tool, including by mimicking the creator’s writing style, vocal or instrumental performance, or likeness—may also constitute an unfair method of competition or an unfair or deceptive practice, especially when the copyright violation deceives consumers, exploits a creator’s reputation or diminishes the value of her existing or future works, reveals private information, or otherwise causes substantial injury to consumers. In addition, conduct that may be consistent with the copyright laws nevertheless may violate Section 5.

We’ve seen unfair competition claims pleaded in the AI cases–maybe we should be thinking about trying to engage the FTC in prosecutions.

@LinaKhanFTC Launches Investigation into AI-Enabled Voice Cloning