Does “Publicly Available” AI Scraping Mean They Take Everything or Just Anything That’s Not Nailed Down?

Let’s be clear: It is not artificial intelligence as a technology that’s the existential threat. It’s the people who make the decisions about how to train and use artificial intelligence that are the existential threat. Just like nuclear power is not an existential threat, it’s the Czar Bomba that measured 50 megatons on the bangometer that’s the existential threat.

If you think that the tech bros can be trusted not to use your data scraped from their various consumer products for their own training purposes, please point to the five things they’ve done in the last 20 years that give you that confidence? Or point to even one thing.

Here’s an example. Back in the day when we were trying to build a library of audio fingerprints, we first had to rip millions of tracks in order to create the fingerprints. One employee who came to us from a company with a free email service said that there were millions of emails with audio file attachments just sitting there in users’ sent mail folders. Maybe we could just grab those audio files? Obviously that would be off limits for a host of reasons, but he didn’t see it. It’s not that he is an immoral person–immoral people recognize that there are some rules and they just want to break them. He was amoral–he didn’t see the rules and he didn’t think anything was wrong with his suggestion.

But the moral of the story–so to speak–is that I fully believe every consumer product is being scraped. That means that there’s a fairly good chance that Google, Microsoft, Meta/Facebook and probably other Big Tech players are using all of their consumer products to train AI. I would not bet against it.

If you think that’s crazy, I would suggest you think again. While these companies keep that kind of thing fairly quiet, it’s not the first time that the issue has come up–Big Tech telling you one thing, but using you to gain a benefit for something entirely different that you probably would never have agreed to had you known.

Take the Google Books saga. The whole point of Google’s effort at digitizing all the world’s books wasn’t because of some do-gooder desire to create the digital library of Alexandria or even the snippets that were the heart of the case. No–it was the “nondisplay uses” like training Google’s translation engine using “corpus machine translation”. The “corpus” of all the digitized books was the real value and of course was the main thing that Google wouldn’t share with the authors and didn’t want to discuss in the case.

Another random example would be “GOOG-411”. We can thank Marissa Meyer for spilling the beans on that one.

According to PC World back in 2010:

Google will close down 1-800-GOOG-411 next month, saying the free directory assistance service has served its purpose in helping the company develop other, more sophisticated voice-powered technologies.

GOOG-411, which will be unplugged on Nov. 12, was the search company’s first speech recognition service and led to the development of mobile services like Voice Search, Voice Input and Voice Actions.

Google, which recorded calls made to GOOG-411, has been candid all along about the motivations behind running the service, which provides phone numbers for businesses in the U.S. and Canada.

In 2007, Google Vice President of Search Products & User Experience Marissa Mayer said she was skeptical that free directory assistance could be viable business, but that she had no doubt that GOOG-411 was key to the company’s efforts to build speech-to-text services.

GOOG 411 is a prime example of how Big Tech plays the thimblerig, especially the “has been candid all along about the motivations behind running the service.” Doesn’t that phrase just ooze corporate flak? That, as we say in the trade, is a freaking lie.

None of the GOOG-411 collateral ever said, “Hey idiot, come help us get even richer by using our dumbass “free” directory assistance “service”.” Just like they’re not saying, “Hey idiot, use our “free” products so we can train our AI to take your job.” That’s the thimblerig, but played at our expense.

This subterfuge has big consequences for people like lawyers. As I wrote in my 2014 piece in Texas Lawyer:

“A lawyer’s duty to maintain the confidentiality of privileged communications is axiomatic. Given Google’s scanning and data mining capabilities, can lawyers using Gmail comply with that duty without their clients’ informed consent? In addition to scanning the text, senders and recipients, Google’s patents for its Gmail applications claim very broad functionality to scan file attachments. (The main patent is available on Google’s site. A good discussion of these patents is in Jeff Gould’s article, “The Natural History of Gmail Data Mining”, available on Medium.)”

Google has made a science of enticing users into giving up free data for Google to evolve even more products that may or may not be useful beyond the “free” part. Does the world really need another free email program? Maybe not, but Google does need a way to snarf down data for its artificial intelligence platforms–deceptively.

Fast forward ten years or so and here we are with the same problem–except it’s entirely possible that all of the Big Tech AI platforms are using their consumer products to train AI. Nothing has changed for lawyers, and some version of these rules would be prudent to follow for anyone with a duty of confidentiality like a doctor, accountant, stock broker or any of the many licensed professions. Not to mention social workers, priests, and the list goes on. If you call Big Tech on the deception and they will all say that they operate within their privacy policies, “de-identify” user data, only use “public” information, or other excuses.

I think the point of all this is that the platforms have far too many opportunities to cross-collateralize our data for the law to permit any confusion about what data they scrape.

What We Think We Know

Microsoft’s AI Training Practices

Microsoft has publicly stated that it does not use data from its Microsoft 365 products (e.g., Word, Excel, Outlook) to train its AI models. The company wants us to believe they rely on “de-identified” data from sources such as Bing searches, Copilot interactions, and “publicly available” information, whatever that means. Microsoft emphasizes its commitment to responsible AI practices, including removing metadata and anonymizing data to protect user privacy. See what I mean? Given Microsoft takes these precautions, that makes it all fine.

However, professionals using Microsoft’s tools must remain vigilant. While Microsoft claims not to use customer data from enterprise accounts for AI training, any inadvertent sharing of sensitive information through other Microsoft services (e.g., Bing or Copilot) could pose risks for users, particularly people with a duty of confidentiality like lawyers and doctors. And we haven’t even discussed child users yet.

Google’s AI Training Practices

For decades, Google has faced scrutiny for its data practices, particularly with products like Gmail, Google Docs, and Google Drive. Google’s updated privacy policy explicitly allows the use of “publicly available” information and user data for training its AI models, including Bard and Gemini. While Google claims to anonymize and de-identify data, concerns remain about the potential for sensitive information to be inadvertently included in training datasets.

For licensed professionals, these practices raise significant red flags. Google advises users not to input confidential or sensitive information into its AI-powered tools–typically Googlely. The risk of human reviewers accessing “de-identified” data can happen to anyone, but why in the world would you ever trust Google?

Does “Publicly Available” Mean Everything or Does it Mean Anything That’s Not Nailed Down?

These companies speak of “publicly available” data as if data that is publicly available is free to scrape and use for training. So what does that mean?

Based on the context and some poking around, it appears that there is no legally recognizable definition of what “publicly available” actually means. If you were going to draw a line between “publicly available” and the opposite, where would you draw it? You won’t be surprised to know that Big Tech will probably draw the line in an entirely different place than a normal person.

As far as I can tell, “publicly available” data would include data or content that is accessible by a data scraping crawler or by the general public without a subscription, payment, or special access permissions. This likely includes web pages, posts on social media like baby pictures on Facebook or Instagram, or other platforms that do not restrict access to their content through paywalls, registration requirements, or other barriers like terms of service prohibiting data scraping, API or a robots.txt file (which like a lot of other people including Ed Newton-Rex, I’m skeptical of even working).

While discussions of terms of service, notices prohibiting scraping and automated directions to crawlers sound good, in reality there’s no way to stop a determined crawler. The vulpine lust for data and cold hard cash by Big Tech is not realistically possible to stop at this point. Stopping the existential onslaught explains why the world needs to escalate punishment for these violations to a new level that may seem extreme at this point or at least unusually harsh.

Yet the massive and intentional copyright infringement, privacy violations, and who knows what else are so vast they are beyond civil penalties particularly for a defendant that seemingly prints money.



@FTC: AI (and other) Companies: Quietly Changing Your Terms of Service Could Be Unfair or Deceptive

An important position paper from the Federal Trade Commission about AI (emphasis mine where indicated):

You may have heard that “data is the new oil”—in other words, data is the critical raw material that drives innovation in tech and business, and like oil, it must be collected at a massive scale and then refined in order to be useful. And there is perhaps no data refinery as large-capacity and as data-hungry as AI.

Companies developing AI products, as we have noted, possess a continuous appetite for more and newer data, and they may find that the readiest source of crude data are their own userbases. But many of these companies also have privacy and data security policies in place to protect users’ information. These companies now face a potential conflict of interest: they have powerful business incentives to turn the abundant flow of user data into more fuel for their AI products, but they also have existing commitments to protect their users’ privacy….

It may be unfair or deceptive for a company to adopt more permissive data practices—for example, to start sharing consumers’ data with third parties or using that data for AI training—and to only inform consumers of this change through a surreptitious, retroactive amendment to its terms of service or privacy policy. (emphasis in original)…

The FTC will continue to bring actions against companies that engage in unfair or deceptive practices—including those that try to switch up the “rules of the game” on consumers by surreptitiously re-writing their privacy policies or terms of service to allow themselves free rein to use consumer data for product development. Ultimately, there’s nothing intelligent about obtaining artificial consent.

Read the post on FTC

Now with added retroactive acrobatics: @DamianCollins calls on UK Prime Minister to stop Google’s “Text and Data Mining” Circus

Damian Collins (former chair of the UK Parliament’s Digital Culture Media and Sport Select Committee) warns of Google’s latest AI shenanigans in a must-read opinion piece in the Daily Mail that highlights Google’s attempt to lobby its way into what is essentially a retroactive safe harbor to protect Google and its confederates in the AI land grab. While Mr. Collins writes about Google’s efforts to rewrite the laws of the UK to free ride in his home country which is egregious bullying, the episode he documents is instructive for all of us. If Google & Co. will do it to the Mother of Parliaments, it’s only a matter of time until Google & Co. do the same everywhere or know the reason why. Their goal is to hoover up all the world’s culture that the AI platforms have not scraped already and–crucially–to get away with it. And as Guy Forsyth says, “…nothing says freedom like getting away with it.”

The timeline of AI’s appropriation of all the world’s culture is a critical understanding to appreciate just how depraved Big Tech’s unbridled greed really is. The important thing to remember is that AI platforms like Google have been scraping the Internet to train their AI for some time now, possibly many years. This apparently includes social media platforms they control. My theory is that Google Books was an early effort at digitization for large language models to support products like corpus machine translation as a predecessor to Gemini (“your twin”) and other Google AI products. We should ask Ray Kurzweil.

There is starting to be increasing evidence that this is exactly what these people are up to.

The New York Times Uncovers the Crimes

According to an extensive long-form report in the New York Times by a team of very highly respected journalists, it turns out that Google has been planning this “Text and Data Mining” land grab for some time. At the very moment YouTube was issuing press releases about their Music AI Incubator and their “partners”–Google was stealing anything that was not nailed down that anyone had hosted on their massive platforms, including Google Docs, Google Maps, and…YouTube. The Times tells us:

Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators….Google said that its A.I. models “are trained on some YouTube content,” which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. 

I find it hard to believe that YouTube was both allowed to transcribe and scrape under all its content deals, or that they parsed through all videos to find the unprotected ones subject to their interpretation of the YouTube terms of use. So as we say in Texas, that sounds like bullshit for starters.

How does this relate to the Text and Data Mining exception that Mr. Collins warns of? Note that the NYT tells us “Google transcribed YouTube videos to harvest text.” That’s a clue.

As Mr. Collins tells us:

Google [recently] published a policy paper entitled: Unlocking The UK’s AI Potential.

What’s not to like?, you might ask. Artificial intelligence has the potential to revolutionise our economy and we don’t want to be left behind as the rest of the world embraces its benefits.

But buried in Google’s report is a call for a ‘text and data mining’ (TDM) exception to copyright.

This TDM exception would allow Google to scrape the entire history of human creativity from the internet without permission and without payment.

And, of course, Mr. Collins is exactly correct, that’s exactly what Google have in mind.

The Conspiracy of Dunces and the YouTube Fraud

In fairness, it wasn’t just Google ripping us off, but Google didn’t do anything to stop it as far as I can tell. One thing to remember is that YouTube was, and I think still is, not very crawlable by outsiders. It is almost certainly the case that Google would know who was crawling youtube.com, such as Bingbot, DuckDuckBot, Yandex Bot, or Yahoo Slurp if for no other reason that those spiders were not googlebot. With that understanding, the Times also tells us:

OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform. [Whatever “independent” means.]

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot….

OpenAI eventually made Whisper, the speech recognition tool, to transcribe YouTube videos and podcasts, six people said. But YouTube prohibits people from not only using its videos for “independent” applications, but also accessing its videos by “any automated means (such as robots, botnets or scrapers).”

OpenAI employees knew they were wading into a legal gray area, the people said, but believed that training A.I. with the videos was fair use. 

And strangely enough, many of the AI platforms sued by creators raise “fair use” as a defense (if not all of the cases) which is strangely reminiscent of the kind of crap we have been hearing from these people since 1999.

Now why might Google have permitted OpenAI to crawl YouTube and transcribe videos (and who knows what else)? Probably because Google was doing the same thing. In fact, the Times tells us:

Some Google employees were aware that OpenAI had harvested YouTube videos for data, two people with knowledge of the companies said. But they didn’t stop OpenAI because Google had also used transcripts of YouTube videos to train its A.I. models, the people said. That practice may have violated the copyrights of YouTube creators. So if Google made a fuss about OpenAI, there might be a public outcry against its own methods, the people said.

So Google and its confederate OpenAI may well have conspired to commit massive copyright infringement against the owner of a valid copyright, did so willingly, and for purposes of commercial advantage and private financial gain. (Attempts to infringe are prohibited to the same extent as the completed act). The acts of these confederates vastly exceed the limits for criminal prosecution for both infringement and conspiracy.

But to Mr. Collins’ concern, the big AI platforms transcribed likely billions of hours of YouTube videos to manipulate text and data–you know, TDM.

The New Retroactive Safe Harbor: The Flying Googles Bring their TDM Circus Act to the Big Tent With Retroactive Acrobatics

But also realize the effect of the new TDM exception that Google and their Big Tech confederates are trying to slip past the UK government (and our own for that matter). A lot of the discussion about AI rulemaking acts as if new rules would be for future AI data scraping. Au contraire mes amis–on the contrary, the bad acts have already happened and they happened on an unimaginable scale.

So what Google is actually trying to do is get the UK to pass a retroactive safe harbor that would deprive citizens of valuable property rights–and also pass a prospective safe harbor so they can keep doing the bad acts with impunity.

Fortunately for UK citizens, the UK Parliament has not passed idiotic retroactive safe harbor legislation like the U.S. Congress has. I am, of course, thinking of the vaunted Music Modernization Act (MMA) that drooled its way to a retroactive safe harbor for copyright infringement, a shining example of the triumph of corruption that has yet to be properly challenged in the US on Constitutional grounds.

There’s nothing like the MMA absurdity in the UK, at least not yet. However, that retroactive safe harbor was not lost on Google, who benefited directly from it. They loved it. They hung it over the mantle next to their other Big Game trophy, the DMCA. And now they’d like to do it again for the triptych of legislative taxidermy.

Because make no mistake–a retroactive safe harbor would be exactly the effect of Google’s TDM exception. Not to mention it would also be a form of retroactive eminent domain, or what the UK analogously might call the compulsory purchase of property under the Compulsory Purchase of Property Act. Well…”purchase” might be too strong a word, more like “transfer” because these people don’t intend to pay for a thing.

The effect of passing Google’s TDM exception would be to take property rights and other personal rights from UK citizens without anything like the level of process or compensation required under the Compulsory Purchase of Property–even when the government requires the sale of private property to another private entity (such as a railroad right of way or a utility easement).

The government is on very shaky ground with a TDM exception imposed by the government for the benefit of a private company, indeed foreign private companies who can well afford to pay for it. It would be missing government oversight on a case-by-base basis, no proper valuation, and for entirely commercial purposes with no public benefit. In the US, it would likely violate the Takings Clause of our Constitution, among other things.

It’s Not Just the Artists

Mr. Collins also makes a very important point that might get lost among the stars–it’s not just the stars that AI is ripping off–it is everyone. As the New York Times story points out (and it seems that there’s more whistleblowers on this point every day), the AI platforms are hoovering up EVERYTHING that is on the Internet, especially on their affiliated platforms. That includes baby videos, influencers, everything.

This is why it is cultural appropriation on a grand scale, indeed a scale of depravity that we haven’t seen since the Nurenberg Trials. A TDM exception would harm all Britons in one massive offshoring of British culture.

Chronology: The Week in Review: Search is Ripping Us Off Again, this time for AI; #FreeJimmyLai; the MLC loves the MLC almost as much as the NMPA

In case you think that infringement of copyrights in AI training happens only to rock stars, photographers, record companies or movie studios, let me give you a personal anecdote that might cause you to reconsider. I asked Bing’s Copilot AI tool to list the deal points in a record producer agreement, and this is what I got back:

Copilot

Certainly! When negotiating a producer agreement, several key points should be considered. Here are some common negotiation points from the artist’s perspective:

  1. Scope of Engagement:
    1. The producer is generally engaged for one album project at a time.
    1. Clarify the scope of work, including the number of songs or recordings to be produced.
  2. Budget and Compensation:
    1. Determine the budget for the project (e.g., $10,000 in your case).
    1. Agree on the producer’s compensation, which may include an upfront fee, points on sales, or a combination.
  3. Ownership and Rights:
    1. Address ownership of the master recordings:
      1. Will the artist own and control the masters?
      1. Does the producer retain any rights?
  4. Credits and Royalties:
    1. Specify how the producer will be credited on releases.
    1. Discuss any royalties or backend points for the producer.
  5. Recording Schedule:
    1. Set a timeline for recording sessions.
    1. Define the two-week period for your project.
  6. Logistics:
    1. Determine where the recordings will take place.
    1. Outline the artist’s approval process.
    1. Define the producer’s deliverables.
  7. Termination Clause:
    1. Include provisions for termination if either party breaches the agreement.
    1. Address payment for work completed up to that point.

Remember that each negotiation is unique, and it’s essential to tailor the agreement to your specific needs and circumstances. Legal advice is recommended to ensure compliance with local laws and industry standards1 3 4.

Source(s)

1. An Artist’s Guide to Negotiating Record Producer Agreements – Medium

2. Negotiating a Music Producer Agreement: 7 Key Issues

3. 5 Things to Look for in a Music Producer Agreement – Lawyer Drummer

4. How to Negotiate: Steps, Phrases & Tips to Win the Deal – Selling Signals

Notice that Copilot lists “Source(s)” and that source #1 is an article from Medium.

My article.

I’ve written a number of articles about negotiating producer agreements and I can categorically say I haven’t given permission or been asked to give permission for any of those articles to be scraped and used to train AI. So how did it come to be in Copilot’s training materials?

My guess is that Medium (and probably a lot of other sites) were crawled by Bing’s spider and that copies of Medium pages were stored in Bing’s (or Microsoft’s) data centers. When Microsoft launched its Copilot AI tool, one could imagine that the temptation would have been strong to use those copies as part of the training tools for Copilot. In fact, the Verge reports that “companies around the web are using your site and its data to build massive sets of training data, in order to build models and products that may not acknowledge your existence at all.”

Are you surprised then that two of the biggest operators in the AI space are the search engine operators Google and Microsoft? This is another example of how Big Tech helps itself to your data and work product without you even knowing it’s happening. So now what? Now I know I’m being ripped off, and I’m wondering if Medium is in on it.

The Verge tells us:

The ability to download, store, organize, and query the modern internet gives any company or developer something like the world’s accumulated knowledge to work with. In the last year or so, the rise of AI products like ChatGPT, and the large language models underlying them, have made high-quality training data one of the internet’s most valuable commodities. That has caused internet providers of all sorts to reconsider the value of the data on their servers, and rethink who gets access to what. 

Ya think?

#FreeJimmyLai

In case you were wondering if the Chinese Communist Party is serious about using TikTok to collect data as they please, Hong Kong publisher Jimmy Lai is a perfect example of what happens when a company refuses to cooperate. The CCP took over Jimmy’s Apple Daily newspaper and arrested him. Jimmy has been in prison for three years and has still not come to “trial” (whatever that means under the CCP).

Blake Morgan sounds off in Hypebot on how TikTok uses music as a honeypot to hook innocent users

The MLC Reup

The MLC gets a five year review of how they are doing. The Copyright Office conducts that review of both the MLC and the DLC. The public (that’s you) gets a chance to weigh in now because the MLC and the DLC filed their respective written statements about their respective awesomeness.

One of the most interesting exchanges happened between Graham Davies (who essentially is the public face of the DLC and is the head of the Digital Media Association) and the NMPA. Graham offered muted criticism of the MLC which irked the easily irked NMPA who offered a rather scathing response.

The most interesting thing about the exchange was that it was the NMPA–not the MLC–that responded to Graham. Tell you anything?

Chronology: The Week in Review: MLC Redesignation Proceeding Highlights Ownership Issues for the Government’s Musical Works Database; TikTok’s SOPA Problem; Google’s Nonindemnity Indemnity for AI

One of the few things Congress got right in Title I of the Music Modernization Act is the five-year review of the mechanical licensing collective. Or more precisely, whether the private company previously designated by the Copyright Office to conduct the functions of the Mechanical Licensing Collective should have another five years to continue doing whatever it is they do.

Impliedly, and I think a bit unfairly, Congress told the Copyright Office to approve its own decision to appoint the current MLC or admit they made a mistake. This is yet another one of the growing list oversights in the oversight. Wouldn’t it make more sense for someone not involved in the initial decision to be evaluating the performance of the MLC? Particularly when there are at least tens of millions changing hands as well as some highly compensated MLC employees, any one of whom makes more than the Copyright Royalty Judges.

What happens if the Register of Copyright actually fires The MLC, Inc. and designates a new MLC operator? The first question probably should be what happens to the vaunted MLC musical works database and the attendant software and accounting systems which seem to be maintained out of the UK for some reason.

I actually raised this question in a comment to the Copyright Office back in 2020. In short, my question was probably more of a statement: ‘‘The musical works database does not belong to the MLC or The MLC and if there is any confusion about that, it should be cleared up right away.” The Copyright Office had a very clear response:

While the mechanical licensing collective must ‘‘establish and maintain a database containing information relating to musical works,’’ the statute and legislative history emphasize that the database is meant to benefit the music industry overall and is not ‘‘owned’’ by the collective itself….Any use by the Office referring to the public database as ‘‘the MLC’s database’’ or ‘‘its database’’ was meant to refer to the creation and maintenance of the database, not ownership. [85 FR at 58172, text accompanying notes 30 and 31.]

So if the current operator of the MLC is fired, we know from the MMA and the Copyright Office guidance that one thing The MLC, Inc. cannot do is hold the database and its attendant systems hostage, or demand payment, or any other shadiness. These items do not belong to them so they must not assert control over that which they do not own.

Which would include the hundreds of millions of black box money that the MLC, Inc. has failed to distribute in going on four years. I’ve even heard cynics suggest that the market share distribution of black box will occur immediately following The MLC, Inc.’s redesignation and the corresponding renewal of HFA’s back office contract which seems to be worth about $10 million a year all by itself.

What would also have been helpful would be for Congress to have required the Copyright Office to publish evaluation criteria for what they expected the MLC’s operator to actually do as well as performance benchmarks. Like I said, it’s a bit unfair of Congress to put the Copyright Office in the unprecedented position of evaluating such an important role with no guidance whatsoever. Surely Congress did not intend for the Copyright Office to have unfettered autonomy in deciding what standards to apply to their review of a quasi-governmental agency like the MLC, yet seems to have defaulted to the guardrail of the Administrative Procedures Act or some other backstop to sustain checks and balances on the situation.

But at least the ownership question is settled.

Breaking the Internet Yet Again: TikTok’s SOPA Problem

TikTok users swarmed over the Capitol to protest and impede a Congressional vote that would force the sale of the ubiquitous TikTok. Can Camp Pelosi redux be far behind? Well, no, because this was a digital swarm which is just different, you see. It’s just different when Big Tech tries to protect an IPO.

TikTok’s tactics are very reminiscent of Google’s tactics with SOPA or Napster’s tactics with Camp Chaos.

But not even Napster had the brass to go to full on papal indulgences. Yes, that’s right: NunTok will save the IPO.

Nuns good, TikTok bad!

I wonder which Washington lobbyist thought of NunTok? Perhaps this guy:

Google’s Nonindemnity Indemnity for AI

Some generative AI platforms are trying to make users believe that the company will actually protect them from copyright infringement claims. When you drill down on what the promise actually is, it’s pretty flimsy and may itself be consumer fraud.

Chronology: The week in review, Spotify layoffs, mechanical rate increase, FTC on copyright issues in AI

What Spotify needs is a good pandemic.  

Harsh?  Not really, at least not from a share price point of view. Spotify’s all time highest share price was during the COVID pandemic.

Spotify CEO Daniel Ek and the press tells us that Spotify is cutting 1,500 jobs which works out to about 17% of Spotify employees. Which works out to a pre-layoff workforce of 8,823.  So let’s start there—that workforce number seems very high and is completely out of line with some recent data from Statista which is usually reliable.

If Statista is correct, Spotify employed 5,584 as of last year. Yet somehow Spotify’s 2023 workforce grew to 9200 according to the Guardian, fully 2/3 over that 2022 level without a commensurate and offsetting growth in revenue. That’s a governance question in and of itself.

Why the layoffs? The Guardian reports that Spotify CEO Daniel Ek is concerned about costs. He says “Despite our efforts to reduce costs this past year, our cost structure for where we need to be is too big.” Maybe I missed it, but the only time I can recall Daniel Ek being vocally concerned about Spotify’s operating costs was when it came to paying royalties. Then it was full-blown poor mouthing while signing leases for very expensive office space in 4 World Trade Center as well as other pricy real estate, executive compensation and podcasters like Harry & Meghan.

Mr. Ek announced his new, new thing:

Over the last two years, we’ve put significant emphasis on building Spotify into a truly great and sustainable business – one designed to achieve our goal of being the world’s leading audio company and one that will consistently drive profitability and growth into the future. While we’ve made worthy strides, as I’ve shared many times, we still have work to do. Economic growth has slowed dramatically and capital has become more expensive. Spotify is not an exception to these realities.

Which “economic growth” is that?

But, he is definitely right about capital costs.

Still, Spotify’s job cuts are not necessarily that surprising considering the macro economy, most specifically rents and interest rates. As recently as 2018, Spotify was the second largest tenant at 4 WTC. Considering the sheer size of Spotify’s New York office space, it’s not surprising that Spotify is now subletting five floors of 4 WTC earlier this year. That’s right, the company had a spare five floors. Can that excess just be more people working at home given Mr. Ek’s decision to expand Spotify’s workforce? But why does Spotify need to be a major tenant in World Trade Center in the first place? Renting the big New York office space is the corporate equivalent of playing house. That’s an expensive game of pretend.

Remember that Spotify is one of the many companies that rose to dominance during the era of easy money in response to the financial crisis that was the hallmark of quantitative easing and the Federal Reserve’s Zero Interest Rate Policy beginning around 2008. Spotify’s bankers were able to fuel Daniel Ek’s desire to IPO and cash out in the public markets by enabling Spotify to run at a loss because money was cheap and the stock market had a higher tolerance for risky investments. When you get a negative interest rate for saving money, Spotify stock doesn’t seem like a totally insane investment by comparison. This may have contributed to two stock buy-back programs of $1 billion each, Spotify’s deal with Barcelona FC and other notorious excesses.

As a great man said, don’t confuse leverage for genius. It was only a matter of time until the harsh new world of quantitative tightening and sharply higher inflation came back to bite. For many years, Spotify told Wall Street a growth story which deflected attention away from the company’s loss making operations. A growth story pumps up the stock price until the chickens start coming home to roost. (Growth is also the reason to put off exercising pricing power over subscriptions.) Investors bought into the growth story in the absence of alternatives, not just for Spotify but for the market in general (compare Russell Growth and Value indexes from 2008-2023). Cutting costs and seeking profit is an example of what public company CEOs might do in anticipation of a rotational shift from growth to value investing that could hit their shares.

Never forget that due to Daniel Ek’s super-voting stock (itself an ESG fail), he is in control of Spotify. So there’s nowhere to hide when the iconography turns to blame. It’s not that easy or cheap to fire him, but if the board really wanted to give him the heave, they could do it.

I expect that Ek’s newly found parsimony will be even more front and center in renegotiations of Spotify’s royalty deals since he’s always blamed the labels for why Spotify can’t turn a profit. Not that WTC lease, surely. This would be a lot more tolerable from someone you thought was actually making an effort to cut all costs not just your revenue. Maybe that will happen, but even if Spotify became a lean mean machine, it will take years to recover from the 1999 levels of stupid that preceded it.

Hellooo Apple. One big thinker in music business issues calls it “Spotify drunk” which describes the tendency of record company marketers to focus entirely on Spotify and essentially ignore Apple Music as a distribution partner. If you’re in that group drinking the Spotify Kool Aid, you may want to give Apple another look. One thing that is almost certain is that that Apple will still be around in five years.

Just sayin.

Mechanicals on Physical and Downloads Get COLA Increase; Nothing for Streaming

Recall that the “Phonorecords IV” minimum mechanical royalties paid by record companies on physical and downloads increased from 9.1¢ to 12¢ with an annual cost of living adjustment each year of the PR IV rate period. The first increase was calculated by the Copyright Royalty Judges and was announced this week. That increase was from 12¢ to 12.40¢ and is automatic effective January 1, 2024.

Note that there is no COLA increase for streaming for reasons I personally do not understand. There really is no justification for not applying a COLA to a government mandated rate that blocks renegotiation to cover inflation expectations. After all, it works for Edmund Phelps.

The Federal Trade Commission on Copyright and AI

The FTC’s comment in the Copyright Office AI inquiry shows an interesting insight to the Commission’s thinking on some of the same copyright issues that bother us about AI, especially AI training. Despite Elon Musk’s refreshing candor of the obvious truth about AI training on copyrights, the usual suspects in the Copyleft (Pam Samuelson, Sy Damle, etc.) seem to have a hard time acknowledging the unfair competition aspects of AI and AI training (at p. 5):

Conduct that may violate the copyright laws––such as training an AI tool on protected expression without the creator’s consent or selling output generated from such an AI tool, including by mimicking the creator’s writing style, vocal or instrumental performance, or likeness—may also constitute an unfair method of competition or an unfair or deceptive practice, especially when the copyright violation deceives consumers, exploits a creator’s reputation or diminishes the value of her existing or future works, reveals private information, or otherwise causes substantial injury to consumers. In addition, conduct that may be consistent with the copyright laws nevertheless may violate Section 5.

We’ve seen unfair competition claims pleaded in the AI cases–maybe we should be thinking about trying to engage the FTC in prosecutions.

Does the Metaverse Have Rights? Permissionless Innovation Bias and Artificial Intelligence

As Susan Crawford told us in 2010:

I was brought up and trained in the Internet Age by people who really believed that nation states were on the verge of crumbling…and we could geek around it.  We could avoid it.  These people [and their nation states] were irrelevant.

Ms. Crawford had a key tech role in the Obama Administration and is now a law professor. She crystalized the wistful disappointment of technocrats when the Internet is confronted with generational expectations of non-technocrats (i.e., you and me). The disappointment that ownership means something, privacy means something and that permission defines a self-identity boundary that is not something to “geek around” in a quest for “permissionless innovation.”

Seeking permission recognizes humanity. Failing to do so takes these rights away from the humans and gives them to the people who own the machines–at least until the arrival of general artificial intelligence which may find us working for the machines.

These core concepts of civil society are not “irrelevant”. They define humanity. What assurance do we have that empowered AI machines won’t capture these rights?

All these concepts are at issue in the “metaverse” plan announced by Mark Zuckerberg, who has a supermajority of Facebook voting shares and has decided to devote an initial investment of $10 billion (that we know of) to expanding the metaverse. Given the addictive properties of social media and the scoring potential of social credit it is increasingly important that we acknowledge that the AI behind the metaverse (and soon almost everything else) is itself a hyper efficient implementation of the biases of those who program that AI.

AI bias and the ethics of AI are all the rage. Harvard Business Review tells us that “AI can help identify and reduce the impact of human biases, but it can also make the problem worse by baking in and deploying biases at scale in sensitive application areas.” Cathy O’Neill’s 2016 book Weapons of Math Destruction is a deep dive into how databases discriminate and exhibit the biases of those who create them.

We can all agree that insurance redlining, gender stereotyping and comparable social biases need to be dealt with. But concerns about bias don’t end there. An even deeper dive needs to be done into the more abstract biases required to geek around the nation state and fundamental human rights corrupted by the “permissionless innovation” bias that is built into major platforms like Facebook and from which its employees and kingpin enjoy unparalleled riches.

That bias will be incorporated into the Zuckerberg version of the metaverse and the AI that will power it.

Here’s an example. We know that Facebook’s architecture never contemplated a music or movie licensing process. Zuckerberg built it that way on purpose–the architecture reflected his bias against respecting copyright, user data and really any private property rights not his own. Not only does Zuckerberg take copyright and data for his own purposes, he has convinced billions of people to create free content for him and then to pay him to advertise that content to Facebook users and elsewhere. He takes great care to be sure that there is extraordinarily complex programming to maximize his profit from selling other people’s property, but he refuses to do the same when it comes to paying the people who create the content, and by extension the data he then repackages and sells.

He does this for a reason–he was allowed to get away with it. The music and movie industries failed to stop him and let him get away with it year after year until he finally agreed to make a token payment to a handful of large companies. That cash arrives with no really accurate reporting because reporting would require reversing the bias against licensing and reporting that was built into the Facebook systems to begin with.

A bias that is almost certainly going to be extended into the Facebook metaverse.

The metaverse is likely going to be a place where everything is for sale and product placements abound. The level of data collection on individuals will likely increase exponentially. Consider this Techcrunch description of “Project Cambria” the Metaverse replacement for the standard VR headset:

Cambria will include capabilities that currently aren’t possible on other VR headsets. New sensors in the device will allow your virtual avatar to maintain eye contact and reflect your facial expressions. The company says that’s something that will allow people you’re interacting with virtually to get a better sense of how you’re feeling. Another focus of the headset will be mixed-reality experiences. With the help of new sensors and reconstruction algorithms, Facebook claims Cambria will have the capability to represent objects in the physical world with a sense of depth and perspective.

If past is prologue, the Metaverse will exhibit an even greater disregard for human rights and the laws that protect us than Facebook. That anti-human bias will be baked into the architecture and the AI that supports it. The machines don’t look kindly on those pesky humans and all their petty little rights that stand in the way of the AI getting what it wants.

If you don’t think that’s true, try reading the terms of service for these platforms. Or considering why the technocrats are so interested in safe harbors where their machines can run free of liability for collateral damage. The terms of service should make clear that AI has greater rights than you. We are way beyond pronouns now.

If the only concern of AI ethics is protection against stereotypes or insurance redlining (a version of the social credit score), we will be missing huge fundamental parts of the bias problem. Should we be content if AI is allowing its owner (for so long as it has an owner) to otherwise rob you blind by taking your property or selling your data while using the right pronoun as it geeks around the nation state?