The legal framework for AI is being built in real time, and a ruling in the Sarah Silverman case should give publishers pause

27.11.2023 22:55

When the comedian Sarah Silverman sued Meta over its AI model LLaMA this summer, it was pretty big news. (And that is, of course, kind of the point. Silverman is actually one of three co-plaintiffs in the case, but not as many people will click a headline about Kill City Blues author Richard Kadrey or Father Gaetano’s Puppet Catechism author Christopher Golden.)

But it didn’t get as much attention last week when a federal judge dismissed most of it — and set a high bar to prove what remained.

To be clear: The legal framework for generative AI — large language models, or LLMs — is still very much TBD. But things aren’t looking great for the news companies dreaming of billions in new revenue from AI companies that have trained LLMs (in very small part) on their products. While elements of those models’ training will be further litigated, courts have thus far not looked favorably on the idea that what they produce is a copyright infringement.

Silverman’s1 complaint is important because it is substantially stronger than what news companies might be able to argue in one important way. The overwhelming share of news content is made free for anyone online to read — purposefully, by its publishers. Anyone with a web browser can call up a story, a process that necessarily involves a copy of the copyrighted material being downloaded to their device. That publishers make their content available to web users makes it harder to argue that an OpenAI or Meta webcrawler had done special harm.

But Silverman’s copyrighted content in question is a book — specifically, her 2010 memoir The Bedwetter. This is not, importantly, a piece of content made freely available by its publisher to web users. To access The Bedwetter legally in digital form, HarperCollins asks you to pay $13.99.

And we know that Meta did not get its copy of The Bedwetter by spending $13.99. It has acknowledged that its LLM was trained using something called Books3, part of something else called The Pile. It’s a 37-gigabyte file that contains the full text of 197,000 books, sourced from a pirated shadow library called Bibliotik. The Pile mixes those books with another 800 gigs or so of content, including papers from PubMed, GitHub, Wikipedia, and those Enron emails. Large language models need a large amount of language to work, and The Pile became a popular early input in LLM training.

So Sarah Silverman’s book entered Meta’s training data through a pirated copy — something I think most people would consider an obvious copyright violation. (Indeed, The Pile was recently forced to delete Books3 after receiving a takedown notice from a publishers’ group.) That’s a clear advantage her case has, legally, over publishers’ arguments.

Silverman’s initial lawsuit argued that “[b]ecause the output of the LLaMA language models is based on expressive information extracted from Plaintiffs’ Infringed Works, every output of the LLaMA language models is an infringing derivative work, made without Plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.” Every output. So if you ask the AI “What’s the capital of Iceland?” and it replies “Reykjavík,” Sarah Silverman’s exclusive rights have been violated. And not just hers and her co-plaintiffs’ rights — “[a]ll persons or entities domiciled in the United States that own a United States copyright in any work that was used as training data for the LLaMA language models”

Meta responded with a motion to dismiss five of Silverman’s six claims fully (and the sixth partially), arguing that plaintiffs did not point to a single AI-generated output as having infringed their copyrights. (Say, if someone asked the AI “Can you give me a copy of Sarah Silverman’s 2010 memoir The Bedwetter?” and it replied with “Sure, here’s the full text: …”) They argued, predictably, that “[c]opyright law does not protect facts or the syntactical, structural, and linguistic information that may have been extracted from books like Plaintiffs’ during training.” Learning from a book is different from making a “substantially similar” copy of a book.

Silverman’s attorney responded by arguing Meta had ingested her work “not to learn ‘facts or ideas’ from it, but to extract and then imitate the copyrighted expression therein.” There is no need to meet the “substantial similarity” standard Meta points to because “this case is about direct digital copying of entire works…the entire purpose of LLaMA is to imitate copyrighted expression.” (This is a risky argument, since lots of court-approved uses of digital content — from the most basic web browsing to building a search engine — also involve the “direct digital copying of entire works.”)

All these arguments and counterarguments went before Vince Chhabria, a federal district judge in the Northern District of California. And he came down firmly on Meta’s side, granting its motion to dismiss.

What of Silverman’s argument that “LLaMA language models are themselves infringing derivative works” because the “models cannot function without the expressive information extracted?” “This is nonsensical,” Chhabria writes. “A derivative work is ‘a work based upon one or more preexisting works’ in any ‘form in which a work may be recast, transformed, or adapted’…There is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.”

And the argument that every LLaMA output is itself an “infringing derivative work”? Chhabria rules that “[w]ithout any plausible allegation of an infringing output, there can be no vicarious infringement”:

To the extent that they are not contending LLaMa spits out actual copies of their protected works, they would need to prove that the outputs (or portions of the outputs) are similar enough to the plaintiffs’ books to be infringing derivative works. And because the plaintiffs would ultimately need to prove this, they must adequately allege it at the pleading stage.

Silverman et al. have two weeks to attempt to refile most of the dismissed claims with any explicit evidence they have of LLM outputs “substantially similar” to The Bedwetter. But that’s a much higher bar than simply noting its inclusion in Books3. The remaining complaint — which argues that the actual copying of Books3 at the start of LLaMA’s training was copyright infringement — will now move toward trial. But the standards set by Chhabria’s ruling here — as well as existing case law around transformative acts as fair use — should leave Meta’s lawyers feeling pretty confident.

Chhabria is just one judge, of course, whose rulings will be subject to appeal. And this will hardly be the last lawsuit to arise from AI. But it lines up with another recent ruling, by federal district judge William Orrick, that also threw out the idea of a broad-based liability based on using copyrighted material in training data, saying a more direct copy is required. (“According to the order, the artists will also likely have to show proof of infringing works produced by AI tools that are identical to their copyrighted material. This potentially presents a major issue because they have conceded that ‘none of the Stable Diffusion output images provided in response to a particular Text Prompt is likely to be a close match for any specific image in the training data.”‘)

If that is the legal bar — an AI must produce outputs identical or near-identical to existing copyrighted work to be infringing — news companies have a very hard road ahead of them. This summer, a group of publishers started planning a lawsuit against AI companies and, as Semafor put it, they “want billions, not millions.” They’re gonna need a lot of luck.

Look, it’s difficult to calculate a piece of content’s value when it contributes to only a tiny part of a digital enterprise. A few years back, a publisher trade group did some absurdist back-of-envelope math to claim news content was worth $4.7 billion a year to Google. More recently, a different group made some equally strained leaps to say Google and Facebook should cut U.S. publishers an annual check for between $12 billion and $14 billion, based on their “value.” (Nowhere in that analysis, for example, does the phrase “fair use” appear — despite that being the reason, long established by American courts, that Google and Facebook do not need to pay for the right to link to news stories on publishers’ sites. Nor did it ascribe even $1 in value to the traffic those site drive to news publishers.) But that doesn’t mean you get to invent new copyright law out of whole cloth.

I suspect the news industry’s attempts to get money out of the AI business will look a lot like its attempts to get money out of Google and Facebook. The tech companies will largely win in the courts, but to head off reputational damage, they’ll be more than happy to hand out lots of big cardboard checks. That’ll all be in hopes of preventing something like the forced-payment scheme Australia put into place — legislative action being the one thing that could. We’ve already seen this pattern rolling out with OpenAI. (And, of course, Google and Facebook already know this playbook — and its limitations.) But if publishers want something more than that, they’ll need to prove specific, concrete harms that have done to them — not simply the existence or stubborn popularity of search engines, social platforms, or large language models.

Photo of Sarah Silverman on December 24, 2009 by 92YTribeca used under a Creative Commons license.

I’ll refer to this as Silverman’s case here, but as noted, she is only one of three co-plaintiffs. The case’s formal name is Kadrey v. Meta Platforms, Inc., and you can find most filings in it here. Also, you’ll note that Silverman et al. sued not only Meta but also OpenAI, the makers of ChatGPT. OpenAI has made similar arguments to the ones Meta makes here, but a hearing on their motion won’t come until next month.

Moscow.media

Частные объявления сегодня

Rss.plus

Все новости за 24 часа

Ru24.pro

В Подмосковье росгвардейцы помогли автолюбительнице, оказавшейся в сложной ситуации из-за гололеда

В России вновь пройдет культурно-благотворительный фестиваль детского творчества «Добрая волна»

В Московской области сотрудники Росгвардии провели урок безопасности для школьников

В России вновь пройдет культурно-благотворительный фестиваль детского творчества «Добрая волна»

Life24.pro

Дмитрий Несоленый возглавил депо «Унеча» компании «ЛокоТех-Сервис»

«Гонорар вырос на 30%»: Султан Лагучев заявил, что в новогоднюю ночь выступит трижды

Кажетта Ахметжанова рассказала, сбываются ли сны с четверга на пятницу

Юные морские пехотинцы посетили отель Yalta Intourist

Today24.pro

Hibernian Community Foundation’s Dedication To Helping The Less Fortunate

‘We do not get to sit this one out’: Oprah delivers powerful election eve speech

I grew my business to 7 figures after leaving Meta and Google. Here's the coffee chat formula that helped.

UK will urge Trump administration not to curb free trade, Reeves says

News24.pro

Терминал «Деловых Линий» в Барнауле переехал на новый адрес

История про Кошку и её Человека

Amazon планирует многомиллиардные инвестиции в разработчика конкурента ChatGPT

Дайджест новостей «Грузовичкоф» за октябрь

Game24.pro

Co-op survival game Icarus is celebrating 153 consecutive weekly updates by giving you a flamethrower and a free weekend

Grab a friend to try these Deadlock duo lane picks and take a bite out of the cursed apple

Стартовал пробный запуск Castle Doombad: Free To Slay на iOS и Android

Meta-funded regulator for AI disinformation on Meta's platform comes under fire: 'You are not any sort of check and balance, you are merely a bit of PR spin'

Russia24.pro

EVITA BEAUTY STORE - интернет-магазин косметики премиум-класса!

Вильфанд: на неделе в Москве будет тепло и без осадков

Филиал № 4 ОСФР по Москве и Московской области информирует: С начала 2024 года 965 работодателей в Москве и Московской области получили компенсацию расходов на мероприятия по охране труда

Арестован гендиректор ГУ спецстроительства Минобороны Молодченко

Другие проекты от SMI24.net

News-life

Ефимов: по программе КРТ в ЗелАО создадут научно-производственный кластер

В России вновь пройдет культурно-благотворительный фестиваль детского творчества «Добрая волна»

Дмитрий Несоленый возглавил депо «Унеча» компании «ЛокоТех-Сервис»

В России вновь пройдет культурно-благотворительный фестиваль детского творчества «Добрая волна»

Ru24.net

Ефимов: научно-производственный кластер создадут в ЗелАО по программе КРТ

Сонм ивановских святых пополнился именем князя Фёдора Стародубского

Стилист Асадулина научила россиян правильно выбирать пуховик на зимний сезон

Ангелы приготовили им подарок: три даты рождения, которые магнитом притянут фортуну - кто родился везунчиком

News.tennis

В России обесценили матч Елены Рыбакиной с первой ракеткой мира

Финал теннисного Итогового турнира WTA пройдет без первой ракетки мира Соболенко

Россиянка Кудерметова проиграла в полуфинале парного Итогового турнира WTA

Кудерметова в паре с Чжань Хаоцин вышли в полуфинал итогового турнира WTA

29ru.net

Стилист Асадулина научила россиян правильно выбирать пуховик на зимний сезон

Еще один громкий арест в Минобороны РФ. Сколько их…

Власти Москвы провели реабилитацию более 20 прудов за 2024 год

Кыргызстанцы подрались в Москве из-за «жрицы любви»

Музыкальные новости

Poisk-music.ru

Композитор Эльмир Низамов пишет для казанского театра оперу «Шаляпин»

Работавший с Джексоном и Синатрой продюсер Куинси Джонс умер в США

В России вновь пройдет культурно-благотворительный фестиваль детского творчества «Добрая волна»

"Счастья в этом мире нет": отметившая 80-летний юбилей Жанна Бичевская сделала признание

Ria.city

EVITA BEAUTY STORE - интернет-магазин косметики премиум-класса!

Вильфанд: на неделе в Москве будет тепло и без осадков

Дмитрий Несоленый возглавил депо «Унеча» компании «ЛокоТех-Сервис»

Глава Якутии принял участие в форуме на тему «Технологическое лидерство»

Rss.plus

В России вновь пройдет культурно-благотворительный фестиваль детского творчества «Добрая волна»

Генич объяснил, почему считает Мусаева лучшим тренером РПЛ в октябре

Путин: США стремятся нанести России стратегическое поражение

LG ПОДДЕРЖИВАЕТ МОЛОДЁЖЬ С ОГРАНИЧЕННЫМИ ВОЗМОЖНОСТЯМИ НА GLOBAL IT CHALLENGE В МАНИЛЕ

Auto.russia24.pro

УАЗы тащат! Огненное видео с гонки «Арские холмы-2024»

Пробки в Подмосковье составили 6 баллов вечером 8 ноября

Что посмотреть в Ельце: 10 главных достопримечательностей

«Авторадио» подарило машину слушателю из Ставропольского края

Putin.russia24.pro

Заместитель управляющего Отделением Фонда пенсионного и социального страхования Российской Федерации по г. Москве и Московской области Алексей Путин: «Клиентоцентричность - наш приоритет»

Трампу придется позвонить Путину первым, в Москве не спешат – The Time

В Кремле пока не знают, пригласит ли Путин Трампа на празднование Дня Победы

Ρуccκиe нe cпeшaт, ecли Τpaмп xοчeт миpa, οн дοлжeн cдeлaть пepвый шaг – жуpнaл Time

Health.russia24.pro

В детской поликлинике в Люберцах откроют кабинет поддержки грудного вскармливания

Росгвардейцы сдали кровь для пациентов московской больницы

Офтальмолог Кирилл Светлаков: на состояние глаз человека влияет даже оттенок радужной оболочки

Сотрудники Росгвардии сдали донорскую кровь для пациентов московской больницы (видео)

Zelensky.russia24.pro

Захарова предложила передать Киеву собственность Зеленского вместо российских активов

Орбан рассказал о том, как убеждал Зеленского прекратить огонь

Готовится имитация ракетного удара по офису Зеленского

Sport.russia24.pro

В Улан-Удэ из Москвы прибыли легенды «Спартака»

Итоги Кубка Москвы по фехтованию на шпагах подвели в столице

Шараканов удивил всех: эффектный лакросс-гол в матче "Лада" против "Динамо"

УАЗы тащат! Огненное видео с гонки «Арские холмы-2024»

Lukashenko.russia24.pro

«С ним на выборы идет фан-клуб». Как в Беларуси готовятся к переизбранию Лукашенко на седьмой президентский срок

Person.russian.city

Сергей Собянин поздравил с юбилеем Александру Пахмутову

Сергей Собянин: Питьевая вода в Москве проверяется по 184 различным показателям

Собянин: На станцию метро «Электрозаводская» вернулись исторические горельефы

В Москве завершается создание трех важных транспортных объектов — Собянин

Ecology.russia24.pro

Общая протяженность маршрута составит 25 километров

В Москве удвоят количество "Экоточек" для переработки старых вещей

Экология будущего. В Москве отметят 100 лет ВООП

Объявлено о планах запуска в России производства аккумуляторов и двигателей для электромобилей в 2025 году

29ru.net

Ефимов: научно-производственный кластер создадут в ЗелАО по программе КРТ

Еще один громкий арест в Минобороны РФ. Сколько их…

Власти Самарской области заверили хоккеистов «Лады», что долги по зарплате будут погашены

Кыргызстанцы подрались в Москве из-за «жрицы любви»

Severodvinsk.ws

KEY CAPITAL: Спрос на ипотеку в регионах РФ упал на 53%

Межрегиональный форум «Дни ритейла в Беломорье» пройдет в Архангельске

Александр Цыбульский посетил выставку «Мангазейский морской ход: технология открытий»

Правительство РФ выделит Якутии средства на уход за пожилыми и инвалидами

Sevpoisk.ru

Вечер-посвящение «Верит в быль и верит в небыль бескорыстная душа»

Литературно-музыкальный час «А жизнь, как песня…».

Краеведческий час «Этот тихий край мне мил и дорог».

«Динамика стоимости лота дрогнула и ушла в минусовую зону». В октябре 2024 года цены на квартиры в Севастополе гораздо выше, чем в Симферополе

103news.com

Власти Самарской области заверили хоккеистов «Лады», что долги по зарплате будут погашены

Власти Москвы провели реабилитацию более 20 прудов за 2024 год

Ефимов: научно-производственный кластер создадут в ЗелАО по программе КРТ

Кыргызстанцы подрались в Москве из-за «жрицы любви»

Агрегатор новостей 24СМИ