🙈 Can AI Ignore Copyright?

AI Governance Professional Edition | Paid-Subscriber Only | #147

Nov 15, 2024

∙ Paid

👋 Hi, Luiza Jarovsky here. Welcome to the 147th edition of this newsletter on AI policy, compliance & regulation, read by 38,500+ subscribers in 155+ countries.

💎 This is an exclusive AI Governance Professional Edition, with my critical analyses of AI compliance and regulation topics, helping you stay ahead in the fast-paced field of AI governance. I hope you enjoy reading it as much as I enjoy writing it.

❄️ Don't miss my intensive winter training! This December, join me for a 3-week AI Governance Training (8 live lessons; 12 hours total), already in its 15th cohort. Join over 1,000 people who have benefited from our training programs. Learn more and register here.

🙈 Can AI Ignore Copyright?

Copyright infringement in the context of AI is a fascinating topic. I have covered it numerous times in this newsletter, including the top 10 papers, the lawsuits that have been piling up, and how the EU AI Act deals with it.

To begin with, if we consider the current Generative AI wave as having started on 30 November 2022, when OpenAI launched ChatGPT, we are almost two years into it. From academic papers to lawsuits and heated debates, much has been said about the topic. Nevertheless, the main issues are far from pacified, including in the EU, where most experts thought artists and creators were safe from AI intrusion.

Today, I want to highlight the main unsolved issues in the context of AI and copyright infringement, as well as three recent legal decisions that might signal how things might evolve in the next months and which might help us understand the future of copyright in the age of AI.

╰┈➤ What are the main issues at stake?

When we look at both copyright law and AI development, having also in mind the content of recent lawsuits - which have evolved over the last two years - we see that today, the main contentious issues are, in simple terms:

➜ The unauthorized use of copyrighted material to train AI;

➜ The removal of Copyright Management Information (CMI) to train AI;

➜ Infringing outputs from the AI system (due to being trained using copyrighted works).

How the topics above are dealt with will vary from country to country. Here are the main copyright arguments from the perspective of the United States and the European Union legal frameworks:

🇺🇸 United States

➜ Regarding the unauthorized use of copyrighted material to train AI, the main argument raised by AI companies is that it's a form of fair use, meaning that it would be a legal exception to copyright protection.

To justify that, one of the arguments raised is that what happens during the training phase is not a form of “copy” per se, or at least the type of copy that would be protected by copyright law.

Companies have said that during the training phase, the AI model/AI system will learn the patterns in the data but not copy the data itself. For example, in a recent lawsuit by major record labels against AI music generator Suno AI, the AI company has answered that the copies of the songs during AI training are:

“made solely to analyze the sonic and stylistic patterns of the universe of pre-existing musical expression”

and that

“It is fair use under copyright law to make a copy of a protected work as part of a back-end technological process, invisible to the public, in the service of creating an ultimately non-infringing new product.”

➜ Regarding Infringing outputs from the AI system (due to being trained using copyrighted works), AI companies have argued that if the outcome is similar, it's because the AI system “was inspired” by the patterns and style of the original work, the same way that the work of other artists can inspire an artist or writer.

They often say there can't be plagiarism because the original work was never “copied,” but its “tokenized” version was used to train the AI model or system.

For example, in the same lawsuit filed by record labels against Suno AI, the AI company argues:

“(…) the outputs of tools like Suno, which do not reprise “the actual sounds fixed” in any “recording” owned by any record label, are not and cannot be even prima facie copyright infringements. The outputs generated by Suno are new sounds, informed precisely by the ‘styles, arrangements and tones’ of previous ones. They are per se lawful.”

🇪🇺 European Union

In the EU, the answers to the issues above will usually focus on the provisions of the EU Copyright Directive (Directive 2019/790), which was translated into national law by each EU Member State.

In this directive, articles 3 and 4 deal with text and data mining exceptions, which have been applied in the AI context.

➜ Article 3 specifies the text and data mining exception for the purposes of scientific research:

“1. Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access. (…)”

The entities who benefit from this exception can lawfully use copyrighted works in the context of text and data mining (including AI). To be able to benefit from this exception, the directive specifies two essential elements:

→ the existence of a research organization or cultural heritage institution;

→ carrying out, for the purposes of scientific research, text, and data mining of works or other subject matter to which they have lawful access.

➜ Article 4 of the same directive, on the other hand, establishes a broader exception for text and data mining, as the organization benefiting from it does not need to be a research organization or cultural heritage institution.

However, in this case, the entity using copyrighted material for text and data mining will have to respect the copyright owner's reservation of rights. I quote below the first and the third paragraph of this article:

“1. Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.

and

3. The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online." (…)

Article 4 of the copyright directive clarifies that there is a copyright exception for the context of text and data mining on the condition that the copyright owner has not exercised their reservation of rights. It means that if the copyright owner has stated that they don't want their work being used in the context of text & data mining (i.e., to train AI), then an AI company cannot use the copyrighted material (and if it does, it will be considered unlawful).

╰┈➤ Three recent legal decisions that might shape the future of copyright

Above, we saw the legal background of the U.S. and the EU on the topic. Now, let's take a look at the lawsuits.

First, a reminder that there are still dozens of ongoing AI copyright lawsuits, the majority in the U.S., which have not been decided yet (you can use this website to track recent developments, month-to-month).

Among the lawsuits that have ended (and have not been dismissed), there are three that everyone in AI should know, as they might change the course of the AI copyright debate. Let's take a look at them.

1️⃣ Robert Kneschke vs. LAION (Germany)

A German court has recently dismissed the AI copyright lawsuit filed by photographer Robert Kneschke against LAION for using his images to train AI without his consent.

The photographer filed the AI copyright lawsuit against the non-profit LAION (Large-scale Artificial Intelligence Open Network) in April 2023, arguing that his images were used to train the LAION 5B dataset without his consent.

The Hamburg Regional Court dismissed the infringement allegations and decided that LAION's use of Robert Kneschke’s images to train AI fell into Section 60(d) of the German Copyright Law (implementing Article 3 of the EU Copyright Directive, covered above), which establishes an exception for text and data mining for the purposes of scientific research.

This is what Section 60(d) of the German Copyright Law says:

"Text and data mining for scientific research purposes
(1) It is permitted to make reproductions to carry out text and data mining (...) for scientific research purposes in accordance with the following provisions.
(2) Research organisations are authorised to make reproductions.
ʻResearch organisationsʼ means universities, research institutes and other establishments conducting scientific research if they
1.  pursue non-commercial purposes,
2.  reinvest all their profits in scientific research or
3.  act in the public interest based on a state-approved mandate. (...)"

➜ Why has this decision shocked many copyright experts?

The court based its decision on Section 60(d) (Article 3 of the Copyright Directive), which reduces the control creators can have over their work, as it does not allow the reservation of rights established in Section 44b.

This means that even if a creator opts out and states upfront (in a machine-readable format) that they don't want their work used to train AI, this reservation of rights is not applicable, as the scientific research exception was invoked.

➜ What might this decision mean for the future of copyright in the EU?

Many creators believed that the reservation of rights mechanism established by the EU Copyright Directive was a safe way to ensure they would remain in control and be able to opt out of AI training. This decision seemingly expanded the potential application of Section 60(d) and created doubts about the interpretation of EU copyright law in the context of AI training.

In any case, Robert Kneschke can still appeal.

2️⃣ Raw Story & AlterNet vs. OpenAI (US)

Last week, a U.S. judge sided with OpenAI in an AI copyright lawsuit filed by Raw Story and AlterNet over copyright management information (CMI) removal.

In simple terms, the lawsuit filed by Raw Story & AlterNet focused on the facts below:

→OpenAI removed copyright management information (CMI), infringed the Digital Millennium Copyright Act (DMCA), and caused harm;

→ They needed a court order to stop OpenAI from continuing to do that.

The judge was not convinced and dismissed both claims, siding with OpenAI. The judge concluded that plaintiffs actually wanted to sue OpenAI over the lack of compensation for using their content to train AI. Here's what the judge said:

"Let us be clear about what is really at stake here. The alleged injury for which Plaintiffs truly seek redress is not the exclusion of CMI from Defendants' training sets, but rather Defendants' use of Plaintiffs' articles to develop ChatGPT without compensation to Plaintiffs. See Compl. ~ 57 ("The OpenAI Defendants have acknowledged that use of copyright-protected works to train ChatGPT requires a license to that content, and in some instances, have entered licensing agreements with large copyright owners ... They are also in licensing talks with other copyright owners in the news industry, but have offered no compensation to Plaintiffs."). Whether or not that type of injury satisfies the injury-in-fact requirement, it is not the type of harm that has been "elevated" by Section 1202(b )(i) of the DMCA. (...). Whether there is another statute or legal theory that does elevate this type of harm remains to be seen. But that question is not before the Court today."

Essential AI copyright controversial issues, such as consent, compensation, and infringing outputs, were not covered in this decision (plaintiffs can still appeal).

➜ What might this decision mean for the future of copyright in the US?

In the decision, the judge clarified that the way the claim was framed did not reflect the legal arguments brought to the court, and what they wanted was to be compensated for having their content used to train AI.

My personal reading of this decision, having in mind the various recent deals between AI companies and media companies (where media companies agree to license their content to train AI), is that the judge was ready to accept the lack of compensation claim. If the lawyers successfully appeal or amend the lawsuit following these lines, in my view, they'll probably win.

Extrapolating from this decision to a broader legal scenario, having read recent lawsuits on the topic and how the claims have evolved, it's realistic to expect this to become a broader tendency, where to have access to datasets, AI companies will have to close licensing deals with the owners of the datasets.

In the case of artists and creators, we can imagine broad associations of creators to be formed where they create specific datasets for AI training, and those who wish to be part of licensing deals (and be compensated for that) can choose to have their works included in the dataset.

➜ The problem of infringing outputs

As I mentioned, my arguments above deal with licensing copyrighted works to train AI, and they don't cover the issue of infringing outputs.

For the licensing model to work well with creators, additional commitments from AI companies would have to be made in the sense of strong guardrails against infringing outputs.

If there are deals against infringing outputs and the AI system still generates infringing outputs, the AI companies should be held liable.

Keep reading with a 7-day free trial

Subscribe to Luiza's Newsletter to keep reading this post and get 7 days of free access to the full post archives.