๐ AI Companies: The GDPR Is Knocking
A deep dive into EDPB Opinion 28/2024 | Paid-Subscriber Only | Edition #157
๐ Hi, Luiza Jarovsky here. Welcome to the 157th edition of this newsletter on AI policy, compliance & regulation, read by 42,800+ subscribers in 160+ countries. I hope you enjoy reading it as much as I enjoy writing it.
๐ This is a paid subscriber-only edition featuring my weekly analyses of AI compliance and regulation topics, which you won't find anywhere else. It's an excellent way to stay ahead in the fast-evolving field of AI governance. If you're not a paid subscriber, you can upgrade your subscription here.
๐ผ Level up your career! This January, join me for the 16th cohort of our acclaimed AI Governance Training (8 live lessons; 12 hours total). Over 1,000 professionals have already benefited from our programsโdonโt miss this opportunity. Students, NGO members, and professionals in career transition can request a discount.
๐ AI Companies: The GDPR Is Knocking
Yesterday, when many of us were already thinking about our end-of-year break, the European Data Protection Board (EDPB) published the much-awaited Opinion 28/2024, covering essential topics in the context of AI companiesโ data protection compliance. Here's what you need to know:
1๏ธโฃ AI, Legitimate Interest, and Why Opinion 28/2024 Matters
If you've read this newsletter over the past two years (or participated in myย AI Governance Trainingย program), you know I've been addressing AI companies' data protection compliance since November 2022. It's clear to meโand to many other experts in the fieldโthat AI companies have failed to comply with several core provisions of the EU General Data Protection Regulation (GDPR).
One of the most important topics in this context is compliance with Article 6 of the GDPR, which governs the lawfulness of data processing. The GDPR establishes that to process personal data, you must rely on one of the lawful grounds listed in the article. In commercial contexts, companies that process personal data typically rely on consent (Art. 6.1a), contract (Art. 6.1b), or legitimate interest (Art. 6.1f).
Regarding AI companies developing AI models, it has become evident in recent months that most of them rely on legitimate interest (see, for example, this page from OpenAI's website, particularly the section titled โHow does the development of ChatGPT comply with privacy laws?โ).
The problem is that legitimate interest is not a magic wand, and to be compliant, three cumulative conditions must be fulfilled:
โFirst, the pursuit of a legitimate interest by the controller or by a third party;
Second, the need to process personal data for the purposes of the legitimate interest(s) pursued; and
Third, the interests or fundamental freedoms and rights of the concerned data subjects do not take precedence over the legitimate interest(s) of the controller or of a third party.โ
The main challenge for AI companies has been complying with the third condition in the assessment above, also called the balancing test. To pass this test, among other conditions, AI companies must comply with GDPR principles, as well as data subjectsโ interests, fundamental rights, and freedoms. Companies must also assess the impact on and the expectations of data subjects.
Most AI companies developing AI models today fail to pass the balancing test. As a consequence, they lack a lawful ground to process personal data for training AI in the EU and are not GDPR compliant.
This has been the case since the beginning of the ongoing generative AI wave, which started in November 2022 when ChatGPT was launched. This is also unsustainable: if EU data protection law matters, it must be enforced, irrespective of the technology.
EU authorities, particularly the European Data Protection Board (EDPB), have been aware of this issue and knew that the ball was in their court.
If AI models (and the AI systems based on them) offered in the EU are not GDPR compliant, there are three available options for EU authorities:
EU authorities start enforcement actions against these AI companies until they start complying with the GDPR;
EU authorities explain what AI companies must do to comply with the GDPR and the specific measures that will be accepted;
EU authorities publish official opinions/guidelines explaining why the GDPR is interpreted differently in the context of AI models' development and deployment or why it does not apply in those cases.
In May 2024, the EDPB published a document titled โChatGPT Taskforce Report,โ in which they clarified to the public that they were aware of the issues I mentioned above and were looking into them, offering initial thoughts on potential solutions.
As I have commented a few times in this newsletter, the report was incomplete, and more guidelines were neededโideally, a comprehensive EDPB Opinion addressing the topic.
The official document from the EDPB clarifying these issues has now arrived and was published yesterday. It's called Opinion 28/2024.
The opinion also makes it clear that the path chosen by the EDPB is the one I described in item 2: โexplain what AI companies have to do to comply with the GDPR and the specific measures that will be accepted.โ What AI companies will soon realizeโand I will clarify belowโis that this represents the strictest alternative for them. The EDPB and EU authorities are well aware of AI-powered technology, its capabilities, and its limitations, and they have established the strictestโyet feasibleโmeasures possible. There will be no place for โit's not feasibleโ excuses and claims of โAI exceptionalism.โ The EDPB made clear to AI companies that the GDPR is knocking.
2๏ธโฃ What Issues Does Opinion 28/2024 Cover?
As the title clarifies, it covers data protection issues related to the processing of personal data in the context of AI models and aims to answer the four questions below:
โค when and how an AI model can be considered as โanonymousโ;
โค how controllers can demonstrate the appropriateness of legitimate interest as a legal basis in the development phase;
โค how controllers can demonstrate the appropriateness of legitimate interest as a legal basis in the deployment phase;
โค what are the consequences of the unlawful processing of personal data in the development phase of an AI model on the subsequent processing or operation of the AI model.
The focus of the EDPB opinion is how companies can comply with legitimate interest and, consequently, have a lawful ground to process personal data according to the GDPR.
The Opinion does not cover all EU data protection issues in the context of AI development and deployment. A company can comply with legitimate interest but not comply with other GDPR provisions, for example.
The first point discussed in the Opinion, โwhen and how an AI model can be considered as anonymous,โ discusses the cases in which the GDPR will not apply. In these cases legitimate interest is irrelevant (if the model is anonymous, then there is no personal data processing and the GDPR does not apply).
The fourth point discussed in the Opinionโโwhat are the consequences of the unlawful processing of personal data in the development phase of an AI model on the subsequent processing or operation of the AI modelโโis extremely relevant. When we think of the AI value chain and AI commercial strategies, AI models are usually made available for integration into downstream AI systems. If an AI model is deemed unlawful (for instance, due to non-compliance with legitimate interest), and the consequence is that all AI systems based on this model are also unlawful, the entire ecosystem will fall apart.
3๏ธโฃ What Are the Most Important Conclusions of Opinion 28/2024?
โ 1. When and how an AI model can be considered as โanonymousโ
Before the EDPB clarifies the conditions for a model to be considered anonymous, it states the following:
โFirst and foremost, the EDPB would like to provide the following general considerations. AI models, regardless of whether they are trained with personal data or not, are usually designed to make predictions or draw conclusions, i.e. they are designed to infer. Furthermore, AI models trained with personal data are often designed to make inferences about individuals different from those whose personal data were used to train the AI model. However, some AI models are specifically designed to provide personal data regarding individuals whose personal data were used to train the model, or in some way to make such data available. In these cases, such AI models will inherently (and typically necessarily) include information relating to an identified or identifiable natural person, and so will involve the processing of personal data. Therefore, these types of AI models cannot be considered anonymous. This would be the case, for example, (i) of a generative model fine-tuned on the voice recordings of an individual to mimic their voice; or (ii) any model designed to reply with personal data from the training when prompted for information regarding a specific person.โ
and
โIn sum, the EDPB considers that, for an AI model to be considered anonymous, using reasonable means, both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant for any data subject. By default, SAs [Supervisory Authorities] should consider that AI models are likely to require a thorough evaluation of the likelihood of identification to reach a conclusion on their possible anonymous nature. This likelihood should be assessed taking into account โall the means reasonably likely to be usedโ by the controller or another person, and should also consider unintended (re)use or disclosure of the model.โ (page 16)
Pay attention to the parts that I highlighted in bold. The EDPB and most EU data protection authorities view anonymity very narrowly, especially given the studies showing that it's almost impossible to truly anonymize personal data.
In the second quote above, the EDPB states that the likelihood of identification should consider โall the means reasonably likely to be used.โ The EDPB did not write โmost meansโ; they wrote โall the means.โ This will be extremely difficult to prove, especially given the ongoing developments in identification tactics, including adversarial attacks.
As a consequence, anonymity claims will be strongly scrutinized, and my guess is that very few AI companiesโif anyโwill manage to claim that their AI models are anonymous. In general, GDPR will apply to AI model development and deployment, and compliance with legitimate interest will be relevant for most AI companies.
โ 2. How controllers can demonstrate the appropriateness of legitimate interest as a legal basis in the development phase
As I mentioned above, the main challenge lies in complying with the balancing test (in the context of the legitimate interest three-part test). In this regard, the EDPB proposes a list of measures that may help mitigate risks and change the balancing test in favor of the AI company. These measures are (see pages 28โ30 for further details):