Will regulation leave the EU out of the AI party?
The Privacy Whisperer #52
Today's newsletter is sponsored by Ubiscore:
Do you want to ensure that your business partners respect privacy regulations? Look no further than Ubiscore's public privacy database, which offers independent compliance ratings for over 5,000 startups. The database is expanding to include the entire DACH region, Italy, and other GDPR countries, providing a comprehensive view of how companies handle personal data. With just a few clicks, you can determine if your potential business partners or vendors comply with privacy regulations. Instead of taking their word for it, consider our evidence. Take action and assess your business’ or competitors’ privacy score now.
🔥 The CNIL's action plan: when data protection meets AI
This week, the French Data Protection Authority - CNIL - published its four-step action plan on AI, and it is the best official document I have seen so far that deals with the intersection between data protection and AI. The four steps highlighted in this action plan are: 1. Understanding the functioning of AI systems and their impacts on people; 2. Allowing and guiding the development of AI that respects personal data; 3. Federal and support innovative players in the AI ecosystem in France and Europe; and 4. Audit and control AI systems and protect people. For practitioners in the fields of data protection & AI, this is a must-read document. Some of the aspects of the CNIL's document that the reader should pay special attention to are: a) the mention of protecting publicly available data against scraping; b) the need to conduct a data protection impact assessment for AI systems that process personal data - on the topic, check out this infographic I prepared, illustrating some of the intersections between the AI Act and the GDPR; and c) the upcoming CNIL's doctrinal work on the intersection between data protection and AI, such as recommendations on how to deal with data subjects’ rights in the context of AI.
🔥 Is synthetic data an effective privacy-preserving tool?
There is a lot of enthusiasm around synthetic data, with some saying that it could potentially be better than real data in the sense of helping to preserve privacy, reflect more diversity, expand existing AI uses, improve AI performance, and even democratize AI research. Synthetic data is produced on a sample of real-world data; according to Mostly AI, “the algorithm first learns the patterns, correlations and statistical properties of the sample data. Once trained, the generator can create statistically identical, synthetic data.” Despite today still being a “niche” field, Gartner estimated that, by 2030, synthetic data will completely overshadow real data in the context of AI models. There are various companies currently offering synthetic data-based AI tools, potentially applied in the training phase or used as a final B2B or B2C product. Despite the advantages, synthetic data seems to also have meaningful downsides. According to Syntheticus, some of the main issues are lower accuracy, lower effectiveness for some AI applications, validation challenges, and still bias and privacy concerns. In this sense, Teresa Stadler has argued that “synthetic data is far from the holy grail of privacy-preserving data publishing. The promise that it provides a higher gain in privacy than traditional sanitization at a lower cost in utility rarely holds true.”
🔥 Will regulation leave the EU out of the AI party?
As I wrote on Monday, Google has recently launched Bard - their large language model (LLM) chatbot. It is available in three languages (US English, Japanese, and Korean) and in 180 countries. Curiously, it is not available in Canada nor in any of the 27 EU