🧠 Rethinking AI training

Plus: this week's AI Briefing, free resources, and more

Feb 13, 2024

∙ Paid

👋 Hi, Luiza Jarovsky here. Welcome to the 90th edition of this newsletter on Privacy, Tech & AI. Thanks to 18,300+ email subscribers and 63,000+ social media followers who are with me on this journey!

🌐 Yesterday, I launched my Instagram, Threads & TikTok accounts (besides the LinkedIn, X, and YouTube ones). If you use any of these networks, follow me there for more privacy, tech & AI content (and come say hi).

A special thanks to MineOS, this edition's sponsor. Check out their guide:

New Jersey became the 14th state to pass comprehensive data privacy legislation, starting 2024 hot for American data privacy. But the Garden State's law isn’t just another state-level carbon copy. With the 2nd most stringent applicability threshold, impact assessments required before data is processed, a never-before-seen expansion of universal opt-out mechanisms, and $10,000 fines per violation, this is a law you’ll need to know well. Get the full breakdown in MineOS’s guide.

Want to sponsor this newsletter? Get in touch (next available spot: May 14)

🧠 Rethinking AI training

I'll begin today's edition with a quote from Mark Zuckerberg, Meta's CEO, during the company's Q4/2023 earnings call, which happened on February 1. You can find the full transcript here.

In the context of Meta's AI strategy, Zuckerberg said:

“On Facebook and Instagram there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well. But even more important than the upfront training corpus is the ability to establish the right feedback loops with hundreds of millions of people interacting with AI services across our products.”

I find this quote and this whole approach to AI training unsettling, and this should not be an acceptable mindset behind AI development. Let me explain why:

→ In the quote above, he is “bragging” about the immense number of “publicly available” images, videos, and text available on Facebook and Instagram, ready for being harvested for AI training. So not only was our data harvested for behavioral advertising and to fund ad-based business models (perhaps the biggest scam of the last decade, as people did not realize this “value exchange”), but this data will be harvested again to train their AI models, again without any notice, choice, or compensation.

→ Maybe this won't surprise anyone, but all Facebook's and Instagram's marketing talk about building meaningful connections through their platforms is suddenly trashed by their CEO, who is in practice saying, “We invented this meaningful connection bulls**t to make you share more data so that we can have a true commercial advantage with the biggest AI model in the world.” Is it ethical? As soon as Meta's main leverage point in AI development became users' personal data, shouldn't they have made it clear through their products?

→ Pay attention to Zuckerberg's language above and how he treats users as unpaid sources of publicly available content that can be used by their AI models. Any new feature is basically a new trap to capture the data they need.

→ Out of curiosity, I searched for the words “transparency,” “autonomy,” “choice,” “consent” (not the mention to the FTC consent order), “privacy,” “data protection,” “ethical,” “fairness” and “fair” (not their Fundamental AI Research group, called FAIR) and I found exactly zero mentions on the transcript of the earnings call. They know this document is publicly available and did not even try to hide that there are no other concerns besides harvesting as much data as possible to train their AI models.

AI regulations and policies around the world should challenge that approach and make sure that people are at the core of AI development, including during the AI training phase. And with that, I mean that:

➵ Transparency: Any website or app that is open to AI scraping and training should make it clear to its users through a user-friendly notice, especially if it is a social network or any platform where users can post content;

➵ Opt-out: Any AI-based functionality that relies on user input to establish feedback loops (in the context of AI reinforcement training) must make it clear to users and allow them to opt out;

➵ Choice: Users must always have a clear choice of not participating in AI training, either through leaving the platform, staying but deleting all their (older) content with ease, or actively opting out;

➵ Network effects: Essential services, or those with a high network effect (such as messaging and social media platforms), cannot tell users to quit to avoid AI training and must allow individual opt-out;

➵ Publicly available: Contextual privacy, in the sense developed by Helen Nissenbaum, is more important than ever. In the age of ubiquitous AI harvesting, publicly available must simply mean “available for other people to see.” However, if you are collecting and processing it for secondary uses, such as AI training, the data should be protected.

➵ Legal consent: From an EU perspective, taking into consideration Article 6 of the GDPR, given the unpredictable, maximizable, decontextualized, and profitable characteristics of AI development, legitimate interest should not be allowed, as the three-part balancing test does not stand. The only legal instrument allowed should be consent. (Consent is feasible if implemented through the website/platform).

Technology can be exciting, groundbreaking, and fun. It can help us solve some of the world's most pressing issues, one challenge at a time.

But we must constantly make sure that humans are at its core, including during AI training.

🔥 Essential (and free) privacy, tech & AI resources:

✔️ Join our AI Book Club (715+ members). We are currently reading “Unmasking AI” by Dr. Joy Buolamwini

✔️ Check out and subscribe to our privacy and AI job boards and land your dream job sooner

✔️ Watch or listen to our latest podcast episode on dark patterns and online manipulation with Prof. Woodrow Hartzog & Prof. Cristiana Santos

✔️ Watch the previous conversations with global experts on privacy & AI

✔️ Check out my privacy, tech & AI content on LinkedIn, X, YouTube, and now also on Instagram, Threads & TikTok

💛 Enjoying this newsletter? Send it to friends using your personal link and get a complimentary premium subscription.

🦋 10% discount on cohorts 4 & 5 (March)

As the January and February cohorts of our 4-week Privacy, Tech & AI Bootcamp sold out fast, today we are announcing the March cohorts with a special promotion: the first 10 people who write to me will get a 10% discount.

Do you want to learn more about AI, get valuable tools to deal with emerging technological and interdisciplinary challenges, and advance your career? This is your chance.

A reminder that the Bootcamp gives you a certificate of conclusion and 8 CPE credits pre-approved by the IAPP. You can check out the March dates and the full program here. Let me know if you have any questions.

🔥 AI Briefing (for premium subscribers)

Turn off the unnecessary noise and focus on meaningful AI-related issues and trends. Here's my commentary on the most important AI topics this week:

🧠 Rethinking AI training

Plus: this week's AI Briefing, free resources, and more

🧠 Rethinking AI training

🔥 Essential (and free) privacy, tech & AI resources:

🦋 10% discount on cohorts 4 & 5 (March)

🔥 AI Briefing (for premium subscribers)

This post is for paid subscribers