💰 Selling user data to train AI

Luiza's Newsletter #93

Mar 05, 2024

👋 Hi, Luiza Jarovsky here. Welcome to the 93rd edition of this newsletter on privacy, tech & AI, read by 18,809 email subscribers in 115+ countries.

📲 For daily privacy, tech & AI content, join 64,595 followers on LinkedIn, X, YouTube, Instagram, Threads & TikTok (and come say hi).

A special thanks to MineOS, this week's sponsor. Check out their guide:

AI governance is all the rage, but with the limited understanding of how exactly data governance will shift to manage AI, we need to start at the beginning to make real progress on safe AI. What is the risk of using AI-powered systems, and how can companies detect AI usage and its risks? Only after answering these questions can we approach AI governance from a position of strength. Get a free guide from MineOS on assessing the risks of AI systems.

💰 Selling user data to train AI

Recently, there have been important developments in the context of selling user data for AI training and everyone on the internet should know it.

Automattic, the company behind Tumblr and WordPress, is one more company selling user data (from blog posts, comments, articles, etc.) to OpenAI and Midjourney.

Last week, Automattic shared this public statement ("Protecting User Choice").

My comments:

➳ blocking major AI platform crawlers by default is an interesting proactive step;

➳ the last bullet, about sharing only public content, is extremely important, especially after 404Media's article that reported that, by mistake, Automattic had shared "private posts on public blogs, posts on deleted or suspended blogs, unanswered asks, private answers, posts that are marked ‘explicit’" and other content that should not have been shared;

➳ the fact that there are no laws or obligations that force web crawlers to follow platforms/user preferences is problematic, as we are unfortunately still in the AI Wild-West. AI companies can "ops I've crawled by mistake" at any time.

This is one more deal to sell user data for AI training, and there are others happening.

Reddit is licensing its content to Google for AI training, and the reported value is $60 million/year, and in 2023, we learned that Shutterstock signed a new six-year agreement with OpenAI to provide images to train DALL-E.

Regarding these deals: it's a positive development that AI companies are looking for alternative sources of training data, as scraping has many questionable issues, including privacy, as I've been discussing in previous editions of this newsletter.

However, it surprises me that there aren't any forms of revenue share agreements with users being proposed. Platforms such as Reddit, Tumblr and Wordpress are mere hosts of user-created content, and the initial deal, when we started using those platforms and posting content, did not involve AI training.

They've changed the direction of their business model to be able to get cash from the current generative AI wave. In that context, especially knowing that: a) this was not the deal when we created our accounts and started posting content; b) the content being sold to AI companies is publicly available user-created content, my opinion is that users should be compensated.

If you have a blog, if you post on social networks, or if you post, comment, or upload content anywhere on the internet, beware: your content is already or will soon be used to train AI. If you don't agree with that, I recommend you verify your alternatives (opting out, or moving to other platforms that allow you to opt out).

*Watch my 2-minute video about the topic here. To learn more about this and other emerging issues in privacy, tech & AI, join my 4-week Bootcamp starting next week (more info below). It would be great to see you there.

🎓 Emerging challenges in privacy, tech & AI

While deals to foster AI training continue to happen, essential privacy-related issues remain unanswered, and privacy & tech professionals should get ready for new challenges in their fields.

If you are looking for professional development opportunities that will help you navigate these new scenarios, register for the 4th edition of our 4-week Bootcamp on Emerging Challenges in Privacy, Tech & AI, starting on March 13. You can read more about the program and sign up here.

If you work for a company: you can ask your manager to book a private Bootcamp for your team (up to 30 people). Contact us if you are interested or if you have any questions.

I personally designed the Bootcamp so that it is the most up-to-date and informative as possible, with enough learning material to feed your curiosity for four weeks. I'll also be leading the sessions - I hope to see you there!

🔥 AI Briefing: legal analyses on trending AI topics

Some of the hottest AI topics this week are:

1. AI-based election misinformation
2. Elon Musk vs. Sam Altman
3. EU vs. OpenAI
4. Mistral

Yesterday, paid subscribers received the weekly AI Briefing with my analysis on each of the topics above. To stay up to date, start your week with the AI Briefing in your inbox, and support my work, upgrade to paid. Thank you!

🎤 This Thursday: live session on AI governance

If you are interested in AI, you can't miss our upcoming live session. I invited four experts - Alexandra Vesalga, Kris Johnston, Katharina Koerner, and Ravit Dotan - to discuss with me emerging issues in the context of AI governance. This will be a fascinating session, full of practical and actionable insights. Read more about the topics we'll cover, register here, and join us live on March 7.

📚 AI Book Club: 780+ members

Interested in AI? Love reading and would like to read more? Our AI Book Club is for you! There are 780+ people registered, and we are currently reading the book “Unmasking AI” by Dr. Joy Buolamwini. We'll meet next week to discuss it, on March 14. During the 1-hour meeting, five book commentators will share their perspectives, and everybody can join the discussion. Join the AI Book Club here.

🔍 Kal-AI-doscope #4

If tech companies are selling publicly available user data for AI training, users should be compensated. I explain in this 2-minute video, watch:

Every week, I share a short video with my commentary on an AI-related topic. Watch the full Kal-AI-doscope playlist here.

🤖 Job opportunities

If you are looking for a job or know someone who is, check out our privacy job board and our AI job board, containing hundreds of open positions. In addition to that, we send a weekly email alert with selected job openings; visit the links above and subscribe.

If you have comments on this week's edition of the newsletter, I'll be happy to hear them! Write to me, and I'll get back to you soon.

From human to human, I wish you a great week, and see you next Tuesday.

All the best, Luiza

Discussion about this post

Ready for more?