Reddit Is Making a Deal With the AI Devil

The social media giant's $60 million real-time data deal with Google is latest example of internet companies selling out their users who have nominally "consented" to share their personal information, but have no control. Blockchains and ZK-proofs could help prevent corporate overreach, Nym CEO and privacy advocate Harry Halpin writes.

AccessTimeIconMar 14, 2024 at 3:39 p.m. UTC
Updated Mar 14, 2024 at 3:41 p.m. UTC
AccessTimeIconMar 14, 2024 at 3:39 p.m. UTCUpdated Mar 14, 2024 at 3:41 p.m. UTC
AccessTimeIconMar 14, 2024 at 3:39 p.m. UTCUpdated Mar 14, 2024 at 3:41 p.m. UTC

Selling user data to artificial intelligence (AI) companies is simply mass surveillance under a new guise. People are rightfully worried about governmental mass surveillance. Yet most people are blithely unaware of the surveillance they sign up to when opening an account with a Web 2.0 company.

Harry Halpin is the CEO and co-founder of Nym Technologies.

Recently, we are all being forced to sign new “terms of service.” What most people don’t know is that these contracts allow their raw data to be sold to train AI models. The latest of these new data-heists is between Reddit and Google, where Reddit gives real-time data to Google for reportedly $60 million.

This hits me personally. The late Aaron Swartz, Reddit’s co-founder, would be spinning in his grave if he knew of this deal.

Just as soylent green ended up being made of people, AI models are actually made of data created by humans. Every time you contribute data to a platform like Reddit or Instagram, the company captures and owns it. They can then sell it all under the conditions to which you have “consented.” Of course, no one reads these terms: they are long, tedious and often purposefully inscrutable.

Generative AI models compete on training data, and the more data the better. Yet some of this data may be copyrighted or even personal. No wonder there are many lawsuits by companies like the New York Times against OpenAI. While it’s true that AI models only keep statistical models of the data, the right prompt may elicit the actual underlying data itself. This can in turn reveal potentially private information.

A safer situation for everyone would be if AI companies trained only on publicly available data where the creator of the data gave consent, which only can be meaningful if the user controls their data.

The real problem is that when you put data on social media sites like Reddit, your data becomes the product. So even though you are creating the data, you have no control or ownership of it. By using the app, you’ve already legally “consented” to your own surveillance in order for you to enjoy the “free” privilege of using the platform.

The entire idea of Web3 was that users – not platforms – would own and control their data, even if, like a Reddit post, it is meant to be public. Ownership could be cryptographically inscribed in a decentralized blockchain so that no single platform could sell your data without your permission.

Sure, AI is exciting, yet we seem to have forgotten this vision in which users are remunerated for their own data. Although Reddit killed its tokenized community points program last October, do we really want to throw this vision out the window to welcome our new AI overlords?

Aaron Swartz, the co-founder of Reddit via a twisted history with Infogami and Y Combinator, was the greatest child prodigy of the internet generation. I knew him through his standards work on decentralizing social media with RSS and the Semantic Web at the World Wide Web Consortium at MIT, where I worked on WebCrypto and related standards.

Aaron was an incredibly kind and thoughtful programmer. He is most well known for his push for opening up government and research data to the public. Yet Aaron was also a staunch defender of personal privacy. He was interested in decentralizing Wikileaks via his work on DeadDrop (later SecureDrop), and even using a blockchain to decentralize domain names.

After selling Reddit, Aaron was convinced the future would require political change from inside the U.S. government. However, the very political system he hoped to reform drove him to suicide when the government charged him with 50 years of imprisonment for using MIT computers to access and download a massive amount of paywalled academic articles to share freely.

I suspect that like me, Aaron would be personally excited by AI. I equally believe he would be supportive of a world where zero-knowledge proofs and mixnets defend citizens against government corruption and corporate overreach. He would want a world where publicly-funded data is free to access and use, but where ordinary people have a choice to protect and control their own data.

As the cypherpunks say: “Transparency for the powerful, privacy for the weak.”


Learn more about Consensus 2024, CoinDesk's longest-running and most influential event that brings together all sides of crypto, blockchain and Web3. Head to consensus.coindesk.com to register and buy your pass now.


Disclosure

Please note that our privacy policy, terms of use, cookies, and do not sell my personal information has been updated.

CoinDesk is an award-winning media outlet that covers the cryptocurrency industry. Its journalists abide by a strict set of editorial policies. In November 2023, CoinDesk was acquired by the Bullish group, owner of Bullish, a regulated, digital assets exchange. The Bullish group is majority-owned by Block.one; both companies have interests in a variety of blockchain and digital asset businesses and significant holdings of digital assets, including bitcoin. CoinDesk operates as an independent subsidiary with an editorial committee to protect journalistic independence. CoinDesk employees, including journalists, may receive options in the Bullish group as part of their compensation.

Harry Halpin

Harry Halpin is the CEO and co-founder of Nym Technologies. He was previously a senior research scientist at MIT, where he led the standardization of the Web Cryptography API across all browsers, and at Inria de Paris where he led interdisciplinary research on socio-technical systems and privacy.