【AI】data for training LLM v.s. Reddit (UGC)
Updates:
2024-June-1 Techcrunch
<AI training data has a price tag that only Big Tech can afford>
Few independent, not-for-profit efforts to create massive datasets anyone can use to train a generative AI model:
2024-May-16 Reddit
<Reddit and OpenAI Build Partnership>
2024-April-25 Techcrunch
< Carv raises $10M Series A to help gamers monetize their data>
"Carv’s initial focus is on two key industries, gaming and AI, where it sees the biggest opportunity to help users control their data and monetize it. Users can choose to provide their data to Carv’s corporate customers in a way that preserves their privacy and is compliant with regulations, so that companies can use it for training AI models, market research and more."
"Carv offers three solutions: CARV Protocol, a modular data layer with cross-chain connectivity that connects web2 identities to web3 tokens; CARV Play, a cross-platform credentialing system and game distribution platform; and CARV’s AI Agent, CARA, a personalized gaming assistant that integrates with web3 wallets and can recommend games, activities and projects. "
"Carv differentiates itself by putting data ownership and monetization rights in the hands of users. Any revenue generated from leveraging users’ data gets shared back with the data creators and themselves,” Yu said. “Additionally, we’ve created a unified user ID standard (ERC-7231) that bridges web2 and web3, enabling seamless data portability versus today’s siloed solutions.”
领英推荐
2024-Apr-13 Techcrunch
<Vana plans to let users rent out their Reddit data to train AI>
A startup, Vana, says it wants users to get paid for training data
"We think users should be able to bring their personal data from walled gardens, like Instagram, Facebook and Google, to your application, so you can create amazing personalized experience from the very first time a user interacts with your consumer AI application."
"Vana makes money by charging users a monthly subscription (starting at $3.99) and levying a “data transaction” fee on devs (e.g. for transferring data sets for AI model training)"
"This month, Vana launched what it’s calling the Reddit Data DAO (Digital Autonomous Organization), a program that pools multiple users’ Reddit data (including their karma and post history) and lets them to decide together how that combined data is used. "
"Then there’s the matter of how to fairly distribute payments that the DAO might receive from data buyers."
"Kazlauskas floats the idea that members of the DAO could choose to share their cross-platform and demographic data, making the DAO potentially more valuable and incentivizing sign-ups."
2024-Apr-06 REUTERS
<Inside Big Tech's underground race to buy AI training data>
"Seattle-based Defined.ai licenses data to a range of companies including Google, Meta, Apple, Amazon and Microsoft ... $1 to $2 per image, $2 to $4 per short-form video and $100 to $300 per hour of longer films. The market rate for text is $0.001 per word. Images of nudity, which require the most sensitive handling, go for $5 to $7... Defined.ai splits those earnings with content providers."
Relevant article:
Impressive analysis on the valuation of user-generated content in the context of Reddit's IPO – it really highlights the intricacies of digital asset valuation in today's economy.