AI has already trained on your data — but don't be afraid!
VentureBeat
VB is obsessed with transformative technology — including exhaustive coverage of AI and the gaming industry.
It's happening, the Millennials are officially "old."
I know this not only because I am one (born in the late 1980s — anyone born in the "late 1900s", as the kids say, is old news) but because this week there emerged evidence of members of my perennially online and economically disadvantaged generation succumbing to the habits of the Boomers, posting chain messages on Instagram and Facebook demanding that parent company Meta Platforms not train its AI models on their posts.
As one variation of the message goes: "I own the copyright to all images and posts submitted to my Instagram profile and therefore DO NOT consent to Meta or other companies using them to train generative AI platforms. This includes all FUTURE and PAST posts/stories on my profile."
Not the first time a chain letter against big tech has trended
This, of course, has happened before — many times over the last few decades — though it seemed in the past to be a target of older, less digitally savvy generations than Millennials (who were raised on the PC) trying to opt out of new Terms of Service updates they didn't like.
Yet, now that generative AI is making its way into more of the most popular tech products people use, from Google to Facebook, Instagram, and work tools such as Slack, Adobe Photoshop, and others, people seem to be getting more freaked out and bandying about these "I do not consent" posts like they're charms to ward off evil spirits.
Except, as has been the case in the past when prior objection chain messages trended on social media, the message itself is actually legally toothless and doesn't stop any of the companies from training on your data.
Adobe AI anxiety
A similar drama unfolded this week among designers, filmmakers, and creatives specifically surrounding Adobe's new Terms of Service, which also seemed to imply the company behind Photoshop and Lightroom and numerous other creative apps was going to surveil and train AI on your works. As Adobe clarified, it wasn't doing this nor intended to, instead, it was using machine learning to scan and analyze content uploaded to the cloud or cloud-enabled features for illegal activity, such as child abuse imagery.
More to the point: in all these cases, the companies have already trained AI models on your data and before that, targeted advertising models and other technologies.
Meta's president of global affairs Nick Clegg came right out and said it openly less than a year ago at the company's Connect conference and Meta released a research paper showing it trained on 1.1 BILLION image-text pairs, presumed to be user uploaded Instagram and Facebook posts.
Adobe, for its part, already trained its Firefly AI model on images uploaded by third-party contributors to Adobe Stock.
Be smart
C'mon, people — of all generations — we should know this by now! Anything we post publicly online or even into a cloud account is highly likely to be analyzed by various algorithms of which we have no knowledge, control, or power to resist.
That's the implicit bargain we make in using these services to begin with — the companies behind them get our data, we get free (or low cost) tech with which to post our funny memes, communicate, make art, etc. That's the trade-off we have all enjoyed since the birth of the internet, a trend that only increased with the advent of targeted advertising in the 2000s.
More to the point, simply by using the services previously, we already agreed to have our data scraped, chopped, screwed, analyzed, remixed, etc by the companies and whatever third-parties they decide to deal with, and by continuing to use them, we once again accept the new Terms and whatever they entail.
领英推荐
I realize this may come off as a cynical, futile, or perhaps even disturbing stance to some of my European Union readers and friends, who have much more stringent data protections than us in the U.S. (thanks to GDPR). But still, even over there, generative AI is rising fast and your data is not your own to control entirely as you wish.
I also understand the desire to protect our data and not have it be used to train AI models that could compete with or replace us, but just to be clear: the horse has already left the barn.
The tech companies already have mountains of our data, and also, synthetic data derived from our data, and can use it to continue making more advanced, more powerful, and more capable services that yes, could replace some of the things we do. Opting out at this stage seems to be like wishful thinking.
There are new services such as social network Cara that have gained a huge following in just the last few weeks by promising not to train AI on user uploads. However, for now, these remain relatively niche in the face of the pro-gen AI tech behemoths. And as long as those big tech companies and platforms remain, it will be hard for incumbents to stand out simply by virtual of their privacy and data policies. People have shown time and time again that they are willing to trade privacy and share data in exchange for convenience, entertainment, and access to more powerful tech.
Join or die
My view: you can't beat 'em, so you might as well join 'em. Learn to use gen AI. Experiment with it. Even if you don't like it or prefer the human hand, at least develop enough competence with it that you can offer an informed position why your human-only work is better or preferable, even if it takes longer, costs more, or doesn't satisfy the "embrace AI" mandate of your bosses or clients.
Your unique voice and creative personality will still shine through, whatever tools you choose to use. Don't be afraid of gen AI or your data being trained on for it. They've already got it, and you're still here. It's up to you to stand out, even in a world where everything can slurped up and spat out by algorithms. Do with this information what you will.
That's all for this week. Thanks for reading, sharing, subscribing, being you.
Read more
Founder and CEO of DataGenn AI Corp
9 个月This will help lead us to AGI quicker, especially as they train on multimodal data.
Koi Pond Lifeguard
9 个月It didn't train on my data.
Vitalist | Zero to One guy | Founder at SymbionIQ Labs and The SymbionIQ Foundation | Health and Longevity | Open Source and Data Sovereignty | I used to be a Mktg Guy (opinions are mine)
9 个月So the argument is, it’s too late so let’s have more of the same? No sorry. While I agree the horse left the barn long ago it isn’t a good reason enough not to think about a different model which doesn’t involve data exploitation without permission. It’s never too late to put the user back in control.