Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user-generated content exclusive to the internet’s already dominant search engine. Reddit recently did this change in the aftermath of its AI deal with Google of $60 Million. Perhaps Reddit's position is justified because a lot of companies where training AI models on the back of Reddit's data and they wanted to at least monetize some of it. But reddit's data is actually of the reddit users.
- Reddit's New Policy: If you use Bing, DuckDuckGo, Mojeek, Qwant, or any other alternative search engine that doesn’t rely on Google’s indexing, you won't find any Reddit content indexed.
- Google's position: The second order effect of this is concerning. Here, Google’s dominance on search is now actively hindering other companies’ ability to compete. This isn't exactly a great outcome for Search users globally.
- Mojeek CEO's Perspective: Colin Hayhurst, CEO of the search engine Mojeek, said, “They’re [Reddit] killing everything for search but Google.” He tried contacting Reddit when Mojeek noticed it was blocked from crawling the site in early June but received no response. “It's never happened to us before,” he said.
- Use of robots.txt: Many websites are updating their robots.txt files to block bots that AI companies use to scrape them for training data. Reddit’s robots.txt is now much stricter, disallowing all bots from scraping any part of the site.
- Discontent with AI Scraping - Why Reddit chose this path: Reddit has taken public and aggressive steps to stop AI companies from scraping the site to train large language models. Last year, Reddit started charging to access its API, making many third-party apps too expensive to operate. Earlier this year, Reddit announced a $60 million deal with Google to license Reddit content for training its AI products.
- Erosion of the Web: Reddit and Google’s deal makes it harder to offer alternative ways of searching the web. “It’s part of a wider trend, isn’t it?” Hayhurst said. “It concerns us greatly. The web has been gradually killed and eroded.” Will other publishers follow suit? Will the press start licensing their content to the highest bidder too? That looks the next likely thing which may happen.
- Recently, many websites have updated their robots.txt files to block AI companies' bots from scraping.
- Google introduced Google-Extended, a bot that crawls the web specifically to improve its Gemini apps.
Conclusion: It’s time to discuss whether AI and large tech deals are making the internet less open and free. The Reddit and Google deal is a worrying sign, and we need to consider whether we are losing the open nature of the web. What are your thoughts? Is AI making the web less open? Let's discuss!
Have you seen other interesting crawlers and robots.txt files? I would love to hear from you in the comments section or even over a DM!
Top Linkedin Voice For Content, Sr Digital Strategist | Marketing, Gen AI & Tech Strategy | Martech, Analytics & Communications Leadership | Author | Speaker
4 个月Yes this is true till the end of 2023 globally around 2.7% of the websites had blocked chatgpt & other AI tools using robots.txt from scraping & indexing their content for training data. Numbers are on the rise this year as well as the same trend is being witnessed exponentially. Many more tie ups & Reddit kind of similar deals may be up on the anvil #gpt #ai #web