Slack Faces Backlash for Scraping Customer Data for AI Training
It's Friday and Digital Frontiers of Data is here! ?In this edition we are going to discuss more about AI and Scraping because Microsoft's AI raises security and privacy concerns! ??
Take your seat and prepare for more interesting details! ??
TikTok turns to generative AI to boost its ads business????
TikTok is the newest tech company to adopt generative AI for its advertising business. On Tuesday, the company introduced “TikTok Symphony”, an AI suite designed to help brands. These tools will aid marketers in scriptwriting, video production, and enhancing existing assets.
A key component of the suite is the “Symphony Creative Studio,” an AI video generator. TikTok claims that this tool can produce TikTok-ready videos with just a few inputs from advertisers. The studio also offers ready-to-use videos for ad campaigns, leveraging assets from TikTok Ads Manager or product information.
The new “Symphony Assistant” is an AI tool designed to help advertisers enhance their campaigns by generating and refining scripts and providing recommendations on best practices.
For example, brands can ask the assistant to craft attention-grabbing lines for a new lipstick launch, show what’s trending on TikTok, or generate promotional ideas for a specific industry.
Additionally, TikTok’s new “Symphony Ads Manager Integration” can automatically optimize a brand’s existing videos. This tool can enhance pre-created videos, making them more engaging and effective.
OpenAI strikes major deal with News Corp to boost ChatGPT????
OpenAI has entered into a significant deal with News Corp to access content for training its AI models, the companies announced on Wednesday.
The multiyear agreement, valued at up to $250 million, grants OpenAI access to a wide range of content from News Corp’s extensive portfolio of titles across multiple countries. These include The Wall Street Journal, MarketWatch, The New York Post, The Times, The Sunday Times, The Sun, The Australian, news.com.au, The Daily Telegraph, The Courier Mail, and the Herald Sun.
The agreement allows OpenAI to display content from News Corp in response to user questions and access News Corp’s extensive database of material. This will enable OpenAI to train its AI models, enhancing its ChatGPT chatbot and other AI-powered products and services.
News Corp chief Robert Thomson hailed the deal as a “historic agreement” that will set new standards in the digital age. OpenAI has also made similar agreements with the Financial Times, Axel Springer, and the Associated Press.
These deals are increasingly common as generative-AI companies aim to avoid legal issues and copyright claims. OpenAI, previously using web data to train its models, now seeks agreements with publishers for approved content access due to growing discontent among creatives.
YouTube Becomes Latest Battlefront for Phishing, Deepfakes????
According to a report by security vendor Avast, YouTube has become a new battleground for malicious activities such as phishing, malware distribution, and fraudulent investment schemes.
The researchers focused on Lumma and RedLine, particularly highlighting their involvement in phishing, scam landing pages, and the dissemination of malicious software. YouTube serves as a platform for directing users to these harmful sites and pages, facilitating scams of different levels of severity.
Furthermore, there is a growing prevalence of deepfake videos on the platform, deceiving viewers with convincingly realistic but fabricated content and propagating misinformation. Avast discovered several accounts with over 50 million subscribers each that were compromised and exploited to disseminate cryptocurrency scams leveraging deepfake videos. These videos incorporate fabricated comments to mislead other viewers and feature malicious links.
Researchers found five ways threat actors exploit YouTube: sending personalized phishing emails to creators, adding malicious links to video descriptions, hijacking channels, and spreading cryptocurrency scams.
User Outcry as Slack Scrapes Customer Data for AI Model Training????
Slack, the enterprise workplace collaboration platform, faces privacy backlash following revelations of scraping customer data, including messages and files, for AI and ML model development.
Slack admits to analyzing customer data and usage information by default, without user opt-in, to enhance its software with AI/ML models. Although the company claims technical controls prevent access to underlying content and ensure data doesn't cross workplaces, corporate Slack admins are rushing to opt out of data scraping.
Slack's communication stirred social media controversy as users realized their content, including direct messages and sensitive material, was used for AI/ML model development. Opting out required emailing a request. Some CISOs feel it's expected for big-tech vendors like Slack to develop AI/ML models but believe customers shouldn't bear the burden of opting out.
In response to critics, Slack stated that it utilizes platform-level machine-learning models for features like channel and emoji recommendations and search results. The company assures customers that they can exclude their data from training these non-generative ML models.
Additionally, Slack offers Slack AI as a separate add-on, which employs Large Language Models (LLMs) but doesn't train them on customer data. Slack AI hosts the models on its own infrastructure, ensuring data remains under the customer's control and exclusively for organizational use, without third-party access.
According to Slack's documentation, data will not transfer between workspaces. The company ensures that models used universally among customers are not trained to learn, memorize, or reproduce any part of customer data.
Elon Musk’s X loses lawsuit against Bright Data over data scraping????
A federal judge in California threw out a lawsuit brought by Elon Musk’s X against Israel’s Bright Data. The case centered around the scraping of public online data and its permissible uses.
X, previously known as Twitter, accused Bright Data of scraping its data and selling it, allegedly using sophisticated methods to bypass X Corp.’s anti-scraping technology. X also alleged violations of its terms of service and copyright.
Data scraping involves automated programs gathering data from publicly accessible websites, which can be used for various purposes like AI model training and targeted advertising. In the U.S., scraping public data is generally legal, as confirmed by a 2022 ruling involving LinkedIn.
Previously, X sought over $1 million in damages from unidentified defendants for unlawfully scraping data associated with Texas residents. However, Judge William Alsup dismissed the complaint, noting that X wanted the benefits of safe harbors while also exercising copyright control.
The judge cautioned against granting social networks total control over public web data, warning of potential information monopolies. He criticized X for prioritizing profit over user privacy, stating that X allowed content extraction as long as it received payment. X's representative did not respond to a comment request.
Meta, formerly Facebook, had also unsuccessfully sued Bright Data. Bright Data emphasized that public online information belongs to everyone, asserting that attempts to restrict public access will be unsuccessful.
"The current situation is unprecedented, with implications spanning general business, research, AI, and more," stated the company.
Bright Data emphasizes that it exclusively scrapes publicly available data accessible to anyone without a login. At the time of the lawsuit's filing, X had made the information scraped by Bright Data available to all users.
领英推荐
Microsoft’s AI ‘Recall’ feature raises security, privacy concerns????
On Monday, Microsoft unveiled its AI-optimized Copilot+ PCs, introducing a new feature that has sparked concerns among certain security experts.
These Copilot+ PCs will come equipped with a preview version of a feature called “Recall,” intended to replicate a "photographic memory" of the user's PC activities.
Recall captures "snapshots" of the active screen every few seconds, allowing users to review their activity through a timeline or search function to locate previously viewed webpages, apps, or files.
During a Wall Street Journal interview, Microsoft CEO Satya Nadella demonstrated how AI models in Copilot+ PCs can search for content like photos using natural language queries. However, Microsoft's FAQ clarifies that the Recall feature "does not perform content moderation" and won't hide sensitive information like passwords or financial data, raising concerns about potential exposure of sensitive data to threat actors.
The feature's constant monitoring of computer activity has led to comparisons with spyware. Security experts warn of security risks and invasion of privacy, with Patrick Tiquet from Keeper Security emphasizing the potential dangers of storing sensitive information without proper security measures.
GitHub Enterprise Server patches 10-outta-10 critical hole?????
GitHub has addressed a critical security flaw in its Enterprise Server software, earning a severity score of 10 out of 10. The vulnerability, assigned CVE-2024-4985, grants full admin access to exploiters in instances of GitHub Enterprise Server before version p3.13.0. It primarily affects instances using SAML single sign-on (SSO) authentication with the optional encrypted assertions feature enabled. Exploiting this flaw allows attackers to forge a SAML response, gaining unauthorized administrator privileges.
Furthermore, the bug is absent in versions stemming from the latest 3.13.x branch, but is present in the 3.9.x, 3.10.x, 3.11.x, and 3.12.x branches. Given that many users continue to utilize older software versions, the vulnerability is likely to have a significant impact.
GitHub, owned by Microsoft, which has pledged to enhance its security measures, discovered the flaw through its bug bounty program. This program rewards individuals who identify vulnerabilities in GitHub software. More severe bugs yield larger rewards, with the discoverer of this particular issue receiving a substantial payout of $20,000 to $30,000, as per GitHub's program.
Critical Netflix Genie Bug Opens Big Data Orchestration to RCE????♂?
A critical vulnerability, identified as CVE-2024-4701, has been discovered in the open-source version of Netflix's Genie job orchestration engine for big data applications. This flaw allows remote attackers to potentially execute arbitrary code on affected systems. It has a critical severity score of 9.9 out of 10 on the CVSS scale and targets organizations running their own instance of Genie OSS, exploiting the local file system used for uploading and storing user-submitted file attachments.
Organizations use Genie to orchestrate, run, and monitor various big data jobs and workflows across different frameworks and distributed computational clusters. Genie provides APIs to manage metadata, configuration of these clusters, and the applications running on them. Additionally, it offers APIs for users to access computational resources needed for big data environments such as Hadoop, Spark, Pig, Hive, Sqoop, and Presto.
Researchers from Contrast Security found a critical vulnerability in Netflix's Genie, enabling remote code execution during file uploads. This flaw can expose files outside the application's root directory, including back-end credentials, application code, and sensitive system files.
What is an HTTP cookie and what is it used for?????
HTTP cookies enhance web browsing by improving user experience and ensuring website functionality, though many people are unaware of how they work. This article explains the functions, types, benefits, and drawbacks of HTTP cookies. It also discusses their importance in web scraping. Understanding HTTP cookies can help both internet users and web developers navigate the online world more efficiently and make informed privacy decisions.
What are HTTP Cookies?
Web cookies are pieces of information stored by a web browser on a user's device. When you visit a website, it sends cookies to your browser, which stores them and sends them back to the server on future visits. This helps websites remember your preferences, login information, and customized settings. Cookies manage sessions, track user activity, and retain user engagement details, supporting features like shopping carts, personalized recommendations, and persistent logins.
What are HTTP cookies used for?
HTTP cookies enhance website functionality and user experience in several ways:
Types of Cookies??
There are different types of HTTP cookies, each serving unique purposes:
HTTP Cookies in web scraping
When web scraping, it's crucial to mimic human behavior to avoid detection. Web servers can identify and block bot activity, increasing the chances of your scraper being blocked. Even if not blocked, you might still encounter error messages from the target websites.
Properly managing HTTP cookies is essential for web scraping. Using cookies when sending requests helps access necessary data and avoid detection. Attempting to access pages without the main site's cookies can flag your scraping activities as suspicious.
How Cookies are used in web scraping? Read the full article and find more ??
Welcome to "Three Truths and a Lie"! We'll serve you up with four headlines, but only one is fibbing. Can you spot the rogue headline among the truth-tellers? Let the guessing games begin!
Answer: Get ready to watch the SpaceX Raptor engine attempt a flawless ballet routine during its testing phase. Spoiler alert: instead of executing a graceful dance, the engine opted for a dramatic finale by blowing up spectacularly! Think of it as less of a refined performance and more of an explosive exit! Take care, guys!
Until next time, stay curious, stay tech-savvy, and we'll catch you in the next edition! ????
Want to gather data without breaking a sweat? Jump on board with our proxy solutions and let's make data collection a breeze!
No boring stuff here – just tech with a side of swagger! ????