UK Watchdog: Meta Accused of Data Harvesting for AI Training
Hello and welcome back to Digital Frontiers of Data!
?? Ready for some exciting updates? Here's a sneak peek of what's coming your way! We'll be exploring intriguing stories, like China who pushes for network upgrade blitz as IPv6 adoption slows and Elon Musk’s X who loses lawsuit against Bright Data over data scraping.
So, sit back, relax, and get ready for some thrilling insights! ??
Hong Kong Testing ChatGPT-Style Tool After OpenAI Took Steps to Block Access ????
Hong Kong’s government is testing a ChatGPT-style tool for its employees, with the aim of eventually making it available to the public. This initiative follows OpenAI's decision to implement stricter measures to block access from the city and other unsupported regions.
Sun Dong, Secretary for Innovation, Technology and Industry, mentioned on a Saturday radio show that his bureau is experimenting with the artificial intelligence program, known in Chinese as the “document assistance application for civil servants," to enhance its functionality. He intends to roll it out to the rest of the government later this year.
The program was developed by a generative AI research center led by the Hong Kong University of Science and Technology, in collaboration with other universities.
Sun mentioned that future functions might include graphics and video design, though it's unclear how it will compare to ChatGPT. His bureau did not respond to The Associated Press' inquiries about the model's functions.
On the radio show, Sun stated that both industry and the government would contribute to the model's future development. He noted that due to Hong Kong's situation, attracting support from companies like Microsoft and Google is challenging, so the government had to initiate the project.
Intel can’t stay silent for much longer???
A Reddit user shared their experience with me about instability issues with their Core i9-13900K. They went through four of these CPUs over several months, leaving them without a functioning gaming PC for three months due to repairs. Another user mentioned they had to replace every component in their system except for the CPU, despite never having had issues with Intel products before. This time, however, the Core i9-14900K was the culprit.
Intel needs to address the instability issues with their 13th-gen and 14th-gen CPUs. While there have been BIOS updates, comments from PR managers, and documentation shared with motherboard vendors, the company must make a public statement. More users, both personal and business, are reporting instability problems, with some even labeling Intel’s high-end chips as “defective.”
Let's rewind. Initial reports of instability with the Core i9-13900K and Core i9-14900K surfaced late last year, but it was a report from Korea that brought significant attention to the issue. Gamers were returning these CPUs en masse due to crashes in Tekken 8 and other Unreal Engine games. This prompted Intel to collaborate with motherboard vendors to release the Intel Baseline Profile, which adjusted numerous BIOS settings to enhance stability. However, some tests indicated that this new profile could result in up to a 9% drop in performance.
The discussion quieted down but is now resurging. YouTube channel Level1Techs released a video investigating the source of the crashes and uncovered troubling data. One developer mentioned they could lose over $100,000 due to multiplayer server crashes on servers using the Core i9-13900K or Core i9-14900K. Sources from Dell, Lenovo, and HP suggested that 10% to 25% of these chips might have issues, while Level1Techs speculated the number could be as high as 50%.
Apple issues spyware warnings to iPhone users across 98 countries????
Apple has issued a new round of threat notifications to iPhone users in 98 countries, warning them about potential use of Pegasus spyware on their phones. This is the second such campaign this year, following a similar effort in April that reached users in 92 countries.
According to a support document on Apple’s website, the company has been regularly issuing these notifications since 2021, impacting users in over 150 countries. The latest warnings, sent on July 10th, did not specify the attackers or the countries affected.
The notification to affected customers stated: “Apple detected that you are being targeted by a mercenary spyware attack attempting to remotely compromise the iPhone associated with your Apple ID.” Apple highlighted the targeted nature of these attacks, explaining, “This attack is likely targeting you specifically because of who you are or what you do.” Despite the challenges in preventing nation-state attacks, Apple stressed the importance of taking these warnings seriously.
Pegasus, termed military-grade spyware by Apple, is developed by the Israeli NSO Group to target journalists and activists using zero-day vulnerabilities on mobile devices.
Recent reports cite Apple's notifications to users in India and previous warnings to journalists and politicians there. Amnesty International confirmed Pegasus on iPhones of Indian journalists.
Apple emphasized the sensitivity of its threat detection methods, refraining from detailed disclosures to prevent evasion by attackers. Apple now describes these incidents as "mercenary spyware attacks," marking a shift in terminology from "state-sponsored," possibly reflecting evolving threat classification.
China pushes for network upgrade blitz as IPv6 adoption slows????
State-controlled media covered the third China IPv6 Innovation and Development Conference on 11st of July, 2024. Officials disclosed that by May 2024, China had 794 million users of the protocol. They also reported that 64.56% of mobile traffic and 21.21% of fixed network traffic now use IPv6 networks.
In the previous conference in July last year, it was announced that as of May 2023, there were 763 million active IPv6 users, up from 697 million in July 2022 according to China's State Council.
Despite these figures spanning slightly longer periods, it appears evident that IPv6 adoption in the region has slowed down.
The country is nearing its targets for 2025, aiming to reach 800 million IPv6 users and have 70% of mobile traffic utilizing the protocol. It has already achieved the goal of having 15% of fixed traffic on IPv6.
To accelerate IPv6 adoption, Beijing issued a directive last year mandating that all new Wi-Fi routers sold in China must support IPv6 and enable it by default upon activation, potentially driving significant progress.
More Context
Beijing plans to launch a year-long initiative to promote IPv6 adoption in eight major cities, collectively home to over 110 million people. This initiative will involve consumers, government agencies, and datacenter operators in cities such as Shanghai, Beijing, Hangzhou, and Shenzhen in a focused effort to increase IPv6 usage. The goal is twofold: to achieve short-term objectives and to support China's ambition to lead globally in IPv6 deployment.
Despite these efforts, current assessments of IPv6 adoption indicate that China is falling behind its targets. According to Akamai, countries like India, Malaysia, Germany, and France lead in IPv6 adoption, while China ranks 61st with a 22.2% adoption rate. The Asia Pacific Network Information Centre reports India as the global leader with 79.85% of networks capable of IPv6, followed by Malaysia, Saudi Arabia, France, and Germany, with China at 36.71%.
China recognizes the importance of networked technology for its economic development. However, its allocation of IPv4 addresses—just over 330 million—equates to approximately one address for every 245 people, which is lower than in many other countries.
While network address translation provides a workaround for connecting more devices to IPv4 networks, IPv6 offers superior features for network performance, management, and monitoring. These capabilities are particularly crucial in a surveillance-oriented state like China, where internet activities are closely monitored to censor objectionable content and identify users.
Elon Musk’s X loses lawsuit against Bright Data over data scraping????
A federal judge in California has dismissed Elon Musk's X's lawsuit against Israel's Bright Data concerning the scraping of public online data and its permissible use.
Formerly Twitter, X accused Bright Data of scraping its data and selling it while allegedly circumventing X's anti-scraping measures. X also alleged violations of its terms of service and copyright.
Data scraping involves automated programs extracting data from publicly accessible websites, often for purposes such as training AI models and targeting online ads. In the U.S., scraping publicly accessible data is generally legal following a 2022 ruling that concluded a lengthy legal battle involving LinkedIn.
Previously, X filed a suit in Dallas County seeking over $1 million in damages from unknown defendants for allegedly unlawfully scraping data associated with Texas residents.
Judge William Alsup, in dismissing the complaint, argued that "X Corp. wants it both ways: to maintain its safe harbors while also exercising a copyright owner’s right to restrict access and charge fees for extracting and copying X users’ content."
The judge cautioned against granting social networks full control over the collection and utilization of public web data, warning of potential monopolies that would not serve the public interest. He criticized X for prioritizing profit over protecting users' privacy, alleging the company was willing to permit data extraction and copying as long as it received payment.
A spokesperson for X did not respond immediately to a request for comment.
Previously, Meta also filed a complaint against Bright Data with similar outcomes.
Bright Data responded in an email statement, asserting that its victories against Meta and X affirm that public information online "belongs to everyone, and attempts to restrict access will be unsuccessful."
The company emphasized that it only scrapes publicly available data accessible to anyone without needing to log in. At the time of the lawsuit, X had made the scraped information available to the public.
Privacy warriors gripe to UK watchdog about Meta harvesting user data to train AI????
A UK-based data rights campaign group has filed a complaint with the country's data regulator against Meta's updated privacy policy. The policy change allows Meta to collect user data for developing AI models.
Open Rights Group (ORG), a UK organization advocating for online privacy and free speech, highlighted that Meta notified Facebook and Instagram users in the UK about upcoming changes to its privacy policy via email at the end of May. These changes, set to take effect on June 26, would permit Meta to utilize individuals' data under the legal basis of "legitimate interests" for AI development purposes.
This complaint mirrors a similar action in the European Union under the General Data Protection Regulation (GDPR). In response to GDPR concerns, Meta agreed to halt its plans to train AI models using posts from EU users on Facebook and Instagram.
Although UK data protection laws currently align with those of the EU, the UK's departure from the EU at the end of 2020 has introduced some distinctions in regulatory approach.
The ORG highlighted that Meta's privacy policy lacks an official amendment making the cessation of data processing for the development of Meta's "AI technologies" legally enforceable in the UK. Consequently, the organization has lodged a formal regulatory complaint with the UK's data protection regulator, the Information Commissioner’s Office (ICO).
Mariano delli Santi, the complainant and Legal and Policy Officer at Open Rights Group, expressed concerns: “Meta’s intentions to ingest data, posts, and pictures from its users will impact over 50 million Instagram and Facebook users in the UK. It is unacceptable for the company to merely offer users the option to opt out rather than seek their consent for such invasive data processing.
“The proposed actions appear to contravene several aspects of the UK GDPR, and we urge the ICO to conduct a thorough investigation and put an end to these practices,” he added. The ORG further noted that while Meta informed users of their right to object, it did not commit to universally honoring objections, and consent could not be retroactively applied once a user's data had been utilized by the company.
The ICO has not responded to The Register's request for comment.
Meta paused its AI training plans for EU users' Facebook and Instagram posts in June after pressure from EU regulators. This delay affects Meta's plans to launch Meta AI in the economic zone.
In a blog post, Meta's Global Engagement Director Stefano Fratta expressed confidence in their compliance with European laws regarding AI training. He emphasized Meta's transparency compared to other industry players.
More Context
Hackers Exploit Flaw in Squarespace Migration to Hijack Domains????
Last week, several cryptocurrency platforms faced a crisis as hackers exploited vulnerabilities in Squarespace's domain migration process. The attacks, beginning on July 9, targeted domains transferred to Squarespace following its acquisition of domain registrations and customers from Google Domains last year.
Squarespace had been in the process of migrating approximately 10 million domain names from Google Domains, but a flaw in their migration procedure enabled hackers to seize control of accounts and alter DNS records for these domains.
Security Alliance reported that Squarespace aimed for a seamless transition by migrating all email addresses from Google Domains, presuming they would be used by domain owners and collaborators to establish Squarespace accounts. However, by associating emails with domains beforehand, Squarespace inadvertently enabled attackers to potentially create accounts ahead of legitimate users.
Moreover, Squarespace does not mandate email validation during account creation, even for accounts secured with passwords. This oversight has enabled attackers to create accounts by guessing email addresses associated with domains transferred from Google Domains.
As a result, attackers could take control of Squarespace accounts and gain unrestricted access to the linked domains without confirming the legitimacy of the email addresses associated with those accounts.
Microsoft finally fixes Outlook alerts bug caused by December updates?????
Microsoft has resolved a longstanding Outlook issue identified in February, where incorrect security alerts were triggered following the installation of December's desktop updates.
The company acknowledged the bug in early February, prompted by reports from numerous Microsoft 365 users encountering unexpected warnings such as "This location may be unsafe" and "Microsoft Office has identified a potential security concern" when opening ICS calendar files.
These alerts were determined to be false alarms linked to the Outlook security updates. These updates addressed an information disclosure vulnerability (CVE-2023-35636), which could potentially allow attackers to steal NTLM hashes through maliciously crafted files.
The stolen NTLM hashes can be exploited in pass-the-hash attacks on Windows systems, enabling unauthorized access to sensitive data or lateral movement within the network.
Microsoft initially addressed the issue in early April but later withdrew the fix after encountering problems during testing with Office Insiders in the Beta Channel. "The Outlook Team identified issues with the fix during testing in the Insider channels," Microsoft explained.
In a recent update to its support document, Microsoft confirmed the long-standing issue was resolved in the July 9th public update for Outlook Desktop. Users who implemented Microsoft's workaround—adding registry keys to disable the security notice—are now instructed to reverse this before installing the updated Outlook versions to ensure the bug fix is effective.
Google’s Biggest Acquisition Ever: Plans to Buy Cybersecurity Startup Wiz for $23 Billion????
Google, the tech giant, is said to be in advanced negotiations to purchase cybersecurity startup Wiz for about $23 billion. If successful, this acquisition would mark Google's largest in history, nearly doubling the $12.5 billion it spent on Motorola Mobility in 2012.
Meet Wiz, the cybersecurity powerhouse. Founded in 2020 and based in New York, Wiz has swiftly emerged as a top provider of cloud-based cybersecurity solutions. Leveraging artificial intelligence for real-time threat detection and response, Wiz serves 40% of Fortune 100 firms and collaborates with leading cloud providers including Amazon, Microsoft, and Google. With revenues of around $350 million in 2023 and a recent $1 billion funding round, valuing the company at $12 billion, Wiz is also considering an initial public offering (IPO).
Google's acquisition of Wiz is viewed as a pivotal step to bolster Google Cloud's standing in the fiercely competitive cloud services sector. Currently trailing Amazon Web Services and Microsoft Azure, Google Cloud intends to utilize Wiz's cutting-edge cybersecurity expertise to set itself apart and appeal to more corporate customers. This move aligns with Google's strategy to expand revenue sources beyond its core search advertising business, prioritizing growth in cloud computing and security services.
Web Scraping Using Ruby?????
Web scraping has become indispensable for companies and developers seeking efficient data collection from the internet. Ruby, renowned for its simplicity and readability, is widely adopted for web scraping projects. This comprehensive guide explores effective methods and strategies for web scraping using Ruby, covering everything from setup to extracting information from dynamic web pages.
Web scraping involves gathering data from websites to automate the swift and effective extraction of information. Ruby's syntax and robust libraries make it an excellent choice for such tasks. This guide will take you through the entire process of web scraping with Ruby, from initial setup to advanced strategies.
Is Ruby Good for Web Scraping?
Ruby is an interpreted, open-source, dynamically typed programming language that supports both object-oriented and procedural development. Known for its prioritization of simplicity, Ruby features a syntax that is easy to write and naturally readable. This efficiency has made Ruby popular for a variety of applications, including web scraping.
Ruby's abundance of third-party libraries, referred to as "gems," makes it particularly well-suited for web scraping. These gems encompass a wide range of tasks, simplifying the downloading of web pages, analysis of HTML content, and extraction of data.
In summary, conducting web scraping with Ruby is not only feasible but also straightforward, thanks to the multitude of available libraries.
Best Gems in Web Scraping Using Ruby
Ruby's ecosystem includes a wealth of gems that boost the effectiveness and simplicity of web scraping tasks. These gems provide functionalities such as managing HTTP requests, parsing HTML content, handling cookies, and more. Here are some highly recommended Ruby gems for web scraping:
Nokogiri
Nokogiri is widely regarded as the essential gem in the Ruby community for parsing HTML and XML. It excels in navigating documents using CSS selectors and XPath, making it invaluable for extracting information from web pages.
HTTP Party
HTTParty simplifies the task of sending HTTP requests with its user-friendly interface. It seamlessly integrates with other popular Ruby gems like Nokogiri, enhancing its functionality.
Mechanize
Mechanize is a versatile tool crafted for automating website interactions, including cookie management, session handling, and form submissions. It's particularly useful for extracting data from pages that require login credentials or other forms of user engagement.
Watir
Watir, short for Web Application Testing in Ruby, is a tool specialized in automating web browsers. It proves invaluable for extracting content that depends on the execution of JavaScript.
Kimurai
Kimurai is a web scraping framework built for Ruby, leveraging Nokogiri, Watir, and Capybara as its core components. It provides a user-friendly interface to manage multiple spiders, making it a powerful solution for complex scraping tasks.
Conclusion
Ruby is a powerful tool for web scraping, enabling the collection of information from websites with ease. Its simplicity and the availability of gems like Nokogiri, HTTParty, Mechanize, and Watir make it an excellent choice for both beginners and experienced developers. Whether you're extracting data from dynamic pages or tackling other scraping challenges, Ruby provides the necessary tools to accomplish tasks effectively. Dive into the world of web scraping with Ruby and unlock the potential of automated data retrieval.
Welcome to "Three Truths and a Lie"! We'll serve you up with four headlines, but only one is fibbing. Can you spot the rogue headline among the truth-tellers? Let the guessing games begin!
1. Didero is using AI to solve supply chain management at mid-market companies
2. YouTube furious after Apple and Antrophic steal their data to train AI
3. Google, Amazon and the problem with Big Tech’s climate claims
4. Gen Z absolutely loves tech jobs, especially coding in Pig Latin
Answer: The idea that "Gen Z Absolutely Loves Tech Jobs, Especially Coding in Pig Latin" is as far-fetched as finding Bigfoot on a scooter. Gen Z isn't souring on tech jobs; they're discovering that it's less about coding in Pig Latin and more about creating the next big thing in tech—whether it's AI that understands their TikTok obsessions or apps that solve real-world problems with a swipe and a tap.
Until next time, stay curious, stay tech-savvy, and we'll catch you in the next edition! ????
Want to gather data without breaking a sweat? Jump on board with our proxy solutions and let's make data collection a breeze!
No boring stuff here – just tech with a side of swagger! ????