Top Websites Block Apple’s AI Scraping
Good day! It’s Friday the 13th, and guess what? It’s the first of two this year—mark your calendars, the next one’s coming in December!
Every time one of these unlucky days rolls around, we ask people to share their personal superstitions and their go-to methods for keeping bad luck or evil spirits at bay.
What's your go-to superstition for keeping bad luck or evil spirits away? Share it with us!
EU court rules Google, Apple must pay billions of euros in antitrust, tax cases????
On Tuesday, a leading European Union court ruled that Google must pay a €2.4 billion fine imposed by the bloc's antitrust regulators seven years ago. At the same time, the court dismissed Apple's last legal challenge against a European Commission order requiring it to repay €13 billion in back taxes to Ireland.
On Tuesday, the EU’s top court delivered major wins for Brussels, ruling against Apple and Google. Apple must pay €13 billion in back taxes to Ireland, while Google’s appeal against a €2.4 billion antitrust fine was rejected. These decisions are a boost for outgoing competition chief Margrethe Vestager, who had faced previous setbacks in EU courts.
Sweetheart tax arrangements
This case was one of many investigations into favorable tax deals between big companies and EU countries over the past decade.
In 2020, Apple initially won when the EU's General Court overturned the tax repayment order, leading Brussels to appeal.
In November, the European Court of Justice's legal adviser suggested overturning the General Court's decision due to legal errors.
The top court then ruled that Apple must pay the back taxes, avoiding a return to the lower court.
Microsoft Says Windows Update Zero-Day?Being Exploited to Undo Security Fixes????
On Tuesday, Microsoft warned of active exploitation of a critical flaw in Windows Update, where attackers are reversing security fixes on certain Windows versions. The flaw, CVE-2024-43491, is rated critical with a CVSS score of 9.8/10. Microsoft did not disclose public exploitation details or provide indicators of compromise, noting the issue was reported anonymously. The bug appears similar to the ‘Windows Downdate’ problem discussed at Black Hat this year.
From the Microsoft bulletin: Microsoft has identified a vulnerability in the Servicing Stack that has reverted fixes for certain issues on Windows 10, version 1507 (released July 2015). This flaw allows attackers to exploit previously mitigated vulnerabilities on systems with the March 12, 2024 update (KB5035858) or subsequent updates until August 2024. Later versions of Windows 10 are not affected.
Microsoft has advised affected Windows users to first install this month’s Servicing Stack Update (SSU KB5043936) and then the September 2024 Windows Security Update (KB5043083).
This Windows Update vulnerability is one of four zero-days currently flagged by Microsoft’s security response team as actively exploited.
The vulnerabilities include CVE-2024-38226 (Office Publisher), CVE-2024-38217 (Windows Mark of the Web), and CVE-2024-38014 (Windows Installer).
Microsoft has reported 21 zero-day attacks on Windows flaws this year.
The September Patch Tuesday fixes around 80 security issues across products like Microsoft Office, Azure, and SQL Server, with seven rated critical.
Meta outlines WhatsApp and Messenger’s cross-app messaging features????
Despite the rise in messaging platforms, most remain isolated, limiting communication to users within the same app. For example, Facebook Messenger users can only chat with other Messenger users, and WhatsApp users face similar restrictions. However, this will change for consumers in Europe.
The shift is driven by the European Union’s Digital Markets Act (DMA), which requires major tech companies to integrate with third-party services. This will enable users of WhatsApp and Messenger to communicate with people on other apps. Meta has been preparing for this change, and the details are now emerging.
Although no specific timeline is set, Meta plans to first introduce individual third-party chat capabilities, followed by group messaging in 2025 and calling features by 2027. Currently, this requirement applies only in Europe, but it's likely that cross-platform messaging will eventually expand to other regions, including the U.S.
Meta has shared more details about the upcoming chat features for WhatsApp and Messenger. The company aims to enhance cross-platform messaging by incorporating user and third-party feedback. According to Meta:
“We’ve heard that connecting with users on other apps should be straightforward and easy to find. Therefore, we’ve added new notifications in WhatsApp and Messenger to inform users about third-party chats and will alert them when new messaging apps become available.”
Meta is also focusing on user control, creating a simple onboarding process to help users understand and manage third-party chats. Users will be able to choose which third-party apps to connect with and how to handle their inbox, including options to merge messages from all apps or keep them separate.
Beyond basic interoperability required by the DMA, Meta plans to support advanced features like message reactions, direct replies, typing indicators, and read receipts in cross-platform interactions.
Major Sites Are Saying No to Apple’s AI Scraping????
Less than three months after Apple introduced a tool allowing publishers to opt out of its AI training, several major news outlets and social platforms have taken advantage of it.
WIRED can confirm that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast, are all opting out of having their data used for Apple’s AI training. This response highlights a major shift in attitudes towards web crawlers, which, once used to gather data indiscriminately, are now at the center of debates over intellectual property and the future of the internet.
Apple has introduced a new tool called Applebot-Extended, an extension of its web-crawling bot that allows website owners to opt out of having their data used for AI training. Apple refers to this as “controlling data usage” in a blog post about the feature. Originally launched in 2015, the Applebot was designed to gather data for Apple’s search products like Siri and Spotlight. Recently, its role has expanded to include training foundational models for Apple’s AI initiatives.
Applebot-Extended is designed to respect publishers' rights, according to Apple spokesperson Nadine Haija. It does not stop the original Applebot from crawling websites, which would affect how content appears in Apple search products. Instead, it prevents the collected data from being used to train Apple’s large language models and other AI projects. Essentially, it customizes how another bot operates.
Publishers can block Applebot-Extended by updating their robots.txt file, which has long regulated how bots scrape the web. This file is now central to the ongoing debate about AI training. Many publishers have already modified their robots.txt files to block AI bots from companies like OpenAI and Anthropic.
The robots.txt file allows website owners to control which bots can access their sites on a case-by-case basis. While there's no legal requirement for bots to follow these rules, adherence is a well-established practice. (However, some bots ignore it; a WIRED investigation earlier this year found that AI startup Perplexity was bypassing robots.txt and scraping websites covertly.)
Since Applebot-Extended is new, few websites have blocked it so far. A recent analysis by Ontario-based AI-detection startup Originality AI of 1,000 high-traffic sites found that about 7 percent—mostly news and media outlets—were blocking the bot. Dark Visitors conducted a similar analysis and found that approximately 6 percent of 1,000 high-traffic sites had blocked Applebot-Extended. These findings suggest that most website owners either do not object to Apple’s AI training or are unaware of the option to block the bot.
Data journalist Ben Welsh found that just over 25% of news websites surveyed are blocking Applebot-Extended, compared to 53% blocking OpenAI’s bot and nearly 43% blocking Google’s AI bot, Google-Extended. This suggests that Applebot-Extended may still be less prominent. Welsh notes that the number of sites blocking Applebot-Extended is gradually increasing.
Welsh’s ongoing project reveals a divide among news publishers on whether to block major AI bots. Some publishers may be negotiating licensing deals, which could influence their decisions. Since last year, Apple has been seeking AI partnerships with publishers, and competitors like OpenAI and Perplexity have formed various agreements. Jon Gillham of Originality AI suggests that some publishers may be strategically withholding data until formal partnerships are established.
New AI standards group wants to make data scraping opt-in???
Initially, major generative AI tools were trained on data scraped from the internet, often encompassing anything publicly available. Now, as data sources become more restrictive and seek licensing agreements, new startups are emerging to secure additional data.
The Dataset Providers Alliance, formed this summer, aims to standardize and fairize the AI industry. The group has released a position paper detailing its views on key AI issues. The alliance includes seven AI licensing firms, such as music rights-management company Rightsify, Japanese stock-photo marketplace Pixta, and generative-AI copyright-licensing startup Calliope Networks. Additional members are expected to join in the fall.
The DPA advocates for an opt-in system, requiring explicit consent from creators and rights holders before their data can be used. This approach contrasts with many major AI companies, which often rely on opt-out systems that place the burden on data owners to remove their work individually, or offer no opt-out options at all.
The DPA, which expects its members to follow the opt-in rule, considers this method more ethical. Alex Bestall, CEO of Rightsify and Global Copyright Exchange, who led the initiative, believes that opt-in is both a pragmatic and moral choice. He notes, “Selling publicly available datasets can lead to lawsuits and loss of credibility.”
Ed Newton-Rex, former AI executive and head of the ethical AI nonprofit Fairly Trained, argues that opt-outs are “fundamentally unfair to creators,” as some may not even be aware of the option. He praises the DPA’s call for opt-ins as a positive step.
Shayne Longpre, head of the Data Provenance Initiative, praises the DPA's commitment to ethical data sourcing but doubts the opt-in standard's practicality due to the high volume of data required by AI models. He warns that only large tech companies might afford comprehensive data licensing.
The DPA opposes government-mandated licensing, advocating instead for direct negotiations between data originators and AI companies. The DPA proposes various compensation models, including subscription-based, usage-based, and outcome-based licensing, to ensure fair payment for data.
Bill Rosenblatt, a copyright technologist, supports the idea of standardizing compensation structures, believing it could facilitate mainstream adoption of licensing practices. He emphasizes the need for incentives and ease of use to encourage AI companies to adopt these models.
领英推荐
Looking for 240/4 addresses????
The IANA’s IPv4 address registry includes a block of addresses under the prefix 240/4, marked as ‘Future Use’ per RFC 1112, which reserved these addresses for potential future applications. This block includes 268,435,455 addresses, raising questions about why such a large portion remains unused while IPv4 addresses are nearly exhausted. The unused addresses could potentially alleviate IPv4 depletion issues, leading to debate over why they haven't been repurposed if no future use has been defined.
In 2008, key discussions about the use of the 240/4 address block were documented in several Internet drafts. One notable draft, draft-wilson-class-e, proposed redesignating this address block for private use. The idea was to expand the pool of local addresses to aid in the transition from IPv4 to IPv6, particularly for large networks operating in a dual-stack environment.
In such scenarios, reusing the 10/8 address block was deemed impractical due to existing allocations and potential address conflicts. Instead, the 240/4 block was suggested as a more straightforward alternative. It would enable the connection of Consumer Premises Equipment (CPE) Network Address Translators (NATs) to border Carrier NATs, avoiding the need for more complex solutions like Dual-Stack Lite (RFC 6333), NAT464, or 464XLAT (RFC 6877). This approach aimed to simplify network management while supporting the IPv6 transition.
In 2008, two main proposals emerged regarding the use of the 240/4 address block. The draft-wilson-class-e suggested using it for private purposes due to many IP implementations discarding packets with these addresses, making them unsuitable for general use. In contrast, the draft fuller-240space recommended reclassifying the block as conventional unicast space, arguing that it should not remain unused given the rapid depletion of IPv4 addresses.
With IPv4 exhaustion expected between 2010 and 2012 and the slow transition of mobile networks to IPv6, there was a pressing need to optimize the use of available IPv4 addresses.
By 2009, the annual consumption of IPv4 addresses had reached 190 million, making the addition of 268 million addresses from the 240/4 block likely to only extend the IPv4 address pool by about 16 months. Updating all IP hosts to accept these addresses would take a comparable amount of time and potentially distract from the urgent need to transition to IPv6.
At that point, the focus shifted to dual-stack transition mechanisms, with over 30 proposals emerging around 2010. Managing the remaining IPv4 addresses took precedence over the 240/4 block, leading to the proposals for using the 240/4 block being sidelined.
WordPress Mandates Two-Factor Authentication for Plugin and Theme Developers?????
WordPress.org is implementing a new security policy requiring two-factor authentication (2FA) for accounts that can update plugins and themes. This requirement will take effect on October 1, 2024.
The WordPress maintainers emphasized that accounts with commit access can make updates to plugins and themes used by millions of WordPress sites globally. Securing these accounts is crucial to prevent unauthorized access and to uphold the security and trust within the WordPress.org community.
In addition to mandatory 2FA, WordPress.org is introducing SVN passwords—unique passwords specifically for committing changes. This measure adds an extra layer of security by separating code commit access from main WordPress.org account credentials.
The new SVN passwords function like application-specific passwords, protecting your primary password from exposure and allowing easy revocation of SVN access without altering your main account credentials.
Due to technical limitations, 2FA cannot yet be applied to existing code repositories. As a result, WordPress.org will implement a combination of account-level 2FA, high-entropy SVN passwords, and other security features like Release Confirmations.
These steps aim to prevent scenarios where attackers could gain control of accounts and inject malicious code into legitimate plugins and themes, mitigating the risk of large-scale supply chain attacks.
This update comes as Sucuri warns of ongoing ClearFake campaigns targeting WordPress sites to distribute the RedLine information stealer by deceiving visitors into running malicious PowerShell scripts.
Fake password manager coding test used to hack Python developers????
Members of the North Korean hacker group Lazarus, posing as recruiters, are targeting Python developers with a fake coding test for password management software that contains malware.
These attacks are part of the 'VMConnect campaign,' which was first identified in August 2023. In this campaign, attackers have been targeting software developers by uploading malicious Python packages to the PyPI repository.
ReversingLabs, which has been monitoring the campaign for over a year, reports that Lazarus hackers host these malicious projects on GitHub. Victims are lured by README files that provide instructions for completing the test.
The instructions are designed to create an impression of professionalism and urgency in the process.
ReversingLabs discovered that Lazarus impersonates major U.S. banks, such as Capital One, to lure job candidates with attractive employment offers.
Additional evidence from a victim indicates that Lazarus also reaches out to targets via LinkedIn, a tactic that has been previously documented for the group.
The README file directs the victim to first run the malicious password manager application ('PasswordManager.py') on their system, and then proceed to identify and fix any errors.
What is a transparent proxy and how does it work??????
In today's digital landscape, where privacy and data management are major concerns, proxies play a vital role in controlling and protecting network traffic. Transparent proxies are particularly notable because they operate silently, without requiring any user intervention or noticeable configuration changes.
What is a Transparent Proxy and How Does It Work?
Transparent proxies are distinct among proxy types because they function seamlessly without requiring user awareness or manual setup on the client side. This article offers an in-depth look at transparent proxies, including their operation, use cases, advantages and disadvantages, and ways to detect them.
How Does a Transparent Proxy Work?
Transparent proxies, also known as inline or see-through proxies, intercept user requests without altering the user’s browsing experience or requiring manual setup. They operate discreetly at network gateways, such as routers or firewalls. Here’s a breakdown of their operation:
There are various types of transparent proxies, each serving different purposes:
Transparent Proxy Use Cases
Transparent proxies are versatile tools used across various scenarios due to their ability to operate without user interaction. Here are key use cases:
Welcome to "Three Truths and a Lie"! We'll serve you up with four headlines, but only one is fibbing. Can you spot the rogue headline among the truth-tellers? Let the guessing games begin!
Answer: As amazing as it would be to grab the entire internet for less than the cost of a concert ticket ???, that’s not how it went down. Researchers actually spent $20 to control just a small portion of the internet ??— like a test drive ??, not a full-blown ownership deal. So don’t worry, Google isn’t suddenly under new management because of a bargain sale! ??
Until next time, stay curious, stay tech-savvy, and we'll catch you in the next edition! ????
No boring stuff here – just tech with a side of swagger! ????