登录查看更多内容

Is Web Scraping Illegal? Depends on Who You Ask

Segun Aderibigbe MBA, CISSP.

Consultant | Board Member | Innovator |

发布日期: 2024年2月6日

Web scraping has existed for a long time, and depending on who you ask, it can be loved or hated. But where is the line drawn between extracting data for legitimate business purposes and malicious data extraction that hurts business? The bar is getting blurrier by the day, and the introduction of generative artificial intelligence (AI) and large language models (LLMs) has complicated things even further. Legal actions against web scraping are slow and vary by country, leaving organizations to fend for themselves.

What is Web Scraping?

Web scraping is a technique to swiftly pull large amounts of data from websites using automated software (bots). The OWASP classifies it as an automated threat (OAT-011). Web scraping differs from screen scraping in that it can extract underlying HTML code and data stored in databases, while screen scraping only copies pixels displayed on screen.

Web scraping is not a new phenomenon; it has been around since the dawn of the internet. Early web scraping was manual and involved individuals copying and pasting data from web pages. However, as the internet grew more complex, so did the data extraction methods. Developers started writing code to automate the process, and with the advent of machine learning and AI, web scraping has become more sophisticated and efficient. In the age of AI, web scraping has become a critical tool for businesses to gather data for machine learning models, market research, competitor analysis, and more.

Web Scraping Uses: The Good, the Bad, and the Shady

Not all web scraping is bad – the difference is rooted in how it is conducted and how that data is being used. In its positive form, web scraping is a vital underpinning of the internet that is helpful for organizations and consumers alike. Good bots that perform web scraping enable search engines to help create and maintain a searchable index of web pages, price comparison services to save consumers money, and market researchers to gauge sentiment on social media.?

In contrast, bad bots fetch content from a website for purposes outside the site owner’s control, often violating the terms of service. For example, competitors may scrape your pricing information to gain a competitive edge and disrupt your business. Or worse, steal your proprietary content. Content scraping is outright theft at a large scale, and if your content appears elsewhere on the web, your SEO rankings are bound to decrease. In addition, unethical practitioners may scrape personal or sensitive information without consent, leading to privacy violations and potential identity theft.?

Alarmingly, bad bots make up 30% of all web traffic today, and web scraping remains one of the most prominent use cases.

Are Web Scraping Services a Legitimate Business?

In recent years, organizations indulging in web scraping have invested heavily in positioning web scraping as a legitimate business. These attempts to rebrand “bad bots as a service” demonstrate themselves in many ways. First, by adopting professional-looking websites offering business intelligence services called pricing intelligence, alternative data for finance, or competitive insights. Typically, these businesses provide data products focused on specific industries. Second, there is increased pressure to purchase scraped data within your industry. No company wants to lose in the marketplace because the competition has access to available data to buy. Finally, there is the growth of job postings looking for people to fill positions with titles like Web Data Extraction Specialist or Data Scraping Specialist.

A quick look at the website or LinkedIn page of these dubious organizations indulging in web scraping operations reveals numerous articles justifying the use of bots to scrape data. I have seen multiple blog posts discussing why an organization must employ web scraping bots to remain competitive in their market. Some blog posts even boast about their bots’ ability to stay “under the radar” while masquerading as legitimate human users. For example, by using residential ISPs as proxies. This begs the question: Why are these bots trying to evade security measures if such a business is legitimate?

Is Web Scraping Legal?

While web scraping is not inherently illegal, how it is conducted and the data’s subsequent use can raise legal and ethical concerns. Actions such as scraping copyrighted content and personal information without consent or engaging in activities that disrupt the normal functioning of a website may be deemed illegal.?

The legality of web scraping largely depends on the jurisdiction and specific circumstances. In the United States, for instance, web scraping can be considered legal as long as it does not infringe upon the Computer Fraud and Abuse Act (CFAA), the Digital Millennium Copyright Act (DMCA), or violate any terms of service agreements.

领英推荐

Debunking Common Myths about AI-powered Web Data…

Forage AI 1 年前

The Do’s and Don’ts of Web Scraping: Best Practices…

KanhaSoft 5 个月前

10 Premier Web Scraping Solution Providers to Watch in…

WebDataGuru 7 个月前

Can Legal Action Be Taken to Prevent Web Scraping?

Yes, it is possible to take legal action against web scraping, but it largely depends on the context. Suppose a website can prove that scraping has caused harm to its operations or has violated terms of service, intellectual property, or privacy rights. In that case, the court may rule against the scraping activity. However, without a comprehensive law against web scraping, each case is evaluated individually, leading to varying outcomes. Several landmark lawsuits have shaped the legal landscape of web scraping:

In the case of eBay vs. Bidder’s Edge in 2000, eBay successfully sued Bidder’s Edge for scraping its auction data, arguing that the scraping activity exhausted its system and could potentially cause more harm.
In Facebook vs. Power Ventures in 2009, the court sided with Facebook, ruling that Power Ventures violated intellectual property rights by scraping Facebook user data.
One of the most recent and significant cases is LinkedIn vs. hiQ Labs in 2019. The Supreme Court ruled that scraping data publicly accessible on the internet is legal, setting a precedent that has implications for future web scraping activities.

Enforcement of web scraping laws can be challenging due to the global nature of the internet and differing regulations. Some entities actively enforce their terms of service through technological measures or legal action, especially if the scraping leads to tangible harm, such as data breaches, privacy violations, or financial losses. However, the extent of enforcement often depends on the severity of the violation and the resources available to the affected parties or relevant authorities.

This situation poses a moral dilemma for organizations. As the need to leverage specific techniques to avoid being disadvantaged, the probability of turning to web scraping increases. In an environment where constant efforts are made to legitimize web scraping, it is difficult to see the bot problem going away soon.

The Legality of Web Scraping in the Age of Artificial Intelligence

The rise of artificial intelligence (AI) and large learning models (LLMs) has brought the discussion about the legality and ethics of web scraping back to center stage. Web scraping has become a crucial component in training AI systems and LLMs. These models, such as OpenAI’s GPT-4, rely on vast data to learn and generate coherent outputs.?

By scraping data from the internet, these models can be trained on diverse and extensive data corpora, improving their ability to understand and respond to a wide range of inputs. However, this practice has also raised complex legal and ethical questions that businesses must navigate.

Recently, OpenAI faced lawsuits alleging that it unlawfully copied text from books without obtaining consent from copyright holders. These lawsuits have sparked a debate about the boundaries of data collection for AI training. While some argue that this data is necessary for advancing AI technologies, others contend it infringes copyright laws and privacy rights.

The ethical implications of web scraping extend beyond legality. As AI systems and LLMs are trained on scraped data, they may inadvertently amplify and proliferate private information, posing potential risks to individuals and society. Moreover, the lack of transparency in how this data is used and the difficulty in removing data once it has been incorporated into a model raises additional ethical concerns.

Conclusion

The legality of using bots to grab information from public websites remains unclear. It is indeed a grey area, as many applicable laws were written well before the widespread use of the internet or the development of generative AI, and which laws take priority hasn’t been resolved yet.?

In an environment where constant efforts are made to legitimize web scraping, it is difficult to see this bot problem going away soon. Do the existing laws need to be updated to deal with the problem? Should new legislation be introduced to provide more clarity? Certainly, but as courts try to decide the legality of further scraping, companies still have their data stolen and the business logic of their websites abused.?

Source: https://www.imperva.com/blog/is-web-scraping-illegal/

要查看或添加评论，请登录

Segun Aderibigbe MBA, CISSP.的更多文章

Elastic Universal Profiling agent, a continuous profiling solution, is now open source

2024年4月28日

Elastic Universal Profiling agent, a continuous profiling solution, is now open source

Elastic Universal Profiling? Agent goes open source under Apache 2 At Elastic, open source is more than just a…
How to Safeguard Your Company’s Data Across Multiple Clouds

2024年2月6日

How to Safeguard Your Company’s Data Across Multiple Clouds

Enhance your company's cyber resilience with Commvault's comprehensive guide on safeguarding data across multiple…
API Security Explained

2022年8月30日

API Security Explained

As enterprises continue on their digital journeys, security teams are preparing for the good, the bad, and the ugly of…

2 条评论
Account Takeover Attack

2022年7月13日

Account Takeover Attack

Account Takeover (ATO) is an attack whereby cybercriminals take ownership of online accounts using stolen passwords and…
Security Service Edge (SSE) is a Profound Moment for Cloud, Data, and Network Security

2022年6月19日

Security Service Edge (SSE) is a Profound Moment for Cloud, Data, and Network Security

Today we are very excited to share that Netskope has been named a Leader in the 2022 Gartner ? Magic Quadrant ? for…
Security Service Edge (SSE) is a Profound Moment for Cloud, Data, and Network Security

2022年3月12日

Security Service Edge (SSE) is a Profound Moment for Cloud, Data, and Network Security

Today we are very excited to share that Netskope has been named a Leader in the 2022 Gartner ? Magic Quadrant ? for…
Introducing Imperva Cloud Data Security

2020年9月22日

Introducing Imperva Cloud Data Security

We are excited to announce that our latest data security innovation is now available worldwide! Made for the cloud…
SASE & the Future of Network Access Control

2020年8月7日

SASE & the Future of Network Access Control

The current enterprise network security practices focus on verifying the identity of the user and the device in a…
7 Best Practices for Securely Enabling Remote Work

2020年8月7日

7 Best Practices for Securely Enabling Remote Work

Today’s users need flexibility to do their jobs efficiently with the ability to access business systems at any time and…
Not surprisingly, 75% of consumers surveyed said contactless cards are the preferred payment method

2020年8月7日

Not surprisingly, 75% of consumers surveyed said contactless cards are the preferred payment method

Chip. Swipe.

See all articles

Is Web Scraping Illegal? Depends on Who You Ask

Segun Aderibigbe MBA, CISSP.

Consultant | Board Member | Innovator |

What is Web Scraping?

Web Scraping Uses: The Good, the Bad, and the Shady

Are Web Scraping Services a Legitimate Business?

Is Web Scraping Legal?

领英推荐

Can Legal Action Be Taken to Prevent Web Scraping?

The Legality of Web Scraping in the Age of Artificial Intelligence

Conclusion

Segun Aderibigbe MBA, CISSP.的更多文章

社区洞察

其他会员也浏览了

Tips and Tricks for Advanced Strategies in Web Scraping and Price Intelligence

The Future of Web Scraping For MVP Developments

Web Scraping API Tools to Track, Manage and Visualize Your Data Pipeline

The A to Z of Web Scraping Explained

Top Industries Requiring Web Scraping Services in 2025

Guide For AI-Powered Web scraping

7 Best Web Scraping Proxies to Bypass Any Block in 2025

Getting Started with Web Scraping: A Simple Guide

Why Proxy Rotation is Crucial for Successful Web Scraping

Web Scraping Without Getting Blocked: A Detailed Guide on How to Bypass IP Blocking

What is Web Scraping?

Web Scraping Uses: The Good, the Bad, and the Shady

Are Web Scraping Services a Legitimate Business?

Is Web Scraping Legal?

领英推荐

Can Legal Action Be Taken to Prevent Web Scraping?

The Legality of Web Scraping in the Age of Artificial Intelligence

Conclusion

Segun Aderibigbe MBA, CISSP.的更多文章

Elastic Universal Profiling agent, a continuous profiling solution, is now open source

How to Safeguard Your Company’s Data Across Multiple Clouds

API Security Explained

Account Takeover Attack

Security Service Edge (SSE) is a Profound Moment for Cloud, Data, and Network Security

Security Service Edge (SSE) is a Profound Moment for Cloud, Data, and Network Security

Introducing Imperva Cloud Data Security

SASE & the Future of Network Access Control

7 Best Practices for Securely Enabling Remote Work

Not surprisingly, 75% of consumers surveyed said contactless cards are the preferred payment method

社区洞察

其他会员也浏览了

Tips and Tricks for Advanced Strategies in Web Scraping and Price Intelligence

The Future of Web Scraping For MVP Developments

Web Scraping API Tools to Track, Manage and Visualize Your Data Pipeline

The A to Z of Web Scraping Explained

Top Industries Requiring Web Scraping Services in 2025

Guide For AI-Powered Web scraping

7 Best Web Scraping Proxies to Bypass Any Block in 2025

Getting Started with Web Scraping: A Simple Guide

Why Proxy Rotation is Crucial for Successful Web Scraping

Web Scraping Without Getting Blocked: A Detailed Guide on How to Bypass IP Blocking