Do you agree with Google’s privacy policy?
Chandani Patel Thompson
Director, Legal and Privacy Leader | Speaker | Author | Expert Problem Solver
In Google’s Privacy Policy, Google states that it "uses information to improve [its] services and to develop new products, features and technologies that benefit [its] users and the public. For example, [Google] use[s] publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”
But is scraping publicly available information for this purpose legal or ethical?
The legality of web scraping, even for publicly available information, can vary based on a range of factors including the jurisdiction you are in, the specific website's terms of use, and the manner in which the scraping is conducted. Here are some key points to consider:
1. Website Terms of Use: Many websites include terms of use or terms of service that outline what is allowed and prohibited on their platform. Some websites explicitly prohibit web scraping in their terms of use. If you violate these terms, the website could take legal action against you.
2. Copyright and Intellectual Property: If the content being scraped is protected by copyright or other intellectual property laws, using or distributing that content without proper authorization could lead to legal issues.
领英推荐
3. Computer Fraud and Abuse Act (CFAA) in the United States: In the U.S., the Computer Fraud and Abuse Act (CFAA) has been used to prosecute individuals and entities that engage in unauthorized access to computer systems, including through scraping. If scraping involves bypassing security measures, gaining unauthorized access, or causing damage to the website's servers, it could potentially be seen as a violation of the CFAA.
4. Ethics and Respect for Website Owners: Even if scraping is legally permissible, it's important to consider the ethical implications. Websites often invest time and resources in creating and maintaining their content. Scraping large amounts of data could strain their servers or negatively impact their business model.
5. Robots.txt and Crawl Delay: Some websites use a file called "robots.txt" to indicate which parts of their site are off-limits to web crawlers and scrapers. Respecting the instructions in robots.txt can help you avoid legal issues.
6. Public vs. Private Data: Just because information is publicly accessible does not necessarily mean it's okay to scrape it. For example, scraping personal or sensitive data, even if it's publicly available, could still raise legal and ethical concerns.
As a writer of this newsletter, Google’s privacy policy suggests that it could use these writings to help build its products and features. Some content creators, thus copyright owners in the US, have already filed complaints against Google alleging copyright infringement. Unfortunately, it will take some time before we learn what the courts decide. Until then, what do you think about Google’s use of publicly available information to train its AI models? Would you argue for the option to opt-out? I'd say, yes please!