Web scraping is one of the most common data collection methods, but its legality is still a much-debated topic. So, is web scraping legal? While the answer is not so straightforward, in this post we take a look at what web scraping is, its legal implications and best practices. 👀 Let’s dive in!
Web scraping involves the extraction of data from a website, the information collected is then exported in a format that is more useful for the user.
🔍 In general, web scraping is done via dedicated and automated tools that work faster than doing web scraping manually.
While web scraping involves developers as it can get quite technical, it is a valuable tool for researchers, journalists, academics, and more.
Web scraping can be used for:
Just like most people who research this topic, you might be wondering: is scraping data legal? Don’t get too enthusiastic, unfortunately, the entire subject remains a gray area.
Web scraping is generally allowed where:
In general, responsible web scraping requires you to be cautious about applicable Terms of Service, copyrighted data and personal data (as personal data is typically protected by privacy laws).
🔍 Take a look at our detailed guide on what is considered personal information across major privacy laws.
The major privacy laws to date in the EU (the GDPR) or in the US (the CPRA) aim at protecting user personal data and setting a framework for how this data can be used.
They do not refer to web scraping or state that it is illegal. However, they regulate the collection of personal data by businesses and what they can do with it. In brief – because yes, the law is much more complicated than that! – it usually involves:
🔍 In short, if your web scraping activities involve scraping personal information, you must make sure you are compliant with data privacy laws.
Some noteworthy cases in which web scraping is illegal and that you should be aware of include individuals or companies abusing web scraping and violating Terms of Service or copyright norms.
📌 Ruling by the US Ninth Circuit of Appeals Court – LinkedIn vs. HiQ
LinkedIn brought a battle in order to stop a competitor, HiQ, from scraping personal information from users’ LinkedIn public profiles.
In 2020, the ruling established that the CFAA was not violated since the data scraped from LinkedIn was public (not behind a password wall).
📌 Clearview AI Fine
The facial recognition firm earned a heavy fine for scraping millions of pictures of people’s faces from social media.
It was declared that Clearview AI was processing sensitive data without a valid legal basis. Read the full story on our blog.
✅ Be careful if downloading data from a website that requires you to log in, as this could mean that you have agreed to Terms of Service which may forbid web scraping activities.
✅ Make sure to check the website’s Terms and Conditions to ensure you’re not in breach of contract.
✅ Even if it’s publicly available data, make sure data isn’t protected by copyright. This can include articles, videos, designs.
✅ Lastly, and most importantly, consider the ethics involved. Even if an activity isn’t illegal, it can still cause harm or reputational damage to you or others.
To protect your website from having its information scraped, you can:
🔒 Copyright your website and write a copyright clause;
🔒 You should add web scraping restrictions to your website’s Terms and Conditions document. When doing so, make sure language is specific and forbid third parties from scraping information and use it for commercial purposes, for example.
👋 Here’s how to easily do this with iubenda software solutions:
🚀 Use iubenda’s Terms and Conditions Generator;
🚀 Create your customize Terms and Conditions document;
🚀 Select our pre-drafted clauses (copyright, etc.) or create a custom clause;
🚀 Follow our instructions to quickly install the document on your website!
Attorney-level solutions to make your websites and apps compliant with the law across multiple countries and legislations.