January 2025 product updates, new integrations, and new resources for your team
Hi, hi! Welcome back. Here, in Web Scraping Made Simple, we share new insights, interesting ideas, and tools to help you ease your web scraping process, speed up data pipelines, and increase your productivity without adding to your workload ??
If you'd like to be part of the conversation, don't forget to subscribe and comment at the end. I'll make sure to answer all your questions!
?? Integrate Zapier into your workflow
Streamlining your data collection and delivery processes can transform your team’s operations. Integrating ScraperAPI with Zapier unlocks powerful workflow automation that reduces manual effort and increases efficiency. This setup allows you to scrape data and channel the results directly into tools like Google Sheets, Slack, or even your CRM without writing complex code for each step.
In this article, I’ll guide you through:
By the end, you’ll have a solid understanding of automating repetitive data tasks, empowering your team to focus on what matters most.
?? New resources and tips
We want our website to become the most comprehensive resource for learning web scraping and leveling up your skills, no matter your experience.
In this HUB, you'll find everything from concepts you need to understand to get started to full guides on how to scrape any type of content using different languages (Python, Node.js, C#, R, Golang, etc.) and libraries.
Of course, you'll also find more specific guides like how to scrape popular sites and how to handle common web scraping issues.
Note: If there's a topic you'd like us to cover, please let us know!
If you're in the ecommerce space, running competitive analysis using Amazon data grats you a huge advantage.
In this guide, we'll teach you, step by step, how to get, clean, and analyze Amazon data to get actionable insights.
?? ScraperAPI just got better in January!
领英推荐
LLM-ready output format
To properly train LLMs, a lot of high-quality, unbiased data is needed, and the web is filled with goldmines of it. However, processing and cleaning all this data – especially at a large scale – is time-consuming and taxing on your resources.
So, if you're already using ScraperAPI to bypass bot-blockers and collect this training data, wouldn't it make sense to also use it to prepare your datasets?
That's precisely what this new solution will do for you! Just set the "output" parameter to "text" or "markdown," and ScraperAPI will return the page's content as a file that is ready to train your LLMs. No additional work needed!
Redfin 'For Sale' endpoint
As promised, we're expanding our suite of endpoints to the real estate industry, starting with one of the most scraped sites: Redfin!
However, you might have noticed – by the name – that this endpoint focuses solely on properties for sale. This is because the layout for selling and renting property pages is so different, so we decided to keep these two separate – but don't worry, that's coming next ;)
Endpoint:
https://api.scraperapi.com/structured/redfin/forsale
ScraperAPI will return all property details in JSON or CSV data.
Note: You can also use this endpoint with our Async Scraper or DataPipeline.
Take screenshots
We took a big step with our Rendering Instruction Set, but we left an important function behind: taking screenshots.
However, we didn't want to force you to use different logic just for this action – as sometimes that's all you want to do, take a screenshot. For this reason, you can now use the "screenshot=true" parameter within the scraping API or the Async Scraper.
Example:
curl -i "https://api.scraperapi.com/?api_key=API_KEY&screenshot=true&url=https://example.com/"
It's a smaller update, but it's still pretty cool!
Thanks a lot for getting all the way here! I hope you found something useful. Please don't forget to share any ideas or comments below. I genuinely want to know what you'd like us to cover next. This is just the beginning, so more improvements to this newsletter are coming.
Until next time, happy scraping!