登录查看更多内容

Effortless Web Scraping with Selenium and Python: A Step-by-Step Guide

Diljith A K

Jr. Software Developer @ G3 Interactive

发布日期: 2024年7月3日

In today's data-driven world, extracting valuable insights from websites can be a game-changer. This article will guide you through the process of web scraping using Selenium with Python.

We will start by launching a web browser with Selenium and Chrome WebDriver to search for a specific keyword on a website. Following that, we'll scrape the h3 tags from the web page and save the results as a CSV file.

Before we dive into the details, let's explore what Selenium is and why a Chrome Driver is essential for web scraping.

What is Selenium?

Selenium is an automated testing tool for web applications. Yes, you heard it right. It can automate browsers, perform searches on websites, click buttons, and do much more. In this article, we are going to use it for scraping websites.

What is ChromeDriver?

"ChromeDriver" is a tool that allows you to automate tasks in Google Chrome. It's essentially a program that acts as a bridge between your test scripts and the Chrome browser. "ChromeDriver" works along with the "Selenium WebDriver" to automate various tasks in the chrome browser.

Setting Up Your Environment

I hope you have Python installed on your system. To scrape websites using Selenium, we need to install the Selenium package and download "ChromeDriver".

Install Selenium in Python

To install Selenium in Python, you can run the following command in the terminal:

pip install selenium

Download ChromeDriver

You can download "ChromeDriver" by searching in your browser or by clicking on this link.?

When you download "ChromeDriver", you need to make sure that the version of "ChromeDriver" matches the version of your Chrome browser.

To check this, you can open your Chrome browser and click on the three dots in the top right corner of the browser window, near your profile image. Then click on "Settings" and select "About Chrome."

You can see your Chrome browser version in this section.

If your Chrome browser version matches, you can download the corresponding version of "ChromeDriver" from this link.

Image showing ChromeDriver version to install if browser version is 126.0.6478

After downloading "ChromeDriver", you need to unzip it and store it somewhere on your system so that you don’t accidentally delete it and you can easily use it in your code. For convenience, I like to store this file in the “Program Files (x86)” folder.

Getting Started

After installing the "Selenium" package and "ChromeDriver, we can start scraping data from websites.

Step 1: Import libraries

In this section, we will import "webdriver", "By", "Keys", and "Service" from "selenium", as well as "csv" and "time".

import csv
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service

Step 2: Store ChromeDriver path and website URL in variables

Now, we need to store the "ChromeDriver" path and the URL of the website in variables. It is best practice to do this rather than providing the path directly when using them.

领英推荐

ParityVend Releases Free Open-Source Python Library…

Alex Casa 1 年前

From Static to Dynamic: Using Python and GitHub…

Luciano Ayres 7 个月前

Test Automation - Speeding Up Testing with Playwright…

Nir Tal 10 个月前

# Path to ChromeDriver
PATH = "C:\Program Files (x86)\chromedriver.exe"

# URL of the website to scrape
URL = 'https://www.bikewale.com/'

Step 3: Get the page content

Next, we will initialize the "ChromeDriver" service by passing the driver path to the "Service()" function. After that, we can create a new instance of the "ChromeDriver" using "webdriver.Chrome(service=service)". This action opens an automated Chrome window on your system, enabling you to navigate to your target website by passing the website URL to the "get()" method on the driver instance.

# Initialize ChromeDriver service
service = Service(PATH)

# Create a new instance of the Chrome driver
driver = webdriver.Chrome(service=service)

# Navigate to the URL
driver.get(URL)

Step 4:? Search for a keyword on the website

Next, we locate the div tag that contains the search field using its class name "o-bfyaNx". Within this div, we find the search input field by its tag name "input".

After that, we can insert the search keyword "Honda Bikes" into the search input field by passing the string to the "send_keys()" method.

To allow any search suggestions to load, we pause the script for 1 second.

Finally, we simulate pressing the Enter key by passing "Keys.RETURN" into the "send_keys()" method.

# Find the search box div by its class name
search_box_div = driver.find_element(By.CLASS_NAME, 'o-bfyaNx')

# Find the input field within the search box div
input_fields = search_box_div.find_element(By.TAG_NAME, 'input')

# Enter the search keyword
input_fields.send_keys('Honda Bikes')

# Wait for a second to let the suggestions load
time.sleep(1)

# Simulate pressing the Enter key
input_fields.send_keys(Keys.RETURN)

Step 5: Find the h3 tags containing bike names

After executing the search, we now have a list of bikes on our screen. To find the bike names from the list, we need to locate all "h3" tags on the website.

Note: When we inspect the web page after the search execution, there are no other h3 tags on the site except those that contain bike names. If the h3 tag is used elsewhere as well, then we need to filter the page content to remove unwanted data.

# Find all h3 elements on the page 
bike_names = driver.find_elements(By.TAG_NAME, 'h3')

Step 6: Store the bike names in a CSV file

After finding all the bike names from the site, we proceed to store these names in a CSV file. We open a CSV file named "bike_headings.csv" in write mode and create a CSV writer object. The first row in the file contains the header "Bike Names". Then, we iterate through each bike name found on the page and write it to the CSV file.

# Open a CSV file to write the bike names
with open('bike_headings.csv', mode='w') as file:
????writer = csv.writer(file)
????writer.writerow(['Bike Names']) # Write the header

????# Write each bike name to the CSV file
????for bike_name in bike_names:
????????writer.writerow([bike_name.text])

Step 7: Quit the browser window

After writing the bike names to the CSV file, we can proceed to close the browser. Although it is not necessary, we can wait for a few seconds before closing the browser to ensure all operations are complete. This can be done using the "time.sleep(5)" function. Finally, we close the browser window using the "driver.quit()" method.

# Wait for a few seconds before closing the browser
time.sleep(5)

# Close the browser window
driver.quit()

Conclusion

You've now learned the essentials of web scraping using Selenium and Python. By following this step-by-step guide, you’ve gained the skills to:

Set Up Your Environment: Install Python, Selenium, and ChromeDriver.
Navigate a Web Page: Launch a web browser and visit a website using Selenium.
Search for Specific Content: Automate a search on a website.
Extract Data: Scrape specific elements (like h3 tags) from a web page.
Save Data: Store the scraped data in a CSV file.

Web scraping can unlock a wealth of information and insights from various websites, making it a powerful tool in the data-driven world we live in. Whether you're gathering data for research, business analysis, or personal projects, mastering web scraping techniques can provide you with valuable skills.

While web scraping is incredibly useful, it's essential to respect website terms of service and ensure compliance with any legal requirements related to data extraction and usage.

If you have any questions or run into any issues, please feel free to ask in the comments below or reach out to me directly. I'm here to help!

Happy scraping!

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

要查看或添加评论，请登录

Diljith A K的更多文章

How to Automate Email Sending with Python: Step-by-Step Guide

2024年7月17日

How to Automate Email Sending with Python: Step-by-Step Guide

Sending personalized emails to multiple recipients can be a time-consuming and tedious task, especially when you have a…
How to Manage Files Automatically: A Step-by-Step Guide

2024年7月12日

How to Manage Files Automatically: A Step-by-Step Guide

Managing files manually can be time-consuming. In our busy lives, we can save time by managing files automatically.
Web Scraping: Unlocking the Power of Data with Beautiful Soup

2024年6月26日

Web Scraping: Unlocking the Power of Data with Beautiful Soup

In today's digital age, data is king. Access to data can drive business decisions, fuel innovative projects and provide…

3 条评论

Effortless Web Scraping with Selenium and Python: A Step-by-Step Guide

Diljith A K

Jr. Software Developer @ G3 Interactive

What is Selenium?

What is ChromeDriver?

Setting Up Your Environment

Install Selenium in Python

Download ChromeDriver

Getting Started

Step 1: Import libraries

Step 2: Store ChromeDriver path and website URL in variables

领英推荐

Step 3: Get the page content

Step 4:? Search for a keyword on the website

Step 5: Find the h3 tags containing bike names

Step 6: Store the bike names in a CSV file

Step 7: Quit the browser window

Conclusion

Diljith A K的更多文章

社区洞察

其他会员也浏览了

How to Deploy any LLM (ChatGPT like) Python App on Azure

Creating Interactive Map Applications in Python Using the Folium Module

How important PYTHON for SEO?

Test Automation - How To Capture Full-Page Screenshots In Selenium 4 Python Using Chrome DevTools Protocol

The Ultimate Guide to Python Web Scraping: Libraries, Tools, and Techniques

Web Scraping with Python: A Beginner’s Guide

A Guide to Web Scraping with Python

Why FAST-API??

Code Editor vs. Python: Choosing the Best Environment for Google Earth Engine (GEE)

How Python Simplifies and Optimizes Web Scraping

What is Selenium?

What is ChromeDriver?

Setting Up Your Environment

Install Selenium in Python

Download ChromeDriver

Getting Started

Step 1: Import libraries

Step 2: Store ChromeDriver path and website URL in variables

领英推荐

Step 3: Get the page content

Step 4:? Search for a keyword on the website

Step 5: Find the h3 tags containing bike names

Step 6: Store the bike names in a CSV file

Step 7: Quit the browser window

Conclusion

Diljith A K的更多文章

How to Automate Email Sending with Python: Step-by-Step Guide

How to Manage Files Automatically: A Step-by-Step Guide

Web Scraping: Unlocking the Power of Data with Beautiful Soup

社区洞察

其他会员也浏览了

How to Deploy any LLM (ChatGPT like) Python App on Azure

Creating Interactive Map Applications in Python Using the Folium Module

How important PYTHON for SEO?

Test Automation - How To Capture Full-Page Screenshots In Selenium 4 Python Using Chrome DevTools Protocol

The Ultimate Guide to Python Web Scraping: Libraries, Tools, and Techniques

Web Scraping with Python: A Beginner’s Guide

A Guide to Web Scraping with Python

Why FAST-API??

Code Editor vs. Python: Choosing the Best Environment for Google Earth Engine (GEE)

How Python Simplifies and Optimizes Web Scraping