Effortless Web Scraping with Selenium and Python: A Step-by-Step Guide
In today's data-driven world, extracting valuable insights
We will start by launching a web browser with Selenium and Chrome WebDriver to search for a specific keyword on a website. Following that, we'll scrape the h3 tags from the web page and save the results as a CSV file.
Before we dive into the details, let's explore what Selenium is and why a Chrome Driver is essential for web scraping.
What is Selenium?
Selenium is an automated testing tool
What is ChromeDriver?
"ChromeDriver" is a tool that allows you to automate tasks in Google Chrome. It's essentially a program that acts as a bridge between your test scripts and the Chrome browser. "ChromeDriver" works along with the "Selenium WebDriver" to automate various tasks in the chrome browser.
Setting Up Your Environment
I hope you have Python installed on your system. To scrape websites using Selenium, we need to install the Selenium package and download "ChromeDriver".
Install Selenium in Python
To install Selenium in Python, you can run the following command in the terminal:
pip install selenium
Download ChromeDriver
You can download "ChromeDriver" by searching in your browser or by clicking on this link.?
When you download "ChromeDriver", you need to make sure that the version of "ChromeDriver" matches the version of your Chrome browser.
To check this, you can open your Chrome browser and click on the three dots in the top right corner of the browser window, near your profile image. Then click on "Settings" and select "About Chrome."
You can see your Chrome browser version in this section.
If your Chrome browser version matches, you can download the corresponding version of "ChromeDriver" from this link.
After downloading "ChromeDriver", you need to unzip it and store it somewhere on your system so that you don’t accidentally delete it and you can easily use it in your code. For convenience, I like to store this file in the “Program Files (x86)” folder.
Getting Started
After installing the "Selenium" package and "ChromeDriver, we can start scraping data from websites.
Step 1: Import libraries
In this section, we will import "webdriver", "By", "Keys", and "Service" from "selenium", as well as "csv" and "time".
import csv
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
Step 2: Store ChromeDriver path and website URL in variables
Now, we need to store the "ChromeDriver" path and the URL of the website in variables. It is best practice to do this rather than providing the path directly when using them.
领英推荐
# Path to ChromeDriver
PATH = "C:\Program Files (x86)\chromedriver.exe"
# URL of the website to scrape
URL = 'https://www.bikewale.com/'
Step 3: Get the page content
Next, we will initialize the "ChromeDriver" service by passing the driver path to the "Service()" function. After that, we can create a new instance of the "ChromeDriver" using "webdriver.Chrome(service=service)". This action opens an automated Chrome window on your system, enabling you to navigate to your target website by passing the website URL to the "get()" method on the driver instance.
# Initialize ChromeDriver service
service = Service(PATH)
# Create a new instance of the Chrome driver
driver = webdriver.Chrome(service=service)
# Navigate to the URL
driver.get(URL)
Step 4:? Search for a keyword on the website
Next, we locate the div tag that contains the search field using its class name "o-bfyaNx". Within this div, we find the search input field by its tag name "input".
After that, we can insert the search keyword "Honda Bikes" into the search input field by passing the string to the "send_keys()" method.
To allow any search suggestions to load, we pause the script for 1 second.
Finally, we simulate pressing the Enter key by passing "Keys.RETURN" into the "send_keys()" method.
# Find the search box div by its class name
search_box_div = driver.find_element(By.CLASS_NAME, 'o-bfyaNx')
# Find the input field within the search box div
input_fields = search_box_div.find_element(By.TAG_NAME, 'input')
# Enter the search keyword
input_fields.send_keys('Honda Bikes')
# Wait for a second to let the suggestions load
time.sleep(1)
# Simulate pressing the Enter key
input_fields.send_keys(Keys.RETURN)
Step 5: Find the h3 tags containing bike names
After executing the search, we now have a list of bikes on our screen. To find the bike names from the list, we need to locate all "h3" tags on the website.
Note: When we inspect the web page after the search execution, there are no other h3 tags on the site except those that contain bike names. If the h3 tag is used elsewhere as well, then we need to filter the page content to remove unwanted data.
# Find all h3 elements on the page
bike_names = driver.find_elements(By.TAG_NAME, 'h3')
Step 6: Store the bike names in a CSV file
After finding all the bike names from the site, we proceed to store these names in a CSV file. We open a CSV file named "bike_headings.csv" in write mode and create a CSV writer object. The first row in the file contains the header "Bike Names". Then, we iterate through each bike name found on the page and write it to the CSV file.
# Open a CSV file to write the bike names
with open('bike_headings.csv', mode='w') as file:
????writer = csv.writer(file)
????writer.writerow(['Bike Names']) # Write the header
????# Write each bike name to the CSV file
????for bike_name in bike_names:
????????writer.writerow([bike_name.text])
Step 7: Quit the browser window
After writing the bike names to the CSV file, we can proceed to close the browser. Although it is not necessary, we can wait for a few seconds before closing the browser to ensure all operations are complete. This can be done using the "time.sleep(5)" function. Finally, we close the browser window using the "driver.quit()" method.
# Wait for a few seconds before closing the browser
time.sleep(5)
# Close the browser window
driver.quit()
Conclusion
You've now learned the essentials of web scraping using Selenium and Python. By following this step-by-step guide, you’ve gained the skills to:
Web scraping can unlock a wealth of information and insights from various websites, making it a powerful tool in the data-driven world we live in. Whether you're gathering data for research, business analysis, or personal projects, mastering web scraping techniques can provide you with valuable skills.
While web scraping is incredibly useful, it's essential to respect website terms of service and ensure compliance with any legal requirements related to data extraction and usage.
If you have any questions or run into any issues, please feel free to ask in the comments below or reach out to me directly. I'm here to help!
Happy scraping!