Automate Finding Items on Craigslist || Python & Selenium to the Rescue

Automate Finding Items on Craigslist || Python & Selenium to the Rescue

If necessity is the mother of invention, then laziness is sometimes its father!

Craigslist, especially in the United States, is one of the truly revolutionary Internet tools, making life much easier for buyers, sellers, job seekers, lonely hearts, and many more. However, one BIG demerit - It can be a pain to check in regularly, hoping not to miss out on that missed deal you might have had got only if you had contacted the seller earlier.

I was looking for deals on computer desk/chair, but always missed out on good deals by ending up in a queue with several people before on that item who contacted seller before me. The trick is to be the first or maximum second in the queue.

The problem at hand - Is there an easy way to get the updates regularly without being on the computer all the time?

The photo above shows the page for sale of "furniture" items near my location. Obviously, I could not be refreshing the page every minute to get in touch with the sellers. So, I wrote a simple Python script, which took me 10 minutes, that saved me quite some money.

Purpose of the Script: Find new items that are added on Craiglist by refreshing the page at a set interval (it can be randomized as well). If there is a new item that matches what I want, the script should email me the description. By looking at the description in the email, I can check the item listed and whether that fits into my requirement. I can then contact the seller if I want that item.

Tools/Software Needed: I used Python 2.7.11 with selenium, text_unidecode, smtplib, email modules. You also need Firefox browser on the computer.

Time Needed to setup: 5 minutes!!

Let’s dig into the craigslist site first, to see how it encodes searches. Rather than a single site, Craigslist is really broken down into many regional subdomains such as “hartford.craigslist.org”. This means that the very first step is to do a search and see what the resultant URL looks like. To search for “computer chair” that are available within 15 miles of my area and has a maximum price of $40 gives us this URL

Let's look at the terms here:

  • query refers to what is being searched. In the example above: query will be "compter+chair". Note that there is no space and is replace by a +. Also, in the actual query, the " should not be present.
  • distanceinMiles is the search radius. Obviously you are not going to pick up something from 50 miles away. So choose this parameter as per convenience.
  • zipcode is your area zipcode.
  • maxPrice is the maximum price in US $ you want to pay for the item.

This gives us a filtered page with only the items that match the requirement.

The next step is to get these items in the inbox. The code below does exactly that. The Python script below checks for new item and emails if anything found. Save this script as craiglist.py

## import all the necessary modules. Make sure you have them.
## modules can be installed using pip?
?from selenium import webdriver
from send_email import test_sendmail
import time
from text_unidecode import unidecode


## parameters
YOUR_ZIPCODE =  # enter your zipcode here
RADIUS = # enter the radius within which you can go to collect the item
MAX_PRICE = # enter the max price you want to pay
?WAIT = # time in minutes the code should wait before checking for updates
ITEMS_TO_LOOK = # list of items to look. This should be a list such as ['chair', 'desk']
?
## generate the url for the above parameters for your area.
## here its generated for hartford area but you might want to change based on
## your location. Open craiglist and it will direct you the relevant subdomain??
url ='https://hartford.craigslist.org/search/zip?search_distance=' + str(RADIUS) + '&postal=' + str(YOUR_ZIPCODE) + '&max_price=' + str(MAX_PRICE)

## intantiate a firefox browser instance
driver=webdriver.Firefox()
## load the url
driver.get(url)

## declare and empty list to keep the items. This will also come in handy to check items
## that are newly added?
items = []

## get all the items on the first page. Not concerned about the other pages
results = driver.find_elements_by_css_selector('.result-row')

## iterate through the results to find the text
for result in results:
    a = results[0].find_element_by_css_selector('.result-info')
    items.append(unidecode(a.find_element_by_tag_name('a').text))

## in a while loop. This makes it an infinite loop
## LOGIC - keep checking every WAIT minutes and find new items that are added
## if any new item found, email it to the user
## else sleep. ???
while True:
    driver.get(url)
    time.sleep(5)
    results = driver.find_elements_by_css_selector('.result-row')
    for result in results:
        a = results[0].find_element_by_css_selector('.result-info')
        text = unidecode(a.find_element_by_tag_name('a').text)
        if (text not in items) and (any(ext.lower().strip() in text.lower().strip() for ext in ITEMS_TO_LOOK)):
            items.append(text)
            test_sendmail(text)
            
    time.sleep(WAIT*60)## sleep

The script below is imported in the previous script and both the scripts should be present in the same folder. The script below is responsible for the smooth email from the code to your inbox. Save this script as send_email.py

## import required modules
## modules can be imported using pip?
?import smtplib
from email.mime.text import MIMEText as text

## declare a function that sends email
## this currently works only with gmail but can work with other
## email providers as well with slight modification
???## remember to turn on "Access for less secure apps" in GMail via Link beforehand
## https://www.google.com/settings/security/lesssecureapps


def test_sendmail(subject):""" This script contains the parameters required for sending the email """
    to_address ='[email protected]' ## change this
    body = subject
    subject=subject
    sendmail(to_address, subject, body)
 
 
def sendmail(to_address, subject, body): 
    from_address='[email protected]' ## change this
    smtp_server = 'smtp.gmail.com'
    smtp_port= 587
    smtp_user="[email protected]" ## change this
    smtp_password="yourPassword" ## change this

    msg = text(body)
    msg['Subject'] = subject
    msg['From'] = from_address
    msg['To'] = to_address

    server = smtplib.SMTP(smtp_server, smtp_port)
    server.ehlo()
    server.starttls()
    server.login(smtp_user, smtp_password)
    server.sendmail(from_address, to_address, msg.as_string())
    server.quit()

Once both these scripts have been saved, you can run the first script (craiglist.py) using terminal/command prompt or simply using Python's IDLE or any other Python IDE. It will open a Firefox browser and will get to work. As long as the computer is running, the script will keep working. You can also create a free Amazon account and transfer the script there.

Note: This script requires password to be entered for the gmail setup and hence is sensitive and should not be shared with others without removing the same. This has also been published on my blog at https://analyticsbot.ml/2016/11/finding-free-items-on-craigslist-python-selenium-to-the-rescue/

Hope this helps someone out there. It surely saved me some money by finding good deals!


Aleksandr A.

SCM | Logistics & Freight | Buyer | Analyst | Technical Procurement

5 个月

Please realize that it's a totally unworkable code. You have to understand that you don't call fetching the email function at all, you call for import driver, search for results row, but where you are going to send your emails to if you have not fetched the elements that contain email addresses to begin with.

回复

Cool. I understand in the very near future everyone will have access to "Agents, A new UI from Google and that will be super convenient and powerful.

回复
Sean Doohan

Mechanical or Industrial Engineering Professional

7 年

Very cool! Do you know how to run this script for multiple zip codes at once?

回复
Shriket Pai

Strategy & Analytics at Rausch Sturm

8 年

That's Ravi for explaining. It's really very cool ????

Shreshth Bahuguna

Internal Audit Analytics Manager @ Voya Financial

8 年

That's awesome!

回复

要查看或添加评论,请登录

Ravi Shankar的更多文章

  • How I started with Deep Learning?

    How I started with Deep Learning?

    Note: In this post, I talk about my learning in deep learning, the courses I took to understand, and the widely used…

    4 条评论
  • Measuring Text Similarity in Python

    Measuring Text Similarity in Python

    Note: This article has been taken from a post on my blog. A while ago, I shared a paper on LinkedIn that talked about…

    1 条评论
  • Getting started with Apache Spark

    Getting started with Apache Spark

    If you are in the big data space, you must have head of these two Apache Projects – Hadoop & Spark. To read more on…

  • Intuitive Explanation of "MapReduce"

    Intuitive Explanation of "MapReduce"

    How many unique words are there in this sentence which you are reading? The answer which you will say is 12 (Note: word…

  • Getting started with Hadoop

    Getting started with Hadoop

    Note: This is a long post. It talks about big data as a concept, what is Apache Hadoop, "Hello World" program of Hadoop…

    7 条评论
  • What is the Most Complex thing in the Universe?

    What is the Most Complex thing in the Universe?

    What is the most complex piece of creation (natural/artificial) in this universe? Is it the human brain? But if the…

    11 条评论
  • Getting Started with Python!

    Getting Started with Python!

    Note: This post is only for Python beginners. If you are comfortable with it, there might be nothing new to learn.

    2 条评论
  • L1, L2 Regularization – Why needed/What it does/How it helps?

    L1, L2 Regularization – Why needed/What it does/How it helps?

    Simple is better! That’s the whole notion behind regularization. I recently wrote about Linear Regression and Bias…

    4 条评论
  • Bias-Variance Tradeoff: What is it and why is it important?

    Bias-Variance Tradeoff: What is it and why is it important?

    What is Bias- Variance Tradeoff? The bias-variance tradeoff is an important aspect of machine/statistical learning. All…

    7 条评论
  • Understanding Linear Regression

    Understanding Linear Regression

    In my recent post on my blog, I tried to present my understanding of linear regression with charts and tables. Here's…

社区洞察

其他会员也浏览了