Automate Finding Items on Craigslist || Python & Selenium to the Rescue
If necessity is the mother of invention, then laziness is sometimes its father!
Craigslist, especially in the United States, is one of the truly revolutionary Internet tools, making life much easier for buyers, sellers, job seekers, lonely hearts, and many more. However, one BIG demerit - It can be a pain to check in regularly, hoping not to miss out on that missed deal you might have had got only if you had contacted the seller earlier.
I was looking for deals on computer desk/chair, but always missed out on good deals by ending up in a queue with several people before on that item who contacted seller before me. The trick is to be the first or maximum second in the queue.
The problem at hand - Is there an easy way to get the updates regularly without being on the computer all the time?
The photo above shows the page for sale of "furniture" items near my location. Obviously, I could not be refreshing the page every minute to get in touch with the sellers. So, I wrote a simple Python script, which took me 10 minutes, that saved me quite some money.
Purpose of the Script: Find new items that are added on Craiglist by refreshing the page at a set interval (it can be randomized as well). If there is a new item that matches what I want, the script should email me the description. By looking at the description in the email, I can check the item listed and whether that fits into my requirement. I can then contact the seller if I want that item.
Tools/Software Needed: I used Python 2.7.11 with selenium, text_unidecode, smtplib, email modules. You also need Firefox browser on the computer.
Time Needed to setup: 5 minutes!!
Let’s dig into the craigslist site first, to see how it encodes searches. Rather than a single site, Craigslist is really broken down into many regional subdomains such as “hartford.craigslist.org”. This means that the very first step is to do a search and see what the resultant URL looks like. To search for “computer chair” that are available within 15 miles of my area and has a maximum price of $40 gives us this URL
Let's look at the terms here:
- query refers to what is being searched. In the example above: query will be "compter+chair". Note that there is no space and is replace by a +. Also, in the actual query, the " should not be present.
- distanceinMiles is the search radius. Obviously you are not going to pick up something from 50 miles away. So choose this parameter as per convenience.
- zipcode is your area zipcode.
- maxPrice is the maximum price in US $ you want to pay for the item.
This gives us a filtered page with only the items that match the requirement.
The next step is to get these items in the inbox. The code below does exactly that. The Python script below checks for new item and emails if anything found. Save this script as craiglist.py
## import all the necessary modules. Make sure you have them.
## modules can be installed using pip?
?from selenium import webdriver
from send_email import test_sendmail
import time
from text_unidecode import unidecode
## parameters
YOUR_ZIPCODE = # enter your zipcode here
RADIUS = # enter the radius within which you can go to collect the item
MAX_PRICE = # enter the max price you want to pay
?WAIT = # time in minutes the code should wait before checking for updates
ITEMS_TO_LOOK = # list of items to look. This should be a list such as ['chair', 'desk']
?
## generate the url for the above parameters for your area.
## here its generated for hartford area but you might want to change based on
## your location. Open craiglist and it will direct you the relevant subdomain??
url ='https://hartford.craigslist.org/search/zip?search_distance=' + str(RADIUS) + '&postal=' + str(YOUR_ZIPCODE) + '&max_price=' + str(MAX_PRICE)
## intantiate a firefox browser instance
driver=webdriver.Firefox()
## load the url
driver.get(url)
## declare and empty list to keep the items. This will also come in handy to check items
## that are newly added?
items = []
## get all the items on the first page. Not concerned about the other pages
results = driver.find_elements_by_css_selector('.result-row')
## iterate through the results to find the text
for result in results:
a = results[0].find_element_by_css_selector('.result-info')
items.append(unidecode(a.find_element_by_tag_name('a').text))
## in a while loop. This makes it an infinite loop
## LOGIC - keep checking every WAIT minutes and find new items that are added
## if any new item found, email it to the user
## else sleep. ???
while True:
driver.get(url)
time.sleep(5)
results = driver.find_elements_by_css_selector('.result-row')
for result in results:
a = results[0].find_element_by_css_selector('.result-info')
text = unidecode(a.find_element_by_tag_name('a').text)
if (text not in items) and (any(ext.lower().strip() in text.lower().strip() for ext in ITEMS_TO_LOOK)):
items.append(text)
test_sendmail(text)
time.sleep(WAIT*60)## sleep
The script below is imported in the previous script and both the scripts should be present in the same folder. The script below is responsible for the smooth email from the code to your inbox. Save this script as send_email.py
## import required modules
## modules can be imported using pip?
?import smtplib
from email.mime.text import MIMEText as text
## declare a function that sends email
## this currently works only with gmail but can work with other
## email providers as well with slight modification
???## remember to turn on "Access for less secure apps" in GMail via Link beforehand
## https://www.google.com/settings/security/lesssecureapps
def test_sendmail(subject):
? """ This script contains the parameters required for sending the email """
to_address ='[email protected]' ## change this
body = subject
subject=subject
sendmail(to_address, subject, body)
def sendmail(to_address, subject, body):
from_address='[email protected]' ## change this
smtp_server = 'smtp.gmail.com'
smtp_port= 587
smtp_user="[email protected]" ## change this
smtp_password="yourPassword" ## change this
msg = text(body)
msg['Subject'] = subject
msg['From'] = from_address
msg['To'] = to_address
server = smtplib.SMTP(smtp_server, smtp_port)
server.ehlo()
server.starttls()
server.login(smtp_user, smtp_password)
server.sendmail(from_address, to_address, msg.as_string())
server.quit()
Once both these scripts have been saved, you can run the first script (craiglist.py) using terminal/command prompt or simply using Python's IDLE or any other Python IDE. It will open a Firefox browser and will get to work. As long as the computer is running, the script will keep working. You can also create a free Amazon account and transfer the script there.
Note: This script requires password to be entered for the gmail setup and hence is sensitive and should not be shared with others without removing the same. This has also been published on my blog at https://analyticsbot.ml/2016/11/finding-free-items-on-craigslist-python-selenium-to-the-rescue/
Hope this helps someone out there. It surely saved me some money by finding good deals!
SCM | Logistics & Freight | Buyer | Analyst | Technical Procurement
5 个月Please realize that it's a totally unworkable code. You have to understand that you don't call fetching the email function at all, you call for import driver, search for results row, but where you are going to send your emails to if you have not fetched the elements that contain email addresses to begin with.
--
6 个月Cool. I understand in the very near future everyone will have access to "Agents, A new UI from Google and that will be super convenient and powerful.
Mechanical or Industrial Engineering Professional
7 年Very cool! Do you know how to run this script for multiple zip codes at once?
Strategy & Analytics at Rausch Sturm
8 年That's Ravi for explaining. It's really very cool ????
Internal Audit Analytics Manager @ Voya Financial
8 年That's awesome!