登录查看更多内容

Product data scraping from Flipkart.com using python

Nikita Patil

发布日期: 2019年11月20日

+ 关注

This Tutorial will explain you how we can extract product data from Flipkart.com using python.

Data we can extract using python.

URLProduct NamePriceEMINumber of reviewsNumber of ratingsHighlightsSpecificationsProduct DescriptionDescription

Screen shot from where data will be extracting using python

Inspecting element for data extractions

To find appropriate data from website first we have to inspecting and understanding html tag which is associated with given data ..

please follow below steps to finding tags

Open browser (Google Chrome , Mozilla )Copy and paste url you want to scrape.

Press F12 to view HTML structure of given site.,

finding tags for require data

Here we have explained for finding price tag how can we find it , like this other tags can easily find…

How to set up your computer for web scraper development

We will use Python 3 for this tutorial. The code will not run if you are using Python 2.7. To start, you need a computer with Python 3 and PIP installed in it.

Let’s check your python version. Open a terminal ( in Linux and Mac OS ) or Command Prompt ( on Windows ) and type

python –version

and press enter. If the output looks something like Python 3.x.x, you have Python 3 installed. If it says Python 2.x.x you have Python 2. If it prints an error, you don’t probably have python installed.

If you don’t have Python 3, install it first.

Install Python 3 and Pip

Linux – https://www.python.org/downloads/source/

Mac Users can follow this guide – https://www.python.org/downloads/mac-osx/

Windows Users go here – https://www.python.org/downloads/windows/

For PIP installation visit this link – https://www.liquidweb.com/kb/install-pip-windows/

Install Packages

Python Requests, to make requests and download the HTML content of the pages ( https://docs.python-requests.org/en/master/user/install/).Python LXML, for parsing the HTML Tree Structure using Xpath (Learn how to install that here – https://lxml.de/installation.html)

Python Code to Scrape Flipkart.com

import requests

from lxml import html

import requests.packages.urllib3.exceptions

import json

from urllib3.exceptions import InsecureRequestWarning

# below code send http get request to yellowpages.com

# return content in form of string

# lib Refernce

# 1 :- request

def getRequest(url):

headers = {‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8’,

‘Accept-Encoding’: ‘gzip, deflate, br’,

‘Accept-Language’: ‘en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7’,

‘User-Agent’: ‘Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36’

}

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

response = requests.get(url, verify=False, headers=headers)

return response.text

# This method is use to parse data from string

# Return object with data

# lib Refrence

# 1 :- lxml

# 2 : json

def parseData(strHtml):

parser = html.fromstring(strHtml)

strJson = parser.xpath(‘//script[@id=”is_script”]’)[0]

jObject = json.loads(strJson.text.replace(

‘window.__INITIAL_STATE__ = ‘, ”).replace(‘};’, ‘}’))

url = jObject[“pageDataV4”][“page”][“pageData”][“pageContext”][“seo”][“webUrl”]

productName = jObject[“pageDataV4”][“page”][“pageData”][“pageContext”][“titles”][“title”]

price = jObject[“pageDataV4”][“page”][“pageData”][“pageContext”][“pricing”][“finalPrice”][“value”]

emi = jObject[“pageDataV4”][“page”][“data”][“10002”][2][“widget”][“data”][

“emiDetails”][“nbfcEmi”][“details”][0][“tenures”][1][“installment”]

reviews = jObject[“pageDataV4”][“page”][“pageData”][“pageContext”][“rating”][“reviewCount”]

ratings = jObject[“pageDataV4”][“page”][“pageData”][“pageContext”][“rating”][“count”]

highlights = ‘\n ‘.join(jObject[“pageDataV4”][“page”][“data”]

[“10004”][0][“widget”][“data”][“highlights”][“value”][“text”])

specifications = jObject[“pageDataV4”][“page”][“data”][“10005”][4][

“widget”][“data”][“renderableComponents”][0][“value”][“attributes”]

productDescription = jObject[“pageDataV4”][“page”][“data”][“10005”][3][“widget”][“data”][“overview”][“description”][“text”]

description = jObject[“pageDataV4”][“page”][“data”][“10005”][2][“widget”][“data”][“renderableComponents”][0][“value”][“text”]

return {

‘URL’: url,

‘Product Name’: productName,

‘Price’: price,

‘EMI’: emi,

‘Number Of Reviews’: reviews,

‘Number Of Rating’: ratings,

‘Highlights’: highlights,

‘Specifications’: specifications,

‘Product Description’: productDescription,

‘Description’: description

}

if __name__ == “__main__”:

print(‘Scraping Data from yellow Pages’)

url = ‘https://www.flipkart.com/vu-iconium-109cm-43-inch-ultra-hd-4k-led-smart-tv/p/itmexyhfetqnrhha’

print(‘Url :- ‘+url)

strHtml = getRequest(url)

strResult = parseData(strHtml)

result = json.dumps(strResult, sort_keys=True, indent=4)

print(result)

Above code is developed for Python 3.X .. Run in any IDE like PyCharm , sublime text etc… We got here json file , we can also extract these data into sql database , export in CSV , Excel with modification in coding..

Here using lxml library data is extracted , you can do using beautifulsoup 4 also we can extract data from any website..

Run above code in any IDE of python and you will get result in JSON.. for test you can use another url from eBay..

Clarification :- This code available in this tutorial is only learning purpose . We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. This code is only use for knowledge expansion regarding programming field.. by this tutorial we are not encourage eBay scraping or web scraping but will help to understand scraping.. also we are not responsible to provide any support for this code .. user can modify for learning purpose..

要查看或添加评论，请登录

Nikita Patil的更多文章

Best Way to Grab Aliexpress Products

2020年2月6日

Best Way to Grab Aliexpress Products

Are you looking for the best way for scraping Aliexpress products?? – This guide is for you. Know about Aliexpress…
Need Web Data Extraction services ???

2019年7月12日

Need Web Data Extraction services ???

Who We Are & What We Serve as Web Scraping Company Infovium web scraping company was started with mission of providing…
5 major benefits of using automate web scraping services

2018年12月26日

5 major benefits of using automate web scraping services

Web scraping services and its uses? Web scraping is programming way to extract data from web pages and arrange these…
When Data scraping services for Scrap Data From Web, Why To Do It Manually???

2018年12月22日

When Data scraping services for Scrap Data From Web, Why To Do It Manually???

Every business needs to have some statistics about its competitors, market movers, latest inclinations and updates…

Product data scraping from Flipkart.com using python

Nikita Patil

Nikita Patil的更多文章

社区洞察

其他会员也浏览了

Automating Python Scripts for Stock Prices

The Snake Installation

Automating Data Extraction from Excel Files in Python: A Step-by-Step Guide

Python Basics for Data Science

?? Big Data in Construction. Part 1-1: Choosing python IDE. Anaconda. Install Python.

Mastering Python Dictionaries: Key to Efficiency ????

Python QuickEcharts

Python geotechTools on GitHub

Comprehensive Guide to Pandas DataFrame Row Operations

What are Data Types in Python - Explain With Example

Nikita Patil的更多文章

Best Way to Grab Aliexpress Products

Need Web Data Extraction services ???

5 major benefits of using automate web scraping services

When Data scraping services for Scrap Data From Web, Why To Do It Manually???

社区洞察

其他会员也浏览了

Automating Python Scripts for Stock Prices

The Snake Installation

Automating Data Extraction from Excel Files in Python: A Step-by-Step Guide

Python Basics for Data Science

?? Big Data in Construction. Part 1-1: Choosing python IDE. Anaconda. Install Python.

Mastering Python Dictionaries: Key to Efficiency ????

Python QuickEcharts

Python geotechTools on GitHub

Comprehensive Guide to Pandas DataFrame Row Operations

What are Data Types in Python - Explain With Example