Python IO - Locked Down Lessons

Very easy to use functions available in python for accessing data from pdf /Json / textfile /web /RDBMS .. Sample code attached for reference . I use Anaconda Jupyter IDE for testing the code

Steps to read pdf using inbuilt python functions

Need to import library PyPDF2  

import PyPDF2


# reading the pdf file
pdf_object = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_object)


# Below ready made function provides Number of pages in the PDF file
print(pdf_reader.numPages)


# get a certain page's text content in an py Object 
page_object = pdf_reader.getPage(1)


# Extract text from the page_object,string operations can be used to disect the 

print(page_object.extractText())




Below two steps to request and read API

  1. Use the requests module to connect to a URL and fetch a response
  2. Use json.loads() to convert a JSON object to a python dictionary
import requests, json
import pprint




r = requests.get(url) # pass the url with the API key and get the response 


# converting the json object to a dict using json.loads()
r_dict = json.loads(r.text)

# Json is converted to python dictionary which is key value data structure 


Getting Data from websites for analysis

Web scraping refers to the art of pro grammatically getting data from the internet. One of the coolest features of python is that it makes it easy to scrape websites.In Python 3, the most popular library for web scraping is BeautifulSoup

The general procedure to get data from websites is:

  1. Use requests to connect to a URL and get data from it
  2. Create a BeautifulSoup object
  3. Get attributes of the BeautifulSoup object (i.e. the HTML elements that you want)
import requests, bs4


# getting HTML from the amazon web page
url ="https://www.amazon.in/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=sport+shoes"
req = requests.get(url) # this library  get data html page  from url 

# create a bs4 object
# To avoid warnings, provide "html5lib" explicitly
soup = bs4.BeautifulSoup(req.text, "html5lib")
#print(soup)
#soup.select('div > p') # selects all the <p> elements within div tags inside html


Getting Data from CSV file 

The easiest way to read delimited files is using

pd.read_csv(filepath, sep, header) and specify a separator (delimiter).default is comma 

import numpy as np
import pandas as pd
# Using encoding = "ISO-8859-1"
file = pd.read_csv("xxxxxx.txt", sep="\t", encoding = "ISO-8859-1")

file.head()
# head () deafult function to print the first 5 rows 


要查看或添加评论,请登录

Nithin Pk的更多文章

  • Encoding

    Encoding

    Computers can handle and understand only numbers and stores them on registers (unit of memory) . How are non-numerical…

  • How to call google Maps open API's using Python

    How to call google Maps open API's using Python

    APIs, or application programming interfaces, are created by companies and organisations to provide restricted access to…

  • Web Scraping

    Web Scraping

    This is very interesting feature in data science to analyse data from web portals ..

    1 条评论
  • Data Frame from Python- Locked down lessons

    Data Frame from Python- Locked down lessons

    Pandas series and dataframes which are the basic data structures in Pandas library.Indexing, Selecting and Subsetting a…

    1 条评论

社区洞察

其他会员也浏览了