登录查看更多内容

Python IO - Locked Down Lessons

Nithin Pk

Chief Product Owner | Data Science Expert

发布日期: 2020年5月25日

Very easy to use functions available in python for accessing data from pdf /Json / textfile /web /RDBMS .. Sample code attached for reference . I use Anaconda Jupyter IDE for testing the code

Steps to read pdf using inbuilt python functions

Need to import library PyPDF2

import PyPDF2


# reading the pdf file
pdf_object = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_object)


# Below ready made function provides Number of pages in the PDF file
print(pdf_reader.numPages)


# get a certain page's text content in an py Object 
page_object = pdf_reader.getPage(1)


# Extract text from the page_object,string operations can be used to disect the 

print(page_object.extractText())

Below two steps to request and read API

Use the requests module to connect to a URL and fetch a response
Use json.loads() to convert a JSON object to a python dictionary

import requests, json
import pprint




r = requests.get(url) # pass the url with the API key and get the response 


# converting the json object to a dict using json.loads()
r_dict = json.loads(r.text)

# Json is converted to python dictionary which is key value data structure

Getting Data from websites for analysis

Web scraping refers to the art of pro grammatically getting data from the internet. One of the coolest features of python is that it makes it easy to scrape websites.In Python 3, the most popular library for web scraping is BeautifulSoup

The general procedure to get data from websites is:

Use requests to connect to a URL and get data from it
Create a BeautifulSoup object
Get attributes of the BeautifulSoup object (i.e. the HTML elements that you want)

import requests, bs4


# getting HTML from the amazon web page
url ="https://www.amazon.in/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=sport+shoes"
req = requests.get(url) # this library  get data html page  from url 

# create a bs4 object
# To avoid warnings, provide "html5lib" explicitly
soup = bs4.BeautifulSoup(req.text, "html5lib")
#print(soup)
#soup.select('div > p') # selects all the <p> elements within div tags inside html

Getting Data from CSV file

The easiest way to read delimited files is using

pd.read_csv(filepath, sep, header) and specify a separator (delimiter).default is comma

import numpy as np
import pandas as pd
# Using encoding = "ISO-8859-1"
file = pd.read_csv("xxxxxx.txt", sep="\t", encoding = "ISO-8859-1")

file.head()

# head () deafult function to print the first 5 rows

要查看或添加评论，请登录

Nithin Pk的更多文章

Encoding

2021年2月21日

Encoding

Computers can handle and understand only numbers and stores them on registers (unit of memory) . How are non-numerical…
How to call google Maps open API's using Python

2020年12月13日

How to call google Maps open API's using Python

APIs, or application programming interfaces, are created by companies and organisations to provide restricted access to…
Web Scraping

2020年6月7日

Web Scraping

This is very interesting feature in data science to analyse data from web portals ..

1 条评论
Data Frame from Python- Locked down lessons

2020年5月24日

Data Frame from Python- Locked down lessons

Pandas series and dataframes which are the basic data structures in Pandas library.Indexing, Selecting and Subsetting a…

1 条评论

Python IO - Locked Down Lessons

Nithin Pk

Chief Product Owner | Data Science Expert

Steps to read pdf using inbuilt python functions

Getting Data from websites for analysis

Getting Data from CSV file

Nithin Pk的更多文章

社区洞察

其他会员也浏览了

Master These Python Fundamentals Before Diving into Data Analysis Libraries

Fix Your Messy Data with These Pandas Methods

Python 3.12: Unpacking Three Exciting New Features

Automating Python Scripts for Stock Prices

SnowPark Python— Aamir P

Pandas Library in Python

Automating Data Extraction from Excel Files in Python: A Step-by-Step Guide

Understanding Input/Output in Python in Excel: A Guide for Actuaries

Python Basics for Data Science

Exploring Python's Core Data Structures: A Beginner’s Guide

Steps to read pdf using inbuilt python functions

Getting Data from websites for analysis

Getting Data from CSV file

Nithin Pk的更多文章

Encoding

How to call google Maps open API's using Python

Web Scraping

Data Frame from Python- Locked down lessons

社区洞察

其他会员也浏览了

Master These Python Fundamentals Before Diving into Data Analysis Libraries

Fix Your Messy Data with These Pandas Methods

Python 3.12: Unpacking Three Exciting New Features

Automating Python Scripts for Stock Prices

SnowPark Python— Aamir P

Pandas Library in Python

Automating Data Extraction from Excel Files in Python: A Step-by-Step Guide

Understanding Input/Output in Python in Excel: A Guide for Actuaries

Python Basics for Data Science

Exploring Python's Core Data Structures: A Beginner’s Guide