Basic Data Preparation with Python (Stock Price Data)

Chee-Chuan Foo

Data Scientist | Writer | Consultant

发布日期: 2022年12月31日

+ 关注

Python stock price analysis 03: Learn typical data preparation process with stock prices data

Python Stock Price Data Analysis Series

Introduction

In my previous?article, I talked about extracting a list of stock prices online using Python. At the end of the tutorial, I stored the data in multiple JSON files.

As a data scientist or data engineer, you might be asked to convert hundreds if not thousands of files into 1 table before pushing them into the database. These files can be CSV, JSON, .xlsb, etc.

In this tutorial, we’re going to combine these JSON files that we get in the previous tutorials into one file.

Data & Resources

Data:?GDrive Link

Medium:?medium.com/@foocheechuan

Youtube:?Chee-Chuan

If you didn’t follow the tutorial in my previous article, download the data above to follow along.

Read the files

I used a relative path to navigate to the folder used in my previous article. In your case, you can just use path = ‘your-folder-path’.

Then we use os.listdir to list out all the files in that folder.

import os
path = os.path.abspath(f"../04 Extract Stock Prices in Bulk/JSON")
json_folder = os.listdir(path)
json_folder

No alt text provided for this image — List of files in json_folder

Since there are multiple files in the folder, we need to loop through the files and read the file one-by-one.

import json

for files in json_folder:
    # Reading from json file
    with open(f'{path}\\{files}', 'r') as openfile:
        json_object = json.load(openfile)

It’s nothing if we just keep replacing the data every time we read a file. So let’s convert json_object into a pandas data frame before continuing to the next element.

Convert into a Pandas Data?Frame

Convert json_object into a pandas data frame in each loop. I’ll copy the code I used in my previous article about extracting stock prices online.

import pandas as pd
import json

for files in json_folder:
    # Reading from json file
    with open(f'{path}\\{files}', 'r') as openfile:
        json_object = json.load(openfile)
    data = json_object['Time Series (Daily)']

    # Convert into data frame
    df = pd.DataFrame.from_dict({(i): data[i] 
                           for i in data.keys()},
                       orient='index')
    df_date = df.rename_axis('Date').reset_index()

This will help convert each of our JSON files into the following format.

Label with its respective ticker?symbol

There is one problem with the data frame. When we loop through all stock data, we wouldn’t know which stock the data belong to after we append them into one table. With that said, we need to create a column in the data frame to label its respective stock ticker symbol.

In python, a string variable can be treated as an iterable array so we can use a substring method to retrieve the first 4 letters of the file names

# Extract first 4 letters of the file names
for files in json_folder:
    print(files[:4])

Benjamin Bennett Alexander 1 年前

Mastering Data Analysis with Python: A 50-Day Journey…

Benjamin Bennett Alexander 10 个月前

Data Analysis 101 with Python: Stop Reading and Start…

Benjamin Bennett Alexander 6 个月前

Add the code into the loop.

import pandas as pd
import json

for files in json_folder:
    # Reading from json file
    with open(f'{path}\\{files}', 'r') as openfile:
        json_object = json.load(openfile)
    data = json_object['Time Series (Daily)']
    
    # Convert into data frame
    df = pd.DataFrame.from_dict({(i): data[i] 
                           for i in data.keys()},
                       orient='index')
    df_date = df.rename_axis('Date').reset_index()
    
    # Add a column for the stock's ticker
    df_date['Ticker'] = files[:4]

Now we can append the data tables into one table.

Append multiple?tables

We need to first create an empty data frame called df_all before we can append all stock price tables into that empty table. Note that this declaration is done before the for loop.

With that, we will replace df and df_date in each of the loops but df_all will not be replaced and will store all tables in the previous loop.

import pandas as pd
import json

df_all = pd.DataFrame()

for files in json_folder:
    # Reading from json file
    with open(f'{path}\\{files}', 'r') as openfile:
        json_object = json.load(openfile)
    data = json_object['Time Series (Daily)']
    
    # Convert into data frame
    df = pd.DataFrame.from_dict({(i): data[i] 
                           for i in data.keys()},
                       orient='index')
    df_date = df.rename_axis('Date').reset_index()
    
    # Add a column for the stock's ticker
    df_date['Ticker'] = files[:4]
    df_all = pd.concat([df_all,df_date])
else:
    display(df_all)

Check that df_all has all the stocks we required.

df_all['Ticker'].unique()

Reset index

Note that there are 30653 rows of data but the last index is 3142. This happened because the index of each table wasn’t reset when we append the tables together and the index remained as its index in the original table. So we need to reset the index in the combined table.

df_out = df_all.reset_index(drop=True)

Output Data

Output data to the same folder (created a folder called Output to store the final data)

output_path = os.path.abspath(f".\\Output")
df_out.to_csv(f'{output_path}\\FAANG-MT Combined Data.csv',index=False)

Video Tutorial

Conclusion

That’s all for today’s article. Thank you for reading today’s article. See you again in my next article.

About Me

Currently working as a Data Scientist. I provide consultancy, training, and professional services for data analytics problems to my clients worldwide. Would love to share my experience as a consultant so that everyone can learn something from it.

LinkedIn: https://www.dhirubhai.net/in/foocheechuan/

Medium: medium.com/@foocheechuan

Youtube: Chee-Chuan

Basic Data Preparation with Python (Stock Price Data)

Chee-Chuan Foo

Data Scientist | Writer | Consultant

Python Stock Price Data Analysis Series

Introduction

Data & Resources

Read the files

Convert into a Pandas Data?Frame

Label with its respective ticker?symbol

领英推荐

Append multiple?tables

Output Data

Video Tutorial

Conclusion

About Me

更多精彩文章

社区洞察

其他会员也浏览了

D-TALE

Effortless Data Tabulation in Python with Tabulate Library

Sweetviz

What makes Python a brilliant choice for Data Analysis?

Python Pandas DataFrame

Unlocking Insights: The Power Of Python For Data Analysis

The Top 10 Python-Based Data Science Skills

Python Basics for Data Science

Do You Read Excel Files with Python? There is a 1000x Faster?Way

High-Performance Data Analysis with Polars: A Comprehensive Guide

Python Stock Price Data Analysis Series

Introduction

Data & Resources

Read the files

Convert into a Pandas Data?Frame

Label with its respective ticker?symbol

领英推荐

Append multiple?tables

Output Data

Video Tutorial

Conclusion

About Me

Machine Learning For Beginners

2023年3月12日

Setup an AWS database for FREE

2023年2月6日

Looping API Request in Python to retrieve data in bulk

2022年12月24日

Extract stock price data with Python

2022年12月13日

Pivot Table in?Python

2022年11月29日

Tableau Quick Tip: Dynamic Chart Title

2022年11月21日

Tableau Desktop Quick Tip: The Problem with Pie Chart & Legend

2022年11月20日

Simple Steps — Data Exploration

2022年8月8日

Problem Solving: List Comprehension (Python)

2022年1月28日

List Comprehension (Python) Explained

2022年1月18日

社区洞察

其他会员也浏览了

D-TALE

Effortless Data Tabulation in Python with Tabulate Library

Sweetviz

What makes Python a brilliant choice for Data Analysis?

Python Pandas DataFrame

Unlocking Insights: The Power Of Python For Data Analysis

The Top 10 Python-Based Data Science Skills

Python Basics for Data Science

Do You Read Excel Files with Python? There is a 1000x Faster?Way

High-Performance Data Analysis with Polars: A Comprehensive Guide