Python In Finance Part 1
What will you learn in this series?
The goal of this series is to showcase how to collect and analyze financial data. Throughout the tutorial you will learn how to use Pandas Dataframes and the Pandas library to obtain and visualize financial data.
Before Starting
Before you start make sure you have the following libraries installed on your system
- pandas
- numpy
- matplotlib
If you do not have those libraries you can instal them by running pip install pandas numpy matplotlib inside your python terminal. Alternatevely if you are using conda you can use conda install pandas numpy matplotlib to install the packages
1. Getting Financial Information
To start off we are going to import the needed libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
import os
from dateutil.relativedelta import relativedelta
Now we can start by writing our first function which will get financial data from Yahoo!. Do do that we pass a list of tickers and the timeframe in which we are interested (i.e start and end dates). Additionally we would like to save the data locally in a .csv file so we can access it easily after.
#function to get data and save it locally
def get_data_from_yahoo(tickers, startDate, endDate):# Create a path to save all the stock data as .csv filesif not os.path.exists('./stock_csvs'):
os.makedirs('./stock_csvs')
# save data for the listed companies
for ticker in tickers:
#if the file does not exist create itif not os.path.exists('./stock_csvs/{}.csv'.format(ticker)):
df = web.DataReader(ticker, 'yahoo', startDate, endDate)
df.to_csv('./stock_csvs/{}.csv'.format(ticker))
#if the file exists update itelse:
os.remove('./stock_csvs/{}.csv'.format(ticker))
df = web.DataReader(ticker, 'yahoo', startDate, endDate)
df.to_csv('./stock_csvs/{}.csv'.format(ticker))
Now we are ready to try out the code. We can do that by
tickers = ['AAPL', 'TSLA', 'GOOGL', 'MSFT']
"""The dates are represented by strings of the format YYYY-MM-DD. You can also use the datetime
libary to get stock data for some time prior to now."""#Uncomment accordingly to the method you want (either direct string or using datetime library)
endDate = '2019-02-19' #using direct string format
endDate = endDate = dt.datetime.now().strftime("%Y-%m-%d") #using datetime library
startDate = '2014-02-19'
startDate = (dt.datetime.now() + relativedelta(years=-5)).strftime("%Y-%m-%d") #set start date 5 years back
get_data_from_yahoo(tickers, startDate, endDate)
Now you can see that our new /stock_csvs directory is starting to populate.
Now that we have our data saved and stored we can begin exploring how to read it and analyze it using Pandas Dataframes. Pandas are very widely used in industry and is a very good library to get started with any kind of data analysis.
2. Getting Familiar with our data
In this section we will talk about how to convert the .csv files into panda dataframes and then learn about how we can get information about our data and how to manipulate it. This is a very usuful skill to have since in order to properly analyze the data in a meaningful way we first need to understand how our data looks like and precisely what we need to get out of it
We will start off by learning how to read one .csv file and then use that as a template for a function that can do that with any of the .csv files. To start of we need to initiate a dataframe object. It is easy to think of a Dataframes as a data table which you can manipulate but also combine with other tables. To initialize a dataframe:
df = pd.DataFrame()
Now it is time to populate our dataframe using the values saved on our .csv file. Hopefully for us the pandas library gives includes a special function for that which prses the given .csv file to a dataframe object. To do so select the stock you want (In this example I selected Microsoft) and run the following:
df = pd.read_csv('./stock_csvs/MSFT.csv')
Now that we have our dataframe object we can manipulate it in a variety of ways. Below are some useful functions to know when working with dataframes
- df['Column Name'] : This function returns the column specified in the argument
- df.head() : returns the top rows of our DataFrame Object
- df.tail() : returns the bottom rows of our DataFrame Object
- df.shape() : returns the shape of the Dataframe (i.e. number of rows x number of columns
- df.drop(['column1'], 1, inplace=True) : This function drops the list of columns given as arguments, from the dataframe
- pd.concat([df1, df2]) : This function is used to merge together two dataframes
Now that you have a grasp of the basics let's apply some of them to our dataframe
#Print the shape of our dataframe
print(df.shape())
#Print the head of our Dataframe to get some more information about the columns and the data
print(df.head())
During those series we will be concerned ony with the Price of our stocks on each Date. As such we can drop all other unecessary columns from our dataframe.
# Drop unecessary columns
df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True)
Now that we have a basic understanding of dataframes let's create a function that will combine all of our csv data into one dataframe. To do that we will use a loop to go through all the different tickers obtain a dataframe for them and then merge that with a main DataFrame that will hold al of the information. Here is how we can implement that
#function to compile data in usable form
def compile_data(tickers):
#initialize a mian dataframe
main_df = pd.DataFrame()
#go through all the tickers
for ticker in tickers:
#convert the .csv file to a dataframe and drop the unecessary column
df = pd.read_csv('./stock_csvs/{}.csv'.format(ticker))
df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True)
#Add a new column with the ticker name and merge with the main dataframe
df['Ticker']=ticker
main_df = pd.concat([main_df, df])
main_df = main_df.reset_index()
main_df.drop(['index'], 1, inplace=True)
return main_df
As simple as that! Now that we have our function we can directly pass a list of tickers and look at the final result.
main_df = compile_data(tickers)
print(main_df.shape())
print(main_df.head())
Closing
I will now leave it up to you to make sense of the data and by all means have a play with it. Additionally I will encourage you to take a look at the pandas documentation which you can find here and definetely try manipulating the data in different ways in order to familiarize yourself as much as possible. Again it worth stressing that using Pandas is a very seful skill to have and will definetely serve you well in the future.
Thank you very much for listening and I hope to see you back for Part 2 where we will be using all that we have learned to do some basic Markowitz Portfolio optimization.
If you want to learn more about me feel free to check out my personal website https://www.theodorosouzounellis.com or follow me on Twitter @OuzounellisT.