登录查看更多内容

Python In Finance Part 1

Theodoros Ouzounellis

Credit Desk Strat Associate at Goldman Sachs

发布日期: 2019年3月18日

What will you learn in this series?

The goal of this series is to showcase how to collect and analyze financial data. Throughout the tutorial you will learn how to use Pandas Dataframes and the Pandas library to obtain and visualize financial data.

Before Starting

Before you start make sure you have the following libraries installed on your system

pandas
numpy
matplotlib

If you do not have those libraries you can instal them by running pip install pandas numpy matplotlib inside your python terminal. Alternatevely if you are using conda you can use conda install pandas numpy matplotlib to install the packages

1. Getting Financial Information

To start off we are going to import the needed libraries

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import pandas_datareader.data as web
    import datetime as dt  
    import os
    from dateutil.relativedelta import relativedelta

Now we can start by writing our first function which will get financial data from Yahoo!. Do do that we pass a list of tickers and the timeframe in which we are interested (i.e start and end dates). Additionally we would like to save the data locally in a .csv file so we can access it easily after.

    #function to get data and save it locally
    def get_data_from_yahoo(tickers, startDate, endDate):# Create a path to save all the stock data as .csv filesif not os.path.exists('./stock_csvs'):
            os.makedirs('./stock_csvs')
    
        # save data for the listed companies
        for ticker in tickers:
            #if the file does not exist create itif not os.path.exists('./stock_csvs/{}.csv'.format(ticker)):
                df = web.DataReader(ticker, 'yahoo', startDate, endDate)
                df.to_csv('./stock_csvs/{}.csv'.format(ticker))
            #if the file exists update itelse:
                os.remove('./stock_csvs/{}.csv'.format(ticker))
                df = web.DataReader(ticker, 'yahoo', startDate, endDate)
                df.to_csv('./stock_csvs/{}.csv'.format(ticker))

Now we are ready to try out the code. We can do that by

    tickers = ['AAPL', 'TSLA', 'GOOGL', 'MSFT']

    """The dates are represented by strings of the format YYYY-MM-DD. You can also use the datetime 
    libary to get stock data for some time prior to now."""#Uncomment accordingly to the method you want (either direct string or using datetime library)
    endDate = '2019-02-19' #using direct string format
    endDate = endDate = dt.datetime.now().strftime("%Y-%m-%d") #using datetime library
    startDate = '2014-02-19'
    startDate = (dt.datetime.now() + relativedelta(years=-5)).strftime("%Y-%m-%d") #set start date 5 years back

    get_data_from_yahoo(tickers, startDate, endDate)

Now you can see that our new /stock_csvs directory is starting to populate.

Now that we have our data saved and stored we can begin exploring how to read it and analyze it using Pandas Dataframes. Pandas are very widely used in industry and is a very good library to get started with any kind of data analysis.

2. Getting Familiar with our data

In this section we will talk about how to convert the .csv files into panda dataframes and then learn about how we can get information about our data and how to manipulate it. This is a very usuful skill to have since in order to properly analyze the data in a meaningful way we first need to understand how our data looks like and precisely what we need to get out of it

We will start off by learning how to read one .csv file and then use that as a template for a function that can do that with any of the .csv files. To start of we need to initiate a dataframe object. It is easy to think of a Dataframes as a data table which you can manipulate but also combine with other tables. To initialize a dataframe:

df = pd.DataFrame()

Now it is time to populate our dataframe using the values saved on our .csv file. Hopefully for us the pandas library gives includes a special function for that which prses the given .csv file to a dataframe object. To do so select the stock you want (In this example I selected Microsoft) and run the following:

df = pd.read_csv('./stock_csvs/MSFT.csv')

Now that we have our dataframe object we can manipulate it in a variety of ways. Below are some useful functions to know when working with dataframes

df['Column Name'] : This function returns the column specified in the argument
df.head() : returns the top rows of our DataFrame Object
df.tail() : returns the bottom rows of our DataFrame Object
df.shape() : returns the shape of the Dataframe (i.e. number of rows x number of columns
df.drop(['column1'], 1, inplace=True) : This function drops the list of columns given as arguments, from the dataframe
pd.concat([df1, df2]) : This function is used to merge together two dataframes

Now that you have a grasp of the basics let's apply some of them to our dataframe

#Print the shape of our dataframe

print(df.shape())

#Print the head of our Dataframe to get some more information about the columns and the data 

print(df.head())

During those series we will be concerned ony with the Price of our stocks on each Date. As such we can drop all other unecessary columns from our dataframe.

  # Drop unecessary columns
  df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True)

Now that we have a basic understanding of dataframes let's create a function that will combine all of our csv data into one dataframe. To do that we will use a loop to go through all the different tickers obtain a dataframe for them and then merge that with a main DataFrame that will hold al of the information. Here is how we can implement that

    #function to compile data in usable form
    def compile_data(tickers):
       #initialize a mian dataframe
       main_df = pd.DataFrame()
       #go through all the tickers
       for ticker in tickers:
          #convert the .csv file to a dataframe and drop the unecessary column
          df = pd.read_csv('./stock_csvs/{}.csv'.format(ticker))
          df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True)
          #Add a new column with the ticker name and merge with the main dataframe
          df['Ticker']=ticker
          main_df = pd.concat([main_df, df])

          main_df = main_df.reset_index()
          main_df.drop(['index'], 1, inplace=True)
        
          return main_df

As simple as that! Now that we have our function we can directly pass a list of tickers and look at the final result.

    main_df = compile_data(tickers)
    print(main_df.shape())
    print(main_df.head())

Closing

I will now leave it up to you to make sense of the data and by all means have a play with it. Additionally I will encourage you to take a look at the pandas documentation which you can find here and definetely try manipulating the data in different ways in order to familiarize yourself as much as possible. Again it worth stressing that using Pandas is a very seful skill to have and will definetely serve you well in the future.

Thank you very much for listening and I hope to see you back for Part 2 where we will be using all that we have learned to do some basic Markowitz Portfolio optimization.

If you want to learn more about me feel free to check out my personal website https://www.theodorosouzounellis.com or follow me on Twitter @OuzounellisT.

Theodoros Ouzounellis的更多文章

Visualizing climate change using Azure Notebooks

2019年2月25日

Visualizing climate change using Azure Notebooks

A bit of a background In today’s blog I will be sharing my experience of using Azure/Jupyter notebooks to visualize…

2 条评论

Python In Finance Part 1

Theodoros Ouzounellis

Credit Desk Strat Associate at Goldman Sachs

What will you learn in this series?

Before Starting

1. Getting Financial Information

2. Getting Familiar with our data

Closing

Theodoros Ouzounellis的更多文章

社区洞察

其他会员也浏览了

AIML26-Pandas and Python Tips and Tricks for Data Science and Data Analysis

The lambda() and more

Ducks vs Pythons: How to write Iceberg tables using PyArrow and analyze them using Polars

Data Manipulation with Pandas

Use Python In Power Query to retrieve H3 indices

Pandas - Create DataFrame

Pandas - Create Basic Visualizations

Pandas DataFrame Operations: A Comprehensive Guide

Using Python Pandas to turn ISO Country Codes into a string to use as values for a SQL Query

Crafting Visuals in Python

What will you learn in this series?

Before Starting

1. Getting Financial Information

2. Getting Familiar with our data

Closing

Theodoros Ouzounellis的更多文章

Visualizing climate change using Azure Notebooks

社区洞察

其他会员也浏览了

AIML26-Pandas and Python Tips and Tricks for Data Science and Data Analysis

The lambda() and more

Ducks vs Pythons: How to write Iceberg tables using PyArrow and analyze them using Polars

Data Manipulation with Pandas

Use Python In Power Query to retrieve H3 indices

Pandas - Create DataFrame

Pandas - Create Basic Visualizations

Pandas DataFrame Operations: A Comprehensive Guide

Using Python Pandas to turn ISO Country Codes into a string to use as values for a SQL Query

Crafting Visuals in Python