登录查看更多内容

Python for Finance in Excel — Filling in Blanks in Financial Trading Data

Bogdan Tudose

Co-Head of Data Science at Training The Street

发布日期: 2023年8月30日

A common data cleaning exercise in finance is to fill in blank rows when pulling in share prices of tickers from different stock exchanges. Data providers such as Bloomberg, Yahoo Finance and Refinitiv usually skip holidays and weekends when you are pulling in trading data. However, when you merge data from a company that is dual listed, you could have holidays on the NYSE that don’t exist on TSX, and vice versa. If you then plot the share prices, you might get your line charts randomly dropping to 0 if your data has blanks.

One solution in Excel is to do an IF Statement that says, “if there is no price, pull in the price from row above.” With Python, this is much simpler with a quick?.fillna() formula from the pandas package.

The demo below walks through how I use?.fillna() to populate missing share prices for Enbridge (ENB), a Canadian energy company that’s dual-listed on NYSE and TSX.

The demo Excel file can be downloaded from my Github repo, where I will be posting more demos in the future: https://github.com/dbogt/PythonExcel

Here is also the direct link to the file: https://github.com/dbogt/PythonExcel/raw/main/Python%20in%20Excel%20Demos%20-%20Demo%201%20-%20Fillna.xlsx

Step 1?—?Grab the data with?Excel

I used the Stocks feature on the Data ribbon of Excel to create the tickers and then the STOCKHISTORY formula to pull in data from Jan 4, 2022 to Dec 31, 2022.

Formulas in cells A6 and D6 below:

=STOCKHISTORY(A1,$A$3,$A$4,0,1)

=STOCKHISTORY(D1,$A$3,$A$4,0,1)

Stockhistory formula in Excel to pull historical share prices.

Step 2?—?Merge the Data and Fill Blanks with Python in?Excel

Before we can merge the two data sets with Python and fill in the blanks, we have to first create two variables that convert normal ranges in Excel into Python “dataframes” which are a special data type in the pandas package in Python. These dataframes are the equivalent of the Table tool in Excel. Once data is stored as a dataframe, you can easily refer to columns using tableName[‘Header’], very similar to how Excel uses Table1[Header] instead of cell references.

One nice thing I noticed while playing with the Python implementation in Excel, is that it handles dynamic arrays fairly well. The codes seen in screenshot below:

In cell H4 I created the variable to keep track of the TSX data of Enbridge. You can quickly generate this code by writing =PY( df_tsx= and then selecting the share prices that were pulled in Step 1 with StockHistory.

df_tsx = xl("A6#", headers=True)

In cell H5 I did the same type of code with the NYSE data of Enbridge:

df_nyse = xl("D6#", headers=True)

In cell H6 I merged the two tables using the pandas formula?.merge and saved as variable “merged”

merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))

In cell H7 I fixed the merged data set by sorting on Date (old to new) and then using the?.fillna() formula to “copy” over prices from rows above to empty rows below whenever there was a blank

merged_fixed = merged.sort_values('Date').fillna(method='ffill')

This was done using the method called ffill that stands for “forward fill”
Of course, pandas package has thought of everything, and you can also do a bfill for backwards fill, in case your dates are sorted new to old
Full documentation of the fillna formula is here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html

领英推荐

Navigating the Data Analytics Landscape: Python's Edge…

Gauch Nikolas, MBA, CBAP 1 年前

Pandas - Create DataFrame

David Rojas, E.I. 9 个月前

Pandas - Sort DataFrame

David Rojas, E.I. 9 个月前

Python code directly in Excel to merge data sets and fill in blanks.

Let’s take a step back and dive deeper on what the?.merge and?.fillna formulas are doing.

Merge Formula

The pandas cheat sheet brochure has a nice graphic on the second page on how merge function works:

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

In the pandas documentation there is also a whole section explaining how to merge data sets:

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Let’s breakdown our code and explain piece by piece what each part of the formula was doing:

merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))

The formula can be used in this style table1.merge(table2)
In our case table1 was df_tsx (TSX data of Enbridge) and table2 df_nyse (NYSE data of Enbridge)
These are also referred to as the table on the “left” and table on the “right” because one is on the left of the word?.merge, and the other on the right, inside the formula
how = ‘outer’ is telling Python to keep all the rows for all the dates
Outer means keep “all” the data, even if there may be missing rows from one table or the other
If we had said how=’inner’, it would keep only dates that overlap between the two exchanges and exclude any days where there was a holiday in either country
If we had said how=’left’, it would have kept only the TSX trading days and “miss” out on the dates where there was a holding in Canada (even if NYSE was open those days) and vice versa if we had said how=’right’ (keep NYSE trading days, miss out on U.S. holidays)
on = ‘Date’ is telling Python to merge on the “Date” column
Think of this as what are we using for the “VLOOKUP” if we had done this using only Excel
suffixes = (‘_TSX’, ‘_NYSE’) is telling Python to add a suffix to columns that have the same name but are not being merged on
Since Date was used for the merge, it only shows up once in the new consolidated table
However, “Close” column is different in each data set (one is the CAD prices on TSX, and the other the USD prices on NYSE) so we need to keep track of which Close column is which
If we don’t specify a suffixes part in the formula, pandas package will just create “Close_x” and “Close_y” header names

Once the data sets are merged, we will have “NA’s” or blanks on days where there wasn’t an overlap in dates.

If you want to see what days will have “problems” check out the trading holidays on NYSE and TSX:

For example, in May 2022, in Canada markets were closed on Victoria Day on May 23, while in US markets were closed on May 30 for Memorial Day. The same issue happens for Thanksgiving. In Canada, markets were closed on October 10, 2022, and in US, markets were closed on November 24, 2022.

Missing share prices in May for Canadian and US holidays

Fillna formula

Let’s now go over how we fixed the merged data:

merged_fixed = merged.sort_values('Date').fillna(method='ffill')

We took the “merged” variable and made sure the dates were sorted first in ascending order by date using?.sort_values(‘Date’)
.fillna(x) is then used to fill in any NA errors or blanks with “x”
For example?.fillna(0) would fill in any blanks with the number 0
In this case we don’t want to have a day where the share price was 0, since it could ruin our time-series charts and any analysis we’re doing (e.g. if we’re calculating daily returns later, it might show a day where stock dropped 100%, even though that wasn’t the case)
Instead we used method=‘ffill’ to say that we want to “forward fill”, i.e. take the row above and fill it forward (or downward) in the blanks rows below

If we wanted to sort our table in descending order (newest date to old), then the code would look like this:

merged_fixed = merged.sort_values('Date', ascending=False).fillna(method='bfill')

Here is what those May 2022 days look like now after the fillna formula, the blue cells were copied over to the yellow blank rows below:

Disclaimer: Depending on the analysis you are trying to do, this might not be the correct solution. Sometimes you may want to populate blanks with a 0, sometimes with the median or average of the numbers in the column, and sometimes you may just want to remove any blank rows. With the pandas package in Python you can remove rows with?.dropna() which we’ll cover in a future demo.

Below are all the codes used in the Excel file:

df_tsx = xl("A6#", headers=True) 

df_nyse = xl("D6#", headers=True) 

merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE')) 

merged_fixed = merged.sort_values('Date').fillna(method='ffill')

要查看或添加评论，请登录

Bogdan Tudose的更多文章

Python for Finance in Excel — Moving Averages Chart

2023年9月6日

Python for Finance in Excel — Moving Averages Chart

Creating Interactive Moving Average Charts with Python in Excel In this week’s article I am showcasing the power of…

1 条评论

Python for Finance in Excel — Filling in Blanks in Financial Trading Data

Bogdan Tudose

Co-Head of Data Science at Training The Street

Step 1?—?Grab the data with?Excel

Step 2?—?Merge the Data and Fill Blanks with Python in?Excel

领英推荐

Merge Formula

Fillna formula

Bogdan Tudose的更多文章

社区洞察

其他会员也浏览了

Using Python Pandas to turn ISO Country Codes into a string to use as values for a SQL Query

Day 3: Exploring Python Data Types – Built-in Types, Getting, and Setting Data Types (Boy and Monk Series)

Data Analytics tool and their implementation with Python.

The application of Python for analysis and visualization of survey form data and the use of Spearman's correlation

Pandas or Excel

Pandas 1.0 - Summary of enhancements and recommendations

Why R is a Natural Fit for Excel Users Entering Data Science

Converting a $100k+ Trading Algorithm from Excel to Python

How to Visualise a Data Frame as a Heat Map in Python + R

11 Must-Know SQL String Functions in Python for Data Analysts and Engineer's

Step 1?—?Grab the data with?Excel

Step 2?—?Merge the Data and Fill Blanks with Python in?Excel

领英推荐

Merge Formula

Fillna formula

Bogdan Tudose的更多文章

Python for Finance in Excel — Moving Averages Chart

社区洞察

其他会员也浏览了

Using Python Pandas to turn ISO Country Codes into a string to use as values for a SQL Query

Day 3: Exploring Python Data Types – Built-in Types, Getting, and Setting Data Types (Boy and Monk Series)

Data Analytics tool and their implementation with Python.

The application of Python for analysis and visualization of survey form data and the use of Spearman's correlation

Pandas or Excel

Pandas 1.0 - Summary of enhancements and recommendations

Why R is a Natural Fit for Excel Users Entering Data Science

Converting a $100k+ Trading Algorithm from Excel to Python

How to Visualise a Data Frame as a Heat Map in Python + R

11 Must-Know SQL String Functions in Python for Data Analysts and Engineer's