Python for Finance in Excel — Filling in Blanks in Financial Trading Data
A common data cleaning exercise in finance is to fill in blank rows when pulling in share prices of tickers from different stock exchanges. Data providers such as Bloomberg, Yahoo Finance and Refinitiv usually skip holidays and weekends when you are pulling in trading data. However, when you merge data from a company that is dual listed, you could have holidays on the NYSE that don’t exist on TSX, and vice versa. If you then plot the share prices, you might get your line charts randomly dropping to 0 if your data has blanks.
One solution in Excel is to do an IF Statement that says, “if there is no price, pull in the price from row above.” With Python, this is much simpler with a quick?.fillna() formula from the pandas package.
The demo below walks through how I use?.fillna() to populate missing share prices for Enbridge (ENB), a Canadian energy company that’s dual-listed on NYSE and TSX.
The demo Excel file can be downloaded from my Github repo, where I will be posting more demos in the future: https://github.com/dbogt/PythonExcel
Here is also the direct link to the file: https://github.com/dbogt/PythonExcel/raw/main/Python%20in%20Excel%20Demos%20-%20Demo%201%20-%20Fillna.xlsx
Step 1?—?Grab the data with?Excel
I used the Stocks feature on the Data ribbon of Excel to create the tickers and then the STOCKHISTORY formula to pull in data from Jan 4, 2022 to Dec 31, 2022.
Formulas in cells A6 and D6 below:
=STOCKHISTORY(A1,$A$3,$A$4,0,1)
=STOCKHISTORY(D1,$A$3,$A$4,0,1)
Step 2?—?Merge the Data and Fill Blanks with Python in?Excel
Before we can merge the two data sets with Python and fill in the blanks, we have to first create two variables that convert normal ranges in Excel into Python “dataframes” which are a special data type in the pandas package in Python. These dataframes are the equivalent of the Table tool in Excel. Once data is stored as a dataframe, you can easily refer to columns using tableName[‘Header’], very similar to how Excel uses Table1[Header] instead of cell references.
One nice thing I noticed while playing with the Python implementation in Excel, is that it handles dynamic arrays fairly well. The codes seen in screenshot below:
df_tsx = xl("A6#", headers=True)
df_nyse = xl("D6#", headers=True)
merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))
merged_fixed = merged.sort_values('Date').fillna(method='ffill')
领英推荐
Let’s take a step back and dive deeper on what the?.merge and?.fillna formulas are doing.
Merge Formula
The pandas cheat sheet brochure has a nice graphic on the second page on how merge function works:
In the pandas documentation there is also a whole section explaining how to merge data sets:
Let’s breakdown our code and explain piece by piece what each part of the formula was doing:
merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))
Once the data sets are merged, we will have “NA’s” or blanks on days where there wasn’t an overlap in dates.
If you want to see what days will have “problems” check out the trading holidays on NYSE and TSX:
For example, in May 2022, in Canada markets were closed on Victoria Day on May 23, while in US markets were closed on May 30 for Memorial Day. The same issue happens for Thanksgiving. In Canada, markets were closed on October 10, 2022, and in US, markets were closed on November 24, 2022.
Fillna formula
Let’s now go over how we fixed the merged data:
merged_fixed = merged.sort_values('Date').fillna(method='ffill')
If we wanted to sort our table in descending order (newest date to old), then the code would look like this:
merged_fixed = merged.sort_values('Date', ascending=False).fillna(method='bfill')
Here is what those May 2022 days look like now after the fillna formula, the blue cells were copied over to the yellow blank rows below:
Disclaimer: Depending on the analysis you are trying to do, this might not be the correct solution. Sometimes you may want to populate blanks with a 0, sometimes with the median or average of the numbers in the column, and sometimes you may just want to remove any blank rows. With the pandas package in Python you can remove rows with?.dropna() which we’ll cover in a future demo.
Below are all the codes used in the Excel file:
df_tsx = xl("A6#", headers=True)
df_nyse = xl("D6#", headers=True)
merged = df_tsx.merge(df_nyse, how='outer', on='Date', suffixes=('_TSX','_NYSE'))
merged_fixed = merged.sort_values('Date').fillna(method='ffill')