Data Analysis Using Python pandas
In this article, I’ll demonstrate some simple data analysis using the Python Data Analysis Library pandas.
For the data-set, I’ll use the stock price history for Microsoft Corporation (NASDAQ: MSFT) from Jan. 01, 1991 through Dec. 31, 2016. This data-set will be downloaded from Yahoo! Finance using the pandas_datareader module.
- Import the requisite libraries.
import pandas as pd
from pandas_datareader import data
from datetime import datetime
%matplotlib inline
- Define the time-frame for the data.
# Get all available data between Jan. 01, 1991 and Dec. 31, 2016
?start, end = datetime(1991, 1, 1), datetime(2016, 12, 31)
- Download data from Yahoo! Finance into a DataFrame.
msft_history = data.get_data_yahoo('MSFT', start, end).round(2).sort_index(ascending=False)
Now that we have the data-set, let’s perform some simple data analysis.
Let’s find the highest and lowest prices for MSFT’s stock for each year in the data-set.
- Group by year and get the highest stock price for each year.
year_highest = msft_history.groupby(msft_history.index.year).max()['High']
- Group by year and get the lowest stock price for each year.
year_lowest = msft_history.groupby(msft_history.index.year).min()['Low']
- Combine the two series’ generated above into a single DataFrame.
annual_highest_lowest = pd.concat([year_highest, year_lowest], axis=1)
Thus, we can see the highest and lowest prices of the stock for each year. For example, in 2013, MSFT’s stock traded for as high as $38.98 and for as low as $26.28.
Let’s see a graphical representation.
annual_highest_lowest.plot(grid=True, figsize=(20,10)).legend(loc=2,prop={'size':16})