Data Analysis with Python: Stop Reading and Start Doing (Analyzing Financial Data)
I hear, and I forget. I see, and I remember. I do and I understand ~ Confucius
Data analysis is best learned by working on different types of data and getting your hands dirty. In this week's article, we are going to analyze some fictional financial data using Python. Analyzing financial statements with Python provides valuable insights into business performance and helps businesses make data-driven decisions. We will tackle four questions about four aspects of the data: revenue, costs, profit, and correlations.
Analyzing Financial Data
We will begin by importing the Python libraries needed for this analysis and loading the data that we are going to use:
Before we start working on the dataset, we will create a copy of it. We will use the info() method to get some insights into the data. This method is helpful for understanding the size and composition of the dataset.
So, this dataset has 19 rows and 4 columns. We do not have any missing values. The data types of the columns are int64 and object. Let's start tackling the questions.
1. Compare the sales of the first seven days (week one) to the sales of the next seven days (week two). Use a bar graph to visualize the comparison.
This question is asking that we compare the sales values of the first week to the second week of data. Since we are dealing with time series data, the first thing that we need to do is convert the "date" column into a datetime object. This step ensures that the date values are properly formatted and can be manipulated as datetime objects.
Once we have converted the date column into a date object, we need to find a way to extract the weeks from the DataFrame. To achieve this, we are going to use the dt.isocalendar().week method to extract the ISO week number from each datetime object in the "date" column. We will use this to create a new column called "number_week" in the DataFrame. See below:
You can see above that a column, "number_week," has been added to the DataFrame. The numbers in the columns 6, 7, and 8 represent ISO week numbers. Since we want the first and second weeks of the data, we are going to use the values 6 and 7 as numbers for weeks one and two, respectively.
Please note that this is just one way you can solve this challenge. There are other methods you can use that will give the same output. Now that we have calculated the sales for weeks 1 and 2, we are required to use the bar chart to visualize the difference in sales. We are going to use Matplotlib to create the plot.
Visualization makes everything better. You can see that we had more sales in week 2 than week 1.
2. What percentage of total costs do direct costs represent on average? What is the mean cost variation change between the minimum cost and the maximum cost as a percentage of total costs?
We will calculate total costs by adding the "direct_costs" column to the "overheads" column. We are going to divide the "direct_costs" by "total_costs." and use the mean() method to calculate the percentage average.
We can now calculate the cost variation. This will give us the mean percent change over the period. We will calculate cost variation by finding the difference between the maximum and minimum values in total_costs, dividing it by the mean() of the total_costs, and then multiplying by 100 to convert it to a percentage.
领英推荐
According to Forbes, data analysis is one of the high-income skills to learn in 2024. There is no better way to become proficient at data analysis than by getting your hands dirty and tackling some challenges. Start your 50-day journey today.
3. What is the difference between profit values for Fridays and Mondays? Visualize the difference using the bar chart. Visualize the daily profitability trend over time (whole period) using a line plot (use the date column).
To answer the first part of the question, we first need to add a "profit" column to the DataFrame. We are going to filter the DataFrame using the "day_of_week" column to get profits for Mondays and Fridays. We will calculate the total profit for Mondays and the total profit for Fridays. We will use the values to calculate the difference and plot the bar chart.
You can clearly see that we have more money coming in at the end of the week than at the beginning of the week.
The second part of the question wants us to visualize the daily profitability using the line graph. Here is the code and graph below:
Overall, the profitability trend is upward.
4. What are the potential correlations between sales, direct costs, and overheads? Use a heatmap to visualize the correlations.
To calculate the correlations, we are going to use the pandas corr() method. We will use this method on the "sales," "direct_costs," and "overheads" columns. Correlation values range between 1 and -1. A coefficient close to 1 indicates a strong positive correlation (as one variable increases, the other tends to increase as well). A coefficient close to -1 indicates a strong negative correlation (as one variable increases, the other tends to decrease). A coefficient close to 0 indicates little to no linear correlation between the variables.
These results show that there is a strong positive correlation between the variables. The correlation is even stronger between sales and direct costs. Let's use a heatmap to visualize the correlation.
You can now clearly see the strong correlation between sales and direct costs.
Conclusion
This demonstrates how you can carry out financial data analysis using Python. Doing challenges is by far the most effective way to learn data analysis. You can download the dataset used in this article from GitHub. Thanks for reading this article. Please like, share, and subscribe?to this newsletter if you are not yet a subscriber.
Newsletter Sponsorship
You can reach a highly engaged audience of over 320,000 tech-savvy subscribers and grow your brand with a newsletter sponsorship. Contact me at [email protected] today to learn more about the sponsorship opportunities.
Systemr?dgiver
8 小时前Really handy article and exactly what I’ve been looking for ??. It’s rare to find articles that focus on Python for data analysis with examples, so this is a super useful read. By the way, do you know how Python can be used to retrieve API keys and integrate with financial systems in an easy way?? Thanks for sharing!
Great advice
Passionate Data Analyst
2 天前Very informative
Manager at Bestsupermarket uk
3 天前SSD Chemical is a type of cleaning chemical used by the people who want to clean black money at home. This black money cleaning solution is a standardized chemical solution generally used to remove excess stain from every type of currencies. SSD solution chemical is also known as synthetic surfactant deceased. Basically, it’s a prefix of SSD cleaner family. So, SSD chemical solution plays an important part in the cleaning of currency. That’s why one can buy SSD chemical solution online to get the cleaning services for defaced money., below is the email Id:[email protected]
--
3 天前Wow. You always make analysis with python looks very simple. Thanks Sir ??.