登录查看更多内容

Working with Time Series Data in Python

Ime Eti-mfon

Data Scientist | Machine Learning Engineer | Data Program Community Ambassador @ ALX

发布日期: 2024年11月13日

Analyzing Trends with Pandas

Time series data analysis is crucial for identifying patterns, forecasting, and understanding trends across time. From stock prices to weather patterns, time series data is everywhere. Python’s Pandas library provides powerful tools to manipulate and analyze this data effectively. This article covers working with time series data using Pandas, including date parsing, resampling, and aggregating data over specific time intervals.

1. Setting Up the Data: Loading and Parsing Dates

Time series data usually includes a date or time column, but it might need to be in a recognizable format for analysis. Pandas can parse date information directly during data import, allowing for smoother handling of time-based data.

Example: Loading a CSV with Date Parsing

Let’s use a sample dataset, sales_data.csv, which includes daily sales figures with date information:

import pandas as pd

# Load data and parse dates in the 'date' column
df = pd.read_csv("sales_data.csv", parse_dates=['date'])

# Display the first few rows
print(df.head())

Using parse_dates=['date'] makes sure Pandas recognizes the date format in the ‘date’ column, enabling us to perform date-specific operations like filtering and resampling.

2. Indexing with Dates for Easy Access

Setting the date column as the index is useful for time series analysis because it allows us to slice data by date ranges efficiently.

# Set 'date' column as the index
df.set_index('date', inplace=True)

# View the first few rows to confirm the index change
print(df.head())

Now, we can easily filter data by date. For example, we can view sales data for January 2023:

# Filter data for January 2023
january_data = df['2023-01']
print(january_data)

3. Resampling Data for Time-Based Aggregation

Time series data often requires aggregation over larger intervals. Resampling allows us to create these summary statistics by aggregating data at different frequencies (e.g., daily to monthly).

Monthly Sales Aggregation Example

Let’s calculate the total monthly sales:

# Resample data to get monthly sales
monthly_sales = df['sales_amount'].resample('M').sum()
print(monthly_sales)

In this example, resample('M') changes the frequency to monthly, and .sum() aggregates sales within each month. You could replace M with other time intervals like W (weekly) or Q (quarterly).

4. Time-Based Rolling Windows

Rolling windows allow us to analyze trends by calculating moving averages or other statistics over a defined period. This technique is useful for smoothing out fluctuations in time series data, giving us a clearer view of trends.

7-Day Moving Average Example

Let’s calculate a 7-day moving average for sales data:

# Calculate 7-day moving average
df['7_day_avg'] = df['sales_amount'].rolling(window=7).mean()
print(df[['sales_amount', '7_day_avg']].head(15))

Using .rolling(window=7).mean() calculates a 7-day average, which helps smooth daily fluctuations, giving us a better sense of the underlying trend.

领英推荐

The Only Roadmap You’ll Ever Need for Data Science…

Arif Alam 5 个月前

Data Analysis and Visualization with Pandas and…

Free Online Courses With Certificates 10 个月前

Missingno

360DigiTMG 1 年前

5. Time Series Data Visualization

Visualization helps interpret time series data effectively. Let’s use Matplotlib to plot the sales data and the 7-day moving average.

import matplotlib.pyplot as plt

# Plot daily sales and 7-day moving average
plt.figure(figsize=(12,6))
plt.plot(df.index, df['sales_amount'], label="Daily Sales")
plt.plot(df.index, df['7_day_avg'], label="7-Day Moving Average", color="orange")
plt.title("Daily Sales and 7-Day Moving Average")
plt.xlabel("Date")
plt.ylabel("Sales Amount")
plt.legend()
plt.show()

This visualization reveals sales patterns and how the moving average smooths out short-term fluctuations, helping us focus on long-term trends.

Case Study: Monthly Revenue Patterns of a Retail Store

Imagine you’re working as a data analyst for a retail store with daily revenue data. The goal is to analyze monthly revenue patterns and provide insights for the store’s finance team.

Step 1: Load and Explore the Data

First, load the data, parse the date column, and set it as the index for easy slicing and resampling.

# Load data and set up time series
data = pd.read_csv("retail_revenue.csv", parse_dates=['date'])
data.set_index('date', inplace=True)

Step 2: Aggregate Revenue on Monthly Basis

With the data loaded, we calculate monthly total revenue to observe revenue trends over time.

# Aggregate revenue monthly
monthly_revenue = data['revenue'].resample('M').sum()
print(monthly_revenue)

This gives us the total revenue for each month, showing how performance varies from month to month.

Step 3: Visualize Monthly Revenue Trends

A line plot of monthly revenue reveals patterns and seasonal variations.

plt.figure(figsize=(10,6))
plt.plot(monthly_revenue.index, monthly_revenue, marker='o')
plt.title("Monthly Revenue Trend")
plt.xlabel("Month")
plt.ylabel("Total Revenue")
plt.show()

Step 4: Add a Rolling Window for Quarterly Insights

To observe quarterly trends, add a 3-month rolling average, helping the finance team identify any sustained increases or decreases.

# Calculate 3-month moving average
data['3_month_avg'] = data['revenue'].rolling(window=3).mean()

Now, we can see both monthly totals and a rolling average for a clearer, smoother trend.

Conclusion

Time series analysis with Pandas opens up opportunities to derive meaningful insights from time-bound data, providing analysts with the tools to parse, aggregate, and visualize trends effectively. Whether it’s daily sales or quarterly revenue, understanding these patterns enables informed decisions and better strategic planning. With Pandas’ powerful time series functions, time-based analysis becomes intuitive and highly insightful.

For more insightful articles on data analysis, follow me on Medium.

要查看或添加评论，请登录

Ime Eti-mfon的更多文章

Fake News Detection Using Machine Learning and Deep Learning

2025年3月11日

Fake News Detection Using Machine Learning and Deep Learning

Combatting Misinformation using Tech Tools Introduction Misinformation has become a major issue with the rise of social…

1 条评论
30 Days, 30 Concepts: A Deep Dive into Machine Learning

2025年2月24日

30 Days, 30 Concepts: A Deep Dive into Machine Learning

Introduction Over the past month, I completed a 30-day Data Science learning challenge focused on Machine Learning…

3 条评论
Day 30 — Hyperparameter Optimization

2025年2月23日

Day 30 — Hyperparameter Optimization

Concept: Model tuning. Implementation: Grid search, random search.

3 条评论
Day 29 — Model Deployment and Monitoring

2025年2月22日

Day 29 — Model Deployment and Monitoring

CONCEPT Model Deployment and Monitoring involve the processes of making trained machine learning models accessible for…

1 条评论
Day 28 — Time Series Analysis and Forecasting

2025年2月21日

Day 28 — Time Series Analysis and Forecasting

CONCEPT Time Series Analysis involves analyzing data points collected over time to extract meaningful statistics and…

1 条评论
Day 27 — Natural Language Processing (NLP)

2025年2月20日

Day 27 — Natural Language Processing (NLP)

CONCEPT Natural Language Processing (NLP) is a field of artificial intelligence focused on enabling computers to…

1 条评论
Day 26?-?Ensemble?Learning

2025年2月20日

Day 26?-?Ensemble?Learning

CONCEPT Ensemble learning is a machine learning technique where multiple models (learners) are trained to solve the…

1 条评论
Day 25 — Transfer Learning

2025年2月19日

Day 25 — Transfer Learning

Concept: Pre-trained models. Implementation: Fine-tuning.

1 条评论
Day 24 - Generative Adversarial Networks (GANs)

2025年2月18日

Day 24 - Generative Adversarial Networks (GANs)

Concept: Generative models. Implementation: Generator, discriminator.

5 条评论
Day 23 — Autoencoders

2025年2月17日

Day 23 — Autoencoders

Concept: Data compression. Implementation: Encoder, decoder.

1 条评论

See all articles

Working with Time Series Data in Python

Ime Eti-mfon

Data Scientist | Machine Learning Engineer | Data Program Community Ambassador @ ALX

Analyzing Trends with Pandas

1. Setting Up the Data: Loading and Parsing Dates

Example: Loading a CSV with Date Parsing

2. Indexing with Dates for Easy Access

3. Resampling Data for Time-Based Aggregation

Monthly Sales Aggregation Example

4. Time-Based Rolling Windows

7-Day Moving Average Example

领英推荐

5. Time Series Data Visualization

Case Study: Monthly Revenue Patterns of a Retail Store

Step 1: Load and Explore the Data

Step 2: Aggregate Revenue on Monthly Basis

Step 3: Visualize Monthly Revenue Trends

Step 4: Add a Rolling Window for Quarterly Insights

Conclusion

Ime Eti-mfon的更多文章

社区洞察

其他会员也浏览了

Seaborn: Elevating Data Visualization in Python

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Pandas for Data Science

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Aggregation in Pandas DataFrame

How I’d Become a Data Scientist (If I Had to Start Over)

20 Advanced Methods For Doing Data Analysis in Excel

Cleaning Data with Pandas

Analyzing Trends with Pandas

1. Setting Up the Data: Loading and Parsing Dates

Example: Loading a CSV with Date Parsing

2. Indexing with Dates for Easy Access

3. Resampling Data for Time-Based Aggregation

Monthly Sales Aggregation Example

4. Time-Based Rolling Windows

7-Day Moving Average Example

领英推荐

5. Time Series Data Visualization

Case Study: Monthly Revenue Patterns of a Retail Store

Step 1: Load and Explore the Data

Step 2: Aggregate Revenue on Monthly Basis

Step 3: Visualize Monthly Revenue Trends

Step 4: Add a Rolling Window for Quarterly Insights

Conclusion

Ime Eti-mfon的更多文章

Fake News Detection Using Machine Learning and Deep Learning

30 Days, 30 Concepts: A Deep Dive into Machine Learning

Day 30 — Hyperparameter Optimization

Day 29 — Model Deployment and Monitoring

Day 28 — Time Series Analysis and Forecasting

Day 27 — Natural Language Processing (NLP)

Day 26?-?Ensemble?Learning

Day 25 — Transfer Learning

Day 24 - Generative Adversarial Networks (GANs)

Day 23 — Autoencoders

社区洞察

其他会员也浏览了

Seaborn: Elevating Data Visualization in Python

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Pandas for Data Science

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Aggregation in Pandas DataFrame

How I’d Become a Data Scientist (If I Had to Start Over)

20 Advanced Methods For Doing Data Analysis in Excel

Cleaning Data with Pandas