Beneath the Surface A Python-Based Mining Data Analysis

Beneath the Surface A Python-Based Mining Data Analysis

Have you ever wondered what happens beneath the surface of a mining operation? Just like a detective piecing together clues, I found myself diving into a world of data to uncover the mysteries of a flotation plant. As a data analyst, I was tasked with a project that revolved around June 1, 2017—a day marked for unusual activity. The stakes were high, and I was curious to see what secrets the data would reveal.

Why THIS Project?

The inspiration for this project came from the critical need to understand the operations of the flotation plant on that particular day. The mining company flagged June 1 due to unexpected fluctuations, and they wanted to know whether operational inefficiencies, equipment issues, or other factors might have contributed to this moment. My background in data analysis and passion for solving complex issues made this project not just a job, but a chance to contribute to the mining industry’s efficiency.

What Readers Will Gain

In this article, I’ll share insights from my analysis, discussing the findings and the journey I took to reach them. You’ll learn about the data I used, the analysis process, and the surprising results I encountered.

Key Takeaways

  • No anomalies were found on June 1, 2017.
  • There’s an inverse relationship between % Iron Concentrate and % Silica Concentrate.
  • Ore pulp pH levels were mostly stable, with a high frequency between 9.9 to 10.1.
  • Surprisingly, the main variables had no significant correlation, suggesting other influencing factors.

Dataset Details

For my analysis, I utilized a dataset sourced from Kaggle, which contained a staggering 737,454 rows and 24 variables spanning from March to September 2017. This dataset was perfect for the project as it provided minute-by-minute and hourly data, giving me the granularity needed to uncover the nuances of flotation plant operations.

Library Installation

I used Deepnote, a browser-based Integrated Development Environment (IDE), to explore and visualize the data by writing Python code for analysis and visualization. Before starting, I installed and imported three libraries: Pandas for data manipulation, and Seaborn and Matplotlib for creating visualizations.


Library installation code

Next, I connected to the dataset in Python using Pandas' DataFrame structure and reviewed a preview of the data.


Connecting Dataset to Python code


Preview of Dataset code


Dataset preview result

Analysis Process

The analysis began with data cleaning, which was crucial for ensuring accuracy. I noticed that some decimals were incorrectly formatted with commas. This required me to replace commas with decimal points.


Changing comma to decimal code


Changing comma to decimal result

Additionally, I had to redefine the data types of certain columns to ensure they were in the correct format for analysis.


Data variable type checking code


Data variable type result

The date column was imported as a string. To fix that, the below code will redefine the column to a date using the _datetime() function.


Redefine date code

Once the data was clean, I created summary statistics that provided key insights into average, median, minimum, and maximum values. The focus was particularly on the first week of June, where I filtered the data to hone in on the variables of interest: date, % Iron Concentrate, % Silica Concentrate, ore pulp pH, and flotation column levels.


Data summary code


Data summary result

Were there any anomalies on June 1, 2017?

Management indicated that something unusual occurred on June 1, 2017, and requested an investigation. To begin, I filtered the data for the first week of June and created a new data frame, df_june.


data frame for 1st week of June code


df_june result


The focus was particularly on the first week of June, where I filtered the data to hone in on the variables of interest: date, % Iron Concentrate, % Silica Concentrate, ore pulp pH, and flotation column levels.

I then created a new DataFrame, df_june_important, by selecting values from the original df_june. This allowed df_june_important to focus on the five key columns from the first week of June.


df_june_important code


df_june_important result

Using Python’s Seaborn library, I visualized the data. explore potential relationships between the variables.


seaborn pariplot code


Seaborn pairplot

It was surprising to see that the relationship between the key variables was weak, indicating that other factors might be influencing the concentrations.

To validate this, I generated a correlation matrix, which showed low correlation values as expected. This indicates weak relationships between the variables, suggesting that other factors might be influencing the data.


df_june_important correlation code


df_june_important correlation result


Fluctuations in % Iron Concentrate, % Silica Concentrate, Flotation Column 05 Level

Management wanted to understand how % concentration changes throughout the day, as previous insights had raised more questions. I used Seaborn to create a line plot to visualize these daily fluctuations in concentration levels.

Line

The line plot for % Iron Concentrate showed fluctuations, particularly around 11 a.m. This was fascinating to observe, as it prompted questions about what operational changes were occurring at that time.


Fluctuations in % Iron Concentrate

The line plot for % Silica Concentrate showed multiple fluctuations, particularly around 5 a.m., 11 a.m., and 6 p.m.


Fluctuations in % Silica Concentrate

Similarly, a pronounced drop in the Flotation Column 05 Level around 3 p.m. raised eyebrows.


Fluctuations in Flotation Column 05 Level

Ore Pulp pH Level Histogram

In examining the ore pulp pH levels, I created histograms that showed a high frequency of values between 9.9 and 10.1, which fell within acceptable limits. This consistency was a relief to see, as it indicated a stable process.


Ore pulp pH histogram code


Ore pulp pH histogram result

Main Takeaways

This project reinforced several key points in my data analysis journey:

  • Cleaning data in Python is critical for accurate analysis; simple formatting changes can make a big difference.
  • Generating summary statistics quickly provides essential insights that guide further analysis.
  • Visualization tools like line plots and correlation matrices are invaluable in identifying trends and relationships, or the lack thereof.
  • The unexpected lack of correlation among key variables suggests that factors outside the data set may be influencing the results, prompting further investigation.

Conclusion and Personal Reflections

Reflecting on this project, I learned the importance of thorough data cleaning and the value of visualizing data to uncover trends. While I faced challenges, such as the initial formatting issues, I found solutions through patience and experimentation. This project has shaped my perspective on data analysis, highlighting the complexities of operational data in the mining industry.

I’m excited about the future and how I can apply these insights to improve processes in various industries.

Call To Action

I would love to hear your thoughts on this analysis! Connect with me on LinkedIn, and if you're looking to hire a data analyst or have questions about my project, let’s chat. Leave a comment below with your insights or queries!



Such a clear and interesting article as always! Love the fact that you explained everything so well, and that you mentioned useful tools like pairplot() and a correlation matrix - Bravo!

Laura S.

Data Analyst | Scientist | Excel | Power BI | MySQL | Tableau | R | Python

2 个月

Great project and write up Omhari!!!

回复
Joseph Pascual

Data Analyst | Leveraging SQL, Tableau, and Excel to Drive Data-Driven Insights

2 个月

I enjoyed how well this flowed and how you broke down each step. The screenshots made it easy to follow alongI noticed in some charts the X-axis date/time stamps are a bit blended together and make it hard to read.(I’ve seen that LinkedIn articles will squeeze large images into a smaller aspect ratio and throw-off chart formatting!) Curious if you’ve tried rotating the x-axis value?A line of code I like to use before?plt.show() is:plt.xticks(rotation=45)# 45 being degrees of rotationI hope this helps! Overall, you’ve put together a great analysis and communicated your insights effectively

Erin Balatayo

Data Analyst | Business Analyst | Driving Business Insights for Growth | SQL, Tableau, Excel, Python

2 个月

Great job communicating your findings. If I was learning Python for the first time I would be easy to follow!

Isaac Oresanya

Data Analyst @ DCJ | Helping businesses find clarity in data | Web scraping & analytics with Python, Tableau & SQL | Open to freelance gigs

2 个月

Your insights are very well communicated. Nice work!

要查看或添加评论,请登录

Omhari Gurung的更多文章

  • Clipboard Health Staffing Optimization Report

    Clipboard Health Staffing Optimization Report

    While working on my project about staffing in long-term care facilities, I stumbled upon something quite surprising…

    14 条评论
  • Nepal Tourism Economic Analysis

    Nepal Tourism Economic Analysis

    Have you ever stood at the base of a majestic mountain and realized its significance not only as a natural wonder but…

    21 条评论
  • Exit Insights: Data-Driven Attrition Analysis with R

    Exit Insights: Data-Driven Attrition Analysis with R

    When I first started my journey into data analysis, I had no idea how personal it would become. At my previous job, I…

    11 条评论
  • Game Changers: Evaluating NBA Performance

    Game Changers: Evaluating NBA Performance

    As someone who admittedly knows little about basketball beyond the legendary Michael Jordan, diving into NBA statistics…

    2 条评论
  • Understanding the Pulse of Healthcare Through Data Analysis

    Understanding the Pulse of Healthcare Through Data Analysis

    From a young age, I've been deeply acquainted with the healthcare system, especially since both my parents are diabetic…

    9 条评论
  • Analyst Aren't You Hungry? Let's Dive Into DoorDash

    Analyst Aren't You Hungry? Let's Dive Into DoorDash

    INTRODUCTION Yesterday evening I ordered food online since my wife and I were not in the mood to cook. Honestly, I’m…

    26 条评论

社区洞察

其他会员也浏览了