Digging into Data: What I Discovered at Metals R' Us Flotation Plant

Digging into Data: What I Discovered at Metals R' Us Flotation Plant

When I first walked into the world of data analysis, I never imagined I would be knee-deep in mining data, specifically from a flotation plant. While working on my project, I felt a bit like a detective, uncovering the hidden stories behind numbers. Who knew that mining operations could be so complex? This project not only piqued my interest in data but also made me appreciate the careful balance of nature and technology in the mining industry.

My journey began when I stumbled upon a dataset from a flotation plant at a mining company called Metals R' Us. As someone fascinated by how data can drive business decisions, I was drawn to this project. The prospect of analyzing real-world data to help improve mining efficiency felt both exciting and important. I wanted to find patterns that could lead to better resource management and optimization in a field I hadn’t previously explored.

In this article, I’ll share the insights I discovered while analyzing the flotation data. You’ll learn about the relationships (or lack thereof) between key variables in the mining process, the surprises I encountered, and how this data analysis journey reshaped my understanding of the mining sector.

Key Takeaways

  • No significant correlations were found between % Iron Concentrate, % Silica Concentrate, and other key variables on selected dates and throughout the week.
  • An intriguing inverse relationship exists between % Silica Feed and % Iron Feed: when one increases, the other decreases.
  • The pH levels of Ore Pulp generally follow a normal distribution, with the majority of samples falling between 9.5 and 10.2. This suggests stability in your processes regarding pH control.

Dataset Details

I used a dataset sourced from Kaggle, comprising 737,453 rows and 24 columns of data collected between March and September 2017. The data was a bit messy, with some columns sampled every 20 seconds while others were sampled hourly. This mix made it a unique challenge, but also added to the richness of the analysis.


Code to learn more about the dataframe.

Analysis Process

My analysis journey started with data cleaning and transformation. I converted the ‘date’ column to a datetime format to ensure proper analysis. I then explored the dataset using various Python libraries like Pandas and Seaborn.


Code to learn more about the structure of the dataframe.

I wanted to explore descriptive statistics for the columns to provide a comprehensive summary.


Code to get descriptive statistics from dataframe.
Code to get the min and max date in the dataframe.

To check for relationships between variables, I used a combination of pair plots and correlation matrices. It was pleasantly surprising to find that many expected correlations simply weren’t there. For instance, on June 1, 2017, the expected relationships between flotation levels and concentrate percentages were absent.

One of the most interesting findings was the inverse relationship between % Silica Feed and % Iron Feed. This alone made me rethink how the inputs in mining operations could affect outputs in unexpected ways.

Visuals and Insights

A supervisor requested to look specifically at June 1, 2017.


Code to create dataframe for June 1, 2017.


Code to create visualization for "important columns" and display.

  • Pairplot Visuals: I used a pairplot in order to visually see how the data related. My pairplot didn’t show any strong relationships between the variables. This was surprising and made me reconsider how I approached the data.


Code to create pairplot to visualize the relationships between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5


Pairplot to visualize the relationships between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5.


Pairplots to visualize the relationships between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5.

  • Correlation Matrix: The matrix confirmed what the pairplot suggested—no strong correlations. It made me ponder the complexities of the flotation process and why these variables might not connect as anticipated.


Code and Correlation Matrix

  • Line Charts: I created line charts to observe how the data changed over time and across different days of the week. After seeing this visual, I decided to create one with all four variables. The lack of correlation again stood out, leading me to think about external factors that could impact these metrics, such as operational shifts or environmental changes.


Code and Line Chart with the averages for % Iron Concentrate on June 1st, 2017.


Code to create Line Charts with the averages for % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5 for June 1st, 2017.


Line Charts with the averages for % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5 for June 1st, 2017.

I wanted to know how the variables changed over the entire period.


Code to create line Charts with the averages for % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5 between March - September.
Line Charts with the averages for % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5 between March - September.

I wanted a closer look at the data and broke it down to averages by the days of the week.


Code to convert the date column to datetime and create a new column for the day of the week.


Code to find the average values for each day of the week.

I then added a visual element to the data.


Code to create Line Charts for averages for % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5 by the day of the week


Line Charts comparing the averages for % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and the Average Flotation Column 5 by the day of the week.

How did the float columns relate to each other based on days of the week?


Code to create Line Charts for the flotation rates.


Line Chart Comparing the Average Flotation Rate

There appears to be no correlation between the flow rate and the day of the week.

Compare both Iron and Silica % Feed vs % Concentrate.


Code to create Line Chart for % Silica Feed vs % Silica Concentrate


Code to create line charts for % Iron Feed and % Iron Concentrate


Line Chart comparing % of Silica Feed vs % of Silica Concentrate between March to September


Line Chart comparing % of Iron Feed vs % of Iron Concentrate between March to September

There is an intriguing inverse relationship between % Silica Feed and % Iron Feed: as one increases, the other decreases.

  • Bubble Plot: The bubble plot illustrated the inverse relationship between % Silica Feed and % Iron Feed perfectly. It was almost like a dance—when one went up, the other went down. This prompted me to consider the operational decisions miners might need to make based on these findings. I chose to analyze the data on a monthly basis and updated the code accordingly to reflect each month separately.


Code to individualize Bubble Plot based on Month


Code to create Bubble Plo


Bubble Plots of % Iron Feed vs % Silica Feed Overall and March-May


Bubble Plots of % Iron Feed vs % Silica Feed June-September

  • Histogram: The histogram visually represents the distribution of Ore Pulp pH levels across your dataset. The x-axis shows the pH levels, while the y-axis shows the frequency of each pH range. The presence of a bell curve indicates that most pH levels cluster around the average, reinforcing the assumption of a normal distribution within the specified range.


Code to create the Histogram for Ore Pulp pH.


Histogram showing the Frequency of the Ore Pulp pH.


The pH levels of Ore Pulp generally follow a normal distribution, with the majority of samples falling between 9.5 and 10.2. This suggests stability in your processes regarding pH control. The bump at pH 8.75 is noteworthy. This spike suggests that a significant number of samples fall within this range, indicating a possible anomaly or a specific condition affecting this subset of data.

Main Takeaways

From this analysis, I learned that mining data can be as unpredictable as it is vast.

  • The absence of strong correlations suggests that other factors, like operational practices or environmental conditions, could be influencing the flotation process.
  • Understanding the inverse relationship between % Silica and % Iron Feed could lead to more informed decisions about resource allocation and processing approaches in the flotation plant.
  • This experience has reshaped my perspective on the mining industry, emphasizing the need for comprehensive data analysis to uncover hidden insights.
  • The presence of a bell curve indicates that most pH levels cluster around the average, reinforcing the assumption of a normal distribution within the specified range. The unexpected increase in frequency at pH 8.75 could indicate that there are specific conditions or factors that lead to this lower pH level. It may be worth investigating further to understand why this is happening.

Conclusion and Personal Reflections

Reflecting on this project, I faced challenges, especially during the data cleaning phase, where I had to wrangle the messy data into a usable format. With persistence and a little creativity, I managed to overcome these hurdles. This project has not only enriched my analytical skills but also sparked a deeper interest in how data can drive operational improvements in various industries.

If you found this project intriguing or have insights on data analysis in the mining sector, I’d love to connect! Let’s discuss your thoughts, share experiences, or even explore potential job opportunities. Feel free to leave a comment or reach out on LinkedIn Dianna Green M.Ed !

?

Moiz Noorali

Operations Analyst @ Kumon | Data Analytics | Data Visualization | SQL | Tableau | Python | Excel

2 个月

Dianna Green M.Ed I really liked your project and loved how detailed your explanations are. Very easy to follow!

回复
Stuart Walker

Fraud Prevention Analyst @ M&G PLC | Data Analyst | Data Scientist | Python | SQL | Machine Learning | Data Analytics | Excel | Tableau | Power BI | R

2 个月

Good Job Dianna ??????

回复
Saurabh K. Negi

Data Solutions Expert | Advanced Excel for Data Analysis | Typing Professional | 10-Key Typing Maestro | Data Visualization

2 个月

Nice ??

回复
Peg Blaine, RHIT, CPT

Data Analytics | SQL | Excel | Python | R | Tableau | Storytelling | HIM

2 个月

Nice job with a great analysis and visualizations Dianna Green M.Ed ??

回复
Laura S.

Data Analyst | Scientist | Excel | Power BI | MySQL | Tableau | R | Python

2 个月

Great project and nice data!!

回复

要查看或添加评论,请登录

Dianna Green M.Ed的更多文章

社区洞察

其他会员也浏览了