Digging Some Mines With Python

Digging Some Mines With Python

Introduction

Imagine joining a company where every second counts. This company transforms massive clumps of dirt into valuable minerals, extracting iron from dirt, sand, and other impurities. Purity is the goal, and your job is to determine the optimal moments to extract specific substances. ??

Mine.


The challenge doesn't end there—the data is a mess. It's even messier than the clumps the company digs up. You need to use your analytical skills to make sense of this chaos and uncover insights that could help the company extract more iron to sell.

Through a connection with Avery Smith , I got the opportunity to analyze this dataset using Python. Python has always been my favorite programming language, standing out from others like C++ or Java. Its readability and ease of use make it particularly appealing, so I chose to tackle this project with Python to extract valuable insights from the mineral dataset.

What I learned..

Correlation of Several Important Features

  • The strong negative correlation between the concentration of Iron and Silica: It turns out that the processes or conditions that result in higher iron purity tend to result in lower silica content. This is valuable information for optimizing the flotation plant’s operations to maximize iron purity while minimizing silica impurities.
  • By visualizing the flow with the Seaborn package, the evidence became clearer: Viewing the change in Iron and Silica concentration rates with line graphs, it's even more evident that these two have a strong negative correlation.
  • Flow of each substance might change due to the adjustment of operational schedules: The flow of reagents for the mixing process changes throughout the week, likely to meet weekly targets, followed by maintenance on the weekend.
  • August turns out to have more low pH substances in ore pulp: By comparing four months from May to August, it's interesting to note that more low pH chemicals were mixed in the pulp during August. This could significantly impact the flotation process results.

The Data

The data for this project was sourced from Kaggle and covers the mining dataset from March to September 2017.

Features Include:

  • Concentration percent of Iron and Silica
  • Starch and Amina flow during the mixing procedure of the flotation plant
  • Froth level, air flow, pH value, and density of ore pulp

In this analysis, I used Deepnote for all Python-based analysis tasks. Additionally, I utilized Carbon to export beautifully formatted images of select portions of the Python code used in this analysis. ???

For the full code, please check out my Deepnote notebook here!


Analysis

Data Preparation

Here are some fundamental steps I took before analyzing the insights from this dataset.

Fundamental Python Packages

First, I utilized common Python packages such as Pandas, Seaborn, and Matplotlib to assist with the analysis.

Make sure the date values are properly displayed
The maximum and minimum date value

Since the data exported from Kaggle was in CSV format, I imported it using the read_csv function.

Next, I noticed that the values in the date column were imported as strings, which isn't suitable for time series analysis. To resolve this, I converted the date column to Timestamp format, ensuring the dates could be accurately handled in my analysis.

Finally, by retrieving the maximum and minimum date values, I confirmed that the data covered the period from March 2017 to September 2017. ??

The Strong Negative Impact Between Iron & Silica

Suppose I were asked by a stakeholder to determine if there's a relationship between the extracted amounts of Iron and Silica.

The first step I took was to extract values from a specific day, storing the data in df_july for July 1st.

In addition to the essential columns like 'date', '% Iron Concentrate', and '% Silica Concentrate', I included 'Ore Pulp pH' and 'Flotation Column 05 Level' to explore any additional relationships between these key elements and other attributes.

Code to extract data from a specific date
Overview of the data stored in df_july_important

To examine the relationships among these values, I used correlation matrices. Initially, I created a pair plot to visualize any patterns, as visual representations like charts and graphs provide an intuitive understanding of relationships.

To ensure there were no biases in the patterns, I also generated a correlation heatmap to support the visual analysis with numerical data.

Code for Pair Plot and Correlation Matrices

From the analysis, it is evident that there's a strong negative relationship between the Iron and Silica concentration percentages. The heatmap indicates a correlation of approximately -0.96 between them.

This finding suggests that in each pulp processed in the plant, a higher percentage of Iron extracted corresponds to a lower percentage of Silica, and vice versa.

Correlation Pairplot Between Each Feature


Correlation Heatmap Between Each Feature

Substance Change During a Week

To gain a more precise view of substance changes during the chemical process, I examined the changes over the course of a week.

Using the code below, I extracted data from the first seven days of August to observe any variations occurring each day. Along with the concentration percentages of Iron and Silica, I included the flow of Ore pulp, Amina, and Starch to inspect changes in other chemical quantities during the process.

Code to create line plots for specific attributes in a week

First, it's apparent that all three flows (Ore pulp, Amina, and Starch) showed increased activity towards the end of the week, peaking on Saturday. This suggests heightened plant activity and processing efforts towards the end of the week, likely to meet specific production targets before the week ends.

Next, there are sharp drops around Sunday and Monday. These drops in flow rates and concentrate levels might indicate weekend adjustments, such as reduced operations and maintenance activities. This could also suggest that after meeting production targets on Saturday, the plant undergoes maintenance to prevent potential issues in future operations.

Lastly, the sharp increase in Silica concentrate on Friday and the corresponding drop in Iron concentrate highlight the significant inverse relationship observed in the correlation matrices.

These insights emphasize the importance of continuous monitoring and adjustments to maintain optimal performance throughout the week.

Line plots for the change of a week

Analysis of pH Levels

Code for creating histograms and line plots for a specific period

In the final step of this analysis, I aimed to identify any adjustments in pH levels across different months.

Using the code above, I created histograms and line plots to visualize pH levels over time.

From the histograms and line graphs, a noticeable decline in average pH levels is observed, particularly towards the end of August. This indicates the addition of more low pH chemicals during the process.

Several factors could contribute to this lowering of pH levels:

  1. Operational Adjustments: Changes in process parameters or the introduction of different chemicals to optimize the flotation process.
  2. Input Material Variability: Variations in the composition of input materials, necessitating adjustments in chemical additions.
  3. Process Inefficiencies: Possible inefficiencies or issues within the flotation process that required corrective measures, such as adding low pH chemicals.

Histograms and Line Plots for pH Levels

Conclusion

This project demonstrates the significant role that data analysis plays in optimizing the operations of a mining company. By leveraging Python for data processing and visualization, we were able to uncover valuable insights into the flotation process.

Key findings from the analysis include:

  1. Strong Negative Correlation Between Iron and Silica: A notable inverse relationship was identified between the percentages of Iron and Silica concentrates, with a correlation coefficient of approximately -0.96. This indicates that higher Iron extraction is associated with lower Silica content and vice versa.
  2. Substance Changes During the Week: The analysis revealed increased plant activity towards the end of the week, with peaks in the flow rates of Ore pulp, Amina, and Starch on Saturdays. This likely reflects efforts to meet production targets. Sharp drops in activity on Sundays and Mondays suggest maintenance or reduced operations following peak production periods.
  3. pH Level Adjustments: A decline in average pH levels, especially towards the end of August, was observed. This decrease likely resulted from the addition of low pH chemicals due to operational adjustments, input material variability, or process inefficiencies.

The insights gained from this project underscore the importance of continuous monitoring and timely adjustments in the chemical process to maintain optimal performance and efficiency. By understanding the relationships between various process parameters and their impacts on output quality, the company can make informed decisions to enhance their operations.

Overall, this analysis highlights the power of data-driven approaches in industrial settings and the value of using tools like Python to unlock actionable insights from complex datasets!


Call to Action

I hope you enjoyed this deep dive into mining data analysis! Exploring the complexities of the flotation process and discovering actionable insights has been an exciting and enlightening experience. ????

If you have any questions or are interested in collaborating on future projects, please reach out to me at [email protected].

For more intriguing projects and analyses, visit my portfolio website here.

Have a fantastic day! ??

Jercika Procel

Order Management Analyst | Data Analytics | Excel | SQL | R | Tableau | Data Visualization

8 个月

Such a cool analysis using Python! Can't wait to get there. Really well done, Andy! ??

Victor Yakubu

Technical Storyteller, Simplifying the web and writing technical documentation | Tech Community Manager | Software Engineer | Technical Writer | DevRel ??

9 个月

Great article Andy. Particularly loved how detailed and diagrams you included. Well done. Do you have a blog where you publish?

Melody Santos

Charge Integrity Analyst with a background in Physical Therapy

9 个月

Well done Andy Chang!

Sridhar Manthripragada

Data scientist with 17+ years of experience, seeking new and challenging opportunities to do well.

9 个月

Nice. Can you share the data set please? I would like to have a go at it myself.

Alex Haycock

Project Coordinator | ?? Business Analyst | Excel, Google Sheets, SQL, R, Tableau, Power BI

9 个月

This is a great project Andy! Really well written??

要查看或添加评论,请登录

Andy Chang的更多文章

社区洞察

其他会员也浏览了