Unearthing Insights: Python Data Analysis in the Mining Industry

Unearthing Insights: Python Data Analysis in the Mining Industry

Introduction:

This project involves analyzing real-world data from a flotation plant at Metals R' Us, a mining company that specializes in extracting iron from impurities. The main focus of the analysis is the "% Iron Concentrate" variable, which indicates the purity of the extracted iron. By using Python to analyze the data, this project aims to uncover hidden insights that can improve the extraction process, reduce costs, and increase efficiency. The project showcases the power of data analysis in optimizing traditional processes and highlights the importance of skilled data analysts in unlocking valuable insights from complex data sets.

The dataset for this project can be found here.

The following were the business questions I worked on to answer.

  1. Was there an unnatural occurrence in the sample collection process on 6.1.2017?
  2. Is there any correlation/relationship between the variables we found?
  3. Is there a change in the Amina Flow throughout the day? Are there other variables that we can present on?
  4. What is the difference between Iron Ore before floatation versus after?
  5. Is there a month that had greater concentration in Iron? What about Silica?
  6. Does Starch Flow affect the Concentration of Iron?

Key Insights:

  • According to the studied data, nothing unusual happened on June 6, 2017.
  • There is no association between the variables we pulled and looked at, according to the charts the seaborn library gave.
  • The mineral change rates do differ from hourly each day, according to the line charts for Amina Flow, Iron Concentrate, Silica Concentrate, Ore Pulp PH, and Flotation Column 05 Level.
  • The best results came from the ore treated between May 13 and June 15.
  • The average iron content after purification is roughly 65. Silica, however, ranges from 2-3.
  • A starch flow of 4,500 or more reveals more impurities, producing iron of greater purity.

The Analysis:

  1. Was there an unnatural occurrence in the sample collection process on 6.1.2017?

Let's first start by getting to know how long this data set spans for by returning the earliest and latest date of the data set. Next, let's filter the rows with a boolean mask & create a new dataframe?df_june:

df_june = df[(df['date'] >?"2017-05-31 23:59:59") & (df['date'] <?"2017-06-02")].reset_index(drop=True)

No alt text provided for this image

Now that helps our rows, but we still have all the columns.

We'll create a variable that is a list of all the important columns we want to focus on. We will call that variable?important_cols. Once this is done we create a new dataframe called?df_june_important?and set it equal to the older dataframe?df_june?in the column of important_cols.

No alt text provided for this image

2. Is there any correlation/relationship between the variables we found?

We just utilize Seaborn as sns and request the pair plot using the data frame as an argument to respond to this question.

No alt text provided for this image

There does not appear to be any correlation or association between this information after looking at these plot maps.This can be confirmed with a correlation matrix & noticing all the correlation values are low.

No alt text provided for this image

3. Is there a change in the Amina Flow throughout the day? Are there other variables that we can present on?

No alt text provided for this image

We can see from the above line chart there is high levels of change rate throughout the date for the Amina flow. Since the line chart was very helpful, we want to see other variable changes during the exact timeframe as the Amina Flow.

No alt text provided for this image

Based on the information obtained via python, we can see that Iron concentrate, silica concentrate, Ore Pulp Ph and the floatation column level all have changes throughout the day.

4. What is the difference between Iron Ore before floatation versus after?

As it is fed into the processing device, iron is first measured. The data reads the same number multiple times because the Iron is tested hourly.

No alt text provided for this image

Even though this is the information I require, it lacks sufficient context. I created a bar graph using the date and the difference to better illustrate where the Iron was at the beginning and the end.

No alt text provided for this image

The "Difference" in the graph, also known as the differences between the initial ore (% Iron Ore) and the end ore (% Iron Concentration), can be seen using the bar chart. From May 13 and June 15, there was barely any difference. making me?think that the specific ore was of higher quality. It would be wise to go back and look where this mineral was discovered.

5. Is there a month that had greater concentration in Iron? What about Silica?

I can see exactly the time frames I have available. I use the command listed below to discover that the data is from 10th April 2017 to 9th September 2017.

No alt text provided for this image

I now have the time period to investigate the iron and silica concentrations. I further filter the data and concentrate on the three primary columns where my data is located. Date, percentages of iron and silica concentrations.

No alt text provided for this image

I utilize this data to instruct Seaborn to produce visualizations. The "% Iron Concentration" amounts over the time period the dataset is in are shown in the first graph. I can see that the iron is, on average, 65 at the end of the purifying procedure.

No alt text provided for this image

The "% Silica Concentration" at the conclusion of the iron's purification procedure is depicted in the second graph. During this time, the Silica is between 2-3.

No alt text provided for this image

6.?Does Starch Flow affect the Concentration of Iron?

Floatation is a process that enables the separation of valuable minerals from waste rock. The minerals will float at the surface and rise above in gas bubbles. Using depressants is yet another method of mineral separation. Starch is a frequent depressant. 2020. Starch will be added after the iron has been magnetically separated to assist keep the iron clean. As a result, since silica is regarded as an impurity in this dataset, we must examine it.

I concentrated on the starch flow in the graph below since it might have had an impact on the final percentage of silica concentrate. It seems that if the starch flow is 4,500 or greater, the percentage of silica concentrate is also higher, indicating that more contaminants are being discovered.

No alt text provided for this image

Conclusion:

I enjoyed working with this project as it didn’t had a familiar dataset and had to do some research about this topic . I am trying to expand my data skills so, if you have any suggestions or feedback , feel free to reach out to me. I am looking for a career in data so if you have any opportunities , I'd be open to explore.

Feel free to connect with me on LinkedIn and be on the look out for future data projects from me!

Deekshith Bommisetty

SDE Amazon | MS Computer Science | Syracuse University

1 年

Interesting work !

回复
Kinzel Jain

Business Analyst

1 年

Great work!

回复

要查看或添加评论,请登录

Simran Pathak的更多文章

  • The Future of Healthcare Analytics: A SQL-Based Approach

    The Future of Healthcare Analytics: A SQL-Based Approach

    Healthcare is an important component of society, and with an ever-increasing volume of data available, data analysis…

    1 条评论
  • DoorDash Market Growth Analysis

    DoorDash Market Growth Analysis

    Moving to the United States as an international student was not a straight forward endeavor. As an international…

社区洞察

其他会员也浏览了