Python extracted Iron Ore from the Earth's Core: Mining Analysis using Python

Python extracted Iron Ore from the Earth's Core: Mining Analysis using Python

Introduction:

Metals are used in almost all the products that we use in our daily life. Specifically, Iron is the most sellable metal out there in the market as it is used in many steel products. The iron ore is mined and then processed to extract the iron, which is in the form of iron oxides such as hematite and magnetite. Once extracted, the iron ore undergoes a series of processes, including crushing, grinding, magnetic separation, and sometimes flotation, to produce a concentrated form suitable for steelmaking. This concentrated form, often referred to as iron ore pellets or sinter, is then melted in a blast furnace along with other materials like coke (a form of carbon) and limestone. This process results in the production of molten iron, which is further refined in a basic oxygen furnace or electric arc furnace to produce steel.


In this study I acted as a Data Analyst for a mining company that digs big holes and extract Iron ore from the ground which is surrounded by impurities like dirt, sand, and silica.?They put it through a flotation plant to make come up with cleaner Iron. As a Data Analyst I utilized Python to examine the dataset from their flotation plant and came up with valuable insights.?

Key Findings:

  • The manager expected that something unusual happened on 1st June in floatation plant. The Paired Scatterplots, Correlation matrix, Heatmap concludes that nothing unusual happened on that particular day.
  • % Iron Concentrate is high at noon between 3pm – 4pm and low at night between 9pm – 12am on 1st June.
  • % Silica Concentrate is high at noon between 12pm – 1pm and low at evening between 6pm – 7pm on 1st June.
  • Ore Pulp pH is high at morning between 5am – 6am and low at evening between 6pm – 7pm on 1st June.
  • Flotation Column 05 Level is high almost throughout the day and low at evening during 6pm on 1st June.

?

Let’s Dive into Data:

This data is real data taken from March 2017 to September 2017.?Every row is a time point at 20 second intervals. The date column has the day, month, year, and hour, but doesn't show minutes. There are 24 columns and 75k+ rows. In the columns, flow is how fast something is moving. Level is how tall the frothing that occurs from all the bubbles. The second to last variable "% Iron Concentrate" is the one to focus on. That is how pure the iron is. If you wish to dive further more into the data set, here’s the link for the DATASET.

Here is the data dictionary explaining what every column represents.

Let’s Get started with Python Analysis:

For this Study an Interactive Development Environment called Deepnote is used. Deepnote is one of the best IDE’s out there which is super user friendly and the best part is it stores the Excel worksheet in its cloud.

  • Installing the Libraries:

Made use of Python libraries in order to analyze the data from the flotation plant. For data manipulation Pandas is installed and for data visualization seaborn, matplotlib are installed.

Installing Libraries

  • Loading and Previewing the data:

As said earlier firstly the dataset (CSV) file is imported into Deepnote. Then the dataset is read using the below command in python:

df = pd.read_csv('MiningProcess_Flotation_Plant_Database.csv')

Previewed the first 5 and last 5 rows of the data set using df.head() and df.tail()

Displayed the number of rows and columns in the dataset using df.shape()

  • Data Cleaning:

The dataset is not displaying the numerical values in an appropriate manner which are supposed to be decimal values. Comma (,) is displayed in the place of dot (.)

This can be simply fixed by letting pandas know that the dataset (CSV) is using Comma (,) in the place of dot (.) by using below command

Replace this command with the above read command and then rerun the notebook. There you go the dataset is clean.

  • Changing the Data Type:

In the dataset from the mining floatation plant the data type of date column is string and this needs to changed to date time format because the dates are very important in extracting the valuable insights from the data set.

Firstly, Checked the data type of the dataframe and specifically date column and date value using below code.

Secondly converted the date column data type to datetime format using below code.

Now the date column in datetime/timestamp format. Let’s go to further analysis.

  • Descriptive Statistics:

In Python, specifically in the context of data analysis using the Pandas library, df.describe() is a method used to generate descriptive statistics of a DataFrame. The DataFrame is a two-dimensional, tabular data structure provided by Pandas. This method provides a summary of the central tendency, dispersion, and shape of the distribution of a DataFrame's numerical columns.

Here's a breakdown of what df.describe() provides:

  1. Count: Number of non-null values in each column.
  2. Mean: Mean (average) value of each column.
  3. Std: Standard deviation, which measures the amount of variation or dispersion of a set of values.
  4. Min: Minimum value in each column.
  5. 25% (Percentile): 25th percentile, also known as the first quartile. It is the value below which a given percentage of observations fall.
  6. 50% (Percentile): 50th percentile, also known as the median or second quartile.
  7. 75% (Percentile): 75th percentile, also known as the third quartile.
  8. Max: Maximum value in each column.

  • Data on particular date:

The manager asked to look up on few important columns (% Iron Concentrate, % Silica Concentrate, Ore Pulp pH, & Flotation Column 05 Level) from the data set on a particular date (1st June) at different hours expecting something unusual happened on that particular day.

Firstly, I listed out the date ranges from the given dataset using below functions:

Secondly, I created a new dataframe for the mentioned particular date (1st June).

Now the above command says, let's create a new dataframe called?df_june?that is actually just the old dataframe, but only where the date is larger than May 31, 2017 at midnight & less than June 2, 2017. The ‘&’ sign allows for those two conditions to be met, and those individual conditions are encased?in round parenthesis & then in square brackets to signify the filtering of rows.

Thirdly, I created one more dataframe based on df_june?dataframe including all the above-mentioned columns for particular date (1st June).


The new dataframe df_june_important consists of only the important 5 columns and 4,320 rows filtered down to one particular date (1st June).

Finally, to see the correlation among all the imp columns designed a scatter plot for all the above-mentioned columns using pair plot function in Python to see if there is anything unusual among the imp columns mentioned on the given date 1st June.

I personally do not see any correlation between the imp columns given by the manager on that particular date (1st June). We can confirm this by a correlation matrix. Python is a powerful tool which can do a lot more things and Correlation matrix is one among them. Correlation matrix in python can be plotted using corr() function.

A visualization gives even more better understanding of the above correlation so for the further confirmation a heat map is plotted using the below code.

Then the manager wants to see how the % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, & Flotation Column 05 Level are varying throughout the whole day (1st June alone) at different hours as he expected something unusual on that particular day. A line graph is perfect when it comes to time series. Plotted line graphs for all the above parameters at once on 1st June using a for loop in Python. Python is a great tool when it comes to visualizing the data.

?My notebook can be accessed here


Conclusion/Recommendations:

  • There aren’t any outliers in % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, & Flotation Column 05 Level as expected by the manager on 1st June. Everything is working fine in the floatation plant on 1st June.
  • Explore and implement advanced flotation technologies and equipment that can enhance the separation efficiency and improve the quality of the iron concentrate.
  • Evaluate and optimize the selection of flotation reagents to maximize recovery while minimizing the consumption of chemicals. Consider eco-friendly and cost-effective alternatives.
  • Implement advanced process control systems and automation to optimize the flotation process. This includes feedback control loops and sensors for real-time monitoring.

Call to Action:

My analysis on this dataset is more business driven. “Any questions or suggestions about the analysis? Want to discuss more about data analytics?” Kindly feel free to reach out to me on?LinkedIn?or write up an email to?[email protected] or catch me at MyData Portfolio.

Thank you for reading this article. You have a good day now!

I would like to extend my sincere thanks to?Avery Smith?for helping me on my data journey and guiding me in the right direction of this project.

Beth Robertson

Volunteer manager

1 年

The addition of the heatmap was a great idea. It is much easier to view the lack of correlation with the colors than the numbers. Well done!

Gouhith Agraharam

Network Security Engineer | CrowdStrike | Certified CCFA | Network Security and EDR Support

1 年

Well done Aksha Hrudhai K

Gordon Cheng

Business Analyst | Campaign Data Analytics | Financial Marketing | Tableau | Python | SQL | Excel | Data Visualization

1 年

Nice job! It is vey easy to follow through.

Adam Gartoua

Python | Excel | SQL | Tableau | ML & AI Enthusiast

1 年

Impressive work.. and very inspiring. Thanks for sharing!

Kathy Mucher, M.Ed.

Academic Data Analyst | Assessments & School Accountability | Academic Impact | Pearson Virtual Schools

1 年

Great work, Aksha Hrudhai K! Love your intro pics! ?? The heat map is a great way to see correlation or rather the lack of correlation. ??

要查看或添加评论,请登录

Aksha Hrudhai K的更多文章

社区洞察

其他会员也浏览了