登录查看更多内容

Iron Mining Python Data Analysis

Christy Ehlert-Mackie

Data Analyst | Bridging Business and Technical Sides to Power Data-Driven Decisions | MSBA, MBA | Excel, SQL, Power BI, Tableau | Background in Accounting and Finance

发布日期: 2023年5月23日

For this project, I analyzed data for a hypothetical mining company called Metals R’ Us. One of the metals they mine is iron. In the mining process, Metals R’ Us digs up big clumps of dirt which contain iron along with impurities such as dirt, sand, and silica. Iron ore pulp is run through a flotation plant. Starch and amina are mixed in order to strip dirt away from the iron. Air bubbles are added to the liquid mixture so the metal rises to the top and minerals remain at the bottom, which increases the purity of the iron concentrate. This video explains the flotation process.

The boss at Metals R’ Us was concerned that something unusual happened on June 1, 2017. The data was cleaned, manipulated, and analyzed to determine if this was the case.

Insights

The pairplot shows no clear relationships between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column 05 Level.
The correlation is weak between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column 05 Level.
Line plots for % Iron Concentrate, % Silica Concentrate, and Ore Pulp pH show nothing concerning for June 1, 2017.
The line plot for Flotation Column 05 Level shows a sharp drop off around 7:00pm on June 1, 2017.

Dataset

The dataset used for this project came from Kaggle. It is real data taken from March – September 2017. The data was downloaded as a csv file and analyzed using Python, a free open-source programming language used by many large companies.

Important columns for this analysis were:

Date = date & time stamp of sample reading
% Iron Concentrate = % of iron at end of flotation process
% Silica Concentrate = % of silica at end of flotation process
Ore Pulp pH = pH on scale from 0 - 14
Flotation Column 05 Level = froth level in the flotation cell with lower the level, the higher the grade of concentration

Analysis

I used Python to analyze the dataset. There are thousands of packages available to use in Python. Common packages used for data analysis include:

NumPy for mathematical operations
Pandas for data manipulation and analysis (built on top of NumPy)
Matplotlib for data visualization
Seaborn for data visualization (built on top of Matplotlib)?

For this project, I used DeepNote to run Python. DeepNote is a brower-based IDE (integrated development environment) that is used to write and run the code for a project in a notebook.

To begin, I imported the packages I needed. Note that the packages are given an abbreviation (such as pd for Pandas) so it is easier to refer to them later in the code.

The csv file of the dataset was read into a Pandas dataframe using the pd.read_csv function. The function df.head() gives a preview of the dataframe.

The preview shows that the data has commas as the separator in the numbers instead of periods. To fix this, the csv file is re-read into the dataframe, this time specifying that the commas signify decimals. The df.head() function is re-run to show that this is now fixed.

To get the number of rows and columns in the dataframe, I used df.shape. There are 737,453 rows of data and 24 columns.

Dates sometimes are not imported in the format that is best suited for analysis. I used the print(type(df)) function to check the variable type. This shows that dates were imported as strings instead of datetime. This was corrected using pd.to_datetime.

领英推荐

Reimagining Mining: Digital Transformation and Future…

GroundHog 5 个月前

13 Process Mining Challenges and 95 Best Practices

Fluxicon 7 个月前

Redefining Competitive Intelligence Through Data…

Hoick 10 个月前

The df.describe() function gives summary descriptive statistics for each numeric column:

Count = number of non-empty values
Mean = average value
Std = standard deviation
Min = minimum value
25% = 25th percentile
50% = 50th percentile
75% = 75th percentile
Max = maximum value

Now that the preliminary work is done, the boss said that something weird happened on June 1, 2017 and wants me to investigate. First, I want to check the data range of the data by finding the maximum and minimum dates which are 9/9/17 and 3/10/17, respectively. Then, I filter the data by creating a new dataframe called df_june1 to just contain June 1st.

According to the engineering department, the most important variables are % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column 05 Level. I created another dataframe called df_june_important which has just these columns along with the datetime stamp for only June 1st.

A pairplot was run using Seaborn to determine if any of the variables correlate with each other. The pairplot results show no clear relationships.

I also ran a correlation matrix. All of these correlation coefficients are pretty low which confirms weak correlation. Values over +/-0.3 are slightly correlated, over +/-0.6 are moderately correlated, and +/-0.8 are strongly correlated. The sign of the correlation coefficient indicates the direction of the relationship, with positive values indicating positive correlation (as one increases, the other increases) while negative values indicate a negative correlation (as one increases, the other decreases). The highest correlation in this matrix is 0.30 between % Iron Concentrate and Ore Pulp pH.

The boss also wants to see line plots for each of the important variables to see how they changed throughout the day on June 1st. I used a for loop, which iterates over each variable and creates a plot. The plots for % Iron Concentrate, % Silica Concentrate, and Ore Pulp pH show variation throughout the day but nothing concerning. The plot for Flotation?Column 05 Level shows fairly steady values around 500 most of the day, with a sharp dropoff to about 300 around 19:00 (7:00pm) that recovered within a couple hours. It is unknown if this is unusual so it will be brought to the boss’ attention for possible further investigation.

Key Takeaways

The pairplot shows no clear relationships between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column 05 Level.
The correlation is weak between % Iron Concentrate, % Silica Concentrate, Ore Pulp pH, and Flotation Column 05 Level.
Line plots for % Iron Concentrate, % Silica Concentrate, and Ore Pulp pH show nothing concerning for June 1, 2017.
The line plot for Flotation Column 05 Level shows a sharp drop off around 7:00pm on June 1, 2017.

Conclusion

Thank you for reading my article! Leave a comment below or connect with me. You can also check out my data analysis project portfolio website here.

Caroline J.

Data Analyst | Business Intelligence | I help companies drive data informed decision making | Remote

1 年

Way to go, Christy! Did you have a favorite part of the project?

1 次回应

Stuart Walker

1 年

Good job Christy, another great project ??????

1 次回应

查看更多评论

要查看或添加评论，请登录

Christy Ehlert-Mackie的更多文章

Come Explore the Artists and Artworks at the Museum of Modern Art!

2025年2月28日

Come Explore the Artists and Artworks at the Museum of Modern Art!

Introduction I love both Excel and art so, when Maven Analytics recently added a new guided project combining both of…

11 条评论
Maven Market Grocery Store Performance Analysis

2025年2月7日

Maven Market Grocery Store Performance Analysis

Grocery stores are a vital segment of the retail industry. We all need to eat regularly, right? However, consumers are…

22 条评论
Beyond the Numbers: Discovering Insights in Animal Shelter Data

2024年8月23日

Beyond the Numbers: Discovering Insights in Animal Shelter Data

According to Shelter Animals Count, an estimated 6.5 million cats and dogs in the United States went through animal…

3 条评论
IBM HR Analytics: Understanding Employee Data

2023年5月27日

IBM HR Analytics: Understanding Employee Data

Human resources (HR) analytics, also known as people analytics, is increasingly becoming popular in organizations…

11 条评论
Shooting for Insights: NBA Player Statistics Analysis for the 2021-22 Season

2023年5月17日

Shooting for Insights: NBA Player Statistics Analysis for the 2021-22 Season

I admit, I am not much of a sports fan. I generally don’t pay much attention to what is going on in the sports world.

7 条评论
Analysis of Hospital Patient Data Using SQL

2023年4月25日

Analysis of Hospital Patient Data Using SQL

Nearly everyone has been in a hospital at some point in their life, whether as an inpatient, to have an outpatient…

14 条评论
Building a Better Economic Future: An Analysis of World Bank Credits & Grants

2023年4月12日

Building a Better Economic Future: An Analysis of World Bank Credits & Grants

Who can developing countries turn to for help in building a better economic future? The World Bank Group is a global…

6 条评论
How Are Massachusetts' Schools Performing?

2023年4月6日

How Are Massachusetts' Schools Performing?

Education is the passport to the future, for tomorrow belongs to those who prepare for it today. – Malcolm X No one can…

6 条评论
Delivering DoorDash Marketing Insights

2023年3月23日

Delivering DoorDash Marketing Insights

Introduction It is hard to believe that it is three years already since the start of the COVID pandemic. Lockdowns and…

6 条评论

See all articles

Iron Mining Python Data Analysis

Christy Ehlert-Mackie

Data Analyst | Bridging Business and Technical Sides to Power Data-Driven Decisions | MSBA, MBA | Excel, SQL, Power BI, Tableau | Background in Accounting and Finance

Insights

Dataset

Analysis

领英推荐

Key Takeaways

Conclusion

Christy Ehlert-Mackie的更多文章

社区洞察

其他会员也浏览了

Empowering Miners with Business Intelligence

This week in process excellence

How much impact does workflow visualization have on process mining analysis?

Mining Data Pipelines

Data Mining for Iron Ore

Process Mining vs. Analytics (or Business Intelligence)?

Python extracted Iron Ore from the Earth's Core: Mining Analysis using Python

Mining for Mining Data with Python

Beneath the Surface A Python-Based Mining Data Analysis

Streamlining Operations: Leveraging Process Mining for Business Enhancement

Insights

Dataset

Analysis

领英推荐

Key Takeaways

Conclusion

Christy Ehlert-Mackie的更多文章

Come Explore the Artists and Artworks at the Museum of Modern Art!

Maven Market Grocery Store Performance Analysis

Beyond the Numbers: Discovering Insights in Animal Shelter Data

IBM HR Analytics: Understanding Employee Data

Shooting for Insights: NBA Player Statistics Analysis for the 2021-22 Season

Analysis of Hospital Patient Data Using SQL

Building a Better Economic Future: An Analysis of World Bank Credits & Grants

How Are Massachusetts' Schools Performing?

Delivering DoorDash Marketing Insights

社区洞察

其他会员也浏览了

Empowering Miners with Business Intelligence

This week in process excellence

How much impact does workflow visualization have on process mining analysis?

Mining Data Pipelines

Data Mining for Iron Ore

Process Mining vs. Analytics (or Business Intelligence)?

Python extracted Iron Ore from the Earth's Core: Mining Analysis using Python

Mining for Mining Data with Python

Beneath the Surface A Python-Based Mining Data Analysis

Streamlining Operations: Leveraging Process Mining for Business Enhancement