Let's Get The Iron
Mining and manufacturing can be a very complicated process. Besides finding a proper location to dig, professionals also need to separate the materials they want from the ones they don’t. In this case, iron is the desired element to obtain. The problem is that it’s surrounded by dirt, silica or sand.
In the next case, we’ll be analyzing data from a mining company called Metals R’Us using Python. Our main objective is to check that there’s no anomalies in the scientific process of obtaining iron. One of the most important values we’ll be looking at is ‘% Iron Concentrate’, which represents the iron’s purity.
The Data Set:
We’ll be working with a real dataset taken from March to September 2017. There are 24 columns and 737453 rows. Here’s a brief description of each column:
Questions
Installing data libraries
In order to analyze and visualize our data, first we need to install our different libraries. This will allow us not only to clean and run different commands in our dataset, but also create easy graphs and visualizations in order to find trends and insights.
The data libraries we’ll install are:
Reading the Data Set
With our libraries installed, we can import, read and look at our data to have a general understanding our rows and columns.
Fixing the Date column
Since we’ll be working with the Date, it’s important to make sure that the data type is the right one. We can check the date data type with the following lines.
As suspected, the date data type is str. This is something we need to change in order to manipulate and group our rows. ?Let’s fix it with the following code lines:
We can also use the one line of code to check the count, mean, min, max and other important values in your dataset.
?
Before we start our analysis, I think it’s important that we check the min and max dates of our data set. That can be easily done with the following lines:
1. Important events in June
?In order to analyze our data from June, we can create a different table for that month only. This will make things easier in the next steps.?
We will also select just the columns that are important for our analysis. This can be achieved by narrowing down our June table and creating another one from it.
Now that we have the values we need, we can create a series of graphs in order to have a quick look at our correlations or trends. Using the following line, we can get 16 graphs.
领英推荐
By looking at the charts, it’s quite obvious that there’s nothing to be alarmed about in the month of June. But that’s in terms of visuals. We could also run a correlation code in order to see the numbers.
Again, there doesn’t seem to be any trend or important correlation. Our decimal numbers indicate the probability of values being related. In this case, most of our numbers have small decimals, with the biggest one being 0.302 (That’s a 30% correlation). It confirms what we saw in the graphs above.
Now, let’s see what are the highs and lows of each value during the first day of June. If we run a loop, we can create four different line graphs that will indicate the levels of our values.?
For the % Iron Concentrate, there was a decline around 11am. Curiously, there was also a spike on % Silica Concentrate at the same time. This makes sense, since those 2 values are the most important ones when obtaining iron. The purer the iron, the smaller the percentage of Silica Concentrate is.
For the Pulp pH and the Flotation 05 Level, there was a decline and a spike between 2 and 5pm, in that order.
2. Flotation Level 05: First and Last Month
To check what were the min and max Flotation Levels from the first and last month, we will need to create two new tables. One for the first month, and one for the last one.
Now that we have both tables, we can easily check the min and max of the Flotation Level 05 and compared them to see if there’s anything to be worried about.
Seeing the results, all the numbers are really close to each other. That suggests that our mining process has been steady and consistent.
3. Ore Pulp pH: Min and Max
Now let’s check the min and max levels of Ore Pulp pH in our entire data frame. This constitute checking the values in the 6 months of records that our data set offers. In this case, we’ll create a table with our important values and use it to make a heatmap of the Ore Pulp pH from March to September.
Our heatmap offers us a general representation of our Ore Pulp pH values and suggests that it’s been stable during those 6 months of data. But it’s hard to see what the exact values are, and when they happened. For this, we’ll use the following code:
Now we can see that our min is 8.753 and happened on March 15th. Our max is 10.8081, and happened on July 20th.
4. Correlation between % Iron Concentrate and % Silica Concentrate
?At this point it’s well known that in order to get pure iron, our % Iron Concentrate needs to be high and our % Silica Concentrate low. It would be interesting to verify this by creating a scatterplot that shows us this. We will make a graph for the first month and a graph for the last one.?
These two graphs look quite similar, which indicates that there’s no anomaly in our mining process. ?We can see most of the points that are on the right side of the chart are also at the bottom, and the ones at the top are mostly on the left. That indicates that the smaller % Silica Concentrate is, the bigger the % Iron Concentrate is.
Findings: