Python in the Engineering World
Dan Waterstradt, MBA
Data Analyst @ University of Rochester | I take dirty data and make it clean then create stunning visuals
Intro
Data can be confusing. We have so many tools to learn and use. These tools can be helpful, insightful and also VERY overwhelming! Python looks to make your job easier as it is an amazingly powerful tool.
In this version of my latest project, I focused on using Python in an engineering setting. My role is as a data analyst for Metals 'R' US and I have been given data from the flotation plant to answer some business questions and obtain data for leadership.
THE DATA
The dataset we are using is from Kaggle and focuses on, essentially, the dirt samples that Metal 'R' Us collects to verify the large amounts of Iron so that it can be collected, refined, cleaned and sold.
To further explore the data -->Click Here
The business questions and topics I worked on to answer were the following:
Insights
Count: 737453.0 Mean: 56.2947 Min: 42.74 Max: 65.78
2. There was no unnatural occurrence on that date. There was not anything major, that shows in the data collection of the samples being off in nature.
3. The correlation values between the information we gathered is low and there is no apparent correlation between the variables we created.
4. The %Iron Concentration did, in fact, fluctuate throughout the day on 6/1/2017. Also, the other variables we researched show fluctuations throughout the day as well.
Analysis
My boss wants to get some summary stats for each column.?They want the count, median, min and max for every column in this dataset. Surprisingly enough this can be found with a simple query in python:
df.describe()
First, we need to figure out how long our dataset span was for, or the timeline.
WE do this with the following query:
max_date?=?df['date'].max()
领英推荐
print('The?max?date?is?'?+?str(max_date))
?min_date?=?df['date'].min()
print('The?min?date?is?'?+?str(min_date))
This will help us with the rows, however, we still have all the columns we need to address.
From here we need to create a variable that is a list of all the important columns we want to focus on. Let's call this variable "important cols"
Once this is done we create a new dataframe called df_june_important and set it equal to the older dataframe df_june in the column of important_cols.
To address this question from our boss we simply use seaborn as sns and call for the pair plot using the data frame as an argument.
After looking at these plot maps there does not seem to be any correlation or relationship within this information.
We can see from the above line chart there is high levels of change rate throughout the date for the Amina flow our boss wanted to. Since the line chart was very helpful, leadership wants to see other variable changes during the exact timeframe as the Amina Flow.
Based on the information obtained via python, we can see that Iron concentrate, silica concentrate, Ore Pulp Ph and the floatation column level all have changes throughout the day.
Recommendations/Further insights
THANK YOU so much for checking out my latest project! This was one of the more challenging projects for me trying to understand and use Python. I hope you enjoy it!
For more of my work please check out the featured section of my profile!
Purchasing Systems Analyst at HistoryMaker Homes
2 年Solid analysis, Dan! I plan on starting my Python project this evening so we'll see how it goes haha.
Front-End Developer | MERN Stack Expert | React, Next.js, Tailwind CSS | Building Scalable & Responsive Web Applications
2 年Really impressive good work ??Dan Waterstradt, MBA