Python in the Engineering World

Python in the Engineering World

Intro

Data can be confusing. We have so many tools to learn and use. These tools can be helpful, insightful and also VERY overwhelming! Python looks to make your job easier as it is an amazingly powerful tool.

In this version of my latest project, I focused on using Python in an engineering setting. My role is as a data analyst for Metals 'R' US and I have been given data from the flotation plant to answer some business questions and obtain data for leadership.


THE DATA

The dataset we are using is from Kaggle and focuses on, essentially, the dirt samples that Metal 'R' Us collects to verify the large amounts of Iron so that it can be collected, refined, cleaned and sold.

To further explore the data -->Click Here

The business questions and topics I worked on to answer were the following:


  • Can we find the Count, Median, Min and Max for every column in the dataset?
  • Was there an unnatural occurrence in the sample collection process on 6/1/2017?
  • We have multiple variables. Is there a correlation between the multiple variables we have found?
  • Is there a change in the Amina Flow throughout the day? Also, are there other variables that we can present on?


Insights

  1. The count, mean, min and max show as follows:

Count: 737453.0 Mean: 56.2947 Min: 42.74 Max: 65.78

2. There was no unnatural occurrence on that date. There was not anything major, that shows in the data collection of the samples being off in nature.

3. The correlation values between the information we gathered is low and there is no apparent correlation between the variables we created.

4. The %Iron Concentration did, in fact, fluctuate throughout the day on 6/1/2017. Also, the other variables we researched show fluctuations throughout the day as well.


Analysis

  • Can we find the Count, Median, Min and Max for every column in the dataset?


My boss wants to get some summary stats for each column.?They want the count, median, min and max for every column in this dataset. Surprisingly enough this can be found with a simple query in python:

df.describe()

No alt text provided for this image



  • Was there an unnatural occurrence in the sample collection process on 6/1/2017?

First, we need to figure out how long our dataset span was for, or the timeline.

WE do this with the following query:

max_date?=?df['date'].max()

print('The?max?date?is?'?+?str(max_date))

?min_date?=?df['date'].min()

print('The?min?date?is?'?+?str(min_date))

No alt text provided for this image

This will help us with the rows, however, we still have all the columns we need to address.

From here we need to create a variable that is a list of all the important columns we want to focus on. Let's call this variable "important cols"

Once this is done we create a new dataframe called df_june_important and set it equal to the older dataframe df_june in the column of important_cols.

No alt text provided for this image



  • We have multiple variables. Is there a correlation between the multiple variables we have found?

To address this question from our boss we simply use seaborn as sns and call for the pair plot using the data frame as an argument.

No alt text provided for this image

After looking at these plot maps there does not seem to be any correlation or relationship within this information.



  • Is there a change in the Amina Flow throughout the day? Also, are there other variables that we can present on?


No alt text provided for this image

We can see from the above line chart there is high levels of change rate throughout the date for the Amina flow our boss wanted to. Since the line chart was very helpful, leadership wants to see other variable changes during the exact timeframe as the Amina Flow.

No alt text provided for this image
No alt text provided for this image

Based on the information obtained via python, we can see that Iron concentrate, silica concentrate, Ore Pulp Ph and the floatation column level all have changes throughout the day.



Recommendations/Further insights

  1. I was able to find the count, median, min and max for all columns within this dataset.
  2. There was no odd occurrence on 6/1 that showed in the data collected.
  3. There is no correlation between the multiple variables that we pulled and studied.
  4. The line chart for the Amina flow and other variables show that the mineral change rates vary from hour to hour every single day.


THANK YOU so much for checking out my latest project! This was one of the more challenging projects for me trying to understand and use Python. I hope you enjoy it!

For more of my work please check out the featured section of my profile!

Dan Waterstradt, MBA | LinkedIn

Jordan Temple, MBA

Purchasing Systems Analyst at HistoryMaker Homes

2 年

Solid analysis, Dan! I plan on starting my Python project this evening so we'll see how it goes haha.

Muhammad Tauqeer Khalid

Front-End Developer | MERN Stack Expert | React, Next.js, Tailwind CSS | Building Scalable & Responsive Web Applications

2 年

Really impressive good work ??Dan Waterstradt, MBA

要查看或添加评论,请登录

Dan Waterstradt, MBA的更多文章

  • Finding the Right Fit: An NBA Analysis using Tableau

    Finding the Right Fit: An NBA Analysis using Tableau

    The Data Being a former professional basketball player, I thought analyzing data from the 21-22 NBA season would be a…

    7 条评论
  • Hospital Health Care Analysis using SQL

    Hospital Health Care Analysis using SQL

    THE DATA I have spent plenty of time in the hospital with multiple sports injuries and the birth of both my children…

    5 条评论
  • World Banking Analysis using SQL

    World Banking Analysis using SQL

    THE DATA There are many organizations that have data and datasets that can number in the millions and even billions! I…

    4 条评论
  • Massachusetts School Report Card

    Massachusetts School Report Card

    Published By: Dan Waterstradt In this case study, I took on the role of a data analyst for the Education Department in…

    8 条评论
  • Market Analysis of Door Dash Using Excel

    Market Analysis of Door Dash Using Excel

    Market Analysis of Door Dash using Excel Created By: Dan Waterstradt DoorDash is a leading food delivery option in the…

    13 条评论

社区洞察

其他会员也浏览了