登录查看更多内容

Python in the Engineering World

Dan Waterstradt, MBA

Data Analyst @ University of Rochester | I take dirty data and make it clean then create stunning visuals

发布日期: 2023年2月7日

Intro

Data can be confusing. We have so many tools to learn and use. These tools can be helpful, insightful and also VERY overwhelming! Python looks to make your job easier as it is an amazingly powerful tool.

In this version of my latest project, I focused on using Python in an engineering setting. My role is as a data analyst for Metals 'R' US and I have been given data from the flotation plant to answer some business questions and obtain data for leadership.

THE DATA

The dataset we are using is from Kaggle and focuses on, essentially, the dirt samples that Metal 'R' Us collects to verify the large amounts of Iron so that it can be collected, refined, cleaned and sold.

To further explore the data -->Click Here

The business questions and topics I worked on to answer were the following:

Can we find the Count, Median, Min and Max for every column in the dataset?
Was there an unnatural occurrence in the sample collection process on 6/1/2017?
We have multiple variables. Is there a correlation between the multiple variables we have found?
Is there a change in the Amina Flow throughout the day? Also, are there other variables that we can present on?

Insights

The count, mean, min and max show as follows:

Count: 737453.0 Mean: 56.2947 Min: 42.74 Max: 65.78

2. There was no unnatural occurrence on that date. There was not anything major, that shows in the data collection of the samples being off in nature.

3. The correlation values between the information we gathered is low and there is no apparent correlation between the variables we created.

4. The %Iron Concentration did, in fact, fluctuate throughout the day on 6/1/2017. Also, the other variables we researched show fluctuations throughout the day as well.

Analysis

Can we find the Count, Median, Min and Max for every column in the dataset?

My boss wants to get some summary stats for each column.?They want the count, median, min and max for every column in this dataset. Surprisingly enough this can be found with a simple query in python:

df.describe()

Was there an unnatural occurrence in the sample collection process on 6/1/2017?

First, we need to figure out how long our dataset span was for, or the timeline.

WE do this with the following query:

max_date?=?df['date'].max()

领英推荐

Why You Should Learn Python for Data Analysis:…

Eduardo Miranda 8 个月前

Understanding Bayesian with Examples In Python

Rany ElHousieny, PhD??? 1 年前

Performance of Python Lists, NumPy Arrays and PyTorch…

Patrick Nicolas 3 个月前

print('The?max?date?is?'?+?str(max_date))

?min_date?=?df['date'].min()

print('The?min?date?is?'?+?str(min_date))

This will help us with the rows, however, we still have all the columns we need to address.

From here we need to create a variable that is a list of all the important columns we want to focus on. Let's call this variable "important cols"

Once this is done we create a new dataframe called df_june_important and set it equal to the older dataframe df_june in the column of important_cols.

We have multiple variables. Is there a correlation between the multiple variables we have found?

To address this question from our boss we simply use seaborn as sns and call for the pair plot using the data frame as an argument.

After looking at these plot maps there does not seem to be any correlation or relationship within this information.

Is there a change in the Amina Flow throughout the day? Also, are there other variables that we can present on?

We can see from the above line chart there is high levels of change rate throughout the date for the Amina flow our boss wanted to. Since the line chart was very helpful, leadership wants to see other variable changes during the exact timeframe as the Amina Flow.

Based on the information obtained via python, we can see that Iron concentrate, silica concentrate, Ore Pulp Ph and the floatation column level all have changes throughout the day.

Recommendations/Further insights

I was able to find the count, median, min and max for all columns within this dataset.
There was no odd occurrence on 6/1 that showed in the data collected.
There is no correlation between the multiple variables that we pulled and studied.
The line chart for the Amina flow and other variables show that the mineral change rates vary from hour to hour every single day.

THANK YOU so much for checking out my latest project! This was one of the more challenging projects for me trying to understand and use Python. I hope you enjoy it!

For more of my work please check out the featured section of my profile!

Dan Waterstradt, MBA | LinkedIn

Jordan Temple, MBA

Purchasing Systems Analyst at HistoryMaker Homes

2 年

Solid analysis, Dan! I plan on starting my Python project this evening so we'll see how it goes haha.

2 次回应

Muhammad Tauqeer Khalid

Front-End Developer | MERN Stack Expert | React, Next.js, Tailwind CSS | Building Scalable & Responsive Web Applications

2 年

Really impressive good work ??Dan Waterstradt, MBA

1 次回应

查看更多评论

要查看或添加评论，请登录

Dan Waterstradt, MBA的更多文章

Finding the Right Fit: An NBA Analysis using Tableau

2023年1月24日

Finding the Right Fit: An NBA Analysis using Tableau

The Data Being a former professional basketball player, I thought analyzing data from the 21-22 NBA season would be a…

7 条评论
Hospital Health Care Analysis using SQL

2023年1月18日

Hospital Health Care Analysis using SQL

THE DATA I have spent plenty of time in the hospital with multiple sports injuries and the birth of both my children…

5 条评论
World Banking Analysis using SQL

2023年1月16日

World Banking Analysis using SQL

THE DATA There are many organizations that have data and datasets that can number in the millions and even billions! I…

4 条评论
Massachusetts School Report Card

2023年1月5日

Massachusetts School Report Card

Published By: Dan Waterstradt In this case study, I took on the role of a data analyst for the Education Department in…

8 条评论
Market Analysis of Door Dash Using Excel

2022年12月29日

Market Analysis of Door Dash Using Excel

Market Analysis of Door Dash using Excel Created By: Dan Waterstradt DoorDash is a leading food delivery option in the…

13 条评论

See all articles

社区洞察

Volunteering

What are the best ways to use Python for data science?

Python in the Engineering World

Dan Waterstradt, MBA

Data Analyst @ University of Rochester | I take dirty data and make it clean then create stunning visuals

领英推荐

Dan Waterstradt, MBA的更多文章

社区洞察

其他会员也浏览了

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Building a Machine Learning Model from Scratch Using?Python

SIMPLE LINEAR REGRESSION IN PYTHON :

6 Reasons Why Python Can Ace AI and Machine Learning Applications?

A detailed K-nearest Neighbors classifier in Python

Day 5: Python Casting – Mastering Variable Types!

Everything that you should know about Linear Regression in python

Heap Sort Algorithm with Python

Handling Missing Values in?Python

Python Dictionary Methods

领英推荐

Dan Waterstradt, MBA的更多文章

Finding the Right Fit: An NBA Analysis using Tableau

Hospital Health Care Analysis using SQL

World Banking Analysis using SQL

Massachusetts School Report Card

Market Analysis of Door Dash Using Excel

社区洞察

其他会员也浏览了

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Building a Machine Learning Model from Scratch Using?Python

SIMPLE LINEAR REGRESSION IN PYTHON :

6 Reasons Why Python Can Ace AI and Machine Learning Applications?

A detailed K-nearest Neighbors classifier in Python

Day 5: Python Casting – Mastering Variable Types!

Everything that you should know about Linear Regression in python

Heap Sort Algorithm with Python

Handling Missing Values in?Python

Python Dictionary Methods