Processing Plant Data with Python.
Picture 1

Processing Plant Data with Python.

In this project I have been recently “hired” as a data analyst for a manufacturing / engineering / science company. More specifically, I’ve been hired as a data analyst for a mining company called Metals R' Us & have been given data from their froth flotation processing plant. The main goal of this analysis is to find a possible issue that occurred on June 1, 2017. The plant manager wants an investigation to see if there is a problem that needs to be addressed. First, let's get an idea of what the froth process encompasses.

Froth flotation Process ?WIKI Explanation

The froth flotation process is widely used in mineral processing. This process is used to separate out unwanted products from dirt by using air or nitrogen in large water filled tanks to float desired materials, or concentrate as seen in Diagram 1. The pulp is a mixture of water and ore that is brought in to be processed. This process is important because the extraction of desired metals from larger amounts of lower grade materials is made possible. This process is also used in waste water treatment plants where water is separated from solids or oils.

No alt text provided for this image

The Data

The data used for this analysis is real, taken from Kaggle and used to predict quality in the froth flotation process. This data set covers the months of March, 2017 to September, 2017. Column readings are a bit uneven as some results are sampled every 20 seconds, and others sampled every hour. There are 24 columns and 737,453 rows in this dataset.

Three required libraries will be need to be loaded into Python. Pandas, Seaborn and Matplotlib. Pandas are being used for data upload and manipulation while Seaborn and Matplotlib are being used for data visualization.

In order to get an idea of what the dataset contains df.head() and df.shape are used to preview the first five rows of data and the number of rows and columns respectively.

No alt text provided for this image

The dates used in this dataset are in text form so they had to be converted to a date time column so they can be aggregated. The Python code used for this task is: df['date'] = pd.to datetime(df['date']). The data dictionary shown below describes the columns used in the dataset.

Another issue with the dataset is that commas were used for the numerical data. I updated the cells to contain periods instead of commas so the numbers would be formatted the same. I code I used to address this is df = pd.read_csv('MiningProcess_Flotation_Plant_Database.csv',decimal=",")

No alt text provided for this image
No alt text provided for this image
Picture 2. Diagram of a cylindrical flotation cell with camera and light used in image analysis of the froth surface. Source: Wikipedia


Getting the Results

In order get a handle on some statistics of the data I used df.describe(), to find the mean, max, min and other information on the different columns.

No alt text provided for this image

I’m going to filter my data for the month of June by creating a new dataframe called df_june. This is done is done speed up any searches that need to be made. I only want to concentrate on a few columns so I will create a new variable called important_cols. I then created a new dataframe,?df_june_important,?and set it equal to the older dataframe (df_june) with the columns in?important_cols.

No alt text provided for this image

The result of the code is shown below for June 1st.

No alt text provided for this image

Next, I called on the Seaborn library to simultaneously compare % Iron Concentrate, % Silica Concentrate, Ore Pulp pH and Flotation column 05 level. There were some suspicions that column5 may be having issues.

No alt text provided for this image

There doesn’t appear to be any correlation between these columns. Even though this may be true, it is still valuable information to keep on hand for future reference. Just to be sure I am seeing the above data correctly, I can run the .corr() command on the df_june_important dataframe. It is now easier to read that the correlation values are very low.

No alt text provided for this image

It can also be useful to view the same information in a line chart. Seaborn will be used again to create this graph. Different graphs had to be used because the unites of measure are too different.


No alt text provided for this image
No alt text provided for this image

Conclusion

There does not appear to be anything troubling happening on June 1 as all readings are running between normal ranges. The one outlier that needs to be researched further is the large drop for Flotation Column 05 Level, shown above, that dropped to a reading of 167.36. Not all data analysis is going to produce obvious results. Even if that is the case, the data can be useful for future comparisons. Python and its libraries are able to produce charts fairly easily to show the information you are trying to analyze and can keep businesses running smoothly.


Thank you for taking the time to read my analysis. Feel free to reach out if you have any questions or would like to talk analytics.




References:

Picture 1 https://www.thermofisher.com/blog/mining/how-to-improve-mining-and-mineral-operations-heres-a-guide/

Diagram 1 https://commons.wikimedia.org/wiki/File:FlCell.PNG

Picture 2 https://commons.wikimedia.org/wiki/File:Flotation_cell.jpg

要查看或添加评论,请登录

Jon Ekroth的更多文章

  • ??Path to the NBA Playoffs??

    ??Path to the NBA Playoffs??

    INTRODUCTION As I watched the Boston Celtics compete in the NBA Finals this week, I wondered what stats had the most…

    8 条评论
  • ?? March Madness!! ??

    ?? March Madness!! ??

    March is one of my favorite sports times of the year because it brings the Men's Final Four Basketball Championship or…

    2 条评论
  • Looking at Excel analysis in a whole new way.

    Looking at Excel analysis in a whole new way.

    ??? I have spent many hours using SQL, Tableau and R lately to work with data analysis. ??? Recently, I have gone back…

    8 条评论
  • The World Bank Analysis Using SQL

    The World Bank Analysis Using SQL

    Data analysis of The Wold Banks IDA statement of credits and grants. Background For this report I was “hired” by The…

    8 条评论
  • Analyzing Employee Attrition using R

    Analyzing Employee Attrition using R

    INTRODUCTION Human Resources departments have a constant battle trying to retain long-term employees. Longer tenured…

    8 条评论
  • Book Recommendation

    Book Recommendation

    I recently finished reading storytelling with data by Cole Nussbaumer Knaflic. I really enjoyed learning about how to…

    5 条评论
  • Your First Job

    Your First Job

    What did you learn about yourself from the experiences of your first job? Looking back, I can see where I developed…

    3 条评论
  • Interview: Data Analyst Report Utah Jazz

    Interview: Data Analyst Report Utah Jazz

    In this project I am “interviewing” with the Utah Jazz for a Data Analyst role. I will be using Tableau public to…

    16 条评论
  • Diabetes Patient Analysis

    Diabetes Patient Analysis

    INTRODUCTION For this SQL project I have been just been “hired” as a health care data analyst and management needs some…

  • Getting to know You.

    Getting to know You.

    I’ve always had a difficult time figuring out what I should be doing for a living. Trial and error is one way to find a…

    2 条评论

社区洞察

其他会员也浏览了