登录查看更多内容

Part 2: Predicting results, and working with Command Boards using Machine Learning

Jose Ignacio Ariznabarreta Fossati

Making data speak...

发布日期: 2022年1月17日

+ 关注

Analyzing the Employees TurnOver: Part 2

If you haven't seen Part 1, please refer to Part 1: Introduction

As an introductory overview of what we should do from an analytical perspective, we must take into Data Science follows the next steps:

Understanding the Problem
Exploratory Data Analysis
Data Visualization
Feature Engineering (Preparing the Dataset for the Machine Learning models)
Model Building
Model Deployment

In this report, we will go through all these steps (the last one we will just simulate some data).

After introducing the company's concern, understanding the problem, and showing the strategy we will take, it's time to start with the Exploratory Data Analysis.

A. Exploratory Analysis

Importing

import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
from pandas.plotting import scatter_matrix

df = pd.read_csv('dataset/employee_churn_data.csv')df.head()

Data Exploration

df.info()

As we mentioned in Part 1: Introduction of the Report, the variables (features) found are:

'department' -> The department where the employees works/used to work.
'promoted' -> If the employee was promoted in the past 2 years or not.
'review' -> From 0 to 1, the review score the employee received.
'projects' -> The number of projects each employee has been working on.
'salary' -> The level of salary. In this case, the variable is categorical and it can be: 'low', 'medium', and 'high'.
'tenure' -> The employee number of years in the company.
'satisfaction' -> From 0 to 1, the level of satisfaction of the employee (this was captured through surveys).
'bonus' -> Whether the employee has received any bonus or not.
'avr_hrs_mont' -> The average worked hours per month of the employee.
'left' -> If the employee left the company or not. This is our so-called 'target'. This is the value we will try to predict later.

A.1. Info Summary

We have 9540 entries, in 10 columns
We have no NULL values and all the attributes seem to have the correct DataType

df.describe()

Statistical Summary

Satisfaction: its mean is 50%
Average Hours Worked per month: 184hs
In most countries, the legal working schedule per day is (8+1)hs = 9hs. Monthly, this means 180hs.
Here, we are saying that the mean is already above the legal number (above this number, the Company should extra compensate the employee). And, on top of that, the first 25th percentil already pass this quantity of hours. This means that more than 75% of the Company works more than what the law establishes (in a different description, we would have to analyze if the employee chose to do the overtime hours or if they have to, and if they are well retributed for these extra-hours or not. As this might impact on the employee's satisfaction)
Promotion: Only 3% of the employees received any kind of promotion in the past two years.
Years in the Company (tenure): the average time in the Company is 6.5 years.
Bonus: Only 21% of the employees received any kind of Bonus.
Projects: The average # of projects managed by an employee is 3.

A.2. TurnOver (target value)

df.left.value_counts()
no     6756
yes    2784
Name: left, dtype: int64

sns.countplot(x='left', data=df)
plt.xticks(rotation=45)
plt.show()

Overview of summary (Turnover V.S. Non-turnover)

turnover_Summary = df.groupby('left')
turnOverSummaryMean=turnover_Summary.mean()
turnOverSummaryMean

turnover_rate = df.left.value_counts() / df.shape[0]
turnover_rate
no     0.708176
yes    0.291824
Name: left, dtype: float64

Looks like about 71% of employees stayed and 29% of employees left.

A.3.Correlation

Is there any correlation between the variables?

#For analysis purposes, we label-encode the target
dfNumeric=df.copy()
dfNumeric['leftNumeric']=0
dfNumeric.loc[dfNumeric['left']=='yes','leftNumeric']=1
dfNumeric.drop(columns='left',axis=1,inplace=True)
dfNumeric.rename(columns={'leftNumeric':'left'},inplace=True)

firstCol=dfNumeric.loc[:,'left']
restDF=dfNumeric.drop(columns='left',axis=1)
dfNumericCorr=pd.concat([firstCol,restDF], axis=1)
corrmat = dfNumericCorr.corr()
corrmat = (corrmat)
f, ax = plt.subplots(figsize=(12,9))
sns.heatmap(corrmat, annot = True, vmax=.8, square=True)
plt.title('Heatmap of Correlation Matrix')
corrmat

A.3.1. Correlation against the Target

dfNumeric.drop(columns='left',axis=1).corrwith(dfNumeric.loc[:,'left'])
promoted        -0.036777
review           0.304294
projects        -0.012408
tenure           0.010521
satisfaction    -0.009721
bonus           -0.011485
avg_hrs_month    0.009008
dtype: float64

A.3.2. Summary of Correlations

Positive Correlations:

There is a strong positive relation between 'tenure' and 'avg_hrs_month.' This could mean that the longer the employee is in the company, the more hours he/she works.
Also, there is a curious case in the relation between 'review' and 'left' as it looks like the higher the review, the higher the possibilities of leaving the company.

Negative Correlations:

There is a low-level negative relation between 'review' and 'avg_hrs_month'. This might mean that the lower the review, the employees tend to work more hours.
There is a negative relation between 'review' and 'satisfaction'. This means that the higher the review of the employee is, the lower the satisfaction is. This is strange, actually. Maybe the satisfaction level is not acquiring the real employees' satisfaction.

A.4.Plots

A.4.1. Distribution

# Graph Employee Satisfaction

# Sort the dataframe by target
target_0 = dfNumeric[dfNumeric.left==0]
target_1 = dfNumeric[dfNumeric.left==1]

sns.distplot(target_0[['satisfaction']], rug=True,kde=False,label='Stayed')
sns.distplot(target_1[['satisfaction']], kde=False, rug=True,label='Left')
plt.title('Employees Satisfaction')
plt.legend()

plt.show()

The distribution tends to be a normal dist.

The distribution of the employees who left its 'inside' to one of the employees who Stayed, but it is slightly right-skewed.

# Graph Employee Review

# Sort the dataframe by target
target_0 = dfNumeric[dfNumeric.left==0]
target_1 = dfNumeric[dfNumeric.left==1]

sns.distplot(target_0[['review']], rug=True,kde=False,label='Stayed')
sns.distplot(target_1[['review']], kde=False, rug=True,label='Left')
plt.title('Employees Review')
plt.legend()

plt.show()

The distribution of the employees who stayed tends to be a normal dist.
The distribution of the employees who left it's left-skewed.

# Graph Employee avg_hrs_month

# Sort the dataframe by target
target_0 = dfNumeric[dfNumeric.left==0]
target_1 = dfNumeric[dfNumeric.left==1]

sns.distplot(target_0[['avg_hrs_month']], rug=True,kde=False,label='Stayed')
sns.distplot(target_1[['avg_hrs_month']], kde=False, rug=True,label='Left')
plt.title('Employees Average Hrs Worked per Month')
plt.legend()

plt.show()

Here, we see that the maximum values of Average Hours worked per Month belong to the employees who stayed.
However, there is an important peak between 185 and 190 hours from the employees who left the company.

A.4.2. Scatter

# scatter plot matrix

sns.set()
sns.pairplot(df, size = 2.5)
plt.show()

A.5. Correlations between pair of values

A.5.1. Relation between Salary and Target (employees who left)

ax = plt.subplots(figsize=(15, 4))
sns.countplot(x="salary", hue='left', data=df).set_title('Employee Salary Turnover')
plt.show()

There are no important remarks in regards to the Salary distribution

A.5.2. Relation between Department and Target (employees who left)

dfSalaryByDepartment=df
dfSalaryByDepartment['salaryNumeric'] = 0
dfSalaryByDepartment.loc[dfSalaryByDepartment['salary']=='medium','salaryNumeric']=1
dfSalaryByDepartment.loc[dfSalaryByDepartment['salary']=='high','salaryNumeric']=2

dfSalaryByDepartment=df.groupby(['department','salary','salaryNumeric']).count()/df.groupby(['department']).count()
dfSalaryByDepartment=dfSalaryByDepartment[['bonus']]

dfSalaryByDepartment.reset_index(inplace=True)
dfSalaryByDepartment.iloc[:,[0,1,3]]

sns.barplot(x='department', y='bonus',hue='salary',data=dfSalaryByDepartment.sort_values(by='salaryNumeric')).set_title('Employee Department Quantity')
plt.xticks(rotation=90)
plt.show()

totalPerDepartment=df.groupby(['department'])[['left']].count()
totalPerDepartment.rename(columns={'left':'totalEmployees'},inplace=True)
totalPerDepartment=totalPerDepartment.reset_index().sort_values(by='totalEmployees',ascending=False)
totalPerDepartment

# Employee distribution
sns.barplot(x='department', y='totalEmployees',data=totalPerDepartment).set_title('Employee Department Quantity')
 
# Rotate x-labels
plt.xticks(rotation=90)
plt.show()

numberOfLeftOver=dfNumeric.groupby('department')[['left']].sum()
totalPerDep=totalPerDepartment.rename(columns={'department':'departmentTotal'}).sort_values(by='departmentTotal')
numberOfLeftOver=numberOfLeftOver.reset_index().sort_values(by='left',ascending=False)
numberOfLeft2=numberOfLeftOver.sort_values(by='department')
departmentsDF=pd.concat([numberOfLeft2,totalPerDep],axis=1)
departmentsDF.drop(columns=['departmentTotal'],inplace=True)
departmentsDF['absoluteRatio']=departmentsDF['left']/departmentsDF['totalEmployees'].sum()
departmentsDF['relativeRatio']=departmentsDF['left']/departmentsDF['totalEmployees']
departmentsDF

f, ax = plt.subplots(figsize=(15, 5))
sns.countplot(x="department", hue='left', data=df).set_title('Employee Department Turnover')
plt.xticks(rotation=90)
plt.show()

The sales, retail, and engineering department were the top 3 employee turnover departments, in absolute numbers.
In terms of TurnOver Ratio: IT is the first one, then logistics and then marketing.
Why is it important also to consider the ratio? The sales department is the biggest one and had 537 persons who left the company (turnover ratio: 0.285183). This number is already bigger than the whole IT department, which had 110 employees who left the company. Now, taking into consideration that the total # of employees the IT department had was 356, the turnover ratio (0.308989) is bigger than the one from the sales department.
The finance department had the smallest amount of turnover both in terms of absolute and relative levels.

Analyzing by Absolute Values/Ratio

dfNumeric[dfNumeric.department=='sales'].groupby('left').mean()

We see that, on average, the people who left the Company had a lower bonus than one who stayed, but a better review, which might mean they were good employees.

The bigger the review is, the more the employees tend to leave the Company

dfNumeric[dfNumeric.department=='finance'].groupby('left').mean()

We see that the employees from Finance (the sector with the smallest turnover both in absolute and relative levels) who left the Company had the highest review than the ones who stayed, and also the biggest bonus, but they were less satisfied. Anyway, both groups have a bonus bigger than the Company's average one.

The satisfaction level is important
The bigger the review is, the more the employees tend to leave the Company

Analyzing by Relative Levels

Analyzing the employees from IT

dfNumeric[dfNumeric.department=='IT'].groupby('left').mean()

In terms of relative TurnOver, the employees who left the Company

tended to work more hours per month.
had a bigger review
had been promoted a lot less

领英推荐

DECISION TREES AND TITANIC DATASET

Giancarlo Ronci 4 个月前

Check Regional Information via Coarsen

Chonghua Yin 2 年前

Does Excel Deserve a Seat at the "Big Boy" Analytics…

Chris Dutton 9 年前

A.5.3. Relation between # of Projects and Target (employees who left)

ax = sns.countplot(x="projects", hue="left", data=df)
plt.show()

No remarks from this analysis

A.5.4. Relation between Reviews and Target (employees who left)

turnOverSummaryMean

target_0=df['left'] == 'no'
target_1=df['left'] == 'yes'
# Kernel Density Plot
fig = plt.figure(figsize=(15,4),)
ax=sns.kdeplot(df.loc[(target_0),'review'] , color='b',shade=True,label='Stayed')
ax=sns.kdeplot(df.loc[(target_1),'review'] , color='r',shade=True, label='Left')
plt.title('Employee Review Distribution - Left the Company V.S. Stayed')
plt.legend()
plt.show()

ax = sns.boxplot(x="left", y="review", data=df)

Breaking the Analysis in Ranges

def splitRange(theDf,column,minValue,maxValue,numberOfSplits=0,theSteps=0):
    """This function receives a Dataframe, the column that you want to create Ranges from, the Starting Value, the Finishing Value
- the Number of Splits you would like to have, or the steps you would like your range to have.
The first 4 values are mandatory, and then you need to enter either of the last two values."""

    if (theSteps==0) & (numberOfSplits==0):
        print ('Enter the Number or Steps ot the Number of Splits.')
        return
    if numberOfSplits!=0:
        theSteps=maxValue/numberOfSplits

    for i in np.arange(minValue,maxValue,theSteps):
        if i == minValue:
            theDf.loc[(theDf[column]==i),column+'Range']=(str(np.round(i,1))+' - '+str(np.round(i+theSteps,1)))
        
        theDf.loc[(theDf[column]>i) & (theDf[column]<=(i+theSteps)),column+'Range']=(str(np.round(i,1))+' - '+str(np.round(i+theSteps,1)))

        if i+theSteps == maxValue:
            theDf.loc[(theDf[column]==np.float(maxValue)),column+'Range']=(str(np.round(i,1))+' - '+str(np.round(i+theSteps,1)))
    return theDf
tuenumberOfSplits=10
theColumn='review'
theMinValue=0
theMaxValue=1

rangedDf=splitRange(dfNumeric,theColumn,theMinValue,theMaxValue,tuenumberOfSplits)
rangedDf.sort_values(theColumn,inplace=True)
sns.countplot(x='reviewRange',hue='left',data=rangedDf)
plt.xticks(rotation=45)
plt.show()

rangedDf.groupby('reviewRange').mean()

sns.scatterplot(x='review',y='tenure',hue='left',data=rangedDf)
plt.show()

The mean of the employees who left the company was 0.69.
We can see that the higher the review score, the more chances the employee leave the company
On top of that, employees with a review above 0.8 leave the company 90% of the time ((0.8+1)/2)

A.5.5. Relation between Average Hours worked per Month and Target (employees who left)

# Kernel Density Plot
fig = plt.figure(figsize=(15,4),)
ax=sns.kdeplot(df.loc[(target_0),'avg_hrs_month'] , color='b',shade=True,label='Stayed')
ax=sns.kdeplot(df.loc[(target_1),'avg_hrs_month'] , color='r',shade=True, label='Left')
plt.title('Employee Average Hours worked per Month Distribution - Left the Company V.S. Stayed')
plt.legend()
plt.show()

For the employees who stayed in the company, there is a right skewness: the mean is smaller than the median.
For the employees who left the company, there is a clear left skewness. In this case, they tend to work many more hours than the average of the company.
In fact, there is a big peak between 185 and 190 hours.

In general, in all the company the employees tend to do overtime hours.

total=df.groupby('left')[['department']].count()
overTimeHours=df[(df.avg_hrs_month>=185) & (df.avg_hrs_month<=190)]
overTimeHours.groupby('left')[['department']].count()/total

56% of the people who left the company were doing between 185 and 190 hs per month. This means between 9:15 and 9:30 hours per day.

Breaking the Analysis in Ranges

tuenumberOfSplits=0
theColumn='avg_hrs_month'
theMinValue=170
theMaxValue=205
steps=5

rangedDf=splitRange(dfNumeric,theColumn,theMinValue,theMaxValue,tuenumberOfSplits,steps)
rangedDf.sort_values(theColumn,inplace=True)
sns.countplot(x='avg_hrs_monthRange',hue='left',data=rangedDf)
plt.xticks(rotation=45)
plt.show()

rangedAVGDf=rangedDf.groupby('avg_hrs_monthRange').mean()
rangedAVGDf.reset_index(inplace=True)
rangedAVGDf

A.5.6. Relation between Satisfaction and Target (employees who left)

# Kernel Density Plot
fig = plt.figure(figsize=(15,4),)
ax=sns.kdeplot(df.loc[(target_0),'satisfaction'] , color='b',shade=True,label='Stayed')
ax=sns.kdeplot(df.loc[(target_1),'satisfaction'] , color='r',shade=True, label='Left')
plt.title('Employee Average Hours worked per Month Distribution - Left the Company V.S. Stayed')
plt.legend()
plt.show()

For the employees that stayed in the company, the distribution is slightly normal.
However, for the employees who left the company, we can see a small right skewness, and we can see a peak below the mean.

Breaking the Analysis in Ranges

tuenumberOfSplits=10
theColumn='satisfaction'
theMinValue=0
theMaxValue=1
steps=0

rangedDf=splitRange(dfNumeric,theColumn,theMinValue,theMaxValue,tuenumberOfSplits,steps)
#The database has a an error where the row 1755 has a Satisfaction of 1.0000000000000002, when it should be 1.
#With this line, we modify the error
theSatisfaction=df.iloc[1755].satisfaction
df.index[rangedDf['satisfaction']==theSatisfaction]
rangedDf.iloc[9539,12]='0.9 - 1.0'
rangedDf.sort_values('satisfaction',inplace=True)
sns.countplot(x='satisfactionRange',hue='left',data=rangedDf)
plt.xticks(rotation=45)
plt.show()

rangedDf.groupby('satisfactionRange').mean()

A.5.7. Relation between Satisfaction and Review

#The database has a an error where the row 1755 has a Satisfaction of 1.0000000000000002, when it should be 1.
#With this line, we modify the error
theSatisfaction=df.iloc[1755].satisfaction
df.index[rangedDf['satisfaction']==theSatisfaction]
rangedDf.iloc[9539,12]='0.9 - 1.0'
sns.lineplot(x='reviewRange',y='tenure',data=rangedDf.sort_values('review'))
plt.xticks(rotation=45)
plt.show()

The bigger the satisfaction is, the lower the review is. And vice versa. (with exception of the first steps of the range)

sns.scatterplot(x='satisfaction',y='review',hue='left',data=df)
plt.show()

In terms of employees who left the company, we find two distinctive clusters:

Employees with low satisfaction and high review: those might have been good employees but not happy in the company.
Most of the employees who left were grouped with a satisfaction above the mean and with an average review.

A.5.8. Relation between Tenure (years in the company) and Target (employees who left)

f, ax = plt.subplots(figsize=(15, 5))
sns.countplot(x="tenure", hue='left', data=df).set_title('Employee years in the company Turnover')
plt.show()

f, ax = plt.subplots(figsize=(15, 5))
sns.barplot(x="tenure", y="tenure", hue="left", data=df, estimator=lambda x: len(x) / len(df) * 100)
ax.set(ylabel="Percent")
plt.show()

We see that in the years 3 & 4, and 7 & 8 years in the company, the turnover ratio almost reaches a parity.
However, due to statistical significance, I would pay special attention to the years 7 and 8
This might mean that some of the employees who left the company might have been headhunted.

A.5.9. Relation between Tenure and Average Hours worked in the Month

sns.lineplot(x='avg_hrs_month',y='tenure',hue='left',data=df)
plt.xticks(rotation=45)
plt.show()

Both the employees who stay as the ones who left tend to work more as they have a bigger tenure (more years in the company)

A.5.10 Relation between Review and Tenure

sns.lineplot(x='reviewRange',y='tenure',data=rangedDf.sort_values('review'))
plt.xticks(rotation=45)
plt.show()

Also, we see that the employees who got the higher reviews were the employees with fewer years in the company

Preliminary Summary

The unique indicator found that directly (and clearly) impacts the outcome (if the employee leaves the Company or not) is the Review of the Employee. The higher the review, the more chances the employee has to leave the Company.
Linked to the Review indicator, we have the Satisfaction, as higher the review is, the lower the Satisfaction is (it's almost a perfect negative relation with the exception of the first Satisfaction's ranges).

Statistical Summary

Satisfaction: its mean is 50%.
Average Hours Worked per month: 184hs
Promotion: Only 3% of the employees received any kind of promotion in the past two years.
Years in the Company (tenure): the average time in the Company is 6.5 years.
Bonus: Only 21% of the employees received any kind of Bonus.
Projects: The average # of projects managed by an employee is 3.

Review

Employees with a review above 0.8 leave the Company 90% of the times
Employees who got the higher reviews were the employees with fewer years in the Company

TurnOver

about 71% of employees stayed, and 29% of employees left the Company in the last analyzed period.

Correlations

Positive Correlations

There is a strong positive relation between 'tenure' and 'avg_hrs_month'. This could mean that the longer the employee is in the Company, the more hours he/she works.
Also, there is a curious case in the relation between 'review' and 'left' as it looks like the higher the review, the higher the possibilities of leaving the Company.

Negative Correlations

There is a low-level negative relation between 'review' and 'avg_hrs_month'. This might mean that the lower the review, the employees tend to work more hours.
There is a negative relation between 'review' and 'satisfaction'. This means that the higher the review of the employee is, the lower the Satisfaction is. This is strange, actually. Maybe the satisfaction level is not acquiring the real employees' Satisfaction.

ANALYSIS OF THE EMPLOYEES FROM SALES

We see that, on average, the people who left the Company had a lower bonus than one who stayed, but a better review, which might mean they were good employees.

ANALYSIS OF THE EMPLOYEES FROM FINANCES

We see that the employees from Finance (the sector with the smallest turnover both in absolute and relative levels) who left the Company had the highest review than the ones who stayed, and also the biggest Bonus, but they were less satisfied. Anyway, both groups had a bonus bigger than the Company's average one.

ANALYSIS OF THE EMPLOYEES FROM IT

In terms of relative TurnOver, the employees who left the Company:

tended to work more hours per month.
had a bigger review
had been promoted a lot less

In the next post, we will start answering the specific questions, and creating & deploying the Machine Learning chosen model. Finally, we will finish with the specific insights.

Go to Part 3: Predictive Analysis

---------------------------------------------------------------------------------------------------------------

This case is part of a DataCamp competition: Employees TurnOver - DataCamp

The full report is available in my GitHub repository GitHub - vascoarizna

Here you will find not only the full explanation with the graphics and the solution but also the Jupyter Notebook codes in case you want to take anything for you.

Author: Ignacio Ariznabarreta - JIAF Consulting

要查看或添加评论，请登录

Jose Ignacio Ariznabarreta Fossati的更多文章

2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

2022年2月6日

2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

How can we analyze the tweets in almost real-time? Communication during the Olympic Games is fantastic because many…

1 条评论
Olympic Data: Making Olympic Games' Data Speak - A Generic Exploration

2022年1月22日

Olympic Data: Making Olympic Games' Data Speak - A Generic Exploration

Did you know that the youngest Olympian ever was 10 years old when he participated and even won a medal? (also being…

1 条评论
Part 4 (Final): Predicting results, and working with Command Boards using Machine Learning

2022年1月20日

Part 4 (Final): Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver: Part 4 (Final) - Descriptive & Predictive analysis If you haven't seen Parts 1, 2, or…
Part 3: Predicting results, and working with Command Boards using Machine Learning

2022年1月18日

Part 3: Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver: Part 3 - Descriptive & Predictive analysis If you haven't seen Part 1 or Part 2…
Predicting results, and working with Command Boards using Machine Learning

2022年1月16日

Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver Machine Learning, and the whole AI spectrum, give us a lot of resources to attack many…

1 条评论

See all articles

Analyzing the Employees TurnOver: Part 2

A. Exploratory Analysis

Data Exploration

A.1. Info Summary

A.2. TurnOver (target value)

Overview of summary (Turnover V.S. Non-turnover)

A.3.Correlation

A.3.1. Correlation against the Target

A.3.2. Summary of Correlations

A.4.Plots

A.4.1. Distribution

A.4.2. Scatter

A.5. Correlations between pair of values

A.5.1. Relation between Salary and Target (employees who left)

A.5.2. Relation between Department and Target (employees who left)

领英推荐

A.5.3. Relation between # of Projects and Target (employees who left)

A.5.4. Relation between Reviews and Target (employees who left)

A.5.5. Relation between Average Hours worked per Month and Target (employees who left)

A.5.6. Relation between Satisfaction and Target (employees who left)

A.5.7. Relation between Satisfaction and Review

A.5.8. Relation between Tenure (years in the company) and Target (employees who left)

A.5.9. Relation between Tenure and Average Hours worked in the Month

A.5.10 Relation between Review and Tenure

Preliminary Summary

Statistical Summary

Review

TurnOver

Correlations

ANALYSIS OF THE EMPLOYEES FROM SALES

ANALYSIS OF THE EMPLOYEES FROM FINANCES

ANALYSIS OF THE EMPLOYEES FROM IT

Jose Ignacio Ariznabarreta Fossati的更多文章

2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

Olympic Data: Making Olympic Games' Data Speak - A Generic Exploration

Part 4 (Final): Predicting results, and working with Command Boards using Machine Learning

Part 3: Predicting results, and working with Command Boards using Machine Learning

Predicting results, and working with Command Boards using Machine Learning

社区洞察

其他会员也浏览了

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Quantico: Plotting

The Art and Comedy of Data Analysis: Laughing Through the Chaos

Delivering The Right Level Of Analytical Detail

Optimizations: Trendlines

Feature Selection for faster analytics

How to speed up tabular data processing by 1053x in pandas/cudf

Time Series Analysis - Basics

Make predictions on Test Data set using Principal Components in R

Linear regression for housing data using randomized search, cross-validation, search grid, or combines: