登录查看更多内容

Data Science Salaries 2023 Dataset??

Leonardo A.

Data Analyst

发布日期: 2023年5月28日

Introduction

In today’s data-driven world, Data Science has emerged as a field with immense potential and exciting career prospects. With the increasing reliance on data for informed decision-making, companies are actively seeking skilled professionals who can navigate the complex world of data analysis and interpretation. Apart from the intellectually stimulating nature of the work, one of the key factors that make Data Science an attractive career choice is the potential for high salaries.

The Data Science Job Salaries Dataset provides us with valuable insights into the earning potential of different roles within the Data Science domain. By exploring this dataset, we can gain a comprehensive understanding of the salary trends, identify the most in-demand job titles, and uncover the factors that contribute to variations in salaries across regions and industries.

Importing Essential Libraries

mport numpy as np # linear algebr
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker

Loading the Data Science Salaries Dataset

df = pd.read_csv('/kaggle/input/data-science-salaries-2023/ds_salaries.csv')
df.head()

N?o foi fornecido texto alternativo para esta imagem — head

Checking the Shape of the DataFrame

df.shape

(3755, 11)

This means that the DataFrame df has 3755 rows and 11 columns. Each row represents a unique data point (in this case, a data scientist's salary information), and each column represents a different attribute of the data point (such as work year, experience level, employment type, job title, salary, etc.).

Obtaining Information about the DataFrame

df.info()

The DataFrame consists of?3755 rows and 11 columns, with data types being integers and objects (typically strings in pandas). Each column contains non-null entries, indicating?there are no missing values. The memory usage of the DataFrame is approximately 322.8 KB.

Counting Null Values in the DataFrame

df.isnull().sum()

Here we got a series showing the?total count of null values in each column. This is a quick way to check if your data has any missing values and, if so, where they are.?No missing values

Statistical Summary of the DataFrame

df.describe().astype(int)

The?average salary in USD is around 137,570. The standard deviation values tell us how spread out the data is. For example, a high standard deviation in salary would mean that salaries vary a lot.

The minimum and maximum values give us the?range of salaries, which in USD is from 5,132 to 450,000.?The 25%, 50%, and 75% values are called percentiles. They tell us that?25% of data scientists earn 95,000 USD or less, and?half of them earn 135,000 USD or less. Interestingly, there's a big difference between the?maximum salary and the maximum salary in USD, which could be due to outliers, errors, or different currency scales. This could be worth investigating further.

Number of job titles

unique_job_titles = df['job_title'].nunique()
print(f'Number of unique job titles: {unique_job_titles}')

Number of unique job titles: 93

Top 10 Job Titles in the Data Science Field

# Calculate the top 10 most common job titles
top_job_titles = df['job_title'].value_counts().nlargest(10)

# Create a new figure for the plot
plt.figure(figsize=(10, 6))

# Generate a bar plot using seaborn
barplot = sns.barplot(y=top_job_titles.index, x=top_job_titles.values, palette='viridis')

# Add text annotations to the bar plot
for i in range(top_job_titles.shape[0]):
    barplot.text(top_job_titles[i] + 10, i, top_job_titles[i], va='center')

# Add title and labels to the plot
plt.title('Top 10 Job Titles')
plt.xlabel('Count')
plt.ylabel('Job Title')

# Display the plot
plt.show()

Salary Statistics for Various Data Science Roles

# Define a dictionary for job roles and their corresponding salarie
job_roles = {
? ? 'Data Engineer': 'Data Engineer',
? ? 'Data Scientist': 'Data Scientist',
? ? 'Data Analyst': 'Data Analyst',
? ? 'Machine Learning Engineer': 'Machine Learning Engineer',
? ? 'Analytics Engineer': 'Analytics Engineer'}


# Iterate over the job roles and calculate the highest, lowest, and average salaries
for role, title in job_roles.items():
? ? salaries = df[df['job_title'] == title]['salary_in_usd']
? ? max_salary = salaries.max()
? ? min_salary = salaries.min()
? ? avg_salary = int(salaries.mean())
? ??
? ? # Print the salary summary for each role
? ? print(role + ':')
? ? print('? - Highest Salary:', max_salary)
? ? print('? - Lowest Salary:', min_salary)
? ? print('? - Average Salary:', avg_salary)
? ? print()

Comparison of Salary Summaries

# Define the job role
job_roles = ['Data Engineer', 'Data Scientist', 'Data Analyst', 'Machine Learning Engineer', 'Analytics Engineer']

# Set the seaborn palette
sns.set_palette("viridis")

# Create a grid of subplots for job roles' salary summary
fig, axs = plt.subplots(len(job_roles), 1, figsize=(8, 5 * len(job_roles)))

# Iterate over job roles
for i, role in enumerate(job_roles):
? ? # Filter the dataset for the specific job role
? ? salaries = df[df['job_title'] == role]['salary_in_usd']
? ??
? ? # Calculate the highest, lowest, and average salaries
? ? max_salary = salaries.max()
? ? min_salary = salaries.min()
? ? avg_salary = int(salaries.mean())
? ??
? ? # Plot the salary summary
? ? sns.barplot(x=['Highest', 'Lowest', 'Average'], y=[max_salary, min_salary, avg_salary], ax=axs[i])
? ? axs[i].set_title(f'{role} Salary')
? ? axs[i].set_ylabel('Salary (USD)')
? ??
? ? # Add value labels to the bars
? ? for j, value in enumerate([max_salary, min_salary, avg_salary]):
? ? ? ? axs[i].text(j, value, f'${value:,}', ha='center', va='bottom')

# Adjust spacing between subplots and remove any excess blank space
plt.tight_layout()

# Show the plot
plt.show()

Salary distribution

# Distribution of salaries
salary_distribution = df['salary_in_usd'].describe().round(2)
print('\nSalary Distribution:')
print(salary_distribution)

# Salary distribution
plt.figure(figsize=(10, 6))  # Create a new figure for the plot with a specific size of 10 inches in width and 6 inches in height
sns.histplot(df[df['salary_in_usd'] < df['salary_in_usd'].quantile(0.95)], x='salary_in_usd', bins=20, kde=True, color='skyblue')  # Generate a histogram plot using seaborn, restricting the data to values below the 95th percentile of 'salary_in_usd' column, with 20 bins, including a kernel density estimate, and setting the color to 'skyblue'
plt.title('Salary Distribution')  # Add a title to the plot as 'Salary Distribution'
plt.xlabel('Salary in USD')  # Label the x-axis as 'Salary in USD'
plt.ylabel('Frequency')  # Label the y-axis as 'Frequency'
plt.show()  # Display the plot

This code creates a histogram of the 'salary_in_usd' column, but only?includes salaries below the 95th percentile to exclude outliers.

Top job salaries

# Subset the data for the most frequent job title
top_job_titles = df['job_title'].value_counts().nlargest(10).index
subset_df = df[df['job_title'].isin(top_job_titles)]

# Set the figure size and adjust spacing
plt.figure(figsize=(12, 8))
sns.set(font_scale=1.0)
sns.set_style("whitegrid")

# Create the box plot
sns.boxplot(data=subset_df, x='job_title', y='salary_in_usd', order=top_job_titles, palette='viridis')

# Set plot title and labels
plt.title('Salary Distribution for Top Job Titles', fontsize=16)
plt.xlabel('Job Title', fontsize=14)
plt.ylabel('Salary in USD', fontsize=14)

# Rotate x-axis labels for better readability
plt.xticks(rotation=45, ha='right')

# Adjust y-axis limits for better visualization of outliers
plt.ylim(bottom=0)

# Add more space between plots
plt.tight_layout()

# Show the plot
plt.show()

Each box in the plot represents the salary range for a specific job title. By comparing the positions and lengths of the boxes, we can gain insights into the salary distribution across different job titles. If a box is positioned higher on the y-axis, it indicates a higher median salary for that job title compared to others. Similarly, if a box is longer, it suggests a wider salary range and potentially more variability in salaries.

The whiskers extending from the boxes represent the data within a certain range, usually 1.5 times the IQR. Any data points beyond the whiskers are considered outliers and are represented as individual points on the plot.

领英推荐

The Evolving Skill Requirements For Data Analysts In…

Enterprise DNA 1 年前

Joining the Data Industry in 2025

Leon Gordon 2 个月前

The Roadmap to learn Data Science in 2022 - The…

Alfredo Serrano Figueroa 2 年前

By examining this box plot, we can observe the range of salaries and the variations across the top job titles. We can identify job titles with higher or lower median salaries, as well as those with wider or narrower salary distributions. This information can provide insights into salary trends, potential salary gaps, and the overall salary landscape for different job titles in the dataset.

Distribution of Employment Types

# Replace specific values in the 'employment_type' column
df['employment_type'] = df['employment_type'].replace({'FT': 'Full-time', 'PT': 'Part-time', 'C': 'Contract', 'I': 'Internship', 'F': 'Freelance', 'CT': 'Contract', 'FL': 'Freelance'})

# Calculate the distribution of values in the 'employment_type' column
employment_type_distribution = df['employment_type'].value_counts()

# Print the heading for the employment type distribution
print('Employment Type Distribution:')

# Print the distribution of employment types
print(employment_type_distribution)

# Create a new figure for the plot with a specific size of 10 inches in width and 6 inches in height
plt.figure(figsize=(10, 6))

# Generate a countplot using seaborn's 'countplot' function
barplot = sns.countplot(x='employment_type', data=df, order=df['employment_type'].value_counts().index, palette='viridis')
# The x-axis represents the 'employment_type' column from the DataFrame 'df'.
# The bars are ordered based on the value counts of each employment type.
# The color palette is set to 'viridis'.

# Iterate over the bars in the countplot
for p in barplot.patches:
? ? height = p.get_height()
? ? # Get the height (count) of the bar

? ? # Annotate each bar with its count
? ? barplot.annotate(format(round(height), ','), (p.get_x() + p.get_width() / 2., height), ha='center', va='center', xytext=(0, 5), textcoords='offset points')
? ? # The count is formatted with comma separators.
? ? # The annotation is positioned at the center of the bar's height with a small offset.

# Add a title to the plot as 'Employment Type Distribution'
plt.title('Employment Type Distribution')

# Label the x-axis as 'Employment Type'
plt.xlabel('Employment Type')

# Label the y-axis as 'Count'
plt.ylabel('Count')

# Display the plot
plt.show()

The majority of data scientists are employed full-time, which aligns with the common expectation for this profession, with 3,718 instances. There are also 17 part-time, 10 contract, and 10 freelance positions.

Average Salary by Job Title

# Calculate the average salary by job title
average_salary_by_job_title = df.groupby('job_title')['salary_in_usd'].mean().sort_values(ascending=False).astype(int)

# Print the heading for average salary by job title
print('\nAverage Salary by Job Title:')

# Print the average salary values for each job title
print(average_salary_by_job_title)

The job title with the highest average salary is?"Data Science Tech Lead", with an average salary of $375,000.

The job title with the lowest average salary is?"Power BI Developer", with an average salary of $5,409

Top 15 Average Salaries by Job Title in the Data Science Field

# Select the top 15 average salaries from the 'average_salary_by_job_title' Serie
top_10_salaries = average_salary_by_job_title.nlargest(15)

# Create a new figure for the plot with a specific size of 10 inches in width and 6 inches in height
plt.figure(figsize=(10, 6))

# Generate a bar plot using seaborn's 'barplot' function
barplot = sns.barplot(y=top_10_salaries.index, x=top_10_salaries.values, palette='viridis')

# Add text annotations to the bar plot
for i in range(top_10_salaries.shape[0]):
? ? # Create a label for the salary value by dividing it by 1000 and formatting it to display as a string with one decimal place followed by 'k'
? ? salary_label = f'${top_10_salaries[i] / 1000:.1f}k'

? ? # Add text annotations to the bar plot, positioning the salary value slightly to the right of the bar and the label at the center of the bar
? ? barplot.text(top_10_salaries[i]+1000, i, salary_label, ha='left', va='center')

# Add a title to the plot as 'Top 15 Average Salaries by Job Title'
plt.title('Top 15 Average Salaries by Job Title')

# Label the x-axis as 'Average Salary (USD)'
plt.xlabel('Average Salary (USD)')

# Label the y-axis as 'Job Title'
plt.ylabel('Job Title')

# Display the plot
plt.show()

The plot displays the top 15 job titles in terms of average salaries. Each bar represents a job title, and the length of the bar indicates the average salary associated with that job title.

Bottom salaries

# Select the bottom 15 average salaries from the 'average_salary_by_job_title' Serie
bottom_15_salaries = average_salary_by_job_title.nsmallest(15)[::-1]

# Create a new figure for the plot with a specific size of 10 inches in width and 6 inches in height
plt.figure(figsize=(10, 6))

# Generate a bar plot using seaborn's 'barplot' function
barplot = sns.barplot(y=bottom_15_salaries.index, x=bottom_15_salaries.values, palette='viridis')

# Add text annotations to the bar plot
for i in range(bottom_15_salaries.shape[0]):
? ? # Create a label for the salary value by dividing it by 1000 and formatting it to display as a string with one decimal place followed by 'k'
? ? salary_label = f'${bottom_15_salaries.values[i] / 1000:.1f}k'

? ? # Add text annotations to the bar plot, positioning the salary value at the corresponding position on the x-axis and the label at the center of the bar
? ? barplot.text(bottom_15_salaries.values[i], i, salary_label, va='center')

# Add a title to the plot as 'Bottom 15 Average Salaries by Job Title'
plt.title('Bottom 15 Average Salaries by Job Title')

# Label the x-axis as 'Average Salary (USD)'
plt.xlabel('Average Salary (USD)')

# Label the y-axis as 'Job Title'
plt.ylabel('Job Title')

# Display the plot
plt.show()

The plot displays the bottom 15 job titles in terms of average salaries. Each bar represents a job title, and the length of the bar indicates the average salary associated with that job title.

Experience level distribution

# Replace specific values in the 'experience_level' colum
df['experience_level'] = df['experience_level'].replace({'SE': 'Senior', 'MI': 'Mid-level', 'EN': 'Entry-level', 'EX': 'Executive'})

# Calculate the distribution of values in the 'experience_level' column
experience_level_distribution = df['experience_level'].value_counts()

# Print the heading for experience level distribution
print('\nExperience Level Distribution:')

# Print the distribution of experience levels
print(experience_level_distribution)

# Create a new figure for the plot with a specific size of 8 inches in width and 6 inches in heigh
plt.figure(figsize=(8, 6))

# Generate a countplot using seaborn's 'countplot' function
barplot = sns.countplot(data=df, x='experience_level', palette='viridis', order=['Executive', 'Senior', 'Mid-level', 'Entry-level'])
# The data is sourced from the DataFrame 'df', and the x-axis represents the 'experience_level' column.
# The color palette is set to 'viridis', and the order of the bars is specified as ['Executive', 'Senior', 'Mid-level', 'Entry-level'].

for p in barplot.patches:
? ? height = p.get_height()
? ? # Get the height (count) of each bar.

? ? barplot.annotate(format(round(height), ','), (p.get_x() + p.get_width() / 2., height), ha='center', va='center', xytext=(0, 5), textcoords='offset points')
? ? # Add text annotations to each bar, displaying the count with comma separators.
? ? # The annotation is positioned at the center of each bar's height with a small offset.

# Add a title to the plot as 'Experience Level Distribution'
plt.title('Experience Level Distribution')

# Label the x-axis as 'Experience Level'
plt.xlabel('Experience Level')

# Label the y-axis as 'Count'
plt.ylabel('Count')

# Display the plot
plt.show()

The plot shows the distribution of data scientists across different experience levels. The taller bars represent a higher number of data scientists in the 'Mid-level' and 'Senior' categories. On the other hand, the 'Executive' category has the fewest data scientists, indicated by the shortest bar. The 'Entry-level' category falls in between, with a moderate number of data scientists. This distribution gives us insights into the composition of data scientists based on their experience levels, helping us understand the experience requirements and expertise within the DS field.

Average salary by experience level

# Replace specific values in the 'experience_level' colum
df['experience_level'] = df['experience_level'].replace({'Ex': 'Executive', 'SE': 'Senior', 'Mi': 'Mid-level', 'EN': 'Entry-level'})

# Calculate the average salary by experience level
average_salary_by_experience_level = df.groupby('experience_level')['salary_in_usd'].mean().sort_values(ascending=False).astype(int)

# Print the heading for average salary by experience level
print('\nAverage Salary by Experience Level:')

# Print the average salary values by experience level
print(average_salary_by_experience_level)

The experience level with the highest average salary is?"EX" (Expert), with an average salary of approximately $194,931.

The experience level with the lowest average salary is?"EN" (Entry-level), with an average salary of approximately $78,546.

# Replace experience level abbreviations with full description
df['experience_level'] = df['experience_level'].replace({'Ex': 'Executive', 'SE': 'Senior', 'Mi': 'Mid-level', 'EN': 'Entry-level'})

# Calculate average salary by experience level
average_salary_by_experience_level = df.groupby('experience_level')['salary_in_usd'].mean().sort_values(ascending=False).astype(int)

# Set a lighter font style
sns.set(font_scale=0.8)

# Set the style without the dark background
sns.set_style('ticks')

# Create the plot
plt.figure(figsize=(10, 6))
barplot = sns.barplot(x=average_salary_by_experience_level.values, y=average_salary_by_experience_level.index, palette='viridis')

# Add value labels to the bars
for i, value in enumerate(average_salary_by_experience_level.values):
? ? barplot.text(value + 1000, i, f'${value/1000:.1f}k', ha='left', va='center')

# Set plot title and labels
plt.title('Average Salary by Experience Level')
plt.xlabel('Average Salary (USD)')
plt.ylabel('Experience Level')

# Show the plot
plt.show()

the "Executive" experience level has the highest average salary, followed by "Senior", "Mid-level", and "Entry-level". This suggests that as professionals gain more experience and progress in their careers, they tend to earn higher salaries.

The visualization provides insights into the relationship between experience level and salary in the field. It highlights the importance of experience in determining salary levels and can help individuals in understanding the salary expectations associated with different experience levels.

Remote work ratios


# Replace specific values in the 'remote_ratio' column
df['remote_ratio'] = df['remote_ratio'].replace({0: 'In-office', 50: 'Hybrid', 100: 'Fully Remote'})

# Calculate the distribution of values in the 'remote_ratio' column
remote_ratio_distribution = df['remote_ratio'].value_counts()

# Print the heading for remote work ratio distribution
print('Remote Work Ratio Distribution:')

# Print the distribution of remote work ratios
print(remote_ratio_distribution)

# Reindex the 'remote_ratio_distribution' Series to ensure the desired order of categories: 'In-office', 'Hybrid', 'Fully Remote'
remote_ratio_distribution = remote_ratio_distribution.reindex(['In-office', 'Hybrid', 'Fully Remote'])


# Create a new figure for the plot with a specific size of 10 inches in width and 6 inches in height
plt.figure(figsize=(10, 6))

# Generate a bar plot using seaborn's 'barplot' function
barplot = sns.barplot(x=remote_ratio_distribution.index, y=remote_ratio_distribution.values, palette='viridis')
# The x-axis represents the categories from the 'remote_ratio_distribution' index,
# the y-axis represents the count values from the 'remote_ratio_distribution' values,
# and the color palette is set to 'viridis'.

# Add text annotations to the bars
for i, value in enumerate(remote_ratio_distribution.values):
? ? # Iterate over the values in the 'remote_ratio_distribution' values
? ? barplot.text(i, value + 100, f'{value}', ha='center', va='bottom')
? ? # Add text annotations to the bars, displaying the count values just above each bar.

# Add a title to the plot as 'Remote Work Ratio Distribution'
plt.title('Remote Work Ratio Distribution')

# Label the x-axis as 'Remote Work Ratio'
plt.xlabel('Remote Work Ratio')

# Remove the y-axis label
plt.ylabel('')

# Remove the y-axis tick labels
barplot.set_yticklabels([])

# Adjust the y-axis limit to improve the visibility of the bars and annotations
plt.ylim(0, remote_ratio_distribution.max() + 500)

# Improve the spacing between the elements of the plot
plt.tight_layout()

# Display the plot
plt.show()

Conclusion

In conclusion, the Data Science Job Salaries Dataset provides valuable insights into the job market for data science professionals. By examining the dataset, we were able to uncover patterns and trends that shed light on salary ranges, job titles, experience requirements, and the most sought-after skills in the field.

However, it is important to note that our analysis is not exhaustive, and there is still much more exploration and interpretation that can be done with this dataset. To continue the analysis and delve deeper into the findings, I recommend referring to the Kaggle notebook associated with this dataset. The notebook provides a platform for further exploration, data visualization, and advanced modeling techniques.

By leveraging the power of the Kaggle platform and the comprehensive dataset, researchers, aspiring data scientists, and industry professionals can continue to gain valuable insights and make informed decisions in the ever-evolving landscape of data science job opportunities.

Note: The Kaggle notebook and the dataset are available for further analysis and exploration.

要查看或添加评论，请登录

Leonardo A.的更多文章

Techniques for Exploratory Data Analysis and Interpretation of Statistical Graphs

2024年11月20日

Techniques for Exploratory Data Analysis and Interpretation of Statistical Graphs

Overview In this project, we’ll explore techniques for exploratory data analysis and dive into the interpretation of…

2 条评论
SQL: Mastering Data Engineering Essentials

2024年9月19日

SQL: Mastering Data Engineering Essentials

Here’s an interesting fact: do you know when the SQL language was created? When it first appeared? I do! It was in…
The Power of Hypothesis Testing

2024年8月3日

The Power of Hypothesis Testing

Hypothesis testing is a fundamental tool in inferential statistics and data science, allowing us to evaluate claims…
Normalization and Standardization in Data?Science: When to apply one, when to apply the?other?

2024年8月2日

Normalization and Standardization in Data?Science: When to apply one, when to apply the?other?

I’m going to bring you now probably the topic that generates the most doubts among those who are just starting their…
Mastering Data Preprocessing in Python Pandas: 23+ Clear Examples

2024年7月4日

Mastering Data Preprocessing in Python Pandas: 23+ Clear Examples

1. Introduction Data preprocessing is a critical step in any data analysis or machine learning project.
Data Splitting in Machine Learning: Techniques and?Pitfalls

2024年7月1日

Data Splitting in Machine Learning: Techniques and?Pitfalls

Machine learning is all the rage these days, but are you really grasping the fundamentals? If you’re diving into this…
Building and Deploying a Machine Learning Model with Flask (Model & Deploy Guide)

2024年6月28日

Building and Deploying a Machine Learning Model with Flask (Model & Deploy Guide)

We have completed the first part of our project, which was building the Machine Learning model. Now, let’s move on to…
8 Steps to Building a Machine Learning Model for Classification

2024年6月26日

8 Steps to Building a Machine Learning Model for Classification

Explore the process of creating, training, and deploying a machine learning model to predict product types based on…

1 条评论
9-Step Guide to Building Machine Learning Models

2024年6月24日

9-Step Guide to Building Machine Learning Models

In this article, I will walk you through the process of building machine learning models. I will first describe the…
Data Engineering: Principles of ETL vs. ELT

2024年6月21日

Data Engineering: Principles of ETL vs. ELT

Introduction There is a long journey within data engineering, especially in the ETL process. ETL is an acronym that…

See all articles

Data Science Salaries 2023 Dataset??

Leonardo A.

Data Analyst

Introduction

Importing Essential Libraries

Loading the Data Science Salaries Dataset

Checking the Shape of the DataFrame

Obtaining Information about the DataFrame

Counting Null Values in the DataFrame

Statistical Summary of the DataFrame

Number of job titles

Top 10 Job Titles in the Data Science Field

Salary Statistics for Various Data Science Roles

Comparison of Salary Summaries

Salary distribution

Top job salaries

领英推荐

Distribution of Employment Types

Average Salary by Job Title

Top 15 Average Salaries by Job Title in the Data Science Field

Bottom salaries

Experience level distribution

Average salary by experience level

Remote work ratios

Conclusion

Note: The Kaggle notebook and the dataset are available for further analysis and exploration.

Leonardo A.的更多文章

社区洞察

其他会员也浏览了

The Roadmap to learn Data Science in 2022 - The efficient way

Meet Christopher Kusha - Data Analyst

?? Charting Your Path in Data Analytics ??

Handling Missing Data in Pandas

The Dynamic Duo: SAS for Big Data Analytics, and R for Plotting

Last week in Data+ Week 41

The Day I decided not to be called "Data Scientist"

The 4 Core Competencies of The Future-Proof Data Scientist

Where Do I Begin? A Data Analytics Journey for the New Year

Introduction to Data Preparation in R

Introduction

Importing Essential Libraries

Loading the Data Science Salaries Dataset

Checking the Shape of the DataFrame

Obtaining Information about the DataFrame

Counting Null Values in the DataFrame

Statistical Summary of the DataFrame

Number of job titles

Top 10 Job Titles in the Data Science Field

Salary Statistics for Various Data Science Roles

Comparison of Salary Summaries

Salary distribution

Top job salaries

领英推荐

Distribution of Employment Types

Average Salary by Job Title

Top 15 Average Salaries by Job Title in the Data Science Field

Bottom salaries

Experience level distribution

Average salary by experience level

Remote work ratios

Conclusion

Note: The Kaggle notebook and the dataset are available for further analysis and exploration.

Leonardo A.的更多文章

Techniques for Exploratory Data Analysis and Interpretation of Statistical Graphs

SQL: Mastering Data Engineering Essentials

The Power of Hypothesis Testing

Normalization and Standardization in Data?Science: When to apply one, when to apply the?other?

Mastering Data Preprocessing in Python Pandas: 23+ Clear Examples

Data Splitting in Machine Learning: Techniques and?Pitfalls

Building and Deploying a Machine Learning Model with Flask (Model & Deploy Guide)

8 Steps to Building a Machine Learning Model for Classification

9-Step Guide to Building Machine Learning Models

Data Engineering: Principles of ETL vs. ELT

社区洞察

其他会员也浏览了

The Roadmap to learn Data Science in 2022 - The efficient way

Meet Christopher Kusha - Data Analyst

?? Charting Your Path in Data Analytics ??

Handling Missing Data in Pandas

The Dynamic Duo: SAS for Big Data Analytics, and R for Plotting

Last week in Data+ Week 41

The Day I decided not to be called "Data Scientist"

The 4 Core Competencies of The Future-Proof Data Scientist

Where Do I Begin? A Data Analytics Journey for the New Year

Introduction to Data Preparation in R