登录查看更多内容

5 Tricky Data Science Interview questions asked by Top Companies

Shivam Modi

46K Followers | I help people build their AI & Data Science career | Founder & CEO - Learn Everything AI | IIT Bombay | Click "Follow" to learn AI & Data Science daily

发布日期: 2022年8月14日

1. You are given a data set consisting of variables with more than 30 percent missing values. How will you deal with them?

The following ways to handle missing data values: If the data set is large, we can simply remove the rows with missing data values. It is the quickest way; we use the rest of the data to predict the values. For smaller data sets, we can substitute missing values with the mean or average of the rest of the data using the pandas' data frame in python. There are different ways to do so, such as df.mean(), df.fillna(mean).

2. What are dimensionality reduction and its benefits?

Dimensionality reduction refers to the process of converting a data set with vast dimensions into data with fewer dimensions (fields) to convey similar information concisely. This reduction helps in compressing data and reducing storage space. It also reduces computation time as fewer dimensions lead to less computing. It removes redundant features; for example, there's no point in storing a value in two different units (meters and inches).

3. How can you select k for k-means?

We use the elbow method to select k for k-means clustering. The idea of the elbow method is to run k-means clustering on the data set where 'k' is the number of clusters. Within the sum of squares (WSS), it is defined as the sum of the squared distance between each member of the cluster and its centroid.

4. What is the significance of p-value?

p-value typically ≤ 0.05 This indicates strong evidence against the null hypothesis; so you reject the null hypothesis. p-value typically > 0.05 This indicates weak evidence against the null hypothesis, so you accept the null hypothesis. p-value at cutoff 0.05 This is considered to be marginal, meaning it could go either way.

领英推荐

Effortless Data Exploration with Pandas Profiling

360DigiTMG 1 年前

Unmasking Real-World Data Science: A Departure from…

Royal Cyber Asia 1 年前

Roadmap to Becoming a Data Scientist In 2023-24

Arif Alam 1 年前

5. You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 96 percent. Why shouldn't you be happy with your model's performance? What can you do about it?

Cancer detection results in imbalanced data. In an imbalanced dataset, accuracy should not be based as a measure of performance. It is important to focus on the remaining four percent, which represents the patients who were wrongly diagnosed. Early diagnosis is crucial when it comes to cancer detection, and can greatly improve a patient's prognosis. Hence, to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), and F measure to determine the class-wise performance of the classifier.?

Looking forward to becoming a Data Scientist? Check out the Data Science Course and get certified today.

Celebrate your freedom to learn Data Science and build a brighter future. Best chance to Save, Enroll Now.

Data Science Daily

3,337 位关注者

Shivam Modi

46K Followers | I help people build their AI & Data Science career | Founder & CEO - Learn Everything AI | IIT Bombay | Click "Follow" to learn AI & Data Science daily

2 年

Checkout my Data Science combo course and become Job-Ready Data Scientist. ?Course Link: https://learneverythingai.com/data-science-combo-course/ ?Website: https://www.learneverythingai.com

要查看或添加评论，请登录

Shivam Modi的更多文章

Important Questions for Data Scientist Interview Pt-2

2022年8月28日

Important Questions for Data Scientist Interview Pt-2

Q. How do you generally choose among different classification models to decide which one is performing the best? A.
Important Questions for Data Scientist Interview Pt-1

2022年8月25日

Important Questions for Data Scientist Interview Pt-1

Q1. What is the main difference between overfitting and underfitting? A.
5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

2022年8月22日

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

Q. What is SelectK best? how does it works? A.
Toughest Statistics Interview Questions

2022年8月20日

Toughest Statistics Interview Questions

Q1. What is Hypothesis testing? A.

1 条评论
How to become a complete Data Scientist in 2 months.

2022年8月16日

How to become a complete Data Scientist in 2 months.

???? Independence day sale ENDS TODAY. Looking forward to becoming a Data Scientist in 2 months? Check out my Data…
Here are 5 Interview Questions(with Answers) asked to my Friend for Microsoft (Machine Learning) Job.

2022年8月11日

Here are 5 Interview Questions(with Answers) asked to my Friend for Microsoft (Machine Learning) Job.

Q1. How to improve your Model performance? A.

2 条评论
Learn these 4 Data Science skills and Increase your salary by at least 60%

2022年8月10日

Learn these 4 Data Science skills and Increase your salary by at least 60%

Check out this blog to read the full article: #datascience #machinelearning #deeplearning #sql #datascientist

1 条评论
Kick-Start your Data Science career now!

2022年8月9日

Kick-Start your Data Science career now!

As per Harvard Business Review, Data Scientist is the Sexiest Job of the 21st century. Data science was named the…

See all articles

5 Tricky Data Science Interview questions asked by Top Companies

Shivam Modi

46K Followers | I help people build their AI & Data Science career | Founder & CEO - Learn Everything AI | IIT Bombay | Click "Follow" to learn AI & Data Science daily

领英推荐

Data Science Daily

3,337 位关注者

Shivam Modi的更多文章

社区洞察

其他会员也浏览了

Who is the Data Scientist and why company should have one?

“Clustering: From Fruits to Finance, Unraveling Data Mysteries”

Linear Probability Model: Why It Still Has Value in Data Science

20 Key Questions in Data Science Interviews

The Importance of EDA in Data Analysis: Why Every Data Scientist Needs a Strong Foundation in Data Exploration

Association Rules in Data Science: Unveiling Hidden Patterns in Data

Hypothesis Testing: A Key Tool in Data Science

Understanding p-Values and Statistical Significance in Data Science

Exploratory Data Analysis (EDA) in Data Science

ANOVA and Chi-Square Tests in Data Science

领英推荐

Data Science Daily

3,337 位关注者

Shivam Modi的更多文章

Important Questions for Data Scientist Interview Pt-2

Important Questions for Data Scientist Interview Pt-1

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

Toughest Statistics Interview Questions

How to become a complete Data Scientist in 2 months.

Here are 5 Interview Questions(with Answers) asked to my Friend for Microsoft (Machine Learning) Job.

Learn these 4 Data Science skills and Increase your salary by at least 60%

Kick-Start your Data Science career now!

社区洞察

其他会员也浏览了

Who is the Data Scientist and why company should have one?

“Clustering: From Fruits to Finance, Unraveling Data Mysteries”

Linear Probability Model: Why It Still Has Value in Data Science

20 Key Questions in Data Science Interviews

The Importance of EDA in Data Analysis: Why Every Data Scientist Needs a Strong Foundation in Data Exploration

Association Rules in Data Science: Unveiling Hidden Patterns in Data

Hypothesis Testing: A Key Tool in Data Science

Understanding p-Values and Statistical Significance in Data Science

Exploratory Data Analysis (EDA) in Data Science

ANOVA and Chi-Square Tests in Data Science