- What is Data Science?
- What is the difference between supervised and unsupervised learning?
- What are precision and recall? How are they different?
- What is a confusion matrix? How do you interpret it?
- What is cross-validation? Why is it important?
- What is overfitting and underfitting in machine learning models?
- Explain the bias-variance tradeoff.
- What is the difference between correlation and causation?
- What is the Central Limit Theorem, and why is it important in statistics?
- Explain p-value in hypothesis testing.
- What is the difference between Type I and Type II errors?
- What is a normal distribution? Why is it important?
- Explain Bayes’ Theorem and its application in machine learning.
- What is A/B testing? How would you use it in a business context?
- How would you handle missing data in a dataset?
- What is data normalization and standardization?
- Explain the difference between L1 and L2 regularization.
- How would you detect outliers in your data?
- What techniques would you use for feature selection?
- How does a decision tree algorithm work?
- What is the difference between bagging and boosting in ensemble methods?
- Explain how random forests work.
- How does K-Nearest Neighbors (KNN) algorithm work?
- What is the purpose of gradient descent in machine learning?
- How does a support vector machine (SVM) work?
- Explain K-Means clustering. How do you choose the value of K?
- What is the difference between NumPy and Pandas in Python?
- How do you merge two dataframes in Pandas?
- What is the difference between a list, a tuple, and a dictionary in Python?
- How would you use Python to implement a linear regression model?
- Explain the difference between apply(), map(), and applymap() in Pandas.
- What is deep learning? How does it differ from traditional machine learning?
- Explain the working of a neural network.
- What is a convolutional neural network (CNN)? Where is it used?
- What is reinforcement learning, and where is it applied?
- How does natural language processing (NLP) work?
- How would you handle imbalanced datasets?
- How would you explain a complex model to a non-technical stakeholder?
- If you find that your model performs well on training data but poorly on test data, what steps would you take?
- You are given a dataset. How would you approach building a predictive model?
- How do you measure the success of a machine learning model?
- What are some key data visualization techniques you use?
- How would you visualize the correlation between multiple variables?
- What is the difference between a histogram and a bar chart?