Data Science and Machine Learning Q&A
1. What is DBSCAN clustering?
- Answer: DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups data points that are densely packed into clusters. It identifies clusters based on local density and is highly effective in handling large spatial datasets. One key feature of DBSCAN is its robustness to outliers and that it doesn’t require a predefined number of clusters, unlike K-Means clustering.
2. What are the different types of joins in SQL?
- Answer: SQL offers multiple join types to define relationships between tables in a query. These include:
- INNER JOIN: Retrieves matching rows between tables.
- OUTER JOIN: Returns all rows from one table and matching rows from the other (can be LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER JOIN).
- SELF JOIN: Joins a table to itself.
- CROSS JOIN: Produces the Cartesian product of two tables.
3. How does grid search differ from random search in hyperparameter tuning?
领英推荐
- Answer: Hyperparameter tuning optimizes a model's performance by selecting the best parameter combinations.
- Grid Search: Systematically evaluates every specified combination of parameters.
- Random Search: Randomly selects parameter combinations, which can be faster but may result in high variance. Grid search is thorough, while random search is less predictable but can be efficient for larger parameter spaces.
4. How should you maintain a deployed model?
- Answer: A deployed model requires ongoing monitoring and periodic retraining to maintain its accuracy. Key steps include:
- Tracking model predictions and actual outcomes for retraining.
- Performing root cause analysis on incorrect predictions.
- Incorporating new data over time to improve model performance and adjust for changes in data patterns.