"From Application to Offer: How to Succeed in Your Data Science & Analytics Interview"
Aditya Prasad (MTech Data Science and Engineering)
Senior Data Scientist || AI , Analytics & Digital Transformation @ Empower India || MTech- Data Science and Engineering, Bits Pilani || Believe in Data and not Miracles
Hello readers, I'm Aditya Prasad, a data scientist with 4.5 years of experience in the field. Throughout my career, I have worked on a variety of projects from different domains (Airline, FMCG, Technology, Marketing, Finance and retirement Industry), from predictive modeling to natural language processing, helping organizations extract insights from their data and make data-driven decisions.
With a strong professional career and having given interviews for more than 30 companies, I have developed a deep understanding about how to prepare for a Data Science and Analytics Job.
In this article, I will be sharing some of my insights and experiences on
Whether you're a seasoned data scientist or just starting out in the field, this article will be informative and useful. The content will be useful for candidates who wants to pursue career in Data Science, Data Analytics, Business Analyst, Machine Learning Engineer.
NOTE: Go through the article carefully and finish the whole document. This is a long article. So be patient. You can keep this article handy for all your upcoming job interviews.
Topics to be Covered in the Article:
? Data Science Applications in different Industries (Retail, Airline, Finance, Healthcare)
? Traits Required for a Data Scientist Job
? How to prepare a CV for the Data Science/Analytics Job
? Interview rounds in a Data Science/ Data Analytics Job:?
? Microsoft Excel Skills for Data Science/ Analytics Job:
? SQL Skills for Data Science/Analytics Job:
? PYTHON important topics for Data Science/Analytics Jobs:
? R programming for Data Science/Analytics Jobs:
? Mathematical Foundations for Data Science:
? Data Structure and Algorithm Design
? Statistics for Data Science/Analytics
? Machine Learning
? Data engineering (Only tools mentioned)
? Cloud computing (Only technologies mentioned)
? Dashboarding (Only tools mentioned)
? Natural Language Processing
? Conclusion
Data Science in different Industries:
Data Science and Analytics is a field which is used across different types of industries and has a vast number of applications. As per my work experience, I have laid down the applications for Data Science used by the companies in 4 major types of Industries.
1.??Amazon: Amazon uses #datascience extensively to personalize its product recommendations, optimize its #supplychain , and improve customer experience.
2.?Walmart: Walmart uses data science to optimize its #inventorymanagement , reduce costs, and improve customer experience. It also uses #predictiveanalytics to forecast demand and adjust pricing.
3.Target: Target uses data science to personalize its #marketingcampaign optimize its #inventorymanagement ,and improve #customerexperience. It also uses #machinelearning to forecast demand and adjust pricing.
4.Starbucks: Starbucks uses data science to optimize its store locations, personalize its marketing campaigns, and improve customer experience. It also uses machine learning to forecast demand and adjust inventory levels.
1. #fidelityinvestments : Fidelity Investments is a financial services company that uses data science to offer personalized investment advice and products to its clients.
??????? 2. #empower: Financial services company using data science for forecasting and capacity planning for different market and business segments of US and rest of the world
??????? 3. S&P Global Market Intelligence: S&P Global Market Intelligence is a provider of financial data and analytics, offering a range of tools to help financial institutions make informed decisions.
????? 4. FICO: FICO is a data analytics company that provides solutions for credit scoring, #frauddetection, and #riskmanagement.
1.Pfizer: Pfizer is a leading pharmaceutical company that uses #datascience to accelerate drug discovery, development and to predict the effectiveness of new drug candidates.
??????? 2. Johnson & Johnson: a multinational medical device, pharmaceutical, and consumer goods company that developed #predictiveanalytics platform that uses machine learning algorithms to identify patients who are at risk of developing certain diseases.
???????3.IBM Watson Health: IBM Watson Health focuses on using #datascience and #artificialintelligence to improve healthcare outcomes. They offer a range of products and services, including a #machinelearning platform that can help healthcare providers identify patterns and trends in patient data.
???????4.Optum: Optum is a healthcare services and technology company that uses #machinelearning algorithms to identify areas where healthcare providers can cut costs and improve efficiency.
??? 1. Delta Air Lines: Delta has a team of data scientists that uses #machinelearning algorithms to optimize flight schedules, crew assignments, and maintenance operations. They also use data to personalize customer experiences, such as offering customized travel recommendations and targeted promotions.
???? 2.United Airlines: United uses #datascience to improve its revenue management, pricing strategies, and customer service.
??? 3.American Airlines: American Airlines has a #datascience team that uses #machinelearning algorithms to optimize flight schedules, reduce delays, and improve maintenance and food beverage options.
??? 4.Southwest Airlines: Southwest uses predictive analytics to forecast demand and optimize their flight schedules, as well as to improve their baggage handling and on-time performance.
Traits Required for a Data Scientist Job:
1.?Analytical skills: Data scientists need strong #analyticalskills to be able to identify patterns and trends in large and complex datasets. They should be comfortable working with #statisticalanalysis , #algorithms, and data visualization tools to extract insights from data.
2.?Programming skills: Data scientists should be proficient in at least one programming language, such as #pythonprogramming or #r . They should be able to write code to manipulate data, create models, and perform analyses.
3.?Problem-solving skills: Data scientists should be able to define and frame problems in a way that can be solved using data-driven approaches. They should be able to apply their #analyticalskills and #technicalskills to develop solutions that meet business goals.
4.?Communication skills: Data scientists should be able to explain technical concepts to non-technical stakeholders in a clear and concise manner. They should be able to communicate the implications of their analyses and recommendations to decision-makers.
5.?Curiosity and creativity: Data scientists should be curious and open-minded, always looking for new ways to approach problems and find solutions. They should be creative in their thinking and willing to experiment with new techniques and tools.
6.?Business acumen: A data scientist should have a good understanding of the business domain in which they work. This includes an understanding of the company's goals, customer needs, and industry trends.
7.?Attention to detail: Data scientists should be meticulous in their work and pay close attention to detail. They should be able to identify and correct errors in the data.
8. Adaptability: The field of data science is constantly evolving, and data scientists should be able to adapt to new technologies and techniques as they emerge. They should also be willing to learn and continue to develop their skills throughout their career.
How to prepare your CV for the Data Science/Analytics Job:
1.Highlight your technical skills: As a data scientist or analyst, you should have expertise in programming languages (e.g., #python , #r ), statistical software (e.g., #spss , SAS), data visualization tools (e.g., #tableau , #powerbi ), and database management (e.g., #sql ). Make sure to highlight your proficiency in these areas.
2.??Showcase your experience: Your CV should highlight your work experience, especially if you have experience in #dataanalysis , #datamining , #statisticalmodeling , or #machinelearning . If you have relevant projects or research experience, make sure to highlight those as well.
3.???Emphasize your achievements: Use quantitative metrics to describe your achievements and impact in your previous roles. For example, highlight how your analysis or insights led to increased revenue or cost savings for your company.
4.????Tailor your CV to the job description: Read the job description carefully and tailor your CV to highlight the specific skills and experiences the employer is looking for. This will show that you have carefully considered the requirements of the job and are a good fit for the role.
5.????Keep it concise: While you want to highlight your skills and experiences, your CV should be concise and easy to read. Try to keep it to one or two pages.
6.????Include a summary or objective statement: A brief summary or objective statement at the beginning of your CV can help to quickly convey your skills and goals as a data scientist or analyst.
7.????Education: List your educational background, including the name of the school, degree obtained, and the date of graduation. You can also include any relevant coursework, honors, or awards
8.????Personal Information: This should include your full name, address, phone number, and email address.
9.????Proofread and format carefully: Make sure to proofread your CV for errors and format it carefully to make it visually appealing and easy to read.
NOTE: A detailed information about the study materials and the topics to be studied for interview is mentioned in the below sections :
Interview rounds in a Data Science/ Data Analytics Job: ?
Although every interview is different, hiring managers and recruiters are typically looking to learn three main things about you during the interview process:
1.???How interested are you in the company and the role??
2.???How well does your skill set match the job’s requirements?
3.???Would you be a good ‘culture fit’???
In case of #datasciencejobs role; there are mainly the following steps : In interviews for some companies all the steps might not be covered but mostly these are the overall steps that are performed for a #datascience role
Microsoft Excel Skills for Data Science/ Analytics Job:
There are many Excel skills that are valuable for data science jobs, including:
1.?Data Cleaning and Preparation: Excel can be used to clean, organize and format data before it is analyzed in other tools. This includes tasks such as removing duplicates, filtering, sorting, and using formulas to fill in missing data.
2.?Data Analysis: Excel has powerful tools for analyzing data, including pivot tables, charts, and various statistical functions. Being able to use these tools to summarize and analyze large datasets is an important skill for data scientists.
3.?Data Visualization: Excel can be used to create professional-looking charts and graphs to help communicate insights from data. Being able to create clear and effective visualizations is an important skill for data scientists.
4.?Macros and VBA: Macros and VBA (Visual Basic for Applications) can be used to automate repetitive tasks in Excel. Being able to create and modify macros can save time and increase productivity.
5.Advanced Formulas: Excel has many powerful formulas for working with data, including lookup functions, array formulas, and conditional formatting.
6.?Data Import and Export: Excel can be used to import data from a variety of sources, including databases, text files, and web pages. Being able to import and export data from different sources is a valuable skill for data scientists.
7. Excel Add-ins: Excel has many add-ins that can extend its functionality, such as Power Query, Power Pivot, and Solver.
References for upskilling excel:
SQL Skills for Data Science/Analytics Job:
1.?SQL basics: This includes understanding the syntax of SQL, data types, operators, functions, and clauses.
2.Data manipulation: This includes creating, retrieving, updating, and deleting data in SQL databases using SELECT, INSERT, UPDATE, and DELETE statements.
3.?Data aggregation and grouping: This includes using GROUP BY, HAVING, and aggregate functions like COUNT, SUM, AVG, MAX, and MIN to summarize and analyze data.
4.?Joins: This includes understanding different types of joins, such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN, and how to use them to combine data from multiple tables.
5.?Subqueries: This includes using subqueries in SELECT, INSERT, UPDATE, and DELETE statements to retrieve data from nested queries.
6.Set operations: This includes using UNION, UNION ALL, INTERSECT, and EXCEPT operators to combine or compare datasets.
7.?Window functions: This includes using OVER and PARTITION BY clauses to perform calculations on subsets of data.
8.?Indexes: This includes understanding how indexes work, and how to use them to speed up queries.
9.Performance tuning: This includes optimizing SQL queries, using query execution plans, and understanding how to use indexes to improve performance.
10.?Data modeling: This includes understanding how to design databases, create tables, define relationships between tables, and use normalization techniques.
11.?Stored procedures and functions: This includes creating and using stored procedures and functions to simplify complex SQL queries.
12.?Views: This includes creating views to simplify queries and provide a simplified view of data
SQL Interview questions link:
?
References for upskilling SQL:
PYTHON important topics for Data Science/Analytics Jobs:
NOTE: All the machine learning algorithms should be implemented using python at least once. This will enhance the knowledge of python as well as the practical implementation of the algorithms.
References for Python Interview questions:
Important References for upskilling python:
Topics Required to study in R programming for Data Science/Analytics Jobs:
1.R basics: This includes understanding the syntax of R, data types, variables, functions, and packages.
2.Data manipulation: This includes creating, retrieving, updating, and deleting data in R using functions like subset(), merge(), and dplyr verbs like select(), filter(), arrange(), mutate(), summarise() and group_by().
3.Data visualization: This includes using ggplot2 to create visualizations such as scatter plots, bar charts, histograms, and heatmaps to explore data.
4.Data analysis: This includes using statistical methods to analyze data, such as descriptive statistics, inferential statistics, hypothesis testing, and regression analysis.
5.Machine learning: This includes using various machine learning techniques such as classification, regression, clustering, and dimensionality reduction algorithms to build models to predict or classify data. Some popular packages for machine learning are caret, randomForest, xgboost, glmnet, and neuralnet.
6.Text mining and natural language processing: This includes using text mining and NLP packages such as tm, tidytext, and quanteda to preprocess text data, create text corpora, and perform sentiment analysis, topic modeling, and text classification.
7.Time series analysis: This includes using time series packages like ts, forecast, and prophet to analyze and forecast time series data.
8.Web scraping: This includes using rvest and httr packages to scrape data from websites.
9.?Big data: This includes using packages like sparklyr and data.table to analyze large datasets in R.
10.?Data preprocessing: This includes data cleaning, missing data imputation, data transformation, and data normalization using packages like tidyr, dplyr, and purrr.
References for R Programming Interview questions:
Important References for upskilling R Programming for Data Science:
MATHEMATICAL FOUNDATIONS FOR DATA SCIENCE:
(The following Concepts are very important for Machine learning, Deep learning Algorithms and Statistics. Direct Questions from these topics won’t be asked but it is important to know before diving deep into the ML algorithms)
1.Gauss Elimination method
2.Norms of the Matrix
3.Conditioned systems And Condition Number
4.LU Factorization and LU decomposition
5.Dolittle Method/Crout's Method/Cholesky Method
6.Gauss Siedel and Gauss Jacobi Method
? Vector Spaces and Subspaces:
? Linear Dependence and Independence of Vectors
? Basis and Dimension of vector?
? Row Space and Column Space of Vector
? Rank Nullity theorem
? Linear Transformation
? Eigen Value and Eigen Vector:
? Eigen Vectors, Eigen Space?
? Eigen values, Multiple Eigen Values and Its numerical
? Geometric Multiplicity, Algebraic Multiplicity
? Similarity of Matrices, Diagonalization of Matrices
? Dominant Matrix ,Power Method for approximating Eigen values
? Singular Value Decomposition
? QR decomposition
? Inner product Spaces
? Dimensionality Reduction?
? Calculus (Derivatives and Integration):
? Definite Integrals
? Probability Density Function and Cumulative Distribution Function?
? Gradient and Directional Derivatives
? Lagrange Multiplier and Gradient Descent?
? Permutations and Combinations:
? Basic counting principles: The product, Sum and Division Rule?
? Pigeonhole principle (For understanding of permutation concepts)
? Permutation and Combinations Refresher (Generally Follow the Well-Known R.D Sharma Book)
? Binomial Theorem:
? Power of Binomial Expressions?
? Formula of Binomial theorem (generally used in the Binomial distribution Under statistics)
Important Reference for upskilling Higher Engineering Mathematics:
Link for Gauss Siedel/Jacobi/LU Factorization/LU decopmosition:
Link for Eigen values, Vector, Linear Transformation, Vector subspaces:
Link for Singular Value Decomposition:
Link for Gradient Descent :
Link for Permutation Combinations (Pigeonhole Principle):
Link for Binomial Theorem:
DATA STRUCTURE AND ALGORITHM DESIGN:
Tips: Keep practicing all the algorithms using Python as it can be asked in the interviews.
? Time complexity and Space complexity:
? Big O, Big Omega , Big theta Asymptotic Analysis
? Time Complexity of nested Algorithm
? Time complexity of Recursive Algorithm
? Method of Solving recurrences:
? Substitution method?
? Master Method (go through master theorem cases and numericals)
? Abstract Data types (Understand the modes of operation and time complexity for each of them)
? Stack and Array Implementation of stack:
? Queue and Array implementation of Queue
? Single Linked List/ Double Linked list (Insertion, Deletion, Search)
? Infix, prefix, postfix and it’s conversions
? Tree Data Structure:
?Balanced Binary tree and different types of tree
?Binary trees (Array Representation & Linked in Representation)
?Tree Traversal (In-order, Pre order, Post order)
?Heap Data Structure and time complexity
?Max and Min heap
?Insertion, Deletion, Search, sorting in Max and Min heap
?Heapification process
? Graph:
?Terminologies of Graph and different types
?Graph Implementation Strategies (Practice numerical), Adjacency Matrix, Adjacency List
?Graph traversal :?
?Breadth First search(BFS), Depth First Search(DFS)
?Directed Graphs, Transitive Closure of Graphs(Floyd Warshall Algorithm)
? Binary Search Tree ( Insertion, Deletion, Search)
? AVL Trees (Adelson – Velskii and Landis Tree)
?AVL trees Terminologies and Balanced AVL trees
领英推荐
?AVL operations (Insert, Delete)
? Optimization Problem Solving Techniques:
?Greedy Method Algorithm
?Task Scheduling Algorithm
?The Knapsack Problem (Dynamic Programming)
?Kruskal Algorithm and Prism Algorithm
?Floyd Warshall Algorithm for Transitive Closure
?Djikstra’s Algorithm
Important references of Data Structure for interviews:
Important links for Upskilling Data Structure:
For Time and space complexity:
Infix, Postfix and Prefix expressions:
Stack and Queue:
Tree Traversals:
Heap Data Structure and operations:
Graph traversals:
AVL Tree Operation:
Knapsack Problem:
Kruskal Algorithm:
Floyd-Warshall Algorithm:
Dijkstra's Algorithm:
STATISTICS FOR DATA SCIENCE:
? Mean, Median, Mode, Standard Deviation, Variance, Range
? Population and Sample?
? Five-point summary of data, Box Plot, concept of Outliers?
? Probability (Solve numerical to have better concept)
?Discrete and continuous Sample space
?Axioms of probability: Different types of Events
?Independent events, Conditional probability and Bayes theorem
?Na?ve Bayes classifier, its assumptions and Bayesian learning, Laplace Smoothing
? Random variables:
?Discrete and Continuous
?Discrete probability distributions, Expected value(mean) and Variance
?Continuous probability distribution functions: Mean and Variance
?Joint probability distribution functions, Marginal Probability
? Distributions:
?Binomial Distribution, numerical and Properties
?Bernoulli Distribution, Numerical and Properties
?Poisson Distribution, Mean and Variance?
?Normal Distribution (Gaussian Distribution): Properties, Z value?
?T distribution, Chi-square distribution, F distribution and Properties
? Testing of Hypothesis:
?Sampling, Sampling error, Types of Sampling, Sampling distribution of mean
?Central Limit Theorem, Z score of sample means
?Sampling from finite population: It’s rules formula
?Point Estimate, Confidence Interval estimate, Estimating the population proportion
? Parametric tests and Non parametric tests
? P-Value, Alpha error and Beta error,
? Z-test, T-test(For difference between two means), Chi- square-Test(Independence and Goodness of Fit,), F-Test(For Ratio of variances), Annova (One-Way Annova): Assumptions of all the test, Rejection criteria, Null and Alternate Hypothesis
? Correlation And Regression:
?Correlation and Covariance: Concepts and Numerical
?Karl Pearson’s correlation coefficient
?Spearman Rank correlation
? Regressions concepts: Method of least square (Numerical)
?Assumptions of Linear regression
?Total sum of squares, Residual Sum of squares
?Overfitting, Multi collinearity, Variance inflation factor
?R-Squared and Adjusted R squared, MSE, MAE, MAPE
Important References for Upskilling Statistics:
Naive Bayes Classification:
Probability Distribution:
Normal Distribution:
P-Value, Z-test, T-Test, Annova-test:
Correlation and Regression:
Linear Regression:
Evaluation Metrics for Regression model:
Important references for statistics for a data science job:
MACHINE LEARNING: Implement these concepts using Python and R for practice.?
? Regression:
?Single Regression and Multiple regression model
?Intuition behind Cost function, Least Squares
?Closed form Solution Approach & Gradient Descent Approach for numerical
?Concepts on Learning Rate
?Cross Validation, Hyperparameter tuning?
?Training set, Validation Set and Test Set
?Bias-Variance Tradeoff, Different types of MAE, MAPE,MSE)
? Classification:
?Discrete Variable, Decision Boundary
?Types of classification: Binary and Multilabel classification
?Logistic Regression Concepts and Numerical
? Bayesian Learning:
?Maximum Likelihood Estimate?
?Maximum A Posteriori Estimation
?Bayes theorem and Conditional probability
?Na?ve Bayes and Conditional Independence
?Na?ve Bayes Classification and Laplace smoothing
?Text classification using Na?ve bayes
? Neural Networks:
?Perceptron, Nodes, neurons, Weights and Biases
?Input, Output, Hidden Layer
?Forward, Backward and Forward Backward Propagation
?Multilayer perceptron and Numerical?
?Activation functions (Sigmoid, Tanh, Relu)
?Dropout, Epochs, Early Stopping
? K Nearest Neighbours:
?Measures of Similarity and Dissimalirity(Mahattan, Minkowski and Euclidean)
?Handling of Ordinal variables in KKN approach
?KNN regression with Mixed datatypes datasets
?Hyperparameter tuning , Elbow Method?
? Ensemble Learning:
?Bagging(Bootstrap Aggregating), Bootstrap Sampling Idea, Bagging – Effect on Variance and Bias
?Random Forest for feature selection
?Boosting:
?Gradient Boosting
?Adaptive Boosting
?XG Boost
? SVM Classification:
?Linear SVM: Optimization with Langrangian
?Non-Linear SVM:?
?Kernel Trick: Different kernel Functions
?Support Vectors
?Maximum Margin Hyperplane: Large Margin & Small margin
?Slack Variables, Soft Margin Classifcation
?Effect of Margin Size and Misclassification Cost
? Clustering:
?K-Means Clustering numerical
?Expectation Maximization Algorithm
?Minimizing K Means cost function
?Limitation of K means
?K means for outlier detection
TIME SERIES FORECASTING:
? Time Series analysis
?Continuous and Discrete time series
?Seasonality, Trend, Noise, Cyclical variation , Irregular Variation Residuals in Time Series
?Additive decomposition and Multiplicative decomposition
?Forecasting: Short term Medium Term and Long term Forecasting
? Forecasting techniques:
?Smoothing method: Moving Average , Exponential Smoothing
?Double Exponential Smoothing, Triple Exponential Smoothing method?
?ARIMA: Auto Regressive Integrated Moving Average, SARIMAX?
?Holts Winter method , Prophet Forecasting Method?
?Measurement of Forecast Error: MSE, MAPE, MAD
Important references for Upskilling Machine learning:
Bias-Variance Tradeoff
Cross Validation & Hyperparameter tuning:
Logistic Regression for classification:
Text Classification using Naive Bayes:
Neural Networks:
K- Nearest Neigbours:
Random forest:
Xgboost Algorithm:
SVM Kernels:
K- Means Clustering:
Time series Forecasting:
Important references for interview questions:
Regression:
Classification:
Random Forest Algorithm:
Xgboost Algorithm:
Time series Forecasting and Analysis:
? Data Engineering:
?Alteryx
?Apache Airflow
?Hadoop
?Pyspark?
?Model Development, Monitoring And Deoployment (ML Ops)
? Cloud Computing:
?AWS
?GCP
?Azure
? Dashboarding:
?Tableau
?PowerBI
?R shiny
Natural Language Processing:
When studying Natural Language Processing (NLP), there are several fundamental concepts, techniques, and subfields that you should consider. Here are some key areas to focus on:
Important References for upskilling NLP:
Sentiment Analysis:
Named Entity Recognition:
Parts of Speech Tagging:
Conclusion:
In conclusion, preparing for a data science interview requires a strategic approach and a well-rounded skill set. By focusing on fundamental concepts, technical proficiency, and effective communication, you can enhance your chances of success in this competitive field. Remember, preparing for a data science interview is a journey rather than a destination. It requires dedication, perseverance, and an ongoing commitment to self-improvement.
Lastly, embracing a growth mindset and maintaining a curiosity-driven approach are crucial for success in the ever-evolving field of data science. Demonstrating a passion for continuous learning and a willingness to explore new domains or techniques will highlight your ability to adapt to emerging challenges and contribute to the growth of an organization.
Good luck on your data science interview journey!
Thank you for reading this article. For more such informative articles, please subscribe to this newsletter. Being Knowledgeable is one of the best strengths a person can develop. Also, if you want me to write on any particular topic, do mention it in the comment section.
Contact
Phone number : 7752041517
Email Id: [email protected]
Linkedin profile:
MSc in Data Science(Business Management) | Data Visualization | Data Analysis | Machine Learning | Business Analyst
1 年Very Insightful and well-written.