Data Science Teams: Good Questions to ask by Data Scientists
In a previous newsletters?I stress the importance of asking compelling questions when serving as a member on a data science team. After all, questions are the impetus for exploration and discovery. I also recommend several techniques initiating question sessions.
However, the techniques I recommend aren't helpful unless you and others on your data science team are comfortable asking questions. In this newsletter, I present four common reasons that data science team members may be uncomfortable asking questions. Simply by recognizing the common barriers to asking questions, you are better equipped to overcome those barriers on your own.
Self-Protection
Asking questions may be very uncomfortable, especially when you're asking someone who's in a position of authority and especially when the person you're asking has an intimidating presence. After all, your question may be perceived as being dumb or as challenging or threatening the other person. No doubt about it — some people have even been fired over asking very good questions.
As a result, many employees, even those who serve on a data science team, may be reluctant to ask compelling questions. They have a natural desire to protect themselves. Nobody wants to seem dumb, wrong, or confrontational.
Overcoming this barrier requires working up the courage to ask compelling questions. Sometimes, you just need to do it — force yourself. If you can't work up the courage, try the opposite tactic — fear. Remind yourself that your job is to ask good questions. If you don't ask, you're not doing your job. And if you don't do your job, your team will fail, and you'll all end up in the unemployment line.
The good news is that over time and with lots of practice, asking tough questions becomes second-nature. When you begin to see that asking questions isn't a threat, and you begin to reap the benefits of asking good questions, any fear you may have had quickly disappears.
Insufficient Time
Some data science teams just don't have enough time and energy to ask compelling questions. Asking questions is hard work. It's exhausting, especially when you're just getting started on a project. It might seem as though each question meeting gets longer. Instead of feeling as though you're making progress toward an answer or solution, you may feel as though you're getting further and further from it. At this point, the team can quickly become discouraged and stop asking.
Many data science teams fall into this trap, and as soon as they stop asking questions, they turn their attention to routine work, such as capturing and cleaning data or implementing new data analytics and visualization tools.
Often, the rest of the organization celebrates this shift from what's perceived as esoteric to more practical endeavors — real work. Many organizations prefer a busy team over an effective one. When this happens, everyone gets so focused on rowing that no one takes the time to question where the ship is headed and why.
Remember that there is no prize for the most data, the cleanest data set, or the best data analytics and visualizations. Prizes are given out for delivering insights and creating business value. You can't do that unless you spend quality time coming up with compelling and relevant questions.
Insufficient Experience
Some data science teams struggle to ask questions simply because they have little experience doing so. This is especially prevalent when team members are engineers, software developers, or project managers — people who have built their careers on answering questions and solving problems. These people want to do, not ask. Team members who come from science or academia tend to have an easier time making the transition.
Nothing is wrong with answers and solutions. In fact, a data science team often needs its members to propose answers and solutions, so those can be tested. However, during question sessions, the team needs to find a way to transform some statements into questions. For example, a team member who is unaccustomed to asking questions may say something like, "I see that more women than men are buying running shoes on our website. Maybe it's because our marketing department caters mostly to women.” The team could easily convert those statements into a question: "Why do more women than men buy running shoes on our website?"
Remember: statements don't spark discussion. Usually, the only option is for the other person to agree or disagree. With a question, the team can begin to consider a range of possibilities and discuss the data it needs to examine for answers.
A Corporate Culture That Stifles Questions
Some data science teams are stifled by a corporate culture that discourages employees from asking questions. In his book The Magic of Dialogue: Transforming Conflict into Cooperation, social scientist Daniel Yankelovich points out that most organizations in the U.S. have a culture of action. When they encounter a problem, their first instinct is to fix what's broken. Asking questions impedes progress.
Quick, decisive action is often needed in organizations, but it's counterproductive in data science, where the focus is on learning and innovation. One thing you don’t want to see the data science team doing is getting wrapped up in routine work to accomplish something practical. You don’t want the research lead saying something like, “You can ask questions once you finish uploading all the data to the cluster.” The team shouldn't be focused on completing projects but on coming up with new insights.
When you’re working on a data science team, watch out for an individual or organizational bias against questions. Questioning is one of the first steps toward discovery. If you skip this step, your team, and the organization overall, will have trouble learning anything new.
Frequently Asked Questions
领英推荐
Which questions should I ask to understand the data science roles within the team?
Questions like "Can you describe the different data science roles within the team?" and "How do data scientists collaborate with data analysts and data engineers here?" will help you understand the team dynamics and your potential day-to-day responsibilities.
How do supervised and unsupervised learning differ in data science?
Supervised learning involves training data with labeled responses, whereas unsupervised learning deals with data without labeled responses. Questions about these concepts may be asked to assess your understanding of different machine learning models.
Can you explain the importance of ensemble learning in data science?
Ensemble learning combines multiple machine learning models to improve accuracy and robustness. Understanding ensemble techniques like bagging, boosting, and stacking can be important for discussing advanced data science topics during interviews.
How important is Python in a data science job?
Python is extremely important in data science roles because it offers robust libraries for data analysis, machine learning, and data visualization. Libraries like pandas, scikit-learn, and matplotlib are widely used in the field of data science to manipulate data, build machine learning models, and create insightful visualizations.
What are the key differences between supervised and unsupervised learning?
Supervised learning involves training a machine learning model on labeled data, meaning the data points have known outcomes. Examples include classification and regression tasks. Unsupervised learning, on the other hand, deals with unlabeled data and is used in data science to identify patterns or groupings within the data, such as in clustering or association tasks.
What regularization techniques are commonly used in data science?
Common regularization techniques used in data science include L1 regularization (Lasso), L2 regularization (Ridge), and Elastic Net, which is a combination of both L1 and L2. These techniques are used to prevent overfitting by adding a penalty to the complexity of the model.
How does machine learning fit into the field of data science?
Machine learning is a subset of data science that focuses on building algorithms to learn from and make predictions on data. It involves training models using various machine learning algorithms to identify patterns and make decisions with minimal human intervention. Machine learning is widely used in data science to automate data-driven decision-making and predictive analysis.
Why is data visualization important in data science?
Data visualization is crucial in data science because it allows data scientists to communicate complex data insights in an understandable and accessible way. Visualizations help to illustrate trends, patterns, and correlations that may not be discernible from raw data. Effective data visualization can drive better decision-making by providing clear and actionable insights.
What are the main responsibilities of a data engineer in a data science team?
A data engineer in a data science team is responsible for constructing, maintaining, and scaling the data infrastructure necessary for data collection, storage, and analysis. They ensure that data pipelines are efficient and reliable, enabling data scientists to access and interpret data for their analyses and model building. Data engineers also work on data cleaning and transformation to ensure data quality.
How do you handle missing data in a data science project?
Handling missing data is a critical step in data science projects. Techniques include imputation methods, such as filling in missing values with mean, median, or mode, or using more sophisticated methods like K-Nearest Neighbor. In some cases, data points with missing values may be removed if they do not significantly impact the dataset. The choice of method depends on the amount of missing data and its potential impact on analysis or model performance.
This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or AI, incorporating insights from the history of data and data science. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and data ethics.?
This newsletter is 100% human written ?? (* aside from a quick run through grammar and spell check).
More Sources:
great explained
--
3 个月Thanks for educating us.
Tech Lead | DevOps, Cloud & Digital Transformation Leader | Career Coach & Mentor
3 个月The culture does determine how open communication is. One of the leader's jobs is to create a suitable environment suitable for trustful relationships. Thank you for this great reminder Doug Rose.