课程: Data Analytics for Business Professionals

Guidelines for formulating questions

- In a Harvard Business Review article, data scientists from Booz Allen Hamilton described a recent data project in which they wanted to predict fraudulent behavior before it occurred. The data scientists had a particularly rich dataset with over 400 variables and 10 years of data. Which one would think would allow them to predict bad behavior. However, as described by the data scientists, "The fraud had manifested itself in hundreds of different ways, but there was so much of it and the fraudsters moved so quickly that we couldn't keep up with the patterns needed to track it." This is a great example of the fact that the question that was posed just wasn't aligning with what the data analysis could be used to prove. So the data scientists creatively tried a different approach. They reframed the analysis by changing the fundamental question they were asking. Instead of trying to find the characteristics of bad behavior, they instead pivoted to look for indications of good behavior. What were the features of honest people who follow the rules? This turned out to be attractable question, which was far easier to predict. Data analytics is a powerful tool, but to appropriately use this tool, you must understand the data you're working with and stay organized to maintain focus on the question you want to answer. In a Gartner research study on why big data projects fail, nearly all participants stated poor organization as the biggest factor. The process starts by asking the right questions. What am I trying to learn? Why does it matter? Imagine starting with the ideal world analysis assuming you have access to all the information you need, what data would you want and why? If there are multiple data sets, which one will best answer your question? Can you merge the data sets together? This will accomplish a few things. First, if you don't have access to all the data you want, it's often an effective way to figure out a better or more complete set of data to collect in the future. Second, the ideal world is often a rather simple place to start, which will help your data team get oriented with the project, especially if they're not familiar with the topic. Often, you'll find the ideal analysis impossible to perform. In that case, you can begin revising the question to fit your constraints. There's a time when the original data's problems require a different analysis. As you analyze potential problems, encourage your data personnel to ask as many questions as they want. This will save them and you headaches down the road because everyone will know the goal, the audience, the data set to build and all deadlines. Collaborate with your data team as much as possible, early and often, to make sure that everyone is on the same page. If you're asking the wrong question or a question that is not specific enough to give you a clear answer, this will leave your team unorganized. Data can be overwhelming and crafting the question carefully sets your team up for success. But these initial decisions, no matter how carefully they're thought out and communicated, will need to be updated as events unfold. Often, more data will come in as you're analyzing and your question must change to reflect this addition. Sometimes data isn't clean, which may put a time crunch on your project if you don't take this into account. Keep in mind that as the analysis evolves, your goals may change as well.

内容