Making Assumptions - Musings from a Silicon Valley Data Scientist
The following article discusses one of the many things that I have learned as a data scientist. It does not involve any popular new methods or deep learning, and perhaps does not sound as cool. It is something much more primary, but absolutely crucial to any scientist’s career.
Whenever a problem is given to someone and their task is to find the best possible solution, one of the most important aspects is to efficiently search through the possible solutions. However, when presented with a problem, the first step people usually take is to quickly begin searching for a solution. This search happens in a virtual space of possible solutions that is created (delineated) by one’s mind almost automatically. We are able to create this virtual search space very quickly by making many assumptions about the problem.
We have been pushed by evolution to make assumptions in order to survive and learn quickly. Making assumptions is a very useful tool, it allows us to skip many steps in understanding things in order to incorporate new knowledge at a fast pace. In fact, we have been driven to making assumptions so strongly, that it is not only automatic, but many times completely invisible to us.
Making these fast assumptions is not always a bad thing, as it has aided us for millennia in quickly learning to build upon others’ knowledge and to adapt in a fast manner to new occurrences in our environment. As long as you realize that you are always making assumptions, you can benefit from them without letting it hurt your work or problem solving skills.
I would now like to prove to you that you do make assumptions. First, consider this: most of the solutions to puzzles/brainteasers are difficult to find because they are outside of the solution space you create to begin with. This is true because the problems are worded in such a way as to induce a certain automatic assumption by the listener. Look at the problems below and think about them for a bit before you continue reading this article.
Here is the first puzzle. Simply figure out which number is in the spot in which the car is parked.
In this second puzzle, imagine 4 toothpicks on a table forming a cross. You are only allowed to move 1 toothpick once, and your goal is to make a square. You are not allowed to use a toothpick to move the other toothpicks or to break a toothpick.
Both examples above have very simple solutions; however, these solutions are only hard to find because your initial search space does not include them. It is human nature to assume that if a solution is not found, it is because one did not look carefully enough. That is the normal behavior with puzzles - search for a solution, and after not finding it, search some more, again and again, when in reality you need to stop and question the search space and not the search itself. This applies to work in data science and software engineering.
In my academic years, I spent my fair share of time coding and fixing bugs; I also competed in coding competitions where both quality and speed were being tested, so finding bugs at a fast pace was crucial. One thing that I learned was to always be aware of my assumptions. The assumptions allow you to search in the most probable places first and get to your solution more quickly. However, you need to be aware of these assumptions so that when you are not finding your solution you can revisit them.
A behavior that I have seen over and over again in industry with software engineers is to make assumptions about their code when looking for a bug, and not questioning these assumptions. It goes something like this:
Coder 1 (the one with the bug) “Could you help me fix this bug?”
Coder 2 (friend helping out) “Where do you think the bug is?”
Coder 1 “Somewhere around here.”
Coder 2 “Did you look over here (different place)?”
Coder 1 “No, but I don’t have to. It is impossible for a bug to be there because …”
The explanations are endless for why it is not possible for a bug to exist in certain locations of the code. I always repeat myself and say “question your assumptions,” and it is usually where people are not looking where the bug is hiding. If the bug was hiding where the person was already looking, it wouldn’t really be hidden or hard to find in the first place. This is a hard lesson to learn.
In part, making no assumptions may make you look like you do not know what you are doing. By checking parts of the code that are "impossible" to have a problem, due to whatever reason, you might seem foolish. Make no mistake, as a species we have shown time and again that the only foolish mistake is to always think we know more than what we actually do know. There is no shame in embracing the openness to learning new things; there is shame however, in refusing to learn because of posturing, pride and wanting to appear smart.
To conclude this article, I would like to tell you a story about a true scientist with no fear of assumptions, Dr. Ignaz Semmelweis, one who is credited with the discovery of the importance of hand-washing before medical procedures.
Firstly, he made use of statistics and noticed that women giving birth at a certain hospital had a significantly higher mortality rate than that of another hospital. What followed was a truly methodical and relentless pursuit of the reason why. He proceeded to test everything that was different between the two places, including the ringing of bells. When I tell this story, people think that it was silly having tested bell ringing, and that he must have been superstitious, and if he were a true scientist he would have known that bell ringing could not have any possible effect on mortality rates. Does this sound familiar? This is another example of assumptions being reinforced by arrogance and fear of ridicule; unfortunately a very common behavior. Dr.Semmelweis was a TRUE scientist; he was not going to let embarrassment or ridicule get in the way of science; he proceeded beautifully and methodically and held “no assumptions,” eventually finding the true answer to the problem at hand.
It is hard to follow the above story with any last few words that drive the point home more poignantly. Remember, first put some thought into the problem definition and use assumptions to help you reduce the search space. Second, remember to revisit this step if you can’t find the solution you are looking for. You might have to question some of the assumptions you made, or try to think of assumptions you might have made inadvertently, which are much trickier to spot. At the end of the day, you should view assumptions as a useful tool in solving problems, one that must be used with consideration and care.
Good luck in your pursuit of true (data) science and if you are interested, I will be posting the answers to the puzzles above as posts on my linkedin profile https://www.dhirubhai.net/in/pedroalvesds together with a series of other “Musings from a Silicon Valley data scientist”
Cheers,
Pedro Alves - Founder and CEO @ Ople.ai
Bringing the true power of AI to the world.
Question everything!
7 年I agree with you wholeheartedly. People who work strictly with data analysis and who are not asked or required to provide interpretations should never use their experience or beliefs into account. A data scientist may be asked for an interpretation, but for every positive they should show a negative. All statistics can say one thing or another based on how much data is taken into account and what factors are used for the selected data. I think questioning the person requesting the data is one of the biggest responsibilities of the data scientist.
Project, Product, Operations, Satisfaction & not always in that order
7 年Nice one.
Senior Data Scientist
7 年excellent article Pedro, thanks!