登录查看更多内容

Making Assumptions - Musings from a Silicon Valley Data Scientist

Pedro Alves

A.I. + Bitcoin + Hedge Fund = financial success

发布日期: 2017年5月19日

The following article discusses one of the many things that I have learned as a data scientist. It does not involve any popular new methods or deep learning, and perhaps does not sound as cool. It is something much more primary, but absolutely crucial to any scientist’s career.

Whenever a problem is given to someone and their task is to find the best possible solution, one of the most important aspects is to efficiently search through the possible solutions. However, when presented with a problem, the first step people usually take is to quickly begin searching for a solution. This search happens in a virtual space of possible solutions that is created (delineated) by one’s mind almost automatically. We are able to create this virtual search space very quickly by making many assumptions about the problem.

We have been pushed by evolution to make assumptions in order to survive and learn quickly. Making assumptions is a very useful tool, it allows us to skip many steps in understanding things in order to incorporate new knowledge at a fast pace. In fact, we have been driven to making assumptions so strongly, that it is not only automatic, but many times completely invisible to us.

Making these fast assumptions is not always a bad thing, as it has aided us for millennia in quickly learning to build upon others’ knowledge and to adapt in a fast manner to new occurrences in our environment. As long as you realize that you are always making assumptions, you can benefit from them without letting it hurt your work or problem solving skills.

I would now like to prove to you that you do make assumptions. First, consider this: most of the solutions to puzzles/brainteasers are difficult to find because they are outside of the solution space you create to begin with. This is true because the problems are worded in such a way as to induce a certain automatic assumption by the listener. Look at the problems below and think about them for a bit before you continue reading this article.

Here is the first puzzle. Simply figure out which number is in the spot in which the car is parked.

In this second puzzle, imagine 4 toothpicks on a table forming a cross. You are only allowed to move 1 toothpick once, and your goal is to make a square. You are not allowed to use a toothpick to move the other toothpicks or to break a toothpick.

Both examples above have very simple solutions; however, these solutions are only hard to find because your initial search space does not include them. It is human nature to assume that if a solution is not found, it is because one did not look carefully enough. That is the normal behavior with puzzles - search for a solution, and after not finding it, search some more, again and again, when in reality you need to stop and question the search space and not the search itself. This applies to work in data science and software engineering.

In my academic years, I spent my fair share of time coding and fixing bugs; I also competed in coding competitions where both quality and speed were being tested, so finding bugs at a fast pace was crucial. One thing that I learned was to always be aware of my assumptions. The assumptions allow you to search in the most probable places first and get to your solution more quickly. However, you need to be aware of these assumptions so that when you are not finding your solution you can revisit them.

A behavior that I have seen over and over again in industry with software engineers is to make assumptions about their code when looking for a bug, and not questioning these assumptions. It goes something like this:

Coder 1 (the one with the bug) “Could you help me fix this bug?”

Coder 2 (friend helping out) “Where do you think the bug is?”

Coder 1 “Somewhere around here.”

Coder 2 “Did you look over here (different place)?”

Coder 1 “No, but I don’t have to. It is impossible for a bug to be there because …”

The explanations are endless for why it is not possible for a bug to exist in certain locations of the code. I always repeat myself and say “question your assumptions,” and it is usually where people are not looking where the bug is hiding. If the bug was hiding where the person was already looking, it wouldn’t really be hidden or hard to find in the first place. This is a hard lesson to learn.

In part, making no assumptions may make you look like you do not know what you are doing. By checking parts of the code that are "impossible" to have a problem, due to whatever reason, you might seem foolish. Make no mistake, as a species we have shown time and again that the only foolish mistake is to always think we know more than what we actually do know. There is no shame in embracing the openness to learning new things; there is shame however, in refusing to learn because of posturing, pride and wanting to appear smart.

To conclude this article, I would like to tell you a story about a true scientist with no fear of assumptions, Dr. Ignaz Semmelweis, one who is credited with the discovery of the importance of hand-washing before medical procedures.

Firstly, he made use of statistics and noticed that women giving birth at a certain hospital had a significantly higher mortality rate than that of another hospital. What followed was a truly methodical and relentless pursuit of the reason why. He proceeded to test everything that was different between the two places, including the ringing of bells. When I tell this story, people think that it was silly having tested bell ringing, and that he must have been superstitious, and if he were a true scientist he would have known that bell ringing could not have any possible effect on mortality rates. Does this sound familiar? This is another example of assumptions being reinforced by arrogance and fear of ridicule; unfortunately a very common behavior. Dr.Semmelweis was a TRUE scientist; he was not going to let embarrassment or ridicule get in the way of science; he proceeded beautifully and methodically and held “no assumptions,” eventually finding the true answer to the problem at hand.

It is hard to follow the above story with any last few words that drive the point home more poignantly. Remember, first put some thought into the problem definition and use assumptions to help you reduce the search space. Second, remember to revisit this step if you can’t find the solution you are looking for. You might have to question some of the assumptions you made, or try to think of assumptions you might have made inadvertently, which are much trickier to spot. At the end of the day, you should view assumptions as a useful tool in solving problems, one that must be used with consideration and care.

Good luck in your pursuit of true (data) science and if you are interested, I will be posting the answers to the puzzles above as posts on my linkedin profile https://www.dhirubhai.net/in/pedroalvesds together with a series of other “Musings from a Silicon Valley data scientist”

Cheers,

Pedro Alves - Founder and CEO @ Ople.ai

Bringing the true power of AI to the world.

Heidi Huber

Question everything!

7 年

I agree with you wholeheartedly. People who work strictly with data analysis and who are not asked or required to provide interpretations should never use their experience or beliefs into account. A data scientist may be asked for an interpretation, but for every positive they should show a negative. All statistics can say one thing or another based on how much data is taken into account and what factors are used for the selected data. I think questioning the person requesting the data is one of the biggest responsibilities of the data scientist.

1 次回应

Nicholas Kostikis (co-stick-keys)

Project, Product, Operations, Satisfaction & not always in that order

7 年

Nice one.

1 次回应

Nicolas Martin

Senior Data Scientist

7 年

excellent article Pedro, thanks!

1 次回应

查看更多评论

要查看或添加评论，请登录

Pedro Alves的更多文章

The unwinnable poker game you are stuck playing.

2024年3月12日

The unwinnable poker game you are stuck playing.

Before I begin this article, I wanted to clear a few things. To the people with a strong background in economics…
Happy Birthday Seuss You

2020年3月2日

Happy Birthday Seuss You

I once found myself eating lunch, with colleagues at work when I had a hunch. I thought it would be interesting…
The w-AI-ting place.

2019年8月20日

The w-AI-ting place.

Within the last several years, the number of companies trying to implement Artificial Intelligence in their business…

1 条评论
How to have productive arguments

2018年9月10日

How to have productive arguments

I love and have always loved arguing, not for the sake of arguing or because I see it as a fun challenge, but for the…

2 条评论
Save the spirit of Silicon Valley

2018年3月1日

Save the spirit of Silicon Valley

You don't get to keep your shares unless you are already wealthy. This is the sad truth in Silicon Valley.

10 条评论
Our raison d'etre

2018年2月8日

Our raison d'etre

I am in the process of writing a few series of articles focused on topics like: interviews, state of AI, and life in a…
Mark Cuban is dead wrong.

2017年12月6日

Mark Cuban is dead wrong.

The recent excitement about AI is undeniable. In every industry there is room for AI to shine and bring tremendous…

5 条评论
Training an army of Wilburs and Orvilles the (W)right way.

2017年5月31日

Training an army of Wilburs and Orvilles the (W)right way.

There is an enormous corpus of articles, blogs, discussions and posts around the subject of the AI/machine…

11 条评论
A change is coming to AI.

2017年5月16日

A change is coming to AI.

I will try to make this article concise and to the point. As I reflect about everything I want to say about this topic…

See all articles

Making Assumptions - Musings from a Silicon Valley Data Scientist

Pedro Alves

A.I. + Bitcoin + Hedge Fund = financial success

Pedro Alves的更多文章

社区洞察

其他会员也浏览了

Learn the Art of Data Science in Five Steps

Decision Trees and Random Forests in Data Science

Insights from Stumbling Back into Data Science

Webinar on Data Science jobs across geographies - Text Synopsis and Video

Carve Your Niche: A Data Scientist's Guide to Growth and Differentiation

Your Path to Data Science Expertise

The Future of Data Science: Less Code, More Business Impact

Make Way for the Business Scientist

Proposed Questions for the Panel: "Benchmarks and Process Management in Data Science: Will We Ever Get Over this Mess?”

An unfiltered view of starting my data science journey at six years old through speech and drama

Pedro Alves的更多文章

The unwinnable poker game you are stuck playing.

Happy Birthday Seuss You

The w-AI-ting place.

How to have productive arguments

Save the spirit of Silicon Valley

Our raison d'etre

Mark Cuban is dead wrong.

Training an army of Wilburs and Orvilles the (W)right way.

A change is coming to AI.

社区洞察

其他会员也浏览了

Learn the Art of Data Science in Five Steps

Decision Trees and Random Forests in Data Science

Insights from Stumbling Back into Data Science

Webinar on Data Science jobs across geographies - Text Synopsis and Video

Carve Your Niche: A Data Scientist's Guide to Growth and Differentiation

Your Path to Data Science Expertise

The Future of Data Science: Less Code, More Business Impact

Make Way for the Business Scientist

Proposed Questions for the Panel: "Benchmarks and Process Management in Data Science: Will We Ever Get Over this Mess?”

An unfiltered view of starting my data science journey at six years old through speech and drama