登录查看更多内容

Where It All Started: How I Became a Data Scientist—(1) Follow the Data

Michael Wu PhD

Chief AI Strategist at PROS / Lecturer / Behavior Economist / Neuroscientist

发布日期: 2017年6月23日

One of the most frequently asked questions I get is “how did you end up as a social media/gamification data scientist from your biophysics PhD background?”

Retrospectively, I have literally answered this question (in one form or another) over 100 times, with journalist/blogger interviews, in keynotes Q&As, or just casual conversations with colleagues or acquaintances. Although the selection and sampling is biased, it’s not hard to arrive at the conclusion that there are probably many who are still interested in this question. So even though I don’t like to talk about myself, I’ll answer this question once more today.

From Particle Physics to the Machines that Created It

I’m an academic at heart. When I was an undergraduate, I started as a Physics Major at UC Berkeley (UCB) because I was interested in particle physics. It’s an area of physics where you smash things in particle accelerators to reveal their fundamental constituents in order to understand what makes up the universe (i.e. matter, anti-matter, etc.). At the time, there were several particle accelerators (e.g. the Large Hadron Collider at CERN, the Tevatron at Fermilab) running experiments that seemed to generate an unlimited supply of new data waiting to be analyzed, waiting to tell the story of how everything came to be. This certainly sounds exciting. However, I needed to declare a second Major in Applied Mathematics, because I had to take so many advanced math classes in order to fully comprehend and appreciate the deep mathematics used in particle physics.

Then life took a sharp turn. My dream job in particle physics was shattered when the construction of the world’s largest particle accelerator—the superconducting super collider—was cancelled in 1993 due to a congressional budget cut. While studying math and physics, I become interested in an area of mathematics called complex systems (a.k.a. nonlinear dynamics, chaos theory, and many other names). One reason that I was so fascinated about complex systems is because it seems to be everywhere. Chaos and unpredictability seem to appear over and over again under different disciplines: math, physics, chemistry, biology, engineering, computer science, economics, sociology, psychology and many more. The common theme is to find the order within the seemingly chaotic nature of these systems.

This opened my mind to one of the most complex system known to mankind—the human brain. However, there is no Neuroscience Major at UCB, but I got so interested this subject that I declared a third major in Molecular and Cell Biology (MCB), which had an emphasis in neurobiology, to learn the basics of neuroscience. Upon completion of the required course work for all 3 Majors, I was virtually kicked out of school because I’ve been at UCB for too long!

Reverse Engineering the Brain

After receiving my undergraduate degree, I was readmitted to UCB’s biophysics graduate program. During my PhD, I was drawn to Prof. Jack Gallant’s Visual Neuroscience Lab, because they had pioneered a method for collecting tons of data from their experiment to study how our brains process visual information. Being the resident math/stats geek, my work focused on developing new algorithms and techniques to analyze the data collected from their experiments and make sense of what the brain is doing. If you are curious, here are few publications with detailed description of the algorithms I developed:

Computational methods for functional characterization of visual neurons —my PhD dissertation
Complete functional characterization of sensory neurons by system identification — Rev. Neurosci. 29 (2006): 477-505
Nonlinear V1 responses to natural scenes revealed by neural network analysis— Neural Networks 17.5 (2004): 663-679
The Berkeley wavelet transform: a biologically inspired orthogonal wavelet transform — Neural computation 20.6 (2008): 1537-1564

Prof. Gallant’s experiment consisted of measuring the brain activity of a subject while he watches a movie under controlled conditions. The movie is the input to a complex system—the brain, whereas the measured brain activity is its output. Having both input and output, what’s left is to use statistics, machine learning, and other mathematical techniques to figure out the functional representation of the brain—the mapping between input and output. The video here should give you a good idea of what we did.

Still confused? If you watched The Imitation Game, this is precisely what Alan Turing had to do in order to reverse engineer the Enigma Machine. Except we were reverse engineering a much more complex machine—the brain, and we had more powerful computer clusters and more sophisticated algorithm to help us. The validation that we have modeled the brain’s visual processing well enough is the fact that we were able to predict what the subject saw with fairly high degree of accuracy just from scanning his brain activity. Pretty cool huh!

A Whole New World of Social Media Analytics

As I was wrapping up my dissertation, I went through the usual job search with 4 universities: Columbia, Cornell, Carnegie Mellon, and NYU. Despite the fact that I was still very much an academic at heart, I was frankly a bit frustrated and tired of the publication process in academia. I also found the cutthroat politics in the peer-review process quite distasteful, and it tarnished my idealistic impression of the purity in academia. So I did a little exploration beyond academia (i.e. in government labs and industries) even though I already had 3 offers.

I didn’t expect much from my exploration. Instead, I was looking for more of a confirmation that government and industry are even less suitable career paths for my idealistic expectation. I found what I was looking for. I hated the slow-moving and risk-averse nature of large government labs, and I disliked the purely monetary motive and the constant tradeoff between quality and time-to-market in industry. Although some large enterprises offer the “Scientist” title to their employees, from talking to those scientists, I’ve learned that they aren’t doing fundamental research. Few scientists in the industry are able to define a completely new scientific inquiry on their own and solve it their own ways.

I was pretty convinced that I should just go back to academia. That’s when an old high-school friend—Lyle Fong, who was the CEO and Co-Founder of Lithium at the time—introduced me to his startup. Lithium had collected a boat load of user behavior data on its community platform. The platform had many descriptive analytics (i.e. summary reports of the data), but hadn’t done much predictive or prescriptive analytics.

In the meantime, I was told that customers had asked for a number of things involving more advanced analytics, and I was also told several interesting problems, use cases, and challenges in social analytics. Truth be told… they could’ve told me anything. Since social media was completely new to me, I had no idea whether any of it was true. What was true, however, was that Lithium didn’t have plans to build any analytics product at that time. So I could basically pick any problems in the social/community space that interested me, and solve it anyway I like. I was very fortunate that Lithium let me be the scientist that I’m proud to be. The freedom to play with this huge and rich, yet unfamiliar, dataset is what got me to join Lithium. As they say, the rest is history.

Conclusion

This is part 1 of my journey in becoming a data scientist. If you are a perceptive reader, you can probably observe a pattern through all the twists and turns in my educational and professional pursuit.

To be a data scientist (at least a good one), you need to follow the data. Wherever there’s an abundance of data, that’s where you need to go.

I have no proof that this observation is universally true, so it’s up to you to believe it or not. But I followed the data, starting with the massive data sets from particle accelerators, to the neural data from visual response experiments, and finally to the user behavior data on social media. These data couldn’t be more different, but the one thing in common is that they are big. You may call it big data today, but it’s not new. Many scientists have been working with big data before this term was even invented.

Although I’ve just sped through almost 15 years of my life, my journey to becoming a data scientist has just begun. And like the data I analyze, life itself is complex and interesting, because it rarely unfolds as you plan it.

*Twitter: @mich8elwu, Youtube: my channel.

Ankit Prabhash

Project Manager | Special Projects - Energy & Procurement | 6σ Black Belt

7 年

Thank Michael !! A great Read :)

John Ken

Vice President, CRM at Experian

7 年

Steve Tu

Michael Wu PhD

Chief AI Strategist at PROS / Lecturer / Behavior Economist / Neuroscientist

7 年

thx Pradipta Pritam Bandyopadhyay

Pritam Bandyopadhyay

Risk Analytics @ Tyger Capital | ProdMan | FinTech ★ MBA @ NMIMS ★ Ex-Deloitte

7 年

Thank Michael for sharing , great article !

Michael Wu PhD

Chief AI Strategist at PROS / Lecturer / Behavior Economist / Neuroscientist

7 年

Hello Philip Soffer, I certainly hope it worked out well for everyone. But particle physics is a very challenging field. Even if the SSC were built as planned, I don't think I would've found anything profound about the nature of our universe. Too many people have been studying it. If you are in that field, you feel like everything that could be discovered has already been discovered. Glad to hear that you are trying to replicate my role at your current company. Let me know if there is anything I can do to help.

1 次回应

查看更多评论

要查看或添加评论，请登录

Michael Wu PhD的更多文章

My Thoughts on the Coronavirus: 8. Sketching a New Normal

2020年5月9日

My Thoughts on the Coronavirus: 8. Sketching a New Normal

First of all, congratulation for making it here, as this is likely the last post in my coronavirus mini-series. This is…

4 条评论
My Thoughts on the Coronavirus: 7. Shifts on the Horizon

2020年5月2日

My Thoughts on the Coronavirus: 7. Shifts on the Horizon

Globalization—a Fragile Web of Interdependency To minimize production costs and compete more effectively, large…

4 条评论
My Thoughts on the Coronavirus: 6. Immediate Opportunities

2020年4月24日

My Thoughts on the Coronavirus: 6. Immediate Opportunities

A Window of Opportunity We all know that executing any large-scale systemic changes is very challenging, because so…
My Thoughts on the Coronavirus: 5. The Bright Side

2020年4月19日

My Thoughts on the Coronavirus: 5. The Bright Side

OK, the last article was a bit dark, so let’s switch gears and look at the bright side of Covid19. Although things…

3 条评论
My Thoughts on the Coronavirus: 4. The Dark Side

2020年4月11日

My Thoughts on the Coronavirus: 4. The Dark Side

Our Inability to Evaluate Probability and Risk In Part 2 of this series, I’ve discussed our inability to understand…

16 条评论
My Personal Thoughts on the Coronavirus: 3. Socioeconomic Side Effect

2020年4月6日

My Personal Thoughts on the Coronavirus: 3. Socioeconomic Side Effect

An Attack on Civilization A global pandemic is one of the few things that late Stephen Hawking said could end humanity.…

2 条评论
My Personal Thoughts on the Coronavirus: 2. The Immediate Urgency

2020年4月1日

My Personal Thoughts on the Coronavirus: 2. The Immediate Urgency

BTW, Part of my keynote in Shift/CX will be broadcasted live this Thursday (tomorrow) on the worldwide web. So if you…

8 条评论
My Personal Thoughts on the Coronavirus: 1. Making Sense of the Reality

2020年3月30日

My Personal Thoughts on the Coronavirus: 1. Making Sense of the Reality

Last week, I was supposed to be in Frankfurt delivering my keynote at Shift/CX. Instead, I was locked down at home in…
A Retrospective of my Journey: Part 3 (Finale)—The Analyses

2019年7月16日

A Retrospective of my Journey: Part 3 (Finale)—The Analyses

This post is the last and the most important part of the 3-part mini-series on my reflection of the journey to PROS. If…

11 条评论
A Retrospective of my Journey: Part 2—Why PROS?

2019年7月3日

A Retrospective of my Journey: Part 2—Why PROS?

Last time, I was reflecting upon my first year at PROS, and I’ve shared with you the career optimization problem at a…

2 条评论

See all articles

Where It All Started: How I Became a Data Scientist—(1) Follow the Data

Michael Wu PhD

Chief AI Strategist at PROS / Lecturer / Behavior Economist / Neuroscientist

From Particle Physics to the Machines that Created It

Reverse Engineering the Brain

A Whole New World of Social Media Analytics

Conclusion

Michael Wu PhD的更多文章

社区洞察

其他会员也浏览了

How Computer Scientists Reimagined Mathematical Proof

The Counterintuitive Power of Randomness

Why Math Plays With Toys

EVERYTHING Is Quantum

Mathematics: The Language of the Universe and the Foundation of Universal Design

Saturday with Math (Nov 23rd)

Saturday with Math (Feb 22nd)

Super computing PI to trillion digits

STEM Gems Book: Meet Our Math Gems

Saturday with Math (Aug 24th)

From Particle Physics to the Machines that Created It

Reverse Engineering the Brain

A Whole New World of Social Media Analytics

Conclusion

Michael Wu PhD的更多文章

My Thoughts on the Coronavirus: 8. Sketching a New Normal

My Thoughts on the Coronavirus: 7. Shifts on the Horizon

My Thoughts on the Coronavirus: 6. Immediate Opportunities

My Thoughts on the Coronavirus: 5. The Bright Side

My Thoughts on the Coronavirus: 4. The Dark Side

My Personal Thoughts on the Coronavirus: 3. Socioeconomic Side Effect

My Personal Thoughts on the Coronavirus: 2. The Immediate Urgency

My Personal Thoughts on the Coronavirus: 1. Making Sense of the Reality

A Retrospective of my Journey: Part 3 (Finale)—The Analyses

A Retrospective of my Journey: Part 2—Why PROS?

社区洞察

其他会员也浏览了

How Computer Scientists Reimagined Mathematical Proof

The Counterintuitive Power of Randomness

Why Math Plays With Toys

EVERYTHING Is Quantum

Mathematics: The Language of the Universe and the Foundation of Universal Design

Saturday with Math (Nov 23rd)

Saturday with Math (Feb 22nd)

Super computing PI to trillion digits

STEM Gems Book: Meet Our Math Gems

Saturday with Math (Aug 24th)