How to break into Data Science the easy way
Scratch that; there’s not an easy way.
Data science has become a hot topic the past few years along side machine learning. The rise of machine learning has made data king, and as a result, there is a huge demand for data scientists. Becoming a data scientist through formal education is a product of the times, and to get into the industry in the modern era requires a bit of work.
Traditional Route
I am a data science, but I do not have a degree that says Data Science. I became a data scientist as a result of working with a lot of data. I have a Ph.D. from the pre-historic days where one had to read academic papers to implement machine learning. Traditionally, if you want to become a data scientist, you became a scientist. Then you deal with so much data that you became a data scientist to be able to analyze all that data. You started by majoring in STEM in undergrad, and then you’d go to a graduate program. The program wasn’t focused on data science, but as a result of your research, you handled a lot of data.
After years of data in graduate school and then some job(s) in your related field, you will have built up an acumen of a data scientist. You will have learned techniques for analyzing data and being confident in your results. This experience comes from working on academic papers and then applying these skills to industry.
Modern Era
Now the industry is hot! It seems like everyone wants in, and the prevalence of nano-degrees has given many newcomers the impression there are short cuts to become a data scientist.
There is not an easier, softer way to become a data scientist. One of the issues of trying to short cut the process is that you don’t learn how to look at data. Even though the concepts can be taught in short order, you need to look through lots of data, design data collections, collect data, clean data, train on data, analyze data, do failure analysis, and repeat. Graduate school is a great vehicle for this process because you have to produce something better than what’s out there currently.
No Short Cuts
Toy data sets are easily available these days, and even machine learning algorithms can be gotten essentially off of the shelf. This gives people a sense of ease when it comes to being able to take the training wheels off and apply this to new datasets.
There in lies the problem: with a nano-degree, you may feel like you have accomplished something great and greatly learned, but you’ve only been introduced to the field. A nano-degree in data science is great for someone already steeped in data due to graduate school or their day job.
Many people also get master’s degree’s in data science, and while being a generalist is great, I still prefer someone who has a depth of field in at least one field. If you want to break into data science, consider getting a degree in an area that’s interesting to you and uses a lot of data. You can learn a lot of the data science stuff on the side or as part of the journey.
The best data scientists are the ones that love data throughout their lives. I know that sounds like some people have a natural inclination towards it; that has been my experience even though it is not everyone’s. I have found ways to use data to improve my life like budgeting, buying a car, determining when to leave a company, making espresso, and assessing the impact of articles I write. This is very natural to me and doesn’t feel like work.
Something to Consider
The majority of the people who have been data scientists until the past 2 or 3 years had masters’ or Ph.D.’s. To them (to us), we can see the difference between people with skin deep expertise and a great depth of field.
Even a newly graduated Ph.D. will not be called senior for a few years. So if you come out of a master’s or a bachelor’s, do a nano-degree, and within a year or two have the title of Senior Data Scientist, I’m skeptical. Even though you may want to think you have the same depth of field as other senior members in the field, most likely, you don’t. That’s okay; just be realistic about your skill level.
There’s a reason my hiring search for a data scientist in 2018 failed: I couldn't find a good one. One could argue I passed over good candidates, but hiring is usually by committee consensus. Everyone on my interview panel has a master’s degree, and half have a Ph.D. They want to work with reliable people whose skills they trust, so they error on the side of saying no. Out of 100 applicants, 40 phone interviews, and 6 in-person interviews, I ended up nobody.
In graduate school, my advisor told me that they have to be careful who they graduate with a Ph.D. because a new Ph.D. could be graduating other Ph.D.’s within a few years. So the cycle is short, and unqualified candidates will dilute the field. The same is true in data science: as less qualified people enter the field, they will gladly let more people of similar caliber in.
In Closing
Part of the difficulty of breaking into Data Science is simply the years it takes grinding away before people trust you to do the job of a data scientist. There’s no free lunch and no shortcuts, so work hard on some interesting problems, ingest all that data, and one day, you will form a cocoon and pop out a data science butterfly.
---------------------
If you like, follow me on Twitter and YouTube where I post videos espresso shots on different machines and espresso related stuff. You can also find me on LinkedIn.
Abandon Ship: How a startup went under
Professional Services industry at Gartner | ex-Deloitte, IBM, Intel
5 年Nice graphics to supplement article! Was wondering why in the flow chart the “enough” decision point went back to clean data instead of collect more data... seemed to make more sense, but am just curious. Another curiosity I had was the article didn’t seem to focus much on pragmatism and applicability. I’ve met plenty of “data scientists” in my own practice and especially while consulting... which offered near zero value to clients. They got stuck on which neural net or other svm machine they used and couldn’t pursuance a business person of the validity or use of model. But as you said, deep knowledge in the domain you practice is key because programming, stats programs, and ML algos are pretty easy to pick up in comparison. Also, I Would love to learn how your espresso improved as result of your experiments!
Data science ?????? | Ex-LinkedIn
5 年Whatever the easy way is, I’m pretty sure my 10 year PhD+postdoc wasn’t it. ??