Advice for Aspiring Data Scientists
Kate Strachnyi
Data & AI Content Creator at DATAcated? (Influencer Marketing) Tech Industry Analyst | DATAcated Speaker Hub
Data scientists are in high demand, for those that are thinking about getting into this field here is some advice.
Don’t try to learn it all at once
Take your time and don’t try to learn multiple languages (e.g both Python and R), don’t start with something complex (like machine learning), make sure you learn some of the basics before you jump into something too difficult.
This was a lesson that Vaibhav Gedigeri from Honeywell had to learn early on. “That’s the first mistake I made, I wanted to put my hands on everything. I was like a small kid, you know you give them a lot of toys and they are like I want that toy. That’s exactly what I was doing, I wanted to learn Python, I wanted to learn SAS, I wanted to learn big data, Spark, etc. and it got me nowhere. Then I decided I have to focus on one or two tools; since then I decided I will only work on open source tools. I gave up on SAS and focused pretty much on python and R, and I progressed moving into Spark.
Vaibhav Gedigeri - Data Scientist at Honeywell
I can personally relate to this. When I started learning R, my first project that I gave myself was to try and pull down some Twitter data using R. I ran into an issue that was preventing me from being successful and I was talking to a data science colleague of mine. She mentioned that she’s done this with Python before and has some script that can help me. When I asked for it she told me to stick with R as Python will have its own issues and I’m better off learning one language at a time. I’m very glad I listened to her because within an hour of our conversation I was able to address the issue and execute on my project successfully. Side note – the issue I was running into was actually due to the fact that my Twitter App wasn’t properly working; it wasn’t even an issue with R!
Learn the underlying logic
It is important to have an understanding of what takes place under the hood of the algorithms; this can help you avoid erroneous output. You will be able to address issues as they come up and actually understand what you are trying to do.
“To answer the business problem is more important than the tool knowledge. It is important that you know the tools and to use them; but only focusing on the tools, only focusing on how to start programing on Python or other tools is not as important. I mean that is a very simple part, what is important is to understand how to reduce the problem, how do you know which variables to choose, how do you know which algorithm to apply, when to use machine learning and when not to use it – this is the real value of a data scientist.
It is still important to start off with fundamentals, I started with two years of reading about mathematics and statistics and it actually helped me to understand the mechanism of how algorithms works. When I know what’s under the hood and how it works, it makes me feel more comfortable.”
Vaibhav Gedigeri - Data Scientist at Honeywell
Learn by doing projects
Several areas where you can find projects to do with data sets, Kaggle is one of them.
Study job descriptions of the types of roles you would want to have and get an idea of what is required. This way you will be able to know what skills are needed and what experiences are preferred. You can basically work backwards to engineer your own skills based on what is in demand.
When you execute on projects or Kaggle competitions, document the steps taken and approach, as well as lessons learned to build up a project portfolio that you can show to recruiters and hiring managers. It is also a great way to keep track of what you are learning and to leverage what you learn in future projects.
Andrew Paul Acosta, from Milesius Capital Resources, has the following advice, “Number one, you need to practice. So, find a project to work on. You need to understand how to solve data problems; and then once you figure that out, move on to another one. The second thing is to keep updated on current technologies, languages and so on. You don’t have to know everything about R, or Python but know enough about the libraries available and uses of the tools and have an opinion. Lastly, you need to have a natural curiosity; don’t be satisfied with the answers that you find, continue to ask questions and be curious.
Andrew Paul Acosta- Data Scientist at Milesius Capital Resources, LLC
For additional advice from data scientists, you can read Journey to Data Scientist (link to book on Amazon).
Senior NLP Engineer
7 年Learn the holy trinity of math - linear algebra, regression, and probability. Also, don't call yourself a "Data Scientist". The term went out of fashion years ago.
NLP Researcher | Author
7 年Stop using the word "aspiring", sounds really very dumb.
Machine Learning Scientist | AI Researcher
7 年Sneha Rautmare, good tips by Kate Strachnyi ?.
Lead Data Scientist | Generative AI - EY
7 年Thanks Kate Strachnyi ?
Senior Applied AI Scientist
7 年Solid advice!