Advice on Starting to Learn Data Science
"Metadata record of the person of Johann Wolfgang Goethe in exchange format MARC 21" by Deutsche National Bibliothek, licensed under CC BY 4.0

Advice on Starting to Learn Data Science

People over the last few months have asked how to start learning #datascience given the explosion of news around #artificialintelligence. I have sent some ideas out a few times now and thought, why not share for all, it is not a secret!? Happy to help all! NB - Given the recent news that Python will run within Excel soon, this is all that more timely.

Many new learners, to me, start in the wrong place. People hear that Neural Networks (NNs) are the backbone of Generative AI, so they try to learn how a Neural Network works but end up ‘rage quitting’ when trying to understand linear algebra, calculus, and other math concepts. Not a perfect analogy, but this is akin to learning how an internal combustion engine works when wanting to learn to drive a car – it is not the right approach.

IMHO, the best way to learn data science is to learn firsthand. Come at this from finding a problem and learn how and when to use the available tools for a solution. Not only will you start to understand the concepts better, by applying it to a digestible problem, you will keep your attention, stay focused, and learn real-world applicability of the tools.

Ok, Mike, where can I find good problems to solve? A great resource is the site Kaggle – where people compete to solve data problems. One of the simple use cases in Kaggle is trying to predict who might have survived or perished the Titanic Disaster. To me it is a good starting point as it has only a few variables, it is easy to understand the data, and it does not require AI to get to a top answer (simpler data science tools will work here).

There are plenty of courses, materials, and more available out there on this topic. I like this series (which the author has updated) on YouTube to explain some of the basic data science principles needed to analyze the Titanic data – separating the data into a train vs test set, how to clean and prepare data for learning, various approaches to building models, etc.?Again, there are others, but you can start with this video on YouTube (with a caveat that the test file has changed a bit since this recording so you’ll need to manipulate it a bit – a real world problem for sure), then you can learn concepts that are a bit more complex, such as Cross Validation.?(These videos use the R programming language, which I personally prefer for working with data and statistics over Python, but both are good languages for this stuff. I am positive you can find comparable videos in Python.)

This is it; this is how I recommend starting - try playing around with the Titanic dataset and learn the tools and concepts. Do not cheat by looking up who survived or not – the goal is not to get 100% accuracy; it is not possible to get 100% accuracy by this dataset alone (which is another good real-world problem to understand).

Once you feel comfortable with this analysis, then look at Kaggle for other real world “competitions.”?Learn other use cases for data, techniques, etc. At some point you will want to move from a Random Forest and Gradient Boost to a real Neural Network. Sure, NNs are cool, but they are overkill for many problems; sometimes a “simple” multivariable regression works fine. An aside, with data science I recommend you follow the adage “don’t use a cannon to kill a mosquito.”

I am happy to answer questions, give pointers, etc., but there are so many resources out there to aid you. Find some that work for you! And good luck; I hope you find this a worthwhile effort.

A note on Generative AI - Once you understand the basics… you MIGHT want to look at GPT-3/4 transformers (which is based on Neural Networks). You can watch some videos on how that works too, but that is really 1) just an extension of Neural Networks, and 2) more in the Natural Language Processing space vs problem solving / data science. It is apples and oranges.

?

Michael, thanks for sharing

要查看或添加评论,请登录

Michael Silverman的更多文章

  • What Does VAR Teach us About AI?

    What Does VAR Teach us About AI?

    Earlier this month (Nov ’23), Newcastle defeated Arsenal 1-0 in a spirited Premier League (football/soccer) match. The…

    3 条评论
  • ChatGPT's Likely Impacts on Malware and Fraud

    ChatGPT's Likely Impacts on Malware and Fraud

    I am seeing a lot of headlines around ChatGPT lately; some seem accurate, and some seem like click-bait and fear…

    8 条评论
  • What does it take to hold a large event in 2022?

    What does it take to hold a large event in 2022?

    In the summer of 2021, we as a team felt that holding in-person events in 2022 seemed possible. If we’re going to do…

  • Real Considerations on AI for CIOs

    Real Considerations on AI for CIOs

    Artificial Intelligence (AI) and Machine Learning (ML) are buzz words making their way across companies, conferences…

    2 条评论
  • Agile 101: Acceptance Criteria vs. Definition of Done (DoD)

    Agile 101: Acceptance Criteria vs. Definition of Done (DoD)

    When I am teaching agile to new group of students, I find one of the biggest questions is: what does it mean to be…

    3 条评论
  • What is Innovation?

    What is Innovation?

    A friend asked me the other day, how do I define “Innovation?” Yes, there is the technical definition to create…

    7 条评论
  • IT Definitely Matters

    IT Definitely Matters

    In 2003, Nicholas Carr wrote in Harvard Business Review that "IT Doesn't Matter," that investments in IT do not deliver…

  • Does Agile "Speed up" Project Work?

    Does Agile "Speed up" Project Work?

    Teams moving to Agile commonly ask me if it will “speed up” their project over Waterfall. I find that teams making the…

    5 条评论

社区洞察

其他会员也浏览了