Data Science: Essentials

Data Science: Essentials

This is for those new to data science or are interested in getting involved. These are the attributes I think are key for me to function as a data scientist.

Fundamental knowledge

  • An engineering or science field of study gives a great leg up.
  • Data science can be used for any kind of data for any kind of field, but having a primary field of study before data science helps determine how to look at data, determine if you have the right data, and figure out how to get the right data. Think of it like partial fractions, if you don't know how to solve partial fractions, you can't solve differential equations using Fourier Transforms.
  • Do you understand statistics? Do you know what a t-test is? Two-tailed t-test? Statistical significance? Sample population? Normal distribution?
  • Standard Deviation: Do you understand what standard deviation is, how it is used, and how it is related to a normal distribution?

Data

  • Do you have data?
  • Design of Experiment (DOE): Do you know how to get more useful data?
  • Do you know how to clean data?
  • Do you know how to check data after data collection?
  • Do you know how to analyze data?
  • Do you know how to present data outside of the curse of knowledge?

Analyzing/Presenting Data

Remember: You are bound by the curse of knowledge which means that you are so deep in the data you're analyzing, it may be difficult to communicate concisely what the results are to non-experts. Most execs spend less than a few minutes a month looking at your work, so they don't have the depth you do, and you need to make that message clear.

The best plot is the one where the conclusion is clear and the audience draws upon the same conclusion as you without suggestion. Here are some good methods you should know when it comes to data presentation:

Tools

  1. Programming Language: Matlab, Python, or R
  2. Data Inspection: Numbers and/or Excel
  3. Data Presentation: Keynote and/or Powerpoint
  4. Scripting: Bash, Python, Perl, etc.

Advanced Knowledge

The advent of AI, machine learning, and computer vision has begun to really affect our lives. Therefore, I think it is important to understand the fundamentals of these techniques to be an effective data scientist because most data will be filtered through these methods before analysis.

  1. Neural Networks (NN) and/or Convolutional Neural Networks (CNN)
  2. Map Reduce
  3. Natural Language Processing
  4. General computer vision or signal processing
  5. K-means Clustering

In Conclusion

These skills are not something that happens overnight or over the course of a few weeks or months. Usually, these skills take years to mature as you're given opportunities to improve them. Good opportunities are challenging problems with no clear or obvious end in sight, and solving them requires hard work, persistent energy, and trudging onwards through the depths of despair.

Paschal Chukwuemeka Amah

Chief Technology Officer @ Wowzi | Technology + Software + Product + Data Engineering Strategy and Management | Fintech | SaaS Platforms| Serverless Microservices | Event-Driven Architecture | Automation Junkie

6 年

Nice write up. However, I'm not too sure this will excite newcomers. In short, it sort of made it a bit too technical and so boring and unattractive.

回复
Dudley Elvery

Curious, Always Striving to Grow

6 年

Thanks for sharing this! I'm transitioning from a fire service career into data analysis and this confirmed for me where I need to buttress my skills. I have a degree in mathematics, so the statistics and fundamentals are there; I'm quite proficient with MS Office, including Excel and PowerPoint, as well. So I think I need to focus on my programming. I'm leaning towards Python as the language to learn (my prior comp sci is S-Basic, Pascal, and V-Basic). Do you have any insights on what differentiates Python, R, and MatLab? Do certain industries prefer one over the other or is it more a persona preference from one scientist to the next?

Neha V S

Analytical Lead, Financial Services

6 年

Hey Robert McKeon Aloe! Thanks for sharing this, it surely outline the data science job! What is really essential to me for doing data science is problem solving mindset and curiousness about dataset of behavior and what all can be modeled in, developing results!

要查看或添加评论,请登录

Dr. Robert McKeon Aloe的更多文章

  • Ph.D. Interviews

    Ph.D. Interviews

    I have interviewed mostly Ph.D.

  • How to break into Data Science the easy way

    How to break into Data Science the easy way

    Scratch that; there’s not an easy way. Data science has become a hot topic the past few years along side machine…

    5 条评论
  • ML: Examining the Test Set

    ML: Examining the Test Set

    I recently saw a post where someone said “Never touch your test set.” The theory was that you (as the algorithm…

    8 条评论
  • Privacy in Machine Learning: PII

    Privacy in Machine Learning: PII

    Privacy is not a value explicitly written into the US Constitution, but the essentials are there. As a democratic…

    1 条评论
  • Mastering LinkedIn

    Mastering LinkedIn

    Account Creation I never had a LinkedIn account until I was searching for a job, and then I only paid attention to it…

    1 条评论
  • Withdrawing a Conference Paper

    Withdrawing a Conference Paper

    In graduate school, I tried all sorts of optimizations aimed at making my face matcher work better and faster. I found…

    1 条评论
  • Thoughts on Leaving

    Thoughts on Leaving

    Relax, I’m not leaving my current job right now. I’ve been writing about many different aspects of my work experience…

  • Crashing the Student Computer Lab

    Crashing the Student Computer Lab

    In my last year of graduate school at Notre Dame, I used over 1,000,000 computer hours or just over 114 years of…

    3 条评论
  • Presentation Essentials

    Presentation Essentials

    I have fallen asleep in my fair share of presentations, and I’ve worked hard at making sure my presentations are not…

  • Design of Experiment: Data Collection

    Design of Experiment: Data Collection

    Anyone can collect data; some people can collect good data. The key theme to any good data collection is data…

社区洞察

其他会员也浏览了