Data Science: Essentials
This is for those new to data science or are interested in getting involved. These are the attributes I think are key for me to function as a data scientist.
Fundamental knowledge
- An engineering or science field of study gives a great leg up.
- Data science can be used for any kind of data for any kind of field, but having a primary field of study before data science helps determine how to look at data, determine if you have the right data, and figure out how to get the right data. Think of it like partial fractions, if you don't know how to solve partial fractions, you can't solve differential equations using Fourier Transforms.
- Do you understand statistics? Do you know what a t-test is? Two-tailed t-test? Statistical significance? Sample population? Normal distribution?
- Standard Deviation: Do you understand what standard deviation is, how it is used, and how it is related to a normal distribution?
Data
- Do you have data?
- Design of Experiment (DOE): Do you know how to get more useful data?
- Do you know how to clean data?
- Do you know how to check data after data collection?
- Do you know how to analyze data?
- Do you know how to present data outside of the curse of knowledge?
Analyzing/Presenting Data
Remember: You are bound by the curse of knowledge which means that you are so deep in the data you're analyzing, it may be difficult to communicate concisely what the results are to non-experts. Most execs spend less than a few minutes a month looking at your work, so they don't have the depth you do, and you need to make that message clear.
The best plot is the one where the conclusion is clear and the audience draws upon the same conclusion as you without suggestion. Here are some good methods you should know when it comes to data presentation:
Tools
- Programming Language: Matlab, Python, or R
- Data Inspection: Numbers and/or Excel
- Data Presentation: Keynote and/or Powerpoint
- Scripting: Bash, Python, Perl, etc.
Advanced Knowledge
The advent of AI, machine learning, and computer vision has begun to really affect our lives. Therefore, I think it is important to understand the fundamentals of these techniques to be an effective data scientist because most data will be filtered through these methods before analysis.
- Neural Networks (NN) and/or Convolutional Neural Networks (CNN)
- Map Reduce
- Natural Language Processing
- General computer vision or signal processing
- K-means Clustering
In Conclusion
These skills are not something that happens overnight or over the course of a few weeks or months. Usually, these skills take years to mature as you're given opportunities to improve them. Good opportunities are challenging problems with no clear or obvious end in sight, and solving them requires hard work, persistent energy, and trudging onwards through the depths of despair.
Chief Technology Officer @ Wowzi | Technology + Software + Product + Data Engineering Strategy and Management | Fintech | SaaS Platforms| Serverless Microservices | Event-Driven Architecture | Automation Junkie
6 年Nice write up. However, I'm not too sure this will excite newcomers. In short, it sort of made it a bit too technical and so boring and unattractive.
Curious, Always Striving to Grow
6 年Thanks for sharing this! I'm transitioning from a fire service career into data analysis and this confirmed for me where I need to buttress my skills. I have a degree in mathematics, so the statistics and fundamentals are there; I'm quite proficient with MS Office, including Excel and PowerPoint, as well. So I think I need to focus on my programming. I'm leaning towards Python as the language to learn (my prior comp sci is S-Basic, Pascal, and V-Basic). Do you have any insights on what differentiates Python, R, and MatLab? Do certain industries prefer one over the other or is it more a persona preference from one scientist to the next?
Analytical Lead, Financial Services
6 年Hey Robert McKeon Aloe! Thanks for sharing this, it surely outline the data science job! What is really essential to me for doing data science is problem solving mindset and curiousness about dataset of behavior and what all can be modeled in, developing results!