Data Science Resources For Beginners
Raghoo Bokam
I help entrepreneurs convert website traffic into paying customers | Achieved 500% more revenue in online sales | E-commerce Strategist | Digital Marketer | Sales Funnel Expert
Hello and Welcome to Data Science Resources. I am creating this article so that I can curate all the resources I found online (by other publishers) that are related to data science for my future reference and for other data science aspirants.
You can find links to the pages by original authors in the articles. This is mostly Ctrl+c and Ctrl+V so Feel free to provide suggestions for modifications.
Happy Learning...
Table Of Contents
3.?Product
4.?Career Resources
Data Science Getting Started
Data Science is a multidisciplinary field covering at the very minimum - statistics, programming, machine learning?Drew Conway's venn diagram ?or?Cheat Sheet of a Modern Data Scientist . These topics are covered throughout this repo. I personally find the best way to learn a topic is to get my hands dirty quickly - with that in mind I would get to work in python and then implement different tools or theories into my toolkit as they are understood. If you haven't used python before I would strongly urge you to use the Codecademy course to familiarize yourself with the content and how to program. Good luck and have fun.
A note about order - I framed the contents in the Pipeline & Tools section order of the data pipeline starting with an acquisition, exploratory data analysis, cleaning data, model section & evaluation and then visualization.
Start
Data Science Courses:
Data Pipeline & Tools
Python
Python is my workhorse language specifically as it has many data sciences and statistic libraries, the ability to work in production environments and work on other problems outside of data science. There are many other languages that could be useful but are not covered here: Julia, R, Cython, Pig, Scala, Java, etc.
Data Structures & CS Topics
Statistics
Some primers on understanding statistics and other resources to get a deeper understanding.
Stats/Engineering Libraries
A collection of workhorse libraries that are elemental for any python data scientist.
Data Acquisition
Libraries that are very helpful for abstracting away some of the complications of scraping or working with HTTP.
Processing & Exploratory Data Analysis
A collection of documents explaining some of the ways to do processing & EDA.
Databases/Frameworks
A collection of databases & frameworks that are helpful for data management and are the industry standard.
Machine Learning
There is a lot of information available online about the theory, mathematical intuition, tuning for this discipline. Here are some tools that are currently available.
Machine Learning Theory
Deep Learning
Getting a lot of media traction is deep learning - get your feet wet with some of these resources:
Time-Series
Model Selection
Resources about how to decide on your model.
Model Evaluation
Resources to help with understanding model evaluation.
Feature Engineering
A critical element of Data Science to improve your performance but minimally talked about.
Additional Tools or Processes
Resources on other topics that are very helpful for data scientists and products.
领英推荐
Data Visualization
Collection of the best libraries that I know for easy and powerful data visualizations.
Other available Visualization Resources.
Design Theory
The importance of design theory in data visualization, storytelling and presentations could not be understated. It can take great content and make it confusing or virtually unusable, or it can make content sing and connect with the audience. Through a better understanding of design theory, UI principles, a data scientist (or anyone) can convey more understandable information to the intended audience and give a strong story to their content.
Ipython Notebook Tutorials
Collection of ipython notebooks that are helpful as examples to either using tools or to explain certain topics.
Data Sources
Collection of sites to access data if you want to build out a project or just use some of the tools for EDA.
New Data Tools
Aim to keep track of developing trends and new tech that is helpful for the practising Data Scientist. New might be a misnomer.
Other Useful Scripts
Product
Product Metrics
Understanding product, user behaviour, and product metrics are helpful for data scientists in the industry. Being able to help your product manager and team execute strategies by understanding the problem, metrics and what they understand facilitates a more fruitful relationship.
Team Communication & Business Tools
There are some very innovative new companies that are producing very effective tools to minimize and abstract away inefficient processes at companies. While it isn't strictly data science-related, these products could be very helpful to integrate with your teams to improve overall productivity.
Best Practices
Source control and keeping accurate documentation so that you and your colleagues can follow and reproduce your work is very important. I will add some best coding practices & data science practices.
Career Resources
Data Science Career Path
Types of Data Scientists
Not all Data Scientists are the same and it's critical for organizations to understand what it is they need, and how best to fill those roles and/or complement the skills of their team. Finding the organizational structure that enables the data scientists/data engineers within the organization and generates better results is also crucial. It should be given thorough consideration.
Data Science Applications/Use Cases
Data Science has so many different applications and use cases within the industry - many are continuously discovered. These resources provide some potential ideas.
Data Science Websites/Books
More resources for community-based information or hard copy books.
Data Science Meetups in the Bay Area
A great way to meet other Data Scientists and keep up to date with best practices.
Data Science Blogs
Data Science Conferences
Data Science Presentations
Relevant Business Processes
Start-Up Resources
Open Source Data Science Resources
While the name might sound redundant this section represents other sites or repos that have aggregated information covering similar topics. Tons of great content on these sites - definitely go check them out.
Other Open Source Data Science Content
There are some really great resources linked within this section covering all of Data Science, the entire data pipeline, machine learning, statistics, python, etc. Go check them out.
Auxiliary Content & Apps