Introduction to Data Science for Python
Malini Shukla
Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist
What is Data Science?
Before we start Data Science Tutorial, we should find out what data science really is.
Data science is a way to try and discover hidden patterns in raw data. To achieve this goal, it makes use of several algorithms, machine learning(ML) principles, and scientific methods. The insights it retrieves from data lie in forms structured and unstructured. So in a way, this is like data mining. Data science encompasses all- data analysis, statistics, and machine learning. With more practices being labeled into data science, the term itself becomes diluted beyond usefulness. This leads to variation in curricula for introductory data science courses worldwide.
Do you know the Best Data Scientist Certifications to Choose from
Data Science Tutorial – History
Through the recent hype that data science has picked up, we observe that it has been around for over thirty years. What one we could use as a synonym for practices like business analytics, business intelligence, or predictive modeling, now refers to a broad sense of dealing with data to find a relationship within it. To quote a timeline, it would go something like this:
a. In 90’s
- 1960- Peter Naur uses the term as a substitute for computer science.
- 1974- Peter Naur publishes Concise Survey of Computer Methods, uses a term in a survey of contemporary data processing methods.
- 1996- Biennial conference in Kobe; members of the IFCS (International Federation of Classification Societies include the term in the conference title.
- 1997- November- Professor C.F. Jeff Wu delivers inaugural lecture on the topic “Statistics=Data Science?”.
b. In 20’s
- 2001- William S. Cleveland introduces data science as an independent discipline in article Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.
- 2002- April- The ICSU (International Council for Science): Committee on Data for Science and Technology (CODATA) starts Data Science Journal- this publication is to focus on issues pertaining to data systems- description, publication, application, and also legal issues.
- 2003- January- Columbia University publishes journal The Journal of Data Science- a platform that allows data workers to exchange ideas.
- 2005- National Science Board publishes Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century- this provides a new definition to the term “data scientists”.
- 2007- Jim Gray, Turing awardee, envisions data-driven science as the fourth paradigm of science.
- 2012- Harvard Business Review article attributes coinage of the term to DJ Patil and Jeff Hammerbacher in 2008.
- 2013- IEEE launches a task force on Data Science and Advanced Analytics; first European Conference on Data Analysis (ECDA)organized in Luxembourg, European Association for Data Science (EuADS) comes into existence.
- 2014- IEEE launches first international conference International Conference on Data Science and Advanced Analytics; General Assembly launches student-paid Bootcamp, The Data Incubator launches data science fellowship for free.
- 2015- Springer launches International Journal on Data Science and Analytics.
Data Science Tutorial – Methodologies
In this Data Science Tutorial, we will cover the following Methodologies in data Science:
a. Machine Learning for Pattern Discovery
With this, clustering comes into play. This is an algorithm to use to discover patterns; an unsupervised model. When you don’t have parameters on which to make predictions, clustering will let you find hidden patterns within a dataset.
One such use-case is to use clustering in a telephone company to determine tower locations for optimum signal strength.
b. Machine Learning for Making Predictions
When we have the data we need to train our machine, we can use supervised learning to deal with transactional data. Making use of machine learning algorithms, we can build a model and determine what trends the future will observe.
c. Predictive Causal Analytics
Causal analytics lets us make predictions based on a cause. This will tell us how probable an event is to hold occurrence in future. One use-case will be to perform such analytics on payment histories of customers in a bank. This tells us how likely customers are to reimburse loans.
Have a look at – 30 Most Popular Data Science Interview Questions
d. Prescriptive Analytics
Predictive analysis will prescribe your actions and the outcomes associated with those. This intelligence lets it take decisions and modify those using dynamic parameters. For a use-case, let us suggest the self-driving car by Google. With the algorithms in place, it can decide when to speed up or slow down, when to turn, and which road to take.
Data Science Applications
Let’s see some applications in this Data Science Tutorial:
a. Image Recognition
Using the face recognition algorithm of data science, we can get a lot done. Did Facebook ever suggest people tag in your pictures? Have you tried the search-by-image feature from Google? Do you remember scanning a barcode to log in to WhatsApp Web using your smartphone?
b. Speech Recognition
Siri, Alexa, Cortana, Google Voice all make use of speech recognition to understand your commands. Attributing to issues like different accents and ambient noise, this isn’t always completely accurate, though intelligible most of the time. This facilitates luxury like speaking the content of a text to send, using your virtual assistant to set an alarm, or even use it to play music, inquire about the weather, or make a call.
c. Internet Search
Search engines like Google, Duckduckgo, Yahoo, and Bing make good use of data science to make fast, real-time searching possible.
d. Digital Advertisements
Data science algorithms let us understand customer behavior. Using this information, we can put up relevant advertisements curated for each user. This also applies to advertisements as banners on websites and digital billboards at airports.
e. Recommender Systems
Names like Amazon and Youtube will throw in suggestions about similar products aside or below as you browse through a product or a video. This enriches the UX(user experience) and helps retain customers and users. This will also take into account the user’s search history and wishlist.
Let’s explore the Future of Data Science – Data Science Career Prospects
f. Price Comparison Websites
Websites like Junglee and PriceDekho let us compare prices for the same products across different platforms. This facility lets you make sure you grab the best deal. These websites work in the domains of technology, apparel, and policy among many others, and use APIs and RSS feeds to fetch data.
g. Gaming
As a player levels up, a machine learning algorithm can improve or upgrade itself. It is also possible for the opponent to analyze the player’s moves and add an element of difficulty to the game. Companies like Sony and Nintendo make use of this.
h. Delivery Logistics
Freight giants like UPS, FedEx, and DHL use practices of data science to discover optimal routes, delivery times, and transport modes among many others. A plus with logistics is the data obtained from the GPS devices installed.
i. Fraud and Risk Detection
Practices like customer profiling and past expenditures let us analyze whether there will be a failure. This lets banks avoid debts and losses.
Business Intelligence vs Data Science
Here, in this part of Data Science Tutorial, we discuss Data Science Vs BI. Business intelligence and data science aren’t exactly the same thing.
- BI works on structured data; data science works on both- structured and unstructured data.
- Where BI focuses on the past and the present, data science considers the present and the future.
- The approach to BI is statistics and visualization; that to data science is statistics, machine learning, graph analysis, and NLP.
- Some tools for BI are Pentaho, Microsoft BI, and R; those for data science are RapidMiner, BigML, and R.
Let’s Explore the Difference Between Data Science vs Data Analytics