登录查看更多内容

An Intro to The Industry of the 21st Century - Data Science. ????

Nael Lakhani

Building AI Voice Agents @ Aiva | Computer Science @ WesternU

发布日期: 2022年11月30日

We live in a world full of data where 2.5 quintillion bytes of data are produced daily (that’s 2.5 followed by a staggering 18 zeros!). This is why about 80% of firms across the globe are investing a large part of their earnings into creating a skillful data analytics division. Yet businesses are only limited to 5% of the information available because 80 to 90% of their data is mostly unstructured, meaning it's nearly impossible to organize and find the insights you need. However, data science is one of the only data jobs that deal with unstructured data, making it one of the most valuable jobs in the industry.?

So what is data science? The sexiest job of the 21st century.?

What Is Data Science?

Data science is the field of study that combines mathematics, computer science, and specific domain knowledge to derive meaningful information from data. Data scientists use machine learning algorithms (machines that imitate human behavior by finding patterns from data) to then create predictive models (models that predict likely future outcomes from historical and existing data) to help extract valuable information from both structured and unstructured data so businesses can make better decisions. Getting valuable results from data is a lengthy task and is known as the data science lifecycle.

How Does Data Science Work??

1.) Identify and understand the specific problem

Creating a specific and clear problem statement is one of the first and critical steps in all data science projects. Many companies are too vague when defining data problems such as:?

I want to increase the revenues of my company.
I want to predict stock prices.
I want to recommend personalized products to customers on my website.?

So, it's the data scientists' job to communicate actively in meetings and ask the right questions to create a clear and goal-oriented problem statement.?

Here’s an example of a specific and well-defined problem statement that we will be using throughout the article:

I want to predict seniors falling before it happens.?

Having a well-defined problem statement gives data scientists a clear direction on which sources to collect data from.?

2.) Data Collection and Cleansing

Data collection is the process of gathering relevant information from a variety of sources. Depending on the problem trying to be solved, the method of data collection is divided into two categories.?

Primary Data Collection: When you have a unique problem where no public data is available, new data must be collected through surveys and interviews.?

Secondary Data Collection: This is data from available open-source websites such as GitHub and kaggle.??

For our problem statement on predicting senior falls, you can collect data from online sources such as PointClickCare and also interview seniors in senior homes to collect data.?

After collecting data, one of the lengthiest and most tedious steps in the data science lifecycle comes into play: data cleansing. Data comes in a variety of formats and can be sorted into one of two categories: structured and unstructured data. Having the skill to work with unstructured data is exclusive to data scientists which makes them unique as using unstructured data requires an understanding of the topic of the data and understanding how the data is related. When combining multiple data sources, data can be incorrect, corrupted, incorrectly formatted, duplicates, or incomplete values which can end up generating inaccurate models and choosing insignificant variables for statistical analysis.?

3. Exploratory Data Analysis

After data collection and cleanup, we are finally able to perform data analysis and build familiarity with the data and see its potential. Data can be understood and analyzed through statistical and visualization methods which can be done through excellent open-source data science libraries.

领英推荐

Introduction To Data Science: A Comprehensive Guide…

Ze Learning Labb 11 个月前

What Is The Difference Between Big Data And Data…

Ze Learning Labb 2 个月前

8 Steps In Data Science Process Decoded – 4th One Is…

Ze Learning Labb 1 年前

Here are some examples:

NumPy - https://numpy.org/

Uses fast and flexible data structures that are designed to work with structured data very easily.??
Excellent tool for performing data analysis on data sets due to its strong and fast numerical computations with arrays and functions. Ex. average of a set of values or calculating standard deviation.?
Contains multidimensional arrays: capacity to hold different columns
Ideal for machine learning due to its capability to work in linear algebra

Pandas - https://pandas.pydata.org/

Uses fast and flexible data structures that are designed to work with structured data very easily.??

MatplotLib - https://matplotlib.org/

Extensively used for data visualization due to the graphs and plots it produces.?
Applications: correlation analysis of variables, outlier detection using a scatter plot.

In our example of predicting seniors' falls, we can calculate important factors such as calculating the number of steps taken per day, blood pressure, medications taken, gait, etc.?

Overall, exploratory data analysis is an important step since it helps us understand the data better so we can make a better model selection.?

Data Modeling

Data Modeling is the process of producing a descriptive diagram of relationships between various types of key data points used and stored within the database to achieve the solution. To build relationships between variables in the data, probability and inferential statistics are used.?

For seniors' data, we can propose if there’s a relationship between age and your risk of falling down, we could model the data into a graph that could look like this:?

Data Communication?

This is the final step where results from the analysis are presented to stakeholders. Findings are usually presented to a non-technical audience such as the marketing team or business executives, so you must explain how you got to a specific conclusion and your findings. Results need to be communicated in a simple manner. Graphs and presentations are used to convey results and this is where the python libraries used above come into play.?

Know your audience and speak their language?
Focus on values and outcomes?
Communicate assumptions and limitations

Key Takeaways:

Data science is a very important field that's demand will increase in the future as companies continue to produce more data and become more data-driven in their approaches and decision-making.
When implementing data science in a company, we go through a process in which at a high level we: choose our specific problem statement, gather data, filter it, analyze it & model it.
There are lots of resources out there that make it easy to model & analyze data such as matplotlib, NumPy & Panadas!

Thanks so much for taking the time to read my article! If you have any questions comment down below or email me at [email protected] .

Until next time... ??

An Intro to The Industry of the 21st Century - Data Science. ????

Nael Lakhani

Building AI Voice Agents @ Aiva | Computer Science @ WesternU

What Is Data Science?

How Does Data Science Work??

1.) Identify and understand the specific problem

2.) Data Collection and Cleansing

3. Exploratory Data Analysis

领英推荐

NumPy - https://numpy.org/

Pandas - https://pandas.pydata.org/

MatplotLib - https://matplotlib.org/

Data Modeling

Data Communication?

Key Takeaways:

社区洞察

其他会员也浏览了

Why is Data Science important?

Roadmap to Becoming a Data Scientist In 2023-24

Role of Data Science in the Business World

Key Differences Among Data Science, Data Engineering, and Data Analytics with Salary Insights

How do you Define “Data Science” and “Data Scientist" December 2024

Cracking the Code: Distinguishing Data Science from Other Data Disciplines

Avoiding Common Mistakes in Data Science: A Complete Guide

Think Like a Pro: Data Science Challenges That Sharpen Your Critical Thinking

A Beginner’s Guide To Data Science

Data Science VS Data Analytics: What’s the Difference?