登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

For Statistics Beginners - Types of data

Shintaro Nakabayashi

CEO, Board member of KPMG Advisory Lighthouse, Inc. KPMG Japan - CDO(Chief Data Officer), Strategy and Intelligence, Data & AI-Advanced Analytics Japan lead.

发布日期: 2018年7月2日

Question:One supermarket collects cardholder's monthly purchase data. Which of the following quantitative variables is it?

Number of items purchased
Purchasing store name
3day of purchase

Data Types are an important concept of statistics, which needs to be understood, to correctly apply statistical measurements on your data.There are various kinds of data. As the viewpoint of data, the graph to use and the method of analysis are different on each type, it is very important to understand what kind of features it has.

The quantitative data (quantitative variables)

Quantitative data is data that can be counted or expressed numerically. It is commonly used to ask “how much” or “how many” and can be used to study events or levels of occurrence. Because it is numerical in nature, quantitative data is both definitive and objective. It also lends itself to statistical analysis and mathematical computations and therefore, is typically illustrated in charts or graphs.

There are two main types of quantitative data: discrete and continuous. Discrete data is described as having a finite number of possible values. For example, if a teacher gives an exam that has 100 questions, the exam scores reflect the number of answers that were correct out of the 100 possible questions. Discrete data may also be defined as data where there is space between values on a number line, thus values must be a whole number.

For example, if a study examined the number of vehicles owned by households in America, the data collected would be whole numbers. Continuous data is defined as data where the values fall on a continuum and it is possible to have fractions or decimals. Continuous data is usually a physical measurement. Examples may include measurements of height, age, or distance.

Continuous data:Continuously continuing without interruption, such as height, time, temperature, etc. Data that can be inferred finely. Example) Next to 175.0 cm is 175.000 ...... 001 cm,
Discrete data (discontinuous data): Data that can not be inferred in general, such as the number of people, the number of times, etc. Example) When counting the number of people, the next one is generally two people, not 1.00 ...... 001 people.

The qualitative data (qualitative variables)

Qualitative variables are those whose data is indicated by category. As the name implies, the "quality" is different between the data. As an example, Favorite color,Room layout,sex,name,And so on. Since these are not numerical data, they can not be used for calculation as it is. In order to use it for calculations, special measures are required.My favorite sports, blood type, car number, etc. are just for distinguishing categories and types.

The flow data (flow)

It is data that shows the amount of change that flowed over a certain period of time. Example) From the amount of water flowing in the tub, minus the amount of water exiting the tub (liter per minute)

The stock data (stock)

It is data that shows the amount accumulated at a certain point of time. Example) Amount of water accumulated in a tub (liter at 1:00 pm) This qualitative data as well as quantitative data are further divided into two. First, qualitative data is classified as "nominal scale" and "ordinal scale".

The nominal scale

It is a scale that has significance only in distinction, merely showing men as examples, merely expressing certain classifications as 2, like females, and of course having no meaning in the order of choices.

The order scale

It has meanings in the order of choices such as (1 very good 2 somewhat good 3 somewhat bad 4 very bad). Median is a measure of meaning.

The Scale level

Based on the nature of the information, the scale level can be roughly divided into two types, quantitative and qualitative, from which it can be further classified into four levels.

The proportional scale

Proportional scale has zero as "nothing", that is, it has a special meaning as a base point (origin). For example, elapsed time and speed, height, weight, blood pressure etc. Because the ratio has meaning, all four arithmetic operations of data are possible. Most statistics have meaning, and there are many analysis methods that can be used. Example) elapsed time,Speed,Height,Weight etc.This data is a quantitative variable. It allows addition, subtraction, multiplication and division, so it can be used for various analysis methods.

Nominal scale

Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales could simply be called “labels.” Here are some examples, below. Notice that all of these scales are mutually exclusive (no overlap) and none of them have any numerical significance. A good way to remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind of like “names” or labels.Examples of Nominal Scales

Note: a sub-type of nominal scale with only two categories (e.g. male/female) is called “dichotomous.” If you are a student, you can use that to impress your teacher.

Continue reading about types of data and measurement scales: nominal, ordinal, interval, and ratio…

Ordinal scale

With ordinal scales, it is the order of the values is what’s important and significant, but the differences between each one is not really known. Take a look at the example below. In each case, we know that a #4 is better than a #3 or #2, but we don’t know–and cannot quantify–how much better it is. For example, is the difference between “OK” and “Unhappy” the same as the difference between “Very Happy” and “Happy?” We can’t say.Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.

“Ordinal” is easy to remember because is sounds like “order” and that’s the key to remember with “ordinal scales”–it is the order that matters, but that’s all you really get from these.

Advanced note: The best way to determine central tendency on a set of ordinal data is to use the mode or median; the mean cannot be defined from an ordinal set.

Interval scales

Interval scales are numeric scales in which we know not only the order, but also the exact differences between the values. The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example of an interval scale in which the increments are known, consistent, and measurable.Interval scales are nice because the realm of statistical analysis on these data sets opens up. For example, central tendency can be measured by mode, median, or mean; standard deviation can also be calculated.

Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself means “space in between,” which is the important thing to remember–interval scales not only tell us about order, but also about the value between each item.

Here’s the problem with interval scales: they don’t have a “true zero.” For example, there is no such thing as “no temperature.” Without a true zero, it is impossible to compute ratios. With interval data, we can add and subtract, but cannot multiply or divide. Confused? Ok, consider this: 10 degrees + 10 degrees = 20 degrees. No problem there. 20 degrees is not twice as hot as 10 degrees, however, because there is no such thing as “no temperature” when it comes to the Celsius scale. I hope that makes sense. Bottom line, interval scales are great, but we cannot calculate ratios, which brings us to our last measurement scale…

Ratio

Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition of zero. Good examples of ratio variables include height and weight.

Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.

要查看或添加评论，请登录

Shintaro Nakabayashi的更多文章

The Era of Survival for Companies with Data and Algorithm Competitiveness ~Transforming Social and Business Challenges into Mathematical Challenges~

2022年5月4日

The Era of Survival for Companies with Data and Algorithm Competitiveness ~Transforming Social and Business Challenges into Mathematical Challenges~

Humans have contributed to the development of civilization by using mathematical methods to address many social and…

1 条评论
Large-scale change management

2021年2月24日

Large-scale change management

Toyota (TOYOTA) announced "Woven City" in Japan, Susono City, Shizuoka Prefecture at the world's largest electronics…

1 条评论
What work and activities are best for your team?

2019年12月14日

What work and activities are best for your team?

What work and activities to solve an issue are best for your team? Effective teamwork can reduce individual work…

1 条评论
Identify "personality" of companies that can not be substituted by machine learning

2018年11月15日

Identify "personality" of companies that can not be substituted by machine learning

Many human tasks were automated in the era of ERP, in fact it was a routine cognitive work (cognitive task) whose…
One that the next generation digital officer should have

2018年11月12日

One that the next generation digital officer should have

There is an article the continued relationship between SAP and McLaren. This is good new for SAP employee.

2 条评论
What is KPI in Digital Marketing activities?

2018年9月2日

What is KPI in Digital Marketing activities?

In this article, I would like to describe the situation that can be seen in many manufacturing and sales companies. It…

1 条评论
Short Topics of the "industry S curve"

2018年8月20日

Short Topics of the "industry S curve"

Last week I was in Singapore for a week.I participated in this weekly SAP session called "Digital Elite Program" and…
Denial and affirmative, sense to see two conflicting faces at the same time

2018年8月8日

Denial and affirmative, sense to see two conflicting faces at the same time

In the world, manufacturing industries such as consumer electronics manufacturers continue to have difficulty…
Is the past deciding the future?

2018年8月6日

Is the past deciding the future?

Is the past deciding the future? I have a doubt about this point.I have brought doubts about this point for a long time…

See all articles

Shintaro Nakabayashi的更多文章

The Era of Survival for Companies with Data and Algorithm Competitiveness ~Transforming Social and Business Challenges into Mathematical Challenges~

Large-scale change management

What work and activities are best for your team?

Identify "personality" of companies that can not be substituted by machine learning

One that the next generation digital officer should have

What is KPI in Digital Marketing activities?

Short Topics of the "industry S curve"

Denial and affirmative, sense to see two conflicting faces at the same time

Is the past deciding the future?

社区洞察