The Engineers Guide to Machine Learning: Data processing | Data Types

The Engineers Guide to Machine Learning: Data processing | Data Types

Introduction

Wow, what a crazy couple of months. I’ve started a new job as this change has understandably taken up a huge part of my time! Luckily things have quieted down a bit and I can start back with my writing. So, let's jump right in!

Data processing: Data Types

Machine learning/Deep Learning/AI are fancy number crunchers and they can have some amazing results given good data, however, the first step is to properly understand your data so you can make informed decisions about what algorithms and data cleaning methods to use. One of the first things in understanding your data is to know what kind of data you have! Here are the 4 most common types of data that you will come across.

Nominal Data

Nominal data is the least informative of the four data types. These are variables with no inherent order or ranking sequence. “Nominal” scales could simply be called labels. All nominal scales are mutually exclusive and none of them have any numerical significance. An easy way to remember is that the “Nominal” sounds a lot like “name” and all nominal scales are a lot like names.

No alt text provided for this image

Ordinal Data

Variables of an ordinal type can be differentiated by order (Rank, Position), but, the relative difference between them is not known. Ordinal data is a categorical type. “Ordinal” is easy to remember because it sounds like “order” and that’s the key to remember with “ordinal scales”–it is the order that matters, but that’s all you really get from these.

No alt text provided for this image

Interval Data

Interval scales are numerical scales where we know the order and exact differences between values. The classic example of an interval scale is Celsius temperature. The difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees. There is however no true zero and it is impossible to compute ratios.

Interval scales are nice because the realm of statistical analysis on these data sets opens up. You can measure the mode, median, mean, and standard deviation. Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself means “space in between,” which is the important thing to remember–interval scales not only tell us about order, but also about the value between each item.

No alt text provided for this image

Ratio Data

Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition of zero. Good examples of ratio variables include height and weight.

Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios). The central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.

No alt text provided for this image

Summary

In summary, nominal variables are used to “name,” or label a series of values. Ordinal scales provide good information about the order of choices, such as in a customer satisfaction survey. Interval scales give us the order of values + the ability to quantify the difference between each one. Finally, Ratio scales give us the ultimate–order, interval values, plus the ability to calculate ratios since a “true zero” can be defined.

Thank you for reading! I value your comments and shares and would love to connect on Twitter, LinkedIn, and Facebook. For updates on the most recent and interesting Machine Learning research papers out there, subscribe to AI Scholar Weekly. Please ?? if you enjoyed this article. Cheers!


Michelle Castillo

Full-stack Digital marketer | Content strategist | Ecommerce and social media nerd

5 年

Nice, easily digestible breakdown of data types

回复
Lorenzo B.

Business Development Manager @WujiangChanghua | Industrial Automation (OEM) | HVAC | Italian Manufacturing

5 年

Working right now on ML for music?

回复

要查看或添加评论,请登录

Christopher D.的更多文章

社区洞察

其他会员也浏览了