登录查看更多内容

Grouping Data

Roland Schubert

Entscheidungs- und ergebnisorienierte Datenanalyse für Finance, Marketing & HR - Von der Strategie bis zur Umsetzung | Planungs- und Analyseprozesse effizient gestalten, optimieren und automatisieren | Alteryx ACE

发布日期: 2022年12月20日

Grouping data in some way is an essential part of day-to-day business for data analysts. Many people immediately think of classification or clustering, probably these are the two terms that first come to a Data Scientist's mind.

But sometimes (actually most of the time) things are much simpler - you just want to group products according to their revenue, suppliers according to their on-time delivery or customers based on their payment history - there is simply a criterion (or several criteria) that decide who belongs to a group.

In a simple example, the following data could be available:

Es wurde kein Alt-Text für dieses Bild angegeben.

To form groups here, you can of course work with a FORMULA tool and use an appropriate IF construction:

You could also use a sequence of FILTER tools that filter out the respective group one by one:

In both cases, the threshold of the specific groups must be defined and implemented in a condition. These values are always valid and independent of the data supplied - in many cases this may be correct.?

Often, however, this logic is not really fitting. If the products are to be classified according to their cumulative revenues, the threshold values would have to be adjusted every month - not really nice.

But there certainly is the TILE tool - and this simplifies the process of grouping and makes it much more flexible.

Of course, you can also specify the respective limit values for individual groups within the TILE tool:

As a result, the TILE tool returns a group number (Tile_Num) for the different data records, in addition they are numbered, however, independent of the data values, but exclusively according to the order in the table.?

But be careful: The TILE tool always specifies the upper limit of a group - i.e. a "<=" always applies to the "manual" setting! There is no chance to change this operator!

It gets really interesting, however, if we take a look at the other options offered by the TILE tool. Besides the manual grouping, there are several other methods:

领英推荐

How to Answer Data Questions for Non-Technical…

Quantum Analytics NG 2 个月前

Amazed by the power data holds

Endava 2 年前

10 Mistakes Every Data Analyst Should Avoid

Walter Shields 1 个月前

Equal Sum:

The data records are divided into a predefined number of groups so that the sum of the values in the different groups is identical. Sorting can be done in advance (always in ascending order), but it is not required.

The result can show, for example, in the style of an ABC analysis, the importance of the individual products for the total revenue; each of the groups has the same total revenue, but group 1 only contains two products, while group 5 includes seven.

Without sorting, by the way, the total is added up in the order delivered - here the result is not related to the size ratios in the table, but to the (random) order.

Equal Records:?

Using the "Equal Records" option, on the other hand, groups of equal size are created - regardless of the values. Without sorting, the records are arranged according to their order; with sorting, groups of equal size are formed, but each group contains records containing high and low values:

Smart Tile:

I especially like the option "Smart Tile" - the data sets are divided according to the standard deviation, so the grouping essentially depends on the structure of the supplied data.?

Data values that fall within +/- the standard deviation are assigned to the group "0", between +/- standard deviation and +/- 2 * standard deviation to the groups "-1" and "1", and the process continues accordingly. Here, neither the number nor the size of the individual groups is fixed in advance.

Of course, the type of grouping must be chosen according to the specific application, but most real-life situations should be covered.

#AlteryxAdventCalendar #alteryxace #alteryxinnovator #analytics?

要查看或添加评论，请登录

Roland Schubert的更多文章

Time For Summary

2022年12月24日

Time For Summary

When I receive a new table or file, I always try to first get an overview. Usually by bringing an INPUT DATA tool onto…
A Little Bit More: Oversampling

2022年12月23日

A Little Bit More: Oversampling

When selecting data records, it is often simply a matter of selecting only the first (or last) data from a table…

1 条评论
Building Groups Based on Relations

2022年12月22日

Building Groups Based on Relations

Grouping is not necessarily a very unusual task - customer groups always come to my mind spontaneously. Common…
Different Types of Correlation

2022年12月21日

Different Types of Correlation

I have to admit it - I intuitively tend to look for relationships between different data. And indeed, I often recognize…
Comparing To Lists

2022年12月19日

Comparing To Lists

Long years ago, a DIY chain in Germany launched a discount campaign entitled "20% discount on all items - except pet…
Famous (or Not-So-Famous) Last Words

2022年12月18日

Famous (or Not-So-Famous) Last Words

Sometimes you just have to have the last word - the last word from a text field, of course. When it comes to "breaking…

1 条评论
Break on Error

2022年12月17日

Break on Error

An error has occurred in a workflow and it continues to run anyway? Sometimes that's all right, but only sometimes…
Year-To-Date Calculations

2022年12月16日

Year-To-Date Calculations

If you are working in Finance/FP&A/Controlling, calculating "Year to Date" (YTD) values is an essential part of your…
Compare Date and Time

2022年12月15日

Compare Date and Time

We often need data only for a specific period of time - a year, a month or a week, sometimes just a few hours, but the…
Sample or Random Sample?

2022年12月14日

Sample or Random Sample?

In Alteryx there are some tools that can be used for very different tasks and whose name can sometimes be confusing…

See all articles

Grouping Data

Roland Schubert

Entscheidungs- und ergebnisorienierte Datenanalyse für Finance, Marketing & HR - Von der Strategie bis zur Umsetzung | Planungs- und Analyseprozesse effizient gestalten, optimieren und automatisieren | Alteryx ACE

领英推荐

Roland Schubert的更多文章

社区洞察

其他会员也浏览了

3 Best Practices for Creating an Effective Data Strategy

How to know your data?

What are the Do’s and Don’ts while analysing data?

How Data Products Change the Game

Data Quality & Cleaning: The Foundation for Reliable Analysis

No Data, No Chocolate

Data Advisory @BigID

Essential guide to understanding Time Series Analysis and different models

Unlocking the Power of Data: Understanding the DIKW Pyramid for Data Analysts

How To Stop Your Data Team Becoming A Bottleneck

领英推荐

Roland Schubert的更多文章

Time For Summary

A Little Bit More: Oversampling

Building Groups Based on Relations

Different Types of Correlation

Comparing To Lists

Famous (or Not-So-Famous) Last Words

Break on Error

Year-To-Date Calculations

Compare Date and Time

Sample or Random Sample?

社区洞察

其他会员也浏览了

3 Best Practices for Creating an Effective Data Strategy

How to know your data?

What are the Do’s and Don’ts while analysing data?

How Data Products Change the Game

Data Quality & Cleaning: The Foundation for Reliable Analysis

No Data, No Chocolate

Data Advisory @BigID

Essential guide to understanding Time Series Analysis and different models

Unlocking the Power of Data: Understanding the DIKW Pyramid for Data Analysts

How To Stop Your Data Team Becoming A Bottleneck