Grouping Data

Grouping Data

Grouping data in some way is an essential part of day-to-day business for data analysts. Many people immediately think of classification or clustering, probably these are the two terms that first come to a Data Scientist's mind.

But sometimes (actually most of the time) things are much simpler - you just want to group products according to their revenue, suppliers according to their on-time delivery or customers based on their payment history - there is simply a criterion (or several criteria) that decide who belongs to a group.

In a simple example, the following data could be available:

Es wurde kein Alt-Text für dieses Bild angegeben.

To form groups here, you can of course work with a FORMULA tool and use an appropriate IF construction:

Es wurde kein Alt-Text für dieses Bild angegeben.

You could also use a sequence of FILTER tools that filter out the respective group one by one:

Es wurde kein Alt-Text für dieses Bild angegeben.

In both cases, the threshold of the specific groups must be defined and implemented in a condition. These values are always valid and independent of the data supplied - in many cases this may be correct.?

Often, however, this logic is not really fitting. If the products are to be classified according to their cumulative revenues, the threshold values would have to be adjusted every month - not really nice.

But there certainly is the TILE tool - and this simplifies the process of grouping and makes it much more flexible.

Of course, you can also specify the respective limit values for individual groups within the TILE tool:

Es wurde kein Alt-Text für dieses Bild angegeben.

As a result, the TILE tool returns a group number (Tile_Num) for the different data records, in addition they are numbered, however, independent of the data values, but exclusively according to the order in the table.?

Es wurde kein Alt-Text für dieses Bild angegeben.

But be careful: The TILE tool always specifies the upper limit of a group - i.e. a "<=" always applies to the "manual" setting! There is no chance to change this operator!

It gets really interesting, however, if we take a look at the other options offered by the TILE tool. Besides the manual grouping, there are several other methods:

Equal Sum:

The data records are divided into a predefined number of groups so that the sum of the values in the different groups is identical. Sorting can be done in advance (always in ascending order), but it is not required.

The result can show, for example, in the style of an ABC analysis, the importance of the individual products for the total revenue; each of the groups has the same total revenue, but group 1 only contains two products, while group 5 includes seven.

Es wurde kein Alt-Text für dieses Bild angegeben.

Without sorting, by the way, the total is added up in the order delivered - here the result is not related to the size ratios in the table, but to the (random) order.

Equal Records:?

Using the "Equal Records" option, on the other hand, groups of equal size are created - regardless of the values. Without sorting, the records are arranged according to their order; with sorting, groups of equal size are formed, but each group contains records containing high and low values:

Es wurde kein Alt-Text für dieses Bild angegeben.

Smart Tile:

I especially like the option "Smart Tile" - the data sets are divided according to the standard deviation, so the grouping essentially depends on the structure of the supplied data.?

Data values that fall within +/- the standard deviation are assigned to the group "0", between +/- standard deviation and +/- 2 * standard deviation to the groups "-1" and "1", and the process continues accordingly. Here, neither the number nor the size of the individual groups is fixed in advance.

Es wurde kein Alt-Text für dieses Bild angegeben.

Of course, the type of grouping must be chosen according to the specific application, but most real-life situations should be covered.


#AlteryxAdventCalendar #alteryxace #alteryxinnovator #analytics?




要查看或添加评论,请登录

Roland Schubert的更多文章

  • Time For Summary

    Time For Summary

    When I receive a new table or file, I always try to first get an overview. Usually by bringing an INPUT DATA tool onto…

  • A Little Bit More: Oversampling

    A Little Bit More: Oversampling

    When selecting data records, it is often simply a matter of selecting only the first (or last) data from a table…

    1 条评论
  • Building Groups Based on Relations

    Building Groups Based on Relations

    Grouping is not necessarily a very unusual task - customer groups always come to my mind spontaneously. Common…

  • Different Types of Correlation

    Different Types of Correlation

    I have to admit it - I intuitively tend to look for relationships between different data. And indeed, I often recognize…

  • Comparing To Lists

    Comparing To Lists

    Long years ago, a DIY chain in Germany launched a discount campaign entitled "20% discount on all items - except pet…

  • Famous (or Not-So-Famous) Last Words

    Famous (or Not-So-Famous) Last Words

    Sometimes you just have to have the last word - the last word from a text field, of course. When it comes to "breaking…

    1 条评论
  • Break on Error

    Break on Error

    An error has occurred in a workflow and it continues to run anyway? Sometimes that's all right, but only sometimes…

  • Year-To-Date Calculations

    Year-To-Date Calculations

    If you are working in Finance/FP&A/Controlling, calculating "Year to Date" (YTD) values is an essential part of your…

  • Compare Date and Time

    Compare Date and Time

    We often need data only for a specific period of time - a year, a month or a week, sometimes just a few hours, but the…

  • Sample or Random Sample?

    Sample or Random Sample?

    In Alteryx there are some tools that can be used for very different tasks and whose name can sometimes be confusing…

社区洞察

其他会员也浏览了