Time For Summary

Time For Summary

When I receive a new table or file, I always try to first get an overview. Usually by bringing an INPUT DATA tool onto the canvas, connecting it to a BROWSE tool, and running the workflow.?

Our sample data is pretty up-to-date - a lot of data is collected at soccer matches nowadays and at the end of a World Cup there is a lot to analyze. For example, the data shows very clearly that the German team was not that bad according to expected goals (xg1, xg2) - but only the ones actually scored count ... but we won't talk about that any more.

But let's have a look at the result in the Results Window first:

Es wurde kein Alt-Text für dieses Bild angegeben.

This gives you a very quick overview and a first feeling for field contents and data quality. Alternatively, you can also look at the field types by simply switching to "Metadata":

Es wurde kein Alt-Text für dieses Bild angegeben.

If you need more details - there you go, that's possible too. For the selected field, the BROWSE tool provides additional information, depending on the field type, parameters such as minimum, maximum, average or field lengths are provided.

Es wurde kein Alt-Text für dieses Bild angegeben.

Basically, the BROWSE tool provides us with all the information we need, but if we want to go into more detail, a few activities are required.

But there is also another approach that gives us a summary overview in one step. For this we can use the FIELD SUMMARY tool.

Here we just have to select the fields for which we want to display information. By default, no field is selected, so you have to go into the configuration in any case.

Es wurde kein Alt-Text für dieses Bild angegeben.

If the dataset is very large, we can also use only a representative part of the data and specify the number or proportion of records to be used. This can save considerable time, but of course we sacrifice accuracy. In each individual case it must be decided whether the speed gained is so important.?

For date fields, however, all records are always used; for this field type the interval (daily, weekly, monthly) is determined, a partial dataset is not sufficient for this.

Es wurde kein Alt-Text für dieses Bild angegeben.

When configured correctly, the FIELD SUMMARY tool provides a very comprehensive overview in a concentrated form:

Es wurde kein Alt-Text für dieses Bild angegeben.

For each field, the tool returns a data record with the respective information. In addition to the field name and field category (numeric, text, date, spatial), further descriptive parameters are shown depending on the category - for numeric data, for example, minimum and maximum, for spatial data the type (point, line, area), for date fields in addition to the first and last date also the interval (day, month, year). For our data this does not work, because no regularity can be identified.

Additionally, for numeric data the distribution is visualized; this representation can be displayed using the BROWSE tool.

Es wurde kein Alt-Text für dieses Bild angegeben.

Most of the key figures can of course also be determined using other tools, e.g. the BROWSE tool. However, there is a crucial difference: The BROWSE tool does not have an output anchor, whereas the FIELD SUMMARY tool does, so the results can be used in the further workflow.?In addition, all descriptive variables can be seen at a glance.

In short - an easy to create, very comprehensive summary of the contents of a table or file with a variety of uses.

要查看或添加评论,请登录

Roland Schubert的更多文章

  • A Little Bit More: Oversampling

    A Little Bit More: Oversampling

    When selecting data records, it is often simply a matter of selecting only the first (or last) data from a table…

    1 条评论
  • Building Groups Based on Relations

    Building Groups Based on Relations

    Grouping is not necessarily a very unusual task - customer groups always come to my mind spontaneously. Common…

  • Different Types of Correlation

    Different Types of Correlation

    I have to admit it - I intuitively tend to look for relationships between different data. And indeed, I often recognize…

  • Grouping Data

    Grouping Data

    Grouping data in some way is an essential part of day-to-day business for data analysts. Many people immediately think…

  • Comparing To Lists

    Comparing To Lists

    Long years ago, a DIY chain in Germany launched a discount campaign entitled "20% discount on all items - except pet…

  • Famous (or Not-So-Famous) Last Words

    Famous (or Not-So-Famous) Last Words

    Sometimes you just have to have the last word - the last word from a text field, of course. When it comes to "breaking…

    1 条评论
  • Break on Error

    Break on Error

    An error has occurred in a workflow and it continues to run anyway? Sometimes that's all right, but only sometimes…

  • Year-To-Date Calculations

    Year-To-Date Calculations

    If you are working in Finance/FP&A/Controlling, calculating "Year to Date" (YTD) values is an essential part of your…

  • Compare Date and Time

    Compare Date and Time

    We often need data only for a specific period of time - a year, a month or a week, sometimes just a few hours, but the…

  • Sample or Random Sample?

    Sample or Random Sample?

    In Alteryx there are some tools that can be used for very different tasks and whose name can sometimes be confusing…

社区洞察

其他会员也浏览了