Time For Summary
Roland Schubert
Entscheidungs- und ergebnisorienierte Datenanalyse für Finance, Marketing & HR - Von der Strategie bis zur Umsetzung | Planungs- und Analyseprozesse effizient gestalten, optimieren und automatisieren | Alteryx ACE
When I receive a new table or file, I always try to first get an overview. Usually by bringing an INPUT DATA tool onto the canvas, connecting it to a BROWSE tool, and running the workflow.?
Our sample data is pretty up-to-date - a lot of data is collected at soccer matches nowadays and at the end of a World Cup there is a lot to analyze. For example, the data shows very clearly that the German team was not that bad according to expected goals (xg1, xg2) - but only the ones actually scored count ... but we won't talk about that any more.
But let's have a look at the result in the Results Window first:
This gives you a very quick overview and a first feeling for field contents and data quality. Alternatively, you can also look at the field types by simply switching to "Metadata":
If you need more details - there you go, that's possible too. For the selected field, the BROWSE tool provides additional information, depending on the field type, parameters such as minimum, maximum, average or field lengths are provided.
Basically, the BROWSE tool provides us with all the information we need, but if we want to go into more detail, a few activities are required.
But there is also another approach that gives us a summary overview in one step. For this we can use the FIELD SUMMARY tool.
Here we just have to select the fields for which we want to display information. By default, no field is selected, so you have to go into the configuration in any case.
领英推荐
If the dataset is very large, we can also use only a representative part of the data and specify the number or proportion of records to be used. This can save considerable time, but of course we sacrifice accuracy. In each individual case it must be decided whether the speed gained is so important.?
For date fields, however, all records are always used; for this field type the interval (daily, weekly, monthly) is determined, a partial dataset is not sufficient for this.
When configured correctly, the FIELD SUMMARY tool provides a very comprehensive overview in a concentrated form:
For each field, the tool returns a data record with the respective information. In addition to the field name and field category (numeric, text, date, spatial), further descriptive parameters are shown depending on the category - for numeric data, for example, minimum and maximum, for spatial data the type (point, line, area), for date fields in addition to the first and last date also the interval (day, month, year). For our data this does not work, because no regularity can be identified.
Additionally, for numeric data the distribution is visualized; this representation can be displayed using the BROWSE tool.
Most of the key figures can of course also be determined using other tools, e.g. the BROWSE tool. However, there is a crucial difference: The BROWSE tool does not have an output anchor, whereas the FIELD SUMMARY tool does, so the results can be used in the further workflow.?In addition, all descriptive variables can be seen at a glance.
In short - an easy to create, very comprehensive summary of the contents of a table or file with a variety of uses.