登录查看更多内容

SAP BW Data Mining Analytics: Clustering Reporting

Sergey Lukyanchikov

AI Automation Expert

发布日期: 2021年9月13日

Summary

Clustering analysis is another standard method available with SAP BW Data Mining. The clustering models based on this method may apply various combinations of parameters (e.g., maximum number of clusters, minimum fraction of inter-cluster hops per iteration, etc.) in order to implement various clustering approaches. The clustering-specific reporting of the method makes possible analysis of the modeling results. In this paper we would like to discuss extensions to the standard reporting in order to improve insight into the results of clustering modeling. The mentioned extensions are implemented via the following analytics:

Dashboard - SAP BW Data Mining Clustering Reporting

Business Requirements

We will focus the discussion on the method-specific (not problem-specific) indicators that are included on the standard clustering reporting in BW Data Mining. In other words, we will not be considering the part of the standard BW Data Mining reporting that visualizes clusters and their attributes (variables participating in the definitions of clusters) neither cluster influence coefficients for particular clustering models. However, we will focus on the indicators that provide insight as to the volume of models’ input data, as well as the quality of segmentation achieved via the models.

The method-specific indicators mentioned above can be viewed via either the modeling results overviews of models involved in analysis processes (transaction RSANWB, display the analysis process, right-click on the model and select to view modeling results) or directly via the modeling results overviews in model definitions (transaction RSDMWB, display the model, choose the modeling results button in the model’s toolbar).

An example of visualization available via the standard reporting of modeling results is provided in the below screenshot:

The standard visualization functionalities cover well the basic needs of a user that would like to obtain insight in the results of clustering modeling. Based on our practical experience with clustering modeling in SAP BW Data Mining the following additional business requirements could be suggested:

Ability to browse specific clustering models to visualize their most important details without clicking into those models and studying them via a Display/Edit mode
Ability to visualize the method-specific indicators of clustering models without going into modeling results overviews of individual processes and models
Ability to visualize the method-specific indicators per model variable with sufficient precision (i.e. not rounded to an integer or to only three digits after the decimal point)
Ability to select and visualize in a graphical mode a meaningful combination of the method-specific indicators (and to obtain additional valuable insight from the modeling results)
Ability to select and visualize only the details (indicators and data properties) of specific model variables

Analytics

The implementation of the above business requirements in the “SAP BW Data Mining Clustering Reporting” dashboard is based on combining the functionality of the “SAP BW Data Mining Model Reporting” dashboard (find more details on this dashboard in SAP BW Data Mining Analytics: Model Reporting) with insight that is specific for the SAP_CLUSTERING method.

At startup, the “SAP BW Data Mining Clustering Reporting” dashboard displays four tabs:

Model Master: contains the overall list of SAP BW Data Mining models defined in our system with their most important data properties, plus a set of controls allowing to browse and filter that list
Clustering Models – Distances Analysis: contains the list of models based on the SAP_CLUSTERING method with their variables and variable values, value frequencies (total percentage of a variable’s values assigned to data points in a specific cluster) and intra-cluster distance indicators, plus a set of controls allowing to browse and filter that list
Clustering Models – Values Analysis: contains the list of models based on the SAP_CLUSTERING method with their variables and variable values, maximum and minimum limits of a value, plus a set of controls allowing to browse and filter that list
Clustering Models – Graph: contains the bubble chart to visualize the average intra-cluster distance indicator for the models and variables chosen using the two other tabs

The selectors of the dashboard match the columns of the lists and allow limiting the models and variables visualized via the lists to specific criteria. Each time a specific value is selected, the respective selector’s status indicator turns green.

The following columns have been enabled in the list at the Model Master tab (see the screenshot below):

Model ID – a unique identifier, the “technical name” of the model in the overall model list
Model Field Name – an identifier of a variable (field) of the model in the overall model list
Modeling Method – the modeling method on which the model is based
Version – the version of the model
Field Data Type – the type of the data contained in a model field
Field Content Type – the role that a model field plays in the model
Field Is Predictable – contains “X” if the field is a predictable field, i.e. the one that if filled with prediction results
Character Field Length – the total length of a character field
Numeric Field Length – the total length of a numeric field
Numeric Field Precision – the number of digits after the decimal point in a numeric field

The following columns have been enabled in the list at the Clustering Models – Distances Analysis tab (see the screenshot below):

领英推荐

5 applications of data mining

Naveen Joshi 4 年前

Data Warehouse in Data Mining: Role, Types, Benefits…

Ze Learning Labb 1 个月前

Classification of Data Mining Systems: Types, Basic…

Ze Learning Labb 1 个月前

Model ID – a unique identifier, the “technical name” of the model in the clustering-specific model list
Cluster ID – an identifier of a cluster that has been generated by the model in the clustering-specific model list
Model Field Name – an identifier of a variable (field) of the model in the clustering-specific model list
Attribute Value ID – an identifier of a variable’s value
Attribute Value Frequency – the total percentage of a variable’s values assigned to data points in a specific cluster
Count of Data Points – the number of data points loaded into a clustering model during its training that have been assigned to a specific cluster
Minimum Distance – an indicator of the minimum intra-cluster distance per a specific cluster
Maximum Distance – an indicator of the maximum intra-cluster distance per a specific cluster
Average Distance – an indicator of the average intra-cluster distance per a specific cluster

The following columns have been enabled in the list at the Clustering Models – Values Analysis tab (see the screenshot below):

Model ID – a unique identifier, the “technical name” of the model in the clustering-specific model list
Model Field Name – an identifier of a variable (field) of the model in the clustering-specific model list
Model Field Content Type – the role that a model variable plays in the model
Attribute Value ID – an identifier of a variable’s value
Attribute Value – the variable value (exact for discrete variables or a binning range for continuous variables)
Minimum Value – 0 for discrete variables (because of exact attribute values) or the lower limit of a binning range for continuous variables)
Maximum Value – 0 for discrete variables (because of exact attribute values) or the upper limit of a binning range for continuous variables)

Finally, in the Clustering Models – Graph tab we obtain graphical visualization of the average intra-cluster distance indicators per specific clusters of a specific model. In the below screenshot we can see all of the clusters generated by the PIO_MRO_CL model (X-axis corresponding to the cluster IDs, Y-axis corresponding to the average intra-cluster distances and bubble size reflecting the count of data points assigned to a specific cluster):

Typical Use Cases

The following could be examples of the typical use cases in which the usage of the SAP BW Data Mining Clustering Reporting dashboard could bring benefits:

1) A data mining specialist would like to visualize the models with Y as predictable variable and to study method-specific indicators of those of them that are based on the SAP_CLUSTERING method.

Use scenario: in the Model Master tab, select the records that correspond to the SAP_CLUSTERING method using the Modeling Method selector, then limit further your selection by choosing Y via the Model Field Name selector and X in the Field Is Predictable selector. The dropdown list of the Model ID selector will contain the technical names of the models we are interested in. Choose those models one by one in the Model ID selector and study their method-specific indicators in the Clustering Models – Distances Analysis, Clustering Models – Values Analysis and Clustering Models – Graph tabs.

2) A data mining specialist would like to visualize the models based on the SAP_CLUSTERING method that contain the variable Y and have generated up to 10 clusters.

Use scenario: in the Model Master tab, select the records that correspond to the SAP_CLUSTERING method using the Modeling Method selector, and then limit further your selection by choosing Y in the Model Field Name selector. The model list in the Model Master tab will display the technical names of the models matching all of the above criteria except for having generated up to 10 clusters. To apply this last criterion, switch to the Clustering Models – Distances Analysis tab and choose 10 in the Cluster ID selector (if 10 is not available, there are no models that match this criterion). The model list in the Clustering Models – Distances Analysis tab will display the models satisfying to tall of the above criteria.

3) A data mining specialist would like to visually evaluate the quality of the clustering produced by the model M from the point of view of the homogeneity of the data points assigned to clusters and of the distribution of the data points across the clusters.

Use scenario: in the Model Master tab, select M in the Model ID selector to limit the evaluation to the model M. After that, switch to the Clustering Models – Graph tab to proceed with the interpretation of the graphical visualization of the model M’s clusters (let us assume that we find there the visualization identical to the one presented in the below screenshot):

Based on the visual analysis of the above graph, we could make the following evaluation:

Clusters 10, 15, 16, 19, 21 and 23 have relatively small average intra-cluster distances (from 300 through 800), the data points are distributed across them relatively evenly (least of the data points assigned to the cluster 19, most of the data points assigned to the cluster 21)
Clusters 5, 12 and 20 have relatively large average intra-cluster distances (from 1300 through 1600), the data points are distributed across them unevenly with the vast majority of the data points assigned to the cluster 20

要查看或添加评论，请登录

Sergey Lukyanchikov的更多文章

jBPM as AI Orchestration Platform

2025年3月10日

jBPM as AI Orchestration Platform

Author: Sergey Lukyanchikov, C-NLTX/Open-Source Disclaimer: The views expressed in this document reflect the author's…
Why AI-as-a-Service Requires an Integrated-from-Core Data Platform

2022年9月27日

Why AI-as-a-Service Requires an Integrated-from-Core Data Platform

Author: Sergey Lukyanchikov, InterSystems For one major reason: to avoid progressive technical and economic performance…
Эксперимент IRIS

2022年2月12日

Эксперимент IRIS

Платформенная агентная модель производственного кластера Автор: Сергей Лукьянчиков, InterSystems 1. Цель В данной…
Agent IRIS*

2022年2月1日

Agent IRIS*

* In-Platform Agent-Based Simulation of a Connected Factory Cluster Author: Sergey Lukyanchikov, InterSystems A live…
SAP BW Data Mining Analytics: Regression Reporting

2021年9月12日

SAP BW Data Mining Analytics: Regression Reporting

Summary Regression analysis is one of the methods supplied “built-in” with SAP BW Data Mining. Based on this method…
SAP BW Data Mining Analytics: Process Reporting

2021年9月11日

SAP BW Data Mining Analytics: Process Reporting

Summary SAP BW Data Mining serves as a process design platform for a wide variety of analyses either based on the data…
SAP BW Data Mining Analytics: Model Reporting

2021年9月10日

SAP BW Data Mining Analytics: Model Reporting

Summary SAP BW Data Mining allows creating data mining models that implement respective analysis methods (either…
Distributed Artificial Intelligence with InterSystems IRIS

2021年4月6日

Distributed Artificial Intelligence with InterSystems IRIS

Author: Sergey Lukyanchikov, Sales Engineer at InterSystems What is Distributed Artificial Intelligence (DAI)? Attempts…
Распределенный искусственный интеллект на платформе InterSystems IRIS

2021年3月30日

Распределенный искусственный интеллект на платформе InterSystems IRIS

Автор: Сергей Лукьянчиков, инженер-консультант InterSystems Что такое распределенный искусственный интеллект? Попытки…
AI+ML Summit Convergent Analytics – Healthcare Stream

2021年3月1日

AI+ML Summit Convergent Analytics – Healthcare Stream

Start from looking at InterSystems IRIS as DevOps-embracing real-time AI/ML platform, continue by watching a demo of…

See all articles

SAP BW Data Mining Analytics: Clustering Reporting

Sergey Lukyanchikov

AI Automation Expert

Summary

Business Requirements

Analytics

领英推荐

Typical Use Cases

Sergey Lukyanchikov的更多文章

社区洞察

其他会员也浏览了

Stages of Data Mining

5 IMPORTANT FUTURE TRENDS IN DATA MINING

Operational Data Mining for better decision-making (Part 2 )

5 Important Future Trends in Data Mining

The Untapped Potential of Data Mining SaaS Applications in 2023

What Is Data Mining? & What are some great Data Mining Tools?

Unearth Hidden Treasures: Mastering The Art Of Data Mining

Data Mining

What is Data Mining? A Complete Guide

Data Mining Concepts and Process

Summary

Business Requirements

Analytics

领英推荐

Typical Use Cases

Sergey Lukyanchikov的更多文章

jBPM as AI Orchestration Platform

Why AI-as-a-Service Requires an Integrated-from-Core Data Platform

Эксперимент IRIS

Agent IRIS*

SAP BW Data Mining Analytics: Regression Reporting

SAP BW Data Mining Analytics: Process Reporting

SAP BW Data Mining Analytics: Model Reporting

Distributed Artificial Intelligence with InterSystems IRIS

Распределенный искусственный интеллект на платформе InterSystems IRIS

AI+ML Summit Convergent Analytics – Healthcare Stream

社区洞察

其他会员也浏览了

Stages of Data Mining

5 IMPORTANT FUTURE TRENDS IN DATA MINING

Operational Data Mining for better decision-making (Part 2 )

5 Important Future Trends in Data Mining

The Untapped Potential of Data Mining SaaS Applications in 2023

What Is Data Mining? & What are some great Data Mining Tools?

Unearth Hidden Treasures: Mastering The Art Of Data Mining

Data Mining

What is Data Mining? A Complete Guide

Data Mining Concepts and Process