登录查看更多内容

SAP BW Data Mining Analytics: Regression Reporting

Sergey Lukyanchikov

AI Automation Expert

发布日期: 2021年9月12日

Summary

Regression analysis is one of the methods supplied “built-in” with SAP BW Data Mining. Based on this method regression models can be created and configured to satisfy specific analysis requirements (e.g., choice between linear or non-linear approximation, etc.). The method includes regression-specific reporting that allows analysis of the modeling results. In this paper we are suggesting a number of ways to extend this reporting in order to improve insight into the results of regression modeling. The mentioned extensions are implemented via the following analytics:

Dashboard - SAP BW Data Mining Regression Reporting

Business Requirements

We will set our focus on the method-specific (not problem-specific) indicators that appear on regression reporting in BW Data Mining. That means that we will not be considering the part of the standard BW Data Mining reporting that visualizes regression scores (predicted values) neither regression coefficients for particular regression models. Instead, we will concentrate on the indicators that provide insight as to the volume and quality of models’ input data, as well as the quality of approximation achieved via the models.

The abovementioned method-specific indicators can be viewed via either the basic statistics of models involved in analysis processes (transaction RSANWB, display the analysis process, right-click on the model and select to display basic statistics) or via the general statistics in a model’s results (transaction RSDMWB, display the model, choose the modeling results button in the model’s toolbar, choose the general statistics button in the toolbar of the main results graph).

An example of visualization available via the basic statistics of an analysis process is provided in the below screenshot:

An example of visualization available via general statistics in a regression model’s results is provided in the below screenshot:

The above visualization functionalities cover well the basic needs of a user that would like to obtain insight in the results of regression modeling. Based on our practical experience with regression modeling in SAP BW Data Mining the following additional business requirements could be suggested:

Ability to browse specific regression models to visualize their most important details without clicking into those models and studying them via a Display/Edit mode
Ability to visualize the method-specific indicators of regression models without going into either basic statistics or general statistics of individual processes and models
Ability to visualize the method-specific indicators per model variable with sufficient precision (i.e. not rounded to an integer or to only three digits after the decimal point)
Ability to select and visualize in a graphical mode only the comparable method-specific indicators (in order to avoid difficulties with their visualization among the other indicators, usually with greater values)
Ability to select and visualize only the details (indicators and data properties) of specific model variables

Analytics

The implementation of the above business requirements in the “SAP BW Data Mining Regression Reporting” dashboard is based on combining the functionality of the “SAP BW Data Mining Model Reporting” dashboard (find more details on this dashboard in SAP BW Data Mining Analytics: Model Reporting) with insight that is specific for SAP_REGRESSION and SAP_SCORING_WT_TABLE methods.

At startup, the “SAP BW Data Mining Regression Reporting” dashboard displays three tabs:

Model Master: contains the overall list of SAP BW Data Mining models defined in our system with their most important data properties, plus a set of controls allowing to browse and filter that list
Regression/Scoring Models – Table: contains the list of models based on SAP_REGRESSION and SAP_SCORING_WT_TABLE methods with their variables and method-specific indicators, plus a set of controls allowing to browse and filter that list
Regression/Scoring Models – Graphs: contains the bar chart to visualize the method-specific indicators for the models and variables chosen using the two other tabs, plus a set of controls to select for visualization specific indicators
The following columns have been enabled in the list at the Model Master tab (see the screenshot below):
Model ID – a unique identifier, the “technical name” of the model in the overall model list
Model Field Name – an identifier of a variable (field) of the model in the overall model list
Modeling Method – the modeling method on which the model is based
Version – the version of the model
Field Data Type – the type of the data contained in a model field
Field Content Type – the role that a model field plays in the model
Field Is Predictable – contains “X” if the field is a predictable field, i.e. the one that if filled with prediction results
Character Field Length – the total length of a character field
Numeric Field Length – the total length of a numeric field
Numeric Field Precision – the number of digits after the decimal point in a numeric field

The following columns have been enabled in the list at the Regression/Scoring Models – Table tab (see the screenshot below):

Model ID – a unique identifier, the “technical name” of the model in the regression-specific model list
Row Number – an identifier of a variable (field) of the model in the regression-specific model list
Goodness Indicator I – an indicator of goodness of approximation
Goodness Indicator R – an indicator of goodness of approximation
Total of Absolute Differences – the sum of absolute predicted/observed differences
Total of Predicted Values – the sum of predicted values
Total of Observed Values – the sum of observed values
Number of Data Records – the number of data points loaded into a regression model during its training

The selectors of the dashboard match the columns of the lists and allow limiting the models and variables visualized via the lists to specific criteria. Each time a specific value is selected, the respective selector’s status indicator turns green.

领英推荐

5 applications of data mining

Naveen Joshi 4 年前

Data Warehouse in Data Mining: Role, Types, Benefits…

Ze Learning Labb 1 个月前

Classification of Data Mining Systems: Types, Basic…

Ze Learning Labb 1 个月前

In the below screenshot, the PIO_INV_RG_L value is selected via the Model ID selector. The lists of variables in the Model Master and Regression/Scoring Models – Table tabs are immediately refreshed to display only the records corresponding to the model with the technical name PIO_INV_RG_L:

In order to visualize only the details of a specific variable, we must select this variable using the Model Field Name and Row Number selectors. The Model Field Name selector applies to the Model Master tab:

The Row Number selector applies to the Regression/Scoring Models – Table tab:

The necessity to indicate two different identifiers for the same variable is due to the fact that the variables are coded differently in the SAP BW Data Mining’s internal tables – the overall model master table uses the model field names indicated in the model definitions, while the regression-specific model master table uses the numbers of the rows occupied by the respective variables in the regression model definitions. We are working on the link among the two sets of identifiers and will implement it in the further versions of this dashboard. For the time being, a reasonable workaround is based on the fact that the order in which the model field names and of the row numbers for a selected model appear on the dropdown lists is the same – which allows us to simply count the same number of labels from the top of the dropdown lists in the Model Field Name and Row Number selectors in order to make sure that the same variable is selected in both of the respective tabs.

Finally, we may need to obtain graphical visualization of the values of the method-specific indicators displayed in the Regression/Scoring Models – Table tab. Such visualization is implemented in the Regression/Scoring Models – Graphs tab. In the below screenshot we can see that all of the six method-specific indicators for the variable PIOINV08 (corresponds to the row number 3, which is displayed next to the graph’s Y-axis) are displayed, but because the values for the two goodness-of-fit indicators, as well as for the total of absolute differences indicator, are so small compared to the values of the other indicators that we can only see three bars in the bar chart (with the bars corresponding to the indicators with smaller values “molded” into the Y-axis):

A similar inconvenience, as it was mentioned before, could be found with the standard SAP BW Data Mining reporting related to regression modeling results. Our dashboard proposes and efficient workaround – we can uncheck directly in the graph’s legend the indicators with greater values and to have the indicators with smaller values visualized along the automatically rescaled X-axis:

Typical Use Cases

The following could be examples of the typical use cases in which the usage of the SAP BW Data Mining Regression Reporting dashboard could bring benefits:

1) A data mining specialist would like to visualize the models with Y as predictable variable and to study method-specific indicators of those of them that are based on the SAP_REGRESSION method.

2) A data mining specialist would like to visualize the models based on the SAP_REGRESSION method that contain at least 10 variables with the predictable variable having exactly N digits after the decimal point.

Use scenario: in the Model Master tab, select the records that correspond to the SAP_REGRESSION method using the Modeling Method selector, and then limit further your selection by choosing X in the Field Is Predictable selector and the value closest to or exactly matching N in the Num. Field Precision selector. The model list in the Model Master tab will display the technical names of the models matching all of the above criteria except for having at least 10 variables. To apply this last criterion, switch to the Regression/Scoring Models – Table tab and choose 10 in the Row Number selector (if 10 is not available, there are no models that match this criterion). Choose one by one the model technical names displayed in the Model ID selector in the Regression/Scoring Models – Table tab to verify whether the models selected with the help of selectors in the Model Master tab are compatible with having at least 10 variables criterion. If on selection of a model technical name via the Model ID selector the model list in the Regression/Scoring Models – Table tab displays at least one row, then the respective model satisfies all the criteria. If the list is empty, the respective model has less than 10 rows.

3) A data mining specialist would like to visualize models based on the SAP_REGRESSION method that contain variable Y as predictable variable and to find out which of those models provides the highest value of the goodness indicator R.

Use scenario: in the Model Master tab, select the records that correspond to the SAP_REGRESSION method using the Modeling Method selector, then limit further your selection by choosing Y in the Model Field Name and X in the Field Is Predictable selectors. This will leave in the dropdown list of the Model ID selector the models that satisfy the above criteria. Switch to the Regression/Scoring Models – Table tab and choose one by one the model technical names displayed in the Model ID selector to visualize the values of the goodness indicator R. Those values may differ across a model’s variables, and then to identify the highest value per model we could either apply sorting in the Goodness Indicator R column (by clicking on that column’s caption), or to switch to the Regression/Scoring Models – Graphs tab and to remove selection from all the options in the graph’s legend except for Goodness Indicator R. By choosing the model technical names available in the Model ID selector and observing the highest value of the goodness indicator R per model, we can identify the model which provides the highest value for this indicator.

要查看或添加评论，请登录

Sergey Lukyanchikov的更多文章

jBPM as AI Orchestration Platform

2025年3月10日

jBPM as AI Orchestration Platform

Author: Sergey Lukyanchikov, C-NLTX/Open-Source Disclaimer: The views expressed in this document reflect the author's…
Why AI-as-a-Service Requires an Integrated-from-Core Data Platform

2022年9月27日

Why AI-as-a-Service Requires an Integrated-from-Core Data Platform

Author: Sergey Lukyanchikov, InterSystems For one major reason: to avoid progressive technical and economic performance…
Эксперимент IRIS

2022年2月12日

Эксперимент IRIS

Платформенная агентная модель производственного кластера Автор: Сергей Лукьянчиков, InterSystems 1. Цель В данной…
Agent IRIS*

2022年2月1日

Agent IRIS*

* In-Platform Agent-Based Simulation of a Connected Factory Cluster Author: Sergey Lukyanchikov, InterSystems A live…
SAP BW Data Mining Analytics: Clustering Reporting

2021年9月13日

SAP BW Data Mining Analytics: Clustering Reporting

Summary Clustering analysis is another standard method available with SAP BW Data Mining. The clustering models based…
SAP BW Data Mining Analytics: Process Reporting

2021年9月11日

SAP BW Data Mining Analytics: Process Reporting

Summary SAP BW Data Mining serves as a process design platform for a wide variety of analyses either based on the data…
SAP BW Data Mining Analytics: Model Reporting

2021年9月10日

SAP BW Data Mining Analytics: Model Reporting

Summary SAP BW Data Mining allows creating data mining models that implement respective analysis methods (either…
Distributed Artificial Intelligence with InterSystems IRIS

2021年4月6日

Distributed Artificial Intelligence with InterSystems IRIS

Author: Sergey Lukyanchikov, Sales Engineer at InterSystems What is Distributed Artificial Intelligence (DAI)? Attempts…
Распределенный искусственный интеллект на платформе InterSystems IRIS

2021年3月30日

Распределенный искусственный интеллект на платформе InterSystems IRIS

Автор: Сергей Лукьянчиков, инженер-консультант InterSystems Что такое распределенный искусственный интеллект? Попытки…
AI+ML Summit Convergent Analytics – Healthcare Stream

2021年3月1日

AI+ML Summit Convergent Analytics – Healthcare Stream

Start from looking at InterSystems IRIS as DevOps-embracing real-time AI/ML platform, continue by watching a demo of…

See all articles

社区洞察

Data Mining

Here's how you can master data preprocessing in Data Mining.

SAP BW Data Mining Analytics: Regression Reporting

Sergey Lukyanchikov

AI Automation Expert

Summary

Business Requirements

Analytics

领英推荐

Typical Use Cases

Sergey Lukyanchikov的更多文章

社区洞察

其他会员也浏览了

Stages of Data Mining

How Data Mining can help Organizations as well as Startups?

Data Mining

Data mining

5 IMPORTANT FUTURE TRENDS IN DATA MINING

Data mining

Operational Data Mining for better decision-making (Part 2 )

5 Important Future Trends in Data Mining

Unearth Hidden Treasures: Mastering The Art Of Data Mining

Data Mining

Summary

Business Requirements

Analytics

领英推荐

Typical Use Cases

Sergey Lukyanchikov的更多文章

jBPM as AI Orchestration Platform

Why AI-as-a-Service Requires an Integrated-from-Core Data Platform

Эксперимент IRIS

Agent IRIS*

SAP BW Data Mining Analytics: Clustering Reporting

SAP BW Data Mining Analytics: Process Reporting

SAP BW Data Mining Analytics: Model Reporting

Distributed Artificial Intelligence with InterSystems IRIS

Распределенный искусственный интеллект на платформе InterSystems IRIS

AI+ML Summit Convergent Analytics – Healthcare Stream

社区洞察

其他会员也浏览了

Stages of Data Mining

How Data Mining can help Organizations as well as Startups?

Data Mining

Data mining

5 IMPORTANT FUTURE TRENDS IN DATA MINING

Data mining

Operational Data Mining for better decision-making (Part 2 )

5 Important Future Trends in Data Mining

Unearth Hidden Treasures: Mastering The Art Of Data Mining

Data Mining