登录查看更多内容

AI and Machine Learning for Qlik Developers - Case Study 1

Robert Svebeck

”Success consists of going from failure to failure without a loss of enthusiasm.” Innovation and Digitalization Leader | AI Strategist | Solution Architect

发布日期: 2020年4月14日

This is an simple example in Qlik where we are using a Supervised Univariate Gradient Descent Algorithm to be able to predict the amount of a insurance claim given the number of claims.

(The purpose of this example is to make you understand something about Machine Learning. This task could of course be solved much easier and quicker without using ML, by using built in Qlik functionality, or by calculating Pearson's Correlation Coefficient and Linear Regression.

For more in-depth information on the Qlik Script, review the articles from start before you begin.

If you want a copy of the Qlik Sense QVF here - contact me on LinkedIn and I will send it to you.

Reference: Swedish Committee on Analysis of Risk Premium in Motor Insurance

Data Source: Auto Insurance in Sweden (download data here)

Data extract (first 10 rows)

Hypothesis

Y = θ0 + X * θ1

Example: Qlik Sense Chart after training

X = number of claims
Y = total payment for all the claims in thousands of Swedish Kronor
Red Line = The result of our algorithm after 15'000 iterations

Example: Qlik Sense Log report after training

We can see that the value of θ1 became stable after around 5'000 iterations and θ0 (the bias) around 10'000 iterations using a fixed learning rate (α) at 0.001.

Script

Sheet 1 (GENERAL SETUP)

// REGIONAL SETTINGS

SET ThousandSep=' ';
SET DecimalSep='.';

Sheet 2 (INPUT DATA)

Training_table:

LOAD

  num(num#(X,'',',')) as X, 
  num(num#(Y,'',',')) as Y
    
FROM [lib://AI_DATA/INSURANCE/data.txt]
(txt, codepage is 28591, embedded labels, delimiter is '\t', msq);

Sheet 3 (LOG TABLE)

// SUB TO GENERATE A LOGFILE IN A LOG TABLE
// EVERY TIME THE SUB IS CALLED, IT GENERATES A NEW ROW IN THE TABLE Log.

Sub Log

 Log: 

 LOAD 

  rowno() as logRow,
  $(θ0) as θ0,
  $(θ1) as θ1

 autogenerate(1);
End sub

Sheet 4 (GRADIENT DESCENT IN ACTION)

// GRADIENT DESCENT FUNCTION



// CHECK THE TIME NOW (TO BE ABLE TO VISUALIZE HOW MUCH TIME LEARNING TAKES)
Let StartTrainingTime = now();

// DEFINE HOW MANY TIMES TO LOOP
Let iterations= 15000;

// DEFINE LEARNING RATE
Let α = 0.001;

// DEFINE START VALUES FOR ALL THETAS (WEIGHTS). 
Let θ0 = 0;
Let θ1 = 0;

// GET HOW MANY ROWS TRAINING DATA HAS.

Let m = NoOfRows('Training_table');

// START LOOP
For i = 1 to iterations

    // CREATE A SUMMARY TEMP TABLE
    temp:
    LOAD

        Rangesum(($(θ0) + (X* $(θ1)) - Y),
                    peek(deviation_0)) as deviation_0,

        Rangesum(($(θ0) + (X * $(θ1)) - Y) * X,
                    peek(deviation_X)) as deviation_X  

    Resident Training_table;

    // GET THE LAST ROW FROM THE temp TABLE 
    // THAT HAS THE TOTAL SUM OF ALL ROWS
    Let deviation_0 = Peek('deviation_0',-1,'temp');
    Let deviation_X = Peek('deviation_X',-1,'temp');
    
    // DROP THE temp TABLE. NO LONGER NEEDED
    drop table temp;
    
    // CHANGE THE VALUE OF EACH θ TOWARDS A BETTER θ
    Let θ0 = θ0 - ( α * 1/m * deviation_0);
    Let θ1 = θ1 - ( α * 1/m * deviation_X);
    

    // CREATE A LOG OF EACH θ
    Call Log;

// REPEAT UNTIL ITERATIONS HAVE REACHED THE GIVEN MAX
next i;


// STORE END TIME AND CALCUALTE TOTAL TIME AND SPEED
Let EndTrainingTime = now();
Let TrainingTime = timestamp(EndTrainingTime - StartTrainingTime);
Let learningSpeed = iterations / Second(TrainingTime);

Sheet 5 (User interface preparation)

// RENAME X AND Y TO ACTUAL NAMES FROM IMPORTED DATA

rename field X to [Number of claims];
rename field Y to [Total payment];

Improvements

I wanted to keep the code above as simple as possible, but I can see some improvements for the code that you can change/add yourself if you have understood how the code works.

The main thing to improve would be to test the learning rate (α) to a slightly bigger number so that the number of iterations don't have to be so high. If you try with α too high you will get this kind of weird error:

This is because you get divergence (failure to converge) and the size of our θ becomes to big for Qlik Sense to handle. Adding some checks on how θ0 and θ1 is changing after each iteration is a good idea to avoid this problem, and exit the loop if they don't converge.

Since θ0 is changing slower to reach it's target than θ1, you can try to use two different learning rates for each θ. Try with α=0.01 for θ0 and keep α=0.001 for θ1, for instance.

Finally, add code to exit the loop once you have reached a reasonable level of convergence, no need to continue after 15'000 iterations for sure.

Linear Regression and Pearson Correlation Coefficient (r)

And now for something completely different!

Pearson Correlation Coefficient (r):

Using the same data as in the ML example above, finding the constants θ0 and θ1 but in a more traditional mathematical approach.

The goal is to find a and b in the form f(x) = a + b*x by first finding the r.

// GET AVERAGE OF X AND y
temp_avg:
Load 
 sum(X)/count(X) as avgX,
    sum(Y)/count(Y) as avgY 
resident Training_table;
Let avgX = Peek('avgX',-1,'temp_avg');
Let avgY = Peek('avgY',-1,'temp_avg');
drop table temp_avg;

// CALULATE THINGS NEEDED LATER...
temp_calc:
Load 
 (X-$(avgX))*(Y-$(avgY)) as [X-avgX*Y-avgY],
 sqr((X-$(avgX))) as [SQR(X-avgX)],
 sqr((Y-$(avgY))) as [SQR(Y-avgY)]
Resident Training_table;

// SUM UP
temp_sum:
LOAD 
 sum([X-avgX*Y-avgY]) as [SUM X-avgX*Y-avgY], 
    sum([SQR(X-avgX)]) as [SQR(X-avgX)], 
    sum([SQR(Y-avgY)]) as [SQR(Y-avgY)] 
Resident temp_calc;
drop table temp_calc;

Let sumXavgX_YavgY = Peek('SUM X-avgX*Y-avgY',-1,'temp_sum');
Let sqr_X_avgX   = Peek('SQR(X-avgX)',-1,'temp_sum');
Let sqr_Y_avgY   = Peek('SQR(Y-avgY)',-1,'temp_sum');

drop table temp_sum;

// Pearson Correlation Coefficient (r)
// This is a measure of the linear correlation between two variables X and Y
Let r = sumXavgX_YavgY / sqrt(sqr_X_avgX*sqr_Y_avgY);

// GET NUMBER OF ROWS (NEEDED FOR STANADRD DEVIATION CALC)
Let m = NoOfRows('Training_table');

//Standard Deviations of X and Y
let stddev_x = sqrt(sqr_X_avgX/m);
let stddev_y = sqrt(sqr_Y_avgY/m);

// CALCULATE b
let b = r * (stddev_y/stddev_x);

// CALCULATE a
Let a = avgY - b * avgX;

And there we go!

The a and b should more or less get the same values as θ0 and θ1. Much quicker and less coding, but no Machine Learning there. Just math.

Good luck with your coding! More advanced Use Cases studies will come.

Thank you for reading. Please share if you like it, and comment on the post if you have questions or feedback about the content.

Comment below, or connect with me here on LinkedIn if you want to get a copy of the Qlik Sense App.

Nikhileswara Reddy S

Senior Consultant

11 个月

please send me the qliksense files [email protected]

???

3 年

Hello Robert, what a nice work! excellent! would possible to get the QVF file? Thanks in advance:) ([email protected])

1 次回应

Fernando S.

Data Science at Volkswagen Group (Logistic Data Lake)

4 年

Hello Robert, what a nice work! excellent. would possible to get the Qlik Sense QVF? Thanks in advance ([email protected])

1 次回应

Hugo Simancas

Consultor Senior Izertis

4 年

Hello excellent work! would it be possible for me to share the .qvs file? Thanks [email protected]

1 次回应

Christophe Brault

Qlik Enthusiast ?? Make Qlik happen

4 年

Thanks, this case study really helps me understand the concept!

1 次回应

查看更多评论

要查看或添加评论，请登录

Robert Svebeck的更多文章

Fr?n problem till prototyp p? rekordtid

2025年2月28日

Fr?n problem till prototyp p? rekordtid

Hur AI kan anv?ndas i ?terkopplingstr?ning Jag vill dela med mig av ett sp?nnande arbete som visar hur AI f?r?ndrar…

14 条评论
Lets move beyond Garbage In, Garbage Out

2025年2月14日

Lets move beyond Garbage In, Garbage Out

Healthcare's data "crisis" isn't about data quality – it's about recognition. Every day, unfortunately, clinicians all…

4 条评论
Dags att ?ppna skattkistan?

2024年10月10日

Dags att ?ppna skattkistan?

AI-assisterad journalgranskning: Att l?sa upp sjukv?rdens dolda skatt Nyligen hade jag f?rm?nen att delta i ett…
A Personal Perspective on Regulation vs Innovation

2024年9月30日

A Personal Perspective on Regulation vs Innovation

As I sit and stare at my desk, fingers poised over the keyboard, I'm acutely aware of the invisible lines that shape my…

3 条评论
European Parliament Unveils Groundbreaking New Law: 18-Hour Days to Revolutionize Productivity

2023年4月1日

European Parliament Unveils Groundbreaking New Law: 18-Hour Days to Revolutionize Productivity

In a bold and unexpected move, the European Parliament has unveiled a groundbreaking new law that will revolutionize…

5 条评论
Using machine learning to turn verbal conversations into structured contextual data records

2023年3月20日

Using machine learning to turn verbal conversations into structured contextual data records

Introduction Many businesses, organizations, and researchers benefit from extracting insights from conversations…

2 条评论
I wrote a story with ChatGPT

2022年12月23日

I wrote a story with ChatGPT

One could say that, much like John Bunyan as he crafted "The Pilgrim's Progress" I find myself employing allegory in my…

3 条评论
How to be specific in AI image creation process

2022年10月3日

How to be specific in AI image creation process

Disclaimer: The pace of development in this area is so incredible that writing a "how to" article is almost a waste of…
Qlik RGB() to HSL() to RGB() / ARGB()

2021年2月10日

Qlik RGB() to HSL() to RGB() / ARGB()

For a project I recently had a special RGB color code and I needed to make it just slightly brighter and more colorful.…

7 条评论
How to make a 2D Interactive Solar System in Qlik Sense

2021年1月25日

How to make a 2D Interactive Solar System in Qlik Sense

Here is just a super quick guide on how to create a simple animated and interactive 2D Solar System using Qlik Standard…

13 条评论

See all articles

AI and Machine Learning for Qlik Developers - Case Study 1

Robert Svebeck

”Success consists of going from failure to failure without a loss of enthusiasm.” Innovation and Digitalization Leader | AI Strategist | Solution Architect

Data extract (first 10 rows)

Example: Qlik Sense Chart after training

Example: Qlik Sense Log report after training

Script

Sheet 1 (GENERAL SETUP)

Sheet 2 (INPUT DATA)

Sheet 3 (LOG TABLE)

Sheet 4 (GRADIENT DESCENT IN ACTION)

Sheet 5 (User interface preparation)

Improvements

Linear Regression and Pearson Correlation Coefficient (r)

And there we go!

Robert Svebeck的更多文章

社区洞察

其他会员也浏览了

How to enable your AI-assisted Analysts with Microsoft Copilot

Building an LLM Visualization Tool: Challenges, Learnings, and the Road Ahead (Github Repo)

AutoML with Computer Vision classification of images

Is Gemini 2.0 the Killer model for Data Ingestion? Let's test it out, shall we?

Development of a SOQL Generator (Applying RAG to Salesforce Structured Data)

Doing more with AutoML: Model Selection

An Introduction to Motive Software

AI Alchemy: Turning Test Case Data into Strategic Business Intelligence

Building a generalized AI (Artificial Intelligence) application using Snowflake AppBuilder and Streamlit

Data extract (first 10 rows)

Example: Qlik Sense Chart after training

Example: Qlik Sense Log report after training

Script

Sheet 1 (GENERAL SETUP)

Sheet 2 (INPUT DATA)

Sheet 3 (LOG TABLE)

Sheet 4 (GRADIENT DESCENT IN ACTION)

Sheet 5 (User interface preparation)

Improvements

Linear Regression and Pearson Correlation Coefficient (r)

And there we go!

Robert Svebeck的更多文章

Fr?n problem till prototyp p? rekordtid

Lets move beyond Garbage In, Garbage Out

Dags att ?ppna skattkistan?

A Personal Perspective on Regulation vs Innovation

European Parliament Unveils Groundbreaking New Law: 18-Hour Days to Revolutionize Productivity

Using machine learning to turn verbal conversations into structured contextual data records

I wrote a story with ChatGPT

How to be specific in AI image creation process

Qlik RGB() to HSL() to RGB() / ARGB()

How to make a 2D Interactive Solar System in Qlik Sense

社区洞察

其他会员也浏览了

How to enable your AI-assisted Analysts with Microsoft Copilot

Building an LLM Visualization Tool: Challenges, Learnings, and the Road Ahead (Github Repo)

AutoML with Computer Vision classification of images

Is Gemini 2.0 the Killer model for Data Ingestion? Let's test it out, shall we?

Development of a SOQL Generator (Applying RAG to Salesforce Structured Data)

Doing more with AutoML: Model Selection

An Introduction to Motive Software

AI Alchemy: Turning Test Case Data into Strategic Business Intelligence

Building a generalized AI (Artificial Intelligence) application using Snowflake AppBuilder and Streamlit