登录查看更多内容

The Chain-Ladder IBNR Method in SQL

Daniel Nolan

Insurance Analytics

发布日期: 2021年2月25日

The purpose of this article is to share an approach I use to produce a quick-and-dirty IBNR estimate using the Chain-Ladder method, without opening a spreadsheet. Note that I do not regard the Chain-Ladder estimate as an end result, but rather an intermediate step in the Bornhuetter-Ferguson method. The benefit of this approach, from my point of view, is that it can be implemented as a stored procedure in an SQL database (I use SQL Server so the code below is all T-SQL) and the results can then be piped into whatever software the analyst uses to produce the Bornhuetter-Ferguson estimate. So let's dive right in...

SQL Environment

I'm going to start by assuming that you have a summary table or view -- call it Claims -- that includes the following columns: COV_MONTH, PAID_MONTH, and PAID_AMT. For now let's assume the PAID_AMT column is net of Stop Loss recoveries (adjusting a lag triangle for Stop Loss recoveries will be the subject of a future article). Assume COV_MONTH and PAID_MONTH are VARCHAR and represent dates in yyyy-mm format. The output of this procedure is a separate table called Loss_Dev_Factor with two columns: COV_MONTH as VARCHAR, like in the Claims table, and LDF, the loss development factor as, e.g., FLOAT. The idea is that you can then carry out a join such as the following, which summarizes for each coverage month i) the paid amount to date, ii) estimated IBNR, and iii) ultimate incurred claims estimate:

SELECT C.COV_MONTH
     , SUM(PAID_AMT) AS PAID_AMT
     , SUM(L.LDF - 1) * PAID_AMT) AS IBNR
     , SUM(L.LDF * PAID_AMT) AS INCURRED_AMT
FROM Claims C LEFT OUTER JOIN Loss_Dev_Factor L ON C.COV_MONTH = L.COV_MONTH
GROUP BY C.INCURRED_MONTH

The Chain-Ladder Method

There are tons of excellent free resources on the Chain-Ladder method, but I'll still include a recap here since there are numerous variations of the method and it's worthwhile to establish which exact one I'm reproducing in SQL. First though, a concept: when a claim is incurred, i.e., when the billable event occurs, it may get adjudicated right away but in a lot of cases it goes through lengthy administrative review or back-and-forth negotiations or even litigation, and ends up getting settled and paid at a later date. In the meantime the plan sponsor may not even know about it. However, at the end of each accounting period the plan sponsor needs to estimate the unpaid claims liability, or IBNR (which stands for Incurred But Not Reported), so they can accrue the expense, which, per GAAP, they do by means of the following identity: [Incurred Claims] = [Paid Claims] + [Change in IBNR]. The Chain-Ladder method is a simple and intuitive way of estimating IBNR. It is predicated on the assumption that the loss development from one month to the next will mirror the same pattern as recent months.

For us in the health insurance industry, the unit of account is the coverage month (after all, members purchase coverage one month at a time even though open enrollment is, ordinarily, only an annual event), so we keep track of the incurred-paid lag described in the previous paragraph in terms of months. A convenient way of visualizing the lag is by setting up a lower triangular array (could be upper triangular -- this is just a matter of preference) with a column for each coverage month and a row for each paid month. Below is an example of what this looks like using the 12 months ended Dec. 31, 2020 and some hypothetical data. (In practice one should use 36- or 48 months but these won't display nicely in the amount of space I have.)

The first thing to do is carry out what I like to call a "lag-transform" on the triangle, looking at it in terms of how many months have elapsed between the incurred date and the paid date, rather than in terms of specific paid months. Notice these are the same numbers, just reorganized according to lag:

Next, tabulate the cumulative paid amount per coverage month at each lag:

Now we are in a position to calculate the incremental Loss Development Factors (LDFs), i.e., the factor by which the cumulative total increases between each lag:

And this is where the variations of the Chain-Ladder method begin to diverge. The approach I'm going to take here is, at each lag step, drill down on the most recent 6 months' worth of LDFs:

What I've denoted by "Trunc Avg" is the truncated average of each row's values; it's truncated because I'm excluding the max (red font) and min (blue font) as these could be outliers. It is reassuring to note that the reds and blues aren't concentrated in one region of the table, which might indicate there is an evolving trend in the development pattern that would likely undermine the model results. You can implement a truncated average in Excel using Excel's AVERAGEIFS function with the appropriate criteria. For example, if the table range above started in cell A1, then you'd enter the following in cell H2: =AVERAGEIFS(B2:G2,B2:G2,"<>"&MAX(B2:G2),B2:G2,"<>"&MIN(B2:G2)).

But we're not here to talk about Excel! In SQL you could instead create a user-defined-function, or UDF, which is then -- most conveniently -- a part of the database:

CREATE FUNCTION [dbo].[TRUNC_AVG] (@v1 FLOAT, @v2 FLOAT, @v3 FLOAT, @v4 FLOAT, @v5 FLOAT, @v6 FLOAT)
RETURNS FLOAT
AS

BEGIN

     DECLARE @min FLOAT, @max FLOAT;
     SET @min = (SELECT MIN(x) FROM (VALUES (@v1), (@v2), (@v3), (@v4), (@v5), (@v6)) AS value(x));
     SET @max = (SELECT MAX(x) FROM (VALUES (@v1), (@v2), (@v3), (@v4), (@v5), (@v6)) AS value(x));
     RETURN (@v1 + @v2 + @v3 + @v4 + @v5 + @v6 - @min - @max)/4;

END;

GO

Before we delve any further into SQL though, I must finish describing the Chain-Ladder method. Now that we have a set of incremental LDFs, we need to "compound" them up to the present. What do I mean by compound? Well, by default we have to assume the first coverage month in our data is 100% complete (which is why you want to use 36- or even 48 months rather than 12, since there is often prior period development more than 12 months after the fact). To make things concrete, the first month in our data was January 2020 with $2.163MM paid and we're regarding this as 100.0% complete, i.e., we're not expecting any further development. The second month was February 2020 with $2.097MM paid but we expect it to undergo the 10-11 LDF of 0.999 (the fact this is less than 1 means we expect there to be net claims reversals for February 2020 next month, resulting in it ultimately completing at $2.095MM with IBNR therefore being -$2K). Put differently, we're assuming February 2020 is currently 1/0.999 = 100.1% complete. Here is the compounding aspect: Regarding March 2020, we're expecting it to undergo the 9-10 and 10-11 LDFs so that ultimate incurred claims are estimated to be $2.200MM x 0.996 x 0.999 = $2.189MM (and IBNR is therefore -$12K). Proceeding in this manner, you can see how the completion factor CF is the reciprocal of the cumulative product of the incremental LDFs. Here's what the results look like:

SQL Procedure

Let's turn our attention back to the matter at hand, implementing the model in SQL. First let's create the Claims table and populate it with the same data as above so that we can have the satisfaction of obtaining the same result in SQL as we did in Excel:

CREATE TABLE #Claims (COV_MONTH VARCHAR(7), PAID_MONTH VARCHAR(7), PAID_AMT MONEY)
INSERT INTO #Claims VALUES ('2020-01', '2020-01', 613639.04)
INSERT INTO #Claims VALUES ('2020-01', '2020-02', 1134532.74)
INSERT INTO #Claims VALUES ('2020-01', '2020-03', 188594.78)
INSERT INTO #Claims VALUES ('2020-01', '2020-04', 70137.52)
INSERT INTO #Claims VALUES ('2020-01', '2020-05', 124190.26)
INSERT INTO #Claims VALUES ('2020-01', '2020-06', 18138.45)
INSERT INTO #Claims VALUES ('2020-01', '2020-07', 18531.28)
INSERT INTO #Claims VALUES ('2020-01', '2020-08', -1482.38)
INSERT INTO #Claims VALUES ('2020-01', '2020-09', 720.42)
INSERT INTO #Claims VALUES ('2020-01', '2020-10', 686.29)
INSERT INTO #Claims VALUES ('2020-01', '2020-11', -2911.24)
INSERT INTO #Claims VALUES ('2020-01', '2020-12', -2167.44)
INSERT INTO #Claims VALUES ('2020-02', '2020-02', 584617.08)
INSERT INTO #Claims VALUES ('2020-02', '2020-03', 1047852.64)
INSERT INTO #Claims VALUES ('2020-02', '2020-04', 99173.93)
INSERT INTO #Claims VALUES ('2020-02', '2020-05', 78100.60)
INSERT INTO #Claims VALUES ('2020-02', '2020-06', 21968.19)
INSERT INTO #Claims VALUES ('2020-02', '2020-07', 216390.93)
INSERT INTO #Claims VALUES ('2020-02', '2020-08', 10914.69)
INSERT INTO #Claims VALUES ('2020-02', '2020-09', 2337.36)
INSERT INTO #Claims VALUES ('2020-02', '2020-10', 50575.20)
INSERT INTO #Claims VALUES ('2020-02', '2020-11', 80.11)
INSERT INTO #Claims VALUES ('2020-02', '2020-12', -15320.99)
INSERT INTO #Claims VALUES ('2020-03', '2020-03', 576962.31)
INSERT INTO #Claims VALUES ('2020-03', '2020-04', 1286557.68)
INSERT INTO #Claims VALUES ('2020-03', '2020-05', 202195.81)
INSERT INTO #Claims VALUES ('2020-03', '2020-06', 53004.17)
INSERT INTO #Claims VALUES ('2020-03', '2020-07', 20011.32)
INSERT INTO #Claims VALUES ('2020-03', '2020-08', 17255.51)
INSERT INTO #Claims VALUES ('2020-03', '2020-09', 4927.42)
INSERT INTO #Claims VALUES ('2020-03', '2020-10', 6975.73)
INSERT INTO #Claims VALUES ('2020-03', '2020-11', 2875.00)
INSERT INTO #Claims VALUES ('2020-03', '2020-12', 29476.26)
INSERT INTO #Claims VALUES ('2020-04', '2020-04', 597969.87)
INSERT INTO #Claims VALUES ('2020-04', '2020-05', 1364946.64)
INSERT INTO #Claims VALUES ('2020-04', '2020-06', 175720.85)
INSERT INTO #Claims VALUES ('2020-04', '2020-07', 454843.82)
INSERT INTO #Claims VALUES ('2020-04', '2020-08', 13753.23)
INSERT INTO #Claims VALUES ('2020-04', '2020-09', 329825.52)
INSERT INTO #Claims VALUES ('2020-04', '2020-10', 85833.35)
INSERT INTO #Claims VALUES ('2020-04', '2020-11', 16380.46)
INSERT INTO #Claims VALUES ('2020-04', '2020-12', 6017.90)
INSERT INTO #Claims VALUES ('2020-05', '2020-05', 767536.93)
INSERT INTO #Claims VALUES ('2020-05', '2020-06', 1872765.36)
INSERT INTO #Claims VALUES ('2020-05', '2020-07', 190876.16)
INSERT INTO #Claims VALUES ('2020-05', '2020-08', 78906.67)
INSERT INTO #Claims VALUES ('2020-05', '2020-09', 25733.30)
INSERT INTO #Claims VALUES ('2020-05', '2020-10', 10015.18)
INSERT INTO #Claims VALUES ('2020-05', '2020-11', 9378.80)
INSERT INTO #Claims VALUES ('2020-05', '2020-12', 20574.25)
INSERT INTO #Claims VALUES ('2020-06', '2020-06', 870292.01)
INSERT INTO #Claims VALUES ('2020-06', '2020-07', 1405074.69)
INSERT INTO #Claims VALUES ('2020-06', '2020-08', 386522.82)
INSERT INTO #Claims VALUES ('2020-06', '2020-09', 42604.17)
INSERT INTO #Claims VALUES ('2020-06', '2020-10', 19436.75)
INSERT INTO #Claims VALUES ('2020-06', '2020-11', 142223.15)
INSERT INTO #Claims VALUES ('2020-06', '2020-12', 19072.64)
INSERT INTO #Claims VALUES ('2020-07', '2020-07', 877692.58)
INSERT INTO #Claims VALUES ('2020-07', '2020-08', 1140596.65)
INSERT INTO #Claims VALUES ('2020-07', '2020-09', 110424.90)
INSERT INTO #Claims VALUES ('2020-07', '2020-10', 48749.23)
INSERT INTO #Claims VALUES ('2020-07', '2020-11', 51669.82)
INSERT INTO #Claims VALUES ('2020-07', '2020-12', 19693.93)
INSERT INTO #Claims VALUES ('2020-08', '2020-08', 961458.48)
INSERT INTO #Claims VALUES ('2020-08', '2020-09', 1272300.45)
INSERT INTO #Claims VALUES ('2020-08', '2020-10', 330720.49)
INSERT INTO #Claims VALUES ('2020-08', '2020-11', 218056.45)
INSERT INTO #Claims VALUES ('2020-08', '2020-12', 265020.38)
INSERT INTO #Claims VALUES ('2020-09', '2020-09', 872773.67)
INSERT INTO #Claims VALUES ('2020-09', '2020-10', 923483.82)
INSERT INTO #Claims VALUES ('2020-09', '2020-11', 330758.94)
INSERT INTO #Claims VALUES ('2020-09', '2020-12', 79095.82)
INSERT INTO #Claims VALUES ('2020-10', '2020-10', 774649.34)
INSERT INTO #Claims VALUES ('2020-10', '2020-11', 1255499.92)
INSERT INTO #Claims VALUES ('2020-10', '2020-12', 190224.73)
INSERT INTO #Claims VALUES ('2020-11', '2020-11', 1108094.52)
INSERT INTO #Claims VALUES ('2020-11', '2020-12', 877579.26)
INSERT INTO #Claims VALUES ('2020-12', '2020-12', 1280162.27)

As a starting point, I'm going to create variables representing the report month (the date I'm calculating IBNR in reference to), the valuation date (potentially a later date, to allow for the possibility that this is a recast based on some additional claims runout, even though in this case I'm just setting it equal to the last day of the report month), and the start date (the first day of the first coverage month in the lag triangle):

DECLARE @rptMonth VARCHAR(7), @startDt DATE, @valDt DATE

SET @rptMonth = '2020-12' -- Latest month, in "YYYY-MM" format
SET @startDt = DATEADD(MONTH, -11, @rptMonth + '-01')
SET @valDt = DATEADD(DAY, -1, DATEADD(MONTH, 1, @rptMonth + '-01'))

I also need variables to keep track of months in terms of relative references since it would be bad practice to create columns named after specific months:

DECLARE @month01 VARCHAR(7)
      , @month02 VARCHAR(7)
      , @month03 VARCHAR(7)
      , @month04 VARCHAR(7)
      , @month05 VARCHAR(7)
      , @month06 VARCHAR(7)
      , @month07 VARCHAR(7)
      , @month08 VARCHAR(7)
      , @month09 VARCHAR(7)
      , @month10 VARCHAR(7)
      , @month11 VARCHAR(7)
      , @month12 VARCHAR(7)

SET @month01 = LEFT(@startDt, 7)
SET @month02 = LEFT(DATEADD(MONTH,  1, @startDt), 7)
SET @month03 = LEFT(DATEADD(MONTH,  2, @startDt), 7)
SET @month04 = LEFT(DATEADD(MONTH,  3, @startDt), 7)
SET @month05 = LEFT(DATEADD(MONTH,  4, @startDt), 7)
SET @month06 = LEFT(DATEADD(MONTH,  5, @startDt), 7)
SET @month07 = LEFT(DATEADD(MONTH,  6, @startDt), 7)
SET @month08 = LEFT(DATEADD(MONTH,  7, @startDt), 7)
SET @month09 = LEFT(DATEADD(MONTH,  8, @startDt), 7)
SET @month10 = LEFT(DATEADD(MONTH,  9, @startDt), 7)
SET @month11 = LEFT(DATEADD(MONTH, 10, @startDt), 7)
SET @month12 = LEFT(DATEADD(MONTH, 11, @startDt), 7)

Next, summarize PAID_AMT by COV_MONTH and PAID_MONTH and store the results in a temporary table called #t. This is our lag triangle (if you run SELECT * FROM #t ORDER BY PAID_MONTH you'll see that it's identical to the spreadsheet version of the lower triangular array above):

IF OBJECT_ID('tempdb.dbo.#t') IS NOT NULL
BEGIN
     DROP TABLE #t
END
SELECT PAID_MONTH
     , ROW_NUMBER() OVER(ORDER BY PAID_MONTH) - 1 AS r
     , SUM(CASE WHEN COV_MONTH = @month01 THEN PAID_AMT ELSE 0 END) AS MONTH01
     , SUM(CASE WHEN COV_MONTH = @month02 THEN PAID_AMT ELSE 0 END) AS MONTH02
     , SUM(CASE WHEN COV_MONTH = @month03 THEN PAID_AMT ELSE 0 END) AS MONTH03
     , SUM(CASE WHEN COV_MONTH = @month04 THEN PAID_AMT ELSE 0 END) AS MONTH04
     , SUM(CASE WHEN COV_MONTH = @month05 THEN PAID_AMT ELSE 0 END) AS MONTH05
     , SUM(CASE WHEN COV_MONTH = @month06 THEN PAID_AMT ELSE 0 END) AS MONTH06
     , SUM(CASE WHEN COV_MONTH = @month07 THEN PAID_AMT ELSE 0 END) AS MONTH07
     , SUM(CASE WHEN COV_MONTH = @month08 THEN PAID_AMT ELSE 0 END) AS MONTH08
     , SUM(CASE WHEN COV_MONTH = @month09 THEN PAID_AMT ELSE 0 END) AS MONTH09
     , SUM(CASE WHEN COV_MONTH = @month10 THEN PAID_AMT ELSE 0 END) AS MONTH10
     , SUM(CASE WHEN COV_MONTH = @month11 THEN PAID_AMT ELSE 0 END) AS MONTH11
     , SUM(CASE WHEN COV_MONTH = @month12 THEN PAID_AMT ELSE 0 END) AS MONTH12
INTO #t
FROM #Claims
WHERE PAID_MONTH BETWEEN LEFT(@startDt, 7) AND LEFT(@valDt, 7)
GROUP BY PAID_MONTH

The next snippet carries out the "lag transform" on the original triangle and stores the results in a temporary table called #s:

IF OBJECT_ID('tempdb.dbo.#s') IS NOT NULL
BEGIN
     DROP TABLE #s
END
SELECT T01.PAID_MONTH
     , T01.r
     , T01.MONTH01 AS c01
     , ISNULL(T02.MONTH02, 0) AS c02
     , ISNULL(T03.MONTH03, 0) AS c03
     , ISNULL(T04.MONTH04, 0) AS c04
     , ISNULL(T05.MONTH05, 0) AS c05
     , ISNULL(T06.MONTH06, 0) AS c06
     , ISNULL(T07.MONTH07, 0) AS c07
     , ISNULL(T08.MONTH08, 0) AS c08
     , ISNULL(T09.MONTH09, 0) AS c09
     , ISNULL(T10.MONTH10, 0) AS c10
     , ISNULL(T11.MONTH11, 0) AS c11
     , ISNULL(T12.MONTH12, 0) AS c12
INTO #s
FROM #t T01 LEFT OUTER JOIN #t T02 ON T01.r = T02.r - 1
            LEFT OUTER JOIN #t T03 ON T01.r = T03.r - 2
            LEFT OUTER JOIN #t T04 ON T01.r = T04.r - 3
            LEFT OUTER JOIN #t T05 ON T01.r = T05.r - 4
            LEFT OUTER JOIN #t T06 ON T01.r = T06.r - 5
            LEFT OUTER JOIN #t T07 ON T01.r = T07.r - 6
            LEFT OUTER JOIN #t T08 ON T01.r = T08.r - 7
            LEFT OUTER JOIN #t T09 ON T01.r = T09.r - 8
            LEFT OUTER JOIN #t T10 ON T01.r = T10.r - 9
            LEFT OUTER JOIN #t T11 ON T01.r = T11.r - 10
            LEFT OUTER JOIN #t T12 ON T01.r = T12.r - 11

At this point we still need to get the cumulative paid amounts per coverage month, for each successive paid month so we can calculate the LDFs. The following SQL produces a temporary table #u containing the cumulative paid amounts:

IF OBJECT_ID('tempdb.dbo.#u') IS NOT NULL
BEGIN
     DROP TABLE #u
END
SELECT S.r
     , SUM(T.c01) AS c01
     , SUM(T.c02) AS c02
     , SUM(T.c03) AS c03
     , SUM(T.c04) AS c04
     , SUM(T.c05) AS c05
     , SUM(T.c06) AS c06
     , SUM(T.c07) AS c07
     , SUM(T.c08) AS c08
     , SUM(T.c09) AS c09
     , SUM(T.c10) AS c10
     , SUM(T.c11) AS c11
     , SUM(T.c12) AS c12
INTO #u
FROM #s S LEFT OUTER JOIN #s T ON S.r >= T.r
GROUP BY S.r

And from there it's a simple matter to get the LDFs, which I've decided to store in a temporary table called #v:

IF OBJECT_ID('tempdb.dbo.#v') IS NOT NULL
BEGIN
     DROP TABLE #v
END
SELECT U.r
     , CAST(U.r AS VARCHAR) + '-' + CAST(V.r AS VARCHAR) AS LAG_STEP
     , V.c01/U.c01 AS c01
     , V.c02/U.c02 AS c02
     , V.c03/U.c03 AS c03
     , V.c04/U.c04 AS c04
     , V.c05/U.c05 AS c05
     , V.c06/U.c06 AS c06
     , V.c07/U.c07 AS c07
     , V.c08/U.c08 AS c08
     , V.c09/U.c09 AS c09
     , V.c10/U.c10 AS c10
     , V.c11/U.c11 AS c11
     , V.c12/U.c12 AS c12
INTO #v
FROM #u U INNER JOIN #u V ON U.r = V.r - 1

We're finally ready to calculate the truncated average LDFs, and this is where the SQL gets a bit more advanced. We need to create a table to store the factors and populate it with our truncated averages (note the function calls to the UDF we created earlier):

IF OBJECT_ID('tempdb.dbo.#ldf') IS NOT NULL
BEGIN
     DROP TABLE #ldf
END
CREATE TABLE #ldf (r INT PRIMARY KEY, LAG_STEP VARCHAR(5), AVG_LDF FLOAT, TRUNC_LDF FLOAT, CF FLOAT, COV_MONTH VARCHAR(7))
INSERT INTO #ldf
SELECT r
     , LAG_STEP
     , CASE WHEN r = 0 THEN (c11 + c10 + c09 + c08 + c07 + c06)/6
            WHEN r = 1 THEN (c10 + c09 + c08 + c07 + c06 + c05)/6
            WHEN r = 2 THEN (c09 + c08 + c07 + c06 + c05 + c04)/6
            WHEN r = 3 THEN (c08 + c07 + c06 + c05 + c04 + c03)/6
            WHEN r = 4 THEN (c07 + c06 + c05 + c04 + c03 + c02)/6
            WHEN r = 5 THEN (c06 + c05 + c04 + c03 + c02 + c01)/6
            WHEN r = 6 THEN (c05 + c04 + c03 + c02 + c01)/5
            WHEN r = 7 THEN (c04 + c03 + c02 + c01)/4
            WHEN r = 8 THEN (c03 + c02 + c01)/3
            WHEN r = 9 THEN (c02 + c01)/2
            WHEN r = 10 THEN (c01)/1
            ELSE NULL
       END AS AVG_LDF
     , CASE WHEN r = 0 THEN [dbo].TRUNC_AVG(c11, c10, c09, c08, c07, c06)
            WHEN r = 1 THEN [dbo].TRUNC_AVG(c10, c09, c08, c07, c06, c05)
            WHEN r = 2 THEN [dbo].TRUNC_AVG(c09, c08, c07, c06, c05, c04)
            WHEN r = 3 THEN [dbo].TRUNC_AVG(c08, c07, c06, c05, c04, c03)
            WHEN r = 4 THEN [dbo].TRUNC_AVG(c07, c06, c05, c04, c03, c02)
            WHEN r = 5 THEN [dbo].TRUNC_AVG(c06, c05, c04, c03, c02, c01)
            WHEN r = 6 THEN (c05 + c04 + c03 + c02 + c01)/5
            WHEN r = 7 THEN (c04 + c03 + c02 + c01)/4
            WHEN r = 8 THEN (c03 + c02 + c01)/3
            WHEN r = 9 THEN (c02 + c01)/2
            WHEN r = 10 THEN (c01)/1
            ELSE NULL
       END AS TRUNC_LDF
     , 1.0 AS COMP_PCT
     , NULL AS COV_MONTH
FROM #v

INSERT INTO #ldf VALUES (11, '11-12', 1.0, 1.0, 1.0, NULL)

The reason I said the SQL is about to get more advanced is because, in order to compound the incremental LDFs, I've found it helpful to use a cursor. There is a more elegant solution utilizing relational concepts, but it's less intuitive in my opinion and anyhow the cursor does the job:

DECLARE @r INT, @LAG_STEP VARCHAR(5), @AVG_LDF FLOAT, @TRUNC_LDF FLOAT, @CF FLOAT, @COV_MONTH VARCHAR(7), @prev FLOAT

DECLARE lag CURSOR
FOR SELECT * FROM #ldf ORDER BY r DESC
FOR UPDATE OF CF

OPEN lag

	FETCH NEXT FROM lag
	INTO @r, @LAG_STEP, @AVG_LDF, @TRUNC_LDF, @CF, @COV_MONTH
	UPDATE #ldf SET CF = 1.0 WHERE CURRENT OF lag
	SET @prev = 1.0

	WHILE @@FETCH_STATUS = 0
	BEGIN
		FETCH NEXT FROM lag INTO @r, @LAG_STEP, @AVG_LDF, @TRUNC_LDF, @CF, @COV_MONTH
		UPDATE #ldf SET CF = @prev/@TRUNC_LDF WHERE CURRENT OF lag
		SET @prev = @prev/@TRUNC_LDF
	END

CLOSE lag;
DEALLOCATE lag;

UPDATE #ldf SET COV_MONTH = @month01 WHERE r = 11
UPDATE #ldf SET COV_MONTH = @month02 WHERE r = 10
UPDATE #ldf SET COV_MONTH = @month03 WHERE r = 9
UPDATE #ldf SET COV_MONTH = @month04 WHERE r = 8
UPDATE #ldf SET COV_MONTH = @month05 WHERE r = 7
UPDATE #ldf SET COV_MONTH = @month06 WHERE r = 6
UPDATE #ldf SET COV_MONTH = @month07 WHERE r = 5
UPDATE #ldf SET COV_MONTH = @month08 WHERE r = 4
UPDATE #ldf SET COV_MONTH = @month09 WHERE r = 3
UPDATE #ldf SET COV_MONTH = @month10 WHERE r = 2
UPDATE #ldf SET COV_MONTH = @month11 WHERE r = 1
UPDATE #ldf SET COV_MONTH = @month12 WHERE r = 0

Now let's put our SQL procedure to the test by running the query that I proposed at the beginning of the article (with the exception that the reciprocal of the completion factor is playing the role of the LDF):

SELECT C.COV_MONTH
     , SUM(PAID_AMT) AS PAID_AMT
     , SUM((1/L.CF - 1) * PAID_AMT) AS IBNR
     , SUM(1/L.CF * PAID_AMT) AS INCURRED_AMT
FROM #Claims C LEFT OUTER JOIN #ldf L ON C.COV_MONTH = L.COV_MONTH

GROUP BY C.COV_MONTH
ORDER BY C.COV_MONTH

Note that the query results match the spreadsheet version above:

COV_MONTH   PAID_AMT    CF      INCURRED_AMT  IBNR
2020-01	    2162609.72  1.0000  2162609.72    0
2020-02	    2096689.74  1.0011  2094383.38    -2306.36
2020-03	    2200241.21  1.0055  2188150.53    -12090.68
2020-04	    3045291.64  1.0009  3042488.64    -2803
2020-05	    2975786.65  0.9940  2993858.96    18072.31
2020-06	    2885226.23  0.9909  2911747.07    26520.84
2020-07	    2248827.11  0.9851  2282888.25    34061.14
2020-08	    3047556.25  0.9411  3238423.6     190867.35
2020-09	    2206112.25  0.9297  2372997.7     166885.45
2020-10	    2220373.99  0.8912  2491514.51    271140.52
2020-11	    1985673.78  0.7951  2497537.74    511863.96
2020-12	    1280162.27  0.3421  3741731.34    2461569.07

Final Thoughts

That completes the procedure. I thank you for reading, and if you have any comments/questions/corrections, please feel free to reach out and discuss. Before I close though, I would be remiss if I didn't enumerate the pros and cons of the SQL approach vs a spreadsheet-based model. First the pros:

The SQL stored procedure, if coded correctly, is airtight. I've had situations where I updated my Excel workbook and then my SQL output revealed errors in the workbook as a result of the updates I tried to make -- I accidentally introduced human error into the spreadsheet, which is easy to do and often goes unnoticed!
The SQL approach separates the model from the data. Whereas the user can run the SQL procedure on whatever data he or she wants -- within the database, in the spreadsheet version of the model one has to create a new tab for every new lag triangle input (or write a macro to create a copy of the model output with the formulas stripped out, which is actually more technically demanding than the SQL procedure!).
As I mentioned in my first paragraph, the model results can be piped into a different software environment to carry out, e.g., the Bornhuetter-Ferguson method. Personally, I use R for the next step in my process. This is more cumbersome in Excel unless you use robust macros, which come with drawbacks in my opinion.
The procedure can be attached to a trigger in SQL, so that when you upload new paid claims data, the LDFs update automatically and virtually instantaneously, without the need to i) open a file, ii) paste new data -- which is of course prone to human error, and iii) run a macro. So the SQL approach is time saving as well.

There are only a few cons that I can think of, one of them much more significant than the other two:

Unless this were a part of an interactive UI -- which is beyond the scope of my work, there's no room to incorporate the analyst's professional judgement. For example there might have been a high dollar claim with an abnormally long lag, that has a disproportionate effect on the completion factors. In a situation like that, while you'd of course take the claims dollars as given, you might nonetheless want to smooth out the LDFs around that lag. Unfortunately, this is impractical in SQL but very easy and transparent in a spreadsheet. In that way spreadsheets are more conducive to ad hoc sensitivity analysis.

A couple of additional, minor cons are:

To expand on the transparency point at the end of the previous bullet, when it comes to sharing the model, the "disadvantage" of the data being wedded to the model can actually be an advantage. This makes spreadsheets more conducive to audit.
If your lag triangle doesn't exist in the database, then it's easier to paste it into a spreadsheet-based model (although in so doing, you do introduce the possibility of human error).

Doug K MBA, CHFP

Healthcare Finance / Cost / Data Analysis

10 个月

Thank you, Daniel. I just slightly modified the query for the final results to include both the LDF and CF%. Now, to test on real claims data!

1 次回应

Kevin Roberts

Employee Benefits Health Insurance Broker

4 年

I was following you right up to the “dive right in” part.

1 次回应

Tom Zimmerman, CEBS

Benefits Consultant at Seubert & Associates, Inc.

4 年

Daniel, Thank you for sharing your process. Very impressive congruence of SQL coding and Actuarial expertise!

1 次回应

Carlos Flores

CSL Vifor/ Former Apple, Gilead, McKesson and BMS

4 年

Wow... very impressive, Daniel. I enjoyed the well written explanation. It was easy to follow. I will try to implement this new knowledge in the near future. Thank you for sharing.

1 次回应

查看更多评论

要查看或添加评论，请登录

Daniel Nolan的更多文章

Modeling Investment Portfolio Returns with GBM

2021年3月17日

Modeling Investment Portfolio Returns with GBM

Background and Purpose In my role as an actuarial analyst for a health and welfare benefits trust, my primary…

3 条评论
Pharmacy Rebate Projections from the Carrier's Point of View: an Alternative Approach

2021年3月2日

Pharmacy Rebate Projections from the Carrier's Point of View: an Alternative Approach

Background & Purpose In this article I plan to talk about a topic that, in my opinion, doesn't get enough attention in…

1 条评论

The Chain-Ladder IBNR Method in SQL

Daniel Nolan

Insurance Analytics

SQL Environment

The Chain-Ladder Method

SQL Procedure

Final Thoughts

Daniel Nolan的更多文章

社区洞察

其他会员也浏览了

SQL 101: Top Mistakes Beginners Should Avoid

Top 9 Best Practices When Writing SQL

SQL Clause (Select, Where, Order by, Limit) | SQL-MySQL | Belayet Hossain

5 things I wish I had learned early in my SQL career

Sameer Gaikwad Triggers in SQL mesameergaikwad

SQL Query Order Execution

SQL Subqueries: Queries within Queries

DELETE, TRUNCATE, DROP: SQL, Simplified with a Paper Analogy

SQL Challenge #9

Intermediate SQL takeaways

SQL Environment

The Chain-Ladder Method

SQL Procedure

Final Thoughts

Daniel Nolan的更多文章

Modeling Investment Portfolio Returns with GBM

Pharmacy Rebate Projections from the Carrier's Point of View: an Alternative Approach

社区洞察

其他会员也浏览了

SQL 101: Top Mistakes Beginners Should Avoid

Top 9 Best Practices When Writing SQL

SQL Clause (Select, Where, Order by, Limit) | SQL-MySQL | Belayet Hossain

5 things I wish I had learned early in my SQL career

Sameer Gaikwad Triggers in SQL mesameergaikwad

SQL Query Order Execution

SQL Subqueries: Queries within Queries

DELETE, TRUNCATE, DROP: SQL, Simplified with a Paper Analogy

SQL Challenge #9

Intermediate SQL takeaways