Don't let reality ruin your analysis result
As we explained in our blog post about “Evaluating Performance Pricing Solutions (Part 1)”: The first and most critical aspect of any Performance Pricing solution is to understand that one analysis method (one mathematical approach) can not handle all cases you face during everyday work. There is just no single “one size fits all” approach for statistical price analysis/performance pricing.
Ok, I agree, we have to prove our statement… This time we elaborate on the topic providing examples. These examples show immediately the importance of this aspect. And that there is no “good enough result” when you use the wrong analysis method with your data.
Before we start, some context
Performance pricing uses the overall idea of using regression analysis for price analysis. Regression analysis finds out the relationship between a dependent variable (price) and one or more independent variables (product properties). It helps one understand how the value of the dependent variable (price) changes when any one of the independent variables changes (product properties). Regression analysis is widely used for prediction and forecasting.
The result of a performance pricing analysis, using regressions analysis, is a target-price formula. Using this target-price formula you can feed product properties into it and get back a target-price/should-cost of that particular product. Once you have a target-price formula, you can predict prices of any products. This is how such a formula for a machined part might look:
target-price = 3,558 +0,021 * ‘Quantity [Pcs]’ + 0,102 * ‘Weight [kg]’ + 0,013 * ‘Diameter [mm]’ + 0,020 * ‘Width [mm]’
Quantity, Weight, Diameter, and Width are product properties that differentiate between different machined parts.
A performance pricing solution uses the information (price and product properties) of a set of known machined parts as input to find this target-price formula.
From theory to reality
So far, so good. But reality makes such a regression analysis for price analysis pretty complicated. The following list shows some aspects that need to be considered:
- You want to get a reliable and realistic target-price formula you can trust. It doesn’t make sense if a performance pricing solution gives you target-prices that are unrealistic.
- Regression methods provide reliable results (in our case the target-price formula) only if their mathematical pre-conditions are fulfilled, and the method correctly captures the structure (how product properties impact the price) of the input data.
- It is easy to calculate many different regression models (target-price formula) which do not capture the structure of the input data and then calculate an unreliable and incorrect target-price for each part number.
- Only reliable regression models can capture the relationship between product properties and price.
- The regression methods used must extract the maximum amount of information from the input data (gain of knowledge) to calculate a regression model (target-price formula) with the best possible predicting power.
In summary, this leads to:
Only models that capture the structure of the input data and extract as much information as possible give reliable and usable results.
And now comes the big problem: Every regression method has some assumption about how the input data is structured. And only if this is the case is the result reliable. The problem is that we are unable to prove this assumption about the data structure up front for the methods we want to use.
What can we do in such a situation?
The first thing that should be clear is that it’s very likely that one regression method alone cannot give us a reliable target-price formula in all cases.
Next, if we are unable to prove the assumption about how the input data is structured before analysis, the only option left is to check how good the target-price formula is afterward.
If we have several regression methods at hand, we need to find out, which one gives the best target-price formula. Our Non-Linear Performance Pricing solution NLPP supports six different regression methods. It can automatically find out which of the six methods best captures the structure of the input data.
How reality ruins your analysis result
If you are planning to do performance pricing with statistical tools like Minitab etc. or if you are evaluating different performance pricing solutions, you will end up using a method called LPP-LSM which stands for Linear-Performance-Pricing using Least-Square-Method. On Wikipedia, you can read more about the Ordinary Least Squares (OLS) method.
Using Synthetic Generated Data
We created a tool that generates data with a specific structure using a Monte Carlo simulation and adding a bit of randomness to the result to get not perfect data. So, for the generated data we know up front which structure it has.
With such a data-set we can evaluate two things pretty easily:
- Does the performance pricing tool recognize the structure of the input data correctly?
- What happens if you apply a wrong method to the data?
The following sections will show what happens if you use a wrong method on your input data. This will be the case if you only have one method at hand and the data doesn't fit your tool.
For our tests, we created data sets containing 100 entries (products) with one product property and a price. Such input data has the simplest structure for doing a performance pricing analysis because we only have one parameter. Here is how such a generated file looks like:
Correct LPP-LSM Case
Let us start with the simplest case which most statistical software tools and performance pricing solutions support: The data is linear and has a normal distribution.
The following two graphics show a distribution plot of the independent variable (product property) and the dependent variable (price). The dashed line is the average, the solid line the median and 50% of all values are within the red box range. There are no outliers, to make the test even simpler.
Distribution Plot of the “Product Property” for LPP-LSM data:
Distribution Plot of the “Price” for LPP-LSM data:
Now let us take a look at the performance pricing result. The following graphic shows on the vertical axis the actual price from our input data and on the horizontal axis the target-price based on our regression result and calculated with the target-price formula.
Actual vs. Target price LPP-LSM:
The three lines are benchmark lines which show most likely upper (red), target (blue) and lower price bounds (green) for every data point.
The result looks good, which is obvious because we use LPP-LSM method on LPP-LSM data. The target-price formula (regression analysis result) looks like this:
target-price = 11,087.462 + 368.099 * ‘product-property’
Since NLPP supports six different regression methods, it calculates how many times more likely the above result is than any of the other five methods.
As you can see: The next likely model is the non-linear version (NLPP) using the same structure (LSM). After this the linear version (LPP) with the QR structure etc.
The order in the list is exactly expected and shows that NLPP can determine the structure of the input data reliable and correct.
Correct NLPP-QR Case
Now let us do the same with an NLPP-QR case. The data is non-linear and exponentially distributed. Such input data is pervasive in real-life price analysis.
Distribution Plot of the “Product Property” for NLPP-QR data:
Distribution Plot of the “Price” for NLPP-QR data:
As you can see the distribution plot for the product property looks pretty much the same as for the LPP-LSM case. But the distribution plot for the price now looks entirely different. The distribution shows that the relation between product property and price must be somehow different than linear.
Again, the plot of the regression result:
The three benchmark lines are now not parallel because the data is non-linear. And the regression formula:
target-price = exp (4,923 + 0,049 * ‘product-property’)
Taking a look at how many times more likely the above result is than any of the other five methods are kind of interesting now:
For an NLPP-QR structured input data, the other non-linear methods are much more likely than any linear method. This result makes a lot of sense.
If you compare the above table to the LPP-LSM table, you see that applying a non-linear method to linear data is much “better” than using a linear method with non-linear data.
Wrong Case: Using LPP-LSM method on NLPP-QR data
In this section, we show what happens if you only have one method like LPP-LSM available and apply it to data that does not full fill the necessary assumptions.
The distribution of the input data is, of course, the same as for the previous case. Hence I do not repeat it here.
Here is the analysis result plot "Actual vs. Target price misusing LPP-LSM on NLPP-QR data":
To be clear, we used the same input data as before but now used an LPP-LSM method to analyze it. It is pretty obvious that this result plot looks suspicious and strange.
Since we use perfectly generated data, you can even see that there seems to be a non-linear structure in the data. But your software using LPP-LSM cannot do any better. It gives you a result that just does not fit.
But please keep in mind, we used perfectly generated data, that is why you can see the problem. Real life data is not perfect, and you would not be able to see immediately, that that a result cannot be correct.
Maybe you are lucky and will find out the problem when looking at every data point in detail. One hint would be, that you can see above as well, that there are some negative target-prices predicted. All points left from 0 on the horizontal line have negative target-prices.
Not very reliable for a result of a price analysis tool. By the way, the target-price regression formula is:
target-price = -10.678,879 + 249,936 * ‘product-property’
We can now compare the two results and plot the absolute difference between the correct NLPP-QR result and the wrong LPP-LSM result. This shows how dramatically wrong the LPP-LSM results are:
We used 100 products in our test data set. As you can see mostly all LPP-LSM target-prices are not just a bit wrong or are close enough to be used, these target-prices are just wrong.
If you are using a result calculated with a wrong method, reality just ruined your analysis result and all decisions you base on it.
Conclusion
We showed the need to use the correct regression method that fits the structure of the data. Otherwise, the results are just nonsense.
Our test case is a straightforward one. We used perfectly generated data and only one independent variable. It cannot get any simpler for analysis tools. Our NLPP tool can reliably recognize the structure of your input data fully automatically and choose a regression method which gives the best target-price formula.
However, most tools are not able to even recognize and handle this simple case correctly. How likely is it, that the result will be reliable for real life data?
And for real life data, you may not recognize these type of problems. The results will still be totally wrong. If the target-price for some parts might be correct, it is pure luck.
Using a wrong regression method on your data is equivalent to throwing dice to get a target-price in terms of reliability and quality. You would not do the latter, do you really want to do the former?
If you want to get our perfectly generated data sets for evaluation purposes of your current tool or because you are evaluating performance pricing solutions, just send an email to [email protected], and we are more than happy to help.
This text was also published in our Performance Pricing blog at https://www.nlpp.ch/blog