Why do we do MCP?
When evaluating the wind energy potential of a proposed wind farm site, measurements of wind conditions are acquired for a period of at least one year, but preferably significantly longer. However, no matter how large the budget committed to pre-construction data acquisition and analysis (which is typically heavily subject to false economies anyway), the duration of the pre-construction measurement campaign will inevitably be much shorter than the expected lifetime of the wind farm.
Therefore it is necessary to extrapolate these measurements in some way, to estimate the long term conditions that will prevail over the project lifetime. We want to have data that are representative of the wind conditions that will determine the annual average energy production (AEP) over the life of the wind farm. This extrapolation allows us to calculate the energy yield with sufficient confidence to support pre-construction investment decisions. It also allows us to classify the site so that appropriate turbine technology can be selected.
We calculate a median energy yield (P50) that allows the project to be valued for equity purposes, and also perform an uncertainty analysis to reflect, among other things, decadal variations in yield so that debt can be sized according to a percentile of AEP that represent production that is sufficiently reliable for financial purposes, such as the 10th percentile (P90).
So the relationship between long term conditions typical of the project lifetime and conditions that prevailed during the short term pre-construction measurement campaign, from which project site data are available, must be ascertained to allow those short term measurements to be adjusted to reflect longer term conditions on site from which a P50 can be derived. The uncertainty in this adjustment contributes (among other things) to the analysis that determines the P90.
- Data describing the short term conditions are available. We have measured these.
- Data describing the long term conditions at the proposed wind farm location (the target site) are not available. We need to predict these.
- Data describing long term conditions at another site (the reference site) may be available. If we can establish a relationship between the short conditions at the target and reference sites, we can apply this relationship to the long term conditions at the reference site to predict the long term conditions at the target site.
A relationship can be established by correlating concurrent short term measurements from the target and reference sites. Hence this procedure is called Measure-Correlate-Predict, or MCP. Short term measurements from the reference site that coincide with short term measurements from the target site are correlated with target site measurements to determine a relationship that can be applied to long term data from the reference site. The reference site could be a weather station where observations have historically been acquired for a meteorological service, or could be a nearby node of a reanalysis data product, or something similar, or some combination thereof.
Conventionally, the relationship is obtained using some form of linear regression. A straight line is fitted through the plot of data points showing pairs of simultaneous wind speeds measured at the target and reference site. This could be a two parameter fit (including slope and offset) or, historically, sometimes a one parameter fit was used (with no offset, so a relationship forced through the origin). Ordinary least square (OLS) linear regression might be used. Alternatively, some other method that attributes measurement uncertainty to both axes, such as bisector or orthogonal methods, might be used.
Different relationships corresponding to different wind direction sectors, seasonal phenomena, or diurnal variations in wind conditions, can all be accommodated by undertaking different fits to the data for these circumstances.
Whatever the case, it is assumed that the relationships are linear.
As you might have guessed, I wouldn't raise such an apparently simple question as "why do we do MCP?" if I didn't think it was an opportunity to be provocative and to scrutinise our assumptions. Of course we need to measure (the "M") and of course we need to predict (the "P), but is linear regression really the best way to establish the relationship that connects the M and P by correlation (the "C" of "MCP"). In fact we can establish a relation without performing a correlation.
Consider Measure-Correlate-Predict using short term concurrent data from both target and reference sites with the target site measurements denoted y and the reference site measurements denoted x. Long term measurements x are also available from the reference site. We want to predict long term conditions y at the target site.
Let's assume a relationship exists. In the ideal case, perfect rank correlation exists between the short term data sets. This means low wind measurements at one site coincide with low wind measurements at the other, high wind measurements coincide, individual percentiles of each set of measurement coincide, and so on. Then the cumulative distribution functions (CDFs) can be equated. That's what perfect rank correlation implies.
The data sets can be still segmented to reflect directional, diurnal and seasonal effects and variations as described above, where different relationships prevail under different circumstances, and multi-modal distributions need to be teased apart into their different constituent parts. Relationships can still be established by equating CDFs.
This can be done empirically, by sorting the data from each site into ascending order and then pairing them off by rank to establish an empirical relationship. There is likely to still be some degree of scatter so some binning and averaging can be done and the scatter analysed to inform an uncertainty analysis. However, it is also possible to do this analytically for some CDFs that are described by amenable functions.
Assuming wind speeds are distributed according to a Weibull distributions, equating CDFs F?(y) for target wind speeds y and F?(x) for reference wind speeds x gives:
This gives a relationship of the form:
where
(k? and k? are the reference and target shape parameters, and λ? and λ? are the reference and target scale parameters respectively)
MCP assumes the relationship between the sites that is manifested in the short term data persists over the long term, otherwise the relationship inferred from the short term data would be useless for prediction of long term target site conditions. The relationship we have inferred above then allows the long term target Weibull parameters to be predicted from the short term target and reference parameters and the long term reference parameters thus:
and
(Primed quantities represents the long term period, and unprimed quantities represent the short term period)
The term “Modified Weibull Scaling” (MWS) has been used to distinguish this method because the term “Weibull Scaling” has previously been used to denote a method implemented in WindPro. This earlier method adjusts only the scale parameter. This means it assumes target and reference shape parameters are equal. If the shape parameters are equal we can see the relationship reduces to the familiar linear form of the sort used to model the data in conventional MCP methods. The point of the method presented here is to accommodate the intrinsic non-linearity of the relationship. If target and reference shape parameters are not equal this can introduce significant non-linearity. This is illustrated in the plot below.
A target and reference Weibull distribution are shown, and the relationship between pairs of equally ranked values is plotted, empirically matching values by rank as described above. It is clear this relationship is not linear. This is a consequence of the shape parameters of the Weibull distributions not being equal, such that the exponent in the relationship between them does not reduce to unity. The one- and two-parameter straight line fits through these pairs of equally ranked values are also shown. It is clear that forcing the relationship between the two sites to be linear will introduce a bias in the wind conditions at one site predicted using measurements at the other.
There are additional advantages to using this non-linear method, over and above the reduction in bias:
- Information loss is minimised. It is often the case that the averaging intervals of the target and reference site data are different, so that data from one must be averaged to the averaging interval of the other. This is so that pairs of wind speeds can be obtained. For example, target site data are typically 10-minute met mast data, whereas reference data are typically hourly. Therefore information is lost as the target site data are averaged to 1-hour intervals for linear correlation to be performed.
- This information is not lost when you simply equate the cdfs of the Weibull distributions observed during the short term period at both sites. It is possible to use the relationship between short term 10-minute data at the target site and short term hourly data at the reference site to predict long term 10-minute data at the target site from long term hourly reference site data. No averaging of 10-minute data to hourly data is necessary to equate cdfs, whereas regression of the short-term data to establish relationships for extrapolation long-term conditions may require matching the longer averaging interval of the reference dataset, which incurs loss of information.
- MWS is based on the comparison of the wind speed distributions and can accommodate different averaging intervals. MWS is robust under circumstances where different averaging intervals are used at target and reference sites. The minimal duration of some measurement campaigns requires robustness with limited data. The minimisation of information loss and the accommodation of non-linearity makes MWS more robust with limited data sets compared to linear regression.
So, for example, meaningful results can be obtained with 6 months of short term data. I can think of a particular case in which only 6 months of data were available. There was a met mast a few km away for which several years of data were also available. Linear MCP and MWS were both used with the 6 month data set, using a nearby reanalysis node as the reference site. The results were compared with CFD extrapolation of the several years of data from the nearby met mast. The MWS method performed significantly better in this comparison, providing a result that was considered satisfactory while linear MCP failed to adequately predict the long term conditions.
In addition:
- Linear methods introduce bias when the shape parameter of the wind speed distributions at the target and long term reference sites are different. The difference in the shape parameters means the exponent in the relationship between the sites is not unity, and so the relationship is non-linear. This non-linearity is captured by MWS.
- Scaling methods relate wind speed distributions observed at the target and reference site directly. No intermediate step is required to perform a correlation that imposes a linear relationship derived using OLS linear regression irrespective of whether a linear model is appropriate. Data synchronisation issues do not arise to the same extent. Regression requires that the time series are very closely synchronised. This can represent a non-trivial technical challenge. For example, data logger may encounter clock drift. MWS is less sensitive to deviations from strict synchronisation. Any time offset between the two data sets in MWS is significant in comparison to the duration of the short term measurement campaign, not the duration of the averaging interval, as in the case of regression in conventional MCP.
- When performing linear regression, the most common method used is OLS. This assigns all measurement error to the y-axis, and assumes the measurements plotted against the x-axis are perfect. That's like saying the reference instruments were perfect, and only the target instruments were liable to error.
- Biases arise when using OLS due to this attribution of error to one axis. These biases can be significant in relationships with a high degree of scatter. They can be avoided by using scaling methods like MWS where scatter corresponds to the quality of the Weibull fits. This is therefore attriobuted to the variables plotted against both axes, since it is observed in the fits for both the target and reference sites. No assumptions about the attribution of error are necessary and the simpler uncertainty analysis is also the more realistic uncertainty analysis.
I provided some anecdotal evidence about the performance of MWS above. I performed a more systematic comparison to evaluate this. A (crude) comparison of MWS and OLS was performed by cross-predicting between sites for which both long term and short term data were available using data from 11 MERRA 2 reanalysis nodes in the south of England.
The results were somewhat surprising, in that the difference in the average prediction error using the two methods was 0.01%. They both performed in a remarkably similar way in terms of prediction accuracy. MWS was expected to perform better than OLS. Why didn’t it? Well, we can see that some of the advantages of MWS I listed above didn't apply on this occasion:
- The same averaging interval was used at all sites over both the short and long term, so there was no OLS information loss relative to MWS.
- The wind speed Weibull distributions from all nodes had very similar Weibull shape parameters, all being associated with the same height in very similar terrain, and so the relationships were very close to linear anyway.
- I used a satisfactory duration of short term period for prediction (two years), so the robustness of MWS over shorter periods did not come into play.
- The OLS error attribution bias canceled out between node pairs in the cross-prediction (i.e. the bias predicting Node 1 from Node 2 cancels out with the bias predicting Node 2 from Node 1).
So, put it another way: even giving conventional MCP every possible advantage, removing various sources of error in an unrealistic manner that is not typical of real world projects, it still couldn't out-perform MWS. In general MWS performs better when working with
- More constrained short term datasets, e.g. 6 months site assessment measurement campaigns
- A target and reference site with significantly difference conditions, i.e. different Weibull shape parameters (e.g. the situation where the long term reference data are from a 10 m airstrip met mast, while the short term target data are from a 100 m target site met mast)
- Long and short term data with different averaging intervals.
Senior scientific consultant at Met Office.
5 年Have we scrutinised the assumption of a negligble 'c'?? ...for those still correlating an anemometer at 100m on top of a hill to one at 10m in a flat airfield 20km away?? (Do people still do that?)? If I understand correctly that would require use of a 3-parameter Weibull distribution which, as far as I'm aware, is not supported in standard software packages?? It's been a long while, I'm rusty, and things may have moved on, but if this is still the case then wouldn't it be a very good argument for correlating the on-site met mast to a long term model that has been downscaled to the location and height of interest?? Any deviation from m=1, alpha=1, c=0 is then model error only, rather than differences in physical attributes of the locations.? So long as the model itself is consistent and the model bias distribution in the measurement period is representative of that of the long term (our way of saying "MCP assumes the relationship between the sites that is manifested in the short term data persists over the long term") then it is an ideal candidate for Modified Weibull Scaling.
Bespoke Solutions Team Lead at Tomorrow.io at Tomorrow.io (formerly ClimaCell)
5 年Ideally the reference site would be at or near hub-height.? Is it difficult to find such measurements freely available? In my area there are only a few 30-m towers, which I suppose it better than nothing, and many 10-m towers.?
Classically termed the Garrad Hassan (Andrew Tindal) method.
Head of Operational Excellence at Vattenfall
5 年Next step, consider systematic directional differences in correlation (preferably not by "binning" by direction)
Finder of Pareto Improvements
5 年I've used a similar method of CDF transformation based on King et al 2005 (https://journals.sagepub.com/doi/abs/10.1260/030952405774354868). In my experience this method can be more sensitive to the data period than linear methods (as you might expect when using a more powerful algorithm). Seasonal variation in particular must be carefully addressed. Your caveat about representativeness is extra important for more powerful non-linear methods!