Stars Cut Point Calculations (Clustering Method)

Stars Cut Point Calculations (Clustering Method)

How are the “raw scores” converted into Stars for each measure??

Up to this point, this "deep dive" into Stars has been pretty straight forward. Now things get more complex. You know the metrics that go into the Stars, but how does CMS take that raw data and come up with cut points to divide up the contracts and turn those scores into an actual Star rating? Well...

Up until 2016, CMS would use older data to calculate and publish cut points so that provider organizations would know ahead of time what scores they needed to achieve on certain measures in order to be awarded a specific Star level. They stopped doing that when they noted that plans improved more if they weren’t given cut-points, so now you don’t know how well you need to perform in order to get to the Star score you’re shooting for. Your score is based on how well your contract performed on a given measure compared to others in that measure that same year.?

CMS does this by applying one of two methods: “clustering,” or “relative distribution and significance testing.” Of course, both are pretty complicated.? Clustering is applied to the majority of the Star Ratings measures (pretty much everything except for those collected through CAHPS surveys), so let’s start there. I'll get to relative distribution and significance testing in my next post.

Clustering method:? CMS uses software called SAS (an integrated system of software products provided by SAS Institute Inc.) to perform the calculations used in producing the Star Ratings. What they’re trying to do is to take all of the scores on a given measure and group (or cluster) them in such a way that natural gaps become evident. Those gaps become the cut points which are used to determine which star rating a score falls into. On a simple level, the software performs a series of calculations where it slowly groups together scores that are more similar to each other than any other scores (or groups of scores) until it ends up with 5 different groups. If you want to really get into the nitty gritty, keep reading. Otherwise, that’s all you need to know for a basic understanding about this method and you should probably stop reading now or go grab a big cup of coffee and forge on.

?

OK, I can see you’re a masochist (or a Stars or math geek), so let’s get into this. The way it works (Oh, I should state at this point that I’m no mathematician or statistician, so if I get something wrong, please comment). The way it works is that for each individual measure, they calculate the difference between each contract’s score and all of the other contracts’ qualifying scores (in whole-numbers only, no decimals). Picture a grid with each contract listed both on the X-axis and on the Y-axis. There’d be a diagonal where a contract would pair off with itself and half the grid would be a duplicate of the other half, just flipped (i.e. contract A on the X-axis pairs with contract B on the Y-axis and then contract B would pair with contract A when it’s next on the X-axis). The diagram below shows example scores and just such a grid of the differences between contracts.?

Those differences are then squared. This number, the square of the difference between 2 scores, is what will be used again and again to divide up all of the scores into 5 distinct groups.?To do this, they run a calculation called “Ward’s minimum variance method.” Initially, every score stands alone (even though they call them a cluster, at the beginning, each cluster simply contains a single score). The program then goes through all of those squared differences to find the two scores that, when combined, will have the smallest sum compared to every other combination. Those contracts then get combined to make a new cluster with the score essentially the average of the cluster components. The program keeps combining the pair of clusters with scores as close to each other as possible. If you’re following, you can tell that, over time, each cluster will have a different number of scores, but the scores within a given cluster will be as close to each other as possible, while the average score between clusters is as different as possible. This continues until we’re down to 5 clusters (unless there’s a rare case where two clusters have exactly the same score and then they’ll be combined as well). In the colorful diagram at the bottom below, pretend that those letter pairs were individual contracts that had similar scores and got clustered together at the start of the process. Then those clusters get further combined with other clusters until we get to the point where there are 5 total clusters (denoted by the vertical black line with each cluster numbered and denoted with a different color rectangle). ?

As if that isn’t confusing enough, they actually run the above calculation 10 different times. How does that work? Well, before they even start doing this clustering business, they randomly divide all of the contracts into one of ten groups. They then run the clustering calculation 10 different times, each time excluding one of those groups so that they come up with 10 sets of 5 final clusters. This part of the process is called “mean resampling” and it was introduced in Star year 2022 to reduce the impact of outliers (apparently, Ward’s minimum variance method is sensitive to outliers which contributed to the decision to include the Tukey Outlier Deletion explained a bit later).?

OK, so now we have 10 sets of 5 clusters and what we need is clear cut-points. Let’s take the top clusters from each of our 10 runs. These guys represent the likely 5-star candidates, but not yet. We’ll take the lowest score from those 10 best-performing clusters and calculate the average. That is now the cut point to achieve 5-Stars. Any contract that has that score or higher on that particular measure will be awarded 5-Stars on that measure.? This is performed for all of the higher-is-better measures. For lower-is-better measures, it is the flip. They take the average maximum score for each of the runs to calculate the effective cut points. 5-stars would be awarded to any score at or below the maximum average score of the cluster with the lowest number.?

One other note on HEDIS measure scores. If CMS deems the data result to be inaccurate or biased in some way, their policy is to reduce that rating to 1 star. Not only that, but if the data files are not successfully submitted and validated by the submission deadline, all of the related ratings will drop to 1 star.?

Given that these calculations occur every year on a new set of data, those cut points could move around quite a bit, adding stress and risk to contracts that don’t even know what they’re striving to achieve until it’s too late. So, starting in 2022, CMS implemented guardrails to help protect contracts from large swings in the measure cutoff thresholds. These guardrails prevent the cut points from moving more than 5% in either direction (again, we’re talking about just the non-CAHPS measures). This should be a good thing, but CMS tripped over this when they implemented the Tukey Outlier Deletion in Star Year 2024. We’ll get to that a bit later.

Now you know how CMS comes up with the cut points to determine the Star rating for the non-CAHPS measures based on each contract's raw scores for those measures. Please let me know if that help clarified anything for you or if it was still muddy and confusing.

要查看或添加评论,请登录

Adam Solomon, MD, MMM, FACP的更多文章

  • The Full CMS 5-Stars Primer

    The Full CMS 5-Stars Primer

    Why do we even have Star Ratings? As you know, in Medicare Advantage, private health plans are paid a fixed monthly…

    6 条评论
  • STAR Gazing (where are the Stars leading?)

    STAR Gazing (where are the Stars leading?)

    What is on the docket for changes going into 2025, 2026 and beyond? In an earlier article, I posted a listing of the…

    3 条评论
  • A Star is Born (how the Star measures roll up to one rating) Stars Deep Dive Part 11

    A Star is Born (how the Star measures roll up to one rating) Stars Deep Dive Part 11

    How do the individual metric scores roll up to overall contract ratings? To make it simpler for patients to evaluate…

  • Calculating Cut Points for CAHPS measures

    Calculating Cut Points for CAHPS measures

    Now that we've reviewed the calculations for how star cut points are determined for the non-survey measures (if you…

    1 条评论
  • MA bids, a simplified primer

    MA bids, a simplified primer

    Now that you understand how CMS develops the county-level benchmarks (see my last article if you missed it), the next…

    2 条评论
  • How MA Benchmarks are calculated (a simplified explanation)

    How MA Benchmarks are calculated (a simplified explanation)

    Medicare Advantage benchmark calculations appear complicated, but if you break them down step-by-step, it makes more…

    15 条评论
  • Aligning incentives for success in VBC

    Aligning incentives for success in VBC

    So you’ve made the leap to value-based care (taking on financial risk for health outcomes of a population), but your…

    5 条评论
  • What drives top performance?

    What drives top performance?

    How companies (don’t) drive performance (and sometimes don’t even realize it): Business success depends on getting…

  • Does MA change what PCPs do?

    Does MA change what PCPs do?

    If you haven’t seen it yet, a few days ago, The Commonwealth Fund posted an analysis of their 2022 International Health…

  • The VBC payment pivot.

    The VBC payment pivot.

    “We’re going to pivot from fee-for-service sales to value based care.” I have heard this statement from numerous start…

    4 条评论

社区洞察

其他会员也浏览了