Model Efficacy for CVSS v3 prediction

Model Efficacy for CVSS v3 prediction

Recently we began predicting CVSS v3 scores ( earlier our models were trained for v2 scores ). Since v3 has become the de-facto CVSS standard and v2 has faded out ( infact NVD has stopped computing scores for v2 based on our observations ), it made all the sense to do that.

As is the case with ML and AI , we believe transparency and responsible AI is of outmost important and sharing outcomes of model behavior based on back testing is one way to do that. This also helps with re-training models to keep improving model accuracy.

So let us dive deep into the analysis....

For this round of back testing we looked at two different data sets,

  1. Pre-Existing Scores : This population includes existing set of vulnerabilities where scores already exist ( i.e those vulnerabilities which already had a CVSS v3 score published by the associated publisher ). This phase will expand to ~400K vulnerabilities in the months to come.
  2. Awaiting Analysis ( Real World Scenario ) : The population for which scores were predicted for new and emerging vulnerabilities ( i.e those vulnerabilities which did not have a published CVSS v3 score ). This phase of back testing included, ~6.1K vulnerabilities and since the models are now deployed in production this will keep growing each passing day.


Model Accuracy For Pre-Existing Scores

ThreatWorx Attenu8 Model Performance


Model Accuracy For New & Emerging Vulnerabilities

ThreatWorx Attenu8 Model Performance


Over 60% of the time the model predicts the CVSS v3 score with 100% match with the eventual outcome ( i.e every other vulnerability or slightly better ). 80% of time the model is able to predict score that is 85% accurate ( i.e within the range of +1.5 to -1.5 ).

Observations

  • For both (1) and (2) the model accuracy is near identical when operating over similar sized population that are spread out over a period of time across all available vulnerability sources. This is a good indicator that the model has stabilized. Expanding the population to cover more vulnerabilities with available scores in the next set of cycles will provide a better understanding of model performance for data spread out over even larger time period.
  • This gives ThreatWorx consumers an ability to analyze and prioritize unstructured content that represent vulnerability reports in the form of mailing list posts , blogs or any other raw form of text. Other ThreatWorx Attenu8 models that aid in content extraction will take this a step further and make it relevant to the exposure brought in for each deployment / organization.
  • CVSS scores for a given vulnerability differ across publishers as well, so in this case we have looked at the analysis of the primary publisher ( eg. NVD for CVE reports and other sources based on their respective advisories ) and stuck with it even when the model accuracy fares better when looking at scores published by other publishers. Below are a couple of examples,

CVE-2023-38060 : ThreatWorx predicted a score of 8.8, NVD scores it as 5.4 , however the publisher who actually reported this finding has scored this as 6.3.         
CVE-2023-35134 : ThreatWorx predicted a score of 9.8, NVD scores it as 5.9, however the publisher who actually reported this finding (ICS-CERT) has scored this as 7.4        

The Value Of Lead Time

Now that we have discussed prediction for prioritization, a natural outcome of that is the lead time that it provides for the operational team to analyze and act. The delay caused to assign scores has significant bearing on determining the severity of the vulnerability and even some transitional scanning vendors will likely not add checks for vulnerabilities that are awaiting analysis.

Before we discuss further, it should be noted that CVSS scores themselves are not great in determining the outcome in terms of their eventual weaponization but this becomes one important factor for prioritization and hence the lead times matter.

In the time to come we will also publish data for the lead times due to early prediction, this would be an important consideration as it shrinks the overall window of compromise.

Write to us for any feedback or comments, [email protected]


要查看或添加评论,请登录

ThreatWorx的更多文章

社区洞察

其他会员也浏览了