What I learned making an SCA tool in 2024

What I learned making an SCA tool in 2024

Introduction

Software Composition Analysis (SCA) has been a staple in the developer's toolkit for years. With numerous commercial and open-source solutions available, one might assume it's a solved problem space. But is it really?

It's 2024 now, I decided to take a fresh look at SCA methodology, challenging myself with two key constraints: 1. Rely solely on established standards 2. Minimize complexity by limiting custom code to no more than what would be required for implementing a proprietary SCA solution The goal was to explore whether modern package ecosystems and vulnerability databases could offer a streamlined approach to SCA, potentially rivaling traditional tools in effectiveness while maintaining simplicity.

What is SCA?

Software Composition Analysis (SCA) is like running a fine-toothed comb through your project's dependency tree. It's not just about listing what libraries you're using; it's about understanding the DNA of your software supply chain.

At its core, SCA involves:

1. Identifying direct and transitive dependencies

2. Determining the versions of these dependencies

3. Checking for known vulnerabilities

4. Analyzing licensing information

5. Assessing the overall health of your software composition

Here's a simplified example of what an SCA might reveal:

project/
├── main.go
└── go.mod

Dependencies:
├── github.com/pkg/errors v0.9.1
└── golang.org/x/crypto v0.0.0-20210921155107-089bfa567519
    └── golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1

Vulnerabilities:
- CVE-2022-27191 in golang.org/x/crypto
- No known vulnerabilities in github.com/pkg/errors

Licenses:
- github.com/pkg/errors: BSD-2-Clause
- golang.org/x/crypto: BSD-3-Clause        

A Trip Down Memory Lane

The evolution of SCA tools is a testament to our growing awareness of supply chain security. Let's break it down:

Early Days: We started with basic license scanners. Remember when that was our biggest concern? Those were simpler times.

$ license-checker
Package: left-pad
License: MIT        

OWASP Enters the Chat: OWASP brought us game-changers like Dependency-Check. Suddenly, we were doing more than just license checks:

$ dependency-check --project MyProject --scan /path/to/project
[INFO] Checking for updates
[INFO] Analyzing dependencies
[INFO] Found 42 dependencies
[WARN] CVE-2021-44228 detected in log4j-core-2.14.1.jar        

Commercial Solutions: The big players came in with their fancy UIs and integrations. They promised the moon, and to be fair, they delivered some pretty stellar features.

The Good, The Bad, and The Buggy Each tool had its strengths and quirks.

Some were like overeager interns – enthusiastic but prone to false positives.

Others were more like seasoned pros – reliable but sometimes set in their ways.

Users loved the comprehensive reports:

{
  "projectName": "MyAwesomeApp",
  "scanDate": "2024-06-30",
  "dependencies": [
    {
      "name": "lodash",
      "version": "4.17.21",
      "vulnerabilities": [
        {
          "id": "CVE-2021-23337",
          "severity": "MEDIUM",
          "description": "Prototype pollution in lodash"
        }
      ]
    }
  ]
}        

But the complaints? Oh, they were plenty:

- "Too many false positives!"

- "It takes forever to scan our monorepo!"

- "Why does it need 16GB of RAM just to run?"

Package Managers: The Unsung Heroes

Here's where things got interesting. Our everyday package managers had some serious SCA capabilities hiding in plain sight.

Go: (kind of builtin, also not really)

$ go list -m all
github.com/pkg/errors v0.9.1
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519        

npm: (builtin)

$ npm audit
found 0 vulnerabilities        

PyPI: (everything is a PEP, nothing is builtin)

$ pip-audit
No known vulnerabilities found        

SBOM: The New Silver Bullet

Software Bill of Materials (SBOM) became the new lingua franca of dependency tracking. SPDX and PURL strings turned out to be powerful tools for describing our software supply chain.

Here's a snippet of an SPDX document:

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: my-project-sbom
DocumentNamespace: https://spdx.org/spdxdocs/my-project-sbom-v1.0
Creator: Person: Jane Doe ([email protected])
Created: 2024-06-30T18:00:00Z

PackageName: my-project
SPDXID: SPDXRef-Package-my-project
PackageVersion: 1.0.0
PackageDownloadLocation: git+https://github.com/example/[email protected]
FilesAnalyzed: true
PackageLicenseConcluded: MIT        

And a PURL example:

pkg:golang/github.com/pkg/[email protected]        

OSV.dev: The One Database to Rule Them All

OSV.dev was like stumbling upon a treasure trove of vulnerability data. It standardized how we consume and share vulnerability information.

A sample query to OSV.dev might look like:

$ curl -X POST -d '{"package":{"ecosystem":"npm","name":"lodash"},"version":"4.17.21"}' \
  https://api.osv.dev/v1/query        

Putting It All Together

Now, here's where the magic happened. I cobbled together a basic SCA capability with just a few lines of code, no complex authentication with vendors, or secrets of any kind, and about the same (or less) lines we would add to a DevOps pipeline step for SCA anyway.

#!/usr/bin/env bash

set -euo pipefail

# Generate SPDX using npm sbom
npm sbom --omit dev --package-lock-only --sbom-format spdx |  jq> sbom.spdx.json

# Extract PURL strings from SPDX
jq -r '.packages[] | select(.packageManager=="npm") | .externalRefs[] | select(.referenceType=="purl") | .referenceLocator' sbom.spdx.json > purls.txt

# Prepare JSON for OSV bulk query
jq -nR '[inputs | {package: {purl: .}}]' purls.txt > query.json

# Query OSV in bulk
curl -sSL -X POST -H "Content-Type: application/json" -d @query.json \
     https://api.osv.dev/v1/querybatch > vulnerabilities.json

# Output results
echo "SPDX (truncated):"
jq '.packages | length' sbom.spdx.json
echo "packages found in SBOM"

echo -e "\nPURL strings:"
cat purls.txt

echo -e "\nVulnerabilities:"
jq '.[] | select(.vulns != null) | {package: .package.purl, vulnerabilities: .vulns}' vulnerabilities.json        

This script demonstrates the core concepts of our homegrown SCA solution.

The Moment of Realisation

Running this bare-bones SCA solution was like watching a Rube Goldberg machine in action. Would all the pieces fall into place? To my amazement, it worked. It wasn't as polished as commercial solutions, but it got the job done with minimal fuss and provided more results than compared commercial options, and the SBOM made reachability analysis and transitive dependency tree tracing a non-issue.

What I Learned

This adventure in SCA showed me that sometimes, the best solutions are hiding in plain sight. By leveraging standards and existing tools, we can create powerful, yet simple, security solutions.

It's not about having the fanciest tool; it's about understanding your stack and using the right tools for the job.

In the rapidly evolving world of software development, sometimes the most elegant solutions are the ones that make you wonder why you didn't think of them sooner.

So, the next time you're faced with a complex problem, take a step back. Look at the tools you already have. You might be surprised at what you can achieve with a little creativity and some elbow grease.

Now, go forth and compose your software wisely!

Bonus: Enriching Your Vulnerability Data

If you find yourself craving richer vulnerability information, consider exploring VulnCheck before jumping to one of the big names vendor in the industry.

VulnCheck offers a PURL API that includes valuable research metadata with each vulnerability. This additional context can be crucial for understanding and prioritizing threats effectively. They also provide free access to two other noteworthy APIs:

1. NVD++: An enhanced version of the National Vulnerability Database, offering deeper insights into each vulnerability.

2. KEV (Known Exploited Vulnerabilities): Focuses on vulnerabilities actively exploited in the wild, helping prioritize security efforts.

Using these APIs in conjunction with our homegrown SCA tool could significantly boost its capabilities, allowing for more nuanced vulnerability assessment and prioritization.

Disclaimer: This is not a sponsored section. I'm sharing this information because I believe in the value of enriched vulnerability data and I've found VulnCheck's approach interesting and potentially useful for developers looking to enhance their SCA capabilities.

Keep an eye on GitHub marketplace for my app called "Triage, by Trivial Security"

https://triage.trivialsec.com is live but in stealth.

Follow the Trivial Security page for announcements.

Patrick Garrity ??????

Cybersecurity/Vulnerability Researcher/Skateboarder

8 个月

Looks like a fun project that you are working on. I'm glad you are getting value out of VulnCheck community.

Stephen Cooke

Entrepreneurial Explorer | Developing IT Infrastructure Skills | Delivering Open-Source Solutions

8 个月

It’s a brilliant use of package managers and OSV.dev to enhance software security. Great work!

要查看或添加评论,请登录

Christopher L.的更多文章

社区洞察

其他会员也浏览了