If you have been following my recent posts, you have probably noticed that I have been learning about CodeQL lately.
Today I was learning about the CodeQL CLI, which uses CodeQL from the CLI.
What is CodeQL?
CodeQL is a query language and semantic code analysis engine developed by GitHub for analyzing code and identifying potential security vulnerabilities or coding errors.
- It allows developers to write queries that can analyze codebases to find specific patterns, such as potential security flaws, bad coding practices, or logic errors.
- The queries are written in a SQL-like language called CodeQL that can reason about code structure, data flow, control flow, and other semantic programming concepts.
- CodeQL supports analyzing code written in many popular programming languages like JavaScript, Python, Java, C/C++, C#, Go and more.
- It integrates into the development workflow, allowing queries to run as part of CI/CD pipelines or code editors.
- CodeQL queries are shareable and can be combined into query suites to comprehensively analyze projects.
- It powers the code scanning security feature in GitHub to help find vulnerabilities across repositories.
CodeQL CLI
The CodeQL CLI allows you to run CodeQL queries and analysis from the command line on your local machine or CI/CD environment.
- It provides a way to create CodeQL databases from your source code to run queries against. These databases contain the data flows, control flows and other semantic information about the codebase.
- You can execute custom CodeQL query files or query suites against these databases using the CLI.
- It integrates with various build systems like Make, MSBuild, Gradle etc. to automatically build CodeQL databases during compilation.
- The results of the queries are displayed in the CLI output, allowing you to grep, filter and analyze the findings.
- You can run the CLI in a containerized environment to have a consistent CodeQL setup.
- The CLI supports features like autobuilding databases, running multiple queries in parallel, diffing results between databases and more.
- It provides a way to integrate CodeQL as part of your CI/CD pipelines for continuous code analysis.
Getting started with CodeQL CLI
1. Install the CodeQL CLI
2. Add CodeQL CLI to PATH
- On Windows: Add the path to the extracted codeql binary to your System PATH
- On Lunix/macOS: Add the path to your .bashrc or .zshrc file: export PATH=$PATH:/path/to/codeql
3. Create CodeQL Databases
- Navigate to the root of your source code repository
- Run codeql database create <language> --source-root . (e.g. codeql database create javascript)
- This will create a CodeQL database for your project in a /codeql-database directory
4. Run CodeQL Queries
- Visit https://github.com/github/codeql and find/write the query you want to run
- Save the .ql query file locally
- Run codeql database analyze <database> <query> --output=<results>e.g. codeql database analyze javascript /path/to/query.ql --output=results.json
- This executes the query against the CodeQL database
5. View Results
- Open the output file (e.g. results.json) to see the query results
- Interpret any findings based on the documentation of the query
Subcommands / Options
Subcommands:
- database create - Create a CodeQL database from source code for querying
- database analyze - Run CodeQL queries against an existing database
- database trace-tests - Run CodeQL tests on the database for CI verification
- repo init - Initialize a repo to set up CodeQL analysis (generates config files)
- repo sync - Synchronize a repo's CodeQL packs/queries with latest versions
- dataset bundle - Utilities for bundling CodeQL dataset contents
Options:
- --language=<lang> - Specify the language(s) for analysis
- --source-root - Root directory of source files to extract into database
- --threads=<n> - Number of threads to use for parallelization
- --ram=<RAM> - Maximum RAM to use for database creation (e.g. --ram=4GB)
- --codescanner-option - Pass options to code scanning engine
- --output=<file>, --output-dir - Where to save results
- --sarif-output=<file> - Save results in SARIF format
SARIF: Static Analysis Results Interchange Format
SARIF stands for "Static Analysis Results Interchange Format". It is an open-source standardized file format for representing static analysis results, defined by the Object Management Group (OMG) in a standard specification.
- It is designed to make static analysis results shareable and interoperable between different tools and platforms.
- SARIF files contain structured data about defects, metrics, code locations, suppressed alerts, and more output by static analysis tools.
- The format is defined as a JSON schema, making SARIF files readable by both humans and machines.
- It supports comprehensive metadata about the analysis run, tool details, artifact locations, code flows, call stacks, and rich result information.
- SARIF enables integrating static analysis findings into development workflows, IDEs, CI/CD pipelines, and reporting tools.
- Major IDEs, language services, and DevOps platforms have SARIF viewer/import capabilities built-in.
For CodeQL CLI specifically, the --sarif-output flag allows saving the query analysis results directly in the SARIF format. This makes it easy to:
- Review results in IDEs/viewers with SARIF support
- Integrate into engineering systems consuming SARIF data
- Archive/compare results over time
- Exchange results between different teams/tools
Key Takeaways
I am learning a lot about CodeQL, but I am not yet proficient in its use. I would like to continue learning. Such as:
- Go through the official CodeQL tutorials: GitHub has an excellent set of interactive tutorial lessons on https://codeql.github.com/docs/codeql-overview/tutorials/. These cover writing basic queries, understanding CodeQL concepts like data/control flow, and more with sample code to practice on.
- Explore sample queries and query suites: GitHub maintains open source repositories with many sample/example CodeQL queries across languages like https://github.com/github/codeql. Go through these to understand real-world security queries.
- Set up a local CodeQL environment: Install the CodeQL CLI on your machine as per the instructions earlier. Create CodeQL databases for an open source project you use/understand. Run various queries against those databases to see results.
- Write queries for your own code: Once I understand the basics, try writing simple CodeQL queries for my own application codebases.