C++, Memory Safety and Rust
Executive Summary. Plus details
Authors
Benedetto Proietti, Principal Software Engineer, Freelancer
Toby Rawlinson, Chief Business Officer, Janea Systems
Executive summary
In the last 18 months, two significant pieces of the US Government (the NSA and the White House) have published reports essentially discouraging the use of 'memory unsafe' languages, explicitly mentioning C++ among them.
Although 'memory unsafe' is not a scientifically defined term, there is general agreement on its meaning. It refers to the use of objects in a programming language where it is not possible to know whether they are 'alive' or not.
C++ is a pervasive and foundational language. Most (if not all) critical machinery and systems in the world operate with programs written in C++. This is because it allows complete control of the machine while still providing powerful abstractions like other 'higher level' languages. The ability to control the machine comes at the expense of not fully controlling object ownership and lifetimes, which is a very complex problem.
C++ is driven by an ISO committee led by Herb Sutter, which also includes the creator of C++, Bjarne Stroustrup. The committee has proposed a plan to address this issue. This plan is still in the early stages and, according to the authors, does not seem easy to be adopted by the general population of software engineers. Bjarne also replied to one author’s post on LinkedIn, pointing out the various proposals for this plan.
The consequences of this 'memory safety' issue vary enormously from industry to industry. ?Industries with needs of extreme low latency (HFT) or complete control of the machine (Automotive, aerospace) will likely keep their core code in C++. However, in all industries, non-critical application code might partially and slowly migrate to Rust.
Rust’s 'borrow checker
For all these reasons, the authors believe that even if C++ does not immediately solve or mitigate its 'memory safety' issues, it will persist as a foundational language in many industries
Even within the government, the authors doubt that anything will drastically change in the next decade or so. We kindly invite you to remember what happened in the US DoD in the 70s/80s when there was a great push to use the Ada language. That led to a decade of technical problems and delays, with the eventual emergence of C++ as the de facto programming language for mission-critical scenarios.
What happened??
NSA Report
On November 10th 2022 the US National Security Agency (NSA) published a report titled ‘Guidance to help software developers and operators prevent and mitigate software memory safety issues
Here is the announcement link, and here is the full report.
In the report the NSA recommends ‘using a memory safe language when possible.’
C and C++ are explicitly mentioned as languages that “provide a lot of freedom and flexibility in memory management” but “Simple mistakes can lead to exploitable memory-based vulnerabilities”.
The report does not provide formal definitions for "memory safety" or "memory-safe language."
The authors requested these definitions from the NSA, which kindly replied:
While the guidance does not include formal definitions for those terms, there are descriptions. While not formal definitions, here are some further descriptions that may help.
Memory safety involves managing memory allocations and protecting memory accesses so that insecure conditions that could be exploited are unlikely to arise.
Memory safe languages are computer languages that automatically manage memory allocations and accesses in most cases to reduce the risk of programs having exploitable memory related vulnerabilities.
Having a background on Mathematics, one of the authors would prefer more formal definitions. But this is a start.
White House Report
On February 26th 2024 the US White House released a report titled "Back to the Building Blocks: A Path Toward Secure and Measurable Software” (link).
In a continuous attempt to “defend the cyberspace”, the report identifies memory safety vulnerabilities as most dangerous. In their words “… in order to reduce memory safety vulnerabilities at scale, creators of software and hardware can better secure the building blocks of cyberspace. This report focuses on the programming language as a primary building block”.
The report explicitly mention C and C++ as “not safe programming languages”. It also states that Rust is “one example of a memory safe language”.
The paper does not define the terms "memory safe programming language" or "memory safety." Similarly, other White House papers on the subject mentioned in this document do not provide definitions for these terms either.
The authors requested these definitions from the White House but had not received a response at the time of writing.
What does ‘memory safety’ mean?
We have not found a widely recognized formal definition of ‘memory safety’.
However, there seems to be an informal agreement on what the ‘memory safety’ issues are. Let’s list them:
·???????? use an object after its destruction
·???????? dereferencing “bad” pointers (ex: accessing an STL iterator after the underlying vector has changed)
·???????? using uninitialized memory
·???????? buffer overflow
·???????? concurrency issues
We have purposefully ignored ‘lower level’ issues around pointers (ex: calling ‘free’ twice on the same pointer).
void foo() {
Class obj;
bar(std::move(obj));
obj.do_something(); // error!!! Ownership is lost to `bar`
}
void foo2() {
std::vector<int> vec(10, 10)
auto iter = vec.begin();
vec.resize(100’000);
bar2(*iter); // error!!! `vec` data could have moved with `resize`
}
Two examples of unsafe operations are shown above. Many, many more exist. The intent of this blog is not a comprehensive list of C++ ‘memory unsafe’ operations.
In general, a function that receives a parameter object cannot know for sure whether such parameter object is fully constructed and still valid.
In software engineering, local reasoning refers to the ability to understand, verify, and predict the behavior of a piece of code by examining only a small, localized section of the codebase, without needing to consider the entire system. This concept is essential for making code easier to comprehend, maintain, and debug.
In C++ this is not generally possible. You can’t look at a given piece of code and understand what it does without looking elsewhere in the code.
??
Herb Sutter’s opinion
Herb Sutter is a prominent and authoritative voice in the C++ worlds. He has been the Chair of the ISO C++committee for more than a decade.
He wrote a blog post titled ‘C++ safety, in context’. Here’s the link, we suggest reading it.
Herb has set a target of having 98% less CVEs for C++.
Which is a 50x reduction. Not exactly a trivial goal.
CVE
CVE stands for ‘Common Vulnerabilities and Exposures
When a vulnerability on any operative system or software is found, it is given a code (ex: CVE-2024-23583 “An attacker could potentially intercept credentials via the task manager and perform unauthorized access to the Client Deploy Tool on Windows systems.”) and it is put in a database of CVEs (such as the one maintained by NIST).
When you run a security ‘tool’ in your system, it simply checks for installed software against these public databases (yes, that’s it).
领英推荐
When we talk about ‘C++ CVEs’ we do NOT refer to vulnerabilities of the language itself, but rather to vulnerabilities in applications written with C++.
CVEs per year
First of all, let’s see how many CVEs per year we had on each language in the last 5 years.
Unsurprisingly, C++ has many more. But that’s understandable because C++ has been around for 40 years and has lot more software running.
C++ Industries
C++ is a powerful language that allows you to completely control the machine underneath your keyboard. But with great power comes … yes, great responsibility.
Over the last four decades the language has evolved and now offers several layers of abstractions that can, when needed, distance you from the low-level machine details. However they are still available if you need them.
Now, for some industries it is crucial and vital to access and control the hardware. Think about the embedded industry - it’s obvious. But also think about High Frequency Trading (HFT). Those developers need nanosecond responses. They need to control everything happening on the computer, from the memory bus to the PCIe bus.
In the authors' opinion, those industries should generally be more sensitive to 'memory safety' issues. However, the top talent present in those companies probably manages and controls all core aspects of the software, at least in critical components, to the extent that 'memory safety' issues in these contexts are less impactful (in those core components). The authors kindly ask developers working in those industries to confirm (or contradict) this hypothesis.
In our opinion, other industries possibly less sensitive to ‘memory safety’ issues (for the same reason) are: Aerospace and Defense, Blockchain and Cryptography, VR and AR, Audio and Video Processing, HPC, Energy Sector, AI and ML, Robotics, Medical Devices, HFT and Telecommunications.
In contrast, the industries more sensitive to ‘memory safety’ issues are, in our opinion: Game Development (excluding the 3D and Physics engine), Finance (excluding HFT), Healthcare, Web Browsers, Education, E-commerce and Retail, Education, Transportation.
C++: plan to mitigate “memory issues”
One of the authors wrote a post on LinkedIn expressing concern about these ‘memory safety’ issues. The creator of C++, Bjarne Stroustrup, kindly replied, pointing to the proposals currently on the desk of the ISO C++ committee that allegedly mitigate or solve such problems.
These are the mentioned proposals:
These are early-stage proposals, with no a clear roadmap nor any indication that this approach will be successfully adopted by the general population of C++ developers.
The authors believe that the ‘Profiles’ approach might not be easily understandable by the majority of C++ developers. Additionally, it will probably require too much effort from developer. We hope to be wrong.
Programming language history shows that simplicity and ease of learn are key factors for languages (or language features) to quickly gain developer adoption (e.g.: think about JavaScript and Python).
Rust
Rust development started around 2010, and the first stable version (1.0) was released in 2015. In 2016 Rust won the first of several "most loved programming language" awards in Stack Overflow's Developer Survey, reflecting its growing popularity and positive reception within the developer community.
The basic principles of Rust were (and still are) safety, concurrency and performance. Rust’s most important feature is without a doubt the Borrow Checker, which we will examine in a moment.
Rust started begin used in Windows and Linux components around 2021.
Borrow Checker
The Borrow Checker enforces language rules at compile time to ensure that ownership of a given object is held by only one owner at the time.
That prevents data races, dangling pointers and all the other ‘memory safety’ issues discussed here.
Ownership Model
Ownership: Each value in Rust has a single owner, which is the variable that holds the value. When the owner goes out of scope, the value is automatically deallocated.
Move Semantics: When an owner variable is assigned to another variable, ownership is transferred (moved). The original variable can no longer be used.
These rules allow the compiler to, as said, to statically verify the code and prevent ‘memory safety’ issues.
This comes at the expense of a syntax that is not simple and takes time to learn, at least for the majority of professional software developers. This topic is frequently discussed online wherever developers meet.
As mentioned earlier, ease of use is a key factor for wide language adoption.
Critical CVE
We want to mention a very critical CVE for Rust to prove that even Rust can have severe ‘memory safety’ issues.
Consider CVE-2024-24576.
This CVE is clearly explained in this blog post. Here’s a small excerpt: ‘This flaw is due to OS command and argument injection weaknesses that can let attackers execute unexpected and potentially malicious commands on the operating system.’
?
Back to the Government
The lost decade: The rise and fall of the Ada programming in the 70s and 80s
In the 1970s, the DoD was concerned about the proliferation of different programming languages used across various projects, leading to issues with software maintenance, interoperability, and costs. This diversity made it difficult to ensure software reliability and security in defense systems.
The DoD decided to create a new programming language that could address these concerns, focusing on reliability, maintainability, and efficiency. In 1983, the DoD mandated the use of Ada for all defense-related software development projects where feasible, intending to ensure standardization and improve software quality across military systems.
The mandate required Ada to be used unless a waiver was granted, usually justified by the specific needs of a project or existing investments in other languages. Ada was adopted for many defense projects and was praised for its strong typing, modularity, and support for real-time programming.
Despite its strengths, there was resistance from some developers and organizations due to the perceived complexity of Ada, the learning curve, and the costs associated with transitioning existing systems to the new language.
In the 1990s, the DoD began to ease the mandate as other languages, such as C and C++, became more popular and were seen as capable alternatives for many applications.
?
Conclusions
About 30 years later, the history mentioned above appears to be, possibly, repeating itself. Or maybe not - time will tell.
As shown above, C++ permeates many different industries. It is certainly possible that in some of them Rust will take a central role because of its Borrow Checker and the reduced risk of ‘memory safety’ issues.
In other industries, such as HFT or Aerospace, C++ will still have a key role in critical components and functionalities.
It needs to be clear that there is no alternative to C++ when hardware control and performance is needed (unless you build your own hardware, such as FPGA or ASIC, which is another story).
Rust could penetrate these industries, replacing C++ in minor and non-critical components and functionalities.
Get in touch directly
Discover why tech giants trust our expertise to deliver outstanding technical outcomes.
Contact us for expert help to discuss how to assess, safeguard and deliver on your tech initiatives and development?practices.
We Help Software engineering leaders solve their most complex software & product challenges.
9 个月Allen Samuels interesting RUST vs C++ tradeoffs conversation at the Valkey contributor summit today. We all have a pov! ??????
Software Test Architect / Senior Software Engineer at HistoSonics Inc -- Medical Devices
10 个月Thanks folks, very nice article. Question on microcontroller-based systems, would they not be C instead of C++? Perhaps my understandings are outdated. At any rate, your advice and comments still applies. Myself, certainly thinking a lot about Rust lately.
Founder at Echobatix, Inc. | Tech Equity and Accessibility for All
10 个月Saved this for now. I’m looking forward to reading it. I’ve been happy enough with C++, but some weeks ago I read most of the government reports you referenced, and I’m curious about your take on Rust. Thanks for posting!