Proposed C++ Manifold Operator
The following details the design, implementation, and WG21 discussion on proposed manifold operators for the C++ programming language.
It is suggested that four new operators be introduced into ISO/IEC 14882:2020(E) Standard – Programming Language C++.
The stated operators aim to simplify common code patterns, improve clarity when interpreting multifaceted conditionals, and reduce unnecessary evaluation when compared to alternative solutions; see below.
The chosen syntax was selected to align with similar bitwise operations XOR, AND, OR, and NOT. Surrounded by the subscript operator tokens to signify a sequence of values. It is hoped this would lead to a more intuitive interpretation of the operators given a basic understanding of existing programming languages.
?? all-of-operator syntax was altered during testing to avoid lambda conflicts.
The following example was liberated from LLVM's SmartPtrModeling.cpp:460.
Before the Proposal
const OverloadedOperatorKind OOK = FD->getOverloadedOperator();
if (!(OOK == OO_EqualEqual || OOK == OO_ExclaimEqual ||
OOK == OO_Less || OOK == OO_LessEqual ||
OOK == OO_Greater || OOK == OO_GreaterEqual ||
OOK == OO_Spaceship))
return false;
After the Proposal
const OverloadedOperatorKind OOK = FD->getOverloadedOperator();
if (OOK [!] OO_EqualEqual, OO_ExclaimEqual, OO_Less, OO_LessEqual,
OO_Greater, OO_GreaterEqual, OO_Spaceship)
return false;
The post-proposal logic results in a 21.7% decrease in character count and a reduction in cognitive load — no requirement exists to study each comparison operator delimited by subsequent logical operators.
Constraints
Datatypes and expressions valid on the right-hand side (RHS) of a comparison operator remain valid with the manifold operators; where it does not conflict with the following:
Expectations
The suggested manifold operators are solving a niche problem with somewhat subjective reasoning used to justify their introduction. It is highly unlikely that the proposal will make any headway with WG21.
Alternative Solutions
A number of existing solutions can be crafted using standard language constructs, and it is important that we justify any new operators by demonstrating where existing capabilities fall short.
?? Design of the alternative solutions are geared towards readability, they are not intended to be exhaustive, optimal, or complete.
Macro Solution
A variadic macro implementation has all the shortcomings of macros but most importantly presents a difficult solution to debug and requires the addition of new definitions as the argument list expands.
#define NOT_OF0(lhs,rhs,...) lhs != rhs
#define NOT_OF1(lhs,rhs,...) lhs != rhs && NOT_OF0(lhs, __VA_ARGS__)
// Add NOT_OF2, 3, 4, 5... as desired.
#define NONE_OF(lhs, ...) NOT_OF1(lhs, __VA_ARGS__)
const OverloadedOperatorKind OOK = FD->getOverloadedOperator();
if (NONE_OF(OOK, OO_EqualEqual, OO_ExclaimEqual,
OO_Less, OO_LessEqual,
OO_Greater, OO_GreaterEqual,
OO_Spaceship))
return false;
Template Solution
A variadic template combined with a C++17 fold expression offers an improved solution. Yet, it fails to avoid unnecessary evaluation of the RHS, as there is no concept of left-to-right or right-to-left evaluation of an expression in C++ — see Order of evaluation for details.
template <typename LHS, typename... Args>
auto none_of(LHS lhs, Args... args)
{
const auto predicate = std::not_equal_to<LHS>();
return (predicate(lhs, args) && ...);
}
const OverloadedOperatorKind OOK = FD->getOverloadedOperator();
if (none_of(OOK, OO_EqualEqual, OO_ExclaimEqual,
OO_Less, OO_LessEqual,
OO_Greater, OO_GreaterEqual,
OO_Spaceship))
return false;
STL Solution
The Standard Template Library (STL) produces a verbose conditional construction with the pitfalls (order of evaluation) of the variadic template solution.
const std::set<OverloadedOperatorKind> ComparisonOperators = {
OO_EqualEqual, OO_ExclaimEqual,
OO_Less, OO_LessEqual,
OO_Greater, OO_GreaterEqual,
OO_Spaceship
};
const OverloadedOperatorKind OOK = FD->getOverloadedOperator();
if (std::none_of(std::cbegin(ComparisonOperators),
std::cend(ComparisonOperators),
[&](auto op) { return op == OOK; }))
return false;
Helper Functions
Clean, concise, and readable - a hard case to dismiss.
However, results in the proliferation of many helper functions to address all possible combinations throughout a codebase e.g. is_comparison_operator, is_equality_operator, is_not_equality_operator.
It too suffers from the order of evaluation pitfall, if the argument list grows.
const OverloadedOperatorKind OOK = FD->getOverloadedOperator();
if (is_not_comparison_operator(OOK))
return false;
C++ Operators
The language has a number of existing operators that can be classified into one of six categories Arithmetic, Relational, Logical, Bitwise, Assignment, and Conditional.
The manifold operators align with the singular existing conditional operator within the language, Ternary.
conditional-expression: [C99 6.5.15]
logical-OR-expression
logical-OR-expression '?' expression ':' conditional-expression
[GNU]
logical-OR-expression '?' ':' conditional-expression
As with the Ternary operator the manifold operators cannot be overridden, this is to ensure that true/false expressions are only evaluated based on the truth or falseness of the conditional expression.
The order of evaluation is maintained due to operator associativity, overcoming the stated pitfall of the alternative solutions above.
Associativity is the left-to-right or right-to-left order for grouping operands to operators that have the same precedence. An operator's precedence is meaningful only if other operators with higher or lower precedence are present.
Three-letter operators have been a contentious area within the C++ community. This barrier was breached with the release of C++20 introducing the three-way comparison operator (P0515R0); more commonly known as the spaceship operator due to its designated token <=>.
Fortunately, the suggested manifold syntax does not intersect with any pre-existing token sequence.
Implementation
The LLVM compiler infrastructure was selected as the target for operator implementation.
LLVM began as a research project at the University of Illinois, with the goal of providing a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages
?? No requirement exists to implement a proposal prior to WG21 submission.
Internals
LLVM is broken up into a number of distinct modules tasked with the primary stages of the compilation process, from lexical analysis to code generation.
Given the new operators can be represented by existing conditionals, we'll focus on three key stages.
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program.
LLVM's Lexer library implements an interface that consumes a text buffer and generates a stream of symbols, one or more characters, for parsing.
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar.
The Parser library interprets an incoming stream of symbols from the Lexer and performs syntactic analysis and Abstract Syntax Tree (AST) generation.
Semantic analysis or context sensitive analysis is a process in compiler construction, usually after parsing, to gather necessary semantic information from the source code.
The Sema library refines and builds out the AST.
We shouldn't have to perform modifications in this area. However, the Sema library appears to be the default dumping ground for common logic, and can't be completely ruled out.
Additional Reading
For the inquisitive reader, the following links are a great start prior to code-base [LLVM] exploration.
Approach
LLVM's documentation details an approach to adding a new expression or statement. Additionally, the three-way comparison operator's commits will hopefully illuminate a potential implementation.
That being said, we will lean heavily towards treating the problem as a software defect and attempt to get it to perform an operation it was not intended to do.
Development
As stated in the build documentation, compilation will be achieved through the CMake build scripts on a Linux distro (pop-os 6.2.6-76060206-generic).
$ git clone https://github.com/llvm/llvm-project
Cloning into 'llvm-project'...
...
Receiving objects: 100% (5588761/5588761), 1.89 GiB | 4.35 MiB/s, done.
?? A debug build will consume ~124GB of storage, not including test cases.
$ cd llvm-project
$ mkdir build && cd build/
$ cmake -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_BUILD_TYPE=Debug -G "Unix Makefiles" ../llvm
...
$ make
...
$ make check-all
...
$ cd .. && build/bin/clang++ --version
clang version 18.0.0 (https://github.com/llvm-project f8b2544c422404b0c8d1225d9240301f50de1cdb)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/bcrowhurst/Development/llvm-project/build/bin
Step 1 - Create an Error ??
Starting with the most complex operator let's compile the desired syntax.
Out of the four new operators, the most involved is the one-of-operator. Its equivalent syntax requires two distinct comparison operators (==, !=), whereas the alternatives only require a single comparison operator.
$ build/bin/clang++ -fsyntax-only manifold-proposal/one-of-rfc.cpp
one-of-rfc.cpp:7:13: error: type name requires a specifier or qualifier
7 | if (var [^] 1,2)
| ^
one-of-rfc.cpp:7:13: error: expected expression
2 errors generated.
Step 2 - Walk the Codebase
Scanning for the rendered messages gives rise to DiagnosticParseKinds.td:201.
$ grep --recursive "expected expression"
...
DiagnosticParseKinds.td:def err_expected_expression : Error<"expected expression">;
...
Subsequently, scanning for 'err_expected_expression' determines the origin of the error message in the Parse library at one of five possible locations.
$ grep --recursive "err_expected_expression"
...
ParseExpr.cpp: Diag(Tok, diag::err_expected_expression);
ParseExpr.cpp: Diag(Tok, diag::err_expected_expression);
ParseExpr.cpp: Diag(CCLoc, diag::err_expected_expression);
ParseExpr.cpp: Diag(Tok, diag::err_expected_expression);
ParseOpenMP.cpp: Diag(Tok.getLocation(), diag::err_expected_expression);
Studying the candidates, we see a comment on line 3705, just above one of the reported diagnostics within ParseExpr.cpp:3705.
Breaking down the code line by line.
Tok (class Token) is the current token being parsed. If its kind (enum TokenKind) is not an opening left brace, enter the body of the if-statement.
3704 if (!Tok.is(tok::l_brace)) {
Report a diagnostic error message indicating malformed syntax, expression expected — see diagnostic documentation for further details.
The ERROR severity is used for diagnostics indicating the program is never acceptable under any circumstances. When an error is emitted, the AST for the input code may not be fully built.
3706 Diag(Tok, diag::err_expected_expression);
Discard the evaluation context associated with the malformed block of code.
3707 Actions.ActOnBlockError(CaretLoc, getCurScope());
Inform the caller of ParseBlockLiteralExpression that an error has occurred.
3708 return ExprError();
Step 3 - Mapping the call stack
I've summarised the program control flow. The reality is the LLVM project has thousands of lines of code and a short article is not the appropriate place to communicate the minutiae.
$ cloc .
156889 text files.
147840 unique files.
32119 files ignored.
cloc v 1.90 T=231.71 s (550.0 files/s, 145368.8 lines/s)
--------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------
C++ 29317 948242 1856402 5487231
LLVM IR 35402 633797 5838837 2977776
C/C++ Header 11741 316619 496652 1593157
D 2980 2 0 1038679
C 10383 257156 1566090 997444
YAML 4846 66451 64219 863733
Assembly 10582 470100 1185436 808791
JSON 111 18 0 443909
Python 2315 51776 57258 222274
... .... .... .... ....
--------------------------------------------------------------------
SUM: 127446 3039636 14947767 15695664
--------------------------------------------------------------------
?? ParseStmt.cpp
Line 37: Enter ParseStatement.
Line 45: Invoke ParseStatementOrDeclaration.
Line 303: Invoke ParseIfStatement.
Line 1515: Invoke ParseParenExprOrCondition.
Line 1307: Invoke ParseCXXCondition.
?? ParseExprCXX.cpp
Line 2042: Enter ParseCXXCondition.
Line 2103: Invoke ParseExpression.
?? ParseExpr.cpp
Line 126: Enter ParseExpression — a precedence-based parser.
Line 128: Invoke ParseRHSOfBinaryExpression.
Line 549: Invoke ParseCastExpression.
Line 1766: Finally, ?? invoke ParseBlockLiteralExpression.
Step 4 - Circumventing the error message
At this stage, we know the source of the diagnostic message and the path through the code that leads to the error. What we desire is to avoid the case expression at ParseExpr.cpp:1765.
Recall our new one-of-operator needs to be parsed as a complete token, not processed as the individual characters [, ^, ]. Introducing this token into the Lexer library should lead to new diagnostics and an area of further research.
The first step is to introduce the token definition into TokenKinds.def:244.
PUNCTUATOR(manifoldoneof, "[^]")
?? The token was introduced into the C++ Support section. For unknown reasons, the Spaceship operator is listed under C99 6.4.6 Punctuators.
Now at Lexer.cpp:3934 we introduce additional logic for operator tokenisation.
?? Ah! The leading character of the Spaceship operator is an opening angle bracket, which is tokenised under the C99 6.4.6: Punctuators section in the CPP file — header sections marry up to the CPP file layout based on the first character of each token.
As a stream of bytes is fed into the lexer they are organised into tokens consisting of one or more characters.
When an opening square bracket is encountered (Lexer.cpp:3934), we peek ahead to the next character testing if it equals caret, and peek ahead one more to find a matching closing square bracket. If the stream satisfies the requirement, we consume the two proceeding characters and assign a token kind of manifold-one-of.
Later we'll extend this logic to support the any-of, all-of, and none-of operators.
Given that we are inside a precedence-based parser (ParseExpression), we need to assign a precedence level to the token by creating an additional case expression at OperatorPrecedence.cpp:53.
A complete diff of the changes can be reviewed in commit a437fd6.
As before, let's (re)build LLVM and attempt to compile the new syntax.
$ build/bin/clang++ -fsyntax-only manifold-proposal/one-of-rfc.cpp
one-of-rfc.cpp:7:18: error: expected ':'
7 | if (var [^] 1,2)
| ^
| :
one-of-rfc.cpp:7:11: note: to match this '?'
7 | if (var [^] 1,2)
| ^
one-of-rfc.cpp:7:18: error: expected expression
7 | if (var [^] 1,2)
| ^
one-of-rfc.cpp:7:15: warning: left operand of comma operator has no effect [-Wunused-value]
7 | if (var [^] 1,2)
| ^
1 warning and 2 errors generated
Awesome! ?? This is precisely the kind of feedback we hoped to see.
The new token is being processed as if it were a conditional operator. Additionally, we have a warning that the comma operator is being parsed, indicating the entirety of the expression has been evaluated.
We see the original error 'expected expression' and a new entry 'expected ':''. This looks as if our expression is being parsed as the Ternary operator; as anticipated.
The source of the new diagnostics can be tracked down to ParseRHSOfBinaryExpression at ParseExpr.cpp:522.
Step 5 - Erecting the scaffolding
Reviewing ParseRHSOfBinaryExpression we see a special block of logic for handling the ternary operator, and the source of our diagnostic messages.
The logic is tasked with parsing the inner expression of the Ternary operator...
logical-OR-expression '?' expression ':' conditional-expression
...if one exists (Elvis operator), and then passing it into ActOnConditionalOp for semantic parsing at ParseExpr.cpp:641.
This [function] seems to be an appropriate location to introduce the new manifold parsing routines.
The following commit (580f147) erects the scaffolding to avoid the special Ternary logic and invoke a distinct path for the manifold operators — resulting in a deliberate expression error at ParseExpr.cpp:3854.
?? The ParseManifoldExpression location changed with commit 95e08c3.
Step 6 - Crafting the AST
Clang has a built-in AST-dump mode, which can be enabled with the flags '-ast-dump' and '-ast-dump-xml' — XML generation is only supported in debug builds.
Generating the desired structure is as easy as dumping the equivalent syntax of the one-of-operator.
?? Minimal implementations produce direct output when dumping the AST.
Breaking down the output line by line details the required steps for emitting a well-structured AST entry.
Finally, the CompoundStmt at line 19 represents a group of one or more statements making up the if-body.
Step 7 - Parsing the operator
See commit ea2f67d or ParseExprCXX.cpp:4120 for details.
Line 4121-4123: Debug assertion; will improve this later with a 'ManifoldExpressions' feature switch.
Line 4125-4140: Parse the first RHS expression and produce an error if we fail to find two or more comma-delimited expressions.
Line 4142-4155: Iterate over the remaining RHS expressions, parse them, and ensure we end with closing parentheses.
Line 4158-4176: Stitch the RHS expressions together while propagating the equality comparison operator further up for each outer iteration — O(n*m).
Line 4178-4187: For each outer iteration wrap the stitched expressions in parentheses to ensure order of evaluation.
Step 8 - Testing
We've updated the final form of the test code to take a single command line argument and to prompt if the input is inside or outside of the acceptable set.
Additionally, test cases were introduced in commit c832a78.
build/bin/clang++ -fsyntax-only -Xclang -verify clang/test/Parser/manifold.cpp
And finally, we rerun LLVMs complete test suite.
$ make check-all
...
[100%] Running all regression tests
llvm-lit: /home/bcrowhurst/Development/llvm-project/llvm/utils/lit/lit/llvm/config.py:487: note: using clang: /home/bcrowhurst/Development/llvm-project/build/bin/clang
Testing Time: 1624.06s
Skipped : 50
Unsupported : 1994
Passed : 88521
Expectedly Failed: 180
[100%] Built target check-all
Step 9 - Resolved Bugs
auto fn = [&]{ } // error: expected manifold expression
Step 10 - Round off
// Enable
clang++ -fmanifold-expressions test.cpp -o test
// Disable (default)
clang++ -fno-manifold-expressions test.cpp -o test
?? Warning this implementation is a proof-of-concept; bugs may still reside.
Step 11 - Retrospective
As a first swing at the LLVM project, the resulting implementation is sufficient for a proof-of-concept.
However, refactoring ParseRHSOfBinaryExpression and splitting the parsing logic for conditional operators from that of other expressions, would make it easier to introduce new conditionals, and help confine the 277 lines of existing logic into a more digestible unit.
Additionally, the newly introduced manifold logic would be better positioned if transferred, or split between the Sema library with a specific action, or extension of the existing action ActOnConditionalOp.
The current approach allows us to reuse much of the existing framework, yet it falls short of a mergeable implementation. Refactoring is required given the manifold operators gain any traction during discussions with the WG21 committee.
WG21 Submission
ISOCPP outlines the proposal process as follows:
Feedback
Over the course of a month, the proposal was discussed and a general disagreement towards introducing new operators was widely received. The preferred approach is a template-based solution with much of the thread discussing the nuances of such an implementation.
Other comments suggested this would lead to even more complex conditionals, countering any arguments on a reduction in cognitive load.
if(Alpha [!] Beta, Gamma, Delta && Alpha[^] Epsilon, Zeta, Eta && Theta [^] ... && ....) I'd argue cognitive load goes up from needing to track the three-character symbols and their relative positions between a bunch of names (especially consider if the names themselves become compounded names or function calls); over a more traditional function call syntax. It adds more characters, yes, but verbosity is not a dirty word if it means making the code clearer. I will agree that the need to insert a wrapping "matcher" template is unideal, but that's how the language works; and adding something as extraordinarily fundamental as a new set of operators to the language requires an extraordinary motivation which in the nicest possible way I don't quite see here.?
One member was concerned with how daunting the addition of new operators would be when wording the change into the standard.
...have you looked at how it would interact with the existing standard in terms of wording; including all the various fringe cases and bits that adding new operators and conditionals will have when pathologically arranged? I dread to think how many places the standard refers to operators and constructs which can be treated as conditionals and predicates, so there may well be wording awkwardness aplenty.
Conclusion
To conclude, we've explored the LLVM Compiler Infrastructure Project, implementing a number of new operators for the C++ language.
The WG21 response was somewhat lacklustre but to be expected due to such a fundamental set of new language features. I acknowledge my naivety when it comes to the impact on the C++ Standard that four new operators would cause; as shown in my initial post to the working group.
Section '7.6 Compound expressions' requires the addition of new grammar and wording detailing the Manifold operator syntax. TBD.
Going into the project I was well aware it would be difficult to get much traction however, my motivation was primarily driven by curiosity with respect to LLVM's compiler design and implementation.
A working copy of the compiler with all the manifold operators implemented can be located in the project's GitHub repository.
For those seeking to scratch a C++ itch, this year's CppCon has concluded with videos now available at https://cppcon.org/. ??