The C++ language defines the observable behaviour of a program and uses terms such as ill-formed or undefined behaviour to describe it. The C++26 standard introduces a new one, called erroneous behaviour. In this post, we’ll look at what these terms mean.
Well-formed
This is the simplest of them all. It means that a program is constructed according to the syntactic and semantic rules of the C++ language.
Ill-formed
By contrast, a program that is not well-formed, is called ill-formed. An ill-formed program is a program that has syntactic errors or semantic errors that compilers can diagnose. For such a program, the compiler is required to issue a diagnostic. Here are a couple of examples:
int main() { int a = 42 // syntax error, missing ; }
int main() { int a = 42; auto l = [a](int const a) {return a + 1;}; // error: lambda capture and lambda parameter have the same name }
(Trivia: the term ill-formed is used about 500 times in the standard.)
Ill-formed, no diagnostic required
This means the program is syntactically correct but has semantic errors, which might not be diagnosable. The compiler is not required to issue a diagnose and executing such a program is undefined behaviour. That doesn’t mean that the compilers do not diagnosticate these problems. In practice, many of these cases do get a diagnose. A couple examples are shown next:
class foo { foo(int a) : foo(a + 1) {} // constructor delegates to itself };
float operator ""Z(const char*) // literal suffix identifiers that do not start with an underscore are reserved { return 42; }
(Trivia: the term ill-formed, no diagnostic required appears about 50 times in the standard.)
Unspecified behaviour
This term defines behaviour that is dependent on the implementation. Examples for such behaviour include the order of function parameter evaluation (which can be from left to right or from right to left) or the amount of memory overhead for an array allocation.
Implementation-defined behaviour
Unspecified behaviour should not be confused with the implementation-defined behaviour. The difference between the two is actually small: in the case of implementation-defined behaviour, the effect of the bahaviour must be documented, which is not the case for unspecified behaviour.
Examples of implementation-defined behaviour include the choice of underlying type for the char
type (which can be either signed char
or unsigned char
, although char
is a distinct type) and the minimum number of buckets for constructing an std::unordered_multiset
(when a value is not explicitly specified). Often, such implementation-defined choices are described with a comment such as following in the specification:
unordered_multiset( std::initializer_list<value_type> init, size_type bucket_count = /* implementation-defined */, const Hash& hash = Hash(), const key_equal& equal = key_equal(), const Allocator& alloc = Allocator() );
(Trivia: the term implementation-defined is used 400 times in the standard.)
Undefined behaviour
This is one of the most known traits of the C++ language. For a range of cases that are semantically incorrect, the standard does not impose any restrictions on the behaviour, and does not require the compiler to diagnose the situation (even though compilers do that whenever they can detect such cases). This is called undefined behaviour. A program with undefined behaviour can execute normally, can crash, or can execute without doing anything that is expected without being incorrect. Examples of undefined behaviour include:
- dereferencing a null pointer
- indexing an array out of bounds
- converting between pointers to objects of incompatible types
- overflow of signed integers
- casting a numeric value to a type in which the value cannot be represented
- modifying a constant object
- division by zero (using
/
or%
operators) - reading uninitialized variables
int* i = nullptr; std::cout << *i << '\n'; // dereferencing a null pointer
int a[5]; a[5] = 42; // indexing an array out of bounds
int step_it(int a) {return a + 1;} int main() { int x; std::cout << step_it(x) << '\n'; // uninitialized read }
(Trivia: Undefined behaviour appears a bit more over 100 times in the standard.)
The standard specifies the following about undefined behaviour:
3.65 [defns.undefined]
undefined behavior
behavior for which this document imposes no requirements
[Note 1 to entry: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. […] ]
C++ Standard, N4917, paragraph 3.65
Erroneous behaviour
The C++26 standard defines a new kind of behaviour known as erroneous behaviour. This is meant to define incorrect code of a well-formed program to indicate bugs and allow compilers to diagnose them. The paper P2795R5, Erroneous behaviour for uninitialized reads, by Thomas Köppe, introduces this new behaviour for uninitialized reads (the last example in the previous section). This is defined as follows:
3.?
erroneous behavior
well-defined behavior that the implementation is recommended to diagnose
[Note 1 to entry: Erroneous behavior is always the consequence of incorrect program code. Implementations are allowed, but not required, to diagnose it ([4.1.1, intro.compliance.general]). Evaluation of a constant expression ([7.7, expr.const]) never exhibits behavior specified as erroneous in [4, intro] through [15, cpp]. — end note]
P2795R5
It also changes the definition of undefined bavariour to the following:
3.65 [defns.undefined]
undefined behavior
behavior for which this document imposes no requirements
[Note 1 to entry: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous incorrect construct or erroneousinvalid data. […] ]
P2795R5
In the future, other programming errors can be categorized as erroneous behaviour:
- signed integer overflow
- array access out of bounds
- dereferencing null pointers
- type-punning (such as
reinterpret_cast
-ing a floating-point value to an integer value)
Identifying undefined behaviour may have runtime costs. Therefore, users would be able to opt out of such a behaviour. This will be done with the help of an attribute. For uninitialized reads, this attribute is not yet defined (although the proposal is to use [[indeterminate]]
), but will be usable on variable definitions and on function parameters.
int step_it(int a) {return a + 1;} int main() { [[indeterminate]] int x; std::cout << step_it(x) << '\n'; // uninitialized read }
For more information about this, including motivation, implications, and examples, you should read the proposal paper, mentioned earlier.
Hi. This is not related to this post, but I suspect you no longer see comments on older posts. Can you please fix the colors in this post: https://mariusbancila.ro/blog/2018/07/26/cpp-special-member-function-rules/ . The table is wonderful, and I have bookmarked it and come back to it for reference often, but it is very hard without the colors. I don’t know why they stopped working
@Baruch, that’s fixed now. Thanks for reporting.