Erroneous behaviour has entered the chat

The C++ language defines the observable behaviour of a program and uses terms such as ill-formed or undefined behaviour to describe it. The C++26 standard introduces a new one, called erroneous behaviour. In this post, we’ll look at what these terms mean.

Well-formed

This is the simplest of them all. It means that a program is constructed according to the syntactic and semantic rules of the C++ language.

Ill-formed

By contrast, a program that is not well-formed, is called ill-formed. An ill-formed program is a program that has syntactic errors or semantic errors that compilers can diagnose. For such a program, the compiler is required to issue a diagnostic. Here are a couple of examples:

int main()
{
   int a = 42   // syntax error, missing ;
}
int main()
{
   int a = 42;

   auto l = [a](int const a) {return a + 1;};  // error: lambda capture and lambda parameter have the same name
}

(Trivia: the term ill-formed is used about 500 times in the standard.)

Ill-formed, no diagnostic required

This means the program is syntactically correct but has semantic errors, which might not be diagnosable. The compiler is not required to issue a diagnose and executing such a program is undefined behaviour. That doesn’t mean that the compilers do not diagnosticate these problems. In practice, many of these cases do get a diagnose. A couple examples are shown next:

class foo
{
   foo(int a) : foo(a + 1) {} // constructor delegates to itself
};
float operator ""Z(const char*) // literal suffix identifiers that do not start with an underscore are reserved
{ return 42; }

(Trivia: the term ill-formed, no diagnostic required appears about 50 times in the standard.)

Unspecified behaviour

This term defines behaviour that is dependent on the implementation. Examples for such behaviour include the order of function parameter evaluation (which can be from left to right or from right to left) or the amount of memory overhead for an array allocation.

Implementation-defined behaviour

Unspecified behaviour should not be confused with the implementation-defined behaviour. The difference between the two is actually small: in the case of implementation-defined behaviour, the effect of the bahaviour must be documented, which is not the case for unspecified behaviour.

Examples of implementation-defined behaviour include the choice of underlying type for the char type (which can be either signed char or unsigned char, although char is a distinct type) and the minimum number of buckets for constructing an std::unordered_multiset (when a value is not explicitly specified). Often, such implementation-defined choices are described with a comment such as following in the specification:

unordered_multiset( std::initializer_list<value_type> init,
                    size_type bucket_count = /* implementation-defined */,
                    const Hash& hash = Hash(),
                    const key_equal& equal = key_equal(),
                    const Allocator& alloc = Allocator() );

(Trivia: the term implementation-defined is used 400 times in the standard.)

Undefined behaviour

This is one of the most known traits of the C++ language. For a range of cases that are semantically incorrect, the standard does not impose any restrictions on the behaviour, and does not require the compiler to diagnose the situation (even though compilers do that whenever they can detect such cases). This is called undefined behaviour. A program with undefined behaviour can execute normally, can crash, or can execute without doing anything that is expected without being incorrect. Examples of undefined behaviour include:

  • dereferencing a null pointer
  • indexing an array out of bounds
  • converting between pointers to objects of incompatible types
  • overflow of signed integers
  • casting a numeric value to a type in which the value cannot be represented
  • modifying a constant object
  • division by zero (using / or % operators)
  • reading uninitialized variables
int* i = nullptr;
std::cout << *i << '\n'; // dereferencing a null pointer
int a[5];
a[5] = 42; // indexing an array out of bounds
int step_it(int a) {return a + 1;}

int main()
{
  int x;
  std::cout << step_it(x) << '\n'; // uninitialized read
}

(Trivia: Undefined behaviour appears a bit more over 100 times in the standard.)

The standard specifies the following about undefined behaviour:

3.65 [defns.undefined]

undefined behavior

behavior for which this document imposes no requirements

[Note 1 to entry: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. […] ]

C++ Standard, N4917, paragraph 3.65

Erroneous behaviour

The C++26 standard defines a new kind of behaviour known as erroneous behaviour. This is meant to define incorrect code of a well-formed program to indicate bugs and allow compilers to diagnose them. The paper P2795R5, Erroneous behaviour for uninitialized reads, by Thomas Köppe, introduces this new behaviour for uninitialized reads (the last example in the previous section). This is defined as follows:

3.?

erroneous behavior

well-defined behavior that the implementation is recommended to diagnose

[Note 1 to entry: Erroneous behavior is always the consequence of incorrect program code. Implementations are allowed, but not required, to diagnose it ([4.1.1, intro.compliance.general]). Evaluation of a constant expression ([7.7, expr.const]) never exhibits behavior specified as erroneous in [4, intro] through [15, cpp]. — end note]

P2795R5

It also changes the definition of undefined bavariour to the following:

3.65 [defns.undefined]

undefined behavior

behavior for which this document imposes no requirements

[Note 1 to entry: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous incorrect construct or erroneousinvalid data. […] ]

P2795R5

In the future, other programming errors can be categorized as erroneous behaviour:

  • signed integer overflow
  • array access out of bounds
  • dereferencing null pointers
  • type-punning (such as reinterpret_cast-ing a floating-point value to an integer value)

Identifying undefined behaviour may have runtime costs. Therefore, users would be able to opt out of such a behaviour. This will be done with the help of an attribute. For uninitialized reads, this attribute is not yet defined (although the proposal is to use [[indeterminate]]), but will be usable on variable definitions and on function parameters.

int step_it(int a) {return a + 1;}

int main()
{
  [[indeterminate]] int x;
  std::cout << step_it(x) << '\n'; // uninitialized read
}

For more information about this, including motivation, implications, and examples, you should read the proposal paper, mentioned earlier.

2 Replies to “Erroneous behaviour has entered the chat”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.