Modernizing legacy code

In the past decade and a half I’ve been working with large legacy code bases started in early ’90s. Therefore, I had to deal with lots of code using old styles and conventions such raw pointers, void pointers, declaring all variables before using them, public data members accessed from everywhere, and many others. I believe in change and therefore I’m trying to make as many changes as possible. Of course, this is not always possible, or desirable (due to various constraints). Moreover, nobody will stop a large project for months or years to modernize the code. However, applying small but incremental changes is always possible, and over time, large code-bases can improve. This is a strategy I’m constantly applying to parts of code that I have to modify. In this blog post I will be listing a series of improvements you can do with old C++ code in order to modernize and improve it.

Macros

It is a very common case to use macros for constants. Here is an example that defines some printer types:

This is bad because there is no way to restrict the use of these values (0, 1, and 2 in this example) wherever a printer type is needed:

The best way in this case is to define a scoped enumeration type.

By using printer_type instead of an int value wherever a printer type is necessarily we can make sure we always use a legal value.

Sometimes, macros are used for defining values that do not represent an enumeration. For instance, the size of a buffer:

In this case, the best solution is to define a constexpr value instead.

There are also function-like macros. Here is an example:

These kind of macros can be replaced with a constexpr function. Here is how:

In C++20, the mega_bytes() function can be an immediate function, instead. An immediate function is a function that must produce a compile-time constant. Such a function exists only at compile time. There is no symbol emitted for one and you cannot take its address. Therefore, an immediate function is much more similar to a macro. An immediate function is declared with the consteval keyword (which cannot be used together with constexpr). This is how it the mega_bytes() function could be declared in C++20:

You can read more about constants, macros and alternatives in the following articles:

Type aliases

I have seen more times than it was necessary the following style of defining structures used by people with a background of programming in C:

typedef struct is not necessarily in C++. Therefore, the C++ definition should look like this:

However, we can do better. Because C++11 provides type aliases that enable us to define aliases for types is a more readable way. The above typedef is equivalent to the following:

This is more relevant when you need to define function pointers. Let’s consider that you have the following functions, foo() and foobar() and 3rd function doit() that needs to take the address of one of these two functions:

So, how do you define the function pointer type fn_foo? I have to confess that all my life I had problems remembering the syntax for doing this. I always had to look it up. This is how you do it:

However, the using definition syntax enables us to write a much more readable and easier to remember definition:

This is the same syntax used for declaring std::function objects. Here is an example:

I’ve stopped using typedefs a while ago because I find the using definitions more natural to write and read.

Data member initialization

I have encountered the following pattern for initialization of data members countless times:

This is wrong because data member initialization should be done in the initialization list (except for the view cases when this is not possible). When you do it like above, each member is initialized twice (which might not be significant for built-in numeric types such as int but it is for larger objects). This is because before the constructor body is executed, initialization of all direct bases, virtual bases, and non-static data members is performed. If you want to specify a non-default initialization for the non-static data members, you should use the initialization list. Here is how:

Keep in mind that the order of the initialization in the list is not important. Non-static data member are initialized in order of declaration in the class definition.

The problem is that the more data members the class has the more likely it is to forget to initialize the member. In C++11, you can simplify initialization by providing it within the declaration of the data member.

If you need to initialize data-members from constructor arguments, you still need to use the constructor initialization list. If both initializations are present, the one from the initialization list take precedence. This is useful for classes that have multiple constructors with different sets of parameters.

Avoiding memory allocation and deallocation

The use of standard containers that allocate memory internally and deallocate it automatically when objects go out of scope helps avoiding explicit allocation and deallocation of memory. An example where a standard container, such as std::vector can be used is for variable size buffers required when calling Windows system APIs. There are many Windows API functions that need to fill a buffer passed from the caller but the caller must first determine the size of the buffer. This is solved by first calling the function with a null buffer which will determine the function to return the required size. Then, you allocate the memory for the buffer and invoke the same function with a sufficiently sized buffer. Below is an example of this pattern.

This code has two issues. The points marked with [1] and [2] would leak memory. At [1], we return without deleting the allocated buffer. At [2], an exception occurs so the next line that deletes the buffer would not execute, again, leaking memory. This can be simplified with the help of a std::vector as follows:

With this new implementation returning from the function either normally (with a return statement) or because an exception occurred will have the effect that the pBuffer object is destroyed and when that happens its internal memory will be deleted. Therefore, this implementation is shorter and more robust.

This example concerned the use of a buffer (a contiguous memory chunk). But the same problems appear when you allocate single objects and use raw pointers. Take a look at the following snippet:

We have a function called example() that allocates a foo object that it will eventually pass to the function give_up_ownership(). Before doing so, it does some checks and may return without calling that function. However, before returning, the foo object must be deleted. Which is easy to forget when you code like this, as exemplified at the line marked with [1]. This introduces a memory leak. Again, this implementation can be simplified, this time with the help of a smart pointer, std::unique_ptr.

There is no explicit calls to new (replaced with std::make_unique()) and delete here. Moreover, the give_up_ownership() remains unchanged. The call to std::unique_ptr::release detaches the unique_ptr object from the underlaying raw pointer and returns the raw pointer, so that when the smart pointer goes out of scope it will not attempt to delete the object. As with the previous example with std::vector the new implementation is simpler and more robust.

Avoiding C-like arrays

C-like arrays can be replace with standard containers, such as std::vector or std::array. A pattern I have encountered many times is shown in the next snippet:

There is an array of Objects and a macro, NUM_OBJECTS used to represent the number of elements in the array, in order to avoid hard-coded values (which are error-prone, especially in the face of actual changes to the number of elements in the array). std::vector or std::array are always a better alternative here:

Not only using a standard container avoid the use of a macro, since the method size() can be used to retrieve the number of elements in the container, but it also enables the use of range-based for loops.

If you have functions that take an array as input in the form of a pointer (to the first element) and a size (to specify the number of elements), it can remain untouched regardless you are calling it with arguments that are arrays or standard containers (including std::array). Consider the following example:

This can be called as follows:

However, the result would be the same if this code changed as follows:

Proper casting

C-style cast expressions, in the form (type)value are widely used by C++ developers, although they should not. C++ provides four cast operators, as follows:

  • static_cast: converts between types using implicit and user-defined conversions (examples include converting enums to integral types, floating-point types to integral types, pointer types to pointer to void, pointers to a base class to pointers to a derived class, etc.)
  • reinterpret_cast: does conversion between types by reinterpreting the underlying bit pattern (such as converting between pointer and integral types)
  • dynamic_cast: performs a safe conversion between pointers or references to classes up, down, and sideways along the inheritance hierarchy
  • const_cast: converts between types with different cv-qualification

However, an explicit C-like casting is interpreted as following (which the first choice that satisfies the respective cast operator being selected):

  1. const_cast
  2. static_cast
  3. static_cast followed by const_cast
  4. reinterpret_cast
  5. reinterpret_cast followed by const_cast

Instead of writing code as this:

you should get into the habit of writing the following:

This better expresses the intention of the user which helps the compiler flag inappropriate casting attempts. C++ casts are also easier to find with a simple text search which can be useful at times.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.