In the past decade and a half I’ve been working with large legacy code bases started in early ’90s. Therefore, I had to deal with lots of code using old styles and conventions such raw pointers, void pointers, declaring all variables before using them, public data members accessed from everywhere, and many others. I believe in change and therefore I’m trying to make as many changes as possible. Of course, this is not always possible, or desirable (due to various constraints). Moreover, nobody will stop a large project for months or years to modernize the code. However, applying small but incremental changes is always possible, and over time, large code-bases can improve. This is a strategy I’m constantly applying to parts of code that I have to modify. In this blog post I will be listing a series of improvements you can do with old C++ code in order to modernize and improve it.
Macros
It is a very common case to use macros for constants. Here is an example that defines some printer types:
#define PRINT_STANDARD 0 #define PRINT_PDF 1 #define PRINT_TEXT 2
This is bad because there is no way to restrict the use of these values (0, 1, and 2 in this example) wherever a printer type is needed:
void print(document doc, int const printer) { // ... } print(doc, 42); // oops
The best way in this case is to define a scoped enumeration type.
enum class printer_type { standard = 0, pdf, text };
By using printer_type
instead of an int
value wherever a printer type is necessarily we can make sure we always use a legal value.
void print(document doc, printer_type const printer) { // ... } print(doc, printer_type::pdf); // OK print(doc, 42); // compiler-error
Sometimes, macros are used for defining values that do not represent an enumeration. For instance, the size of a buffer:
#define BUFFER_SIZE 1024 int main() { char buffer[BUFFER_SIZE]; }
In this case, the best solution is to define a constexpr
value instead.
constexpr size_t BUFFER_SIZE = 1024;
There are also function-like macros. Here is an example:
#define MEGA_BYTES(MB) (MB * 1048576) enum class GROW_BY { mb1 = MEGA_BYTES (1), mb4 = MEGA_BYTES (4), mb8 = MEGA_BYTES (8) };
These kind of macros can be replaced with a constexpr
function. Here is how:
constexpr auto mega_bytes(unsigned const mb) { return mb * 1048576; } enum class GROW_BY { mb1 = mega_bytes(1), mb4 = mega_bytes(4), mb8 = mega_bytes(8) };
In C++20, the mega_bytes()
function can be an immediate function, instead. An immediate function is a function that must produce a compile-time constant. Such a function exists only at compile time. There is no symbol emitted for one and you cannot take its address. Therefore, an immediate function is much more similar to a macro. An immediate function is declared with the consteval
keyword (which cannot be used together with constexpr
). This is how it the mega_bytes()
function could be declared in C++20:
consteval auto mega_bytes(unsigned const mb) { return mb * 1048576; }
You can read more about constants, macros and alternatives in the following articles:
Type aliases
I have seen more times than it was necessary the following style of defining structures used by people with a background of programming in C:
typedef struct TransInfo { INT iUniqueNo; INT iDocNo; } TRANSINFO, *PTRANSINFO;
typedef struct
is not necessarily in C++. Therefore, the C++ definition should look like this:
struct TRANSINFO { INT iUniqueNo; INT iDocNo; }; typedef TRANSINFO* PTRANSINFO;
However, we can do better. Because C++11 provides type aliases that enable us to define aliases for types is a more readable way. The above typedef is equivalent to the following:
using PTRANSINFO = TRANSINFO*;
This is more relevant when you need to define function pointers. Let’s consider that you have the following functions, foo()
and foobar()
and 3rd function doit()
that needs to take the address of one of these two functions:
bool foo(int const a, double const b) { // ... } bool foobar(int a, double b) { // ... } void doit(fp_foo f) { std::cout << f(42, 100) << '\n'; } int main() { doit(foo); doit(foobar); }
So, how do you define the function pointer type fn_foo
? I have to confess that all my life I had problems remembering the syntax for doing this. I always had to look it up. This is how you do it:
typedef bool (*fp_foo)(int const, double const);
However, the using definition syntax enables us to write a much more readable and easier to remember definition:
using fp_foo = bool(*)(int const, double const);
This is the same syntax used for declaring std::function
objects, except for the (*)
part. Here is an example:
void doit(std::function<bool(int const, double const)> f) { std::cout << f(42, 100) << '\n'; }
I’ve stopped using typedefs a while ago because I find the using definitions more natural to write and read.
Data member initialization
I have encountered the following pattern for initialization of data members countless times:
struct TRANSINFO { INT iUniqueNo; INT iDocNo; // ... TRANSINFO(); }; TRANSINFO::TRANSINFO() { iUniqueNo = 0; iDocNo = 0; }
This is wrong because data member initialization should be done in the initialization list (except for the view cases when this is not possible). When you do it like above, each member is initialized twice (which might not be significant for built-in numeric types such as int
but it is for larger objects). This is because before the constructor body is executed, initialization of all direct bases, virtual bases, and non-static data members is performed. If you want to specify a non-default initialization for the non-static data members, you should use the initialization list. Here is how:
TRANSINFO::TRANSINFO() : iUniqueNo(0), iDocNo(0) { }
Keep in mind that the order of the initialization in the list is not important. Non-static data member are initialized in order of declaration in the class definition.
The problem is that the more data members the class has the more likely it is to forget to initialize the member. In C++11, you can simplify initialization by providing it within the declaration of the data member.
struct TRANSINFO { INT iUniqueNo = 0; INT iDocNo = 0; // ... };
If you need to initialize data-members from constructor arguments, you still need to use the constructor initialization list. If both initializations are present, the one from the initialization list take precedence. This is useful for classes that have multiple constructors with different sets of parameters.
Avoiding memory allocation and deallocation
The use of standard containers that allocate memory internally and deallocate it automatically when objects go out of scope helps avoiding explicit allocation and deallocation of memory. An example where a standard container, such as std::vector
can be used is for variable size buffers required when calling Windows system APIs. There are many Windows API functions that need to fill a buffer passed from the caller but the caller must first determine the size of the buffer. This is solved by first calling the function with a null buffer which will determine the function to return the required size. Then, you allocate the memory for the buffer and invoke the same function with a sufficiently sized buffer. Below is an example of this pattern.
void GetUsernameAndDoSomething() { ULONG nSize = 0; // determine the size of the buffer if (!::GetUserName(NULL, &nSize)) { if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) { return; } } // allocate some buffer memory LPSTR pBuffer = new CHAR[nSize]; if (pBuffer == NULL) return; // fill the buffer if (!::GetUserName(pBuffer, &nSize)) { // [1] oops... failed to delete allocated memory return; } // do something // [2] what if it throws? oops... // clean up delete [] pBuffer; }
This code has two issues. The points marked with [1] and [2] would leak memory. At [1], we return without deleting the allocated buffer. At [2], an exception occurs so the next line that deletes the buffer would not execute, again, leaking memory. This can be simplified with the help of a std::vector
as follows:
void GetUsernameAndDoSomething() { ULONG nSize = 0; // determine the size of the buffer if (!::GetUserName(nullptr, &nSize)) { if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) { return; } } // allocate some buffer memory std::vector<char> pBuffer(nSize); // fill the buffer if (!::GetUserName(pBuffer.data(), &nSize)) { // no need to deallocate anything return; } // do something // what if it throws? we're OK }
With this new implementation returning from the function either normally (with a return statement) or because an exception occurred will have the effect that the pBuffer
object is destroyed and when that happens its internal memory will be deleted. Therefore, this implementation is shorter and more robust.
This example concerned the use of a buffer (a contiguous memory chunk). But the same problems appear when you allocate single objects and use raw pointers. Take a look at the following snippet:
void give_up_ownership(foo* ptr) { // do something delete ptr; } void example() { foo* ptr = new foo(); if(...) { delete ptr; return; } if(...) { // [1] oops... failed to delete object return; } give_up_ownership(ptr); }
We have a function called example()
that allocates a foo
object that it will eventually pass to the function give_up_ownership()
. Before doing so, it does some checks and may return without calling that function. However, before returning, the foo
object must be deleted. Which is easy to forget when you code like this, as exemplified at the line marked with [1]. This introduces a memory leak. Again, this implementation can be simplified, this time with the help of a smart pointer, std::unique_ptr
.
void example() { std::unique_ptr<foo> ptr = std::make_unique<foo>(); if(...) { return; } if(...) { return; } give_up_ownership(ptr.release()); }
There is no explicit calls to new
(replaced with std::make_unique()
) and delete
here. Moreover, the give_up_ownership()
remains unchanged. The call to std::unique_ptr::release
detaches the unique_ptr
object from the underlaying raw pointer and returns the raw pointer, so that when the smart pointer goes out of scope it will not attempt to delete the object. As with the previous example with std::vector
the new implementation is simpler and more robust.
Avoiding C-like arrays
C-like arrays can be replace with standard containers, such as std::vector
or std::array
. A pattern I have encountered many times is shown in the next snippet:
struct Object { int ID; int Parent; }; static Object Alist [] = { {1, 0}, {2, 0}, {3, 1}, {4, 3}, }; #define NUM_OBJECTS (sizeof(list) / sizeof(Object)) for(int i = 0; i < NUM_OBJECTS; ++i) { // do something with Alist[i] }
There is an array of Objects
and a macro, NUM_OBJECTS
used to represent the number of elements in the array, in order to avoid hard-coded values (which are error-prone, especially in the face of actual changes to the number of elements in the array). std::vector
or std::array
are always a better alternative here:
static std::vector<Object> Alist { {1, 0}, {2, 0}, {3, 1}, {4, 3}, }; static std::array<Object, 4> Alist { {1, 0}, {2, 0}, {3, 1}, {4, 3}, }; for(size_t i = 0; i < AList.size(); ++i) { // do something with Alist[i] }
Not only using a standard container avoid the use of a macro, since the method size()
can be used to retrieve the number of elements in the container, but it also enables the use of range-based for loops.
for(auto const & element : AList) { // do something with element }
If you have functions that take an array as input in the form of a pointer (to the first element) and a size (to specify the number of elements), it can remain untouched regardless you are calling it with arguments that are arrays or standard containers (including std::array
). Consider the following example:
void foo(int* arr, unsigned const size) { if(arr != nullptr && size > 0) { for(unsigned i = 0; i < size; ++i) std::cout << arr[i] << '\n'; } }
This can be called as follows:
int main() { int arr[3] = {1, 2, 3}; foo(arr, sizeof(arr)/sizeof(int)); }
However, the result would be the same if this code changed as follows:
int main() { std::vector<int> vec {1, 2, 3}; foo(vec.data(), vec.size()); }
Proper casting
C-style cast expressions, in the form (type)value
are widely used by C++ developers, although they should not. C++ provides four cast operators, as follows:
static_cast
: converts between types using implicit and user-defined conversions (examples include converting enums to integral types, floating-point types to integral types, pointer types to pointer to void, pointers to a base class to pointers to a derived class, etc.)reinterpret_cast
: does conversion between types by reinterpreting the underlying bit pattern (such as converting between pointer and integral types)dynamic_cast
: performs a safe conversion between pointers or references to classes up, down, and sideways along the inheritance hierarchyconst_cast
: converts between types with different cv-qualification
However, an explicit C-like casting is interpreted as following (which the first choice that satisfies the respective cast operator being selected):
const_cast
static_cast
static_cast
followed byconst_cast
reinterpret_cast
reinterpret_cast
followed byconst_cast
Instead of writing code as this:
int margin = (int)((cy - GetHeight())/2); MyEnum e = (MyEnum)value; foo* f = (foo*)lParam;
you should get into the habit of writing the following:
int margin = static_cast<int>((cy - GetHeight())/2); MyEnum e = static_cast<MyEnum>(value); foo* f = reinterpret_cast<foo*>(lParam);
This better expresses the intention of the user which helps the compiler flag inappropriate casting attempts. C++ casts are also easier to find with a simple text search which can be useful at times.