The static analysis of our code base has identified lately several several issues in the C++ code that I had to fix. Once again, this help me I realize how it is to make mistakes that are usually hard to find by just looking at the code (with a human eye). I believe it is worth sharing a few of these issues. I cannot post my actual code, it would be too complex anyway, but I will use some simple snippets that exhibit the same problems found in code. These will help you (hopefully) to easily understand the problem and the solution.
Deleting elements from a vector in a loop
The first problem to discuss is erasing elements from a container, such as std::vector
, in a loop. Our code looked similar to the following, simplified snippet:
int main() { std::vector<int> data{ 1,1,2,3,5,8,13,21,34,55 }; for (auto it = data.begin(); it != data.end();) { /* do something with *it */ if (*it % 2 == 0) { data.erase(it); } else { ++it; } } for (auto const& e : data) std::cout << e << '\n'; }
Here, we have a vector containing data and a loop where each element of the vector is checked to match some condition and those who do are removed from the container. Can you spot the problem?
To remove an element, we use std::vector::erase. This removes the element at the specified position and invalidates the iterators and references at or after the point of the erase, including the end()
iterator. What happens is that when an element is removed, the iterator is not incremented, so the next loop will process the element that was just removed. And that’s not going to end well.
The fix is pretty simple: assigned the value returned by erase()
, which is the iterator following the removed element, to the it
variable.
it = data.erase(it);
That’s the fix. But can we do better? Yes, we can use a generic algorithm to remove all the elements that we want. For this purpose, we can use std::remove_if. We can replace the for loop from above with the following:
data.erase(std::remove_if(data.begin(), data.end(), [](int const n) {return n % 2 == 0; }), data.end());
You should prefer using standard algorithms when possible because, in general, you’ll write less code and that code is less likely to be prone to bugs.
Returning without releasing dynamically allocated memory
The second example I want to bring up is a memory leak that occurs in a function that allocates memory and then returns without releasing it. This can be exemplified with the following snippet:
void Process(int* ptr) { std::cout << *ptr << '\n'; delete ptr; } void Demo(int const limit) { int* ptr = new int(rand() % 100); if (*ptr > limit) { return; } Process(ptr); }
This may look silly, but imagine that instead of an int
there is a user-defined class and there are several checks on the object (or maybe related data) and in some cases the execution of the function should stop. Otherwise, the allocated object is passed to a function (that perhaps takes ownership of it). It only takes a little lack of attention (especially if this is an longer, more complex function) to forget to delete the allocated memory before returning from the function.
The fix is to use a std::unique_ptr which has the advantage of releasing the own dynamically allocated object when going out of scope.
void Demo(int const limit) { std::unique_ptr<int> ptr = std::make_unique<int>(rand() % 100); if (*ptr > limit) { return; } Process(ptr.release()); }
If you need to pass the underlying object to a function that takes a raw pointer and assumes ownership of the object, just use std::unique_ptr::release(). This disconnects the managed object from the smart pointer giving ownership to the caller who is responsible for releasing it.
And this leads to the next issue…
Confusion between release and reset
The std::unique_ptr
class has two methods:
- release(): releases the managed object giving ownership to the caller.
- reset(): replaces the managed object with another one (it takes a null argument as default value); if the smart pointer holds a non-null pointer to an object that object is deleted.
The following snippet contains an error:
std::unique_ptr<int> MakeObject(int const limit) { std::unique_ptr<int> ptr = std::make_unique<int>(rand() % 100); if (*ptr > limit) { ptr.release(); } return ptr; }
If the condition is not met, the managed object is released. But the actual intention is to delete it. This leads to a memory leak. The correct call is to reset():
std::unique_ptr<int> MakeObject(int const limit) { std::unique_ptr<int> ptr = std::make_unique<int>(rand() % 100); if (*ptr > limit) { ptr.reset(); } return ptr; }
More about memory leaks
We’ve seen above how we can use std::unique_ptr
to avoid leaking objects allocated on the heap. But what if we need to allocate and release an array of objects? The std::unique_ptr
class supports arrays but it comes with trade offs such as: the size is fixed, copying is not possible, std::unique_ptr<[]>
is not a container and therefore not compatible with algorithms and concepts. But there are containers, such as std::vector
, that can be used for this purpose.
As an example, let’s take the following snippet:
void Demo() { DWORD size = 0; TCHAR* buffer = nullptr; BOOL ret = ::GetUserName(nullptr, &size); if(!ret && ::GetLastError() == ERROR_INSUFFICIENT_BUFFER) buffer = new TCHAR[size]; if (buffer) { ::GetUserName(buffer, &size); std::cout << buffer << '\n'; } }
Many Windows API require a buffer of a certain size. But you can call them with a null buffer and they return the size required for the buffer so that you can allocate the necessary memory. This is what happens in this snippet, that retrieves and prints the name of the current user. But the allocated buffer is never released (because of a mistake). This can be entirely avoided by using a std::vector
which takes care of allocations and deallocations internally.
void Demo() { DWORD size = 0; std::vector<TCHAR> buffer; BOOL ret = ::GetUserName(nullptr, &size); if (!ret && ::GetLastError() == ERROR_INSUFFICIENT_BUFFER) buffer.resize(size); if (!buffer.empty()) { ::GetUserName(buffer.data(), &size); std::cout << buffer.data() << '\n'; } }
No matter how this function returns (normally or even because of an exception) the memory allocated by the std::vector
object is released at that point.
The number of elements in an array
I have seen this pattern plenty of times in legacy code. There’s an array declared somewhere (typically in global scope) and then there is a macro used for determining the number of elements in the array. Here is a minimal snippet:
int AnArray[] = {1, 1, 2, 3, 5, 8}; #define COUNT_OF_ARRAY sizeof(AnArray) / sizeof(int) int main() { for (size_t i = 0; i < COUNT_OF_ARRAY; ++i) { std::cout << AnArray[i] << '\n'; } }
This is a poor style that is prone to errors. Such an error is the following (that could be hard to spot). But static analysis caught it immediately:
int AnArray[] = {1, 1, 2, 3, 5, 8}; int main() { for (size_t i = 0; i < sizeof(AnArray); ++i) { std::cout << AnArray[i] << '\n'; } }
The intention is to iterate through all the elements of the array, but sizeof(AnArray)
returns the size it occupies in memory, not its number of elements. This will result in out-of-bounds access.
There are multiple solutions for this problem. On of them is to use the non-member function std::size()
to get the count of elements in the array:
int AnArray[] = {1, 1, 2, 3, 5, 8}; int main() { for (size_t i = 0; i < std::size(AnArray); ++i) { std::cout << AnArray[i] << '\n'; } }
A better approach could arguably be to use std::array
instead of C-like arrays. This will have multiple benefits including easily retrieving the number of elements with the size()
member function (or the std::size()
non-member function).
std::array<int, 6> AnArray {1, 1, 2, 3, 5, 8}; int main() { for (size_t i = 0; i < AnArray.size(); ++i) { std::cout << AnArray[i] << '\n'; } }
Static analysis is a great way to help identify problems with your code that are otherwise difficult to spot. The few examples shows here should demonstrate that.
Yeah, `release` was a problematic naming choice for detaching the object from its smart pointer, since `release` has common precedent in other API’s (e.g. COM’s Release method) to actually release (that is, destroy) the object when the last reference is gone. Oh well.
What are some of the issues highlighted in the provided article about static analysis in C++ code, and can you elaborate on the first issue discussed regarding erasing elements from a vector in a loop?
Telkom Informatika