C++11 provides the atomic operations library that features classes and functions that enable us to perform atomic operations using lock-free mechanisms. There are primarily two class templates in this library, std::atomic and std::atomic_flag. The latter, which defines an atomic boolean type, is guaranteed to always be lock-free and is implemented using the lock-free atomic CPU instructions. The former however, may actually be implemented using mutexes or other locking operations. In this article, we will look at a new class template, introduced in C++20, std::atomic_ref.
The std::atomic class template has several specializations in C++11:
- The primary template that can be instantiated with any type T that is trivially copyable and satisfies both the CopyConstructible and CopyAssignable requirements.
- Partial specialization for all pointer types.
- Specializations for integral types, that includes the character types, the sign and unsigned integer types, and any additional integral types needed by the typedefs in the header <cstdint>.
In C++20, the following specializations have been added:
- Specializations for floating-point types float, double, and long double.
- Partial specializations std::atomic<std::shared_ptr<U>> for std::shared_ptr and std::atomic<std::weak_ptr<U>> for std::weak_ptr.
What std::atomic does not support is references. But let’s start with an example of using std::atomic.
int do_count(int value) { std::atomic<int> counter { value }; std::vector<std::thread> threads; for (int i = 0; i < 10; ++i) { threads.emplace_back([&counter]() { for (int i = 0; i < 10; ++i) { ++counter; { using namespace std::chrono_literals; std::this_thread::sleep_for(50ms); } } }); } for (auto& t : threads) t.join(); return counter; } int main() { int result = do_count(0); std::cout << result << '\n'; // prints 100 }
In this example, the do_count() function creates 10 threads and each thread increments a variable in a loop. The variable is a shared resource and therefore race conditions must be avoided. The use of the std::atomic<int> type guarantees race conditions to not occur, although we don’t necessarily have the guarantee of a lock-free implementation. The is_lock_free() member function, and the non-member std::atomic_is_lock_free() function, as well as the compile-time constant is_always_lock_free, indicate whether the atomic object is implemented using lock-free mechanisms.
std::atomic<int> counter { value }; static_assert(decltype(counter)::is_always_lock_free, "Atomic int is not lock free!");
However, keep in mind that the standard allows that atomic types are only sometimes lock-free. That means, it’s possible that we might know only at runtime if an atomic type is lock free, if only some sub-architectures support lock-free atomic access for a given type (such as the CMPXCHG16B instruction on x86-64).
If we change the function above so that the argument is passed by reference, the result changes:
void do_count_ref(int& value) { std::atomic<int> counter{ value }; std::vector<std::thread> threads; for (int i = 0; i < 10; ++i) { threads.emplace_back([&counter]() { for (int i = 0; i < 10; ++i) { ++counter; { using namespace std::chrono_literals; std::this_thread::sleep_for(50ms); } } }); } for (auto& t : threads) t.join(); } int main() { int value = 0; do_count_ref(value); std::cout << value << '\n'; // prints 0 }
In this case, the value printed to the console is 0 and not 100. This is because std::atomic does not work with references. It makes a copy of the value it is initialized with, so the do_count_ref() function does not actually modify its argument.
There are many scenarios where an object should be accessed atomically only in some parts of a program. Performing atomic operations, even lock-free, when they are not necessary could potentially impact performance. This is especially true when working with large arrays. Parallel operations such as initializations and reads do not have conflicting access, but updates require atomic operations. However, with std::atomic, this is not possible, as shown in the following example:
void array_inc(std::vector<int>& arr, size_t const i) { std::atomic<int> elem{ arr[i] }; elem++; } int main() { std::vector<int> arr{ 1,2,3 }; array_inc(arr, 0); std::cout << arr[0] << '\n'; // prints 1 }
The array_inc function is supposed to increment atomically an element of the provided vector. However, for the same reason mentioned earlier, this does not work, and back in main(), the arr vector is left untouched.
To help with this problem, the C++20 standard provides an atomic type that works with references. However, instead of providing a specialization of std::atomic for references (std::atomic<T&>), a new class template, called std::atomic_ref is available. This has the exact same interface as std::atomic and similar specializations:
- The primary template that can be instantiated with any type T that is trivially copyable.
- Partial specialization for all pointer types.
- Specializations for integral types, that include the character types, the sign and unsigned integer types, and any additional integral types needed by the typedefs in the header <cstdint>.
- Specializations for the floating-point types float, double, and long double.
There are several requirements when using std::atomic_ref:
- The lifetime of the referred object must exceed the lifetime of the atomic_ref object itself.
- As long as an object is referred in an atomic_ref instance, it must be exclusively accessed through atomic_ref instances.
- No sub-object of the referred object can be concurrently referenced by any other atomic_ref object.
You also need to keep in mind that:
- Whether an implementation of std::atomic is lock free, does not imply that the corresponding implementation of atomic_ref is also lock free.
- It is possible to modify the referenced value through a const atomic_ref object.
All that we have to do to fix our examples is to replace std::atomic with std::atomic_ref. Here is the first:
void do_count_ref(int& value) { std::atomic_ref<int> counter{ value }; std::vector<std::thread> threads; for (int i = 0; i < 10; ++i) { threads.emplace_back([&counter]() { for (int i = 0; i < 10; ++i) { ++counter; { using namespace std::chrono_literals; std::this_thread::sleep_for(50ms); } } }); } for (auto& t : threads) t.join(); }
At the end of the execution of this function, the value argument will always be 100.
Simillarly, the array_inc() function will properly update, atomically, the specified element of a vector:
void array_inc(std::vector<int>& arr, size_t const i) { std::atomic_ref<int> elem{ arr[i] }; elem++; }
The generated code is also very efficient. This is what the Compiler Explorer is showing when compiling using GCC and the options -std=gnu++2a -Wall -O3:
array_inc(std::vector<int, std::allocator<int> >&, unsigned long): mov rax, QWORD PTR [rdi] lock add DWORD PTR [rax+rsi*4], 1 ret
I mentioned earlier that is it possible to modify a referenced object through a constant atomic_ref object. This is because the constness of the atomic_ref object is shallow and does not affect referenced object. The following snippet provides and example:
int a = 42; { const std::atomic_ref ra(a); ra++; } std::cout << a << '\n'; // prints 43
At this time, std::atomic_ref is only supported in GCC 10.
The proposal paper is available here: Atomic Ref.