You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A few weeks ago, I posted the first part of this article, where I explained the advantages of Boost.MultiIndex over the standard containers when you need to have multiple views on a set of data.
In this second part, I would like to talk about the benefits you can get from using Boost.MultiIndex as a single-index hash table, as a replacement
of std::unordered_map.
One interesting and powerful aspect of Boost.MultiIndex is that it allows you to add an index of type T, where T is different from the stored type. And it is more frequent and useful that you could think.
std::string as a key
Who never used a std::unordered_map<std::string, X>? This is the most straightforward way to lookup for an object with a string: it is simple, relatively fast, and convenient as std::hash<std::string> is provided by the standard library.
However, what if you get another string type as input for your lookup? I already found myself in this situation multiple times:
when reading from shared memory, a socket, or a device, you get a const char*
when using APIs that provide their own string implementation, like Qt: even if your entire codebase is using std::string, you will have some boundary zones where both types are used
when using third party libraries: you get most of the time a const char * or std::string
struct stock
{
std::string ref;
double price;
int volume;
};
struct message_handler
{
void receive(const char* buffer, size_t size)
{
// no choice here, lookup has to be on std::string
// a std::string is constructed: memory allocation + bytes copy
auto it = _stocks.find({buffer, size});
// ...
}
One would say that you could simply convert your string to std::string before the lookup. However this might not be acceptable from a performance point of view, as it will perform a memory allocation. Nowadays, many STL implementations use “small string optimization* (SSO) — up to 15 bytes with g++ 6.2 and 22 chars with clang 3.9 — in order to avoid the memory allocation. Bytes are still copied though, and the performance of your system heavily relies on implementation details.
The solution here is to use C++17’s std::string_view. This is not possible with standard containers, though: with std::unordered_map and std::unordered_set, you cannot query on another type than Key — and using a std::unordered_map<std::string_view, X> is a bad choice, for a number of reasons I explain in the appendix.
Easy task with Boost.MultiIndex: simlpy add a method that returns a string view and use it as index!
struct stock
{
std::experimental::string_view ref_view() const { return ref; }
// same attributes as before
};
struct message_handler
{
void receive(const char* buffer, size_t size)
{
// no memory allocation, no copy!
std::experimental::string_view ref_view(buffer, size);
auto it = _stocks.find(ref_view);
// ...
}
The magic is done on line 23, which defined the hashed unique index as const_mem_fun<stock, std::experimental::string_view, &stock::ref_view>: a const member function of stock, that returns
a string view, and can be found at &stock::ref_view.
unordered_set<unique_ptr<T>>
Keeping a set of unique_ptr<T> is less common, but sometimes convenient. Such pattern can be used if a class owns a collection of polymorphic T, where this owner class does not know anything about the concrete type.
private:
void on_error(connection& conn)
{
conn.close("bye bye");
_connections.erase(&conn); // won't compile: Key is a unique_ptr<connection>, not a connection*
}
Note: on_error is private. If it was public, this would be a poor API, as the client code would have a dangling reference the line right after the call to node::on_error. node owns and does not share connection ; otherwise it could simply use a std::unordered_set<std::shared_ptr<T>>.
There are a few ways to solve this with unordered STL containers, but…
std::unordered_set<connection*>: you lose the advantages of RAII
std::unordered_map<connection*, std::unique_ptr<connection>>: ugly and under-performant solution
Boost.MultiIndex allows you here to store unique_ptr while being able to access it from raw pointers:
struct node
{
// ...
private:
void on_error(connection& conn)
{
conn.close("bye bye");
_connections.erase(&conn); // this line now builds!
}
using connection_ptr = std::unique_ptr<connection>;
Composite keys is another quite common pattern when storing objects into associative containers. As the name implies, it allows to add an index composed by multiple fields. Let’s take a quick example.
// more attributes ...
std::chrono::time_point<std::chrono::system_clock> _started_time;
std::vector<process> _processes;
};
This class represents a session of a user on a particular machine. A user can only log once on a host, which allows us to uniquely identify a session by the association
of the username and the hostname.
What are our options to do that with standard containers?
The most straightforward solution is to use a std::unordered_map<std::pair<std::string, std::string>, session>, as you don’t have to provide any custom std::hash nor operator== implementations. I see a lot of problems with this solution though: it is bad from a performance point of view, because it uses four std::string instances
while we only need two and it allocates two std::string on each lookup. Also, the code is not very elegant and is error prone as you don’t know if username is the first element of the pair or the second.
A better way to do it in my opinion is to have a std::unordered_map<session_id, session>, with session_id a struct with two std::string. This gets rid of a few drawbacks that we have in the solution 1, but requires a std::hash specialization and operator== definition. Not dramatic, but this raises more questions: how to implement a good hash function that takes two std::string as input?
I usually use Boost.Functional/Hash for that, but not every application can support Boost as a dependency. Apparently, a good start would be to use a xor with some bit shifting on the two std::string hashes, but who knows if this is really good?
A last solution I could think of is to use std::unordered_set<session>, but it is still far from ideal: you have all the problems cited in the solution 2. The only advantage is that
you’re using less memory as you don’t have the two additional std::string instances as key… but now you have to const_cast all the time your stored objects, which can be annoying or simply not
desirable depending on your API.
As you can see, there is no simple and straightforward solution to that (simple!) problem. Here is what is looks like with Boost.MultiIndex:
auto it = sessions.find(boost::make_tuple("foobar", "localhost"));
std::cout << it->user_name << std::endl;
The composite_key takes in this case two const_mem_fun — which is the argument used to refer to const member functions. For the lookup, you will have to build a Boost.Tuple as a key.
As I wrote in the first paragraph, building std::string to do a lookup take a significant time. If you care enough about performance, here is the solution that is going to squeeze the
last bits and nanoseconds:
Same approach as before: the composite key is still composed by the two methods of your class, that we modify of course in order to return a std::string_view and not a std::string anymore. The only
annoying bit is that we have to explicitly define the composite_key_hash as Boost.MultiIndex doesn’t pick the std::string_view specialization!
Appendix
Why using std::unordered_map<std::string_view, X> is bad design choice?
It is a bad design because it is fragile and error-prone. The key is const std::string_view, but as it is a view, it can actually change anytime. The only
acceptable scenario in my mind would be if X owns the viewed string as const std::string.
is wrong because x is going to be copied, while the key is pointing to x.str
is wrong for the same reason
is wrong because there is no guarantee on the order of evaluation of arguments, thus x could be moved before x.str is evaluated, resulting in the construction of a string view that points to some undefined memory area
is correct: we construct the view, and move x
via Thoughts from a Wall Street developer
November 1, 2024 at 02:54PM
The text was updated successfully, but these errors were encountered:
Why you should use Boost.MultiIndex (Part II)
https://ift.tt/td25yjU
David Gross
read
A few weeks ago, I posted the first part of this article, where I explained the advantages of Boost.MultiIndex over the standard containers when you need to have multiple views on a set of data.
In this second part, I would like to talk about the benefits you can get from using Boost.MultiIndex as a single-index hash table, as a replacement of std::unordered_map.
One interesting and powerful aspect of Boost.MultiIndex is that it allows you to add an index of type T, where T is different from the stored type. And it is more frequent and useful that you could think.
std::string as a key
Who never used a std::unordered_map<std::string, X>? This is the most straightforward way to lookup for an object with a string: it is simple, relatively fast, and convenient as std::hash<std::string> is provided by the standard library.
However, what if you get another string type as input for your lookup? I already found myself in this situation multiple times:
One would say that you could simply convert your string to std::string before the lookup. However this might not be acceptable from a performance point of view, as it will perform a memory allocation. Nowadays, many STL implementations use “small string optimization* (SSO) — up to 15 bytes with g++ 6.2 and 22 chars with clang 3.9 — in order to avoid the memory allocation. Bytes are still copied though, and the performance of your system heavily relies on implementation details.
The solution here is to use C++17’s std::string_view. This is not possible with standard containers, though: with std::unordered_map and std::unordered_set, you cannot query on another type than Key — and using a std::unordered_map<std::string_view, X> is a bad choice, for a number of reasons I explain in the appendix.
Easy task with Boost.MultiIndex: simlpy add a method that returns a string view and use it as index!
The magic is done on line 23, which defined the hashed unique index as
const_mem_fun<stock, std::experimental::string_view, &stock::ref_view>
: a const member function of stock, that returns a string view, and can be found at &stock::ref_view.unordered_set<unique_ptr<T>>
Keeping a set of unique_ptr<T> is less common, but sometimes convenient. Such pattern can be used if a class owns a collection of polymorphic T, where this owner class does not know anything about the concrete type.
Note: on_error is private. If it was public, this would be a poor API, as the client code would have a dangling reference the line right after the call to node::on_error. node owns and does not share connection ; otherwise it could simply use a std::unordered_set<std::shared_ptr<T>>.
There are a few ways to solve this with unordered STL containers, but…
std::unordered_set<connection*>
: you lose the advantages of RAIIstd::unordered_map<connection*, std::unique_ptr<connection>>
: ugly and under-performant solutionBoost.MultiIndex allows you here to store unique_ptr while being able to access it from raw pointers:
composite keys
Composite keys is another quite common pattern when storing objects into associative containers. As the name implies, it allows to add an index composed by multiple fields. Let’s take a quick example.
This class represents a session of a user on a particular machine. A user can only log once on a host, which allows us to uniquely identify a session by the association of the username and the hostname.
What are our options to do that with standard containers?
The most straightforward solution is to use a
std::unordered_map<std::pair<std::string, std::string>, session>
, as you don’t have to provide any custom std::hash nor operator== implementations. I see a lot of problems with this solution though: it is bad from a performance point of view, because it uses four std::string instances while we only need two and it allocates two std::string on each lookup. Also, the code is not very elegant and is error prone as you don’t know if username is the first element of the pair or the second.A better way to do it in my opinion is to have a
std::unordered_map<session_id, session>
, with session_id a struct with two std::string. This gets rid of a few drawbacks that we have in the solution 1, but requires a std::hash specialization and operator== definition. Not dramatic, but this raises more questions: how to implement a good hash function that takes two std::string as input? I usually use Boost.Functional/Hash for that, but not every application can support Boost as a dependency. Apparently, a good start would be to use a xor with some bit shifting on the two std::string hashes, but who knows if this is really good?A last solution I could think of is to use
std::unordered_set<session>
, but it is still far from ideal: you have all the problems cited in the solution 2. The only advantage is that you’re using less memory as you don’t have the two additional std::string instances as key… but now you have to const_cast all the time your stored objects, which can be annoying or simply not desirable depending on your API.As you can see, there is no simple and straightforward solution to that (simple!) problem. Here is what is looks like with Boost.MultiIndex:
The composite_key takes in this case two const_mem_fun — which is the argument used to refer to const member functions. For the lookup, you will have to build a Boost.Tuple as a key.
As I wrote in the first paragraph, building std::string to do a lookup take a significant time. If you care enough about performance, here is the solution that is going to squeeze the last bits and nanoseconds:
Same approach as before: the composite key is still composed by the two methods of your class, that we modify of course in order to return a std::string_view and not a std::string anymore. The only annoying bit is that we have to explicitly define the composite_key_hash as Boost.MultiIndex doesn’t pick the std::string_view specialization!
Appendix
It is a bad design because it is fragile and error-prone. The key is const std::string_view, but as it is a view, it can actually change anytime. The only acceptable scenario in my mind would be if X owns the viewed string as const std::string.
m.emplace(x.str, x)
std::experimental::string_view sv{x.str};
m.emplace(sv, x.str);
m.emplace(x.str, std::move(x));
std::experimental::string_view sv{x.str};
m.emplace(sv, std::move(x));
… and the answer is:
via Thoughts from a Wall Street developer
November 1, 2024 at 02:54PM
The text was updated successfully, but these errors were encountered: