Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-place containers for fun and profit #11860

Open
guevara opened this issue Nov 1, 2024 · 0 comments
Open

In-place containers for fun and profit #11860

guevara opened this issue Nov 1, 2024 · 0 comments

Comments

@guevara
Copy link
Owner

guevara commented Nov 1, 2024

In-place containers for fun and profit



https://ift.tt/lQT0G6i



David Gross


read

Last year I posted about static_any and I am back with a similar container that I implemented recently: inplace_string.

For the record, static_any should have been be called inplace_any, but at this time I chose the prefix static as Boost.StaticVector was popular and it was clear to everybody what to expect from such a container.

inplace_string

The idea behind inplace_string is to get a full replacement of C++17’s std::string, with an in-place memory storage:

using Name = inplace_string<15>; // 16 bytes on stack, size included
Name name = "foo"; 

auto it = name.find("r");
assert(it == Name::npos);

name += "bar";

std::string str(name); // implicit string_view construction


What you get as benefits:

  • Faster string operations: string construction and destruction are much faster since no memory allocation takes place. Other operations are also slightly faster: as its buffer does not grow, inplace_string benefits from a simpler code with less branches.
  • Cache-friendly: it stays close to your other class members, your cache likes it.
  • Simplicity: it allows you to avoid dynamic allocations without implementing your own memory allocator.

As for the implementation, no surprises: the underlying container is std::array<CharT, N> and it uses the famous trick — made popular by fbstring — of storing the remaining size within the last byte of the string — if you don’t know it yet look here and think how the value will change when the string reaches the maximum capacity.

One of the only difference with std::string is the presence of a constructor from const CharT(&)[M]), allowing a compile-time error it the input exceeds the maximum capacity:

inplace_string<5> too_small;
too_small = "foobar"; // error: static_assert failed "basic_inplace_string: size exceeds maximum capacity"


std::string’s SSO

At first glance, it might seem that inplace_string overlaps with std::string’s SSO (Short String Optimization), although it does not.

One of the reason is that SSO is implementation dependent:

  • 15 bytes with gcc 5
  • 22 bytes with clang 5
  • 15 bytes with VS 2015 and 2017

Another reason is that even though everybody thinks that using gcc 5 is enough to get SSO, it is wrong: as of this writing, most of the popular Linux distributions (CentOS, Debian, RedHat) patched the glibc/gcc to keep the previous std::string implementation without SSO (and with COW!). This has been done to not break std::string’s ABI when C++11 has been release, cf this post on RedHat developers’ blog.

More importantly, inplace_string guarantees that your code will never allocate. If, for any reason, your string grows more than you expected, it will throw… and you will know that you were wrong, as your application will likely crash! On the other hand, crossing fingers and relying on std::string’s SSO is somewhat a bet, and you will never know if one day, strings get bigger and start allocating.

string_view

In the current C++ world, std::string has a monopoly and the following code is common practice:

void foo(const std::string& bar)
{
   ...
}


In a code base with std::string as unique string class, this code is reasonable — it is not when multiple string classes live together, as it defeats the purpose of not using std::string if you need to do conversions all over the place.

foo("foobar"); // compiles, but allocates

inplace_string<15> ss = "foobar";
foo(ss); // does not compile, std::string does not know inplace_string

foo(ss.c_str()); // compiles, but allocates and copies the string


To solve this issue, C++17 brings us std::string_view: std::string, inplace_string and friends implement an implicit cast operator to std::string_view, and mixing all of them work nicely:

void foo(std::string_view bar)
{
   ...
}

foo("foobar"); // compiles, does not allocate

inplace_string<15> ss = "foobar";
foo(ss); // compiles, building string_view is cheap (taking pointer + size)

std::string s = "foobar";
foo(s); // compiles


Mixing in-place containers

Let’s consider the following code:

using MetadataTag = std::string;
using MetadataValue = std::experimental::any;

using Metadata = std::pair<MetadataTag, MetadataValue>;

struct MetadataTree
{
template <typename StringT, typename ValueT>
void Add(StringT&& str, ValueT&& value)
{
_metadata.emplace_back(std::forward<StringT>(str), std::forward<ValueT>(value));
}

<span>std</span><span>::</span><span>vector</span><span>&lt;</span><span>Metadata</span><span>&gt;</span> <span>_metadata</span><span>;</span>

};


This object is very similar to QVariantMap, which is a QMap<QString, QVariant>QVariant is actually not like std::variant, but similar to std::any. I use this pattern a lot, as it is convenient to use a string to index various objects. One issue with this was speed: having to allocate memory due to many std::string is a major obstacle.

Its equivalent with in-place containers could be:

using MetadataTag = inplace_string<15>;
using MetadataValue = static_any<16>;

using Metadata = std::pair<MetadataTag, MetadataValue>;

struct MetadataTree
{
template <typename StringT, typename ValueT>
void Add(StringT&& str, ValueT&& value)
{
_metadata.emplace_back(std::forward<StringT>(str), std::forward<ValueT>(value));
}

<span>static</span> <span>constexpr</span> <span>MaxCapacity</span> <span>=</span> <span>8</span><span>;</span>
<span>boost</span><span>::</span><span>container</span><span>::</span><span>static_vector</span><span>&lt;</span><span>Metadata</span><span>,</span> <span>MaxCapacity</span><span>&gt;</span> <span>_metadata</span><span>;</span>

};

Now, let’s benchmark these two guys — I know, it is unfair, but we all love looking at numbers! For that, let’s take a simple usage of our MetadataTree class:

template <typename TreeT>
TreeT GetTree(int i, double d, bool b)
{
  TreeT tree;
  tree.Add("metadata1", i);
  tree.Add("metadata2", d);
  tree.Add("metadata3", b);
  return tree;
}

… and the result (time, instructions and cycles per iteration):

Test                        Time (ns)          INS          CYC 
---------------------------------------------------------------
MetadataTree                      536        2,205        1,010 
in-place MetadataTree               3            3            2


You can find the benchmark on my github. Here is the full output of perf:

$ perf stat -e cycles,instructions,cache-misses ./inplace_examples notinplace
notinplace3000000

Performance counter stats for './inplace_examples notinplace':

 <span>1</span> <span>005</span> <span>689</span> <span>341</span> <span>cycles</span>                   
 <span>2</span> <span>209</span> <span>217</span> <span>853</span> <span>instructions</span>              <span>#</span>    <span>2</span><span>,</span><span>20</span>  <span>insns</span> <span>per</span> <span>cycle</span>        
        <span>20</span> <span>05</span><span>9</span> <span>cache</span><span>-</span><span>misses</span>                                                

   <span>0</span><span>,</span><span>531293829</span> <span>seconds</span> <span>time</span> <span>elapsed</span>

$ perf stat -e cycles,instructions,cache-misses ./inplace_examples inplace
inplace3000000

Performance counter stats for './inplace_examples inplace':

     <span>2</span> <span>540</span> <span>791</span> <span>cycles</span>                   
     <span>2</span> <span>384</span> <span>036</span> <span>instructions</span>              <span>#</span>    <span>0</span><span>,</span><span>94</span>  <span>insns</span> <span>per</span> <span>cycle</span>        
        <span>10</span> <span>970</span> <span>cache</span><span>-</span><span>misses</span>                                                

   <span>0</span><span>,</span><span>0015</span><span>84907</span> <span>seconds</span> <span>time</span> <span>elapsed</span></code></pre></div>






via Thoughts from a Wall Street developer

November 1, 2024 at 02:55PM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant