Further thoughts on performance improvements #418

TurpentineDistillery · 2017-01-06T02:37:13Z

My analysis of the performance bottlenecks showed that passthrough (non-instantiating) parsing performance is comparable to that of wc (i.e. near-perfect), and the major bottleneck is the usual bane of performant programming - mallocs and frees of many small objects. Can this be improved without resorting to custom allocators game? I think so. The performance becomes an issue when there are many json records to parse. In real-world scenarios the records will have the same schema - just the values that are different. Under this assumption, if we have an old json object that is no longer needed, it can be reused to update the values within, rather than let it expire, freeing the contents, and then allocate everything de-novo in a new json. This would entail having a function like json::parse(istream& istr, json& dest) that would take an existing record and reuse the existing fields where possible, making allocations only when necessary (longer strings, longer arrays, keys in maps that are absent in the old object, etc).

The text was updated successfully, but these errors were encountered:

TurpentineDistillery · 2017-02-15T14:24:25Z

Another room for improvement:
Now we have operator[] accepting a key_type (string). This means that calls like j["foo"] necessitate creation of a string under the hood just to access the value. This could be avoided by adding overloads alongs the lines of

reference at(const char* const key)
{	
    static thread_local object_t::key_type k;
    k.assign(key);
    return at(key); // call the original `at`
}

Having said that, modern compilers use SBO-optimized std::string implementations, so the needless malloc/free churn when creating a temporary string occurs only in GCC<5, so this might be not worth the effort.

nlohmann · 2017-02-27T15:50:43Z

I recently (bd0326c, b1441f3, 0f04e42) started with some micro-optimizations using cachegrind to measure the number of executed instructions. This may not be a precise measurement that can be transferred to any architecture, but it helps to get a feeling where time is actually spent. Of course, wall-clock benchmarks are also required.

I shall continue focussing on the dump function. In particular, I want to move all related functions into a serializer class to recycled buffers and also the locale information.

(Sorry for hijacking this issue, but I did not find a better place to dump these thoughts)

TurpentineDistillery · 2017-02-28T01:48:57Z

This is precisely what the ticket was about, so you're not hijacking anything : )

Another place for optimization is unescaping unicode. Currently the code allocates a string for every character (not really a problem with SBO string implementations).

Moved all dump()-related functions into a class "serializer". This fix includes a lot of performance improvements yielding a 7% speedup for serialization. Details on the individual steps can be found in the commit messages. Individual benchmark numbers: before: dump jeopardy.json 5 374555228 ns/op dump jeopardy.json with indent 5 430953700 ns/op dump numbers/floats.json 5 622938509 ns/op dump numbers/signed_ints.json 20 82177979 ns/op after: dump jeopardy.json 5 335449757 ns/op -11% dump jeopardy.json with indent 5 375467773 ns/op -13% dump numbers/floats.json 5 584611852 ns/op -7% dump numbers/signed_ints.json 20 68289574 ns/op -17%

TurpentineDistillery · 2017-03-04T19:58:44Z

I tried adding an optimizing overload for string literals, but in both cases this results in ambiguous overload resolution : /

// prevent heap-allocating construction of a temporary string
// when accessing by string literal by reusing a thread_local storage.

#if 0
    const_reference operator[](const char* key const) const
#else
    template<int N> 
    const_reference operator[](const char (&key)[N]) const
#endif
    {
        static thread_local typename object_t::key_type s;
        s.assign(key);
        return (*this)[s];
    }

theodelrieu · 2017-03-05T10:53:22Z

You must add some enable_if to avoid the array-to -pointer decay, the parse method does exactly that ;)

Internally, the parser now writes its result into a JSON value provided as a reference. To be usable, the public interfaces need to be extended.

nlohmann · 2017-08-15T18:58:26Z

I think the ideas here are realized by now.

nlohmann added the kind: enhancement/improvement label Jan 9, 2017

TurpentineDistillery changed the title ~~Further thoughts on performance impronements~~ Further thoughts on performance improvements Feb 15, 2017

nlohmann added a commit that referenced this issue Jul 22, 2017

⚡ started working on parser with provded result reference #418

c7a69ae

Internally, the parser now writes its result into a JSON value provided as a reference. To be usable, the public interfaces need to be extended.

nlohmann closed this as completed Aug 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further thoughts on performance improvements #418

Further thoughts on performance improvements #418

TurpentineDistillery commented Jan 6, 2017

TurpentineDistillery commented Feb 15, 2017

nlohmann commented Feb 27, 2017

TurpentineDistillery commented Feb 28, 2017

TurpentineDistillery commented Mar 4, 2017 •

edited

Loading

theodelrieu commented Mar 5, 2017

nlohmann commented Aug 15, 2017

Further thoughts on performance improvements #418

Further thoughts on performance improvements #418

Comments

TurpentineDistillery commented Jan 6, 2017

TurpentineDistillery commented Feb 15, 2017

nlohmann commented Feb 27, 2017

TurpentineDistillery commented Feb 28, 2017

TurpentineDistillery commented Mar 4, 2017 • edited Loading

theodelrieu commented Mar 5, 2017

nlohmann commented Aug 15, 2017

TurpentineDistillery commented Mar 4, 2017 •

edited

Loading