Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further thoughts on performance improvements #418

Closed
TurpentineDistillery opened this issue Jan 6, 2017 · 6 comments
Closed

Further thoughts on performance improvements #418

TurpentineDistillery opened this issue Jan 6, 2017 · 6 comments

Comments

@TurpentineDistillery
Copy link

My analysis of the performance bottlenecks showed that passthrough (non-instantiating) parsing performance is comparable to that of wc (i.e. near-perfect), and the major bottleneck is the usual bane of performant programming - mallocs and frees of many small objects. Can this be improved without resorting to custom allocators game? I think so. The performance becomes an issue when there are many json records to parse. In real-world scenarios the records will have the same schema - just the values that are different. Under this assumption, if we have an old json object that is no longer needed, it can be reused to update the values within, rather than let it expire, freeing the contents, and then allocate everything de-novo in a new json. This would entail having a function like json::parse(istream& istr, json& dest) that would take an existing record and reuse the existing fields where possible, making allocations only when necessary (longer strings, longer arrays, keys in maps that are absent in the old object, etc).

@TurpentineDistillery TurpentineDistillery changed the title Further thoughts on performance impronements Further thoughts on performance improvements Feb 15, 2017
@TurpentineDistillery
Copy link
Author

Another room for improvement:
Now we have operator[] accepting a key_type (string). This means that calls like j["foo"] necessitate creation of a string under the hood just to access the value. This could be avoided by adding overloads alongs the lines of

reference at(const char* const key)
{	
    static thread_local object_t::key_type k;
    k.assign(key);
    return at(key); // call the original `at`
}

Having said that, modern compilers use SBO-optimized std::string implementations, so the needless malloc/free churn when creating a temporary string occurs only in GCC<5, so this might be not worth the effort.

@nlohmann
Copy link
Owner

I recently (bd0326c, b1441f3, 0f04e42) started with some micro-optimizations using cachegrind to measure the number of executed instructions. This may not be a precise measurement that can be transferred to any architecture, but it helps to get a feeling where time is actually spent. Of course, wall-clock benchmarks are also required.

I shall continue focussing on the dump function. In particular, I want to move all related functions into a serializer class to recycled buffers and also the locale information.

(Sorry for hijacking this issue, but I did not find a better place to dump these thoughts)

@TurpentineDistillery
Copy link
Author

This is precisely what the ticket was about, so you're not hijacking anything : )

Another place for optimization is unescaping unicode. Currently the code allocates a string for every character (not really a problem with SBO string implementations).

nlohmann added a commit that referenced this issue Feb 28, 2017
Moved all dump()-related functions into a class "serializer". This fix includes a lot of performance improvements yielding a 7% speedup for serialization. Details on the individual steps can be found in the commit messages.

Individual benchmark numbers:

before:

dump jeopardy.json                            5   374555228 ns/op
dump jeopardy.json with indent                5   430953700 ns/op
dump numbers/floats.json                      5   622938509 ns/op
dump numbers/signed_ints.json                20    82177979 ns/op

after:

dump jeopardy.json                            5   335449757 ns/op -11%
dump jeopardy.json with indent                5   375467773 ns/op -13%
dump numbers/floats.json                      5   584611852 ns/op -7%
dump numbers/signed_ints.json                20    68289574 ns/op -17%
@TurpentineDistillery
Copy link
Author

TurpentineDistillery commented Mar 4, 2017

I tried adding an optimizing overload for string literals, but in both cases this results in ambiguous overload resolution : /

// prevent heap-allocating construction of a temporary string
// when accessing by string literal by reusing a thread_local storage.

#if 0
    const_reference operator[](const char* key const) const
#else
    template<int N> 
    const_reference operator[](const char (&key)[N]) const
#endif
    {
        static thread_local typename object_t::key_type s;
        s.assign(key);
        return (*this)[s];
    }

@theodelrieu
Copy link
Contributor

You must add some enable_if to avoid the array-to -pointer decay, the parse method does exactly that ;)

nlohmann added a commit that referenced this issue Jul 22, 2017
Internally, the parser now writes its result into a JSON value provided as a reference. To be usable, the public interfaces need to be extended.
@nlohmann
Copy link
Owner

I think the ideas here are realized by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants