Request: binary serialization/deserialization #358

CyberBrainX · 2016-11-09T16:32:17Z

Wouldn’t it be useful to have the possibility to serialize the json object to a more compact form?
This would make sense for storing the container or sending it over an internet connection.

The MessagePack (http://msgpack.org/) format or some proprietary format directly derived from your internal representation would be fine.
Or could you probably just provide some generic interface for the binary serialization/deserialization?

I think this approach could be more useable than just gzipping the whole JSON data.

As an example this library did implement some alternative representations of a json container: http://greatpanic.com/progdocs/UniversalContainer.html
(But this library doesn’t work properly and doesn’t use C++ 11)

nlohmann · 2016-11-09T16:58:11Z

I also had a look at MessagePack, and it seems to be a reasonable approach. As it can express more than JSON, I hope that users would understand that the library would not be able to parse arbitrary MessagePack input.

nlohmann · 2016-11-11T16:26:29Z

I started to implement MessagePack. Commit 543745a contains a basic serializer/deserializer for JSON values as static functions. It works with a std::vector<uint8_t> and currently has no diagnosis information in case of a parse error.

Feedback would be greatly appreciated:

naming of the functions
static function vs. member function vs. free function
move implementation outside the main header?
organization of the functions (I am thinking of reorganizing the large if in the parser, maybe using re2c)
improvements of the low-level shifting/conversion stuff
byte vector vs. containers vs. ???

CyberBrainX · 2016-11-14T16:49:49Z

Wow, I couldn’t believe how fast you did implement a first version. And it seems to be working!
I think it’s a great benefit being able to exchange data with the MessagePack world. Except binary data, but binary data and JSON is a bad match anyway (base 64 encoding).

I did test the space savings in some of my use cases and got the following results.
In simple cases (small data set) with almost no numeric data the saving was 12% over the compacted JSON file. Zipping alone showed a 15% reduction. First converting to msgpack and zipping the result gave a 19% reduction. As expected, with more numeric data the compression rate did grow constantly. Nice!

The naming to_msgpack and from_msgpack makes sense and is future save if you like to add more serializers/deserializers.
I prefer the static function. It’s the right context and can be still used in a flexible way.
At the moment it doesn’t hurt to have just one header. It’s just more convenient.
I’am not sure if re2c would be necessary in the parsing part. A hand crafted parser could be faster (the same is true for the json part - rapid json). Using a switch statement (for the fixed values) or a jump table could possibly speed up parsing of the MessagePack data(needs some tests to be verified).
The low-level shifting/conversion stuff could be improved but the code will get more complicated (dealing with endianness).
A byte vector is fine. It has some nice properties. A lot of frameworks have chosen byte vectors for handling serialized data.

Thank you for your great JSON implementation!

user1095108 · 2016-11-17T09:51:11Z

I beg to differ. MsgPack is not standardized, whereas CBOR, for example, is. There are a myriad of binary data formats out there, but, for better or worse, CBOR is standardized in RFC 7049, whereas MsgPack is not.

marton78 · 2016-11-21T09:08:42Z

+1 for CBOR!

nlohmann · 2016-11-23T16:48:20Z

I shall have a look at CBOR.

CyberBrainX · 2016-11-25T09:24:21Z

Ok, standards are great, I like them. CBOR is a PROPOSED standard and could still get changed. The proposal has some nice properties and it seems to be an improved MessagePack protocol. Unfortunately it’s not widely adopted and there are a lot of poor implementations. There’s no modern C++ 11 implementation and the old style C++ implementation is looking more like C code.

user1095108 · 2016-11-25T16:25:50Z

Oh, you wanted to call a library, I thought you wanted to write everything yourselves. But anyway, even it is "only" a proposed standard, this is still better than MSGPACK. Should it change, we will know about it, but we might not find out if MSGPACK changes.

nlohmann · 2016-11-26T23:12:52Z

I started to also implement CBOR. As the implementation tends to be rather verbose, I am thinking of moving the binary serialization/deserialization into a new header. In any case, I need to add more test cases.

marton78 · 2016-11-27T15:27:30Z

Hi @nlohmann, I've had a look at your code. The implementation could be less verbose by

Exploiting the orthogonality between length and content type. You could write a decode_length helper function that is called for all content types that can have a length. The function would take the length from the first byte if first_byte & 0x1F < 0x18 and call decode_uint8 / 16 / 32 / 64 otherwise. Then it would return a pointer to the first content byte. Similar for encoding.
Writing a jump table with 256 entries. This way you would also be sure that you don't forget anything.

nlohmann · 2016-11-27T15:49:15Z

@marton78 Good points. There may be even more possibilities to generalize between the MessagePack and CBOR sections given the two formats are so similar.

nlohmann · 2016-11-27T18:14:32Z

@marton78 I started to refactor the code and indeed it got much more concise, see a820d68. However, I think even after more refactoring, the whole CBOR/MessagePack business needs some 1000 lines of code...

nlohmann · 2016-12-01T21:32:25Z

I really want to move the code for MessagePack and CBOR out of the main header (it's over 1000 lines of code...) and I need some ideas how to best do this.

My thoughts:

C++ does not allow to extend the basic_json class in an additional file, so I can't just move the static functions (to_msgpack, from_msgpack, etc.) to a new header.
I could move these functions in the nlohmann namespace. So far, I think I do not rely on private functions of the basic_json class, so this could be possible.

The syntax would then be:

// assuming vec is a std::vector<uint8_t>
json j = nlohmann::from_msgpack(vec);
auto vec2 = nlohmann::to_msgpack(j);

I would be happy about opinions, pointers to best practices, etc.

Furthermore, any feedback on the interfaces, in particular using std::vector<uint8_t> as input/ouput would be more than welcome.

gregmarr · 2016-12-01T21:42:06Z

You could keep it in the class but in another file through a #include in the middle of the class definition, but it's really ugly.

If you did need access to private functions/data, you could forward declare a msgpack class in the nlohmann namespace and make it a friend of basic_json.

Probably the easiest is to just put the declarations of the long statics in the class definition, and then put the full definitions in another header.

user1095108 · 2016-12-01T21:46:19Z

Just write a separate library. You are outgrowing the bounds of this one. Slap in some RPC while writing the new one :)

nlohmann · 2016-12-01T21:47:36Z

If I would include the other header (say, json_binary.hpp), I would end json.hpp's independence, right? I would like to have the binary stuff separate from the rest, so you would only need to include the other header if you need MessagePack/CBOR support.

gregmarr · 2016-12-01T21:50:49Z

Yes, the second header should only be needed for callers of those functions.

user1095108 · 2016-12-01T21:52:14Z

I think the best thing to do would be to make a new repository entirely.

TurpentineDistillery · 2016-12-01T21:55:05Z

Or keep the functionality in the same file, ifdefed-out by default. Whoever needs it can enable it with #define NLOHMANN_JSON_ENABLE_CBOR In the source, or via a compiler switch.

nlohmann · 2016-12-01T21:58:17Z

Any idea how to move the code into a second file without also needing a preprocessor macro?

TurpentineDistillery · 2016-12-01T22:03:40Z

Put declarations in the main file, and definitions in another. Such files are typically named *.inl

TurpentineDistillery · 2016-12-01T22:07:51Z

I.e what @gregmarr said

nlohmann · 2016-12-01T22:19:18Z

Or another idea (just thinking): What about splitting the code into several files for development and have them merge by a script into a single (large) header?

nlohmann · 2016-12-01T22:25:28Z

(Maybe my judgement of a large header is wrong, but I fear that letting it grow (a) increases compilation times, (b) makes understanding the code harder, and (c) pushes a lot of functionality into code that may only need a fraction.)

TurpentineDistillery · 2016-12-02T00:30:57Z

A big benefit of having one file with declarations and one-or-many files with definitions is that it would allow for explicit template instantiation, so that the compilation overhead related to json.hpp can be factored out into a separate translation unit.

nlohmann · 2016-12-10T23:41:17Z

I am currently cleaning up and bringing the test coverage to 100%. I hope to release CBOR/MessagePack by next week.

nlohmann · 2016-12-11T15:14:06Z

Merged e906933.

marton78 · 2016-12-11T15:42:38Z

Hi @nlohmann, great job! I have looked at your code, some remarks:

When converting single / double, you have reimplemented memcpy. Why don't you just use it? (or std::copy, would be actually even nicer).
you could #define all those magic numbers (or put them in an enum)
msgpack / cbor are so similar that a lot of code is duplicated. probably you could template those conversions based on serialization format (so, two structs: msgpack and cbor, both of which contain all magic numbers, limits for the integers to fit in the first byte and the conversion functions from small integer to one byte)

All in all very nice job, thanks!

nlohmann · 2016-12-11T15:48:33Z

@marton78 Thanks for the feedback.

Regarding memcpy: I need to have another look.
Regarding the magic numbers / reduce code duplication: There are in fact a lot of similarities between CBOR and MessagePack, but I doubt that creating a configurable super type from which both formats inherit makes the code much more readable. I may play around with it, but right now I'd rather focus on writing down the limitations of the implementations (i.e., which types are not supported).

nlohmann · 2016-12-14T21:34:23Z

@marton78 Sorry, but I can't find the spot where I can use memcpy/std::copy.

TurpentineDistillery · 2016-12-15T07:32:41Z

Maybe it's this part?

            case value_t::number_float:
            {
                // Double-Precision Float
                v.push_back(0xfb);
                const uint8_t* helper = reinterpret_cast<const uint8_t*>(&(j.m_value.number_float));
                for (size_t i = 0; i < 8; ++i)
                {
                    v.push_back(helper[7 - i]);
                }
                break;
            }

Also, if this part of code expects the number_float_t to be double, a static_assert would be appropriate?

marton78 · 2016-12-15T08:12:02Z

Yes, it's this part, std::copy_backward could be used here instead of memcpy though, since copying is done in reverse.

TurpentineDistillery · 2016-12-15T08:52:41Z

This looks like the endianness handling business?
If so, it can't be assumed that the host platform is little-endian. The ntoh and hton family functions are there to take the host endianness into account and reverse the bytes efficiently as appropriate, but these are not portable. It looks like boost endian library also does this and compiles to efficient byte-swapping instructions too (not saying we should add boost dependency).

user1095108 · 2016-12-15T09:01:10Z

Lol definetly not, look at my swap file to get some ideas:

https://github.com/user1095108/generic/blob/master/swap.hpp

nlohmann · 2016-12-16T21:00:06Z

FYI: I asked to advertize the latest release on the CBOR (cbor/cbor.github.io#25) and MessagePack (msgpack/msgpack#221) website.

nlohmann added the kind: enhancement/improvement label Nov 9, 2016

nlohmann added a commit that referenced this issue Nov 11, 2016

starting to integrate MessagePack (#358)

543745a

nlohmann self-assigned this Nov 11, 2016

nlohmann added state: please discuss please discuss the issue or vote for your favorite option kind: improvement and removed kind: improvement labels Nov 11, 2016

nlohmann added this to the Release 2.0.8 milestone Nov 15, 2016

nlohmann removed this from the Release 2.0.8 milestone Nov 23, 2016

nlohmann added a commit that referenced this issue Nov 26, 2016

🚧 CBOR support (#358)

a73f43a

nlohmann removed the state: please discuss please discuss the issue or vote for your favorite option label Nov 26, 2016

nlohmann added the state: help needed the issue needs help to proceed label Dec 1, 2016

nlohmann added this to the Release 2.0.9 milestone Dec 2, 2016

This was referenced Dec 6, 2016

Add test cases from RFC 7049 #384

Closed

Implement indefinite-length types from RFC 7049 #387

Closed

nlohmann added a commit that referenced this issue Dec 11, 2016

📝 clean up and added documentation for #358

dc6fc3e

nlohmann closed this as completed Dec 11, 2016

nlohmann added the aspect: binary formats BSON, CBOR, MessagePack, UBJSON label Mar 28, 2017

Request: binary serialization/deserialization #358

Request: binary serialization/deserialization #358

Comments

CyberBrainX commented Nov 9, 2016

nlohmann commented Nov 9, 2016

nlohmann commented Nov 11, 2016 • edited Loading

CyberBrainX commented Nov 14, 2016

user1095108 commented Nov 17, 2016

marton78 commented Nov 21, 2016

nlohmann commented Nov 23, 2016

CyberBrainX commented Nov 25, 2016

user1095108 commented Nov 25, 2016

nlohmann commented Nov 26, 2016

marton78 commented Nov 27, 2016 • edited Loading

nlohmann commented Nov 27, 2016

nlohmann commented Nov 27, 2016

nlohmann commented Dec 1, 2016

gregmarr commented Dec 1, 2016 • edited Loading

user1095108 commented Dec 1, 2016

nlohmann commented Dec 1, 2016

gregmarr commented Dec 1, 2016

user1095108 commented Dec 1, 2016

TurpentineDistillery commented Dec 1, 2016

nlohmann commented Dec 1, 2016

TurpentineDistillery commented Dec 1, 2016

TurpentineDistillery commented Dec 1, 2016

nlohmann commented Dec 1, 2016

nlohmann commented Dec 1, 2016

TurpentineDistillery commented Dec 2, 2016

nlohmann commented Dec 10, 2016

nlohmann commented Dec 11, 2016

marton78 commented Dec 11, 2016

nlohmann commented Dec 11, 2016

nlohmann commented Dec 14, 2016

TurpentineDistillery commented Dec 15, 2016

marton78 commented Dec 15, 2016 • edited Loading

TurpentineDistillery commented Dec 15, 2016

user1095108 commented Dec 15, 2016

nlohmann commented Dec 16, 2016

nlohmann commented Nov 11, 2016 •

edited

Loading

marton78 commented Nov 27, 2016 •

edited

Loading

gregmarr commented Dec 1, 2016 •

edited

Loading

marton78 commented Dec 15, 2016 •

edited

Loading