-
Notifications
You must be signed in to change notification settings - Fork 1
ToolBehavior
Performance numbers are useful, but they can be misleading. For example:
- some tools perform more safety checks than others — these checks might be worth the performance hit
- some tools will let you serialize your own hand-written Java classes; others require that you use Java classes generated from a schema file
This page describes how each tool is used in the benchmark. This is not an exhaustive list of each tool’s features. For example, our “json/jackson” test abbreviated field names — this doesn’t mean that the Jackson JSON library requires the use of abbreviated field names.
Tool | Language-Neutral | Data Structure | Serialization | Formats | Cycle/Shared Object detection |
---|---|---|---|---|---|
java-built-in | binary | yes | |||
java-manual | manual | binary | |||
scala/java-built-in | binary | yes | |||
scala/sbinary | manual | binary | |||
kryo | optional annotate | binary | yes | ||
kryo-manual | manual | binary | |||
fast-serialization (“fst”) | optional annotate | binary | yes | ||
wobly/wobly-compact | annotate | binary | |||
protobuf | yes | schema+gen | binary, json | ||
thrift | yes | schema+gen | binary, json-like | ||
protobuf/activemq-alt | yes | schema+gen | binary, json | ||
protostuff | yes | schema+gen | protostuff, protobuf, json, smile | optional (not used in test) | |
protostuff-manual | yes | manual | protostuff, protobuf, json, smile | ||
protostuff-runtime | yes | protostuff, protobuf, json, smile | |||
avro | yes | schema+gen | binary, json | ||
jackson | yes | manual++ | json, smile | ? | |
jackson-databind | yes | optional annotate | json, smile | ? | |
xml/javolution | yes | manual | xml+abbrev | ? | |
…xstream | yes | xml | ? | ||
…xstream+c | yes | manual | xml+abbrev | ? | |
xml-fi/… | yes | binary xml | ? |
Language-neutral: Whether the tool is a viable option if you want to be language-neutral. For example, though it is possible for other languages to consume Java’s built-in serialization format, it is definitely not convenient — that’s why Java’s built-in serialization format is not listed as being language-neutral.
The “yes” here just indicates some level of cross-language support. Obviously the different tools have different levels of support for other languages. You should probably do your own research to see if the tool has robust support for the language you need.
Data structure: Whether the tool places any restrictions on the classes it can serialize. If this entry is empty, it means the tool will generally serialize any Java class.
- schema+gen: you write a schema file (in the tool’s schema language) and the tool generates Java classes that you must use
- annotate: the tool will let you use your own Java classes, but you need to annotate the Java classes to help the tool out
- optional annotate: the tool can serialize any Java class, but you can improve or configure the serialization through annotations
The tools that generate code from a schema file vary a lot in the type of code they generate. This affects the “create” time.
- very simple classes that allow you to directly manipulate fields
- classes with private fields and get/set methods
- classes whose instances are immutable — helps reduce certain kinds of bugs
The tools that generate code from a schema don’t have equivalent schema languages. Some schema languages are more expressive than others. Some let you perform low-level optimizations. Make sure you take a look at the different schema languages to see what each has to offer.
Serialization: How much additional work the programmer has to do to serialize data.
- manual: The tool will take care of the low-level details of the format (like syntax) but you essentially have to write all the code to serialize your specific data structure.
- manual++: The same thing as ‘manual’, but even more tedious.
Formats: The primary format is listed first. The binary formats usually have a human-readable alternative format as well, used for specifying data values by hand or for debugging.
- binary: some custom binary format.
- json: the JSON format.
- json-like: some custom text format that is similar to JSON/YAML/plist, etc.
java-built-in:
- !ObjectOutput.writeObject(java.io.Serializable)
- Part of the reason it’s slow is that it’s the only test here that preserves arbitrary object graphs (all the other serializers flatten graphs to trees). To do this, the serializer keeps track of every object’s identity, which is an expensive operation.
java-manual:
- Hand-written serialization code: JavaManual.java, lines 52-143
fast-serialization:
- the serializer test case does similar optimization to others (class registration, unshared mode)
- fst-serialization is JDK-serialization compatible in that it supports readObject/writeObject/readReplace/putField methods. This requires a few additional cpu cycles but eases drop-in replacement of JDK serialization.
kryo:
- We register each class with Kryo, an optional step that improves performance. Kryo.java, lines 95-100
- The “kryo-opt” test further improves performance by giving the serializer more information about the data values being serialized. Kryo.java, lines 105-130
- the regular variant uses the default Kryo serialization code(runtime databind).
- the manual variant contains custom handwritten serialization code.
protobuf (Google Protocol Buffers):
- The generated data classes are relatively heavy-weight. Data values are immutable and are built by “Builder” objects that check the data for conformance to the schema. This makes things safer and also accounts for why protobuf’s “create” time is so high.
- We set optimize_for=SPEED
- Schema language
protobuf/activemq-alt:
- Java-only protobuf-format-compatible implementation from the ActiveMQ project.
- We use the “alternative” bindings generator, which parses the input on-demand (i.e. when the fields are actually accessed).
protostuff:
- serialization api with built-in forward-backward compatibility for pojos
- formats featured on the benchmark: protostuff(native), protobuf, json, smile
- the regular variant serializes code-generated(schema) pojos, compiled from protostuff-compiler
- the runtime variant serializes the existing pojos(databind) found in tpc/src/data.
- the manual variant contains handwritten schema w/c looks like the generated schema, but with no autoboxing of singular primitive types.
Note that there is a difference in serialized size for both variants because tpc/src/data/media/Media.java has an extra boolean field “hasBitrate” w/c explains the 2-byte difference(1-byte for the varint and 1-byte for the boolean value).
thrift (Apache Thrift):
- Very similar to protobuf.
- “thrift” uses the standard TBinaryProtocol serializer.
- “thrift-compact” uses the newer TCompactProtocol serializer.
- Schema language
avro (Apache Avro):
- Schema language
- Avro’s data structures use UTF-8 encoded strings (instead of Java’s native UTF-16 encoded strings). Its “create” times and post-deserialize times are so high because the benchmark converts to/from native Java strings. In actual use, careful programming may allow you to avoid these conversions.
json/jackson:
- Lots of hand-written serialization code: JsonJackson.java, lines 62-292 , with additional support code in FieldMapping.java
- Uses full names (earlier versions used abbreviated names)
- “json/jackson-databind” uses automatic serialization, with default settings (no custom annotations or configuration)
xml/javolution:
- Hand-written serialization code: JavolutionXml.java, lines 59-181
- Uses abbreviated names.
xstream:
- The non-“+c” variants use XStream’s built-in object-to-XML conversion (no extra code, full names).
- The “+c” variants use a hand-written conversion function with abbreviated names. XStream.java, lines 110-257
- “xml/xstream” uses XStream’s own XML reader/writer.
- The “xml/…-xstream” variants use various StAX implementations to do the XML reading/writing.
xml-manual/woodstox, xml-manual/aalto:
- Use hand-written manual serialization code on top of StAX implementation (woodstox, aalto)
- Full names
wobly:
- compact version creates smaller output, sacrificing speed a little (and is done by optional annotation properties)