ToolBehavior

Performance numbers are useful, but they can be misleading. For example:

some tools perform more safety checks than others — these checks might be worth the performance hit
some tools will let you serialize your own hand-written Java classes; others require that you use Java classes generated from a schema file

This page describes how each tool is used in the benchmark. This is not an exhaustive list of each tool’s features. For example, our “json/jackson” test abbreviated field names — this doesn’t mean that the Jackson JSON library requires the use of abbreviated field names.

Tool	Language-Neutral	Data Structure	Serialization	Formats	Cycle/Shared Object detection
java-built-in				binary	yes
java-manual			manual	binary
scala/java-built-in				binary	yes
scala/sbinary			manual	binary
kryo		optional annotate		binary	yes
kryo-manual			manual	binary
fast-serialization (“fst”)		optional annotate		binary	yes
wobly/wobly-compact		annotate		binary
protobuf	yes	schema+gen		binary, json
thrift	yes	schema+gen		binary, json-like
protobuf/activemq-alt	yes	schema+gen		binary, json
protostuff	yes	schema+gen		protostuff, protobuf, json, smile	optional (not used in test)
protostuff-manual	yes		manual	protostuff, protobuf, json, smile
protostuff-runtime	yes			protostuff, protobuf, json, smile
avro	yes	schema+gen		binary, json
jackson	yes		manual++	json, smile	?
jackson-databind	yes	optional annotate		json, smile	?
xml/javolution	yes		manual	xml+abbrev	?
…xstream	yes			xml	?
…xstream+c	yes		manual	xml+abbrev	?
xml-fi/…	yes			binary xml	?

Language-neutral: Whether the tool is a viable option if you want to be language-neutral. For example, though it is possible for other languages to consume Java’s built-in serialization format, it is definitely not convenient — that’s why Java’s built-in serialization format is not listed as being language-neutral.

The “yes” here just indicates some level of cross-language support. Obviously the different tools have different levels of support for other languages. You should probably do your own research to see if the tool has robust support for the language you need.

Data structure: Whether the tool places any restrictions on the classes it can serialize. If this entry is empty, it means the tool will generally serialize any Java class.

schema+gen: you write a schema file (in the tool’s schema language) and the tool generates Java classes that you must use
annotate: the tool will let you use your own Java classes, but you need to annotate the Java classes to help the tool out
optional annotate: the tool can serialize any Java class, but you can improve or configure the serialization through annotations

The tools that generate code from a schema file vary a lot in the type of code they generate. This affects the “create” time.

very simple classes that allow you to directly manipulate fields
classes with private fields and get/set methods
classes whose instances are immutable — helps reduce certain kinds of bugs

The tools that generate code from a schema don’t have equivalent schema languages. Some schema languages are more expressive than others. Some let you perform low-level optimizations. Make sure you take a look at the different schema languages to see what each has to offer.

Serialization: How much additional work the programmer has to do to serialize data.

manual: The tool will take care of the low-level details of the format (like syntax) but you essentially have to write all the code to serialize your specific data structure.
manual++: The same thing as ‘manual’, but even more tedious.

Formats: The primary format is listed first. The binary formats usually have a human-readable alternative format as well, used for specifying data values by hand or for debugging.

binary: some custom binary format.
json: the JSON format.
json-like: some custom text format that is similar to JSON/YAML/plist, etc.

Tool Specifics

java-built-in:

!ObjectOutput.writeObject(java.io.Serializable)
Part of the reason it’s slow is that it’s the only test here that preserves arbitrary object graphs (all the other serializers flatten graphs to trees). To do this, the serializer keeps track of every object’s identity, which is an expensive operation.

java-manual:

Hand-written serialization code: JavaManual.java, lines 52-143

fast-serialization:

the serializer test case does similar optimization to others (class registration, unshared mode)
fst-serialization is JDK-serialization compatible in that it supports readObject/writeObject/readReplace/putField methods. This requires a few additional cpu cycles but eases drop-in replacement of JDK serialization.

kryo:

We register each class with Kryo, an optional step that improves performance. Kryo.java, lines 95-100
The “kryo-opt” test further improves performance by giving the serializer more information about the data values being serialized. Kryo.java, lines 105-130
the regular variant uses the default Kryo serialization code(runtime databind).
the manual variant contains custom handwritten serialization code.

protobuf (Google Protocol Buffers):

The generated data classes are relatively heavy-weight. Data values are immutable and are built by “Builder” objects that check the data for conformance to the schema. This makes things safer and also accounts for why protobuf’s “create” time is so high.
We set optimize_for=SPEED
Schema language

protobuf/activemq-alt:

Java-only protobuf-format-compatible implementation from the ActiveMQ project.
We use the “alternative” bindings generator, which parses the input on-demand (i.e. when the fields are actually accessed).

protostuff:

serialization api with built-in forward-backward compatibility for pojos
formats featured on the benchmark: protostuff(native), protobuf, json, smile
the regular variant serializes code-generated(schema) pojos, compiled from protostuff-compiler
the runtime variant serializes the existing pojos(databind) found in tpc/src/data.
the manual variant contains handwritten schema w/c looks like the generated schema, but with no autoboxing of singular primitive types.
Note that there is a difference in serialized size for both variants because tpc/src/data/media/Media.java has an extra boolean field “hasBitrate” w/c explains the 2-byte difference(1-byte for the varint and 1-byte for the boolean value).

thrift (Apache Thrift):

Very similar to protobuf.
“thrift” uses the standard TBinaryProtocol serializer.
“thrift-compact” uses the newer TCompactProtocol serializer.
Schema language

avro (Apache Avro):

Schema language
Avro’s data structures use UTF-8 encoded strings (instead of Java’s native UTF-16 encoded strings). Its “create” times and post-deserialize times are so high because the benchmark converts to/from native Java strings. In actual use, careful programming may allow you to avoid these conversions.

json/jackson:

Lots of hand-written serialization code: JsonJackson.java, lines 62-292 , with additional support code in FieldMapping.java
Uses full names (earlier versions used abbreviated names)
“json/jackson-databind” uses automatic serialization, with default settings (no custom annotations or configuration)

xml/javolution:

Hand-written serialization code: JavolutionXml.java, lines 59-181
Uses abbreviated names.

xstream:

The non-“+c” variants use XStream’s built-in object-to-XML conversion (no extra code, full names).
The “+c” variants use a hand-written conversion function with abbreviated names. XStream.java, lines 110-257
“xml/xstream” uses XStream’s own XML reader/writer.
The “xml/…-xstream” variants use various StAX implementations to do the XML reading/writing.

xml-manual/woodstox, xml-manual/aalto:

Use hand-written manual serialization code on top of StAX implementation (woodstox, aalto)
Full names

wobly:

compact version creates smaller output, sacrificing speed a little (and is done by optional annotation properties)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ToolBehavior

Tool Specifics

Clone this wiki locally