Skip to content

Latest commit

 

History

History
370 lines (266 loc) · 19.7 KB

README.md

File metadata and controls

370 lines (266 loc) · 19.7 KB

Jil

A fast JSON (de)serializer, built on Sigil with a number of somewhat crazy optimization tricks.

Releases are available on Nuget in addition to this repository.

Usage

Serializing

using(var output = new StringWriter())
{
    JSON.Serialize(
        new
        {
            MyInt = 1,
            MyString = "hello world",
            // etc.
        },
        output
    );
}

There is also a Serialize method that returns a string.

The first time Jil is used to serialize a given configuration and type pair, it will spend extra time building the serializer. Subsequent invocations will be much faster, so if a consistently fast runtime is necessary in your code you may want to "prime the pump" with an earlier "throw away" serialization.

Dynamic Serialization

If you need to serialize compile-time unknown types (including subclasses, and virtual properties) you should use JSON.SerializeDynamic instead. JSON.SerializeDynamic does not require a generic type parameter, and can cope with subclasses, object/dynamic members, and DLR participating types such as ExpandoObject and DynamicObject.

Deserializing

using(var input = new StringReader(myString))
{
    var result = JSON.Deserialize<MyType>(input);
}

There is also a Deserialize method that takes a string as input.

The first time Jil is used to deserialize a given configuration and type pair, it will spend extra time building the deserializer. Subsequent invocations will be much faster, so if a consistently fast runtime is necessary in your code you may want to "prime the pump" with an earlier "throw away" deserialization.

Jil is case sensitive as a rule, so when deserializing make sure your member names match what is in your JSON.

Dynamic Deserialization

using(var input = new StringReader(myString))
{
    var result = JSON.DeserializeDynamic(input);
}

There is also a DeserializeDynamic method that works directly on strings.

These methods return dynamic, and support the following operations:

  • Casts
    • ie. (int)JSON.DeserializeDynamic("123")
  • Member access
    • ie. JSON.DeserializeDynamic(@"{""A"":123}").A
  • Indexers
    • ie. JSON.DeserializeDynamic(@"{""A"":123}")["A"]
    • or JSON.DeserializeDynamic("[0, 1, 2]")[0]
  • Foreach loops
    • ie. foreach(var keyValue in JSON.DeserializeDynamic(@"{""A"":123}")) { ... }
      • in this example, keyValue is a dynamic with Key and Value properties
    • or foreach(var item in JSON.DeserializeDynamic("[0, 1, 2]")) { ... }
      • in this example, item is a dynamic and will have values 0, 1, and 2
  • Common unary operators (+, -, and !)
  • Common binary operators (&&, ||, +, -, *, /, ==, !=, <, <=, >, and >=)
  • .Length & .Count on arrays
  • .ContainsKey(string) on objects

Supported Types

Jil will only (de)serialize types that can be reasonably represented as JSON.

The following types (and any user defined types composed of them) are supported:

  • Strings (including char)
  • Booleans
  • Integer numbers (int, long, byte, etc.)
  • Floating point numbers (float, double, and decimal)
  • DateTimes & DateTimeOffsets
    • Note that DateTimes are converted to UTC time to allow for round-tripping, use DateTimeOffsets if you need to preserve timezone information
    • See Configuration for further details
  • TimeSpans
    • See Configuration for further details
  • Nullable types
  • Enumerations
    • Including [Flags]
  • Guids
  • IList<T>, ICollection<T>, and IReadOnlyList<T> implementations
  • IDictionary<TKey, TValue> implementations where TKey is a string or enumeration
  • ISet<T>

Jil deserializes public fields and properties; the order in which they are serialized is not defined (it is unlikely to be in declaration order). The DataMemberAttribute.Name property and IgnoreDataMemberAttribute are respected by Jil, as is the ShouldSerializeXXX() pattern. For situations where DataMemberAttribute and IgnoreDataMemberAttribute cannot be used, Jil provides the JilDirectiveAttribute which provides equivalent functionality.

Strong typing of primitives types (int, long, etc.) can be done by annotating a wrapper type with [JilPrimitiveWrapper]. Such a type should have one declared field or property, and default or single parameter constructor.

Unions

Jil has limited support for "unions" (fields on JSON objects that may contain one of several types), provided that they can be distiguished by their first character.

In other words:

class LegalUnion
{
	[JilDirective(Name = "Foo", IsUnion = true)]
	public string FooString { get; set; }
	[JilDirective(Name = "Foo", IsUnion = true)]
	public int FooInt { get; set; }
}

Is allowed because the first character of a JSON string is always ", while the first character of a JSON number is a digit or -.

The following would not be legal, however.

class IllegalUnion
{
	[JilDirective(Name = "Foo", IsUnion = true)]
	public uint FooUInt { get; set; }
	[JilDirective(Name = "Foo", IsUnion = true)]
	public double FooDouble { get; set; }
}

Since both properties could start with a digit.

You can also use a Type member to determine which field was (de)serialized.

class WithUnionType
{
	[JilDirective(Name = "Foo", IsUnion = true, IsUnionType = true)]
	public Type FooType { get; set; }

	[JilDirective(Name = "Foo", IsUnion = true)]
	public uint FooUInt { get; set; }
	[JilDirective(Name = "Foo", IsUnion = true)]
	public List<int> FooList { get; set; }

}

When serializing this field must be set.

Configuration

Jil's JSON.Serialize and JSON.Deserialize methods take an optional Options parameter which controls:

  • The format of DateTimes, DateTimeOffsets, and TimeSpans; one of
    • MicrosoftStyleMillisecondsSinceUnixEpoch, a string
      • "/Date(##...##)/" for DateTimes & DateTimeOffsets
      • "1.23:45:56.78" for TimeSpans
    • MillisecondsSinceUnixEpoch, a number
    • SecondsSinceUnixEpoch, a number
    • ISO8601, a string
      • for DateTimes & DateTimeOffsets, ie. "2011-07-14T19:43:37Z"
        • DateTimes are always serialized in UTC (timezone offset = 00:00), because Local DateTimes cannot reliably roundtrip
        • DateTimeOffsets include their timezone offset when serialized
      • for TimeSpans, ie. "P40DT11H10M9.4S"
    • RFC1123, a string
      • for DateTimes and DateTimeOffsets, ie. "Thu, 10 Apr 2008 13:30:00 GMT"
      • "1.23:45:56.78" for TimeSpans
  • What to treat DateTimes with an Unspecified DateTimeKind as; one of
    • IsLocal, will treat an unspecified DateTime as if it were in local time
    • IsUtc, will treat an unspecified DateTime as if it were in UTC
  • Whether or not to exclude null values when serializing dictionaries, and object members
  • Whether or not to "pretty print" while serializing, which adds extra linebreaks and whitespace for presentation's sake
  • Whether or not the serialized JSON will be used as JSONP (which requires slightly more work be done w.r.t. escaping)
  • Whether or not to include inherited members when serializing
  • The way to format member names; one of
    • Verbatim
      • As it appears in source, unless modified by a [MemberName] or [JilDirective]
    • CamelCase
      • lowercasing the first letter of members, ie. "CamelCase" would become "camelCase"

Benchmarks

Jil aims to be the fastest general purpose JSON (de)serializer for .NET. Flexibility and "nice to have" features are explicitly discounted in the pursuit of speed.

These benchmarks were run on a machine with the following specs:

  • Operating System: Windows 8.1 Enterprise 64-bit (6.3, Build 9600) (9600.winblue_r3.140827-1500)
  • System Manufacturer: Apple Inc.
  • System Model: MacBookPro11,3
  • Processor: Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz (8 CPUs), ~2.6GHz
  • Memory: 16384MB RAM
    • DDR3
    • Dual Channel
    • 798.1 MHZ

As with all benchmarks, take these with a grain of salt.

Serialization

For comparison, here's how Jil stacks up against other popular .NET serializers in a synthetic benchmark:

All three libraries are in use at Stack Exchange in various production roles.

Note that the bars in each group of each graph are scaled so that the fastest library is 100.

Numbers, include millisecond timings, can found in this Google Document.

The Question, Answer, and User types are taken from the Stack Exchange API.

Data for each type is randomly generated from a fixed seed. Random text is biased towards ASCII*, but includes all unicode.

*This is meant to simulate typical content from the Stack Exchange API.

Deserialization

The same libraries and same types were used to test deserialization.

Note that the bars in each group of each graph are scaled so that the fastest library is 100.

Numbers, include millisecond timings, can be found in the same Google Document.

Tricks

Jil has a lot of tricks to make it fast. These may be interesting, even if Jil itself is too limited for your use.

Sigil

Jil does a lot of IL generation to produce tight, focused code. While possible with ILGenerator, Jil instead uses the Sigil library. Sigil automatically does a lot of the busy work you'd normally have to do manually to produce ideal IL. Using Sigil also makes hacking on Jil much more productive, as debuging IL generation without it is pretty slow going.

Trade Memory For Speed

Jil's internal serializers and deserializers are (in the absense of recursive types) monolithic, and per-type; avoiding extra runtime lookups, and giving .NET's JIT more context when generating machine code.

The methods Jil create also do no Options checking at serialization time; Options are baked in at first use. This means that Jil may create up to 32 different serializers and 8 different deserializers for a single type (though in practice, many fewer).

Optimizing Member Access Order

Perhaps the most arcane code in Jil determines the preferred order to access members, so the CPU doesn't stall waiting for values from memory.

Members are divided up into 4 groups:

  • Simple
    • primitive ValueTypes such as int, double, etc.
  • Nullable Types
  • Recursive Types
  • Everything Else

Members within each group are ordered by the offset of the fields backing them (properties are decompiled to determine fields they use).

This is a fairly naive implementation of this idea, there's almost certainly more that could be squeezed out especially with regards to consistency of gains.

Don't Allocate If You Can Avoid It

.NET's GC is excellent, but no-GC is still faster than any-GC.

Jil tries to avoid allocating any reference types, with some exceptions:

  • a 36-length char[] if any integer numbers, DateTimes, or GUIDs are being serialized
  • a 32-length char[] if any strings, user defined objects, or ISO8601 DateTimes are being deserialized

Depending on the data being deserialized a StringBuilder may also be allocated. If a TextWriter does not have an invariant culture, strings may also be allocated when serializing floating point numbers.

Escaping Tricks

JSON has escaping rules for \, ", and control characters. These can be kind of time consuming to deal with. Jil avoids it as much as possible in two ways.

First, all known key names are determined once and baked into the generated delegates like so. Known keys are member names and enumeration values.

Second, rather than lookup encoded characters in a dictionary or a long series of branches Jil does explicit checks for " and \ and turns the rest into a subtraction and jump table lookup. This comes out to ~three branches (with mostly consistently taken paths, good for branch prediction in theory) per character.

This works because control characters in .NET strings (bascally UTF-16, but might as well be ASCII for this trick) are sequential, being [0,31].

JSONP also requires escaping of line separator (\u2028) and paragraph separator (\u2029) characters. When configured to serialize JSONP, Jil escapes them in the same manner as \ and ".

Custom Number Formatting

While number formatting in .NET is pretty fast, it has a lot of baggage to handle custom number formatting.

Since JSON has a strict definition of a number, a Write() implementation without configuration is noticeably faster. To go the extra mile, Jil contains separate implementations for int, uint, ulong, and long.

Jil does not include custom decimal, double, or single Write() implementations, as despite my best efforts I haven't been able to beat the ones built into .NET. If you think you're up to the challenge, I'd be really interested in seeing code that is faster than the included implementations.

Custom Date Formatting

Similarly to numbers, each of Jil's date formats has a custom Write() implementation.

Custom Guid Formatting

Noticing a pattern?

Jil has a custom Guid writer (which is one of the reasons Jil only supports the D format).

Fun fact about this method, I tested a more branch heavy version (which removed the byte lookup) which turned out to be considerably slower than the built-in method due to branch prediction failures. Type 4 Guids being random makes for something quite close to the worst case for branch prediciton.

Different Code For Arrays

Although arrays implement IList<T> the JIT generates much better code if you give it array-ish IL to chew on, so Jil does so.

Special Casing Enumerations With Sequential Values

Many enums end up having sequential values, Jil will exploit this if possible and generate a subtraction and jump table lookup. Non-sequential enumerations are handled with a long series of branches.

Custom Number Readers

Just like Jil maintains many different methods for writing integer types, it also maintains different methods for reading them. These methods omit unnecessary sign checks, overflow checks, and culture-specific formatting support.

Automata Based Member Name Lookups

Rather than read a member name into a string or buffer when deserializing, Jil will try to match it one character at a time using an automata.

Avoid Abstractions If Able

If you're serializing to string (as indicated by using a particular Serialize<T> method) Jil will avoid the overhead of virtually dispatching calls against TextWriter, and instead statically call against its own specialized StringBuilder-esque class. In the general case Jil prefers to write against a TextWriter so as to keep memory pressure low (a real concern in many real world deployments), but when Jil is going to allocate a string anyway avoiding virtual dispatch results in a noticeable speed up.

Simiarly, deserializing from string (as indicated by this Deserialize<T> method) Jil avoid using TextReader, and instead issue static calls against a lightweight wrapper of its own.