A minimal and portable JSON tokenizer written in standard C and C++ (two separate versions). Performs validating and highly efficient parsing suitable for reading JSON directly into custom data structures. There are no code dependencies — simply include jsont.{h,hh,c,cc}
in your project.
Build and run unit tests:
make
C API:
jsont_ctx_t* S = jsont_create(0);
jsont_reset(S, uint8_t* inbuf, size_t inbuf_len);
tok = jsont_next(S)
// branch on `tok` ...
V = jsont_*_value(S[, ...]);
jsont_destroy(S);
New C++ API:
jsont::Tokenizer S(const char* inbuf, size_t length);
jsont::Token token;
while ((token = S.next())) {
if (token == jsont::Float) {
printf("%g\n", S.floatValue());
} ... else if (t == jsont::Error) {
// handle error
break;
}
}
jsont::Builder json;
json.startObject()
.fieldName("foo").value(123.45)
.fieldName("bar").startArray()
.value(678)
.value("nine \"ten\"")
.endArray()
.endObject();
std::cout << json.toString() << std::endl;
// {"foo":123.45,"bar":[678,"nine \"ten\""]}
See jsont.h
and jsont.hh
for a complete overview of the API, incuding more detailed documentation. Here's an overview:
Builder build()
— convenience builder factory
Reads a sequence of bytes and produces tokens and values while doing so.
Tokenizer(const char* bytes, size_t length, TextEncoding encoding)
— initialize a new Tokenizer to readbytes
oflength
inencoding
void reset(const char* bytes, size_t length, TextEncoding encoding)
— Reset the tokenizer, making it possible to reuse this parser so to avoid unnecessary memory allocation and deallocation.
const Token& next() throw(Error)
— Read next token, possibly throwing anError
const Token& current() const
— Access current token
bool hasValue() const
— True if the current token has a valuesize_t dataValue(const char const** bytes)
— Returns a slice of the input which represents the current value, or nothing (returns 0) if the current token has no value (e.g. start of an object).std::string stringValue() const
— Returns a copy of the current string value.double floatValue() const
— Returns the current value as a double-precision floating-point number.int64_t intValue() const
— Returns the current value as a signed 64-bit integer.
ErrorCode error() const
— Returns the error code of the last errorconst char* errorMessage() const
— Returns a human-readable message for the last error. Never returns NULL.
const char* inputBytes() const
— A pointer to the input data as passed toreset
or the constructor.size_t inputSize() const
— Total number of input bytessize_t inputOffset() const
— The byte offset into input where the tokenizer is currently at. In the event of an error, this will point to the source of the error.
End
— Input endedObjectStart
— {ObjectEnd
— }ArrayStart
— [ArrayEnd
— ]True
— trueFalse
— falseNull
— nullInteger
— number value without a fraction part (access as int64 throughTokenizer::intValue()
)Float
— number value with a fraction part (access as double throughTokenizer::floatValue()
)String
— string value (access value throughTokenizer::stringValue()
et al)FieldName
— field name (access value throughTokenizer::stringValue()
et al)Error
— an error occured (access error code throughTokenizer::error()
et al)
UTF8TextEncoding
— Unicode UTF-8 text encoding
UnspecifiedError
— Unspecified errorUnexpectedComma
— Unexpected commaUnexpectedTrailingComma
— Unexpected trailing commaInvalidByte
— Invalid input bytePrematureEndOfInput
— Premature end of inputMalformedUnicodeEscapeSequence
— Malformed Unicode escape sequenceMalformedNumberLiteral
— Malformed number literalUnterminatedString
— Unterminated stringSyntaxError
— Illegal JSON (syntax error)
Aids in building JSON, providing a final sequential byte buffer.
Builder()
— initialize a new builder with an empty backing bufferBuilder& startObject()
— Start an object (appends a'{'
character to the backing buffer)Builder& endObject()
— End an object (a'}'
character)Builder& startArray()
— Start an array ('['
)Builder& endArray()
— End an array (']'
)const void reset()
— Reset the builder to its neutral state. Note that the backing buffer is reused in this case.
Builder& fieldName(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding)
— Adds a field name by copyinglength
bytes fromv
.Builder& fieldName(const std::string& name, TextEncoding encoding=UTF8TextEncoding)
— Adds a field name by copyingname
.Builder& value(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding)
— Adds a string value by copyinglength
bytes fromv
which content is encoded according toencoding
.Builder& value(const char* v)
— Adds a string value by copyingstrlen(v)
bytes from c-stringv
. Uses the default encoding ofvalue(const char*,size_t,TextEncoding)
.Builder& value(const std::string& v)
— Adds a string value by copyingv
. Uses the default encoding ofvalue(const char*,size_t,TextEncoding)
.Builder& value(double v)
— Adds a possibly fractional numberBuilder& value(int64_t v)
,void value(int v)
,void value(unsigned int v)
,void value(long v)
— Adds an integer numberBuilder& value(bool v)
— Adds the "true" or "false" atom, depending onv
Builder& nullValue()
— Adds the "null" atom
size_t size() const
— Number of readable bytes at the pointer returned bybytes()
const char* bytes() const
— Pointer to the backing buffer, holding the resulting JSON.std::string toString() const
— Return astd::string
object holding a copy of the backing buffer, representing the JSON.const char* seizeBytes(size_t& size_out)
— "Steal" the backing buffer. After this call, the caller is responsible for callingfree()
on the returned pointer. Returns NULL on failure. Sets the value ofsize_out
to the number of readable bytes at the returned pointer. The builder will be reset and ready to use (which will act on a new backing buffer).
jsont_ctx_t
— A tokenizer context ("instance" in OOP lingo.)jsont_tok_t
— A token type (see "Token types".)jsont_err_t
— A user-configurable error type, which defaults toconst char*
.
jsont_ctx_t* jsont_create(void* user_data)
— Create a new JSON tokenizer context.void jsont_destroy(jsont_ctx_t* ctx)
— Destroy a JSON tokenizer context.void jsont_reset(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length)
— Reset the tokenizer to parse the data pointed to bybytes
.
jsont_tok_t jsont_next(jsont_ctx_t* ctx)
— Read and return the next token.jsont_tok_t jsont_current(const jsont_ctx_t* ctx)
— Returns the current token (last token read byjsont_next
).
int64_t jsont_int_value(jsont_ctx_t* ctx)
— Returns the current integer value.double jsont_float_value(jsont_ctx_t* ctx)
— Returns the current floating-point number value.size_t jsont_data_value(jsont_ctx_t* ctx, const uint8_t** bytes)
— Returns a slice of the input which represents the current value.char* jsont_strcpy_value(jsont_ctx_t* ctx)
— Retrieve a newly allocated c-string.bool jsont_data_equals(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length)
— Returns true if the current data value is equal tobytes
oflength
bool jsont_str_equals(jsont_ctx_t* ctx, const char* str)
— Returns true if the current data value is equal to c stringstr
.
Note that the data is not parsed until you call one of these functions. This means that if you know that a value transferred as a string will fit in a 64-bit signed integer, it's completely valid to call jsont_int_value
to parse the string as an integer.
uint8_t jsont_current_byte(jsont_ctx_t* ctx)
— Get the last byte read.size_t jsont_current_offset(jsont_ctx_t* ctx)
— Get the current offset of the last byte read.jsont_err_t jsont_error_info(jsont_ctx_t* ctx)
— Get information on the last error.void* jsont_user_data(const jsont_ctx_t* ctx)
— Returns the value passed tojsont_create
JSONT_END
— Input ended.JSONT_ERR
— Error. Retrieve details throughjsont_error_info
JSONT_OBJECT_START
— {JSONT_OBJECT_END
— }JSONT_ARRAY_START
— [JSONT_ARRAY_END
— ]JSONT_TRUE
— trueJSONT_FALSE
— falseJSONT_NULL
— nullJSONT_NUMBER_INT
— number value without a fraction part (access throughjsont_int_value
orjsont_float_value
)JSONT_NUMBER_FLOAT
— number value with a fraction part (access throughjsont_float_value
)JSONT_STRING
— string value (access throughjsont_data_value
orjsont_strcpy_value
)JSONT_FIELD_NAME
— field name (access throughjsont_data_value
orjsont_strcpy_value
)
- See
example*.c
for working sample programs. - See
LICENSE
for the MIT-style license under which this project is licensed.