Skip to content

Commit

Permalink
Update to simdjson 3.9.5 (#83)
Browse files Browse the repository at this point in the history
* switched to ondemand parser

* removed LIBS_PATH and debug symbols

* a bit smarter about `pushinteger` and `pushnumber` for Lua 5.3+
  • Loading branch information
FourierTransformer authored Jul 31, 2024
1 parent 4ea3cc3 commit 0f609dd
Show file tree
Hide file tree
Showing 7 changed files with 166,866 additions and 16,462 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ TARGET = simdjson.$(LIBEXT)
all: $(TARGET)

$(TARGET):
$(CXX) $(SRC) $(FLAGS) $(INCLUDE) $(LIBS_PATH) $(LIBS) -o $@
$(CXX) $(SRC) $(FLAGS) $(INCLUDE) $(LIBS) -o $@

clean:
rm *.$(LIBEXT)
Expand Down
41 changes: 21 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# lua-simdjson (WIP)
# lua-simdjson
[![Build Status](https://github.com/FourierTransformer/lua-simdjson/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/FourierTransformer/lua-simdjson/actions?query=branch%3Amaster)

A basic lua binding to [simdjson](https://simdjson.org). The simdjson library is an incredibly fast JSON parser that uses SIMD instructions and fancy algorithms to parse JSON very quickly. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, 5.3, and 5.4 on linux/osx. It has a general parsing mode and a lazy mode that uses a JSON pointer.
A basic Lua binding to [simdjson](https://simdjson.org). The simdjson library is an incredibly fast JSON parser that uses SIMD instructions and fancy algorithms to parse JSON very quickly. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, 5.3, and 5.4 on linux/osx/windows. It has a general parsing mode and a lazy mode that uses a JSON pointer.

Current simdjson version: 0.5.0
Current simdjson version: 3.9.5

## Installation
If all the requirements are met, lua-simdjson can be install via luarocks with:
Expand All @@ -15,28 +15,29 @@ Otherwise it can be installed manually by pulling the repo and running luarocks

## Requirements
* lua-simdjson only works on 64bit systems.
* a lua build environment with support for C++11
* a Lua build environment with support for C++11
* g++ version 7+ and clang++ version 6+ or newer should work!

## Parsing
There are two main ways to parse JSON in lua-simdjson:
1. With `parse`: this parses JSON and returns a lua table with the parsed values
1. With `parse`: this parses JSON and returns a Lua table with the parsed values
2. With `open`: this reads in the JSON and keeps it in simdjson's internal format. The values can then be accessed using a JSON pointer (examples below)

Both of these methods also have support to read files on disc with `parseFile` and `openFile` respectively. If handling JSON from disk, these methods should be used and are incredibly fast.

## Typing
* lua-simdjson uses `simdjson.null` to represent `null` values from parsed JSON.
* Any application should use that for comparison as needed.
* it uses `lua_pushnumber` and `lua_pushinteger` for JSON floats and ints respectively, so your lua version may handle that slightly differently.
* All other types map as expected.
* lua-simdjson uses `simdjson.null` to represent `null` values from parsed JSON.
* Any application should use that for comparison as needed.
* it uses `lua_pushnumber` and `lua_pushinteger` for JSON floats and ints respectively, so your Lua version may handle that slightly differently.
* `lua_pushinteger` uses signed ints. A number from JSON larger than `LUA_MAXINTEGER` will be represented as a float/number
* All other types map as expected.

### Parse some JSON
The `parse` methods will return a normal lua table that can be interacted with.
The `parse` methods will return a normal Lua table that can be interacted with.
```lua
local simdjson = require("simdjson")
local response = simdjson.parse([[
{
{
"Image": {
"Width": 800,
"Height": 600,
Expand All @@ -60,11 +61,11 @@ print(fileResponse["statuses"][1]["id"])
```

### Open some json
The `open` methods currently require the use of a JSON pointer, but are very quick.
The `open` methods currently require the use of a JSON pointer, but are very quick. They are best used when you only need a part of a response. In the example below, it could be useful for just getting the `Thumnail` object with `:atPointer("/Image/Thumbnail")` which will then only create a Lua table with those specific values.
```lua
local simdjson = require("simdjson")
local response = simdjson.open([[
{
{
"Image": {
"Width": 800,
"Height": 600,
Expand All @@ -82,21 +83,21 @@ local response = simdjson.open([[
print(response:atPointer("/Image/Width"))

-- OR to parse a file from disk
local fileResponse = simdjson.open("jsonexamples/twitter.json")
local fileResponse = simdjson.openFile("jsonexamples/twitter.json")
print(fileResponse:atPointer("/statuses/0/id")) --using a JSON pointer

```
Starting with version 0.5.0, the the `atPointer` method is JSON pointer compliant. The previous pointer implementation is considered deprecated, but is still available with the `at` method.
Starting with version 0.5.0, the `atPointer` method is JSON pointer compliant. The previous pointer implementation is considered deprecated, but is still available with the `at` method.

The `open` and `parse` codeblocks should print out the same values. It's worth noting that the JSON pointer indexes from 0.

This lazy style of using the simdjson data structure could also be used with array access in the future, and would result in ultra-fast JSON "parsing".
This lazy style of using the simdjson data structure could also be used with array access in the future.

## Error Handling
lua-simdjson will error out with any errors from simdjson encountered while parsing. They are very good at helping identify what has gone wrong during parsing.

## Benchmarks
I ran some benchmarks against lua-cjson, rapidjson, and dkjson. For each test, I loaded the JSON into memory, and then had the parsers go through each file 100 times and took the average time it took to parse to a lua table. You can see all the results in the [benchmark](benchmark/) folder. I've included a sample output run via Lua (the LuaJIT graph looks very similar, also in the benchmark folder). The y-axis is logarithmic, so every half step down is twice as fast.
I ran some benchmarks against lua-cjson, rapidjson, and dkjson. For each test, I loaded the JSON into memory, and then had the parsers go through each file 100 times and took the average time it took to parse to a Lua table. You can see all the results in the [benchmark](benchmark/) folder. I've included a sample output run via Lua (the LuaJIT graph looks very similar, also in the benchmark folder). The y-axis is logarithmic, so every half step down is twice as fast.

![Lua Performance Column Chart](benchmark/lua-perf.png)

Expand All @@ -109,13 +110,13 @@ All tested files are in the [jsonexamples folder](jsonexamples/).
lua-simdjson, like the simdjson library performs better on more modern hardware. These benchmarks were run on a ninth-gen i7 processor. On an older processor, rapidjson may perform better.

## Caveats & Alternatives
* there is no encoding/dumping a lua table to JSON (yet! Most other lua JSON libraries can handle this)
* it only works on 64 bit systems (untested on Windows...)
* there is no encoding/dumping a Lua table to JSON (yet! Most other lua JSON libraries can handle this)
* it only works on 64 bit systems
* it builds a large binary. On a modern linux system, it ended up being \~200k (lua-cjson comes in at 42k)
* since it's an external module, it's not quite as easy to just grab the file and go (dkjson has you covered here!)

## Philosophy
I plan to keep it fairly inline with what the original simdjson library is capable of doing, which really means not adding too many additional options. The big _thing_ that's missing so far is encoding a lua table to JSON. I may add in an encoder at some point (likely modified from an existing lua library). There are some rumours that simdjson _may_ support creating JSON structure in the future. If that happens, I would likely switch to it.
I plan to keep it fairly inline with what the original simdjson library is capable of doing, which really means not adding too many additional options. The big _thing_ that's missing so far is encoding a lua table to JSON. I may add in an encoder at some point.

## Licenses
* The jsonexamples, src/simdjson.cpp, src/simdjson.h are unmodified from the released version simdjson under the Apache License 2.0.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package="lua-simdjson"
version="0.0.2-1"
version="0.0.3-1"
source = {
url = "git://github.com/FourierTransformer/lua-simdjson",
tag = "0.0.2"
tag = "0.0.3"
}
description = {
summary = "This is a simple Lua binding for simdjson",
Expand Down
115 changes: 87 additions & 28 deletions spec/compile_spec.lua
Original file line number Diff line number Diff line change
Expand Up @@ -11,34 +11,93 @@ end


local files = {
"apache_builds.json",
"canada.json",
"citm_catalog.json",
"github_events.json",
"google_maps_api_compact_response.json",
"google_maps_api_response.json",
"gsoc-2018.json",
"instruments.json",
"marine_ik.json",
"mesh.json",
"mesh.pretty.json",
"numbers.json",
"random.json",
"repeat.json",
"twitter_timeline.json",
"update-center.json",
"small/adversarial.json",
"small/demo.json",
"small/flatadversarial.json",
"small/smalldemo.json",
"small/truenull.json"
"apache_builds.json",
"canada.json",
"citm_catalog.json",
"github_events.json",
"google_maps_api_compact_response.json",
"google_maps_api_response.json",
"gsoc-2018.json",
"instruments.json",
"marine_ik.json",
"mesh.json",
"mesh.pretty.json",
"numbers.json",
"random.json",
"repeat.json",
"twitter_timeline.json",
"update-center.json",
"small/adversarial.json",
"small/demo.json",
"small/flatadversarial.json",
"small/smalldemo.json",
"small/truenull.json"
}

describe("Make sure everything compiled correctly", function()
for _, file in ipairs(files) do
it("should parse the file: " .. file, function()
local fileContents = loadFile("jsonexamples/" .. file)
assert.are.same(cjson.decode(fileContents), simdjson.parse(fileContents))
end)
end
describe("Make sure it parses strings correctly", function()
for _, file in ipairs(files) do
it("should parse the file: " .. file, function()
local fileContents = loadFile("jsonexamples/" .. file)
local cjsonDecodedValues = cjson.decode(fileContents)
assert.are.same(cjsonDecodedValues, simdjson.parse(fileContents))
end)
end
end)

describe("Make sure it parses files correctly", function()
for _, file in ipairs(files) do
it("should parse the file: " .. file, function()
local fileContents = loadFile("jsonexamples/" .. file)
local cjsonDecodedValues = cjson.decode(fileContents)
assert.are.same(cjsonDecodedValues, simdjson.parseFile("jsonexamples/" .. file))
end)
end
end)

describe("Make sure json pointer works with a string", function()
it("should handle a string", function()
local fileContents = loadFile("jsonexamples/small/demo.json")
local decodedFile = simdjson.open(fileContents)
assert.are.same(800, decodedFile:atPointer("/Image/Width"))
assert.are.same(600, decodedFile:atPointer("/Image/Height"))
assert.are.same(125, decodedFile:atPointer("/Image/Thumbnail/Height"))
assert.are.same(943, decodedFile:atPointer("/Image/IDs/1"))
end)
end)

describe("Make sure json pointer works with openfile", function()
it("should handle opening a file", function()
local decodedFile = simdjson.openFile("jsonexamples/small/demo.json")
assert.are.same(800, decodedFile:atPointer("/Image/Width"))
assert.are.same(600, decodedFile:atPointer("/Image/Height"))
assert.are.same(125, decodedFile:atPointer("/Image/Thumbnail/Height"))
assert.are.same(943, decodedFile:atPointer("/Image/IDs/1"))
end)
end)

local major, minor = _VERSION:match('([%d]+)%.(%d+)')
if tonumber(major) >= 5 and tonumber(minor) >= 3 then
describe("Make sure ints and floats parse correctly", function ()
it("should handle decoding numbers appropriately", function()

local numberCheck = simdjson.parse([[
{
"float": 1.2,
"min_signed_integer": -9223372036854775808,
"max_signed_integer": 9223372036854775807,
"one_above_max_signed_integer": 9223372036854775808,
"min_unsigned_integer": 0,
"max_unsigned_integer": 18446744073709551615
}
]])

assert.are.same("float", math.type(numberCheck["float"]))
assert.are.same("integer", math.type(numberCheck["max_signed_integer"]))
assert.are.same("integer", math.type(numberCheck["min_signed_integer"]))
assert.are.same("float", math.type(numberCheck["one_above_max_signed_integer"]))
assert.are.same("integer", math.type(numberCheck["min_unsigned_integer"]))
assert.are.same("float", math.type(numberCheck["max_unsigned_integer"]))

end)
end)
end
Loading

0 comments on commit 0f609dd

Please sign in to comment.