Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: byte slice JSON parser #1415

Merged
merged 80 commits into from
Mar 29, 2024
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
f1308d3
numberKind byte slice parser
notJoon Dec 6, 2023
c30634c
remove unused packages
notJoon Dec 6, 2023
7b0e6c2
add utf16 package
notJoon Dec 7, 2023
cf0655e
add escape handler
notJoon Dec 7, 2023
dbeb921
parse float
notJoon Dec 8, 2023
d39ad52
basic lexer
notJoon Dec 11, 2023
1d2a600
update lexer
notJoon Dec 11, 2023
26f9c70
key position range finder
notJoon Dec 12, 2023
32760c4
find multiple key positions in JSON
notJoon Dec 12, 2023
1de7f0c
parsing Integer and float values
notJoon Dec 12, 2023
e19a06d
number parser refactoring
notJoon Dec 12, 2023
9ba8808
parse primitive types
notJoon Dec 12, 2023
534f0f7
create searchKeys
notJoon Dec 13, 2023
f12375d
get type value
notJoon Dec 13, 2023
6868c10
parse array
notJoon Dec 13, 2023
1a1657f
remove Lexer struct
notJoon Dec 13, 2023
39bfa64
refactor
notJoon Dec 13, 2023
33ccb14
re-refactor
notJoon Dec 13, 2023
789abfd
parse uint value
notJoon Dec 13, 2023
7de39d5
revert
notJoon Dec 14, 2023
cc136d5
JSON PoC
notJoon Dec 15, 2023
7e9ccb8
remove unused errors
notJoon Dec 15, 2023
b69eadb
flatten
notJoon Dec 15, 2023
6e72466
investigate flattening
notJoon Dec 15, 2023
83a58eb
refactor
notJoon Dec 15, 2023
57916bc
type assertion in flatten
notJoon Dec 17, 2023
5a54e9a
save
notJoon Dec 19, 2023
5f138b6
key extract
notJoon Dec 20, 2023
ebef5d5
re-organize the files
notJoon Dec 20, 2023
2b35743
save
notJoon Dec 20, 2023
40faddb
fix array
notJoon Dec 23, 2023
fbc13fc
struct parser
notJoon Jan 4, 2024
7041cfc
struct parser finish
notJoon Jan 4, 2024
9fee201
marshaling
notJoon Jan 8, 2024
16e9cf3
handle type formatting
notJoon Jan 8, 2024
e5964bd
handle type formatting
notJoon Jan 8, 2024
d3d0ff7
fmt
notJoon Jan 8, 2024
fffe9d1
add more types for marshaling
notJoon Jan 10, 2024
188e39c
define dynamic structs and generate JSON
notJoon Jan 11, 2024
cc0c1ed
reduce memory alloc when marshal
notJoon Jan 12, 2024
3135854
CRUD for struct instance
notJoon Jan 16, 2024
4f93bc2
add output
notJoon Jan 16, 2024
e8a8251
Merge branch 'master' into json
notJoon Jan 16, 2024
7a62f93
add basic linter and example
notJoon Jan 17, 2024
807d77d
add lookup table
notJoon Jan 22, 2024
6fd80ce
Merge branch 'master' into json
notJoon Feb 6, 2024
a49efbb
change fmt package into p/demo/ufmt
notJoon Feb 11, 2024
95e41ed
regex as global
notJoon Feb 11, 2024
ee9eec9
fmt
notJoon Feb 11, 2024
db0c8ab
fixup
notJoon Feb 11, 2024
48049c2
create EachKey
notJoon Feb 12, 2024
ecbc853
Merge branch 'master' into json
notJoon Feb 12, 2024
d8c6498
fixup
notJoon Feb 15, 2024
ac77d71
fmt
notJoon Feb 15, 2024
680f152
create error code
notJoon Feb 19, 2024
e8ab3cf
basic buffer
notJoon Feb 19, 2024
2ab0eed
License
notJoon Feb 19, 2024
e3298e3
json path parser
notJoon Feb 19, 2024
ecaab8a
json path token handler
notJoon Feb 21, 2024
2faabe2
add test case
notJoon Feb 22, 2024
f7716c4
save
notJoon Feb 22, 2024
2c6fc6e
Merge branch 'master' into json
notJoon Feb 28, 2024
de817ba
fmt and mv license
notJoon Mar 2, 2024
29224d9
state machine decoder and some node getters
notJoon Mar 5, 2024
42b4a15
rewrite JSON code to avoid struct
notJoon Mar 12, 2024
03a9d17
use ryu algorithm to format float and reorganize the structure
notJoon Mar 12, 2024
cc4cd3e
tidy
notJoon Mar 12, 2024
2218981
fix DeleteIndex
notJoon Mar 13, 2024
faf9890
Improve determining int and float type
notJoon Mar 13, 2024
33ab79a
resolve conflict
notJoon Mar 13, 2024
a688fc5
Merge branch 'master' into json
notJoon Mar 13, 2024
a54cfab
remove utf16 package from json
notJoon Mar 13, 2024
c999121
Merge branch 'master' into json
notJoon Mar 28, 2024
88e93cc
refactor
notJoon Mar 28, 2024
8bffa48
some lint
notJoon Mar 28, 2024
aee3526
save
notJoon Mar 29, 2024
447718a
Merge branch 'master' into json
notJoon Mar 29, 2024
bd14f50
refactor and update README
notJoon Mar 29, 2024
c693ccc
Merge branch 'master' into json
notJoon Mar 29, 2024
209a754
typo
notJoon Mar 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/reference/go-gno-compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ Legend:
| time | `full`[^7] |
| time/tzdata | `tbd` |
| unicode | `full` |
| unicode/utf16 | `tbd` |
| unicode/utf16 | `full` |
| unicode/utf8 | `full` |
| unsafe | `nondet` |

Expand Down
22 changes: 22 additions & 0 deletions examples/gno.land/p/demo/json/const.gno
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
package json

const (
SupplementalPlanesOffset = 0x10000
HighSurrogateOffset = 0xD800
LowSurrogateOffset = 0xDC00

SurrogateEnd = 0xDFFF
BasicMultilingualPlaneOffset = 0xFFFF

BadHex = -1

unescapeStackBufSize = 64
)

const (
absMinInt64 = 1 << 63
maxInt64 = 1<<63 - 1
maxUint64 = 1<<64 - 1
intSize = 32 << (^uint(0) >> 63)
IntSize = intSize
)
912 changes: 912 additions & 0 deletions examples/gno.land/p/demo/json/eisel_lemire.gno

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions examples/gno.land/p/demo/json/errors.gno
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
package json

import "errors"

const (
// KeyPathNotFoundError occurs when a specified key path does not exist in the JSON structure.
KeyPathNotFoundError error = errors.New("JSON Error: key path not found in the JSON structure")

// ArrayIndexNotFound occurs when the specified index is beyond the range of the array.
ArrayIndexNotFound = errors.New("JSON Error: array index not found or out of range")

// TokenNotFound occurs when a particular token (expected as part of the structure) is not found.
TokenNotFound = errors.New("JSON Error: expected token not found in the JSON input")

// KeyLevelNotMatched occurs when the key levels do not match the expected structure or depth.
KeyLevelNotMatched = errors.New("JSON Error: key level does not match the expected structure or depth")

// Overflow occurs when a number in the JSON exceeds the range that can be handled.
Overflow = errors.New("JSON Error: numeric value exceeds the range limit")

// EmptyBytes occurs when the JSON input is empty or has no content.
EmptyBytes = errors.New("JSON Error: empty bytes: the JSON input is empty or has no content")

// InvalidArrayIndex occurs when the index used for an array is not an integer or out of valid range.
InvalidArrayIndex = errors.New("JSON Error: invalid array index: index should be an integer and within the valid range")

// InvalidExponents occurs when there's an error related to the format or range of exponents in numbers.
InvalidExponents = errors.New("JSON Error: invalid format or range of exponents in a numeric value")

// NonDigitCharacters occurs when there are non-digit characters where a number is expected.
NonDigitCharacters = errors.New("JSON Error: non-digit characters found where a number is expected")

// MultipleDecimalPoints occurs when a number has more than one decimal point.
MultipleDecimalPoints = errors.New("JSON Error: multiple decimal points found in a number")

// MalformedType occurs when the type of a value does not match the expected type.
MalformedType = errors.New("JSON Error: malformed type: the type of the value does not match the expected type")

// MalformedString occurs when a string is improperly formatted, like unescaped characters or incorrect quotes.
MalformedString = errors.New("JSON Error: malformed string: improperly formatted string, check for unescaped characters or incorrect quotes")

// MalformedValue occurs when a value does not conform to the expected format or structure.
MalformedValue = errors.New("JSON Error: malformed value: the value does not conform to the expected format or structure")

// MalformedObject occurs when a JSON object is improperly formatted.
MalformedObject = errors.New("JSON Error: malformed object: the JSON object is improperly formatted or structured")

// MalformedArray occurs when a JSON array is improperly formatted.
MalformedArray = errors.New("JSON Error: malformed array: the JSON array is improperly formatted or structured")

// MalformedJson occurs when the entire JSON structure is improperly formatted or structured.
MalformedJson = errors.New("JSON Error: malformed JSON: the entire JSON structure is improperly formatted or structured")

// UnknownValueType occurs when the JSON contains a value of an unrecognized or unsupported type.
UnknownValueType = errors.New("JSON Error: unknown value type: the value type is unrecognized or unsupported")
)
162 changes: 162 additions & 0 deletions examples/gno.land/p/demo/json/escape.gno
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
package json

import (
"bytes"
"errors"
"unicode/utf8"
)

// unescape takes an input byte slice, processes it to unescape certain characters,
// and writes the result into an output byte slice.
//
// it returns the processed slice and any error encountered during the unescape operation.
func unescape(input, output []byte) ([]byte, error) {
// find the index of the first backslash in the input slice.
firstBackslash := bytes.IndexByte(input, BackSlashToken)
if firstBackslash == -1 {
return input, nil
}

// ensure the output slice has enough capacity to hold the result.
if cap(output) < len(input) {
output = make([]byte, len(input))
} else {
// if the capacity is sufficient, slice the output to the length of the input.
output = output[0:len(input)]
}

copy(output, input[:firstBackslash])
input = input[firstBackslash:]
buf := output[firstBackslash:]

for len(input) > 0 {
// inLen is the number of bytes consumed in the input
// bufLen is the number of bytes written to buf.
if inLen, bufLen := processEscapedUTF8(input, buf); inLen == -1 {
return nil, errors.New("JSON Error: Encountered an invalid escape sequence in a string.")
} else {
input = input[inLen:]
buf = buf[bufLen:]
}

// find the next backslash in the remaining input.
nextBackslash := bytes.IndexByte(input, BackSlashToken)
if nextBackslash == -1 {
copy(buf, input)
buf = buf[len(input):]
break
} else {
copy(buf, input[:nextBackslash])
buf = buf[nextBackslash:]
input = input[nextBackslash:]
}
}

return output[:len(output)-len(buf)], nil
}

// isSurrogatePair returns true if the rune is a surrogate pair.
//
// A surrogate pairs are used in UTF-16 encoding to encode characters
// outside the Basic Multilingual Plane (BMP).
func isSurrogatePair(r rune) bool {
return HighSurrogateOffset <= r && r <= SurrogateEnd
}

// combineSurrogates reconstruct the original unicode code points in the
// supplemental plane by combinin the high and low surrogate.
//
// The hight surrogate in the range from U+D800 to U+DBFF,
// and the low surrogate in the range from U+DC00 to U+DFFF.
//
// The formula to combine the surrogates is:
// (high - 0xD800) * 0x400 + (low - 0xDC00) + 0x10000
func combineSurrogates(high, low rune) rune {
return ((high - HighSurrogateOffset) << 10) + (low - LowSurrogateOffset) + SupplementalPlanesOffset
}

// deocdeSingleUnicodeEscape decodes a unicode escape sequence (e.g., \uXXXX) into a rune.
func decodeSingleUnicodeEscape(b []byte) (rune, bool) {
if len(b) < 6 {
return utf8.RuneError, false
}

// convert hex to decimal
h1, h2, h3, h4 := h2i(b[2]), h2i(b[3]), h2i(b[4]), h2i(b[5])
if h1 == BadHex || h2 == BadHex || h3 == BadHex || h4 == BadHex {
return utf8.RuneError, false
}

return rune(h1<<12 + h2<<8 + h3<<4 + h4), true
}

// decodeUnicodeEscape decodes a Unicode escape sequence from a byte slice.
func decodeUnicodeEscape(b []byte) (rune, int) {
r, ok := decodeSingleUnicodeEscape(b)
if !ok {
return utf8.RuneError, -1
}

// determine valid unicode escapes within the BMP
if r <= BasicMultilingualPlaneOffset && !isSurrogatePair(r) {
return r, 6
}

// Decode the following escape sequence to verify a UTF-16 susergate pair.
r2, ok := decodeSingleUnicodeEscape(b[6:])
if !ok {
return utf8.RuneError, -1
}

if r2 < LowSurrogateOffset {
return utf8.RuneError, -1
}

return combineSurrogates(r, r2), 12
}

var escapeByteSet = [256]byte{
'"': DoublyQuoteToken,
'\\': BackSlashToken,
'/': SlashToken,
'b': BackSpaceToken,
'f': FormFeedToken,
'n': NewLineToken,
'r': CarriageReturnToken,
't': TabToken,
}

// processEscapedUTF8 processes the escape sequence in the given byte slice and
// and converts them to UTF-8 characters. The function returns the length of the processed input and output.
//
// The input 'in' must contain the escape sequence to be processed,
// and 'out' provides a space to store the converted characters.
//
// The function returns (input length, output length) if the escape sequence is correct.
// Unicode escape sequences (e.g. \uXXXX) are decoded to UTF-8, other default escape sequences are
// converted to their corresponding special characters (e.g. \n -> newline).
//
// If the escape sequence is invalid, or if 'in' does not completely enclose the escape sequence,
// function returns (-1, -1) to indicate an error.
func processEscapedUTF8(in, out []byte) (inLen int, outLen int) {
if len(in) < 2 || in[0] != BackSlashToken {
return -1, -1
}

escapeSeqLen := 2
escapeChar := in[1]
if escapeChar == 'u' {
if r, size := decodeUnicodeEscape(in); size != -1 {
outLen = utf8.EncodeRune(out, r)
return size, outLen
}
} else {
val := escapeByteSet[escapeChar]
if val != 0 {
out[0] = val
return escapeSeqLen, 1
}
}

return -1, -1
}
Loading
Loading