Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to not parse beyond end of structure #435

Merged
merged 4 commits into from
Jun 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions codec/dagjson/nongreedy_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package dagjson

import (
"bytes"
"testing"

"github.com/ipld/go-ipld-prime/datamodel"
"github.com/ipld/go-ipld-prime/node/basicnode"
)

func TestNonGreedy(t *testing.T) {
buf := bytes.NewBufferString(`{"a": 1}{"b": 2}`)
opts := DecodeOptions{
ParseLinks: false,
ParseBytes: false,
DontParseBeyondEnd: true,
}
nb1 := basicnode.Prototype.Map.NewBuilder()
err := opts.Decode(nb1, buf)
if err != nil {
t.Fatalf("first decode (%v)", err)
}
n1 := nb1.Build()
if n1.Kind() != datamodel.Kind_Map {
t.Errorf("expecting a map")
}
if _, err := n1.LookupByString("a"); err != nil {
t.Fatalf("missing fist key")
}
nb2 := basicnode.Prototype.Map.NewBuilder()
err = opts.Decode(nb2, buf)
if err != nil {
t.Fatalf("second decode (%v)", err)
}
n2 := nb2.Build()
if n2.Kind() != datamodel.Kind_Map {
t.Errorf("expecting a map")
}
if _, err := n2.LookupByString("b"); err != nil {
t.Fatalf("missing second key")
}
}
10 changes: 10 additions & 0 deletions codec/dagjson/unmarshal.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@ type DecodeOptions struct {
// If true, parse DAG-JSON `{"/":{"bytes":"base64 bytes..."}}` as a Bytes kind
// node rather than nested plain maps
ParseBytes bool

// If true, the decoder stops reading from the stream at the end of the JSON structure.
// i.e. it does not slurp remaining whitespaces and EOF.
// As per standard IPLD behavior, the parser considers the entire block to be
// part of the JSON structure and will error if there is extraneous
// non-whitespace data.
DontParseBeyondEnd bool
}

// Decode deserializes data from the given io.Reader and feeds it into the given datamodel.NodeAssembler.
Expand All @@ -43,6 +50,9 @@ func (cfg DecodeOptions) Decode(na datamodel.NodeAssembler, r io.Reader) error {
if err != nil {
return err
}
if cfg.DontParseBeyondEnd {
return nil
}
Comment on lines +53 to +55

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petar IIUC this is perhaps a nice config option, but only slightly helps fix the problem you're likely currently worrying about (i.e. streaming decodes of multiple dag-json objects for Reframe using Edelweiss).

Since the specific application format (in this case Reframe's HTTP+dag-json transport) has its own way of dealing with concatenating dag-json blobs (in this case appending \n) you're going to need some custom code anyhow to parse the \n.

At that point is it so different from just using Unmarshal(na, json.NewDecoder(r), cfg) directly and then trying to slurp up one more \n before continuing instead of using Decode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this proposal is that json.NewDecoder won't decode bytes and other IPLD-specific objects. Right?

The solution to Edelweiss' problem could be composed of two steps:

  • this PR, which makes sure nothing is slurped after the JSON object, and
  • a manual reader.ReadChar, following every invocation of ipld.StreamingUnmarshal(dagjson.Decoder, ...) in the Edelweiss-generated code for reading multiple \n-separated results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to flag more concretely that this PR just allows for using:

DecodeOptions{ ParseLinks : true, ParseBytes: true, DontParseBeyondEnd : true }.Decode(na, r)

rather than:

Unmarshal(na, json.NewDecoder(r), DecodeOptions{ ParseLinks : true, ParseBytes: true }).

Is this config mostly about helping discoverability so people know how to do this (i.e. is the Unmarshal code path not obvious enough)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that Unmarshal is deprecated.

// Slurp any remaining whitespace.
// This behavior may be due for review.
// (This is relevant if our reader is tee'ing bytes to a hasher, and
Expand Down