Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proto Error returned when using aetest context -- proto: invalid UTF-8 string #136

Closed
egag opened this issue May 12, 2018 · 22 comments
Closed
Assignees

Comments

@egag
Copy link

egag commented May 12, 2018

The error is set inside the Iterator on the call to datastore.Query.Run.
It is initially returned by the call to internal.Call line 534 in datastore/query.go
Which returns the error at the call to proto.Unmarshall line 534 (also coincidently lol) in internal/api.go
The error is: proto: invalid UTF-8 string

This occurs with the most recent version of github.com/golang/protobuf which was updated in response to this issue: golang/protobuf#484

The following code reproduces the issue:

func TestB() {
	inst, err := aetest.NewInstance(nil)
	if err != nil {
		panic(err.Error())
	}
	req, err := inst.NewRequest("GET", "/", nil)
	if err != nil {
		panic(err.Error())
	}
	ctx := appengine.NewContext(req)
	iter := datastore.NewQuery("link").KeysOnly().Run(ctx)
	_, err = iter.Cursor()
	if err != nil {
		panic(err.Error())
	}
}
@sbuss
Copy link
Contributor

sbuss commented May 17, 2018

Thank you for the bug report, I can reproduce this. This appears to be due to golang/protobuf#499, which changes how the protobuf library handles strings: they are now expected to be valid UTF-8 strings.

Unfortunately, the way we encode protos in other protos in the datastore library doesn't produce valid UTF-8 strings.

@sbuss
Copy link
Contributor

sbuss commented May 17, 2018

This is also the reason our tests on Travis just started failing.

@egag
Copy link
Author

egag commented May 17, 2018

Thanks for getting back to me on this, @sbuss!

As a temporary resolution, I have added this to our deploy scripts
pushd ${GOPATH}/src/github.com/golang/protobuf/ && git checkout ac606b176499 && popd

@dsnet
Copy link
Member

dsnet commented May 22, 2018

I'm not familiar with data store, but IIUC, it is the user who provides the proto message that is to be marshaled and unmarshaled.

For your use-case, are you able to switch the proto field type from string to bytes?

@dsnet
Copy link
Member

dsnet commented May 22, 2018

I just merged golang/protobuf#616, which should help identify which field is problematic.

@sbuss
Copy link
Contributor

sbuss commented May 22, 2018

Thanks, @dsnet. The error is now:

main_test.go:24: testB: proto: string field "appengine.CompiledQuery.PrimaryScan.index_name" contains invalid UTF-8

@dsnet
Copy link
Member

dsnet commented May 23, 2018

Is that a field where the proto type can be easily switch from string to bytes?

@sbuss
Copy link
Contributor

sbuss commented May 25, 2018

I have been experimenting with fixing this, and it looks like just replacing query.Encode() with any arbitrary string might be the right solution.

@egag Can you try this?

pushd $(dirname $(which gcloud))/../platform/google_appengine/google/appengine/datastore
sed -i "s/query.Encode()/'devindex'/" datastore_stub_util.py
popd
go get -u github.com/golang/protobuf

I'm not 100% sure that this will work, but I'd like to get feedback from a real-world use case.

@egag
Copy link
Author

egag commented May 25, 2018

@sbuss This did fix this error for me!

Unfortunately, now I get this error:
proto: string field "appengine.PropertyValue.StringValue" contains invalid UTF-8

Is this happening because of a particular field on the Query?

That did fix the error from the query though.

@lukebjerring
Copy link

I'm also seeing proto: string field "appengine.CompiledQuery.PrimaryScan.index_name" contains invalid UTF-8 inside my Docker image (FROM gcr.io/gcp-runtimes/go1-builder:1.10), but the error does not occur outside Docker (on both macOS and Linux).

Anyone know what aetest and/or dev_appserver.py will be missing in terms of environment that will make the difference? I tried this:

RUN apt-get install -y locales
RUN locale-gen en_US.UTF-8

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV PYTHONIOENCODING=UTF-8

but made no difference.

@sbuss
Copy link
Contributor

sbuss commented May 25, 2018

@egag Well, that's progress :) I'm pretty sure that if you use #141 the error will disappear:

pushd $GOPATH/src/google.golang.org/appengine && git fetch && git checkout fixdatastore && popd

If that's the case, then that's going to make fixing it in prod a little more difficult for me, but at least I'll know for sure that that's the right approach.

@sbuss
Copy link
Contributor

sbuss commented May 25, 2018

@lukebjerring I think that's happening because your system's github.com/golang/protobufis out of date, but the docker image is using the latest protobuf package.

The error message does make it sound like it's a locale issue, but it's actually because protobuf started validating all string fields to be valid UTF-8, which an encoded protobuf message cast to a string is not.

@egag
Copy link
Author

egag commented May 25, 2018

@sbuss I will give that a try. I really appreciate your help with this!

@egag
Copy link
Author

egag commented May 25, 2018

@sbuss proto.Marshal at line 484 in internal/api.go returns proto: string field "appengine.PropertyValue.StringValue" contains invalid UTF-8 when marshaling binary data from a byte slice
the error is returned from XXX_Marshal at line 2625 in github.com/golang/protobuf/proto/tablemarshal.go

@sbuss
Copy link
Contributor

sbuss commented May 25, 2018

@egag unfortunately, you can no longer store binary data in the StringValue.

You might be doing something like this:

m, err := proto.Marshal(...)
s = string(m)

But now you need to do this:

s = proto.MarshalTextString(...)

My biggest concern is that you've stored invalid UTF-8 strings in datastore which you'll be unable to retrieve using the latest protobuf package. This is one of the things I'm investigating as fallout from this change. I may have to ask the protobuf team to undo the change :\

@egag
Copy link
Author

egag commented May 25, 2018

@sbuss Interesting. That's unfortunate. Is binary data no longer supported? It's still listed in the docs

Or is there a more appropriate way to store binary data than in a slice of bytes? This error is occurring when I use datastore.Put for an entity with a property of type Data []byte that holds the binary.

@egag
Copy link
Author

egag commented May 25, 2018

@lukebjerring try adding this to your docker file (after downloading all dependencies):
RUN pushd ${GOPATH}/src/github.com/golang/protobuf/ && git checkout ac606b176499 && popd

@lukebjerring
Copy link

@egag thanks, did something similar:

 go_medium_test: go_deps dev_appserver_deps
   # Hack to work around https://github.com/golang/appengine/issues/136
   cd $(GOPATH)/src/github.com/golang/protobuf; git checkout ac606b1
   cd $(WPTD_GO_PATH); go test -tags=medium -v $(FLAGS) ./...

@sbuss
Copy link
Contributor

sbuss commented May 25, 2018

@egag Can you show me what your code does in as much detail as you're able? I suspect it's something akin to the following, but I want to be sure.

k := datastore.NewKey(ctx, "Data", "invalidbytes", 0, nil)
b := "\xa0\x01"  // \xa0 is an invalid utf-8 byte
_, err := datastore.Put(ctx, k, b)

This indicates we might actually be able to fix this instead of pushing back against protobuf, but I need to sync up with someone on the datastore team to be sure.

@egag
Copy link
Author

egag commented May 25, 2018

@sbuss Great! This should reproduce the error:

	key := datastore.NewKey(ctx, "data", "data1", 0, nil)
	data := &struct{ Data []byte }{Data: []byte("\xa0\x01")}  // Data: []byte("abc") works fine
	_, err := datastore.Put(ctx, key, data)
	if err != nil {
		panic(err.Error())
	}

I think it's because the byte slice is treated as a string under the covers.

@sbuss
Copy link
Contributor

sbuss commented May 25, 2018

Thanks for that code snippet @egag! I'm closing this issue and replacing it with #143 now that the dev_appserver issue has been addressed (fix to be released in a couple weeks) and to reduce on the noise in the issue.

@sbuss sbuss closed this as completed May 25, 2018
@egag
Copy link
Author

egag commented May 25, 2018

Much appreciated! For reference/if others run into this, this fixed the original issue:

pushd $(dirname $(which gcloud))/../platform/google_appengine/google/appengine/datastore
sed -i "s/query.Encode()/'devindex'/" datastore_stub_util.py
popd
go get -u github.com/golang/protobuf

Hexcles added a commit to web-platform-tests/wpt.fyi that referenced this issue Jul 27, 2018
golang/appengine#136 has been fixed so we
don't need the hack.

This change only stops to pin protobuf in new Go workspace. If your
protobuf pkg has already been pinned, you'd need to manually:

cd $GOPATH/src/github.com/golang/protobuf && git checkout master && git pull

Or if you use Docker, simply remove the Docker instance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants