Add unsafe package to perform zero-allocation conversion from string to byte slice #49

xichen2020 · 2017-03-14T22:46:04Z

cc @kobolog @robskillington @jeromefroe @cw9 @prateek

This PR adds the unsafe package to perform zero-allocation conversion from string to byte slices, which will significantly reduce GC when we need to encode string ids as byte slices.

…to byte slice

coveralls · 2017-03-14T22:46:58Z

Coverage increased (+0.01%) to 85.033% when pulling c84f0c5 on xichen-unsafe-string-byteslice-conversion into 2adc966 on master.

kobolog · 2017-03-14T23:12:16Z

unsafe/string.go

+	if len(s) == 0 {
+		return nil
+	}
+	sh := (*reflect.StringHeader)(unsafe.Pointer(&s))


You actually have to do all casts (string -> StringHeader -> SliceHeader -> []byte) in the same expression, because currently after you cast s to StringHeader, a potential GC from the future version of Go could preempt the goroutine and move the pointer in s, while sh's uintptr will be left unchanged.

This may not actually come to fruition. There's a proposal to add two new types unsafe.String and unsafe.Slice to replace reflect.StringHeader and reflect.SliceHeader. The Data field of the new types would be unsafe.Pointer instead of uintptr and unsafe.Pointer types will be updated by GC from what I can gather on this thread . The proposal is marked Go2 though so I gather it's not very high on the list of priorities. With that being said, I also don't know if Go will have a compacting GC anytime soon.

jeromefroe · 2017-03-14T23:51:29Z

unsafe/string.go

+	// where we assign the data pointer of the string header to the data
+	// pointer of the slice header to make sure the underlying bytes don't
+	// get GC'ed before the assignment happens.
+	return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{


The unsafe package states: "In general, reflect.SliceHeader and reflect.StringHeader should be used only as *reflect.SliceHeader and *reflect.StringHeader pointing at actual slices or strings, never as plain structs. A program should not declare or allocate variables of these struct types." And they offer the following pattern as being a valid way to update a string's Data field:

var s string hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1 hdr.Data = uintptr(unsafe.Pointer(p)) // case 6 (this case) hdr.Len = n

Consequently, I think we may be better off doing something of the form:

var b []byte byteHeader := (*reflect.SliceHeader)(unsafe.Pointer(&b)) byteHeader.Data = (*reflect.StringHeader)(unsafe.Pointer(&s)).Data l := len(s) byteHeader.Len = l byteHeader.Cap = l return b

This way we:

Address Andrey's comment above by setting the Data field of the byte slice to be the same as the Data field of the string in one expression.

Ensure s is live after updating the Data fields by calling len(s).

My comment still stays here, since there's a gap between reading the pointer from string header and writing a pointer to slice header, which potentially in the future can be used by the GC to preempt the goroutine. You literally need to read and write in the same expression, e.g.:

return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{ Data: (*reflect.StringHeader)(unsafe.Pointer(&s)).Data, Cap: len(s), Len: len(s), }))

Also I'd like to point out that since we dynamically create a slice here, its memory won't be tracked by the GC and if the "underlying" string gets collected at any point in time while the slice is still being used, then the slice will end up having a dangling reference leading to all kinds of weird behavior.

I fail to see how you're comment applies here, why would Data: (*reflect.StringHeader)(unsafe.Pointer(&s)).Data, be okay but not byteHeader.Data = (*reflect.StringHeader)(unsafe.Pointer(&s)).Data? It appears equivalent to your example to me since afterwards we have a byte slice pointing to the memory.

I also don't see why the memory wouldn't be tracked by GC. When the GC looks at the root pointers on the stack it will see a byte slice pointing to the memory that used to be pointed to by the string.

What I mean to say is that if the GC can interrupt the load and subsequent store in the expression byteHeader.Data = (*reflect.StringHeader)(unsafe.Pointer(&s)).Data then surely it could do the same when we create the struct and do Data: (*reflect.StringHeader)(unsafe.Pointer(&s)).Data,.

To clarify, my comment about dangling pointers applies to all examples and is valid even in the light of the documentation code snippet provided by Jerome since p there (on line 3) is a valid pointer all along, unlike sh.Data in our examples.
My comment about GC tracking memory after the string gets collected applies to the original code, not the proposed changes where we start with allocating a real []byte.

I agree that there is a problem in the xxhash implementation, I offered it just to highlight the fact that it takes the approach of allocating a real byte slice (i.e. var b []byte) and not allocating a byte slice through the use of a SliceHeader.

I also agree about the possibility of a dangling pointer, but it will only be an issue for a GC that moves memory and updates references. There is already a proposal to introduce a new unsafe representation for slices and strings whose Data field would be an unsafe.Pointer instead of an uintptr. If that proposal lands before a compacting GC is introduced then we can update this package to use those new types and won't run into this issue even with a compacting GC. Needless to say, we'll need to be vigilant during new Go releases.

Hmm, actually, on second thought, we should be fine even if the GC begins moving references with this approach because what we are doing is using the reflect Header structs to update the internal unsafe.Pointers in the runtime representation of strings and slices.

I don't see why we'd have a dangling pointer. If GC relocates the string's underlying memory, it'll need to fix up the str field in the stringStruct object on the stack:

type stringStruct struct { str unsafe.Pointer len int }

If that's the case, it means it'll automatically fix up the Data pointer in the string header because the string header pointer is literally just reinterpreting cast the stringStruct pointer and points to the exact same location on the stack where the stringStruct variable sits.

… and tests

coveralls · 2017-03-15T18:03:12Z

Coverage increased (+0.05%) to 85.072% when pulling a6ce31f on xichen-unsafe-string-byteslice-conversion into 2adc966 on master.

coveralls · 2017-03-15T18:03:12Z

Coverage increased (+0.05%) to 85.072% when pulling a6ce31f on xichen-unsafe-string-byteslice-conversion into 2adc966 on master.

robskillington

LGTM

Add unsafe package to perform zero-allocation conversion from string …

c84f0c5

…to byte slice

kobolog suggested changes Mar 14, 2017

View reviewed changes

jeromefroe reviewed Mar 14, 2017

View reviewed changes

Fix dangling pointer and GC relocation issue along with more comments…

a6ce31f

… and tests

kobolog approved these changes Mar 15, 2017

View reviewed changes

robskillington approved these changes Mar 15, 2017

View reviewed changes

xichen2020 merged commit bf13462 into master Mar 15, 2017

xichen2020 deleted the xichen-unsafe-string-byteslice-conversion branch March 15, 2017 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unsafe package to perform zero-allocation conversion from string to byte slice #49

Add unsafe package to perform zero-allocation conversion from string to byte slice #49

xichen2020 commented Mar 14, 2017

coveralls commented Mar 14, 2017 •

edited

Loading

kobolog Mar 14, 2017

jeromefroe Mar 14, 2017 •

edited

Loading

jeromefroe Mar 14, 2017

kobolog Mar 15, 2017

kobolog Mar 15, 2017

jeromefroe Mar 15, 2017

jeromefroe Mar 15, 2017

kobolog Mar 15, 2017

jeromefroe Mar 15, 2017

jeromefroe Mar 15, 2017

jeromefroe Mar 15, 2017

xichen2020 Mar 15, 2017

coveralls commented Mar 15, 2017

coveralls commented Mar 15, 2017

robskillington left a comment

Add unsafe package to perform zero-allocation conversion from string to byte slice #49

Add unsafe package to perform zero-allocation conversion from string to byte slice #49

Conversation

xichen2020 commented Mar 14, 2017

coveralls commented Mar 14, 2017 • edited Loading

Choose a reason for hiding this comment

jeromefroe Mar 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Mar 15, 2017

coveralls commented Mar 15, 2017

robskillington left a comment

Choose a reason for hiding this comment

coveralls commented Mar 14, 2017 •

edited

Loading

jeromefroe Mar 14, 2017 •

edited

Loading