Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: add (*sstable.Writer).RangeKey{Set,Unset,Delete} methods #1445

Merged
merged 1 commit into from
Jan 11, 2022

Conversation

nicktrav
Copy link
Contributor

@nicktrav nicktrav commented Jan 7, 2022

Users of sstable.Writer other than Pebble itself (e.g. Cockroach)
require a means of adding range keys to an sstable that is slightly more
flexible than the existing AddRangeKey method in the way in which key
spans can be added to the table.

Specifically, AddRangeKey requires that spans be sorted (start / end
key ascending, sequence number and key kind descending), in addition to
fragmented and coalesced (i.e. multiple suffixes within the same span
are coalesced into a single value).

Add the RangeKey{Set,Unset,Delete} methods on the sstable.Writer
that allow adding spans in order of start key. These methods have no
requirement that the spans be fragmented or coalesced. Instead, the
implementation handles the fragmenting of spans as keys are added. A
keyspan.Fragmenter emits completed spans into a rangekey.coalescer,
which aggregates the fragments for the same span into a
rangekey.CoalescedSpan, which can then be used to write the individual
RangeKey{Set,Unset,Delete} key / value pairs into the rang key block
in the sstable.

The new RangeKey{Set,Unset,Delete} methods have similar semantics to
the existing Set,Delete,DeleteRange,Merge methods that write the key /
value pairs at sequence number zero. In the range key case, these new
methods are intended to be used by external callers when preparing
sstables for ingestion.

Add a data-driven test specific to the new range key methods. These
tests are separate to the existing data driven test cases for the
AddRangeKey method, which has stricter requirements on how keys are
added (i.e. already ordered, fragmented and coalesced).

Rename AddInternalRangeKey to AddRangeKey to better match the
existing naming of (*sstable.Writer).Add.

Add new utility methods to the rangekey package to aid in encoding
range key values.

Related to #1339.


Some open questions:

  • Should the API allow for passing in multiple suffix / value pairs with the same start / end key - currently the caller would need to make multiple calls.
  • Safety guarantees of passing byte slices to the new functions. Specifically, we only perform a copy when a span is finalized. Depending on how the caller re-uses any byte slices, the underlying values in the fragmenter may change.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions


internal/rangekey/rangekey.go, line 73 at r1 (raw file):

// SuffixValues, when encoded for a RangeKeySet. It may be used to construct a
// buffer of the appropriate size before encoding.
func EncodedSetSuffixValuesLen(suffixValues []SuffixValue) int {

I found myself doing a lot of gymnastics with just the suffix values (i.e. the value without the end key encoded). I split these functions out.


sstable/writer.go, line 398 at r1 (raw file):

// not required to be fragmented.
func (w *Writer) RangeKeySet(start, end, suffix, value []byte) error {
	// FIXME(travers): Should the API allow for adding multiple (suffix, value)

Open question: is this sufficient? Or should we support allowing the caller to pass in multiple suffix / value pairs. Same for RangeKeyUnset.


sstable/writer.go, line 401 at r1 (raw file):

	// pairs at the same time?
	startKey := base.MakeInternalKey(start, 0, base.InternalKeyKindRangeKeySet)
	return w.addRangeKeySpan(startKey, end, suffix, value)

Open question: there's some subtlety in tracking the lifetimes of the byte slices that are passed to these methods. As we only perform a copy when the fragmented span is coalesced and finalized, it's possible that the caller could alter the underlying data before we add to the block. Do we want to preemptively copy?

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Reviewed 2 of 6 files at r1.
Reviewable status: 2 of 6 files reviewed, 11 unresolved discussions (waiting on @nicktrav and @sumeerbhola)


internal/rangekey/rangekey.go, line 180 at r1 (raw file):

}

// EncodeUnsetSuffixValues encodes a slice of SuffixValues for a RangeKeyUnset

nit: s/slice of SuffixValues/slice of suffixes/


internal/rangekey/rangekey.go, line 182 at r1 (raw file):

// EncodeUnsetSuffixValues encodes a slice of SuffixValues for a RangeKeyUnset
// into dst. The length of dst must be greater than or equal to
// EncodedSetSuffixValuesLen. EncodeSetSuffixValues returns the number of bytes

nit: s/EncodeSetSuffixValues/EncodeUnsetSuffixes/


internal/rangekey/rangekey.go, line 185 at r1 (raw file):

// written, which should always equal the EncodedSetValueLen with the same
// arguments.
func EncodeUnsetSuffixValues(dst []byte, suffixes [][]byte) int {

s/EncodeUnsetSuffixValues/EncodeUnsetSuffixes/


sstable/writer.go, line 398 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Open question: is this sufficient? Or should we support allowing the caller to pass in multiple suffix / value pairs. Same for RangeKeyUnset.

I think this is sufficient. 👍


sstable/writer.go, line 401 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Open question: there's some subtlety in tracking the lifetimes of the byte slices that are passed to these methods. As we only perform a copy when the fragmented span is coalesced and finalized, it's possible that the caller could alter the underlying data before we add to the block. Do we want to preemptively copy?

I vote preemptively copy to err towards guarding against misuse, since we expect range keys to be rare in CockroachDB. We can walk that decision back if we need to during performance testing.

We could share a bytes.Buffer for all range key buffered data (including the encoded suffix-values tuples) to reduce the allocations..


sstable/writer.go, line 451 at r1 (raw file):

	case base.InternalKeyKindRangeKeyUnset:
		suffixes := [][]byte{suffix}
		buf := make([]byte, rangekey.EncodedUnsetSuffixValuesLen(suffixes))

I think it might be worthwhile to use a shared bytes.Buffer on the Writer for these encoded Span.Values to reduce allocations.


sstable/writer.go, line 468 at r1 (raw file):

	// only flush when we've moved past the previous key.
	if w.compare(span.Start.UserKey, w.prevRangeKeyUserKey) > 0 {
		w.fragmenter.TruncateAndFlushTo(w.prevRangeKeyUserKey)

Is this necessary? I think Fragmenter.Add will flush if necessary. I think we can Add each key span and call Finish when finishing the sstable, and that should be enough to ensure everything is flushed.


sstable/writer.go, line 496 at r1 (raw file):

}

func (w *Writer) addCoalescedSpan(s rangekey.CoalescedSpan) {

Can you add a comment here stating that this method is used as the emit function of the Coalescer?


sstable/testdata/writer_range_keys, line 1 at r1 (raw file):

# NOTE: The operations SET, UNSET, and DEL in this test file are aliases for

These test cases are great (and so easy to read!)


sstable/testdata/writer_range_keys, line 85 at r1 (raw file):

d.RANGEKEYSET.0: e [(@9=v9)]
d.RANGEKEYDEL.0: e
f.RANGEKEYSET.0: i [(@4=v4),(@5=v5),(@7=v7),(@8=v8),(@9=v9)]

nit: The ordering of these suffix-values is opposite, because the test uses base.DefaultComparer which compares suffixes as byte slices. I think it'd be worthwhile to use testkeys.Comparer and reverse the order here to avoid confusion.

Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @jbowens from a discussion.
Reviewable status: 1 of 6 files reviewed, 4 unresolved discussions (waiting on @jbowens and @sumeerbhola)


internal/rangekey/rangekey.go, line 185 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

s/EncodeUnsetSuffixValues/EncodeUnsetSuffixes/

Done.


sstable/writer.go, line 398 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think this is sufficient. 👍

👍


sstable/writer.go, line 401 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I vote preemptively copy to err towards guarding against misuse, since we expect range keys to be rare in CockroachDB. We can walk that decision back if we need to during performance testing.

We could share a bytes.Buffer for all range key buffered data (including the encoded suffix-values tuples) to reduce the allocations..

Done. I added (*Writer).tempRangeKeyBuf, which is similar to how we do things over in the block property collector.


sstable/writer.go, line 451 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think it might be worthwhile to use a shared bytes.Buffer on the Writer for these encoded Span.Values to reduce allocations.

Same as above. Will re-use the same mechanism for buffering.


sstable/writer.go, line 468 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is this necessary? I think Fragmenter.Add will flush if necessary. I think we can Add each key span and call Finish when finishing the sstable, and that should be enough to ensure everything is flushed.

Removed.


sstable/writer.go, line 496 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Can you add a comment here stating that this method is used as the emit function of the Coalescer?

Done.


sstable/testdata/writer_range_keys, line 1 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

These test cases are great (and so easy to read!)

👍


sstable/testdata/writer_range_keys, line 85 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: The ordering of these suffix-values is opposite, because the test uses base.DefaultComparer which compares suffixes as byte slices. I think it'd be worthwhile to use testkeys.Comparer and reverse the order here to avoid confusion.

Fixed.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 5 files at r2.
Reviewable status: 5 of 6 files reviewed, 7 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)


sstable/writer.go, line 442 at r2 (raw file):

	// caller to avoid re-use of buffers until the Writer is closed.
	endCopy := w.tempRangeKeyBuf(len(end))
	copy(endCopy, end)

where are we copying start? I am assuming that like the other slices it can be mutated by the caller when this method returns.


sstable/writer.go, line 497 at r2 (raw file):

	var unsets [][]byte
	for _, item := range s.Items {
		sv := rangekey.SuffixValue{Suffix: item.Suffix}

nit: initializing sv here adds a level of indirection that is unnecessary for the if-block.
how about

 		if item.Unset {
			unsets = append(unsets, item.Suffix)
		} else {
			sets = append(sets, rangekey.SuffixValue{Suffix: item.Suffix, Value: item.Value)
		}

sstable/writer_rangekey_test.go, line 50 at r2 (raw file):

			start, end := startEndSplit[0], startEndSplit[1]

			switch kind {

Consider scrambling the bytes in the byte slices passed as parameters after the return from the method. We've had hard to track down bugs in the past.

Users of `sstable.Writer` other than Pebble itself (e.g. Cockroach)
require a means of adding range keys to an sstable that is slightly more
flexible than the existing `AddRangeKey` method in the way in which key
spans can be added to the table.

Specifically, `AddRangeKey` requires that spans be sorted (start / end
key ascending, sequence number and key kind descending), in addition to
fragmented and coalesced (i.e. multiple suffixes within the same span
are coalesced into a single value).

Add the `RangeKey{Set,Unset,Delete}` methods on the `sstable.Writer`
that allow adding spans in order of start key. These methods have no
requirement that the spans be fragmented or coalesced. Instead, the
implementation handles the fragmenting of spans as keys are added. A
`keyspan.Fragmenter` emits completed spans into a `rangekey.coalescer`,
which aggregates the fragments for the same span into a
`rangekey.CoalescedSpan`, which can then be used to write the individual
`RangeKey{Set,Unset,Delete}` key / value pairs into the rang key block
in the sstable.

The new `RangeKey{Set,Unset,Delete}` methods have similar semantics to
the existing `Set,Delete,DeleteRange,Merge` methods that write the key /
value pairs at sequence number zero. In the range key case, these new
methods are intended to be used by external callers when preparing
sstables for ingestion.

Add a data-driven test specific to the new range key methods. These
tests are separate to the existing data driven test cases for the
`AddRangeKey` method, which has stricter requirements on how keys are
added (i.e. already ordered, fragmented and coalesced).

Rename `AddInternalRangeKey` to `AddRangeKey` to better match the
existing naming of `(*sstable.Writer).Add`.

Add new utility methods to the `rangekey` package to aid in encoding
range key values.

Related to cockroachdb#1339.
Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 6 unresolved discussions (waiting on @jbowens and @sumeerbhola)


sstable/writer.go, line 442 at r2 (raw file):

Previously, sumeerbhola wrote…

where are we copying start? I am assuming that like the other slices it can be mutated by the caller when this method returns.

Added.


sstable/writer.go, line 497 at r2 (raw file):

Previously, sumeerbhola wrote…

nit: initializing sv here adds a level of indirection that is unnecessary for the if-block.
how about

 		if item.Unset {
			unsets = append(unsets, item.Suffix)
		} else {
			sets = append(sets, rangekey.SuffixValue{Suffix: item.Suffix, Value: item.Value)
		}

Done.


sstable/writer_rangekey_test.go, line 50 at r2 (raw file):

Previously, sumeerbhola wrote…

Consider scrambling the bytes in the byte slices passed as parameters after the return from the method. We've had hard to track down bugs in the past.

Good point. Done.

@nicktrav nicktrav mentioned this pull request Jan 11, 2022
29 tasks
@nicktrav
Copy link
Contributor Author

cc: @erikgrinaker - FYI. Feel free to take this for a spin. There might be some subtleties we're not aware of that you may be able point out.

@nicktrav
Copy link
Contributor Author

TFTRs!

@nicktrav nicktrav merged commit 0c50304 into cockroachdb:master Jan 11, 2022
@nicktrav nicktrav deleted the nickt.writer-rangekeys branch January 11, 2022 20:25
@erikgrinaker
Copy link
Contributor

Thanks @nicktrav, will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants