Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3853: Implementation of range sequence entry for skipped sequence slice #6764

Merged
merged 10 commits into from
Apr 19, 2024

Conversation

gregns1
Copy link
Contributor

@gregns1 gregns1 commented Apr 10, 2024

CBG-3853

  • Addition of the range entry handling of the skipped slice
  • Addition of test cases for range entries in the slice
  • Careful testing of edge case scenarios (especially when it comes to removing sequences from the slice). Test comments describe these edge cases.
  • Updates to benchmarks
  • Changed compact method top return int64 instead for ease of adding this output to a stat
  • Refactored binarySearchFunc for simplicity
  • Removed mutex from insert as its unnecessary (its always called form method already acquiring the mutex)

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Show resolved Hide resolved
@adamcfraser adamcfraser assigned gregns1 and unassigned adamcfraser Apr 11, 2024
@gregns1 gregns1 assigned adamcfraser and unassigned gregns1 Apr 12, 2024
@gregns1 gregns1 requested a review from adamcfraser April 12, 2024 16:24
Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks fine, a few minor suggestions to help readability and to ensure we have complete test coverage.

db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Outdated Show resolved Hide resolved
name: "list_full_range_items",
inputList: []uint64{5, 15, 25, 35, 45},
expectedStartSeqList: []uint64{5, 15, 25, 28, 35, 45},
expectedEndSeqList: []uint64{10, 20, 26, 30, 40, 50},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making these test definitions a bit more readable, so that it's more immediately obvious what the input ranges are and what the expected results are? In particular the fact that we're creating ranges of [input, input+5] isn't really obvious, and it's also hard to eyeball what the expected end ranges are.

I was thinking of something like:

input: [][]uint64{{5,10}, {15,20}, {25,30}, {35,40}, {45,50}}
expected: [][]uint64{{5,10}, {15,20}, {25,26}, {28,30}, {35,40}, {45,50}

With that approach you could use {25} for cases where we expect a single sequence, and {25,25} where we expect a range of length 1. You could also include the range len=2 and range len=3 cases in this test a bit more easily.

It might be personal preference, if you disagree you can ignore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, want to ensure we've got coverage for all permutations of removals, including:

  • length 1 range
  • length 2 ranges
  • length 3 ranges, for removal of first/second/third (I think test below just removes the middle element)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, have made changes to have input and expected be 2-D array of uint64[][].

Added test cases to the test TestRemoveSeqFromSkipped for length 1 range seq removal, and length 2 seq removal. Have extended test TestRemoveSeqFromThreeSequenceRange to include a test for removing the start/last seq on a three sequence range.

@adamcfraser adamcfraser assigned gregns1 and unassigned adamcfraser Apr 12, 2024
@gregns1 gregns1 assigned adamcfraser and bbrks and unassigned gregns1 Apr 15, 2024
bbrks
bbrks previously requested changes Apr 16, 2024
db/skipped_sequence.go Show resolved Hide resolved
Comment on lines 292 to 300
// if x == startSeq set startSeq+1, if x == lastSeq set lastSeq-1
if rangeElem.getStartSeq() == x {
rangeElem.setStartSeq(x + 1)
return nil
}
if rangeElem.getLastSeq() == x {
rangeElem.setLastSeq(x - 1)
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If numSequences is 2 for these cases, does that mean this range becomes a single sequence without changing the type?

Does that impact anything the next time it's run through code that does isRange()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the original approach was if the above code lines makes a range of 1, we were going to delete this and insert a new single entry in its place. Adam suggested we could avoid extra allocations by just altering the range to be one sequence. It seems to work well!
With regards to it running through isRange(), it would still return true for being a range entry but I think this has no functional impact. Where isRange() is used in binary search function, it will still find the sequence if the range is length 1. Also for removeSeq we have a check for 1 sequence (above this code) that will just delete the entry. The only other way this is used is for adding new entries in PushSkippedSequenceEntry which I feel doesn't have functional impact.

Comment on lines 46 to 47
var _ SkippedSequenceListEntry = &SingleSkippedSequence{}
var _ SkippedSequenceListEntry = &SkippedSequenceRange{}
Copy link
Member

@bbrks bbrks Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the overhead of the interface will be vs. a single concrete type that can handle single sequences and ranges. ( https://syslog.ravelin.com/go-interfaces-but-at-what-cost-961e0f58a07b )

Most of the difference in behaviour is already in isRange checks, and it'd save all of the no-op method implementations.

Having said that, this would be closer to a micro-optimisation than the planned refactor of skipped sequences, so I'm OK waiting for all of that work to be done first, since the refactor to change the interface to a single type wouldn't be too intrusive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this idea, and am fine doing it in a later optimization. However, it does make me wonder whether anything that's a no-op belongs on the interface. (setStartSeq/setLastSeq/extendRange). Looking at the code, this goes back to the fact that we're using isRange instead of a type switch. Was this intentional for performance reasons?

db/skipped_sequence.go Outdated Show resolved Hide resolved
@bbrks bbrks assigned gregns1 and unassigned bbrks and adamcfraser Apr 16, 2024
@gregns1 gregns1 assigned adamcfraser and unassigned gregns1 Apr 16, 2024
@adamcfraser
Copy link
Collaborator

Overall looks good, but I'm also wondering about the slightly non-standard use of interfaces here (with the isRange/no-ops).

@adamcfraser adamcfraser removed their assignment Apr 16, 2024
@gregns1 gregns1 removed their assignment Apr 17, 2024
@gregns1 gregns1 requested a review from adamcfraser April 17, 2024 15:53
db/skipped_sequence.go Outdated Show resolved Hide resolved
db/skipped_sequence.go Show resolved Hide resolved
@adamcfraser adamcfraser assigned gregns1 and unassigned adamcfraser Apr 17, 2024
@gregns1 gregns1 assigned torcolvin and unassigned gregns1 Apr 18, 2024
@gregns1 gregns1 requested a review from torcolvin April 18, 2024 11:51
@torcolvin torcolvin enabled auto-merge (squash) April 19, 2024 01:23
@torcolvin torcolvin dismissed adamcfraser’s stale review April 19, 2024 01:25

changes addressed

@torcolvin torcolvin dismissed bbrks’s stale review April 19, 2024 01:26

changes addressed

@torcolvin torcolvin merged commit e8a27bf into main Apr 19, 2024
30 checks passed
@torcolvin torcolvin deleted the CBG-3853 branch April 19, 2024 01:26
gregns1 added a commit that referenced this pull request Jul 9, 2024
* CBG-3852: Implementation of single sequence entry for skipped sequence slice (#6747)

* CBG-3852: implementation of single sequence entry for skipped sequence slice

* minor updates to make setLastSeq no-op for single values

* updates for review round

* remove not needed comment

* updates based off review

* CBG-3853: Implementation of range sequence entry for skipped sequence slice (#6764)

* CBG-3854: Integration of the skipped sequence slice into the cache (#6789)

* CBG-3854: Integeration of the skipped sequence slice into the cache

* remove old chnages not needed anymore + gh actions failures

* updates based of review. Move stat update to updateStats and alter soem naming. Test updates to reflect stat chnages too.

* remove incorrect incrementing of num current sequences inside PushSkippedSequenceEntry

* Fix race condition in updateStats()

* updates off review

* tidy CleanSkippedSequenceQueue

* stats update

* CBG-3855: Sequence range removal support (#6801)

* CBG-3855: support for seqeuence range removal

* add comment to function + remove repeated code

* refactor error handling, simplify stat calculation update

* CBG-3850: Optimise releaseUnusedSequenceRange  (#6831)

* CBG-3850: Optimise releaseUnusedSequenceRange function + support for ranges in pending queue

* updates to add _pushRangeToPending and _removeSubsetOfRangeFromSkipped to process duplicates sequences between unused rnage and pending/skipped lists

* fix race condition in test

* fix for incorrect logic for removing skipped range

* updates based off review

* updates to fix incorrect code + add extra test coverage of changes

* change error name

* CBG-3945: Fix for panic in _clip method on skipped sequence slice (#6842)

* fix test to use old api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants