Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-4270: Implement sequence-based storage for BlipTesterClient #7226

Merged
merged 19 commits into from
Dec 11, 2024
Merged

Conversation

bbrks
Copy link
Member

@bbrks bbrks commented Dec 3, 2024

CBG-4270

BlipTesterClient now maintains a client-side sequence number, associated with every document update made on the client. This is independent of the Sync Gateway sequence.

Document storage is sequence-based, and we store all versions of documents for the purposes of history and test assertions. No cleanup is made on these old versions, since it's not expected that we'll be writing extremely long-running tests using this.

We build a client-side "changes feed" by iterating over all sequences and emitting the latest version of each document. There is no client-side checkpointing, but tests can start changes feeds from a specific sequence when required. Most tests can just use a continous changes feed running in the background.

The client's changes feed produces ProposeChanges in batches of only 1 item at a time, rather than a more real-world example of 20 - since the code to produce batches and a flush timeout of changes is not implemented. Filed CBG-4401 as follow-up.

TODO

  • Self-review/cleanup
  • Ctx/close investigation
  • fix data race
2024-12-03T17:55:25.9788096Z ==================
2024-12-03T17:55:25.9788177Z WARNING: DATA RACE
2024-12-03T17:55:25.9788285Z Write at 0x00c000d1c108 by goroutine 198918:
2024-12-03T17:55:25.9788567Z   github.com/couchbase/sync_gateway/rest.(*BlipTesterCollectionClient).upsertDoc()
2024-12-03T17:55:25.9788906Z       /home/runner/work/sync_gateway/sync_gateway/rest/blip_client_test.go:1294 +0xcf5
2024-12-03T17:55:25.9789163Z   github.com/couchbase/sync_gateway/rest.(*BlipTesterCollectionClient).AddRev()
2024-12-03T17:55:25.9789496Z       /home/runner/work/sync_gateway/sync_gateway/rest/blip_client_test.go:1315 +0x99
2024-12-03T17:55:25.9789784Z   github.com/couchbase/sync_gateway/rest.(*BlipTestClientRunner).AddRev()
2024-12-03T17:55:25.9790112Z       /home/runner/work/sync_gateway/sync_gateway/rest/blip_client_test.go:1668 +0x55
2024-12-03T17:55:25.9790477Z   github.com/couchbase/sync_gateway/rest.TestMinRevPosWorkToAvoidUnnecessaryProveAttachment.func1()
2024-12-03T17:55:25.9790865Z       /home/runner/work/sync_gateway/sync_gateway/rest/attachment_test.go:2417 +0x86b
2024-12-03T17:55:25.9791099Z   github.com/couchbase/sync_gateway/rest.(*BlipTestClientRunner).Run.func2()
2024-12-03T17:55:25.9791424Z       /home/runner/work/sync_gateway/sync_gateway/rest/blip_client_test.go:857 +0xbd
2024-12-03T17:55:25.9791513Z   testing.tRunner()
2024-12-03T17:55:25.9791811Z       /opt/hostedtoolcache/go/1.23.3/x64/src/testing/testing.go:1690 +0x226
2024-12-03T17:55:25.9791921Z   testing.(*T).Run.gowrap1()
2024-12-03T17:55:25.9792207Z       /opt/hostedtoolcache/go/1.23.3/x64/src/testing/testing.go:1743 +0x44
2024-12-03T17:55:25.9792216Z 
2024-12-03T17:55:25.9792356Z Previous read at 0x00c000d1c108 by goroutine 199068:
2024-12-03T17:55:25.9792847Z   github.com/couchbase/sync_gateway/rest.(*BlipTesterCollectionClient).docsSince.func1.(*BlipTesterCollectionClient).OneShotDocsSince.1()
2024-12-03T17:55:25.9793171Z       /home/runner/work/sync_gateway/sync_gateway/rest/blip_client_test.go:109 +0x7ce
2024-12-03T17:55:25.9793454Z   github.com/couchbase/sync_gateway/rest.(*BlipTesterCollectionClient).docsSince.func1()
2024-12-03T17:55:25.9793778Z       /home/runner/work/sync_gateway/sync_gateway/rest/blip_client_test.go:119 +0x321

Integration Tests

rest/blip_api_crud_test.go Outdated Show resolved Hide resolved
rest/blip_api_delta_sync_test.go Outdated Show resolved Hide resolved
rest/blip_api_delta_sync_test.go Outdated Show resolved Hide resolved
rest/blip_client_test.go Outdated Show resolved Hide resolved
base/collection_xattr.go Outdated Show resolved Hide resolved
db/crud.go Outdated Show resolved Hide resolved
rest/api_test_helpers.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked through part of this and mostly not the changes in blip_client_test.go but it would be helpful to me to have an overall design strategy possibly even in a comment for how document bodies and revisions are stored on the body for a pull replication and push replication.

It might make sense for me to meet and have you walk me through the design strategy a bit to make the review faster for me to do.

base/collection_xattr.go Outdated Show resolved Hide resolved
rest/api_test_helpers.go Show resolved Hide resolved
rest/attachment_test.go Show resolved Hide resolved
rest/attachment_test.go Show resolved Hide resolved
rest/attachment_test.go Show resolved Hide resolved
rest/utilities_testing_resttester.go Outdated Show resolved Hide resolved
rest/utilities_testing.go Outdated Show resolved Hide resolved
rest/blip_client_test.go Show resolved Hide resolved
rest/blip_client_test.go Outdated Show resolved Hide resolved
rest/blip_client_test.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code generally looks fine. I have some readability improvements I've made to avoid so many ctx.Err() != nil checks which I think were made defensively, and split up some of the complexity in docsSince. As discussed offline, this code might not live forever anyway and get replaced by active replicator code.

I did spend a considerable amount of time debugging why some of the blip delta sync tests fail on my local machine, only when running multiple tests at once. I never got to the bottom of why they failed and I am not sure that they were totally successfully passing before either. Running these on a different system leads to passing. That said, the delta sync tests are not run through github actions, so they might fail.

I will push up a branch (on top of this one) where you can look at the changes I've made to help debug, and decide to keep or remove them.

  • remove AssertfCtx / err requirements so that there is a better traceback in the case of failure. In the case of PushUnsolicitedRev we need more information from the error message, but generally we just want a blip message to fail the test harness, preferably without passing the error back up. This results in a lot of code changes.
  • removes most of the logging
  • removes channel close from docSince since this is unused
  • remove very confusing UnsubPushChanges function.
  • Added some docstrings to try to understand the code better.

I am fine if you don't want this branch.

I think it is worth considering whether some of the assertion handling to push into testify require assertions with tracebacks will make it easier to debug the code on anemone, and if we want to do a painful merge once. Maybe evaluate how the merge/rebase of this PR will be onto anemone and whether significant changes were made there.

rest/blip_client_test.go Outdated Show resolved Hide resolved
rest/blip_client_test.go Outdated Show resolved Hide resolved
Comment on lines 130 to 148
if ctx.Err() != nil {
close(ch)
return
}
c.TB().Logf("docsSince: sinceVal=%d", sinceVal)
for _, doc := range c.OneShotDocsSince(ctx, sinceVal) {
select {
case <-ctx.Done():
close(ch)
return
case ch <- doc:
c.TB().Logf("sent doc %q to changes feed", doc.id)
sinceVal = doc.latestSeq()
}
}
if !continuous {
c.TB().Logf("opts.Continuous=false, breaking changes loop")
break
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readability nit here:

Suggested change
if ctx.Err() != nil {
close(ch)
return
}
c.TB().Logf("docsSince: sinceVal=%d", sinceVal)
for _, doc := range c.OneShotDocsSince(ctx, sinceVal) {
select {
case <-ctx.Done():
close(ch)
return
case ch <- doc:
c.TB().Logf("sent doc %q to changes feed", doc.id)
sinceVal = doc.latestSeq()
}
}
if !continuous {
c.TB().Logf("opts.Continuous=false, breaking changes loop")
break
}
defer close(ch)
for _, doc := range c.OneShotDocsSince(ctx, sinceVal) {
select {
case <-ctx.Done():
return
case ch <- doc:
sinceVal = doc.latestSeq()
}
}
if !continuous {
break
}

Without this change, I don't understand how the loop closes in the non continuous case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It closes from the context when that is cancelled, but otherwise doesn't.

The loop will sit waiting in the OneShotDocsSince iterator until a new sequence is available on the client via the c._seqCond.Wait()

rest/blip_client_test.go Outdated Show resolved Hide resolved
rest/blip_client_test.go Outdated Show resolved Hide resolved
@bbrks bbrks requested a review from torcolvin December 10, 2024 17:22
@bbrks bbrks merged commit b5f0e80 into main Dec 11, 2024
38 checks passed
@bbrks bbrks deleted the CBG-4270 branch December 11, 2024 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants