-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: sharky cleanup and simplification of code #3500
base: master
Are you sure you want to change the base?
Conversation
1f6576a
to
ec5fe69
Compare
pkg/sharky/store.go
Outdated
s.wg.Add(1) | ||
go func() { | ||
defer sh.slots.wg.Done() | ||
defer s.wg.Done() | ||
sh.process() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently new goroutine is started for each shard
instance, would it be possible to avoid these goroutines? Are they really necessary?
All that sh.process()
function does is essentially two operations:
slot := sh.slots.Next()
sh.slots.Use(slot)
Would it be possible to just call these two functions in (store.go:136) instead of reading from channel? Probably one more utility function would be needed to get shard, but overall code might be more simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea actually :)
sl.size = uint32(len(sl.data) * 8) | ||
sl.head = sl.next(0) | ||
return err | ||
sl.data = data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should statement
sl.data = data
be guarded with mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this function is only used during bootup so no mutex is needed
This sharky implementation does not have guaranties which shard and location is returned next. As an outcome this could lead to 1) probably more cache invalidation (on hdd/ssd level); and 2) fragmented data. Would it make sense to have sharky to always return first available location in consistent order? This way disk IO will probably hit same sector(s) on disk (probably having cache hit). |
@@ -44,12 +48,11 @@ type Store struct { | |||
// - maxDataSize - positive integer representing the maximum blob size to be stored | |||
func New(basedir fs.FS, shardCnt int, maxDataSize int) (*Store, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is store ever writing data down to disk (closing and saving shards)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it asks shards to ask slots to save and close
@@ -28,12 +27,17 @@ var ( | |||
// - read prioritisation over writing | |||
// - free slots allow write | |||
type Store struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be tested.
Order of writes does not guarantee order of reads however. We cannot tell which chunk will be retrieved in the future, so it does not matter which shard we place them. |
0a9a83e
to
2e4f076
Compare
Consider this again after the major localstore releases. |
Checklist
Description
Write operations now follow a very basic round robin strategy with circular distribution of requests.
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
Screenshots (if appropriate):
This change is