-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: concurrent disk image write. #2395
base: main
Are you sure you want to change the base?
Conversation
Write it concurrently to speed it up from the previous sequential write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks good. But I would love to see a benchmarking that compares the performance of the 2 approaches.
@muhamadazmy I've tried to paralelize the io.Copy with this environment:
the results:
|
That's cool. One thing i had to add. Flists with raw images are obsolete and should not be used anymore. Zos still supports it for backwards compatibility only but definitely new workloads with this kind of images should not be allowed. |
Forgot to say that make sure to clean up the cache between the benchmarks runs. Since rfs caches the downloaded content in zos-cache. Means that second run will always go faster than first run since it doesn't have to download the image again |
sure, it only took ~2 mins with the cache. |
actually i not only deleted the cache, but delete all the qemu cow disks because i don't know how to delete it. var/cache/modules/flistd # ls
cache flist log mountpoint pid ro |
pkg/storage/disk.go
Outdated
// the sequential write is slow because the data source is from the remote server. | ||
var ( | ||
// use errgroup because there is no point in continuing if one of the goroutines failed | ||
group = new(errgroup.Group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be usable on the zero value or with a context if required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be usable on the zero value
could you elaborate more on this?
or with a context if required
not possible for now:
- the func doesn't have context
io.Copy
itself doesn't have context.
I think make it context aware will need separate ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think make it context aware will need separate ticket
DiskWrite
handled by storaged
, called by provisiond
through zbus
.
So, plain Go context will not really work here.
But of course we can start with context
inside storaged
.
As a side note, i noticed that DiskWrite
keep going even tho provisiond
already cancel the deployment/contract.
So, we already have real cancellation issue over zbus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please mention the cancellation problem as a known issue on the top of PR to be easily found later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please mention the cancellation problem as a known issue on the top of PR to be easily found later on
Maybe create separate issue because it is not really specific to this PR
pkg/storage/disk.go
Outdated
} | ||
wr := io.NewOffsetWriter(file, start) | ||
rd := io.NewSectionReader(source, start, len) | ||
_, err = io.Copy(wr, rd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe using io.CopyBuffer could avoid temporal buffers https://pkg.go.dev/io#CopyBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, no need to call file.Sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe using io.CopyBuffer could avoid temporal buffers https://pkg.go.dev/io#CopyBuffer
the difference between io.Copy
and CopyBuffer
is : CopyBuffer lets us choose our own buffer.
io.Copy
will also use buffer.
The slowness here is because of downloading the file from the hub
, so i don't think that the buffer will make big differences here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, no need to call file.Sync?
i don't think so, we're copying to rfs
anyway.
CMIIW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.CopyBuffer uses a buffer your provide instead of io.Copy keeping allocating them during every call (it's still an io to save imo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of io.Copy keeping allocating them during every call
No, the buffer used by io.Copy
also only allocated once.
See https://github.com/golang/go/blob/master/src/io/io.go#L417-L427
if what you mean is creating global buffer, outside the DiskWrite
func then it would be lot of works for very small benefit.
All in all looks good, also I wonder about if that code path should be enabled in case of HDD only nodes? |
can you elaborate more on this? |
zos right now supports also HDD nodes, I'm believe sequential nature of HDD would have the performance impacted (worse) with the concurrency |
on another note I'd also add the concept of retries to the code if possible |
As a side note, current code is favoring SSD disks |
why not handle it inside |
I assume it won't happen because the slowness from remote |
Write it concurrently to speed it up from the previous sequential write.
Description
Change image DiskWrite from sequential to concurrent, to make it faster.
Changes
Related Issues
Fixes :
Checklist