-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upload-product is not efficient #196
Comments
We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story. The labels on this github issue will be updated when the story is started. |
That's really dope! Question - why didn't you just modify the formcontent
package in place instead of splitting this out as a repo?
…On Thu, Jun 28, 2018 at 10:53 AM cf-gitbot ***@***.***> wrote:
We have created an issue in Pivotal Tracker to manage this. Unfortunately,
the Pivotal Tracker project is private so you may be unable to view the
contents of the story
<https://www.pivotaltracker.com/story/show/158701122>.
The labels on this github issue will be updated when the story is started.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#196 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAqFAyea7QehPyfmdXE6nwUg7Lv97OXPks5uBRf_gaJpZM4U70P9>
.
|
It can be as generic as a standalone library, I guess? I didn't find a good enough library, so I just want to write it |
@ljfranklin curious for your thoughts on this one since I know you've spent time in the past looking at improvements to upload speeds |
Definitely something we'd like to investigate further. At minimum we shouldn't be leaving 9GB temp files after exit. I haven't looked at our code closely but I wonder if it's related to this. Extra disk space aside, are you seeing any speed up in upload time with your library? |
that is interesting, but I believe it is a separate issue. As
Given assumption: 9G file, rotational disk (W:~150MB/s), it takes about one minute just to write the tempfile to disk. And all that time it shows as The library wipes that minute. |
@fredwangwang could you run |
There is no point to time it... As if uploading a 9G file, network bandwidth fluctuation would be the primary factor cauzing the time difference. The libaray shortens the total time by something around one minute given the exact same network situation, thats all. For uploading a smaller file (1G for example), writing to the disk doesn't take that long anyways, then the (noticable, mainly time factor) difference of the library is not obvious.
If you dig the source code, it is writing the entire multipart payload into a tempfile, which including if you really want to experience the improvement of the library without touching the source cod: Go to a upload-product task, watch it run. Time how long it spends on |
@fredwangwang confirmed that your implementation does speed things up: Current
Your code (<1 second processing time):
For your implementation, I'm still hoping we can have the stdlib do the heavy lifting and not have to deal with the low level boundary calculations. Could we instead use an approach similar to this to have an |
I have that here: https://github.com/fredwangwang/formcontent/tree/threaded That was my first attempt (using io.pipe). See my original post, that was actually my So you could take a look that, the code does look simpler |
@fredwangwang Let's cross-team pair on getting this merged in. I'm liking the direction the threaded branch is going in, but I don't understand why we have a
Then in |
@ljfranklin yeah i do think your solution is cleaner. The reason I did it that way is I don't want to create additional structs to hold all the different properties, so the |
Moving convo from slack back here:
I prefer the one writer approach just because it's easier for me to wrap my head around and matches most closely with the canonical multipart upload examples listed in the godocs. My main goal is to minimize the amount of low-level boundary calculations we have to do to keep the code as maintainable as possible, e.g. |
Paired for a bit with @fredwangwang and sounds like we're in sync about the potential implementation details. He's planning on submitting what he has as a PR and we'll try to get it merged in. |
Merged #211 to fix this. Thanks for working through this tricky issue @fredwangwang! |
I tried the following and don't see an obvious improvement. Is my test case valid? New version (
Old version (
oh... looks like I should just be paying attention to the time spent in "processing product"? |
yep, the improvement would not speed up network connection (I hope I can...), its about reducing the processing product time and the temp disk occupied by the command |
Get
How big is this thing? (1.9G)
Before (
After (
|
And dig /tmp folder, you would find several copy of ~1.9G file, which are
generated by the old command executions
…On Wed, Jul 25, 2018 at 12:56 David McClure ***@***.***> wrote:
Get ts:
dm $ brew install moreutils
How big is this thing? (1.9G)
dm $ ls -lh ~/Downloads/p-isolation-segment-2.3.0-build.182.pivotal
-rw-r--r--@ 1 pivotal staff 1.9G Jul 25 11:32 /Users/pivotal/Downloads/p-isolation-segment-2.3.0-build.182.pivotal
Before (8s)
dm $ om -k upload-product --product ~/Downloads/p-isolation-segment-2.3.0-build.182.pivotal | ts
Jul 25 11:49:04 processing product
Jul 25 11:49:12 beginning product upload to Ops Manager
...
After (0s)
dm $ om -k upload-product --product ~/Downloads/p-isolation-segment-2.3.0-build.182.pivotal | ts
Jul 25 11:51:38 processing product
Jul 25 11:51:38 beginning product upload to Ops Manager
...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#196 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AUr56ZooYMdH-q4VUZbZZGVRnIbDrWHBks5uKL9rgaJpZM4U70P9>
.
|
related to issue #196, I created the formcontent in my repo, now there is a request to move the code into a pivotal repo to remove the dependency on a personal repo. [#159835480]
Had a branch to move formcontent into |
We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story. The labels on this github issue will be updated when the story is started. |
related to issue #196, I created the formcontent in my repo, now there is a request to move the code into a pivotal repo to remove the dependency on a personal repo. [#159835480]
merged |
Noticed this yesterday, when I ran
upload-product --product pas.pivotal
,om
spend a long time sayingprocessing product
. Eventually found that it writes the content payload to the temp disk. This makes sense because we never want to load a 9G file into the memory, but it is not efficient. And after the upload process finishes, it will keep the payload in the/tmp
folder. Although eventually it will be cleaned up by the operating system, it occupies a lot of the space.So I tried to find is there a way to multipart upload big file without consuming much disk or memory, but failed to find a good library to do that. Then I just tried to write a library, with the aim to plugin into om as easy as possible.
Then I come up with this:
https://github.com/fredwangwang/formcontent/
It loads the content of the file from the disk on demand, so it doesn't eat additional disk or consume much memory.
And I write there are two versions in the repository, one is threaded (master branch), another is non-threaded (non-threaded branch). The non-threaded branch works better, but the code clearity is worse, as it basically implement a cheap state machine in a custom Read method.
The text was updated successfully, but these errors were encountered: