Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Google storage upload #759

Open
Fonsan opened this issue Nov 21, 2018 · 17 comments
Open

Streaming Google storage upload #759

Fonsan opened this issue Nov 21, 2018 · 17 comments
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@Fonsan
Copy link

Fonsan commented Nov 21, 2018

I am trying to solve the case of uploading files in a streaming fashion without keeping the full contents in memory

I have previously implemented the same feature in aws/aws-sdk-ruby#1711

As identified in googleapis/google-cloud-ruby#1997
There are some limitations in google-api-client which hard codes the use of the content length header

I have written a minimal wrapper library that monkey patches it here which should serve as a good source of inspiration: https://github.com/Fonsan/gcs_stream_upload

require 'gcs_stream_upload'

storage = Google::Cloud::Storage.new(timeout: 5 * 60)
bucket = storage.bucket('bucket')
gcs_stream_upload = GCSStreamUpload.new(bucket)

gcs_stream_upload.upload('object') do |io|
 IO.copy_stream(IO.popen('yes | head -n 10'), io)
end
# => Written "y\n" * 10


gcs_stream_upload.upload('object') do |io|
 io << 'data'
 io << 'dat2'
end
# => Written "datadat2"
@quartzmo quartzmo self-assigned this Nov 26, 2018
@quartzmo
Copy link
Member

@Fonsan Thank you for your patience with this request! I finally got the chance today to run https://github.com/Fonsan/gcs_stream_upload and the examples work for me.

@quartzmo
Copy link
Member

@Fonsan Can you see a backward-compatible way to add this functionality to googleapis/google-api-ruby-client? I think that might be the first step toward incorporating it into google-cloud-storage. Even then, however, I'm not sure if other considerations might prevent adding this feature, for example the use of #rewind as mentioned here by @blowmage.

@Fonsan
Copy link
Author

Fonsan commented Jan 18, 2019

@quartzmo Given the way request_header works in #send_start_command; it fails even if we would stub the #size method on objects that do not respond to size when passed and then stub #to_s we would still end up in a scenario where UPLOAD_CONTENT_LENGTH => nil. This still sets the header and the GCS backend service fails in that scenario currently.

What I am describing above is a bit of a hack but I would really rather see the change in

def send_start_command(client)

Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size? @blowmage

BTW @ https://www.kaiko.com/ we are running the gem above in production until a cleaner alternative comes around.

@quartzmo
Copy link
Member

Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size?

Should this conversation be moved to a new issue on googleapis/google-api-ruby-client ?

@quartzmo
Copy link
Member

@Fonsan If it's OK with you, I will transfer this issue to googleapis/google-api-ruby-client for further discussion.

@Fonsan
Copy link
Author

Fonsan commented Jan 19, 2019

@quartzmo any efforts to further the progression would be excellent :)

@quartzmo quartzmo transferred this issue from googleapis/google-cloud-ruby Jan 29, 2019
@quartzmo quartzmo added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jan 29, 2019
@quartzmo quartzmo removed their assignment Jan 29, 2019
@dazuma
Copy link
Member

dazuma commented Jul 25, 2019

@quartzmo I'll let you investigate the feasibility of this.

@quartzmo
Copy link
Member

@blowmage Any comment on this?

Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size?

@blowmage
Copy link
Contributor

I believe HttpClient is checking for size. But yes, google-api-client could be rewritten to use something other than HttpClient and not check File#size.

@ianks
Copy link

ianks commented Oct 19, 2019

Judging from the documentation, Content-Length header is not required. Seems like it may be a straightforward fix. @quartzmo do you still plan on transferring the issue?

@quartzmo quartzmo removed their assignment Nov 13, 2019
@quartzmo
Copy link
Member

quartzmo commented Dec 7, 2020

quartzmo transferred this issue from googleapis/google-cloud-ruby on Jan 29, 2019

@tinco
Copy link

tinco commented Feb 16, 2021

hi @quartzmo are you actively working on the streaming support on this? I noticed this issue has no one assigned to it right now

@dazuma
Copy link
Member

dazuma commented Feb 16, 2021

It's not being actively worked on right at this point. Likely this will require first completing #2348 (which is on my plate but I won't get to it for a few more weeks).

@simi
Copy link

simi commented Mar 10, 2021

No note this is related as well - googleapis/google-cloud-ruby#8235, any news? Is PR for #2348 welcomed?

@simi
Copy link

simi commented Apr 13, 2021

@dazuma is there anything I can help with? Would it be welcomed to push #2348 forward?

@simi
Copy link

simi commented Jul 28, 2021

:'( no progress? Anything I can help with here?

@simi
Copy link

simi commented May 29, 2023

@quartzmo @frankyn @dazuma It is 2023 and this is still not part of the official library if I understand it well. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

7 participants