Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large file is going to blow memory #1780

Closed
philister opened this issue Nov 24, 2015 · 10 comments
Closed

large file is going to blow memory #1780

philister opened this issue Nov 24, 2015 · 10 comments
Labels

Comments

@philister
Copy link

I believe the commit 927479c (PR #1517 and issue #1338) leads to a memory-problem when uploading large files:
I guess file.read reads the complete file-content into memory. Fog can not "stream" the file from disk, using the multipart-feature correctly (when set fog_attributes[:multipart_chunk_size])
At least, when I revert this commit, my large files (more GB than available mem) are uploaded. Current master returns a "NoMemoryError: failed to allocate memory".

philister added a commit to pictrs/carrierwave that referenced this issue Nov 24, 2015
@thomasfedb
Copy link
Contributor

Hi @philister, can you provide an example of an uploader that exhibits this issue?

@thomasfedb thomasfedb added the bug label Nov 25, 2015
@philister
Copy link
Author

Gemfile.lock (excerpt)

remote: git@github.com:carrierwaveuploader/carrierwave.git
revision: d6e7a4480ae595ad28f4cd19c5b1c6a7918b015f

rails (4.2.5)
fog (1.36.0)

animal.rb

class Animal < ActiveRecord::Base
  mount_uploader :picture
end

rails c

a = Animal.new; a.picture = File.open('10GB.zip') # more than RAM
=> #<File:/home/philipp/testrails/10GB.zip>
>> a.save
   (0.1ms)  begin transaction
  SQL (0.8ms)  INSERT INTO "animals" ("picture", "created_at", "updated_at") VALUES (?, ?, ?)  [["picture", "10GB.zip"], ["created_at", "2015-11-25 10:46:04.552696"], ["updated_at", "2015-11-25 10:46:04.552696"]]
   (0.2ms)  rollback transaction
NoMemoryError: failed to allocate memory
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/storage/fog.rb:315:in `read'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/storage/fog.rb:315:in `store'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/storage/fog.rb:78:in `store!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/uploader/store.rb:59:in `block in store!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/uploader/callbacks.rb:17:in `with_callbacks'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/uploader/store.rb:58:in `store!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/mounter.rb:99:in `each'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/mounter.rb:99:in `store!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/bundler/gems/carrierwave-5217412e56ca/lib/carrierwave/mount.rb:387:in `store_picture!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:432:in `block in make_lambda'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:228:in `call'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:228:in `block in halting_and_conditional'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:506:in `call'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:506:in `block in call'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:506:in `each'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:506:in `call'
... 8 levels...
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `block in transaction'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/transaction.rb:184:in `within_new_transaction'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/transactions.rb:220:in `transaction'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/transactions.rb:348:in `with_transaction_returning_status'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/transactions.rb:286:in `block in save'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/transactions.rb:301:in `rollback_active_record_state!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/activerecord-4.2.5/lib/active_record/transactions.rb:285:in `save'
    from (irb):1
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/railties-4.2.5/lib/rails/commands/console.rb:110:in `start'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/railties-4.2.5/lib/rails/commands/console.rb:9:in `start'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:68:in `console'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
    from /home/philipp/.rvm/gems/ruby-2.2.3/gems/railties-4.2.5/lib/rails/commands.rb:17:in `<top (required)>'
    from bin/rails:9:in `require'
    from bin/rails:9:in `<main>'

@thomasfedb
Copy link
Contributor

@philister Can you provide a full stack trace please?

@philister
Copy link
Author

Added above.

@thomasfedb
Copy link
Contributor

@bensie Are you able to look over this?

@bensie
Copy link
Member

bensie commented Nov 27, 2015

This definitely seems problematic, though I'm not sure when I'll have time to dig in and take a look.

@leandro
Copy link

leandro commented Dec 27, 2016

I'm going through a similar problem but on the opposite direction. I'm fetching a large file and I want to be able to (according to @philister example) forward it to the request's response by chunks:

tmp_path = Tempfile.create('animal-tmp') do |output|
  while buffer = animal.picture.read(4096)
    output.write(buffer)
  end
  output.path
end

send_file tmp_path, type: animal.picture.content_type, disposition: 'inline'

And I noticed, that the CarrierWave::Uploader::Proxy#read doesn't allow me to do so:
https://github.com/carrierwaveuploader/carrierwave/blob/master/lib/carrierwave/uploader/proxy.rb#L43

PS.: I'm not using send_data because using send_file is apparently faster.

@leandro
Copy link

leandro commented Dec 27, 2016

Sorry. Never mind. animal.picture.file points to a CarrierWave::Storage::Fog::File's instance. So I'll check out their code to see what I can do about it.

@tmedford
Copy link

Is there any update on this?

DarkArc added a commit to topechelon/carrierwave that referenced this issue Apr 12, 2018
This fixes an issue with file upload streaming. Previously passing the file was added in PR carrierwaveuploader#468, and was subsequently broken by PR carrierwaveuploader#1517.

This corrects the approach implemented in PR carrierwaveuploader#468, without introducing the problem in PR carrierwaveuploader#1517.

New behavior:

  In the case of a fog based retrieval request, the file will be fetched as it was previously. The file read method will then see that file.body is set to a non-nil value, and return the file content from the file object.

  In the case of a fog based storage request, the file will be uploaded either as full text, or in batches as it was in PR carrierwaveuploader#468. However, to address the issue observed in PR carrierwaveuploader#1517 we set the body to nil, and then store a reference to the source file (the CarrierWave::SanitizedFile/read compatible object passed). This in turn results in the read method seeing no body on the file object, and consulting the source file for the file contents.

The approach to storage requests is taken over two other approaches:

  Approach carrierwaveuploader#1:

    We could read the file into memory as carrierwave currently does, however, this results in large files often exhausting the heap space of the process, thus making carrierwave a poor choice for large file uploads.

  Approach carrierwaveuploader#2:

    We could not store a reference to the original source file, however, this would result in an additional retrieval requests carrierwave does not currently make, potentially costing in bandwidth for current users.
DarkArc added a commit to topechelon/carrierwave that referenced this issue Apr 12, 2018
This fixes an issue with file upload streaming. Previously passing the file was added in PR carrierwaveuploader#468, and was subsequently broken by PR carrierwaveuploader#1517.

This corrects the approach implemented in PR carrierwaveuploader#468, without introducing the problem in PR carrierwaveuploader#1517.

New behavior:

  In the case of a fog based retrieval request, the file will be fetched as it was previously. The file read method will then see that file.body is set to a non-nil value, and return the file content from the file object.

  In the case of a fog based storage request, the file will be uploaded either as full text, or in batches as it was in PR carrierwaveuploader#468. However, to address the issue observed in PR carrierwaveuploader#1517 we set the body to nil, and then store a reference to the source file (the CarrierWave::SanitizedFile/read compatible object passed). This in turn results in the read method seeing no body on the file object, and consulting the source file for the file contents.

The approach to storage requests is taken over two other approaches:

  Approach carrierwaveuploader#1:

    We could read the file into memory as carrierwave currently does, however, this results in large files often exhausting the heap space of the process, thus making carrierwave a poor choice for large file uploads.

  Approach carrierwaveuploader#2:

    We could not store a reference to the original source file, however, this would result in an additional retrieval requests carrierwave does not currently make, potentially costing in bandwidth for current users.
DarkArc added a commit to topechelon/carrierwave that referenced this issue Apr 12, 2018
This fixes an issue with file upload streaming. Previously passing the file was added in PR carrierwaveuploader#468, and was subsequently broken by PR carrierwaveuploader#1517.

This corrects the approach implemented in PR carrierwaveuploader#468, without introducing the problem in PR carrierwaveuploader#1517.

New behavior:

  In the case of a fog based retrieval request, the file will be fetched as it was previously. The file read method will then see that file.body is set to a non-nil value, and return the file content from the file object.

  In the case of a fog based storage request, the file will be uploaded either as full text, or in batches as it was in PR carrierwaveuploader#468. However, to address the issue observed in PR carrierwaveuploader#1517 we set the body to nil, and then store a reference to the source file (the CarrierWave::SanitizedFile/read compatible object passed). This in turn results in the read method seeing no body on the file object, and consulting the source file for the file contents.

The approach to storage requests is taken over two other approaches:

  Approach carrierwaveuploader#1:

    We could read the file into memory as carrierwave currently does, however, this results in large files often exhausting the heap space of the process, thus making carrierwave a poor choice for large file uploads.

  Approach carrierwaveuploader#2:

    We could not store a reference to the original source file, however, this would result in an additional retrieval requests carrierwave does not currently make, potentially costing in bandwidth for current users.
DarkArc added a commit to topechelon/carrierwave that referenced this issue Apr 12, 2018
This fixes an issue with file upload streaming. Previously passing the file was added in PR carrierwaveuploader#468, and was subsequently broken by PR carrierwaveuploader#1517.

This corrects the approach implemented in PR carrierwaveuploader#468, without introducing the problem in PR carrierwaveuploader#1517.

New behavior:

  In the case of a fog based retrieval request, the file will be fetched as it was previously. The file read method will then see that file.body is set to a non-nil value, and return the file content from the file object.

  In the case of a fog based storage request, the file will be uploaded either as full text, or in batches as it was in PR carrierwaveuploader#468. However, to address the issue observed in PR carrierwaveuploader#1517 we set the body to nil, and then store a reference to the source file (the CarrierWave::SanitizedFile/read compatible object passed). This in turn results in the read method seeing no body on the file object, and consulting the source file for the file contents.

The approach to storage requests is taken over two other approaches:

  Approach carrierwaveuploader#1:

    We could read the file into memory as carrierwave currently does, however, this results in large files often exhausting the heap space of the process, thus making carrierwave a poor choice for large file uploads.

  Approach carrierwaveuploader#2:

    We could not store a reference to the original source file, however, this would result in an additional retrieval requests carrierwave does not currently make, potentially costing in bandwidth for current users.
@mshibuya
Copy link
Member

Fixed by #2314.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants