forked from pboling/each_in_batches
-
Notifications
You must be signed in to change notification settings - Fork 1
Makes it easy to execute really large computations on each row of really large data sets
License
wickkidd/each_in_batches
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Introduction ================ From BolingForBatches: I often need to execute really large computations on really large data sets. I usually end up writing a rake task to do it, which calls methods in my models. But something about the process bugged me. Each time I had to re-implement my 'batching code' that allowed me to not chew up GB after GB of memory due to klass.find(:all, :include => [:everything_under_the_sun]). Re-implementation of the same logic over and over across many projects is not very DRY, so I got out my blow torch and lit it up. The difficulty was that the part that was different each time I batched was at the center of the code, right in the middle of the batch loop. But I didn't let that stop me! EachInBatches: I needed to iterate over the results and perform more actions than a single method would provide. I didn't want to write a method in my app that performed the needed functionality as I felt the plugin should support this directly. I modified the original plugin so that it takes a block instead of a method. It will pass the object instance to the block. It works pretty much the same as Class.find(:all).each {|x| do something}, except in batches n that you specify with :batch_size. Installation ============ ./script/plugin install git://github.com/wickkidd/each_in_batches.git Example ======= To create a new Batch, call Batch#new pass it the class and any additional arguements (all as a hash). batch = EachInBatches::Batch.new(:klass => Payment, :select => \"DISTINCT transaction_id\", :batch_size => 50, :order => 'transaction_id') To process the batched data, pass a block to Batch#run the same way you would to an object returned by Class.find(:all).each. Batch#run will pass the data to your block, one at a time, in batches set by the :batch_size arguement. batch.run {|x| puts x.id; puts x.transaction_id} Print the results! batch.print_results Or... Consolidate your code if you prefer EachInBatches::Batch.new(:klass => Payment, :select => \"DISTINCT transaction_id\", :batch_size => 50, :order => 'transaction_id', :show_results => true).run{|x| puts x.id; puts x.transaction_id} Configuration ============= Arguements for the initializer (Batch.new) method are: Required: :klass - Usage: :klass => MyClass Required, as this is the class that will be batched Optional: :include - Usage: :include => [:assoc] Optional :select - Usage: :select => "DISTINCT field_name" or :select => "field1, field2, field3" :order - Usage: :order => "field DESC" :conditions - Usage: :conditions => ["field1 is not null and field2 = ?", x] :verbose - Usage: :verbose => true or false Sets verbosity of output Default: false (if not provided) :batch_size - Usage: :batch_size => x Where x is some number. How many AR Objects should be processed at once? Default: 50 (if not provided) :last_batch - Usage: :last_batch => x Where x is some number. Only process up to and including batch #x. Batch numbers start at 0 for the first batch. Default: won't be used (no limit if not provided) :first_batch - Usage: first_batch => x Where x is some number. Begin processing batches beginning at batch #x. Batch numbers start at 0 for the first batch. Default: won't be used (no offset if not provided) :show_results - Usage: :show_results => true or false Prints statistics about the results of Batch#run. Default: true if verbose is set to true and :show_results is not provided, otherwise false Output ====== Interpreting the output: '[O]' means the batch was skipped due to an offset. '[L]' means the batch was skipped due to a limit. '[P]' means the batch is processing. '[C]' means the batch is complete. and yes... it was a coincidence. This class is not affiliated with 'one laptop per child' License ======= Copyright (c) 2008 Peter H. Boling, released under the MIT license Or in other words have fun, and don't blame me!
About
Makes it easy to execute really large computations on each row of really large data sets
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Ruby 100.0%