Skip to content

Makes it easy to execute really large computations on each row of really large data sets

License

Notifications You must be signed in to change notification settings

wickkidd/each_in_batches

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction
================

  From BolingForBatches:
    I often need to execute really large computations on really large data sets.
    I usually end up writing a rake task to do it, which calls methods in my models.
    But something about the process bugged me.  Each time I had to re-implement my
    'batching code' that allowed me to not chew up GB after GB of memory due to
    klass.find(:all, :include => [:everything_under_the_sun]). Re-implementation of
    the same logic over and over across many projects is not very DRY, so I got out
    my blow torch and lit it up.  The difficulty was that the part that was different
    each time I batched was at the center of the code, right in the middle of the
    batch loop.  But I didn't let that stop me!

  EachInBatches:
    I needed to iterate over the results and perform more actions than a single
    method would provide.  I didn't want to write a method in my app that performed
    the needed functionality as I felt the plugin should support this directly.
    I modified the original plugin so that it takes a block instead of a method.
    It will pass the object instance to the block.  It works pretty much the same
    as Class.find(:all).each {|x| do something}, except in batches n that you
    specify with :batch_size.

Installation
============

    ./script/plugin install git://github.com/wickkidd/each_in_batches.git

Example
=======
          
  To create a new Batch, call Batch#new pass it the class and any additional arguements (all as a hash).

    batch = EachInBatches::Batch.new(:klass => Payment, :select => \"DISTINCT transaction_id\", :batch_size => 50, :order => 'transaction_id')

  To process the batched data, pass a block to Batch#run the same way you would to an object returned by Class.find(:all).each.
  Batch#run will pass the data to your block, one at a time, in batches set by the :batch_size arguement.

    batch.run {|x| puts x.id; puts x.transaction_id}


  Print the results!

    batch.print_results

  Or...

  Consolidate your code if you prefer
  
    EachInBatches::Batch.new(:klass => Payment, :select => \"DISTINCT transaction_id\", :batch_size => 50, :order => 'transaction_id', :show_results => true).run{|x| puts x.id; puts x.transaction_id}


Configuration
=============

  Arguements for the initializer (Batch.new) method are:

    Required:

      :klass         - Usage: :klass => MyClass
                        Required, as this is the class that will be batched

    Optional:

      :include       - Usage: :include => [:assoc]
                        Optional

      :select        - Usage: :select => "DISTINCT field_name"
                                  or
                              :select => "field1, field2, field3"

      :order         - Usage: :order => "field DESC"

      :conditions    - Usage: :conditions => ["field1 is not null and field2 = ?", x]

      :verbose       - Usage: :verbose => true or false
                        Sets verbosity of output
                        Default: false (if not provided)

      :batch_size    - Usage: :batch_size => x
                        Where x is some number.
                        How many AR Objects should be processed at once?
                        Default: 50 (if not provided)

      :last_batch    - Usage: :last_batch => x
                        Where x is some number.
                        Only process up to and including batch #x.
                          Batch numbers start at 0 for the first batch.
                        Default: won't be used (no limit if not provided)

      :first_batch   - Usage: first_batch => x
                        Where x is some number.
                        Begin processing batches beginning at batch #x.
                          Batch numbers start at 0 for the first batch.
                        Default: won't be used (no offset if not provided)

      :show_results  - Usage: :show_results => true or false
                        Prints statistics about the results of Batch#run.
                        Default: true if verbose is set to true and :show_results is not provided, otherwise false
                    
Output
======
      
  Interpreting the output:
    '[O]' means the batch was skipped due to an offset.
    '[L]' means the batch was skipped due to a limit.
    '[P]' means the batch is processing.
    '[C]' means the batch is complete.
    and yes... it was a coincidence.  This class is not affiliated with 'one laptop per child'

License
=======

	Copyright (c) 2008 Peter H. Boling, released under the MIT license
	
	Or in other words have fun, and don't blame me!

About

Makes it easy to execute really large computations on each row of really large data sets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%