- Wanted to speed up an export to CSV
- Use PostgreSQL COPY function to improve performance
- Create a view in PostgreSQL with 1,000,000 rows
- Create a
ActiveRecord
object namedExampleModel
backed by this view - Extend
ActiveRecord::Relation
with two methodsto_csv
andto_csv_copy
.to_csv
uses Ruby's built in CSV whileto_csv_copy
uses the PostgreSQL COPY command.
To use the PostgreSQL COPY command we need to do a few things:
- Get access to the raw
pg
connection withconnection.raw_connection
- Execute the COPY command. We do this by using the
to_sql
method ofActiveRecord::Relation
to get the base SQL we want to execute and then running the copy, printing tostdout
. - We then call the
get_copy_data
to get the copy data in CSV form.
For small number of records (~1,000), there is no noticeable difference in performance between the to_csv
and the to_csv_copy
.
But as the amount of data increases, you can start to see gigantic performance improvements.
On my development machine, I see speed ups of ~9x for 50,000 rows and 30x for 500,000 rows.
$ bundle install
$ rake db:migrate
$ rake benchmark
user system total real
1000 records with CSV library : 0.090000 0.010000 0.100000 ( 0.450539)
1000 records with PG COPY : 0.000000 0.000000 0.000000 ( 0.327618)
user system total real
50000 records with CSV library : 2.650000 0.100000 2.750000 ( 3.157916)
50000 records with PG COPY : 0.040000 0.020000 0.060000 ( 0.431258)
user system total real
500000 records with CSV library : 29.190000 0.820000 30.010000 ( 31.398112)
500000 records with PG COPY : 0.460000 0.180000 0.640000 ( 1.308529)