The man getting you down?
Theman lets you import lots of data into PostgreSQL very fast.
gem install 'theman'
Say we have a csv file called sample.csv
and we want to count how many rows we created:
conn = PGconn.open(:dbname => 'test') agent = Theman::Agency.new(conn, 'sample.csv') agent.create! res = conn.exec("SELECT count(*) FROM #{agent.table_name}") res.getvalue(0,0)
conn = ActiveRecord::Base.connection.raw_connection agent = Theman::Agency.new(conn, 'sample.csv') agent.create! model = Theman::Object.new(agent.table_name, ActiveRecord::Base) model.count
Theman will call the create!
method if you pass in a block.
conn = ActiveRecord::Base.connection.raw_connection agent = Theman::Agency.new conn, 'ugly.csv' do |smith| smith.nulls /"N"/, /"UNKNOWN"/, /""/ smith.chop 15 smith.delimiter "|" smith.table do |t| t.string :name, :limit => 50 t.date :date t.integer :ext_id t.float :amount t.boolean :exited end end MyModel.table_name = agent.table_name MyModel.where(:exited => true).count
In the above example we omitted the last 15 rows, made some things null and specified some column data types.
If you do not provide a table block your columns will be VARCHAR(255); you can cherry pick the columns that you want to change the data types for.
The temp table has no id column by default, but you can add one by calling add_primary_key!
, this will add the agents_pkey
column.
If you want to use ON COMMIT DROP
you will need to pass in :on_commit => :drop
into options and do everthing inside a transacton.
agent = Theman::Agency.new conn, 'sample.csv', :on_commit => :drop agent.transaction do agent.create! # do stuff end
If you data does not have headers pass into options :headers => false
, but each column must be specified or the import will fail.
Ah dates, the joy! Use datestyle to tell Theman to then tell PostgreSQL:
agent = Theman::Agency.new conn, 'uber_foie_gras.csv' do |schmit| schmit.datestyle 'European' schmit.table do |t| t.date :start_date t.date :end_date end end
Refer to PostgreSQL docs for more info in the mean time here is some copy and paste action:
ISO
Use ISO 8601-style dates and times (YYYY-MM-DD HH:MM:SS). This is the default.
SQL
Use Oracle/Ingres-style dates and times.
PostgreSQL
Use traditional PostgreSQL format.
German
dd.mm.yyyy
European
dd/mm/yyyy
US
mm/dd/yyyy
PostgreSQL COPY
requires that the data be well formed, any rows that are different to what is expected by the table the import will raise a Theman::Agency::Error
.
If you are importing very large files and the import fails space on disc will still be used untill VACUUM
.
(The MIT License)
Copyright © 2010 mynameisrufus (Rufus Post)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.