Purview

A framework designed to simplify data warehousing

Installation

Add this line to your application's Gemfile:

gem 'purview'

And then execute:

$ bundle

Or install it yourself as:

$ gem install purview

Usage

Load the MySQL client (for MSSQL simple change 'mysql2' to 'tiny_tds'; for PostgreSQL simply change 'mysql2' to 'pg' -- when using this gem in a JRuby environment the 'jdbc/jtds', 'jdbc/mysql' and/or 'jdbc/postgres'library must be installed/available)

require 'mysql2'

Set the table-name (this can be anything, but it must exist)

table_name = :users

Define the Column(s) (available column-types: Bigint, Boolean, CreatedTimestamp, Date, Float, Id, Integer, Money, String, Text, Time, Timestamp, UpdatedTimestamp & UUID -- the Id, CreatedTimestamp & UpdatedTimestamp columns are required for all BaseSyncable tables)

id_column = Purview::Columns::Id.new(:id),
name_column = Purview::Columns::String.new(:name, :nullable => false),
email_column = Purview::Columns::String.new(:email, :nullable => false, :limit => 100),
created_at_column = Purview::Columns::CreatedTimestamp.new(:created_at),
updated_at_column = Purview::Columns::UpdatedTimestamp.new(:updated_at),

columns = [
  id_column,
  name_column,
  email_column,
  created_at_column,
  updated_at_column,
]

Define the Indices (availble index-types: Composite & Simple). By default Indices will be added for the required column-types (CreatedTimestamp & UpdatedTimestamp)

indices = [
  Purview::Indices::Simple.new(email_column, :unique => true),
]

Configure the Puller (available puller-types: MSSQL, MySQL, PostgreSQL & URI)

puller_opts = {
  :type => Purview::Pullers::URI,
  :uri => 'http://feed.test.com/users',
}

Configure the Parser (available parser-types: CSV, SQL & TSV)

parser_opts = {
  :type => Purview::Parsers::TSV,
}

Configure the Loader (for PostgreSQL simply change MySQL to PostgreSQL)

loader_opts = {
  :type => Purview::Loaders::MySQL,
}

Combine all the configuration options and instantiate the Table

table_opts = {
  :columns => columns,
  :indices => indices,
  :loader => loader_opts,
  :parser => parser_opts,
  :puller => puller_opts,
}

table = Purview::Tables::Raw.new(
  table_name,
  table_opts
)

Set the database-name (this can be anything, but it must exist)

database_name = :data_warehouse_raw

Combine all the configuration options and instantiate the Database (for PostgreSQL simply change MySQL to PostgreSQL)

database_opts = {
  :tables => [table],
}

database = Purview::Databases::MySQL.new(
  database_name,
  database_opts
)

Create the Table (in the DB). Recommended for testing purposes only. For production environments you will likely want an external process to manage the schema (for PostgreSQL simply change Mysql2::Error to PG::DuplicateTable)

begin
  database.create_table(table)
rescue Mysql2::Error
  # Swallow
end

Initialize the Table (in the DB). This process sets the max_timestamp_pulled value in the table_metadata table and is used by the candidate Table selection algorithm to determine which Table should be synchronized next (the least recently synchronized Table will be selected). This value is also used as the high-water mark for records pulled from its source

database.initialize_table(table, timestamp)

Baseline the Table. This process will quickly get the state of the Table as close to the current state as possible. This is generally useful when adding a new Table to an existing schema (ideally this should be done while the Table is disabled)

database.baseline_table(table)

Enable the Table (in the DB). This process sets the enabled_at value in the table_metadata table and is used by the candidate Table selection algorithm to determine the pool of Table(s) available for synchronization (to remove a Table from the pool simply execute disable_table)

database.enable_table(table)

Sync the Table. This process will pull data from its [remote-]source and reconcile the new data against the main-table (e.g. perform 'INSERT', 'UPDATE' and 'DELETE' operations).

database.sync_table(table)

Sync the Database. The result of this process is the same as sync_table except that the process itself will select a candidate table. When multiple Table(s) are configured the least recently pulled and available (enabled, initialized and unlocked) Table will be selected.

database.sync

Fetch the metadata for a Table. This process will return a Struct representation of the current state for the given Table

database.table_metadata(table)

Contributing

Fork it ( http://github.com/jzaleski/purview/fork )
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
lib		lib
spec		spec
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG		CHANGELOG
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
TODO		TODO
purview.gemspec		purview.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purview

Installation

Usage

Contributing

About

Releases

Packages

Languages

License

sermo/purview

Folders and files

Latest commit

History

Repository files navigation

Purview

Installation

Usage

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages