Skip to content

responsible of synchronizing duplicate data between collections

License

Notifications You must be signed in to change notification settings

amit221/mongodb-denormalized-data-sync

Repository files navigation

mongodb-data-sync

Duplicate data between multiple collections (Denormalization) is a common thing in MongoDB. It is efficient for searching, sorting and field projection.

Handling duplicate data is a pain, you will have to create jobs to sync the data or update in place all the collections with the duplicated data.

mongodb-data-sync solves this problem. With mongodb-data-sync you declare the dependencies in a logical place, for instance, with the schemas). mongodb-data-sync takes care of syncing the data in almost real-time.

It uses the native MongoDB Change Streams in order to keep track of changes.

Core Features

  1. It was designed to do all the synchronization with minimum overhead on the database. Most of the checks are done in memory.

  2. It uses the native MongoDB Change Streams in order to keep track of changes.

  3. It has a plan A and B to recover after a crash.

  4. It gives you an easy way to create dependencies with no worries of handling them.

  5. After declaring Your dependencies you can retroactively sync your data.

  6. from version 0.0.25 you can add a mysql dependency, this is one way dependency the refCollection must be a mongodb collection

  7. from version 0.0.29 you can now create triggers for update,insert,replace and delete

Notice

mongodb-data-sync is still experimental

Pros and cons of having duplicate data in multiple collection

Pros

  1. No need for joins.
  2. Index all fields.
  3. Faster and easier searching and sorting.

Cons

  1. More storage usage.
  2. Hard to maintain: Need to keep track of all the connections (this is what mongodb-data-sync comes to solve).
  3. Add write operations, every update will have to update multiple collections

Requirements

  • MongoDB v4 or higher replaica set
  • nodejs 7.6 or higher

Architecture

mongodb-data-sync built from 2 separate parts.

  1. The engine (there should only be one) - a nodejs server application that's you have to run from your machine(you will see how to do it in the next steps). The engine runs all the updates and recovery logic. it was designed to work as a single process. It knows where to continue after a restart/crash. Don't try auto-scaling or set 2 containers for high availability. in short Don't use more than 1 engine,

  2. The SDK - responsible for managing the database dependencies of your application. It connects your app with the engine.

Instructions

The Instructions will address the 2 parts separately: the engine and the SDK.

The engine

Run

npm install mongodb-data-sync -g

Then, in the cmd run

mongodb-data-sync --key "some key" --url "mongodb connection url"
Options:

  --debug                console log important information
  
  -p, --port <port>      server port. (default: 6500)
  
  -d, --dbname <dbname>  the database name for the package. (default: "mongodb_data_sync_db")
  
  -k, --key <key>        API key to use for authentication of the SDK requests, required
  
  -u, --url <url>        MongoDB connection url, required
  
  -h, --help             output usage information

that's it for running the server, let's jump to the SDK

SDK

You can look at the example on github

Install
npm install mongodb-data-sync --save

init

first initialize the client , do it as soon as possible in your app

const SynchronizerClient = require('mongodb-data-sync');

// settings the communication between you app and the engine.
// use this method the number of Database you want to work on
SynchronizerClient.init({

    // your Database name the package should do the synchronization on (required)
    dbName: 'mydb', 
    
    // the URL for package engine you run  (required),  
    engineUrl: 'http://localhost:6500',
   
    //the authentication key you declared on the engine application (required)
    apiKey: 'my cat is brown', 
}); 

returns a Promise

getInstance

const synchronizerClientInstance = SynchronizerClient.getInstance({

 // your Database name you want work on
    dbName: 'mydb', 

}); 

return an instance related to your db(its not a mongodb db instance) for dependencies operations

addDependency

// 'addDependency' allow you to declare a dependency between 2 collections
synchronizerClientInstance.addDependency({
   
   // the dependent collection is the collection that need to get updated automatically  (required)
   // in case the dependent collection is a mysql table ,its should be writing like this mysql.dbname.tablename
   dependentCollection: 'orders',
   
   //the referenced collection is the collection that get updated from your application (required)
   refCollection: 'users',
   
   // the dependent collection field to connect with (required)
   localField: 'user_id',
   
   // the referenced collection field to connect with, default _id ,using other field then _id will cuz an extra join for each check (optional)
   foreignField:"_id" , // default
   
   // an object represents the fields who need to be updated.
   // the keys are the fields you want to be updated 
   // the values are the fields you want to take the value from (required)
   fieldsToSync: {
       user_first_name:'first_name',
       user_last_name:'last_name',
       user_email:'email'
   },
   
    // the engine uses a resume-token to know from where to continue the change stream. 
    // in case you had a crash for a long time and the oplog doesn't have this token anymore the engine will start update all the dependencies from the beginning,
    // it is recommended to supply an update field (if you have) so the engine will start sync only for dates after the crash 
    refCollectionLastUpdateField:'last_update'

});

return Promise with the id of the Dependency

removeDependency

// deletes a dependency based on id 
synchronizerClientInstance.removeDependency(id);

return Promise

getDependencies

// used to get the database dependencies
synchronizerClientInstance.getDependencies();

return Promise with all your database dependencies

syncAll

// used to sync all the data in your database according to your dependencies.
// most of the time this function needs to be called only if you add a new dependency on an old data 
synchronizerClientInstance.syncAll();

return Promise

addTrigger

synchronizerClientInstance.addTrigger({

    // the dependent collection to subscribe triggers on (required)
    dependentCollection : "orders",
    
    // the type of the trigger , can be insert,update,replace,delete (required)
    triggerType:'insert',
    
    // when triggerType is update define which fields you want to trigger the update 
    triggerFields : [],
   
    // when knowledge set to true it will retry to fire the event until its get on ok http status
    knowledge : false, // default
    
    // the url the trigger will call 
    url:'http://localhost/insert-trigger'
});

return Promise with the id of the Trigger

removeTrigger

// deletes a trigger based on id 
synchronizerClientInstance.removeTrigger(id);

return Promise

About

responsible of synchronizing duplicate data between collections

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published