It synchronizes data from a replica set to another MongoDB deployment, e.g., standalone, replica set, and sharded cluster.
It's oplog-based and provides a realtime data synchronization.
It's written in Python 2.7.
- MongoDB 2.4
- MongoDB 2.6
- MongoDB 3.0
- MongoDB 3.2
- MongoDB 3.4
- initial sync and oplog based incremental sync
- sync the specified databases and collections
- concurrent oplog replaying
See requirements for details.
-
gevent
-
toml
-
mmh3
-
pymongo
Always use pymongo 3.5.1.
Refer to https://api.mongodb.com/python/3.6.0/changelog.html
Version 3.6 adds support for MongoDB 3.6, drops support for CPython 3.3 (PyPy3 is still supported), and drops support for MongoDB versions older than 2.6. If connecting to a MongoDB 2.4 server or older, PyMongo now throws a ConfigurationError.
- source MUST be a replica set
- ignore system databases
- admin
- local
- ignore system collections
- system.*
- create users for destination manually if necessary
- suggest to authenticate with administrator if source enabled authentication
- not support geospatial index
if the source is a sharded cluster
- first, stop the balancer
- then, start a seprate sync process for each shard
Use TOML as configuration file format.
Refer to mongo_conf.toml.
Source config items.
- src.hosts - hostportstr of a member of replica set
- src.username - username
- src.password - password
- src.authdb - authentiction database
Destination config items.
- dst.mongo.hosts
- dst.mongo.authdb
- dst.mongo.username
- dst.mongo.password
Custom options for synchronization.
sync.dbs
specfies the databases to sync.
sync.dbs.colls
specifies the collections to sync.
- sync.dbs - databases to sync, sync all databases if not specify
- sync.dbs.db - source database name
- sync.dbs.rename_db - destination database name, stay the same if not specify
- sync.dbs.colls - collectons to sync, sync all collections if not specify
coll
in sync.dbs.colls
element specifies the collection to sync.
fileds
in sync.dbs.colls
element specifies the fields of current collection to sync.
- log.filepath - log file path, write to stdout if empty or not set
Command options has functional limitations. It's strongly recommended that use config file.
usage: sync.py [-h] [-f [CONFIG]] [--src [SRC]] [--src-authdb [SRC_AUTHDB]]
[--src-username [SRC_USERNAME]] [--src-password [SRC_PASSWORD]]
[--dst [DST]] [--dst-authdb [DST_AUTHDB]]
[--dst-username [DST_USERNAME]] [--dst-password [DST_PASSWORD]]
[--start-optime [START_OPTIME]]
[--optime-logfile [OPTIME_LOGFILE]] [--logfile [LOGFILE]]
Sync data from a replica-set to another MongoDB/Elasticsearch.
optional arguments:
-h, --help show this help message and exit
-f [CONFIG], --config [CONFIG]
configuration file, note that command options will
override items in config file
--src [SRC] source should be hostportstr of a replica-set member
--src-authdb [SRC_AUTHDB]
src authentication database, default is 'admin'
--src-username [SRC_USERNAME]
src username
--src-password [SRC_PASSWORD]
src password
--dst [DST] destination should be hostportstr of a mongos or
mongod instance
--dst-authdb [DST_AUTHDB]
dst authentication database, default is 'admin', for
MongoDB
--dst-username [DST_USERNAME]
dst username, for MongoDB
--dst-password [DST_PASSWORD]
dst password, for MongoDB
--start-optime [START_OPTIME]
timestamp in second, indicates oplog based increment
sync
--optime-logfile [OPTIME_LOGFILE]
optime log file path, use this as start optime if
without '--start-optime'
--logfile [LOGFILE] log file path
- command options tuning
- config file format tuning
- sync sharding config (enableSharding & shardCollection)