This program tails the oplog of a Mongo server, and publishes changes to Redis. It's designed to work with the redis-oplog Meteor package.
Unlike redis-oplog's default behavior, which requires that everything that
writes to Mongo also publish matching notifications to Redis, using
oplogtoredis in combination with redis-oplog's externalRedisPublisher
option
guarantees that every change written to Mongo will be automatically published
to Redis.
The project is currently stable and used in production in Tulip. Before using this project in production, review the Known Limitations below, and test it out in a staging environment.
There are a few things that don't currently work in redis-oplog
when using the externalRedisPublisher
option, so those features won't work when using redis-oplog
together with oplogtoredis
. These features are part of redis-oplog
's fine-tuning options. If you don't use any of redis-oplog's fine-tuning options, you won't run into any of these limitations.
- Custom namespaces and channels (
redis-oplog
issue #279) - Synthetic mutations (
redis-oplog
issue #277)
We are using Nix at Tulip so we're rolling a default.nix
and a flake.nix
wrapper around it for Flake support. The default.nix
file contains a vendorHash
attribute, which would be the calculated hash of the downloaded modules to be vendored. To update this value after a go.sum
file change (which means that different modules were downloaded), refer to the following instructions:
- Edit
default.nix
and change thevendorHash
attribute to an empty string then runnix build
- Run
nix build
and observe the output. You are looking for this section:
error: hash mismatch in fixed-output derivation '/nix/store/by8bc99ywfw6j1i6zjxcwdmc5b3mmgzv-oplogtoredis-3.0.0-go-modules.drv':
specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
got: sha256-ceToA2DC1bhmg9WIeNSAfoNoU7sk9PrQqgqt5UbpivQ=
- Use the
sha256-ceToA2DC1bhmg9WIeNSAfoNoU7sk9PrQqgqt5UbpivQ=
for the value ofvendorHash
and verify thatnix build
succeeds.
MongoDB v5 makes a substantial change to the format of the oplog, which makes it much more difficult to assemble the list of changed fields from an oplog entry. Meteor has handled this with a full oplog v2 to v1 converter, which entails substantial complexity and a high level of dependence on the (undocumented) oplog format, and requires testing against every new version of Mongo. To reduce this maintenance burden, oplogtoredis implements a simplified version of this algorithm that just extracts changed top-level fields.
Take, for example the command: db.Foo.update({ _id: 'someId', }, { $set: { "one.two.three": 123, "four": 4 } })
.
With MongoDB v4, oplogtoredis
would produce the record {"e":"u","d":{"_id":"someId"},"f":["one.two.three", "four"]}
. However, with MongoDB v5, oplogtoredis
would instead produce the record {"e":"u","d":{"_id":"someId"},"f":["one", "four"]}
.
This produces correct behavior when used with the redis-oplog
Meteor package: we're simply indicating a superset of the actual change. However, it's possible that it produces sub-optimal performance. For example, say you're observing the query Foo.find({ _id: 'someId' }, { fields: { 'one.two.three': 1 }})
. With MongoDB v4, the command db.Foo.update({ _id: 'someId', }, { $set: { "one.two.xx": 123 } })
would not be considered by redis-oplog
to be a possible change to the observed query results (because it considers the sub-fields, and detects that a change to one.two.xx
cannot affect the observed field one.two.three
). However, with the new MongoDB v5 behavior, redis-oplog
would detect that this could have changed the observed query results, because it just sees a change to the top-level field one
.
This is a trade-off between optimal performance and maintainability: implementing a full v2-to-v1 conversion layer like Meteor does would provide slightly better performance in these specific cases, but with a much increased risk of incorrect behavior (e.g. not triggering an update when we should) if Mongo changes the oplog format, which they've indicated they reserve the right to do even in a patch release.
To use this with redis-oplog, configure redis-oplog with:
externalRedisPublisher: true
globalRedisPrefix: "<name of the Mongo database>."
For example, if your MONGO_URL for Meteor is mongodb://mymongoserver/mydb
,
you might use this config:
{
"redisOplog": {
"redis": {
"port": 6379,
"host": "myredisserver"
},
"globalRedisPrefix": "mydb.",
"externalRedisPublisher": true
}
}
You can build oplogtoredis from source with go build .
, which produces a statically-linked binary you can run. Alternatively, you can use the public docker image. Previously we used to host these images in Docker Hub. These won't be maintained anymore.
You must set the following environment variables:
-
OTR_MONGO_URL
: Required. Mongo URL to read the oplog from. This should point to thelocal
database of the Mongo server and will match theMONGO_OPLOG_URL
you give to your Meteor server. -
OTR_REDIS_URL
: Required: Redis URL to publish updates to. To connect to a instance over TLS be sure to specify OTR_REDIS_URL url with protocolrediss://
, otherwise useredis://
You may also set the following environment variables to configure the level of logging:
-
OTR_LOG_DEBUG
: Optional. Use debug logging, which is more detailed. Seelib/log/main.go
for details. -
OTR_LOG_QUIET
: Don't print any logs. Useful when running unit tests.
There are a number of other environment variables you can set to tune various performance and reliability settings. See the config package docs for more details.
oplogtoredis includes a number of features to support its use in production-critical scenarios.
oplogtoredis uses Redis to deduplicate messages, so it's safe (and recommended!) to run multiple instances of oplogtoredis to ensure availability in the event of a partial outage or a bug in oplogtoredis that causes it to crash or hang.
Just run two copies of oplogtoredis with the same configuration. Each copy will process each oplog message and send it to Redis, where we use a small Lua script to deduplicate the messages. It's not recommended to run more than 2 or 3 copies of oplogtoredis, because the load it puts on your Mongo and Redis databases increases linearly with the number of copies of oplogtoredis that you're running.
oplogtoredis uses Redis to keep track of the last message it processed. When it starts up, it checks to see where it left off, and will resume tailing the oplog from that timestamp, as long as it's not too far in the past (we don't want to replay too much of the oplog, or we could could overload the Redis or Meteor servers). The config package docs have more detail on how to tune this behavior, but this feature should keep your system working propertly even if every copy of oplogtoredis that you're running goes down for a brief period.
oplogtoredis exposes an HTTP server that can be used to monitor the state of
the program. By default, it serves on 0.0.0.0:9000
, but you can change this
with the environment variable OTR_HTTP_SERVER_ADDR
.
The HTTP server exposes a health-checking endpoint at /healthz
. This endpoint
checks connectivity to Mongo and Redis, and then returns 200. If HTTP requests
to this endpoint time out or return non-200 codes for more than a brief
period (10-15 seconds), you should consider the program unhealthy and restart
it. You can do this using Kubernetes liveness probes,
an Icinga health check,
or any other mechanism.
The HTTP server also exposes a Prometheus endpoint
at /metrics
that your Prometheus server can scrape to collect a number
of useful metrics. In particular, if you see the value of the metric
otr_redispub_processed_messages
with the label status=sent
fall to lower
than the writes to your Mongo database, it likely indicates an issue with
oplogtoredis.
oplogtoredis by default emits info, warning, and error messages as JSON,
for easy consumption by structured logging systems. When debugging
oplogtoredis, you may want to set OTR_LOG_DEBUG=true
, which will log
more detailed messages and log in a human-readable format for manual review.
You can use go build
to build and test oplogtoredis, or you can use
the docker-compose environment (docker-compose up
), which spins us a full
environment with Mongo, Redis, and oplogtoredis.
The components of the docker-compose environment are:
-
oplogtoredis
: A container running oplogtoredis, live-reloading whenever the source code changes (via fresh). -
mongo
: A MongoDB server. The other containers are configured to use thedev
db. Connect to Mongo withdocker-compose exec mongo mongo dev
-
redis
: A Redis server. Connect to the CLI withdocker-compose exec redis redis-cli
. Themonitor
command in that CLI will show you everything being published.
You can optionally also spin up 2 meteor app servers with
docker-compose -f docker-compose.yml -f docker-compose.meteor.yml up
. These
servers are running a simple todos app, using redis-oplog, and pointing at the
same Mongo and Redis servers. Note that the first run will take a long time
(5-10 minutes) while Meteor does initial downloads/builds; caching makes
subsequent startups much, much faster.
The additional docker-compose.meteor.yml
file contains:
meteor1
andmeteor2
: Two meteor servers, serving the app attestapp/
. You can go tolocalhost:9091
andlocalhost:9092
to hit the two app servers. When everything's working correctly, changes made on a client connected to one app server (or made directly in the Mongo database) should show up in real-time to clients on the other app server.
There are a number of testing tools and suites that are used to check the correctness of oplogtoredis. All of these are run by Travis on each commit.
You can run all of the tests locally with scripts/runAllTests.sh
.
We use gometalinter
to detect stylistic and correctness issues. Run
scripts/runLint.sh
to run the suite.
You'll need gometalinter
and its dependencies installed. You can install
them with go get github.com/alecthomas/gometalinter; gometalinter --install
.
We use the standard go test
tool. We wrap it as scripts/runUnitTests.sh
to set timeout and enable the race detector.
These acceptance tests test a production-ready docker build of oplogtoredis. They run the production docker container alongside Mongo and Redis docker containers (using docker-compose), and then test the behavior of that whole system.
They are run both with and without the race detector, and against a number of different version of Mongo and Redis.
Run these tests with scripts/runIntegrationAcceptance.sh
. This suite takes
a while to run, because it's run against many different combinations of
Redis, Mongo, and race detections. Use scripts/runIntegrationAcceptanceSingle.sh
for a quick run with only a single configuration.
These tests observe the behavior of oplogtoredis under a number of fault conditions, such as a crash and restart of oplogtoredis, and temporary unavailability of Mongo and Redis. To provide the test harness more control over the environment, they operate on a compiled binary of oplogtoredis rather than a docker image. They run inside a single docker container, with oplogtoredis, Mongo, and Redis spun up and down by the test harness itself.
Run these tests with scripts/runIntegrationFaultInjection.sh
.
These tests use docker-compose to spin up oplogtoredis, Mongo, Redis, and two
Meteor application servers (running the app from ./testapp
), and then connect
to the Meteor servers via websockets/DDP and observe the behavior of traffic
on the wire to ensure that oplogtoredis is working correctly in concert with
redis-oplog.
Run these tests with scripts/runIntegrationMeteor.sh
.
These tests are run in a similar environment to the acceptance tests, but only againt a single Mongo and Redis version, and without the race detector.
We run two tests: one where we just fire-and-forget a bunch of writes to Mongo, and one where we do the same writes, but wait until we've received notifications about all of those writes from Redis. The difference between these two numbers gives us an upper bound on the latency overhead of using oplogtoredis. These tests fail if the overhead is greater than 35%.
Run these tests with scripts/runIntegrationPerformance.sh
.