Skip to content

GlareDB/openraft

 
 

Repository files navigation

Openraft

Advanced Raft in 🦀 Rust using Tokio. Please ⭐ on github!

Crates.io docs.rs guides
CI License Crates.io Crates.io

🪵🪵🪵 Raft is not yet good enough. This project intends to improve raft as the next generation consensus protocol for distributed data storage systems (SQL, NoSQL, KV, Streaming, Graph ... or maybe something more exotic).

Currently openraft is the consensus engine of meta-service cluster in databend.

Versions

  • Openraft API is not stable yet. Before 1.0.0, an upgrade may contain incompatible changes. Check our change-log. A commit message starts with a keyword to indicate the modification type of the commit:

    • Change: if it introduces incompatible changes.
    • Feature: if it introduces compatible non-breaking new features.
    • Fix: if it just fixes a bug.
  • Branch release-0.6: In this release branch, v0.6.8 is the latest version. release-0.6 won't accept new features but only bug fixes.

  • Branch release-0.7: In this release branch, v0.7.0 is the latest version. Upgrade guide from 0.6 to 0.7

  • Branch main has been under active development.

Roadmap

  • Extended joint membership

  • Reduce the complexity of vote and pre-vote: get rid of pre-vote RPC;

  • Reduce confliction rate when electing; Allow leadership to be taken in one term by a node with greater node-id.

  • Support flexible quorum, e.g.:Hierarchical Quorums

  • Consider introducing read-quorum and write-quorum, improve efficiency with a cluster with an even number of nodes.

  • Goal performance is 1,000,000 put/sec.

    Bench history:

    • 2022 Jul 01: 41,000 put/sec; 23,255 ns/op;
    • 2022 Jul 07: 43,000 put/sec; 23,218 ns/op; Use Progress to track replication.
    • 2022 Jul 09: 45,000 put/sec; 21,784 ns/op; Batch purge applied log

    Run the benchmark: make bench_cluster_of_3

    Benchmark setting:

    • No network.
    • In memory store.
    • A cluster of 3 nodes on one server.
    • Single client.

Features

  • It is fully reactive and embraces the async ecosystem. It is driven by actual Raft events taking place in the system as opposed to being driven by a tick operation. Batching of messages during replication is still used whenever possible for maximum throughput.

  • Storage and network integration is well defined via two traits RaftStorage & RaftNetwork. This provides applications maximum flexibility in being able to choose their storage and networking mediums.

  • All interaction with the Raft node is well defined via a single public Raft type, which is used to spawn the Raft async task, and to interact with that task. The API for this system is clear and concise.

  • Log replication is fully pipelined and batched for optimal performance. Log replication also uses a congestion control mechanism to help keep nodes up-to-date as efficiently as possible.

  • It fully supports dynamic cluster membership changes with joint config. The buggy single-step membership change algo is not considered. See the dynamic membership chapter in the guide.

  • Details on initial cluster formation, and how to effectively do so from an application's perspective, are discussed in the cluster formation chapter in the guide.

  • Automatic log compaction with snapshots, as well as snapshot streaming from the leader node to follower nodes is fully supported and configurable.

  • The entire code base is instrumented with tracing. This can be used for standard logging, or for distributed tracing, and the verbosity can be statically configured at compile time to completely remove all instrumentation below the configured level.

Who use it

Contributing

Check out the CONTRIBUTING.md guide for more details on getting started with contributing to this project.

License

Openraft is licensed under the terms of the MIT License or the Apache License 2.0, at your choosing.

Releases

No releases published

Packages

No packages published

Languages

  • Rust 99.0%
  • Other 1.0%