Skip to content
Adrien Béraud edited this page Mar 29, 2024 · 58 revisions

Introduction

OpenDHT offers the following features:

  • Distributed shared key->value data-store.
  • IPv4 and IPv6 support.
  • Storage of arbitrary binary values up to 64 KiB. Keys are 160 bits long.
  • Different values under a same key can be distinguished by a key-unique 64 bits ID.
  • Every value also has a "value type". Each value type defines potentially complex storage, edition and expiration policies, allowing for instance different value expiration times. The set of supported "value types" is hardcoded and known by every node.

Note that OpenDHT is not compatible with the Mainline Bittorrent DHT.

An optional public-key cryptography layer on top of the DHT allows to put signed or encrypted data on the DHT. Signed values can then be edited, only by their owner (as verified cryptographically). Signed values retrieved from the DHT are automatically checked and will only be presented to the user if the signature verification succeeds.

The identity layer also publishes a (usually self-signed) certificate on the DHT that can be used to encrypt data for other nodes. Encrypted values are always signed, and the signature is part of the encrypted data, to hide the signer identity during transmission. For this reason, like other non-signed values, encrypted values can't be edited (because storage nodes can't verify the identity of the author).

The OpenDHT API

OpenDHT uses the dht C++ namespace and is composed by a few major classes :

  • Infohash represents a key or a node ID, which are 20 bytes/160 bits bitstrings. Infohash instances can be compared with the comparison operator ==. The user can compute hashes from strings or binary data using static methods Infohash::get(), for instance Infohash::get("my_key") returns the SHA1 hash of the string "my_key".
  • Value represents a value potentially stored on the DHT. dht::Value is the result type of get operations and the argument type of put operations. A dht::Value can be easily built from any binary object, for instance using the constructor dht::Value::Value(const std::vector<uint8_t>&) or C-style with dht::Value::Value(const uint8_t* ptr, size_t len).
  • ValueType defines how data is stored on the DHT : preservation time, storage and edition constraints etc. Every stored Value has an associated value type. Note that ValueType usually has no impact on data serialization.
  • Value::Filter is a class inheriting from std::function<bool(Value&)>. It lets you define whether a value should be returned to the user. It also defines some useful methods like chain(Value::Filter&&) and chainOr(Value::Filter&&).
  • Query much like the filters, the Query lets you filter values, but also fields in each value. It pretty much defines an SQL SELECT, WHERE statements. In fact, one of it's constructors literally takes an SQL-ish fromatted string as parameter. Fields on which SELECT and WHERE operations are permitted are listed in Value::Fields. This is a subset of the fields a Value contains. The most meaningful distinction between the query and the filter is that the query is going to be executed by the remote nodes, giving you a better control over the traffic triggered by your usage of the library.
  • Dht is the class implementing the actual distributed hash table and providing basic operations. It requires an already-open UDP socket to send packets. When used alone, the Dht::periodic method must be called regularly and when a packet is received.
  • SecureDht is a child class of dht::Dht that exposes its APIs and transparently checks signed values (for get and listen operations), decrypt encrypted values, and provide additional methods to publish signed or encrypted values.
  • DhtRunner provides a thread-safe interface to SecureDht and manages UDP sockets. DhtRunner is what most applications implementing OpenDHT should use: the instance can be safely shared to be used independently by various components or threads, with networking managed transparently. DhtRunner can launch a dedicated thread or be integrated in the program loop.

Callbacks

Get/listen operations take a callback argument of type GetCallback or GetCallbackSimple (both can be used):

using GetCallback = std::function<bool(const std::vector<std::shared_ptr<dht::Value>>& values)>;

using GetCallbackSimple = std::function<bool(const std::shared_ptr<dht::Value>& value)>;

Listen operations can also take a callback argument of type ValueCallback allowing to know when a new value is found and expires (expired is false when the value is first found and true when it expires):

using ValueCallback = std::function<bool(const std::vector<std::shared_ptr<Value>>& values, bool expired)>;

Query operations take a callback argument of type QueryCallback, defined as:

using QueryCallback = std::function<bool(const std::vector<std::shared_ptr<dht::FieldValueIndex>>& fields)>;

Many operations also use an "operation completed" callback DoneCallback, defined as:

using DoneCallback = std::function<void(bool success)>

dht::Dht

This class provides the core API. Important methods are:

  • Constructor
Dht::Dht(int s, int s6, const InfoHash& id)

The constructor takes open IPv4, IPv6 UDP sockets used to send packets, and the node ID. At least one open socket must be provided for the Dht instance to be considered running. If a valid socket is not provided the value -1 should be passed instead.

Most apps using OpenDHT should use the class DhtRunner that will instantiate Dht, handle networking transparently and provide a thread-safe interface to the dht instance.

  • Get
void Dht::get(const InfoHash& key, GetCallback cb, DoneCallback donecb={}, Value::Filter f = {}, Query q = {});

Get initiates a search on the network for values associated with the provided key. Results will be provided during the search through the second argument cb. The callback will be called multiple times with new values when they are found on the network or until the callback returns false. An optional DoneCallback is called on operation completion (success or failure), after which no further callback is called.
Filter: optional predicate to pre-filter values before they are passed to the callback.
Query: optional query to filter values on remote nodes.

Example using Dht::get:

//node is a running instance of dht::Dht
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "Got value: " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }
);
  • Query
void Dht::query(const InfoHash& key, QueryCallback cb, DoneCallback done_cb = {}, Query&& q = {});

Query initiates a search on the network at the provided key for specific value fields. Results will be provided during the search through the second argument cb. The callback will be called multiple times with new values when they are found on the network or until the callback returns false. An optional DoneCallback is called on operation completion (success or failure), after which no further callback is called.
Filter: optional predicate to pre-filter values before they are passed to the callback.
Query: optional query to filter values on remote nodes.

Example using Dht::query:

//node is a running instance of dht::Dht
node.query(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::FieldValueIndex>>& fields) {
        for (const auto& i : fields)
            std::cout << "Got index: " << *i << std::endl;
        return true; // keep looking for field value index
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }
);
  • Put
void Dht::put(const InfoHash& key, const std::shared_ptr<Value>& value, DoneCallback cb = {});

Put initiates publication of a value on the network at the provided key. See Data serialization for more information about how to build a dht::Value instance. An optional DoneCallback is called on operation completion (success or failure).
If the value ID is dht::Value::INVALID_ID (0) when put is called, the Value::id field is set during the operation to identify the value.
A value remains on the network for its lifetime (default 10 minutes). Use put with the same key and value to refresh the expiration deadline. Values can't be edited by default (with the exception of signed values). If a value with the same value ID exists on the network, the new value is by default ignored by the network.

Example using Dht::put:

const char* my_data = "42 cats";

//node is a running instance of dht::Dht
node.put(
    dht::InfoHash::get("some_key"),
    dht::Value((const uint8_t*)my_data, std::strlen(my_data))
);
  • Listen
size_t Dht::listen(const InfoHash& key, ValueCallback cb, Value::Filter q = {}, Query q = {});

Listen initiates a search on the network to find values associated with the provided key and will keep being informed of new values published at key, calling the provided callback function cb every time there is a new or changed value at key, until the callback cb returns false or the operation is canceled with bool cancelListen(const InfoHash& key, size_t token), where token is the return value from listen. Calling cancelListen has the same effect as returning false from the callback.

Example using Dht::listen:

auto key = dht::InfoHash::get("some_key");
auto token = node.listen(key,
    [](const std::vector<std::shared_ptr<dht::Value>>& values, bool expired) {
        for (const auto& v : values)
            std::cout << "Found value: " << *v << ", " << (expired ? "expired" : "added") << std::endl;
        return true; // keep listening
    }
);

// later
node.cancelListen(key, std::move(token));

Listen with type template for automatic deserialization:

struct Cloud {
    uint32_t altitude;
    double width, height;
    bool rainbow;
    MSGPACK_DEFINE_MAP(altitude, width, height, rainbow);
}
std::vector<Cloud> found_clouds;

auto key = dht::InfoHash::get("some_key");
auto token = node.listen<Cloud>(key, [](Cloud&& value) {
        // warning: called from another thread
        found_clouds.emplace_back(std::move(value));
    }
);

// later
node.cancelListen(key, token);

Filters and queries

Filters

A filter is an std::function<bool(const dht::Value&)> predicate to filter values.

auto coolValueFilter = [](const dht::Value& v) {
    return v.user_type == "cool" and v.data.size() < 64;
};
node.get(
    dht::InfoHash::get("coolKey"),
    [](const std::shared_ptr<dht::Value>& value) {
        std::cout << "That's a cool value: " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Op went " << (success ? "cool" : "not cool") << std::endl;
    },
    coolValueFilter);

As you can see, the Value::Filter class is really flexible. However, this filtering is only going to be processed on the local node upon receiving values in a response. What if you know that the storage you're interested in is hosting a high number of values and you don't want to trigger big traffic. Use queries!

Queries

An equivalent to the last example, but using queries is as follows:

dht::Where w;
w.id(5); /* the same as dht::Where w("WHERE id=5"); */
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "This value has passed through the remotes filters " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }, {}, w
);

All available fields are listed below:

Field
Id
ValueType
OwnerPk
UserType

Note: fields usage in string initialization is snake case!

A query can tell if it is satisfied by another query. For e.g.:

Query q1;
q1.where.id(5); // the whole value with id=5 will be sent

Query q2 {{"SELECT value_type"}};
// q2 the same as Query q("SELECT value_type WHERE value_type=10,user_type=foo_type");
q2.where.valueType(10).userType("foo_type");

Query q3("SELECT id WHERE id=5"); // only the id=5 will be sent

q1.isSatisfiedBy(q3); // false
q2.isSatisfiedBy(q1); // false
q3.isSatisfiedBy(q1); // true because q1 yields all the response data q3 would have or more
q2.isSatisfiedBy(q3); // false

dht::SecureDht

This class extends dht::Dht, and provides the same API methods (get, put, listen). It adds a public-key cryptography layer on top of the DHT.

A user-provided Identity (RSA key pair and optional Certificate) can be used for signing and decryption.

If SecureDht is configured with a Certificate, it will be published on the DHT, and automatically retrieved by other nodes in order to identify, authentify and encrypt values exchanged on the DHT.

Values returned by SecureDht::get and SecureDht::listen are checked beforehand and filtered: signed values are dropped if their signature verification fails. Similarly, encrypted values that can't be decrypted are dropped.

As a layer on top of Dht, SecureDht can also be used for plain values. Methods like get and put will behave the same as Dht for non-encrypted and non-signed values.

Signed values

The user can know if a dht::Value provided by ::get and ::listen is signed by checking the owner field of the Value (which would be the public key of the signer). The public key ID of the signer can then be checked with value->owner->getId() or value->owner->getLongId().

The following fields of dht::Value are authenticated by the signature:

  • owner
  • recipient
  • seq
  • user_type
  • type
  • data

Note that the value ID is not part of the signed data and is not authenticated by the signature.

SecureDht adds the following method:

  • PutSigned
void putSigned(const InfoHash& hash, const std::shared_ptr<Value>& val, DoneCallback callback);

This method requires SecureDht to be configured with a private key, used for signing.

Value edition

Value edition is only possible with signed values. It allows to replace a value at a specific key and value id with a different content.

To edit a value, perform multiple calls to putSigned with the same key and value id. It is possible to reuse and modify the same dht::Value instance, as the signature is recomputed at every call to putSigned. In that case, avoid modifying the value instance between the call to putSigned and the completion callback.

Value edition in OpenDHT enforces the following requirements:

  • The value must keep the same signer.
  • The seq field of the value must be increasing. This is done automatically when performing putSigned multiple times on the same SecureDht instance.

Encrypted values

The user can know if a value was received encrypted by checking the recipient field of the Value (which would be our public key ID if the value was encrypted for us).

On OpenDHT, encrypted values are always also privately signed (the signature is only visible to the recipient). Every field of dht::Value authenticated by the signature is also concealed to everyone but the recipient when using encryption.

SecureDht adds the following methods:

  • PutEncrypted
void putEncrypted(const InfoHash& hash, const InfoHash& to, std::shared_ptr<Value> val, DoneCallback callback, bool permanent = false);

This method requires SecureDht to be configured with a private key, used for signing. The public key of the recipient (to) will be searched using the provided certificate or public key lookup callback, or automatically on the DHT.

void putEncrypted(const InfoHash& hash, const crypto::PublicKey& to, Sp<Value> val, DoneCallback callback, bool permanent = false);

This method allows to provide the full recipient public key directly.

dht::DhtRunner

DhtRunner provides a thread-safe access to the running DHT instance and exposes all methods from SecureDht. See more information here : Running a node in your program