Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pebbleds profile and plugin #10530

Merged
merged 13 commits into from
Oct 3, 2024
22 changes: 22 additions & 0 deletions config/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,28 @@ func DefaultDatastoreConfig() Datastore {
}
}

func pebbleSpec() map[string]interface{} {
lidel marked this conversation as resolved.
Show resolved Hide resolved
return map[string]interface{}{
"type": "measure",
"prefix": "pebble.datastore",
"child": map[string]interface{}{
"type": "pebbleds",
"path": "pebbleds",
"bytesPerSync": 0,
"bisableWAL": false,
gammazero marked this conversation as resolved.
Show resolved Hide resolved
"cacheSize": 0,
"l0CompactionThreshold": 0,
"l0StopWritesThreshold": 0,
"lBaseMaxBytes": 0,
"maxConcurrentCompactions": 0,
"memTableSize": 0,
"memTableStopWritesThreshold": 0,
"walBytesPerSync": 0,
"walMinSyncSeconds": 0,
},
}
}

func badgerSpec() map[string]interface{} {
return map[string]interface{}{
"type": "measure",
Expand Down
39 changes: 37 additions & 2 deletions config/profile.go
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,11 @@ You should use this datastore if:
* You want to minimize memory usage.
* You are ok with the default speed of data import, or prefer to use --nocopy.

This profile may only be applied when first initializing the node.
See configuration documentation at:
https://github.com/ipfs/kubo/blob/master/docs/datastores.md#flatfs

NOTE: This profile may only be applied when first initializing node at IPFS_PATH
via 'ipfs init --profile flatfs'
`,

InitOnly: true,
Expand All @@ -144,6 +148,32 @@ This profile may only be applied when first initializing the node.
return nil
},
},
"pebbleds": {
Description: `Configures the node to use the pebble high-performance datastore.

Pebble is a LevelDB/RocksDB inspired key-value store focused on performance
and internal usage by CockroachDB.
You should use this datastore if:

- You need a datastore that is focused on performance.
- You need reliability by default, but may choose to disable WAL for maximum performance when reliability is not critical.
- This datastore is good for multi-terabyte data sets.
- May benefit from tuning depending on read/write patterns and throughput.
- Performance is helped significantly by running on a system with plenty of memory.

See configuration documentation at:
https://github.com/ipfs/kubo/blob/master/docs/datastores.md#pebbleds

NOTE: This profile may only be applied when first initializing node at IPFS_PATH
via 'ipfs init --profile pebbleds'
`,

InitOnly: true,
Transform: func(c *Config) error {
c.Datastore.Spec = pebbleSpec()
return nil
},
},
"badgerds": {
Description: `Configures the node to use the legacy badgerv1 datastore.

Expand All @@ -160,7 +190,12 @@ Other caveats:
* Good for medium-size datastores, but may run into performance issues
if your dataset is bigger than a terabyte.

This profile may only be applied when first initializing the node.`,
See configuration documentation at:
https://github.com/ipfs/kubo/blob/master/docs/datastores.md#badgerds

NOTE: This profile may only be applied when first initializing node at IPFS_PATH
via 'ipfs init --profile badgerds'
`,

InitOnly: true,
Transform: func(c *Config) error {
Expand Down
10 changes: 10 additions & 0 deletions docs/changelogs/v0.31.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,23 @@

- [Overview](#overview)
- [🔦 Highlights](#-highlights)
- [Experimental Pebble Datastore](#experimental-pebble-datastore)
- [📝 Changelog](#-changelog)
- [👨‍👩‍👧‍👦 Contributors](#-contributors)

### Overview

### 🔦 Highlights

#### Experimental Pebble Datastore

[Pebble](https://github.com/ipfs/kubo/blob/master/docs/config.md#pebbleds-profile) provides a high-performance alternative to leveldb as the datastore, and provides a modern replacement for [legacy badgerv1](https://github.com/ipfs/kubo/blob/master/docs/config.md#badgerds-profile).

A fresh Kubo node can be initialized with [`pebbleds` profile](https://github.com/ipfs/kubo/blob/master/docs/config.md#pebbleds-profile) via `ipfs init --profile pebbleds`.

There are a number of parameters available for tuning pebble's performance to your specific needs. Default values are used for any parameters that are not configured or are set to their zero-value.
For a description of the available tuning parameters, see [kubo/docs/datastores.md#pebbleds](https://github.com/ipfs/kubo/blob/master/docs/datastores.md#pebbleds).

### 📝 Changelog

### 👨‍👩‍👧‍👦 Contributors
45 changes: 34 additions & 11 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ config file at runtime.
- [`local-discovery` profile](#local-discovery-profile)
- [`default-networking` profile](#default-networking-profile)
- [`flatfs` profile](#flatfs-profile)
- [`pebbleds` profile](#pebbleds-profile)
- [`badgerds` profile](#badgerds-profile)
- [`lowpower` profile](#lowpower-profile)
- [`legacy-cid-v0` profile](#legacy-cid-v0-profile)
Expand Down Expand Up @@ -522,13 +523,8 @@ Spec defines the structure of the ipfs datastore. It is a composable structure,
where each datastore is represented by a json object. Datastores can wrap other
datastores to provide extra functionality (eg metrics, logging, or caching).

This can be changed manually, however, if you make any changes that require a
different on-disk structure, you will need to run the [ipfs-ds-convert
tool](https://github.com/ipfs/ipfs-ds-convert) to migrate data into the new
structures.

For more information on possible values for this configuration option, see
[docs/datastores.md](datastores.md)
> [!NOTE]
> For more information on possible values for this configuration option, see [`kubo/docs/datastores.md`](datastores.md)

Default:
```
Expand Down Expand Up @@ -2401,9 +2397,9 @@ Inverse profile of the test profile.

### `flatfs` profile

Configures the node to use the flatfs datastore. Flatfs is the default datastore.
Configures the node to use the flatfs datastore.
Flatfs is the default, most battle-tested and reliable datastore.

This is the most battle-tested and reliable datastore.
You should use this datastore if:

- You need a very simple and very reliable datastore, and you trust your
Expand All @@ -2414,7 +2410,30 @@ You should use this datastore if:
- You want to minimize memory usage.
- You are ok with the default speed of data import, or prefer to use `--nocopy`.

This profile may only be applied when first initializing the node.
> [!WARNING]
> This profile may only be applied when first initializing the node via `ipfs init --profile flatfs`

> [!NOTE]
> See caveats and configuration options at [`datastores.md#flatfs`](datastores.md#flatfs)

### `pebbleds` profile

Configures the node to use the pebble high-performance datastore.

Pebble is a LevelDB/RocksDB inspired key-value store focused on performance and internal usage by CockroachDB.
You should use this datastore if:

- You need a datastore that is focused on performance.
- You need reliability by default, but may choose to disable WAL for maximum performance when reliability is not critical.
- This datastore is good for multi-terrabyte data sets.
- May benefit from tuning depeending on read/write patterns and throughput.
- Performance is helped significnatly by running on a system with plenty of memory.
gammazero marked this conversation as resolved.
Show resolved Hide resolved

> [!WARNING]
> This profile may only be applied when first initializing the node via `ipfs init --profile pebbleds`

> [!NOTE]
> See other caveats and configuration options at [`datastores.md#pebbleds`](datastores.md#pebbleds)

### `badgerds` profile

Expand All @@ -2435,7 +2454,11 @@ Also, be aware that:
- Good for medium-size datastores, but may run into performance issues if your dataset is bigger than a terabyte.
- The current implementation is based on old badger 1.x which is no longer supported by the upstream team.

This profile may only be applied when first initializing the node.
> [!WARNING]
> This profile may only be applied when first initializing the node via `ipfs init --profile badgerds`

> [!NOTE]
> See other caveats and configuration options at [`datastores.md#pebbleds`](datastores.md#pebbleds)

### `lowpower` profile

Expand Down
36 changes: 36 additions & 0 deletions docs/datastores.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
This document describes the different possible values for the `Datastore.Spec`
field in the ipfs configuration file.

- [flatfs](#flatfs)
- [levelds](#levelds)
- [pebbleds](#pebbleds)
- [badgerds](#badgerds)
- [mount](#mount)
- [measure](#measure)

## flatfs

Stores each key value pair as a file on the filesystem.
Expand Down Expand Up @@ -35,6 +42,35 @@ Uses a leveldb database to store key value pairs.
}
```

## pebbleds

Uses [pebble](https://github.com/cockroachdb/pebble) as a key value store.

```json
{
"type": "pebbleds",
"path": "<location of pebble inside repo>",
}
```

The following options are availble for tuning pebble.
If they are not configured (or assigned their zero-valued), then default values are used.

* `bytesPerSync`: int, Sync sstables periodically in order to smooth out writes to disk. (default: 512KB)
* `bisableWAL`: true|false, Disable the write-ahead log (WAL) at expense of prohibiting crash recovery. (default: false)
* `cacheSize`: Size of pebble's shared block cache. (default: 8MB)
* `l0CompactionThreshold`: int, Count of L0 files necessary to trigger an L0 compaction.
* `l0StopWritesThreshold`: int, Limit on L0 read-amplification, computed as the number of L0 sublevels.
* `lBaseMaxBytes`: int, Maximum number of bytes for LBase. The base level is the level which L0 is compacted into.
* `maxConcurrentCompactions`: int, Maximum number of concurrent compactions. (default: 1)
* `memTableSize`: int, Size of a MemTable in steady state. The actual MemTable size starts at min(256KB, MemTableSize) and doubles for each subsequent MemTable up to MemTableSize (default: 4MB)
* `memTableStopWritesThreshold`: int, Limit on the number of queued of MemTables. (default: 2)
* `walBytesPerSync`: int: Sets the number of bytes to write to a WAL before calling Sync on it in the background. (default: 0, no background syncing)
* `walMinSyncSeconds`: int: Sets the minimum duration between syncs of the WAL. (default: 0)

> [!TIP]
> Start using pebble with only default values and configure tuning items are needed for your needs. For a more complete description of these values, see: `https://pkg.go.dev/github.com/cockroachdb/pebble@vA.B.C#Options` (where `A.B.C` is pebble version from Kubo's `go.mod`).

## badgerds

Uses [badger](https://github.com/dgraph-io/badger) as a key value store.
Expand Down
12 changes: 12 additions & 0 deletions docs/examples/kubo-as-a-library/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ require (
require (
bazil.org/fuse v0.0.0-20200117225306-7b5117fecadc // indirect
github.com/AndreasBriese/bbloom v0.0.0-20190825152654-46b345b51c96 // indirect
github.com/DataDog/zstd v1.4.5 // indirect
github.com/Jorropo/jsync v1.0.1 // indirect
github.com/alecthomas/units v0.0.0-20240626203959-61d1e3462e30 // indirect
github.com/alexbrainman/goissue34681 v0.0.0-20191006012335-3fc7a47baff5 // indirect
Expand All @@ -25,6 +26,12 @@ require (
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/ceramicnetwork/go-dag-jose v0.1.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cockroachdb/errors v1.11.3 // indirect
github.com/cockroachdb/fifo v0.0.0-20240606204812-0bbfbd93a7ce // indirect
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b // indirect
github.com/cockroachdb/pebble v1.1.2 // indirect
github.com/cockroachdb/redact v1.1.5 // indirect
github.com/cockroachdb/tokenbucket v0.0.0-20230807174530-cc333fc44b06 // indirect
github.com/containerd/cgroups v1.1.0 // indirect
github.com/coreos/go-systemd/v22 v22.5.0 // indirect
github.com/crackcomm/go-gitignore v0.0.0-20231225121904-e25f5bc08668 // indirect
Expand All @@ -43,6 +50,7 @@ require (
github.com/francoispqt/gojay v1.2.13 // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/gabriel-vasile/mimetype v1.4.4 // indirect
github.com/getsentry/sentry-go v0.27.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-task/slim-sprig/v3 v3.0.0 // indirect
Expand Down Expand Up @@ -74,6 +82,7 @@ require (
github.com/ipfs/go-ds-flatfs v0.5.1 // indirect
github.com/ipfs/go-ds-leveldb v0.5.0 // indirect
github.com/ipfs/go-ds-measure v0.2.0 // indirect
github.com/ipfs/go-ds-pebble v0.3.2-0.20241002075519-c174835dc84a // indirect
github.com/ipfs/go-fs-lock v0.0.7 // indirect
github.com/ipfs/go-ipfs-blockstore v1.3.1 // indirect
github.com/ipfs/go-ipfs-delay v0.0.1 // indirect
Expand Down Expand Up @@ -103,6 +112,8 @@ require (
github.com/klauspost/compress v1.17.9 // indirect
github.com/klauspost/cpuid/v2 v2.2.8 // indirect
github.com/koron/go-ssdp v0.0.4 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/libp2p/go-buffer-pool v0.1.0 // indirect
github.com/libp2p/go-cidranger v1.1.0 // indirect
github.com/libp2p/go-doh-resolver v0.4.0 // indirect
Expand Down Expand Up @@ -172,6 +183,7 @@ require (
github.com/quic-go/quic-go v0.45.2 // indirect
github.com/quic-go/webtransport-go v0.8.0 // indirect
github.com/raulk/go-watchdog v1.3.0 // indirect
github.com/rogpeppe/go-internal v1.12.0 // indirect
github.com/samber/lo v1.46.0 // indirect
github.com/spaolacci/murmur3 v1.1.0 // indirect
github.com/stretchr/testify v1.9.0 // indirect
Expand Down
Loading
Loading