Skip to content

Commit

Permalink
fix(indexer): fix bulletpoints, break lines (#94)
Browse files Browse the repository at this point in the history
  • Loading branch information
leohhhn authored Jun 5, 2024
1 parent 41fc121 commit bce6343
Showing 1 changed file with 66 additions and 28 deletions.
94 changes: 66 additions & 28 deletions posts/2024-05-10_kv-stores-indexer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,32 +9,52 @@ authors: [ajnavarro]

In this article, we'll discuss how we achieved a tenfold increase in the processing speed of the tx-indexer by applying four key concepts related to our use of key/value storage:

[Key/Value Stores: How We Improved the Performance of Our tx-indexer by 10x](#keyvalue-stores-how-we-improved-the-performance-of-our-tx-indexer-by-10x)
[Understanding Key/Value Store Variability](#understanding-keyvalue-store-variability)
[The Importance of Efficient Data Encoding](#the-importance-of-efficient-data-encoding)
[Implementing Secondary Indexes on a Key/Value Store](#implementing-secondary-indexes-on-a-keyvalue-store)
[The Role of Batch Inserts in Enhancing Performance](#the-role-of-batch-inserts-in-enhancing-performance)
[Data consistency](#data-consistency)
[Speed](#speed)
[Old](#old)
[New](#new)
[Conclusion](#conclusion)

The Transaction Indexer ([tx-indexer](https://github.com/gnolang/tx-indexer)) is the primary tool Gno.land uses to index its networks. It is in charge of keeping up with block production, fetching new data, indexing it, and serving it to users while providing filtering and subscription capabilities. The tx-indexer creates versatility and ease of use when using on-chain data, which is one of the key aspects of a fully functioning decentralized application.
- Understanding Key/Value Store Variability
- The Importance of Efficient Data Encoding
- Implementing Secondary Indexes on a Key/Value Store
- The Role of Batch Inserts in Enhancing Performance
- Data consistency
- Speed
- Old
- New
- Conclusion

The Transaction Indexer ([tx-indexer](https://github.com/gnolang/tx-indexer)) is
the primary tool Gno.land uses to index its networks. It is in charge of keeping
up with block production, fetching new data, indexing it, and serving it to users
while providing filtering and subscription capabilities. The tx-indexer creates
versatility and ease of use when using on-chain data, which is one of the key
aspects of a fully functioning decentralized application.

## Understanding Key/Value Store Variability

Not all key/value storages are created equal. Each varies significantly, and depending on their internal data structures, some are better suited for certain use cases than others. A deep understanding of the key/value store you plan to use will help you better organize data for efficient writing and reading and assist in choosing the best store for your specific needs.
Not all key/value storages are created equal. Each varies significantly, and
depending on their internal data structures, some are better suited for certain
use cases than others. A deep understanding of the key/value store you plan to
use will help you better organize data for efficient writing and reading and
assist in choosing the best store for your specific needs.

While [PebbleDB](https://github.com/cockroachdb/pebble) is based on [RocksDB](https://github.com/facebook/rocksdb/wiki/RocksDB-Overview), the two databases differ significantly. Both utilize LSM Trees built upon SSTables; however, PebbleDB supports only a subset of the features available in RocksDB. For instance, PebbleDB lacks built-in transaction capabilities, but these can be alternatively implemented through the use of Batches and/or Snapshots.
While [PebbleDB](https://github.com/cockroachdb/pebble) is based on
[RocksDB](https://github.com/facebook/rocksdb/wiki/RocksDB-Overview), the two
databases differ significantly. Both utilize LSM Trees built upon SSTables;
however, PebbleDB supports only a subset of the features available in RocksDB.
For instance, PebbleDB lacks built-in transaction capabilities, but these can be
alternatively implemented through the use of Batches and/or Snapshots.

## The Importance of Efficient Data Encoding

Our indexing involved elements defined by consecutive integers, with Blocks on one side and Transactions within a Block on the other.
Our indexing involved elements defined by consecutive integers, with Blocks on
one side and Transactions within a Block on the other.

Initially, Blocks were indexed using a combination of `block_prefix` and block_id encoded in little endian. This method wasn't allowing us to use iterators for ordered data retrieval, forcing us to fetch elements individually, resulting in excessive and inefficient database queries.
Initially, Blocks were indexed using a combination of `block_prefix` and block_id
encoded in little endian. This method wasn't allowing us to use iterators for
ordered data retrieval, forcing us to fetch elements individually, resulting in
excessive and inefficient database queries.

After refactoring, we adopted a binary encoding scheme that allowed for custom encoding of strings and integers. This flexibility enabled ascending or descending order iterations, which significantly improved our ability to read data sequentially through iterators and, consequently, reduced query times dramatically.
After refactoring, we adopted a binary encoding scheme that allowed for custom
encoding of strings and integers. This flexibility enabled ascending or descending
order iterations, which significantly improved our ability to read data sequentially
through iterators and, consequently, reduced query times dramatically.

Small example about how we encoded uint32 values in ascending order:

Expand All @@ -46,34 +66,50 @@ func encodeUint32Ascending(b []byte, v uint32) []byte {

## Implementing Secondary Indexes on a Key/Value Store

While most filters are applied on the fly due to their low cost, we implemented secondary indexes to fetch Transactions by Hash efficiently.
While most filters are applied on the fly due to their low cost, we implemented
secondary indexes to fetch Transactions by Hash efficiently.

Secondary indexes are specialized key groups that directly reference the primary index key where the data resides. For example, a transaction with ID `3` in block `42` is indexed as `/index/txs/[uint64]42[uint32]3`. These transactions are also uniquely identified by a hash representing the entire transaction content.
Secondary indexes are specialized key groups that directly reference the primary
index key where the data resides. For example, a transaction with ID `3` in block
`42` is indexed as `/index/txs/[uint64]42[uint32]3`. These transactions are also
uniquely identified by a hash representing the entire transaction content.

To fetch transactions by hash, we created a secondary index that points to the primary index:
To fetch transactions by hash, we created a secondary index that points to the
primary index:

`/index/txh/[HASH] -> /data/txs/[uint64]42[uint32]3`

Although our secondary indexes do not require ordered iteration, this capability remains available, allowing us to apply additional filters as necessary. For instance, we could index transactions by year:
Although our secondary indexes do not require ordered iteration, this capability
remains available, allowing us to apply additional filters as necessary. For
instance, we could index transactions by year:

`/index/txYear/[uint16]2024[uint64]42[uint32]3 -> /data/txs/[uint64]42[uint32]3`

This format allows us to iterate through transactions within a specific year, from the start to the end of 2023, for example:
This format allows us to iterate through transactions within a specific year,
from the start to the end of 2023, for example:

from: `/index/txYear/[uint16]2023[uint64]0[uint32]0`
to: `/index/txYear/[uint16]2023[uint64]MAX_UINT64[uint32]MAX_UINT32`
- from: `/index/txYear/[uint16]2023[uint64]0[uint32]0`
- to: `/index/txYear/[uint16]2023[uint64]MAX_UINT64[uint32]MAX_UINT32`

## The Role of Batch Inserts in Enhancing Performance

The advantages of write batches are often overlooked but crucial. Inserting elements individually can lead to data consistency issues and slower operations.
The advantages of write batches are often overlooked but crucial. Inserting
elements individually can lead to data consistency issues and slower operations.

### Data consistency

Batches ensure atomicity—either all elements are persisted, or none are. Without batches, a failure during insertion could result in a block being saved without some of its transactions.
Batches ensure atomicity—either all elements are persisted, or none are.
Without batches, a failure during insertion could result in a block being saved
without some of its transactions.

### Speed

Each insertion involves internal processes that slow down the operation. By grouping several entries in one batch, we significantly enhance insertion speed. These are new benchmarks comparing the old and new way of writting elements without and with batches. Note that these are just synthetic benchmarks and the 10x improvement was measured when using the indexer as it is (we came from speding 30 mins to 3 mins with the new storage changes):
Each insertion involves internal processes that slow down the operation. By
grouping several entries in one batch, we significantly enhance insertion speed.
These are new benchmarks comparing the old and new way of writting elements
without and with batches. Note that these are just synthetic benchmarks and the
10x improvement was measured when using the indexer as it is (we came from
spending 30 mins to 3 mins with the new storage changes):

#### Old

Expand Down Expand Up @@ -149,7 +185,9 @@ ok github.com/gnolang/tx-indexer/storage 4.669s

## Conclusion

If you find out this interesting and want to have a deeper look about [how it is done](https://github.com/gnolang/tx-indexer/tree/main/storage), or just try our indexer, it is as simple as ramping up a docker image:
If you find out this interesting and want to have a deeper look about
[how it is done](https://github.com/gnolang/tx-indexer/tree/main/storage), or just try our indexer, it is as simple as ramping up
a docker image:

```
docker run -it -p 8546:8546 ghcr.io/gnolang/tx-indexer:latest start -remote http://test3.gno.land:36657
Expand Down

0 comments on commit bce6343

Please sign in to comment.