Skip to content

Commit

Permalink
revise TSI proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
benbjohnson committed Aug 25, 2016
1 parent d6b68cf commit 4b84e3d
Showing 1 changed file with 47 additions and 13 deletions.
60 changes: 47 additions & 13 deletions doc/tsm/tsi.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,17 @@ This is a proposal for a TSI (Time-Series Index) file format to provide fast, me

## Block types

The file contains multiple block types:
The file contains multiple variable-length block types:

- Series Block - This block stores a series block ID and a list of series ids that map to series keys. The list is sorted by series key. The series id is only used internally within the index file and no guarantees are provided about preserving keys after a compaction.

- Measurement Block - This block stores the measurement name and a list of tag keys. Each tag key points to a tag value block.

- Tag Value Block - This block stores a list of values for a tag key that points to a sorted list of series ids. The series ids are used instead of series keys because the IDs are much smaller than the keys and sorted integer sets can be operated on more quickly.

- WAL Block - This block stores a list of series keys that have not been compacted yet.

## Usage

Initially, series are appended to a WAL block and the series information is held in-memory. Once this reaches a threshold, the WAL block is compacted to a set of series, measurement, & tag value blocks. The WAL is discarded and these new blocks will serve as the index and will be operated upon through an mmap.
Initially, series are appended to a WAL file and the series information is held in-memory. Once this reaches a threshold, the WAL block is compacted to a set of series, measurement, & tag value blocks. The WAL file is discarded and these new blocks will serve as the index and will be operated upon through an mmap.

Series keys can be queried by measurement and tag value by first jumping to the appropriate measurement block. References to measurement blocks will be held in-memory once the files are read in. From the measurement block, a binary search on the tag keys will allow you to find a particular key. Once the key is found, you can jump to the key's tag value block and perform a binary search to find a particular value and a list of associated series IDs.

Expand All @@ -35,19 +33,49 @@ These measurement blocks may duplicate measurement blocks created earlier by oth

As TSI files grow and measurements are duplicated, query time merging becomes more costly so a larger compaction will be necessary to consolidate all series, measurement, and tag value blocks into one.

When an index file or WAL file is compacted it is locked and the new index and WAL will begin on a new file. This is to allow snapshotting to take hard links on the files.

#### Two Compaction Types

There are two separate types of compaction that can occur:

1. Fast compaction -- compacts a set of WAL entries into series, measurement, and tag value blocks. These blocks are layered on top of previously compacted block sets.

2. Full compaction -- compacts sets of block sets into a single block set. This takes the multiple layers of series, measurement, and tag value blocks and merges them into one. It also removes any entries which have been tombstoned.


## Detailed Block Information

### Series Blocks

Series blocks contain a `uint32` block ID followed by a list of `uint32` local series IDs. Combining the block id and local series ID provides a globally addressable `uint64` series ID: `(blockID,seriesID)`
Series blocks contain a `uint32` block ID followed by a list of `uint32` local series IDs. Combining the block id and local series ID provides a globally addressable `uint64` series ID: `(blockID,seriesID)`. Block IDs are assigned in sequential order and are only unique within an index file.

For example, if the block ID was 1 and the local series ID was 5 then the global ID is (1,5) and is represented as `1 << 32 | 5`.

The layout on disk for the block would be:

```
BLOCKID<uint32>
KEYN<uint32>
(KEYPOS<uint64>)*
(FLAG<uint8>,KEYSZ<uint16>,KEY<string>)*
```

With a global ID, the key can be looked up by moving to the appropriate block and then moving to the `KEYPOS` based on the local series ID and then finally jumping to the key itself using the key position.

The `FLAG` contains boolean flags for the series. The first bit is a tombstone flag to mark the series as deleted. Other bits are reserved for future use.

Because the local ID is a `uint32` we are limited to approximately 4 billion series in a block. Once the ID overflows then additional series will need to be written to a new block.

#### Histogram

There will also be a smaller bucketed index at the end of the series block which will allow us to narrow down the set of pages to search. This index can be searched first and then a binary search can be performed within a limited range of the series block.

The layout of this histogram would be:

```
KEYN<uint32>
(KEYPOS<uint64>)*
(KEYSZ<uint16>,KEY<string>)*
```

Expand All @@ -58,12 +86,13 @@ Measurement blocks hold the measurement name and a list of tags with pointers to
The layout on disk for the block would be:

```
NAMESZ<uint16>,NAME<string>
FLAG<uint8>,NAMESZ<uint16>,NAME<string>
KEYN<uint64>
(KEYPOS<uint64>,VALBLKPOS<uint64>)*
(KEYSZ<uint16>,KEY<string>)*
(KEYFLAG<uint8>,KEYSZ<uint16>,KEY<string>)*
```

The `FLAG` and `KEYFLAG` contains boolean flags for the measurement and key, respectively. The first bit is a tombstone flag to mark the object as deleted. Other bits are reserved for future use.

### Tag Value Blocks

Expand All @@ -74,14 +103,19 @@ The layout on disk for the block would be:
```
VALN<uint32>
(VALPOS<uint64>)*
(VALSZ<uint16>,VAL<string>)*
(FLAG<uint8>,VALSZ<uint16>,VAL<string>)*
```

The `FLAG` contains boolean flags for the tag value. The first bit is a tombstone flag to mark the tag value as deleted. Other bits are reserved for future use.

### WAL File

The Write Ahead Log (WAL) exists to quickly append new data. Once it reaches a size threshold it can be compacted into a set of blocks on the index.

## Other considerations
The format of each entry in the WAL is:

Some other data we may want to include are:
```
FLAG<uint8>,KEYSZ<uint16>,KEY<string>,CHECKSUM<uint8>
```

- Checksums on blocks
- Block sizes to avoid out-of-range reference bugs
- Trailers on measurement blocks to allow files to be parsed backwards. This may conflict with having WAL blocks because partial writes to a WAL will make it difficult to read backwards.
The `FLAG` holds boolean flags for the series key while the `KEYSZ` and `KEY` store a fixed-size series key. The `CHECKSUM` at the end ensures all bytes were written correctly to disk.

0 comments on commit 4b84e3d

Please sign in to comment.