Skip to content

Latest commit

 

History

History
335 lines (255 loc) · 13.1 KB

synthetic-size.md

File metadata and controls

335 lines (255 loc) · 13.1 KB

Synthetic size

Neon storage has copy-on-write branching, which makes it difficult to answer the question "how large is my database"? To give one reasonable answer, we calculate synthetic size for a project.

The calculation is called "synthetic", because it is based purely on the user-visible logical size, which is the size that you would see on a standalone PostgreSQL installation, and the amount of WAL, which is also the same as what you'd see on a standalone PostgreSQL, for the same set of updates.

The synthetic size does not depend on the actual physical size consumed in the storage, or implementation details of the Neon storage like garbage collection, compaction and compression. There is a strong correlation between the physical size and the synthetic size, but the synthetic size is designed to be independent of the implementation details, so that any improvements we make in the storage system simply reduce our COGS. And vice versa: any bugs or bad implementation where we keep more data than we would need to, do not change the synthetic size or incur any costs to the user.

The synthetic size is calculated for the whole project. It is not straighforward to attribute size to individual branches. See "What is the size of an individual branch?" for discussion on those difficulties.

The synthetic size is designed to:

  • Take into account the copy-on-write nature of the storage. For example, if you create a branch, it doesn't immediately add anything to the synthetic size. It starts to affect the synthetic size only as it diverges from the parent branch.

  • Be independent of any implementation details of the storage, like garbage collection, remote storage, or compression.

Terms & assumptions

  • logical size is the size of a branch at a given point in time. It's the total size of all tables in all databases, as you see with "\l+" in psql for example, plus the Postgres SLRUs and some small amount of metadata. NOTE that currently, Neon does not include the SLRUs and metadata in the logical size. See comment to get_current_logical_size_non_incremental().

  • a "point in time" is defined as an LSN value. You can convert a timestamp to an LSN, but the storage internally works with LSNs.

  • PITR horizon can be set per-branch.

  • PITR horizon can be set as a time interval, e.g. 5 days or hours, or as amount of WAL, in bytes. If it's given as a time interval, it's converted to an LSN for the calculation.

  • PITR horizon can be set to 0, if you don't want to retain any history.

Calculation

Inputs to the calculation are:

  • logical size of the database at different points in time,
  • amount of WAL generated, and
  • the PITR horizon settings

The synthetic size is based on an idealistic model of the storage system, where we pretend that the storage consists of two things:

  • snapshots, containing a full snapshot of the database, at a given point in time, and
  • WAL.

In the simple case that the project contains just one branch (main), and a fixed PITR horizon, the synthetic size is the sum of:

  • the logical size of the branch at the beginning of the PITR horizon, i.e. at the oldest point that you can still recover to, and
  • the size of the WAL covering the PITR horizon.

The snapshot allows you to recover to the beginning of the PITR horizon, and the WAL allows you to recover from that point to any point within the horizon.

                             WAL
   -----------------------#########>
                          ^
                       snapshot

Legend:
  ##### PITR horizon. This is the region that you can still access
        with Point-in-time query and you can still create branches
        from.
  ----- history that has fallen out of the PITR horizon, and can no
        longer be accessed

NOTE: This is not how the storage system actually works! The actual implementation is also based on snapshots and WAL, but the snapshots are taken for individual database pages and ranges of pages rather than the whole database, and it is much more complicated. This model is a reasonable approximation, however, to make the synthetic size a useful proxy for the actual storage consumption.

Example: Data is INSERTed

For example, let's assume that your database contained 10 GB of data at the beginning of the PITR horizon, and you have since then inserted 5 GB of additional data into it. The additional insertions of 5 GB of data consume roughly 5 GB of WAL. In that case, the synthetic size is:

10 GB (snapshot) + 5 GB (WAL) = 15 GB

If you now set the PITR horizon on the project to 0, so that no historical data is retained, then the beginning PITR horizon would be at the end of the branch, so the size of the snapshot would be calculated at the end of the branch, after the insertions. Then the synthetic size is:

15 GB (snapshot) + 0 GB (WAL) = 15 GB.

In this case, the synthetic size is the same, regardless of the PITR horizon, because all the history consists of inserts. The newly inserted data takes up the same amount of space, whether it's stored as part of the logical snapshot, or as WAL. (*)

(*) This is a rough approximation. In reality, the WAL contains headers and other overhead, and on the other hand, the logical snapshot includes empty space on pages, so the size of insertions in WAL can be smaller or greater than the size of the final table after the insertions. But in most cases, it's in the same ballpark.

Example: Data is DELETEd

Let's look at another example:

Let's start again with a database that contains 10 GB of data. Then, you DELETE 5 GB of the data, and run VACUUM to free up the space, so that the logical size of the database is now only 5 GB.

Let's assume that the WAL for the deletions and the vacuum take up 100 MB of space. In that case, the synthetic size of the project is:

10 GB (snapshot) + 100 MB (WAL) = 10.1 GB

This is much larger than the logical size of the database after the deletions (5 GB). That's because the system still needs to retain the deleted data, because it's still accessible to queries and branching in the PITR window.

If you now set the PITR horizon to 0 or just wait for time to pass so that the data falls out of the PITR horizon, making the deleted data inaccessible, the synthetic size shrinks:

5 GB (snapshot) + 0 GB (WAL) = 5 GB

Branching

Things get more complicated with branching. Branches in Neon are copy-on-write, which is also reflected in the synthetic size.

When you create a branch, it doesn't immediately change the synthetic size at all. The branch point is within the PITR horizon, and all the data needed to recover to that point in time needs to be retained anyway.

However, if you make modifications on the branch, the system needs to keep the WAL of those modifications. The WAL is included in the synthetic size.

Example: branch and INSERT

Let's assume that you again start with a 10 GB database. On the main branch, you insert 2 GB of data. Then you create a branch at that point, and insert another 3 GB of data on the main branch, and 1 GB of data on the child branch

  child                 +#####>
                        |
                        |    WAL
  main    ---------###############>
                   ^
                snapshot

In this case, the synthetic size consists of:

  • the snapshot at the beginning of the PITR horizon (10 GB)
  • the WAL on the main branch (2 GB + 3 GB = 5 GB)
  • the WAL on the child branch (1 GB)

Total: 16 GB

Diverging branches

If there is only a small amount of changes in the database on the different branches, as in the previous example, the synthetic size consists of a snapshot before the branch point, containing all the shared data, and the WAL on both branches. However, if the branches diverge a lot, it is more efficient to store a separate snapshot of branches.

Example: diverging branches

You start with a 10 GB database. You insert 5 GB of data on the main branch. Then you create a branch, and immediately delete all the data on the child branch and insert 5 GB of new data to it. Then you do the same on the main branch. Let's assume that the PITR horizon requires keeping the last 1 GB of WAL on the both branches.

                              snapshot
                                  v     WAL
  child                 +---------##############>
                        |
                        |
  main     -------------+---------##############>
                                  ^     WAL
                              snapshot

In this case, the synthetic size consists of:

  • snapshot at the beginning of the PITR horizon on the main branch (4 GB)
  • WAL on the main branch (1 GB)
  • snapshot at the beginning of the PITR horizon on the child branch (4 GB)
  • last 1 GB of WAL on the child branch (1 GB)

Total: 10 GB

The alternative way to store this would be to take only one snapshot at the beginning of branch point, and keep all the WAL on both branches. However, the size with that method would be larger, as it would require one 10 GB snapshot, and 5 GB + 5 GB of WAL. It depends on the amount of changes (WAL) on both branches, and the logical size at the branch point, which method would result in a smaller synthetic size. On each branch point, the system performs the calculation with both methods, and uses the method that is cheaper, i.e. the one that results in a smaller synthetic size.

One way to think about this is that when you create a branch, it starts out as a thin branch that only stores the WAL since the branch point. As you modify it, and the amount of WAL grows, at some point it becomes cheaper to store a completely new snapshot of the branch and truncate the WAL.

What is the size of an individual branch?

Synthetic size is calculated for the whole project, and includes all branches. There is no such thing as the size of a branch, because it is not straighforward to attribute the parts of size to individual branches.

Example: attributing size to branches

(copied from neondatabase#2884 (comment))

Imagine that you create two branches, A and B, at the same point from main branch, and do a couple of small updates on both branches. Then six months pass, and during those six months the data on the main branch churns over completely multiple times. The retention period is, say 1 month.

                      +------> A
                     /
--------------------*-------------------------------> main
                     \
                      +--------> B

In that situation, the synthetic tenant size would be calculated based on a "logical snapshot" at the branch point, that is, the logical size of the database at that point. Plus the WAL on branches A and B. Let's say that the snapshot size is 10 GB, and the WAL is 1 MB on both branches A and B. So the total synthetic storage size is 10002 MB. (Let's ignore the main branch for now, that would be just added to the sum)

How would you break that down per branch? I can think of three different ways to do it, and all of them have their own problems:

Subtraction method

For each branch, calculate how much smaller the total synthetic size would be, if that branch didn't exist. In other words, how much would you save if you dropped the branch. With this method, the size of branches A and B is 1 MB.

With this method, the 10 GB shared logical snapshot is not included for A nor B. So the size of all branches is not equal to the total synthetic size of the tenant. If you drop branch A, you save 1 MB as you'd expect, but also the size of B suddenly jumps from 1 MB to 10001 MB, which might feel surprising.

Division method

Divide the common parts evenly across all branches that need them. With this method, the size of branches A and B would be 5001 MB.

With this method, the sum of all branches adds up to the total synthetic size. But it's surprising in other ways: if you drop branch A, you might think that you save 5001 MB, but in reality you only save 1 MB, and the size of branch B suddenly grows from 5001 to 10001 MB.

Addition method

For each branch, include all the snapshots and WAL that it depends on, even if some of them are shared by other branches. With this method, the size of branches A and B would be 10001 MB.

The surprise with this method is that the sum of all the branches is larger than the total synthetic size. And if you drop branch A, the total synthetic size doesn't fall by 10001 MB as you might think.

Alternatives

A sort of cop-out method would be to show the whole tree of branches graphically, and for each section of WAL or logical snapshot, display the size of that section. You can then see which branches depend on which sections, which sections are shared etc. That would be good to have in the UI anyway.

Or perhaps calculate per-branch numbers using the subtraction method, and in addition to that, one more number for "shared size" that includes all the data that is needed by more than one branch.

Which is the right method?

The bottom line is that it's not straightforward to attribute the synthetic size to individual branches. There are things we can do, and all of those methods are pretty straightforward to implement, but they all have their own problems. What makes sense depends a lot on what you want to do with the number, what question you are trying to answer.