Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshoted indices data (disk space & more) #23479

Closed
eirc opened this issue Mar 3, 2017 · 4 comments
Closed

Snapshoted indices data (disk space & more) #23479

eirc opened this issue Mar 3, 2017 · 4 comments
Labels
discuss :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@eirc
Copy link

eirc commented Mar 3, 2017

Describe the feature:

What I really need is a way to tell how much disk space is used up by specific indices on snapshots.

Currently there is no way to get any statistics on the disk space used by snapshots except for directly querying the snapshot repository's filesystem. This is even more obscure on 5.0 where indices are not stored in folders with the index name but instead using an ID.

Two possible ideas on how to fix this:

  • Provide a map of snapshot index IDs to index names on some API and let user go query the filesystem
  • Keep the mapping internal and instead directly expose the disk space used (and possibly other data too) on some API essentially keeping users away from the snapshot repository filesystem

Both those approaches could be included for example in the _snapshot API by changing the indices array to contain hashes with data:

/_snapshot/{repository}/_all
{
  "snapshots" : [
    {
      "snapshot" : "mysnapshot-20170302141731",
      "uuid" : "qB7R3TIkQIKcKrDkPIlsLg",
      ...
      "indices" : [
        {
          "name": "nicelogs-2017.03.02"
          "id_on_disk": "jbD8iHU_Toy1kNRGXhS6sA",
          "size_on_disk_bytes": "1000000",
          "more_data_here": "???",
        },
        ...
      ],
      ...
    },
    ...

This is just an example, I know nothing about ES's conventions on naming things in the API

My question that prompted the feature request

@imotov imotov added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs discuss >feature and removed >feature labels Mar 20, 2017
@imotov
Copy link
Contributor

imotov commented Mar 20, 2017

This might be tricky. Files that belong to individual indices are shared between multiple snapshots, so it might be misleading to provide stats on per index per snapshot level. We might be able to provide overall per index stats for the entire repository. @abeyad what do you think?

@abeyad
Copy link

abeyad commented Mar 20, 2017

Index disk space per snapshot will not make much sense as @imotov mentioned because we share index data across snapshots that stays the same. So deleting a snapshot will not necessarily clear up all the disk space used by indexes in that particular snapshot, because another snapshot may contain the same index sharing much of the same data.

We might be able to provide overall per index stats for the entire repository.

Yes, this would be much more feasible to implement.

@eirc
Copy link
Author

eirc commented Mar 21, 2017

That's all true, it would make more sense to get global (repository) index stats. For my logging case this is exactly what I need to answer the question how much do past days' log retention costs in disk space. Finding what snapshots need to be deleted to fully delete a specific past day can already be done with the current API.

@tlrx
Copy link
Member

tlrx commented Mar 26, 2018

We talked about this today with @ywelsch and this issue is similar to #18543, so I'm closing this.

@tlrx tlrx closed this as completed Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

4 participants