How to backup and recovery Janusgraph with large data (using HBase and ES)? #2745

bidiudiu · 2021-08-04T12:50:02Z

bidiudiu
Aug 4, 2021

Is there any official CLI or API to do that? If not, do you have any suggestions? Thanks~

porunov · 2021-08-04T13:20:58Z

porunov
Aug 4, 2021
Maintainer

Typically, you don't necessarily need to backup data from ES as you always can reindex your backup data but it depends on the scenario. If you need to restore your cluster from backup in a shorter period of time than backing up ES data makes sense as well.
To backup data you can use the underlying storage / index backends out of the box tools to backup your data. I didn't work with HBase much, but it has similar a Snapshots concept as Cassandra has.
The first link I found which tells how to make HBase backup using Snapshots is: https://blog.cloudera.com/approaches-to-backup-and-disaster-recovery-in-hbase/#snapshots

Of course, Snapshots aren't the only way to backup data, so see other mechanisms from the link above like Replication / Export / CopyTable / HTable API / Offline Backup of Raw HDFS Data.

As I said, backing up ElasticSearch data is probably unnecessary if you are OK with data reindexing (which will recreate ElasticSearch data itself) but in some situations it could be useful to quickly restart from backup without reindexing. If so, you can read more about ElasticSearch backups, such as Snapshots here: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

Another way would be to use Gremlin IO step (https://tinkerpop.apache.org/docs/current/reference/#io-step) directly if you can't or don't want to interact with HBase / ElasticSearch yourself. In such way it will produce data in your chosen format which you can later restore in any other new graph (not necessarily new). Notice, IO step will store data only and not your schema (i.e. you will need to create the same schema when you apply the data from a file / input steam to your new graph). I don't know how scalable this solution is but you probably can write your own reader / writer implementations to be it scalable.

1 reply

bidiudiu Aug 4, 2021
Author

Thanks for your help! I'll try the first way. (since the data is quite large and maybe the second full storage way is not very suitable. )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to backup and recovery Janusgraph with large data (using HBase and ES)? #2745

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to backup and recovery Janusgraph with large data (using HBase and ES)? #2745

bidiudiu Aug 4, 2021

Replies: 1 comment · 1 reply

porunov Aug 4, 2021 Maintainer

bidiudiu Aug 4, 2021 Author

bidiudiu
Aug 4, 2021

Replies: 1 comment 1 reply

porunov
Aug 4, 2021
Maintainer

bidiudiu Aug 4, 2021
Author