Replies: 1 comment 1 reply
-
Typically, you don't necessarily need to backup data from ES as you always can reindex your backup data but it depends on the scenario. If you need to restore your cluster from backup in a shorter period of time than backing up ES data makes sense as well. Of course, Snapshots aren't the only way to backup data, so see other mechanisms from the link above like Replication / Export / CopyTable / HTable API / Offline Backup of Raw HDFS Data. As I said, backing up ElasticSearch data is probably unnecessary if you are OK with data reindexing (which will recreate ElasticSearch data itself) but in some situations it could be useful to quickly restart from backup without reindexing. If so, you can read more about ElasticSearch backups, such as Snapshots here: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html Another way would be to use Gremlin IO step (https://tinkerpop.apache.org/docs/current/reference/#io-step) directly if you can't or don't want to interact with HBase / ElasticSearch yourself. In such way it will produce data in your chosen format which you can later restore in any other new graph (not necessarily new). Notice, IO step will store data only and not your schema (i.e. you will need to create the same schema when you apply the data from a file / input steam to your new graph). I don't know how scalable this solution is but you probably can write your own reader / writer implementations to be it scalable. |
Beta Was this translation helpful? Give feedback.
-
Is there any official CLI or API to do that? If not, do you have any suggestions? Thanks~
Beta Was this translation helpful? Give feedback.
All reactions