Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Batch base Sync to improve large scale file async #1172

Closed
qiyuangong opened this issue Sep 27, 2017 · 0 comments
Closed

Batch base Sync to improve large scale file async #1172

qiyuangong opened this issue Sep 27, 2017 · 0 comments
Assignees

Comments

@qiyuangong
Copy link
Contributor

Description:
Currently, baseSync will fetch file list then insert diffs to metastore before processing file sync. This procedure works well when the number of files is not large. But, when there are billion of files in src dir, the processing time before file sync may be not acceptable.

Basic solution/idea (Async process file list):

  1. Add all files to memory (very fast).
  2. Batch insert diffs into metastore (slow due to metastore and network).
  3. Processing file diffs when diffs are in metastore (fast).
@qiyuangong qiyuangong self-assigned this Sep 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant