-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write multiple copies when uploading file for durability improvement #25
Comments
I think the general idea of having a version of uploading a file that lets you know when it has been replicated to meet the replication policy rather than just after the first copy is uploaded is a good one. You could implement it like you suggested or just have a version of the existing upload methods which take one extra optional parameter indicating whether you want a "reliable" or "fully replicated" upload. I guess that's up to the mogilefs-server developers and their API but I can definitely see value in having something like this. |
There are several possible strategies to upload a file to multiple destinations about this issue. I think the basic two are 1. trivial single thread upload and 2. multi thread producer-consumer upload. I wrote a little PoC code to benchmark above two strategies posted here. I run the code in a multiple hosts/disks environment. Since the network in the environment is expected to be bottleneck, I run the code via 1Gbps and 1Gbps*2=2Gbps network respectively to see it's performance (latency/bandwidth) variance in specific setup. The result is described below. Notice that the "original upload" shown below only uploads a single copy and act as a control group.
Personally I think the first version will adopt "single thread upload" for it's simplicity and acceptable overhead in performance. Design for users who needs durability > performance. |
API designThe original API design let storage class to be assigned in
Then when doing file upload in following examples, the file will have multiple copies before file upload is finished.
The file will lost just in a rarely scenario that two disks which retains replicas belong to the file are broken. p.s. I post the progress so that anyone who interest in this could join the design here. |
I have a general API design comment; personally I'd avoid a boolean flag to
Elliot. On 5 October 2016 at 09:21, hrchu notifications@github.com wrote:
|
@teabot I agree that boolean flag is hard to understand it's usage without IDE. I like the enum parameter approach, with values WriteStrategy.DEFAULT and WriteStrategy.DURABLE. I think that not to use number in the enum value can prevent user confusing this with storageClass (which also implies number of replicas has to be retained). Thank you for the suggestion (and for this project!) |
+1 for the enum suggestion with the values proposed by @hrchu |
I have finished the first version in branch enhance/durableWrite. I think it should not be merged before mogilefs/MogileFS-Server#39 be accepted. Since mogilefs team is inactive, I am going to use it in my production first. |
Sounds good and agree we shouldn't merge it here until it gets supported upstream (or in an upstream fork that is available for end users). |
the branch enhance/durableWrite is disappeared, wired. |
Hmmm, not sure how that happened. I have a version of that branch still checked out locally, I can do a diff against master and send you patch if that would help? A lot has changed in master since then so it would require quite a bit of work to get it mergeable but at least it would be a start. |
Hi guys,
To prevent file loss risk that the host/disk which hold the first copy may broken before replication is done, I am doing some work to address the issue.
I have send a PR to mogilefs-server mogilefs/MogileFS-Server#39
I will try to let moji support this new feature later.
Any ideas about this?
The text was updated successfully, but these errors were encountered: