Skip to content
This repository has been archived by the owner on Sep 6, 2019. It is now read-only.

Use hmac value to compare cloud and local copy of files #29

Open
kaloyan-raev opened this issue Nov 13, 2017 · 6 comments
Open

Use hmac value to compare cloud and local copy of files #29

kaloyan-raev opened this issue Nov 13, 2017 · 6 comments
Assignees

Comments

@kaloyan-raev
Copy link
Contributor

The bridge stores an hmac value for each file. It is returned in the JSON response for each file when listing the bucket's file, e.g.

"hmac": {
  "value": "3c3cd36d8046484141c41c49d9395b7a6bc4be6ddf6fb2936b0bc3524f026182d6f18a5233203cca2cbe07a4422852b3558afcbd240939a98dde6b9750516e99",
  "type": "sha512"
}

We need to check if we can take advantage of this value. It might be useful in cases when a file is present both on the cloud and in the local storage, but not in the sync DB. In such case, the sync app will detect a conflict. If it can compare the hmac of the local and cloud copy then it can detect that the copies are equal and mark the file as synced without reporting a conflict.

A valid user scenario would be if the user wants to recreate the sync DB due to some bug in the app. Then the user would just delete the sync DB and run again the sync app. The app would recreate the sync DB without transferring any files or reporting conflicts just by comparing the hmac values.

@kaloyan-raev
Copy link
Contributor Author

I asked about the hmac value in the Storj community chat: https://community.storj.io/channel/dev?msg=56nQn8TXDRtLQk6qn

@kaloyan-raev
Copy link
Contributor Author

This would work if libstorj provides a function that can calculated the HMAC for the local file using the same logic it uses when uploading files.

I opened an issue in the libstorj project: storj-archived/libstorj#401

@kaloyan-raev
Copy link
Contributor Author

I pushed a related PR to libstorj: storj-archived/libstorj#402

This is not the PR that provides the API function, requested in storj-archived/libstorj#401, but one for properly populating the HMAC available in the bridge to the file metadata returned to the client.

@jkawamoto
Copy link
Member

If libstorj decide not providing functions to compute only HMACs, we can compute HMACs by ourselves with some libraries such as The Ripple Java Library. However, we should consider that files have to be encrypted to compute HMACs in either case. Maybe, we should compare file sizes first, and then compare HMACs so that we can reduce the computational cost.

@kaloyan-raev
Copy link
Contributor Author

The build-in crypto functions in Java are enough for computing HMAC. The problem is that the HMAC value stored in the bridge is not just a simple HMAC-512 checksum of the local file. It's a much more complex formula that looks like this: HMACSHA512([RIPMD160(SHA256(shard_1_data))|RIPMD160(SHA256(shard_2_data))|...])

So, if we want to do it in pure Java, we need to replicate the whole sharding process done by libstorj. This would be complex to implement and maintain. Hence, it's better to have it provided by libstorj, which has already implemented that logic.

Regarding the computational cost, I totally agree. Calculating the HMAC, especially for big files, is a very expensive operation. So, we should first check the file size and compare HMAC only if the file size of the two copies is equal.

@jkawamoto
Copy link
Member

First of all, I agree we should keep asking libstorj to export a function computing HMAC. I meant, in the worst case scenario, we need to replicate the whole sharding process and the above library might help it. The process is complicated but exporting such functions from libstorj also seems not easy. So, we should have a plan B.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants