-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Store blobs in subdirs with random names (#4309) #5495
base: main
Are you sure you want to change the base?
Conversation
b248bbf
to
ed33f30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, functionality wise, thanks! imu, the PR just always adds a suffix to the stem, not just sometimes (please correct me if i summed that up wrongly :)
i think, this is a reasonable way forward to get rid of weird filenames. however, i leave code-level review to ppl who are more into rust and python.
deltachat.h still says "File name may be mangled" - but that should be "File name will be mangled", or?
wondering what is left in #4583 - i am a little confused by the the two PR :)
ideally, before this gets merged, imu UIs should check their usage of dc_msg_get_file()
and change "affected" code to BLOBDATA+dc_msg_get_filename()
or dc_msg_save_file()
- again imu, UIs can test that already now by forcing a suffix by sending the same file twice
@iequidoo can you please check/refine and file issues for that in all UIs? this is useful in any case
Exactly
Yes, forgot to move the documentation changes to this PR from the follow-up one.
It removes the original file stem at all, we don't need it as we have It's still a question what to do if we need to display a file (with random name) in some external program. This could be solved by preserving file names, but storing files in dirs having random names to avoid copying. But afaik there are plans to encrypt the whole blobstorage (especially on Desktop because there's no sandboxing), so copying can't be avoided anyway.
I think this one can be merged even before switching all UIs to the new API (though it can break sending multipart archives as @adbenitez noted), but definitely not the next PR. I can check Desktop and Android on my own, maybe also iOS, but i don't use it and never looked into its sources :) |
76c82f3
to
deca65d
Compare
So far i see the following tasks for the UIs:
(I'm going to file tasks, but need to study the apps code a bit more. DONE: deltachat/deltachat-desktop#3851, deltachat/deltachat-android#3091.)
|
deca65d
to
0610b7d
Compare
This is likely will never be merged because it breaks opening attachments in external programs (they will always have weird names). The previous approach (copying to a tmp file) failed as well, we don't want to do extra work in general and also it's difficult to maintain that UIs don't cause blob copying when it's not needed, e.g. the UI may want to get the blob path to display an image, this mustn't cause copying as the path isn't passed to an external program, so we need two separate API functions for this. So my suggestion is to save blobs to separate dirs having random names. These dirs should be garbage-collected in housekeeping (as far as i can tell this is already done in |
I think if we change blob storage, then we should already go for a solution that has support for deduplication and encryption. I think it could make sense to have a second sqlite db (maybe even with a different page size more optimised for storing files, or compressed like in https://sqlite.org/sqlar.html) Then that sqlite db could have two tables, one with filenames and hash and one with hash, size and content or sth like that. Or just an SQLite Archive (as described in https://sqlite.org/sqlar.html) archive with hashes as filenames and reusing the blobs table in the existing database. |
I still think this may be excessive on platforms with apps sandboxing like Android. And this makes useless the fs-level dedup if any (my Android just uses ext4, but maybe there are devices using Btrfs or so, though i don't see it on https://source.android.com/docs/core/architecture/android-kernel-file-system-support). Fs-level dedup is useful if files are exported outside of Delta Chat. If we want to use SQLite Archive, then maybe just switch to it? Reading the "Disadvantages" section on https://sqlite.org/sqlar.html, i don't see any critical ones. Still measurements make sense. And on some platforms we still may want to store blobs directly on fs, moreover there may be already a block device encryption or fs-level encryption. For now i suggest to just store blobs in separate dirs, it's not that huge change and should resolve all issues with file names intersection. EDIT: Done. |
0610b7d
to
8bfb6e9
Compare
8a315e8
to
d2e299d
Compare
Recently there was an accident with a chatbot that replaced its avatar set from the command line with an unrelated avatar of a contact. Both the `selfavatar` setting and the contact avatar `i` param pointed to `$BLOBDIR/avatar.png` at the time it was detected. How this happened is unclear, but it is possible that `avatar.png` was removed, unmounted or otherwise not detected by the core, and the core stored avatar received from the contact as `avatar.png`, while `selfavatar` config still pointed to `$BLOBDIR/avatar.png`. Such bugs are unavoidable even if the core itself has no bugs as we cannot rely on blobdir not reside on the faulty network filesystem, being incorrectly backed up and restored etc., so we should assume that files may be randomly removed. Then there may be dangling `$BLOBDIR/...` references in the database which may accidentally point to unrelated files, could even be an `avatar.png` file sent to the bot in private. To prevent such bugs, store blobs in blobdir subdirs with random names. Also this helps when we receive multiple attachments having the same name -- before, random filename suffixes were added to subsequent attachments, now attachments preserve their filenames which is important if they are opened in external programs.
d2e299d
to
cb1d008
Compare
Close #4309
Related discussions: #4555, #4583.
TODO: If this is merged, consider reverting #5778.