-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): near-duplicate detection #8228
Conversation
Deploying immich with Cloudflare Pages
|
722c878
to
7d9d9f9
Compare
3bf6521
to
788d476
Compare
788d476
to
c87dfe4
Compare
Can I help with anything regarding this PR, I am happy to work on UI |
I cleaned it up so the backend part is essentially good to go (might need to adjust the response if you want them to be grouped by duplicates and not just sorted). The UI... has a lot of room for improvement haha. It'd be great if you could help with that 😄 |
In terms of UI, would it make sense if photo stacks were automatically created for near-duplicate photos? It's something that the App-Which-Must-Not-Be-Named introduced a while ago and I personally find it very useful. |
The idea right now is to have the duplicates displayed in a dedicated page where there are options to convert them to stacks or deduplicate based on some criteria, but otherwise treat them as separate assets until the user elects to do this. Auto-stacking would be very useful and convenient, though, so a later PR to add this functionality would be nice. |
Is it possible to have some kind of notification so that the user can interact with stacking? It's nice when images are stacked automatically, but I find that this sometimes occurs erroneously and I'd like to at least know when a stack has been made. |
A notification in the web UI would be straightforward. But auto-stacking would be a later addition, so discussion on that is a bit out of scope for this PR. |
The duplicate threshold will be exposed in the admin settings. I was debating between defaulting to 0.02 or 0.03, so maybe 0.03 is the better default after all. |
1ea463e
to
e825c55
Compare
This
And this, will be separate. Once this is implemented, these features can be worked on. So they will basically be built off of the code in this feature. |
I would suggest taking some inspiration from Samsung Gallery for the UI. When you hit delete duplicates it selects all but one of each of the duplicates (so if there is 3 it will select 2). I'm pretty sure it takes date modified or something and if they are different resolutions or file sizes it selects the lower resolution or filesize. Then you can hit delete with all of the duplicates selected. I think implementing something similar wouldn't be too difficult, and it doesn't even have to select for you but the side-by-side view is the most important thing. Having a button to select duplicates which could prefer selecting the lower resolution/filesize would be an added bonus. FYI the Testing Immich Album has photos I copied over for testing immich and I only select this album in the mobile app to protect my photos from bugs etc., so they are duplicates. |
Hey there, amazing work on this PR! Just a thought - how about we get rid of those JPEG duplicates when we've got the original HEIC files? What's your take on this? |
If the files are basically identical then this should pick that up. If you're suggesting that it automatically prefer HEIC, that's for the UI which is coming eventually. |
There will be an option to deduplicate based on resolution, file size, etc. That will get you most of the way there, except in cases where the HEIF is smaller than JPEG purely because it's a more efficient format. Doing it based on format sounds iffy. You can have a high resolution, high quality JPEG that looks similar to a poor quality HEIF, not to mention that we'd need an arbitrary ranking for which format is better. We can always expand on this in the future, possibly with a measure of compression artifacts and selecting the image with the least artifacts. But for the first cut, it's better to keep it simple. |
@mertalev what do you think? I haven't done much but at least you can get out of there. |
Nice! I'll reduce the scope of this PR to just be the backend changes so we can do the UI separately. |
4c5bab8
to
05fe371
Compare
After removing the UI changes, this PR is ready for review. The current behavior is that the feature is disabled by default and not exposed to the user except through the config file. The only blocker is that a seemingly unrelated E2E test is failing. |
Another complement about this functionality. Since it's AI based, it picks up 2 different pictures taken directly after one another at slightly different angles or distances. So when I take multiple pictures just in case one of them is blurry bur then later have a bunch of extra, this should be the solution. Ignore the sidebar, I tapped the button which scrolls down and adds to the screenshot and it did that. |
In addition to hashing, the exif spec contains a field for |
I think that would be saved for later. For the UI. Alex and I discussed UI development and Alex will develop most of it but I'll try and start on it this week. We are thinking of a Utilities page which has the deduplication page. I am taking note of what you said, I'm not sure how much logic will go into the deduplication page but that seems like a good idea. |
* Add Duplicate Detection Flag * Use Duplicate Detection Flag * Attempt Fixes for Failing Checks * lower minimum `maxDistance` * fix tests --------- Co-authored-by: mertalev <101130780+mertalev@users.noreply.github.com>
c364bc8
to
a5dbef9
Compare
8827ae6
to
a5dbef9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me! Awesome :)
Man @mertalev I'm sorry for bringing this up so late but I just checked #1968 and someone commented a really good point but was asking about the community deduplication project (but this prompted me to realize this could be an issue for our (by our I mean this PR) implementation) :
This is actually a really good idea, but how would something like this be implemented? There would have to be a blacklist of some kind and preferably a way to remove items from the blacklist. If deleting files on the web could delete them on mobile this wouldn't be a problem. But I'm worried that in the current state of immich, deleting a duplicate on the server that was backed up from the phone, will be reuploaded from the phone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see this moved to a DuplicateController
and DuplicateService
and served on the /duplicates
route, but since it isn't used in the UI yet we can do this in a follow-up pull request.
Imo this is a separate ongoing issue we need to address separately as it's impact is wider than just duplicates. |
Yeah, I guess so. |
I believe currently the mobile app will not reupload files that it has already uploaded in the past (but I could be wrong). If you reinstalled the app or something like that it still would though. |
Even then, you'd have the duplicates sitting on your phone and if you reinstalled the app or switched phones and copied everything over then all of a sudden the photos reappear. But that's a different problem because it doesn't just affect duplicates. |
We have a plan to sync deletions to synced devices, we will likely be delivering that before stable |
I'm looking forward to this feature, it's going to be a highlight of immich. |
immich v1.106.1 was just released, the release includes this feature. |
Description
This PR adds a new job to detect duplicate assets and aggregate them with a new
duplicateId
column. This PR only implements the backend for duplicate detection. It does not expose the results in the UI or take any actions relating to the assets: this is left for future work.The data model is such that each
(duplicateId, assetId)
pair uniquely identifies a duplicate asset and eachduplicateId
can have many associated assets.To do:
duplicateIds
exist among the found duplicatesImplements #1968
How Has This Been Tested?
Tested by running the new job on all assets through the job panel and inspecting logs to confirm that some assets have duplicates.
Tested that the duplicates displayed in the web view are actually near-duplicates.
Tested that changing the duplicate threshold changes the strictness of the results.