Better support for hardlink detection #534
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous approach for linux was scanning the root of the given file: /home/user/files/video_file1.mp4 => scanned: => / for all files
With a 30 second timeout generally would never complete before timing out. And if it did complete, it was scanning the entire filesystem for every single possible duplicate.
Now the scan will occur multiple times but only at the base of each included directory:
IncludeDir: /home/user1/files
IncludeDir: /home/user2/videos/
Scans => /home/user1/files/ and /home/user2/videos/
Ideally, I think the hard links approach for linux should simply collect all inode values at the start of file enumeration (st_ino) and use those for comparison (without even invoking ffmpeg or any other heuristics) since the inode number is the definition of hard-links (for linux), but that approach requires modifying the FileEntry struct which involves protobuf attributes and I'm not really familiar with that.