Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate titles when using Symlinks in Library #8009

Open
mixxxbot opened this issue Aug 22, 2022 · 24 comments
Open

duplicate titles when using Symlinks in Library #8009

mixxxbot opened this issue Aug 22, 2022 · 24 comments

Comments

@mixxxbot
Copy link
Collaborator

Reported by: mk42
Date: 2015-05-05T10:39:29Z
Status: Confirmed
Importance: Medium
Launchpad Issue: lp1451778


I'm using build branch-1.12 r5411.

I have symlinked /home/user/Music/ to /mnt/data/Music.
I've set /home/user/Music/SomeFolder as my LibraryRoot.

Some titles that I play in Mixxx 1.12 will be inserted a second time to my library with the path /mnt/data/Music/...
I can't at the moment reproduce when excactly this happens, but I now have multiple files two times in my library, once with path /home/... and once with /mnt/...
the one from /mnt/... is the newer one and mostly has some additional metadata (as album-artist) set. Maybe this could hint in the right direction.

Sometimes I noticed that titles will be analyzed when loading to a deck even if they have bmp and key attached. This could be issued by that.

Cheers,
Markus

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-05T16:56:24Z


I see many invocations of QFileInfo::canonicalFilePath() in the code base which replaces symbolic links in the file path. But QFileInfo::absoluteFilePath() is also used which preserves symbolic links.

What is missing is a common function that normalizes file paths consistently instead of calling different Qt functions that are scattered all around.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-05T17:30:20Z


Using QFileInfo::canonicalFilePath() is basically correct, but some post-processing is needed. The prefix of the canonical file path has to be matched against the canonical file path of the configured music directories. If a match is found it has to be replaced by QFileInfo::absoluteFilePath() of that specific music directory which might contain a symbolic link.

Note: Music directories must be referenced by their absolute file path! Their canonical file path may vary, i.e. if the target directory of a symbolic link is changed in the file system.

Precondition: The canonical file paths of all music directories must be disjunct, i.e. none is the prefix of any other. This restriction should be checked not only when adding new music directories, but also on every startup of Mixxx! The canonical file paths may vary even while Mixxx is running, but that is definitely out of scope.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-05T17:38:43Z


Please note that my previous proposal does not account for symbolic inside/underneath a music directory. This would require even more post-processing, not only of the prefix (= reference of music directory = absolute path of music directory) but also of the suffix (= relative path inside the referenced music directory).

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-05T17:43:49Z


Simple rule: QFileInfo::canonicalFilePath() is needed for normalization purposes. But since canonical file paths may vary (-> symbolic links) they should never be stored persistently as a reference to any file or directory!

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-07T05:56:10Z


I will try to find a solution that is both robust and future proof. But definitely out of scope for 1.12.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-08T09:40:46Z


I was able to reproduce this behavior! It happens when you open a track from the library through the browse view. This track is then stored in the library without symlinks in the file path, i.e. the "canonical" file path.

A quick fix for 1.12 might be possible.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-08T22:15:22Z


I've found a pragmatic solution:

  • basically use the absolute path of files and directories
  • normalize it by using QDir::cleanPath()
  • use the canonical path only for security tokens

This strategy preserves symlinks, while redundant parts of the path are removed. The corresponding static functions will be located in MFile and MDir. All invocations of QDir::absolutePath(), QFileInfo::absolutePath(), and QFileInfo::absoluteFilePath() will be replace by MDir::normalizedPath(), MFile::normalizedPath(), and MFile::normalizedFilePath() respectively.

PR will follow shortly. I've prepared a branch for 1.12, but rebasing it on master would also be possible.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-10T20:40:33Z


Quick fix for 1.12: #581

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-11T12:07:59Z


A short note about the scope and limitations of this fix:

There is no efficient way to reliably detect if an arbitrary file is already contained in the library if the path may contain symbolic links. The same file can be referenced by any number of paths. It is not possible to determine all paths to a single file.

For the decision if two files are the same their canonical paths have to be compared. But the canonical path of a file may change over time and cannot be stored and indexed in the database as I already mentioned. A full scan of the whole library to determine the current canonical path of each file for comparison is definitely not an option.

If you browse and open a file through the same path that is referenced in the library, no duplicates should be created with this fix applied. Browsing and opening the file through a different path will still add duplicate entries to the library.

@mixxxbot
Copy link
Collaborator Author

Commented by: daschuer
Date: 2015-05-11T12:53:07Z


The only thing we can do is to compare files with equal or Artist and Title.

Like this:

  • User picks a track from outside the library path
  • Find tracks with same Artist / Title
  • compare canonical path
  • Use the library path for the folowing operations if the canonical path matches.

@mixxxbot
Copy link
Collaborator Author

Commented by: mk42
Date: 2015-05-12T08:24:57Z


Sorry for not giving feedback till now, didn't get notifications from Launchpad.

Thanks for your fix. But I think there must be another way then using the "Browse-View" since I never used it with mixxx.

Is there any possibility to clean up my library now?

@mixxxbot
Copy link
Collaborator Author

Commented by: daschuer
Date: 2015-05-12T11:14:56Z


Did you use drag and drop from a file manager?

It looks like your original issue is still open.

To clean up your DB can Hide the duplicates and then Purge them from the hidden track view.
Unfortunately your will loose the history of these tracks.

If you wish to start with a fresh database, you may rename or delete
~/.mixxx/mixxxdb.sqlite

@mixxxbot
Copy link
Collaborator Author

Commented by: mk42
Date: 2015-05-13T09:13:30Z


No, I don't do that normally - means I can't remember that i've done that.
I think its something else.

I will try to reproduce this.

@mixxxbot
Copy link
Collaborator Author

Commented by: mk42
Date: 2015-05-13T09:17:48Z


I got it. I looked over the files and tryed to remember.

It happens when you drag and drop such a file from Library to a Deck (maybe in some other circumstances as well). Then a file with the path outside of Music-Directory (which is a symlinked path) will be added to the library

@mixxxbot
Copy link
Collaborator Author

Commented by: mk42
Date: 2015-05-13T09:32:54Z


Happens also when dragging to preview-Deck

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-17T12:20:43Z


I experienced this issue one again when analyzing a track that has not yet been played. Unfortunately I was not able to reproduce this error.

File system access indeed needs to be reworked by avoiding to directly invoke filePath()/absoluteFilePath()/relativeFilePath() from the various Qt classes.

@mixxxbot
Copy link
Collaborator Author

Commented by: mk42
Date: 2015-05-19T08:42:39Z


Maybe a dumb question, but isn't double-clicking and context-menu -> load to deck X doing it 'right'?
When I use those ways to load tracks i won't get duplicate tracks.
Wouldn't it be possible to use the way tracks (and paths) are loaded for now until it is reworked as you said?

@mixxxbot
Copy link
Collaborator Author

Commented by: mk42
Date: 2015-05-19T18:11:45Z


I've just tested it.

For me replacing canonicalFilePath with absoluteFilePath in the onDrop functions works. Should I file a pull request for those changes?

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-05-19T20:05:26Z


I was unsure if I should change the onDrop functions. Of course, another pull request for those functions (-> 1.12) makes sense, if you've tested it.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2015-12-17T09:00:38Z


Another fix for the duplicate tracks issue:
#814

@mixxxbot
Copy link
Collaborator Author

Commented by: jclaveau
Date: 2016-03-03T17:32:59Z


There is question I think about for a while now :
I have a lot of duplicates in my library due to the fact that my music is stored on an external hdd and the default location of Ubuntu can change whether or not it's plugged during boot.

More, I often move my tracks from a folder to another as I store them by genre and split my folders in subgenres when they are too big.

I need their location to be well tidied to use them on the CDJs of my friends with a USB pen.

So my question is : why don't we use Chromaprint's fingerprints to find duplicates and ?

I don't know if the idea has already been discussed or not. Tell me if I should file a new bug.

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2016-03-04T09:35:07Z


If you store your music on external storage (like I do) use symbolic links for your music library. This ensures that your files always appear at the same location, independent of where the volume is actually mounted. Example: I use <home directory>/Music/Collection

Each track in Mixxx has a "location" that references the corresponding file. Track locations are normalized by QFileInfo::absoluteFilePath(). This value that is stored in the library persistently.

Track locations might contain symbolic links. Different locations can reference the same file, this is what you noticed and might lead to duplicates. At runtime we additionally use QFileInfo::canonicalFilePath() to determine the actual file path. But this path can change dynamically (depending on mount points) so we must not store it in the library.

See also my latest PR that encapsulates those concepts in TrackRef:
#901

The chroma fingerprint (as any other "hash" value) is not guaranteed to be unique. We should think about using it for relocating moved files and finding duplicates, nice idea. But it is not a replacement for the track's location that we need to store in the library.

@mixxxbot
Copy link
Collaborator Author

Commented by: rryan
Date: 2018-09-20T17:51:31Z


Uwe, is this fixed?

@mixxxbot
Copy link
Collaborator Author

Commented by: uklotzde
Date: 2020-08-17T22:32:58Z


Related: Since canonical file paths might not be unique within the library we need to account for this fact when caching tracks.

#3027

@mixxxbot mixxxbot transferred this issue from another repository Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant