Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrackRef = Track Location + Track ID #901

Closed
wants to merge 5 commits into from
Closed

TrackRef = Track Location + Track ID #901

wants to merge 5 commits into from

Conversation

uklotzde
Copy link
Contributor

@uklotzde uklotzde commented Mar 3, 2016

A class for referencing tracks with or without an id. It is needed for the introduction of a TrackCache that manages tracks by their TrackRef.

The new class encapsulates both the location and id of a track. Each track has a unique location, but only tracks that are stored in the library have an id. The id is created and assigned when storing the track in the database for the first time.

This class also consistently calculates the track location from QFileInfo.

@daschuer
Copy link
Member

daschuer commented Mar 4, 2016

I think I have understand what you aim, even if the new TrackRef is not used.

... but IMHO for the additional self documentation we introduce a new degree of complexity, another Object type other developers have to learn.
Can't we move the new features to the TrackInfoObject or to a track info base class?

Always calculating and storing the canonical path, might be an overhead that we don't need for all use-cases of TrackRef. I would prefer a solution that caches the canonical path only on demand.
Such caching is already done by QFileInfo, so way may want to wrap that, instead of our own caching.

};

inline
bool operator==(const TrackRef& lhs, const TrackRef& rhs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a candidate for missus since it might fail even if we reference the same track.
I would prefer to introduce a named function for comparing.
Like isRefferencingTheSameTrack() or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree. This is the canonical implementation of the equality operator
for a simple value object, member by member.

If on the other hand the class name of TrackRef is misleading than we should
instead choose a different one.

On 03/04/2016 04:02 PM, Daniel Schürmann wrote:

In src/track/trackref.h
#901 (comment):

  • }
  • bool isValid() const {
  •    return hasId() || hasCanonicalLocation();
    
  • }

+private:

  • bool verifyConsistency() const;
  • QString m_location;
  • QString m_canonicalLocation;
  • TrackId m_id;
    +};

+inline
+bool operator==(const TrackRef& lhs, const TrackRef& rhs) {

This is a candidate for missus since it might fail even if we reference
the same track.
I would prefer to introduce a named function for comparing.
Like isRefferencingTheSameTrack() or similar.


Reply to this email directly or view it on GitHub
https://github.com/mixxxdj/mixxx/pull/901/files#r55039820.

...because construction from QFileInfo is costly and should be used
consciously.
@uklotzde
Copy link
Contributor Author

uklotzde commented Mar 5, 2016

TrackRef is a new class, but it encapsulates a separate aspect of tracks
that does not fit into TrackInfoObject. TrackInfoObject is still too large
and needs to be decomposed.

I don't think that we should introduce an additional layer of caching.
Storing the QFileInfo and calculating canonicalLocation() dynamically is not
an option, because this class is used as a key in the TrackCache and keys
must be immutable! It is only allowed to set the ID once.

  • Added comments to location()/canonicalLocation() about potential file
    system access
  • Moved the construction from a QFileInfo into a separate factory function
    to avoid misuse

On 03/04/2016 04:00 PM, Daniel Schürmann wrote:

I think I have understand what you aim, even if the new TrackRef is not used.

... but IMHO for the additional self documentation we introduce a new
degree of complexity, another Object type other developers have to learn.
Can't we move the new features to the TrackInfoObject or to a track info
base class?

Always calculating and storing the canonical path, might be an overhead
that we don't need for all use-cases of TrackRef. I would prefer a
solution that caches the canonical path only on demand.
Such caching is already done by QFileInfo, so way may want to wrap that,
instead of our own caching.


Reply to this email directly or view it on GitHub
#901 (comment).

@daschuer
Copy link
Member

daschuer commented Mar 5, 2016

It looks like we have a different view on this PR. Maybe it is because the
new object is still unused and I have other target issues in mind than you.

The object itself, locks nice and you are right in separating different
things to extra objects, so that is not the point.

For me, the main issue we have to solve is a concurrent file access by two
track objects. This may happen because Mixxx may crate two track objects in
different parts that reference the same file. Due to originating from
external libraries or duplicates by symlinks.

As I understand the way to solve it, is a track cache that tracks tracks by
its canonical path.
Once Mixxx accesses the file, to look up the track in this cache an respect
sync facilities.
Locking up the canonical path is a non cheap system call, so we should only
look it up for this purpose.
Mixxx should be remain to be able to instantiate two Track objects for the
same file, since this reflects the truth if thy presents two tracks in the
library with different meta data or as members of different playlist.
Of cause this is an annoying situation we should be able to handle. That
can be a part of a duplicate merge class.

The canonical path is cached in the Qfileinfo class which is part of
tracking.
From this point of view the is no need for a new trackref class?

Will this approach work, or do I miss something?

Which use case do you have in mind?
When do we need for example the compare operators?
TrackRef is a new class, but it encapsulates a separate aspect of tracks
that does not fit into TrackInfoObject. TrackInfoObject is still too large
and needs to be decomposed.

I don't think that we should introduce an additional layer of caching.
Storing the QFileInfo and calculating canonicalLocation() dynamically is
not
an option, because this class is used as a key in the TrackCache and keys
must be immutable! It is only allowed to set the ID once.

  • Added comments to location()/canonicalLocation() about potential file
    system access
  • Moved the construction from a QFileInfo into a separate factory function
    to avoid misuse

On 03/04/2016 04:00 PM, Daniel Schürmann wrote:

I think I have understand what you aim, even if the new TrackRef is not
used.

... but IMHO for the additional self documentation we introduce a new
degree of complexity, another Object type other developers have to learn.
Can't we move the new features to the TrackInfoObject or to a track info
base class?

Always calculating and storing the canonical path, might be an overhead
that we don't need for all use-cases of TrackRef. I would prefer a
solution that caches the canonical path only on demand.
Such caching is already done by QFileInfo, so way may want to wrap that,
instead of our own caching.


Reply to this email directly or view it on GitHub
#901 (comment).


Reply to this email directly or view it on GitHub
#901 (comment).

@uklotzde
Copy link
Contributor Author

uklotzde commented Mar 5, 2016

We must not instantiate two TrackInfoObjects for the same file! Otherwise
writing tags back into files is unsafe. This is the current situation.

The TrackCache will create and hand out managed pointers to tracks, both
from the library and from the file system. TrackInfoObjects MUST always be
allocated through and managed by this cache. Tracks are registered with both
their canonical path and their ID in this cache. The ID is missing for
tracks that have not been added to the library (yet).

Maybe we can delay the calculation of the canonical path until it is
actually needed, i.e. after looking up the track by ID failed. But this
optimization will only affect the interface of TrackCache, i.e. TrackRef
will no longer appear as a parameter in the lookup()/resolve() functions.
The TrackRef class itself is not affected. It is just only created
internally by TrackCache on demand.

I thought we could pull this PR out of the TrackCache stuff, because
TrackRef represents concepts that are independent of the TrackCache. But if
it causes too much confusion I will close this PR.

On 03/05/2016 05:58 PM, Daniel Schürmann wrote:

It looks like we have a different view on this PR. Maybe it is because the
new object is still unused and I have other target issues in mind than you.

The object itself, locks nice and you are right in separating different
things to extra objects, so that is not the point.

For me, the main issue we have to solve is a concurrent file access by two
track objects. This may happen because Mixxx may crate two track objects in
different parts that reference the same file. Due to originating from
external libraries or duplicates by symlinks.

As I understand the way to solve it, is a track cache that tracks tracks by
its canonical path.
Once Mixxx accesses the file, to look up the track in this cache an respect
sync facilities.
Locking up the canonical path is a non cheap system call, so we should only
look it up for this purpose.
Mixxx should be remain to be able to instantiate two Track objects for the
same file, since this reflects the truth if thy presents two tracks in the
library with different meta data or as members of different playlist.
Of cause this is an annoying situation we should be able to handle. That
can be a part of a duplicate merge class.

The canonical path is cached in the Qfileinfo class which is part of
tracking.

From this point of view the is no need for a new trackref class?

Will this approach work, or do I miss something?

Which use case do you have in mind?
When do we need for example the compare operators?
TrackRef is a new class, but it encapsulates a separate aspect of tracks
that does not fit into TrackInfoObject. TrackInfoObject is still too large
and needs to be decomposed.

I don't think that we should introduce an additional layer of caching.
Storing the QFileInfo and calculating canonicalLocation() dynamically is
not
an option, because this class is used as a key in the TrackCache and keys
must be immutable! It is only allowed to set the ID once.

  • Added comments to location()/canonicalLocation() about potential file
    system access
  • Moved the construction from a QFileInfo into a separate factory function
    to avoid misuse

On 03/04/2016 04:00 PM, Daniel Schürmann wrote:

I think I have understand what you aim, even if the new TrackRef is not
used.

... but IMHO for the additional self documentation we introduce a new
degree of complexity, another Object type other developers have to learn.
Can't we move the new features to the TrackInfoObject or to a track info
base class?

Always calculating and storing the canonical path, might be an overhead
that we don't need for all use-cases of TrackRef. I would prefer a
solution that caches the canonical path only on demand.
Such caching is already done by QFileInfo, so way may want to wrap that,
instead of our own caching.


Reply to this email directly or view it on GitHub
#901 (comment).


Reply to this email directly or view it on GitHub
#901 (comment).


Reply to this email directly or view it on GitHub
#901 (comment).

@uklotzde
Copy link
Contributor Author

uklotzde commented Mar 5, 2016

Delaying the calculation of the canonical file path was a good idea! That
even improves the interface of TrackCache :)

But as I already mentioned, the class TrackRef is unaffected.

On 03/05/2016 07:27 PM, Uwe Klotz wrote:

We must not instantiate two TrackInfoObjects for the same file! Otherwise
writing tags back into files is unsafe. This is the current situation.

The TrackCache will create and hand out managed pointers to tracks, both
from the library and from the file system. TrackInfoObjects MUST always be
allocated through and managed by this cache. Tracks are registered with
both their canonical path and their ID in this cache. The ID is missing
for tracks that have not been added to the library (yet).

Maybe we can delay the calculation of the canonical path until it is
actually needed, i.e. after looking up the track by ID failed. But this
optimization will only affect the interface of TrackCache, i.e. TrackRef
will no longer appear as a parameter in the lookup()/resolve() functions.
The TrackRef class itself is not affected. It is just only created
internally by TrackCache on demand.

I thought we could pull this PR out of the TrackCache stuff, because
TrackRef represents concepts that are independent of the TrackCache. But
if it causes too much confusion I will close this PR.

On 03/05/2016 05:58 PM, Daniel Schürmann wrote:

It looks like we have a different view on this PR. Maybe it is because the
new object is still unused and I have other target issues in mind than you.

The object itself, locks nice and you are right in separating different
things to extra objects, so that is not the point.

For me, the main issue we have to solve is a concurrent file access by two
track objects. This may happen because Mixxx may crate two track objects in
different parts that reference the same file. Due to originating from
external libraries or duplicates by symlinks.

As I understand the way to solve it, is a track cache that tracks tracks by
its canonical path.
Once Mixxx accesses the file, to look up the track in this cache an respect
sync facilities.
Locking up the canonical path is a non cheap system call, so we should only
look it up for this purpose.
Mixxx should be remain to be able to instantiate two Track objects for the
same file, since this reflects the truth if thy presents two tracks in the
library with different meta data or as members of different playlist.
Of cause this is an annoying situation we should be able to handle. That
can be a part of a duplicate merge class.

The canonical path is cached in the Qfileinfo class which is part of
tracking.

From this point of view the is no need for a new trackref class?

Will this approach work, or do I miss something?

Which use case do you have in mind?
When do we need for example the compare operators?
TrackRef is a new class, but it encapsulates a separate aspect of tracks
that does not fit into TrackInfoObject. TrackInfoObject is still too large
and needs to be decomposed.

I don't think that we should introduce an additional layer of caching.
Storing the QFileInfo and calculating canonicalLocation() dynamically is
not
an option, because this class is used as a key in the TrackCache and keys
must be immutable! It is only allowed to set the ID once.

  • Added comments to location()/canonicalLocation() about potential file
    system access
  • Moved the construction from a QFileInfo into a separate factory function
    to avoid misuse

On 03/04/2016 04:00 PM, Daniel Schürmann wrote:

I think I have understand what you aim, even if the new TrackRef is not
used.

... but IMHO for the additional self documentation we introduce a new
degree of complexity, another Object type other developers have to learn.
Can't we move the new features to the TrackInfoObject or to a track info
base class?

Always calculating and storing the canonical path, might be an overhead
that we don't need for all use-cases of TrackRef. I would prefer a
solution that caches the canonical path only on demand.
Such caching is already done by QFileInfo, so way may want to wrap that,
instead of our own caching.


Reply to this email directly or view it on GitHub
#901 (comment).


Reply to this email directly or view it on GitHub
#901 (comment).


Reply to this email directly or view it on GitHub
#901 (comment).

@daschuer
Copy link
Member

daschuer commented Mar 5, 2016

I am starting to understand your approach.

We must not instantiate two TrackInfoObjects for the same file! Otherwise writing tags back into files is unsafe. This is the current situation.

This is a true requirement from your point of view. The underlying root requirement is:
We must not open a file twice.

Even if we have only a single track object there is no guarantee. But the advantage of your approach is that it allows us to set-up locking mechanism quite easy.
So lets head to a solution that combines the best of both ideas.

Currently a TrackInfo if build around the unique database ID. If we now prevent Mixxx from instantiating a TrackInfo object from a different database ID of a file that is only referenced by an other Track, it will change a lot of the meaning of a track info object.
This will move the "remove duplicates" task to every instantiation of a track info object.
I am afraid this will leads to various issues.

For example in SetlogFeature::slotJoinWithPrevious the TrackInfo object is only required to set the played counter, which has nothing to do with the track file on the HDD.
If we return a track ID from an other track info object which already reference the same file. We will write the wrong DB entry.

This is the source of my Objections: If we do the deduplicate it at the Track level, we will have problems to graceful handle duplicates, which we will have in some legacy databases. Of cause, duplicates are not desired, but they exists and we need keep a way to write the Database entries of a specific Track ID in the database even if we have already an other track object around which referenced the same file but from and on other DB ID.

How do you think about this idea:
Keep two caches:

  • The track ID cache (as we have already)
  • An canonical path cache, required to allow only one open file operation at a time.

@uklotzde
Copy link
Contributor Author

uklotzde commented Mar 6, 2016

I'll close this PR for now. The commits are still available in my "WriteAudioTags" branch/PR.

I have to think about the implications of an alternative solution or how to handle inconsistencies/duplicates in existing libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants