Move metadata from info.json files to DB #148

hkalexling · 2021-01-14T06:43:25Z

Background

Currently, all metadata except tags are stored in the info.json files in each title folder. The data includes reading progress, sorting options, custom cover images, and custom display names. The reason why we use info.json can be found here #37 (comment).

The Issue

Reading and writing to the info.json files are much slower than a proper DB
Some data like tags and thumbnails can only be stored in DB, so when the library is renamed or moved, the data would be lost (see [Bug Report] Lost Tags after library root directory change #146)
Some users might want to keep their library unchanged

Proposed Solution

We can have two tables in DB:

==========
TITLES
----------
id
path
signature
==========

==========
TITLE_INFO
----------
id
tags
progress
... other info of the title
==========

In the TITLES table, signature is a (mostly) unique value for a title. We can calculate it using the following procedure:

Get all entries in the title (a list of cbz/cbr files)
Get the file sizes of the entries as an array and sort it
Join the array as a long string
Calculate the CRC32 checksum of the string, and use that as the signature of the title

If anything in the title changes, the checksum would likely change as well.

On library scan, if a title's path and signature match a row in the TITLES table, we assign the corresponding id to the title, and it can then retrieve its information from the TITLE_INFO table. If a title's signature matches the DB record, but the path doesn't (or the other way around), we still use the id, and we update the unmatched field to the correct value. In this way, even if a title is moved or renamed, we can still match it in the DB because its signature is still the same.

Conclusion

This issue serves as an RFC, so any comments and suggestions are welcome!

The text was updated successfully, but these errors were encountered:

Leeingnyo · 2021-01-25T22:18:11Z

Sounds good to me.

Proposed fault tolerance of matching same titles:

directories
- allow to be moved, renamed
- not allowing any updates (except renamed) of nested contents if moved, renamed
files: allow to be moved, renamed

There might be other requirements, but I think this tolerance is enough to use, since people usually move or rename entire root titles. Above all, this prevent to generate thumbnails repeatedly! 😄

by the way, the calculated signatures are cached automatically?

hkalexling · 2021-01-26T12:16:35Z

@Leeingnyo Thanks for the feedback! I took some time to implement this (not pushed yet), and I am leaning towards simply using the inode numbers as the signatures for both titles and entries. On most file systems, the inode number of a file/folder is preserved when the file is moved, renamed, or even edited.

Some operations that would cause the inode number to change:

Reboot/remount on some file systems
Replaced with a copied file
Moved to a different device

But since we are also comparing the file paths, we won't lose information as long as the above changes do not happen together with a file/folder rename, with no library scan in between.

The difference between using the inode number and the original plan mentioned above is that the inode number stays the same even when the file/folder content changes, but I think this is not an issue.

The inode number and filesize/modification date are metadata, and reading them is very fast, so I don't think we need to cache the signatures. I tested it a bit, and the scanning time does not appear to be much longer. But I am not sure how this would affect the scanning performance for network-mounted drives (see #118), so I would need to test this a bit before releasing the changes.

Again, feel free to let me know what you think!

Leeingnyo · 2021-01-26T16:44:14Z

oh I see! Then it has more generous fault tolerance. Great!
You mean that signature of titles, entries equals inode number of directories, files (directly gotten from a single node, no nested jobs), right? not as wrote in dev branch

hkalexling · 2021-01-27T03:32:11Z

Oh I should have made it clearer that for titles we do generate the signatures recursively: https://github.com/hkalexling/Mango/blob/5779d225f6afece178aa5a8785f34045e84a4253/src/util/signature.cr#L10-L51

hkalexling · 2022-03-19T13:26:17Z

Update:

With the new metadata and library caching features in v0.24.0, Mango can handle large libraries pretty well, so we don't desperately need this feature any more. I am keeping this open so maybe we can revisit it someday.

afknst · 2022-04-18T14:28:20Z

Please have a look at #295

hkalexling · 2022-04-22T08:55:03Z

Yeah good point the JSON files are less resilient than the DB. Let me see what we can do.

hkalexling added enhancement New feature or request help wanted Extra attention is needed labels Jan 14, 2021

hkalexling pinned this issue Jan 14, 2021

hkalexling added the rfc Request for Comments label Jan 27, 2021

hkalexling mentioned this issue Jan 29, 2021

v0.20.0 #156

Merged

hkalexling mentioned this issue Mar 2, 2021

[Feature Request] MangaDex in API Improvements #170

Open

hkalexling unpinned this issue Apr 9, 2021

hkalexling mentioned this issue May 4, 2021

Support for large libraries #186

Closed

hkalexling mentioned this issue Aug 5, 2021

[Question] speed up the loading of the home #209

Closed

hkalexling removed the help wanted Extra attention is needed label Mar 19, 2022

hkalexling changed the title ~~[Plan/RFC] Move metadata from info.json files to DB~~ Move metadata from info.json files to DB Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move metadata from info.json files to DB #148

Move metadata from info.json files to DB #148

hkalexling commented Jan 14, 2021 •

edited

Loading

Leeingnyo commented Jan 25, 2021

hkalexling commented Jan 26, 2021

Leeingnyo commented Jan 26, 2021

hkalexling commented Jan 27, 2021

hkalexling commented Mar 19, 2022

afknst commented Apr 18, 2022

hkalexling commented Apr 22, 2022

Move metadata from info.json files to DB #148

Move metadata from info.json files to DB #148

Comments

hkalexling commented Jan 14, 2021 • edited Loading

Background

The Issue

Proposed Solution

Conclusion

Leeingnyo commented Jan 25, 2021

hkalexling commented Jan 26, 2021

Leeingnyo commented Jan 26, 2021

hkalexling commented Jan 27, 2021

hkalexling commented Mar 19, 2022

afknst commented Apr 18, 2022

hkalexling commented Apr 22, 2022

hkalexling commented Jan 14, 2021 •

edited

Loading