Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload API instability (fileparse and extract_thumbnail) #1127

Open
psilabs-dev opened this issue Dec 11, 2024 · 2 comments
Open

Upload API instability (fileparse and extract_thumbnail) #1127

psilabs-dev opened this issue Dec 11, 2024 · 2 comments

Comments

@psilabs-dev
Copy link
Contributor

psilabs-dev commented Dec 11, 2024

os: ubuntu/linux, docker.

When uploading an archive to the server via the API, there seems to be a ~10-20% chance that an unexpected/unhandled exception occurs at extract_thumbnail -> get_filelist -> is_pdf. Adding try catch to is_pdf and printing the id shows the following:

[Archive] [error] Failed to parse file: "" - error: fileparse(): need a valid pathname at /home/koyomi/lanraragi/script/../lib/LANraragi/Utils/Archive.pm line 43.

For some reason, the ID of this archive is empty. However, the archive has been uploaded into the server: if I try to upload manually, I get a duplicate archive error, and docker exec-ing the container shows that the archive has been uploaded in its entirety.

Checking redis, there is no trace of this archive. So the archive is uploaded but not registered in the server/database. Running server-based cleaning like rescan archive/clean search/etc. don't seem to fix the issue. Will probably be checking the database logic for now.

Perhaps more importantly, if an archive is physically present in LRR, rescanning for new archives does not successfully pick up these archives. This is resulting in "dead" archives occupying the contents directory that LRR can't read, while preventing manual or api uploads of the same archive on duplicate grounds.

For the record hasn't happened to me in the past 200k archives I uploaded via the API, though recently I turned on vm.overcommit, so I'm lowkey wondering if that could be a cause.

@Difegue
Copy link
Owner

Difegue commented Dec 11, 2024

This is likely caused by the filemap in the Shinobu file watcher, although I find it odd that it wouldn't just re-register the archive if the ID is empty?
IDs are at the core of every file detection, so unless this file specifically makes an empty ID every time, it'd eventually be picked up as soon as it gets a different one.

@psilabs-dev
Copy link
Contributor Author

psilabs-dev commented Dec 12, 2024

I moved the affected instance's archives to an SSD; the files in LRR match* the num of files in the directory now (±1). I think I'll add some logs before running my next upload job; it seems this is only happening in a prod environment, but fortunately there's not many places where something could go wrong here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants