Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving feature doesn't work on latest docker image with Internet Archive integration enabled in settings #256

Closed
OPerepadia opened this issue May 16, 2022 · 4 comments · Fixed by #281

Comments

@OPerepadia
Copy link
Contributor

Hello. First of all, thank you for developing linkding. I really like it.

So yesterday I updated docker image to latest version (1.9.0). After login, there was a message about archiving feature being opt-in now. I went into settings and enabled it. But now when adding new bookmark, archive is not created. Restarting docker didn't help. I can also reproduce this behavior on newly created user account.

The option LD_DISABLE_BACKGROUND_TASKS in docker-compose.yml is set to False.

@OPerepadia OPerepadia changed the title Archiving feature doesn't work on latest docker image with Internet Archive integration enabled in settingss Archiving feature doesn't work on latest docker image with Internet Archive integration enabled in settings May 16, 2022
@sissbruecker
Copy link
Owner

Did a quick check on my installation, with my user as well as a new user, and it's still working. The steps you took would have been pretty much what I would have suggested.

I was going to suggest to check the admin panel if there are any locked tasks that have not been cleared when restarting Docker, but it looks like I never integrated tasks into the admin panel 😑. I'll see if I can add the tasks to the admin and maybe add some debug output to the task generation.

One thing to note is that if you had a large queue of tasks to be processed before the update, then it would just take longer until the background process gets to processing the new bookmark. If you are capable of doing so, you could open the SQLite database and check if there are tasks in the background_task table, and if adding a bookmark adds a task. You could also just delete all tasks from the table, and then logout & login again, which should schedule tasks for all bookmarks that do not have an archive link yet.

@OPerepadia
Copy link
Contributor Author

OPerepadia commented May 16, 2022

I found the cause. I opened the database and there were 16 tasks in the table. And apparently some of those tasks are related to a bookmarks which are links to already archived articles (at the moment of adding the original link was already dead), or the links which are not accessible at the moment. So I manually removed those bookmarks, and after restart it works again, but after I add one of those links, it gets stuck.

After manually re-adding one of such links, I get this error in last_error field

Traceback (most recent call last):
  File "/opt/venv/lib/python3.9/site-packages/background_task/tasks.py", line 43, in bg_runner
    func(*args, **kwargs)
  File "/etc/linkding/bookmarks/services/tasks.py", line 44, in _create_web_archive_snapshot_task
    archive = wayback.save()
  File "/opt/venv/lib/python3.9/site-packages/waybackpy/wrapper.py", line 223, in save
    self._archive_url = "https://" + _archive_url_parser(
  File "/opt/venv/lib/python3.9/site-packages/waybackpy/utils.py", line 447, in _archive_url_parser
    raise WaybackError(exc_message)
waybackpy.exceptions.WaybackError: No archive URL found in the API response. If 'https://web.archive.org/web/202205161012/https://web.archive.org/web/20220321023949/http://teslacoil.ru/katushki-tesla/tranzistornyie-katushki/polumostovaya-sstc/' can be accessed via your web browser then either this version of waybackpy (2.4.3) is out of date or WayBack Machine is malfunctioning. Visit 'https://github.com/akamhy/waybackpy' for the latest version of waybackpy.
Header:
save redirected

It's interesting that before the update it just skipped such links.

@sissbruecker
Copy link
Owner

Thanks for checking. The logic around broken links could definitely be improved. In general the task processor handles any error that is raised, and then reschedules the task again in an increasing interval, up to 5 (?) times. So it's expected that the task stays in the queue then, I think it can take up to several hours to go through all 5 iterations. Note that when rescheduling, this does not block other tasks, so if you add a new bookmark then that should get processed in between.

If one or more failed tasks block other tasks, that would be an actual problem, but that needs some more debugging.I can give this a try later with the URL from the error message.

Apart from that, a future improvement could be to mark bookmarks as broken after X failures, and then never schedule a background task for these again.

@sprklinginfo
Copy link

Installed the application in docker today, I found lots of tasks with similar errors with the Internet Archive integration enabled.

Traceback (most recent call last):
  File "/opt/venv/lib/python3.9/site-packages/background_task/tasks.py", line 43, in bg_runner
    func(*args, **kwargs)
  File "/etc/linkding/bookmarks/services/tasks.py", line 44, in _create_web_archive_snapshot_task
    archive = wayback.save()
  File "/opt/venv/lib/python3.9/site-packages/waybackpy/wrapper.py", line 223, in save
    self._archive_url = "https://" + _archive_url_parser(
  File "/opt/venv/lib/python3.9/site-packages/waybackpy/utils.py", line 447, in _archive_url_parser
    raise WaybackError(exc_message)
waybackpy.exceptions.WaybackError: No archive URL found in the API response. If 'https://corporatefinanceinstitute.com/course/learn-accounting-fundamentals-corporate-finance/' can be accessed via your web browser then either this version of waybackpy (2.4.3) is out of date or WayBack Machine is malfunctioning. Visit 'https://github.com/akamhy/waybackpy' for the latest version of waybackpy.
Header:
save redirected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants