Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bookmarking of PDFs #826

Merged
merged 1 commit into from
Jun 22, 2024
Merged

Fix bookmarking of PDFs #826

merged 1 commit into from
Jun 22, 2024

Conversation

BPerlakiH
Copy link
Collaborator

@BPerlakiH BPerlakiH commented Jun 22, 2024

Fixes: #817

Currently this is how it looks with all the Water docs bookmarked on macOS:

Screenshot 2024-06-22 at 14 25 39

@BPerlakiH BPerlakiH requested review from rgaudin and kelson42 June 22, 2024 09:46
@BPerlakiH BPerlakiH linked an issue Jun 22, 2024 that may be closed by this pull request
@kelson42
Copy link
Contributor

kelson42 commented Jun 22, 2024

@BPerlakiH You should not need a PDF parser if you don't need an HTML parser (for the bookmarks). This is also not what is requested in the issue. I really don't get it.

@BPerlakiH
Copy link
Collaborator Author

@kelson42 The bookmarks are currently containing these fields we do display:

  • title
  • (optional) short snippet
  • (optional) image url

So far we were getting these fields only via an HTML parser. In case of a PDF file it was not working as expected. Therefore I thought we might use a PDF parser to get the same fields if possible.

@BPerlakiH
Copy link
Collaborator Author

BPerlakiH commented Jun 22, 2024

It could be simplified then with: in case of a non html (text) document:

  • the title replaced with the path
  • no snippet
  • no image

@kelson42
Copy link
Contributor

@BPerlakiH Indeed, but not sure what you mean?

@BPerlakiH
Copy link
Collaborator Author

BPerlakiH commented Jun 22, 2024

The result of such simplification would look like this:
Screenshot 2024-06-22 at 14 26 38

@kelson42
Copy link
Contributor

kelson42 commented Jun 22, 2024

yes, this is what is requested in the issue and this is appropriate to me.

So, as far as I understand you wanted to get a better bookmark by allowing to get more details about the PDF. Can you please:

  • open a dedicated issue to explain the problem and proposal
  • move the appropriate code in a draft PR

@benoit74
Copy link

As discussed for nautilus and zimit, I think it makes much more sense to enhance scrapers so that they populate properly the title and provide proper indexing data for search. This would allow all readers to benefit from such an enhancement at once. I don't mind if this is implement in apple reader, but it might soon be "obsolete" once all pdfs have a proper title due to implementation in the scraper. I intend to add this to the python-scraperlib in the coming weeks.

@kelson42
Copy link
Contributor

kelson42 commented Jun 22, 2024

@benoit74 At the core of this issue is a lack of PDF support at scraper side. But, even if this has to be fixed there, it has to be handled properly at reader level if no title metadata available... For PDF or any other supported mime-type.

@BPerlakiH
Copy link
Collaborator Author

I've updated this PR, and created a new issue for the more detailed solution: #827

@kelson42 kelson42 merged commit 6f55765 into main Jun 22, 2024
4 checks passed
@kelson42 kelson42 deleted the 817-fix-bookmark-titles branch June 22, 2024 13:45
@rgaudin
Copy link
Member

rgaudin commented Jun 22, 2024

I'm a bit worried by this PR. I feel there's no room for discussion and everything is rushed.

My first opinion when reading the PR description was the one of @kelson42 but reading the implementation, I think it was the appropriate way to go: PDF is a natively supported format on apple system. Hence the PDF Parser is builtin.
The PR had the appropriate fallback so ZIM entry so it was respecting the same concept as for HTML: if document itself has a title, use it (as in a regular browser or pdf reader) and if not fallback to ZIM.

@kelson42
Copy link
Contributor

@rgaudin See my comment to the dedicated ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bookmarking a PDF fails
4 participants