Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: pdf support #28

Closed
asg0451 opened this issue Mar 27, 2024 · 2 comments
Closed

feature request: pdf support #28

asg0451 opened this issue Mar 27, 2024 · 2 comments
Labels
feature request New feature or request

Comments

@asg0451
Copy link

asg0451 commented Mar 27, 2024

I'd like to be able to capture a pdf url, eg https://gavinadair.files.wordpress.com/2017/03/baker-changes-of-mind.pdf

currently, it is captured but no tags or added nor is text extracted
image

logs:

hoarder-workers 2024-03-27T16:53:27.624Z info: [Crawler][9] Will crawl "https://gavinadair.files.wordpress.com/2017/03/baker-changes-of-mind.pdf" for link with id "h03n4dihn2gp0kn8giwiyir7"                                                                                           hoarder-workers 2024-03-27T16:53:27.813Z info: [search][30] Completed successfully                                                                                                                                                                                                      hoarder-workers 2024-03-27T16:53:27.822Z error: [Crawler][9] Crawling job failed: {}                               
@MohamedBassem
Copy link
Collaborator

Yeah, only html Content-Type currently works. PDF support is a reasonable feature request though. Will add it to the backlog. Thanks!

@MohamedBassem MohamedBassem added the feature request New feature or request label Mar 27, 2024
@MarkLuk
Copy link

MarkLuk commented Jun 8, 2024

Exactly my use-case! I research & bookmark a lot of PDF files. Would like to have support to view their content in the preview.

kamtschatka added a commit to kamtschatka/hoarder-app that referenced this issue Jun 9, 2024
extended the database to allow storing pdf assets alongside links
added downloading of pdfs
added aiinference for pdfs
updated the UI to display the same as for asset bookmarks
kamtschatka added a commit to kamtschatka/hoarder-app that referenced this issue Jun 10, 2024
Added a new sourceUrl column to the asset bookmarks
Added transforming a link bookmark pointing at a pdf to an asset bookmark
made sure the "View Original" link is also shown for asset bookmarks that have a sourceURL
updated gitignore for IDEA
kamtschatka added a commit to kamtschatka/hoarder-app that referenced this issue Jun 18, 2024
Added a new sourceUrl column to the asset bookmarks
Added transforming a link bookmark pointing at a pdf to an asset bookmark
made sure the "View Original" link is also shown for asset bookmarks that have a sourceURL
updated gitignore for IDEA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants