Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move scraping of site title and site description to the client (bookmarklet and browser extension) #118

Open
eyriewow opened this issue Apr 29, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@eyriewow
Copy link

Currently, when adding a bookmark through the firefox browser addon the addon will populate certain fields automatically.
I could be wrong here but it seems like the actual scraping of that data, when the bookmark gets added to linkding, is handled by the server.

Most of the time this is not an issue, but with certain sites like those protected by cloudflare, this leads to unexpected behavior as illustrated here;

Adding a bookmark for a Path of Exile forum post via the browser extension:
Notice the pre-populated title field, as expected
YoloMouse_PDT1GLgAb7

How that bookmark then appears in linkding:
YoloMouse_3RlxhE769i

@eyriewow eyriewow changed the title Feature request: Move scraping of site title and site-description entirely to the browser addon Feature request: Move scraping of site title and site description entirely to the browser addon Apr 29, 2021
@sissbruecker
Copy link
Owner

For now I would say that this is by design. The scraping happens on the server because:

  • it can be reused by the internal bookmark form, by the extension, as well as other tools using the REST API
  • fetching a website using AJAX methods from the browser would likely lead to cross origin issues. While the extension might be able to circumvent CORS checks, the internal bookmark form would definitely not

While it's unfortunate that some sites block request coming from servers, I would prefer to keep things simple and keep the logic in one place rather than implement this logic multiple times in different places / languages.

An alternative I can think of is to extend the extension to:

  • provide a setting to always set an explicit title + description and get these from the current tab
  • ATM the extension only reads the tab title, so it would also need to be extended determine a description from the document

@sissbruecker sissbruecker added the question Further information is requested label May 13, 2021
@sissbruecker sissbruecker changed the title Feature request: Move scraping of site title and site description entirely to the browser addon Move scraping of site title and site description to the client (bookmarklet and browser extension) Aug 13, 2022
@sissbruecker
Copy link
Owner

Changed the title to include the bookmarklet into this issue. See #292 for the original request. As mentioned there, if the website metadata is provided by a client, then scraping on the server could be skipped.

I'm more open to this now, as there are bug reports around this from time to time. Ideally the client should provide both the website title and description. Getting the title is straightforward, however the description is not. There are websites (GitHub, Reddit) that do not update the website's meta description tag while navigating through the page, which means the description provided by the client might not be correct. Kind of hard to say which method (client or server scraping) would provide better results on average.

For now I assume server-side scraping is still be better alternative, if someone has ideas around the description issue, feel free to share.

@joshdick
Copy link
Contributor

joshdick commented Jan 2, 2023

Regarding the description issue, I would love if any currently-selected text on a page would be used as the description when invoking the bookmarklet/extension (the current behavior would be kept if no text is selected.)

Barring that, it would be nice to at least have the ability to manually provide a description parameter to /new in order to homebrew the functionality described above by customizing the bookmarklet on my own, using it in Apple Shortcuts, etc.

acbgbca added a commit to acbgbca/linkding that referenced this issue May 30, 2023
acbgbca added a commit to acbgbca/linkding that referenced this issue May 30, 2023
acbgbca added a commit to acbgbca/linkding that referenced this issue May 30, 2023
sissbruecker pushed a commit that referenced this issue May 30, 2023
* Added ability to set title and description #118

* Updated bookmarklet to pass site title #118

* Revert "Updated bookmarklet to pass site title #118"

This reverts commit 873d901.
@sissbruecker
Copy link
Owner

The browser extension now allows to use the title and description of the current browser tab instead of fetching those through the server. The bookmarklet still needs to be updated.

@ccxuy
Copy link

ccxuy commented Oct 8, 2024

Thanks for your great work! I stored over 50,000 bookmarks with this project on my little server now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants