Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add LinkContentFetcher Haystack 2.0 component #5724

Merged
merged 44 commits into from
Sep 20, 2023
Merged

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Sep 5, 2023

Why:

This PR introduces the LinkContentFetcher component to Haystack 2.0. This component is responsible for fetching content from a given URL and converting it into a Document object, which can then be used within the Haystack 2.0 pipeline.

Part of #5614

What:

A new LinkContentFetcher component has been added to the newly created fetchers package. This package is intended to be the future home for all components that fetch data via network calls.

How can it be used:

Here's a code snippet demonstrating how to use the new LinkContentFetcher component:

from haystack.preview import Document
from haystack.preview.components.fetchers import LinkContentFetcher

lcf = LinkContentFetcher()
doc: Document = lcf.fetch(url="example.com")
print(doc)

How did you test it:

Unit tests have been added to cover the new component. Additionally, manual tests were conducted, as described in the "How can it be used" section above.

Notes For Reviewer:

Please ensure that the new component and its unit tests are correctly implemented.

@vblagoje vblagoje requested a review from a team as a code owner September 5, 2023 12:21
@vblagoje vblagoje requested review from ZanSara and removed request for a team September 5, 2023 12:21
@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Sep 5, 2023
@vblagoje vblagoje changed the title feat: Add LinkContentFetcher feat: Add LinkContentFetcher Haystack 2.0 component Sep 5, 2023
@vblagoje vblagoje requested a review from a team as a code owner September 5, 2023 12:25
@vblagoje vblagoje requested review from dfokina and removed request for a team September 5, 2023 12:25
@ZanSara ZanSara added the 2.x Related to Haystack v2.0 label Sep 5, 2023
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation is solid but I think it can be simplified a lot. Sorry that I left many comments, I tried to provide the code where possible.

haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
haystack/preview/components/fetchers/link_content.py Outdated Show resolved Hide resolved
@ZanSara ZanSara changed the title feat: Add LinkContentFetcher Haystack 2.0 component feat: Add LinkContentFetcher Haystack 2.0 component Sep 6, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Sep 6, 2023

I've just realized that there's no proper way to store the binary returned with the response into the current Document class. As this is not the first time we have trouble with the limitations of Document, I'm going to first expand the Document class and then wrap up this PR.

@ZanSara ZanSara mentioned this pull request Sep 6, 2023
Base automatically changed from new-document to main September 11, 2023 15:40
@coveralls
Copy link
Collaborator

coveralls commented Sep 20, 2023

Pull Request Test Coverage Report for Build 6245834225

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 49.255%

Totals Coverage Status
Change from base Build 6244882304: 0.1%
Covered Lines: 11904
Relevant Lines: 24168

💛 - Coveralls

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@ZanSara ZanSara merged commit 0983fb6 into main Sep 20, 2023
67 checks passed
@ZanSara ZanSara deleted the new_content_fetcher branch September 20, 2023 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants