You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WebSearch components expect a question and return a list of links, similarly to what WebSearch components do in Haystack 1.x. As in Haystack 1.x, we expect to have about 4 implementations:
SerpWebSearch
SerperWebSearch
GoogleWebSearch
BingWebSearch
and any other search engine backend we may want to support in the future.
Note that WebSearch component return either links, snippets, or both. We can specify which behavior we prefer with an init flag, or simply make the component always return both and either discard or ignore the outputs we are not interested in.
LinkFetcher
LinkFetcher as well works very similarly to its Haystach 1.x version, but it's much simpler, because it will not have callbacks for document processing, nor a preprocessor instance. Instead it will work similarly to a FileTypeClassifier: it will check what type of file it obtained from the input link that it just read and return such file on an edge named after the type, for example html, pdf, json, ...
The file conversion and the preprocessing will be done later in the pipeline by the respective components.
CacheChecker
CacheChecker is a document store aware component that simply checks for the presence of a document into a document store. In this case, rather than checking documents instances, it should check for the presence of documents with a specific URL in their metadata. It also support cache expiration features, so the documents might need to have also their retrieval datetime in the metadata.
Let's remember that CacheChecker returns two different outputs: missing, which is a list of links it could not find in the docstore (or which cached version is too old) and found, which instead is a list of Documents that correspond to the links it received in input.
The content you are editing has changed. Please copy your edits and refresh the page.
Haystack 1.x supports retrievers that perform a web search instead of looking into a document store.
In Haystack 2.0 we decide to split such component into smaller components that would look like such in a web retrieval RAG pipeline:
...WebSearch
WebSearch
components expect a question and return a list of links, similarly to what WebSearch components do in Haystack 1.x. As in Haystack 1.x, we expect to have about 4 implementations:SerpWebSearch
SerperWebSearch
GoogleWebSearch
BingWebSearch
and any other search engine backend we may want to support in the future.
Note that WebSearch component return either links, snippets, or both. We can specify which behavior we prefer with an init flag, or simply make the component always return both and either discard or ignore the outputs we are not interested in.
LinkFetcher
LinkFetcher as well works very similarly to its Haystach 1.x version, but it's much simpler, because it will not have callbacks for document processing, nor a preprocessor instance. Instead it will work similarly to a
FileTypeClassifier
: it will check what type of file it obtained from the input link that it just read and return such file on an edge named after the type, for examplehtml
,pdf
,json
, ...The file conversion and the preprocessing will be done later in the pipeline by the respective components.
CacheChecker
CacheChecker
is a document store aware component that simply checks for the presence of a document into a document store. In this case, rather than checking documents instances, it should check for the presence of documents with a specific URL in their metadata. It also support cache expiration features, so the documents might need to have also their retrieval datetime in the metadata.Let's remember that CacheChecker returns two different outputs:
missing
, which is a list of links it could not find in the docstore (or which cached version is too old) andfound
, which instead is a list of Documents that correspond to the links it received in input.Tasks
SerperDevWebSearch
Haystack 2.0 component #5712LinkContentFetcher
Haystack 2.0 component #5724UrlCacheChecker
2.0 #5840Join
component #5852The text was updated successfully, but these errors were encountered: