Browserbase is a developer platform to reliably run, manage, and monitor headless browsers.
Power your AI data retrievals with:
- Serverless Infrastructure providing reliable browsers to extract data from complex UIs
- Stealth Mode with included fingerprinting tactics and automatic captcha solving
- Session Debugger to inspect your Browser Session with networks timeline and logs
- Live Debug to quickly debug your automation
- Get an API key and Project ID from browserbase.com and set it in environment variables (
BROWSERBASE_API_KEY
,BROWSERBASE_PROJECT_ID
). - Install the required dependencies:
pip install browserbase-haystack
You can load webpages into Haystack using BrowserbaseFetcher
. Optionally, you can set text_content
parameter to convert the pages to text-only representation.
from browserbase_haystack import BrowserbaseFetcher
browserbase_fetcher = BrowserbaseFetcher()
browserbase_fetcher.run(urls=["https://example.com"], text_content=False)
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from browserbase_haystack import BrowserbaseFetcher
prompt_template = (
"Tell me the titles of the given pages. Pages: {{ documents }}"
)
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator()
browserbase_fetcher = BrowserbaseFetcher()
pipe = Pipeline()
pipe.add_component("fetcher", browserbase_fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("fetcher.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"fetcher": {"urls": ["https://example.com"]}})
urls
Required. A list of URLs to fetch.text_content
Retrieve only text content. Default isFalse
.session_id
Optional. Provide an existing Session ID.proxy
Optional. Enable/Disable Proxies.