- Digital Nomad
- ddelange@delange.dev
etl
A graph-based functional API for building complex scikit-learn pipelines.
Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of results.
Scrape the Twitter Frontend API without authentication.
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
Scrape job websites into a single spreadsheet with no duplicates.
A Rust library to extract useful data from HTML documents, suitable for web scraping.
Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any sc…
A library that scrapes Linkedin for user data
🥫 The simple, fast, and modern web scraping library
The web scraper that's nearly impossible to block - now called @ulixee/hero
Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
curl-impersonate: A special build of curl that can impersonate Chrome & Firefox
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch…
Extract data from a wide range of Internet sources into a pandas DataFrame.
An open-source, low-code machine learning library in Python
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.
A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
Intake is a lightweight package for finding, investigating, loading and disseminating data.
Finds Instagram location IDs near a specified latitude and longitude.