Skip to content
View ddelange's full-sized avatar
💥
["translatio", "imitatio", "aemulatio"]
💥
["translatio", "imitatio", "aemulatio"]

Block or report ddelange

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

etl

Extract-Transform-Load, Data Wrangling, Data Mining, ...
249 repositories

A graph-based functional API for building complex scikit-learn pipelines.

Python 592 30 Updated Dec 8, 2022

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.

Jupyter Notebook 663 58 Updated May 15, 2024

Doubt your data, find bad labels.

Python 506 17 Updated Jul 15, 2024

Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of results.

HTML 54 21 Updated Jan 29, 2024

Scrape the Twitter Frontend API without authentication.

Python 3,940 606 Updated Oct 30, 2023

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

HTML 2,361 168 Updated Dec 7, 2024

Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

Go 2,645 151 Updated Aug 12, 2024

Scrape job websites into a single spreadsheet with no duplicates.

Python 1,911 218 Updated Oct 15, 2024

A Rust library to extract useful data from HTML documents, suitable for web scraping.

Rust 976 69 Updated Jun 21, 2024

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

Python 1,408 80 Updated Jun 2, 2023

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any sc…

Go 813 23 Updated Dec 5, 2021

A library that scrapes Linkedin for user data

Python 2,146 590 Updated Dec 13, 2024

🥫 The simple, fast, and modern web scraping library

Python 748 55 Updated Dec 7, 2023

The web scraper that's nearly impossible to block - now called @ulixee/hero

TypeScript 676 46 Updated Mar 7, 2023

crawl and scrape web pages in rust

Rust 743 35 Updated Jun 20, 2023

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

Python 824 108 Updated Oct 18, 2023

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

Python 1,079 226 Updated Jul 30, 2024

Lightweight package to query popular search engines and scrape for result titles, links and descriptions

Python 460 87 Updated May 3, 2024

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Python 426 19 Updated Dec 9, 2024

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.

Python 1,372 140 Updated Nov 13, 2023

curl-impersonate: A special build of curl that can impersonate Chrome & Firefox

Python 3,869 260 Updated Jul 18, 2024

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch…

Python 1,804 284 Updated Dec 2, 2023

Extract data from a wide range of Internet sources into a pandas DataFrame.

Python 2,979 683 Updated Aug 8, 2024

An open-source, low-code machine learning library in Python

Jupyter Notebook 9,019 1,779 Updated Dec 13, 2024

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Python 6,035 603 Updated Aug 26, 2024

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Python 483 63 Updated Nov 30, 2024

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Python 2,485 77 Updated Nov 15, 2024

Iterative JSON parser with Pythonic interfaces

Python 860 53 Updated Nov 26, 2024

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Python 1,014 142 Updated Nov 13, 2024

Finds Instagram location IDs near a specified latitude and longitude.

Python 578 82 Updated Apr 9, 2024