ddelange

💥

["translatio", "imitatio", "aemulatio"]

ddelange ddelange

💥

["translatio", "imitatio", "aemulatio"]

🔯 Data Engineer / Reliability Engineer ⚛️ Scalable Data Products ☸️ MLOps ft. Kubernetes

92 followers · 51 following

Digital Nomad
ddelange@delange.dev

Achievements

x3 x2 x3

Achievements

x3 x2 x3

Stars

etl

Extract-Transform-Load, Data Wrangling, Data Mining, ...

249 repositories

alegonz / baikal

A graph-based functional API for building complex scikit-learn pipelines.

Python 592 30 Updated Dec 8, 2022

LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.

Jupyter Notebook 663 58 Updated May 15, 2024

koaning / doubtlab

Doubt your data, find bad labels.

Python 506 17 Updated Jul 15, 2024

EdmundMartin / SearchScraperAPI

Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of results.

HTML 54 21 Updated Jan 29, 2024

bisguzar / twitter-scraper

Scrape the Twitter Frontend API without authentication.

Python 3,940 606 Updated Oct 30, 2023

microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

HTML 2,361 168 Updated Dec 7, 2024

geziyor / geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

Go 2,645 151 Updated Aug 12, 2024

PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Python 1,911 218 Updated Oct 15, 2024

utkarshkukreti / select.rs

A Rust library to extract useful data from HTML documents, suitable for web scraping.

Rust 976 69 Updated Jun 21, 2024

claffin / cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

Python 1,408 80 Updated Jun 2, 2023

DataHenHQ / till

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any sc…

Go 813 23 Updated Dec 5, 2021

joeyism / linkedin_scraper

A library that scrapes Linkedin for user data

Python 2,146 590 Updated Dec 13, 2024

maxhumber / gazpacho

🥫 The simple, fast, and modern web scraping library

Python 748 55 Updated Dec 7, 2023

ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero

TypeScript 676 46 Updated Mar 7, 2023

mattsse / voyager

crawl and scrape web pages in rust

Rust 743 35 Updated Jun 20, 2023

JosephLai241 / URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

Python 824 108 Updated Oct 18, 2023

Altimis / Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

Python 1,079 226 Updated Jul 30, 2024

bisohns / search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions

Python 460 87 Updated May 3, 2024

roniemartinez / dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Python 426 19 Updated Dec 9, 2024

Ge0rg3 / requests-ip-rotator

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.

Python 1,372 140 Updated Nov 13, 2023

lwthiker / curl-impersonate

curl-impersonate: A special build of curl that can impersonate Chrome & Firefox

Python 3,869 260 Updated Jul 18, 2024

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch…

Python 1,804 284 Updated Dec 2, 2023

pydata / pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

Python 2,979 683 Updated Aug 8, 2024

pycaret / pycaret

An open-source, low-code machine learning library in Python

Jupyter Notebook 9,019 1,779 Updated Dec 13, 2024

wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Python 6,035 603 Updated Aug 26, 2024

drivendataorg / cloudpathlib

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Python 483 63 Updated Nov 30, 2024

jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Python 2,485 77 Updated Nov 15, 2024

ICRAR / ijson

Iterative JSON parser with Pythonic interfaces

Python 860 53 Updated Nov 26, 2024

intake / intake

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Python 1,014 142 Updated Nov 13, 2024

bellingcat / instagram-location-search

Finds Instagram location IDs near a specified latitude and longitude.

Python 578 82 Updated Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddelange ddelange

Achievements

Achievements

Block or report ddelange

etl

alegonz / baikal

LineaLabs / lineapy

koaning / doubtlab

EdmundMartin / SearchScraperAPI

bisguzar / twitter-scraper

microlinkhq / metascraper

geziyor / geziyor

PaulMcInnis / JobFunnel

utkarshkukreti / select.rs

claffin / cloudproxy

DataHenHQ / till

joeyism / linkedin_scraper

maxhumber / gazpacho

ulixee / secret-agent

mattsse / voyager

JosephLai241 / URS

Altimis / Scweet

bisohns / search-engine-parser

roniemartinez / dude

Ge0rg3 / requests-ip-rotator

lwthiker / curl-impersonate

uber / petastorm

pydata / pandas-datareader

pycaret / pycaret

wireservice / csvkit

drivendataorg / cloudpathlib

jcrist / msgspec

ICRAR / ijson

intake / intake

bellingcat / instagram-location-search