-
Notifications
You must be signed in to change notification settings - Fork 63
Available data sources
Sal Hagen edited this page Aug 20, 2024
·
16 revisions
On this page we list the scripts for data sources. Some of these are fully functional, others are deprecated. Let us know if you have a new data source to add.
For datasource-specific information, check the README files in the folder of the respective data source.
Name | Source | Active | Objects | Local (Continuous scraper) | Notes |
---|---|---|---|---|---|
4chan | 4chan API | Yes | Comments + OPs | Yes | We wrote several scripts to import data from 4chan archives in the helper-scripts folder, e.g. this script to import csv dumps from 4plebs. |
8chan | 4chan API | No (Archives only) | Comments + OPs | Yes | 8chan is now defunct. We scraped live data when it was still online. Let us know in case you are interested in a database copy. |
8kun | 8chan API | Yes | Comments + OPs | Yes | Similar to the 4chan data source. |
9gag | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
Bitchute | Scraping | No (issue) | Videos + comments | No | Uses BitChute's web search endpoint, and scrapes data from the live website. |
Douban | Scraping | Yes | Comments + OPs | No | Small datasets can be collected; due to rate-limiting, large searches may not complete properly. |
Douyin | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
Imgur | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
Import from tool | Files from other tools | Yes | - | No | This to import files from tools like CrowdTangle. |
ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. | |
ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. | |
No (Archive only) |
|
||||
Telegram | Telegram API | Yes | Messages in open groups | No | Requires a personal API key, which can be obtained by anyone with a Telegram account here. |
TikTok | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
Tumblr | Tumblr API | Yes | Posts + reblogs | No | Requires API keys which you can obtain here |
X/Twitter | Twitter API & ZeeSchuimer | Yes | Tweets | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
Usenet | - | Comments + OPs | Yes | Requires a local, static Usenet database. | VK |
🐈🐈🐈🐈