Version 3.0.0
A rabbit hole you shouldn't enter, once entered you can't get out.
Created by: N4O
Last Updated: 26/12/2021
Table of Contents:
The v3 version of VTHell is a big rewrite from previous version, while previous version use multiple scripts now this version includes a single webserver with other stuff that will automatically download/upload/archive your stuff.
This program utilize the Holodex API to fetch Youtube stream and information about it.
The program also use a specific dataset to map upload path, if its need to be improved feel free to open a new pull request.
- Python 3.7+
- mkvmerge (mkvtoolnix)
- rclone
- ytarchive
This project utilize Poetry to manage its project, please follow this instruction to install Poetry.
After you have installed poetry run all of this command:
poetry install
cp .env.example .env
This will install all the requirements and copy the example environment into a proper env file.
- Install
rclone
: https://rclone.org/install/ - Setup
rclone
by refering to their documentation
A simple setup using google drive will be
$ rclone config
Current remotes:
Name Type
==== ====
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>
- Type
n
for creating a new remote
e/n/d/r/c/s/q> n
name> [enter whatever you want]
- After that you will be asked to enter number/name of the storage
FindGoogle Drive
and type the number beside it.
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
[...]
12 / Google Cloud Storage (this is not Google Drive)
\ "google cloud storage"
13 / Google Drive
\ "drive"
14 / Google Photos
\ "google photos"
[...]
Storage> 13
- When asked for
Google Application Client Id
just press enter. - When asked for
Google Application Client Secret
just press enter. - When asked for
scope
press1
and then enter. - When asked for
root_folder_id
just press enter. - When asked for
service_account_file
just press enter. - When asked if you want to edit advanced config press
n
and enter. - When asked this:
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n>
Press y
if you have GUI access, or n
if you're using SSH/Console only.
If you use SSH/Console, you will be given a link, open it, authorize your
account and copy the verification code
result back to the console.
If you use GUI it will open a browser and you can authorize it normally.
When asked for Configure this as a team drive?
just press n
if you don't or y
if you're using team drive.
--------------------
[vthell]
type = drive
scope = drive
token = {"access_token":"REDACTED","token_type":"Bearer","refresh_token":"REDACTED","expiry":"2020-04-12T11:07:42.967625371Z"}
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d>
- Press
y
to complete the setup ore
if you want to edit it again. - You can exit by typing
q
and enter after this.
YTArchive is a tool to download a youtube stream from the very beginning of the stream. This tools works much better rather than Streamlink for now.
- Download the latest version of ytarchive: https://github.com/Kethsar/ytarchive/releases/latest
- Select the correct distribution
- Extract the file
- Create a new folder called
bin
in the root folder of vthell - Copy the extracted file into there, it should now look like this:
[agrius ~/vthell/bin] ls -alh
total 7.3M
drwx------ 2 mizore mizore 4.0K Dec 14 21:57 .
drwxr-xr-x 11 mizore mizore 4.0K Dec 14 21:57 ..
-rwxr-xr-x 1 mizore mizore 7.3M Oct 20 23:58 ytarchive
VTHell v3 have this following configuration needed:
# -- Web Server Config --
# THe port to run the web server on
PORT=12790
# Enable if you're planning to use Reverse Proxy like Nginx
WEBSERVER_REVERSE_PROXY=false
# Set the secret key here if you want to use reverse proxy
WEBSERVER_REVERSE_PROXY_SECRET=this-is-a-very-secure-reverse-proxy-secret
# Set the web password, will be use for authentication
WEBSERVER_PASSWORD=this-is-a-very-secure-web-password
# -- VTHell Config --
# Database name
VTHELL_DB=vth.db
# The waiting time for each download check in seconds
VTHELL_LOOP_DOWNLOADER=60
# The waiting time for each auto scheduler check in seconds
VTHELL_LOOP_SCHEDULER=180
# The grace period for the downloader before starting the download
# waiting process in seconds
VTHELL_GRACE_PERIOD=120
# Your Holodex API Key, you can get it from your profile section
HOLODEX_API_KEY=
# Binary path location and more
RCLONE_BINARY=rclone
RCLONE_DISABLE=0
RCLONE_DRIVE_TARGET=
MKVMERGE_BINARY=mkvmerge
YTARCHIVE_BINARY=ytarchive
# Notification helper
NOTIFICATION_DISCORD_WEBHOOK=
-
PORT
just means what port it will run on (if you run the app file directly) -
WEBSERVER_REVERSE_PROXY
enable if you need reverse proxy feature -
WEBSERVER_REVERSE_PROXY_SECRET
this need to be set if you enable reverse proxy, learn more here. You can generate a random one with:openssl rand -hex 32
-
WEBSERVER_PASSWORD
this will be your password to access protected resources. -
VTHELL_DB
is your database filename -
VTHELL_LOOP_DOWNLOADER
will be your downloader timer, which means the scheduler will run every x seconds that are specified (default 60 seconds) -
VTHELL_LOOP_SCHEDULER
will be your auto scheduler timer, which means the scheduler will run every x seconds that are specified (default 180 seconds). This one will run the auto scheduler that will fetch and automatically add the new job to the database -
VTHELL_GRACE_PERIOD
how long should the program waits before start trying to download the stream (in seconds, default 2 minutes) -
HOLODEX_API_KEY
will be your Holodex API key which you can get from your profile page -
RCLONE_BINARY
will be the full path to your rclone (or you can add it to your system PATH) -
RCLONE_DISABLE
if you set it to1
, it will disable rclone/upload step and will save the data to your local disk atstreamdump/
-
RCLONE_DRIVE_TARGET
will be your target drive or your remote name that you setup in Setup Rclone -
MKVMERGE_BINARY
will be your mkvmerge path -
YTARCHIVE_BINARY
will be your ytarchve path, you can follow the Setup YTArchive to get your ytarchive up and running. -
NOTIFICATION_DISCORD_WEBHOOK
will be used to announce any update to your scheduling. Must be a valid Discord Webhook link.
After you configure it properly, you can start running with Uvicorn or invoking the app.py file directly.
Via Uvicorn
poetry run uvicorn asgi:app
You can see more information here
Invoking directly
- Make sure you're in the virtualenv
- Modify the port you want in the
.env
file - Run with
python3 app.py
to start the webserver
POST
/api/schedule
, schedule a single video.
Returns 200 with the added video on success.
Authentication needed
On fail it will return a JSON with error
field.
This route allows you to schedule a video manually. If video already scheduled, it will replace some stuff but not everything.
This route accept JSON data with this format:
{
"id": "abcdef12345"
}
id
is the youtube video ID that will be fetched to Holodex API to check if it's still live/upcoming.
DELETE
/api/schedule
, delete single scheduled video.
Returns 200 with deleted video on success.
Authentication needed
On fail it will return a JSON with error
field.
This route will delete a specific video and return the deleted video if found, the data is the following:
{
"id": "bFNvQFyTBx0",
"title": "【ウマ娘】本気の謝罪ガチャをさせてください…【潤羽るしあ/ホロライブ】",
"start_time": 1639559148,
"channel_id": "UCl_gCybOJRIgOXw6Qb4qJzQ",
"is_member": false,
"status": "DOWNLOADING",
"error": null
}
The deletion only work if the status is either:
WAITING
DONE
CLEANUP
If it's anything else, it will return 406 Not Acceptable status code.
GET
/api/status
, get the status of all scheduled video.
Returns 200 with a list scheduled video on success.
This routes accept the following query parameters:
include_done
, adding this and setting it into1
ortrue
will include all scheduled video including the one that are already finished.
[
{
"id": "bFNvQFyTBx0",
"title": "【ウマ娘】本気の謝罪ガチャをさせてください…【潤羽るしあ/ホロライブ】",
"start_time": 1639559148,
"channel_id": "UCl_gCybOJRIgOXw6Qb4qJzQ",
"is_member": false,
"status": "DOWNLOADING",
"error": null
}
]
All the data is self-explanatory, the status
is one of this enum:
WAITING
means that it's not yet startedPREPARING
means the recording process is started and now waiting for stream to startDOWNLOADING
means that the stream is being recordedMUXING
means that the stream has finished downloading and now being muxed into.mkv
formatUPLOAD
means that the stream is now being uploaded to the specified folderCLEANING
means that upload process is done and now the program is cleaning up downloaded files.DONE
means that the job is finishedERROR
means an error occured, see theerror
field to learn more.CANCELLED
means the job is cancelled because of an unexpected error (members, private, and more)
GET
/api/status/:id
, get the status of a single job
Returns 200 with a requested video on success.
On fail it will return a JSON with error
key.
It does the same thing as above route, but only for a single job and returns a dictionary instead of list.
The auto scheduler is a feature where the program will check every X seconds to the Holodex API for ongoing/upcoming live stream and will schedule anything that match the criteria.
Routes
The following are the routes available to add/remove/modify scheduler:
GET
/api/auto-scheduler
, fetch all the auto scheduler.
Returns 200 on success with the following data:
{
"include": [
{
"id": 1,
"type": "channel",
"data": "UC1uv2Oq6kNxgATlCiez59hw",
"chains": null
},
{
"id": 2,
"type": "word",
"data": "ASMR",
"chains": [
{
"type": "group",
"data": "hololive"
}
]
}
],
"exclude": [
{
"id": 3,
"type": "word",
"data": "(cover)",
"chains": null
},
]
}
The data format as seen above includes:
type
, which is the type of the data. It must be the following enum:word
: to check if specific word exist in the title. (case-insensitive)regex_word
: same as above, but it use regex. (case-insensitive)group
: check if it match the organization or group (case-insensitive)channel
: check if channel ID match (case-sensitive)
data
: a string following the format of specifiedtype
chains
: A list of data to be chained with the original data check. If chains are defined, all of them must be matching to be scheduled.- This only works on the following type:
word
,regex_word
- This only works on
include
filters only right now.
- This only works on the following type:
You can add new scheduler by sending a POST request to this following route:
POST
/api/auto-scheduler
, add new scheduler filter
Returns 201 on success
Authentication needed
On fail it will return a JSON with error
field.
This route accepts a JSON data with this format:
{
"type": "string-or-type-enum",
"data": "string",
"chains": null,
"include": true
}
type
must be the enum specified above, data
must be a string, and include
means if it should be included or excluded when processing the filters later.
Chains can be either, a dictionary/map for single chain, or a list for multiple chains. It can also be none if you dont need it.
Chains will be ignored automatically if type
is not word
or regex_word
.
PATCH
/api/auto-scheduler/:id
, modify specific scheduler filter.
Returns 204 on success
Authentication needed
On fail it will return a JSON with error
field.
This route accepts all of this JSON data:
{
"type": "string-or-type-enum",
"data": "string",
"chains": null,
"include": true
}
All of it are optional, but you must specify something if you want to modify it.
:id
can be found from using the GET /api/auto-scheduler
.
DELETE
/api/auto-scheduler/:id
, delete specific scheduler filter.
Returns 200 on success with the deleted data
Authentication needed
On fail it will return a JSON with error
field.
:id
can be found from using the GET /api/auto-scheduler
.
The auto scheduler has now been rewritten, if you still have the old one you might want to run the migration scripts.
$ python3 migrations/auto_scheduler.py
Make sure you have the _auto_scheduler.json
in the dataset
folder, and make sure the webserver is running.
Some routes are protected with password to make sure not everyone can use it. To access it, you need to set the WEBSERVER_PASSWORD
and copy te value elsewhere.
After that to access it, you need to set either of following header:
Authorization
: You also need to prefix it withPassword
(ex:Password 123
)X-Auth-Token
: No extra prefixX-Password
: No extra prefix
The program will first check it in Authorization
header then the both X-*
header.
Sample request
curl -X POST -H "Authorization: Password SecretPassword123" http://localhost:12790/api/add
curl -X POST -H "X-Password: SecretPassword123" http://localhost:12790/api/add
curl -X POST -H "X-Auth-Token: SecretPassword123" http://localhost:12790/api/add
Note
If you are running with Uvicorn or anything else, make sure to disable the ping timeout and ping interval. We have our own ping method that you need to answer and using that ping method will broke if you use Nginx deployment or something like that.
The v3 of VTHell now have a Websocket server ready to be connected to. To start, connect to this following route: /api/event
For example in JS:
const ws = new WebSocket("ws://127.0.0.1:12790/api/event");
The websocket have the following formatting that you can understand:
{
"event": "event name",
"data": "can be anything"
}
The raw data will be sent as string, so you need to parse it first to JSON format before parsing anything. The data can be a dictionary, list, string, or even null. So make sure you see this following section that will show all the event name with the data.
Event and Data:
job_update
event
Will be emitted everytime there is an update on the job status. It will broadcast the following data:
{
"id": "123",
"title": "optional",
"start_time": "optional",
"channel_id": "optional",
"is_member": "optional",
"status": "DOWNLOADING",
"error": "An error if possible"
}
or
{
"id": "123",
"status": "DOWNLOADING",
"error": "An error if possible"
}
The error
field might be not available if the status
is not ERROR
.
The only data that will always be sent is id
and status
, if you got the extra field like title
. It means someone called the /api/schedule
API and the existing job data got replaced with some new data. Please maks sure you handle it properly!
job_scheduled
event
This will be emitted everytime autoscheduler added a new scheduled job automatically. It will contains the following data as an example:
{
"id": "bFNvQFyTBx0",
"title": "【ウマ娘】本気の謝罪ガチャをさせてください…【潤羽るしあ/ホロライブ】",
"start_time": 1639559148,
"channel_id": "UCl_gCybOJRIgOXw6Qb4qJzQ",
"is_member": false,
"status": "DOWNLOADING"
}
job_deleted
event
This will be emitted whenever a job was deleted from the database. It will contains the follwing data:
{
"id": "bFNvQFyTBx0"
}
connect_job_init
event
This will be called as soon as you established connection with the Socket.IO server. It will be used so you can store the current state without needing to use the API.
The data will be the same as requesting to the /api/status
(without the job with DONE
status)
ping
andpong
event
This ping/pong packet or event is being used to make sure the connection is alive and well.
The server will sent a ping
request with the followwing content:
{
"t" 1234567890,
"sid": "user-id"
}
t
will be the server unix milis, you will need to respond with the pong
event with the same data.
If you dont answer within 30 seconds, the connection will be closed immediately.
When you connect with the socket, you will get the ping
event immediately!
It is recommended to run it in direct mode if you want to use multiple workers.
Although, it is supported, it might doing some unexpected thing.
To run it in multiple workers mode, just add the parameter --workers
or -W
when invoking the app.py
file
$ source .venv/bin/activate
$ (.venv) python3 app.py -W 4
Above command will run the server with 4 workers.
Version 3.0 of VTHell is very much different to the original 2.x or 1.x version of it. It includes a full web server to monitor your recording externally, a better task management to allow you to fire multiple download at once, Socket.IO feature to better monitor your data via websocket.
It also now using Holodex API rather than Holotools API since it support many more VTuber.
The other thing is moving from JSON file to SQLite3 database for all the job, this improve performance since we dont need to read/write multiple time to disk.
Oh, and I guess now it support Windows since it does not rely on some linux only feature.
With v3, the dataset is now on its own repository, you can access it here: https://github.com/noaione/vthell-dataset
The dataset repo will be fetched every 1 hour to see if the deployed hash changes.
If you have suggestion for new dataset, removal, and more. Please visit the repo and open a PR or Issue there!
This project is licensed with MIT License, learn more here