This is a database for AdVent, the TV ads arrestor. For a description of how AdVent works, refer to its repository. This repository contains a database of TV jingle hashes. These are used by AdVent in order to decide when the ads sound has to be cut.
(advent-pyenv) $ db-djv-pg dbinfo
Dejavu database info:
Fingerprinted / total tracks = 412 / 412
Peak groups = 132228 (avg. ~= 321 per track)
Fingerprints = 3571509 (avg. ~= 8669 per track)
Total fingerprinted time ~= 2007 s (avg. ~= 4.9 s per track)
Database size ~= 514 MB (avg. ~= 1.25 MB per track)
Fingerprinting frequency ~= 1780 Hz (~= 4.04% of sampling frequency 44100 Hz)
Hash size = 10 B
Hash collisions ~= 43.48%
First update ~= 2024-05-14 00:02:17
Last update ~= 2024-10-14 23:40:47
Last vacuum ~= 2024-10-14 23:43:32
AdVent database info:
Countries = 1
TV channels = 14 (avg. ~= 14 per country)
Jingles = 412 (avg. ~= 29 per TV channel)
Pure entry / entry jingles = 94 / 293
Pure exit / exit jingles = 119 / 318
No action jingles = 0
Time coverage from = 2022-02-05
Time coverage till = 2024-10-05
(advent-pyenv) $
It is assumed that you have installed AdVent. Make sure to switch to its virtual environment:
$ source advent-pyenv/bin/activate
where advent-pyenv
is the folder created during AdVent installation.
To pull the latest updates into your database:
- Download the latest snapshot (clone, pull or get a zip):
(advent-pyenv) $ git clone https://github.com/denis-stepanov/advent-db.git
(advent-pyenv) $ cd advent-db/DB
- (recommended) Delete countries or TV channels you are not expected to watch. This will decrease CPU load and reduce the number of false positives while running AdVent:
(advent-pyenv) $ rm -r VA # ....
- Import the update:
(advent-pyenv) $ db-djv-pg import -s .
Import might take a few minutes (depending on how much your database is out of sync).
Some TV channels are known to use multiple jingles during ad breaks (e.g., 1-2-3
instead of regular 1-2
), at least in prime time. While this information could potentially be added to the database, currently it is not provided. You need to pass the number of exit jingles (1
by default) to AdVent using its -j
option, or dynamically at run-time using j
/ J
keys. The table below lists affected channels and their preferences.
TV Channel | Jingle Pattern | N of Exit Jingles (-j) |
---|---|---|
FR/6TER | 1-2 1-2-3 | 1 2 |
FR/M6 | 1-2 1-2-3 | 1 2 |
FR/RMCSTORY | 1-2-3 | 2 |
FR/TF1 | 1-2-3 | 2 |
FR/TF1SERIESFILMS | 1-2-3 | 2 |
FR/TFX | 1-2+3-4 | 2 |
FR/TMC | 1-2-3 | 2 |
FR/W9 | 1-2 1-2(truncated)-3 | 1 2 |
You will need:
- installed and configured AdVent (only sound capturing part; TV control module is not required);
- a basic audio editor capable of track trimming (plus some familiarity with it).
Make sure to switch to AdVent virtual environment:
$ source advent-pyenv/bin/activate
where advent-pyenv
is the folder created during AdVent installation.
Example below is given for Linux Fedora, AdVent configured to listen to PulseAudio output (see "Capturing a TV Webcast") and Audacity as sound editor.
As far as possible, stick to the same audio source that you will be using when running AdVent. Usually the simplest way not requiring messing up with cables is to record a portion of a TV broadcast from Internet. If you have got a choice between analog and digital recording, always prefer digital. Stereo sources are OK and even preferred (Dejavu will treat them as two-in-one).
Preferred audio format to produce is PCM (WAV), 16 bit low endian signed, 44.1 kHz, 2 channels (stereo). Many audio tools will produce this by default. Other formats or parameters have not been tested and may or may not work.
If you configured AdVent for PulseAudio input, you should be OK to record with default settings:
(advent-pyenv) $ parecord -v 6ter_20220725.wav
Opening a recording stream with sample specification 's16le 2ch 44100Hz' and channel map 'front-left,front-right'.
Connection established.
Stream successfully created.
Buffer metrics: maxlength=4194304, fragsize=352800
Using sample spec 's16le 2ch 44100Hz', channel map 'front-left,front-right'.
Connected to device alsa_output.pci-0000_00_1b.0.analog-stereo.monitor (index: 45, suspended: no).
Time: 4.608 sec; Latency: 608164 usec.
...
(stop with Ctrl-C)
(advent-pyenv) $
You can also use recorder of your choice, including Audacity. The file name is not important at this point; here it gives a hint of a channel and of a date of recording.
Load the recording into Audacity: File
> Open...
. Seek through the track to locate the ad jingle; use zoom if needed. Select the desired interval by clicking in the track area on the jingle start and dragging mouse to the jingle end. Whenever possible, keep your jingle record at least two seconds long; anything below one second would not work and shall be avoided. Take a note of the jingle position (before or after the ads).
At this point is it recommended to test the jingle with AdVent to make sure that you are not looking at the piece already known. Run AdVent in console, then press Play
in Audacity.
(advent-pyenv) $ advent -t nil
AdVent v1.6.1
TV control is nil with action 'mute' for 600 s max and 1 exit jingle
TV status: unmuted, volume: 100
Recognition interval is 2 s with confidence of 10%
Started 2 listening thread(s)
Type 'h' for help
....:::oooo:o.::ooooo:.......
If AdVent does not detect your jingle during playback (no "hit" printed), you are good to continue. Trim the track: Edit
> Remove Special
> Trim Audio
; then shift the result to the beginning: Tracks
> Align Tracks
> Start to Zero
. Export the track: File
> Export
> Export as WAV
. Give it a name as per naming convention described below (in this case it would be something like FR_6TER_220725_ELEMENTARY1_1.wav
); leave the encoding Signed 16-bit PCM
(default). No need to fill any metadata; just click OK
.
You can leave AdVent running till the end of the session; it will seamlessly pick up changes in database.
Fingerprint the jingle using standard Dejavu approach:
(advent-pyenv) $ dejavu -f . wav
Fingerprinting all .wav files in the . directory
Fingerprinting channel 1/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
Finished channel 1/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
Fingerprinting channel 2/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
Finished channel 2/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
(advent-pyenv) $
It is recommended (but not required) to keep already processed jingle WAV files around in the case fingerprinting needs to be reexecuted. You can re-submit the entire folders many times; Dejavu will recognize and skip records which have already been submitted.
(optional) At this point you can undo your changes in Audacity (press Ctrl-Z twice) and press Play
; AdVent running in background should now recognize the track:
....:::oO
Hit: FR_6TER_220725_ELEMENTARY1_1
TV muted
OOOo...
Export the hash from the database for future use:
(advent-pyenv) $ db-djv-pg export FR_6TER_220725_ELEMENTARY1_1
FR_6TER_220725_ELEMENTARY1_1 13349
(advent-pyenv) $
Check the number printed which is a number of fingerprints in the jingle. A good jingle might have 5000 or more fingerprints. Unfortunately, you do not have much freedom here, as the number of fingerprints would depend on the length of the jingle and of its sound "richness". Jingles with fewer than 500 fingerprints should not be let into a shared database, as they would often result in false positives. Jingles with 500 - 1000 fingerprints are in the risk zone.
If you consider that your hashes could be of use for others, please submit a pull request. Make sure you follow the folder structure defined in this repository and the file naming conventions.
11_2..2_333333_4..4_5..5.ext
FR_TF1_220214_EVENING1_3.djv
- ISO 3166 two letter country code
- TV channel name
- Jingle capture date YYMMDD (approximate if unsure)
- Jingle name (free format, alphanumeric)
- Binary flags in decimal
- 0x1 - Jingle starts the ads (1 if unsure)
- 0x2 - Jingle ends the ads (1 if unsure)
- ...
- File extension indicates recognizer provider
- djv - Dejavu
If a jingle can be seen both at entry and exit, the flags would be 0x1 & 0x2 = 0x3
= decimal 3 (i.e., just 3 in the name).
Unless you have a specific reason, it is recommended to use capital letters only and to avoid any special characters (spaces, apostrophs, punctuation...).
Variable CSV (comma-separated) format is used.
The first line describes the format:
<format>,<format_version>
The only supported format is djv
. Current format version is 1
. Backward compatibility (reading files of older formats) is supported; forward compatibility (reading files of newer formats) is not supported.
The second line describes the jingle:
<name>,<fingerprinted>,<file_hash>,<num_fingerprints>
name
is the full name of the jingle, as defined above (omitting file extension). It is expected (but not required) that the name as stored in the file corresponds to the file name. fingerprinted
is a flag 0
/1
from the Dejavu database; it should read 1
in all regular AdVent usage scenarios. file_hash
is a SHA1 hash of the audio file once submitted to Dejavu. num_fingerprints
is a number of fingerprints generated for the jingle.
The remaining lines are individual fingerprints. Their number should correspond to the number of fingerprints defined.
<offset>,<hash>
offset
is a fingerprint offset inside the jingle. hash
is a fingerprint itself. It is normal to have several fingerprints for the same offset. The order of the fingerprint lines in the file is not important; for reproducibility of export they are ordered on save.
Example of a file FR_6TER_220725_ELEMENTARY1_1.djv
:
djv,1
FR_6TER_220725_ELEMENTARY1_1,1,72e621563a21a344ead619ef6bfe14fa6a2d219c,13349
0,1d65a451ac6be35206b8
0,2b4157cae1e958cfe6e7
(.. 13347 more lines ..)