Skip to content

denis-stepanov/advent-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

AdVent Database

This is a database for AdVent, the TV ads arrestor. For a description of how AdVent works, refer to its repository. This repository contains a database of TV jingle hashes. These are used by AdVent in order to decide when the ads sound has to be cut.

Database v20241014

(advent-pyenv) $ db-djv-pg dbinfo
Dejavu database info:
  Fingerprinted / total tracks = 412 / 412
  Peak groups                  = 132228 (avg. ~= 321 per track)
  Fingerprints                 = 3571509 (avg. ~= 8669 per track)
  Total fingerprinted time    ~= 2007 s (avg. ~= 4.9 s per track)
  Database size               ~= 514 MB (avg. ~= 1.25 MB per track)
  Fingerprinting frequency    ~= 1780 Hz (~= 4.04% of sampling frequency 44100 Hz)
  Hash size                    = 10 B
  Hash collisions             ~= 43.48%
  First update                ~= 2024-05-14 00:02:17
  Last update                 ~= 2024-10-14 23:40:47
  Last vacuum                 ~= 2024-10-14 23:43:32

AdVent database info:
  Countries                    = 1
  TV channels                  = 14 (avg. ~= 14 per country)
  Jingles                      = 412 (avg. ~= 29 per TV channel)
  Pure entry / entry jingles   = 94 / 293
  Pure exit / exit jingles     = 119 / 318
  No action jingles            = 0
  Time coverage from           = 2022-02-05
  Time coverage till           = 2024-10-05
(advent-pyenv) $

Database Population or Update for Regular Users

It is assumed that you have installed AdVent. Make sure to switch to its virtual environment:

$ source advent-pyenv/bin/activate

where advent-pyenv is the folder created during AdVent installation.

To pull the latest updates into your database:

  1. Download the latest snapshot (clone, pull or get a zip):
(advent-pyenv) $ git clone https://github.com/denis-stepanov/advent-db.git
(advent-pyenv) $ cd advent-db/DB
  1. (recommended) Delete countries or TV channels you are not expected to watch. This will decrease CPU load and reduce the number of false positives while running AdVent:
(advent-pyenv) $ rm -r VA # ....
  1. Import the update:
(advent-pyenv) $ db-djv-pg import -s .

Import might take a few minutes (depending on how much your database is out of sync).

Exit Jingles

Some TV channels are known to use multiple jingles during ad breaks (e.g., 1-2-3 instead of regular 1-2), at least in prime time. While this information could potentially be added to the database, currently it is not provided. You need to pass the number of exit jingles (1 by default) to AdVent using its -j option, or dynamically at run-time using j / J keys. The table below lists affected channels and their preferences.

TV ChannelJingle PatternN of Exit Jingles (-j)
FR/6TER1-2
1-2-3
1
2
FR/M61-2
1-2-3
1
2
FR/RMCSTORY1-2-32
FR/TF11-2-32
FR/TF1SERIESFILMS1-2-32
FR/TFX1-2+3-42
FR/TMC1-2-32
FR/W91-2
1-2(truncated)-3
1
2

If You Want to Create Your Own Hashes, Read Further

You will need:

  1. installed and configured AdVent (only sound capturing part; TV control module is not required);
  2. a basic audio editor capable of track trimming (plus some familiarity with it).

Make sure to switch to AdVent virtual environment:

$ source advent-pyenv/bin/activate

where advent-pyenv is the folder created during AdVent installation.

Example below is given for Linux Fedora, AdVent configured to listen to PulseAudio output (see "Capturing a TV Webcast") and Audacity as sound editor.

Step 1: Record TV Audio Containing Ads

As far as possible, stick to the same audio source that you will be using when running AdVent. Usually the simplest way not requiring messing up with cables is to record a portion of a TV broadcast from Internet. If you have got a choice between analog and digital recording, always prefer digital. Stereo sources are OK and even preferred (Dejavu will treat them as two-in-one).

Preferred audio format to produce is PCM (WAV), 16 bit low endian signed, 44.1 kHz, 2 channels (stereo). Many audio tools will produce this by default. Other formats or parameters have not been tested and may or may not work.

If you configured AdVent for PulseAudio input, you should be OK to record with default settings:

(advent-pyenv) $ parecord -v 6ter_20220725.wav
Opening a recording stream with sample specification 's16le 2ch 44100Hz' and channel map 'front-left,front-right'.
Connection established.
Stream successfully created.
Buffer metrics: maxlength=4194304, fragsize=352800
Using sample spec 's16le 2ch 44100Hz', channel map 'front-left,front-right'.
Connected to device alsa_output.pci-0000_00_1b.0.analog-stereo.monitor (index: 45, suspended: no).
Time: 4.608 sec; Latency: 608164 usec.
...
(stop with Ctrl-C)
(advent-pyenv) $

You can also use recorder of your choice, including Audacity. The file name is not important at this point; here it gives a hint of a channel and of a date of recording.

Step 2: Single Out a Jingle of Interest

Load the recording into Audacity: File > Open.... Seek through the track to locate the ad jingle; use zoom if needed. Select the desired interval by clicking in the track area on the jingle start and dragging mouse to the jingle end. Whenever possible, keep your jingle record at least two seconds long; anything below one second would not work and shall be avoided. Take a note of the jingle position (before or after the ads).

Selecting a jingle in Audacity

At this point is it recommended to test the jingle with AdVent to make sure that you are not looking at the piece already known. Run AdVent in console, then press Play in Audacity.

(advent-pyenv) $ advent -t nil
AdVent v1.6.1
TV control is nil with action 'mute' for 600 s max and 1 exit jingle
TV status: unmuted, volume: 100
Recognition interval is 2 s with confidence of 10%
Started 2 listening thread(s)
Type 'h' for help
....:::oooo:o.::ooooo:.......

If AdVent does not detect your jingle during playback (no "hit" printed), you are good to continue. Trim the track: Edit > Remove Special > Trim Audio; then shift the result to the beginning: Tracks > Align Tracks > Start to Zero. Export the track: File > Export > Export as WAV. Give it a name as per naming convention described below (in this case it would be something like FR_6TER_220725_ELEMENTARY1_1.wav); leave the encoding Signed 16-bit PCM (default). No need to fill any metadata; just click OK.

You can leave AdVent running till the end of the session; it will seamlessly pick up changes in database.

Step 3: Generate a Hash

Fingerprint the jingle using standard Dejavu approach:

(advent-pyenv) $ dejavu -f . wav
Fingerprinting all .wav files in the . directory
Fingerprinting channel 1/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
Finished channel 1/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
Fingerprinting channel 2/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
Finished channel 2/2 for ./FR_6TER_220725_ELEMENTARY1_1.wav
(advent-pyenv) $ 

It is recommended (but not required) to keep already processed jingle WAV files around in the case fingerprinting needs to be reexecuted. You can re-submit the entire folders many times; Dejavu will recognize and skip records which have already been submitted.

(optional) At this point you can undo your changes in Audacity (press Ctrl-Z twice) and press Play; AdVent running in background should now recognize the track:

....:::oO
Hit: FR_6TER_220725_ELEMENTARY1_1
TV muted
OOOo...

Export the hash from the database for future use:

(advent-pyenv) $ db-djv-pg export FR_6TER_220725_ELEMENTARY1_1
FR_6TER_220725_ELEMENTARY1_1 13349
(advent-pyenv) $ 

Check the number printed which is a number of fingerprints in the jingle. A good jingle might have 5000 or more fingerprints. Unfortunately, you do not have much freedom here, as the number of fingerprints would depend on the length of the jingle and of its sound "richness". Jingles with fewer than 500 fingerprints should not be let into a shared database, as they would often result in false positives. Jingles with 500 - 1000 fingerprints are in the risk zone.

If you consider that your hashes could be of use for others, please submit a pull request. Make sure you follow the folder structure defined in this repository and the file naming conventions.

Jingle Naming Convention

11_2..2_333333_4..4_5..5.ext
FR_TF1_220214_EVENING1_3.djv
  1. ISO 3166 two letter country code
  2. TV channel name
  3. Jingle capture date YYMMDD (approximate if unsure)
  4. Jingle name (free format, alphanumeric)
  5. Binary flags in decimal
    1. 0x1 - Jingle starts the ads (1 if unsure)
    2. 0x2 - Jingle ends the ads (1 if unsure)
    3. ...
  6. File extension indicates recognizer provider
    1. djv - Dejavu

If a jingle can be seen both at entry and exit, the flags would be 0x1 & 0x2 = 0x3 = decimal 3 (i.e., just 3 in the name).

Unless you have a specific reason, it is recommended to use capital letters only and to avoid any special characters (spaces, apostrophs, punctuation...).

Jingle Hash File Format (.djv)

Variable CSV (comma-separated) format is used.

The first line describes the format:

<format>,<format_version>

The only supported format is djv. Current format version is 1. Backward compatibility (reading files of older formats) is supported; forward compatibility (reading files of newer formats) is not supported.

The second line describes the jingle:

<name>,<fingerprinted>,<file_hash>,<num_fingerprints>

name is the full name of the jingle, as defined above (omitting file extension). It is expected (but not required) that the name as stored in the file corresponds to the file name. fingerprinted is a flag 0/1 from the Dejavu database; it should read 1 in all regular AdVent usage scenarios. file_hash is a SHA1 hash of the audio file once submitted to Dejavu. num_fingerprints is a number of fingerprints generated for the jingle.

The remaining lines are individual fingerprints. Their number should correspond to the number of fingerprints defined.

<offset>,<hash>

offset is a fingerprint offset inside the jingle. hash is a fingerprint itself. It is normal to have several fingerprints for the same offset. The order of the fingerprint lines in the file is not important; for reproducibility of export they are ordered on save.

Example of a file FR_6TER_220725_ELEMENTARY1_1.djv:

djv,1
FR_6TER_220725_ELEMENTARY1_1,1,72e621563a21a344ead619ef6bfe14fa6a2d219c,13349
0,1d65a451ac6be35206b8
0,2b4157cae1e958cfe6e7
(.. 13347 more lines ..)

About

A database for AdVent, the TV ads arrestor

Resources

License

Stars

Watchers

Forks

Packages

No packages published