Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver/Input: Migrate audio backend to Symphonia #89

Merged
merged 215 commits into from
Jul 23, 2022

Conversation

FelixMcFelix
Copy link
Member

@FelixMcFelix FelixMcFelix commented Aug 23, 2021

This begins work on #67, which will be a huge step towards purely in-process audio processing (i.e., one day eliminating ffmpeg).

  • Audiopus-based Symphonia codec and DCA1 framing.
  • Ensure that Opus packet passthrough can be performed through Symphonia.
  • Port mixing infrastructure (and packet passthrough).
  • Rework input interface.
  • Implement basic DASH handling for youtube and other multimedia handlers extracted from youtube-dl.
  • Investigate upstreaming MKV support to Symphonia.
  • Port our stream-cacheing wrappers to this ecosystem.

There will be some slight issues stemming from the fact that we will now be parsing metadata from bytestreams as they arrive:

  • How will metadata be accessible to users before it enters the mixer? Is this just a spawn_blocking on the initial Format parse phase? Pass messages out from the main mixer context?
  • Users will need to be able to both pass in raw MediaSourceStreams, Reads, and fully processed (Format, Metadata, ...) sets -- due to the above, and as some extra metadata will only be exposed by youtube-dl, for instance.
  • Until MKV support is impl'd, we'll probably need to peek at bytes internally and pass over to ffmpeg on a decode failure. Having such a fallback should probably be toggleable.
  • We'll probably need to offer one or two wrappers to hand over AsyncReads into the mixing system (as well as automating extension/MIME type extraction from Files or HTTP responses).

Adds generics to any `Id` types on `Call`. Also includes the overlooked `Songbird::get_or_insert`.

Closes serenity-rs#94. Tested using `cargo make ready`.
This matches a recent serenity change where `ClientBuilder` no longer has an explicit lifetime parameter.

This was tested using `cargo make ready`.
MP3s now work great under the convert -> resample -> mix pipeline, even across unclean frame boundaries. Main issue seemed to be the number of internal subchunks in the resampler, which is still totally opaque... Oh well.

Actual instantiation of Lazy->Wrapped and Wrapped->Parsed elements in the driver is not yet covered.

Other formats are broken due to handling of the default track id, which is currently a) messy, and b) incorrect.
Oggs have some frames whose length is 0, yet must be decoded. I assume this covers the entire coder delay.
Tested with a URL extracted from bandcamp via youtube-dl, also includes the scaffolding necessary to have Reads/Seeks pass between sync/async boundaries.

This format is MP3, streamed in over an HTTP request via reqwest. Seeks not tested with e.g. Async Files -- they are programmed, though.
@FelixMcFelix
Copy link
Member Author

Rough status update: we now have a decent mixing pipeline for audio files of different formats, samples rates, packet lengths etc. into a single buffer. This is horrifically messy. Opus packet passthrough isn't there yet, but should be possible with some care around codec resets. There are some issues around track data cleanup right now.

We also have a wrapper for running seeks/reads over async audio sources (i.e., reqwest calls) and passing those bytes in via ring buffer, which seem to be working great for exposing some amount of flexibility around input sources. In tandem, mkv support in symphonia seems to be coming along well -- hopefully with that youtube-dl should integrate nicely to allow e.g. WebMs. I hope that will come down to simple parsing, with occasional renegotiation for a new DASH source if we don't want the stream to cut out halfway through.

Moves all sync creation/parsing/seeking etc. over to an elastically-sized thread pool. Since basically everything is now a restartable, this means that the `ForwardOnly` distinction can be handled really cleanly in general and we can try to recreate sources as needed.
Asks for any streams without webm, since they're likely to be golden right now.
FelixMcFelix and others added 8 commits July 11, 2022 16:56
This is most relevant for queue users -- an extra track would cause opus passthrough to end, even though only one track was being actively used in mixing.
Corollary: this same new clippy lint adds a ton of false positives which *will* fail to compile.
Will leave 're-computing filesize guesses' to a future issue.
@FelixMcFelix FelixMcFelix marked this pull request as ready for review July 23, 2022 22:15
@FelixMcFelix
Copy link
Member Author

I've put a fairly in-depth review and commenting round into the core driver changes, and I've had a more cursory look over the rest of the changes (inputs, adapters, tracks). I'll be merging shortly -- between this and the adventurous use in prod from a handful of users I think this is really quite stable now. Most importantly, it's fast and it's lightweight as a pure Rust implementation should be.

One caveat is that we're waiting on one symphonia bug fix being merged and released before this can go to crates.io, at some point this might need to be pushed out with a disclaimer that the current symphonia fork should be Cargo patched in instead. Hopefully this won't be the case!

@FelixMcFelix FelixMcFelix merged commit 5547de2 into serenity-rs:next Jul 23, 2022
FelixMcFelix added a commit that referenced this pull request Jul 26, 2022
This extensive PR rewrites the internal mixing logic of the driver to use symphonia for parsing and decoding audio data, and rubato to resample audio. Existing logic to decode DCA and Opus formats/data have been reworked as plugins for symphonia. The main benefit is that we no longer need to keep yt-dlp and ffmpeg processes alive, saving a lot of memory and CPU: all decoding can be done in Rust! In exchange, we now need to do a lot of the HTTP handling and resumption ourselves, but this is still a huge net positive.

`Input`s have been completely reworked such that all default (non-cached) sources are lazy by default, and are no longer covered by a special-case `Restartable`. These now span a gamut from a `Compose` (lazy), to a live source, to a fully `Parsed` source. As mixing is still sync, this includes adapters for `AsyncRead`/`AsyncSeek`, and HTTP streams.

`Track`s have been reworked so that they only contain initialisation state for each track. `TrackHandles` are only created once a `Track`/`Input` has been handed over to the driver, replacing `create_player` and related functions. `TrackHandle::action` now acts on a `View` of (im)mutable state, and can request seeks/readying via `Action`.

Per-track event handling has also been improved -- we can now determine and propagate the reason behind individual track errors due to the new backend. Some `TrackHandle` commands (seek etc.) benefit from this, and now use internal callbacks to signal completion.

Due to associated PRs on felixmcfelix/songbird from avid testers, this includes general clippy tweaks, API additions, and other repo-wide cleanup. Thanks go out to the below co-authors.

Co-authored-by: Gnome! <45660393+GnomedDev@users.noreply.github.com>
Co-authored-by: Alakh <36898190+alakhpc@users.noreply.github.com>
FelixMcFelix added a commit to FelixMcFelix/songbird that referenced this pull request Nov 20, 2023
This extensive PR rewrites the internal mixing logic of the driver to use symphonia for parsing and decoding audio data, and rubato to resample audio. Existing logic to decode DCA and Opus formats/data have been reworked as plugins for symphonia. The main benefit is that we no longer need to keep yt-dlp and ffmpeg processes alive, saving a lot of memory and CPU: all decoding can be done in Rust! In exchange, we now need to do a lot of the HTTP handling and resumption ourselves, but this is still a huge net positive.

`Input`s have been completely reworked such that all default (non-cached) sources are lazy by default, and are no longer covered by a special-case `Restartable`. These now span a gamut from a `Compose` (lazy), to a live source, to a fully `Parsed` source. As mixing is still sync, this includes adapters for `AsyncRead`/`AsyncSeek`, and HTTP streams.

`Track`s have been reworked so that they only contain initialisation state for each track. `TrackHandles` are only created once a `Track`/`Input` has been handed over to the driver, replacing `create_player` and related functions. `TrackHandle::action` now acts on a `View` of (im)mutable state, and can request seeks/readying via `Action`.

Per-track event handling has also been improved -- we can now determine and propagate the reason behind individual track errors due to the new backend. Some `TrackHandle` commands (seek etc.) benefit from this, and now use internal callbacks to signal completion.

Due to associated PRs on felixmcfelix/songbird from avid testers, this includes general clippy tweaks, API additions, and other repo-wide cleanup. Thanks go out to the below co-authors.

Co-authored-by: Gnome! <45660393+GnomedDev@users.noreply.github.com>
Co-authored-by: Alakh <36898190+alakhpc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Will either cause existing code to fail to compile, or cause substantial behaviour changes enhancement New feature or request input Relates to raw audio data handling. tracks Relates to audio control, state, or lifecycle management.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants