Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all voice encryption modes #599

Merged
merged 8 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions guides/advanced/multi_node.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ changing your application definition in `mix.exs` as follows:
mod: {MyBot.Application, []},
included_applications: [:nostrum],
# You can see this with `mix app.tree nostrum`
extra_applications: [:certifi, :gun, :inets, :jason, :kcl, :mime]
extra_applications: [:certifi, :gun, :inets, :jason, :mime]
# ...
]
end
Expand All @@ -53,7 +53,7 @@ as command frameworks like `:nosedrum`:
```elixir
defp deps do
[
{:nostrum, "~> 0.8", runtime: false},
{:nostrum, "~> 0.9", runtime: false},
# {:nosedrum, "~> 0.6", runtime: false},
]
end
Expand Down
57 changes: 57 additions & 0 deletions guides/functionality/voice.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,3 +177,60 @@ packets returned per invocation and the option to return the raw RTP packet. In
likely won't be missed when consuming incoming voice packets asynchronously.
Note that the third element in the event is of type
`t:Nostrum.Struct.VoiceWSState.t/0` and not `t:Nostrum.Struct.WSState.t/0`.

## Encryption Modes

Nostrum supports all of Discord's available encryption modes for voice channels.
The encryption mode is invisible to the user, and you will likely never need to touch it.

Different encryption modes may have different performance characteristics depending on the
hardware architecture your bot is running on. If you're interested, keep reading.

#### Encryption Mode Configuration Options

This is a compile-time configuration option, so should you wish to set it,
do it in `config.exs` or one of its imported config files, *not* `runtime.exs`.

```elixir
config :nostrum, :voice_encryption_mode, :aes256_gcm # Default
```

Available configuration options are as follows:
- `:xsalsa20_poly1305`
- `:xsalsa20_poly1305_suffix`
- `:xsalsa20_poly1305_lite`
- `:xsalsa20_poly1305_lite_rtpsize` *(not yet documented by Discord)*
- `:aead_xchacha20_poly1305_rtpsize` *(not yet documented by Discord)*
- `:aead_aes256_gcm` *(not yet documented by Discord)*
- `:aead_aes256_gcm_rtpsize` *(not yet documented by Discord)*
- `:xchacha20_poly1305` (alias for `:aead_xchacha20_poly1305_rtpsize`)
- `:aes256_gcm` (alias for `:aead_aes256_gcm_rtpsize`)

The first seven are Discord's available options, while the last two are shorter aliases.

The latter four of Discord's seven modes are not yet documented, but [will be soon](https://github.com/discord/discord-api-docs/pull/6801).

#### Implementation Details

Of the seven supported modes, three different ciphers are used. The remaining differences
are variations in how the nonce is determined and where the encrypted portion of the RTP packet begins.

Erlang's `:crypto` module is leveraged as much as possible as the ciphers are NIFs.

##### xsalsa20_poly1305

The entire Salsa20/XSalsa20 cipher is implemented in elixir. The poly1305 MAC function is handled by the `:crypto` module.
As a result, xsalsa_poly1305 modes will likely have the slowest performance.

##### xchacha20_poly1305

The `:crypto` module supports the `chacha20_poly1305` AEAD cipher. The only thing implemented in elixir
is the HChaCha20 hash function that generates a sub-key from the key and the longer nonce that XChaCha20
specifies, which is then passed to the `chacha20_poly1305` cipher.
If your hardware doesn't have AES hardware acceleration, the `chacha` option may perform
the best for you.

##### aes256_gcm

The `:crypto` module completely supports AES256 in GCM mode requiring no implementation in elixir.
Many CPUs have hardware acceleration specifically for AES. For these reasons, Nostrum defaults to `aes256_gcm`.
10 changes: 6 additions & 4 deletions guides/intro/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ nostrum is an Elixir library that can be used to interact with Discord.
To see documentation about a specific part of the library, please visit one of
the following:

* [API](api.html) - Methods to interact with the RESTful API (and some other goodies).
* [API](api-1.html) - Methods to interact with the RESTful API (and some other goodies).
* [State](state.html) - Caches that keep information from Discord fresh at your disposal.
* [Events](event_handling.html) - Handling events from Discord as they come in.
* [Voice](voice.html) - Playing audio through Discord voice channels.
* [Voice](voice-2.html) - Playing audio through Discord voice channels.

## Setup

Expand Down Expand Up @@ -67,15 +67,17 @@ Apart from the `token` field mentioned above, the following fields are also supp
livestream audio with streamlink support. Defaults to `"streamlink"`.
- `audio_timeout` - Milliseconds that input must begin generating audio by
upon invoking `play`. More information about this option can be found in the
[voice](./voice.html) documentation page. Defaults to `20_000` (20s).
[voice](./voice-2.html) documentation page. Defaults to `20_000` (20s).
- `audio_frames_per_burst` - Number of opus frames to send at a time while
playing audio. More information about this option can be found in the
[voice](./voice.html) documentation page. Defaults to `10`.
[voice](./voice-2.html) documentation page. Defaults to `10`.
- `voice_auto_connect` - This will determine if Nostrum automatically connects
to voice websockets gateways upon joining voice channels. If set to `false`
but you still wish to connect to the voice gateway, you can do so manually
by calling `Nostrum.Voice.connect_to_gateway/1` after joining a voice
channel. Defaults to `true`.
- `voice_encryption_mode` - Defaults to `:aes256_gcm`. More information about this
option can be found [here](./voice-2.html#encryption-modes).


### Development & debugging
Expand Down
21 changes: 5 additions & 16 deletions lib/nostrum/voice/audio.ex
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,10 @@ defmodule Nostrum.Voice.Audio do
alias Nostrum.Struct.VoiceState
alias Nostrum.Util
alias Nostrum.Voice
alias Nostrum.Voice.Crypto
alias Nostrum.Voice.Opus
alias Nostrum.Voice.Ports

@encryption_mode "xsalsa20_poly1305"

# Default value
@frames_per_burst 10

Expand All @@ -20,8 +19,6 @@ defmodule Nostrum.Voice.Audio do
@ytdl "youtube-dl"
@streamlink "streamlink"

def encryption_mode, do: @encryption_mode

def ffmpeg_executable, do: Application.get_env(:nostrum, :ffmpeg, @ffmpeg)
def youtubedl_executable, do: Application.get_env(:nostrum, :youtubedl, @ytdl)
def streamlink_executable, do: Application.get_env(:nostrum, :streamlink, @streamlink)
Expand All @@ -40,13 +37,6 @@ defmodule Nostrum.Voice.Audio do
>>
end

def encrypt_packet(%VoiceState{} = voice, data) do
header = rtp_header(voice)
# 12 byte header + 12 null bytes
nonce = header <> <<0::8*12>>
header <> Kcl.secretbox(data, nonce, voice.secret_key)
end

def open_udp do
{:ok, socket} =
:gen_udp.open(0, [
Expand All @@ -58,17 +48,16 @@ defmodule Nostrum.Voice.Audio do
socket
end

def get_rtp_packet(%VoiceState{secret_key: key, udp_socket: socket} = v) do
def get_rtp_packet(%VoiceState{udp_socket: socket} = v) do
{:ok, {_ip, _port, payload}} = :gen_udp.recv(socket, 1024)

case payload do
# Skip RTCP packets
<<2::2, 0::1, 1::5, 201::8, _rest::binary>> ->
get_rtp_packet(v)

<<header::binary-size(12), data::binary>> ->
nonce = header <> <<0::8*12>>
{header, Kcl.secretunbox(data, nonce, key)}
<<header::bytes-size(12), _::binary>> = data ->
{header, Crypto.decrypt(v, data)}
end
end

Expand Down Expand Up @@ -144,7 +133,7 @@ defmodule Nostrum.Voice.Audio do
v.udp_socket,
v.ip |> ip_to_tuple(),
v.port,
encrypt_packet(v, f)
Crypto.encrypt(v, f)
)

%{
Expand Down
209 changes: 209 additions & 0 deletions lib/nostrum/voice/crypto.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
defmodule Nostrum.Voice.Crypto do
@moduledoc false

alias Nostrum.Struct.VoiceState
alias Nostrum.Struct.VoiceWSState
alias Nostrum.Voice.Audio
alias Nostrum.Voice.Crypto.Aes
alias Nostrum.Voice.Crypto.Chacha
alias Nostrum.Voice.Crypto.Salsa

@type cipher_rtpsize ::
:xsalsa20_poly1305_lite_rtpsize
| :aead_xchacha20_poly1305_rtpsize
| :aead_aes256_gcm_rtpsize

@type cipher_alias :: :aes256_gcm | :xchacha20_poly1305

@type cipher_non_rtpsize ::
:xsalsa20_poly1305
| :xsalsa20_poly1305_suffix
| :xsalsa20_poly1305_lite
| :aead_aes256_gcm

@type cipher :: cipher_non_rtpsize() | cipher_alias() | cipher_rtpsize()

@mode Application.compile_env(:nostrum, :voice_encryption_mode, :aes256_gcm)

@mode_string Map.get(
%{
xchacha20_poly1305: "aead_xchacha20_poly1305_rtpsize",
aes256_gcm: "aead_aes256_gcm_rtpsize"
},
@mode,
"#{@mode}"
)

def encryption_mode, do: @mode_string

def encrypt(voice, data) do
header = Audio.rtp_header(voice)
unquote(:"encrypt_#{@mode}")(voice, data, header)
end

def decrypt(%VoiceState{secret_key: key}, data), do: decrypt(key, data)
def decrypt(%VoiceWSState{secret_key: key}, data), do: decrypt(key, data)
def decrypt(key, data), do: unquote(:"decrypt_#{@mode}")(key, data)

def encrypt_xsalsa20_poly1305(%VoiceState{secret_key: key}, data, header) do
nonce = header <> <<0::unit(8)-size(12)>>

[header, Salsa.encrypt(data, key, nonce)]
end

def encrypt_xsalsa20_poly1305_suffix(%VoiceState{secret_key: key}, data, header) do
nonce = :crypto.strong_rand_bytes(24)

[header, Salsa.encrypt(data, key, nonce), nonce]
end

def encrypt_xsalsa20_poly1305_lite(%VoiceState{secret_key: key} = voice, data, header) do
{unpadded_nonce, nonce} = lite_nonce(voice)

[header, Salsa.encrypt(data, key, nonce), unpadded_nonce]
end

def encrypt_xsalsa20_poly1305_lite_rtpsize(voice, data, header),
do: encrypt_xsalsa20_poly1305_lite(voice, data, header)

def encrypt_xchacha20_poly1305(voice, data, header),
do: encrypt_aead_xchacha20_poly1305_rtpsize(voice, data, header)

def encrypt_aead_xchacha20_poly1305_rtpsize(%VoiceState{secret_key: key} = voice, data, header) do
{unpadded_nonce, nonce} = lite_nonce(voice)

[header, Chacha.encrypt(data, key, nonce, _aad = header), unpadded_nonce]
end

def encrypt_aead_aes256_gcm(voice, data, header), do: encrypt_aes256_gcm(voice, data, header)

def encrypt_aead_aes256_gcm_rtpsize(voice, data, header),
do: encrypt_aes256_gcm(voice, data, header)

def encrypt_aes256_gcm(%VoiceState{secret_key: key} = voice, data, header) do
{unpadded_nonce, nonce} = lite_nonce(voice, 12)

[header, Aes.encrypt(data, key, nonce, _aad = header), unpadded_nonce]
end
jchristgit marked this conversation as resolved.
Show resolved Hide resolved

def decrypt_xsalsa20_poly1305(key, <<header::bytes-size(12), cipher_text::binary>>) do
nonce = header <> <<0::unit(8)-size(12)>>

Salsa.decrypt(cipher_text, key, nonce)
end

def decrypt_xsalsa20_poly1305_lite(key, data) do
{_header, cipher_text, _tag = <<>>, nonce} = decode_packet(data, 4, 24, 0)

Salsa.decrypt(cipher_text, key, nonce)
end

def decrypt_xsalsa20_poly1305_suffix(key, data) do
{_header, cipher_text, _tag = <<>>, nonce} = decode_packet(data, 24, 24, 0)

Salsa.decrypt(cipher_text, key, nonce)
end

def decrypt_xsalsa20_poly1305_lite_rtpsize(key, data) do
{_header, cipher_text, _tag, nonce, ext_len} = decode_packet_rtpsize(data, 24, 0)

<<_exts::unit(32)-size(ext_len), opus::binary>> = Salsa.decrypt(cipher_text, key, nonce)

opus
end

def decrypt_xchacha20_poly1305(key, data),
do: decrypt_aead_xchacha20_poly1305_rtpsize(key, data)

def decrypt_aead_xchacha20_poly1305_rtpsize(key, data) do
{header, cipher_text, tag, nonce, ext_len} = decode_packet_rtpsize(data, 24, 16)

<<_exts::unit(32)-size(ext_len), opus::binary>> =
Chacha.decrypt(cipher_text, key, nonce, _aad = header, tag)

opus
end

def decrypt_aes256_gcm(key, data), do: decrypt_aead_aes256_gcm_rtpsize(key, data)

def decrypt_aead_aes256_gcm_rtpsize(key, data) do
{header, cipher_text, tag, nonce, ext_len} = decode_packet_rtpsize(data, 12, 16)

<<_exts::unit(32)-size(ext_len), opus::binary>> =
Aes.decrypt(cipher_text, key, nonce, _aad = header, tag)

opus
end

def decrypt_aead_aes256_gcm(key, data) do
{header, cipher_text, tag, nonce} = decode_packet(data, 4, 12, 16)

Aes.decrypt(cipher_text, key, nonce, _aad = header, tag)
end

@lite_nonce_length 4

defp lite_nonce(%VoiceState{rtp_sequence: rtp_sequence}, nonce_length \\ 24) do
unpadded_nonce = <<rtp_sequence::32>>
jchristgit marked this conversation as resolved.
Show resolved Hide resolved
nonce = unpadded_nonce <> <<0::unit(8)-size(nonce_length - @lite_nonce_length)>>
{unpadded_nonce, nonce}
end

# Discord's newer encryption modes ending in '_rtpsize' leave the first 4 bytes of the RTP
# header extension in plaintext while encrypting the elements themselves. The AAD is the
# 12-byte RTP header concatenated with the first 4 bytes of the RTP header extension.

# Much like is done within the function `Nostrum.Voice.Opus.strip_rtp_ext/1`, we pattern match
# on the `0xBEDE` constant and the 16-bit big-endian extension length that denotes the length
# in 32-bit words of the extension elements. Because the elements are a part of the cipher text,
# the extension length is the number of 32-bit words to discard after decryption to obtain
# solely the opus packet.

# This function returns a 5-element tuple with
# - RTP header
# - Fixed 12 byte header concatenated with the first 4 bytes of the extension
# - Used as the AAD for AEAD ciphers
# - cipher text
# - RTP extension elements prepended to the opus packet
# - cipher tag (MAC)
# - nonce (padded)
# - RTP header extension length
# - for isolating the opus after decryption
defp decode_packet_rtpsize(
<<header::bytes-size(12), 0xBE, 0xDE, ext_len::integer-16, rest::binary>>,
nonce_length,
tag_length
)
when byte_size(rest) - (@lite_nonce_length + tag_length) > ext_len * 4 do
header = header <> <<0xBE, 0xDE, ext_len::integer-16>>

{cipher_text, tag, unpadded_nonce} = split_data(rest, @lite_nonce_length, tag_length)

nonce = unpadded_nonce <> <<0::unit(8)-size(nonce_length - @lite_nonce_length)>>

{header, cipher_text, tag, nonce, ext_len}
end

# Non "rtpsize" modes where everything is encrypted beyond the 12-byte header
defp decode_packet(
<<header::bytes-size(12), rest::binary>>,
unpadded_nonce_length,
nonce_length,
tag_length
) do
{cipher_text, tag, unpadded_nonce} = split_data(rest, unpadded_nonce_length, tag_length)

nonce = unpadded_nonce <> <<0::unit(8)-size(nonce_length - unpadded_nonce_length)>>

{header, cipher_text, tag, nonce}
end

defp split_data(data, unpadded_nonce_length, tag_length) do
cipher_text_length = byte_size(data) - (unpadded_nonce_length + tag_length)

<<cipher_text::bytes-size(cipher_text_length), tag::bytes-size(tag_length),
unpadded_nonce::bytes-size(unpadded_nonce_length)>> = data

{cipher_text, tag, unpadded_nonce}
end
end
Loading