Kraigie · jchristgit · May 20, 2024 · May 16, 2024 · May 17, 2024 · May 17, 2024
diff --git a/guides/advanced/multi_node.md b/guides/advanced/multi_node.md
@@ -40,7 +40,7 @@ changing your application definition in `mix.exs` as follows:
       mod: {MyBot.Application, []},
       included_applications: [:nostrum],
       # You can see this with `mix app.tree nostrum`
-      extra_applications: [:certifi, :gun, :inets, :jason, :kcl, :mime]
+      extra_applications: [:certifi, :gun, :inets, :jason, :mime]
       # ...
     ]
   end
@@ -53,7 +53,7 @@ as command frameworks like `:nosedrum`:
 ```elixir
   defp deps do
     [
-      {:nostrum, "~> 0.8", runtime: false},
+      {:nostrum, "~> 0.9", runtime: false},
       # {:nosedrum, "~> 0.6", runtime: false},
     ]
   end

diff --git a/guides/functionality/voice.md b/guides/functionality/voice.md
@@ -177,3 +177,60 @@ packets returned per invocation and the option to return the raw RTP packet. In
 likely won't be missed when consuming incoming voice packets asynchronously.
 Note that the third element in the event is of type
 `t:Nostrum.Struct.VoiceWSState.t/0` and not `t:Nostrum.Struct.WSState.t/0`.
+
+## Encryption Modes
+
+Nostrum supports all of Discord's available encryption modes for voice channels.
+The encryption mode is invisible to the user, and you will likely never need to touch it.
+
+Different encryption modes may have different performance characteristics depending on the
+hardware architecture your bot is running on. If you're interested, keep reading.
+
+#### Encryption Mode Configuration Options
+
+This is a compile-time configuration option, so should you wish to set it,
+do it in `config.exs` or one of its imported config files, *not* `runtime.exs`.
+
+```elixir
+config :nostrum, :voice_encryption_mode, :aes256_gcm # Default
+```
+
+Available configuration options are as follows:
+- `:xsalsa20_poly1305`
+- `:xsalsa20_poly1305_suffix`
+- `:xsalsa20_poly1305_lite`
+- `:xsalsa20_poly1305_lite_rtpsize` *(not yet documented by Discord)*
+- `:aead_xchacha20_poly1305_rtpsize` *(not yet documented by Discord)*
+- `:aead_aes256_gcm` *(not yet documented by Discord)*
+- `:aead_aes256_gcm_rtpsize` *(not yet documented by Discord)*
+- `:xchacha20_poly1305` (alias for `:aead_xchacha20_poly1305_rtpsize`)
+- `:aes256_gcm` (alias for `:aead_aes256_gcm_rtpsize`)
+
+The first seven are Discord's available options, while the last two are shorter aliases.
+
+The latter four of Discord's seven modes are not yet documented, but [will be soon](https://github.com/discord/discord-api-docs/pull/6801).
+
+#### Implementation Details
+
+Of the seven supported modes, three different ciphers are used. The remaining differences
+are variations in how the nonce is determined and where the encrypted portion of the RTP packet begins.
+
+Erlang's `:crypto` module is leveraged as much as possible as the ciphers are NIFs.
+
+##### xsalsa20_poly1305
+
+The entire Salsa20/XSalsa20 cipher is implemented in elixir. The poly1305 MAC function is handled by the `:crypto` module.
+As a result, xsalsa_poly1305 modes will likely have the slowest performance.
+
+##### xchacha20_poly1305
+
+The `:crypto` module supports the `chacha20_poly1305` AEAD cipher. The only thing implemented in elixir 
+is the HChaCha20 hash function that generates a sub-key from the key and the longer nonce that XChaCha20 
+specifies, which is then passed to the `chacha20_poly1305` cipher.
+If your hardware doesn't have AES hardware acceleration, the `chacha` option may perform
+the best for you.
+
+##### aes256_gcm
+
+The `:crypto` module completely supports AES256 in GCM mode requiring no implementation in elixir. 
+Many CPUs have hardware acceleration specifically for AES. For these reasons, Nostrum defaults to `aes256_gcm`.
diff --git a/guides/intro/intro.md b/guides/intro/intro.md
@@ -5,10 +5,10 @@ nostrum is an Elixir library that can be used to interact with Discord.
 To see documentation about a specific part of the library, please visit one of
 the following:
 
-* [API](api.html) - Methods to interact with the RESTful API (and some other goodies).
+* [API](api-1.html) - Methods to interact with the RESTful API (and some other goodies).
 * [State](state.html) - Caches that keep information from Discord fresh at your disposal.
 * [Events](event_handling.html) - Handling events from Discord as they come in.
-* [Voice](voice.html) - Playing audio through Discord voice channels.
+* [Voice](voice-2.html) - Playing audio through Discord voice channels.
 
 ## Setup
 
@@ -67,15 +67,17 @@ Apart from the `token` field mentioned above, the following fields are also supp
   livestream audio with streamlink support. Defaults to `"streamlink"`.
 - `audio_timeout` - Milliseconds that input must begin generating audio by
   upon invoking `play`. More information about this option can be found in the
-  [voice](./voice.html) documentation page. Defaults to `20_000` (20s).
+  [voice](./voice-2.html) documentation page. Defaults to `20_000` (20s).
 - `audio_frames_per_burst` - Number of opus frames to send at a time while
   playing audio. More information about this option can be found in the
-  [voice](./voice.html) documentation page. Defaults to `10`.
+  [voice](./voice-2.html) documentation page. Defaults to `10`.
 - `voice_auto_connect` - This will determine if Nostrum automatically connects
   to voice websockets gateways upon joining voice channels. If set to `false`
   but you still wish to connect to the voice gateway, you can do so manually
   by calling `Nostrum.Voice.connect_to_gateway/1` after joining a voice
   channel. Defaults to `true`.
+- `voice_encryption_mode` - Defaults to `:aes256_gcm`. More information about this
+  option can be found [here](./voice-2.html#encryption-modes).
 
 
 ### Development & debugging

diff --git a/lib/nostrum/voice/audio.ex b/lib/nostrum/voice/audio.ex
@@ -7,11 +7,10 @@ defmodule Nostrum.Voice.Audio do
   alias Nostrum.Struct.VoiceState
   alias Nostrum.Util
   alias Nostrum.Voice
+  alias Nostrum.Voice.Crypto
   alias Nostrum.Voice.Opus
   alias Nostrum.Voice.Ports
 
-  @encryption_mode "xsalsa20_poly1305"
-
   # Default value
   @frames_per_burst 10
 
@@ -20,8 +19,6 @@ defmodule Nostrum.Voice.Audio do
   @ytdl "youtube-dl"
   @streamlink "streamlink"
 
-  def encryption_mode, do: @encryption_mode
-
   def ffmpeg_executable, do: Application.get_env(:nostrum, :ffmpeg, @ffmpeg)
   def youtubedl_executable, do: Application.get_env(:nostrum, :youtubedl, @ytdl)
   def streamlink_executable, do: Application.get_env(:nostrum, :streamlink, @streamlink)
@@ -40,13 +37,6 @@ defmodule Nostrum.Voice.Audio do
     >>
   end
 
-  def encrypt_packet(%VoiceState{} = voice, data) do
-    header = rtp_header(voice)
-    # 12 byte header + 12 null bytes
-    nonce = header <> <<0::8*12>>
-    header <> Kcl.secretbox(data, nonce, voice.secret_key)
-  end
-
   def open_udp do
     {:ok, socket} =
       :gen_udp.open(0, [
@@ -58,17 +48,16 @@ defmodule Nostrum.Voice.Audio do
     socket
   end
 
-  def get_rtp_packet(%VoiceState{secret_key: key, udp_socket: socket} = v) do
+  def get_rtp_packet(%VoiceState{udp_socket: socket} = v) do
     {:ok, {_ip, _port, payload}} = :gen_udp.recv(socket, 1024)
 
     case payload do
       # Skip RTCP packets
       <<2::2, 0::1, 1::5, 201::8, _rest::binary>> ->
         get_rtp_packet(v)
 
-      <<header::binary-size(12), data::binary>> ->
-        nonce = header <> <<0::8*12>>
-        {header, Kcl.secretunbox(data, nonce, key)}
+      <<header::bytes-size(12), _::binary>> = data ->
+        {header, Crypto.decrypt(v, data)}
     end
   end
 
@@ -144,7 +133,7 @@ defmodule Nostrum.Voice.Audio do
           v.udp_socket,
           v.ip |> ip_to_tuple(),
           v.port,
-          encrypt_packet(v, f)
+          Crypto.encrypt(v, f)
         )
 
         %{

diff --git a/lib/nostrum/voice/crypto.ex b/lib/nostrum/voice/crypto.ex
@@ -0,0 +1,209 @@
+defmodule Nostrum.Voice.Crypto do
+  @moduledoc false
+
+  alias Nostrum.Struct.VoiceState
+  alias Nostrum.Struct.VoiceWSState
+  alias Nostrum.Voice.Audio
+  alias Nostrum.Voice.Crypto.Aes
+  alias Nostrum.Voice.Crypto.Chacha
+  alias Nostrum.Voice.Crypto.Salsa
+
+  @type cipher_rtpsize ::
+          :xsalsa20_poly1305_lite_rtpsize
+          | :aead_xchacha20_poly1305_rtpsize
+          | :aead_aes256_gcm_rtpsize
+
+  @type cipher_alias :: :aes256_gcm | :xchacha20_poly1305
+
+  @type cipher_non_rtpsize ::
+          :xsalsa20_poly1305
+          | :xsalsa20_poly1305_suffix
+          | :xsalsa20_poly1305_lite
+          | :aead_aes256_gcm
+
+  @type cipher :: cipher_non_rtpsize() | cipher_alias() | cipher_rtpsize()
+
+  @mode Application.compile_env(:nostrum, :voice_encryption_mode, :aes256_gcm)
+
+  @mode_string Map.get(
+                 %{
+                   xchacha20_poly1305: "aead_xchacha20_poly1305_rtpsize",
+                   aes256_gcm: "aead_aes256_gcm_rtpsize"
+                 },
+                 @mode,
+                 "#{@mode}"
+               )
+
+  def encryption_mode, do: @mode_string
+
+  def encrypt(voice, data) do
+    header = Audio.rtp_header(voice)
+    unquote(:"encrypt_#{@mode}")(voice, data, header)
+  end
+
+  def decrypt(%VoiceState{secret_key: key}, data), do: decrypt(key, data)
+  def decrypt(%VoiceWSState{secret_key: key}, data), do: decrypt(key, data)
+  def decrypt(key, data), do: unquote(:"decrypt_#{@mode}")(key, data)
+
+  def encrypt_xsalsa20_poly1305(%VoiceState{secret_key: key}, data, header) do
+    nonce = header <> <<0::unit(8)-size(12)>>
+
+    [header, Salsa.encrypt(data, key, nonce)]
+  end
+
+  def encrypt_xsalsa20_poly1305_suffix(%VoiceState{secret_key: key}, data, header) do
+    nonce = :crypto.strong_rand_bytes(24)
+
+    [header, Salsa.encrypt(data, key, nonce), nonce]
+  end
+
+  def encrypt_xsalsa20_poly1305_lite(%VoiceState{secret_key: key} = voice, data, header) do
+    {unpadded_nonce, nonce} = lite_nonce(voice)
+
+    [header, Salsa.encrypt(data, key, nonce), unpadded_nonce]
+  end
+
+  def encrypt_xsalsa20_poly1305_lite_rtpsize(voice, data, header),
+    do: encrypt_xsalsa20_poly1305_lite(voice, data, header)
+
+  def encrypt_xchacha20_poly1305(voice, data, header),
+    do: encrypt_aead_xchacha20_poly1305_rtpsize(voice, data, header)
+
+  def encrypt_aead_xchacha20_poly1305_rtpsize(%VoiceState{secret_key: key} = voice, data, header) do
+    {unpadded_nonce, nonce} = lite_nonce(voice)
+
+    [header, Chacha.encrypt(data, key, nonce, _aad = header), unpadded_nonce]
+  end
+
+  def encrypt_aead_aes256_gcm(voice, data, header), do: encrypt_aes256_gcm(voice, data, header)
+
+  def encrypt_aead_aes256_gcm_rtpsize(voice, data, header),
+    do: encrypt_aes256_gcm(voice, data, header)
+
+  def encrypt_aes256_gcm(%VoiceState{secret_key: key} = voice, data, header) do
+    {unpadded_nonce, nonce} = lite_nonce(voice, 12)
+
+    [header, Aes.encrypt(data, key, nonce, _aad = header), unpadded_nonce]
+  end
+
+  def decrypt_xsalsa20_poly1305(key, <<header::bytes-size(12), cipher_text::binary>>) do
+    nonce = header <> <<0::unit(8)-size(12)>>
+
+    Salsa.decrypt(cipher_text, key, nonce)
+  end
+
+  def decrypt_xsalsa20_poly1305_lite(key, data) do
+    {_header, cipher_text, _tag = <<>>, nonce} = decode_packet(data, 4, 24, 0)
+
+    Salsa.decrypt(cipher_text, key, nonce)
+  end
+
+  def decrypt_xsalsa20_poly1305_suffix(key, data) do
+    {_header, cipher_text, _tag = <<>>, nonce} = decode_packet(data, 24, 24, 0)
+
+    Salsa.decrypt(cipher_text, key, nonce)
+  end
+
+  def decrypt_xsalsa20_poly1305_lite_rtpsize(key, data) do
+    {_header, cipher_text, _tag, nonce, ext_len} = decode_packet_rtpsize(data, 24, 0)
+
+    <<_exts::unit(32)-size(ext_len), opus::binary>> = Salsa.decrypt(cipher_text, key, nonce)
+
+    opus
+  end
+
+  def decrypt_xchacha20_poly1305(key, data),
+    do: decrypt_aead_xchacha20_poly1305_rtpsize(key, data)
+
+  def decrypt_aead_xchacha20_poly1305_rtpsize(key, data) do
+    {header, cipher_text, tag, nonce, ext_len} = decode_packet_rtpsize(data, 24, 16)
+
+    <<_exts::unit(32)-size(ext_len), opus::binary>> =
+      Chacha.decrypt(cipher_text, key, nonce, _aad = header, tag)
+
+    opus
+  end
+
+  def decrypt_aes256_gcm(key, data), do: decrypt_aead_aes256_gcm_rtpsize(key, data)
+
+  def decrypt_aead_aes256_gcm_rtpsize(key, data) do
+    {header, cipher_text, tag, nonce, ext_len} = decode_packet_rtpsize(data, 12, 16)
+
+    <<_exts::unit(32)-size(ext_len), opus::binary>> =
+      Aes.decrypt(cipher_text, key, nonce, _aad = header, tag)
+
+    opus
+  end
+
+  def decrypt_aead_aes256_gcm(key, data) do
+    {header, cipher_text, tag, nonce} = decode_packet(data, 4, 12, 16)
+
+    Aes.decrypt(cipher_text, key, nonce, _aad = header, tag)
+  end
+
+  @lite_nonce_length 4
+
+  defp lite_nonce(%VoiceState{rtp_sequence: rtp_sequence}, nonce_length \\ 24) do
+    unpadded_nonce = <<rtp_sequence::32>>
+    nonce = unpadded_nonce <> <<0::unit(8)-size(nonce_length - @lite_nonce_length)>>
+    {unpadded_nonce, nonce}
+  end
+
+  # Discord's newer encryption modes ending in '_rtpsize' leave the first 4 bytes of the RTP
+  # header extension in plaintext while encrypting the elements themselves. The AAD is the
+  # 12-byte RTP header concatenated with the first 4 bytes of the RTP header extension.
+
+  # Much like is done within the function `Nostrum.Voice.Opus.strip_rtp_ext/1`, we pattern match
+  # on the `0xBEDE` constant and the 16-bit big-endian extension length that denotes the length
+  # in 32-bit words of the extension elements. Because the elements are a part of the cipher text,
+  # the extension length is the number of 32-bit words to discard after decryption to obtain
+  # solely the opus packet.
+
+  # This function returns a 5-element tuple with
+  # - RTP header
+  #   - Fixed 12 byte header concatenated with the first 4 bytes of the extension
+  #   - Used as the AAD for AEAD ciphers
+  # - cipher text
+  #   - RTP extension elements prepended to the opus packet
+  # - cipher tag (MAC)
+  # - nonce (padded)
+  # - RTP header extension length
+  #   - for isolating the opus after decryption
+  defp decode_packet_rtpsize(
+         <<header::bytes-size(12), 0xBE, 0xDE, ext_len::integer-16, rest::binary>>,
+         nonce_length,
+         tag_length
+       )
+       when byte_size(rest) - (@lite_nonce_length + tag_length) > ext_len * 4 do
+    header = header <> <<0xBE, 0xDE, ext_len::integer-16>>
+
+    {cipher_text, tag, unpadded_nonce} = split_data(rest, @lite_nonce_length, tag_length)
+
+    nonce = unpadded_nonce <> <<0::unit(8)-size(nonce_length - @lite_nonce_length)>>
+
+    {header, cipher_text, tag, nonce, ext_len}
+  end
+
+  # Non "rtpsize" modes where everything is encrypted beyond the 12-byte header
+  defp decode_packet(
+         <<header::bytes-size(12), rest::binary>>,
+         unpadded_nonce_length,
+         nonce_length,
+         tag_length
+       ) do
+    {cipher_text, tag, unpadded_nonce} = split_data(rest, unpadded_nonce_length, tag_length)
+
+    nonce = unpadded_nonce <> <<0::unit(8)-size(nonce_length - unpadded_nonce_length)>>
+
+    {header, cipher_text, tag, nonce}
+  end
+
+  defp split_data(data, unpadded_nonce_length, tag_length) do
+    cipher_text_length = byte_size(data) - (unpadded_nonce_length + tag_length)
+
+    <<cipher_text::bytes-size(cipher_text_length), tag::bytes-size(tag_length),
+      unpadded_nonce::bytes-size(unpadded_nonce_length)>> = data
+
+    {cipher_text, tag, unpadded_nonce}
+  end
+end