-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Try to fully support key rotation AND maxSessionCacheSize on devices with very limited key slots #1511
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
peaBerberian
added
DRM
Relative to DRM (EncryptedMediaExtensions)
work-in-progress
This Pull Request or issue is not finished yet
labels
Aug 21, 2024
peaBerberian
force-pushed
the
misc/max-session-cache-size-current-content
branch
from
August 29, 2024 20:00
d65e155
to
19f6f54
Compare
peaBerberian
force-pushed
the
misc/max-session-cache-size-current-content
branch
from
October 29, 2024 15:40
19f6f54
to
999730a
Compare
peaBerberian
force-pushed
the
misc/max-session-cache-size-current-content
branch
from
October 31, 2024 16:03
18a4984
to
0dd5214
Compare
…ng over the limit This is a retry of #1511 (I even re-used the same git branch) because I found the work in that PR to be too complex. If you already read the previous PR, you can skip the `The issue` below. The issue ========= Conditions ---------- This work is about a specific and for now never seen issue that has chance to pop up under the following conditions: - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it). - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously. We for now know of four different set-top boxes with that type of limitation, from two constructors. The `maxSessionCacheSize` option --------------------------------------------- Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time. _Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._ _Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._ So problem solved right? WRONG! The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently). Example scenario ---------------- So let's just imagine a simple scenario: 1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key. 2. we're asking the decryption logic to make sure the key is loaded for future Period `B` 3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`. Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason. 4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on. 5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly. In any case, I wouldn't bet on something good happening. Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering. Solution I'm proposing here =========================== In a previous PR, I tried to handle all cases but that became too complex and I know think that doing it in multiple steps may be easier to architects: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases. The simpler case it to just close `MediaKeySession` that are known to not be needed anymore if we go over the `maxSessionCacheSize` limit on the current content. To have a vague non-perfect idea of what is currently needed, we look at all `Period`s from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found. How I'm implementing this ========================= Detecting the issue ------------------- As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that: 1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first. 2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`. Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit. It also doesn't create the `MediaKeySession` in that case. Fixing the situation -------------------- The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`. It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event). Note: the new `ActiveSessionsStore` ----------------------------------- To allow the `ContentDecryptor` to easily know when it can restart creating `MediaKeySession` after encountering this `tooMuchSessions` situation and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around. This new store's difference with the `LoadedSessionsStore` may be unclear at first but there's one: - The `LoadedSessionsStore` stores information on all `MediaKeySession` currently attached to a `MediaKeys` (and also creates / close them). - The `ActiveSessionsStore` is technically just an array of `MediaKeySession` information and a `isFull` flag, and its intended semantic is to represent all `MediaKeySession` that are "actively-used" by the `ContentDecryptor` (in implementation, it basically means all `MediaKeySession` linked to the current content). If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the `ActiveSessionsStore` (the former contains all information from the latter) as "active" sessions are all currently "loaded". Writing that, I'm still unsure if the `isFull` flag would have more its place on the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals. We may discuss on what makes the most sense here. Remaining issues ================ This PR only fixes a fraction of the issue, actually the simpler part where we can close older `MediaKeySession` linked to the current content that we don't need anymore, like for example for a previous DASH Period. But there's also the risk of encountering that limit while preloading future contents encrypted through a different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code. Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now. So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...
peaBerberian
added a commit
that referenced
this pull request
Oct 31, 2024
…ng over the limit This is a retry of #1511 (I even re-used the same git branch) because I found the work in that PR to be too complex. If you already read the previous PR, you can skip the `The issue` below. The issue ========= Conditions ---------- This work is about a specific and for now never seen issue that has chance to pop up under the following conditions: - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it). - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously. We for now know of four different set-top boxes with that type of limitation, from two constructors. The `maxSessionCacheSize` option --------------------------------------------- Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time. _Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._ _Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._ So problem solved right? WRONG! The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently). Example scenario ---------------- So let's just imagine a simple scenario: 1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key. 2. we're asking the decryption logic to make sure the key is loaded for future Period `B` 3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`. Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason. 4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on. 5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly. In any case, I wouldn't bet on something good happening. Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering. Solution I'm proposing here =========================== In a previous PR, I tried to handle all cases but that became too complex and I know think that doing it in multiple steps may be easier to architects: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases. The simpler case it to just close `MediaKeySession` that are known to not be needed anymore if we go over the `maxSessionCacheSize` limit on the current content. To have a vague non-perfect idea of what is currently needed, we look at all `Period`s from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found. How I'm implementing this ========================= Detecting the issue ------------------- As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that: 1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first. 2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`. Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit. It also doesn't create the `MediaKeySession` in that case. Fixing the situation -------------------- The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`. It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event). Note: the new `ActiveSessionsStore` ----------------------------------- To allow the `ContentDecryptor` to easily know when it can restart creating `MediaKeySession` after encountering this `tooMuchSessions` situation and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around. This new store's difference with the `LoadedSessionsStore` may be unclear at first but there's one: - The `LoadedSessionsStore` stores information on all `MediaKeySession` currently attached to a `MediaKeys` (and also creates / close them). - The `ActiveSessionsStore` is technically just an array of `MediaKeySession` information and a `isFull` flag, and its intended semantic is to represent all `MediaKeySession` that are "actively-used" by the `ContentDecryptor` (in implementation, it basically means all `MediaKeySession` linked to the current content). If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the `ActiveSessionsStore` (the former contains all information from the latter) as "active" sessions are all currently "loaded". Writing that, I'm still unsure if the `isFull` flag would have more its place on the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals. We may discuss on what makes the most sense here. Remaining issues ================ This PR only fixes a fraction of the issue, actually the simpler part where we can close older `MediaKeySession` linked to the current content that we don't need anymore, like for example for a previous DASH Period. But there's also the risk of encountering that limit while preloading future contents encrypted through a different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code. Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now. So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...
peaBerberian
added a commit
that referenced
this pull request
Oct 31, 2024
…ng over the limit This is a retry of #1511 (I even re-used the same git branch) because I found the work in that PR to be too complex. If you already read the previous PR, you can skip the `The issue` below. The issue ========= Conditions ---------- This work is about a specific and for now never seen issue that has chance to pop up under the following conditions: - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it). - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously. We for now know of four different set-top boxes with that type of limitation, from two constructors. The `maxSessionCacheSize` option --------------------------------------------- Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time. _Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._ _Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._ So problem solved right? WRONG! The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently). Example scenario ---------------- So let's just imagine a simple scenario: 1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key. 2. we're asking the decryption logic to make sure the key is loaded for future Period `B` 3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`. Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason. 4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on. 5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly. In any case, I wouldn't bet on something good happening. Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering. Solution I'm proposing here =========================== In a previous PR, I tried to handle all cases but that became too complex and I know think that doing it in multiple steps may be easier to architects: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases. The simpler case it to just close `MediaKeySession` that are known to not be needed anymore if we go over the `maxSessionCacheSize` limit on the current content. To have a vague non-perfect idea of what is currently needed, we look at all `Period`s from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found. How I'm implementing this ========================= Detecting the issue ------------------- As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that: 1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first. 2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`. Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit. It also doesn't create the `MediaKeySession` in that case. Fixing the situation -------------------- The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`. It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event). Note: the new `ActiveSessionsStore` ----------------------------------- To allow the `ContentDecryptor` to easily know when it can restart creating `MediaKeySession` after encountering this `tooMuchSessions` situation and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around. This new store's difference with the `LoadedSessionsStore` may be unclear at first but there's one: - The `LoadedSessionsStore` stores information on all `MediaKeySession` currently attached to a `MediaKeys` (and also creates / close them). - The `ActiveSessionsStore` is technically just an array of `MediaKeySession` information and a `isFull` flag, and its intended semantic is to represent all `MediaKeySession` that are "actively-used" by the `ContentDecryptor` (in implementation, it basically means all `MediaKeySession` linked to the current content). If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the `ActiveSessionsStore` (the former contains all information from the latter) as "active" sessions are all currently "loaded". Writing that, I'm still unsure if the `isFull` flag would have more its place on the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals. We may discuss on what makes the most sense here. Remaining issues ================ This PR only fixes a fraction of the issue, actually the simpler part where we can close older `MediaKeySession` linked to the current content that we don't need anymore, like for example for a previous DASH Period. But there's also the risk of encountering that limit while preloading future contents encrypted through a different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code. Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now. So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...
peaBerberian
added a commit
that referenced
this pull request
Oct 31, 2024
…ng over the limit This is a retry of #1511 (I even re-used the same git branch) because I found the work in that PR to be too complex. If you already read the previous PR, you can skip the `The issue` below. The issue ========= Conditions ---------- This work is about a specific and for now never seen issue that has chance to pop up under the following conditions: - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it). - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously. We for now know of four different set-top boxes with that type of limitation, from two constructors. The `maxSessionCacheSize` option --------------------------------------------- Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time. _Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._ _Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._ So problem solved right? WRONG! The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently). Example scenario ---------------- So let's just imagine a simple scenario: 1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key. 2. we're asking the decryption logic to make sure the key is loaded for future Period `B` 3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`. Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason. 4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on. 5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly. In any case, I wouldn't bet on something good happening. Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering. Solution I'm proposing here =========================== In a previous PR, I tried to handle all cases but that became too complex and I know think that doing it in multiple steps may be easier to architects: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases. The simpler case it to just close `MediaKeySession` that are known to not be needed anymore if we go over the `maxSessionCacheSize` limit on the current content. To have a vague non-perfect idea of what is currently needed, we look at all `Period`s from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found. How I'm implementing this ========================= Detecting the issue ------------------- As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that: 1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first. 2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`. Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit. It also doesn't create the `MediaKeySession` in that case. Fixing the situation -------------------- The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`. It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event). Note: the new `ActiveSessionsStore` ----------------------------------- To allow the `ContentDecryptor` to easily know when it can restart creating `MediaKeySession` after encountering this `tooMuchSessions` situation and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around. This new store's difference with the `LoadedSessionsStore` may be unclear at first but there's one: - The `LoadedSessionsStore` stores information on all `MediaKeySession` currently attached to a `MediaKeys` (and also creates / close them). - The `ActiveSessionsStore` is technically just an array of `MediaKeySession` information and a `isFull` flag, and its intended semantic is to represent all `MediaKeySession` that are "actively-used" by the `ContentDecryptor` (in implementation, it basically means all `MediaKeySession` linked to the current content). If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the `ActiveSessionsStore` (the former contains all information from the latter) as "active" sessions are all currently "loaded". Writing that, I'm still unsure if the `isFull` flag would have more its place on the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals. We may discuss on what makes the most sense here. Remaining issues ================ This PR only fixes a fraction of the issue, actually the simpler part where we can close older `MediaKeySession` linked to the current content that we don't need anymore, like for example for a previous DASH Period. But there's also the risk of encountering that limit while preloading future contents encrypted through a different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code. Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now. So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...
peaBerberian
added a commit
that referenced
this pull request
Nov 15, 2024
…ng over the limit This is a retry of #1511 (I even re-used the same git branch) because I found the work in that PR to be too complex. If you already read the previous PR, you can skip the `The issue` below. The issue ========= Conditions ---------- This work is about a specific and for now never seen issue that has chance to pop up under the following conditions: - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it). - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously. We for now know of four different set-top boxes with that type of limitation, from two constructors. The `maxSessionCacheSize` option --------------------------------------------- Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time. _Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._ _Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._ So problem solved right? WRONG! The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently). Example scenario ---------------- So let's just imagine a simple scenario: 1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key. 2. we're asking the decryption logic to make sure the key is loaded for future Period `B` 3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`. Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason. 4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on. 5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly. In any case, I wouldn't bet on something good happening. Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering. Solution I'm proposing here =========================== In a previous PR, I tried to handle all cases but that became too complex and I know think that doing it in multiple steps may be easier to architects: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases. The simpler case it to just close `MediaKeySession` that are known to not be needed anymore if we go over the `maxSessionCacheSize` limit on the current content. To have a vague non-perfect idea of what is currently needed, we look at all `Period`s from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found. How I'm implementing this ========================= Detecting the issue ------------------- As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that: 1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first. 2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`. Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit. It also doesn't create the `MediaKeySession` in that case. Fixing the situation -------------------- The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`. It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event). Note: the new `ActiveSessionsStore` ----------------------------------- To allow the `ContentDecryptor` to easily know when it can restart creating `MediaKeySession` after encountering this `tooMuchSessions` situation and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around. This new store's difference with the `LoadedSessionsStore` may be unclear at first but there's one: - The `LoadedSessionsStore` stores information on all `MediaKeySession` currently attached to a `MediaKeys` (and also creates / close them). - The `ActiveSessionsStore` is technically just an array of `MediaKeySession` information and a `isFull` flag, and its intended semantic is to represent all `MediaKeySession` that are "actively-used" by the `ContentDecryptor` (in implementation, it basically means all `MediaKeySession` linked to the current content). If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the `ActiveSessionsStore` (the former contains all information from the latter) as "active" sessions are all currently "loaded". Writing that, I'm still unsure if the `isFull` flag would have more its place on the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals. We may discuss on what makes the most sense here. Remaining issues ================ This PR only fixes a fraction of the issue, actually the simpler part where we can close older `MediaKeySession` linked to the current content that we don't need anymore, like for example for a previous DASH Period. But there's also the risk of encountering that limit while preloading future contents encrypted through a different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code. Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now. So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
DRM
Relative to DRM (EncryptedMediaExtensions)
work-in-progress
This Pull Request or issue is not finished yet
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since Monday, I'm trying to fix the theoretical issue described here, but it proved very hard to actually implement.
As such, I chose to open its Pull Request despite it still being non-functional, to expose (to other developers and also to myself), what I tried and what wall I encountered here.
We may in the end try different solutions.
The issue
Preamble: conditions
This work is about a specific and for now never seen issue that has chance to pop up under the following conditions:
The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it).
The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes VERY limited, e.g.
6
, so we can at most rely on 6 keys simultaneously.We for now know of two different set-top boxes with that limitation.
The
maxSessionCacheSize
optionTheoretically, an application can rely there on the
keySystems[].maxSessionCacheSize
option to set a maximum number ofMediaKeySession
we may keep at at the same time.Note that we prefer here to rely on a number of
MediaKeySession
and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1MediaKeySession
== 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license.Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its
maxSessionCacheSize
accordingly (e.g. if there's max10
key slots available on the device and5
keys per license maximum, it could just communicate to us amaxSessionCacheSize
of2
).So problem solved right? WRONG!
The RxPlayer, when exploiting the
maxSessionCacheSize
property, will when that limit is reached just close theMediaKeySession
it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that aMediaKeySession
or more precizely a key it can make use of, has been needed recently.Example scenario
So let's just imagine a scenario:
we're currently loading a Period A with encrypted content and buffering a future Period B with content encrypted with a different key.
we're asking the decryption logic to make sure the key is loaded for future Period B
The decryption logic sees that it doesn't have the key yet, and thus has to create a new
MediaKeySession
.Yet, it sees that it cannot create a new
MediaKeySession
for that new key without closing an old one to respect thekeySystems[].maxSessionCacheSize
option, and it turns out one relied on to play Period A was the least recently needed for some reason.The decryption logic closes a
MediaKeySession
for PeriodA
that was currently relied on.??? I don't know what happens, the
MediaKeySession
closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly.In any case, I wouldn't bet on something good happening.
Other types of scenarios are possible, e.g. we could be closing a
MediaKeySession
needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering.Solution I'm proposing here
The solution I tried to implement tries the following logic:
When closing a
MediaKeySession
linked to the current content, it will stop relying on "least-recently used" principles (this algorithm will still be relied on for "cached"MediaKeySession
- e.g. when switching between multiple TV channels).The idea here is more aligned with usual buffer-linked eviction algorithms:
We'll begin to close
MediaKeySession
linked only to already-playedPeriod
, in theory beginning by those further in the past.In reality let's just close all of them.
if it's not sufficient, we'll close
MediaKeySession
linked toPeriod
distant in the future (like further in the future first).if it's not sufficient, e.g. if even just for the current Period there's too many
MediaKeySession
, I'm not sure of what to do for now (probably do nothing and let things break)This only take into consideration the concept of
Period
, even if theoretically aMediaKeySession
could only be linked to a particular track orRepresentation
.This is only because - when implementing the solution - my brain already hurt thinking about the very simple concept of per-Period licenses, I didn't want to go to other aspects, we'll see those details later (even if they are also very possible - the solution I write here also work for them, even if we could also have an even better one taking into consideration e.g. the current tracks).
How I'm implementing this
On the
ContentDecryptor
-sideThis feature has been a little hard to actually implement.
First, as we now have a difference in our
MediaKeySession
-closing algorithm depending on if theMediaKeySession
is linked to the current content or not, I chose in ourContentDecryptor
module that:MediaKeySession
that are not linked to the current content keep being closed as before: least recently needed first.MediaKeySession
that are linked to the current content are never directly closed by theContentDecryptor
.Instead, the
ContentDecryptor
module basically signals, when only left withMediaKeySession
for the current content yet going over themaxSessionCacheSize
limit, atooMuchSessions
event and lock its queue of protection data (it stops basically creating newMediaKeySession
until older sessions are closed).The
ContentDecryptor
also exposes a new method,freeKeyIds
. You communicate to it the key id you don't need anymore, then theContentDecryptor
will see if can consequently closeMediaKeySession
, then check if themaxSessionCacheSize
is respected, restart its queue if it is, and so on.On the
ContentInitializer
-sideThe
ContentInitializer
module then receptiontooMuchSessions
events coming from theContentDecryptor
, and rely on our newly-defined algorithm to know whichMediaKeySession
it can let go.Though this is where for now I'm lost due to edge cases:
When closing
MediaKeySession
linked to futurePeriod
, theStream
modules loading their linked content might be in the process of running, and they won't know for now that theMediaKeySession
has been closed and thus not think about re-asking for the decryption key.There's many potential solutions here (relying on the concept of
lockedStream
, reloading, actually communicating to those streams that for now aMediaKeySession
is closed, re-asking for the right key once the content is playing inside theContentInitializer
), yet any of them are very hard to get right and all of them have inconvenients.There's still the issue of what should happen if we're left only with the current Period. I already lost many neurons on the simpler first simpler future-Period case, and in consequence had none to spare for this more complex possibility.
This is on another level, but what if while we were running this algorithm, some random
MediaKeySession
is closed by the browser (or some RxPlayer/application logic running in concurrence)?We may theoretically go below the
maxSessionCacheSize
limit at any time, due to some outside behavior, and in the end not actually need to continue running the algorithm.For now, even if that one is possible, I chose that having perfect code that would restart processing init data by itself due to that type of event too difficult to actually implement.
So we'll probably should not show broken behavior if that actually happens, but I wasn't explicitely handling it here.