DRM: Second attempt to manage a maxSessionCacheSize on contents going over the limit#1590
DRM: Second attempt to manage a maxSessionCacheSize on contents going over the limit#1590peaBerberian wants to merge 1 commit intodevfrom
maxSessionCacheSize on contents going over the limit#1590Conversation
d6a59a0 to
f870862
Compare
f870862 to
e229ef3
Compare
| for (let i = 0; i < toDelete; i++) { | ||
| const entry = entries[i]; | ||
| proms.push(loadedSessionsStore.closeSession(entry.mediaKeySession)); | ||
| const sessionsMetadata = loadedSessionsStore.getAll().slice(); // clone |
There was a problem hiding this comment.
What the purpose of cloning here?
There was a problem hiding this comment.
It doesn't seem necessary here.
I often do that to either ensure that infinite loops are prevented (due to a permanently updated array) or when I want to take a snapshot of an array at a particular point in time.
Here none of that appears to be necessary, do you think I should remove the slice call?
| return; | ||
| } | ||
| log.info("DRM: LSS cache limit exceeded", limit, loadedSessionsStore.getLength()); | ||
| const proms: Array<Promise<unknown>> = []; |
There was a problem hiding this comment.
can be typed as Promise boolean ?
There was a problem hiding this comment.
Maybe, but here the return type was not important because we never rely on it. I typed it as unknown so it is not an obstacle now or in future evolutions.
e229ef3 to
b95dc6d
Compare
b95dc6d to
c4e8b88
Compare
00fc806 to
b7216b4
Compare
196f443 to
1baa96c
Compare
1baa96c to
2366c19
Compare
2366c19 to
1aaaa27
Compare
1aaaa27 to
0888009
Compare
0888009 to
5dfd9cd
Compare
716b2c8 to
aab2c57
Compare
5dfd9cd to
e44b017
Compare
1184032 to
85ee050
Compare
6cfd206 to
1e55170
Compare
a42734c to
29372ac
Compare
85ee050 to
b0cc5fa
Compare
d4be192 to
9ad6758
Compare
0142e34 to
1fd9df3
Compare
…ng over the limit This is a retry of #1511 (I even re-used the same git branch) because I found the work in that PR to be too complex. If you already read the previous PR, you can skip the `The issue` below. The issue ========= Conditions ---------- This work is about a specific and for now never seen issue that has chance to pop up under the following conditions: - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it). - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously. We for now know of four different set-top boxes with that type of limitation, from two constructors. The `maxSessionCacheSize` option --------------------------------------------- Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time. _Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._ _Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._ So problem solved right? WRONG! The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently). Example scenario ---------------- So let's just imagine a simple scenario: 1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key. 2. we're asking the decryption logic to make sure the key is loaded for future Period `B` 3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`. Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason. 4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on. 5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly. In any case, I wouldn't bet on something good happening. Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering. Solution I'm proposing here =========================== In a previous PR, I tried to handle all cases but that became too complex and I know think that doing it in multiple steps may be easier to architects: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases. The simpler case it to just close `MediaKeySession` that are known to not be needed anymore if we go over the `maxSessionCacheSize` limit on the current content. To have a vague non-perfect idea of what is currently needed, we look at all `Period`s from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found. How I'm implementing this ========================= Detecting the issue ------------------- As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that: 1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first. 2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`. Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit. It also doesn't create the `MediaKeySession` in that case. Fixing the situation -------------------- The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`. It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event). Note: the new `ActiveSessionsStore` ----------------------------------- To allow the `ContentDecryptor` to easily know when it can restart creating `MediaKeySession` after encountering this `tooMuchSessions` situation and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around. This new store's difference with the `LoadedSessionsStore` may be unclear at first but there's one: - The `LoadedSessionsStore` stores information on all `MediaKeySession` currently attached to a `MediaKeys` (and also creates / close them). - The `ActiveSessionsStore` is technically just an array of `MediaKeySession` information and a `isFull` flag, and its intended semantic is to represent all `MediaKeySession` that are "actively-used" by the `ContentDecryptor` (in implementation, it basically means all `MediaKeySession` linked to the current content). If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the `ActiveSessionsStore` (the former contains all information from the latter) as "active" sessions are all currently "loaded". Writing that, I'm still unsure if the `isFull` flag would have more its place on the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals. We may discuss on what makes the most sense here. Remaining issues ================ This PR only fixes a fraction of the issue, actually the simpler part where we can close older `MediaKeySession` linked to the current content that we don't need anymore, like for example for a previous DASH Period. But there's also the risk of encountering that limit while preloading future contents encrypted through a different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code. Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now. So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...
b0cc5fa to
1a2f91b
Compare
|
✅ Automated performance checks have passed on commit DetailsPerformance tests 1st run outputNo significative change in performance for tests:
|
This is a retry of #1511 (I even re-used the same git branch) because I found the work in that first PR to be too complex.
If you already read the previous PR, you can skip the
The issuechapter below.The issue
Conditions
This work is about a specific and for now never seen issue that has chance to appear under the following conditions:
The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it).
The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes VERY limited, e.g.
6, so we can at most rely on 6 keys simultaneously.We for now know of four different set-top boxes with that type of limitation, from two constructors.
The
maxSessionCacheSizeoptionTheoretically, an application can rely there on the
keySystems[].maxSessionCacheSizeoption to set a maximum number ofMediaKeySessionwe may keep at at the same time.Note that we prefer here to rely on a number of
MediaKeySessionand not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1MediaKeySession== 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license. Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up itsmaxSessionCacheSizeaccordingly (e.g. if there's max10key slots available on the device and5keys per license maximum, it could just communicate to us amaxSessionCacheSizeof2).So problem solved right? WRONG!
The RxPlayer, when exploiting the
maxSessionCacheSizeproperty, will when that limit is reached just close theMediaKeySessionit has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that aMediaKeySessionor more precizely a key it can make use of, has been needed recently).Example scenario
So let's just imagine a simple scenario:
we're currently loading a Period
Awith encrypted content and buffering a future PeriodBwith content encrypted with a different key.we're asking the decryption logic to make sure the key is loaded for future Period
BThe decryption logic sees that it doesn't have the key yet, and thus has to create a new
MediaKeySession.Yet, it sees that it cannot create a new
MediaKeySessionfor that new key without closing an old one to respect thekeySystems[].maxSessionCacheSizeoption, and it turns out one relied on to play Period A was the least recently needed for some reason.The decryption logic closes a
MediaKeySessionfor PeriodAthat was currently relied on.??? I don't know what happens, the
MediaKeySessionclosure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly.In any case, I wouldn't bet on something good happening.
Other types of scenarios are possible, e.g. we could be closing a
MediaKeySessionneeded in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering.Solution I'm proposing here
In a previous PR, I tried to handle all cases but that became too complex and I now think that doing it in multiple steps may be easier to write: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases.
The simpler case it to just close
MediaKeySessionthat are known to not be needed anymore if we go over themaxSessionCacheSizelimit on the current content.To have a vague imperfect idea of what is currently needed, we look at all
Periods from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found.How I'm implementing this
Detecting the issue
As we now have a difference in our
MediaKeySession-closing algorithm depending on if theMediaKeySessionis linked to the current content or not, I chose in ourContentDecryptormodule that:MediaKeySessionthat are not linked to the current content keep being closed like they were before: least recently needed first.MediaKeySessionthat are linked to the current content are never directly closed by theContentDecryptor.Instead, the
ContentDecryptormodule basically signals atooMuchSessionsevent when only left withMediaKeySessionfor the current content yet going over themaxSessionCacheSizelimit.It also doesn't create the
MediaKeySessionin that case.Fixing the situation
The
ContentDecryptornow exposes a new method,freeKeyIds. The idea is that you communicate to it the key id you don't need anymore, then theContentDecryptorwill see if can consequently close someMediaKeySession.It is the role of the
ContentInitializerto call thisfreeKeyIdsmethod on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of thetooMuchSessionsevent).Note: the new
ActiveSessionsStoreTo allow the
ContentDecryptorto easily know when it can restart creatingMediaKeySessionafter encountering thistooMuchSessionssituation and then having itsfreeKeyIdsmethod called, I replaced its simple_currentSessionsprivate array into a new kind of "MediaKeySession store" (a third one after theLoadedSessionsStoreand thePersistentSessionsStore), called theActiveSessionsStore, which also keeps aisFullboolean around.This new store's difference with the
LoadedSessionsStoremay be unclear at first but there's one:The
LoadedSessionsStorestores information on allMediaKeySessioncurrently attached to aMediaKeys(and also creates / close them).The
ActiveSessionsStoreis technically just an array ofMediaKeySessioninformation and aisFullflag, and its intended semantic is to represent allMediaKeySessionthat are "actively-used" by theContentDecryptor(in implementation, it basically means allMediaKeySessionlinked to the current content).If you followed, note that the session information stored by the
LoadedSessionsStoreis a superset of the ones stored by theActiveSessionsStore(the former contains all information from the latter) as "active" sessions are all currently "loaded".Writing that, I'm still unsure if the
isFullflag would have more its place on theLoadedSessionsStoreinstead. After allmaxSessionCacheSizetechnically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals.We may discuss on what makes the most sense here.
Remaining issues
This PR only fixes a fraction of the issue, actually the simpler part where we can close older
MediaKeySessionlinked to the current content that we don't need anymore, like for example for a previous DASH Period.But there's also the risk of encountering that limit while preloading future contents encrypted through different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code.
Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now.
So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...