Skip to content

DRM: Second attempt to manage a maxSessionCacheSize on contents going over the limit#1590

Open
peaBerberian wants to merge 1 commit intodevfrom
misc/max-session-cache-size-current-content
Open

DRM: Second attempt to manage a maxSessionCacheSize on contents going over the limit#1590
peaBerberian wants to merge 1 commit intodevfrom
misc/max-session-cache-size-current-content

Conversation

@peaBerberian
Copy link
Copy Markdown
Collaborator

@peaBerberian peaBerberian commented Oct 31, 2024

This is a retry of #1511 (I even re-used the same git branch) because I found the work in that first PR to be too complex.

If you already read the previous PR, you can skip the The issue chapter below.

The issue

Conditions

This work is about a specific and for now never seen issue that has chance to appear under the following conditions:

  • The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it).

  • The device on which we we play that content has a very limited number of key slots available simultaneously for decryption (sometimes VERY limited, e.g. 6, so we can at most rely on 6 keys simultaneously.

    We for now know of four different set-top boxes with that type of limitation, from two constructors.

The maxSessionCacheSize option

Theoretically, an application can rely there on the keySystems[].maxSessionCacheSize option to set a maximum number of MediaKeySession we may keep at at the same time.

Note that we prefer here to rely on a number of MediaKeySession and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 MediaKeySession == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license. Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its maxSessionCacheSize accordingly (e.g. if there's max 10 key slots available on the device and 5 keys per license maximum, it could just communicate to us a maxSessionCacheSize of 2).

So problem solved right? WRONG!

The RxPlayer, when exploiting the maxSessionCacheSize property, will when that limit is reached just close the MediaKeySession it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a MediaKeySession or more precizely a key it can make use of, has been needed recently).

Example scenario

So let's just imagine a simple scenario:

  1. we're currently loading a Period A with encrypted content and buffering a future Period B with content encrypted with a different key.

  2. we're asking the decryption logic to make sure the key is loaded for future Period B

  3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new MediaKeySession.

    Yet, it sees that it cannot create a new MediaKeySession for that new key without closing an old one to respect the keySystems[].maxSessionCacheSize option, and it turns out one relied on to play Period A was the least recently needed for some reason.

  4. The decryption logic closes a MediaKeySession for Period A that was currently relied on.

  5. ??? I don't know what happens, the MediaKeySession closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly.

    In any case, I wouldn't bet on something good happening.

Other types of scenarios are possible, e.g. we could be closing a MediaKeySession needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering.

Solution I'm proposing here

In a previous PR, I tried to handle all cases but that became too complex and I now think that doing it in multiple steps may be easier to write: we handle first the "simple" cases (which sadly are not the most frequent ones), we'll then see the harder cases.

The simpler case it to just close MediaKeySession that are known to not be needed anymore if we go over the maxSessionCacheSize limit on the current content.

To have a vague imperfect idea of what is currently needed, we look at all Periods from the current position onward, list their key ids, compare with the keys currently handled by our DRM logic, and just close the ones that haven't been found.

How I'm implementing this

Detecting the issue

As we now have a difference in our MediaKeySession-closing algorithm depending on if the MediaKeySession is linked to the current content or not, I chose in our ContentDecryptor module that:

  1. MediaKeySession that are not linked to the current content keep being closed like they were before: least recently needed first.

  2. MediaKeySession that are linked to the current content are never directly closed by the ContentDecryptor.

    Instead, the ContentDecryptor module basically signals a tooMuchSessions event when only left with MediaKeySession for the current content yet going over the maxSessionCacheSize limit.

    It also doesn't create the MediaKeySession in that case.

Fixing the situation

The ContentDecryptor now exposes a new method, freeKeyIds. The idea is that you communicate to it the key id you don't need anymore, then the ContentDecryptor will see if can consequently close some MediaKeySession.

It is the role of the ContentInitializer to call this freeKeyIds method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the tooMuchSessions event).

Note: the new ActiveSessionsStore

To allow the ContentDecryptor to easily know when it can restart creating MediaKeySession after encountering this tooMuchSessions situation and then having its freeKeyIds method called, I replaced its simple _currentSessions private array into a new kind of "MediaKeySession store" (a third one after the LoadedSessionsStore and the PersistentSessionsStore), called the ActiveSessionsStore, which also keeps a isFull boolean around.

This new store's difference with the LoadedSessionsStore may be unclear at first but there's one:

  • The LoadedSessionsStore stores information on all MediaKeySession currently attached to a MediaKeys (and also creates / close them).

  • The ActiveSessionsStore is technically just an array of MediaKeySession information and a isFull flag, and its intended semantic is to represent all MediaKeySession that are "actively-used" by the ContentDecryptor (in implementation, it basically means all MediaKeySession linked to the current content).

If you followed, note that the session information stored by the LoadedSessionsStore is a superset of the ones stored by the ActiveSessionsStore (the former contains all information from the latter) as "active" sessions are all currently "loaded".

Writing that, I'm still unsure if the isFull flag would have more its place on the LoadedSessionsStore instead. After all maxSessionCacheSize technically applies to all "loaded" sessions, not just the "active" ones which is a concept only relied on by RxPlayer internals.

We may discuss on what makes the most sense here.

Remaining issues

This PR only fixes a fraction of the issue, actually the simpler part where we can close older MediaKeySession linked to the current content that we don't need anymore, like for example for a previous DASH Period.

But there's also the risk of encountering that limit while preloading future contents encrypted through different keys, or when seeking back at a previous DASH Period with different keys. In all those scenarios (which actually seems more probable), there's currently no fix, just error logs and multiple FIXME mentions in the code.

Fixing this issue while keeping a readable code is very hard right now, moreover for what is only a suite of theoretical problems that has never been observed for now.

So I sometimes wonder if the best compromise would not be to just let it happen, and have an heuristic somewhere else detecting the issue and fixing it by slightly reducing the experience (e.g. by reloading + disabling future Period pre-loading)...

@peaBerberian peaBerberian added bug This is an RxPlayer issue (unexpected result when comparing to the API) DRM Relative to DRM (EncryptedMediaExtensions) labels Oct 31, 2024
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from d6a59a0 to f870862 Compare October 31, 2024 18:27
@peaBerberian peaBerberian added this to the 4.3.0 milestone Nov 6, 2024
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from f870862 to e229ef3 Compare November 15, 2024 18:01
@peaBerberian peaBerberian added the Priority: 2 (Medium) This issue or PR has a medium priority. label Nov 22, 2024
for (let i = 0; i < toDelete; i++) {
const entry = entries[i];
proms.push(loadedSessionsStore.closeSession(entry.mediaKeySession));
const sessionsMetadata = loadedSessionsStore.getAll().slice(); // clone
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the purpose of cloning here?

Copy link
Copy Markdown
Collaborator Author

@peaBerberian peaBerberian Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem necessary here.
I often do that to either ensure that infinite loops are prevented (due to a permanently updated array) or when I want to take a snapshot of an array at a particular point in time.

Here none of that appears to be necessary, do you think I should remove the slice call?

return;
}
log.info("DRM: LSS cache limit exceeded", limit, loadedSessionsStore.getLength());
const proms: Array<Promise<unknown>> = [];
Copy link
Copy Markdown
Collaborator

@Florent-Bouisset Florent-Bouisset Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be typed as Promise boolean ?

Copy link
Copy Markdown
Collaborator Author

@peaBerberian peaBerberian Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but here the return type was not important because we never rely on it. I typed it as unknown so it is not an obstacle now or in future evolutions.

@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from e229ef3 to b95dc6d Compare December 12, 2024 13:47
@peaBerberian peaBerberian added Priority: 3 (Low) This issue or PR has a low priority. and removed Priority: 2 (Medium) This issue or PR has a medium priority. labels Dec 23, 2024
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from b95dc6d to c4e8b88 Compare January 28, 2025 14:05
@peaBerberian peaBerberian modified the milestones: 4.3.0, 4.4.0 Apr 7, 2025
@peaBerberian peaBerberian force-pushed the dev branch 2 times, most recently from 00fc806 to b7216b4 Compare April 15, 2025 18:14
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch 2 times, most recently from 196f443 to 1baa96c Compare April 22, 2025 18:27
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from 1baa96c to 2366c19 Compare June 30, 2025 12:41
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from 2366c19 to 1aaaa27 Compare July 8, 2025 10:45
@peaBerberian peaBerberian removed this from the 4.4.0 milestone Jul 23, 2025
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from 1aaaa27 to 0888009 Compare August 1, 2025 13:27
@canalplus canalplus deleted a comment from github-actions bot Aug 1, 2025
@canalplus canalplus deleted a comment from github-actions bot Aug 1, 2025
@canalplus canalplus deleted a comment from github-actions bot Aug 1, 2025
@canalplus canalplus deleted a comment from github-actions bot Aug 1, 2025
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from 0888009 to 5dfd9cd Compare August 29, 2025 15:46
@peaBerberian peaBerberian force-pushed the dev branch 4 times, most recently from 716b2c8 to aab2c57 Compare September 25, 2025 15:25
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from 5dfd9cd to e44b017 Compare September 25, 2025 20:44
@canalplus canalplus deleted a comment from github-actions bot Sep 26, 2025
@canalplus canalplus deleted a comment from github-actions bot Sep 26, 2025
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch 2 times, most recently from 1184032 to 85ee050 Compare October 10, 2025 09:08
@peaBerberian peaBerberian force-pushed the dev branch 4 times, most recently from 6cfd206 to 1e55170 Compare October 13, 2025 20:54
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from 85ee050 to b0cc5fa Compare December 10, 2025 17:32
@peaBerberian peaBerberian force-pushed the dev branch 2 times, most recently from d4be192 to 9ad6758 Compare December 19, 2025 19:49
@peaBerberian peaBerberian force-pushed the dev branch 9 times, most recently from 0142e34 to 1fd9df3 Compare January 27, 2026 11:59
…ng over the limit

This is a retry of #1511 (I even re-used the same git branch) because
I found the work in that PR to be too complex.

If you already read the previous PR, you can skip the `The issue` below.

The issue
=========

Conditions
----------

This work is about a specific and for now never seen issue that has chance to pop up under the following conditions:

  - The current content make use of per-Period key rotation (NOTE: this is also possible without key rotation, but this is made much more probable by it).

  - The device on which we we play that content has a very limited number of key slots available simultaneously for decryption  (sometimes **VERY** limited, e.g. `6`, so we can at most rely on 6 keys simultaneously.

    We for now know of four different set-top boxes with that type of limitation, from two constructors.

The `maxSessionCacheSize` option
---------------------------------------------

Theoretically, an application can rely there on the `keySystems[].maxSessionCacheSize` option to set a maximum number of `MediaKeySession` we may keep at at the same time.

_Note that we prefer here to rely on a number of `MediaKeySession` and not of keys, because the RxPlayer is not able to predict how many keys it will find inside a license (NOTE: to simplify, let's just say that 1 `MediaKeySession` == 1 license here, as it's very predominantly the case) nor is it able to not communicate some keys that are found in a given license._
_Yet an application generally has a rough idea of how many keys it's going to find in a license a most, and can set up its `maxSessionCacheSize` accordingly (e.g. if there's max `10` key slots available on the device and `5` keys per license maximum, it could just communicate to us a `maxSessionCacheSize` of `2`)._

So problem solved right? WRONG!

The RxPlayer, when exploiting the `maxSessionCacheSize` property, will when that limit is reached just close the `MediaKeySession` it has least recently seen the need for (basically, when the RxPlayer loads segment for a new Period/track/Representation, it asks our decryption logic to make sure it has the right key, this is how our decryption logic know that a `MediaKeySession` or more precizely a key it can make use of, has been needed recently).

Example scenario
----------------

So let's just imagine a simple scenario:

  1. we're currently loading a Period `A` with encrypted content and buffering a future Period `B` with content encrypted with a different key.

  2. we're asking the decryption logic to make sure the key is loaded for future Period `B`

  3. The decryption logic sees that it doesn't have the key yet, and thus has to create a new `MediaKeySession`.

     Yet, it sees that it cannot create a new `MediaKeySession` for that new key without closing an old one to respect the `keySystems[].maxSessionCacheSize` option, and it turns out one relied on to play Period A was the least recently needed for some reason.

  4. The decryption logic closes a `MediaKeySession` for Period `A` that was currently relied on.

  5. ??? I don't know what happens, the `MediaKeySession` closure may fail in which case we could be left with too many key slots used on the device and some random error, or content playback may just fail directly.

     In any case, I wouldn't bet on something good happening.

Other types of scenarios are possible, e.g. we could be closing a `MediaKeySession` needed in the future and not think to re-create it when playing that future Period, potentially leading to a future infinite rebuffering.

Solution I'm proposing here
===========================

In a previous PR, I tried to handle all cases but that became too
complex and I know think that doing it in multiple steps may be easier
to architects: we handle first the "simple" cases (which sadly are not
the most frequent ones), we'll then see the harder cases.

The simpler case it to just close `MediaKeySession` that are known to
not be needed anymore if we go over the `maxSessionCacheSize` limit on
the current content.

To have a vague non-perfect idea of what is currently needed, we look
at all `Period`s from the current position onward, list their key ids,
compare with the keys currently handled by our DRM logic, and just
close the ones that haven't been found.

How I'm implementing this
=========================

Detecting the issue
-------------------

As we now have a difference in our `MediaKeySession`-closing algorithm depending on if the `MediaKeySession` is linked to the current content or not, I chose in our `ContentDecryptor` module that:

  1. `MediaKeySession` that are not linked to the current content keep being closed like they were before: least recently needed first.

  2. `MediaKeySession` that are linked to the current content are never directly closed by the `ContentDecryptor`.

     Instead, the `ContentDecryptor` module basically signals a `tooMuchSessions` event when only left with `MediaKeySession` for the current content yet going over the `maxSessionCacheSize` limit.

     It also doesn't create the `MediaKeySession` in that case.

Fixing the situation
--------------------

The `ContentDecryptor` now exposes a new method, `freeKeyIds`. The idea is that you communicate to it the key id you don't need anymore, then the `ContentDecryptor` will see if can consequently close some `MediaKeySession`.

It is the role of the `ContentInitializer` to call this `freeKeyIds` method on key ids it doesn't seem to have the use of anymore (all key ids are in the payload of the `tooMuchSessions` event).

Note: the new `ActiveSessionsStore`
-----------------------------------

To allow the `ContentDecryptor` to easily know when it can restart creating
`MediaKeySession` after encountering this `tooMuchSessions` situation
and then having its `freeKeyIds` method called, I replaced its simple `_currentSessions` private array into a new kind of "MediaKeySession store" (a third one after the `LoadedSessionsStore` and the `PersistentSessionsStore`), called the `ActiveSessionsStore`, which also keeps a `isFull` boolean around.

This new store's difference with the `LoadedSessionsStore` may be unclear
at first but there's one:

  - The `LoadedSessionsStore` stores information on all `MediaKeySession`
    currently attached to a `MediaKeys` (and also creates / close them).

  - The `ActiveSessionsStore` is technically just an array of
    `MediaKeySession` information and a `isFull` flag, and its intended
    semantic is to represent all `MediaKeySession` that are "actively-used"
    by the `ContentDecryptor` (in implementation, it basically means all
    `MediaKeySession` linked to the current content).

If you followed, note that the session information stored by the `LoadedSessionsStore` is a superset of the ones stored by the
`ActiveSessionsStore` (the former contains all information from the
latter) as "active" sessions are all currently "loaded".

Writing that, I'm still unsure if the `isFull` flag would have more its place on
the `LoadedSessionsStore` instead. After all `maxSessionCacheSize` technically
applies to all "loaded" sessions, not just the "active" ones which is a
concept only relied on by RxPlayer internals.

We may discuss on what makes the most sense here.

Remaining issues
================

This PR only fixes a fraction of the issue, actually the simpler part
where we can close older `MediaKeySession` linked to the current content
that we don't need anymore, like for example for a previous DASH Period.

But there's also the risk of encountering that limit while preloading
future contents encrypted through a different keys, or when seeking back
at a previous DASH Period with different keys. In all those scenarios
(which actually seems more probable), there's currently no fix, just
error logs and multiple FIXME mentions in the code.

Fixing this issue while keeping a readable code is very hard right now,
moreover for what is only a suite of theoretical problems that has
never been observed for now.

So I sometimes wonder if the best compromise would not be to just let
it happen, and have an heuristic somewhere else detecting the issue and
fixing it by slightly reducing the experience (e.g. by reloading +
disabling future Period pre-loading)...
@peaBerberian peaBerberian force-pushed the misc/max-session-cache-size-current-content branch from b0cc5fa to 1a2f91b Compare February 27, 2026 15:31
@github-actions
Copy link
Copy Markdown

✅ Automated performance checks have passed on commit f127a957c265c18d7f7654142421c3267020e9f6 with the base branch dev.

Details

Performance tests 1st run output

No significative change in performance for tests:

Name Mean Median
loading 25.37ms -> 25.41ms (-0.040ms, z: 0.59789) 36.00ms -> 36.15ms
seeking 384.56ms -> 363.16ms (21.397ms, z: 1.78609) 16.65ms -> 17.10ms
audio-track-reload 31.85ms -> 31.89ms (-0.042ms, z: 0.59034) 47.55ms -> 47.40ms
cold loading multithread 50.78ms -> 50.16ms (0.616ms, z: 10.80412) 75.15ms -> 74.25ms
seeking multithread 55.44ms -> 45.49ms (9.953ms, z: 0.57685) 19.80ms -> 19.80ms
audio-track-reload multithread 30.27ms -> 30.14ms (0.128ms, z: 1.97146) 44.40ms -> 44.25ms
hot loading multithread 20.04ms -> 19.93ms (0.117ms, z: 3.99744) 29.40ms -> 29.10ms

@canalplus canalplus deleted a comment from github-actions bot Feb 27, 2026
@canalplus canalplus deleted a comment from github-actions bot Feb 27, 2026
@canalplus canalplus deleted a comment from github-actions bot Feb 27, 2026
@canalplus canalplus deleted a comment from github-actions bot Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug This is an RxPlayer issue (unexpected result when comparing to the API) DRM Relative to DRM (EncryptedMediaExtensions) Priority: 3 (Low) This issue or PR has a low priority.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants