Skip to content

Comments

Fix VRAM leak: use weak references in ToplevelHandleState and stop capture sessions#2095

Open
MartinKavik wants to merge 7 commits intopop-os:masterfrom
MartinKavik:weak_window_upstream_smithay
Open

Fix VRAM leak: use weak references in ToplevelHandleState and stop capture sessions#2095
MartinKavik wants to merge 7 commits intopop-os:masterfrom
MartinKavik:weak_window_upstream_smithay

Conversation

@MartinKavik
Copy link

Summary

  • Fix VRAM leak when closing windows: store Weak references instead of strong CosmicSurface clones in ToplevelHandleState, so protocol
    handles don't keep WindowInner (and its GPU textures) alive
  • Fix VRAM leak when using workspace overview (Super+W): use WeakCosmicSurface in ImageCaptureSourceKind::Toplevel and explicitly drop
    capture sessions on toplevel removal so Session::drop() fails active frames and releases GPU buffers
  • Bump smithay to upstream 599857c (includes mem::forget fix for Frame::success()/Frame::fail())

More details and investigation notes: #2084

Test results

5 rounds of 20 cosmic-term windows + 5 workspace overview (Super+W) open/close cycles each. NVIDIA RTX 2070.

Round cosmic-comp cosmic-workspaces Total
0 (baseline) 32 MiB 14 MiB 508 MiB
1 207 MiB 142 MiB 893 MiB
2 73 MiB 142 MiB 759 MiB
3 248 MiB 142 MiB 893 MiB
4 140 MiB 142 MiB 868 MiB
5 290 MiB 14 MiB 765 MiB
6 248 MiB 142 MiB 893 MiB

VRAM fluctuates but does not accumulate across rounds. Before this fix, cosmic-comp leaked ~200 MiB per round, growing linearly.

Actively using the fixed cosmic-comp binary in live OS right now to make sure it's stable and not leaking.

Resolves

  • Closing windows VRAM leak (stale ToplevelHandleState.window references)
  • Workspace overview VRAM leak (capture session / ImageCaptureSource reference cycle)

MartinKavik and others added 7 commits February 13, 2026 03:31
ZcosmicToplevelHandleV1 protocol objects persist until the client
destroys them. Previously, ToplevelHandleState held a strong
CosmicSurface (Arc<WindowInner>) reference, keeping the entire
surface chain alive and trapping GPU textures in smithay's
append-only data_map.

Change ToplevelHandleState.window from Option<W> to Option<W::Weak>
so the reference cycle is broken structurally. When the compositor
drops its last strong reference to a window, the weak reference
becomes invalid and all resources (including VRAM) are freed
automatically.

- Add WeakCosmicSurface wrapping smithay's WeakWindow
- Add Window::Weak associated type and WindowWeak trait
- Store window.downgrade() instead of window.clone() in from_window()
- Use weak.upgrade() in window_from_handle()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move upgrade() from a separate WindowWeak trait to a static method
on the Window trait. Eliminates the circular associated type
relationship and reduces boilerplate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…VRAM leak

The Toplevel variant stored a strong CosmicSurface reference in capture
source user data, creating a reference cycle that prevented GPU texture
cleanup. This caused ~203 MiB VRAM leak over 5 workspace overview cycles
with 20 windows. Follows the same weak reference pattern used for the
ToplevelHandleState fix and the existing Output(WeakOutput) variant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iew VRAM leak

Add stop_all_capture_sessions() that drains owned Session/CursorSession from
a surface's user data. Called in remove_toplevel() and refresh() dead-window
path. When sessions drop, Session::drop() fails active frames and releases
GPU buffers.

Needed because with WeakCosmicSurface in ImageCaptureSourceKind, the
session_destroyed callback can't upgrade() the weak ref after the surface
is gone, so remove_session() never runs without explicit draining.

Also points smithay to local fork with cherry-picked mem::forget fix
(3d3f9e35) — required for Frame::drop() to actually release GPU buffers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No cherry-pick needed — upstream smithay works with system cosmic-workspaces
when all cosmic-comp fixes are present.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Explain why ToplevelHandleStateInner stores W::Weak instead of W,
why ImageCaptureSourceKind::Toplevel uses WeakCosmicSurface, and
why stop_all_capture_sessions is needed at toplevel removal time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@Drakulix Drakulix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the weak-CosmicSurface fix looks correct, however the changes to image-copy-capture don't seem to be. Please just drop them.

ImageCaptureSourceKind::Toplevel(mut toplevel) => {
ImageCaptureSourceKind::Toplevel(weak) => {
let Some(mut toplevel) = weak.upgrade() else {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can just return, we need to invalidate the session here.

/// `Toplevel` uses `WeakCosmicSurface` (not a strong `CosmicSurface`) so
/// that capture sources created by cosmic-workspaces for window thumbnails
/// don't prevent the underlying window from being dropped. `Output` already
/// uses `WeakOutput` for the same reason.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop this comment.

/// Protocol handle objects persist until the *client* destroys them, so
/// a strong reference here would keep `Arc<WindowInner>` (and its GPU
/// textures) alive indefinitely.
type Weak: Clone + Send + 'static;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop this comment.

/// destroys them — long-lived clients (cosmic-panel, cosmic-workspaces, and
/// even cosmic-term itself) hold handles indefinitely. A strong reference
/// here would prevent `WindowInner` from dropping, trapping GPU textures
/// in `SurfaceUserData::data_map` → `MultiTextureInternal`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Half of this text isn't accurate and repetitive. Please drop this comment from the commit.

let mut data = data.borrow_mut();
data.sessions.clear();
data.cursor_sessions.clear();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function seems unnecessary and the documentation wrong?
The user-data of the toplevel is dropped, when the wayland-object is destroyed.

As long as ImageCopySessionsData doesn't hold any recursive structures, this isn't necessary.

(Looking at our code this, it might be that output-session which are created with a DamageTracker holding a reference to an output might be self-referencial, but that doesn't affect this code path. @ids1024 We need to make sure to manually drop sessions on remove_output.)

@delicious-pancakes
Copy link

delicious-pancakes commented Feb 13, 2026

Heads up: Drakulix's comment about not just returning when the weak upgrade fails is worth looking at closely. The Destroyed / failed-upgrade paths in new_session, new_cursor_session, and frame currently just return, but the session is still alive from the client's perspective. It needs to be invalidated (e.g. session.stop()) so the client knows to stop submitting frames, otherwise it'll keep sending requests against a source that can never produce output.

The session_destroyed and cursor_session_destroyed paths are fine since the session is already going away there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants