Fix VRAM leak: use weak references in ToplevelHandleState and stop capture sessions#2095
Fix VRAM leak: use weak references in ToplevelHandleState and stop capture sessions#2095MartinKavik wants to merge 7 commits intopop-os:masterfrom
Conversation
ZcosmicToplevelHandleV1 protocol objects persist until the client destroys them. Previously, ToplevelHandleState held a strong CosmicSurface (Arc<WindowInner>) reference, keeping the entire surface chain alive and trapping GPU textures in smithay's append-only data_map. Change ToplevelHandleState.window from Option<W> to Option<W::Weak> so the reference cycle is broken structurally. When the compositor drops its last strong reference to a window, the weak reference becomes invalid and all resources (including VRAM) are freed automatically. - Add WeakCosmicSurface wrapping smithay's WeakWindow - Add Window::Weak associated type and WindowWeak trait - Store window.downgrade() instead of window.clone() in from_window() - Use weak.upgrade() in window_from_handle() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move upgrade() from a separate WindowWeak trait to a static method on the Window trait. Eliminates the circular associated type relationship and reduces boilerplate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…VRAM leak The Toplevel variant stored a strong CosmicSurface reference in capture source user data, creating a reference cycle that prevented GPU texture cleanup. This caused ~203 MiB VRAM leak over 5 workspace overview cycles with 20 windows. Follows the same weak reference pattern used for the ToplevelHandleState fix and the existing Output(WeakOutput) variant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iew VRAM leak Add stop_all_capture_sessions() that drains owned Session/CursorSession from a surface's user data. Called in remove_toplevel() and refresh() dead-window path. When sessions drop, Session::drop() fails active frames and releases GPU buffers. Needed because with WeakCosmicSurface in ImageCaptureSourceKind, the session_destroyed callback can't upgrade() the weak ref after the surface is gone, so remove_session() never runs without explicit draining. Also points smithay to local fork with cherry-picked mem::forget fix (3d3f9e35) — required for Frame::drop() to actually release GPU buffers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No cherry-pick needed — upstream smithay works with system cosmic-workspaces when all cosmic-comp fixes are present. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Explain why ToplevelHandleStateInner stores W::Weak instead of W, why ImageCaptureSourceKind::Toplevel uses WeakCosmicSurface, and why stop_all_capture_sessions is needed at toplevel removal time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drakulix
left a comment
There was a problem hiding this comment.
Thanks, the weak-CosmicSurface fix looks correct, however the changes to image-copy-capture don't seem to be. Please just drop them.
| ImageCaptureSourceKind::Toplevel(mut toplevel) => { | ||
| ImageCaptureSourceKind::Toplevel(weak) => { | ||
| let Some(mut toplevel) = weak.upgrade() else { | ||
| return; |
There was a problem hiding this comment.
I don't think this can just return, we need to invalidate the session here.
| /// `Toplevel` uses `WeakCosmicSurface` (not a strong `CosmicSurface`) so | ||
| /// that capture sources created by cosmic-workspaces for window thumbnails | ||
| /// don't prevent the underlying window from being dropped. `Output` already | ||
| /// uses `WeakOutput` for the same reason. |
| /// Protocol handle objects persist until the *client* destroys them, so | ||
| /// a strong reference here would keep `Arc<WindowInner>` (and its GPU | ||
| /// textures) alive indefinitely. | ||
| type Weak: Clone + Send + 'static; |
| /// destroys them — long-lived clients (cosmic-panel, cosmic-workspaces, and | ||
| /// even cosmic-term itself) hold handles indefinitely. A strong reference | ||
| /// here would prevent `WindowInner` from dropping, trapping GPU textures | ||
| /// in `SurfaceUserData::data_map` → `MultiTextureInternal`. |
There was a problem hiding this comment.
Half of this text isn't accurate and repetitive. Please drop this comment from the commit.
| let mut data = data.borrow_mut(); | ||
| data.sessions.clear(); | ||
| data.cursor_sessions.clear(); | ||
| } |
There was a problem hiding this comment.
This whole function seems unnecessary and the documentation wrong?
The user-data of the toplevel is dropped, when the wayland-object is destroyed.
As long as ImageCopySessionsData doesn't hold any recursive structures, this isn't necessary.
(Looking at our code this, it might be that output-session which are created with a DamageTracker holding a reference to an output might be self-referencial, but that doesn't affect this code path. @ids1024 We need to make sure to manually drop sessions on remove_output.)
|
Heads up: Drakulix's comment about not just returning when the weak upgrade fails is worth looking at closely. The The |
Summary
Weakreferences instead of strongCosmicSurfaceclones inToplevelHandleState, so protocolhandles don't keep
WindowInner(and its GPU textures) aliveWeakCosmicSurfaceinImageCaptureSourceKind::Topleveland explicitly dropcapture sessions on toplevel removal so
Session::drop()fails active frames and releases GPU buffers599857c(includesmem::forgetfix forFrame::success()/Frame::fail())More details and investigation notes: #2084
Test results
5 rounds of 20 cosmic-term windows + 5 workspace overview (Super+W) open/close cycles each. NVIDIA RTX 2070.
VRAM fluctuates but does not accumulate across rounds. Before this fix, cosmic-comp leaked ~200 MiB per round, growing linearly.
Actively using the fixed
cosmic-compbinary in live OS right now to make sure it's stable and not leaking.Resolves
ToplevelHandleState.windowreferences)ImageCaptureSourcereference cycle)