Skip to content

fix: pace low-power GPU presentation with vsync#13119

Draft
oz-for-oss[bot] wants to merge 1 commit into
masterfrom
oz-agent/implement-issue-2319
Draft

fix: pace low-power GPU presentation with vsync#13119
oz-for-oss[bot] wants to merge 1 commit into
masterfrom
oz-agent/implement-issue-2319

Conversation

@oz-for-oss

@oz-for-oss oz-for-oss Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Closes #2319

Summary

  • Thread the GPU power preference into WGPU surface configuration.
  • Use PresentMode::AutoVsync for low-power rendering so typing and scrolling are paced to display refresh instead of using the non-vsync presentation path.
  • Preserve PresentMode::AutoNoVsync for the high-performance rendering path to keep the existing low-latency behavior when users opt out of low-power rendering.
  • Add unit tests for both present-mode mappings.

Validation

  • ./script/format --check
  • cargo test -p warpui presentation_mode --no-fail-fast
  • cargo clippy --workspace --exclude warp_completer --all-targets --tests -- -D warnings
  • cargo clippy -p warp_completer --all-targets --tests -- -D warnings

Notes

  • No approved spec context was available for this issue, so this implements the smallest rendering-layer mitigation aligned with recent reports of GPU spikes while typing or scrolling.
  • A direct cargo clippy --workspace --all-targets --all-features --tests -- -D warnings run is not the current presubmit path and fails on unrelated all-features dead-code/unused-item warnings across terminal modules.

Co-Authored-By: Elijah Lynn <ElijahLynn@users.noreply.github.com>

Co-Authored-By: Oz <oz-agent@warp.dev>
@ElijahLynn

Copy link
Copy Markdown

Manual A/B test results (native Wayland, Intel Iris Xe)

I reproduced the high GPU usage from #2319 on a local dev build of this PR branch and ran an automated A/B comparison between the two present-mode paths. The PR change did not meaningfully reduce GPU load under a heavy typing/scroll workload.

Environment

  • Hardware: Intel Iris Xe Graphics (ADL GT2), single integrated GPU (no discrete GPU)
  • OS: Arch Linux, GNOME on Wayland
  • Build: cargo run dev build (warp-oss), PR branch checked out locally
  • Windowing: native Wayland (system.force_x11 = false, launched with WARP_ENABLE_WAYLAND=1)
  • Measurement: whole-GPU i915 PMU rcs0-busy via perf_event_open (same source btop / intel_gpu_top use), not per-process fdinfo

Workload

Automated, repeatable input via ydotool (kernel uinput):

  • rapid typing into the prompt + clear (Ctrl+U)
  • seq 1 4000 output flood + autoscroll
  • PageUp/PageDown scrollback churn

~18s workload, ~20s GPU sampling at 200ms intervals.

Results

Mode prefer_low_power_gpu Present mode Render avg Render peak
vsync (PR path) true AutoVsync 49.9% 76.8%
novsync (old path) false AutoNoVsync 51.0% 82.1%

Both runs used the same binary, same native Wayland config, and the same automated workload. The difference is within noise (~1% avg, ~5% peak).

Observations

  1. On a single-iGPU Linux machine, enabling "prefer low power GPU" was already possible (integrated GPU detected), but stable builds did not change present mode — only adapter selection. This PR adds the missing vsync pacing for that path, but it did not fix the reported symptom for me.
  2. AutoVsync caps frame rate, not per-frame render cost. Under sustained typing/scrolling Warp still drives the render engine hard (~50% avg, ~77–82% peak whole-GPU).
  3. On Wayland, a meaningful fraction of compositing cost may land outside Warp's process (GNOME/mutter), so whole-GPU PMU is the right metric for user-visible impact.

Suggestion

This looks like a reasonable small mitigation for uncapped over-presentation, but for cases like mine (single Iris Xe, native Wayland, heavy terminal redraw) a deeper fix may be needed — e.g. frame coalescing/throttling on input, dirty-region rendering, or reducing full-grid redraws during typing/scroll.

Happy to re-run with different configs (XWayland vs Wayland, longer scrollback, etc.) if useful.

@ElijahLynn

Copy link
Copy Markdown

Reproducibility: methodology + scripts

Follow-up to the A/B results above — full reproduction steps and the scripts used.

Prerequisites

# Build the PR branch locally
gh pr checkout 13119
cargo build

# GPU measurement: i915 PMU via perf (same source as btop/intel_gpu_top)
# btop ships with cap_perfmon; for a script you need one of:
sudo sysctl kernel.perf_event_paranoid=0   # revert: =2

# Input automation under native GNOME Wayland (kernel uinput)
sudo pacman -S --needed ydotool   # or apt equivalent
# ydotoold runs as your user if /dev/uinput ACL grants access

btop reference: cloned https://github.com/aristocratos/btop and traced GPU measurement to vendored intel_gpu_top (src/linux/intel_gpu_top/), which reads i915 PMU *-busy events (rcs0-busy = render) via perf_event_open with PERF_FORMAT_TOTAL_TIME_ENABLED. Utilization = Δbusy_ns / Δtime_enabled_ns. Whole-GPU (pid=-1), not per-process fdinfo.

Per-process /proc/<pid>/fdinfo drm-engine-render under-counts on Wayland because compositor work lands in mutter/gnome-shell.

Config (dev build ~/.config/warp-oss/settings.toml)

[system]
force_x11 = false
prefer_low_power_gpu = true   # vsync / AutoVsync path (PR)
# prefer_low_power_gpu = false  # novsync / AutoNoVsync path (baseline)

Launch with native Wayland (also hides the Wayland settings toggle in UI by design):

WARP_ENABLE_WAYLAND=1 /path/to/target/debug/warp-oss

Scripts

Public gist with the three scripts used:

https://gist.github.com/ElijahLynn/dc18971e77101b32a823215b8fa67a98

File Role
gpu_pmu_sampler.py Samples whole-GPU rcs0-busy (render) via i915 PMU, writes CSV
workload.sh Fixed typing / seq 1 4000 / PageUp-Down workload via ydotool
run_mode_auto.sh Writes config, relaunches warp-oss, countdown, runs sampler + workload

Run commands (what I actually ran)

mkdir -p ~/warp-gpu-test
# copy scripts from gist into ~/warp-gpu-test, chmod +x

ydotoold --socket-path "$XDG_RUNTIME_DIR/.ydotool_socket" --socket-own "$(id -u):$(id -g)" &

# AutoVsync (PR path)
./run_mode_auto.sh vsync true 10 20 18
# -> /tmp/gpu_vsync.csv

# AutoNoVsync (baseline) — re-focus Warp during countdown
./run_mode_auto.sh novsync false 10 20 18
# -> /tmp/gpu_novsync.csv

Important: ydotool sends input to the focused window. Click the Warp terminal pane during the countdown and don't touch keyboard/mouse during the ~18s workload.

Summarize CSVs

for f in /tmp/gpu_vsync.csv /tmp/gpu_novsync.csv; do
  echo -n "$f: "
  awk -F, 'NR>1{r+=$2; n++; if($2>p)p=$2} END{printf "render avg=%.1f%% peak=%.1f%%\n", r/n, p}' "$f"
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Too high GPU memory usage

2 participants