Ichristo/cvs monitor helios jumphost#212
Open
cijohnson wants to merge 6 commits into
Open
Conversation
Implements a new persistent worker mode that maintains long-lived SSH processes across multiple operations, improving performance for workloads with repeated SSH commands to the same hosts. Key Features: - PersistentPsshSharder: Long-lived worker processes with process pooling - Worker state management via WorkerTable/WorkerState interfaces - Configurable via CVS_PERSISTENT_SHARDS environment variable - Intelligent host chunking with max_workers limit enforcement - Backwards compatible - defaults to existing transient mode Components Added: - cvs/lib/parallel/persistent_pssh_sharder.py: Core persistent worker implementation - Enhanced ParallelConfig with persistent_shards option - SharderInterface abstraction for pluggable sharding strategies - Comprehensive test coverage including performance integration tests Host Chunking Improvements: - Simplified chunk_hosts algorithm for better maintainability - Respects max_workers configuration to prevent resource exhaustion - Ensures contiguous host distribution across workers This architecture enables significant performance gains for CVS monitor operations and other SSH-intensive workloads by eliminating connection setup overhead on repeated operations. Signed-off-by: Ignatious Johnson <ichristo@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
registry Jump host (bastion) support for parallel SSH: ParallelConfig/Pssh/ MultiProcessPssh and the persistent sharder now accept jump_host/user/ password/pkey/port and tunnel target connections via parallel-ssh proxy parameters (new JumpHostManager helper). Because this introduced several new CVS_JUMP_* variables, centralize all supported environment variables in a single registry (cvs/lib/env_vars.py): name, default, type, and description declared once, read via get(). Refactor config.from_env and the CLUSTER_FILE / CVS_EXTENSION_PKG_NAMES call sites to use it, and add a `cvs env` command (listing + masked current values + quick table) that renders the registry so docs can't drift. `cvs env` is ordered after `exec` in help output. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Ignatious Johnson <ichristo@amd.com>
unittests Establish a unittest test suite for the cluster-mon backend ahead of the SSH migration, capturing the behavior the migration must preserve. - backend/run_all_unittests.py: backend-rooted unittest discovery - app/unittests/testing.py: FakeSshManager test double for the SSH-manager API - per-module unittests/ packages under app/core, app/collectors, app/api - collector contract tests pinning exec_async -> parsed-output (incl. ERROR/ABORT) - api logs tests (grep validation + /search filtering) - SSH-manager contract test against current Pssh (parallel-ssh + probe mocked), including event-loop non-blocking assertion - cluster-mon Makefile (ut + docker-build); repo Makefile ut delegates to it Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Ignatious Johnson <ichristo@amd.com>
Phase 1 (adapter + TDD parity gate): - Add app/core/cluster_ssh_manager.py wrapping MultiProcessPssh. Preserves cluster-mon's API (exec_async/exec/exec_cmd_list, get_*_hosts, refresh_host_reachability, recreate_client, destroy_clients and the host_list/reachable_hosts/unreachable_hosts attrs). Implements the parity behaviors not provided by the lib: ABORT-merge for pre-probe-unreachable hosts, all-ABORT short-circuit when nothing is reachable, and an exec_async offload point (asyncio.to_thread + lazy asyncio.Lock). Direct path runs the TCP pre-probe; jump-host path uses libssh2 proxy_* and skips it. - Refactor test_ssh_manager_contract.py into SshManagerContractMixin so the same 10 assertions run against both the legacy Pssh and the new adapter, proving parity. 51 tests green. - Makefile ut target installs cvs (--no-deps, editable) so the adapter import resolves without clobbering backend's parallel-ssh pin. Phase 2 (rewire main.py): - Construct ClusterSshManager for both direct and jump-host branches in lifespan startup and reload_configuration; drop the Pssh/JumpHostPssh imports and the dead max_parallel handling. refresh/recreate/destroy call sites unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
Refactor the image to a multi-stage build that builds the cvs wheel from the repo-root build context, so `docker build` / `docker compose build` works with no separate host wheel step. Move the build context to the repo root, add a root .dockerignore to keep the context lean, ignore stray build/ artifacts, and extend `make clean` to remove venv, build, and cache artifacts. The docker-build target auto-detects daemon permissions and retries with sudo. Co-authored-by: Cursor <cursoragent@cursor.com>
The migration to ClusterSshManager (backed by cvs.lib.parallel) is complete and main.py no longer constructs the legacy Pssh/JumpHostPssh classes. Delete those two modules, drop the legacy Pssh arm of the SSH-manager contract test (keeping the ClusterSshManager contract), and update README/testing docstrings to point at cluster_ssh_manager. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist