Skip in-container sshd setup for single-node container runs#224
Merged
Conversation
ContainerOrchestrator.setup_sshd() ran a fixed command list ending in `/usr/sbin/sshd -p2224` and asserted every step succeeded, for every container run regardless of node count. The orch pytest fixture calls it unconditionally, so a single-node run on a minimal image with no /usr/sbin/sshd failed the whole fixture with a generic "SSH setup command failed" message and never ran the workload. The in-container sshd exists only so MPI (mpirun's plm_rsh_args -p 2224) can reach peer ranks on other nodes. A single-node run execs directly via docker exec and never distributes over MPI, so the sshd setup is dead weight there. Guard setup_sshd() to return True early when len(self.hosts) <= 1, after the container_id precondition. The host count lives on the orchestrator, so the decision belongs there; multinode runs are unchanged.
cijohnson
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ContainerOrchestrator.setup_sshd()ran a fixed command list ending in/usr/sbin/sshd -p2224and asserted every step exited 0 — for everycontainer run, regardless of node count (
cvs/tests/conftest.pycalls itunconditionally). The in-container sshd exists solely so MPI (
mpirun'splm_rsh_args -p 2224, seeBaremetalOrchestrator.get_mpi_command) can reachpeer ranks on other nodes. A single-node run execs directly via
docker execand never distributes over MPI, so the sshd setup is dead weight on one host.
Consequences before this change:
/usr/sbin/sshdfailed the wholeorchfixture (pytest.fail("Failed to setup sshd in container")) and neverran the workload — surfaced only as the generic
SSH setup command failed.that can fail on unrelated edge cases (e.g. empty
~/.sshmount).Change
setup_sshd()to returnTrueearly whenlen(self.hosts) <= 1,placed after the
container_idprecondition. Host count lives on theorchestrator, so the decision belongs there; the fixture stays unchanged and
any future caller inherits the behavior.
Tests
cvs/core/orchestrators/unittests/test_container.py: single-node skips (noruntime.exec),container_idprecondition still raises even single-node,multinode still attempts setup.
Out of scope
message could be clearer, tracked separately.
Gate
make test(mirrors CI.github/workflows/ci.yml, Python 3.10):make fmtclean (213 files unchanged),
Ran 370 tests ... OK, 42 CLI tests passed.