Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 93 additions & 1 deletion docs/Maestro/scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,99 @@ flux run -n 1 -N 1 -c 1 --setopt=optiona myapplication
!!! note

Using flux directives here to illustrate even though python api is used. These directives will be in the step scripts, retaining repeatability/record of what was submitted and viewable with the dry run feature. The batch/allocation arguments are normalized to the long form (`--setattr` instead of `-S`) and will show up that way in the serialized batch scripts.


### Binding Mode
----

The `binding mode` key was added to steps in Maestro version :material-tag:`1.1.12` (currently flux only, other to follow). This key enables control of whether to attach nodes/tasks info to the batch job/jobspec and/or to the `$(LAUNCHER)` generated flux run calls independently. Whether you want Nodes attached to either one is context dependent:

* `True`: Launching a job that wants exclusive use of a Node, even on core scheduled partitions (Exclusive job example)
* `False`: Launching a job that to a nested broker (an existing allocation/batch job) that only needs a slice of a single node (Small, nested/non-exclusive job example)

See two simplified Maestro specifications and generated batch scripts with these new options

#### Exclusive job
---
=== "Maestro Specification"

``` yaml
batch:
type: flux
host: machineA
bank: guests
queue: debug

study:
- name: step1
description: Sample step that grabs exclusive use of a node even with only one task
run:
cmd: |
$(LAUNCHER) small_application

procs: 1
nodes: 1
walltime: "00:01:00"
binding mode:
allocation: tasks and nodes
launcher: tasks only

```

=== "Generated Batch Script"

Assuming the step is launched at the root level (step gets it's own allocation):

``` console
#flux: -N 1
#flux: -n 1
#flux: -c 1
#flux: -q debug
#flux: --bank=guests
#flux: -t 60s

flux run -n 1 -c 1 small_application
```

#### Small, nested/non-exclusive job
---
=== "Maestro Specification"

``` yaml
batch:
type: flux
host: machineA
bank: guests
queue: debug

study:
- name: step1
description: Sample step that only uses part of a node (allocation packing)
run:
cmd: |
$(LAUNCHER) small_application

procs: 1
nodes: 1
walltime: "00:01:00"
binding mode:
allocation: tasks only
launcher: tasks only

```

=== "Generated Batch Script"

Assuming the step is launched inside an allocation where there are no queues (-q directive omitted):

``` console
#flux: -n 1
#flux: -c 1
#flux: --bank=guests
#flux: -t 60s

flux run -n 1 -c 1 small_application
```

## LSF: a Tale of Two Launchers
----

Expand Down
3 changes: 2 additions & 1 deletion docs/Maestro/specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -430,13 +430,14 @@ before using them:
| **Key** | **Type** | **Description** |
| :- | :-: | :- |
| `cores per task` | str/int | Number of cores to use for each task |
| `exclusive` | str | Flag for ensuring batch job has exclusive access to it's requested resources |
| `exclusive` | str/mapping| Flag for ensuring batch job has exclusive access to it's requested resources |
| `gpus` | str/int | Number of gpus to allocate with this step |
| `tasks per rs` | str/int | Number of tasks per resource set (LSF/jsrun) |
| `rs per node` | str/int | Number of resource sets per node |
| `cpus per rs` | str/int | Number of cpus in each resource set |
| `bind` | str | Controls binding of tasks in resource sets |
| `bind gpus` | str | Controls binding of gpus in resource sets |
| `binding mode` | mapping | Controls whether to bind nodes/tasks to allocation directives/launcher args |


## Parameters: `global.parameters`
Expand Down
34 changes: 34 additions & 0 deletions maestrowf/specification/schemas/yamlspecification.json
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,40 @@
}
]
},
"resource mapping": {
"type": "object",
"properties": {
"allocation": {
"type": "object",
"properties": {
"nodes": { "type": "boolean" },
"tasks": { "type": "boolean" }
},
"additionalProperties": false,
"anyOf": [
{ "required": ["nodes"] },
{ "required": ["tasks"] }
]
},
"launcher": {
"type": "object",
"properties": {
"nodes": { "type": "boolean" },
"tasks": { "type": "boolean" }
},
"additionalProperties": false,
"anyOf": [
{ "required": ["nodes"] },
{ "required": ["tasks"] }
]
}
},
"additionalProperties": false,
"anyOf": [
{ "required": ["allocation"] },
{ "required": ["launcher"] }
]
},
"nested": {"type": "boolean"},
"waitable": {"type": "boolean"},
"priority": {
Expand Down
7 changes: 6 additions & 1 deletion maestrowf/specification/yamlspecification.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@
)
from maestrowf.datastructures import environment

from rich.traceback import install
install(show_locals=True)

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -426,7 +429,9 @@ def validate_schema(parent_key, instance, schema):
raise jsonschema.ValidationError(
f"In {parent_key}, {path} must be of type "
f"'{expected_type}', but found "
f"'{type(instance[path]).__name__}'."
f"'{type(error.instance).__name__}'."
# Below can't work for nested keys: rework all this
# f"'{type(instance[path]).__name__}'."
)

elif error.validator == "required":
Expand Down
63 changes: 63 additions & 0 deletions tests/specification/test_specs/error_resource_mapping_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
description:
name: error_resource_mapping_1
description: Test variant for validation errors for 'resource mapping' step keys. Exercises invalid keys

batch:
type : flux
host : rzadams
bank : guests
queue : pdebug

env:
variables:
OUTPUT_PATH: hello_bye_world
labels:
OUT_FORMAT: $(GREETING)_$(NAME).txt

study:
- name: hello_world
description: Say hello to someone!
run:
cmd: |
$(LAUNCHER) echo "$(GREETING), $(NAME)!" > $(OUT_FORMAT)
$(LAUNCHER) sleep 1

nodes: 1
procs: 4
resource mapping:
alloc:
tasks: False
launcher:
nodes: False
tasks: True
nested: True
exclusive: True # Testing mixed batch/step
walltime: "00:60"

- name: bye_world
description: Say bye to someone!
run:
cmd: |
$(LAUNCHER) echo "Bye, World!" > bye.txt
$(LAUNCHER) sleep 1
procs: 1
nested: True
exclusive: # Test overriding launcher from batch block
allocation: True
launcher: False
walltime: "00:60"
depends: [hello_world]

global.parameters:
# NAME:
# values: [Pam, Jim, Michael, Dwight]
# label: NAME.%%
# GREETING:
# values: [Hello, Ciao, Hey, Hi]
# label: GREETING.%%
NAME:
values: [Pam]
label: NAME.%%
GREETING:
values: [Hello]
label: GREETING.%%
61 changes: 61 additions & 0 deletions tests/specification/test_specs/error_resource_mapping_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
description:
name: error_resource_mapping_1
description: Test variant for validation errors for 'resource mapping' step keys. Exercises invalid keys

batch:
type : flux
host : rzadams
bank : guests
queue : pdebug

env:
variables:
OUTPUT_PATH: hello_bye_world
labels:
OUT_FORMAT: $(GREETING)_$(NAME).txt

study:
- name: hello_world
description: Say hello to someone!
run:
cmd: |
$(LAUNCHER) echo "$(GREETING), $(NAME)!" > $(OUT_FORMAT)
$(LAUNCHER) sleep 1

nodes: 1
procs: 4
resource mapping:
allocation:
tasks: 'all'

nested: True
exclusive: True # Testing mixed batch/step
walltime: "00:60"

- name: bye_world
description: Say bye to someone!
run:
cmd: |
$(LAUNCHER) echo "Bye, World!" > bye.txt
$(LAUNCHER) sleep 1
procs: 1
nested: True
exclusive: # Test overriding launcher from batch block
allocation: True
launcher: False
walltime: "00:60"
depends: [hello_world]

global.parameters:
# NAME:
# values: [Pam, Jim, Michael, Dwight]
# label: NAME.%%
# GREETING:
# values: [Hello, Ciao, Hey, Hi]
# label: GREETING.%%
NAME:
values: [Pam]
label: NAME.%%
GREETING:
values: [Hello]
label: GREETING.%%
63 changes: 63 additions & 0 deletions tests/specification/test_specs/valid_resource_mapping_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
description:
name: valid_resource_mapping_1
description: Test variant for verification of valid 'resource mapping' step keys

batch:
type : flux
host : rzadams
bank : guests
queue : pdebug

env:
variables:
OUTPUT_PATH: hello_bye_world
labels:
OUT_FORMAT: $(GREETING)_$(NAME).txt

study:
- name: hello_world
description: Say hello to someone!
run:
cmd: |
$(LAUNCHER) echo "$(GREETING), $(NAME)!" > $(OUT_FORMAT)
$(LAUNCHER) sleep 1

nodes: 1
procs: 4
resource mapping:
allocation:
tasks: False
launcher:
nodes: False
tasks: True
nested: True
exclusive: True # Testing mixed batch/step
walltime: "00:60"

- name: bye_world
description: Say bye to someone!
run:
cmd: |
$(LAUNCHER) echo "Bye, World!" > bye.txt
$(LAUNCHER) sleep 1
procs: 1
nested: True
exclusive: # Test overriding launcher from batch block
allocation: True
launcher: False
walltime: "00:60"
depends: [hello_world]

global.parameters:
# NAME:
# values: [Pam, Jim, Michael, Dwight]
# label: NAME.%%
# GREETING:
# values: [Hello, Ciao, Hey, Hi]
# label: GREETING.%%
NAME:
values: [Pam]
label: NAME.%%
GREETING:
values: [Hello]
label: GREETING.%%
Loading
Loading