Add support for multi-clusters.#2087
Conversation
|
@copilot resolve the merge conflicts in this pull request |
There was a problem hiding this comment.
Pull request overview
This pull request introduces a new multi-cluster pipeline intended to let a single Rally load driver benchmark against multiple Elasticsearch clusters during the same benchmark run, and extends metrics/reporting to surface per-cluster request metrics.
Changes:
- Adds a new
multi-clusterpipeline and CLI/docs guidance for configuring multiple named clusters via--target-hosts/--client-options. - Updates the driver and metrics pipeline to tag/request metrics per cluster and report them in a multi-column summary table.
- Extends option parsing utilities to better support “no
defaultcluster key” scenarios and adds related tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
tests/utils/opts_test.py |
Adds tests for single-cluster wrappers over multi-cluster host/option configs. |
tests/racecontrol_test.py |
Adds pipeline listing expectation and tests for multi-cluster pipeline validation/behavior. |
esrally/utils/opts.py |
Adds ClientOptions.default fallback behavior and single-cluster wrapper classes. |
esrally/tracker/tracker.py |
Chooses a “default or first cluster” host/options when generating tracks. |
esrally/reporter.py |
Adds multi-cluster summary table rendering for per-cluster op metrics. |
esrally/rally.py |
Updates --target-hosts help text to document multi-cluster JSON format. |
esrally/racecontrol.py |
Adds multi-cluster pipeline and adjusts Docker pipeline restrictions + result calculation call signature. |
esrally/metrics.py |
Adds cluster_name filtering support in metrics store APIs and returns per-cluster GlobalStats lists. |
esrally/mechanic/mechanic.py |
Uses “default or first cluster” host selection for engine startup. |
esrally/driver/driver.py |
Adds multi-cluster execution mode (runs each step against each cluster), and tags samples with cluster name. |
docs/pipelines.rst |
Documents the new multi-cluster pipeline and adds it to the pipeline list. |
docs/command_line_reference.rst |
Mentions the multi-cluster pipeline in --target-hosts documentation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@copilot resolve the merge conflicts in this pull request |
|
Removing es-perf review request as there are still Copilot comments open. |
…e name In calculate_results(), only enter the multi-cluster result path when race.pipeline == "multi-cluster", rather than inferring it from the presence of cluster names in the metrics store alone. This prevents existing multi-cluster --target-hosts setups that use custom runners (without the multi-cluster pipeline) from accidentally producing multi-cluster reporting.
gbanasiak
left a comment
There was a problem hiding this comment.
Thanks for adding this. I left comments after the first code pass. I haven't tested this yet.
gbanasiak
left a comment
There was a problem hiding this comment.
This looks good in testing. I've left a few additional minor comments while going through the code after recent modifications.
gbanasiak
left a comment
There was a problem hiding this comment.
Many thanks for all the iterations, LGTM. I left non-blocking comments.
I have also noticed esrally compare does not handle multi-cluster results properly. Something to work on in a follow-up.
| @property | ||
| def default(self): | ||
| return self._clients.get("default") or next(iter(self._clients.values()), None) |
There was a problem hiding this comment.
nit: default_or_first method name would be more consistent with ClientOptions class
Co-authored-by: Grzegorz Banasiak <grzegorz.banasiak@elastic.co>
This adds support for a single rally load driver to be able to run benchmarks against multiple Elasticsearch clusters at the same time. This differs than the current behaviour of providing a list of hosts, or a list of clusters where requests are either spread out across the hosts, or a track has to cater specifically for the multiple clusters.