SOLR-18255 Jans initial port from OSB to solr-benchmark (6 logical commits) by janhoy · Pull Request #3 · apache/solr-orbit

janhoy · 2026-05-21T23:21:26Z

https://issues.apache.org/jira/browse/SOLR-18255

This PR contains the initial port of OpenSearch Benchmark (OSB) to work with Apache Solr. The fork point from OSB is tagged osb_fork_point (OSB commit 92982c56).

The codebase retains the OSB Python package name (osbenchmark) and directory structure for now; known work to do is tracked in TODO.md and will likely be converted into JIRA tasks.

How to review

The PR is structured as 6 commits in logical progression order. Each commit is independently coherent and reviewable in isolation. The recommended approach is to review one commit at a time using GitHub's commit view or git log -p. The final commit is the largest, but by that point the project shape is established and the changes read more clearly in context.

#	Commit	Files	What to focus on
1	Establish ASF legal and governance files	12	NOTICE attribution, license header format, CONTRIBUTING accuracy
2	Update GitHub/CI infrastructure	20	Workflow correctness, removed vs. kept actions
3	Rewrite documentation	84	Install steps, CLI examples, converter docs accuracy
4	Remove OSB-specific dead code and binaries	41	Verify nothing Solr-relevant was swept up
5	Add new Solr-specific modules	25	Conversion logic (schema.py, query.py), provisioner correctness
6	Port core benchmark framework	195	client.py, telemetry.py, runner.py — see functional notes below

Summary of major changes

1. Solr-native client (`osbenchmark/client.py`)

The OpenSearch Python client (opensearch-py) has been replaced with a purpose-built SolrAdminClient and SolrClient that communicate with Solr over HTTP using requests/pysolr. All collection management, document indexing, and query execution now goes through Solr's REST API (Collections API, /select, /update, etc.).

2. Solr provisioner (`osbenchmark/builder/solr_provisioner.py`)

A new SolrProvisioner replaces the OpenSearch node provisioning machinery. It supports three deployment modes:

from-distribution — downloads a released Solr binary from downloads.apache.org or the ASF archive (including pre-9.0 paths).
from-sources — builds Solr from a local checkout with Gradle.
docker — pulls and starts the official Solr Docker image, including nightly builds.

SolrDockerLauncher handles container lifecycle. Version-aware logic handles the API differences between Solr 9.x and 10.x (e.g. collection creation flags).

3. Solr-specific telemetry devices (`osbenchmark/telemetry.py`)

Six new SolrTelemetryDevice subclasses collect Solr-specific metrics during a run: SolrJvmStats, SolrNodeStats, SolrCollectionStats, SolrQueryStats, SolrIndexingStats, SolrCacheStats. These poll the Solr Metrics API and write results via the existing ResultWriter pipeline.

4. Solr runner operations (`osbenchmark/worker_coordinator/runner.py`)

56 OpenSearch-specific runner classes have been removed (KNN, ML connectors, vector datasets, data streams, index templates, pipelines, etc.). In their place, Solr-specific runners have been added under SolrRunner: SolrBulkIndex, SolrSearch, SolrPaginatedSearch, SolrCommit, SolrOptimize, SolrWaitForMerges, SolrCreateCollection, SolrDeleteCollection.

5. Workload model: index → collection (`osbenchmark/workload/`)

The workload domain model has been updated throughout:

Index / DataStream / IndexTemplate → Collection
IndexTemplate, ComponentTemplate, DataStream and serverless/vector-related types removed
New CreateCollectionParamSource / DeleteCollectionParamSource / SolrSearchParamSource
OpenSearch Query DSL validation removed; Solr query params used instead

6. OSB-to-Solr workload converter (`osbenchmark/conversion/`)

A new converter pipeline (workload_converter.py, detector.py, query.py, schema.py, field.py) translates an OpenSearch Benchmark workload into Solr format:

Detects OSB-specific operations and query DSL automatically
Translates bulk → bulk-index, force-merge → optimize, index mappings → Solr configsets
Generates a minimal solrconfig.xml / managed-schema.xml configset skeleton
Invoked via solr-benchmark convert-workload; see docs/converter/ for details

7. Metrics store simplified (`osbenchmark/metrics.py`)

OsMetricsStore, OsTestRunStore, OsResultsStore, and IndexTemplateProvider (all backed by OpenSearch) have been removed. The single supported store is now FilesystemMetricsStore (JSON + CSV + SQLite on local disk), accessed via LocalFilesystemResultWriter.

8. Documentation site (`docs/`)

A full user-facing documentation site is included, built with Jekyll + just-the-docs. Key sections: user-guide/ (install, configure, workload authoring), reference/ (telemetry, metrics, workload schema, commands), converter/ (OSB migration guide), cluster-config/. Deployed to GitHub Pages via .github/workflows/docs.yml. See docs/README.md for local build instructions.

9. ASF licence headers and housekeeping

All modified files carry a two-line ASF modification notice above the original OpenSearch header.
OSB-specific GitHub workflows (release, backport, integ-test, PyPI publish) removed; a docs deploy workflow added.
Bundled pbzip2 binaries removed; pbzip2 is now an optional system prerequisite.
CONTRIBUTING.md, DEVELOPER_GUIDE.md, README.md rewritten for the Solr/ASF context.
TODO.md tracks remaining incubation steps (package rename, CI, release process, etc.).

The changes are described by the 9 functional areas above regardless of which commit they land in. The 6-commit structure exists purely to aid review — it does not reflect the order in which the work was done.

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

epugh

I poked around, and other then noticing some solr-benchmark where I expected solr-orbit, this looks good. Maybe in a future pr we fix the directory names?

janhoy · 2026-05-22T00:25:56Z

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Haha :)

I poked around, and other then noticing some solr-benchmark where I expected solr-orbit, this looks good. Maybe in a future pr we fix the directory names?

Not yet re-branded this repo, so that is expected.

Being in flux and needing more steps, I'll do CTR for this PR and merge the 6 commits as is. Normally I'd leave it open for 3-4 days to allow reviews, but I believe in this early stage, as you say Eric, it is acceptable to focus on progress as long as we follow best practices and others can review after the fact.

Replace OpenSearch-specific project governance with ASF-compatible equivalents: add NOTICE and create-notice.sh for ASF IP compliance, bump version to 0.1.0 to signal the fresh start, update CONTRIBUTING.md for Solr context, remove MAINTAINERS/RELEASE/TRIAGE files that don't apply to an ASF incubating project (those processes are defined by the ASF), drop .whitesource/.fossa.yml OSS-scanning configs that were tied to the OpenSearch project infrastructure. Part of apache#3

Adapt CI/CD and project metadata for the Solr port: - Remove workflows that depended on OpenSearch infrastructure (backport, add-untriaged, integ-test, publish-release, docker-push-release) — these will be rebuilt once the project has its own ASF infrastructure - Add docs.yml workflow (commented out pending docs host decision) - Simplify unit-test and docker-build workflows to remove OpenSearch-specific steps - Update .ci/build.sh and check_deprecated_terms.py for Solr naming - Remove CODEOWNERS and issue templates tied to the old team structure - Add AGENTS.md: guidance for AI coding assistants working in this repo - Refresh Makefile, tox.ini, .pylintrc, .gitignore for the new project shape Part of apache#3

Complete documentation overhaul for the Solr port: - Replace OpenSearch-focused Jekyll docs site with Solr-specific content covering installation, configuration, running benchmarks, workload creation, and the new OpenSearch→Solr converter - Remove legacy docs/api/ (OpenSearch API reference) and docs/user-guides/ in favor of the new docs/user-guide/ and docs/reference/ structure - Update README.md, DEVELOPER_GUIDE.md, PYTHON_SUPPORT_GUIDE.md, CREATE_WORKLOAD_GUIDE.md to use Solr terminology and solr-benchmark CLI - Add TODO.md: incubation checklist and known remaining work - Add it/README.md: integration test setup instructions - Remove opensearch_benchmark.png splash image Part of apache#3

Clean out everything that has no place in a Solr benchmark tool: **Kafka / async HTTP / gRPC:** - Remove kafka_client.py (Kafka producer for OpenSearch metrics streaming) - Remove async_connection.py (OpenSearch async HTTP connection layer) - Remove worker_coordinator/proto_helpers/ (gRPC bulk/query helpers) - Remove osbenchmark/data_streaming/ package (Kafka data pipeline) - Remove all corresponding unit tests **Bundled binaries:** - Remove osbenchmark/decompressors/pbzip2-{Darwin,Linux}-{arm64,x86_64,aarch64} - Remove scripts/pbzip2 These binaries are not redistributable in an ASF project; decompression will use the system pbzip2 or fallback to Python's bz2 module. **OpenSearch-specific infrastructure:** - Remove scripts/terraform/ (Terraform cluster provisioning for OpenSearch on AWS) - Remove samples/ccr/ (OpenSearch cross-cluster replication sample) - Remove tests for all of the above Part of apache#3

All files in this commit are net-new — no existing code is modified. **osbenchmark/conversion/ — OpenSearch→Solr workload converter:** - detector.py: identify whether a workload targets OpenSearch or Solr - field.py: field name normalization rules - schema.py: translate OpenSearch index mappings to Solr schema XML - query.py: translate OpenSearch Query DSL operations to Solr query syntax - workload_converter.py: orchestrate full workload directory conversion Tests: tests/unit/solr/conversion/ and tests/unit/solr/test_workload_converter.py **osbenchmark/builder/solr_provisioner.py:** Provision and configure a Solr cluster (collection creation, configset upload, schema application) as a drop-in replacement for the OpenSearch provisioner. Test: tests/unit/solr/test_provisioner.py **osbenchmark/builder/installers/preparers/solr_preparer.py:** Prepare a Solr node installation (derived from opensearch_preparer.py, adapted for Solr directory layout and startup options). Test: tests/builder/installers/preparers/solr_preparer_test.py **osbenchmark/result_writer.py:** Write benchmark results to filesystem in JSON/CSV formats for Solr runs. Test: tests/unit/solr/test_result_writer.py **solrbenchmark/ package:** Thin top-level package and entry point (solr-benchmark CLI) that will replace opensearch-benchmark once the project is accepted into ASF. **tests/unit/solr/:** Full unit test suite for all new Solr modules. tests/unit/test_telemetry.py: new telemetry test replacing the old telemetry_test.py. None of these modules are wired into the main CLI yet; that happens in the next commit. Part of apache#3

This is the main functional change of the Solr port, touching all layers of the benchmark tool. This commit wires the previous five together. **setup.py / entry points:** - Remove opensearch-py, opensearch-protobufs, aiokafka dependencies - Add requests for Solr HTTP communication - Rename entry points: opensearch-benchmark→solr-benchmark, osb→sb (pending ASF acceptance) **osbenchmark/client.py — complete rewrite:** Replace the opensearch-py async client with a synchronous requests-based Solr HTTP client. Supports collection management, document indexing, and query execution against a Solr cluster. **osbenchmark/telemetry.py:** Replace OpenSearch-specific telemetry devices (JVM heap, GC, hot threads, etc.) with Solr equivalents using the Solr Metrics API and node stats endpoints. **osbenchmark/worker_coordinator/runner.py:** Adapt operation runners for Solr: bulk indexing via /update, queries via /select, collection admin operations. Remove OpenSearch-specific operations (snapshot, shrink, force-merge semantics, etc.). **osbenchmark/builder/ — OSB→Solr naming cleanup:** - Rename opensearch_distribution_downloader.py → distribution_downloader.py - Rename opensearch_source_downloader.py → source_downloader.py - Rename opensearch_distribution_repository_provider.py → distribution_repository_provider.py - Delete opensearch_preparer.py (replaced by solr_preparer.py in PR 5) - Delete core_plugin_source_downloader.py, external_plugin_source_downloader.py, plugin_distribution_downloader.py (OSB plugin infrastructure, not needed for Solr) - Update builder.py, provisioner.py, supplier.py to use the new Solr provisioner **osbenchmark/config.py, context.py, benchmark.py, benchmarkd.py:** Update configuration keys, context variables, and CLI help text for Solr. Remove OpenSearch-specific commands and flags; add Solr cluster URL handling. **osbenchmark/workload/ and osbenchmark/workload_generator/:** Adapt workload loading and workload generation for Solr collection schema. **osbenchmark/metrics.py, publisher.py:** Update metric names and summary report labels from OpenSearch to Solr terminology. **osbenchmark/resources/ — cluster config cleanup:** Remove resources/cluster_configs/1.0/ entirely (OpenSearch 1.x configs). Simplify resources/cluster_configs/main/ to Solr-relevant entries. Update benchmark.ini default configuration. **Integration and unit tests:** Update all existing tests to match new API shapes. Delete tests for removed functionality (telemetry_test.py replaced by tests/unit/test_telemetry.py, workload_generator corpus/index tests removed as workload_generator was refactored). Part of apache#3

Replace OpenSearch-specific project governance with ASF-compatible equivalents: add NOTICE and create-notice.sh for ASF IP compliance, bump version to 0.1.0 to signal the fresh start, update CONTRIBUTING.md for Solr context, remove MAINTAINERS/RELEASE/TRIAGE files that don't apply to an ASF incubating project (those processes are defined by the ASF), drop .whitesource/.fossa.yml OSS-scanning configs that were tied to the OpenSearch project infrastructure. Part of #3

Adapt CI/CD and project metadata for the Solr port: - Remove workflows that depended on OpenSearch infrastructure (backport, add-untriaged, integ-test, publish-release, docker-push-release) — these will be rebuilt once the project has its own ASF infrastructure - Add docs.yml workflow (commented out pending docs host decision) - Simplify unit-test and docker-build workflows to remove OpenSearch-specific steps - Update .ci/build.sh and check_deprecated_terms.py for Solr naming - Remove CODEOWNERS and issue templates tied to the old team structure - Add AGENTS.md: guidance for AI coding assistants working in this repo - Refresh Makefile, tox.ini, .pylintrc, .gitignore for the new project shape Part of #3

Complete documentation overhaul for the Solr port: - Replace OpenSearch-focused Jekyll docs site with Solr-specific content covering installation, configuration, running benchmarks, workload creation, and the new OpenSearch→Solr converter - Remove legacy docs/api/ (OpenSearch API reference) and docs/user-guides/ in favor of the new docs/user-guide/ and docs/reference/ structure - Update README.md, DEVELOPER_GUIDE.md, PYTHON_SUPPORT_GUIDE.md, CREATE_WORKLOAD_GUIDE.md to use Solr terminology and solr-benchmark CLI - Add TODO.md: incubation checklist and known remaining work - Add it/README.md: integration test setup instructions - Remove opensearch_benchmark.png splash image Part of #3

Clean out everything that has no place in a Solr benchmark tool: **Kafka / async HTTP / gRPC:** - Remove kafka_client.py (Kafka producer for OpenSearch metrics streaming) - Remove async_connection.py (OpenSearch async HTTP connection layer) - Remove worker_coordinator/proto_helpers/ (gRPC bulk/query helpers) - Remove osbenchmark/data_streaming/ package (Kafka data pipeline) - Remove all corresponding unit tests **Bundled binaries:** - Remove osbenchmark/decompressors/pbzip2-{Darwin,Linux}-{arm64,x86_64,aarch64} - Remove scripts/pbzip2 These binaries are not redistributable in an ASF project; decompression will use the system pbzip2 or fallback to Python's bz2 module. **OpenSearch-specific infrastructure:** - Remove scripts/terraform/ (Terraform cluster provisioning for OpenSearch on AWS) - Remove samples/ccr/ (OpenSearch cross-cluster replication sample) - Remove tests for all of the above Part of #3

All files in this commit are net-new — no existing code is modified. **osbenchmark/conversion/ — OpenSearch→Solr workload converter:** - detector.py: identify whether a workload targets OpenSearch or Solr - field.py: field name normalization rules - schema.py: translate OpenSearch index mappings to Solr schema XML - query.py: translate OpenSearch Query DSL operations to Solr query syntax - workload_converter.py: orchestrate full workload directory conversion Tests: tests/unit/solr/conversion/ and tests/unit/solr/test_workload_converter.py **osbenchmark/builder/solr_provisioner.py:** Provision and configure a Solr cluster (collection creation, configset upload, schema application) as a drop-in replacement for the OpenSearch provisioner. Test: tests/unit/solr/test_provisioner.py **osbenchmark/builder/installers/preparers/solr_preparer.py:** Prepare a Solr node installation (derived from opensearch_preparer.py, adapted for Solr directory layout and startup options). Test: tests/builder/installers/preparers/solr_preparer_test.py **osbenchmark/result_writer.py:** Write benchmark results to filesystem in JSON/CSV formats for Solr runs. Test: tests/unit/solr/test_result_writer.py **solrbenchmark/ package:** Thin top-level package and entry point (solr-benchmark CLI) that will replace opensearch-benchmark once the project is accepted into ASF. **tests/unit/solr/:** Full unit test suite for all new Solr modules. tests/unit/test_telemetry.py: new telemetry test replacing the old telemetry_test.py. None of these modules are wired into the main CLI yet; that happens in the next commit. Part of #3

janhoy mentioned this pull request May 21, 2026

SOLR-18255 Jans initial port from OSB to solr-benchmark #1

Closed

janhoy requested review from Copilot and epugh May 21, 2026 23:24

Copilot AI reviewed May 21, 2026

View reviewed changes

epugh approved these changes May 21, 2026

View reviewed changes

janhoy added 6 commits May 22, 2026 02:29

janhoy force-pushed the port/apache-solr-benchmark branch from 964651b to 5559b28 Compare May 22, 2026 00:34

janhoy merged commit 57387ed into apache:main May 22, 2026
3 checks passed

janhoy deleted the port/apache-solr-benchmark branch May 22, 2026 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-18255 Jans initial port from OSB to solr-benchmark (6 logical commits)#3

SOLR-18255 Jans initial port from OSB to solr-benchmark (6 logical commits)#3
janhoy merged 6 commits into
apache:mainfrom
janhoy:port/apache-solr-benchmark

janhoy commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

epugh left a comment

Uh oh!

janhoy commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

janhoy commented May 21, 2026

How to review

Summary of major changes

1. Solr-native client (osbenchmark/client.py)

2. Solr provisioner (osbenchmark/builder/solr_provisioner.py)

3. Solr-specific telemetry devices (osbenchmark/telemetry.py)

4. Solr runner operations (osbenchmark/worker_coordinator/runner.py)

5. Workload model: index → collection (osbenchmark/workload/)

6. OSB-to-Solr workload converter (osbenchmark/conversion/)

7. Metrics store simplified (osbenchmark/metrics.py)

8. Documentation site (docs/)

9. ASF licence headers and housekeeping

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

epugh left a comment

Choose a reason for hiding this comment

Uh oh!

janhoy commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Solr-native client (`osbenchmark/client.py`)

2. Solr provisioner (`osbenchmark/builder/solr_provisioner.py`)

3. Solr-specific telemetry devices (`osbenchmark/telemetry.py`)

4. Solr runner operations (`osbenchmark/worker_coordinator/runner.py`)

5. Workload model: index → collection (`osbenchmark/workload/`)

6. OSB-to-Solr workload converter (`osbenchmark/conversion/`)

7. Metrics store simplified (`osbenchmark/metrics.py`)

8. Documentation site (`docs/`)