Skip to content

wherobots/benchmark-results

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Wherobots Benchmark Results

Monthly benchmark results from Wherobots performance benchmarks, comparing spatial analytics performance across multiple cloud platforms.

Repository Structure

benchmark-results/
├── tpch/                          # TPC-H benchmark results
│   ├── parquet/
│   │   ├── sf100/YYYY-MM.parquet  # Scale factor 100
│   │   └── sf1000/YYYY-MM.parquet # Scale factor 1000
│   └── csv/
│       ├── sf100/YYYY-MM.csv
│       └── sf1000/YYYY-MM.csv
├── spatial/                       # SpatialBench benchmark results
│   ├── parquet/
│   │   └── ...
│   └── csv/
│       └── ...
├── short-spatial/                 # ShortSpatialBench results (if applicable)
│   └── ...
└── metadata/
    └── YYYY-MM.json               # Run metadata (timing, CLI args, platform configs)

File Formats

Parquet (Full Schema)

Machine-readable format with complete result data:

Column Type Description
query string Query identifier (e.g., q1, q2)
platform string Platform name (e.g., WDB-1, EMR)
benchmark string Benchmark name (e.g., TPC-H, SpatialBench)
scale_factor int64 Data scale factor (e.g., 100, 1000)
final_status string Query execution status (SUCCESS, ERROR, TIMEOUT)
runtime float64 Query runtime in seconds (null if failed)
result_count int64 Number of result rows (null if failed)
cost float64 Estimated cost in USD (null if unavailable)
execution_name string Run identifier (typically YYYY-MM-DD)
exception string Exception message if query failed (null otherwise)
platform_config string Platform configuration as JSON string

CSV (Simplified Schema)

Human-readable format, same as parquet but without exception and platform_config columns.

Metadata JSON

Each run produces a metadata file containing:

  • execution_name: Run identifier
  • start_time / end_time: ISO 8601 timestamps
  • duration_seconds: Total run duration
  • cli_args: CLI arguments used for the run
  • platform_configs: Per-platform configuration details

How Results Are Published

Results are automatically published by the benchmark-dashboard CI pipeline after each benchmark run. The automated benchmark runs on the 1st of every month.

Reading the Data

import pyarrow.parquet as pq

# Read a specific month's results
table = pq.read_table("tpch/parquet/sf100/2025-07.parquet")
df = table.to_pandas()
print(df[df["final_status"] == "SUCCESS"].groupby("platform")["runtime"].mean())

License

This data is published by Wherobots. See the main benchmark-dashboard repository for details.

About

Monthly benchmark results from Wherobots performance benchmarks (TPC-H, SpatialBench)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors