Skip to content

perf: reduce per-query allocations on hot path#239

Open
adamyeats wants to merge 1 commit intomainfrom
perf/query-hot-path
Open

perf: reduce per-query allocations on hot path#239
adamyeats wants to merge 1 commit intomainfrom
perf/query-hot-path

Conversation

@adamyeats
Copy link
Copy Markdown
Contributor

@adamyeats adamyeats commented Apr 19, 2026

Summary

Reduces per-query allocations on the sqlds hot path. Every change is mechanical and preserves existing behaviour; the fleet-wide impact across ~15 Grafana SQL datasource plugins (snowflake, bigquery, clickhouse, databricks, redshift, mssql, postgres, mysql, oracle, saphana, athena, …) is proportional.

Note

Related: #240perf: Plumb RowCapacityHint through DBQuery to presize frame fields. The two PRs are independent and can land in either order; #240 targets per-column slice growth inside sqlutil.FrameFromRows (and is blocked on grafana/grafana-plugin-sdk-go#1536), while this PR targets allocations in sqlds itself around it. The performance wins are additive.

Five targeted changes:

  • connector: cache defaultKey once on Connector instead of fmt.Sprintfing it per GetConnectionFromQuery / Connect call.
  • datasource: cache driverSettings, Converters(), and the six optional mutator type assertions (QueryDataMutator, QueryMutator, ResponseMutator, QueryArgSetter, QueryErrorMutator, CheckHealthMutator) at construction; hoist settings := ds.DriverSettings() at the top of handleQuery instead of re-copying the struct 10× per query.
  • query: replace applyHeaders' json.Unmarshal → map → json.Marshal round-trip with a byte-level JSON-key injection fast path; fall back to the original code on malformed input. Preallocate fixFrameForLongToMulti's time slice to the known length.
  • bench_test (new): benchmark harness covering every change plus two deferred candidates (DriverSettings/Converters direct, Response_ConcurrentSet). TestMain silences backend.Logger so hclog doesn't interleave with go test output and corrupt samples.

Benchmark results

go test -bench=. -benchmem -count=10 -run=^$ ./... on Apple M2 Pro, Darwin arm64, benchstat comparison against main:

sec/op

Benchmark Before After Δ
ApplyHeaders/Nil 1097.5n ± 26% 614.9n ± 22% -43.98%
ApplyHeaders/Empty 1013.5n ± 4% 598.0n ± 2% -41.00%
ApplyHeaders/Small 1395.0n ± 6% 621.6n ± 5% -55.44%
ApplyHeaders/Large 3552.5n ± 25% 747.3n ± 13% -78.96%
FixFrameForLongToMulti/100 6.389µ ± 22% 4.516µ ± 14% -29.31%
FixFrameForLongToMulti/1000 48.80µ ± 5% 36.02µ ± 10% -26.19%
FixFrameForLongToMulti/10000 528.9µ ± 2% 353.4µ ± 22% -33.19%
Connector_GetConnectionFromQuery_SingleConn 118.15n ± 5% 42.63n ± 1% -63.92%
HandleQueryMutatorChecks 4.766n ± 16% 2.076n ± 10% -56.44%
DriverSettings 2.785n ± 16% 2.842n ± 11% ~
HandleQuery_SettingsReads 1.986n ± 2% 1.994n ± 1% ~
DriverConverters 1.975n ± 2% 1.990n ± 6% ~
DefaultKey 56.06n ± 20% 57.96n ± 13% ~
Response_ConcurrentSet/1 458.1n ± 25% 469.6n ± 9% ~
Response_ConcurrentSet/10 4.315µ ± 5% 4.229µ ± 10% ~
Response_ConcurrentSet/50 24.11µ ± 6% 26.81µ ± 14% ~
geomean 499.3n 338.9n -32.14%

B/op

Benchmark Before After Δ
ApplyHeaders/Nil 992 536 -45.97%
ApplyHeaders/Empty 992 544 -45.16%
ApplyHeaders/Small 1153 576 -50.04%
ApplyHeaders/Large 2322 776 -66.58%
FixFrameForLongToMulti/100 14.594Ki 9.867Ki -32.39%
FixFrameForLongToMulti/1000 121.94Ki 87.96Ki -27.86%
FixFrameForLongToMulti/10000 1588.9Ki 874.9Ki -44.94%
Connector_GetConnectionFromQuery_SingleConn 40 0 -100%

allocs/op

Benchmark Before After Δ
ApplyHeaders/Nil 17 9 -47.06%
ApplyHeaders/Empty 17 10 -41.18%
ApplyHeaders/Small 24 10 -58.33%
ApplyHeaders/Large 57 10 -82.46%
FixFrameForLongToMulti/100 122 115 -5.74%
FixFrameForLongToMulti/1000 1025 1015 -0.98%
FixFrameForLongToMulti/10000 10.03k 10.02k -0.17%
Connector_GetConnectionFromQuery_SingleConn 2 0 -100%

Benchmarks that show no significant change (DriverSettings, HandleQuery_SettingsReads, DriverConverters, DefaultKey, Response_ConcurrentSet) were added as controls / deferred candidates — they were included in the harness before picking which fixes to land, and are kept so future regressions are detectable.

Running the benchmarks and tests

  • go test ./...\
  • go vet ./...\
  • go test -bench=. -benchmem -count=10 -run=^$ ./...\ before and after, compare the two sets with `benchstat`

@adamyeats adamyeats requested a review from a team as a code owner April 19, 2026 00:17
@adamyeats adamyeats changed the title perf: reduce per-query allocations on sqlds hot path perf: reduce per-query allocations on hot path Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant