Skip to content

Fix correlated subquery COUNT(*) returning NULL in ungrouped aggregates#5678

Closed
bendechrai wants to merge 6 commits intotursodatabase:mainfrom
bendechrai:fix/correlated-subquery-ungrouped-aggregate
Closed

Fix correlated subquery COUNT(*) returning NULL in ungrouped aggregates#5678
bendechrai wants to merge 6 commits intotursodatabase:mainfrom
bendechrai:fix/correlated-subquery-ungrouped-aggregate

Conversation

@bendechrai
Copy link
Copy Markdown

@bendechrai bendechrai commented Feb 28, 2026

Summary

Correlated scalar subqueries with COUNT(*) return NULL instead of 0 in ungrouped
aggregate queries when no rows match the outer WHERE clause. This is a data
corruption bug: the wrong NULL gets persisted via INSERT...SELECT and cannot be
recovered by a future patch.

This PR includes both the bug fix and a DST simulator improvement that catches
this bug class via differential testing.

Repro:

CREATE TABLE t1(a INTEGER, b INTEGER);
CREATE TABLE t2(a INTEGER, b INTEGER);
SELECT COUNT(*), (SELECT COUNT(*) FROM t2 WHERE t2.a = t1.a) FROM t1;

Expected (SQLite): 0|0
Before this PR: 0|NULL

Data corruption proof

CREATE TABLE dst(cnt INTEGER, sub_cnt INTEGER);
INSERT INTO dst SELECT COUNT(*), (SELECT COUNT(*) FROM t2 WHERE t2.a = t1.a) FROM t1;
SELECT cnt, typeof(cnt), sub_cnt, typeof(sub_cnt) FROM dst;

Expected (SQLite): 0|integer|0|integer
Before this PR: 0|integer||null -- NULL persisted to disk

Root cause

Correlated subqueries are emitted inside the scan loop body (in open_loop via
main_loop.rs). When no rows match, the loop body never executes at runtime,
the subquery bytecode never runs, and the result register keeps its initial NULL.

In emit_ungrouped_aggregation, the fallback path re-evaluates non-aggregate
columns via translate_expr_no_constant_opt, but this only reads the (still-NULL)
result register -- it does not re-execute the subquery.

Fix

  • Before open_loop consumes the subquery plans, clone the plans for correlated
    subqueries in ungrouped aggregate queries (new field
    TranslateCtx::deferred_ungrouped_agg_subqueries).
  • In emit_ungrouped_aggregation's fallback path (loop never ran), drain and
    emit these deferred subqueries via emit_non_from_clause_subquery. Cursors
    are in NullRow state, so correlated column refs resolve to NULL and COUNT(*)
    correctly returns 0.
  • Skip this for contains_constant_false_condition (WHERE 0), where the Goto
    skips cursor initialization entirely.

DST simulator improvement

The simulator previously generated no aggregate functions and no correlated
subqueries. This PR adds query generation for:

SELECT COUNT(*), (SELECT COUNT(*) FROM t2 WHERE t2.col = t1.col) FROM t1 WHERE ...

Changes to sql_generation:

  • Predicate::count_star(), subquery(), qualified_column() helpers
  • Select::count_with_correlated_subquery() factory method
  • 15% chance branch in Select::arbitrary() when 2+ tables exist
  • dependencies() updated to track inner tables of subquery expressions

The differential testing oracle (Turso vs SQLite) now catches this bug class
automatically. Without this generation, the simulator passes on the buggy code;
with it, the mismatch is detected within seconds.

Before / After / SQLite comparison

Query SQLite Before After
Empty table: SELECT COUNT(*), (SELECT COUNT(*) ...) 0|0 0|NULL 0|0
WHERE filters all rows 0|0 0|NULL 0|0
INSERT...SELECT typeof(sub_cnt) integer null integer
Positive case (rows match) 3|2 3|2 3|2

Related PRs (different bugs)

Test plan

  • New sqltest: correlated-subquery-ungrouped-aggregate.sqltest (6 cases)
  • Full sqltest suite: 7223 passed, 0 new failures
  • Simulator differential mode catches bug on buggy code, passes on fixed code
  • cargo clippy --workspace --all-features --all-targets -- --deny=warnings
  • cargo fmt

Correlated scalar subqueries with COUNT(*) return NULL instead of 0 in
ungrouped aggregate queries when no rows match the outer WHERE clause.
This is a data corruption bug: NULL gets persisted via INSERT...SELECT.

Root cause: correlated subqueries are emitted inside the scan loop body.
When no rows match, the loop never executes, the subquery bytecode never
runs, and the result register keeps its initial NULL value.

Fix: clone correlated subquery plans before open_loop consumes them, then
re-emit them in emit_ungrouped_aggregation's fallback path (when the loop
never ran). Cursors are in NullRow state so COUNT(*) correctly returns 0.
Copy link
Copy Markdown

@turso-bot turso-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review @PThorpe92

SQLite returns 0|0 for this query but Turso returns 0|NULL because the
WHERE 0 path skips cursor initialization entirely. This is a separate
pre-existing issue unrelated to the correlated subquery fix.
…ator

Generate queries like SELECT COUNT(*), (SELECT COUNT(*) FROM t2
WHERE t2.col = t1.col) FROM t1 WHERE ... in the differential
testing simulator. This exercises the code path where correlated
scalar subqueries with COUNT(*) must return 0 (not NULL) when
no outer rows match.

- Add count_star(), subquery(), qualified_column() helpers to Predicate
- Add Select::count_with_correlated_subquery() factory
- 15% chance branch in Select::arbitrary() when 2+ tables exist
- Fix dependencies() to track inner tables of subquery expressions
- Update COVERAGE.md for COUNT(*) and correlated scalar subqueries
@bendechrai bendechrai marked this pull request as draft March 3, 2026 16:37
collect_select_table_names was extracting just qname.name (e.g.
"tablename") but the simulator tracks attached-database tables
with the full prefix (e.g. "aux0.tablename"). Reconstruct the
full name from QualifiedName.db_name + QualifiedName.name.
- qualified_column() now handles "aux0.table" names by emitting
  DoublyQualified(db, table, column) instead of Qualified
- Disable correlated subquery generation (set to 0%) because it
  surfaces a pre-existing cursor bug ("cursor id 1 is None") in
  correlated subquery execution after DDL on referenced tables.
  The model code (count_star, subquery, count_with_correlated_subquery)
  is ready to enable once that bug is fixed.
- Use WHERE TRUE to avoid constant-false-condition optimizer path
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 3, 2026

Merging this PR will degrade performance by 15.5%

❌ 1 regressed benchmark
✅ 278 untouched benchmarks
⏩ 105 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation ltrim_with_pattern 1.2 µs 1.4 µs -15.5%

Comparing bendechrai:fix/correlated-subquery-ungrouped-aggregate (92895ad) with main (35ffd30)2

Open in CodSpeed

Footnotes

  1. 105 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (adb9f4b) during the generation of this report, so 35ffd30 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@LeMikaelF
Copy link
Copy Markdown
Collaborator

I'm looking at the EXPLAINs for this, SQLite, and main, and there's something fishy with this fix, there's 3 AggFinal opcodes and the program is almost twice as long as SQLite's for the following query:

select count(*), (select count(*) where t.a) from t;

I see that you marked this as a draft again, so I'll let you work on it and review it again when you mark it ready again.

@bendechrai bendechrai closed this Mar 26, 2026
@bendechrai bendechrai deleted the fix/correlated-subquery-ungrouped-aggregate branch March 26, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants