Skip to content

Commit f73da4d

Browse files
authored
Add Reproducer for Issues with LEFT joins on Fixed Size Binary Columns (#19800)
## Which issue does this PR close? Adds a reproducer for #19067 and closes #19067. ## Rationale for this change The bug has been fixed in arrow-rs (apache/arrow-rs#8981). To ensure this case is covered in the tests, we add a reproducer. ## What changes are included in this PR? - SLT test case exhibiting the issue. DISCLAIMER: First version was generated using AI from the original reproducer and improved by me. Happy to incorporate further suggestions for improvements. ## Are these changes tested? Yes. I've also ensure that the test case exhibits the issue on `branch-51`. Diff when running the test on `branch-51`: ``` [Diff] (-expected|+actual) 1 aaaaaaaa aaaaaaaa 1000 2 bbbbbbbb bbbbbbbb 2000 - 3 cccccccc NULL NULL + 3 cccccccc aaaaaaaa NULL ``` ## Are there any user-facing changes? No
1 parent c4f039f commit f73da4d

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

datafusion/sqllogictest/test_files/joins.slt

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5197,3 +5197,73 @@ DROP TABLE t1_c;
51975197

51985198
statement ok
51995199
DROP TABLE t2_c;
5200+
5201+
# Reproducer of https://github.com/apache/datafusion/issues/19067
5202+
statement count 0
5203+
set datafusion.explain.physical_plan_only = true;
5204+
5205+
# Setup Left Table with FixedSizeBinary(4)
5206+
statement count 0
5207+
CREATE TABLE issue_19067_left AS
5208+
SELECT
5209+
column1 as id,
5210+
arrow_cast(decode(column2, 'hex'), 'FixedSizeBinary(4)') as join_key
5211+
FROM (VALUES
5212+
(1, 'AAAAAAAA'),
5213+
(2, 'BBBBBBBB'),
5214+
(3, 'CCCCCCCC')
5215+
);
5216+
5217+
# Setup Right Table with FixedSizeBinary(4)
5218+
statement count 0
5219+
CREATE TABLE issue_19067_right AS
5220+
SELECT
5221+
arrow_cast(decode(column1, 'hex'), 'FixedSizeBinary(4)') as join_key,
5222+
column2 as value
5223+
FROM (VALUES
5224+
('AAAAAAAA', 1000),
5225+
('BBBBBBBB', 2000)
5226+
);
5227+
5228+
# Perform Left Join. Third row should contain NULL in `right_key`.
5229+
query I??I
5230+
SELECT
5231+
l.id,
5232+
l.join_key as left_key,
5233+
r.join_key as right_key,
5234+
r.value
5235+
FROM issue_19067_left l
5236+
LEFT JOIN issue_19067_right r ON l.join_key = r.join_key
5237+
ORDER BY l.id;
5238+
----
5239+
1 aaaaaaaa aaaaaaaa 1000
5240+
2 bbbbbbbb bbbbbbbb 2000
5241+
3 cccccccc NULL NULL
5242+
5243+
# Ensure usage of HashJoinExec
5244+
query TT
5245+
EXPLAIN
5246+
SELECT
5247+
l.id,
5248+
l.join_key as left_key,
5249+
r.join_key as right_key,
5250+
r.value
5251+
FROM issue_19067_left l
5252+
LEFT JOIN issue_19067_right r ON l.join_key = r.join_key
5253+
ORDER BY l.id;
5254+
----
5255+
physical_plan
5256+
01)SortExec: expr=[id@0 ASC NULLS LAST], preserve_partitioning=[false]
5257+
02)--ProjectionExec: expr=[id@2 as id, join_key@3 as left_key, join_key@0 as right_key, value@1 as value]
5258+
03)----HashJoinExec: mode=CollectLeft, join_type=Right, on=[(join_key@0, join_key@1)]
5259+
04)------DataSourceExec: partitions=1, partition_sizes=[1]
5260+
05)------DataSourceExec: partitions=1, partition_sizes=[1]
5261+
5262+
statement count 0
5263+
set datafusion.explain.physical_plan_only = false;
5264+
5265+
statement count 0
5266+
DROP TABLE issue_19067_left;
5267+
5268+
statement count 0
5269+
DROP TABLE issue_19067_right;

0 commit comments

Comments
 (0)