Skip to content

feat(duckdb)!: support transpilation of function LEFT from Snowflake …#7419

Open
fivetran-ashashankar wants to merge 7 commits intomainfrom
RD-1147725-transpile-LEFT
Open

feat(duckdb)!: support transpilation of function LEFT from Snowflake …#7419
fivetran-ashashankar wants to merge 7 commits intomainfrom
RD-1147725-transpile-LEFT

Conversation

@fivetran-ashashankar
Copy link
Copy Markdown
Collaborator

…to DuckDB

  • Add left_sql() method to handle BINARY/BLOB arguments
  • Convert BLOB to HEX string, multiply length by 2, apply LEFT, convert back with UNHEX
  • For VARCHAR arguments, use native DuckDB LEFT function

@geooo109
Copy link
Copy Markdown
Collaborator

geooo109 commented Mar 31, 2026

@fivetran-ashashankar Let's also add a check on the query, if the second argument (length) is negative we have to return an empty string/blob.

memory D SELECT LEFT('HELLO', -2);
┌─────────────────────┐
│ "left"('HELLO', -2) │
│       varchar       │
├─────────────────────┤
│ HEL                 │
└─────────────────────┘
memory D SELECT RIGHT('HELLO', -2);
┌──────────────────────┐
│ "right"('HELLO', -2) │
│       varchar        │
├──────────────────────┤
│ LLO                  │

This ^ isn't covered for both RIGHT and LEFT. Snowflake for negative length doesn't return a substring both for string and blob inputs.

So, in this approach #7419 (comment) we can also add a check for the length argument.

The check should be added on top of the generated query. As we have already discussed we have to cover cases like the following:

WITH t AS (SELECT -2 AS l) SELECT LEFT(TO_BINARY('SNOWIKOPN', 'utf-8'), l) FROM t;

We don't know the value of l at generation time.

@github-actions
Copy link
Copy Markdown
Contributor

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:RD-1147725-transpile-LEFT, sqlglot version: RD-1147725-transpile-LEFT)
  • baseline (main, sqlglot version: 0.0.1.dev1)

By Dialect

dialect main sqlglot:RD-1147725-transpile-LEFT transitions links
bigquery -> bigquery 23872/23877 passed (100.0%) 21252/21252 passed (100.0%) No change full result / delta
bigquery -> duckdb 2105/2624 passed (80.2%) 0/0 passed (0.0%) No change full result / delta
snowflake -> duckdb 1511/2674 passed (56.5%) 1511/2674 passed (56.5%) No change full result / delta
snowflake -> snowflake 65910/65910 passed (100.0%) 65910/65910 passed (100.0%) No change full result / delta
databricks -> databricks 1364/1364 passed (100.0%) 1364/1364 passed (100.0%) No change full result / delta
postgres -> postgres 6040/6040 passed (100.0%) 6040/6040 passed (100.0%) No change full result / delta
redshift -> redshift 7059/7059 passed (100.0%) 7059/7059 passed (100.0%) No change full result / delta

Overall

main: 109548 total, 107861 passed (pass rate: 98.5%), sqlglot version: 0.0.1.dev1

sqlglot:RD-1147725-transpile-LEFT: 108302 total, 107139 passed (pass rate: 98.9%), sqlglot version: RD-1147725-transpile-LEFT

Transitions:
No change

✅ 1 test(s) passed

@fivetran-ashashankar
Copy link
Copy Markdown
Collaborator Author

@fivetran-ashashankar Let's also add a check on the query, if the second argument (length) is negative we have to return an empty string/blob.

memory D SELECT LEFT('HELLO', -2);
┌─────────────────────┐
│ "left"('HELLO', -2) │
│       varchar       │
├─────────────────────┤
│ HEL                 │
└─────────────────────┘
memory D SELECT RIGHT('HELLO', -2);
┌──────────────────────┐
│ "right"('HELLO', -2) │
│       varchar        │
├──────────────────────┤
│ LLO                  │

This ^ isn't covered for both RIGHT and LEFT. Snowflake for negative length doesn't return a substring both for string and blob inputs.

So, in this approach #7419 (comment) we can also add a check for the length argument.

The check should be added on top of the generated query. As we have already discussed we have to cover cases like the following:

WITH t AS (SELECT -2 AS l) SELECT LEFT(TO_BINARY('SNOWIKOPN', 'utf-8'), l) FROM t;

We don't know the value of l at generation time.
added runtime checks for negative length values:
For VARCHAR: returns '' when length < 0
For BLOB: returns UNHEX('') when length < 0
Optimized to skip CASE for positive literals (e.g., LEFT('HELLO', 2) generates clean LEFT('HELLO', 2))
Adds CASE for all non-literals (columns, functions, expressions) since we can't determine the value at transpile time

WITH t AS (SELECT -2 AS l) SELECT LEFT(TO_BINARY('SNOWIKOPN', 'utf-8'), l) FROM t; transpiles to WITH t AS (SELECT -2 AS l) SELECT CASE WHEN l < 0 THEN UNHEX('') ELSE UNHEX(LEFT(HEX(ENCODE('SNOWIKOPN')), l * 2)) END FROM t;

identified l as a column, adds case and handles the blob convertion

…to DuckDB

- Add left_sql() method to handle BINARY/BLOB arguments
- Convert BLOB to HEX string, multiply length by 2, apply LEFT, convert back with UNHEX
- For VARCHAR arguments, use native DuckDB LEFT function
@fivetran-ashashankar fivetran-ashashankar force-pushed the RD-1147725-transpile-LEFT branch from 9a7b596 to b700525 Compare April 1, 2026 17:06
Comment on lines +3296 to +3330
def _left_right_sql(self, expression: exp.Left | exp.Right, func_name: str) -> str:
arg = expression.this
length = expression.expression

# For BINARY/BLOB: DuckDB doesn't support RIGHT on BLOB
# Convert to HEX string, use RIGHT, then convert back to BLOB
# Only add negative length handling for Snowflake-originated queries
needs_case = expression.args.get("negative_length_returns_empty")

# For BINARY/BLOB: DuckDB doesn't support LEFT/RIGHT on BLOB
# Convert to HEX string, use LEFT/RIGHT, then convert back to BLOB
if _is_binary(arg):
# RIGHT(blob, n) becomes UNHEX(RIGHT(HEX(blob), n * 2))
# LEFT/RIGHT(blob, n) becomes UNHEX(LEFT/RIGHT(HEX(blob), n * 2))
# Each byte becomes 2 hex chars, so multiply length by 2
hex_arg = exp.Hex(this=arg)
hex_length = exp.Mul(this=length, expression=exp.Literal.number(2))
# since this exp.Right is not annotated, it won't enter this _is_binary branch during the recursive call
hex_right = self.func("RIGHT", hex_arg, hex_length)
result = exp.Unhex(this=hex_right)
result = exp.Unhex(this=self.func(func_name, hex_arg, hex_length))

# Handle negative length: return empty blob
if needs_case:
empty_blob = exp.Unhex(this=exp.Literal.string(""))
case_expr = (
exp.case().when(length < exp.Literal.number(0), empty_blob).else_(result)
)
return self.sql(case_expr)
return self.sql(result)

# For VARCHAR: Use native RIGHT function
return self.func("RIGHT", arg, length)
# For VARCHAR: Use native LEFT/RIGHT function
# Handle negative length: return empty string
if needs_case:
func_expr = exp.func(func_name, arg, length)
empty_string = exp.Literal.string("")
case_expr = (
exp.case().when(length < exp.Literal.number(0), empty_string).else_(func_expr)
)
return self.sql(case_expr)
return self.func(func_name, arg, length)
Copy link
Copy Markdown
Collaborator

@geooo109 geooo109 Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's minimize this to something like:

def _left_right_sql(self, expression: exp.Left | exp.Right, func_name: str) -> str:                                                                                                        
    arg = expression.this                                                                                                                                                                  
    length = expression.expression                                                                                                                                                         
    is_binary = _is_binary(arg)                                                                                                                                                            
                                                                                                                                                                                            
    if is_binary:
        hex_arg = exp.Hex(this=arg)
        # LEFT/RIGHT(blob, n) becomes UNHEX(LEFT/RIGHT(HEX(blob), n * 2))
        # Each byte becomes 2 hex chars, so multiply length by 2
        hex_length = exp.Mul(this=length, expression=exp.Literal.number(2))
        result = exp.Unhex(this=self.func(func_name, hex_arg, hex_length))                                                                                                                 
    else:
        result = exp.func(func_name, arg, length)                                                                                                                                          
                                                                                                                                                                                            
    if expression.args.get("negative_length_returns_empty"):
        empty = exp.Literal.string("")       
        if is_binary: 
           empty = exp.Unhex(this=empty)                                                                                 
        result = exp.case().when(length < exp.Literal.number(0), empty).else_(result)
                                                                                                                                                                                            
    return self.sql(result)

   - Add DuckDB roundtrip tests for LEFT/RIGHT with literals and columns
   - Consolidate and expand Snowflake LEFT/RIGHT transpilation tests
   - Test both VARCHAR and BINARY data types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants