SNOW-3176017: Fix accidental removal of aliases in certain JOIN statements by sfc-gh-joshi · Pull Request #4096 · snowflakedb/snowpark-python

sfc-gh-joshi · 2026-02-27T23:21:13Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-3176017
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.

Reverts #4095, restoring the optimizations made in SNOW-2895675.

The original optimization, which replaced a layer of SELECT "A" AS "A", "B" AS "B" with SELECT * in certain join operations, caused a test failure in SnowML's CI pipeline. A minimal version of the test is reproduced in this PR in test_query_generator.py::test_disambiguate_skips_quoted_alias, and is as follows:

df1 = session.read.parquet(stage_filename)
df2 = session.create_dataframe(data, schema=["ID", "A", "B"])
df_res = df1.join(df2, on=["ID"])[['"COL_0"', '"COL_1"']]

This would generate SQL like the following:

bad SQL

SELECT 
    "COL_0", 
    "COL_1"
 FROM (
 SELECT  * 
 FROM (
(
 SELECT $1 AS "ID", $2 AS """COL_0""", $3 AS """COL_1""" FROM  VALUES (0 :: INT, 1 :: INT, 2 :: INT), (3 :: INT, 4 :: INT, 5 :: INT
)
) AS SNOWPARK_LEFT 
INNER JOIN 
(
 SELECT 
    "ID", 
    "A", 
    "B"
 FROM (
 SELECT $1 AS "ID", $2 AS "A", $3 AS "B" FROM  VALUES (0 :: INT, 1 :: INT, 2 :: INT), (3 :: INT, 4 :: INT, 5 :: INT)
)
) AS SNOWPARK_RIGHT
 USING (ID)
)
)

The file read operation was producing aliases of triple-quoted column names (which are used in SQL to produce single-quoted column names), which were silently dropped by the join optimization, resulting in the top-level references to COL_0 and COL_1 becoming invalid.

When _disambiguate is called during join operations, the new optimization skipped alias generation if the left and right side did not share any common column names. However, the _alias_if_needed actually serves the additional function of stripping quotes from identifiers it processes, including from triple-quoted identifiers. Even though there were no common column names in this case, aliasing was still necessary to rename """COL_0""" to "COL_0".

This error path is triggered only by ReadFile operations, which more or less directly generate their own SQL SELECT/COPY statements with aliasing inside. The join operation then implicitly normalized triple-quoted names like """COL_0""" to "COL_0" (whether this behavior is itself correct should be investigated separately).

…nation column are identical (#4095)" This reverts commit 5453991.

sfc-gh-yuwang · 2026-02-28T00:03:51Z

some merge gates are still failing

src/snowflake/snowpark/dataframe.py

sfc-gh-aling · 2026-03-10T18:51:35Z

can you help run the ML test job to verify that the change won't cause any error related to identifier compilation error?

sfc-gh-joshi · 2026-03-11T20:21:19Z

can you help run the ML test job to verify that the change won't cause any error related to identifier compilation error?

The affected test in the SnowML job appears to be failing for database permission reasons, but I manually verified that the issue is no longer present when running the test locally.

…x-join-dealiasing

graphite-app · 2026-03-20T22:47:51Z

src/snowflake/snowpark/dataframe.py

+    # We use the session of the LHS DataFrame to report this telemetry
+    lhs._session._conn._telemetry_client.send_alias_in_join_telemetry()


Telemetry is now sent unconditionally for all joins, even when there are no common column names requiring aliasing. Previously (line 333), this was conditional on if common_col_names:. This will cause telemetry spam for joins that don't actually need aliasing.

Fix: Restore the conditional check:

if common_col_names: # We use the session of the LHS DataFrame to report this telemetry lhs._session._conn._telemetry_client.send_alias_in_join_telemetry()

Suggested change

# We use the session of the LHS DataFrame to report this telemetry

lhs._session._conn._telemetry_client.send_alias_in_join_telemetry()

if common_col_names:

# We use the session of the LHS DataFrame to report this telemetry

lhs._session._conn._telemetry_client.send_alias_in_join_telemetry()

Spotted by Graphite

Is this helpful? React 👍 or 👎 to let us know.

sfc-gh-joshi added 2 commits February 27, 2026 14:59

Revert "NO-SNOW: Revert "SNOW-2895675: Skip aliases when source/desti…

e73a75e

…nation column are identical (#4095)" This reverts commit 5453991.

fix de-aliasing

206b173

sfc-gh-joshi requested review from a team as code owners February 27, 2026 23:21

sfc-gh-joshi requested review from sfc-gh-jrose, sfc-gh-mayliu and sfc-gh-yuwang February 27, 2026 23:21

github-actions bot added the local testing Local Testing issues/PRs label Feb 27, 2026

sfc-gh-joshi and others added 4 commits February 27, 2026 16:41

normalize db/schema path

7f5fdf3

Merge branch 'main' into joshi-SNOW-3176017-fix-join-dealiasing

556dd1f

Merge branch 'main' into joshi-SNOW-3176017-fix-join-dealiasing

79c1f74

Merge branch 'main' into joshi-SNOW-3176017-fix-join-dealiasing

2eabf58

sfc-gh-aling reviewed Mar 10, 2026

View reviewed changes

src/snowflake/snowpark/dataframe.py Show resolved Hide resolved

sfc-gh-aling approved these changes Mar 10, 2026

View reviewed changes

Merge branch 'main' into joshi-SNOW-3176017-fix-join-dealiasing

56f1506

sfc-gh-aalam approved these changes Mar 18, 2026

View reviewed changes

sfc-gh-helmeleegy approved these changes Mar 19, 2026

View reviewed changes

sfc-gh-joshi added 2 commits March 20, 2026 15:41

Merge remote-tracking branch 'origin/main' into joshi-SNOW-3176017-fi…

3865def

…x-join-dealiasing

double fix changelog

cae7f55

graphite-app bot reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-3176017: Fix accidental removal of aliases in certain JOIN statements#4096

SNOW-3176017: Fix accidental removal of aliases in certain JOIN statements#4096
sfc-gh-joshi wants to merge 9 commits intomainfrom
joshi-SNOW-3176017-fix-join-dealiasing

sfc-gh-joshi commented Feb 27, 2026

Uh oh!

sfc-gh-yuwang commented Feb 28, 2026

Uh oh!

Uh oh!

sfc-gh-aling commented Mar 10, 2026

Uh oh!

sfc-gh-joshi commented Mar 11, 2026

Uh oh!

graphite-app bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		# We use the session of the LHS DataFrame to report this telemetry
		lhs._session._conn._telemetry_client.send_alias_in_join_telemetry()

Conversation

sfc-gh-joshi commented Feb 27, 2026

Uh oh!

sfc-gh-yuwang commented Feb 28, 2026

Uh oh!

Uh oh!

sfc-gh-aling commented Mar 10, 2026

Uh oh!

sfc-gh-joshi commented Mar 11, 2026

Uh oh!

graphite-app bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants