Skip to content

Fix snow stage copy --recursive for vstage (snow://) paths#2786

Open
sfc-gh-moczko wants to merge 2 commits intomainfrom
fix/vstage-recursive-copy
Open

Fix snow stage copy --recursive for vstage (snow://) paths#2786
sfc-gh-moczko wants to merge 2 commits intomainfrom
fix/vstage-recursive-copy

Conversation

@sfc-gh-moczko
Copy link
Copy Markdown
Collaborator

@sfc-gh-moczko sfc-gh-moczko commented Feb 26, 2026

Summary

Fixes two bugs that caused snow stage copy --recursive to fail when downloading from vstage (snow://) paths, particularly DCM deployment stages like snow://project/.../deployments/DEPLOYMENT$1/.

  • Bug 1 — iter_stage() loses snow:// prefix: When ls returns plain relative paths (e.g., deployments/DEPLOYMENT$1/manifest.yml), build_path() misinterprets them as @-prefixed stage paths, generating invalid SQL (get @deployments/... instead of get 'snow://project/...'). Fixed by adding a vstage-aware branch that reconstructs full paths via stage_path.root_path() / file["name"].

  • Bug 2 — StagePath.relative_to() is case-sensitive: Snowflake normalizes unquoted identifiers to lowercase in ls output (DEPLOYMENT$1 becomes deployment$1), but PurePosixPath.relative_to() is case-sensitive, causing ValueError when computing local directory structure. Fixed with case-insensitive part-by-part comparison.

Files changed

File Change
src/snowflake/cli/_plugins/stage/manager.py 2-line elif branch in iter_stage() for snow:// paths
src/snowflake/cli/api/stage_path.py Replaced relative_to() with case-insensitive implementation
tests/stage/test_stage.py 4 new parametrized test cases across 2 test functions
tests/stage/test_stage_path.py 8 new parametrized test cases across 3 test functions

Test plan

  • New unit tests reproduce both bugs (5 failures before fix, 0 after)
  • Full stage test suite passes (327 passed, 50 snapshots)
  • All pre-commit hooks pass (ruff, black, mypy, codespell)
  • Manual verification against real DCM project on Snowflake (7 files downloaded with correct directory structure)
  • CI passes

@sfc-gh-moczko sfc-gh-moczko requested a review from a team as a code owner February 26, 2026 03:41
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/vstage-recursive-copy branch from de5191c to 926d6a8 Compare February 26, 2026 15:33
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/vstage-recursive-copy branch from 926d6a8 to 5407c94 Compare March 18, 2026 17:41
Copy link
Copy Markdown

@sfc-gh-olorek sfc-gh-olorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments

Comment on lines 570 to 573
elif stage_path.absolute_path().startswith(SNOW_PREFIX):
path = stage_path.root_path() / file["name"]
else:
# Snowflake `ls` returns unqualified names; re-attach the original FQN.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the else branch comment now only applies to @-stage paths. what do you think about
" For @-stage paths: Snowflake ls returns... "

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The comment on the else branch now reads: "For @-stage paths: Snowflake ls returns unqualified names (e.g. `stage_name/path/file.sql`); re-attach the original FQN prefix so the result uses the correct qualified stage name."

Comment on lines +300 to +301
for s, o in zip(self_parts[: len(other_parts)], other_parts):
if s.lower() != o.lower():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure here about being case-insensite. Snowflake preservers case sensitivity for quoted identifiers: "Data"/file.txt" should be different than "data"/file.txt". we should handle this case

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. relative_to() now uses per-component case rules:

  • Unquoted identifiers (only [A-Za-z0-9_$]) — Snowflake normalises these to lowercase in ls output, so we compare case-insensitively.
  • Quoted identifiers (any character outside that set, e.g. a dot or space) — Snowflake preserves their case, so we compare exactly.

The detection regex _UNQUOTED_IDENTIFIER_RE = re.compile(r"^[A-Za-z0-9_$]+$") is applied to each path component pair. If either component has characters outside the unquoted charset, we fall through to an exact match. Added test_relative_to_quoted_identifier_success and test_relative_to_raises_for_non_subpath to cover both cases.

path = StagePath.get_user_stage() / file["name"]
elif stage_path.is_git_repo():
path = self.build_path(file["name"])
elif stage_path.absolute_path().startswith(SNOW_PREFIX):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, not a blocker: this can be encapsulated with new is_vstage method on StagePath

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added is_vstage() method to StagePath and updated iter_stage() to use it (elif stage_path.is_vstage():). Also added parametrised test_is_vstage and test_is_vstage_preserved_through_operations to verify the method returns the correct value through joinpath, parent, and root_path.

sfc-gh-moczko added a commit that referenced this pull request Mar 24, 2026
…e iter_stage comment (#2786)

- Add StagePath.is_vstage() method returning _is_snow_prefixed_stage,
  mirroring is_user_stage() and is_git_repo(); use it in iter_stage()
  instead of leaking the SNOW_PREFIX constant into manager.py.
- Fix relative_to() to apply case-insensitive comparison only for
  unquoted identifier components ([A-Za-z0-9_$]+), which Snowflake
  normalizes to lowercase in ls output. Components containing other
  characters (dots, spaces, etc.) imply quoted identifiers whose case
  Snowflake preserves, so those are compared exactly.
- Update the else-branch comment in iter_stage() to clarify it applies
  to @-stage paths only, now that vstage has its own elif branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sfc-gh-moczko added a commit that referenced this pull request Mar 24, 2026
…e iter_stage comment (#2786)

- Add StagePath.is_vstage() method returning _is_snow_prefixed_stage,
  mirroring is_user_stage() and is_git_repo(); use it in iter_stage()
  instead of leaking the SNOW_PREFIX constant into manager.py.
- Fix relative_to() to apply case-insensitive comparison only for
  unquoted identifier components ([A-Za-z0-9_$]+), which Snowflake
  normalizes to lowercase in ls output. Components containing other
  characters (dots, spaces, etc.) imply quoted identifiers whose case
  Snowflake preserves, so those are compared exactly.
- Update the else-branch comment in iter_stage() to clarify it applies
  to @-stage paths only, now that vstage has its own elif branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/vstage-recursive-copy branch from 80b5581 to d83dada Compare March 24, 2026 13:21
Maciej Oczko and others added 2 commits March 25, 2026 13:08
Two bugs prevented recursive copy from working with vstage paths
(e.g., `snow://project/.../deployments/DEPLOYMENT$1/`):

1. `iter_stage()` lost the `snow://` prefix when reconstructing file
   paths from `ls` output. Snowflake's `ls` returns plain relative
   paths (e.g., `deployments/DEPLOYMENT$1/manifest.yml`) without the
   `snow://` prefix. The existing code passed these to `build_path()`,
   which treated them as `@`-prefixed stage paths, producing invalid
   SQL like `get @deployments/DEPLOYMENT$1/manifest.yml ...` instead
   of `get 'snow://project/.../manifest.yml' ...`.

   Fix: Add a vstage-aware branch in `iter_stage()` that reconstructs
   full paths via `stage_path.root_path() / file["name"]` when the
   stage path starts with `snow://`.

2. `StagePath.relative_to()` used `PurePosixPath.relative_to()` which
   is case-sensitive. Snowflake normalizes unquoted identifiers to
   lowercase in `ls` output (e.g., `DEPLOYMENT$1` becomes
   `deployment$1`), causing `ValueError` when computing relative paths
   for local directory structure during recursive download.

   Fix: Replace with case-insensitive part-by-part comparison that
   preserves original casing in the returned relative path.

Verified against a real DCM project on Snowflake -- 7 files downloaded
successfully with correct directory structure preserved.

.... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code)

Co-Authored-By: Cortex Code <noreply@snowflake.com>
…e iter_stage comment (#2786)

- Add StagePath.is_vstage() method returning _is_snow_prefixed_stage,
  mirroring is_user_stage() and is_git_repo(); use it in iter_stage()
  instead of leaking the SNOW_PREFIX constant into manager.py.
- Fix relative_to() to apply case-insensitive comparison only for
  unquoted identifier components ([A-Za-z0-9_$]+), which Snowflake
  normalizes to lowercase in ls output. Components containing other
  characters (dots, spaces, etc.) imply quoted identifiers whose case
  Snowflake preserves, so those are compared exactly.
- Update the else-branch comment in iter_stage() to clarify it applies
  to @-stage paths only, now that vstage has its own elif branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/vstage-recursive-copy branch from d83dada to 53115f9 Compare March 25, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants