Skip to content

Fix streamlit deploy failure when pages/*.py overlaps with pages/ directory#2780

Open
sfc-gh-moczko wants to merge 3 commits intomainfrom
fix/2741-streamlit-deploy-pages-collision
Open

Fix streamlit deploy failure when pages/*.py overlaps with pages/ directory#2780
sfc-gh-moczko wants to merge 3 commits intomainfrom
fix/2741-streamlit-deploy-pages-collision

Conversation

@sfc-gh-moczko
Copy link
Collaborator

@sfc-gh-moczko sfc-gh-moczko commented Feb 24, 2026

Summary

Fixes #2741.

snow streamlit deploy fails with NotInDeployRootError when both pages/ (directory) and pages/*.py (glob) appear in artifacts and a pages/ directory exists on disk.

Root cause

_ArtifactPathMap.put() walks directory sources to mark children in _dest_is_dir, but does not register those children in __dest_to_src. When a glob like pages/*.py later resolves to the same files, __dest_to_src.get(dest) returns None instead of the existing source, so the duplicate mapping is silently added. During symlink_or_copy(), the second mapping calls Path.resolve() on an already-created symlink, which follows it back to the project source directory -- outside the deploy root -- and raises NotInDeployRootError.

Fix

_ArtifactPathMap.put() (bundle_map.py) -- Three changes in the put() method:

  1. Directory walk now registers children in __dest_to_src: Each file discovered during os.walk is recorded as __dest_to_src[child_dest] = child_src. If a different source already maps to that destination, TooManyFilesError is raised immediately.
  2. File-to-file branch returns early on same-source duplicates: When current_source == src, the mapping is redundant (already covered by a parent directory walk), so put() returns without adding a duplicate entry.
  3. Directory-to-directory branch handles glob-then-directory order: When individual files from a glob (e.g. pages/*.py) are added first and then the parent directory (pages/) is added, the directory destination is already marked in _dest_is_dir. Instead of raising TooManyFilesError, we now verify all children of the new directory source are consistent with existing mappings and skip the duplicate if they are.

convert_streamlit_to_v2_data() (definition_conversion.py) -- Defense-in-depth:

Adds _is_path_covered_by_directory() helper. During V1-to-V2 definition conversion, additional_source_files entries like pages/*.py are skipped if they fall under a directory already included as an artifact (e.g. pages/). This prevents the overlapping mappings from reaching _ArtifactPathMap in the first place.

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually on MacOS.
  • I've confirmed that my changes are working by executing CLI's commands manually on Windows.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.
  • I've described my changes in the documentation.

Test plan

  • test_bundle_map_deduplicates_directory_and_glob_overlap -- directory first, then explicit file; verifies duplicate is silently deduplicated
  • test_bundle_map_deduplicates_glob_and_directory_overlap -- glob/files first, then directory; verifies reverse order also deduplicates
  • test_bundle_map_disallows_different_source_collision_with_directory_child -- different source, same dest; verifies TooManyFilesError is raised
  • test_bundle_deduplicates_pages_directory_and_glob -- end-to-end StreamlitEntity.bundle() with artifacts in dir-then-glob order
  • test_bundle_deduplicates_pages_glob_and_directory -- end-to-end with reversed glob-then-dir order
  • test_v1_to_v2_streamlit_conversion_deduplicates_pages -- V1-to-V2 conversion filters overlapping pages/*.py
  • test_v1_to_v2_streamlit_conversion_keeps_non_overlapping_additional_files -- non-overlapping files preserved

@sfc-gh-moczko sfc-gh-moczko requested a review from a team as a code owner February 24, 2026 01:14
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2741-streamlit-deploy-pages-collision branch from 5ee660a to 26c478a Compare February 24, 2026 05:21
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2741-streamlit-deploy-pages-collision branch 2 times, most recently from 35039f3 to a4780e7 Compare February 26, 2026 15:33
@sfc-gh-kolszewski sfc-gh-kolszewski self-assigned this Feb 27, 2026
if stage_path.is_user_stage():
path = StagePath.get_user_stage() / file["name"]
else:
elif stage_path.is_git_repo():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you included entire #2777 in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased onto main — the #2777 commit is now dropped from this branch.

@sfc-gh-kolszewski
Copy link
Contributor

Please, do not remove Pre-review checklist from the PR description and fill it

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually on MacOS.
  • I've confirmed that my changes are working by executing CLI's commands manually on Windows.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.
  • I've described my changes in the documentation.

type="streamlit",
identifier="test_streamlit",
main_file="streamlit_app.py",
artifacts=["streamlit_app.py", "pages/", "pages/*.py"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some testing and this fails with:

│ Multiple file or directories were mapped to one output destination.                                                                                                                                                                                                                                                                                                          │
│
│ destination = pages    

if the order is swapped to ["streamlit_app.py", "pages/*.py", "pages/"]. I think we should make the behaviour consistent so that the order of artifacts does not matter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the directory-to-directory branch in _ArtifactPathMap.put() now handles the case where individual files from a glob were registered first. Instead of immediately raising TooManyFilesError, it checks whether all children of the new directory source are consistent with existing mappings and skips the duplicate if they match. Added test_bundle_map_deduplicates_glob_and_directory_overlap and test_bundle_deduplicates_pages_glob_and_directory to cover both orderings.

Copy link
Contributor

@sfc-gh-kolszewski sfc-gh-kolszewski Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-moczko Hm, I don't see any changes compared to the last time I reviewed and these two new tests are missing - did you push the changes?

Copy link
Collaborator Author

@sfc-gh-moczko sfc-gh-moczko Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-kolszewski Good catch! I've just pushed the changes (commit adbcc79). The implementation is now complete with both tests you mentioned:

Tests added:

  1. test_bundle_map_deduplicates_glob_and_directory_overlap (tests/api/artifacts/test_bundle_map.py:484)

    • Unit test for glob-first ordering (file added before directory)
  2. test_bundle_deduplicates_pages_glob_and_directory (tests/streamlit/test_streamlit_entity.py:102)

    • End-to-end test with artifacts=["streamlit_app.py", "pages/*.py", "pages/"]
    • Tests the exact scenario you reported

Fix details:

The issue was in _ArtifactPathMap.put() (bundle_map.py:447-465). When individual files are added via glob before
their parent directory, _update_dest_is_dir() recursively marks the parent as a directory. The old code immediately
raised TooManyFilesError when the directory was later added.

The fix walks the directory to verify all children either:

  • Are not yet mapped, OR
  • Map from the same child source

If consistent, it skips the duplicate via early return. If any child maps from a different source, it raises
TooManyFilesError as before.

Result: Both orderings now work correctly:

  • ["pages/", "pages/*.py"] — directory-first ✓ (existing code path)
  • ["pages/*.py", "pages/"] — glob-first ✓ (new code path)

Let me know if you'd like me to clarify anything!

@sfc-gh-moczko
Copy link
Collaborator Author

Done — added the pre-review checklist to the PR description.

@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2741-streamlit-deploy-pages-collision branch from a4780e7 to 27e803a Compare March 3, 2026 09:35
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2741-streamlit-deploy-pages-collision branch from adbcc79 to f799315 Compare March 6, 2026 22:10
Maciej Oczko and others added 2 commits March 18, 2026 17:43
…ectory (#2741)

When both `pages/` and `pages/*.py` appear in artifacts, the bundle map
created duplicate destination mappings that caused `NotInDeployRootError`
during symlink resolution. Fix by tracking directory-walked children in
`__dest_to_src` to detect and deduplicate overlapping mappings, and by
filtering redundant `additional_source_files` during V1-to-V2 definition
conversion.

.... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code)

Co-Authored-By: Cortex Code <noreply@snowflake.com>
When individual files are added via glob (e.g., pages/*.py) before
their parent directory (pages/), the directory destination is already
marked in _dest_is_dir due to recursive parent marking in
_update_dest_is_dir(). The previous code immediately raised
TooManyFilesError, even though the mappings were from the same source
and thus not a conflict.

Changes:
- _ArtifactPathMap.put(): Replace immediate error with consistency
  check when current_is_dir=True. Walk the directory to verify all
  children either (a) are not yet mapped, or (b) map from the same
  child source. If consistent, skip duplicate via early return. If
  any child maps from a different source, raise TooManyFilesError.

- test_bundle_map_deduplicates_glob_and_directory_overlap: Unit test
  for glob-first ordering (adds file, then directory).

- test_bundle_deduplicates_pages_glob_and_directory: End-to-end test
  with artifacts=["streamlit_app.py", "pages/*.py", "pages/"] to
  verify StreamlitEntity.bundle() handles both orderings.

Addresses reviewer feedback: artifact order is now consistent.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sfc-gh-moczko sfc-gh-moczko force-pushed the fix/2741-streamlit-deploy-pages-collision branch from f799315 to cec4b50 Compare March 18, 2026 17:59
…appings

The previous fix walked the new source directory to detect conflicts, but
missed cases where existing dest files came from sources outside the new
source directory. Reverse the check: iterate existing dest mappings and
verify each is consistent with what the new source would produce.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Contributor

@sfc-gh-jwilkowski sfc-gh-jwilkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! Please address one issue with local imports and add a note to RELEASE_NOTES.md about this change

Comment on lines +791 to +794
from snowflake.cli.api.project.definition_conversion import (
convert_streamlit_to_v2_data,
)
from snowflake.cli.api.project.schemas.v1.streamlit.streamlit import Streamlit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those local imports needed? Please move them to the top of the file if not

@sfc-gh-jwilkowski sfc-gh-jwilkowski self-assigned this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SNOW-3015632: Streamlit deploy fails when pages/*.py is in artifacts and pages/ directory exists

3 participants