fix: resolve .env from CWD and stop _do_deploy env mutation by deanq · Pull Request #236 · runpod/flash

deanq · 2026-02-28T18:20:00Z

Summary

Fix .env resolution: dotenv_values() and load_dotenv() now use find_dotenv(usecwd=True) to walk up from CWD (user's project) instead of from the package source location. Editable installs were loading the library's dev .env (PYTHONPATH=src, FLASH_HOST=localhost) instead of the user's project .env (HF_TOKEN, RUNPOD_API_KEY, etc).
Fix config drift from _do_deploy: Runtime env var injections (RUNPOD_API_KEY, FLASH_MODULE_PATH, FLASH_ENDPOINT_TYPE) now go into self.template.env via a new _inject_template_env() helper instead of mutating self.env. Since env is a hashed field, the old mutation caused false config drift on every subsequent deploy, triggering unnecessary rolling releases.

Test plan

7 new tests for _inject_template_env (adds KV pair, idempotent, no-op with None template, initializes empty env list, non-mutation of self.env, API key injection into template, LB module path + endpoint type injection)
Updated 2 existing dotenv tests to match new import pattern
All 1266 tests pass via make quality-check
Coverage at 73.95% (above 65% threshold)
Manual: flash run --auto-provision on flash-examples/03_mixed_workers provisions with correct env vars, no rolling release on first request

Copilot

Pull request overview

This PR fixes environment variable loading and deployment-time env injection to ensure .env files are resolved from the user’s working directory and to prevent config drift caused by mutating hashed configuration fields during deploy.

Changes:

Resolve .env from CWD via find_dotenv(usecwd=True) for both load_dotenv() and dotenv_values().
Stop mutating self.env during _do_deploy; inject runtime env vars into self.template.env via a new _inject_template_env() helper.
Add/adjust unit tests to cover the new injection behavior and updated dotenv import/call patterns.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/runpod_flash/__init__.py`	Loads `.env` via `find_dotenv(usecwd=True)` before other imports.
`src/runpod_flash/core/resources/environment.py`	Reads `.env` values from a CWD-upwards search instead of package-relative resolution.
`src/runpod_flash/core/resources/serverless.py`	Adds `_inject_template_env()` and switches deploy-time injections to template env to avoid env hash drift.
`src/runpod_flash/core/resources/load_balancer_sls_resource.py`	Injects `FLASH_ENDPOINT_TYPE` into template env instead of mutating `self.env`.
`tests/unit/test_dotenv_loading.py`	Updates assertions to match new dotenv import/call pattern.
`tests/unit/resources/test_serverless.py`	Adds tests validating template env injection and non-mutation of `self.env` during deploy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/core/resources/serverless.py

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/core/resources/serverless.py

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/core/resources/serverless.py

runpod-Henrik

1. Root cause fix — correct

Both bugs addressed cleanly:

find_dotenv(usecwd=True) in __init__.py and environment.py fixes editable-install .env resolution. The old call with no arguments resolved relative to the package source, which meant dev installs loaded the library's own .env (PYTHONPATH=src, FLASH_HOST=localhost) instead of the user's project.
Moving runtime var injection from self.env → self.template.env via _inject_template_env() stops the false drift loop. self.env stays unchanged across deploys, so the hash is stable.

2. `FLASH_ENDPOINT_TYPE` removal from LB deploy

The deletion of self.env["FLASH_ENDPOINT_TYPE"] = "lb" in load_balancer_sls_resource.py has a comment explaining the intent — resource_provisioner sets it for flash deploy, and it must NOT be set for flash run (would trigger artifact unpacking that doesn't exist for live endpoints). This is the correct fix, but it's worth confirming the resource_provisioner path is exercised by a test so this split behavior is locked in.

3. Question: `has_explicit_template_env` misses one case

has_explicit_template_env = (
    not new_config.env and new_config.template.env is not None
)
skip_env = env_unchanged and not has_explicit_template_env

The condition only checks not new_config.env (env is falsy). If a user has env={"HF_TOKEN": "x"} (truthy) AND explicit template.env entries, has_explicit_template_env is False even when template.env has content. If env_unchanged=True in that case, skip_env=True → the explicit template.env entries are silently dropped from the update payload.

This is a narrow edge case (requires explicitly constructing template.env outside the normal path), but if it's possible, the condition should be new_config.template.env is not None without the not new_config.env guard.

4. Testing

_inject_template_env, _inject_runtime_template_vars, and _build_template_update_payload(skip_env=...) all have unit tests (524 lines added). Coverage at the helper level is solid.

One gap: no integration test for the update() path that confirms skip_env=True is actually triggered when env_unchanged=True — the existing test_update_calls_save_template_with_resolved_template_id tests that update_template is called, but not that it omits env when unchanged. The behavior relies on the helper tests + code inspection, not an end-to-end flow test through update().

Verdict

PASS with one question to confirm (item 3 — template.env drop edge case) and a suggested test addition (item 4). The core fixes for both bugs are correct.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/runpod_flash/core/resources/serverless.py

dotenv_values() and load_dotenv() without arguments use find_dotenv() which walks up from the calling file's directory. For editable installs, this resolves the library's dev .env (PYTHONPATH=src, FLASH_HOST=localhost) instead of the user's project .env (HF_TOKEN, RUNPOD_API_KEY, etc). Pass find_dotenv(usecwd=True) so the search starts from CWD (the user's project directory) in both __init__.py and environment.py.

…rift _do_deploy injected runtime env vars (RUNPOD_API_KEY, FLASH_MODULE_PATH, FLASH_ENDPOINT_TYPE) directly into self.env, which is a hashed field. This caused config drift detection on every subsequent deploy, triggering unnecessary rolling releases. Add _inject_template_env() helper that appends KeyValuePairs to self.template.env instead. Runtime injections now go into the template (which is excluded from hashing) while self.env stays clean for drift detection.

The old self.env["FLASH_ENDPOINT_TYPE"] = "lb" was dead code — env is in _input_only, so model_dump excluded it from the API payload. The refactor to _inject_template_env made it actually reach the worker, which triggered is_flash_deployment() -> maybe_unpack() -> artifact not found error for flash run (live serverless) endpoints. For flash deploy, the runtime resource_provisioner already sets FLASH_ENDPOINT_TYPE=lb. This injection point is not needed.

When updating an endpoint, the saveTemplate mutation previously always sent the user's env vars, which overwrote platform-injected vars like PORT and PORT_HEALTH on LB endpoints. This triggered unnecessary rolling releases. Now _build_template_update_payload accepts skip_env; update() compares old vs new env and omits env from the template payload when unchanged, letting the platform preserve its injected vars.

The comment said env vars were excluded from the hash, but they are included. The exclude set only contains RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS (id), not env.

…v overwrite Extract _inject_runtime_template_vars() from _do_deploy so both initial deploy and update() paths inject RUNPOD_API_KEY and FLASH_MODULE_PATH into template.env. Without this, runtime vars set during _do_deploy were silently dropped when update() overwrote the template env on config drift. Also preserve explicit template.env entries when env dict is empty on both sides.

PodTemplate.env defaults to [] (not None), so `is not None` was always True when new_config.env was empty — forcing skip_env=False on every update and triggering unnecessary rolling releases. Use __pydantic_fields_set__ to detect whether env was explicitly provided.

…ic_fields_set__ __pydantic_fields_set__ is not part of Pydantic's public API and may change across versions. Use the public model_fields_set property (Pydantic v2) for stable explicit-field detection.

runpod-Henrik

Delta review — since 2026-03-12

What's been addressed:

1. has_explicit_template_env logic — Fixed correctly with model_fields_set. My original concern was that the not new_config.env gate would miss the case where env={"A":"1"} is unchanged but template.env is explicitly re-set. The new code uses getattr(new_config.template, "model_fields_set", set()) and checks "env" in template_fields_set, which correctly distinguishes PodTemplate(env=[...]) (explicit, in fields_set) from PodTemplate() (default empty list, not in fields_set). The __pydantic_fields_set__ → public model_fields_set rename in the last commit is also correct.

2. update() test coverage gap — Five new tests added covering the exact scenarios I flagged: env unchanged skips payload, env changed includes payload, runtime vars injected on change, runtime vars skipped when unchanged, explicit template.env with empty env sends env. Gap is closed.

Still open:

My medium-confidence question about FLASH_ENDPOINT_TYPE for LB endpoints on env-changing redeployments remains unanswered in the diff. The comment says "the runtime resource_provisioner sets it" — but resource_provisioner.py runs inside the worker for auto-provisioning, not during flash deploy for directly-deployed LB endpoints. There's a test_flash_endpoint_type_not_injected test confirming FLASH_ENDPOINT_TYPE is NOT injected, but no test confirming LB endpoints work correctly after an env-changing redeploy without it in the template.

If deanq's review resolved this (e.g. confirmed saveTemplate merges rather than replaces, or confirmed resource_provisioner does set it for direct deploys), this is a non-issue. Otherwise worth a clarifying comment before merge.

Overall: The two substantive concerns from the first review have been addressed. The PR is in good shape — still PASS WITH NITS on the FLASH_ENDPOINT_TYPE question as the only remaining unknown.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

deanq

Addressing the remaining question on FLASH_ENDPOINT_TYPE:

resource_provisioner.create_resource_from_manifest() is called from cli/utils/deployment.py during flash deploy (lines 108, 246, 270) — not only inside the worker. So for directly-deployed LB endpoints, FLASH_ENDPOINT_TYPE=lb is set by create_resource_from_manifest (resource_provisioner.py:79) before the resource is passed to ResourceManager for provisioning.

The comment in load_balancer_sls_resource.py:258 is accurate: _do_deploy() does not inject it because resource_provisioner already has. The flash run path (live serverless) intentionally skips it since there are no build artifacts to unpack.

Call chain for flash deploy:

deploy.py → deployment.py:108 → create_resource_from_manifest()
  → resource_provisioner.py:79 sets env["FLASH_ENDPOINT_TYPE"] = "lb"
  → resource passed to ResourceManager → _do_deploy()

deanq requested a review from Copilot February 28, 2026 18:22

Copilot started reviewing on behalf of deanq February 28, 2026 18:22 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Show resolved Hide resolved

deanq requested a review from Copilot February 28, 2026 21:11

Copilot started reviewing on behalf of deanq February 28, 2026 21:11 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Show resolved Hide resolved

src/runpod_flash/core/resources/serverless.py Show resolved Hide resolved

deanq requested a review from Copilot March 2, 2026 06:32

Copilot started reviewing on behalf of deanq March 2, 2026 06:33 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Outdated Show resolved Hide resolved

deanq force-pushed the deanq/ae-1549-env-vars-from-cwd-first branch from c5916e6 to 405f046 Compare March 3, 2026 07:24

jhcipar approved these changes Mar 3, 2026

View reviewed changes

deanq force-pushed the deanq/ae-1549-env-vars-from-cwd-first branch 2 times, most recently from 081d553 to 788808e Compare March 10, 2026 02:52

deanq force-pushed the deanq/ae-1549-env-vars-from-cwd-first branch from 788808e to e69e1c3 Compare March 12, 2026 19:12

runpod-Henrik reviewed Mar 12, 2026

View reviewed changes

deanq force-pushed the deanq/ae-1549-env-vars-from-cwd-first branch 2 times, most recently from 58c57de to 014231e Compare March 18, 2026 23:23

deanq requested review from Copilot and runpod-Henrik March 19, 2026 17:08

Copilot started reviewing on behalf of deanq March 19, 2026 17:08 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Outdated Show resolved Hide resolved

deanq added 7 commits March 19, 2026 11:02

fix(serverless): correct stale inline comment in config_hash

1fc2eb0

The comment said env vars were excluded from the hash, but they are included. The exclude set only contains RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS (id), not env.

deanq force-pushed the deanq/ae-1549-env-vars-from-cwd-first branch from 014231e to 37d4f00 Compare March 19, 2026 18:04

fix(review): use public model_fields_set instead of internal __pydant…

1166bbe

…ic_fields_set__ __pydantic_fields_set__ is not part of Pydantic's public API and may change across versions. Use the public model_fields_set property (Pydantic v2) for stable explicit-field detection.

runpod-Henrik reviewed Mar 19, 2026

View reviewed changes

deanq commented Mar 19, 2026

View reviewed changes

deanq merged commit 583efae into main Mar 20, 2026
4 checks passed

deanq deleted the deanq/ae-1549-env-vars-from-cwd-first branch March 20, 2026 00:12

runpod-release-please-bot bot mentioned this pull request Mar 20, 2026

chore: release 1.10.1 #278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve .env from CWD and stop _do_deploy env mutation#236

fix: resolve .env from CWD and stop _do_deploy env mutation#236
deanq merged 8 commits intomainfrom
deanq/ae-1549-env-vars-from-cwd-first

deanq commented Feb 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

deanq left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

deanq commented Feb 28, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Root cause fix — correct

2. FLASH_ENDPOINT_TYPE removal from LB deploy

3. Question: has_explicit_template_env misses one case

4. Testing

Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Uh oh!

deanq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2. `FLASH_ENDPOINT_TYPE` removal from LB deploy

3. Question: `has_explicit_template_env` misses one case