Skip to content

fix(CubeAPI): propagate create-time env vars to envd-backed commands#566

Open
zyl1121 wants to merge 1 commit into
TencentCloud:masterfrom
zyl1121:fix/cubeapi-envd-init
Open

fix(CubeAPI): propagate create-time env vars to envd-backed commands#566
zyl1121 wants to merge 1 commit into
TencentCloud:masterfrom
zyl1121:fix/cubeapi-envd-init

Conversation

@zyl1121

@zyl1121 zyl1121 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Motivation

For envd-backed command execution, create-time environment variables should be propagated from CubeAPI into envd.

On the current CubeAPI sandbox creation path, this propagation is incomplete:

  • the E2B SDK sends create-time environment variables as envs, while CubeAPI only accepted envVars
  • after CubeMaster creates the sandbox, CubeAPI does not initialize envd with those create-time environment variables

As a result, envd-backed execution such as commands.run cannot see environment variables passed at sandbox creation time.

This PR does not introduce a generic sandbox startup-time environment refresh model across all runtime paths.

What this PR changes

  • accept envs as an alias of envVars in NewSandbox, so both SDK entrypoints are supported

  • reject invalid or dangerous create-time env var names before sandbox creation

  • after CubeMaster successfully creates a sandbox, initialize envd with the create-time environment variable map through CubeProxy path-based routing:

    POST /sandbox/{sandbox_id}/49983/init
    
  • derive the CubeProxy base URL from existing runtime configuration instead of introducing a new configuration item

  • fail sandbox creation if envd initialization fails after CubeMaster creates the sandbox, and clean up the created sandbox to avoid leaving a partial result

  • add focused tests for alias deserialization, CubeProxy URL derivation, path-based envd /init routing, fail-fast failure handling, and post-create cleanup

Scope

This PR restores create-time environment variables for envd-backed execution, such as commands.run.

It does not change run_code. run_code goes through the bundled lightweight code interpreter instead of envd-backed process execution, and that interpreter does not currently read envd /envs.

It also does not claim generic startup-time env refresh support for every sandbox runtime path.

Testing

Unit tests:

cargo test --manifest-path CubeAPI/Cargo.toml services::sandboxes::tests -- --nocapture
cargo test --manifest-path CubeAPI/Cargo.toml models::tests::new_sandbox_accepts_e2b_envs_alias -- --nocapture

Cluster validation:

from e2b_code_interpreter import Sandbox

with Sandbox.create(
    template="tpl-xxxx",
    envs={"CUBE_TEST_CREATE": "from-e2b-create"},
) as sandbox:
    print("commands.run:")
    print(sandbox.commands.run("echo $CUBE_TEST_CREATE").stdout)

    print("run_code:")
    result = sandbox.run_code(
        "import os; print(os.environ.get('CUBE_TEST_CREATE', '<NOT SET>'))"
    )
    print(result.logs.stdout)

Before this PR, on current origin/master:

commands.run:


run_code:
['<NOT SET>\n']

After this PR:

commands.run:
from-e2b-create

run_code:
['<NOT SET>\n']
  • verified the E2B SDK envs path
  • verified the CubeSandbox SDK native envVars path
  • verified that commands.run sees create-time env vars after sandbox creation
  • verified that sandbox creation fails fast if create-time env vars cannot be injected into envd
  • verified that envd init failure triggers sandbox cleanup
  • verified that per-call command env vars override sandbox-level env vars
  • verified that after a per-call override, the next commands.run falls back to the sandbox-level env vars
  • verified that pause / resume preserves the same commands.run env visibility in the current lifecycle model
  • verified that run_code remains <NOT SET> on the current runtime image

Compatibility

This change is backward compatible:

  • existing envVars requests continue to work
  • E2B SDK envs requests now work
  • sandbox creation now fails if requested create-time env vars cannot be injected into envd
  • per-call command environment variable precedence is unchanged

For envd-backed command execution, precedence remains:

sandbox-level env vars from create-time init < per-call command env vars

Documentation

No repository documentation was updated in this PR.

The behavior change is intentionally narrow: CubeAPI now propagates create-time environment variables to envd for envd-backed command execution and treats failed envd initialization as a create failure.

The remaining run_code behavior is tracked in the linked issue and requires runtime/image-side support in the bundled lightweight code interpreter.

Related issue

Refs #565

return Ok(());
};
let url = build_envd_init_url(base_url, sandbox_id);
let resp = self

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing HTTP timeout on envd init call – The shared reqwest::Client in state.rs (line 46-50) is built without .timeout(), which means reqwest::Client defaults to no timeout. If the envd proxy is unresponsive, this POST will hang indefinitely, blocking the sandbox creation endpoint for that handler. Since this call is best-effort (failure is logged and swallowed), it should have a tight timeout so it fails fast.

Suggested change
let resp = self
let resp = self
.http_client
.post(url)
.json(&EnvdInitRequest { env_vars })
.timeout(std::time::Duration::from_secs(10))
.send()
.await
.map_err(|e| AppError::Internal(anyhow::anyhow!("envd init request failed: {}", e)))?;

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
let url = build_envd_init_url(base_url, sandbox_id);
let resp = self
.http_client
.post(url)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing auth header on envd init request – Every other call to the sandbox proxy (e.g. run_envd_command in agenthub.rs:2162) sends Authorization: Basic cm9vdDo=. This new init_sandbox_env_vars method sends no auth header. If the downstream /init endpoint enforces the same proxy-level auth, this will silently fail on every request (logged at warn, caller sees success). If it does not enforce auth, this is an inconsistency that may widen the attack surface.

Suggested change
.post(url)
.http_client
.post(url)
.header("Authorization", "Basic cm9vdDo=")
.json(&EnvdInitRequest { env_vars })


async fn init_sandbox_env_vars(
&self,
sandbox_id: &str,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent skip when env vars are provided but no proxy URL is configured – When sandbox_proxy_base_url is None and env_vars is non-empty, the init call is skipped silently. The caller (create_sandbox) only logs a warning when the HTTP call itself fails, not when it's skipped. This makes it hard for operators to debug why client-provided env vars are being silently ignored. Consider adding a tracing::warn! here when env_vars is non-empty but the proxy URL is unset.

env_vars: &'a HashMap<String, String>,
}

pub(crate) fn default_sandbox_proxy_base_url() -> Option<String> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub(crate) function should have a doc commentdefault_sandbox_proxy_base_url is visible outside this module (called from services/mod.rs:104) but has no doc comment. It reads three environment variables with a fallback chain. A doc comment explaining the env vars (AGENTHUB_SANDBOX_PROXY_URL, CUBE_SANDBOX_NODE_IP, CUBE_PROXY_HTTP_PORT), their meanings, and the fallback priority would help future maintainers.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Path((sandbox_id, port)): Path<(String, u16)>,
Json(body): Json<Value>,
) -> StatusCode {
assert_eq!(sandbox_id, "sb-123");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions inside spawned test handler are invisible on failure – These assert_eq! calls run inside a tokio::spawn-ed axum handler. If they fail (say from a code change that alters the URL path), the panic is caught by tokio and logged to stderr, but does not cause the test to fail. The test would silently pass because the handler returning an error response triggers the best-effort fallback. Consider returning the received values in the response and asserting from the caller side, or using a tokio::sync::oneshot channel to propagate assertion failures.

@zyl1121 zyl1121 changed the title fix(CubeAPI): initialize envd for create-time env vars fix(CubeAPI): propagate create-time env vars to envd-backed commands Jun 15, 2026
@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Are you doing the same thing as PR #554?

@zyl1121

zyl1121 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Are you doing the same thing as PR #554?

Thanks for pointing this out. It is related to #554, but the fix is applied at a different layer.

#554 calls envd /init from the CubeSandbox SDKs after Sandbox.create(env_vars=...) succeeds. This PR moves that propagation into CubeAPI itself: CubeAPI accepts both envs and envVars, and initializes envd after CubeMaster creates the sandbox.

The main reason is compatibility with clients that do not use the CubeSandbox SDK, especially E2B SDK users, since the E2B SDK sends envs instead of envVars. With this PR, any client creating sandboxes through CubeAPI gets the same create-time env propagation for envd-backed command execution, without implementing the /init call client-side.

const RET_CODE_HTTP_OK: i32 = 200;
const RET_CODE_NOT_FOUND: i32 = 130404;
const RET_CODE_CONFLICT: i32 = 130409;
const ENVD_PORT: u16 = 49983;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: This hardcoded credential (Basic cm9vdDo= → root with empty password) is shared across all deployments and transmitted over plain HTTP when the proxy URL is derived from CUBE_SANDBOX_NODE_IP. Consider making this configurable via an environment variable (with the current value as a fallback default), and document that production deployments should use per-deployment secrets over HTTPS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR intentionally reuses the existing sandbox-proxy auth behavior rather than introducing a new auth contract.

The same Authorization: Basic cm9vdDo= value is already used by the existing envd proxy path in run_envd_command (CubeAPI/src/handlers/agenthub.rs). In this PR I only aligned the new create-time /init path with that existing server-side behavior.

I agree that making the proxy auth configurable would be a worthwhile follow-up, but I would prefer to keep that as a separate change because it affects the broader CubeAPI -> sandbox proxy contract rather than this narrow env propagation fix.

return Ok(());
};
let url = build_envd_init_url(base_url, sandbox_id);
let resp = self

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quality: Wrapping the error with format!("...{}", e) calls Display on the underlying error, which often produces a shorter message than the full chain. Consider using .context("envd init request failed") from the anyhow crate (if AppError::Internal accepts anyhow::Error) to preserve the full error chain for debugging.

@@ -469,6 +497,98 @@
limit,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security (input validation): The env_vars map accepts arbitrary key/value pairs with no size limits. An API client could send thousands of entries or large values, potentially causing memory exhaustion. Consider capping the number of entries and maximum key/value length, either via #[validate] attributes or manual validation in create_sandbox.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree that request-size limits are worth considering.

For this PR, I kept the scope limited to restoring the verified CubeAPI -> envd propagation path for create-time envs. Adding entry-count / key-length / value-length limits would introduce a broader API validation contract for NewSandbox, which likely deserves separate discussion so the limits can be applied consistently across callers and documented clearly.

I would prefer to keep this PR focused on the propagation fix and handle request-size validation as a follow-up if maintainers think that should be enforced at the API layer.

derive_sandbox_proxy_base_url(
agenthub_proxy.as_deref(),
node_ip.as_deref(),
proxy_port.as_deref(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security / Robustness: CUBE_SANDBOX_NODE_IP is used directly in URL construction without validating it's a valid IP address or hostname. Consider validating the value (e.g., parse as IPv4) before using it in URL construction, and reject malformed values instead of silently constructing a potentially unsafe URL.

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 34e765f to c466139 Compare June 15, 2026 07:52
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
const RET_CODE_CONFLICT: i32 = 130409;
const ENVD_PORT: u16 = 49983;
const ENVD_INIT_TIMEOUT_SECS: u64 = 10;
const SANDBOX_PROXY_BASIC_AUTH: &str = "Basic cm9vdDo=";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded weak Basic auth credential — This decodes to root: (username root, empty password). The credential is identical across all deployments, non-rotatable without a code change, and trivially recoverable from source/binary. Anyone with network access to the CubeProxy endpoint can replay it to inject arbitrary env vars into any sandbox.

Suggestion: Promote this to a runtime configuration value (e.g., env var CUBE_PROXY_AUTH_TOKEN) loaded at service startup instead of hardcoding it. At minimum, use a non-empty password and document that operators must override it.

Ref: CWE-798

Comment on lines +181 to +189
if let Some(env_vars) = env_vars.as_ref() {
if let Err(err) = self.init_sandbox_env_vars(&resp.sandbox_id, env_vars).await {
tracing::warn!(
sandbox_id = %resp.sandbox_id,
error = %err,
"envd init after sandbox create failed"
);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential envd init delays sandbox creation response — The init_sandbox_env_vars call is awaited synchronously before returning the sandbox response to the caller. Since this is best-effort (errors are swallowed), the client still waits for envd init latency — up to the 10-second timeout on failure. This inflates tail latency for every create_sandbox even though the sandbox is already usable.

Suggestion: Spawn envd init as a detached background task so the response returns immediately after CubeMaster confirms creation. SandboxService already derives Clone and reqwest::Client is Arc-backed, so cloning is cheap.

let proxy_port = proxy_port
.map(str::trim)
.filter(|value| !value.is_empty())
.unwrap_or("80");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent default port 80 masks misconfiguration — When CUBE_PROXY_HTTP_PORT is unset, the code silently defaults to port 80. If the proxy listens on 8080 (a common alternative), envd init silently fails and — because it's best-effort — the sandbox still creates without env vars being applied. The failure surfaces only as a generic HTTP error.

Suggestion: Return None from derive_sandbox_proxy_base_url when CUBE_SANDBOX_NODE_IP is set but CUBE_PROXY_HTTP_PORT is absent, letting the missing configuration be logged as a warning rather than silently assuming port 80.

Comment on lines +590 to 597
fn build_envd_init_url(base_url: &str, sandbox_id: &str) -> String {
format!(
"{}/sandbox/{}/{}/init",
base_url.trim_end_matches('/'),
sandbox_id,
ENVD_PORT
)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing URL path-segment encodingsandbox_id is interpolated directly into the URL path without validation or encoding. The codebase has a validate_path_segment helper (in cubemaster/mod.rs:567) used elsewhere that is not applied here. While the current sandbox_id is server-generated with a constrained character set, inconsistent validation creates a maintenance hazard.

Suggestion: Call validate_path_segment before constructing the URL, or add a comment documenting why it's unnecessary (e.g., the ID is guaranteed UUID-like).

Comment thread CubeAPI/src/models/mod.rs

#[serde(rename = "envVars", skip_serializing_if = "Option::is_none")]
#[serde(
alias = "envs",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envs alias not documented on the struct — The #[serde(alias = "envs")] attribute is tested but the NewSandbox struct-level doc comment only mentions envVars. API consumers reading the source won't know that envs (used by the E2B SDK) is also accepted.

Suggestion: Update the struct doc comment to mention that envs is an accepted alias for envVars. Also consider documenting the behavior when both envs and envVars are present (serde_json last-writer-wins for aliased fields).

@cubesandboxbot

cubesandboxbot Bot commented Jun 15, 2026

Copy link
Copy Markdown

Overview

This PR adds envd-based environment variable injection during sandbox creation, with input validation, E2B SDK compatibility (envs alias), and a proxy URL resolution mechanism. The code is well-structured with strong test coverage, but there are a few security and maintainability concerns worth addressing.


RED Medium Severity

  1. Envd error response body leaked to API caller (sandboxes.rs:584-588)
    The full envd response body is included verbatim in the error returned to the caller. If envd returns internal paths or stack traces, these are leaked. See inline comment.

  2. Missing env var value validation (sandboxes.rs:596-624)
    Only env var names are validated. Values have no length limit. Consider adding a value length cap.

  3. FORBIDDEN_ENV_NAMES has gaps (sandboxes.rs:37-59)
    Missing dangerous variables: PERL5OPT, SHELLOPTS, BASHOPTS, PYTHONSTARTUP, TMPDIR, TMP, TEMP, HOME. See inline comment.


YELLOW Low-Medium Severity

  1. Partial-move pattern (sandboxes.rs:154-208)
    distribution_scope is left in the destructured body variable after partial move. See inline comment.

  2. Error type loss (sandboxes.rs:558-559)
    CubeMasterError is flattened via to_string(), losing structured info. See inline comment.

  3. spawn_server helper duplicated (sandboxes.rs:1395,1485)
    The spawn_server function is identical across two tests. Extract into a shared utility.

  4. AGENTHUB_SANDBOX_PROXY_URL not validated (sandboxes.rs:669)
    Proxy URL accepted without URL parsing or scheme validation.

  5. Sequential HTTP call on critical path (sandboxes.rs:226-238)
    Envd init adds extra RTT (up to 10s timeout) to every sandbox creation with env vars.


GREEN Minor

  1. Doc comment wording (sandboxes.rs:634)
    If this function is ever extended - this is a struct, not a function. See inline comment.

Overall: Clean, well-tested PR with good security. The information disclosure fix and value validation are the most actionable items.

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 851ec0d to c7f0071 Compare June 15, 2026 08:18
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from c7f0071 to 7afe5f1 Compare June 15, 2026 09:30
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
let url = build_envd_init_url(base_url, sandbox_id);

let mut last_err: Option<anyhow::Error> = None;
for attempt in 0..ENVD_INIT_MAX_ATTEMPTS {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid retry here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid retry here.

I agree this is worth reconsidering.

For context, my original design was best-effort and did not retry. The retry was added later in a follow-up commit by @fslongjin while consolidating parts of #554, mainly to avoid false failures when envd is not ready immediately.

I think the trade-off is reliability vs. create-path efficiency. It may be better to discuss the expected behavior and overall design first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid retry here.

I agree this is worth reconsidering.

For context, my original design was best-effort and did not retry. The retry was added later in a follow-up commit by @fslongjin while consolidating parts of #554, mainly to avoid false failures when envd is not ready immediately.

I think the trade-off is reliability vs. create-path efficiency. It may be better to discuss the expected behavior and overall design first.

I thought about it, maybe it's more reasonable to change this to fail-fast — errors shouldn't be silently swallowed.

@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Can you squash it into one commit?

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch 3 times, most recently from 7afe5f1 to 2b485fa Compare June 23, 2026 06:31
@zyl1121 zyl1121 requested a review from lisongqian as a code owner June 23, 2026 06:31
@zyl1121

zyl1121 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Can you squash it into one commit?

@kinwin-ustc Done. I rebased the branch onto the latest master and squashed the PR into a single commit.

I thought about it, maybe it's more reasonable to change this to fail-fast, errors shouldn't be silently swallowed.

I updated the envd init behavior to fail fast with cleanup based on the discussion with @fslongjin. The PR description has also been updated to match the final behavior.

The current behavior is: when create-time env vars are provided for envd-backed commands, envd init is treated as part of successful sandbox creation. If envd init fails after CubeMaster creates the sandbox, CubeAPI cleans up the created sandbox and returns an error instead of retrying or ignoring the failure.

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 2b485fa to 68c848a Compare June 25, 2026 05:15
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Accept the E2B envs alias on sandbox creation requests and validate
create-time env var names before creating the sandbox.

After CubeMaster creates the sandbox, initialize envd through CubeProxy
using the configured sandbox proxy base URL. If envd init fails, clean
up the created sandbox and return an error instead of retrying or
silently swallowing the failure.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
Co-authored-by: jinlong <jinlong@tencent.com>
@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 629828b to bd9f786 Compare June 25, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants