Skip to content

Commit 74ea9ba

Browse files
authored
fix: add runtime version banner, log after unpack for accuracy (#71)
* feat(worker): add version logging after unpack for accurate reporting Log worker, runpod-flash, and runpod versions at boot. Banner is emitted after maybe_unpack() so the bundled runpod_flash (from flash build) is on sys.path, reporting the actual deployed version rather than the base image's pip-installed version. - Add version.py with __version__ constant and package version helpers - Read flash version from bundled runpod_flash.__version__ first - Fall back to importlib.metadata for non-bundled environments - Move version banner to after unpack in both QB and LB handlers * refactor(version): read worker version from pyproject.toml instead of hardcoded constant Worker version is now read from the co-located pyproject.toml via regex, eliminating manual version maintenance. Falls back to importlib.metadata if pyproject.toml is unavailable. * fix(version): use __version__ constant instead of pyproject.toml reading The pyproject.toml-reading approach fails because maybe_unpack() extracts the user's flash project to /app/, overwriting the worker's pyproject.toml with the user's (version 0.1.0). Use a release-please-managed __version__ constant instead. * perf(ci): remove 56-min GPU LB build from PR checks - Remove docker-test-lb job (pytorch base, build-only, no tests) - Update docker-validation gate to 4 check jobs - Replace jlumbroso/free-disk-space with targeted rm -rf in prod jobs docker-test-lb-cpu already validates LB code (~2 min) and docker-test validates the pytorch base image. GPU LB image still builds on release via docker-prod-lb. * perf(ci): optimize PR docker jobs and add LB handler test Remove unnecessary steps from PR docker jobs that only apply to prod cross-platform builds: lint dependency (validation gate already checks), QEMU (amd64-only), uv setup (Dockerfiles handle deps), disk cleanup (CPU images are ~150MB vs 14GB free), and full git history fetch. Add LB handler test to docker-test-lb-cpu for regression coverage. * fix(ci): use system python in LB handler test inside Docker test-lb-handler.sh unconditionally used `uv run` which creates a fresh venv and re-downloads 117 packages inside Docker, consuming the entire 30s timeout before the server can start. Apply the same Docker detection pattern used in test-handler.sh: use system python directly when running inside a container where packages are pre-installed. * fix(ci): use /ping endpoint in LB handler test The test script polled /health for readiness but lb_handler.py only exposes /ping. Server started fine but the check always timed out. * chore(deps): update dependencies boto3 1.42.57->1.42.58, ruff 0.15.3->0.15.4, runpod-flash 1.4.1->1.4.2
1 parent 002b787 commit 74ea9ba

File tree

9 files changed

+321
-483
lines changed

9 files changed

+321
-483
lines changed

.github/workflows/ci.yml

Lines changed: 13 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,6 @@ jobs:
2727
steps:
2828
- name: Checkout repository
2929
uses: actions/checkout@v4
30-
with:
31-
fetch-depth: 0
3230

3331
- name: Set up Python ${{ matrix.python-version }}
3432
uses: actions/setup-python@v5
@@ -75,38 +73,13 @@ jobs:
7573
docker-test:
7674
runs-on: ubuntu-latest
7775
if: github.event_name != 'pull_request' || github.head_ref != 'release-please--branches--main'
78-
needs: [lint]
7976
steps:
80-
- name: Clear Space
81-
# Note: Manual cleanup sufficient for CPU images (python:3.12-slim base ~150MB)
82-
# Only CUDA images need aggressive cleanup due to their ~10-15GB size
83-
run: |
84-
rm -rf /usr/share/dotnet
85-
rm -rf /opt/ghc
86-
rm -rf "/usr/local/share/boost"
87-
rm -rf "$AGENT_TOOLSDIRECTORY"
88-
docker system prune -af
89-
df -h
90-
9177
- name: Checkout repository
9278
uses: actions/checkout@v4
93-
with:
94-
fetch-depth: 0
95-
96-
- name: Set up QEMU
97-
uses: docker/setup-qemu-action@v3
9879

9980
- name: Set up Docker Buildx
10081
uses: docker/setup-buildx-action@v3
10182

102-
- name: Set up uv
103-
uses: astral-sh/setup-uv@v4
104-
with:
105-
enable-cache: true
106-
107-
- name: Setup dependencies
108-
run: uv sync
109-
11083
- name: Build CPU Docker image
11184
uses: docker/build-push-action@v6
11285
with:
@@ -124,92 +97,16 @@ jobs:
12497
echo "Testing CPU handler in Docker environment..."
12598
docker run --rm flash-cpu:test ./test-handler.sh
12699
127-
docker-test-lb:
128-
runs-on: ubuntu-latest
129-
if: github.event_name != 'pull_request' || github.head_ref != 'release-please--branches--main'
130-
needs: [lint]
131-
steps:
132-
- name: Free Disk Space
133-
uses: jlumbroso/free-disk-space@v1.3.1
134-
with:
135-
tool-cache: true
136-
android: true
137-
dotnet: true
138-
haskell: true
139-
large-packages: true
140-
swap-storage: true
141-
142-
- name: Additional cleanup and report
143-
run: |
144-
docker system prune -af
145-
df -h
146-
147-
- name: Checkout repository
148-
uses: actions/checkout@v4
149-
with:
150-
fetch-depth: 0
151-
152-
- name: Set up QEMU
153-
uses: docker/setup-qemu-action@v3
154-
155-
- name: Set up Docker Buildx
156-
uses: docker/setup-buildx-action@v3
157-
158-
- name: Set up uv
159-
uses: astral-sh/setup-uv@v4
160-
with:
161-
enable-cache: true
162-
163-
- name: Setup dependencies
164-
run: uv sync
165-
166-
- name: Build Load Balancer Docker image
167-
uses: docker/build-push-action@v6
168-
with:
169-
context: .
170-
file: ./Dockerfile-lb
171-
platforms: linux/amd64
172-
push: false
173-
tags: flash-lb:test
174-
cache-from: type=gha
175-
cache-to: type=gha,mode=max
176-
load: true
177-
178100
docker-test-lb-cpu:
179101
runs-on: ubuntu-latest
180102
if: github.event_name != 'pull_request' || github.head_ref != 'release-please--branches--main'
181-
needs: [lint]
182103
steps:
183-
- name: Clear Space
184-
# Note: Manual cleanup sufficient for CPU images (python:3.12-slim base ~150MB)
185-
# Only CUDA images need aggressive cleanup due to their ~10-15GB size
186-
run: |
187-
rm -rf /usr/share/dotnet
188-
rm -rf /opt/ghc
189-
rm -rf "/usr/local/share/boost"
190-
rm -rf "$AGENT_TOOLSDIRECTORY"
191-
docker system prune -af
192-
df -h
193-
194104
- name: Checkout repository
195105
uses: actions/checkout@v4
196-
with:
197-
fetch-depth: 0
198-
199-
- name: Set up QEMU
200-
uses: docker/setup-qemu-action@v3
201106

202107
- name: Set up Docker Buildx
203108
uses: docker/setup-buildx-action@v3
204109

205-
- name: Set up uv
206-
uses: astral-sh/setup-uv@v4
207-
with:
208-
enable-cache: true
209-
210-
- name: Setup dependencies
211-
run: uv sync
212-
213110
- name: Build CPU Load Balancer Docker image
214111
uses: docker/build-push-action@v6
215112
with:
@@ -222,14 +119,19 @@ jobs:
222119
cache-to: type=gha,mode=max
223120
load: true
224121

122+
- name: Test LB handler execution in Docker environment
123+
run: |
124+
echo "Testing LB handler in Docker environment..."
125+
docker run --rm flash-lb-cpu:test ./test-lb-handler.sh
126+
225127
docker-validation:
226128
runs-on: ubuntu-latest
227-
needs: [test, lint, docker-test, docker-test-lb, docker-test-lb-cpu]
129+
needs: [test, lint, docker-test, docker-test-lb-cpu]
228130
if: always()
229131
steps:
230132
- name: Check all jobs succeeded
231133
run: |
232-
results=("${{ needs.test.result }}" "${{ needs.lint.result }}" "${{ needs.docker-test.result }}" "${{ needs.docker-test-lb.result }}" "${{ needs.docker-test-lb-cpu.result }}")
134+
results=("${{ needs.test.result }}" "${{ needs.lint.result }}" "${{ needs.docker-test.result }}" "${{ needs.docker-test-lb-cpu.result }}")
233135
for result in "${results[@]}"; do
234136
if [[ "$result" != "success" && "$result" != "skipped" ]]; then
235137
echo "One or more quality checks failed (got: $result)"
@@ -262,18 +164,9 @@ jobs:
262164
needs: [release]
263165
if: needs.release.outputs.release_created
264166
steps:
265-
- name: Free Disk Space
266-
uses: jlumbroso/free-disk-space@v1.3.1
267-
with:
268-
tool-cache: true
269-
android: true
270-
dotnet: true
271-
haskell: true
272-
large-packages: true
273-
swap-storage: true
274-
275-
- name: Additional cleanup and report
167+
- name: Clear Space
276168
run: |
169+
rm -rf /usr/share/dotnet /opt/ghc /usr/local/share/boost "$AGENT_TOOLSDIRECTORY"
277170
docker system prune -af
278171
df -h
279172
@@ -394,18 +287,9 @@ jobs:
394287
needs: [release]
395288
if: needs.release.outputs.release_created
396289
steps:
397-
- name: Free Disk Space
398-
uses: jlumbroso/free-disk-space@v1.3.1
399-
with:
400-
tool-cache: true
401-
android: true
402-
dotnet: true
403-
haskell: true
404-
large-packages: true
405-
swap-storage: true
406-
407-
- name: Additional cleanup and report
290+
- name: Clear Space
408291
run: |
292+
rm -rf /usr/share/dotnet /opt/ghc /usr/local/share/boost "$AGENT_TOOLSDIRECTORY"
409293
docker system prune -af
410294
df -h
411295
@@ -463,18 +347,9 @@ jobs:
463347
needs: [release]
464348
if: needs.release.outputs.release_created
465349
steps:
466-
- name: Free Disk Space
467-
uses: jlumbroso/free-disk-space@v1.3.1
468-
with:
469-
tool-cache: true
470-
android: true
471-
dotnet: true
472-
haskell: true
473-
large-packages: true
474-
swap-storage: true
475-
476-
- name: Additional cleanup and report
350+
- name: Clear Space
477351
run: |
352+
rm -rf /usr/share/dotnet /opt/ghc /usr/local/share/boost "$AGENT_TOOLSDIRECTORY"
478353
docker system prune -af
479354
df -h
480355

release-please-config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
{"type": "chore", "section": "Miscellaneous", "hidden": true}
1515
],
1616
"extra-files": [
17-
"src/__init__.py"
17+
"src/version.py"
1818
]
1919
}
2020
},

src/handler.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@
66

77
from logger import setup_logging
88
from unpack_volume import maybe_unpack
9+
from version import format_version_banner
910

1011
# Initialize logging configuration
1112
setup_logging()
1213

14+
logger = logging.getLogger(__name__)
15+
1316
# Unpack Flash deployment artifacts if running in Flash mode
1417
# This is a no-op for Live Serverless and local development
1518
maybe_unpack()
1619

17-
logger = logging.getLogger(__name__)
20+
# Log after unpack so bundled runpod_flash is on sys.path
21+
logger.info(format_version_banner())
1822

1923

2024
def _load_generated_handler() -> Optional[Any]:
@@ -62,7 +66,7 @@ def _load_generated_handler() -> Optional[Any]:
6266
except ImportError as e:
6367
logger.warning(
6468
"Generated handler %s failed to import (missing dependency: %s). "
65-
"Deploy with --use-local-flash to include latest runpod_flash. "
69+
"Redeploy to include latest runpod_flash. "
6670
"Falling back to FunctionRequest handler.",
6771
handler_file,
6872
e,

src/lb_handler.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929

3030
from logger import setup_logging
3131
from unpack_volume import maybe_unpack
32+
from version import format_version_banner
3233

3334
# Initialize logging configuration
3435
setup_logging()
@@ -38,6 +39,9 @@
3839
# This is a no-op for Live Serverless and local development
3940
maybe_unpack()
4041

42+
# Log after unpack so bundled runpod_flash is on sys.path
43+
logger.info(format_version_banner())
44+
4145
# Import from bundled /app/runpod_flash (no system package)
4246
# These imports must happen AFTER maybe_unpack() so /app is in sys.path
4347
from runpod_flash.protos.remote_execution import FunctionRequest, FunctionResponse # noqa: E402

src/test-lb-handler.sh

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,20 @@ trap cleanup EXIT
2424

2525
# Start FastAPI server in background
2626
echo "Starting FastAPI server on port $PORT..."
27-
PYTHONPATH=. uv run python3 -m uvicorn lb_handler:app --host $HOST --port $PORT --log-level error > /tmp/lb_handler.log 2>&1 &
27+
if [ -f /.dockerenv ]; then
28+
# Docker: use system python with pre-installed packages
29+
PYTHONPATH=. python3 -m uvicorn lb_handler:app --host $HOST --port $PORT --log-level error > /tmp/lb_handler.log 2>&1 &
30+
else
31+
# Local: use uv run to manage dependencies
32+
PYTHONPATH=. uv run python3 -m uvicorn lb_handler:app --host $HOST --port $PORT --log-level error > /tmp/lb_handler.log 2>&1 &
33+
fi
2834
SERVER_PID=$!
2935

3036
# Wait for server to be ready
3137
echo "Waiting for server to be ready..."
3238
attempt=0
3339
while [ $attempt -lt $TEST_TIMEOUT ]; do
34-
if curl -s -f "http://$HOST:$PORT/health" > /dev/null 2>&1; then
40+
if curl -s -f "http://$HOST:$PORT/ping" > /dev/null 2>&1; then
3541
echo "✓ Server is ready"
3642
break
3743
fi
@@ -46,10 +52,10 @@ while [ $attempt -lt $TEST_TIMEOUT ]; do
4652
fi
4753
done
4854

49-
# Test /health endpoint
55+
# Test /ping endpoint
5056
echo ""
51-
echo "Testing /health endpoint..."
52-
health_response=$(curl -s "http://$HOST:$PORT/health")
57+
echo "Testing /ping endpoint..."
58+
health_response=$(curl -s "http://$HOST:$PORT/ping")
5359
echo "Response: $health_response"
5460

5561
# Run /execute tests

src/version.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
"""Version utilities for flash-worker boot logging."""
2+
3+
from importlib.metadata import PackageNotFoundError, version
4+
5+
__version__ = "1.1.0" # x-release-please-version
6+
7+
8+
def _get_version(package_name: str) -> str:
9+
try:
10+
return version(package_name)
11+
except PackageNotFoundError:
12+
return "unknown"
13+
14+
15+
def get_worker_version() -> str:
16+
return __version__
17+
18+
19+
def get_flash_version() -> str:
20+
"""Read bundled flash version, falling back to pip metadata."""
21+
try:
22+
from runpod_flash import __version__ as flash_ver
23+
24+
return str(flash_ver)
25+
except (ImportError, AttributeError):
26+
return _get_version("runpod-flash")
27+
28+
29+
def get_runpod_version() -> str:
30+
return _get_version("runpod")
31+
32+
33+
def format_version_banner() -> str:
34+
return (
35+
f"Starting Flash Worker {get_worker_version()} | "
36+
f"runpod-flash {get_flash_version()} | "
37+
f"runpod {get_runpod_version()}"
38+
)

tests/unit/test_lb_handler.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,14 @@
1414

1515

1616
# Mock heavy dependencies before importing lb_handler to prevent side effects
17+
_mock_version = MagicMock()
18+
_mock_version.format_version_banner = MagicMock(return_value="Starting Flash Worker vtest")
19+
1720
_MOCK_MODULES = {
1821
"logger": MagicMock(),
1922
"unpack_volume": MagicMock(),
2023
"remote_executor": MagicMock(),
24+
"version": _mock_version,
2125
"runpod_flash": MagicMock(),
2226
"runpod_flash.protos": MagicMock(),
2327
"runpod_flash.protos.remote_execution": MagicMock(),

0 commit comments

Comments
 (0)