Skip to content

fix: release 1.0.3 - security hardening and quality improvements#20

Merged
Romain-Grosos merged 19 commits intomainfrom
fix/release-1.0.3
Mar 12, 2026
Merged

fix: release 1.0.3 - security hardening and quality improvements#20
Romain-Grosos merged 19 commits intomainfrom
fix/release-1.0.3

Conversation

@Romain-Grosos
Copy link
Contributor

Summary

  • Security hardening: narrowed exception types (crypto, tar extraction), SHA256 pre-verification, TOCTOU fix on analysis cache, OCI per-IP rate limiting (200 req/min)
  • Reliability: graceful shutdown with ThreadPoolExecutor drain, socket timeout (60s) for slowloris mitigation, GC pre-delete audit log
  • Code quality: dead code cleanup, deduplicated resolver logic (-52 lines), removed duplicate imports, config validation for gc/transfer_path/unknown keys
  • Tests: 12 new tests (OCI rate limit, Content-Length, blob streaming 5 MiB, slowloris, config validation, log limits)
  • Docs: CHANGELOG, README install examples, RPM specs updated to 1.0.3

Test plan

  • 597 tests pass, 9 skipped (platform-dependent)
  • ruff check . clean
  • ruff format --check . clean
  • All security fixes covered by dedicated tests

_last_gc_report was read/written without synchronization across
concurrent request threads. Add _gc_lock to protect gc_report()
writes and gc_execute() reads, consistent with _analysis_lock
pattern in BunckerServer.
Switch from BaseHTTPRequestHandler to a WSGI-based architecture using
waitress for production-grade HTTP serving. Waitress provides proper
connection management, request parsing, and thread pooling out of the
box. For TLS mode, fall back to stdlib WSGIServer since waitress does
not support SSL in its async I/O model.

- Convert BunckerHandler to standalone WSGI-compatible class
- Add _WSGIHeaders adapter for environ-based header access
- Add _ResponseWriter for response body buffering
- Add create_wsgi_app() factory function
- Use waitress.create_server() for non-TLS (common case)
- Use ThreadingMixIn + WSGIServer for TLS fallback
- Fix reserved LogRecord attribute conflict (filename -> manifest_filename)
- Fix oversized body test to work with waitress request parsing
- Use case-insensitive HTTPMessage headers in test helpers
Use OCIPlatform from shared.oci to properly parse platform strings
with os/arch/variant format (e.g. linux/arm/v7). Previously only
os/arch was parsed, silently ignoring the variant component.
Replace waitress with stdlib WSGIServer for both TLS and non-TLS
modes. Waitress cannot serve TLS (async I/O incompatible with SSL),
making it unsuitable for the LAN use case that actually needs a
production-grade server.

- Stream blob responses via WSGI iterator (no full memory buffering)
- Increase chunk size from 64 KiB to 1 MiB for better throughput
- Enable TCP_NODELAY on accepted connections
- Set TCP backlog to 32 for predictable queuing
- Remove python3-waitress dependency
- Fix FD leak in Store.import_blob error path
- Validate Content-Length as integer in 3 handler locations
- Bounds-check log limit parameter (0-10000)
- Reject symlinks/hardlinks in tar extraction (Python <3.12)
- Handle HMAC decode errors as TransferError
- SHA256 pre-verification before streaming blob response
- Fix TOCTOU on analysis cache (lock covers id comparison)
- Add per-IP OCI rate limiting (200 req/min sliding window)
- Update install examples from 1.0.1 to 1.0.3
- Document OCI rate limiting (200 req/min) in security section
- Bump RPM spec versions from 1.0.1 to 1.0.3
- crypto.py: catch (ValueError, InvalidTag) instead of bare Exception
- transfer.py: catch CryptoError instead of bare Exception on decrypt
- transfer.py: match tarfile exception class names instead of string
  matching on error messages for Python 3.12+ filter errors
- server: pool.shutdown(wait=True, cancel_futures=True) drains
  in-flight requests before closing; join timeout raised to 5s
- store: gc_pre_delete audit log emitted before any blob deletion
  with full digest list for forensic recovery
- store: use contextlib.suppress for OSError in FD cleanup
- Validate gc.inactive_days_threshold >= 1
- Validate transfer_path is a string when set
- Warn on unknown config keys to catch typos
- Add tests for all new validation rules
- OCI rate limiter returns 429 with TOOMANYREQUESTS on manifests/blobs
- OCI rate limiter does not affect /v2/ root endpoint
- Non-integer Content-Length returns 400 instead of 500
- 4 MiB blob streaming test verifies multi-chunk transfer
- Log limit bounds tests: negative and excessive values return 400
- Add graceful shutdown, GC audit log, narrowed exceptions
- Add config validation improvements
- Document all security and quality enhancements
- Log server_stopping with pending worker count before shutdown
- Move socket timeout=60 to _QuietWSGIHandler (was dead code on
  BunckerHandler after WSGI refactor)
- Fix CryptoError double-wrap in decrypt_env_value (use from None)
- Add slowloris timeout test (2s patched, validates connection close
  and server recovery)
- Add 5 MiB blob streaming test (multi-chunk SHA256 integrity)
- __main__.py: remove duplicate import contextlib in _cmd_setup
- __main__.py: remove duplicate import shutil in _cmd_api_setup
- resolver.py: resolve_dockerfile now delegates to _resolve_image_blobs
  instead of duplicating 60 lines of identical blob resolution logic
@Romain-Grosos Romain-Grosos added this to the v1.0.3 milestone Mar 12, 2026
@Romain-Grosos Romain-Grosos self-assigned this Mar 12, 2026
@Romain-Grosos Romain-Grosos merged commit f89d608 into main Mar 12, 2026
7 checks passed
@Romain-Grosos Romain-Grosos deleted the fix/release-1.0.3 branch March 12, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant