Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed

- PBKDF2 iterations bumped from 600,000 to 1,200,000 (2x OWASP 2023 minimum for SHA-256)

### Security

- Thread-safe `_last_analysis` state with `threading.Lock` (race condition fix)
- Path traversal hardening in `/admin/analyze`: `Path.resolve()` + `is_file()` check

## [0.8.0] - 2026-03-06

### Added
Expand Down
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
<img src="https://img.shields.io/badge/platform-Debian%2FUbuntu-A81D33.svg?logo=debian&logoColor=white" alt="Platform">
<img src="https://img.shields.io/badge/packaging-.deb-orange.svg" alt="Packaging">
<img src="https://img.shields.io/badge/status-v0.8.0-brightgreen.svg" alt="Status">
<img src="https://img.shields.io/badge/coverage-%E2%89%A580%25-brightgreen.svg" alt="Coverage">
</p>

---
Expand Down Expand Up @@ -463,8 +464,11 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for full development setup and guidelines

| Feature | Description | Status |
|---------|-------------|--------|
| PBKDF2 iterations bump | Increase key derivation from 600k to 1M+ iterations (OWASP 2023+) | Planned |
| PRD mnemonic doc fix | Fix PRD to reflect 16-word mnemonic (salt encoded in last 4 words) | Planned |
| Thread-safe analysis state | Add `threading.Lock` around `_last_analysis` in server/handler for correctness | Done |
| Path traversal hardening | Use `Path.resolve()` + file existence check in `/admin/analyze` path mode | Done |
| PBKDF2 iterations bump | Increase key derivation from 600k to 1.2M iterations (2x OWASP 2023 minimum) | Done |
| PRD mnemonic doc fix | Fix PRD to reflect 16-word mnemonic (salt encoded in last 4 words) | Done |
| Test coverage reporting | Add pytest-cov with coverage badge in README and CI coverage gate | Done |
| Import cleanup flag | `--cleanup` flag on `buncker import` to delete .tar.enc after successful import | Planned |
| TLS cert expiry warning | Warn in `status` and logs when auto-signed certificate expires within 30 days | Planned |
| Streamlined `api-setup` | Auto-export ca.pem to a known path and display cert fingerprint during setup | Planned |
Expand All @@ -475,10 +479,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for full development setup and guidelines
| GC impact report | `gc --report` shows which images become non-pullable if candidates are deleted | Planned |
| Fetch rate limiting | Auto-pace blob downloads based on registry `RateLimit-*` headers | Planned |
| Manifest auto-refresh | buncker-fetch re-downloads manifests on every fetch and warns if upstream digest changed | Planned |
| Thread-safe analysis state | Add `threading.Lock` around `_last_analysis` in server/handler for correctness | Planned |
| Path traversal hardening | Use `Path.resolve()` + file existence check in `/admin/analyze` path mode | Planned |
| Admin API rate-limiting | Per-IP request throttling on `/admin/*` endpoints to mitigate DoS on LAN exposure | Planned |
| Test coverage reporting | Add pytest-cov with coverage badge in README and CI coverage gate | Planned |

## License

Expand Down
19 changes: 11 additions & 8 deletions buncker/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,13 +346,13 @@ def _handle_admin_analyze(self):
)
return

# Path traversal prevention
if ".." in Path(dockerfile).parts:
# Path traversal prevention: resolve symlinks and verify file exists
dockerfile_path = Path(dockerfile).resolve()
if ".." in Path(dockerfile).parts or not dockerfile_path.is_file():
self._send_admin_error(
400, "INVALID_PATH", "path traversal not allowed"
)
return
dockerfile_path = Path(dockerfile)
else:
self._send_admin_error(
400, "MISSING_FIELD", "dockerfile or dockerfile_content field required"
Expand Down Expand Up @@ -382,8 +382,9 @@ def _handle_admin_analyze(self):
if dockerfile_content:
dockerfile_path.unlink(missing_ok=True)

# Store analysis result for generate-manifest
self._server_ref._last_analysis = result
# Store analysis result for generate-manifest (thread-safe)
with self._server_ref._analysis_lock:
self._server_ref._last_analysis = result

report = {
"source_path": result.source_path,
Expand Down Expand Up @@ -419,7 +420,8 @@ def _handle_admin_analyze(self):

def _handle_admin_generate_manifest(self):
"""POST /admin/generate-manifest - Generate encrypted transfer request."""
analysis = getattr(self._server_ref, "_last_analysis", None)
with self._server_ref._analysis_lock:
analysis = self._server_ref._last_analysis
if analysis is None:
self._send_admin_error(
409, "NO_ANALYSIS", "no analysis pending - run /admin/analyze first"
Expand Down Expand Up @@ -492,8 +494,9 @@ def _handle_admin_generate_manifest(self):
},
)

# Clear analysis after generation
self._server_ref._last_analysis = None
# Clear analysis after generation (thread-safe)
with self._server_ref._analysis_lock:
self._server_ref._last_analysis = None

def _handle_admin_import(self):
"""POST /admin/import - Import encrypted transfer response."""
Expand Down
1 change: 1 addition & 0 deletions buncker/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ def __init__(
self.api_enabled = api_enabled
self._start_time: float | None = None
self._last_analysis = None
self._analysis_lock = threading.Lock()

def start(self) -> None:
"""Start the server in a background thread."""
Expand Down
8 changes: 4 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -477,13 +477,13 @@ sequenceDiagram
participant F as buncker-fetch

OFF->>D: buncker setup
D->>D: generate_mnemonic() → 12 BIP-39 words
D->>D: generate_mnemonic() → 16 BIP-39 words (12 secret + 4 salt)
D->>D: derive_keys + save config
D-->>OFF: Display 12 words (write on paper)
D-->>OFF: Display 16 words (write on paper)
Note over OFF,ON: Human channel (verbal, paper)
ON->>F: buncker-fetch pair
F-->>ON: Enter 12 words
ON->>F: word1 word2 ... word12
F-->>ON: Enter 16 words
ON->>F: word1 word2 ... word16
F->>F: derive_keys + save config
F-->>ON: Pairing OK
```
Expand Down
8 changes: 4 additions & 4 deletions docs/architecture/7-core-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,13 +128,13 @@ sequenceDiagram
participant F as buncker-fetch

OFF->>D: buncker setup
D->>D: generate_mnemonic() → 12 BIP-39 words
D->>D: generate_mnemonic() → 16 BIP-39 words (12 secret + 4 salt)
D->>D: derive_keys + save config
D-->>OFF: Display 12 words (write on paper)
D-->>OFF: Display 16 words (write on paper)
Note over OFF,ON: Human channel (verbal, paper)
ON->>F: buncker-fetch pair
F-->>ON: Enter 12 words
ON->>F: word1 word2 ... word12
F-->>ON: Enter 16 words
ON->>F: word1 word2 ... word16
F->>F: derive_keys + save config
F-->>ON: Pairing OK
```
Expand Down
10 changes: 5 additions & 5 deletions docs/prd.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Buncker is the only tool combining Dockerfile resolution + delta sync + encrypti
- **FR6:** The offline daemon imports a response.tar.enc by sequentially verifying: decryption → HMAC → SHA256 per blob → atomic storage in the store
- **FR7:** The offline daemon exposes the OCI Distribution API (pull subset: GET/HEAD manifests and blobs) to allow Docker clients to pull images without configuration changes (beyond hosts.toml)
- **FR8:** The offline daemon is a permanent HTTP service (systemd) simultaneously serving the OCI API to build clients and the administration API to the operator
- **FR9:** The system supports initial pairing via BIP-39 mnemonic (12 words) communicated through a human channel, with PBKDF2 derivation of AES and HMAC keys
- **FR9:** The system supports initial pairing via BIP-39 mnemonic (16 words: 12 secret + 4 salt) communicated through a human channel, with PBKDF2 derivation of AES and HMAC keys
- **FR10:** The system supports key rotation with a configurable grace period
- **FR11:** Blob GC is manual only: inactive candidates report → operator confirmation → deletion. Never automatic deletion
- **FR12:** The system produces structured JSON Lines logs (append-only) for every event: analysis, manifest generation, import, pull, GC, key rotation
Expand Down Expand Up @@ -243,14 +243,14 @@ I want the cryptographic primitives implemented and tested,
so that all transfer security relies on proven, audited code.

**Acceptance Criteria:**
1. generate_mnemonic() returns 12 words from BIP-39 wordlist (2048 words) with secrets.token_bytes entropy
2. derive_keys(mnemonic, salt, iterations=600_000) returns tuple (aes_key, hmac_key) via PBKDF2-SHA256
1. generate_mnemonic() returns 16 words from BIP-39 wordlist (12 secret + 4 salt) with secrets.token_bytes entropy
2. derive_keys(mnemonic, salt, iterations=1_200_000) returns tuple (aes_key, hmac_key) via PBKDF2-SHA256
3. encrypt(data, aes_key) encrypts with AES-256-GCM and returns nonce + ciphertext + tag
4. decrypt(data, aes_key) decrypts and verifies auth tag. Raises CryptoError if invalid
5. sign(data, hmac_key) returns HMAC-SHA256 hex digest
6. verify(data, hmac_key, signature) returns bool (constant-time comparison)
7. shared/wordlist.py contains the complete BIP-39 wordlist (2048 words) embedded
8. Unit tests: round-trip encrypt/decrypt, wrong key → CryptoError, valid/invalid HMAC, mnemonic has 12 valid words
8. Unit tests: round-trip encrypt/decrypt, wrong key → CryptoError, valid/invalid HMAC, mnemonic has 16 valid words
9. 100% coverage on this module

#### Story 1.3 - OCI Module (shared/oci)
Expand Down Expand Up @@ -490,7 +490,7 @@ I want a complete CLI to manage the online side,
so that I can pair, inspect, fetch, and manage the cache.

**Acceptance Criteria:**
1. buncker-fetch pair: enter 12 words, derive keys, save config
1. buncker-fetch pair: enter 16 words, derive keys, save config
2. buncker-fetch inspect: decrypt, display summary
3. buncker-fetch fetch: full cycle with --output and --parallelism options
4. buncker-fetch status: cache state
Expand Down
2 changes: 1 addition & 1 deletion docs/prd/3-requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- **FR6:** The offline daemon imports a response.tar.enc by sequentially verifying: decryption → HMAC → SHA256 per blob → atomic storage in the store
- **FR7:** The offline daemon exposes the OCI Distribution API (pull subset: GET/HEAD manifests and blobs) to allow Docker clients to pull images without configuration changes (beyond hosts.toml)
- **FR8:** The offline daemon is a permanent HTTP service (systemd) simultaneously serving the OCI API to build clients and the administration API to the operator
- **FR9:** The system supports initial pairing via BIP-39 mnemonic (12 words) communicated through a human channel, with PBKDF2 derivation of AES and HMAC keys
- **FR9:** The system supports initial pairing via BIP-39 mnemonic (16 words: 12 secret + 4 salt) communicated through a human channel, with PBKDF2 derivation of AES and HMAC keys
- **FR10:** The system supports key rotation with a configurable grace period
- **FR11:** Blob GC is manual only: inactive candidates report → operator confirmation → deletion. Never automatic deletion
- **FR12:** The system produces structured JSON Lines logs (append-only) for every event: analysis, manifest generation, import, pull, GC, key rotation
Expand Down
8 changes: 4 additions & 4 deletions docs/prd/7-epic-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ I want the cryptographic primitives implemented and tested,
so that all transfer security relies on proven, audited code.

**Acceptance Criteria:**
1. generate_mnemonic() returns 12 words from BIP-39 wordlist (2048 words) with secrets.token_bytes entropy
2. derive_keys(mnemonic, salt, iterations=600_000) returns tuple (aes_key, hmac_key) via PBKDF2-SHA256
1. generate_mnemonic() returns 16 words from BIP-39 wordlist (12 secret + 4 salt) with secrets.token_bytes entropy
2. derive_keys(mnemonic, salt, iterations=1_200_000) returns tuple (aes_key, hmac_key) via PBKDF2-SHA256
3. encrypt(data, aes_key) encrypts with AES-256-GCM and returns nonce + ciphertext + tag
4. decrypt(data, aes_key) decrypts and verifies auth tag. Raises CryptoError if invalid
5. sign(data, hmac_key) returns HMAC-SHA256 hex digest
6. verify(data, hmac_key, signature) returns bool (constant-time comparison)
7. shared/wordlist.py contains the complete BIP-39 wordlist (2048 words) embedded
8. Unit tests: round-trip encrypt/decrypt, wrong key → CryptoError, valid/invalid HMAC, mnemonic has 12 valid words
8. Unit tests: round-trip encrypt/decrypt, wrong key → CryptoError, valid/invalid HMAC, mnemonic has 16 valid words
9. 100% coverage on this module

### Story 1.3 - OCI Module (shared/oci)
Expand Down Expand Up @@ -273,7 +273,7 @@ I want a complete CLI to manage the online side,
so that I can pair, inspect, fetch, and manage the cache.

**Acceptance Criteria:**
1. buncker-fetch pair: enter 12 words, derive keys, save config
1. buncker-fetch pair: enter 16 words, derive keys, save config
2. buncker-fetch inspect: decrypt, display summary
3. buncker-fetch fetch: full cycle with --output and --parallelism options
4. buncker-fetch status: cache state
Expand Down
2 changes: 1 addition & 1 deletion docs/qa/gates/1.2-crypto-module.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ schema: 1
story: '1.2'
story_title: 'Crypto Module'
gate: PASS
status_reason: 'All 9 ACs verified: generate_mnemonic uses secrets.token_bytes(16) with proper BIP-39 checksum, derive_keys uses PBKDF2-SHA256 at 600,000 iterations returning (aes_key, hmac_key) 32 bytes each, encrypt uses AES-256-GCM with os.urandom(12) nonce returning nonce+ciphertext+tag, decrypt raises CryptoError on auth failure, sign returns HMAC-SHA256 hex digest, verify uses hmac.compare_digest for constant-time comparison, wordlist.py embeds 2048 BIP-39 words with assertion, 21 tests cover all round-trip and failure paths. CryptoError migrated to shared.exceptions and re-exported from shared.crypto for compatibility. 100% coverage reported by dev agent.'
status_reason: 'All 9 ACs verified: generate_mnemonic uses secrets.token_bytes(16) with proper BIP-39 checksum (16 words: 12 secret + 4 salt), derive_keys uses PBKDF2-SHA256 at 1,200,000 iterations returning (aes_key, hmac_key) 32 bytes each, encrypt uses AES-256-GCM with os.urandom(12) nonce returning nonce+ciphertext+tag, decrypt raises CryptoError on auth failure, sign returns HMAC-SHA256 hex digest, verify uses hmac.compare_digest for constant-time comparison, wordlist.py embeds 2048 BIP-39 words with assertion, 21 tests cover all round-trip and failure paths. CryptoError migrated to shared.exceptions and re-exported from shared.crypto for compatibility. 100% coverage reported by dev agent.'
reviewer: 'Quinn (Test Architect)'
updated: '2026-03-04T23:00:00Z'
top_issues: []
Expand Down
20 changes: 10 additions & 10 deletions docs/stories/1.2.story.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ Done

## Acceptance Criteria

1. `generate_mnemonic()` returns 12 words from BIP-39 wordlist (2048 words) with `secrets.token_bytes` entropy
2. `derive_keys(mnemonic, salt, iterations=600_000)` returns tuple `(aes_key, hmac_key)` via PBKDF2-SHA256
1. `generate_mnemonic()` returns 16 words from BIP-39 wordlist (12 secret + 4 salt) with `secrets.token_bytes` entropy
2. `derive_keys(mnemonic, salt, iterations=1_200_000)` returns tuple `(aes_key, hmac_key)` via PBKDF2-SHA256
3. `encrypt(data, aes_key)` encrypts with AES-256-GCM and returns nonce + ciphertext + tag
4. `decrypt(data, aes_key)` decrypts and verifies auth tag. Raises `CryptoError` if invalid
5. `sign(data, hmac_key)` returns HMAC-SHA256 hex digest
6. `verify(data, hmac_key, signature)` returns bool (constant-time comparison)
7. `shared/wordlist.py` contains the complete BIP-39 wordlist (2048 words) embedded
8. Unit tests: round-trip encrypt/decrypt, wrong key -> CryptoError, valid/invalid HMAC, mnemonic has 12 valid words
8. Unit tests: round-trip encrypt/decrypt, wrong key -> CryptoError, valid/invalid HMAC, mnemonic has 16 valid words
9. 100% coverage on this module

## Tasks / Subtasks
Expand All @@ -29,11 +29,11 @@ Done
- [x] Add a simple validation: `assert len(WORDLIST) == 2048`
- [x] Task 2: Implement `generate_mnemonic()` (AC: 1)
- [x] Use `secrets.token_bytes(16)` for 128-bit entropy
- [x] Convert entropy to 12 word indices from BIP-39 wordlist
- [x] Return space-separated string of 12 words
- [x] Convert entropy to 12 word indices from BIP-39 wordlist + 4 salt word indices
- [x] Return space-separated string of 16 words
- [x] Task 3: Implement `derive_keys()` (AC: 2)
- [x] Use `cryptography.hazmat.primitives.kdf.pbkdf2.PBKDF2HMAC` with SHA256
- [x] 600,000 iterations default, derive 64 bytes, split into `aes_key` (32 bytes) + `hmac_key` (32 bytes)
- [x] 1,200,000 iterations default, derive 64 bytes, split into `aes_key` (32 bytes) + `hmac_key` (32 bytes)
- [x] Accept mnemonic as string, encode to bytes internally
- [x] Task 4: Implement `encrypt()` and `decrypt()` (AC: 3, 4)
- [x] Use `cryptography.hazmat.primitives.ciphers.aead.AESGCM`
Expand All @@ -43,7 +43,7 @@ Done
- [x] `sign`: `hmac.new(hmac_key, data, hashlib.sha256).hexdigest()`
- [x] `verify`: `hmac.compare_digest()` for constant-time comparison
- [x] Task 6: Write comprehensive unit tests (AC: 8, 9)
- [x] Test `generate_mnemonic`: returns 12 words, all words in wordlist, two calls differ
- [x] Test `generate_mnemonic`: returns 16 words, all words in wordlist, two calls differ
- [x] Test `derive_keys`: deterministic (same input = same output), returns 32-byte keys
- [x] Test encrypt/decrypt round-trip with various data sizes
- [x] Test decrypt with wrong key raises `CryptoError`
Expand Down Expand Up @@ -71,7 +71,7 @@ The crypto module is in `shared/crypto.py` and is used by both `buncker` and `bu
**PBKDF2:**
- Library: `cryptography.hazmat.primitives.kdf.pbkdf2.PBKDF2HMAC`
- Hash: SHA256
- Iterations: 600,000 (default, configurable)
- Iterations: 1,200,000 (default, configurable)
- Output: 64 bytes total, split into two 32-byte keys (AES key + HMAC key)

**HMAC:**
Expand Down Expand Up @@ -168,7 +168,7 @@ Claude Opus 4.6

### Code Quality Assessment

Implementation is correct and complete. `generate_mnemonic` uses `secrets.token_bytes(16)` with a proper BIP-39 checksum (SHA256 first 4 bits appended to 128-bit entropy, then split into 12 x 11-bit indices). `derive_keys` uses PBKDF2-SHA256 at 600,000 iterations deriving 64 bytes split into two 32-byte keys. `encrypt`/`decrypt` use `AESGCM` with 12-byte `os.urandom` nonce; `decrypt` guards against short data and wraps all crypto exceptions into `CryptoError`. `sign`/`verify` use stdlib `hmac` with `compare_digest` for constant-time comparison. Wordlist is a `tuple` of 2048 words with a module-level `assert len(WORDLIST) == 2048`. All public functions have docstrings and type hints. `CryptoError` correctly migrated to `shared.exceptions` after Story 1.4 with a re-export in `shared.crypto` for backward compatibility.
Implementation is correct and complete. `generate_mnemonic` uses `secrets.token_bytes(16)` with a proper BIP-39 checksum (SHA256 first 4 bits appended to 128-bit entropy, then split into 12 x 11-bit indices + 4 salt word indices). `derive_keys` uses PBKDF2-SHA256 at 1,200,000 iterations deriving 64 bytes split into two 32-byte keys. `encrypt`/`decrypt` use `AESGCM` with 12-byte `os.urandom` nonce; `decrypt` guards against short data and wraps all crypto exceptions into `CryptoError`. `sign`/`verify` use stdlib `hmac` with `compare_digest` for constant-time comparison. Wordlist is a `tuple` of 2048 words with a module-level `assert len(WORDLIST) == 2048`. All public functions have docstrings and type hints. `CryptoError` correctly migrated to `shared.exceptions` after Story 1.4 with a re-export in `shared.crypto` for backward compatibility.

### Refactoring Performed

Expand Down Expand Up @@ -196,7 +196,7 @@ None.

### Performance Considerations

PBKDF2 at 600,000 iterations is the NIST-recommended minimum for SHA256. Tests use `iterations=1000` to keep the test suite fast - correct pattern.
PBKDF2 at 1,200,000 iterations is 2x the OWASP 2023 minimum for SHA256. Tests use `iterations=1000` to keep the test suite fast - correct pattern.

### Files Modified During Review

Expand Down
2 changes: 1 addition & 1 deletion docs/stories/3.4.story.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Done
- [x] Generate salt via `os.urandom(32)`
- [x] Create config file with `source_id`, `mnemonic_hash` (SHA256 of mnemonic), `salt`
- [x] Initialize store directory
- [x] Display 12 words to operator (one-time display, never stored in cleartext)
- [x] Display 16 words to operator (one-time display, never stored in cleartext)
- [x] Task 3: Implement `buncker serve` (AC: 2)
- [x] Load config
- [x] Read mnemonic from `BUNCKER_MNEMONIC` env var or prompt on stdin
Expand Down
Loading