Skip to content

Feat/windows junction support#2985

Draft
ChanTsune wants to merge 30 commits intomainfrom
feat/windows-junction-support
Draft

Feat/windows junction support#2985
ChanTsune wants to merge 30 commits intomainfrom
feat/windows-junction-support

Conversation

@ChanTsune
Copy link
Copy Markdown
Owner

No description provided.

Move the Windows reparse-point FFI out of pna into cli so pna stays
platform-dependency-free. Add PathnameEditor::edit_junction delegating
to a helper shared with edit_symlink. Declare StoreAs::Junction as
unconditional for exhaustive matching. Accept both absolute and relative
stored targets at extract for forward compatibility. Correct
detect_junction's error-kind mapping, switch to OsStr::encode_wide for
non-lossy path encoding in create_junction, and fix the push branch name.
Codex adversarial review flagged that the plan instructed implementers to
create cli/src/utils/os/windows/fs/mod.rs and edit
cli/src/utils/os/windows/mod.rs, but the repository uses the sibling
layout: cli/src/utils/os/windows.rs declares `pub(crate) mod fs;` and
cli/src/utils/os/windows/fs.rs already holds FileHandle, chmod, lchown,
open_read_metadata, and pub(crate) mod owner. Creating a new fs/mod.rs
would conflict with fs.rs and strand those exports.

Reparse and junction now live as additional submodules of the existing
fs module: Task 1.1 adds `pub(crate) mod reparse;` to windows/fs.rs and
creates windows/fs/reparse.rs; Task 1.4 adds `pub(crate) mod junction;`
and creates windows/fs/junction.rs. The existing contents of
windows/fs.rs are left untouched.
Codex adversarial review flagged that `DataKind::HardLink + fLTP=Directory`
(junction or non-Windows fallback symlink) still went through the default
`restore_metadata()` path in extract.rs, which would chmod/chown/ACL/xattr/
fflag the junction via follow-link syscalls and therefore mutate the
external directory outside the extraction root. Option (A) from the
review discussion closes the hole by bypassing `restore_metadata()` for
junction entries and applying only no-follow timestamp restoration via a
new `restore_link_timestamps_no_follow` helper. A `TODO:
junction-aware no-follow metadata (deferred follow-up)` block on that
helper and a new Out-of-Scope entry document the full option (C) path
for a follow-up plan.

Task 4.4 locks the invariant with a security regression test (T19) that
pre-sets a distinctive mode on the external target, runs extract with
`--keep-permission --allow-unsafe-links`, and asserts the external mode
is unchanged. Self-review, risk summary, testing matrix and
type-consistency sections are updated accordingly.
Codex adversarial review flagged that `detect_junction`'s
`raw_os_error() == Some(4390)` check against ERROR_NOT_A_REPARSE_POINT
would never fire because `read_reparse_point` and `create_junction`
round-trip `windows::core::Error` through
`io::Error::from_raw_os_error(e.code().0)`. The windows crate encodes
Win32 failures as HRESULT via `HRESULT_FROM_WIN32` (0x80070000 | code),
so the stored raw_os_error is the HRESULT form (e.g. 0x80071126), not
the canonical Win32 code (4390).

Introduce `io_error_from_win32` in the reparse module. It inspects the
HRESULT facility bits and, when FACILITY_WIN32 is set, extracts the low
16 bits as the Win32 code; other HRESULTs pass through as raw i32. Every
`CreateFileW` / `DeviceIoControl` error path in read_reparse_point and
create_junction now goes through this helper, so downstream consumers
such as `detect_junction` see Microsoft-documented Win32 codes.

Adds two pure-Rust unit tests that verify the FACILITY_WIN32 extraction
and the non-Win32 pass-through, so the helper is regression-fenced
without needing a Windows CI round-trip.
Writing-plans skipped the brainstorming→spec step and produced a plan
directly, with all design decisions iterated inside the plan through
grill and Codex adversarial review rounds. This commits the spec
retroactively so architectural intent lives separately from task-level
steps.

Captures locked decisions (HardLink+fLTP=Directory encoding, cli-only
module placement, PathnameEditor::edit_junction with shared helper,
Option A metadata safety with early return, HRESULT→Win32 error
translation, non-UTF-8 invariant from UTF-16 gate) alongside the
rejected alternatives (cases B'/C', pna-crate module placement, option
B/C metadata approaches, direct edit_symlink reuse, HRESULT-form error
matching) and the deferred follow-ups (relative-path optimization,
full junction-aware no-follow metadata, public reparse-point API).
The spec `docs/superpowers/specs/2026-04-19-windows-junction-support-design.md`
now owns the WHY — problem, architecture, interfaces, invariants,
rejected alternatives, follow-ups. The plan keeps the HOW — Task 1.1
through Task 6.2 with concrete code snippets, commit commands, and
verification steps.

Removed duplicated sections from the plan (Goal, Architecture, Tech
Stack narrative, full In-scope/Out-of-scope list, Risk Summary table,
Testing Matrix detailed rows, File Structure tables) and replaced them
with short cross-references plus plan-specific anchors: where each of
T1-T19 lands, which task enforces which safety invariant, which files
are touched. The Task List and its embedded implementation details are
untouched.
ASCII lexicographic order places Win32_System_IO before Win32_System_Ioctl
('O' 0x4F < 'o' 0x6F). The plan originally placed them in the opposite
order, which the Task 1.1 code-quality review flagged as the only
sortable-by-ASCII anomaly in the list. Swap the two entries in both
Cargo.toml and the plan so the list is strictly sorted — the stable-diff
rationale the plan cites now actually holds.

No functional impact; feature ordering does not affect compilation.
The Task 1.1 code-quality review flagged that the IO_REPARSE_TAG_SYMLINK
arm of parse_reparse_buffer shipped without direct unit coverage. Add
three tests and the sample_symlink_buffer helper that exercises the
symlink layout (with the extra 4-byte Flags word): one for an absolute
target where \\??\\ is stripped, one for a relative target where the
raw string is preserved, and one for a truncated buffer that should
error with InvalidData before the flags word is read.

Updates the plan's Task 1.1 Step 3 test module to mirror the final
reparse.rs file so plan and implementation stay in sync.
The Task 1.2 code-quality review flagged three consistency gaps against
the rest of the crate's Windows FFI:

- Reuse `crate::utils::str::encode_wide`, which rejects embedded NUL
  bytes before null-terminating. The previous inline
  `encode_wide().collect() + push(0)` would let a path with an embedded
  NUL reach CreateFileW and open a silently-truncated different file.
  All nine sibling call sites in `cli/src/utils/os/windows/` already use
  this helper.
- Use `PCWSTR::from_raw` instead of the tuple-struct constructor, again
  matching the nine sibling call sites.
- Downgrade `read_reparse_point` from `pub` to `pub(crate)` to match the
  spec §5 commitment that these primitives stay CLI-private. Promotion
  into the `pna` crate behind a `windows-fs` feature is captured as an
  explicit follow-up in spec §11.

Plan Task 1.2 Step 4 and spec §5 are updated to mirror the source so
they don't drift again.
The Task 1.3 code-quality review flagged a silent `u16` truncation
pitfall: the reparse-buffer header stores SubstituteName and PrintName
byte lengths as `u16`, and `create_junction` was casting
`(wide.len() * 2) as u16` plus computing `data_len = 8 + subst + print`
as `u16` — a path exceeding ~32 KB of UTF-16 bytes would silently wrap
and emit a malformed reparse buffer. The 16 KB reparse-buffer kernel
cap acts as a backstop, but relying on that to backstop silent
corruption is weaker than the helper's contract should be.

Move the SubstituteName / PrintName construction and the u16 overflow
check ahead of `std::fs::create_dir`, so an oversized target now
returns `InvalidInput` at the helper boundary without ever leaving a
half-created junction directory behind. Use `checked_add` for the
8-byte offset-header total so the check also catches the
32_760-plus-offset-header wrap case.

Plan Task 1.3 Step 4 mirrors the reorder.
The Task 2.1 code-quality review flagged two stylistic gaps against the
file's existing conventions:

- The four new unit tests used an `edit_junction_*` prefix while every
  other test in `cli/src/command/core/path.rs` uses an `editor_*`
  prefix. Rename to `editor_junction_*` so the test list stays
  consistently grep-able.
- The `edit_junction` doc comment captured the design rationale (why a
  separate method exists) but not the contract (substitutions applied;
  leading `/` and --strip-components NOT applied). Mirror the contract
  lines from `edit_symlink` so the two methods read as siblings, not
  as rationale + cross-reference.
Windows CI revealed that every `create_junction_round_trip` run was
hitting `ERROR_INVALID_REPARSE_DATA` (4392) — the kernel rejected our
IO_REPARSE_TAG_MOUNT_POINT buffer. Microsoft's REPARSE_DATA_BUFFER
contract requires both SubstituteName and PrintName inside PathBuffer
to be UTF-16 NUL-terminated, with the length fields excluding the NULs
and PrintNameOffset set after SubstituteNameLength plus the NUL.

Our writer omitted both terminators and pointed PrintNameOffset at
`subst_bytes_len` (should be `subst_bytes_len + 2`). The parser never
noticed because it slices by offset/length and never reads the NULs,
so `read_reparse_point_on_junction` (which reads a `mklink /J`-built
junction) passed all along while `create_junction_round_trip` failed.

Append a u16 0x0000 after each name, bump `PrintNameOffset` by 2, and
make `ReparseDataLength` account for the four extra NUL bytes. The
u16-overflow guard now also checks the two extra NUL bytes so
oversized targets are still caught before any filesystem mutation.
Plan Task 1.3 Step 4 mirrors the fix.
The Task 3.1 code-quality review flagged that the `StoreAs::Junction`
arm in `create_entry` built the `EntryReference` directly via
`EntryReference::from_utf8_preserve_root(target.to_str().unwrap())`,
bypassing the `PathnameEditor::edit_junction` method added in Task 2.1.
The consequence was that user `-s` / `--transform` substitutions
applied to symlink targets but silently NOT to junction targets, and
`edit_junction` itself was dead code on every platform.

Call `pathname_editor.edit_junction(target)` instead. The method
applies substitutions and returns an `EntryReference` via the shared
helper that `edit_symlink` also uses, so junction and symlink targets
behave consistently under user transforms. Invariant I1 (spec §7)
guarantees the target is valid UTF-8, so the helper's internal
`to_string_lossy` is effectively lossless for real inputs.

Updates spec §6 / §8 so the "load-bearing unwrap" paragraph is
replaced with the `edit_junction` rationale, and plan Task 3.1 Step 7
mirrors the final code. Remove the now-unused `EntryReference` import
from `cli/src/command/core.rs`.
Address all four Minor nits flagged by the Task 4.1 code-quality
review; none affect Invariant I2 correctness.

- Junction skip warning: drop the internal encoding term
  "HardLink+fLTP=Directory" and match the peer symlink/hardlink voice
  ("If you need to extract it, use `--allow-unsafe-links`.").
- Non-Windows fallback warning: demote to `log::debug!`. The per-entry
  `warn!` would spam on large Windows-origin archives extracted on
  Unix; the degradation (junction → symlink) is a known platform
  limitation, best surfaced at debug level.
- `create_junction_or_fallback` canonicalize: replace `unwrap_or` with
  an explicit match that logs the underlying error at `debug!` level
  before falling back to the raw join. Prevents permission-denied /
  ELOOP failures from being hidden behind `create_junction`'s generic
  "target must be absolute" message.
- Test pattern: switch `extract_junction_without_allow_unsafe_links_skips`
  to the peer CLI test convention (`utils::setup()` + relative
  `"{base}/{base}.pna"` paths + in-process `cli::Cli::try_parse_from`).
  Drops the subprocess invocation and the macOS `canonicalize`
  workaround, and aligns with existing peer tests in
  `cli/tests/cli/extract/`. The stderr assertion is dropped because
  the filesystem assertion ("link_dir does not exist") is the
  load-bearing check.

Plan Task 4.1 Step 2 and Step 4 mirror the final code.
CI surfaced that `extract_junction_with_allow_unsafe_links_creates_link`
panics under `wasm32-wasip1` with "Operation not permitted (os error 63)"
because wasi-preview1 does not expose symlink creation. The fallback
path `pna::fs::symlink` therefore fails when the test tries to
materialize the non-Windows junction-fallback symlink.

Add `#[cfg(not(target_family = "wasm"))]` to the test, matching the
pattern established by `cli/tests/cli/extract/hardlink.rs:95` and
peers which gate off WASM for the same reason. The skip-case test
(`extract_junction_without_allow_unsafe_links_skips`) is not gated
because it never reaches the symlink-creation path.

Plan Task 4.2 Step 1 code block mirrors the gate.
Match the neighboring `extract_junction_with_allow_unsafe_links_creates_link`
test's `let ft = meta.file_type();` hoist, avoiding three separate
`file_type()` calls inside a single `assert!`. Pure style cleanup flagged
by the Task 4.3 code-quality review.
The Task 4.4 spec-compliance review empirically proved the I2 security
fence was decorative: a regression that swaps
`restore_link_timestamps_no_follow` for the default `restore_metadata`
call slipped through the `extract_junction_does_not_mutate_external_target`
assertion. Root cause: `build_junction_fixture` produced a junction entry
with no Permission chunk, so `restore_metadata`'s chmod/chown branch
(gated on `item.metadata().permission().is_some()`) never entered — no
follow-link syscall ever touched the target, no matter what the extract
code did.

Extend the fixture helper surface:
- `build_junction_fixture_with_permission(target, mode)` stamps a
  Permission chunk with the given mode so extract's `restore_mode` actually
  fires when `--keep-permission` is passed.
- `build_junction_fixture` continues to produce a Permission-less entry
  for the other two tests that don't exercise the mode path.
- Both delegate to a shared
  `build_junction_fixture_with_optional_permission` to avoid duplication.

Retarget the I2 test to `build_junction_fixture_with_permission(..., 0o755)`
while still pre-setting the external directory to 0o700. Under a regression
that removes Task 4.1's early return, extract now calls `chmod(link, 0o755)`
which follows the symlink and changes the external target's mode to 0o755,
firing `baseline_perms != after_perms` with the "I2 violation" message.

Empirically verified by reverting Task 4.1's early return in a throwaway
experiment: the fence fires with `left 0o040700, right 0o040755`. With
the early return restored, the fence passes.

Plan Task 4.4 Step 1 code block mirrors the new helper surface.
Windows CI surfaced that `extract_junction_does_not_mutate_external_target`
panics with `MissingRequiredArgument: --unstable` on the
`--no-default-features` build of the `extract` subcommand. On Windows,
`--keep-permission` (and related metadata-restoration flags) is gated
behind `--unstable` per `cli/src/command/extract.rs:149-150`
(`#[cfg_attr(windows, arg(requires = "unstable", ...))]`).

Add `#[cfg(windows)] "--unstable"` to the test's CLI args, matching
the peer pattern in `cli/tests/cli/keep_all.rs` and
`cli/tests/cli/diff/metadata.rs`. Plan mirrored.
The extract path gates Windows junction creation on --allow-unsafe-links
(see `cli/src/command/extract.rs` junction arm), but the flag's help text
still advertised only "symbolic links and hard links". A user reading
`pna extract --help` or `pna compat bsdtar --help` today would not know
the flag also covers NTFS junctions.

Rewrite the four help strings (positive/negative x extract/bsdtar) to
list symbolic links, hard links, and Windows junctions together, and
regenerate the committed CLI reference so docs stay in sync.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f4eda91-3c93-4dc8-b005-fdaf16c4367e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/windows-junction-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added dependencies Pull requests that update a dependency file cli This issue is about cli application labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Windows NTFS junctions, enabling them to be detected during archive creation and restored during extraction. Junctions are represented using the PNA encoding for hard links with a directory target type. The implementation includes new Windows-specific utilities for reparse point manipulation and updates to the CLI's path editing and extraction logic. Review feedback identified a bug in the junction creation logic involving incorrect NT namespace prefixing and an invalid string literal, as well as a potential extraction failure when canonicalizing targets that do not yet exist.

Comment on lines +230 to +232
let mut subst_wide: Vec<u16> = r"\??\".encode_utf16().collect();
subst_wide.extend(target.as_os_str().encode_wide());
let print_wide: Vec<u16> = target.as_os_str().encode_wide().collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The target path provided to create_junction may come from std::fs::canonicalize, which on Windows prepends the verbatim prefix \\?\. If this prefix is present, prepending \??\ results in an invalid NT namespace path like \??\\\?\C:\.... Additionally, the raw string literal r"\??\" used in the original code is invalid in Rust because raw strings cannot end with an odd number of backslashes.

You should strip the \\?\ prefix (if present) from the target path and use a valid string literal for the NT prefix.

Suggested change
let mut subst_wide: Vec<u16> = r"\??\".encode_utf16().collect();
subst_wide.extend(target.as_os_str().encode_wide());
let print_wide: Vec<u16> = target.as_os_str().encode_wide().collect();
let mut target_wide: Vec<u16> = target.as_os_str().encode_wide().collect();
if target_wide.starts_with(&[92, 92, 63, 92]) {
target_wide.drain(0..4);
}
let mut subst_wide: Vec<u16> = "\\??\\".encode_utf16().collect();
subst_wide.extend(&target_wide);
let print_wide = target_wide;

Comment on lines +2066 to +2076
match std::fs::canonicalize(&joined) {
Ok(canon) => canon,
Err(e) => {
log::debug!(
"Failed to canonicalize junction target {}: {}; using raw join",
joined.display(),
e
);
joined
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::fs::canonicalize here will fail if the junction target does not yet exist on disk, which is common when extracting an archive containing both a directory and a junction pointing to it. If canonicalize fails, the code falls back to joined, which might be a relative path. However, create_junction (in reparse.rs) explicitly rejects relative targets.

Consider using std::path::absolute to ensure the path is absolute without requiring it to exist on disk, while still attempting canonicalize for a cleaner path if possible.

            let absolute = std::path::absolute(&joined).unwrap_or_else(|_| joined.clone());
            match std::fs::canonicalize(&absolute) {
                Ok(canon) => canon,
                Err(e) => {
                    log::debug!(
                        "Failed to canonicalize junction target {}: {}; using absolute path",
                        absolute.display(),
                        e
                    );
                    absolute
                }
            }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli This issue is about cli application dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant