feature: initial version of a rust based init process for the initramfs#2
feature: initial version of a rust based init process for the initramfs#2JoergZeidler wants to merge 0 commit intoomnect:mainfrom
Conversation
There was a problem hiding this comment.
Code Review — PR1–PR5 Coverage
All five planned deliverables are present and structurally correct. The following issues were found during review, grouped by severity.
🔵 Design Notes (informational, no action required)
Device detection approach: Plan specified stat(2) + sysfs lookup. Implementation uses root= cmdline parsing. This is simpler and equally correct — the kernel has already resolved the device by the time initramfs runs. Acceptable deviation.
GRUB detection ordering: create_bootloader() checks for grub-editenv in rootfs and GrubBootloader::new() checks grubenv existence under rootfs/boot/. Both require rootfs/boot to already be mounted, which the call order in main.rs satisfies. Fragile if call order changes — worth a comment.
ROOTFS_DIR env var in Config: Useful for integration tests but unexpected for PID 1. Low risk since fallback to /rootfs is correct.
JanZachmann
left a comment
There was a problem hiding this comment.
Inline review comments follow. See individual file annotations for details.
| } | ||
|
|
||
| // Determine release mode early for use in error handling | ||
| let is_release_image = Config::load().map(|c| c.is_release_image).unwrap_or(false); |
There was a problem hiding this comment.
🟡 Should Fix — early Config::load() before rootfs is mounted
Config::load() reads /rootfs/etc/os-release to determine is_release_image, but rootfs is not mounted yet at this point. It silently falls back to the initramfs /etc/os-release. If the initramfs image does not carry OMNECT_RELEASE_IMAGE=1, a release image will spawn a debug shell on the first fatal error instead of looping.
Fix: Read only /proc/cmdline here; defer os-release parsing until after mount_partitions succeeds.
src/main.rs
Outdated
| // Mount rootfs read-only | ||
| if let Some(root_dev) = layout.partitions.get("rootCurrent") { | ||
| // Run fsck first | ||
| if let Ok(result) = check_filesystem_lenient(root_dev) { |
There was a problem hiding this comment.
🔴 Critical — FsckRequiresReboot silently swallowed
if let Ok(result) discards Err(FsckRequiresReboot). check_filesystem_lenient intentionally re-propagates fsck exit code 2 as an error variant, but this pattern drops it — the system continues booting when it should reboot.
The same bug is present on all six partition checks below (boot, factory, cert, etc, data).
// ❌ Current
if let Ok(result) = check_filesystem_lenient(root_dev) { ... }
// ✅ Fix
let result = check_filesystem_lenient(root_dev)?;|
|
||
| fn save_fsck_status(&mut self, partition: &str, output: &str, code: i32) -> Result<()> { | ||
| let var_name = format!("{}{}", FSCK_VAR_PREFIX, partition); | ||
| let value = format!("{}:{}", code, output); |
There was a problem hiding this comment.
🔴 Critical — fsck output in grubenv will corrupt the environment block
The raw "code:output" string is written directly to the 1024-byte grubenv block. With up to 5 partitions each producing 80–500+ bytes of fsck output, this will overflow and silently truncate all bootloader environment variables.
Why compress_and_encode (the UBoot approach) is not a reliable fix:
- gzip has ~18 bytes of fixed overhead; base64 then inflates output by 33%.
- Small inputs (healthy partition, ~80 bytes) often produce a larger blob after compression.
- Corrupted-filesystem output can be hundreds of lines — no size guarantee exists.
- UBoot uses this approach too, and is equally at risk; it just has not hit the limit in testing yet.
Correct fix — separate the bootloader signal from diagnostic data:
Store only the exit code in grubenv (this is all the bootloader and ODS need for boot decisions):
// ✅ Fix: store exit code only
let var_name = format!("{}{}", FSCK_VAR_PREFIX, partition);
self.set_env(&var_name, Some(&code.to_string()))Write full fsck output to /data/var/log/fsck/<partition>.log. The /data partition is persistent, unbounded, and readable by ODS after boot. This is the correct place for diagnostic data.
The same change should be applied to UBootBootloader::save_fsck_status.
| // Move critical mounts to new root before switching | ||
| move_mount("/dev", &new_root.join("dev"))?; | ||
| move_mount("/proc", &new_root.join("proc"))?; | ||
| move_mount("/sys", &new_root.join("sys"))?; |
There was a problem hiding this comment.
🔴 Critical — /run not moved before switch_root
Only /dev, /proc, /sys are moved. /run/omnect-device-service/ is populated immediately before switch_root is called (via create_ods_runtime_files), but it is not moved here. After the root pivot, ODS will find an empty /run/omnect-device-service/ and fail to read its runtime state.
// ✅ Fix: add after the /sys line
move_mount("/run", &new_root.join("run"))?;
src/runtime/switch_root.rs
Outdated
| ).map_err(|e| { | ||
| InitramfsError::Io(std::io::Error::new( | ||
| std::io::ErrorKind::Other, | ||
| format!("Failed to move to {}", e), |
There was a problem hiding this comment.
🔴 Critical — error message loses all diagnostic context
format!("Failed to move to {}", e) only prints the errno. The source path and target path are absent, making the error undebuggable in the kernel log.
// ✅ Fix
format!("Failed to move {} → {}: {}", source, target.display(), e)
Cargo.toml
Outdated
| thiserror = { version = "2.0", default-features = false } | ||
|
|
||
| # Serialization | ||
| serde = { version = "1.0", features = ["derive"] } |
There was a problem hiding this comment.
🟡 Should Fix — serde and serde_json missing default-features = false
Required by project coding standards (CLAUDE.md — Dependency Management).
# ✅ Fix
serde = { version = "1.0", default-features = false, features = ["derive"] }
serde_json = { version = "1.0", default-features = false }| .and_then(|s| s.to_str()) | ||
| .unwrap_or(""); | ||
|
|
||
| root_part_str.ends_with('2') || root_part_str.ends_with("p2") |
There was a problem hiding this comment.
🟡 Should Fix — fragile partition suffix check
ends_with('2') also matches partition numbers 12, 22, 32, etc. on devices with 10+ partitions (e.g., nvme0n1p12).
// ✅ Fix: parse the numeric suffix and compare to PARTITION_NUM_ROOT_A
let suffix: u32 = root_part_str
.trim_start_matches(|c: char| !c.is_ascii_digit())
.parse()
.unwrap_or(0);
suffix == PARTITION_NUM_ROOT_A| use std::time::{Duration, Instant}; | ||
| use std::thread; | ||
|
|
||
|
|
There was a problem hiding this comment.
🟡 Should Fix — double blank line, cargo fmt -- --check will fail
Remove one of the two consecutive blank lines here. Also note: src/filesystem/mount.rs has trailing whitespace in move_mount that will similarly fail cargo fmt. Run cargo fmt to fix both.
Know issues:
Remark: Just tested on a rpi4.
Additional changes see: meta-omnect Draft PR: omnect/meta-omnect#636