Skip to content

feature: initial version of a rust based init process for the initramfs#2

Closed
JoergZeidler wants to merge 0 commit intoomnect:mainfrom
JoergZeidler:bootable_omnect_os
Closed

feature: initial version of a rust based init process for the initramfs#2
JoergZeidler wants to merge 0 commit intoomnect:mainfrom
JoergZeidler:bootable_omnect_os

Conversation

@JoergZeidler
Copy link

Know issues:

  • sporadically not bootable regarding fsck filesystem errors

Remark: Just tested on a rpi4.

Additional changes see: meta-omnect Draft PR: omnect/meta-omnect#636

@JoergZeidler JoergZeidler requested a review from JanZachmann March 3, 2026 12:02
Copy link

@JanZachmann JanZachmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR1–PR5 Coverage

All five planned deliverables are present and structurally correct. The following issues were found during review, grouped by severity.


🔵 Design Notes (informational, no action required)

Device detection approach: Plan specified stat(2) + sysfs lookup. Implementation uses root= cmdline parsing. This is simpler and equally correct — the kernel has already resolved the device by the time initramfs runs. Acceptable deviation.

GRUB detection ordering: create_bootloader() checks for grub-editenv in rootfs and GrubBootloader::new() checks grubenv existence under rootfs/boot/. Both require rootfs/boot to already be mounted, which the call order in main.rs satisfies. Fragile if call order changes — worth a comment.

ROOTFS_DIR env var in Config: Useful for integration tests but unexpected for PID 1. Low risk since fallback to /rootfs is correct.

Copy link

@JanZachmann JanZachmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline review comments follow. See individual file annotations for details.

}

// Determine release mode early for use in error handling
let is_release_image = Config::load().map(|c| c.is_release_image).unwrap_or(false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Should Fix — early Config::load() before rootfs is mounted

Config::load() reads /rootfs/etc/os-release to determine is_release_image, but rootfs is not mounted yet at this point. It silently falls back to the initramfs /etc/os-release. If the initramfs image does not carry OMNECT_RELEASE_IMAGE=1, a release image will spawn a debug shell on the first fatal error instead of looping.

Fix: Read only /proc/cmdline here; defer os-release parsing until after mount_partitions succeeds.

src/main.rs Outdated
// Mount rootfs read-only
if let Some(root_dev) = layout.partitions.get("rootCurrent") {
// Run fsck first
if let Ok(result) = check_filesystem_lenient(root_dev) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — FsckRequiresReboot silently swallowed

if let Ok(result) discards Err(FsckRequiresReboot). check_filesystem_lenient intentionally re-propagates fsck exit code 2 as an error variant, but this pattern drops it — the system continues booting when it should reboot.

The same bug is present on all six partition checks below (boot, factory, cert, etc, data).

// ❌ Current
if let Ok(result) = check_filesystem_lenient(root_dev) { ... }

// ✅ Fix
let result = check_filesystem_lenient(root_dev)?;


fn save_fsck_status(&mut self, partition: &str, output: &str, code: i32) -> Result<()> {
let var_name = format!("{}{}", FSCK_VAR_PREFIX, partition);
let value = format!("{}:{}", code, output);
Copy link

@JanZachmann JanZachmann Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — fsck output in grubenv will corrupt the environment block

The raw "code:output" string is written directly to the 1024-byte grubenv block. With up to 5 partitions each producing 80–500+ bytes of fsck output, this will overflow and silently truncate all bootloader environment variables.

Why compress_and_encode (the UBoot approach) is not a reliable fix:

  • gzip has ~18 bytes of fixed overhead; base64 then inflates output by 33%.
  • Small inputs (healthy partition, ~80 bytes) often produce a larger blob after compression.
  • Corrupted-filesystem output can be hundreds of lines — no size guarantee exists.
  • UBoot uses this approach too, and is equally at risk; it just has not hit the limit in testing yet.

Correct fix — separate the bootloader signal from diagnostic data:

Store only the exit code in grubenv (this is all the bootloader and ODS need for boot decisions):

// ✅ Fix: store exit code only
let var_name = format!("{}{}", FSCK_VAR_PREFIX, partition);
self.set_env(&var_name, Some(&code.to_string()))

Write full fsck output to /data/var/log/fsck/<partition>.log. The /data partition is persistent, unbounded, and readable by ODS after boot. This is the correct place for diagnostic data.

The same change should be applied to UBootBootloader::save_fsck_status.

// Move critical mounts to new root before switching
move_mount("/dev", &new_root.join("dev"))?;
move_mount("/proc", &new_root.join("proc"))?;
move_mount("/sys", &new_root.join("sys"))?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — /run not moved before switch_root

Only /dev, /proc, /sys are moved. /run/omnect-device-service/ is populated immediately before switch_root is called (via create_ods_runtime_files), but it is not moved here. After the root pivot, ODS will find an empty /run/omnect-device-service/ and fail to read its runtime state.

// ✅ Fix: add after the /sys line
move_mount("/run", &new_root.join("run"))?;

).map_err(|e| {
InitramfsError::Io(std::io::Error::new(
std::io::ErrorKind::Other,
format!("Failed to move to {}", e),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — error message loses all diagnostic context

format!("Failed to move to {}", e) only prints the errno. The source path and target path are absent, making the error undebuggable in the kernel log.

// ✅ Fix
format!("Failed to move {} → {}: {}", source, target.display(), e)

Cargo.toml Outdated
thiserror = { version = "2.0", default-features = false }

# Serialization
serde = { version = "1.0", features = ["derive"] }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Should Fix — serde and serde_json missing default-features = false

Required by project coding standards (CLAUDE.md — Dependency Management).

# ✅ Fix
serde      = { version = "1.0", default-features = false, features = ["derive"] }
serde_json = { version = "1.0", default-features = false }

.and_then(|s| s.to_str())
.unwrap_or("");

root_part_str.ends_with('2') || root_part_str.ends_with("p2")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Should Fix — fragile partition suffix check

ends_with('2') also matches partition numbers 12, 22, 32, etc. on devices with 10+ partitions (e.g., nvme0n1p12).

// ✅ Fix: parse the numeric suffix and compare to PARTITION_NUM_ROOT_A
let suffix: u32 = root_part_str
    .trim_start_matches(|c: char| !c.is_ascii_digit())
    .parse()
    .unwrap_or(0);
suffix == PARTITION_NUM_ROOT_A

use std::time::{Duration, Instant};
use std::thread;


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Should Fix — double blank line, cargo fmt -- --check will fail

Remove one of the two consecutive blank lines here. Also note: src/filesystem/mount.rs has trailing whitespace in move_mount that will similarly fail cargo fmt. Run cargo fmt to fix both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants