Skip to content

virtme-init: Reap zombie processes#428

Open
amorenoz wants to merge 1 commit intoarighi:mainfrom
amorenoz:fix-zombies
Open

virtme-init: Reap zombie processes#428
amorenoz wants to merge 1 commit intoarighi:mainfrom
amorenoz:fix-zombies

Conversation

@amorenoz
Copy link

@amorenoz amorenoz commented Feb 28, 2026

If the users create ofphaned processes, it's init's task to reap them or else they'll remain as zombies.

For example:

vng --user root
          _      _
   __   _(_)_ __| |_ _ __ ___   ___       _ __   __ _
   \ \ / / |  __| __|  _   _ \ / _ \_____|  _ \ / _  |
    \ V /| | |  | |_| | | | | |  __/_____| | | | (_| |
     \_/ |_|_|   \__|_| |_| |_|\___|     |_| |_|\__  |
                                                |___/
   kernel version: 6.19.0-virtme x86_64
   (CTRL+d to exit)

bash-5.3# (setsid sleep 1 &); sleep 2; ps aux | grep 'sleep'
root       342  0.0  0.0      0     0 ?        Zs   18:11   0:00 [sleep] <defunct>
root       345  0.0  0.0   3464  1932 ttyS0    S+   18:11   0:00 grep 'sleep'
bash-5.3#

Replace the specific waitpid (internal to Command::output()) with a simple loop that will also reap zombies if they appear.

@rbmarliere
Copy link
Contributor

Does the same happen with the shell init? (--no-virtme-ng-init)

@amorenoz
Copy link
Author

amorenoz commented Mar 4, 2026

Does the same happen with the shell init? (--no-virtme-ng-init)

No, that one seems to work all right.

It seems clippy is not happy if we don't call Process::wait so maybe we need to spawn a thread that does the zombie reaping after all (I tried to avoid it precisely because it could interact with the other Process::wait).

@amorenoz
Copy link
Author

amorenoz commented Mar 5, 2026

Is --no-virtme-ng-init planned to stay around or is the plan to remove it eventually?

@marcosps
Copy link
Collaborator

marcosps commented Mar 5, 2026

Is --no-virtme-ng-init planned to stay around or is the plan to remove it eventually?

Well, I find it useful. In places where I just clone virtme-ng, I can start a VM without having cargo/rust around. For most cases, maybe people will use the packaged version, but at least for me, the recent fixes are important for my workloads.

@matttbe
Copy link
Collaborator

matttbe commented Mar 5, 2026

The Bash version is also easier to hack to quickly try things, and it doesn't have some optimisations done in the Rust version which could cause some issues.

Also, please note that some packages still don't contain virtme-ng-init, e.g. on Debian and cie.

If the users create ofphaned processes, it's init's task to
reap them or else they'll remain as zombies.

For example:
  vng --user root
            _      _
     __   _(_)_ __| |_ _ __ ___   ___       _ __   __ _
     \ \ / / |  __| __|  _   _ \ / _ \_____|  _ \ / _  |
      \ V /| | |  | |_| | | | | |  __/_____| | | | (_| |
       \_/ |_|_|   \__|_| |_| |_|\___|     |_| |_|\__  |
                                                  |___/
     kernel version: 6.19.0-virtme x86_64
     (CTRL+d to exit)

  bash-5.3# (setsid sleep 1 &); sleep 2; ps aux | grep 'sleep'
  root       342  0.0  0.0      0     0 ?        Zs   18:11   0:00 [sleep] <defunct>
  root       345  0.0  0.0   3464  1932 ttyS0    S+   18:11   0:00 grep 'sleep'
  bash-5.3#

Replace the specific waitpid (internal to `Command::output()`) with a
simple loop that will also reap zombies if they appear.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
@amorenoz
Copy link
Author

amorenoz commented Mar 6, 2026

Since there is a workaround, this is not a big issue for me anymore. Feel free to close the PR. If you would like this issue to be solved in the rust init in any other way, let me know and I'll give it a try.

@matttbe
Copy link
Collaborator

matttbe commented Mar 6, 2026

I think it is fine to ignore the clippy warning, but I'm not a rust expert. @arighi WDYT?

@arighi
Copy link
Owner

arighi commented Mar 6, 2026

Is --no-virtme-ng-init planned to stay around or is the plan to remove it eventually?

The bash init is definitely going to stay. We can't cross-compile virtme-ng-init for all the architectures and the bash init allows to easily support them (--arch).

@arighi
Copy link
Owner

arighi commented Mar 7, 2026

If the users create ofphaned processes, it's init's task to reap them or else they'll remain as zombies.

Nit: "ofphaned" -> "orphaned"


fn wait_for_child(child: i32) {
loop {
match wait() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait() can fail with EINTR when the process receives a signal. Right now any error leads to a log + return, so the direct child is never reaped and can stay a zombie. Maybe add:

Err(Errno::EINTR) => continue,

.output()
.spawn()
.expect("Failed to start shell session");
wait_for_child(child.id() as i32);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should apply the same spawn/wait pattern to run_user_script() as well?

@arighi
Copy link
Owner

arighi commented Mar 7, 2026

I think it is fine to ignore the clippy warning, but I'm not a rust expert. @arighi WDYT?

I think it's correct to ignore: the warning is because we spawned a child without waiting on it, but we do wait (via a custom wait() loop), so it's just lint that doesn't understand this pattern.

Copy link
Collaborator

@matttbe matttbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(adapting the status after Andrea's review, plus there are some conflicts to solve if you don't mind rebasing)

@arighi
Copy link
Owner

arighi commented Mar 10, 2026

BTW, I'm not able to reproduce this problem, I don't see any zombie tasks. Did it go away with some of the recent changes?

@matttbe
Copy link
Collaborator

matttbe commented Mar 10, 2026

BTW, I'm not able to reproduce this problem, I don't see any zombie tasks. Did it go away with some of the recent changes?

I can still reproduce it on my side:

$ ./vng --user root -r
# (setsid sleep 1 &); sleep 2; ps aux | grep "[s]leep"
root      1102  0.0  0.0      0     0 ?        Zs   11:31   0:00 [sleep] <defunct>

(also without --user root)

As @amorenoz said before, this issue is only with vng-init, not with the bash init script.

How did you try?

@arighi
Copy link
Owner

arighi commented Mar 10, 2026

$ ./vng --user root -r
# (setsid sleep 1 &); sleep 2; ps aux | grep "[s]leep"
root      1102  0.0  0.0      0     0 ?        Zs   11:31   0:00 [sleep] <defunct>

@matttbe Ah yes, with this I can reproduce it. Ok, then we should fix this and once we land this one we can cut a new version, wdyt?

@matttbe
Copy link
Collaborator

matttbe commented Mar 10, 2026

Ok, then we should fix this and once we land this one we can cut a new version, wdyt?

Good idea to cut a new version!

Do you mean you are planning to apply the suggestions you sent here?

@rbmarliere
Copy link
Contributor

@arighi #445 might be worth including in the new version, thx!

@arighi
Copy link
Owner

arighi commented Mar 10, 2026

Ok, then we should fix this and once we land this one we can cut a new version, wdyt?

Good idea to cut a new version!

Do you mean you are planning to apply the suggestions you sent here?

Yeah we need to resolve the conflicts and include my comments, if @amorenoz doesn't have time maybe I can add these changes on top of this commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants