Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Mar 2014 19:29:49 -0300
From:      Marcelo Gondim <gondim@bsdinfo.com.br>
To:        freebsd-stable@freebsd.org
Cc:        freebsd-net@freebsd.org
Subject:   Re: Process handlers, and zombies, or preap(1)
Message-ID:  <5339EC5D.3030408@bsdinfo.com.br>
In-Reply-To: <f5bfca4537aaca03ef53ae45950ef764.authenticated@ultimatedns.net>
References:  <20140331211147.GA52184@anubis.morrow.me.uk> <f5bfca4537aaca03ef53ae45950ef764.authenticated@ultimatedns.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Em 31/03/14 18:39, Chris H escreveu:
>> Quoth "Chris H" <bsd-lists@bsdforge.com>:
>>>   I'm evaluating/experimenting on releng_9. The install, and now
>>> custom kernel have noting exotic, or anything out of the ordinary.
>>> top(1), and ps(1) indicate a (1) zombie, or <defunct> process. On
>>> my releng_8 systems, when I occasionally encounter one of these,
>>> they soon disappear (are reaped) from the process table. While I
>>> have not investigated this far enough on both versions to determine
>>> whether the parent process reaped the child on the releng_8 systems,
>>> and the parent on releng_9 is simply an irresponsible parent, eg;
>>> a different parent.
>> What is the parent?
> Sorry, that /should/ have been clearer. :)
> Meaning; the processes (parents) that are reaping the zombies on releng_8
> are different that those I'm seeing on releng_9.
> In other words; On releng_8, I see a zombie, then seconds later, it's
> gone. On releng_9, I see a zombie, and it never leaves.
> Is the "parent" of the dead "child" on releng_9, different than that of
> the parent on releng_8. I couldn't possibly expect you to know. But not
> having been able to catch the parent process reaping the defunct child
> on releng_8, before it has reaped it. I cannot know. Which led me to ask;
> Is there anything different on releng_9, that might cause zombies
> terminally within the process table?
> A bit wordy, perhaps. But makes the point. No? :)
>
>>> Before I do, I was wondering if there was any
>>> specific difference between the 2 versions that might cause better
>>> handling of such situations. While I recognize that resource
>>> starvation is HIGHLY unlikely, except by perhaps a rouge parent
>> A rouge parent? :)
> Yes. An unfit parent, that will not watch after it's child(ren). We
> have agencies in the US that seek to end such delinquencies. Maybe
> FreeBSD could employ such tactics. :)
>
>>> spawning multitudes of zombies. I thought it might be useful for
>>> "housekeeping" to 1) provide a process table housekeeper (zombie
>>> reaper),
>> That's called init(8). When the parent exits, init will wait for the
>> zombie.
>>
>>> or 2) create a system utility/command like SunOS/OpenSolaris
>>> has; preap(1).
>> That seems like a bad idea, to me. Generally speaking I would expect it
>> to be safer to kill and restart the parent, allowing init to do its job.
> Maybe. Maybe not. I think it depends on the parent process, and what impact
> HUPing it, will have on the system. Tho this should not be an excuse for
> not fixing the problem parent. But rather, a stop-gap, until a suitable
> fix is created/obtained (for the parent).
>
>
> Thanks for taking the time to respond, Ben.
>
> --Chris
>
>> Ben
This could be related with this problem that I'm having with zombie 
processes?

[...]
40945  -  Is       0:00.01 sshd: luciele [priv] (sshd)
40946  -  Z        0:00.01 <defunct>
40947  -  IW       0:00.00 sshd: luciele [pam] (sshd)
44376  -  Is       0:00.01 sshd: unknown [priv] (sshd)
44377  -  Z        0:00.01 <defunct>
44378  -  IW       0:00.00 sshd: unknown [pam] (sshd)
58892  -  IW       0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
58978  -  IW       0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
61361  -  IW       0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
61684  -  Is       0:00.01 sshd: unknown [priv] (sshd)
61685  -  Z        0:00.01 <defunct>
61692  -  IW       0:00.00 sshd: unknown [pam] (sshd)
78346  -  Is       0:00.01 sshd: unknown [priv] (sshd)
78347  -  Z        0:00.01 <defunct>
78351  -  IW       0:00.00 sshd: unknown [pam] (sshd)
[...]

# procstat -f 40945
   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
40945 sshd              text v r r-------  -       - - /usr/sbin/sshd
40945 sshd               cwd v d r-------  -       - - /
40945 sshd              root v d r-------  -       - - /
40945 sshd                 0 v c rw------  6       0 - /dev/null
40945 sshd                 1 v c rw------  6       0 - /dev/null
40945 sshd                 2 v c rw------  6       0 - /dev/null
40945 sshd                 3 s - rw---n--  2       0 TCP 
186.xxx.xx.10:4321 186.xxx.xx.8:64762
40945 sshd                 4 s - rw------  1       0 UDS -
40945 sshd                 5 p - rw------  2       0 - -
40945 sshd                 6 s - rw------  2       0 UDS -

# procstat -f 44376
   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
44376 sshd              text v r r-------  -       - - /usr/sbin/sshd
44376 sshd               cwd v d r-------  -       - - /
44376 sshd              root v d r-------  -       - - /
44376 sshd                 0 v c rw------  6       0 - /dev/null
44376 sshd                 1 v c rw------  6       0 - /dev/null
44376 sshd                 2 v c rw------  6       0 - /dev/null
44376 sshd                 3 s - rw---n--  2       0 TCP 
186.xxx.xx.10:4321 186.xxx.xx.8:64368
44376 sshd                 4 s - rw------  1       0 UDS -
44376 sshd                 5 p - rw------  2       0 - -
44376 sshd                 6 s - rw------  2       0 UDS -

# procstat -f 61684
   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
61684 sshd              text v r r-------  -       - - /usr/sbin/sshd
61684 sshd               cwd v d r-------  -       - - /
61684 sshd              root v d r-------  -       - - /
61684 sshd                 0 v c rw------  6       0 - /dev/null
61684 sshd                 1 v c rw------  6       0 - /dev/null
61684 sshd                 2 v c rw------  6       0 - /dev/null
61684 sshd                 3 s - rw---n--  2       0 TCP 
186.xxx.xx.10:4321 186.xxx.xx.8:61415
61684 sshd                 4 s - rw------  1       0 UDS -
61684 sshd                 5 p - rw------  2       0 - -
61684 sshd                 6 s - rw------  2       0 UDS -

# procstat -f 78346
   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
78346 sshd              text v r r-------  -       - - /usr/sbin/sshd
78346 sshd               cwd v d r-------  -       - - /
78346 sshd              root v d r-------  -       - - /
78346 sshd                 0 v c rw------  6       0 - /dev/null
78346 sshd                 1 v c rw------  6       0 - /dev/null
78346 sshd                 2 v c rw------  6       0 - /dev/null
78346 sshd                 3 s - rw---n--  2       0 TCP 
186.xxx.xx.10:4321 186.xxx.xx.8:50994
78346 sshd                 4 s - rw------  1       0 UDS -
78346 sshd                 5 p - rw------  2       0 - -
78346 sshd                 6 s - rw------  2       0 UDS -

# netstat -n | grep CLOSED
tcp4       0      0 186.xxx.xx.10.4321     186.xxx.xx.8.64368 CLOSED
tcp4       0      0 186.xxx.xx.10.4321     186.xxx.xx.8.61415 CLOSED
tcp4       0      0 186.xxx.xx.10.4321     186.xxx.xx.8.50994 CLOSED
tcp4       0      0 186.xxx.xx.10.4321     186.xxx.xx.8.64762 CLOSED

# uname -a
FreeBSD xxxxx.xxxxx.xxx.xx 10.0-STABLE FreeBSD 10.0-STABLE #6 r263882: 
Fri Mar 28 20:28:40 BRT 2014 
root@xxxxx.xxxxx.xxx.xx:/usr/obj/usr/src/sys/GONDIM10  amd64

Cheers,
Gondim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5339EC5D.3030408>