Date: Mon, 31 Mar 2014 19:29:49 -0300 From: Marcelo Gondim <gondim@bsdinfo.com.br> To: freebsd-stable@freebsd.org Cc: freebsd-net@freebsd.org Subject: Re: Process handlers, and zombies, or preap(1) Message-ID: <5339EC5D.3030408@bsdinfo.com.br> In-Reply-To: <f5bfca4537aaca03ef53ae45950ef764.authenticated@ultimatedns.net> References: <20140331211147.GA52184@anubis.morrow.me.uk> <f5bfca4537aaca03ef53ae45950ef764.authenticated@ultimatedns.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Em 31/03/14 18:39, Chris H escreveu: >> Quoth "Chris H" <bsd-lists@bsdforge.com>: >>> I'm evaluating/experimenting on releng_9. The install, and now >>> custom kernel have noting exotic, or anything out of the ordinary. >>> top(1), and ps(1) indicate a (1) zombie, or <defunct> process. On >>> my releng_8 systems, when I occasionally encounter one of these, >>> they soon disappear (are reaped) from the process table. While I >>> have not investigated this far enough on both versions to determine >>> whether the parent process reaped the child on the releng_8 systems, >>> and the parent on releng_9 is simply an irresponsible parent, eg; >>> a different parent. >> What is the parent? > Sorry, that /should/ have been clearer. :) > Meaning; the processes (parents) that are reaping the zombies on releng_8 > are different that those I'm seeing on releng_9. > In other words; On releng_8, I see a zombie, then seconds later, it's > gone. On releng_9, I see a zombie, and it never leaves. > Is the "parent" of the dead "child" on releng_9, different than that of > the parent on releng_8. I couldn't possibly expect you to know. But not > having been able to catch the parent process reaping the defunct child > on releng_8, before it has reaped it. I cannot know. Which led me to ask; > Is there anything different on releng_9, that might cause zombies > terminally within the process table? > A bit wordy, perhaps. But makes the point. No? :) > >>> Before I do, I was wondering if there was any >>> specific difference between the 2 versions that might cause better >>> handling of such situations. While I recognize that resource >>> starvation is HIGHLY unlikely, except by perhaps a rouge parent >> A rouge parent? :) > Yes. An unfit parent, that will not watch after it's child(ren). We > have agencies in the US that seek to end such delinquencies. Maybe > FreeBSD could employ such tactics. :) > >>> spawning multitudes of zombies. I thought it might be useful for >>> "housekeeping" to 1) provide a process table housekeeper (zombie >>> reaper), >> That's called init(8). When the parent exits, init will wait for the >> zombie. >> >>> or 2) create a system utility/command like SunOS/OpenSolaris >>> has; preap(1). >> That seems like a bad idea, to me. Generally speaking I would expect it >> to be safer to kill and restart the parent, allowing init to do its job. > Maybe. Maybe not. I think it depends on the parent process, and what impact > HUPing it, will have on the system. Tho this should not be an excuse for > not fixing the problem parent. But rather, a stop-gap, until a suitable > fix is created/obtained (for the parent). > > > Thanks for taking the time to respond, Ben. > > --Chris > >> Ben This could be related with this problem that I'm having with zombie processes? [...] 40945 - Is 0:00.01 sshd: luciele [priv] (sshd) 40946 - Z 0:00.01 <defunct> 40947 - IW 0:00.00 sshd: luciele [pam] (sshd) 44376 - Is 0:00.01 sshd: unknown [priv] (sshd) 44377 - Z 0:00.01 <defunct> 44378 - IW 0:00.00 sshd: unknown [pam] (sshd) 58892 - IW 0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT 58978 - IW 0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT 61361 - IW 0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT 61684 - Is 0:00.01 sshd: unknown [priv] (sshd) 61685 - Z 0:00.01 <defunct> 61692 - IW 0:00.00 sshd: unknown [pam] (sshd) 78346 - Is 0:00.01 sshd: unknown [priv] (sshd) 78347 - Z 0:00.01 <defunct> 78351 - IW 0:00.00 sshd: unknown [pam] (sshd) [...] # procstat -f 40945 PID COMM FD T V FLAGS REF OFFSET PRO NAME 40945 sshd text v r r------- - - - /usr/sbin/sshd 40945 sshd cwd v d r------- - - - / 40945 sshd root v d r------- - - - / 40945 sshd 0 v c rw------ 6 0 - /dev/null 40945 sshd 1 v c rw------ 6 0 - /dev/null 40945 sshd 2 v c rw------ 6 0 - /dev/null 40945 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.10:4321 186.xxx.xx.8:64762 40945 sshd 4 s - rw------ 1 0 UDS - 40945 sshd 5 p - rw------ 2 0 - - 40945 sshd 6 s - rw------ 2 0 UDS - # procstat -f 44376 PID COMM FD T V FLAGS REF OFFSET PRO NAME 44376 sshd text v r r------- - - - /usr/sbin/sshd 44376 sshd cwd v d r------- - - - / 44376 sshd root v d r------- - - - / 44376 sshd 0 v c rw------ 6 0 - /dev/null 44376 sshd 1 v c rw------ 6 0 - /dev/null 44376 sshd 2 v c rw------ 6 0 - /dev/null 44376 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.10:4321 186.xxx.xx.8:64368 44376 sshd 4 s - rw------ 1 0 UDS - 44376 sshd 5 p - rw------ 2 0 - - 44376 sshd 6 s - rw------ 2 0 UDS - # procstat -f 61684 PID COMM FD T V FLAGS REF OFFSET PRO NAME 61684 sshd text v r r------- - - - /usr/sbin/sshd 61684 sshd cwd v d r------- - - - / 61684 sshd root v d r------- - - - / 61684 sshd 0 v c rw------ 6 0 - /dev/null 61684 sshd 1 v c rw------ 6 0 - /dev/null 61684 sshd 2 v c rw------ 6 0 - /dev/null 61684 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.10:4321 186.xxx.xx.8:61415 61684 sshd 4 s - rw------ 1 0 UDS - 61684 sshd 5 p - rw------ 2 0 - - 61684 sshd 6 s - rw------ 2 0 UDS - # procstat -f 78346 PID COMM FD T V FLAGS REF OFFSET PRO NAME 78346 sshd text v r r------- - - - /usr/sbin/sshd 78346 sshd cwd v d r------- - - - / 78346 sshd root v d r------- - - - / 78346 sshd 0 v c rw------ 6 0 - /dev/null 78346 sshd 1 v c rw------ 6 0 - /dev/null 78346 sshd 2 v c rw------ 6 0 - /dev/null 78346 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.10:4321 186.xxx.xx.8:50994 78346 sshd 4 s - rw------ 1 0 UDS - 78346 sshd 5 p - rw------ 2 0 - - 78346 sshd 6 s - rw------ 2 0 UDS - # netstat -n | grep CLOSED tcp4 0 0 186.xxx.xx.10.4321 186.xxx.xx.8.64368 CLOSED tcp4 0 0 186.xxx.xx.10.4321 186.xxx.xx.8.61415 CLOSED tcp4 0 0 186.xxx.xx.10.4321 186.xxx.xx.8.50994 CLOSED tcp4 0 0 186.xxx.xx.10.4321 186.xxx.xx.8.64762 CLOSED # uname -a FreeBSD xxxxx.xxxxx.xxx.xx 10.0-STABLE FreeBSD 10.0-STABLE #6 r263882: Fri Mar 28 20:28:40 BRT 2014 root@xxxxx.xxxxx.xxx.xx:/usr/obj/usr/src/sys/GONDIM10 amd64 Cheers, Gondim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5339EC5D.3030408>