Date: Fri, 21 Mar 2014 22:02:50 -0700 From: Kevin Oberman <rkoberman@gmail.com> To: Marcelo Gondim <gondim@bsdinfo.com.br> Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org> Subject: Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround Message-ID: <CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A@mail.gmail.com> In-Reply-To: <532B7DEC.7010809@bsdinfo.com.br> References: <53016D97.5030909@bsdinfo.com.br> <CAN6yY1uucfkdXxkCF30w1Q9vffRpDLxM90Sz1XVbdn5W69vQMg@mail.gmail.com> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim <gondim@bsdinfo.com.br>wrote: > Em 20/03/14 11:58, John Baldwin escreveu: > >> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote: >> >> Em 19/03/14 13:01, Kevin Oberman escreveu: >>> >>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim >>>> >>> <gondim@bsdinfo.com.br>wrote: >> >>> Hi all, >>>>> >>>>> While the solution does not appear, did the script below and put it in >>>>> crontab to automatically delete zombie sshd processes. >>>>> >>>>> the_walking_dead.sh: >>>>> >>>>> #!/bin/sh >>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'` >>>>> >>>>> >>>>> Put this in /etc/crontab: >>>>> >>>>> 00 1 * * * root the_walking_dead.sh >>>>> >>>>> >>>>> If 'kill -9' works, the process is not really a zombie. It simply >>>> still >>>> >>> has >> >>> a socket open and is waiting for it to be closed before exiting. >>>> >>>> You might takes a look at network sockets with sockstat(1) and see if >>>> you >>>> can get any indication of why these sockets are not being closed. It may >>>> >>> be >> >>> that the issue is not sshd but some other issue in the OS leaving sockets >>>> open. >>>> >>>> Hi Kevin, >>> >>> My ps -afx below: >>> >>> [...] >>> 42139 - Is 0:00.01 sshd: unknown [priv] (sshd) >>> 42140 - Z 0:00.01 <defunct> >>> 42141 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 58445 - Is 0:00.01 sshd: unknown [priv] (sshd) >>> 58446 - Z 0:00.02 <defunct> >>> 58447 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 65635 - Is 0:00.01 sshd: vinicius [priv] (sshd) >>> 65636 - Z 0:00.01 <defunct> >>> [...] >>> >>> # sockstat | grep 42140 >>> # >>> >>> # sockstat | grep 58446 >>> # >>> >>> # sockstat | grep 65636 >>> # >>> >>> No associated socket with zombie process. >>> >> Do a pstree. I bet the zombies are children of the other processes that >> are stuck on a socket as Kevin described. >> >> # ps afx|grep sshd |grep unk > 10948 - Is 0:00.02 sshd: unknown [priv] (sshd) > 10955 - IW 0:00.00 sshd: unknown [pam] (sshd) <==== > 11701 - Is 0:00.02 sshd: unknown [priv] (sshd) > 11704 - IW 0:00.00 sshd: unknown [pam] (sshd) > 25450 - Is 0:00.01 sshd: unknown [priv] (sshd) > 25452 - IW 0:00.00 sshd: unknown [pam] (sshd) > 41193 - Is 0:00.02 sshd: unknown [priv] (sshd) > 41196 - IW 0:00.00 sshd: unknown [pam] (sshd) > 42193 - Is 0:00.02 sshd: unknown [priv] (sshd) > 42195 - IW 0:00.00 sshd: unknown [pam] (sshd) > 80638 - Is 0:00.02 sshd: unknown [priv] (sshd) > 80640 - IW 0:00.00 sshd: unknown [pam] (sshd) > 81484 - Is 0:00.02 sshd: unknown [priv] (sshd) > 81486 - IW 0:00.00 sshd: unknown [pam] (sshd) > > With proctstat I could see the socket as follows: > > # procstat -f 10955 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 10955 sshd text v r r------- - - - /usr/sbin/sshd > 10955 sshd cwd v d r------- - - - / > 10955 sshd root v d r------- - - - / > 10955 sshd 0 v c rw------ 6 0 - /dev/null > 10955 sshd 1 v c rw------ 6 0 - /dev/null > 10955 sshd 2 v c rw------ 6 0 - /dev/null > 10955 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.2:22 > 186.xxx.xx.8:57035 > 10955 sshd 5 p - rw------ 2 0 - - > 10955 sshd 6 s - rw------ 2 0 UDS - > 10955 sshd 7 p - rw------ 1 0 - - > 10955 sshd 8 s - rw------ 2 0 UDS - > > I do not understand why these connections are remaining locked in FreeBSD > 10.0 > > I'll try this sysctl: net.inet.tcp.delayed_ack=0 > If the problem is still showing up, can you see what is going on with the socket? What is the state of the connection. Try "netstat -f inet -p tcp" and see what state the connection is in. I'm wondering if there is some sort of race going on where the socket hangs. Ideally I'd look to try and capture the packets st the end of the session. Can you do something to trigger this reliably? if so "standard" "tcpdump -pw file.bpf host HOST". I seem to recall that these connections are scheduled. If so, you can put the packet capture in a crontab to run at the same time. If you feed this to a tool like wireshark, you should get a good idea of what is happening, if not why. I understand that the timing of this might be very tricky. -- R. Kevin Oberman, Network Engineer, Retired E-mail: rkoberman@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A>