Date: Sat, 22 Mar 2014 08:54:21 -0300 From: Marcelo Gondim <gondim@bsdinfo.com.br> To: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org> Subject: Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround Message-ID: <532D79ED.90200@bsdinfo.com.br> In-Reply-To: <CAN6yY1uEADbTHyrP7=uEgEUQWR%2BcTW2grq=aK00i9idW=ver%2Bg@mail.gmail.com> References: <53016D97.5030909@bsdinfo.com.br> <CAN6yY1uucfkdXxkCF30w1Q9vffRpDLxM90Sz1XVbdn5W69vQMg@mail.gmail.com> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br> <CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A@mail.gmail.com> <532D2852.1010700@bsdinfo.com.br> <CAN6yY1uEADbTHyrP7=uEgEUQWR%2BcTW2grq=aK00i9idW=ver%2Bg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Em 22/03/14 04:18, Kevin Oberman escreveu: > On Fri, Mar 21, 2014 at 11:06 PM, Marcelo Gondim > <gondim@bsdinfo.com.br <mailto:gondim@bsdinfo.com.br>> wrote: > > Em 22/03/14 02:02, Kevin Oberman escreveu: > > On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim > <gondim@bsdinfo.com.br <mailto:gondim@bsdinfo.com.br>>wrote: > > Em 20/03/14 11:58, John Baldwin escreveu: > > On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim > wrote: > > Em 19/03/14 13:01, Kevin Oberman escreveu: > > On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim > > <gondim@bsdinfo.com.br > <mailto:gondim@bsdinfo.com.br>>wrote: > Hi all, > > While the solution does not appear, did > the script below and put it in > crontab to automatically delete zombie > sshd processes. > > the_walking_dead.sh: > > #!/bin/sh > kill -9 `ps afx|grep sshd|grep unknown|awk > '{print $1}'` > > > Put this in /etc/crontab: > > 00 1 * * * root the_walking_dead.sh > > > If 'kill -9' works, the process is not > really a zombie. It simply > > still > > has > a socket open and is waiting for it to be closed > before exiting. > > You might takes a look at network sockets with > sockstat(1) and see if > you > can get any indication of why these sockets > are not being closed. It may > > be > that the issue is not sshd but some other issue in > the OS leaving sockets > > open. > > Hi Kevin, > > My ps -afx below: > > [...] > 42139 - Is 0:00.01 sshd: unknown [priv] (sshd) > 42140 - Z 0:00.01 <defunct> > 42141 - IW 0:00.00 sshd: unknown [pam] (sshd) > 58445 - Is 0:00.01 sshd: unknown [priv] (sshd) > 58446 - Z 0:00.02 <defunct> > 58447 - IW 0:00.00 sshd: unknown [pam] (sshd) > 65635 - Is 0:00.01 sshd: vinicius [priv] > (sshd) > 65636 - Z 0:00.01 <defunct> > [...] > > # sockstat | grep 42140 > # > > # sockstat | grep 58446 > # > > # sockstat | grep 65636 > # > > No associated socket with zombie process. > > Do a pstree. I bet the zombies are children of the > other processes that > are stuck on a socket as Kevin described. > > # ps afx|grep sshd |grep unk > > 10948 - Is 0:00.02 sshd: unknown [priv] (sshd) > 10955 - IW 0:00.00 sshd: unknown [pam] (sshd) > <==== > 11701 - Is 0:00.02 sshd: unknown [priv] (sshd) > 11704 - IW 0:00.00 sshd: unknown [pam] (sshd) > 25450 - Is 0:00.01 sshd: unknown [priv] (sshd) > 25452 - IW 0:00.00 sshd: unknown [pam] (sshd) > 41193 - Is 0:00.02 sshd: unknown [priv] (sshd) > 41196 - IW 0:00.00 sshd: unknown [pam] (sshd) > 42193 - Is 0:00.02 sshd: unknown [priv] (sshd) > 42195 - IW 0:00.00 sshd: unknown [pam] (sshd) > 80638 - Is 0:00.02 sshd: unknown [priv] (sshd) > 80640 - IW 0:00.00 sshd: unknown [pam] (sshd) > 81484 - Is 0:00.02 sshd: unknown [priv] (sshd) > 81486 - IW 0:00.00 sshd: unknown [pam] (sshd) > > With proctstat I could see the socket as follows: > > # procstat -f 10955 > PID COMM FD T V FLAGS REF OFFSET PRO > NAME > 10955 sshd text v r r------- - - - > /usr/sbin/sshd > 10955 sshd cwd v d r------- - - - / > 10955 sshd root v d r------- - - - / > 10955 sshd 0 v c rw------ 6 0 - > /dev/null > 10955 sshd 1 v c rw------ 6 0 - > /dev/null > 10955 sshd 2 v c rw------ 6 0 - > /dev/null > 10955 sshd 3 s - rw---n-- 2 0 TCP > 186.xxx.xx.2:22 > 186.xxx.xx.8:57035 > 10955 sshd 5 p - rw------ 2 0 - - > 10955 sshd 6 s - rw------ 2 0 UDS - > 10955 sshd 7 p - rw------ 1 0 - - > 10955 sshd 8 s - rw------ 2 0 UDS - > > I do not understand why these connections are remaining > locked in FreeBSD > 10.0 > > I'll try this sysctl: net.inet.tcp.delayed_ack=0 > > If the problem is still showing up, can you see what is going > on with the > socket? What is the state of the connection. Try "netstat -f > inet -p tcp" > and see what state the connection is in. I'm wondering if > there is some > sort of race going on where the socket hangs. > > Ideally I'd look to try and capture the packets st the end of > the session. > Can you do something to trigger this reliably? if so > "standard" "tcpdump > -pw file.bpf host HOST". I seem to recall that these > connections are > scheduled. If so, you can put the packet capture in a crontab > to run at the > same time. If you feed this to a tool like wireshark, you > should get a good > idea of what is happening, if not why. I understand that the > timing of this > might be very tricky. > > Hi Kevin, > > Thanks for your help. > > I did the netstat and the state of the connection is closed as you > can see below: > > # procstat -f 26177 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 26177 sshd text v r r------- - - - /usr/sbin/sshd > 26177 sshd cwd v d r------- - - - / > 26177 sshd root v d r------- - - - / > 26177 sshd 0 v c rw------ 6 0 - /dev/null > 26177 sshd 1 v c rw------ 6 0 - /dev/null > 26177 sshd 2 v c rw------ 6 0 - /dev/null > 26177 sshd 3 s - rw---n-- 2 0 TCP > 186.193.48.10:4321 <http://186.193.48.10:4321> 186.193.48.8:50094 > <http://186.193.48.8:50094> > 26177 sshd 4 s - rw------ 1 0 UDS - > 26177 sshd 5 p - rw------ 2 0 - - > 26177 sshd 6 s - rw------ 2 0 UDS - > > # procstat -f 10110 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 10110 sshd text v r r------- - - - /usr/sbin/sshd > 10110 sshd cwd v d r------- - - - / > 10110 sshd root v d r------- - - - / > 10110 sshd 0 v c rw------ 6 0 - /dev/null > 10110 sshd 1 v c rw------ 6 0 - /dev/null > 10110 sshd 2 v c rw------ 6 0 - /dev/null > 10110 sshd 3 s - rw---n-- 2 0 TCP > 186.193.48.10:4321 <http://186.193.48.10:4321> 186.193.48.8:63048 > <http://186.193.48.8:63048> > 10110 sshd 4 s - rw------ 1 0 UDS - > 10110 sshd 5 p - rw------ 2 0 - - > 10110 sshd 6 s - rw------ 2 0 UDS - > > # netstat -f inet -p tcp > Active Internet connections > Proto Recv-Q Send-Q Local Address Foreign Address (state) > tcp4 0 0 bart.24173 pppoe17250.8728 ESTABLISHED > tcp4 0 0 bart.53795 pppoe17249.8728 TIME_WAIT > tcp4 0 0 bart.54191 pppoe149.8728 TIME_WAIT > tcp4 0 0 bart.12476 pppoe148.8728 TIME_WAIT > tcp4 0 0 bart.36846 pppoe142.8728 TIME_WAIT > tcp4 0 0 bart.39944 186.193.48.22.8728 TIME_WAIT > tcp4 0 0 bart.60233 186.193.48.25.8728 TIME_WAIT > tcp4 0 0 bart.50946 186.193.48.9.8728 TIME_WAIT > tcp4 0 0 bart.13403 186.193.48.19.8728 TIME_WAIT > tcp4 0 0 bart.36982 zeus.linuxinfo.c.8728 TIME_WAIT > tcp4 0 0 bart.rwhois pppoe769.49896 ESTABLISHED > tcp4 0 0 bart.mysql mail.15711 ESTABLISHED > tcp4 0 0 bart.mysql mail.16087 ESTABLISHED > tcp4 0 0 bart.mysql mail.25051 ESTABLISHED > tcp4 0 0 bart.mysql mail.59126 ESTABLISHED > tcp4 0 0 bart.mysql mail.59051 ESTABLISHED > tcp4 0 0 bart.mysql mail.29446 ESTABLISHED > tcp4 0 0 bart.mysql mail.45453 ESTABLISHED > tcp4 0 0 bart.mysql mail.14938 ESTABLISHED > tcp4 0 0 bart.mysql mail.46230 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.16930 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.28074 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.53686 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.14448 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.52487 ESTABLISHED > tcp4 0 0 bart.rwhois 186.193.48.8.50094 CLOSED > <==== > tcp4 0 0 bart.mysql mail.38286 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.32387 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.52219 ESTABLISHED > tcp4 0 0 bart.mysql mail.52144 ESTABLISHED > tcp4 0 0 bart.mysql mail.18862 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.52636 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.51607 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.62581 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.23071 ESTABLISHED > tcp4 0 0 bart.mysql mail.22862 FIN_WAIT_2 > tcp4 0 0 bart.rwhois 186.193.48.8.63048 CLOSED > <==== > tcp4 0 0 bart.mysql mail.42479 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.18146 ESTABLISHED > tcp4 0 0 bart.mysql mail.46731 FIN_WAIT_2 > tcp4 0 0 bart.mysql mail.20498 ESTABLISHED > tcp4 0 0 bart.62869 186.193.48.2.1190 ESTABLISHED > tcp4 0 0 bart.mysql mail.55353 ESTABLISHED > > > I'm sorry. I am now even more confused. Maybe I need to re-read the > entire thread. > > I thought that the hung processes were sshd. These are rwhois. Or is > there an ssh tunnel carrying the rwhois connections? (I see no sshd > connections in this list.) > -- > R. Kevin Oberman, Network Engineer, Retired > E-mail: rkoberman@gmail.com <mailto:rkoberman@gmail.com> Hi Kevin, Nope, I use 4321/tcp port for sshd and not port 22/tcp. When I ran the netstat did not put the -nparameter and then it changed 4321to rwhois. # cat /etc/services |grep rwhois rwhois 4321/tcp #Remote Who Is rwhois 4321/udp #Remote Who Is
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?532D79ED.90200>