From owner-freebsd-stable@FreeBSD.ORG Sat Mar 22 06:06:27 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9B312C61 for ; Sat, 22 Mar 2014 06:06:27 +0000 (UTC) Received: from mail.bsdinfo.com.br (mail.bsdinfo.com.br [67.212.89.78]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4FB7D7A2 for ; Sat, 22 Mar 2014 06:06:26 +0000 (UTC) Received: from mail.bsdinfo.com.br (mail.bsdinfo.com.br [127.0.0.1]) by mail.bsdinfo.com.br (Postfix) with ESMTP id C2FFD139C8 for ; Sat, 22 Mar 2014 06:08:56 -0300 (BRT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=bsdinfo.com.br; h=content-type:content-type:in-reply-to:references:subject :subject:to:mime-version:user-agent:from:from:date:date :message-id; s=dkim; t=1395479332; x=1396343333; bh=Lzh/j4jIvVny Q2D8mC+2J2xviKWcJ7Kpj0R/8j0Ra50=; b=cdwo56ln3lk5ugpDpXeJ+UzO4v7Q bUBrsyMOrAyjPrwPpQ3SPcjSnEfsEyHGxCeuct24njLrhZkPLV4l/AYcChlGa3sT OONSqQ8pAFkyxlGymOxUwnvMiNW6rZJBwOxIqw75hu3fjg3R1ZCsr7N0Cu0QiAxe h3tQrvl07fDtDtE= X-Virus-Scanned: amavisd-new at mail.bsdinfo.com.br Received: from mail.bsdinfo.com.br ([127.0.0.1]) by mail.bsdinfo.com.br (mail.bsdinfo.com.br [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NvX9ifwbpiFZ for ; Sat, 22 Mar 2014 06:08:52 -0300 (BRT) Received: from MacBook-de-Gondim-2.local (unknown [186.193.54.69]) by mail.bsdinfo.com.br (Postfix) with ESMTPSA id 914A5139C3 for ; Sat, 22 Mar 2014 06:08:51 -0300 (BRT) Message-ID: <532D2852.1010700@bsdinfo.com.br> Date: Sat, 22 Mar 2014 03:06:10 -0300 From: Marcelo Gondim User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: FreeBSD Stable Mailing List Subject: Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround References: <53016D97.5030909@bsdinfo.com.br> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2014 06:06:27 -0000 Em 22/03/14 02:02, Kevin Oberman escreveu: > On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim wrote: > >> Em 20/03/14 11:58, John Baldwin escreveu: >> >>> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote: >>> >>> Em 19/03/14 13:01, Kevin Oberman escreveu: >>>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim >>>>> >>>> wrote: >>>> Hi all, >>>>>> While the solution does not appear, did the script below and put it in >>>>>> crontab to automatically delete zombie sshd processes. >>>>>> >>>>>> the_walking_dead.sh: >>>>>> >>>>>> #!/bin/sh >>>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'` >>>>>> >>>>>> >>>>>> Put this in /etc/crontab: >>>>>> >>>>>> 00 1 * * * root the_walking_dead.sh >>>>>> >>>>>> >>>>>> If 'kill -9' works, the process is not really a zombie. It simply >>>>> still >>>>> >>>> has >>>> a socket open and is waiting for it to be closed before exiting. >>>>> You might takes a look at network sockets with sockstat(1) and see if >>>>> you >>>>> can get any indication of why these sockets are not being closed. It may >>>>> >>>> be >>>> that the issue is not sshd but some other issue in the OS leaving sockets >>>>> open. >>>>> >>>>> Hi Kevin, >>>> My ps -afx below: >>>> >>>> [...] >>>> 42139 - Is 0:00.01 sshd: unknown [priv] (sshd) >>>> 42140 - Z 0:00.01 >>>> 42141 - IW 0:00.00 sshd: unknown [pam] (sshd) >>>> 58445 - Is 0:00.01 sshd: unknown [priv] (sshd) >>>> 58446 - Z 0:00.02 >>>> 58447 - IW 0:00.00 sshd: unknown [pam] (sshd) >>>> 65635 - Is 0:00.01 sshd: vinicius [priv] (sshd) >>>> 65636 - Z 0:00.01 >>>> [...] >>>> >>>> # sockstat | grep 42140 >>>> # >>>> >>>> # sockstat | grep 58446 >>>> # >>>> >>>> # sockstat | grep 65636 >>>> # >>>> >>>> No associated socket with zombie process. >>>> >>> Do a pstree. I bet the zombies are children of the other processes that >>> are stuck on a socket as Kevin described. >>> >>> # ps afx|grep sshd |grep unk >> 10948 - Is 0:00.02 sshd: unknown [priv] (sshd) >> 10955 - IW 0:00.00 sshd: unknown [pam] (sshd) <==== >> 11701 - Is 0:00.02 sshd: unknown [priv] (sshd) >> 11704 - IW 0:00.00 sshd: unknown [pam] (sshd) >> 25450 - Is 0:00.01 sshd: unknown [priv] (sshd) >> 25452 - IW 0:00.00 sshd: unknown [pam] (sshd) >> 41193 - Is 0:00.02 sshd: unknown [priv] (sshd) >> 41196 - IW 0:00.00 sshd: unknown [pam] (sshd) >> 42193 - Is 0:00.02 sshd: unknown [priv] (sshd) >> 42195 - IW 0:00.00 sshd: unknown [pam] (sshd) >> 80638 - Is 0:00.02 sshd: unknown [priv] (sshd) >> 80640 - IW 0:00.00 sshd: unknown [pam] (sshd) >> 81484 - Is 0:00.02 sshd: unknown [priv] (sshd) >> 81486 - IW 0:00.00 sshd: unknown [pam] (sshd) >> >> With proctstat I could see the socket as follows: >> >> # procstat -f 10955 >> PID COMM FD T V FLAGS REF OFFSET PRO NAME >> 10955 sshd text v r r------- - - - /usr/sbin/sshd >> 10955 sshd cwd v d r------- - - - / >> 10955 sshd root v d r------- - - - / >> 10955 sshd 0 v c rw------ 6 0 - /dev/null >> 10955 sshd 1 v c rw------ 6 0 - /dev/null >> 10955 sshd 2 v c rw------ 6 0 - /dev/null >> 10955 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.2:22 >> 186.xxx.xx.8:57035 >> 10955 sshd 5 p - rw------ 2 0 - - >> 10955 sshd 6 s - rw------ 2 0 UDS - >> 10955 sshd 7 p - rw------ 1 0 - - >> 10955 sshd 8 s - rw------ 2 0 UDS - >> >> I do not understand why these connections are remaining locked in FreeBSD >> 10.0 >> >> I'll try this sysctl: net.inet.tcp.delayed_ack=0 >> > If the problem is still showing up, can you see what is going on with the > socket? What is the state of the connection. Try "netstat -f inet -p tcp" > and see what state the connection is in. I'm wondering if there is some > sort of race going on where the socket hangs. > > Ideally I'd look to try and capture the packets st the end of the session. > Can you do something to trigger this reliably? if so "standard" "tcpdump > -pw file.bpf host HOST". I seem to recall that these connections are > scheduled. If so, you can put the packet capture in a crontab to run at the > same time. If you feed this to a tool like wireshark, you should get a good > idea of what is happening, if not why. I understand that the timing of this > might be very tricky. Hi Kevin, Thanks for your help. I did the netstat and the state of the connection is closed as you can see below: # procstat -f 26177 PID COMM FD T V FLAGS REF OFFSET PRO NAME 26177 sshd text v r r------- - - - /usr/sbin/sshd 26177 sshd cwd v d r------- - - - / 26177 sshd root v d r------- - - - / 26177 sshd 0 v c rw------ 6 0 - /dev/null 26177 sshd 1 v c rw------ 6 0 - /dev/null 26177 sshd 2 v c rw------ 6 0 - /dev/null 26177 sshd 3 s - rw---n-- 2 0 TCP 186.193.48.10:4321 186.193.48.8:50094 26177 sshd 4 s - rw------ 1 0 UDS - 26177 sshd 5 p - rw------ 2 0 - - 26177 sshd 6 s - rw------ 2 0 UDS - # procstat -f 10110 PID COMM FD T V FLAGS REF OFFSET PRO NAME 10110 sshd text v r r------- - - - /usr/sbin/sshd 10110 sshd cwd v d r------- - - - / 10110 sshd root v d r------- - - - / 10110 sshd 0 v c rw------ 6 0 - /dev/null 10110 sshd 1 v c rw------ 6 0 - /dev/null 10110 sshd 2 v c rw------ 6 0 - /dev/null 10110 sshd 3 s - rw---n-- 2 0 TCP 186.193.48.10:4321 186.193.48.8:63048 10110 sshd 4 s - rw------ 1 0 UDS - 10110 sshd 5 p - rw------ 2 0 - - 10110 sshd 6 s - rw------ 2 0 UDS - # netstat -f inet -p tcp Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 bart.24173 pppoe17250.8728 ESTABLISHED tcp4 0 0 bart.53795 pppoe17249.8728 TIME_WAIT tcp4 0 0 bart.54191 pppoe149.8728 TIME_WAIT tcp4 0 0 bart.12476 pppoe148.8728 TIME_WAIT tcp4 0 0 bart.36846 pppoe142.8728 TIME_WAIT tcp4 0 0 bart.39944 186.193.48.22.8728 TIME_WAIT tcp4 0 0 bart.60233 186.193.48.25.8728 TIME_WAIT tcp4 0 0 bart.50946 186.193.48.9.8728 TIME_WAIT tcp4 0 0 bart.13403 186.193.48.19.8728 TIME_WAIT tcp4 0 0 bart.36982 zeus.linuxinfo.c.8728 TIME_WAIT tcp4 0 0 bart.rwhois pppoe769.49896 ESTABLISHED tcp4 0 0 bart.mysql mail.15711 ESTABLISHED tcp4 0 0 bart.mysql mail.16087 ESTABLISHED tcp4 0 0 bart.mysql mail.25051 ESTABLISHED tcp4 0 0 bart.mysql mail.59126 ESTABLISHED tcp4 0 0 bart.mysql mail.59051 ESTABLISHED tcp4 0 0 bart.mysql mail.29446 ESTABLISHED tcp4 0 0 bart.mysql mail.45453 ESTABLISHED tcp4 0 0 bart.mysql mail.14938 ESTABLISHED tcp4 0 0 bart.mysql mail.46230 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.16930 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.28074 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.53686 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.14448 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.52487 ESTABLISHED tcp4 0 0 bart.rwhois 186.193.48.8.50094 CLOSED <==== tcp4 0 0 bart.mysql mail.38286 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.32387 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.52219 ESTABLISHED tcp4 0 0 bart.mysql mail.52144 ESTABLISHED tcp4 0 0 bart.mysql mail.18862 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.52636 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.51607 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.62581 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.23071 ESTABLISHED tcp4 0 0 bart.mysql mail.22862 FIN_WAIT_2 tcp4 0 0 bart.rwhois 186.193.48.8.63048 CLOSED <==== tcp4 0 0 bart.mysql mail.42479 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.18146 ESTABLISHED tcp4 0 0 bart.mysql mail.46731 FIN_WAIT_2 tcp4 0 0 bart.mysql mail.20498 ESTABLISHED tcp4 0 0 bart.62869 186.193.48.2.1190 ESTABLISHED tcp4 0 0 bart.mysql mail.55353 ESTABLISHED Cheers, Gondim