From owner-freebsd-stable@FreeBSD.ORG Sat Mar 22 05:02:51 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A3A4C2E3 for ; Sat, 22 Mar 2014 05:02:51 +0000 (UTC) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 74C1616D for ; Sat, 22 Mar 2014 05:02:51 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id kl14so3222353pab.18 for ; Fri, 21 Mar 2014 22:02:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Rgcg+KRMll38kf8pEuKMCtb9WtQpZD7R7ZvvYwU5NF4=; b=wm5mKnVZs4Gd19RX5zz2zdSbHzUqIDFNPId5HOAKp5Jqiwn4fQIq3wuiJzNHnss+El ZnMpTMm4miY+k3wBN/jaILBBQ37j+AN1Q/fhL1v9FIbrP1O8GAO6iH5u1bYaY5IZdNN8 ek+faVPq17WXyQKZaxBDZwqYOUHnd0xF9GmVqh5/DHQZK2ygPF8LZ6aE1dAu9dL/YCzl U9gqJ306kGk2vT7LsAdj3ECdUhDor2sMzD/XJeB8ZCFUf2K8BBnhV6AdFMxNm3kKiUcU 6Sizphk1AUwgH+3CJvB1dA9TnxcdhW9KV05wqlaQXVIub72ft4AybaQVAdvdPpCgtYcx BO/A== MIME-Version: 1.0 X-Received: by 10.66.193.161 with SMTP id hp1mr49871466pac.20.1395464570821; Fri, 21 Mar 2014 22:02:50 -0700 (PDT) Sender: kob6558@gmail.com Received: by 10.66.0.164 with HTTP; Fri, 21 Mar 2014 22:02:50 -0700 (PDT) In-Reply-To: <532B7DEC.7010809@bsdinfo.com.br> References: <53016D97.5030909@bsdinfo.com.br> <5329D81E.7040709@bsdinfo.com.br> <201403201058.38555.jhb@freebsd.org> <532B7DEC.7010809@bsdinfo.com.br> Date: Fri, 21 Mar 2014 22:02:50 -0700 X-Google-Sender-Auth: ce7NBcvyNBLuckh5m1UAduH0ftA Message-ID: Subject: Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround From: Kevin Oberman To: Marcelo Gondim Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Stable Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Mar 2014 05:02:51 -0000 On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim wrote: > Em 20/03/14 11:58, John Baldwin escreveu: > >> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote: >> >> Em 19/03/14 13:01, Kevin Oberman escreveu: >>> >>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim >>>> >>> wrote: >> >>> Hi all, >>>>> >>>>> While the solution does not appear, did the script below and put it in >>>>> crontab to automatically delete zombie sshd processes. >>>>> >>>>> the_walking_dead.sh: >>>>> >>>>> #!/bin/sh >>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'` >>>>> >>>>> >>>>> Put this in /etc/crontab: >>>>> >>>>> 00 1 * * * root the_walking_dead.sh >>>>> >>>>> >>>>> If 'kill -9' works, the process is not really a zombie. It simply >>>> still >>>> >>> has >> >>> a socket open and is waiting for it to be closed before exiting. >>>> >>>> You might takes a look at network sockets with sockstat(1) and see if >>>> you >>>> can get any indication of why these sockets are not being closed. It may >>>> >>> be >> >>> that the issue is not sshd but some other issue in the OS leaving sockets >>>> open. >>>> >>>> Hi Kevin, >>> >>> My ps -afx below: >>> >>> [...] >>> 42139 - Is 0:00.01 sshd: unknown [priv] (sshd) >>> 42140 - Z 0:00.01 >>> 42141 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 58445 - Is 0:00.01 sshd: unknown [priv] (sshd) >>> 58446 - Z 0:00.02 >>> 58447 - IW 0:00.00 sshd: unknown [pam] (sshd) >>> 65635 - Is 0:00.01 sshd: vinicius [priv] (sshd) >>> 65636 - Z 0:00.01 >>> [...] >>> >>> # sockstat | grep 42140 >>> # >>> >>> # sockstat | grep 58446 >>> # >>> >>> # sockstat | grep 65636 >>> # >>> >>> No associated socket with zombie process. >>> >> Do a pstree. I bet the zombies are children of the other processes that >> are stuck on a socket as Kevin described. >> >> # ps afx|grep sshd |grep unk > 10948 - Is 0:00.02 sshd: unknown [priv] (sshd) > 10955 - IW 0:00.00 sshd: unknown [pam] (sshd) <==== > 11701 - Is 0:00.02 sshd: unknown [priv] (sshd) > 11704 - IW 0:00.00 sshd: unknown [pam] (sshd) > 25450 - Is 0:00.01 sshd: unknown [priv] (sshd) > 25452 - IW 0:00.00 sshd: unknown [pam] (sshd) > 41193 - Is 0:00.02 sshd: unknown [priv] (sshd) > 41196 - IW 0:00.00 sshd: unknown [pam] (sshd) > 42193 - Is 0:00.02 sshd: unknown [priv] (sshd) > 42195 - IW 0:00.00 sshd: unknown [pam] (sshd) > 80638 - Is 0:00.02 sshd: unknown [priv] (sshd) > 80640 - IW 0:00.00 sshd: unknown [pam] (sshd) > 81484 - Is 0:00.02 sshd: unknown [priv] (sshd) > 81486 - IW 0:00.00 sshd: unknown [pam] (sshd) > > With proctstat I could see the socket as follows: > > # procstat -f 10955 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 10955 sshd text v r r------- - - - /usr/sbin/sshd > 10955 sshd cwd v d r------- - - - / > 10955 sshd root v d r------- - - - / > 10955 sshd 0 v c rw------ 6 0 - /dev/null > 10955 sshd 1 v c rw------ 6 0 - /dev/null > 10955 sshd 2 v c rw------ 6 0 - /dev/null > 10955 sshd 3 s - rw---n-- 2 0 TCP 186.xxx.xx.2:22 > 186.xxx.xx.8:57035 > 10955 sshd 5 p - rw------ 2 0 - - > 10955 sshd 6 s - rw------ 2 0 UDS - > 10955 sshd 7 p - rw------ 1 0 - - > 10955 sshd 8 s - rw------ 2 0 UDS - > > I do not understand why these connections are remaining locked in FreeBSD > 10.0 > > I'll try this sysctl: net.inet.tcp.delayed_ack=0 > If the problem is still showing up, can you see what is going on with the socket? What is the state of the connection. Try "netstat -f inet -p tcp" and see what state the connection is in. I'm wondering if there is some sort of race going on where the socket hangs. Ideally I'd look to try and capture the packets st the end of the session. Can you do something to trigger this reliably? if so "standard" "tcpdump -pw file.bpf host HOST". I seem to recall that these connections are scheduled. If so, you can put the packet capture in a crontab to run at the same time. If you feed this to a tool like wireshark, you should get a good idea of what is happening, if not why. I understand that the timing of this might be very tricky. -- R. Kevin Oberman, Network Engineer, Retired E-mail: rkoberman@gmail.com