From owner-freebsd-stable@FreeBSD.ORG  Sat Mar 22 05:02:51 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A3A4C2E3
 for <freebsd-stable@freebsd.org>; Sat, 22 Mar 2014 05:02:51 +0000 (UTC)
Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com
 [IPv6:2607:f8b0:400e:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 74C1616D
 for <freebsd-stable@freebsd.org>; Sat, 22 Mar 2014 05:02:51 +0000 (UTC)
Received: by mail-pa0-f45.google.com with SMTP id kl14so3222353pab.18
 for <freebsd-stable@freebsd.org>; Fri, 21 Mar 2014 22:02:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=Rgcg+KRMll38kf8pEuKMCtb9WtQpZD7R7ZvvYwU5NF4=;
 b=wm5mKnVZs4Gd19RX5zz2zdSbHzUqIDFNPId5HOAKp5Jqiwn4fQIq3wuiJzNHnss+El
 ZnMpTMm4miY+k3wBN/jaILBBQ37j+AN1Q/fhL1v9FIbrP1O8GAO6iH5u1bYaY5IZdNN8
 ek+faVPq17WXyQKZaxBDZwqYOUHnd0xF9GmVqh5/DHQZK2ygPF8LZ6aE1dAu9dL/YCzl
 U9gqJ306kGk2vT7LsAdj3ECdUhDor2sMzD/XJeB8ZCFUf2K8BBnhV6AdFMxNm3kKiUcU
 6Sizphk1AUwgH+3CJvB1dA9TnxcdhW9KV05wqlaQXVIub72ft4AybaQVAdvdPpCgtYcx
 BO/A==
MIME-Version: 1.0
X-Received: by 10.66.193.161 with SMTP id hp1mr49871466pac.20.1395464570821;
 Fri, 21 Mar 2014 22:02:50 -0700 (PDT)
Sender: kob6558@gmail.com
Received: by 10.66.0.164 with HTTP; Fri, 21 Mar 2014 22:02:50 -0700 (PDT)
In-Reply-To: <532B7DEC.7010809@bsdinfo.com.br>
References: <53016D97.5030909@bsdinfo.com.br>
 <CAN6yY1uucfkdXxkCF30w1Q9vffRpDLxM90Sz1XVbdn5W69vQMg@mail.gmail.com>
 <5329D81E.7040709@bsdinfo.com.br>
 <201403201058.38555.jhb@freebsd.org>
 <532B7DEC.7010809@bsdinfo.com.br>
Date: Fri, 21 Mar 2014 22:02:50 -0700
X-Google-Sender-Auth: ce7NBcvyNBLuckh5m1UAduH0ftA
Message-ID: <CAN6yY1sf0z_jBJgBy2dZX0a3JJnyTnq76_DepXzG32GWgHHO6A@mail.gmail.com>
Subject: Re: sshd with zombie process on FreeBSD 10.0-STABLE - workaround
From: Kevin Oberman <rkoberman@gmail.com>
To: Marcelo Gondim <gondim@bsdinfo.com.br>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Mar 2014 05:02:51 -0000

On Thu, Mar 20, 2014 at 4:46 PM, Marcelo Gondim <gondim@bsdinfo.com.br>wrote:

> Em 20/03/14 11:58, John Baldwin escreveu:
>
>> On Wednesday, March 19, 2014 1:47:10 pm Marcelo Gondim wrote:
>>
>>  Em 19/03/14 13:01, Kevin Oberman escreveu:
>>>
>>>> On Wed, Mar 19, 2014 at 6:00 AM, Marcelo Gondim
>>>>
>>> <gondim@bsdinfo.com.br>wrote:
>>
>>> Hi all,
>>>>>
>>>>> While the solution does not appear, did the script below and put it in
>>>>> crontab to automatically delete zombie sshd processes.
>>>>>
>>>>> the_walking_dead.sh:
>>>>>
>>>>> #!/bin/sh
>>>>> kill -9 `ps afx|grep sshd|grep unknown|awk '{print $1}'`
>>>>>
>>>>>
>>>>> Put this in /etc/crontab:
>>>>>
>>>>> 00 1 * * *    root    the_walking_dead.sh
>>>>>
>>>>>
>>>>>  If 'kill -9' works, the process is not really a zombie. It simply
>>>> still
>>>>
>>> has
>>
>>> a socket open and is waiting for it to be closed before exiting.
>>>>
>>>> You might takes a look at network sockets with sockstat(1) and see if
>>>> you
>>>> can get any indication of why these sockets are not being closed. It may
>>>>
>>> be
>>
>>> that the issue is not sshd but some other issue in the OS leaving sockets
>>>> open.
>>>>
>>>>  Hi Kevin,
>>>
>>> My ps -afx below:
>>>
>>> [...]
>>> 42139  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>>> 42140  -  Z        0:00.01 <defunct>
>>> 42141  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>> 58445  -  Is       0:00.01 sshd: unknown [priv] (sshd)
>>> 58446  -  Z        0:00.02 <defunct>
>>> 58447  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>>> 65635  -  Is       0:00.01 sshd: vinicius [priv] (sshd)
>>> 65636  -  Z        0:00.01 <defunct>
>>> [...]
>>>
>>> # sockstat | grep 42140
>>> #
>>>
>>> # sockstat | grep 58446
>>> #
>>>
>>> # sockstat | grep 65636
>>> #
>>>
>>> No associated socket with zombie process.
>>>
>> Do a pstree.  I bet the zombies are children of the other processes that
>> are stuck on a socket as Kevin described.
>>
>>  # ps afx|grep sshd |grep unk
> 10948  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 10955  -  IW       0:00.00 sshd: unknown [pam] (sshd)       <====
> 11701  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 11704  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 25450  -  Is       0:00.01 sshd: unknown [priv] (sshd)
> 25452  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 41193  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 41196  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 42193  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 42195  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 80638  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 80640  -  IW       0:00.00 sshd: unknown [pam] (sshd)
> 81484  -  Is       0:00.02 sshd: unknown [priv] (sshd)
> 81486  -  IW       0:00.00 sshd: unknown [pam] (sshd)
>
> With proctstat I could see  the socket as follows:
>
> # procstat -f 10955
>   PID COMM               FD T V FLAGS     REF  OFFSET PRO NAME
> 10955 sshd              text v r r-------  -       - - /usr/sbin/sshd
> 10955 sshd               cwd v d r-------  -       - - /
> 10955 sshd              root v d r-------  -       - - /
> 10955 sshd                 0 v c rw------  6       0 - /dev/null
> 10955 sshd                 1 v c rw------  6       0 - /dev/null
> 10955 sshd                 2 v c rw------  6       0 - /dev/null
> 10955 sshd                 3 s - rw---n--  2       0 TCP 186.xxx.xx.2:22
> 186.xxx.xx.8:57035
> 10955 sshd                 5 p - rw------  2       0 - -
> 10955 sshd                 6 s - rw------  2       0 UDS -
> 10955 sshd                 7 p - rw------  1       0 - -
> 10955 sshd                 8 s - rw------  2       0 UDS -
>
> I do not understand why these connections are remaining locked in FreeBSD
> 10.0
>
> I'll try this sysctl: net.inet.tcp.delayed_ack=0
>

If the problem is still showing up, can you  see what is going on with the
socket? What is the state of the connection. Try "netstat -f inet -p tcp"
and see what state the connection is in. I'm wondering if there is some
sort of race going on where the socket hangs.

Ideally I'd look to try and capture the packets st the end of the session.
Can you do something to trigger this reliably? if so "standard" "tcpdump
-pw file.bpf host HOST". I seem to recall that these connections are
scheduled. If so, you can put the packet capture in a crontab to run at the
same time. If you feed this to a tool like wireshark, you should get a good
idea of what is happening, if not why. I understand that the timing of this
might be very tricky.
-- 
R. Kevin Oberman, Network Engineer, Retired
E-mail: rkoberman@gmail.com