From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 2 14:41:16 2014 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5DD7DCE for ; Wed, 2 Apr 2014 14:41:16 +0000 (UTC) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 30EA899F for ; Wed, 2 Apr 2014 14:41:16 +0000 (UTC) Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52] helo=damnhippie.dyndns.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1WVMLq-000FWn-Tk; Wed, 02 Apr 2014 14:41:15 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id s32EfDbA085936; Wed, 2 Apr 2014 08:41:13 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 24.8.230.52 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/hO9RNr1XSxkENwNRCGtWG Subject: Re: Stuck CLOSED sockets / sshd / zombies... From: Ian Lepore To: Karl Pielorz In-Reply-To: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk> References: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk> Content-Type: text/plain; charset="us-ascii" Date: Wed, 02 Apr 2014 08:41:13 -0600 Message-ID: <1396449673.81853.264.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2014 14:41:16 -0000 On Wed, 2014-04-02 at 15:30 +0100, Karl Pielorz wrote: > Hi All, > > This issue started in -xen (subject: *Stuck sshd in urdlck), moved to > -stable (subject: sshd with zombie process on FreeBSD 10.0-STABLE), and > -net (subject: Server sockets staying in CLOSED for extended), but seems to > have died a death in all of them. > > It's affecting a number of people - predominately with sshd. > > Does anyone know how I can troubleshoot this further, what the cause / fix > is, or if it's already actually fixed? > > " > # ps ax | grep 4344 > ps axl | grep 4344 > 0 4344 895 0 20 0 84868 6944 urdlck Is - 0:00.01 sshd: unknown > [priv] (sshd) > 22 4345 4344 0 20 0 0 0 - Z - 0:00.00 > 0 4346 4344 0 21 0 84868 6952 sbwait I - 0:00.00 sshd: unknown > [pam] (sshd) > > #ps axd > ... > 895 - Is 0:00.05 |-- /usr/sbin/sshd > 3933 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 3934 - Z 0:00.00 | | |-- > 3935 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > 4338 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 4339 - Z 0:00.00 | | |-- > 4340 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > 4341 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 4342 - Z 0:00.00 | | |-- > 4343 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > 4344 - Is 0:00.01 | |-- sshd: unknown [priv] (sshd) > 4345 - Z 0:00.00 | | |-- > 4346 - I 0:00.00 | | `-- sshd: unknown [pam] (sshd) > ... > > #netstat -a -n | grep CLOSED | wc -l > 59 > > #netstat -a | grep 54544 > tcp4 0 0 192.168.0.138.22 192.168.0.45.54544 CLOSED > > #sockstat | grep 4343 > root sshd 4343 3 tcp4 192.168.0.138:22 192.168.0.45:54544 > root sshd 4343 6 stream (not connected) > root sshd 4343 8 stream -> ?? > > #uname -a > FreeBSD host 10.0-STABLE FreeBSD 10.0-STABLE #0 r261289M: Thu Jan 30 > 13:33:35 UTC 2014 x@domain.com:/usr/src/sys/amd64/compile/GENERIC amd64 > " > > For a box that's doing nothing (apart from people ssh'ing in occasionally) > - there's obviously something wrong. > > What would be next to try and figure out why this is happening? - as I'd > dearly like to know what's causing it / a fix (or if it's already fixed in > -STABLE, and at which revision) > > Thanks, > > -Karl I don't know anything about the underlying cause of the stuck sockets or zombies, but I suspect the thing that triggered the appearance of the problem was the import of a newer openssh in which the UsePrivilegeSeparation option default changed to "Sandbox" (or maybe that was just a new option with the new version). I think of this possibility because the extra child forked off with that option exposed some kernel memory-management problems on the arm platform a few months ago. That may imply that adding "UsePrivilegeSeparation no" could be a workaround for anyone having severe problems with this on a production server, but it should in no way become mythology that doing this somehow "fixes" a problem -- it would be purely a workaround, and we should keep pursuing the actual problem. -- Ian