From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 3 17:17:51 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3AB75B01 for ; Thu, 3 Apr 2014 17:17:51 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 12763818 for ; Thu, 3 Apr 2014 17:17:51 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F3BDAB9C9; Thu, 3 Apr 2014 13:17:49 -0400 (EDT) From: John Baldwin To: Karl Pielorz Subject: Re: Stuck CLOSED sockets / sshd / zombies... Date: Thu, 3 Apr 2014 12:32:16 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <3FE645E9723756F22EF901AE@Mail-PC.tdx.co.uk> <201404031103.41171.jhb@freebsd.org> <6F730B3126CC5AE636D1E2A0@Mail-PC.tdx.co.uk> In-Reply-To: <6F730B3126CC5AE636D1E2A0@Mail-PC.tdx.co.uk> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201404031232.16465.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 03 Apr 2014 13:17:50 -0400 (EDT) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2014 17:17:51 -0000 On Thursday, April 03, 2014 11:59:07 am Karl Pielorz wrote: > > --On 03 April 2014 11:03 -0400 John Baldwin wrote: > > > Hmm, that fd value doesn't make any sense now. Do you have the backtrace > > for that process? The fd may show up in the arguments to kern_readv(). > > Ok, bt shows: > > " > #0 sched_switch (td=0xfffff800238bb920, newtd=, > flags=) at ../../../kern/sched_ule.c:1938 > #1 0xffffffff808be76e in mi_switch (flags=260, newtd=0x0) at > ../../../kern/kern_synch.c:494 > #2 0xffffffff808f9002 in sleepq_catch_signals (wchan=0xfffff80002da4c24, > pri=104) at ../../../kern/subr_sleepqueue.c:429 > #3 0xffffffff808f8eaf in sleepq_wait_sig (wchan=0x0, pri=0) at > ../../../kern/subr_sleepqueue.c:634 > #4 0xffffffff808be195 in _sleep (ident=, lock= optimized out>, priority=360, wmesg=0xffffffff80efbd30 "sbwait", > sbt=, pr=0, flags=) at > ../../../kern/kern_synch.c:254 > #5 0xffffffff8092328c in sbwait (sb=) at > ../../../kern/uipc_sockbuf.c:130 > #6 0xffffffff80926b44 in soreceive_generic (so=0xfffff80002da4ae0, > psa=0x0, uio=0xfffffe0000341ab0, mp0=0x0, controlp=0x0, flagsp=0x0) > at ../../../kern/uipc_socket.c:1496 > #7 0xffffffff8090346b in dofileread (td=0xfffff800238bb920, fd=8, > fp=0xfffff80002cf86e0, auio=0xfffffe0000341ab0, offset= out>, flags=0) > at file.h:295 > > > #8 0xffffffff809031a5 in kern_readv (td=0xfffff800238bb920, fd=8, > auio=0xfffffe0000341ab0) at ../../../kern/sys_generic.c:256 > > > #9 0xffffffff80903133 in sys_read (td=, uap= optimized out>) at ../../../kern/sys_generic.c:171 > #10 0xffffffff80c96cd7 in amd64_syscall (td=0xfffff800238bb920, traced=0) > at subr_syscall.c:134 > #11 0xffffffff80c7d3fb in Xfast_syscall () at > ../../../amd64/amd64/exception.S:391 > #12 0x000000080320d9ea in ?? () > " > > So, fd=8? - fstat seems to show that as: > > " > USER CMD PID FD MOUNT INUM MODE SZ|DV R/W > root sshd 4346 8* local stream fffff80002e55c30 <-> fffff80002e552d0 > ... > root sshd 4344 4* local stream fffff80002e552d0 <-> fffff80002e55c30 > " Right, so it's just blocked on a UNIX domain socket from the parent waiting for the parent to tell it to do something. The root issue is the parent (as I feared). Is 4344 threaded (procstat -t?) -- John Baldwin