From owner-freebsd-current@FreeBSD.ORG Fri Oct 26 18:21:22 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D48F16A47F for ; Fri, 26 Oct 2007 18:21:22 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id EB2E213C481 for ; Fri, 26 Oct 2007 18:21:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8p) with ESMTP id 216251073-1834499 for multiple; Fri, 26 Oct 2007 14:23:30 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id l9QIKdC2037881; Fri, 26 Oct 2007 14:20:46 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Gleb Kozyrev" Date: Fri, 26 Oct 2007 12:22:28 -0400 User-Agent: KMail/1.9.6 References: <200710251435.58984.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200710261222.28656.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Fri, 26 Oct 2007 14:20:47 -0400 (EDT) X-Virus-Scanned: ClamAV 0.91.2/4600/Fri Oct 26 10:02:30 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-current@freebsd.org Subject: Re: Deadlock, exclusive sx so_rcv_sx, amd64 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Oct 2007 18:21:22 -0000 On Friday 26 October 2007 05:52:07 am Gleb Kozyrev wrote: > On 25/10/2007, John Baldwin wrote: > > > Running rtorrent and ftp brings my system to a deadlock > > > in a few hours. Kernel still responds to pings and sends some > > > TCP acks. > ... > > > Please suggest any other commands to run in DDB if needed. > > > Cores are saved. > > > > show sleepchain will show if it's a real deadlock or not. > > > > This time the freeze was a matter of minutes. > > db> ps > pid ppid pgrp uid state wmesg wchan cmd > 1229 991 991 0 ? smbd > 1201 1195 1201 1001 SL+ pfault 0xffffffff80b1359c rtorrent > 1199 1193 1199 1001 Ss+ ttyin 0xffffff0001211410 tcsh > 1197 1193 1197 1001 Ss+ ttyin 0xffffff0001218810 tcsh > 1195 1193 1195 1001 Ss+ pause 0xffffff000624a0c0 tcsh > 1193 1192 1193 1001 SLs pfault 0xffffffff80b1359c screen > 1192 1190 1190 1001 S+ pause 0xffffff00013c10c0 screen > 1190 1189 1190 1001 Ss+ pause 0xffffff00065b40c0 tcsh > 1189 1187 1187 1001 S select 0xffffffff80af79d0 sshd > 1187 1097 1187 0 Ss sbwait 0xffffff00065346cc sshd > ... > > db> show alllocks > Process 1187 (sshd) thread 0xffffff00065ad350 (100166) > exclusive sx so_rcv_sx r = 0 (0xffffff0006534670) locked @ > /usr/src/sys/kern/uipc_sockbuf.c:145 > > db> show sleepchain 1187 > thread 100166 (pid 1187, sshd) sleeping on 0xffffff00065346cc "sbwait" > db> show sleepchain 1201 > thread 100164 (pid 1201, rtorrent) sleeping on 0xffffffff80b1359c "pfault" > > Nothing interesting I guess... > Maybe this is not a deadlock, what else can cause such a freeze? > I won't reboot it for a while -- maybe someone can suggest anything else. "sbwait" is waiting for data to come in on a socket and "pfault" is waiting on disk I/O. It is a bit odd that 1187 is holding a lock while sleeping though that is permitted with an sx lock. Still, if it's supposed to be protect socket's receive buffer that is odd. Maybe get a trace of the process blocked in "sbwait" (tr ) and bug rwatson@ about it. -- John Baldwin