From owner-freebsd-stable@FreeBSD.ORG Tue Aug 20 22:18:48 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 477825B9; Tue, 20 Aug 2013 22:18:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id EE35720DF; Tue, 20 Aug 2013 22:18:47 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqMEAOLqE1KDaFve/2dsb2JhbABagzpRgxq8QIE9dIIkAQEBAwEBAQEgKyALBRYYAgINGQIpAQkmBggHBAEcBIdpBgyif4pigSmNc4EFNAeCaIEsA5Udg3WQKYM4IDKBAzk X-IronPort-AV: E=Sophos;i="4.89,923,1367985600"; d="scan'208";a="45703111" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 20 Aug 2013 18:18:16 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C4BB5B3EEF; Tue, 20 Aug 2013 18:18:16 -0400 (EDT) Date: Tue, 20 Aug 2013 18:18:16 -0400 (EDT) From: Rick Macklem To: J David Message-ID: <937358501.11648801.1377037096794.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: NFS deadlock on 9.2-Beta1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Konstantin Belousov , freebsd-stable , scottl , Michael Tratz , Steven Hartland X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Aug 2013 22:18:48 -0000 J David wrote: > On Thu, Aug 15, 2013 at 5:39 PM, Rick Macklem > wrote: > > Have you been able to pass the debugging info on to Kostik? > > > > It would be really nice to get this fixed for FreeBSD9.2. > > You're probably not talking to me, but headway here is slow. At our > location, we have been continuing to test releng/9.2 extensively, but > with r250907 reverted. Since reverting it solves the issue, and > since > there haven't been any further changes to releng/9.2 that might also > resolve this issue, re-applying r250907 is perceived here as > un-fixing > a problem. Enthusiasm for doing so is correspondingly low, even if > the purpose is to gather debugging info. :( > > However, after finally having clearance to test releng/9.2 r254540 > with r250907 included and with DDB on five nodes. The problem > cropped > up in about an hour. Two threads in one process deadlocked, was > perfect. Got it into DDB and saw the stack trace was scrolling off > so > there was no way to copy it by hand. Also, the machine's disk is > smaller than physical RAM, so no dump file. :( > > Here's what is available so far: > > db> show proc 33362 > > Process 33362 (httpd) at 0xcd225b50: > > state: NORMAL > > uid: 25000 gids: 25000 > > parent: pid 25104 at 0xc95f92d4 > > ABI: FreeBSD ELF32 > > arguments: /usr/local/libexec/httpd > > threads: 3 > > 100405 D newnfs 0xc9b875e4 httpd > Ok, so this one is waiting for an NFS vnode lock. > 100393 D pgrbwt 0xc43a30c0 httpd > This one is sleeping in vm_page_grab() { which I suspect has been called from kern_sendfile() with a shared vnode lock held, from what I saw on the previous debug info }. > 100755 S uwait 0xc84b7c80 httpd > > > Not much to go on. :( Maybe these five can be configured with serial > consoles. > > So, inquiries are continuing, but the answer to "does this still > happen on 9.2-RC2?" is definitely yes. > Since r250027 moves a vn_lock() to before the vm_page_grab() call in kern_sendfile(), I suspect that is the cause of the deadlock. (r250027 is one of the 3 commits MFC'd by r250907) I don't know if it would be safe to VOP_UNLOCK() the vnode after VOP_GETATTR() and then put the vn_lock() call that comes after vm_page_grab() back in or whether r250027 should be reverted (getting rid of the VOP_GETATTR() and going back to using the size in the vm stuff). Hopefully Kostik will know what is best to do with it now, rick > Thanks! > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >