From owner-freebsd-stable@FreeBSD.ORG Thu Aug 15 21:39:11 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1AAD8954; Thu, 15 Aug 2013 21:39:11 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B8BC72B56; Thu, 15 Aug 2013 21:39:10 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEABNKDVKDaFve/2dsb2JhbABbhAuDGbt8gTh0giQBAQQBI1YFFhgCAg0ZAiM2BhOHfgMJBqh9iEQNiF6BKYwsgkc0B4JogSoDlXuOFIUngzcggW4 X-IronPort-AV: E=Sophos;i="4.89,888,1367985600"; d="scan'208";a="45680649" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 15 Aug 2013 17:39:04 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EFBACB404B; Thu, 15 Aug 2013 17:39:03 -0400 (EDT) Date: Thu, 15 Aug 2013 17:39:03 -0400 (EDT) From: Rick Macklem To: Michael Tratz Message-ID: <461392652.9990692.1376602743970.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: NFS deadlock on 9.2-Beta1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Konstantin Belousov , freebsd-stable@freebsd.org, scottl , Steven Hartland X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Aug 2013 21:39:11 -0000 Michael Tratz wrote: > > On Jul 27, 2013, at 11:25 PM, Konstantin Belousov > wrote: > > > On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote: > >> Let's assume the pid which started the deadlock is 14001 (it will > >> be a different pid when we get the results, because the machine > >> has been restarted) > >> > >> I type: > >> > >> show proc 14001 > >> > >> I get the thread numbers from that output and type: > >> > >> show thread xxxxx > >> > >> for each one. > >> > >> And a trace for each thread with the command? > >> > >> tr xxxx > >> > >> Anything else I should try to get or do? Or is that not the data > >> at all you are looking for? > >> > > Yes, everything else which is listed in the 'debugging deadlocks' > > page > > must be provided, otherwise the deadlock cannot be tracked. > > > > The investigator should be able to see the whole deadlock chain > > (loop) > > to make any useful advance. > > Ok, I have made some excellent progress in debugging the NFS > deadlock. > > Rick! You are genius. :-) You found the right commit r250907 (dated > May 22) is the definitely the problem. > > Here is how I did the testing: One machine received a kernel before > r250907, the second machine received a kernel after r250907. Sure > enough within a few hours the machine with r250907 went into the > usual deadlock state. The machine without that commit kept on > working fine. Then I went back to the latest revision (r253726), but > leaving r250907 out. The machines have been running happy and rock > solid without any deadlocks. I have expanded the testing to 3 > machines now and no reports of any issues. > > I guess now Konstantin has to figure out why that commit is causing > the deadlock. Lovely! :-) I will get that information as soon as > possible. I'm a little behind with normal work load, but I expect to > have the data by Tuesday evening or Wednesday. > Have you been able to pass the debugging info on to Kostik? It would be really nice to get this fixed for FreeBSD9.2. Thanks for your help with this, rick > Thanks again!! > > Michael > >