From owner-freebsd-stable@FreeBSD.ORG Sun Aug 25 11:42:34 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 5B06120F for ; Sun, 25 Aug 2013 11:42:34 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com [IPv6:2607:f8b0:400d:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1B61A2943 for ; Sun, 25 Aug 2013 11:42:34 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id l13so1033440qcy.39 for ; Sun, 25 Aug 2013 04:42:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=1mpSOAX/KIfJsKHPq6ripGmMWnTVHc2UV1yF4MHFqjA=; b=QKYgL0RPtRKffovBWNHy58wKBZrnh+BrRW5EYLBAtwhdkHMjkCi7vu0Rhn1HovGQ42 pQc8Qbq3bVjGjxrEwRc7/pNUX88lWGWYx/39h46Ssnzhtes0GTwPW5piYfoC2n/z8llK XEJmNtsX05ciFGrKKHNx1TWa3HqDgJNxDUCDipNHNypZA6CwCjmdKgMA2MScGWLyOf4E 8hQJG801gRtH5BRSCpPQTS96PZeA4+lpnj1VTma8Zer0UUDX4AVvW2WjvLiEjVub+IXQ O7XSI+bEL+R30vUznPzclneAZkmZIEZbp6L5YFCmSmK/2NAHUta8NmfTUqo1VLCY1nIm 3iwQ== MIME-Version: 1.0 X-Received: by 10.224.68.4 with SMTP id t4mr9695122qai.67.1377430953195; Sun, 25 Aug 2013 04:42:33 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.128.70 with HTTP; Sun, 25 Aug 2013 04:42:33 -0700 (PDT) In-Reply-To: <40674FAC-33E6-4994-819E-6B8318B9DDB3@esosoft.com> References: <461392652.9990692.1376602743970.JavaMail.root@uoguelph.ca> <40674FAC-33E6-4994-819E-6B8318B9DDB3@esosoft.com> Date: Sun, 25 Aug 2013 04:42:33 -0700 X-Google-Sender-Auth: Y7WiYhYdoiljwMTwiwLWf87ZFhY Message-ID: Subject: Re: NFS deadlock on 9.2-Beta1 From: Adrian Chadd To: Michael Tratz Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Rick Macklem , FreeBSD Stable Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 11:42:34 -0000 Hi, Does -HEAD have this same problem? If so, we should likely just revert the patch entirely from -HEAD and -9 until it's resolved. -adrian On 24 August 2013 23:51, Michael Tratz wrote: > > On Aug 15, 2013, at 2:39 PM, Rick Macklem wrote: > > > Michael Tratz wrote: > >> > >> On Jul 27, 2013, at 11:25 PM, Konstantin Belousov > >> wrote: > >> > >>> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote: > >>>> Let's assume the pid which started the deadlock is 14001 (it will > >>>> be a different pid when we get the results, because the machine > >>>> has been restarted) > >>>> > >>>> I type: > >>>> > >>>> show proc 14001 > >>>> > >>>> I get the thread numbers from that output and type: > >>>> > >>>> show thread xxxxx > >>>> > >>>> for each one. > >>>> > >>>> And a trace for each thread with the command? > >>>> > >>>> tr xxxx > >>>> > >>>> Anything else I should try to get or do? Or is that not the data > >>>> at all you are looking for? > >>>> > >>> Yes, everything else which is listed in the 'debugging deadlocks' > >>> page > >>> must be provided, otherwise the deadlock cannot be tracked. > >>> > >>> The investigator should be able to see the whole deadlock chain > >>> (loop) > >>> to make any useful advance. > >> > >> Ok, I have made some excellent progress in debugging the NFS > >> deadlock. > >> > >> Rick! You are genius. :-) You found the right commit r250907 (dated > >> May 22) is the definitely the problem. > >> > >> Here is how I did the testing: One machine received a kernel before > >> r250907, the second machine received a kernel after r250907. Sure > >> enough within a few hours the machine with r250907 went into the > >> usual deadlock state. The machine without that commit kept on > >> working fine. Then I went back to the latest revision (r253726), but > >> leaving r250907 out. The machines have been running happy and rock > >> solid without any deadlocks. I have expanded the testing to 3 > >> machines now and no reports of any issues. > >> > >> I guess now Konstantin has to figure out why that commit is causing > >> the deadlock. Lovely! :-) I will get that information as soon as > >> possible. I'm a little behind with normal work load, but I expect to > >> have the data by Tuesday evening or Wednesday. > >> > > Have you been able to pass the debugging info on to Kostik? > > > > It would be really nice to get this fixed for FreeBSD9.2. > > > > Thanks for your help with this, rick > > Sorry Rick, I wasn't able to get you guys that info quickly enough. I > thought I would have enough time, before my own wedding and honeymoon came > along, but everything went a little crazy and stressful. I didn't think it > would be this nuts. :-) > > I'm caught up with everything and from what I can see from the discussions > is that we know now what the problem is. > > I can report that the machines which I have had without r250907 have been > running without any problems for 27+ days. > > If you need me to test any new patches, please let me know. If I should > test with the partial merge of r253927 I'll be happy to do so. > > Thanks, > > Michael > > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >