Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Aug 2013 17:39:03 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Michael Tratz <michael@esosoft.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org, scottl  <scottl@freebsd.org>, Steven Hartland <killing@multiplay.co.uk>
Subject:   Re: NFS deadlock on 9.2-Beta1
Message-ID:  <461392652.9990692.1376602743970.JavaMail.root@uoguelph.ca>
In-Reply-To: <F20E755D-EE01-4411-8790-1E2BC7D8CD5D@esosoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Michael Tratz wrote:
> 
> On Jul 27, 2013, at 11:25 PM, Konstantin Belousov
> <kostikbel@gmail.com> wrote:
> 
> > On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote:
> >> Let's assume the pid which started the deadlock is 14001 (it will
> >> be a different pid when we get the results, because the machine
> >> has been restarted)
> >> 
> >> I type:
> >> 
> >> show proc 14001
> >> 
> >> I get the thread numbers from that output and type:
> >> 
> >> show thread xxxxx
> >> 
> >> for each one.
> >> 
> >> And a trace for each thread with the command?
> >> 
> >> tr xxxx
> >> 
> >> Anything else I should try to get or do? Or is that not the data
> >> at all you are looking for?
> >> 
> > Yes, everything else which is listed in the 'debugging deadlocks'
> > page
> > must be provided, otherwise the deadlock cannot be tracked.
> > 
> > The investigator should be able to see the whole deadlock chain
> > (loop)
> > to make any useful advance.
> 
> Ok, I have made some excellent progress in debugging the NFS
> deadlock.
> 
> Rick! You are genius. :-) You found the right commit r250907 (dated
> May 22) is the definitely the problem.
> 
> Here is how I did the testing: One machine received a kernel before
> r250907, the second machine received a kernel after r250907. Sure
> enough within a few hours the machine with r250907 went into the
> usual deadlock state. The machine without that commit kept on
> working fine. Then I went back to the latest revision (r253726), but
> leaving r250907 out. The machines have been running happy and rock
> solid without any deadlocks. I have expanded the testing to 3
> machines now and no reports of any issues.
> 
> I guess now Konstantin has to figure out why that commit is causing
> the deadlock. Lovely! :-) I will get that information as soon as
> possible. I'm a little behind with normal work load, but I expect to
> have the data by Tuesday evening or Wednesday.
> 
Have you been able to pass the debugging info on to Kostik?

It would be really nice to get this fixed for FreeBSD9.2.

Thanks for your help with this, rick

> Thanks again!!
> 
> Michael
> 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?461392652.9990692.1376602743970.JavaMail.root>