Date: Mon, 29 Jul 2013 13:44:39 -0700 From: Michael Tratz <michael@esosoft.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-stable@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>, Steven Hartland <killing@multiplay.co.uk> Subject: Re: NFS deadlock on 9.2-Beta1 Message-ID: <F20E755D-EE01-4411-8790-1E2BC7D8CD5D@esosoft.com> In-Reply-To: <20130728062545.GE4972@kib.kiev.ua> References: <780BC2DB-3BBA-4396-852B-0EBDF30BF985@esosoft.com> <806421474.2797338.1374956449542.JavaMail.root@uoguelph.ca> <20130727205815.GC4972@kib.kiev.ua> <602747E8-0EBE-4BB1-8019-C02C25B75FA1@esosoft.com> <20130728062545.GE4972@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 27, 2013, at 11:25 PM, Konstantin Belousov <kostikbel@gmail.com> = wrote: > On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote: >> Let's assume the pid which started the deadlock is 14001 (it will be = a different pid when we get the results, because the machine has been = restarted) >>=20 >> I type: >>=20 >> show proc 14001 >>=20 >> I get the thread numbers from that output and type: >>=20 >> show thread xxxxx >>=20 >> for each one. >>=20 >> And a trace for each thread with the command? >>=20 >> tr xxxx >>=20 >> Anything else I should try to get or do? Or is that not the data at = all you are looking for? >>=20 > Yes, everything else which is listed in the 'debugging deadlocks' page > must be provided, otherwise the deadlock cannot be tracked. >=20 > The investigator should be able to see the whole deadlock chain (loop) > to make any useful advance. Ok, I have made some excellent progress in debugging the NFS deadlock. Rick! You are genius. :-) You found the right commit r250907 (dated May = 22) is the definitely the problem. Here is how I did the testing: One machine received a kernel before = r250907, the second machine received a kernel after r250907. Sure enough = within a few hours the machine with r250907 went into the usual deadlock = state. The machine without that commit kept on working fine. Then I went = back to the latest revision (r253726), but leaving r250907 out. The = machines have been running happy and rock solid without any deadlocks. I = have expanded the testing to 3 machines now and no reports of any = issues. I guess now Konstantin has to figure out why that commit is causing the = deadlock. Lovely! :-) I will get that information as soon as possible. = I'm a little behind with normal work load, but I expect to have the data = by Tuesday evening or Wednesday. Thanks again!! Michael
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F20E755D-EE01-4411-8790-1E2BC7D8CD5D>