Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Jul 2006 14:38:22 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        freebsd-stable@freebsd.org, Michel Talon <talon@lpthe.jussieu.fr>
Subject:   Re: NFS Locking Issue
Message-ID:  <20060705113822.GM37822@deviant.kiev.zoral.com.ua>
In-Reply-To: <20060705100403.Y80381@fledge.watson.org>
References:  <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> The most significant problem working with rpc.lockd is creating easy to 
> reproduce test cases.  Not least because they can potentially involve 
> multiple clients.  If you can help to produce simple test cases to 
> reproduce the bugs you're seeing, that would be invaluable.
> 
........
> 
> Reducing complex failure modes to easily reproduced test cases is tricky 
> also, though.  It requires careful analysis, often with ktrace and 
> tcpdump/ethereal to work out what's going on, and not a little luck to 
> perform the reduction of a large trace down to a simple test scenario.  The 
> first step is to try and figure out what, if any, specific workload results 
> in a problem.  For example, can you trigger it using work on just one 
> client against a server, without client<->client interactions?  This makes 
> tracking and reproduction a lot easier, as multi-client test cases are 
> really tricky!  Once you've established whether it can be reproduced with a 
> single client, you have to track down the behavior that triggers it -- 
> normally, this is done by attempting to narrow down the specific program or 
> sequence of events that causes the bug to trigger, removing things one at a 
> time to see what causes the problem to disappear.  This is made more 
> difficult as lock managers are sensitive to timing, so removing a high load 
> item from the list, even if it isn't the source of the problem, might cause 
> it to trigger less frequently.

I made the patch for rpc.lockd that could somewhat ease obtaining
debug information. Patch is available at
http://people.freebsd.org/~kib/rpc.lockd-debug.patch

No functional changes. Patch only adds dumping of currently held locks
(as perceived by lockd) on receiving of SIGUSR1. You need to specify
debug level 2 or 3 to obtain the dump.

Also, the both lockd processes now put identification information
in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)

iD8DBQFEq6SuC3+MBN1Mb4gRApCoAKCtMr8xxjm6SRZo/v19JLCc6AYa/ACffhrk
DwT7qAM1B0b73pWvr4m7GxU=
=4Dzc
-----END PGP SIGNATURE-----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060705113822.GM37822>