Date: Wed, 5 Jul 2006 23:49:13 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: Francisco Reyes <lists@stringsutils.com> Cc: freebsd-stable@freebsd.org, Michel Talon <talon@lpthe.jussieu.fr> Subject: Re: NFS Locking Issue Message-ID: <20060705234514.I70011@fledge.watson.org> In-Reply-To: <cone.1152136419.991036.72616.1000@zoraida.natserv.net> References: <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <cone.1152136419.991036.72616.1000@zoraida.natserv.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 5 Jul 2006, Francisco Reyes wrote: >> can you trigger it using work on just one client against a server, without >> client<->client interactions? This makes tracking and reproduction a lot >> easier > > Personally I am experiencing two problems. > 1- NFS clients freeze/hang if the server goes away. > We have clients with several mounts so if one of the servers dies then the > entire operation of the client is put in jeopardy. > > This I can reproduce every single time with a 6.X client.. with both a 5.X > and a 6.X server. > > "umount -f" hangs too. The problems you are experiencing are almost certainly not related to rpc.lockd, rather, bugs in the NFS client. Let's just look at the normal use hang for now, and revisit umount -f after that. >> as multi-client test cases are really tricky! > > The second case only happens under heavy load and restarting nfsd makes it > go away. Basically 'b' column in vmstat goes high and the performnance of > the machine falls to the floor. > > Going to try > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld > ebug-deadlocks.html > > And reading up on how to debug with DDB. Have another user who volunteered > to give me some pointers.. so will try that.. so I am able to actually > produce more helpfull info. If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods Note that the last two will only work if you compile WITNESS in -- WITNESS significantly changes kernel timing, so you may find it closes whatever race you're running into. If you can reproduce the problem with WITNESS and INVARIANTS, that would be very useful. The above output will hopefully tell us the basic state of the system with respect to processes, threads, locking, and so on, and may help us track things down. For the above, you definitely want a serial console as it will be quite a bit of output. Also, can you send the output of the 'mount' command from the un-hung state? I notice a lot of threads stuck in 'ufs'. Finally, during the above, if you could disable background file system checking by placing the following in /etc/rc.conf: background_fsck="NO" And boot to single user mode, doing a full fsck -p before booting up, in order to make sure the file system is in a good state before beginning. Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060705234514.I70011>