Date: Mon, 10 Jan 2011 18:48:26 +0100 From: "Ronald Klop" <ronald-freebsd8@klop.yi.org> To: "Rick Macklem" <rmacklem@uoguelph.ca>, "Kostik Belousov" <kostikbel@gmail.com> Cc: freebsd-stable@freebsd.org Subject: Re: Hang in VOP_LOCK1_APV on 8-STABLE with NFS. Message-ID: <op.vo3s20fg8527sy@212-123-145-58.ip.telfort.nl> In-Reply-To: <20110107195257.GF12599@deviant.kiev.zoral.com.ua> References: <op.voxs8lqx8527sy@212-123-145-58.ip.telfort.nl> <1542786719.258389.1294429045433.JavaMail.root@erie.cs.uoguelph.ca> <20110107195257.GF12599@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 07 Jan 2011 20:52:57 +0100, Kostik Belousov <kostikbel@gmail.com>= =20 wrote: > On Fri, Jan 07, 2011 at 02:37:25PM -0500, Rick Macklem wrote: >> > Hi, >> > >> > OpenOffice hangs on NFS when I try to save a file or even when I try >> > to >> > open the save dialog in this case. >> > >> > >> > $ 17:25:35 ronald@ronald [~] >> > procstat -kk 85575 >> > PID TID COMM TDNAME KSTACK >> > 85575 100322 soffice.bin initial thread mi_switch+0x176 >> > sleepq_wait+0x3b __lockmgr_args+0x655 vop_stdlock+0x39 >> > VOP_LOCK1_APV+0x46 >> > _vn_lock+0x44 vget+0x67 vfs_hash_get+0xeb nfs_nget+0xa8 >> > nfs_lookup+0x65e >> > VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 kern_statat_vnhook+0x82 >> > kern_statat+0x15 lstat+0x22 syscallenter+0x186 syscall+0x40 >> > 85575 100502 soffice.bin - mi_switch+0x176 >> > sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0 >> > do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186 >> > syscall+0x40 >> > Xfast_syscall+0xe2 >> > 85575 100576 soffice.bin - mi_switch+0x176 >> > sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0 >> > do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186 >> > syscall+0x40 >> > Xfast_syscall+0xe2 >> > 85575 100577 soffice.bin - mi_switch+0x176 >> > sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _sleep+0x25d >> > kern_accept+0x19c accept+0xfe syscallenter+0x186 syscall+0x40 >> > Xfast_syscall+0xe2 >> > 85575 100578 soffice.bin - mi_switch+0x176 >> > sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _cv_wait_sig+0x10e >> > seltdwait+0xed poll+0x457 syscallenter+0x186 syscall+0x40 >> > Xfast_syscall+0xe2 >> > 85575 100579 soffice.bin - mi_switch+0x176 >> > sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 >> > _cv_timedwait_sig+0x11d seltdwait+0x79 poll+0x457 syscallenter+0x186 >> > syscall+0x40 Xfast_syscall+0xe2 >> > >> > $ 17:25:35 ronald@ronald [~] >> > uname -a >> > FreeBSD ronald.office.base.nl 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE >> > #6: >> > Mon Dec 27 23:49:30 CET 2010 >> > root@ronald.office.base.nl:/usr/obj/usr/src/sys/GENERIC amd64 >> > >> I think all the above tells us is that the thread is waiting for >> a vnode lock. The question then becomes "what is holding a lock >> on that vnode and why?". >> >> > It is not possible to exit or kill soffice.bin. I had a slighty >> > different >> > procstat stack before, but that was fixed a couple of days ago. >> >> Yea, it will be in an uniterruptible sleep when waiting for a vnode =20 >> lock. >> >> > Any thoughts? Enabling local locks in NFS doesn't fix it. >> >> Here's some things you could try: >> 1 - apply the attached patch. It fixes a known problem w.r.t. the >> client side of the krpc. Not likely to fix this, but I can hope:-) > 1a - Look around of other processes in the uninterruptible sleep state, > quite possible, one of them also owns the lock the openoffice is waitin= g > for. Also see > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/ke= rneldebug-deadlocks.html > > Of the particular interest are the witness output and backtraces for > all threads that are reported by witness as owning the vnode locks. > >> 2 - If #1 doesn't fix the problem: >> - before making it hang, start capturing packets via: >> # tcpdump -s 0 -w xxx host server >> - then make it hang, kill the above and >> # procstat -ka >> # ps axHlww >> and capture the output of both of these. Hopefully these 2 command= s >> will indicate what is holding the vnode lock and maybe, why. The >> "xxx" file can be looked at in wireshark to see what/if any NFS >> traffic is happening. >> If you aren't comfortable looking at the above, you can email them >> to me and I'll take a stab at them someday. >> 3 - Try the experimental client to see if it behaves differently. The >> mount command is: >> # mount -t newnfs -o nfsv3,<the options you already use> =20 >> server:/path /mntpath >> (This might ideantify if the regular client has an infrequently =20 >> executed code >> path that forgets to unlock the vnode, since it uses a somewhat =20 >> different RPC >> layer. The buffer cache handling etc are almost the same, but the= =20 >> RPC stuff is >> fairly different.) >> >> > The nfs server is an up-to-date Linux Debian 5 with kernel 2.6.26. >> > >> I'm afraid I can't blame Linux (at least not until we have more info;-= ). >> >> > If more info is needed. I can easily reproduce this. >> >> See above #2. >> >> Good luck with it and let us know how it goes, rick Hi, I have got the first steps set up. No solution yet. 1. With the patch OpenOffice opens my homedir (yeah!), but it gives an I/= O =20 error when saving a file and everything hangs after that. 2. I have dumps and stuff. I will mail some links in private e-mail. 3. Didn't work. It mount, but ls -l /home gives "Operation not permitted"= . I didn't see other processes in uninterruptable state. But maybe you guys= =20 see more than I do. If you don't see anything in wireshark I will try WITNESS and friends =20 later this week. Already 2 hours busy with this during work hours. Ronald.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.vo3s20fg8527sy>