Date: Fri, 07 Mar 2008 07:43:46 +0100 From: Tomas Olsson <tol@stacken.kth.se> To: Alec Kloss <alec-dated-1205290157.d7dd21@SetFilePointer.com> Cc: afs@FreeBSD.org, arla-drinkers@stacken.kth.se, Robert Watson <rwatson@FreeBSD.org>, Garance A Drosehn <gad@FreeBSD.org>, Rasmus Kaj <kaj@kth.se> Subject: Re: arla-devel port for FreeBSD (was: Patches to get Arla running on FreeBSD 8-CURRENT) Message-ID: <1204872226.4059.15.camel@hippo.t.nxs.se> In-Reply-To: <20080307024916.GC1911@hamlet.SetFilePointer.com> References: <20080223102922.GF38141@hamlet.setfilepointer.com> <20080223110549.GG38141@hamlet.setfilepointer.com> <20080223161249.GH38141@hamlet.setfilepointer.com> <90334B40754BEDC2991E0147@ganymede.hub.org> <p0624081bc3e936674ece@[128.113.24.47]> <20080226061140.GI28956@hamlet.SetFilePointer.com> <20080301210055.GA8919@hamlet.SetFilePointer.com> <20080302161258.L21146@fledge.watson.org> <1204477663.4180.36.camel@hippo.t.nxs.se> <20080303045554.GC8919@hamlet.SetFilePointer.com> <20080307024916.GC1911@hamlet.SetFilePointer.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2008-03-06 at 20:49 -0600, Alec Kloss wrote: > Anyway, Tomas, or others, do you have any hints for me about how > best to start diagnosing and maybe fixing issues? The most > repeatable way I've found to get bad behavior is to rsync -a > /usr/src and /usr/obj into AFS. After 30 seconds or so of this, > I'll start getting messages like these: > > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > > on the console. Eventually, rsync will block and generally things > will decay. Overnight, I'm going to script the console while > attempting this with nnpfsdeb almost-all set. This is, of course, > a lot slower than arla normally runs, but I'm hoping someone may be > able to see the source of the trouble. I'll post the console > somewhere tomorrow. > > Anyway, any hints about debugging arla would be welcome. > Some random thoughts: * If you don't have it yet, get a debug kernel with full vfs sanity checking etc. * Set a breakpoint (or panic) at the lockmgr printf and inspect stack trace and other live threads. * See if you can run into similar problems using arla's tests, if you're lucky there will be a faster way to trigger it. * Perhaps you can cut down on almost-all. Not sure how much. Of course, there's always the risk that timing changes with nnpfsdebug on. * try arlad --tracefile=foo.trace (in the cache dir) and cat it to nnpfs/readtrace.py to decipher it when you're done. It's fast and gives a complete log of arlad-nnpfs communication. Hope this helps /t
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1204872226.4059.15.camel>