Date: Tue, 3 Jan 2006 13:28:02 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: gcr+freebsd-stable@tharned.org Cc: freebsd@McKusick.COM, freebsd-stable@FreeBSD.org, kris@obsecurity.org Subject: Re: Recurring problem: processes block accessing UFS file system Message-ID: <200601032128.k03LS2Il007744@gw.catspoiler.org> In-Reply-To: <20060103120454.O798@nc8000.tharned.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 3 Jan, Greg Rivers wrote: > On Tue, 22 Nov 2005, I wrote: > >> On Mon, 21 Nov 2005, Kris Kennaway wrote: >> >>> It may not be the same problem. You should also try to obtain a trace when >>> snapshots are not implicated. >>> >> >> Agreed. I'll do so at the first opportunity. >> > > First, my thanks to all of you for looking into this. > > It's taken more than a month, but the problem has recurred without > snapshots ever having been run. I've got a good trace of the machine in > this state (attached). My apologies for the size of the debug output, but > the processes had really stacked up this time before I noticed it. > > I have enough capacity that I can afford to have this machine out of > production for a while, so I've left it suspended in kdb for the time > being in case additional information is needed. Please let me know if > there's anything else I can do to facilitate troubleshooting this. > Thanks! There are large number of sendmail processes waiting on vnode locks which are held by other sendmail processes that are waiting on other vnode locks, etc. until we get to sendmail pid 87150 which is holding a vnode lock and waiting to lock a buf. Tracing command sendmail pid 87150 tid 100994 td 0xcf1c5480 sched_switch(cf1c5480,0,1,b2c5195e,a480a2bc) at sched_switch+0x158 mi_switch(1,0,c04d7b33,dc713fb0,ec26a6ac) at mi_switch+0x1d5 sleepq_switch(dc713fb0,ec26a6e0,c04bb9ce,dc713fb0,50) at sleepq_switch+0x16f sleepq_wait(dc713fb0,50,c0618ef5,0,202122) at sleepq_wait+0x11 msleep(dc713fb0,c0658430,50,c0618ef5,0) at msleep+0x3d7 acquire(ec26a748,120,60000,15c2e6e0,0) at acquire+0x89 lockmgr(dc713fb0,202122,c89855cc,cf1c5480,dc76fe30) at lockmgr+0x45f getblk(c8985550,15c2e6e0,0,4000,0) at getblk+0x211 breadn(c8985550,15c2e6e0,0,4000,0) at breadn+0x52 bread(c8985550,15c2e6e0,0,4000,0) at bread+0x4c ffs_vget(c8870000,ae58b3,2,ec26a8d4,8180) at ffs_vget+0x383 ffs_valloc(c8d41660,8180,c92e8d00,ec26a8d4,c05f9302) at ffs_valloc+0x154 ufs_makeinode(8180,c8d41660,ec26abd4,ec26abe8,ec26aa24) at ufs_makeinode+0x61 ufs_create(ec26aa50,ec26aa24,ec26ad04,ec26abc0,ec26ab0c) at ufs_create+0x36 VOP_CREATE_APV(c0646cc0,ec26aa50,2,ec26aa50,0) at VOP_CREATE_APV+0x3c vn_open_cred(ec26abc0,ec26acc0,180,c92e8d00,6) at vn_open_cred+0x1fe vn_open(ec26abc0,ec26acc0,180,6,c679eacb) at vn_open+0x33 kern_open(cf1c5480,81416c0,0,a03,180) at kern_open+0xca open(cf1c5480,ec26ad04,c,cf1c5480,8169000) at open+0x36 syscall(3b,bfbf003b,bfbf003b,0,a02) at syscall+0x324 Xint0x80_syscall() at Xint0x80_syscall+0x1f This doesn't appear to be a buf/memory exhausting problem because syncer, bufdaemon, and pagedaemon all appear to be idle. What does "show lockedbufs" say?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200601032128.k03LS2Il007744>