From owner-freebsd-stable@FreeBSD.ORG Wed Jan 4 01:56:23 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0DC0A16A420 for ; Wed, 4 Jan 2006 01:56:23 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FC8743D5E for ; Wed, 4 Jan 2006 01:56:22 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id k041u5Z9008271; Tue, 3 Jan 2006 17:56:09 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200601040156.k041u5Z9008271@gw.catspoiler.org> Date: Tue, 3 Jan 2006 17:56:05 -0800 (PST) From: Don Lewis To: gcr+freebsd-stable@tharned.org In-Reply-To: <20060103182349.X798@nc8000.tharned.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-stable@FreeBSD.org, freebsd@McKusick.COM, kris@obsecurity.org Subject: Re: Recurring problem: processes block accessing UFS file system X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jan 2006 01:56:23 -0000 On 3 Jan, Greg Rivers wrote: > On Tue, 3 Jan 2006, Don Lewis wrote: > >> There are large number of sendmail processes waiting on vnode locks >> which are held by other sendmail processes that are waiting on other >> vnode locks, etc. until we get to sendmail pid 87150 which is holding a >> vnode lock and waiting to lock a buf. >> >> Tracing command sendmail pid 87150 tid 100994 td 0xcf1c5480 >> sched_switch(cf1c5480,0,1,b2c5195e,a480a2bc) at sched_switch+0x158 >> mi_switch(1,0,c04d7b33,dc713fb0,ec26a6ac) at mi_switch+0x1d5 >> sleepq_switch(dc713fb0,ec26a6e0,c04bb9ce,dc713fb0,50) at sleepq_switch+0x16f >> sleepq_wait(dc713fb0,50,c0618ef5,0,202122) at sleepq_wait+0x11 >> msleep(dc713fb0,c0658430,50,c0618ef5,0) at msleep+0x3d7 >> acquire(ec26a748,120,60000,15c2e6e0,0) at acquire+0x89 >> lockmgr(dc713fb0,202122,c89855cc,cf1c5480,dc76fe30) at lockmgr+0x45f >> getblk(c8985550,15c2e6e0,0,4000,0) at getblk+0x211 >> breadn(c8985550,15c2e6e0,0,4000,0) at breadn+0x52 >> bread(c8985550,15c2e6e0,0,4000,0) at bread+0x4c >> ffs_vget(c8870000,ae58b3,2,ec26a8d4,8180) at ffs_vget+0x383 >> ffs_valloc(c8d41660,8180,c92e8d00,ec26a8d4,c05f9302) at ffs_valloc+0x154 >> ufs_makeinode(8180,c8d41660,ec26abd4,ec26abe8,ec26aa24) at ufs_makeinode+0x61 >> ufs_create(ec26aa50,ec26aa24,ec26ad04,ec26abc0,ec26ab0c) at ufs_create+0x36 >> VOP_CREATE_APV(c0646cc0,ec26aa50,2,ec26aa50,0) at VOP_CREATE_APV+0x3c >> vn_open_cred(ec26abc0,ec26acc0,180,c92e8d00,6) at vn_open_cred+0x1fe >> vn_open(ec26abc0,ec26acc0,180,6,c679eacb) at vn_open+0x33 >> kern_open(cf1c5480,81416c0,0,a03,180) at kern_open+0xca >> open(cf1c5480,ec26ad04,c,cf1c5480,8169000) at open+0x36 >> syscall(3b,bfbf003b,bfbf003b,0,a02) at syscall+0x324 >> Xint0x80_syscall() at Xint0x80_syscall+0x1f >> >> This doesn't appear to be a buf/memory exhausting problem because >> syncer, bufdaemon, and pagedaemon all appear to be idle. >> >> What does "show lockedbufs" say? >> > > db> show lockedbufs [snip] looks like this is the buf that pid 87150 is waiting for: > buf at 0xdc713f50 > b_flags = 0xa00200a0 > b_error = 0, b_bufsize = 16384, b_bcount = 16384, b_resid = 0 > b_bufobj = (0xc8985610), b_data = 0xe0b7b000, b_blkno = 365094624 > lockstatus = 2, excl count = 1, excl owner 0xcfeb5d80 > b_npages = 4, pages(OBJ, IDX, PA): (0xc8984108, 0x2b85cdc, 0xa89e9000),(0xc8984108, 0x2b85cdd, 0xa852a000),(0xc8984108, 0x2b85cde, 0xa850b000),(0xc8984108, 0x2b85cdf, 0xa836c000) which is locked by this thread: Tracing command sendmail pid 87117 tid 101335 td 0xcfeb5d80 sched_switch(cfeb5d80,0,1,fd1926a,640c65f9) at sched_switch+0x158 mi_switch(1,0,c04d7b33,dc76fe8c,ec883b2c) at mi_switch+0x1d5 sleepq_switch(dc76fe8c,ec883b60,c04bb9ce,dc76fe8c,4c) at sleepq_switch+0x16f sleepq_wait(dc76fe8c,4c,c061e9ac,0,0) at sleepq_wait+0x11 msleep(dc76fe8c,c0662f80,4c,c061e9ac,0) at msleep+0x3d7 getdirtybuf(dc76fe30,c0662f80,1,ec883ba8,0) at getdirtybuf+0x221 softdep_update_inodeblock(cd1bc528,dc713f50,1,4000,0) at softdep_update_inodeblo ck+0x267 ffs_update(cd953bb0,1,0,cd953bb0,ec883c78,c0529a59,0,0,0,4,1,cd953c2c) at ffs_up date+0x27f ffs_syncvnode(cd953bb0,1,4,ec883c78,c05f9a70) at ffs_syncvnode+0x52e ffs_fsync(ec883cb4,ec883cd0,c052468a,c0646cc0,ec883cb4) at ffs_fsync+0x1c VOP_FSYNC_APV(c0646cc0,ec883cb4,0,0,0) at VOP_FSYNC_APV+0x3a fsync(cfeb5d80,ec883d04,4,cfeb5d80,ec883d2c) at fsync+0x1db syscall(3b,3b,3b,80c7c1b,bfbfa6b0) at syscall+0x324 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (95, FreeBSD ELF32, fsync), eip = 0x8830f63f, esp = 0xbfbfa66c, ebp = 0xbfbfaf98 --- Pid 87117 is playing with buf 0xdc76fe30 which is not locked, and is sleeping on the buf's b_xflags member. It looks like 87117 is waiting for an in-progress write to complete. There are a large number of other sendmail processes waiting in this same place. How about "show buffer 0xdc76fe30"? This is getting into an area of the kernel that I do not understand well.