Date: Wed, 19 Jul 2006 14:24:24 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: User Freebsd <freebsd@hub.org> Cc: freebsd-stable@freebsd.org, Robert Watson <rwatson@freebsd.org> Subject: Re: file system deadlock - the whole story? Message-ID: <20060719112424.GK1464@deviant.kiev.zoral.com.ua> In-Reply-To: <20060718074804.W1799@ganymede.hub.org> References: <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <cone.1152136419.991036.72616.1000@zoraida.natserv.net> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--IJAclU0AInkryoed Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 18, 2006 at 07:51:52AM -0300, User Freebsd wrote: >=20 > 'k, had a bunch of fun tonight, but one of the results is that I was able= =20 > to achieve file system deadlock, or so it appears ... >=20 > Using the following from DDB: >=20 > set $lines=3D0 > show pcpu > show allpcpu > ps > trace > alltrace > show locks > show alllocks > show uma > show malloc > show lockedvnods > call doadump >=20 > I've been able to produce the attached output, as well as have a core dum= p=20 > that can hopefully be used to gather any that I may have missed this time= =20 > *cross fingers* Marc, I seriously doubt that the problems machine experiencing is deadlock. At the http://people.freebsd.org/~kib/e1.gif is the graph of the locking dependencies for the vnode locks. The edge from process a to process b means that process a holds a lock and process b is waiting for the lock. Black edge means dependency by the vnode lock, red edge - by the buffer lock. As you see, graph is acyclic. Basically, there are two groups of the processes that a blocked: one hierarchy rooted in the pid 66575, this one includes shell 806. Second one is rooted in the process 32. What are they doing ? Pid 66575: Tracing command smtpd pid 66575 tid 101396 td 0xceb0a180 sched_switch(ceb0a180,0,1) at sched_switch+0x177 mi_switch(1,0) at mi_switch+0x270 sleepq_switch(dc5b5b20,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1 sleepq_wait(dc5b5b20,0,c0601d10,e59,8) at sleepq_wait+0x46 msleep(dc5b5b20,c06afde0,44,c061021d,0) at msleep+0x279 bwait(dc5b5b20,44,c061021d) at bwait+0x47 vnode_pager_generic_getpages(c8e85000,ed347c80,1000,0,c8e22000) at vnode_pa= ger_generic_getpages+0x777 ffs_getpages(ed347bbc,c8e85000,0,ed347be8,c0597c41) at ffs_getpages+0x100 VOP_GETPAGES_APV(c063c100,ed347bbc) at VOP_GETPAGES_APV+0xa9 vnode_pager_getpages(c8e22000,ed347c80,1,0) at vnode_pager_getpages+0xa5 vm_fault(c88da4a0,280bb000,1,0,ceb0a180) at vm_fault+0x980 trap_pfault(ed347d38,1,280bb000,280bb000,0) at trap_pfault+0xce trap(3b,3b,3b,8078d1c,807952c) at trap+0x1eb calltrap() at calltrap+0x5 --- trap 0xc, eip =3D 0x280baffd, esp =3D 0xbfbfe894, ebp =3D 0xbfbfe8d8 --- This process waits for the data to be paged in. Pid 32 (syncer) Tracing command syncer pid 32 tid 100033 td 0xc8544780 sched_switch(c8544780,0,1) at sched_switch+0x177 mi_switch(1,0) at mi_switch+0x270 sleepq_switch(dc79fe68,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1 sleepq_wait(dc79fe68,0,c0601d10,e59,c06039a0) at sleepq_wait+0x46 msleep(dc79fe68,c06afde0,4c,c06024dc,0) at msleep+0x279 bwait(dc79fe68,4c,c06024dc) at bwait+0x47 bufwait(dc79fe68,1,0,0,0) at bufwait+0x1a breadn(c8a0b414,6537700,0,4000,0) at breadn+0x266 bread(c8a0b414,6537700,0,4000,0) at bread+0x20 ffs_update(c9992000,0,6,0,0) at ffs_update+0x228 ffs_syncvnode(c9992000,3) at ffs_syncvnode+0x3be ffs_sync(c8831400,3,c8544780,c8831400,2) at ffs_sync+0x209 sync_fsync(e817fcbc,c8a11ae0,c8a11bec,e817fcd8,c04ed586) at sync_fsync+0x126 VOP_FSYNC_APV(c0634220,e817fcbc) at VOP_FSYNC_APV+0x9b sync_vnode(c8a11bec,c8544780) at sync_vnode+0x106 sched_sync(0,e817fd38,0,c04ed614,0) at sched_sync+0x1ed fork_exit(c04ed614,0,e817fd38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip =3D 0, esp =3D 0xe817fd6c, ebp =3D 0 --- also waits for the data. What happens with blocks ? syncer (pid 32) locked block 0xc8a0b414 and waits for data (as shown before= ). Processes 33 (softdepflush), umount (pid 73338) waits for this block. You did not provided the output of "show lockedbufs", but, even without that data, I doubt that the buf subsystem deadlocked by itself. I make an conjecture that the problem is either with you disk hardware (i.e= ., actual hard drive or disk controller), or in the controller driver. At least, you could show us the dmesg. --IJAclU0AInkryoed Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFEvhZnC3+MBN1Mb4gRAkBOAJ9PRADeaDsO6B4ugtqBgZrrsckMpACfRmnv JEX9eaQqtjmB2VRA0HsdV/Y= =pgP4 -----END PGP SIGNATURE----- --IJAclU0AInkryoed--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060719112424.GK1464>