From owner-freebsd-stable@FreeBSD.ORG Wed Jul 19 11:24:33 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 534FB16A4E5; Wed, 19 Jul 2006 11:24:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (fw.zoral.com.ua [213.186.206.134]) by mx1.FreeBSD.org (Postfix) with ESMTP id 62E4B43D49; Wed, 19 Jul 2006 11:24:32 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k6JBORWi031490 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Jul 2006 14:24:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k6JBORu1029147; Wed, 19 Jul 2006 14:24:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k6JBOOZk029146; Wed, 19 Jul 2006 14:24:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Jul 2006 14:24:24 +0300 From: Kostik Belousov To: User Freebsd Message-ID: <20060719112424.GK1464@deviant.kiev.zoral.com.ua> References: <20060705100403.Y80381@fledge.watson.org> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IJAclU0AInkryoed" Content-Disposition: inline In-Reply-To: <20060718074804.W1799@ganymede.hub.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.4 required=5.0 tests=ALL_TRUSTED, DNS_FROM_RFC_ABUSE,SPF_NEUTRAL autolearn=no version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on fw.zoral.com.ua Cc: freebsd-stable@freebsd.org, Robert Watson Subject: Re: file system deadlock - the whole story? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2006 11:24:33 -0000 --IJAclU0AInkryoed Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 18, 2006 at 07:51:52AM -0300, User Freebsd wrote: >=20 > 'k, had a bunch of fun tonight, but one of the results is that I was able= =20 > to achieve file system deadlock, or so it appears ... >=20 > Using the following from DDB: >=20 > set $lines=3D0 > show pcpu > show allpcpu > ps > trace > alltrace > show locks > show alllocks > show uma > show malloc > show lockedvnods > call doadump >=20 > I've been able to produce the attached output, as well as have a core dum= p=20 > that can hopefully be used to gather any that I may have missed this time= =20 > *cross fingers* Marc, I seriously doubt that the problems machine experiencing is deadlock. At the http://people.freebsd.org/~kib/e1.gif is the graph of the locking dependencies for the vnode locks. The edge from process a to process b means that process a holds a lock and process b is waiting for the lock. Black edge means dependency by the vnode lock, red edge - by the buffer lock. As you see, graph is acyclic. Basically, there are two groups of the processes that a blocked: one hierarchy rooted in the pid 66575, this one includes shell 806. Second one is rooted in the process 32. What are they doing ? Pid 66575: Tracing command smtpd pid 66575 tid 101396 td 0xceb0a180 sched_switch(ceb0a180,0,1) at sched_switch+0x177 mi_switch(1,0) at mi_switch+0x270 sleepq_switch(dc5b5b20,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1 sleepq_wait(dc5b5b20,0,c0601d10,e59,8) at sleepq_wait+0x46 msleep(dc5b5b20,c06afde0,44,c061021d,0) at msleep+0x279 bwait(dc5b5b20,44,c061021d) at bwait+0x47 vnode_pager_generic_getpages(c8e85000,ed347c80,1000,0,c8e22000) at vnode_pa= ger_generic_getpages+0x777 ffs_getpages(ed347bbc,c8e85000,0,ed347be8,c0597c41) at ffs_getpages+0x100 VOP_GETPAGES_APV(c063c100,ed347bbc) at VOP_GETPAGES_APV+0xa9 vnode_pager_getpages(c8e22000,ed347c80,1,0) at vnode_pager_getpages+0xa5 vm_fault(c88da4a0,280bb000,1,0,ceb0a180) at vm_fault+0x980 trap_pfault(ed347d38,1,280bb000,280bb000,0) at trap_pfault+0xce trap(3b,3b,3b,8078d1c,807952c) at trap+0x1eb calltrap() at calltrap+0x5 --- trap 0xc, eip =3D 0x280baffd, esp =3D 0xbfbfe894, ebp =3D 0xbfbfe8d8 --- This process waits for the data to be paged in. Pid 32 (syncer) Tracing command syncer pid 32 tid 100033 td 0xc8544780 sched_switch(c8544780,0,1) at sched_switch+0x177 mi_switch(1,0) at mi_switch+0x270 sleepq_switch(dc79fe68,c0661b60,0,c05fd078,20c) at sleepq_switch+0xc1 sleepq_wait(dc79fe68,0,c0601d10,e59,c06039a0) at sleepq_wait+0x46 msleep(dc79fe68,c06afde0,4c,c06024dc,0) at msleep+0x279 bwait(dc79fe68,4c,c06024dc) at bwait+0x47 bufwait(dc79fe68,1,0,0,0) at bufwait+0x1a breadn(c8a0b414,6537700,0,4000,0) at breadn+0x266 bread(c8a0b414,6537700,0,4000,0) at bread+0x20 ffs_update(c9992000,0,6,0,0) at ffs_update+0x228 ffs_syncvnode(c9992000,3) at ffs_syncvnode+0x3be ffs_sync(c8831400,3,c8544780,c8831400,2) at ffs_sync+0x209 sync_fsync(e817fcbc,c8a11ae0,c8a11bec,e817fcd8,c04ed586) at sync_fsync+0x126 VOP_FSYNC_APV(c0634220,e817fcbc) at VOP_FSYNC_APV+0x9b sync_vnode(c8a11bec,c8544780) at sync_vnode+0x106 sched_sync(0,e817fd38,0,c04ed614,0) at sched_sync+0x1ed fork_exit(c04ed614,0,e817fd38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip =3D 0, esp =3D 0xe817fd6c, ebp =3D 0 --- also waits for the data. What happens with blocks ? syncer (pid 32) locked block 0xc8a0b414 and waits for data (as shown before= ). Processes 33 (softdepflush), umount (pid 73338) waits for this block. You did not provided the output of "show lockedbufs", but, even without that data, I doubt that the buf subsystem deadlocked by itself. I make an conjecture that the problem is either with you disk hardware (i.e= ., actual hard drive or disk controller), or in the controller driver. At least, you could show us the dmesg. --IJAclU0AInkryoed Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFEvhZnC3+MBN1Mb4gRAkBOAJ9PRADeaDsO6B4ugtqBgZrrsckMpACfRmnv JEX9eaQqtjmB2VRA0HsdV/Y= =pgP4 -----END PGP SIGNATURE----- --IJAclU0AInkryoed--