From owner-freebsd-geom@FreeBSD.ORG Tue Feb 3 07:56:36 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F048A16A4CF for ; Tue, 3 Feb 2004 07:56:35 -0800 (PST) Received: from mailbox.univie.ac.at (mailbox.univie.ac.at [131.130.1.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B25843D1D for ; Tue, 3 Feb 2004 07:56:32 -0800 (PST) (envelope-from le@FreeBSD.org) Received: from wireless (adslle.cc.univie.ac.at [131.130.102.11]) by mailbox.univie.ac.at (8.12.10/8.12.10) with ESMTP id i13FuP8T124950 for ; Tue, 3 Feb 2004 16:56:27 +0100 Date: Tue, 3 Feb 2004 16:56:23 +0100 (CET) From: Lukas Ertl To: freebsd-geom@FreeBSD.org Message-ID: <20040203164816.X616@korben.in.tern> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-DCC-ZID-Univie-Metrics: mailbox 4243; Body=0 Fuz1=0 Fuz2=0 Subject: vinum and GEOM deadlock situation X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Feb 2004 15:56:36 -0000 Hi, I'm running into a deadlock situation with the following scenario: Have a vinum RAID5 with several disks mounted, pull out one of the disks, shortly thereafter all I/O hangs. I managed to identify the deadlock, but couldn't come up with a fix yet. Let's see. Here's the backtrace of the vinum process: (kgdb) defproc 512 512 c685da50 e3eac000 0 1 512 000200 1 vinum g_waitfor_event c6852200 frame 0 at 0xe3e865a4: ebp e3e865ec, eip 0xc04e20ba : add $0x4,%esp frame 1 at 0xe3e865ec: ebp e3e86610, eip 0xc04e19ad : add $0x4,%esp frame 2 at 0xe3e86610: ebp e3e86638, eip 0xc04b0873 :add $0x14,%esp frame 3 at 0xe3e86638: ebp e3e86660, eip 0xc04af9c7 : movl $0x0,(%esi) frame 4 at 0xe3e86660: ebp e3e86674, eip 0xc043e971 : push $0xc0678500 frame 5 at 0xe3e86674: ebp e3e868e8, eip 0xc042ffa2 :add $0x4,%esp frame 6 at 0xe3e868e8: ebp e3e868f4, eip 0xc042fd07 :add $0x4,%esp frame 7 at 0xe3e868f4: ebp e3e86918, eip 0xc043e531 : mov $0x0,%edx frame 8 at 0xe3e86918: ebp e3e86934, eip 0xc04af344 :mov %eax,0xfffffff0(%ebp) frame 9 at 0xe3e86934: ebp e3e86964, eip 0xc04b3256 :mov %eax,0xffffffe8(%ebp) frame 10 at 0xe3e86964: ebp e3e869a0, eip 0xc04b171d :lea 0xfffffff4(%ebp),%esp frame 11 at 0xe3e869a0: ebp e3e869d0, eip 0xc04b3256 :mov %eax,0xffffffe8(%ebp) frame 12 at 0xe3e869d0: ebp e3e86a0c, eip 0xc04b171d :lea 0xfffffff4(%ebp),%esp frame 13 at 0xe3e86a0c: ebp e3e86a3c, eip 0xc04b3256 :mov %eax,0xffffffe8(%ebp) frame 14 at 0xe3e86a3c: ebp e3e86a70, eip 0xc04aecc4 :mov %eax,%edi frame 15 at 0xe3e86a70: ebp e3e86a94, eip 0xc6780dfb :mov %eax,%edi frame 16 at 0xe3e86a94: ebp e3e86aa4, eip 0xc6780d62 : add $0x4,%esp frame 17 at 0xe3e86aa4: ebp e3e86ac8, eip 0xc6781798 :add $0x8,%esp frame 18 at 0xe3e86ac8: ebp e3e86ad8, eip 0xc677f7e2 :jmp 0xc677f89f frame 19 at 0xe3e86ad8: ebp e3e86ae0, eip 0xc677f9c4 :mov $0x0,%edx frame 20 at 0xe3e86ae0: ebp e3e86af8, eip 0xc67828bd : jmp 0xc67828c9 frame 21 at 0xe3e86af8: ebp e3e86b44, eip 0xc67820fe : jmp 0xc67822d4 frame 22 at 0xe3e86b44: ebp e3e86b70, eip 0xc04ad2ea : mov %eax,%esi frame 23 at 0xe3e86b70: ebp e3e86b7c, eip 0xc04acbef :leave frame 24 at 0xe3e86b7c: ebp e3e86c34, eip 0xc052f20f : add $0x4,%esp frame 25 at 0xe3e86c34: ebp e3e86cec, eip 0xc04fc6e8 : add $0x14,%esp frame 26 at 0xe3e86cec: ebp e3e86d40, eip 0xc060e297 : mov %eax,%ebx As you can see, it finally hangs in g_waitfor_event()+123: 328 do 329 tsleep(ep, PRIBIO, "g_waitfor_event", hz); 330 while (!(ep->flag & EV_DONE)); So, what is the g_event thread doing: (kgdb) defproc 2 2 c685da50 e1a5e000 0 0 0 000204 1 g_event GEOM topology c069dc58 frame 0 at 0xe1a38c50: ebp e1a38c98, eip 0xc04e20ba : add $0x4,%esp frame 1 at 0xe1a38c98: ebp e1a38cb0, eip 0xc04bfba9 : movl $0xe4,(%esp,1) frame 2 at 0xe1a38cb0: ebp e1a38cc8, eip 0xc04e11ec <_sx_xlock+100>: decl 0x48(%ebx) frame 3 at 0xe1a38cc8: ebp e1a38cfc, eip 0xc04b0352 : add $0x28,%esp frame 4 at 0xe1a38cfc: ebp e1a38d04, eip 0xc04b0549 : test %eax,%eax frame 5 at 0xe1a38d04: ebp e1a38d1c, eip 0xc04b12c9 :mov 0xc06ab164,%esi frame 6 at 0xe1a38d1c: ebp e1a38d34, eip 0xc04cb5f0 : push $0x325 It hangs at one_event()+66: 170 g_topology_lock(); OK, and here's the problem: the topology lock was grabbed in g_dev_close(), which you can see in the backtrace of the vinum process. Any ideas? regards, le -- Lukas Ertl http://mailbox.univie.ac.at/~le/ le@FreeBSD.org http://people.freebsd.org/~le/