From owner-freebsd-bugs@FreeBSD.ORG Tue Jul 8 19:40:12 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04B59106566C for ; Tue, 8 Jul 2008 19:40:12 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7271F8FC14 for ; Tue, 8 Jul 2008 19:40:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m68Je6GE092377 for ; Tue, 8 Jul 2008 19:40:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m68Je66W092376; Tue, 8 Jul 2008 19:40:06 GMT (envelope-from gnats) Resent-Date: Tue, 8 Jul 2008 19:40:06 GMT Resent-Message-Id: <200807081940.m68Je66W092376@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Javier Martín Rueda Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B1FC106567F for ; Tue, 8 Jul 2008 19:37:10 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id ECE628FC13 for ; Tue, 8 Jul 2008 19:37:09 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m68Jb9dv092575 for ; Tue, 8 Jul 2008 19:37:09 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m68Jb9sR092574; Tue, 8 Jul 2008 19:37:09 GMT (envelope-from nobody) Message-Id: <200807081937.m68Jb9sR092574@www.freebsd.org> Date: Tue, 8 Jul 2008 19:37:09 GMT From: Javier Martín Rueda To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/125413: Panic when doing zfs raidz with gmirror and ggate X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2008 19:40:12 -0000 >Number: 125413 >Category: kern >Synopsis: Panic when doing zfs raidz with gmirror and ggate >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jul 08 19:40:06 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Javier Martín Rueda >Release: FreeBSD 7.0-STABLE >Organization: DIATEL - UPM >Environment: FreeBSD fuego2.pruebas.local 7.0-STABLE FreeBSD 7.0-STABLE #0: Thu Jul 3 17:21:29 CEST 2008 root@fuego2.pruebas.local:/usr/src/sys/i386/compile/REPLICACION i386 >Description: I have two FreeBSD machines with 8 disks each. I am trying to create a replicated raidz ZFS pool using gmirror and ggate. I export the disks on one of the machines with ggate, and then create 8 gmirrors on the other one, each with two providers (the local disk, and the correspondig remote ggate disk). To clarify, this is the output of gmirror status: # gmirror status Name Status Components mirror/gm0 COMPLETE ggate0 da0 mirror/gm1 COMPLETE ggate1 da1 mirror/gm2 COMPLETE ggate2 da2 mirror/gm3 COMPLETE ggate3 da3 mirror/gm4 COMPLETE ggate4 da4 mirror/gm5 COMPLETE ggate5 da5 mirror/gm6 COMPLETE ggate6 da6 mirror/gm7 COMPLETE ggate7 da7 Now, if I create a non-raidz zpool, everything is fine: # zpool create z1 mirror/gm0 mirror/gm1 mirror/gm2 mirror/gm3 mirror/gm4 mirror/gm5 mirror/gm6 mirror/gm7 # zpool status pool: z1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM z1 ONLINE 0 0 0 mirror/gm0 ONLINE 0 0 0 mirror/gm1 ONLINE 0 0 0 mirror/gm2 ONLINE 0 0 0 mirror/gm3 ONLINE 0 0 0 mirror/gm4 ONLINE 0 0 0 mirror/gm5 ONLINE 0 0 0 mirror/gm6 ONLINE 0 0 0 mirror/gm7 ONLINE 0 0 0 errors: No known data errors However, if I try to create a pool with raidz or raidz2, I get a panic. The sentence that causes the page fault is vdev_geom.c:420, when it tries to access a null pointer. If I create the gmirrors with just the local disk as provider, there is no panic in any case (raidz or raidz2). So, it seems that ggate has something to do with it, or more likely it unhides a problem somewhere else. All this happened also with 7.0-RELEASE. I have been looking at the code for a while, and the sequence of function calls that triggers the panic is this: 1) For some reason, zio_vdev_io_assess() tells SPA to reopen the vdev 2) vdev_reopen() calls vdev_close(), and then it will call vdev_open() 2.1) vdev_close() queues several events to close the 8 devices, but returns before they have been completely closed. The subsequent call to vdev_open() finds the devices still there and reuses them. However, eventually the events from vdev_close will dettach them, and that's when the problem comes, because suddenly a provider that was there vanishes. It looks like a race condition. >How-To-Repeat: It is explained in detail above. Summarizing: The servers are fuego1 and fuego2. Both have 8 data disks (da0 - da7), appart from the system disks. Execute the following on fuego1: ggatec create -u 0 fuego2 /dev/da0 ggatec create -u 1 fuego2 /dev/da1 ggatec create -u 2 fuego2 /dev/da2 ggatec create -u 3 fuego2 /dev/da3 ggatec create -u 4 fuego2 /dev/da4 ggatec create -u 5 fuego2 /dev/da5 ggatec create -u 6 fuego2 /dev/da6 ggatec create -u 7 fuego2 /dev/da7 gmirror label -h -b prefer gm0 da0 ggate0 gmirror label -h -b prefer gm1 da1 ggate1 gmirror label -h -b prefer gm2 da2 ggate2 gmirror label -h -b prefer gm3 da3 ggate3 gmirror label -h -b prefer gm4 da4 ggate4 gmirror label -h -b prefer gm5 da5 ggate5 gmirror label -h -b prefer gm6 da6 ggate6 gmirror label -h -b prefer gm7 da7 ggate7 zpool create z1 raidz2 mirror/gm0 mirror/gm1 mirror/gm2 mirror/gm3 mirror/gm4 mirror/gm5 mirror/gm6 mirror/gm7 And you'll get a panic. >Fix: I don't know a good fix, but I attach a shoddy patch that seems to work (and reinforces my belief that it is a race condition). Basically, I insert a delay between vdev_close and vdev_open in vdev_reopen, so that by the time vdev_open gets called all the closes have been completely finished. Patch attached with submission follows: --- vdev.c 2008-04-17 03:23:33.000000000 +0200 +++ /tmp/vdev.c 2008-07-08 21:27:35.000000000 +0200 @@ -1023,6 +1023,7 @@ ASSERT(spa_config_held(spa, RW_WRITER)); vdev_close(vd); + pause("chapuza", 2000); (void) vdev_open(vd); /* >Release-Note: >Audit-Trail: >Unformatted: