From owner-freebsd-stable@FreeBSD.ORG Mon Oct 13 20:10:19 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 70875F3B; Mon, 13 Oct 2014 20:10:19 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id D984598C; Mon, 13 Oct 2014 20:10:18 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 95A7720E708F1; Mon, 13 Oct 2014 20:10:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: *** X-Spam-Status: No, score=3.1 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE, STOX_REPLY_TYPE_WITHOUT_QUOTES autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 0E33D20E708EF; Mon, 13 Oct 2014 20:10:14 +0000 (UTC) Message-ID: <14ADE02801754E028D9A0EAB4A16527E@multiplay.co.uk> From: "Steven Hartland" To: "K. Macy" References: <54372173.1010100@ijs.si> <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si> <543731F3.8090701@ijs.si> <543AE740.7000808@ijs.si> <6E01BBEDA9984CCDA14F290D26A8E14D@multiplay.co.uk> Subject: zpool import hangs when out of space - Was: zfs pool import hangs on [tx->tx_sync_done_cv] Date: Mon, 13 Oct 2014 21:10:11 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: Mark Martinec , "freebsd-fs@FreeBSD.org" , FreeBSD Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2014 20:10:19 -0000 ----- Original Message ----- From: "K. Macy" > You are correct. > > (kgdb) p ((zio_t *)$r14)->io_reexecute > $32 = 2 '\002' > (kgdb) p ((zio_t *)$r14)->io_flags > $33 = 0 > (kgdb) p ((zio_t *)$r14)->io_spa->spa_suspended > $34 = 1 '\001' > > This means zio_suspend has been called from zio_done: > else if (zio->io_reexecute & ZIO_REEXECUTE_SUSPEND) { > /* > * We'd fail again if we reexecuted now, so suspend > * until conditions improve (e.g. device comes online). > */ > zio_suspend(spa, zio); > } > > If failure mode were panic we would have panicked when attempting the import: > void > zio_suspend(spa_t *spa, zio_t *zio) > { > if (spa_get_failmode(spa) == ZIO_FAILURE_MODE_PANIC) > fm_panic("Pool '%s' has encountered an uncorrectable I/O " > "failure and the failure mode property for this pool " > "is set to panic.", spa_name(spa)); Yep and forcing that panic I got the following stack: #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff80607977 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff80607e85 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80607ed3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688 #4 0xffffffff81548dfa in zio_suspend (spa=, zio=) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1527 #5 0xffffffff8154ec66 in zio_done (zio=) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3264 #6 0xffffffff81548d54 in zio_execute (zio=0xfffff80044a0dac8) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #7 0xffffffff8154ebfc in zio_done (zio=0xfffff8004884b398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258 #8 0xffffffff81548d54 in zio_execute (zio=0xfffff8004884b398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #9 0xffffffff8154ebfc in zio_done (zio=0xfffff80044c0a000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258 #10 0xffffffff81548d54 in zio_execute (zio=0xfffff80044c0a000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #11 0xffffffff8154ebfc in zio_done (zio=0xfffff80044a2fac8) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258 #12 0xffffffff81548d54 in zio_execute (zio=0xfffff80044a2fac8) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #13 0xffffffff8154ebfc in zio_done (zio=0xfffff80044853398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258 #14 0xffffffff81548d54 in zio_execute (zio=0xfffff80044853398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #15 0xffffffff8154ea2a in zio_done (zio=0xfffff8004877e398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3313 #16 0xffffffff81548d54 in zio_execute (zio=0xfffff8004877e398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #17 0xffffffff8154ea2a in zio_done (zio=0xfffff80044cb0730) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3313 #18 0xffffffff81548d54 in zio_execute (zio=0xfffff80044cb0730) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #19 0xffffffff80651410 in taskqueue_run_locked (queue=0xfffff800488cf400) at /usr/src/sys/kern/subr_taskqueue.c:342 #20 0xffffffff80651dcb in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:563 Along with: (kgdb) print (*(zio_t *)0xfffff80044853398)->io_error $20 = 28 (kgdb) print (*(zio_t *)0xfffff80044a2fac8)->io_error $21 = 28 grep 28 /usr/include/sys/errno.h #define ENOSPC 28 /* No space left on device */ So the issue is simply the pool is out of space to perform the import as that process, when not readonly, requires space to write to the pool. The problem with that is that during this process it has the pool lock so any subsequent zpool actions are dead in the water as they will block waiting on that lock. Something to discuss with the openzfs guys, but I would say the import should fail with a no space error. So Mark the mystery is solved, when you upgraded you ran the pool so low on space that it now can't be imported RW as that requires a write. Regards Steve