Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Oct 2014 21:10:11 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "K. Macy" <kmacy@freebsd.org>
Cc:        Mark Martinec <Mark.Martinec+freebsd@ijs.si>, "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>, FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   zpool import hangs when out of space - Was: zfs pool import hangs on [tx->tx_sync_done_cv]
Message-ID:  <14ADE02801754E028D9A0EAB4A16527E@multiplay.co.uk>
References:  <54372173.1010100@ijs.si> <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si> <DE7DD7A94E9B4F1FBB3AFF57EDB47C67@multiplay.co.uk> <543731F3.8090701@ijs.si> <543AE740.7000808@ijs.si> <A5BA41116A7F4B23A9C9E469C4146B99@multiplay.co.uk> <CAHM0Q_N%2BC=3qgUnyDkEugOFcL=J8gBjbTg8v45Vz3uT=e=Fn2g@mail.gmail.com> <6E01BBEDA9984CCDA14F290D26A8E14D@multiplay.co.uk> <CAHM0Q_OpV2sAQQAH6Cj_=yJWAOt8pTPWQ-m45JSiXDpBwT6WTA@mail.gmail.com> <E2E24A91B8B04C2DBBBC7E029A12BD05@multiplay.co.uk> <CAHM0Q_Oeka25-kdSDRC2evS1R8wuQ0_XgbcdZCjS09aXJ9_WWQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- 
From: "K. Macy" <kmacy@freebsd.org>
> You are correct.
>
> (kgdb) p ((zio_t *)$r14)->io_reexecute
> $32 = 2 '\002'
> (kgdb) p ((zio_t *)$r14)->io_flags
> $33 = 0
> (kgdb) p ((zio_t *)$r14)->io_spa->spa_suspended
> $34 = 1 '\001'
>
> This means zio_suspend has been called from zio_done:
> else if (zio->io_reexecute & ZIO_REEXECUTE_SUSPEND) {
> /*
> * We'd fail again if we reexecuted now, so suspend
> * until conditions improve (e.g. device comes online).
> */
> zio_suspend(spa, zio);
> }
>
> If failure mode were panic we would have panicked when attempting the import:
> void
> zio_suspend(spa_t *spa, zio_t *zio)
> {
> if (spa_get_failmode(spa) == ZIO_FAILURE_MODE_PANIC)
> fm_panic("Pool '%s' has encountered an uncorrectable I/O "
>   "failure and the failure mode property for this pool "
> "is set to panic.", spa_name(spa));

Yep and forcing that panic I got the following stack:

#0  doadump (textdump=1) at pcpu.h:219
#1  0xffffffff80607977 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff80607e85 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80607ed3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688
#4  0xffffffff81548dfa in zio_suspend (spa=<value optimized out>, zio=<value optimized out>) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1527
#5  0xffffffff8154ec66 in zio_done (zio=<value optimized out>) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3264
#6  0xffffffff81548d54 in zio_execute (zio=0xfffff80044a0dac8) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#7  0xffffffff8154ebfc in zio_done (zio=0xfffff8004884b398) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258
#8  0xffffffff81548d54 in zio_execute (zio=0xfffff8004884b398) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#9  0xffffffff8154ebfc in zio_done (zio=0xfffff80044c0a000) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258
#10 0xffffffff81548d54 in zio_execute (zio=0xfffff80044c0a000) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#11 0xffffffff8154ebfc in zio_done (zio=0xfffff80044a2fac8) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258
#12 0xffffffff81548d54 in zio_execute (zio=0xfffff80044a2fac8) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#13 0xffffffff8154ebfc in zio_done (zio=0xfffff80044853398) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3258
#14 0xffffffff81548d54 in zio_execute (zio=0xfffff80044853398) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#15 0xffffffff8154ea2a in zio_done (zio=0xfffff8004877e398) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3313
#16 0xffffffff81548d54 in zio_execute (zio=0xfffff8004877e398) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#17 0xffffffff8154ea2a in zio_done (zio=0xfffff80044cb0730) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3313
#18 0xffffffff81548d54 in zio_execute (zio=0xfffff80044cb0730) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407
#19 0xffffffff80651410 in taskqueue_run_locked (queue=0xfffff800488cf400) at /usr/src/sys/kern/subr_taskqueue.c:342
#20 0xffffffff80651dcb in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:563

Along with:
(kgdb) print (*(zio_t *)0xfffff80044853398)->io_error
$20 = 28
(kgdb) print (*(zio_t *)0xfffff80044a2fac8)->io_error
$21 = 28

grep 28 /usr/include/sys/errno.h
#define ENOSPC          28              /* No space left on device */

So the issue is simply the pool is out of space to perform the import
as that process, when not readonly, requires space to write to the pool.

The problem with that is that during this process it has the pool lock so
any subsequent zpool actions are dead in the water as they will block
waiting on that lock.

Something to discuss with the openzfs guys, but I would say the import
should fail with a no space error.

So Mark the mystery is solved, when you upgraded you ran the pool so low
on space that it now can't be imported RW as that requires a write.

    Regards
    Steve 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14ADE02801754E028D9A0EAB4A16527E>