Date: Sun, 12 Oct 2014 20:54:32 -0700 From: "K. Macy" <kmacy@freebsd.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>, FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: zfs pool import hangs on [tx->tx_sync_done_cv] Message-ID: <CAHM0Q_N%2BC=3qgUnyDkEugOFcL=J8gBjbTg8v45Vz3uT=e=Fn2g@mail.gmail.com> In-Reply-To: <A5BA41116A7F4B23A9C9E469C4146B99@multiplay.co.uk> References: <54372173.1010100@ijs.si> <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si> <DE7DD7A94E9B4F1FBB3AFF57EDB47C67@multiplay.co.uk> <543731F3.8090701@ijs.si> <543AE740.7000808@ijs.si> <A5BA41116A7F4B23A9C9E469C4146B99@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
Relavent threads: > > #0 sched_switch (td=0xfffff800446bc000, newtd=<value optimized out>, > flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=<value optimized out>, > pri=<value optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064dc23 in sleepq_wait (wchan=0xfffff80021f4b220, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:617 > #4 0xffffffff805b7c1a in _cv_wait (cvp=0xfffff80021f4b220, > lock=0xfffff80021f4b1b8) at /usr/src/sys/kern/kern_condvar.c:139 > #5 0xffffffff8151ec4b in txg_wait_synced (dp=0xfffff80021f4b000, > txg=<value optimized out>) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:631 > #6 0xffffffff81510937 in spa_load (spa=0xfffffe0003fe1000, > state=SPA_LOAD_IMPORT, type=<value optimized out>, mosconfig=<value > optimized out>) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:2777 > #7 0xffffffff8150e5df in spa_load_best (spa=0xfffffe0003fe1000, > state=SPA_LOAD_IMPORT, mosconfig=1, max_request=<value optimized out>, > rewind_flags=1) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:2873 > #8 0xffffffff8150e023 in spa_import (pool=0xfffffe0002e2a000 "sys1boot", > config=0xfffff80044f24ba0, props=0x0, flags=<value optimized out>) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/spa.c:4207 > #9 0xffffffff81566d78 in zfs_ioc_pool_import (zc=0xfffffe0002e2a000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/zfs_ioctl.c:1594 > #10 0xffffffff81563c4f in zfsdev_ioctl (dev=<value optimized out>, > zcmd=<value optimized out>, arg=<value optimized out>, flag=<value > optimized out>, td=<value optimized out>) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6158 > #11 0xffffffff805299eb in devfs_ioctl_f (fp=0xfffff8004f020aa0, > com=3222821378, data=0xfffff80038123e60, cred=<value optimized out>, > td=0xfffff800446bc000) at /usr/src/sys/fs/devfs/devfs_vnops.c:757 > #12 0xffffffff8066201b in kern_ioctl (td=<value optimized out>, fd=<value > optimized out>, com=<value optimized out>) at file.h:320 > #13 0xffffffff80661d9c in sys_ioctl (td=0xfffff800446bc000, > uap=0xfffffe011e49fa40) at /usr/src/sys/kern/sys_generic.c:702 > #14 0xffffffff8088defa in amd64_syscall (td=0xfffff800446bc000, traced=0) > at subr_syscall.c:134 > #15 0xffffffff80870ecb in Xfast_syscall () at /usr/src/sys/amd64/amd64/ > exception.S:391 > > > Thread 509 (Thread 100533): > #0 sched_switch (td=0xfffff800446bf000, newtd=<value optimized out>, > flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=<value optimized out>, > pri=<value optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064e323 in sleepq_timedwait (wchan=0xfffff80044e47210, > pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:652 > #4 0xffffffff805b84f0 in _cv_timedwait_sbt (cvp=0xfffff80044e47210, > lock=0xfffff80044e471b8, sbt=<value optimized out>, pr=<value optimized > out>, flags=<value optimized out>) at /usr/src/sys/kern/kern_condvar.c:325 > #5 0xffffffff8151f72e in txg_thread_wait (tx=<value optimized out>, > cpr=0xfffffe011dc76a00, cv=0xfffff80044e47210, time=<value optimized out>) > at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:249 > #6 0xffffffff8151e930 in txg_sync_thread (arg=0xfffff80044e47000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:483 > #7 0xffffffff805d5bd4 in fork_exit (callout=0xffffffff8151e480 > <txg_sync_thread>, arg=0xfffff80044e47000, frame=0xfffffe011dc76ac0) at > /usr/src/sys/kern/kern_fork.c:996 > #8 0xffffffff8087111e in fork_trampoline () at /usr/src/sys/amd64/amd64/ > exception.S:606 > #9 0x0000000000000000 in ?? () > > Thread 508 (Thread 100532): > #0 sched_switch (td=0xfffff80048204920, newtd=<value optimized out>, > flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=<value optimized out>, > pri=<value optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064dc23 in sleepq_wait (wchan=0xfffff80044e47230, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:617 > #4 0xffffffff805b7c1a in _cv_wait (cvp=0xfffff80044e47230, > lock=0xfffff80044e471b8) at /usr/src/sys/kern/kern_condvar.c:139 > #5 0xffffffff8151f73b in txg_thread_wait (tx=<value optimized out>, > cpr=0xfffffe011dc6da00, cv=<value optimized out>, time=<value optimized > out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:251 > #6 0xffffffff8151e450 in txg_quiesce_thread (arg=0xfffff80044e47000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:556 > #7 0xffffffff805d5bd4 in fork_exit (callout=0xffffffff8151e030 > <txg_quiesce_thread>, arg=0xfffff80044e47000, frame=0xfffffe011dc6dac0) at > /usr/src/sys/kern/kern_fork.c:996 > #8 0xffffffff8087111e in fork_trampoline () at /usr/src/sys/amd64/amd64/ > exception.S:606 > #9 0x0000000000000000 in ?? () > > > Thread 512 (Thread 100338): > #0 sched_switch (td=0xfffff80041505490, newtd=<value optimized out>, > flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=<value optimized out>, > pri=<value optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064dc23 in sleepq_wait (wchan=0xfffff80044acf320, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:617 > #4 0xffffffff805b7c1a in _cv_wait (cvp=0xfffff80044acf320, > lock=0xfffff80044acf300) at /usr/src/sys/kern/kern_condvar.c:139 > #5 0xffffffff81547c9b in zio_wait (zio=<value optimized out>) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/zio.c:1442 > #6 0xffffffff814f056c in dsl_pool_sync (dp=0xfffff80021f4b000, > txg=11733518) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:531 > #7 0xffffffff81514030 in spa_sync (spa=0xfffffe0003fe1000, txg=11733518) > at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/spa.c:6604 > #8 0xffffffff8151e6cd in txg_sync_thread (arg=0xfffff80021f4b000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:518 > #9 0xffffffff805d5bd4 in fork_exit (callout=0xffffffff8151e480 > <txg_sync_thread>, arg=0xfffff80021f4b000, frame=0xfffffe011e6fdac0) at > /usr/src/sys/kern/kern_fork.c:996 > #10 0xffffffff8087111e in fork_trampoline () at /usr/src/sys/amd64/amd64/ > exception.S:606 > #11 0x0000000000000000 in ?? () > A recent quick read of the code would lead me to believe that zio_wait not returning there means that the zio never reached the zio_done stage. Parent zios seem to yield in a couple of stages in the pipeline if they have incomplete children. They determine this by calling zio_wait_for_children with zio child types and their corresponding wait type. In so doing they set the io_stall to the count of the number of waiters of the first non-zero check. This parent I/O will be resumed by the last child zio of that type and wait state in zio_notify_parent. I'm sure you know all this - but I wrote it to preface asking for the following fields of the zio being waited on in dsl_pool_sync_mos: io_stall (i.e, which field in io_children is pointed to) *io_stall, io_children[*][*], io_child_list (at a first glance just the addresses). The other alternative is that it reexecuting has gotten in to a bad place in the state machine so io_reexecute. Thanks. -K
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHM0Q_N%2BC=3qgUnyDkEugOFcL=J8gBjbTg8v45Vz3uT=e=Fn2g>