From owner-freebsd-fs@FreeBSD.ORG Mon Oct 13 03:54:33 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 847859EE; Mon, 13 Oct 2014 03:54:33 +0000 (UTC) Received: from mail-yh0-x232.google.com (mail-yh0-x232.google.com [IPv6:2607:f8b0:4002:c01::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3351E3B5; Mon, 13 Oct 2014 03:54:33 +0000 (UTC) Received: by mail-yh0-f50.google.com with SMTP id a41so3265925yho.23 for ; Sun, 12 Oct 2014 20:54:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Y8Pu4pQN5ssvKh1dxtAFY4vwN4CziU13Dc6OZwmPSGQ=; b=Fh4FNi2ovq6/C87wRWOwZDVnYLzFhhL83XYs5HBdMbzOcLV3YYTN0suhbIhtA2goOq Mrrq+RrZgfnWtLsGzu08fGYllJvFR7W7to+9DqVmXyGyYSzT0Ms3L8Mux5CIJkzZYUbE duc/rYDlqaGZ1TqRFoHZh6j8EILsyyF2uNjWeKjq9QoN5kvXZK1EKH3cjzpStcFozZyE fg1NbsQGx0OFdY5wpd29gaRCW+0p9gDi/a8co82um/WD0e2aiV2M69215YlxCBlJJb9t q8Nl0q0oPWnvFp5pKiwIh49lUQGGRtulh7p0+6gxiqjC/09Otp4l/8DwtaWlNEqHaVYU 1EGg== MIME-Version: 1.0 X-Received: by 10.236.32.4 with SMTP id n4mr33142798yha.16.1413172472105; Sun, 12 Oct 2014 20:54:32 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.170.82.197 with HTTP; Sun, 12 Oct 2014 20:54:32 -0700 (PDT) In-Reply-To: References: <54372173.1010100@ijs.si> <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si> <543731F3.8090701@ijs.si> <543AE740.7000808@ijs.si> Date: Sun, 12 Oct 2014 20:54:32 -0700 X-Google-Sender-Auth: Ed-LGs5HF9Eo1bT7G0bE8hehL2Q Message-ID: Subject: Re: zfs pool import hangs on [tx->tx_sync_done_cv] From: "K. Macy" To: Steven Hartland Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-fs@FreeBSD.org" , FreeBSD Stable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2014 03:54:33 -0000 Relavent threads: > > #0 sched_switch (td=0xfffff800446bc000, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=, > pri=) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064dc23 in sleepq_wait (wchan=0xfffff80021f4b220, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:617 > #4 0xffffffff805b7c1a in _cv_wait (cvp=0xfffff80021f4b220, > lock=0xfffff80021f4b1b8) at /usr/src/sys/kern/kern_condvar.c:139 > #5 0xffffffff8151ec4b in txg_wait_synced (dp=0xfffff80021f4b000, > txg=) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:631 > #6 0xffffffff81510937 in spa_load (spa=0xfffffe0003fe1000, > state=SPA_LOAD_IMPORT, type=, mosconfig= optimized out>) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:2777 > #7 0xffffffff8150e5df in spa_load_best (spa=0xfffffe0003fe1000, > state=SPA_LOAD_IMPORT, mosconfig=1, max_request=, > rewind_flags=1) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:2873 > #8 0xffffffff8150e023 in spa_import (pool=0xfffffe0002e2a000 "sys1boot", > config=0xfffff80044f24ba0, props=0x0, flags=) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/spa.c:4207 > #9 0xffffffff81566d78 in zfs_ioc_pool_import (zc=0xfffffe0002e2a000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/zfs_ioctl.c:1594 > #10 0xffffffff81563c4f in zfsdev_ioctl (dev=, > zcmd=, arg=, flag= optimized out>, td=) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6158 > #11 0xffffffff805299eb in devfs_ioctl_f (fp=0xfffff8004f020aa0, > com=3222821378, data=0xfffff80038123e60, cred=, > td=0xfffff800446bc000) at /usr/src/sys/fs/devfs/devfs_vnops.c:757 > #12 0xffffffff8066201b in kern_ioctl (td=, fd= optimized out>, com=) at file.h:320 > #13 0xffffffff80661d9c in sys_ioctl (td=0xfffff800446bc000, > uap=0xfffffe011e49fa40) at /usr/src/sys/kern/sys_generic.c:702 > #14 0xffffffff8088defa in amd64_syscall (td=0xfffff800446bc000, traced=0) > at subr_syscall.c:134 > #15 0xffffffff80870ecb in Xfast_syscall () at /usr/src/sys/amd64/amd64/ > exception.S:391 > > > Thread 509 (Thread 100533): > #0 sched_switch (td=0xfffff800446bf000, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=, > pri=) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064e323 in sleepq_timedwait (wchan=0xfffff80044e47210, > pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:652 > #4 0xffffffff805b84f0 in _cv_timedwait_sbt (cvp=0xfffff80044e47210, > lock=0xfffff80044e471b8, sbt=, pr= out>, flags=) at /usr/src/sys/kern/kern_condvar.c:325 > #5 0xffffffff8151f72e in txg_thread_wait (tx=, > cpr=0xfffffe011dc76a00, cv=0xfffff80044e47210, time=) > at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:249 > #6 0xffffffff8151e930 in txg_sync_thread (arg=0xfffff80044e47000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:483 > #7 0xffffffff805d5bd4 in fork_exit (callout=0xffffffff8151e480 > , arg=0xfffff80044e47000, frame=0xfffffe011dc76ac0) at > /usr/src/sys/kern/kern_fork.c:996 > #8 0xffffffff8087111e in fork_trampoline () at /usr/src/sys/amd64/amd64/ > exception.S:606 > #9 0x0000000000000000 in ?? () > > Thread 508 (Thread 100532): > #0 sched_switch (td=0xfffff80048204920, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=, > pri=) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064dc23 in sleepq_wait (wchan=0xfffff80044e47230, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:617 > #4 0xffffffff805b7c1a in _cv_wait (cvp=0xfffff80044e47230, > lock=0xfffff80044e471b8) at /usr/src/sys/kern/kern_condvar.c:139 > #5 0xffffffff8151f73b in txg_thread_wait (tx=, > cpr=0xfffffe011dc6da00, cv=, time= out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:251 > #6 0xffffffff8151e450 in txg_quiesce_thread (arg=0xfffff80044e47000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:556 > #7 0xffffffff805d5bd4 in fork_exit (callout=0xffffffff8151e030 > , arg=0xfffff80044e47000, frame=0xfffffe011dc6dac0) at > /usr/src/sys/kern/kern_fork.c:996 > #8 0xffffffff8087111e in fork_trampoline () at /usr/src/sys/amd64/amd64/ > exception.S:606 > #9 0x0000000000000000 in ?? () > > > Thread 512 (Thread 100338): > #0 sched_switch (td=0xfffff80041505490, newtd=, > flags=) at /usr/src/sys/kern/sched_ule.c:1945 > #1 0xffffffff806110d9 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:494 > #2 0xffffffff8064ddc2 in sleepq_switch (wchan=, > pri=) at /usr/src/sys/kern/subr_sleepqueue.c:538 > #3 0xffffffff8064dc23 in sleepq_wait (wchan=0xfffff80044acf320, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:617 > #4 0xffffffff805b7c1a in _cv_wait (cvp=0xfffff80044acf320, > lock=0xfffff80044acf300) at /usr/src/sys/kern/kern_condvar.c:139 > #5 0xffffffff81547c9b in zio_wait (zio=) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/zio.c:1442 > #6 0xffffffff814f056c in dsl_pool_sync (dp=0xfffff80021f4b000, > txg=11733518) at /usr/src/sys/modules/zfs/../.. > /cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:531 > #7 0xffffffff81514030 in spa_sync (spa=0xfffffe0003fe1000, txg=11733518) > at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/spa.c:6604 > #8 0xffffffff8151e6cd in txg_sync_thread (arg=0xfffff80021f4b000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/ > common/fs/zfs/txg.c:518 > #9 0xffffffff805d5bd4 in fork_exit (callout=0xffffffff8151e480 > , arg=0xfffff80021f4b000, frame=0xfffffe011e6fdac0) at > /usr/src/sys/kern/kern_fork.c:996 > #10 0xffffffff8087111e in fork_trampoline () at /usr/src/sys/amd64/amd64/ > exception.S:606 > #11 0x0000000000000000 in ?? () > A recent quick read of the code would lead me to believe that zio_wait not returning there means that the zio never reached the zio_done stage. Parent zios seem to yield in a couple of stages in the pipeline if they have incomplete children. They determine this by calling zio_wait_for_children with zio child types and their corresponding wait type. In so doing they set the io_stall to the count of the number of waiters of the first non-zero check. This parent I/O will be resumed by the last child zio of that type and wait state in zio_notify_parent. I'm sure you know all this - but I wrote it to preface asking for the following fields of the zio being waited on in dsl_pool_sync_mos: io_stall (i.e, which field in io_children is pointed to) *io_stall, io_children[*][*], io_child_list (at a first glance just the addresses). The other alternative is that it reexecuting has gotten in to a bad place in the state machine so io_reexecute. Thanks. -K