Date: Mon, 13 Oct 2014 20:02:12 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: "K. Macy" <kmacy@freebsd.org> Cc: Mark Martinec <Mark.Martinec+freebsd@ijs.si>, "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>, FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: zfs pool import hangs on [tx->tx_sync_done_cv] Message-ID: <E2E24A91B8B04C2DBBBC7E029A12BD05@multiplay.co.uk> References: <54372173.1010100@ijs.si> <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si> <DE7DD7A94E9B4F1FBB3AFF57EDB47C67@multiplay.co.uk> <543731F3.8090701@ijs.si> <543AE740.7000808@ijs.si> <A5BA41116A7F4B23A9C9E469C4146B99@multiplay.co.uk> <CAHM0Q_N%2BC=3qgUnyDkEugOFcL=J8gBjbTg8v45Vz3uT=e=Fn2g@mail.gmail.com> <6E01BBEDA9984CCDA14F290D26A8E14D@multiplay.co.uk> <CAHM0Q_OpV2sAQQAH6Cj_=yJWAOt8pTPWQ-m45JSiXDpBwT6WTA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "K. Macy" <kmacy@freebsd.org> > On Mon, Oct 13, 2014 at 1:06 AM, Steven Hartland <killing@multiplay.co.uk> > wrote: > >> ----- Original Message ----- From: "K. Macy" <kmacy@freebsd.org> >> >> A recent quick read of the code would lead me to believe that zio_wait not >>> returning there means that the zio never reached the zio_done stage. >>> Parent >>> zios seem to yield in a couple of stages in the pipeline if they have >>> incomplete children. They determine this by calling zio_wait_for_children >>> with zio child types and their corresponding wait type. In so doing they >>> set the io_stall to the count of the number of waiters of the first >>> non-zero check. This parent I/O will be resumed by the last child zio of >>> that type and wait state in zio_notify_parent. I'm sure you know all this >>> - >>> but I wrote it to preface asking for the following fields of the zio being >>> waited on in dsl_pool_sync_mos: io_stall (i.e, which field in io_children >>> is pointed to) *io_stall, io_children[*][*], io_child_list (at a first >>> glance just the addresses). The other alternative is that it reexecuting >>> has gotten in to a bad place in the state machine so io_reexecute. >>> >> >> Yer I would have got the zio details but typically its "optimised out" by >> the >> compiler, so will need some effort to track that down unfortunately :( >> > > Well, let me know if you can. Re-creating a new 10.x VM is taking a while > as it's taking me forever to checkout the sources. > > Things like that need to somehow continue to be accessible. I believe there's some pool corruption here somewhere as every once in a while I trip and ASSERT panic: panic: solaris assert: size >= SPA_MINBLOCKSIZE || range_tree_space(msp->ms_tree) == 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: 1636 #3 0xffffffff80607ed3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688 #4 0xffffffff8179321d in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #5 0xffffffff81501ecf in metaslab_passivate (msp=0xfffff800091e5800, size=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:1636 #6 0xffffffff81501ca7 in metaslab_group_alloc (mg=0xfffff80044ef7400, psize=512, asize=512, txg=11733518, min_distance=536281088, dva=<value optimized out>, d=-1) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:2206 #7 0xffffffff81500c9d in metaslab_alloc_dva (spa=0xfffffe00022bb000, mc=0xfffff800045d0c00, psize=512, dva=0xfffffe000e409640, d=0, hintdva=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:2381 #8 0xffffffff81500709 in metaslab_alloc (spa=0xfffffe00022bb000, mc=0xfffff800045d0c00, psize=512, bp=0xfffffe000e409640, ndvas=3, txg=11733518, hintbp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:2586 #9 0xffffffff8154ca8a in zio_dva_allocate (zio=0xfffff80044a57398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2444 #10 0xffffffff81548d54 in zio_execute (zio=0xfffff80044a57398) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #11 0xffffffff8154d25f in zio_ready (zio=0xfffff800487d2730) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3031 #12 0xffffffff81548d54 in zio_execute (zio=0xfffff800487d2730) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1407 #13 0xffffffff80651410 in taskqueue_run_locked (queue=0xfffff80054291000) at /usr/src/sys/kern/subr_taskqueue.c:342 #14 0xffffffff80651dcb in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:563 (kgdb) frame 5 #5 0xffffffff81501ecf in metaslab_passivate (msp=0xfffff800091e5800, size=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:1636 1636 ASSERT(size >= SPA_MINBLOCKSIZE || range_tree_space(msp->ms_tree) == 0); (kgdb) print size $5 = 0 (kgdb) print msp->ms_tree $6 = (range_tree_t *) 0xfffff800091e5400 (kgdb) print *msp->ms_tree $7 = { rt_root = { avl_root = 0xfffff80048fab840, avl_compar = 0xffffffff81502850 <range_tree_seg_compare>, avl_offset = 0, avl_numnodes = 1, avl_size = 64 }, rt_space = 5632, rt_ops = 0xffffffff8160b110, rt_arg = 0xfffff800091e5800, rt_histogram = {0 <repeats 12 times>, 1, 0 <repeats 51 times>}, rt_lock = 0xfffff800091e5800 } (kgdb) print *msp $8 = { ms_lock = { lock_object = { lo_name = 0xffffffff815eb310 "msp->ms_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0 }, sx_lock = 18446735279027798016 }, ms_load_cv = { cv_description = 0xffffffff815eb31e "msp->ms_load_cv", cv_waiters = 0 }, ms_sm = 0xfffff8000924bb80, ms_ops = 0xffffffff8160b100, ms_id = 119, ms_start = 3992977408, ms_size = 33554432, ms_fragmentation = 90, ms_alloctree = {0xfffff800091e5000, 0xfffff800091ec000, 0xfffff800091e3c00, 0xfffff800091e3400}, ms_freetree = {0xfffff800091e4c00, 0xfffff800091e4000, 0xfffff800091e4400, 0xfffff800091e3000}, ms_defertree = {0xfffff800091e3800, 0xfffff800091df400}, ms_tree = 0xfffff800091e5400, ms_condensing = 0, ms_condense_wanted = 0, ms_loaded = 1, ms_loading = 0, ms_deferspace = 0, ms_weight = 13835058055282163712, ms_access_txg = 11733526, ms_size_tree = { avl_root = 0xfffff80048fab858, avl_compar = 0xffffffff81502360 <metaslab_rangesize_compare>, avl_offset = 24, avl_numnodes = 1, avl_size = 64 }, ms_lbas = {0, 0, 0, 0, 0, 0, 0, 0, 0, 4026524672, 4026530816, 0 <repeats 53 times>}, ms_group = 0xfffff80044ef7400, ms_group_node = { avl_child = {0x0, 0x0}, avl_pcb = 18446735278773074705 }, ms_txg_node = { tn_next = {0x0, 0xfffff800091d7b28, 0xfffff800091e2328, 0x0}, tn_member = "\000\001\001" } } Also when it stalls I've seen this in the zfs debug:- 13 39363 :zfs-dprintf dnode.c - dnode_free_range:1655: ds=mos obj=31 blkid=0 nblks=1125899906842624 txg=3078494 13 39363 :zfs-dprintf dbuf.c - dbuf_free_range:816: ds=mos obj=31 start=0 end=0 13 39363 :zfs-dprintf dbuf.c - dbuf_dirty:1133: ds=mos obj=40 lvl=0 blkid=-1 size=140 13 39363 :zfs-dprintf dnode.c - dnode_setdirty:1286: ds=mos obj=40 txg=3078494 nblks looks suspisiously large and I'm not sure blkid should be -1? For reference to see this I'm using: sysctl debug.zfs_flags=64 dtrace -n 'zfs-dprintf {printf("%s - %s:%d: %s", stringof(arg0), stringof(arg1), arg2, stringof(arg3))}' With all that said the following looks like it might indicate the issue: zdb -e -m sys1boot Metaslabs: vdev 0 metaslabs 127 offset spacemap free --------------- ------------------- --------------- ------------- metaslab 0 offset 0 spacemap 417 free 0 metaslab 1 offset 2000000 spacemap 24 free 0 metaslab 2 offset 4000000 spacemap 64 free 0 metaslab 3 offset 6000000 spacemap 1094 free 0 metaslab 4 offset 8000000 spacemap 1091 free 0 metaslab 5 offset a000000 spacemap 1093 free 0 metaslab 6 offset c000000 spacemap 1095 free 0 metaslab 7 offset e000000 spacemap 1096 free 0 metaslab 8 offset 10000000 spacemap 1098 free 0 metaslab 9 offset 12000000 spacemap 1097 free 0 metaslab 10 offset 14000000 spacemap 1099 free 0 metaslab 11 offset 16000000 spacemap 1102 free 0 metaslab 12 offset 18000000 spacemap 113 free 0 metaslab 13 offset 1a000000 spacemap 110 free 0 metaslab 14 offset 1c000000 spacemap 115 free 0 metaslab 15 offset 1e000000 spacemap 1103 free 0 metaslab 16 offset 20000000 spacemap 416 free 0 metaslab 17 offset 22000000 spacemap 81 free 0 metaslab 18 offset 24000000 spacemap 57 free 0 metaslab 19 offset 26000000 spacemap 102 free 0 metaslab 20 offset 28000000 spacemap 100 free 0 metaslab 21 offset 2a000000 spacemap 104 free 0 metaslab 22 offset 2c000000 spacemap 125 free 0 metaslab 23 offset 2e000000 spacemap 105 free 0 metaslab 24 offset 30000000 spacemap 108 free 0 metaslab 25 offset 32000000 spacemap 126 free 0 metaslab 26 offset 34000000 spacemap 248 free 0 metaslab 27 offset 36000000 spacemap 55 free 0 metaslab 28 offset 38000000 spacemap 82 free 0 metaslab 29 offset 3a000000 spacemap 149 free 0 metaslab 30 offset 3c000000 spacemap 152 free 0 metaslab 31 offset 3e000000 spacemap 155 free 0 metaslab 32 offset 40000000 spacemap 156 free 0 metaslab 33 offset 42000000 spacemap 72 free 0 metaslab 34 offset 44000000 spacemap 96 free 0 metaslab 35 offset 46000000 spacemap 159 free 0 metaslab 36 offset 48000000 spacemap 158 free 0 metaslab 37 offset 4a000000 spacemap 160 free 0 metaslab 38 offset 4c000000 spacemap 1271 free 0 metaslab 39 offset 4e000000 spacemap 161 free 0 metaslab 40 offset 50000000 spacemap 1273 free 0 metaslab 41 offset 52000000 spacemap 1277 free 0 metaslab 42 offset 54000000 spacemap 35 free 0 metaslab 43 offset 56000000 spacemap 25 free 0 metaslab 44 offset 58000000 spacemap 32 free 0 metaslab 45 offset 5a000000 spacemap 150 free 0 metaslab 46 offset 5c000000 spacemap 151 free 0 metaslab 47 offset 5e000000 spacemap 163 free 0 metaslab 48 offset 60000000 spacemap 415 free 0 metaslab 49 offset 62000000 spacemap 98 free 0 metaslab 50 offset 64000000 spacemap 109 free 0 metaslab 51 offset 66000000 spacemap 171 free 0 metaslab 52 offset 68000000 spacemap 80 free 0 metaslab 53 offset 6a000000 spacemap 134 free 0 metaslab 54 offset 6c000000 spacemap 135 free 0 metaslab 55 offset 6e000000 spacemap 154 free 0 metaslab 56 offset 70000000 spacemap 140 free 0 metaslab 57 offset 72000000 spacemap 141 free 0 metaslab 58 offset 74000000 spacemap 1272 free 0 metaslab 59 offset 76000000 spacemap 138 free 0 metaslab 60 offset 78000000 spacemap 139 free 0 metaslab 61 offset 7a000000 spacemap 18 free 0 metaslab 62 offset 7c000000 spacemap 148 free 0 metaslab 63 offset 7e000000 spacemap 1270 free 0 metaslab 64 offset 80000000 spacemap 114 free 0 metaslab 65 offset 82000000 spacemap 112 free 0 metaslab 66 offset 84000000 spacemap 116 free 0 metaslab 67 offset 86000000 spacemap 164 free 0 metaslab 68 offset 88000000 spacemap 243 free 0 metaslab 69 offset 8a000000 spacemap 128 free 0 metaslab 70 offset 8c000000 spacemap 1101 free 0 metaslab 71 offset 8e000000 spacemap 153 free 0 metaslab 72 offset 90000000 spacemap 120 free 0 metaslab 73 offset 92000000 spacemap 62 free 0 metaslab 74 offset 94000000 spacemap 122 free 0 metaslab 75 offset 96000000 spacemap 131 free 0 metaslab 76 offset 98000000 spacemap 129 free 0 metaslab 77 offset 9a000000 spacemap 157 free 0 metaslab 78 offset 9c000000 spacemap 133 free 0 metaslab 79 offset 9e000000 spacemap 137 free 0 metaslab 80 offset a0000000 spacemap 165 free 0 metaslab 81 offset a2000000 spacemap 168 free 0 metaslab 82 offset a4000000 spacemap 170 free 0 metaslab 83 offset a6000000 spacemap 127 free 0 metaslab 84 offset a8000000 spacemap 180 free 0 metaslab 85 offset aa000000 spacemap 162 free 0 metaslab 86 offset ac000000 spacemap 1100 free 0 metaslab 87 offset ae000000 spacemap 107 free 0 metaslab 88 offset b0000000 spacemap 119 free 0 metaslab 89 offset b2000000 spacemap 61 free 0 metaslab 90 offset b4000000 spacemap 60 free 0 metaslab 91 offset b6000000 spacemap 123 free 0 metaslab 92 offset b8000000 spacemap 130 free 0 metaslab 93 offset ba000000 spacemap 59 free 0 metaslab 94 offset bc000000 spacemap 167 free 0 metaslab 95 offset be000000 spacemap 136 free 0 metaslab 96 offset c0000000 spacemap 144 free 0 metaslab 97 offset c2000000 spacemap 166 free 0 metaslab 98 offset c4000000 spacemap 169 free 0 metaslab 99 offset c6000000 spacemap 58 free 0 metaslab 100 offset c8000000 spacemap 56 free 0 metaslab 101 offset ca000000 spacemap 54 free 0 metaslab 102 offset cc000000 spacemap 53 free 0 metaslab 103 offset ce000000 spacemap 52 free 0 metaslab 104 offset d0000000 spacemap 106 free 0 metaslab 105 offset d2000000 spacemap 51 free 0 metaslab 106 offset d4000000 spacemap 50 free 0 metaslab 107 offset d6000000 spacemap 121 free 0 metaslab 108 offset d8000000 spacemap 124 free 0 metaslab 109 offset da000000 spacemap 49 free 0 metaslab 110 offset dc000000 spacemap 142 free 0 metaslab 111 offset de000000 spacemap 132 free 0 metaslab 112 offset e0000000 spacemap 48 free 0 metaslab 113 offset e2000000 spacemap 47 free 0 metaslab 114 offset e4000000 spacemap 46 free 0 metaslab 115 offset e6000000 spacemap 45 free 0 metaslab 116 offset e8000000 spacemap 143 free 0 metaslab 117 offset ea000000 spacemap 44 free 0 metaslab 118 offset ec000000 spacemap 43 free 177K metaslab 119 offset ee000000 spacemap 42 free 13.0K metaslab 120 offset f0000000 spacemap 41 free 0 metaslab 121 offset f2000000 spacemap 179 free 0 metaslab 122 offset f4000000 spacemap 40 free 0 metaslab 123 offset f6000000 spacemap 39 free 0 metaslab 124 offset f8000000 spacemap 38 free 0 metaslab 125 offset fa000000 spacemap 37 free 0 metaslab 126 offset fc000000 spacemap 36 free 0 Only two metaslabs with any free? zfs list -r sys1boot NAME USED AVAIL REFER MOUNTPOINT sys1boot 1.76G 2.08G 11K /sys1boot sys1boot/ROOT 1.72G 2.08G 1.20G /sys1boot/ROOT NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT sys1boot 3.97G 3.97G 190K 0% - 99% 1.00x ONLINE - So basically the zpool says its out of space even though zfs says its under half full? Given this I'm guessing that on import when it goes to write an update its hitting a brick wall and can't, with the zio being flagged as can't fail it seems to get stuck. Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E2E24A91B8B04C2DBBBC7E029A12BD05>