Date: Wed, 3 Oct 2012 11:43:23 +0300 From: Nikolay Denev <ndenev@gmail.com> To: Andriy Gapon <avg@freebsd.org> Cc: "<freebsd-fs@freebsd.org>" <freebsd-fs@FreeBSD.org> Subject: Re: nfs + zfs hangs on RELENG_9 Message-ID: <CF9C7048-15C1-4C7A-8395-2BAB3AE31322@gmail.com> In-Reply-To: <506BF372.1090208@FreeBSD.org> References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 3, 2012, at 11:12 AM, Andriy Gapon <avg@freebsd.org> wrote: > on 02/10/2012 13:26 Nikolay Denev said the following: >> 7 100537 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 >> _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 >> txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe >=20 > =46rom my past experience the threads stuck in zio_wait always meant = an I/O > operation stuck in a storage controller driver, controller firmware, = etc. > Not necessarily a case here, but a possibility. >=20 > Perhaps try camcontrol tags <disk devname> -v to see the state of disk = queues. >=20 I'm using the mfi(4) driver which does not seem to be under CAM, but I'm = also running it with=20 the following loader tunable : hw.mfi.max_cmds=3D254, which is an = increase over the standard 128 tags, and maybe this could be the problem. I'll revert it now and retest. > P.S. > It would be nice if for debugging purposes we had some place in zio to = record > bio that it depends upon. > E.g. something like: > diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > index 80d9336..75b2fcf 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > @@ -432,6 +432,7 @@ struct zio { > #ifdef _KERNEL > /* FreeBSD only. */ > struct ostask io_task; > + void *io_bio; > #endif > }; >=20 > diff --git = a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > index 7d146ff..36bb5ad 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp) > vd->vdev_delayed_close =3D B_TRUE; > } > } > + zio->io_bio =3D NULL; > g_destroy_bio(bp); > zio_interrupt(zio); > } > @@ -732,6 +733,7 @@ sendreq: > } > bp =3D g_alloc_bio(); > bp->bio_caller1 =3D zio; > + zio->io_bio =3D bp; > switch (zio->io_type) { > case ZIO_TYPE_READ: > case ZIO_TYPE_WRITE: >=20 > Then, in situation like yours you could use kgdb, switch to the thread = in > zio_wait, go to zio_wait frame and get bio pointer from zio. =46rom = there you > could try to deduce what is going on with the I/O request. >=20 I'm rebuilding now with these patches and DDB/KDB enabled and will try = to get this information if it happens again. > --=20 > Andriy Gapon Thanks! Regards, Nikolay=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF9C7048-15C1-4C7A-8395-2BAB3AE31322>