Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Oct 2012 11:43:23 +0300
From:      Nikolay Denev <ndenev@gmail.com>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        "<freebsd-fs@freebsd.org>" <freebsd-fs@FreeBSD.org>
Subject:   Re: nfs + zfs hangs on RELENG_9
Message-ID:  <CF9C7048-15C1-4C7A-8395-2BAB3AE31322@gmail.com>
In-Reply-To: <506BF372.1090208@FreeBSD.org>
References:  <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Oct 3, 2012, at 11:12 AM, Andriy Gapon <avg@freebsd.org> wrote:

> on 02/10/2012 13:26 Nikolay Denev said the following:
>> 7 100537 zfskern          txg_thread_enter mi_switch+0x186 =
sleepq_wait+0x42
>> _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336
>> txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe
>=20
> =46rom my past experience the threads stuck in zio_wait always meant =
an I/O
> operation stuck in a storage controller driver, controller firmware, =
etc.
> Not necessarily a case here, but a possibility.
>=20
> Perhaps try camcontrol tags <disk devname> -v to see the state of disk =
queues.
>=20

I'm using the mfi(4) driver which does not seem to be under CAM, but I'm =
also running it with=20
the following loader tunable : hw.mfi.max_cmds=3D254, which is an =
increase over the standard 128 tags,
and maybe this could be the problem.
I'll revert it now and retest.

> P.S.
> It would be nice if for debugging purposes we had some place in zio to =
record
> bio that it depends upon.
> E.g. something like:
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> index 80d9336..75b2fcf 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
> @@ -432,6 +432,7 @@ struct zio {
> #ifdef _KERNEL
> 	/* FreeBSD only. */
> 	struct ostask	io_task;
> +	void		*io_bio;
> #endif
> };
>=20
> diff --git =
a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> index 7d146ff..36bb5ad 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp)
> 			vd->vdev_delayed_close =3D B_TRUE;
> 		}
> 	}
> +	zio->io_bio =3D NULL;
> 	g_destroy_bio(bp);
> 	zio_interrupt(zio);
> }
> @@ -732,6 +733,7 @@ sendreq:
> 	}
> 	bp =3D g_alloc_bio();
> 	bp->bio_caller1 =3D zio;
> +	zio->io_bio =3D bp;
> 	switch (zio->io_type) {
> 	case ZIO_TYPE_READ:
> 	case ZIO_TYPE_WRITE:
>=20
> Then, in situation like yours you could use kgdb, switch to the thread =
in
> zio_wait, go to zio_wait frame and get bio pointer from zio.  =46rom =
there you
> could try to deduce what is going on with the I/O request.
>=20

I'm rebuilding now with these patches and DDB/KDB enabled and will try =
to get this information if it happens again.

> --=20
> Andriy Gapon

Thanks!

Regards,
Nikolay=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF9C7048-15C1-4C7A-8395-2BAB3AE31322>