From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 08:43:30 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE67F1065670; Wed, 3 Oct 2012 08:43:30 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by mx1.freebsd.org (Postfix) with ESMTP id 1F63B8FC15; Wed, 3 Oct 2012 08:43:29 +0000 (UTC) Received: by wibhr7 with SMTP id hr7so1606578wib.13 for ; Wed, 03 Oct 2012 01:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=UMBk0qjlOrS/0ppX0f0P6COS3GVDpop/STEVp/uqlgQ=; b=q0PCAoJ8AnYyK4jgz7mFKHttlJ206pDLsyMdeVDQkX624ofSunr/JpaxKstJpUQGtA mheHuJj0gFnl5URus2bm9YPRJxc9F6qXYCVOXanv9//BnGfEVSO3x5NxpwG9wVTYQYwk 57rBUZgUpFOL/RfmnVPBP4MADsrT8fWzzwxvg1fKiKnaIE+RWR64mbSwGh8bznCp5KHY pfxC6JOHi5Y38HDIyPPXAdM1OgpFjB7LsAOWbWG9RKHUGw7nG1jkZuzp72+auHOZCMPF zhrD58Ggn0VmA9L0UrKrsovSc4LCd2roAFEaGpgkPsD3M1qBYzOFKYotFgal8OAT8tUd Ttww== Received: by 10.180.79.100 with SMTP id i4mr27999207wix.12.1349253808228; Wed, 03 Oct 2012 01:43:28 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id gg4sm7248461wib.6.2012.10.03.01.43.26 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 03 Oct 2012 01:43:26 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=windows-1252 From: Nikolay Denev In-Reply-To: <506BF372.1090208@FreeBSD.org> Date: Wed, 3 Oct 2012 11:43:23 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1498) Cc: "" Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 08:43:30 -0000 On Oct 3, 2012, at 11:12 AM, Andriy Gapon wrote: > on 02/10/2012 13:26 Nikolay Denev said the following: >> 7 100537 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 >> _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 >> txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe >=20 > =46rom my past experience the threads stuck in zio_wait always meant = an I/O > operation stuck in a storage controller driver, controller firmware, = etc. > Not necessarily a case here, but a possibility. >=20 > Perhaps try camcontrol tags -v to see the state of disk = queues. >=20 I'm using the mfi(4) driver which does not seem to be under CAM, but I'm = also running it with=20 the following loader tunable : hw.mfi.max_cmds=3D254, which is an = increase over the standard 128 tags, and maybe this could be the problem. I'll revert it now and retest. > P.S. > It would be nice if for debugging purposes we had some place in zio to = record > bio that it depends upon. > E.g. something like: > diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > index 80d9336..75b2fcf 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > @@ -432,6 +432,7 @@ struct zio { > #ifdef _KERNEL > /* FreeBSD only. */ > struct ostask io_task; > + void *io_bio; > #endif > }; >=20 > diff --git = a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > index 7d146ff..36bb5ad 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp) > vd->vdev_delayed_close =3D B_TRUE; > } > } > + zio->io_bio =3D NULL; > g_destroy_bio(bp); > zio_interrupt(zio); > } > @@ -732,6 +733,7 @@ sendreq: > } > bp =3D g_alloc_bio(); > bp->bio_caller1 =3D zio; > + zio->io_bio =3D bp; > switch (zio->io_type) { > case ZIO_TYPE_READ: > case ZIO_TYPE_WRITE: >=20 > Then, in situation like yours you could use kgdb, switch to the thread = in > zio_wait, go to zio_wait frame and get bio pointer from zio. =46rom = there you > could try to deduce what is going on with the I/O request. >=20 I'm rebuilding now with these patches and DDB/KDB enabled and will try = to get this information if it happens again. > --=20 > Andriy Gapon Thanks! Regards, Nikolay=