From owner-freebsd-stable@FreeBSD.ORG  Wed Jun 19 15:17:45 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 589F8326
 for <freebsd-stable@freebsd.org>; Wed, 19 Jun 2013 15:17:45 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net
 [217.70.183.195])
 by mx1.freebsd.org (Postfix) with ESMTP id D4A9D19C1
 for <freebsd-stable@freebsd.org>; Wed, 19 Jun 2013 15:17:44 +0000 (UTC)
Received: from mfilter14-d.gandi.net (mfilter14-d.gandi.net [217.70.178.142])
 by relay3-d.mail.gandi.net (Postfix) with ESMTP id 0F627A80D2;
 Wed, 19 Jun 2013 17:17:28 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mfilter14-d.gandi.net
Received: from relay3-d.mail.gandi.net ([217.70.183.195])
 by mfilter14-d.gandi.net (mfilter14-d.gandi.net [10.0.15.180]) (amavisd-new,
 port 10024)
 with ESMTP id KJUUtnSiRICC; Wed, 19 Jun 2013 17:16:56 +0200 (CEST)
X-Originating-IP: 76.102.14.35
Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net
 [76.102.14.35]) (Authenticated sender: jdc@koitsu.org)
 by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 52505A80F1;
 Wed, 19 Jun 2013 17:16:54 +0200 (CEST)
Received: by icarus.home.lan (Postfix, from userid 1000)
 id 6DD3F73A1C; Wed, 19 Jun 2013 08:16:52 -0700 (PDT)
Date: Wed, 19 Jun 2013 08:16:52 -0700
From: Jeremy Chadwick <jdc@koitsu.org>
To: Dennis =?unknown-8bit?Q?K=F6gel?= <dk@neveragain.de>
Subject: Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)
Message-ID: <20130619151652.GB72566@icarus.home.lan>
References: <C2AA9591-CBF4-4956-BABE-08BD8994FF8C@neveragain.de>
 <op.wyxg11zc8527sy@ronaldradial.versatec.local>
 <FD9290D8-1A12-4F28-816B-94EFB4516DA4@neveragain.de>
 <EA2D201C731C46CB8F7BE4972847A53B@multiplay.co.uk>
 <B199EA9B-6E1C-4B1A-A8F3-4574FF61AEC0@neveragain.de>
 <15E6A1D4AB1D43D49C1DA02EAF463126@multiplay.co.uk>
 <27EED7A0-AB0B-43B5-8B7F-B424852DBD65@neveragain.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <27EED7A0-AB0B-43B5-8B7F-B424852DBD65@neveragain.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org, Steven Hartland <killing@multiplay.co.uk>,
 Ronald Klop <ronald-freebsd8@klop.yi.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Jun 2013 15:17:45 -0000

On Wed, Jun 19, 2013 at 05:02:20PM +0200, Dennis Kgel wrote:
> Am 19.06.2013 um 16:47 schrieb Steven Hartland:
> > I'm not familar with that model of the areca but have you tried
> > with the standard OS driver or does it not support that card?
> 
> The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver.

Which model of the ARC1320 are you using (there are 2).  I'm having
trouble understanding their chart too:

http://www.areca.us/products/sasnoneraid6g.htm

Because the controllers claim to support up to 128 disks, via break-out
cables, but I'm not sure.

You aren't using any port multipliers, are you?

> > Also when you see hangs can you access the disk directly or not
> > e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?
> 
> Interesting idea. The dd then hangs right until everything else resumes as well.
> 
> ^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 1632k

Is this ***while** you have immense amounts of ZFS write I/O going to
those drives (your zpool iostat was showing ~250-300MB/sec to the pool)?

It's very important to note that the stats you showed were during
writes.

What we're trying to figure out here is where the blocking (waiting) is
happening:

a) the ZFS layer
b) the storage driver layer ('arcsat', the 3rd-party unofficial driver)
c) the CAM layer
d) the GEOM layer
e) something with the disk(s)
f) something with memory I/O going on (say between the storage driver
   and ZFS, for lack of better way to phrase it)

I have a very big Email written for you, but I wanted to let certain
answers to Ronald's questions come out first.

-rw-------    1 jdc       users     5576 Jun 19 06:49 dennis_kgel_response.txt

I need to re-word this and take into consideration some of the new stuff
said up to now, but I don't know if I'll ahve the time for this (you
should see my desktop right now, I have literally 4 IM messages to
answer and my Email box is non-stop).

The one I want to get out of the way right now is this:

Can you please try putting this in /boot/loader.conf + reboot and
see if the behaviour for you changes?

vfs.zfs.no_write_throttle="1"

Warning: this may actually exacerbate the problem worse, depending on
what the nature/root cause is.  Right now I'm of the opinion ZFS is
actually doing the Right Thing(tm) and that the issue may be in Areca's
driver, but that's hearsay until I have proof.  But the write throttling
stuff added semi-recently (by the Illumos folks, this is not a FreeBSD
feature) has had some reports of problems where disabling it helped
immensely.

Important: 24 disks off a single controller is a lot of bandwidth.
That controller may be overwhelmed, in which case you would see
exactly this kind of behaviour as the controller is screaming "GOD HELP
ME, I'M TRYING TO DO ALL THIS STUFF AND YOU KEEP THROWING I/O AT ME".
:-)  This is also why I ask about port multiplier usage.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |