From owner-freebsd-stable@FreeBSD.ORG Wed Jun 19 15:17:45 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 589F8326 for ; Wed, 19 Jun 2013 15:17:45 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id D4A9D19C1 for ; Wed, 19 Jun 2013 15:17:44 +0000 (UTC) Received: from mfilter14-d.gandi.net (mfilter14-d.gandi.net [217.70.178.142]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 0F627A80D2; Wed, 19 Jun 2013 17:17:28 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter14-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter14-d.gandi.net (mfilter14-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id KJUUtnSiRICC; Wed, 19 Jun 2013 17:16:56 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 52505A80F1; Wed, 19 Jun 2013 17:16:54 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 6DD3F73A1C; Wed, 19 Jun 2013 08:16:52 -0700 (PDT) Date: Wed, 19 Jun 2013 08:16:52 -0700 From: Jeremy Chadwick To: Dennis =?unknown-8bit?Q?K=F6gel?= Subject: Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0) Message-ID: <20130619151652.GB72566@icarus.home.lan> References: <15E6A1D4AB1D43D49C1DA02EAF463126@multiplay.co.uk> <27EED7A0-AB0B-43B5-8B7F-B424852DBD65@neveragain.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <27EED7A0-AB0B-43B5-8B7F-B424852DBD65@neveragain.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Steven Hartland , Ronald Klop X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jun 2013 15:17:45 -0000 On Wed, Jun 19, 2013 at 05:02:20PM +0200, Dennis Kgel wrote: > Am 19.06.2013 um 16:47 schrieb Steven Hartland: > > I'm not familar with that model of the areca but have you tried > > with the standard OS driver or does it not support that card? > > The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver. Which model of the ARC1320 are you using (there are 2). I'm having trouble understanding their chart too: http://www.areca.us/products/sasnoneraid6g.htm Because the controllers claim to support up to 128 disks, via break-out cables, but I'm not sure. You aren't using any port multipliers, are you? > > Also when you see hangs can you access the disk directly or not > > e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ? > > Interesting idea. The dd then hangs right until everything else resumes as well. > > ^T during hang says: load: 12.39 cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 1632k Is this ***while** you have immense amounts of ZFS write I/O going to those drives (your zpool iostat was showing ~250-300MB/sec to the pool)? It's very important to note that the stats you showed were during writes. What we're trying to figure out here is where the blocking (waiting) is happening: a) the ZFS layer b) the storage driver layer ('arcsat', the 3rd-party unofficial driver) c) the CAM layer d) the GEOM layer e) something with the disk(s) f) something with memory I/O going on (say between the storage driver and ZFS, for lack of better way to phrase it) I have a very big Email written for you, but I wanted to let certain answers to Ronald's questions come out first. -rw------- 1 jdc users 5576 Jun 19 06:49 dennis_kgel_response.txt I need to re-word this and take into consideration some of the new stuff said up to now, but I don't know if I'll ahve the time for this (you should see my desktop right now, I have literally 4 IM messages to answer and my Email box is non-stop). The one I want to get out of the way right now is this: Can you please try putting this in /boot/loader.conf + reboot and see if the behaviour for you changes? vfs.zfs.no_write_throttle="1" Warning: this may actually exacerbate the problem worse, depending on what the nature/root cause is. Right now I'm of the opinion ZFS is actually doing the Right Thing(tm) and that the issue may be in Areca's driver, but that's hearsay until I have proof. But the write throttling stuff added semi-recently (by the Illumos folks, this is not a FreeBSD feature) has had some reports of problems where disabling it helped immensely. Important: 24 disks off a single controller is a lot of bandwidth. That controller may be overwhelmed, in which case you would see exactly this kind of behaviour as the controller is screaming "GOD HELP ME, I'M TRYING TO DO ALL THIS STUFF AND YOU KEEP THROWING I/O AT ME". :-) This is also why I ask about port multiplier usage. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |