From owner-freebsd-bugs@FreeBSD.ORG  Tue Jun  9 09:11:59 2009
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F3B1106566C
	for <freebsd-bugs@freebsd.org>; Tue,  9 Jun 2009 09:11:59 +0000 (UTC)
	(envelope-from freebsdusb@bindone.de)
Received: from mail.bindone.de (mail.bindone.de [80.190.134.51])
	by mx1.freebsd.org (Postfix) with SMTP id AF9C88FC18
	for <freebsd-bugs@freebsd.org>; Tue,  9 Jun 2009 09:11:55 +0000 (UTC)
	(envelope-from freebsdusb@bindone.de)
Received: (qmail 68972 invoked by uid 89); 9 Jun 2009 09:11:55 -0000
Received: from unknown (HELO ufo.bindone.de) (mg@bindone.de@87.152.176.85)
	by mail.bindone.de with ESMTPA; 9 Jun 2009 09:11:55 -0000
Message-ID: <4A2E2756.5020408@bindone.de>
Date: Tue, 09 Jun 2009 11:11:50 +0200
From: Michael <freebsdusb@bindone.de>
User-Agent: Thunderbird 2.0.0.17pre (X11/20090202)
MIME-Version: 1.0
To: freebsd-bugs@freebsd.org
References: <4A2E258B.9090207@bindone.de>
In-Reply-To: <4A2E258B.9090207@bindone.de>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: Adaptec 5405 (aac0) hanging on high load
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jun 2009 09:12:00 -0000

PR has been accepted as kern/135408:
http://www.freebsd.org/cgi/query-pr.cgi?pr=135408

Michael wrote:
> (I filed this one as a PR through the website as well but waiting for
> confirmation and assignment of a PR number)
> 
> Hi folks,
> 
> I've got issues with an Adaptec sometimes hanging under high load
> stating COMMAND xxx TIMEOUT AFTER yyy SECONDS multiple times and then
> "Controller is no longer running". (on 7.1-RELEASE, 7.2-RC2,
> 7.2-RELEASE, 8-CURRENT).
> 
> This can be provoked by high load like highly parallel make buildworld
> or various benchmarks (e.g. /usr/ports/benchmarks/blogbench).
> 
> I've been wondering if this is somehow related to the following article
> in the adaptec knowledge base:
> 
> http://ask.adaptec.com/scripts/adaptec_tic.cfg/php.exe/enduser/std_adp.php?p_faqid=15357&p_created=1225366599&p_sid=NqNtKZrj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPSZwX3NvcnRfYnk9JnBfZ3JpZHNvcnQ9JnBfcm93X2NudD0yNjk3LDI2OTcmcF9wcm9kcz0mcF9jYXRzPSZwX3B2PSZwX2N2PSZwX3NlYXJjaF90eXBlPWFuc3dlcnMuc2VhcmNoX25sJnBfcGFnZT0x&p_li=&p_topview=1
> 
> It states:
> "AACRAID based controllers have an underlying timeout/recovery cycle
> that is 35 seconds long.
> 
> The default in some SCSI subsystems was 60 seconds in the past, but is
> now standardized at 30 seconds which results in an interference pattern
> between the controller and the Linux SCSI subsystem."
> 
> (I copy and pasted the entire article at the end of this post).
> 
> Since sys/dev/aac/aacvar.h sets AAC_CMD_TIMEOUT to 30 seconds I've been
> wondering if this is somehow related (there are also timeouts for
> immediate commands and the period check for timeouts interval - not sure
> how they're used in aac.c and too lazy to check).
> 
> The bottom line is, that adaptec states that they're AACRAID based
> controllers may sometimes need >35 seconds to process a command under
> normal operational circumstances, if the controller is going through an
> "error correction cycle on the SAS/SATA bus".
> 
> cheers
> Michael
> 
> 
> -- Complete Adaptec knowledge base entry --
> AACRAID based controllers have an underlying timeout/recovery cycle that
> is 35 seconds long.
> 
> The default in some SCSI subsystems was 60 seconds in the past, but is
> now standardized at 30 seconds which results in an interference pattern
> between the controller and the Linux SCSI subsystem.
> 
> The alternate workaround is for the user to adjust the timeout in SYSFS
> if it is shorter than 35 seconds.
> 
> Changing the timeout values for a Linux block device can be done via
> SYSFS. For example, if /dev/sdc , /dev/sdd and /dev/sde are the device
> LUNs on a given Linux host, then the following commands need to be issued:
> echo 45 > /sys/block/sdc/device/timeout
> echo 45 > /sys/block /sdd/device/timeout
> echo 45 > /sys/block/sde/device/timeout
> In this example the timeout is 45 seconds which should be enough.
> 
> Note: Any AACRAID based controller is going through an error correction
> cycle on the SAS/SATA bus that is delaying the completion of I/O beyond
> the Linux default timeout set for the device, this may be a hardware
> issue or a problem with the default timeout value as outlined above. If
> changing the timeout value doesn't solve the problem then please follow
> the steps we recommend to trouble shoot "Host adapter reset request.
> SCSI hang ?" messages:
> Check for any updated firmware for the motherboard, controller, targets
> and enclosure on the respective manufacturer's web sites.
> Check per-device queue depth in SYSFS to make sure it is reasonable.
> Engage disk drive manufacturer's technical support department to check
> through compatibility or drive class issues.
> Engage enclosure manufacturer's technical support department to check
> through compatibility issues.
> _______________________________________________
> freebsd-bugs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
> To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"