Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Jun 2009 11:11:50 +0200
From:      Michael <freebsdusb@bindone.de>
To:        freebsd-bugs@freebsd.org
Subject:   Re: Adaptec 5405 (aac0) hanging on high load
Message-ID:  <4A2E2756.5020408@bindone.de>
In-Reply-To: <4A2E258B.9090207@bindone.de>
References:  <4A2E258B.9090207@bindone.de>

next in thread | previous in thread | raw e-mail | index | archive | help
PR has been accepted as kern/135408:
http://www.freebsd.org/cgi/query-pr.cgi?pr=135408

Michael wrote:
> (I filed this one as a PR through the website as well but waiting for
> confirmation and assignment of a PR number)
> 
> Hi folks,
> 
> I've got issues with an Adaptec sometimes hanging under high load
> stating COMMAND xxx TIMEOUT AFTER yyy SECONDS multiple times and then
> "Controller is no longer running". (on 7.1-RELEASE, 7.2-RC2,
> 7.2-RELEASE, 8-CURRENT).
> 
> This can be provoked by high load like highly parallel make buildworld
> or various benchmarks (e.g. /usr/ports/benchmarks/blogbench).
> 
> I've been wondering if this is somehow related to the following article
> in the adaptec knowledge base:
> 
> http://ask.adaptec.com/scripts/adaptec_tic.cfg/php.exe/enduser/std_adp.php?p_faqid=15357&p_created=1225366599&p_sid=NqNtKZrj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPSZwX3NvcnRfYnk9JnBfZ3JpZHNvcnQ9JnBfcm93X2NudD0yNjk3LDI2OTcmcF9wcm9kcz0mcF9jYXRzPSZwX3B2PSZwX2N2PSZwX3NlYXJjaF90eXBlPWFuc3dlcnMuc2VhcmNoX25sJnBfcGFnZT0x&p_li=&p_topview=1
> 
> It states:
> "AACRAID based controllers have an underlying timeout/recovery cycle
> that is 35 seconds long.
> 
> The default in some SCSI subsystems was 60 seconds in the past, but is
> now standardized at 30 seconds which results in an interference pattern
> between the controller and the Linux SCSI subsystem."
> 
> (I copy and pasted the entire article at the end of this post).
> 
> Since sys/dev/aac/aacvar.h sets AAC_CMD_TIMEOUT to 30 seconds I've been
> wondering if this is somehow related (there are also timeouts for
> immediate commands and the period check for timeouts interval - not sure
> how they're used in aac.c and too lazy to check).
> 
> The bottom line is, that adaptec states that they're AACRAID based
> controllers may sometimes need >35 seconds to process a command under
> normal operational circumstances, if the controller is going through an
> "error correction cycle on the SAS/SATA bus".
> 
> cheers
> Michael
> 
> 
> -- Complete Adaptec knowledge base entry --
> AACRAID based controllers have an underlying timeout/recovery cycle that
> is 35 seconds long.
> 
> The default in some SCSI subsystems was 60 seconds in the past, but is
> now standardized at 30 seconds which results in an interference pattern
> between the controller and the Linux SCSI subsystem.
> 
> The alternate workaround is for the user to adjust the timeout in SYSFS
> if it is shorter than 35 seconds.
> 
> Changing the timeout values for a Linux block device can be done via
> SYSFS. For example, if /dev/sdc , /dev/sdd and /dev/sde are the device
> LUNs on a given Linux host, then the following commands need to be issued:
> echo 45 > /sys/block/sdc/device/timeout
> echo 45 > /sys/block /sdd/device/timeout
> echo 45 > /sys/block/sde/device/timeout
> In this example the timeout is 45 seconds which should be enough.
> 
> Note: Any AACRAID based controller is going through an error correction
> cycle on the SAS/SATA bus that is delaying the completion of I/O beyond
> the Linux default timeout set for the device, this may be a hardware
> issue or a problem with the default timeout value as outlined above. If
> changing the timeout value doesn't solve the problem then please follow
> the steps we recommend to trouble shoot "Host adapter reset request.
> SCSI hang ?" messages:
> Check for any updated firmware for the motherboard, controller, targets
> and enclosure on the respective manufacturer's web sites.
> Check per-device queue depth in SYSFS to make sure it is reasonable.
> Engage disk drive manufacturer's technical support department to check
> through compatibility or drive class issues.
> Engage enclosure manufacturer's technical support department to check
> through compatibility issues.
> _______________________________________________
> freebsd-bugs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
> To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A2E2756.5020408>