Date: Tue, 09 Jun 2009 11:11:50 +0200 From: Michael <freebsdusb@bindone.de> To: freebsd-bugs@freebsd.org Subject: Re: Adaptec 5405 (aac0) hanging on high load Message-ID: <4A2E2756.5020408@bindone.de> In-Reply-To: <4A2E258B.9090207@bindone.de> References: <4A2E258B.9090207@bindone.de>
next in thread | previous in thread | raw e-mail | index | archive | help
PR has been accepted as kern/135408: http://www.freebsd.org/cgi/query-pr.cgi?pr=135408 Michael wrote: > (I filed this one as a PR through the website as well but waiting for > confirmation and assignment of a PR number) > > Hi folks, > > I've got issues with an Adaptec sometimes hanging under high load > stating COMMAND xxx TIMEOUT AFTER yyy SECONDS multiple times and then > "Controller is no longer running". (on 7.1-RELEASE, 7.2-RC2, > 7.2-RELEASE, 8-CURRENT). > > This can be provoked by high load like highly parallel make buildworld > or various benchmarks (e.g. /usr/ports/benchmarks/blogbench). > > I've been wondering if this is somehow related to the following article > in the adaptec knowledge base: > > http://ask.adaptec.com/scripts/adaptec_tic.cfg/php.exe/enduser/std_adp.php?p_faqid=15357&p_created=1225366599&p_sid=NqNtKZrj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPSZwX3NvcnRfYnk9JnBfZ3JpZHNvcnQ9JnBfcm93X2NudD0yNjk3LDI2OTcmcF9wcm9kcz0mcF9jYXRzPSZwX3B2PSZwX2N2PSZwX3NlYXJjaF90eXBlPWFuc3dlcnMuc2VhcmNoX25sJnBfcGFnZT0x&p_li=&p_topview=1 > > It states: > "AACRAID based controllers have an underlying timeout/recovery cycle > that is 35 seconds long. > > The default in some SCSI subsystems was 60 seconds in the past, but is > now standardized at 30 seconds which results in an interference pattern > between the controller and the Linux SCSI subsystem." > > (I copy and pasted the entire article at the end of this post). > > Since sys/dev/aac/aacvar.h sets AAC_CMD_TIMEOUT to 30 seconds I've been > wondering if this is somehow related (there are also timeouts for > immediate commands and the period check for timeouts interval - not sure > how they're used in aac.c and too lazy to check). > > The bottom line is, that adaptec states that they're AACRAID based > controllers may sometimes need >35 seconds to process a command under > normal operational circumstances, if the controller is going through an > "error correction cycle on the SAS/SATA bus". > > cheers > Michael > > > -- Complete Adaptec knowledge base entry -- > AACRAID based controllers have an underlying timeout/recovery cycle that > is 35 seconds long. > > The default in some SCSI subsystems was 60 seconds in the past, but is > now standardized at 30 seconds which results in an interference pattern > between the controller and the Linux SCSI subsystem. > > The alternate workaround is for the user to adjust the timeout in SYSFS > if it is shorter than 35 seconds. > > Changing the timeout values for a Linux block device can be done via > SYSFS. For example, if /dev/sdc , /dev/sdd and /dev/sde are the device > LUNs on a given Linux host, then the following commands need to be issued: > echo 45 > /sys/block/sdc/device/timeout > echo 45 > /sys/block /sdd/device/timeout > echo 45 > /sys/block/sde/device/timeout > In this example the timeout is 45 seconds which should be enough. > > Note: Any AACRAID based controller is going through an error correction > cycle on the SAS/SATA bus that is delaying the completion of I/O beyond > the Linux default timeout set for the device, this may be a hardware > issue or a problem with the default timeout value as outlined above. If > changing the timeout value doesn't solve the problem then please follow > the steps we recommend to trouble shoot "Host adapter reset request. > SCSI hang ?" messages: > Check for any updated firmware for the motherboard, controller, targets > and enclosure on the respective manufacturer's web sites. > Check per-device queue depth in SYSFS to make sure it is reasonable. > Engage disk drive manufacturer's technical support department to check > through compatibility or drive class issues. > Engage enclosure manufacturer's technical support department to check > through compatibility issues. > _______________________________________________ > freebsd-bugs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-bugs > To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A2E2756.5020408>