From owner-freebsd-bugs@FreeBSD.ORG Tue Jun 9 09:11:59 2009 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F3B1106566C for ; Tue, 9 Jun 2009 09:11:59 +0000 (UTC) (envelope-from freebsdusb@bindone.de) Received: from mail.bindone.de (mail.bindone.de [80.190.134.51]) by mx1.freebsd.org (Postfix) with SMTP id AF9C88FC18 for ; Tue, 9 Jun 2009 09:11:55 +0000 (UTC) (envelope-from freebsdusb@bindone.de) Received: (qmail 68972 invoked by uid 89); 9 Jun 2009 09:11:55 -0000 Received: from unknown (HELO ufo.bindone.de) (mg@bindone.de@87.152.176.85) by mail.bindone.de with ESMTPA; 9 Jun 2009 09:11:55 -0000 Message-ID: <4A2E2756.5020408@bindone.de> Date: Tue, 09 Jun 2009 11:11:50 +0200 From: Michael User-Agent: Thunderbird 2.0.0.17pre (X11/20090202) MIME-Version: 1.0 To: freebsd-bugs@freebsd.org References: <4A2E258B.9090207@bindone.de> In-Reply-To: <4A2E258B.9090207@bindone.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: Adaptec 5405 (aac0) hanging on high load X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jun 2009 09:12:00 -0000 PR has been accepted as kern/135408: http://www.freebsd.org/cgi/query-pr.cgi?pr=135408 Michael wrote: > (I filed this one as a PR through the website as well but waiting for > confirmation and assignment of a PR number) > > Hi folks, > > I've got issues with an Adaptec sometimes hanging under high load > stating COMMAND xxx TIMEOUT AFTER yyy SECONDS multiple times and then > "Controller is no longer running". (on 7.1-RELEASE, 7.2-RC2, > 7.2-RELEASE, 8-CURRENT). > > This can be provoked by high load like highly parallel make buildworld > or various benchmarks (e.g. /usr/ports/benchmarks/blogbench). > > I've been wondering if this is somehow related to the following article > in the adaptec knowledge base: > > http://ask.adaptec.com/scripts/adaptec_tic.cfg/php.exe/enduser/std_adp.php?p_faqid=15357&p_created=1225366599&p_sid=NqNtKZrj&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPSZwX3NvcnRfYnk9JnBfZ3JpZHNvcnQ9JnBfcm93X2NudD0yNjk3LDI2OTcmcF9wcm9kcz0mcF9jYXRzPSZwX3B2PSZwX2N2PSZwX3NlYXJjaF90eXBlPWFuc3dlcnMuc2VhcmNoX25sJnBfcGFnZT0x&p_li=&p_topview=1 > > It states: > "AACRAID based controllers have an underlying timeout/recovery cycle > that is 35 seconds long. > > The default in some SCSI subsystems was 60 seconds in the past, but is > now standardized at 30 seconds which results in an interference pattern > between the controller and the Linux SCSI subsystem." > > (I copy and pasted the entire article at the end of this post). > > Since sys/dev/aac/aacvar.h sets AAC_CMD_TIMEOUT to 30 seconds I've been > wondering if this is somehow related (there are also timeouts for > immediate commands and the period check for timeouts interval - not sure > how they're used in aac.c and too lazy to check). > > The bottom line is, that adaptec states that they're AACRAID based > controllers may sometimes need >35 seconds to process a command under > normal operational circumstances, if the controller is going through an > "error correction cycle on the SAS/SATA bus". > > cheers > Michael > > > -- Complete Adaptec knowledge base entry -- > AACRAID based controllers have an underlying timeout/recovery cycle that > is 35 seconds long. > > The default in some SCSI subsystems was 60 seconds in the past, but is > now standardized at 30 seconds which results in an interference pattern > between the controller and the Linux SCSI subsystem. > > The alternate workaround is for the user to adjust the timeout in SYSFS > if it is shorter than 35 seconds. > > Changing the timeout values for a Linux block device can be done via > SYSFS. For example, if /dev/sdc , /dev/sdd and /dev/sde are the device > LUNs on a given Linux host, then the following commands need to be issued: > echo 45 > /sys/block/sdc/device/timeout > echo 45 > /sys/block /sdd/device/timeout > echo 45 > /sys/block/sde/device/timeout > In this example the timeout is 45 seconds which should be enough. > > Note: Any AACRAID based controller is going through an error correction > cycle on the SAS/SATA bus that is delaying the completion of I/O beyond > the Linux default timeout set for the device, this may be a hardware > issue or a problem with the default timeout value as outlined above. If > changing the timeout value doesn't solve the problem then please follow > the steps we recommend to trouble shoot "Host adapter reset request. > SCSI hang ?" messages: > Check for any updated firmware for the motherboard, controller, targets > and enclosure on the respective manufacturer's web sites. > Check per-device queue depth in SYSFS to make sure it is reasonable. > Engage disk drive manufacturer's technical support department to check > through compatibility or drive class issues. > Engage enclosure manufacturer's technical support department to check > through compatibility issues. > _______________________________________________ > freebsd-bugs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-bugs > To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"