From owner-freebsd-current Thu Mar 23 13:32:50 2000 Delivered-To: freebsd-current@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 60D8F37C55D for ; Thu, 23 Mar 2000 13:32:34 -0800 (PST) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e2NLu3024299; Thu, 23 Mar 2000 13:56:03 -0800 (PST) Date: Thu, 23 Mar 2000 13:56:03 -0800 From: Alfred Perlstein To: mw@kpnqwest.ch Cc: freebsd-current@FreeBSD.ORG Subject: Re: AMI MegaRAID lockup? not accepting commands. Message-ID: <20000323135603.D21029@fw.wintelcom.net> References: <200003231940.TAA24686@mail.kpnqwest.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <200003231940.TAA24686@mail.kpnqwest.ch>; from mw@kpnqwest.ch on Thu, Mar 23, 2000 at 08:40:27PM +0100 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * mw@kpnqwest.ch [000323 12:47] wrote: > I've played around changing the spinloop to using DELAY (like the Linux model), > but this didn't prevent the controller from either "just" locking up or > crashing the whole machine with it. Changing various other places in a similar > manner (like replacing the bcopy() in amr_quartz_get_work() with similar > code as in the linux driver to wait for 0xFF to clear) didn't do the trick > either. > > However, when I forced the driver to not use the full number of > concurrent commands as returned by the firmware, I seem to finally have > found the one change that made the difference. Looking at the linux > code, it sets a hard limit of AMR_MAXCMD (MAX_COMMANDS in the linux code) of > 127 (my controller, a 466, returned 254), and it says the value can be tweaked > between 0 and 253, not 254...). So, forcing sc->amr_maxio to AMR_MAXCMD if > that one's smaller, in amr_query_controller(), might cause some performance > loss, but it made the code *significantly* stabler than before. I did two > make world on the raid now, and not one hickup. Before I wasn't even able to > copy over the system to the raid without sending the system to reboot. > > Possible explanation: people that introduced debugging statements slowed down > the feeding of new commands to the controller, so the controller didn't ever > use up the full set of concurrent commands. The lockup happens when too many > concurrent commands are open (now, I haven't tried setting things to 253, I > am glad things finally work:-)). dude, you rule! I'm glad this looks like it's finally resolved, can you let me know if it survives further stress testing? I've found the easiest way to wedge the box is to perform a 'cvs up' (not cvsup) from a local repository over /usr/src or /usr/ports, this would always lockup my box with amr, if you have the time and disk space that would be a much better stressor than just make world. thanks, -Alfred > > Hope this helps, > Markus > -- > KPNQwest Switzerland Ltd > P.O. Box 9470, Zweierstrasse 35, CH-8036 Zuerich > Tel: +41-1-298-6030, Fax: +41-1-291-4642 > Markus Wild, Manager Engineering, e-mail: markus.wild@kpnqwest.ch Content-Description: mydiff.short > Index: amr.c > =================================================================== > RCS file: /home/ncvs/src/sys/dev/amr/amr.c,v > retrieving revision 1.8 > diff -c -r1.8 amr.c > *** amr.c 2000/03/20 10:44:03 1.8 > --- amr.c 2000/03/23 19:20:03 > *************** > *** 699,704 **** > --- 702,712 ---- > } > sc->amr_maxdrives = 8; > sc->amr_maxio = ae->ae_adapter.aa_maxio; > + if (sc->amr_maxio > AMR_MAXCMD) { > + device_printf(sc->amr_dev, "reducing maxio from %d to %d\n", > + sc->amr_maxio, AMR_MAXCMD); > + sc->amr_maxio = AMR_MAXCMD; > + } > for (i = 0; i < ae->ae_ldrv.al_numdrives; i++) { > sc->amr_drive[i].al_size = ae->ae_ldrv.al_size[i]; > sc->amr_drive[i].al_state = ae->ae_ldrv.al_state[i]; > *************** > *** 853,859 **** > ac->ac_private = bp; > ac->ac_data = bp->b_data; > ac->ac_length = bp->b_bcount; > ! if (bp->b_iocmd == BIO_READ) { > ac->ac_flags |= AMR_CMD_DATAIN; > cmd = AMR_CMD_LREAD; > } else { > --- 861,868 ---- > ac->ac_private = bp; > ac->ac_data = bp->b_data; > ac->ac_length = bp->b_bcount; > ! /* if (bp->b_iocmd == BIO_READ) { */ > ! if (bp->b_flags & B_READ) { > ac->ac_flags |= AMR_CMD_DATAIN; > cmd = AMR_CMD_LREAD; > } else { > Index: amrvar.h > =================================================================== > RCS file: /home/ncvs/src/sys/dev/amr/amrvar.h,v > retrieving revision 1.2 > diff -c -r1.2 amrvar.h > *** amrvar.h 1999/10/26 23:18:57 1.2 > --- amrvar.h 2000/03/23 19:20:04 > *************** > *** 37,43 **** > #define AMR_CFG_SIG 0xa0 > #define AMR_SIGNATURE 0x3344 > > ! #define AMR_MAXCMD 255 /* ident = 0 not allowed */ > #define AMR_MAXLD 40 > > #define AMR_BLKSIZE 512 > --- 37,44 ---- > #define AMR_CFG_SIG 0xa0 > #define AMR_SIGNATURE 0x3344 > > ! /*#define AMR_MAXCMD 255*/ /* ident = 0 not allowed */ > ! #define AMR_MAXCMD 127 /* ident = 0 not allowed */ > #define AMR_MAXLD 40 > > #define AMR_BLKSIZE 512 -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message