From owner-freebsd-current Thu Mar 23 11:41:28 2000 Delivered-To: freebsd-current@freebsd.org Received: from mail.kpnqwest.ch (mail.eunet.ch [146.228.10.7]) by hub.freebsd.org (Postfix) with ESMTP id 432CD37B7CF for ; Thu, 23 Mar 2000 11:40:34 -0800 (PST) (envelope-from mw@kpnqwest.ch) Received: (from mw@localhost) by mail.kpnqwest.ch (8.9.3/1.34) id TAA24686 for freebsd-current@freebsd.org; Thu, 23 Mar 2000 19:40:27 GMT env-from (mw@kpnqwest.ch) From: mw@kpnqwest.ch Message-Id: <200003231940.TAA24686@mail.kpnqwest.ch> Subject: Re: AMI MegaRAID lockup? not accepting commands. To: freebsd-current@freebsd.org Date: Thu, 23 Mar 2000 20:40:27 +0100 (CET) X-Mailer: ELM [version 2.4ME+ PL72 (25)] MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=ELM953840427-11422-2_ Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --ELM953840427-11422-2_ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit I've played around changing the spinloop to using DELAY (like the Linux model), but this didn't prevent the controller from either "just" locking up or crashing the whole machine with it. Changing various other places in a similar manner (like replacing the bcopy() in amr_quartz_get_work() with similar code as in the linux driver to wait for 0xFF to clear) didn't do the trick either. However, when I forced the driver to not use the full number of concurrent commands as returned by the firmware, I seem to finally have found the one change that made the difference. Looking at the linux code, it sets a hard limit of AMR_MAXCMD (MAX_COMMANDS in the linux code) of 127 (my controller, a 466, returned 254), and it says the value can be tweaked between 0 and 253, not 254...). So, forcing sc->amr_maxio to AMR_MAXCMD if that one's smaller, in amr_query_controller(), might cause some performance loss, but it made the code *significantly* stabler than before. I did two make world on the raid now, and not one hickup. Before I wasn't even able to copy over the system to the raid without sending the system to reboot. Possible explanation: people that introduced debugging statements slowed down the feeding of new commands to the controller, so the controller didn't ever use up the full set of concurrent commands. The lockup happens when too many concurrent commands are open (now, I haven't tried setting things to 253, I am glad things finally work:-)). Hope this helps, Markus -- KPNQwest Switzerland Ltd P.O. Box 9470, Zweierstrasse 35, CH-8036 Zuerich Tel: +41-1-298-6030, Fax: +41-1-291-4642 Markus Wild, Manager Engineering, e-mail: markus.wild@kpnqwest.ch --ELM953840427-11422-2_ Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: attachment; filename=mydiff.short Content-Description: mydiff.short Content-Transfer-Encoding: 7bit Index: amr.c =================================================================== RCS file: /home/ncvs/src/sys/dev/amr/amr.c,v retrieving revision 1.8 diff -c -r1.8 amr.c *** amr.c 2000/03/20 10:44:03 1.8 --- amr.c 2000/03/23 19:20:03 *************** *** 699,704 **** --- 702,712 ---- } sc->amr_maxdrives = 8; sc->amr_maxio = ae->ae_adapter.aa_maxio; + if (sc->amr_maxio > AMR_MAXCMD) { + device_printf(sc->amr_dev, "reducing maxio from %d to %d\n", + sc->amr_maxio, AMR_MAXCMD); + sc->amr_maxio = AMR_MAXCMD; + } for (i = 0; i < ae->ae_ldrv.al_numdrives; i++) { sc->amr_drive[i].al_size = ae->ae_ldrv.al_size[i]; sc->amr_drive[i].al_state = ae->ae_ldrv.al_state[i]; *************** *** 853,859 **** ac->ac_private = bp; ac->ac_data = bp->b_data; ac->ac_length = bp->b_bcount; ! if (bp->b_iocmd == BIO_READ) { ac->ac_flags |= AMR_CMD_DATAIN; cmd = AMR_CMD_LREAD; } else { --- 861,868 ---- ac->ac_private = bp; ac->ac_data = bp->b_data; ac->ac_length = bp->b_bcount; ! /* if (bp->b_iocmd == BIO_READ) { */ ! if (bp->b_flags & B_READ) { ac->ac_flags |= AMR_CMD_DATAIN; cmd = AMR_CMD_LREAD; } else { Index: amrvar.h =================================================================== RCS file: /home/ncvs/src/sys/dev/amr/amrvar.h,v retrieving revision 1.2 diff -c -r1.2 amrvar.h *** amrvar.h 1999/10/26 23:18:57 1.2 --- amrvar.h 2000/03/23 19:20:04 *************** *** 37,43 **** #define AMR_CFG_SIG 0xa0 #define AMR_SIGNATURE 0x3344 ! #define AMR_MAXCMD 255 /* ident = 0 not allowed */ #define AMR_MAXLD 40 #define AMR_BLKSIZE 512 --- 37,44 ---- #define AMR_CFG_SIG 0xa0 #define AMR_SIGNATURE 0x3344 ! /*#define AMR_MAXCMD 255*/ /* ident = 0 not allowed */ ! #define AMR_MAXCMD 127 /* ident = 0 not allowed */ #define AMR_MAXLD 40 #define AMR_BLKSIZE 512 --ELM953840427-11422-2_-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message