From owner-freebsd-questions@FreeBSD.ORG Sun May 18 20:31:12 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C18BE37B401 for ; Sun, 18 May 2003 20:31:12 -0700 (PDT) Received: from entiat.olympus.net (entiat.olympus.net [65.117.224.69]) by mx1.FreeBSD.org (Postfix) with ESMTP id A63A643FB1 for ; Sun, 18 May 2003 20:31:11 -0700 (PDT) (envelope-from cp@olympus.net) Received: from intentiat2 ([127.0.0.2] helo=intEntiat.olympus.net) by entiat.olympus.net with esmtp (Exim 4.10) id 19HbMl-0005YM-00 for freebsd-questions@freebsd.org; Sun, 18 May 2003 20:31:11 -0700 Received: from intEntiat2 ([127.0.0.2]) by intEntiat.olympus.net (MailMonitor for SMTP v1.2.2 ) ; Sun, 18 May 2003 20:31:10 -0700 (PDT) Envelope-to: freebsd-questions@FreeBSD.ORG Received: from 0-2pool33-168.nas14.bellevue1.wa.us.da.qwest.net ([67.3.33.168] helo=compaq7058) by entiat.olympus.net with smtp (Exim 4.10) id 19HbMc-0005W4-00 for freebsd-questions@FreeBSD.ORG; Sun, 18 May 2003 20:31:02 -0700 Message-ID: <002901c31db7$b695d800$de260343@compaq7058> From: "cp" To: Date: Sun, 18 May 2003 20:35:36 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 X-Olympus-Spam-Filter-Strength: aggressive X-Olympus-Spamindex: 1.6 X-Olympus-Virus: Scanned Subject: AIC7902 SCSI Timeouts and Dumps X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 May 2003 03:31:13 -0000 This is partly a repost. I want to try one more time before filling out the bug report. This new system gets SCSI timeouts which cause drives to be dropped while running. It runs rock solid when booted from the IDE drive. It works perfectly under W2K. I've tried 4.8 April 3 and 5.0 January. On 4.8 the Problem is quite random and becomes less frequent when controller drive speed is set down to 160/80Mhz. It is still not stable enough to go into production. On 5.0 it's quite hopeless. The panics are various related to what disk data is needed at the time but all point back to some timing on the controller. I've run Channel A and B with drives seperated or on same channel. The only consistent clue is that ahd1 gives a 'card paused' and Card Dump on 4.8 during boot (see below). I have tried every BIOS, hardware disable and SCSI utilility possible with no luck. I can make it better but not good enough. The system worked for the vendor but they don't do Unix. Adaptec doesn't officially support FreeBSD due to it being embedded. They provide drivers for Red Hat, SUSE, W2K, DOS etc. Unless someone just happens to know something about dumps from the ahd driver, I realize that asking to evaluate the information I've collected is over the line so I'm not including that information or asking that question. My question ends up being, How do I determine what is being worked on for state-of-the-art hardware to avoid continuing to take potshots at reinstallations, upgrades and reconfigurations (the bug report list does not have this problem listed)? I've put absurd hours into this system and never had such a trying experience with FreeBSD... but I've never needed any support before. The only thing I can say for sure is the machine fails on FreeBSD and works on W2k server. Sadly all the people selling hardware care only about the latter. Hardware: Supermicro 7043A-8R (X5DA8 Mbd, 2 Xeon 2.6Ghz, 2 GB, AIC7902, Super GEM 318, E7505), 2 Seagate Cheetah ST336753LC and 1 WD1201AD IDE. Pertinent part of dmesg (this is just the Dump Card State and Card Paused that occurs at every boot): ahd1: PCI error Interrupt >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< ahd1: Dumping Card State at program address 0x90 Mode 0x33 Card was paused HS_MAILBOX[0x0] INTCTL[0x0] SEQINTSTAT[0x0] SAVED_MODE[0x0] DFFSTAT[0x30]:(CURRFIFO_0|FIFO0FREE|FIFO1FREE) SCSISIGI[0x0]:(P_DATAOUT) SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1]:(P_DATAOUT|P_BUSFREE) SCSISEQ0[0x0] SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) SEQCTL0[0x10]:(FASTMODE) SEQINTCTL[0x80]:(INTVEC1DSL) SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] SSTAT1[0x8]:(BUSFREE) SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO) LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 16 CMDS_PENDING = 0 LASTSCB 0xffff CURRSCB 0x0 NEXTSCB 0x0 qinstart = 0 qinfifonext = 0 QINFIFO: WAITING_TID_QUEUES: Pending list: Total 0 Kernel Free SCB list: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: ahd1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0, LJSCB 0xff00 SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|ENCFG4ISTAT|ENCFG4DATA|ENS AVEPTRS) SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_AVAIL) ahd1: FIFO1 Free, LONGJMP == 0x8072, SCB 0x0, LJSCB 0xff00 SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|ENCFG4ISTAT|ENCFG4DATA|ENS AVEPTRS) SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_AVAIL) LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ahd1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 ahd1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 SIMODE0[0x6c]:(ENOVERRUN|ENIOERR|ENSELDI|ENSELDO) CCSCBCTL[0x0] ahd1: REG0 == 0xe735, SINDEX = 0x33, DINDEX = 0x0 ahd1: SCBPTR == 0x1ff, SCB_NEXT == 0xff00, SCB_NEXT2 == 0x0 CDB ff 1 0 0 0 0 STACK: 0x1 0x8 0x7 0x6 0x5 0x4 0x3 0x2e >>>>>>>>>>>>>>>>> ahd1: Signaled Target Abort