From owner-freebsd-scsi Sun Nov 7 3:59:45 1999 Delivered-To: freebsd-scsi@freebsd.org Received: from mail.powertech.no (intentia.powertech.no [195.159.0.220]) by hub.freebsd.org (Postfix) with ESMTP id CCD2214C02 for ; Sun, 7 Nov 1999 03:59:32 -0800 (PST) (envelope-from shamz@login1.powertech.no) Received: from login1.powertech.no (IDENT:root@login1.powertech.no [195.159.0.151]) by mail.powertech.no (8.9.3/8.8.5) with ESMTP id OAA30143; Sun, 7 Nov 1999 14:04:54 +0100 Received: (from shamz@localhost) by login1.powertech.no (8.9.3/8.9.3) id MAA21311; Sun, 7 Nov 1999 12:59:30 +0100 Date: Sun, 7 Nov 1999 12:59:30 +0100 From: Shaun Jurrens To: "Justin T. Gibbs" Cc: scsi@freebsd.org Subject: Re: scsi bus errors Message-ID: <19991107125930.A20165@shamz.net> References: <19991105120951.C1083@shamz.net> <199911051943.MAA69413@narnia.plutotech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <199911051943.MAA69413@narnia.plutotech.com>; from Justin T. Gibbs on Fri, Nov 05, 1999 at 12:43:02PM -0700 Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Nov 05, 1999 at 12:43:02PM -0700, Justin T. Gibbs wrote: #> In article <19991105120951.C1083@shamz.net> you wrote: #> #> [ #> I've reformatted your mail so it fits in 80 columns. This makes #> it much easier to read. #> ] sorry, forgot to import my nexrc to this acct #> #> > Hi, #> > #> > After reading the lists and trying about everything under the sun #> > to get the errors to abate, I am finally writing. The setup is #> > about the same as all the others with SCB timeout errors. #> #> ... #> #> > I left out the logs because they don't seem to have been more than #> > grounds for speculation about termination and such up until now. #> #> As far as I know, we've resolved all other "timeout" type errors #> with the ahc driver. This was only possible because the people #> having the problems gave detailed information about their setup #> and the errors that occurred. In other words, provide the output #> from 'dmesg' for the system having problems as well as the messages #> output by the driver when the errors occur and we'll see what we #> can do. Leave the determination of what is valuable information #> to the experts. #> #> -- #> Justin Well then let's begin with dmesg: Copyright (c) 1992-1999 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 3.3-STABLE #0: Sun Oct 24 17:54:31 CEST 1999 root@dakota.shamz.net:/usr/src/sys/compile/DAKOTA Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 350797513 Hz CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x580 Stepping = 0 Features=0x8001bf AMD Features=0x80000800 real memory = 67108864 (65536K bytes) avail memory = 61997056 (60544K bytes) Preloaded elf kernel "kernel" at 0xc0312000. Probing for devices on PCI bus 0: chip0: rev 0x04 on pci0.0.0 chip1: rev 0x00 on pci0.1.0 chip2: rev 0x41 on pci0.7.0 chip3: rev 0x10 on pci0.7.3 ahc0: rev 0x03 int a irq 9 on pci0.8.0 ahc0: aic7870 Wide Channel A, SCSI Id=7, 16/255 SCBs vga0: rev 0x03 int a irq 10 on pci0.9.0 rl0: rev 0x10 int a irq 11 on pci0.10.0 rl0: Ethernet address: 00:e0:7d:01:00:99 rl0: autoneg complete, link status good (half-duplex, 10Mbps) Probing for devices on PCI bus 1: Probing for PnP devices: CSN 1 Vendor ID: CTL00f0 [0xf0008c0e] Serial 0xffffffff Comp ID: PNPb02f [0x2fb0d041] This is a Vibra16X, but LDN 0 is disabled Probing for devices on the ISA bus: sc0 on isa sc0: VGA color <16 virtual consoles, flags=0x0> sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 flags 0x10 on isa sio1: type 16550A atkbdc0 at 0x60-0x6f on motherboard atkbd0 irq 1 on isa psm0 irq 12 on isa psm0: model Generic PS/2 mouse, device ID 0 ppc0 at 0x378 irq 7 on isa ppc0: Winbond chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppb0: IEEE1284 device found /ECP Probing for PnP devices on ppbus0: ppbus0: MEDIA lpt0: on ppbus 0 lpt0: Interrupt-driven port ppi0: on ppbus 0 pcm0 not found fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in fd1: 1.2MB 5.25in npx0 on motherboard npx0: INT 16 interface vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa IP packet filtering initialized, divert enabled, rule-based forwarding disabled, default to accept, logging limited to 100 packets/entry by default IP Filter: initialized. Default = pass all, Logging = enabled Waiting 8 seconds for SCSI devices to settle da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 16.128MB/s transfers (8.064MHz, offset 8, 16bit), Tagged Queueing Enabled da1: 4101MB (8399520 512 byte sectors: 255H 63S/T 522C) da2 at ahc0 bus 0 target 9 lun 0 da2: Fixed Direct Access SCSI-2 device da2: 16.128MB/s transfers (8.064MHz, offset 8, 16bit), Tagged Queueing Enabled da2: 1075MB (2203480 512 byte sectors: 255H 63S/T 137C) da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit), Tagged Queueing Enabled da0: 1010MB (2069860 512 byte sectors: 64H 32S/T 1010C) cd0 at ahc0 bus 0 target 2 lun 0 cd0: Removable CD-ROM SCSI-2 device cd0: 4.629MB/s transfers (4.629MHz, offset 8) cd0: cd present [105372 x 2048 byte records] cd9660: Joliet Extension rl0: selecting MII, 10Mbps, half duplex As you can see, i have tried to take the transfer speed down between the controler and the quantum drives, but that did not result in any noticeable reduction in errors (as suggested in the man page). I am currently trying to retrieve the drive specs to check the jumpers on all drives once again, just to be sure and have retrieved a new bios for my fic board, after i noticed that at least one other had the same board as I do. BTW, i am still not on the list. I'm working on that too, but the machine crashed this morning, so now I have to get this hardware problem taken care of first. A few console errors too, just for length and completeness... Oct 3 19:03:34 dakota /kernel: (da2:ahc0:0:9:0): Queuing a BDR SCB Oct 3 19:03:34 dakota /kernel: (da2:ahc0:0:9:0): Bus Device Reset Message Sent Oct 3 19:03:34 dakota /kernel: (da2:ahc0:0:9:0): no longer in timeout, status = 34b Oct 3 19:03:34 dakota /kernel: ahc0: Bus Device Reset on A:9. 3 SCBs aborted Oct 4 21:40:03 dakota /kernel: (da2:ahc0:0:9:0): data overrun detected in Data- In phase. Tag == 0x45. Oct 4 21:40:03 dakota /kernel: (da2:ahc0:0:9:0): Have seen Data Phase. Length = 40960. NumSGs = 10. Oct 4 21:40:03 dakota /kernel: sg[0] - Addr 0x2735000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[1] - Addr 0x1d36000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[2] - Addr 0x3d77000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[3] - Addr 0x2878000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[4] - Addr 0x2179000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[5] - Addr 0x2eba000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[6] - Addr 0x113b000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[7] - Addr 0x243c000 : Length 4096 Oct 4 21:40:03 dakota /kernel: sg[8] - Addr 0x39fd000 : Length 4096 Oct 4 21:47:34 dakota /kernel: pid 808 (navigator-4.61.b), uid 1002: exited on signal 10 here, it obviously killed netscape, but that's not hard Oct 8 12:56:35 dakota /kernel: (da2:ahc0:0:9:0): data overrun detected in Data- Out phase. Tag == 0x1a. Oct 8 12:56:35 dakota /kernel: (da2:ahc0:0:9:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 8 12:56:35 dakota /kernel: sg[0] - Addr 0x3075000 : Length 4096 Oct 9 21:03:04 dakota mountd[143]: mount request succeeded from 192.168.0.17 fo r /usr/local/public/root/midge Oct 9 21:28:07 dakota /kernel: (da1:ahc0:0:1:0): data overrun detected in Data- Out phase. Tag == 0x21. Oct 9 21:28:07 dakota /kernel: (da1:ahc0:0:1:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 9 21:28:07 dakota /kernel: sg[0] - Addr 0x1d89000 : Length 4096 Oct 9 21:29:03 dakota /kernel: (da1:ahc0:0:1:0): SCB 0x2c - timed out while idl e, LASTPHASE == 0x1, SEQADDR == 0xb Oct 9 21:29:03 dakota /kernel: (da1:ahc0:0:1:0): Queuing a BDR SCB Oct 9 21:29:03 dakota /kernel: (da1:ahc0:0:1:0): Bus Device Reset Message Sent Oct 9 21:29:03 dakota /kernel: (da1:ahc0:0:1:0): no longer in timeout, status = 34b Oct 9 21:29:03 dakota /kernel: ahc0: Bus Device Reset on A:1. 10 SCBs aborted Oct 9 21:29:31 dakota /kernel: (da1:ahc0:0:1:0): data overrun detected in Data- Out phase. Tag == 0x21. Oct 9 21:29:31 dakota /kernel: (da1:ahc0:0:1:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 9 21:29:31 dakota /kernel: sg[0] - Addr 0xeb5000 : Length 4096 Oct 9 21:31:22 dakota /kernel: (da1:ahc0:0:1:0): data overrun detected in Data- Out phase. Tag == 0xf0. Oct 9 21:31:22 dakota /kernel: (da1:ahc0:0:1:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 9 21:31:22 dakota /kernel: sg[0] - Addr 0x2cdd000 : Length 4096 Oct 9 21:34:43 dakota /kernel: (da2:ahc0:0:9:0): data overrun detected in Data- In phase. Tag == 0x23. Oct 9 21:34:43 dakota /kernel: (da2:ahc0:0:9:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 9 21:34:43 dakota /kernel: sg[0] - Addr 0x3e86000 : Length 4096 Oct 9 21:42:59 dakota /kernel: pid 45161 (ld), uid 0: exited on signal 11 (core dumped) this was not nice. Oct 12 02:06:20 dakota /kernel: (da1:ahc0:0:1:0): data overrun detected in Data- In phase. Tag == 0x26. Oct 12 02:06:20 dakota /kernel: (da1:ahc0:0:1:0): Have seen Data Phase. Length = 1024. NumSGs = 1. Oct 12 07:43:27 dakota /kernel: (da2:ahc0:0:9:0): SCB 0xa - timed out while idle , LASTPHASE == 0x1, SEQADDR == 0xc Oct 12 07:43:35 dakota /kernel: (da2:ahc0:0:9:0): Queuing a BDR SCB Oct 12 07:43:35 dakota /kernel: (da2:ahc0:0:9:0): Bus Device Reset Message Sent Oct 12 07:43:35 dakota /kernel: (da2:ahc0:0:9:0): no longer in timeout, status = 34b Oct 12 07:43:35 dakota /kernel: ahc0: Bus Device Reset on A:9. 1 SCBs aborted Oct 12 07:44:27 dakota /kernel: (da1:ahc0:0:1:0): SCB 0x9 - timed out while idle , LASTPHASE == 0x1, SEQADDR == 0x9 Oct 12 07:44:27 dakota /kernel: (da1:ahc0:0:1:0): Queuing a BDR SCB Oct 12 07:44:27 dakota /kernel: (da1:ahc0:0:1:0): Bus Device Reset Message Sent Oct 12 07:44:27 dakota /kernel: (da1:ahc0:0:1:0): no longer in timeout, status = 34b Oct 12 07:44:27 dakota /kernel: ahc0: Bus Device Reset on A:1. 7 SCBs aborted Oct 12 07:45:00 dakota /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET Oct 12 07:45:00 dakota /kernel: SAVED_TCL == 0x10, ARG_1 == 0x6, SEQ_FLAGS == 0x 40 Oct 12 07:45:00 dakota /kernel: ahc0: Bus Device Reset on A:1. 13 SCBs aborted Oct 12 07:45:27 dakota /kernel: (da2:ahc0:0:9:0): SCB 0x11 - timed out while idl e, LASTPHASE == 0x1, SEQADDR == 0x9 Oct 12 07:45:35 dakota /kernel: (da2:ahc0:0:9:0): Queuing a BDR SCB Oct 12 07:45:35 dakota /kernel: (da2:ahc0:0:9:0): Bus Device Reset Message Sent Oct 12 07:45:35 dakota /kernel: (da2:ahc0:0:9:0): no longer in timeout, status = 34b Oct 12 07:45:35 dakota /kernel: ahc0: Bus Device Reset on A:9. 3 SCBs aborted Oct 12 07:49:58 dakota /kernel: pid 27230 (cc), uid 0: exited on signal 11 (core dumped) again, not very nice when you're compiling Oct 18 21:31:25 dakota /kernel: (da2:ahc0:0:9:0): data overrun detected in Data- Out phase. Tag == 0x2. Oct 18 21:31:25 dakota /kernel: (da2:ahc0:0:9:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 18 21:31:25 dakota /kernel: sg[0] - Addr 0x287b000 : Length 4096 Oct 21 22:02:16 dakota /kernel: (da2:ahc0:0:9:0): data overrun detected in Data- Out phase. Tag == 0x3e. Oct 21 22:02:16 dakota /kernel: (da2:ahc0:0:9:0): Have seen Data Phase. Length = 8192. NumSGs = 2. Oct 21 22:02:16 dakota /kernel: sg[0] - Addr 0x2a01000 : Length 4096 Oct 22 02:08:04 dakota /kernel: ahc0:A:9: ahc_intr - referenced scb not valid du ring seqint 0x71 scb(36) Oct 22 02:09:12 dakota /kernel: ahc0: WARNING no command for scb 36 (cmdcmplt) Oct 22 02:09:12 dakota /kernel: QOUTPOS = 102 Oct 22 02:09:12 dakota /kernel: (da2:ahc0:0:9:0): SCB 0x14 - timed out while idl e, LASTPHASE == 0x1, SEQADDR == 0x9 Oct 22 02:09:12 dakota /kernel: (da2:ahc0:0:9:0): Queuing a BDR SCB Oct 22 02:09:12 dakota /kernel: (da2:ahc0:0:9:0): Bus Device Reset Message Sent Oct 22 02:09:12 dakota /kernel: (da2:ahc0:0:9:0): no longer in timeout, status = 34b Oct 22 02:09:12 dakota /kernel: ahc0: Bus Device Reset on A:9. 2 SCBs aborted Oct 22 09:40:29 dakota /kernel: (da2:ahc0:0:9:0): data overrun detected in Data- Out phase. Tag == 0x6. Oct 22 09:40:29 dakota /kernel: (da2:ahc0:0:9:0): Have seen Data Phase. Length = 8192. NumSGs = 2. and so on ad infinitum. an avg. of two per day under minimum load. Sorry this got so long and I hope the formatting worked. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message