From owner-freebsd-bugs Fri Jan 19 02:25:58 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id CAA05856 for bugs-outgoing; Fri, 19 Jan 1996 02:25:58 -0800 (PST) Received: from proxy.siemens.at (proxy.siemens.at [192.138.228.19]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id CAA05844 for ; Fri, 19 Jan 1996 02:25:27 -0800 (PST) Received: from zerberus.hai.siemens.co.at (zerberus.hai.siemens-austria) by proxy.siemens.at with SMTP id AA22553 (5.67a/IDA-1.5 for ); Fri, 19 Jan 1996 11:24:30 +0100 Received: by zerberus.hai.siemens.co.at (4.1/SMI-4.1) id AA01493; Fri, 19 Jan 96 11:24:28 +0100 Date: Fri, 19 Jan 96 11:24:28 +0100 From: wirth@zerberus.hai.siemens.co.at (Helmut F. Wirth) Message-Id: <9601191024.AA01493@zerberus.hai.siemens.co.at> To: freebsd-bugs@freebsd.org Subject: Bug with NCR810 driver and FreeBSD 2.1 release, please help Cc: wirth@zerberus.hai.siemens.co.at Sender: owner-bugs@freebsd.org Precedence: bulk Hello ! I think I triggered a bug in the NCR driver code: If I try to do a dump from one of the IBM disks (see below) an error in the NCR driver code shows up. Details see below. My Hardware: Pentium-120, ASUS TPX4 motherboard, 32MB memory NCR810 controller Diamond Stealth 64 SCSI Bus: Quantum ATLAS (target 0) IBM OEM 1GB (target 1) IBM OEM 1GB (target 2) VIPER ARCHIVE tape (target 5) TOSHIBA CDROM (target 6) Software: MSDOS 6.2 (Win 3.1) on target 0 FreeBSD 2.1 release on target 0,1,2; target 0 contains /, swap and /usr This are the FreeBSD kernel boot messages: Jan 18 20:06:05 atlantis /kernel: FreeBSD 2.1.0-RELEASE #0: Wed Jan 17 21:39:28 1996 Jan 18 20:06:05 atlantis /kernel: hfwirth@atlantis.ping.at:/usr/src/sys/compile/ATLANTIS Jan 18 20:06:05 atlantis /kernel: CPU: 120-MHz Pentium 735\90 or 815\100 (Pentium-class CPU) Jan 18 20:06:05 atlantis /kernel: Origin = "GenuineIntel" Id = 0x525 Stepping=5 Jan 18 20:06:05 atlantis /kernel: Features=0x1bf Jan 18 20:06:05 atlantis /kernel: real memory = 33554432 (32768K bytes) Jan 18 20:06:05 atlantis /kernel: avail memory = 30900224 (30176K bytes) Jan 18 20:06:05 atlantis /kernel: Probing for devices on the ISA bus: Jan 18 20:06:05 atlantis /kernel: sc0 at 0x60-0x6f irq 1 on motherboard Jan 18 20:06:05 atlantis /kernel: sc0: VGA color <16 virtual consoles, flags=0x0> Jan 18 20:06:05 atlantis /kernel: sio0 at 0x3f8-0x3ff irq 4 on isa Jan 18 20:06:05 atlantis /kernel: sio0: type 16550A Jan 18 20:06:05 atlantis /kernel: sio1 at 0x2f8-0x2ff irq 3 on isa Jan 18 20:06:05 atlantis /kernel: sio1: type 16550A Jan 18 20:06:06 atlantis /kernel: lpt0 at 0x378-0x37f irq 7 on isa Jan 18 20:06:06 atlantis /kernel: lpt0: Interrupt-driven port Jan 18 20:06:06 atlantis /kernel: lp0: TCP/IP capable interface Jan 18 20:06:06 atlantis /kernel: fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa Jan 18 20:06:06 atlantis /kernel: fdc0: NEC 72065B Jan 18 20:06:06 atlantis /kernel: fd0: 1.2MB 5.25in Jan 18 20:06:06 atlantis /kernel: fd1: 1.44MB 3.5in Jan 18 20:06:06 atlantis /kernel: npx0 on motherboard Jan 18 20:06:06 atlantis /kernel: npx0: INT 16 interface Jan 18 20:06:06 atlantis /kernel: sb0 at 0x220 irq 5 drq 1 on isa Jan 18 20:06:06 atlantis /kernel: sb0: Jan 18 20:06:06 atlantis /kernel: sbxvi0 at 0x0 drq 5 on isa Jan 18 20:06:06 atlantis /kernel: sbxvo0: Jan 18 20:06:06 atlantis /kernel: sbmidi0 at 0x300 on isa Jan 18 20:06:06 atlantis /kernel: Jan 18 20:06:06 atlantis /kernel: bio_imask c0000040 tty_imask c003009a net_imask c003009a Jan 18 20:06:06 atlantis /kernel: Probing for devices on the PCI bus: Jan 18 20:06:06 atlantis /kernel: chip0 rev 2 on pci0:0 Jan 18 20:06:07 atlantis /kernel: chip1 rev 2 on pci0:7 Jan 18 20:06:07 atlantis /kernel: vga0 rev 0 on pci0:9 Jan 18 20:06:07 atlantis /kernel: ncr0 rev 1 int a irq 11 on pci0:12 Jan 18 20:06:07 atlantis /kernel: ncr0 waiting for scsi devices to settle Jan 18 20:06:07 atlantis /kernel: (ncr0:0:0): "Quantum XP32150 81HB" type 0 fixed SCSI 2 Jan 18 20:06:07 atlantis /kernel: sd0(ncr0:0:0): Direct-Access Jan 18 20:06:07 atlantis /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. Jan 18 20:06:07 atlantis /kernel: 2050MB (4199760 512 byte sectors) Jan 18 20:06:07 atlantis /kernel: (ncr0:1:0): 200ns (5 Mb/sec) offset 8. Jan 18 20:06:07 atlantis /kernel: (ncr0:1:0): "IBM OEM 0662S12 3 30" type 0 fixed SCSI 2 Jan 18 20:06:07 atlantis /kernel: sd1(ncr0:1:0): Direct-Access Jan 18 20:06:07 atlantis /kernel: sd1(ncr0:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. Jan 18 20:06:07 atlantis /kernel: 1003MB (2055035 512 byte sectors) Jan 18 20:06:07 atlantis /kernel: (ncr0:2:0): "IBM DPES-31080 S31Q" type 0 fixed SCSI 2 Jan 18 20:06:07 atlantis /kernel: sd2(ncr0:2:0): Direct-Access Jan 18 20:06:07 atlantis /kernel: sd2(ncr0:2:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. Jan 18 20:06:07 atlantis /kernel: 1034MB (2118144 512 byte sectors) Jan 18 20:06:07 atlantis /kernel: (ncr0:5:0): "ARCHIVE VIPER 150 21247 -011" type 1 removable SCSI 1 Jan 18 20:06:07 atlantis /kernel: st0(ncr0:5:0): Sequential-Access st0: Archive Viper 150 is a known rogue Jan 18 20:06:07 atlantis /kernel: density code 0x0, drive empty Jan 18 20:06:07 atlantis /kernel: (ncr0:6:0): "TOSHIBA CD-ROM XM-3501TA 2694" type 5 removable SCSI 2 Jan 18 20:06:07 atlantis /kernel: cd0(ncr0:6:0): CD-ROM Jan 18 20:06:07 atlantis /kernel: cd0(ncr0:6:0): 250ns (4 Mb/sec) offset 8. Jan 18 20:06:07 atlantis /kernel: cd present.[264427 x 2048 byte records] Jan 18 20:06:05 atlantis lpd[94]: restarted Bug description: (send-pr not possible yet, my mail does not work (yet)) I discoverd this while trying to dump from one IBM disk to the other IBM disk, like this (details see below): dump 0f - (diskname)|gzip -c >(file) The disk, from which dump *reads*, has problems with the NCR driver. Actually there were two different errors, but I think they are related. The target 0 (Quantum ATLAS) is mounted at /, /usr and for swap The target 1 (IBM OEM, old) is mounted at /home/disk1 The target 2 (IBM OEM, new) is mounted at /home/disk2 With this I got the following error (I will refer to it as ERROR_1) while trying to dump from target 1 to a file on target 2: bash# dump 0f - /home/disk1 | gzip -c > /home/disk2/disk1.dump.gz DUMP: Date of this level 0 dump: Fri Jan 19 01:29:56 1996 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rsd1s1e (/home/disk1) to standard output DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 469684 tape blocks. DUMP: slave couldn't reopen disk: Device not configured DUMP: DUMP: The ENTIRE dump is aborted. bash# The /var/log/messages contained: Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:2 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:2 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , FAILURE Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , FAILURE Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:2 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:2 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , retries:1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , FAILURE Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): NOT READY asc:4,1 Jan 19 01:30:02 atlantis /kernel: sd1(ncr0:1:0): Logical unit is in process of becoming ready Jan 19 01:30:02 atlantis /kernel: , FAILURE I experimented a bit and tried to dump from target 1 to a file on target 0 and from target 1 to the tape: bash# dump 0f - /home/disk1 | gzip -c > /var/tmp/disk1.dump.gz and bash# dump 0f /dev/rst0 /home/disk1 Both tries yielded exactly the same error as above. I tried then to dump from *target 2* and here the error was different, I will refer to it as ERROR_2: Dumping from target 2 to a file on target 1: bash# dump 0f - /home/disk2 | gzip -c > /home/disk1/disk2.dump.gz DUMP: Date of this level 0 dump: Fri Jan 19 01:32:56 1996 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rsd2e (/home/disk2) to standard output DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 595453 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] ^C DUMP: Interrupt received. DUMP: Do you want to abort dump?: ("yes" or "no") DUMP: Broken pipe DUMP: The ENTIRE dump is aborted. (The dump seemed to work, but I did not trust it and aborted). var/log/messages contained: Jan 19 01:17:02 atlantis /kernel: assertion "cp" failed: file "../../pci/ncr.c", line 5560 Jan 19 01:17:02 atlantis /kernel: assertion "cp" failed: file "../../pci/ncr.c", line 5560 Jan 19 01:17:02 atlantis /kernel: sd2(ncr0:2:0): COMMAND FAILED (4 28) @f0a2ce00. I tried to dump target 2 to target 0 and to tape bash# dump 0f - /home/disk2 | gzip -c > /var/tmp/disk2.dump.gz bash# dump 0f /dev/rts0 /home/disk2 and ERROR_2 occured in both cases too. Triying to dump *from* target 0 works to all other targets (except the CDROM and the NCR810 of course :-)). So far the description what happened. Considerations and further experimenting: 1) The SCSI bus, and termination: The controller and the CDROM are the last devices on the cable and are both terminated properly. The controller supplies terminator power. There is only a cable inside the PC and it is under 100cm long. The machine worked with NetBSD 1.0A and MSDOS(Windows) without any problems. So I think this is'nt a hardware related problem. 2) Both IBM disks seem to trigger the problem: The first IBM disk (target 1) is about one year old. It never had problems but with a early version of the NCR driver and with NetBSD 0.9 I had problems with the tags. This showed up with a "disk not ready" during savecore (without a core dump) while booting. I think this is very similar to ERROR_1. The problem disappeared with the next driver version, but I found a solution to it: Disable the tags for the IBM disk. I tried this for the two bugs ERROR_1 and ERROR_2: With the tags disabled for all disks both bugs disappeared. With the tags disabled for the disk which I try to dump *from*, the bug disappears too. So could that be buggy disks ? I think not, because the second disk is about 2 weeks old and looks completly different. I think IBM has a way with tags which the driver won't like. Playing around with ncrcontrol showed some strange things too, but I had not the time to look into it more: ncrcontrol shows the SCSI devices, and for all three disks there are 4 tags. The CDROM is SCSI-2 and has no tags, the tape is SCSI-1. Using ncrcontrol .. -t 1 -s tags=1 solved the problem for target 1, but doing ncrcontrol after this showed 4 tags for target 1 ?? The list entry did not change, but it seems the driver got it. The datasheet for the disks mention jumpers to disable SCSI-ATTENTION after a SCSI bus reset. For target 1 (only) there is a jumper to disable active (target initiated) sync negiotation. Could one of these help ? Thats all I know yet. Thank you for any help and hints, completly disabling tags hurts the performance and I would like to find a better solution. Helmut Wirth So far the description of the bugs