From owner-freebsd-current Tue Oct 13 04:48:25 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id EAA13981 for freebsd-current-outgoing; Tue, 13 Oct 1998 04:48:25 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from rucus.ru.ac.za (rucus.ru.ac.za [146.231.29.2]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id EAA13882 for ; Tue, 13 Oct 1998 04:47:59 -0700 (PDT) (envelope-from nbm@rucus.ru.ac.za) Received: (qmail 14897 invoked by uid 1003); 13 Oct 1998 11:47:39 -0000 Message-ID: <19981013134739.A26388@rucus.ru.ac.za> Date: Tue, 13 Oct 1998 13:47:39 +0200 From: Neil Blakey-Milner To: current@FreeBSD.ORG Subject: Some SCSI(?) problems whilst running SMP Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi We're in the unfortunate position of having to run a non-SMP kernel on our dual-processor machine, due to the following: uname -a: (of the non-SMP kernel) FreeBSD rucus.ru.ac.za 3.0-BETA FreeBSD 3.0-BETA #0: Mon Oct 5 04:49:12 SAT 1998 nbm@rucus.ru.ac.za:/usr/src/sys/compile/RUCUS i386 Sources are from October 4th, without softupdates, devfs, and other fun things. //--------------------------------------------------------------------- /usr/src/sys/i386/conf/RUCUS: # SMP-GENERIC -- Smp machine with WD/AHx/NCR/BTx family disks # # For more information read the handbook part System Administration -> # Configuring the FreeBSD Kernel -> The Configuration File. # The handbook is available in /usr/share/doc/handbook or online as # latest version from the FreeBSD World Wide Web server # # # An exhaustive list of options and more detailed explanations of the # device lines is present in the ./LINT configuration file. If you are # in doubt as to the purpose or necessity of a line, check first in LINT. # # $Id: SMP-GENERIC,v 1.16 1998/09/25 17:34:48 peter Exp $ machine "i386" # SMP does NOT support 386/486 CPUs. #cpu "I386_CPU" #cpu "I486_CPU" cpu "I586_CPU" cpu "I686_CPU" ident GENERIC maxusers 256 # Create a SMP capable kernel (mandatory options): #options SMP # Symmetric MultiProcessor Kernel #options APIC_IO # Symmetric (APIC) I/O options "MAXDSIZ=(512*1048576)" # Max allowed size of process options "DFLDSIZ=(256*1048576)" # Default Max size of process # Optional, these are the defaults: #options NCPU=2 # number of CPUs #options NBUS=4 # number of busses #options NAPIC=1 # number of IO APICs #options NINTR=24 # number of INTs # Lets always enable the kernel debugger for SMP. #options DDB # SMP shouldn't need x87 emulation, disable by default. #options MATH_EMULATE #Support for x87 emulation options INET #InterNETworking options FFS #Berkeley Fast Filesystem options NFS #Network Filesystem #options MSDOSFS #MSDOS Filesystem options "CD9660" #ISO 9660 Filesystem options PROCFS #Process filesystem options "COMPAT_43" #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=15000 #Be pessimistic about Joe SCSI device options UCONSOLE #Allow users to grab the console options FAILSAFE #Be conservative options USERCONFIG #boot -c editor options VISUAL_USERCONFIG #visual boot -c editor options INCLUDE_CONFIG_FILE options "MD5" options IPFIREWALL options IPFIREWALL_VERBOSE options QUOTA config kernel root on wd0 controller isa0 controller eisa0 controller pci0 controller fdc0 at isa? port "IO_FD1" bio irq 6 drq 2 vector fdintr disk fd0 at fdc0 drive 0 disk fd1 at fdc0 drive 1 # Unless you know very well what you're doing, leave ft0 at drive 2, or # remove the line entirely if you don't need it. Trying to configure # it on another unit might cause surprises, see PR kern/7176. tape ft0 at fdc0 drive 2 options "CMD640" # work around CMD640 chip deficiency controller wdc0 at isa? port "IO_WD1" bio irq 14 vector wdintr disk wd0 at wdc0 drive 0 disk wd1 at wdc0 drive 1 controller wdc1 at isa? port "IO_WD2" bio irq 15 vector wdintr disk wd2 at wdc1 drive 0 disk wd3 at wdc1 drive 1 options ATAPI #Enable ATAPI support for IDE bus options ATAPI_STATIC #Don't do it as an LKM device wcd0 #IDE CD-ROM # A single entry for any of these controllers (ncr, ahb, ahc, amd) is # sufficient for any number of installed devices. #controller ncr0 #controller amd0 #controller ahb0 controller ahc0 #controller isp0 options AHC_ALLOW_MEMIO # This controller offers a number of configuration options, too many to # document here - see the LINT file in this directory and look up the # dpt0 entry there for much fuller documentation on this. The options # line following dpt0 here is also currently a *required* option for it. # controller dpt0 # options DPT_MEASURE_PERFORMANCE #controller adv0 at isa? port ? cam irq ? #controller bt0 at isa? port ? cam irq ? #controller aha0 at isa? port ? cam irq ? #controller uha0 at isa? port "IO_UHA0" bio irq ? drq 5 vector uhaintr #controller aic0 at isa? port 0x340 bio irq 11 vector aicintr #controller nca0 at isa? port 0x1f88 bio irq 10 vector ncaintr #controller nca1 at isa? port 0x350 bio irq 5 vector ncaintr #controller sea0 at isa? bio irq 5 iomem 0xc8000 iosiz 0x2000 vector seaintr controller scbus0 device da0 device sa0 device pass0 device cd0 #Only need one of these, the code dynamically grows device wt0 at isa? port 0x300 bio irq 5 drq 1 vector wtintr device mcd0 at isa? port 0x300 bio irq 10 vector mcdintr controller matcd0 at isa? port 0x230 bio device scd0 at isa? port 0x230 bio #options PNP # syscons is the default console driver, resembling an SCO console device sc0 at isa? port "IO_KBD" tty irq 1 vector scintr # Enable this and PCVT_FREEBSD for pcvt vt220 compatible console driver #device vt0 at isa? port "IO_KBD" tty irq 1 vector pcrint #options XSERVER # include code for XFree86 #options FAT_CURSOR # start with block cursor # If you have a ThinkPAD, uncomment this along with the rest of the PCVT lines #options PCVT_SCANSET=2 # IBM keyboards are non-std options MAXCONS=16 #options SC_DISABLE_REBOOT device npx0 at isa? port "IO_NPX" irq 13 vector npxintr # # Laptop support (see LINT for more options) # device apm0 at isa? disable flags 0x31 # Advanced Power Management # PCCARD (PCMCIA) support #controller card0 #device pcic0 at card? #device pcic1 at card? device sio0 at isa? port "IO_COM1" flags 0x10 tty irq 4 vector siointr device sio1 at isa? port "IO_COM2" tty irq 3 vector siointr device sio2 at isa? disable port "IO_COM3" tty irq 5 vector siointr device sio3 at isa? disable port "IO_COM4" tty irq 9 vector siointr device lpt0 at isa? port? tty irq 7 vector lptintr device lpt1 at isa? port? tty device mse0 at isa? port 0x23c tty irq 5 vector mseintr device psm0 at isa? disable port "IO_KBD" conflicts tty irq 12 vector psmintr # Order is important here due to intrusive probes, do *not* alphabetize # this list of network interfaces until the probes have been fixed. # Right now it appears that the ie0 must be probed before ep0. See # revision 1.20 of this file. #device de0 #device fxp0 #device tl0 device tx0 #device vx0 #device xl0 device ed0 at isa? port 0x300 net irq 3 iomem 0xd8000 vector edintr #device ie0 at isa? port 0x300 net irq 10 iomem 0xd0000 vector ieintr #device ep0 at isa? port 0x300 net irq 10 vector epintr #device ex0 at isa? port? net irq? vector exintr #device fe0 at isa? port 0x300 net irq ? vector feintr #device le0 at isa? port 0x300 net irq 5 iomem 0xd0000 vector le_intr #device lnc0 at isa? port 0x280 net irq 10 drq 0 vector lncintr #device ze0 at isa? port 0x300 net irq 5 iomem 0xd8000 vector zeintr #device zp0 at isa? port 0x300 net irq 10 iomem 0xd8000 vector zpintr #device cs0 at isa? port 0x300 net irq ? vector csintr pseudo-device loop pseudo-device ether pseudo-device sl 1 #pseudo-device ppp 1 pseudo-device tun 4 pseudo-device pty 256 pseudo-device gzip # Exec gzipped a.out's # KTRACE enables the system-call tracing facility ktrace(2). # This adds 4 KB bloat to your kernel, and slightly increases # the costs of each syscall. options KTRACE #kernel tracing # This provides support for System V shared memory. # options SYSVSHM options SYSVSEM options SYSVMSG //-------------------------------------------------------------- diff RUCUS RUCUS-SMP //-------------------------------------------------------------- 23c23 < ident GENERIC --- > ident SMP-GENERIC 27,28c27,28 < #options SMP # Symmetric MultiProcessor Kernel < #options APIC_IO # Symmetric (APIC) I/O --- > options SMP # Symmetric MultiProcessor > Kernel > options APIC_IO # Symmetric (APIC) I/O 40c40 < #options DDB --- > options DDB //--------------------------------------------------------------- /var/run/dmesg.boot: //--------------------------------------------------------------- Copyright (c) 1992-1998 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 3.0-BETA #0: Mon Oct 5 04:49:12 SAT 1998 nbm@rucus.ru.ac.za:/usr/src/sys/compile/RUCUS Timecounter "i8254" frequency 1193182 Hz cost 2170 ns CPU: Pentium/P54C (200.46-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x52c Stepping=12 Features=0x3bf real memory = 134217728 (131072K bytes) avail memory = 127762432 (124768K bytes) Probing for devices on PCI bus 0: chip0: rev 0x03 on pci0.0.0 chip1: rev 0x01 on pci0.7.0 ide_pci0: rev 0x00 on pci0.7.1 ahc0: rev 0x00 int a irq 11 on pci0.12.0 ahc0: Using left over BIOS settings ahc0: aic7880 Wide Channel A, SCSI Id=5, 16/255 SCBs Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> ed0 at 0x300-0x31f irq 3 on isa ed0: address 00:00:e8:1c:7b:57, type NE2000 (16 bit) sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 sio1 not found at 0x2f8 lpt0 at 0x378-0x37f irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface lpt1 not found mse0 not found at 0x23c fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): wd0: 3020MB (6185088 sectors), 6136 cyls, 16 heads, 63 S/T, 512 B/S wdc1 not found at 0x170 wt0 not probed due to I/O address conflict with ed0 at 0x300 mcd0 not probed due to I/O address conflict with ed0 at 0x300 matcdc0 not found at 0x230 scd0 not found at 0x230 npx0 on motherboard npx0: INT 16 interface Intel Pentium F00F detected, installing workaround IP packet filtering initialized, divert disabled, rule-based forwarding disabled, unlimited logging Sending WDTR! (probe2:ahc0:0:2:0): Sending SDTR!! sa0 at ahc0 bus 0 target 1 lun 0 sa0: Removable Sequential Access SCSI2 device sa0: 10.0MB/s transfers (10.0MHz, offset 15) changing root device to da1s1a da1 at ahc0 bus 0 target 6 lun 0 da1: Fixed Direct Access SCSI2 device da1: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled da1: 2063MB (4226725 512 byte sectors: 64H 32S/T 2063C) da0 at ahc0 bus 0 target 2 lun 0 da0: Fixed Direct Access SCSI2 device da0: 40.0MB/s transfers (20.0MHz, offset 8, 16bit) da0: 4157MB (8515173 512 byte sectors: 64H 32S/T 4157C) //--------------------------------------------------------------- Ok, the problem is this: When we enable SMP support, within any time from an hour to 6 days, we will die with SCSI errors - of late "SCB timeout handled by another timeout" I think is the proferred explanation. The "death" seems to occur quickly after extensive access to the disks, but it also just dies arbitrarily, usually after the machine has been up for a few days. It doesn't seem to be specific to any drive failing either. (we've swapped drives around, etc) ("die" is a technical term here meaning either a reboot just after a SCSI error pops up for a few seconds, or just hangs after a SCSI error pops up.) The motherboard is a GigaByte GA586DX with onboard AIC7880 SCSI controller. The BIOS has been updated from 1.0 to 3.43 to no avail. I'm looking for anything to do with the SCSI controller too, but nothing seems to be out there. We have both 16bit and 8bit devices on it, and is terminated correctly according to both the motherboard manual, and tons of testing. (terminator on last device on each SCSI connection, and high-bit termination on and low-bit termination off on the motherboard). These errors have been occuring much more often recently, happening only occasionally about a year ago, and now happening _extremely_ fast (usually within 4 days, sometimes a whole week) if we have SMP enabled. We've yet to have the same problem without SMP though. I realize that it's incredibly likely to be the hardware, I was just hoping that I'm wrong in this regard, since we're stuck with this hardware for a few months, and we're kinda used to having huge uptimes, CPU power, and similar things, compared to the Microsoft house that is the Information Systems department. Anyway, any and all help would be appreciated. (although I'll understand if everyone ignores this for a few days whilst furious coding occurs on the new release) Neil -- Neil Blakey-Milner nbm@rucus.ru.ac.za To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message