Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Aug 1999 10:15:46 -0400 (EDT)
From:      Chris Tracy <ncrawler@telerama.com>
To:        freebsd-SCSI@freebsd.org
Cc:        Chris Tracy <ncrawler@telerama.com>
Subject:   weird SCSI problems...
Message-ID:  <Pine.BSF.3.96.990823095859.28594I-100000@gauntlet.telerama.com>

next in thread | raw e-mail | index | archive | help
Hiyah..

I'm not subscribed to the freebsd-SCSI mailing list, so if anyone has any
info about this, it'd be great if you could CC me in...  Anyways... Here's
the problem:

I'm using an Intel Pentium II 350mhz based machine, with a Buslogic SCSI
card in it.  The machine used to be running the 2.2.x branch, and I've
recently wiped off the drives and have installed the 3.2-19990803-STABLE
release.

For the most part, the machine runs great.  However, this is the machine
that runs Amanda for us, and one day, while I was running Amanda, I got
the following error message, right before the machine crashed...  Here's a
transript of what exactly happened:


-----------------------
% amcheck lm
Amanda Tape Server Host Check
-----------------------------
/usr/home/holding-disk: 2178706 KB disk space available, using 2076306 KB.
NOTE: skipping tape-writable test.
Tape Telerama01 label ok.
Server check took 9.904 seconds.

Amanda Backup Client Hosts Check
--------------------------------
WARNING: kappa.webnz.net: selfcheck request timed out.  Host down?
Client check: 9 hosts checked in 29.163 seconds, 1 problem found.

(brought to you by Amanda 2.4.1p1)
% amflush -f lm
Scanning /usr/home/holding-disk...
  19990818: found non-empty Amanda directory.

Flushing dumps in 19990818,
today: 19990818
to tape drive /dev/nrsa0.
Expecting tape Telerama01 or a new tape.  (The last dumps were to tape
Telerama1
0)
Are you sure you want to do this? y
driver: send-cmd time 0.011 to taper: START-TAPER 19990818
taper: pid 1390 executable taper version 2.4.1p1
taper: read label `Telerama01' date `19990718'
Aug 18 22:13:04 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc5502f00 - timed out
Aug 18 22:13:04 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc5502f00 - timed out
Aug 18 22:13:21 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc5502f00 - timed out
Aug 18 22:13:21 qbert /kernel: bt0: No longer in timeout
Aug 18 22:13:21 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc5502f00 - timed out
Aug 18 22:13:21 qbert /kernel: bt0: No longer in timeout
Aug 18 22:13:21 qbert /kernel: (sa0:bt0:0:2:0): WRITE(06). CDB: a 0 0 80 0
0
% Aug 18 22:13:21 qbert /kernel: (sa0:bt0:0:2:0): WRITE(06). CDB: a 0 0 80
0 0
Aug 18 22:13:21 qbert /kernel: (sa0:bt0:0:2:0): UNIT ATTENTION asc:29,0
Aug 18 22:13:21 qbert /kernel: (sa0:bt0:0:2:0): UNIT ATTENTION asc:29,0
Aug 18 22:13:21 qbert /kernel: (sa0:bt0:0:2:0): Power on, reset, or bus
device r
eset occurred
Aug 18 22:13:21 qbert /kernel: (sa0:bt0:0:2:0): Power on, reset, or bus
device r
eset occurred
%
% cd /usr/home/hold

^C
Cannot create dfWAA01394: Device not configured
queueup: cannot create data temp file dfWAA01394, uid=0: Device not
configured
zsh: segmentation fault  su operator
qbert#
qbert# Aug 18 22:14:21 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc55024c0 -
timed ou
t
Aug 18 22:14:21 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc55024c0 - timed out
Aug 18 22:15:01 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc55024c0 - timed out
Aug 18 22:15:01 qbert /kernel: (da0:bt0:0:0:0): CCB 0xc55024c0 - timed out
Aug 18 22:15:01 qbert /kernel: bt0: No longer in timeout
Aug 18 22:15:01 qbert /kernel: bt0: No longer in timeout
Aug 18 22:15:01 qbert /kernel: (da0:bt0:0:0:0): Invalidating pack
Aug 18 22:15:01 qbert /kernel: (da0:bt0:0:0:0): Invalidating pack
Aug 18 22:15:01 qbert last message repeated 15 times
Aug 18 22:15:01 qbert /kernel: spec_getpages: I/O read failure: (error
code=6)
Aug 18 22:15:01 qbert last message repeated 15 times
Aug 18 22:15:01 qbert /kernel: spec_getpages: I/O read failure: (error
code=6)
Aug 18 22:15:01 qbert /kernel: size: 4096, resid: 4096, a_count: 4096,
valid: 0x
0
Aug 18 22:15:01 qbert /kernel: size: 4096, resid: 4096, a_count: 4096,
valid: 0x
Aug 18 22:15:01 qbert /kernel: nread: 0, reqpage: 0, pindex: 57, pcount: 1
Aug 18 22:15:01 qbert /kernel: nread: 0, reqpage: 0, pindex: 57, pcount: 1
Aug 18 22:15:01 qbert /kernel: vm_fault: pager read error, pid 1383 (csh)
Aug 18 22:15:01 qbert /kernel: vm_fault: pager read error, pid 1383 (csh)
Aug 18 22:15:01 qbert /kernel: (da0:bt0:0:0:0): Invalidating pack
Aug 18 22:15:01 qbert /kernel: (da0:bt0:0:0:0): Invalidating pack
Aug 18 22:15:01 qbert sendmail[1401]: NOQUEUE: SYSERR(root): queuename:
Cannot c
reate "qfWAA01401" in "/var/spool/mqueue" (euid=0): Device not configured
Aug 18 22:15:01 qbert sendmail[1401]: NOQUEUE: SYSERR(root): queuename:
Cannot c
reate "qfWAA01401" in "/var/spool/mqueue" (euid=0): Device not configured
Aug 18 22:15:01 qbert sendmail[1394]: WAA01394: SYSERR(operator): Cannot
create
dfWAA01394: Device not configured
Aug 18 22:15:01 qbert sendmail[1394]: WAA01394: SYSERR(operator): Cannot
create
dfWAA01394: Device not configured
Aug 18 22:15:01 qbert sendmail[1394]: WAA01394: SYSERR(operator): queueup:
canno
t create data temp file dfWAA01394, uid=0: Device not configured
Aug 18 22:15:01 qbert sendmail[1394]: WAA01394: SYSERR(operator): queueup:
canno
t create data temp file dfWAA01394, uid=0: Device not configured
Aug 18 22:15:01 qbert sendmail[1394]: WAA01394: SYSERR(operator): queueup:
canno
t create data temp file dfWAA01394, uid=0: Device not configured
Aug 18 22:15:11 qbert sshd[1395]: log: ROOT LOGIN as 'root' from
gauntlet.telera
ma.com
Aug 18 22:15:11 qbert /kernel: vm_fault: pager read error, pid 1403 (zsh)
Aug 18 22:15:11 qbert /kernel: vm_fault: pager read error, pid 1403 (zsh)
Aug 18 22:15:11 qbert sshd[1395]: fatal: Local: Command terminated on
signal 11.
Aug 18 22:15:11 qbert sshd[1395]: fatal: Local: Command terminated on
signal 11.
Aug 18 22:15:19 qbert sshd[1404]: log: ROOT LOGIN as 'root' from
gauntlet.telera
ma.com
Aug 18 22:15:19 qbert /kernel: vm_fault: pager read error, pid 1406 (zsh)
Aug 18 22:15:19 qbert /kernel: vm_fault: pager read error, pid 1406 (zsh)
Aug 18 22:15:19 qbert sshd[1404]: fatal: Local: Command terminated on
signal 11.
Aug 18 22:15:19 qbert sshd[1404]: fatal: Local: Command terminated on
signal 11.

qbert#
qbert#
qbert#
qbert# shutdown -r now
zsh: Input/output error: shutdown
Aug 18 22:16:36 qbert /kernel: spec_getpages: I/O read failure: (error
code=6)
qbert# Aug 18 22:16:36 qbert /kernel: spec_getpages: I/O read failure:
(error co
de=6)
Aug 18 22:16:36 qbert /kernel: size: 65536, resid: 65536, a_count: 65536,
valid:
 0x0
Aug 18 22:16:36 qbert /kernel: size: 65536, resid: 65536, a_count: 65536,
valid:
 0x0
Aug 18 22:16:36 qbert /kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 16
Aug 18 22:16:36 qbert /kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 16
Aug 18 22:16:36 qbert /kernel: spec_getpages: I/O read failure: (error
code=6)
Aug 18 22:16:36 qbert /kernel: spec_getpages: I/O read failure: (error
code=6)
Aug 18 22:16:36 qbert /kernel: size: 65536, resid: 65536, a_count: 65536,
valid:
 0x0
Aug 18 22:16:36 qbert /kernel: size: 65536, resid: 65536, a_count: 65536,
valid:
 0x0
Aug 18 22:16:36 qbert /kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 16
Aug 18 22:16:36 qbert /kernel: nread: 0, reqpage: 0, pindex: 0, pcount: 16
--------------------------


So basically it looks like some part of our SCSI bus is failing
hardcore...

Here is the results of the 'dmesg' command so everyone can see exactly how
this machine's hardware is configured.....


--------------------------
Copyright (c) 1992-1999 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California. All rights reserved.
FreeBSD 3.2-19990803-STABLE #0: Wed Aug 18 13:55:26 EDT 1999
    root@qbert.telerama.com:/usr/src/sys/compile/QBERT
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium II (299.75-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x634  Stepping = 4

Features=0x80fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,MMX>
real memory  = 134217728 (131072K bytes)
config> di zp0
config> di ze0
config> di lnc0
config> di le0
config> di ie0
config> di fe0
config> di ep0
config> di ed0
config> di cs0
config> di wt0
config> di scd0
config> di mcd0
config> di matcdc0
config> di aha0
config> di adv0
config> q
avail memory = 126984192 (124008K bytes)
Preloaded elf kernel "kernel" at 0xc036b000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc036b09c.
Probing for devices on PCI bus 0:
chip0: <Intel 82443LX host to PCI bridge> rev 0x03 on pci0.0.0
chip1: <Intel 82443LX PCI-PCI bridge> rev 0x03 on pci0.1.0
chip2: <Intel 82371AB PCI to ISA bridge> rev 0x01 on pci0.7.0
ide_pci0: <Intel PIIX4 Bus-master IDE controller> rev 0x01 on pci0.7.1
chip3: <Intel 82371AB Power management controller> rev 0x01 on pci0.7.3
bt0: <Buslogic Multi-Master SCSI Host Adapter> rev 0x08 int a irq 11 on
pci0.11.0
bt0: BT-958 FW Rev. 5.07B Ultra Wide SCSI Host Adapter, SCSI ID 7, 192
CCBs
fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x05 int a irq 9 on
pci0.12.0
fxp0: Ethernet address 00:a0:c9:db:03:18
Probing for devices on PCI bus 1:
Probing for PnP devices:
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=0x0>
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
psm0 not found
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 not found at 0x1f0
wdc1 not found at 0x170
ppc0 at 0x378 irq 7 flags 0x40 on isa
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
lpt0: <generic printer> on ppbus 0
lpt0: Interrupt-driven port
ppi0: <generic parallel i/o> on ppbus 0
plip0: <PLIP network interface> on ppbus 0
ex0 not found
bt: unit number (1) too high
bt1 not found at 0x330
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
Waiting 15 seconds for SCSI devices to settle
sa0 at bt0 bus 0 target 2 lun 0
sa0: <HP C1533A A708> Removable Sequential Access SCSI-2 device 
sa0: 10.000MB/s transfers (10.000MHz, offset 15)
da1 at bt0 bus 0 target 1 lun 0
da1: <SEAGATE ST34573W 5698> Fixed Direct Access SCSI-2 device 
da1: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing
Enabled
da1: 4340MB (8888924 512 byte sectors: 255H 63S/T 553C)
da0 at bt0 bus 0 target 0 lun 0
da0: <SEAGATE ST34572W 0876> Fixed Direct Access SCSI-2 device 
da0: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing
Enabled
da0: 4340MB (8888924 512 byte sectors: 255H 63S/T 553C)
changing root device to da0s1a
WARNING: / was not properly dismounted
-----


As you can see, this machine has 3 SCSI devices -- 0 and 1 are internal
4GB seagate barracudas, 2 is our external HP tape drive. 

So anyways, I am pretty convinced it is either the card, or one of the
SCSI devices, or maybe even a bug in the BusLogic SCSI driver (I doubt it,
but who knows..heheh)?  It seems as though something on the SCSI bus reset
itself or something, from what I've seen in the error message..  Could
this be a termination problem?  I've doublechecked all of our termination,
and it seems to be OK !?!? ...

If anyone has any suggestions, even on things to try, I'd appreciate it!
FYI -- this particular problem has only happened once.  The machine HAS
been working OK since this happened, but I'm convinced it could happen
again...

Like I said, I'm not subscribed to this list, so please CC me in any
response..  I will check back to the mailing list anyways even if I don't
hear anything in e-mail.  

Thanks in advance!!!
-Chris



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.990823095859.28594I-100000>