Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Dec 1998 12:43:11 -0500 (EST)
From:      spork <spork@super-g.com>
To:        "Kenneth D. Merry" <ken@plutotech.com>
Cc:        freebsd-scsi@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG
Subject:   Re: CAM and -stable
Message-ID:  <Pine.BSF.4.00.9812141207050.6123-100000@super-g.inch.com>
In-Reply-To: <199812141657.JAA54340@panzer.plutotech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 14 Dec 1998, Kenneth D. Merry wrote:

> spork wrote...
> 
> [ sorry for not responding to your previous message.  I thought Justin
> would respond, but evidently he never got around to it. ]

No problem, I was a little nervous, that's all...

> Generally, it's a sign that the device has gone "out to lunch", and we have
> to whap it over the head with a BDR to get it to wake up.

Are you aware of anyone else running a CMD controller?  I know wcarchive
is using the Mylex box.  I hope I've made a good choice...
 
> You also have another problem, which wasn't evident in your earlier mail.
> The tagged openings on one of your RAID partitions have gone down to 2.
> That indicates that the device keeps sending queue full until we reduce the
> number of tagged openings to the lowest possible value (2).  I would
> suggest looking in the CMD docs, and try to figure out if they say how many
> simultaneous transactions the device can handle.  Take that number, divide
> it by 2 (you've got two partitions on the device), and make that the
> maximum number of tags in a quirk entry in the transport layer.  Make the
> minimum number of tags something slightly less than that.

How and where do I set the number of tags?  The docs
(http://www.cmd.com/storage/products/docs/datasheet/crd5440.cfm) say that
the unit can queue 64 commands.  There are two hosts on this controller in
our installation.  I'm no scsi genius, are 'tags' and 'queued commands'
the same thing?
 
> Generally, the system will recover all right from the 'timed out while
> idle' problem.  After we hit the device with a BDR, all the CCBs that have
> already been sent to the device are aborted, and we requeue them all.

Yep, it's only froze up completely once.  Is your feeling that if I adjust
it to use less tags this should go away?  I checked the CMD website, and
there is one more firmware update, but it doesn't address any serious
issues.

And one last thing, will there ever be a newer patchkit for CAM under
stable?  My timing is horrible, I need to put these machines in production
in the next week or so, so I can't really go to 3.x right now...  In 6
months or so, sure, but these will probably remain at 2.2 for quite some
time.  Are there major changes from what I'm running in the 3.x tree?

Thanks very much,

Charles

ps- if anyone feels the crosspost to -stable and -scsi is unneeded, feel
free to remove the less appropriate list...

> 
> > On Fri, 11 Dec 1998, spork wrote:
> > 
> > > Hi,
> > > 
> > > I'm about to put two new machines in production, and they're both "core"
> > > machines; main dns/auth/mail and a shell machine.  Currently the machines
> > > we use in this capacity are 2.1.7.1, and it's been very stable.
> > > 
> > > Now the new machines share a RAID array hung off of a CMD CRD-5440.  I
> > > patched our usual build (980825 -stable) with the July CAM patchkit, as
> > > the existing AHC driver couldn't detect any LUNs beyond the first one.
> > > 
> > > All has been well so far, I've tried to stress the machines as much as
> > > possible by running some disk benchmarks over and over, but yesterday one
> > > locked up (console frozen) with the following messages being the last
> > > thing on the console:
> > > 
> > > Dec 10 18:13:15 shell /kernel: (da0:ahc0:0:0:0): SCB 0x1e - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
> > > Dec 10 18:13:18 shell /kernel: SEQADDR == 0xa
> > > Dec 10 18:13:18 shell /kernel: SSTAT1 == 0xb
> > > Dec 10 18:13:18 shell /kernel: (da0:ahc0:0:0:0): Queuing a BDR SCB
> > > Dec 10 18:13:18 shell /kernel: (da0:ahc0:0:0:0): Bus Device Reset Message Sent
> > > Dec 10 18:13:18 shell /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b
> > > Dec 10 18:13:18 shell /kernel: ahc0: Bus Device Reset Sent. 2 SCBs aborted
> > > 
> > > I had to give it a hard reset at this point.
> > > 
> > > So my questions are:  Is this a known issue?  Does it point to a possible
> > > hardware problem?  Will there be a newer cam patchkit for -stable?
> > > 
> > > I don't think it's a cabling issue, as this is the first I've seen of any
> > > anomolies with the scsi subsystem, and the only cabling in question here
> > > is a high quality 2' external UW scsi cable from the back of this machine
> > > to the RAID array.  The other machine that uses the other host port on the
> > > RAID array remained functional during this glitch...
> > > 
> > > Any ideas?  I was very comfortable with CAM before, but now I'm a little
> > > nervous about moving this into production.  Would it be better to try and
> > > back out of the patches and use the ahc driver?  Let me know if there's
> > > any other info needed.
> > > 
> > > Following are the boot messages...
> > > 
> > > Thanks,
> > > 
> > > Charles
> > > 
> > > Dec 10 19:27:32 shell /kernel: Copyright (c) 1992-1998 FreeBSD Inc.
> > > Dec 10 19:27:32 shell /kernel: Copyright (c) 1982, 1986, 1989, 1991, 1993
> > > Dec 10 19:27:32 shell /kernel: The Regents of the University of California.  All rights reserved.
> > > Dec 10 19:27:32 shell /kernel: 
> > > Dec 10 19:27:32 shell /kernel: FreeBSD 2.2.7-19980825-SNAP #0: Thu Dec 10 12:02:45 EST 1998
> > > Dec 10 19:27:32 shell /kernel: spork@shell.inch.com:/usr/src/sys/compile/SHELL
> > > Dec 10 19:27:32 shell /kernel: CPU: Pentium II (quarter-micron) (350.80-MHz 686-class CPU)
> > > Dec 10 19:27:32 shell /kernel: Origin = "GenuineIntel"  Id = 0x651  Stepping=1
> > > Dec 10 19:27:32 shell /kernel: Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,<b16>,<b17>,MMX,<b24>>
> > > Dec 10 19:27:32 shell /kernel: real memory  = 268435456 (262144K bytes)
> > > Dec 10 19:27:32 shell /kernel: avail memory = 261144576 (255024K bytes)
> > > Dec 10 19:27:32 shell /kernel: Probing for devices on PCI bus 0:
> > > Dec 10 19:27:32 shell /kernel: chip0 <generic PCI bridge (vendor=8086 device=7190 subclass=0)> rev 2 on pci0:0:0
> > > Dec 10 19:27:32 shell /kernel: chip1 <generic PCI bridge (vendor=8086 device=7191 subclass=4)> rev 2 on pci0:1:0
> > > Dec 10 19:27:32 shell /kernel: chip2 <Intel 82371AB PCI-ISA bridge> rev 2 on pci0:4:0
> > > Dec 10 19:27:32 shell /kernel: chip3 <Intel 82371AB IDE interface> rev 1 on pci0:4:1
> > > Dec 10 19:27:32 shell /kernel: chip4 <Intel 82371AB USB interface> rev 1 int d irq 12 on pci0:4:2
> > > Dec 10 19:27:32 shell /kernel: chip5 <Intel 82371AB Power management controller> rev 2 on pci0:4:3
> > > Dec 10 19:27:32 shell /kernel: fxp0 <Intel EtherExpress P
> > > Dec 10 19:27:32 shell /kernel: ro 10/100B Ethernet> rev 5 int a irq 10 on pci0:7:0
> > > Dec 10 19:27:32 shell /kernel: fxp0: Ethernet address 00:e0:18:90:36:4d
> > > Dec 10 19:27:32 shell /kernel: ahc0 <Adaptec 2940 Ultra SCSI adapter> rev 1 int a irq 12 on pci0:9:0
> > > Dec 10 19:27:32 shell /kernel: ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
> > > Dec 10 19:27:32 shell /kernel: fxp1 <Intel EtherExpress Pro 10/100B Ethernet> rev 5 int a irq 10 on pci0:10:0
> > > Dec 10 19:27:32 shell /kernel: fxp1: Ethernet address 00:a0:c9:e7:ac:7d
> > > Dec 10 19:27:32 shell /kernel: vga0 <VGA-compatible display device> rev 211 int a irq 11 on pci0:11:0
> > > Dec 10 19:27:32 shell /kernel: Probing for devices on PCI bus 1:
> > > Dec 10 19:27:32 shell /kernel: Probing for devices on the ISA bus:
> > > Dec 10 19:27:32 shell /kernel: sc0 at 0x60-0x6f irq 1 on motherboard
> > > Dec 10 19:27:32 shell /kernel: sc0: VGA color <16 virtual consoles, flags=0x0>
> > > Dec 10 19:27:32 shell /kernel: sio0 at 0x3f8-0x3ff irq 4 on isa
> > > Dec 10 19:27:32 shell /kernel: sio0: type 16550A
> > > Dec 10 19:27:32 shell /kernel: sio1 at 0x2f8-0x2ff irq 3 on isa
> > > Dec 10 19:27:32 shell /kernel: sio1: type 16550A
> > > Dec 10 19:27:32 shell /kernel: lpt0 at 0x378-0x37f irq 7 on isa
> > > Dec 10 19:27:32 shell /kernel: lpt0: Interrupt-driven port
> > > Dec 10 19:27:32 shell /kernel: lp0: TCP/IP capable interface
> > > Dec 10 19:27:32 shell /kernel: fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
> > > Dec 10 19:27:32 shell /kernel: fdc0: FIFO enabled, 8 bytes threshold
> > > Dec 10 19:27:32 shell /kernel: fd0: 1.44MB 3.5in
> > > Dec 10 19:27:32 shell /kernel: npx0 flags 0x1 on motherboard
> > > Dec 10 19:27:32 shell /kernel: npx0: INT 16 interface
> > > Dec 10 19:27:32 shell /kernel: IP packet filtering initialized, divert enabled, logging limited to 200 packets/entry
> > > Dec 10 19:27:32 shell /kernel: da0 at ahc0 bus 0 target 0 lun 0
> > > Dec 10 19:27:32 shell /kernel: da0: <CMD TECH CRD-5440-1 C1-5> Fixed Direct Access SCSI2 device 
> > > Dec 10 19:27:32 shell /kernel: da0: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> > > Dec 10 19:27:32 shell /kernel: da0: 6999MB (14335872 512 byte sectors: 64H 32S/T 6999C)
> > > Dec 10 19:27:32 shell /kernel: da1 at ahc0 bus 0 target 0 lun 1
> > > Dec 10 19:27:32 shell /kernel: da1: <CMD TECH CRD-5440-1 C1-5> Fixed Direct Access SCSI2 device 
> > > Dec 10 19:27:32 shell /kernel: da1: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> > > Dec 10 19:27:32 shell /kernel: da1: 10431MB (21362688 512 byte sectors: 64H 32S/T 10431C)
> > > Dec 10 19:27:32 shell /kernel: WARNING: / was not properly dismounted.
> > > Dec 10 19:27:32 shell /kernel: nfs server 10.0.0.1:/var/mail: not responding
> > > Dec 10 19:27:32 shell savecore: no core dump
> > > 
> > > ---
> > > Charles Sprickman
> > > spork@super-g.com
> > > --- 
> > >                      "...there's no idea that's so good you can't 
> > >                       ruin it with a few well-placed idiots." 
> > > 
> > > 
> > > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > > with "unsubscribe freebsd-scsi" in the body of the message
> > > 
> > 
> > 
> > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > with "unsubscribe freebsd-scsi" in the body of the message
> > 
> 
> 
> Ken
> -- 
> Kenneth Merry
> ken@plutotech.com
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.00.9812141207050.6123-100000>