Date: Tue, 14 Dec 1999 05:00:03 -0800 (PST) From: Ken Harrenstien <klh@netcom.com> To: freebsd-bugs@FreeBSD.org Subject: Re: kern/15447: Seagate ST32550 (Barracuda 2LP) may be a broken tagged queueing drive? Message-ID: <199912141300.FAA09212@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/15447; it has been noted by GNATS. From: Ken Harrenstien <klh@netcom.com> To: "Kenneth D. Merry" <ken@kdm.org> Cc: klh@netcom.com, freebsd-gnats-submit@FreeBSD.ORG Subject: Re: kern/15447: Seagate ST32550 (Barracuda 2LP) may be a broken tagged queueing drive? Date: Tue, 14 Dec 99 4:58:33 PST > On Sun, Dec 12, 1999 at 06:40:28PM -0800, klh@netcom.com wrote: > > A separate problem is causing my system to sometimes boot up with > > tagged queueing enabled and sometimes not. I've recently been stressing > > the disk significantly more than usual and have encountered user-level > > I/O errors that I traced back to the enabling of tagged queueing. > > > > With tagged queueing off, everything always works. With it on, > > a heavy load of seeks will cause reads and writes to start failing. > > I was able to verify this by running a test case on two ST32550s, both > > on line during the same kernel boot and both identical in all respects > > except that one had tagged queueing enabled and the other didn't (the > > randomness of this enabling is a separate problem). The drive without > > tagging always works perfectly; the drive with tagging always fails > > at random places during the test. I verified that it is not specific > > to the individual drives by doing reboots until the formerly tag-enabled > > drive booted up tag-disabled -- whereupon it then performed perfectly > > again. I also verified that the filesystems were identical by doing > > a complete track-by-track copy of one to the other prior to testing. > > > > The ST32550 is not in the latest quirks table in cam_xtp.c, although > > several other Seagates are. The Barracuda 2LP was at one time fairly > > popular so I'm a little surprised this hasn't shown up before, but > > who knows. Maybe most FreeBSD users have IDE drives. > > No, there are many, many people using Seagate drives (including me) > successfully in FreeBSD systems. I think this problem is most likely > peculiar to your particular system and/or drives. Agreed. These are surplus drives that appear to be Sun OEM but those, also, are in wide use. If any of the hardware is to be suspected, I would squint at the AM53C974 or more properly its driver. Read on. > > >Fix: > > Obviously the ST32550 can be added to the quirks table in cam_xtp.c. > > I just hope this does not reflect some underlying problem with > > tagged queueing support of Seagates in general. > > Nope, it reflects a problem either with your drives or your cabling and > termination setup. I think the cabling and termination is highly unlikely to be a problem in this case; I am familiar with the SCSI requirements and use high-quality components, active termination, etc. In any case, it's only a fast-10 bus and there have been no other signs of trouble. It's *only* when the kernel thinks that tagged queueing is enabled that I start to get user-mode I/O errors, and then only when doing a lot of long-distance seeks (ie when commands would start piling up). One of the main things I've been trying to pin down is whether this problem is specific to the ST32550s or if it happens with other drives as well. Finally, after several hours far into the night of reboots with various versions and flags, I struck paydirt and enticed the system to come up with the Fujitsu M2952 TQ-enabled. Guess what? It behaves just like the ST32550s, meaning that it causes the same problems with Tagged-Queueing enabled, but works fine otherwise. I wondered if perhaps the problem might be a queue-full condition; the ST32550 manual says it can handle up to 64 commands, while the kernel default is 255 (implying it expects a QUEUE FULL response from the drive). So I tried adding a quirk entry limiting the Seagate to a maxtags of 63. No luck. Tried 32. Still no change. Now I'm using 0 which disables it altogether and things are now safe. > You need to supply some more information before we can make any sort of > guess at what is going on. So, please send (and make sure you do a "group" > reply to this mail, so it winds up in the PR database) full 'dmesg' output > from your system, including any kernel messages that have shown up while > doing your tests. > > Please don't send the output of /var/log/messages, unless it is necessary > to show problems that happened in a previous boot. The output of dmesg(8) > is easier to read. Done; see response to kern/15446. > Also, please send a description of your cabling and termination setup. #1 ---- #7 ---- #0 ---- #2 ---- #3 ---- TERM DPES amd0 ST32550 ST32550 M2952 (term) #1 internal, #0,2,3 external. Because of your statement that the ST32550 is known to work, and the fact that my Fujitsu was failing in the same way, I don't think the drives are at fault. So we're left with either the controller, or FreeBSD 3.1's support of it, or something else. The controller seems unlikely since Tagged Queueing is a higher-level protocol and there's no reason to suspect either the physical bus or the link-level protocol (otherwise many more problems would have evinced themselves). One more data point. I use the same kernel source base in another system (NCR 53c895, 3 IBM drives) where all drives are TQ-enabled and have never had problems despite much heavier usage. I'm starting to think that whatever is causing the kernel to be spastic about whether or not to use Tagged Queueing (cf kern/15446) may also be responsible for its failure to operate properly. In any case, since the ST32550 is no longer a suspect, I suggest that this bug (kern/15447) be closed and the above information made a follow-up to kern/15446. --Ken To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912141300.FAA09212>