Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Apr 1995 09:12:25 -0500 (CDT)
From:      Mike Pritchard <pritc003@maroon.tc.umn.edu>
To:        rlenk@xmission.com (Ron Lenk)
Cc:        questions@FreeBSD.org
Subject:   Re: SCSI timeout...
Message-ID:  <199504151412.JAA06761@mpp.com>
In-Reply-To: <199504150141.TAA04738@xmission.xmission.com> from "Ron Lenk" at Apr 14, 95 07:41:30 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> I had run the 2.0-920210-SNAP from the for approx. 3 weeks, at which
> point, about the first week in April, I compiled from the "current"
> sources. Everthing ran fine until the first of this week when I added
> the SyQuest drive to the SCSI bus. I was able to fdisk, disklabel, and
> create a new filesystem on the drive, and things worked great under
> _light_ use. However, when I attempted to copy the contents of
> /usr/src ( 110 Mb, as we all know ) to the SyQuest mounted on /mnt, I
> got about half way through the copy when I began getting a kernel
> message that sd0 ( the 1.05 Micropolis disk ) had timed out. I got the
> message about 8 times, and then the system appeared to hang. After
> doing a hard reset, I found the root filesystem (sd0a) was damaged
> beyond repair ( by me, anyway ), and I was forced to attempt to
> reinstall everything

I just ran into the same problem a few days ago with an Adaptec 2842VL SCSI 
controller and two SCSI disk and running -current.  I can reproduce the 
problem at will by doing something like:

find /disk1 -print > /dev/null &
[run about 6 of the above commands in the background]
find /disk2 -print > /dev/null &
[run a couple of the above commands in the background]

After about 30 seconds or so you will start to see timeouts.  It looks
like the data corruption you saw was due to the fact that after
a while the I/O that is timing out will be completed with garbage.
E.g. the finds start printing stuff like
/usr/src/sys/AAD,kjhet2@#$098 not found
Only there are lots of really strange characters in the filename.

I suspect that this might be related to writes on the disk,
since I've also seen this happen if I do something like:

cd /disk1
touch a b c d e
cd /disk2
[start doing some I/O on disk2]
after a bit you might see timeouts

If I do a sync before doing the cd /disk2, then there isn't
a problem.  I suspect that the finds die out when sync runs
to flush the i-nodes back to disk to update the directory access times.

> I have looked into the obvious problems, i.e. problems with either of
> the disks, improper termination of the SCSI bus, excessive SCSI bus
> length, etc. But I'm not sure what is causing the problem. I do know
> that everything works fine under Windows NT, with native support for
> the 2842. ( although I'm not sure that this is a fair comparison )
> 
> Any advise, help, or suggestions would be appreciated.

Well, to prevent disk damage, hit the reset button when you
start to see the timeout messages.  
-- 
Mike Pritchard
pritc003@maroon.tc.umn.edu
"Go that way.  Really fast.  If something gets in your way, turn"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199504151412.JAA06761>