Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 1998 01:05:23 -0500
From:      Patrick Hartling <mystify@friley63.res.iastate.edu>
To:        scsi@FreeBSD.ORG
Subject:   CAM == CAM Ate my Machine (and severly corrupted file systems too)
Message-ID:  <199804240605.BAA02352@friley63.res.iastate.edu>

index | next in thread | raw e-mail

The intention of this message is to warn people of the possibilty of serious
disk corruption when using CAM + SMP + ccd.  Other factors could be involved,
but I've never had anything like this happen to me before in 2 years of
running FreeBSD.

This morning when I got back from class, I discovered that my machine had
apparently gotten hungry and had eaten itself.  It had been very stable for
10 days running an SMP kernel with the CAM patches (built April 13, 1998),
but then this happened.  Unfortunately, I don't know what caused this, but
it certainly caused me a lot of stress this morning.

My current disk configuration is three UW SCSI disks (two Quantum Viking's
and one WD Enterprise) with one Viking and the Enterprise on a BusLogic
BT-958 and the other Viking on the onboard Adaptec 2940UW.  I have a
mirrored ccd across the two Viking disks.  Besides that, I'd say that
everything concerning partitions/slices is fairly typical.  (I also have a
Jaz disk and a CD-ROM drive plugged into the BusLogic controller.)

At any rate, my /var was completely trashed.  fsck core dumped on it
repeatedly.  /usr was pretty well hosed too.  Lots of files (mostly shared
libraries) were removed by fsck.  This was easy to replace since my /usr/src
and /usr/obj partitions were fully intact.  'make install' saved the day
here--once I got ld.so and libc.so.3.1 restored.

However, the real horror story was the complete loss of my home directory.
BUT I have /home on the mirrored ccd, and the second partition in the ccd was
fully intact by some miracle.  :)  The first partition was thoroughly
trashed.  Everything that was in my base directory ended up in lost+found, so
I could have gotten it back if I had spent the time to go through each file
and directory and rename everything.  Once I found that the second partition
was fine, I tried to do:

	dd if=/dev/rda2s1e of=/dev/rda1s1e bs=64k

but it kept saying that rda1s1e was a read-only filesystem.  I could be
wrong, but that seems kind of odd.  This was after going through the
appropriate steps to split the mirrored ccd up so that I could get to each
partition individually.  Using the block device worked fine, and I was able
to get the whole ccd back in operation.

Since getting everything more or less back to normal, I have crashed my
machine again today by accidentally doing:

	disklabel -r sd4c

I'm still not fully used to the da stuff, but now that I have discovered
mixing it up can be fatal to stability, I'll remember to be more careful.  :)

So, unless someone can tell me what mistakes I've made to cause all this, I
would recommend that people be extra careful with using the current CAM code
(even though I'm really impressed with it overall).  Personally, I'm feeling
pretty edgy now, but I'll keep on using it and be sure to make frequent
backups of important data.

 -Patrick


Patrick L. Hartling			| Research Assistant, ICEMT
mystify@friley63.res.iastate.edu	| SE Lab - 1117 Black Engineering
http://www.public.iastate.edu/~oz	| http://www.icemt.iastate.edu

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message


help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804240605.BAA02352>