Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Mar 2002 01:10:36 -0600
From:      Bob Giesen <BobGiesen@earthlink.net>
To:        freebsd-questions@freebsd.org
Subject:   FBSD 4.4 UDMA ICRC error
Message-ID:  <E16hQ8o-0003kX-00@falcon.prod.itd.earthlink.net>

next in thread | raw e-mail | index | archive | help
   I spent a couple of days messing around with this upgrade, so I'd 
really like to know what happened, here, but I won't lose any sleep 
over it.  The box at issue is happily running FBSD 3.2 again and I 
don't feel compelled to upgrade it, just yet...
   I have a dual-boot (w98/FBSD) system on which I've run FBSD 3.2 
for a couple of years, without major incident.  When I recently 
successfully upgraded another system to 4.4, I decided to upgrade my 
second box.  As soon as I booted it, however, I began to get many 
errors like this one:

ad1s1f: UDMA ICRC error reading fsbn 17694848 of 8060992-8061007 
(ad1s1 bn 17694848; cn 17554 tn 6 sn 38) retrying

   I searched the FBSD archives and Googled the Net, only to find 
suggestions that I had
(A) a faulty hard drive,
(B) a faulty IDE ribbon cable, or
(C) electro-magnetic interference from nearby power cables or a coil.

   Hoping for a cheap and easy fix, I first tried rerouting my ribbon 
cable, to no avail.  (I actually held it as far from everything as 
possible while firing up the box, but I still got the errors.)
   Since I have a few, never-been-used cables, I tried a couple of 
these -- one of each, 40- and 80-conductors.  No dice.
   Then, I wondered if it might be the power cable supplying the hard 
drive, so I swapped supply lines a couple of times.  Nothing doing...
   Fearing the worst, I still couldn't help but wonder if the upgrade 
to 4.4 somehow got hosed.  So, I backed up my /etc and /home trees 
and did a *clean* install of 4.4 -- newfs'd the drive and installed 
from scratch.
   Funny thing happened when I did the backup: I tar'd and cp'd those 
trees to my DOS drive -- and got the same errors during the backup -- 
except some of them referred to the DOS slice (ad0s7) to which I was 
saving the tar files.  Hhmmm... fat chance that both of my hard 
drives went south at the same time.  I was now pretty sure I had an 
OS problem -- be it native to the new version or something in my 
configuration.
   After installing the fresh 4.4, I still got the UDMA ICRC errors 
-- with the original GENERIC kernel as well as with a few custom ones 
I compiled.
   I got X working and the box, for the most part, seemed to behave 
normally while working in the GUI, but command-line operations would 
repeatedly spit out those pesky error messages.  Finally, frustration 
having set in and beginning to feel some fear of losing even more 
time due to some future failure, I gave in to the nagging question of 
how the box might behave if I reinstalled 3.2 -- so, I did.  It is 
back to normal for about 4 days, now -- not a single error of any 
kind.
   Being that the UDMA ICRC errors began with the first boot of 4.4 
and continued through the last login session on that install -- and 
that the errors ceased immediately upon wiping it off the hard drive 
and reinstalling 3.2, I have little doubt that the problem lay 
somewhere in the software.  So, does anyone have a clue as to what 
might have caused this or how I might have prevented it by way of 
system configuration?
   The relevant hardware is:
Maxtor 91303D6 13-GB HD's (both, ad0 (DOS) and ad1 (FBSD)) --
   (33MB/sec UDMA mode 2)
SCE (Superpower Computer Electronics) SP - A586B mobo
   w/ Award BIOS ver A.9
AMD K6-2 500 AFX
   I have both hard drives on the primary IDE port (FBSD slave) and 
my only secondary IDE device is a DVD-ROM.  Grasping at straws, I 
tried the DVD as slave as well as master to see if it affected either 
3.2 or 4.4 in any important way; it didn't.
   My dmesg output (which I didn't save, from any 4.4 boots) didn't 
seem to indicate anything wrong - except for those UDMA errors 
popping up as the boot prgressed.  (It's probably just a coincidence, 
but I did notice that the error would first appear immediately after 
the word (linux) -- on the same line, with no preceding space -- 
presumably while the linux compatibility was being loaded by the 
kernel...)  If it's of any use, I can provide a current, 3.2, dmesg 
output (the hardware's the same); I'm leaving it out of this message 
to reduce the length of this already-long message.
   Thanks, in advance...
Bob

-- 
"The ultimate measure of a man is not where he stands in moments of 
comfort and convenience, but where he stands at times of challenge 
and controversy." -- Martin Luther King, Jr. (1929 - 1968)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E16hQ8o-0003kX-00>