Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Oct 2006 00:22:14 +0200
From:      Paul-Kenji Cahier <deathwolf@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re[2]: VIA Sata Problem maybe?
Message-ID:  <189665893.20061001002214@F1-Photo.com>
In-Reply-To: <7528F5A9-4B4A-4FEC-9726-70BDCF31B631@foolishgames.com>
References:  <52009783.20060929214021@F1-Photo.com> <7528F5A9-4B4A-4FEC-9726-70BDCF31B631@foolishgames.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello, thanks for the reply.

> You mentioned softupdates.  Try disabling softupdates on the file  
> system.  I've had to do this in the past with a sata raid 1 setup or  
> the system would reboot every few days.
I tried that earlier, and it didnt help at all, i got a crash in the minutes after rebooting during fsck.

> Also verify the "speed" of  
> the drives.  I had a problem with freebsd 5.3 where it would set my  
> sata drives as udma33 which caused quite a few problems. 
How do i check the speed?(although i think it's not linked to that)

You may  
> want to check if there are any updates in stable for the driver you  
> are using.  6.2 is close to release and while there are several  
> outstanding issues, it may improve your situation.
where can i find a list of the drivers updates? I downloaded the latest HEAD kernel source, but from looking into the GENERIC configuration file, i didnt see anything for a VIA ATA chipset, or is there more than what is in the GENERIC file?


Best Regards,

Paul-Kenji Cahier


















>> Hello,

>> I am currently experiencing some technical problems with one of my  
>> remote servers running 6.1-RELEASE(#0: Sun May  7 04:32:43 UTC  
>> 2006     root@opus.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC   
>> i386) with the default kernel.
>> Configuration is the following: http://83.149.102.206/lspci.txt

>> The problem is basically the following:
>> Every few days, the server crashes. Sometimes it reboots, sometimes  
>> it doesnt. In all cases, there's not a single trace of anything in  
>> logs(not even a single word in /var/log/messages).
>> The problem usually worsens after it auto-reboots itself as it runs  
>> an fsck at the end of the boot time. That fsck runs, and most of  
>> the time it crashes again, causing a reboot or a complete freeze,  
>> then another reboot, etc. Once the system freezes, my only option
>> is(since i dont have local access to the server) to remote reboot  
>> on a rescue os that allows me to set my rescue freebsd partition as  
>> main, and get a rescue freebsd. Once in, i just fsck the file  
>> system until it is marked clean and reboot. And everything goes  
>> fine...
>> for a few days.

>> Now what i tried was, after an unwanted reboot, to monitor the logs  
>> of fsck running to see what made it crash.

>> Here is what i got from tail -f /var/log/messages(ad4s2f is the / 
>> usr partition that has most of the data):
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615093 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615094 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615095 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615096 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615097 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615098 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615099 (20 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=2615100 (16 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=3674361 (12 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=14994399 (4 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=16016109 (6624 should be 320) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=16016110 (12 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=16016111 (4352 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT  
>> I=16016112 (8 should be 0) (CORRECTED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: UNREF FILE I=6   
>> OWNER=root MODE=100400
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: SIZE=151445575864  
>> MTIME=Sep 29 19:02 2006  (CLEARED)
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: ALLOCATED FRAGS  
>> 431136-431167 MARKED FREE
>> Sep 29 19:31:54 arcueid fsck:
>> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: UNEXPECTED SOFT UPDATE  
>> INCONSISTENCY; RUN fsck MANUALLY.
>> And then the server went more or less haywire and crashed(no other  
>> input in /var/log/messages though)


>> I also monitored the fsck process to see how it was running:
>> <19:15:57> root@arcueid:~$ while (true); do echo `date` `ps aux |  
>> grep fsck | grep ad4`; sleep 2; done
>> <...>
>> Fri Sep 29 19:29:36 CEST 2006 root 777 0.7 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.07 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs)
>> Fri Sep 29 19:29:38 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.09 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs)
>> Fri Sep 29 19:29:40 CEST 2006 root 777 0.6 1.8 19092 18612 ?? RN  
>> 7:11PM 0:21.12 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs)
>> Fri Sep 29 19:29:42 CEST 2006 root 777 0.6 1.8 19092 18612 ?? RN  
>> 7:11PM 0:21.14 fsck_ufs: /dev/ad4s2f p2 94% (fsck_ufs)
>> Fri Sep 29 19:29:44 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.17 fsck_ufs: /dev/ad4s2f p2 94% (fsck_ufs)
>> Fri Sep 29 19:29:46 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.19 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs)
>> Fri Sep 29 19:29:48 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.21 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs)
>> Fri Sep 29 19:29:50 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.25 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs)
>> Fri Sep 29 19:29:52 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.29 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:29:54 CEST 2006 root 777 0.8 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.35 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:29:56 CEST 2006 root 777 0.8 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.38 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:29:58 CEST 2006 root 777 0.8 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.40 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:30:00 CEST 2006 root 777 0.7 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.43 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs)
>> Fri Sep 29 19:30:02 CEST 2006 root 777 0.7 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.47 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:04 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.49 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:06 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.51 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:08 CEST 2006 root 777 0.4 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.53 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:10 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.57 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs)
>> Fri Sep 29 19:30:12 CEST 2006 root 777 0.7 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.59 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:14 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.59 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:16 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.61 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:18 CEST 2006 root 777 0.4 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.62 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs)
>> Fri Sep 29 19:30:20 CEST 2006 root 777 0.4 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.63 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:23 CEST 2006 root 777 0.3 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.64 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:25 CEST 2006 root 777 0.2 1.8 19092 18612 ?? DN  
>> 7:11PM 0:21.66 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:27 CEST 2006 root 777 0.1 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.68 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:30:29 CEST 2006 root 777 0.0 1.8 19092 18612 ?? SN  
>> 7:11PM 0:21.70 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs)
>> Fri Sep 29 19:31:12 CEST 2006 root 777 0.0 1.8 19092 18724 ?? RN  
>> 7:11PM 0:22.17 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:14 CEST 2006 root 777 0.9 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.34 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:16 CEST 2006 root 777 0.9 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.35 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:18 CEST 2006 root 777 0.7 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.37 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:20 CEST 2006 root 777 0.7 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.40 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:22 CEST 2006 root 777 0.6 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.40 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:25 CEST 2006 root 777 0.5 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.42 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:27 CEST 2006 root 777 0.5 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.45 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:29 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.46 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:31 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.48 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:33 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.50 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:35 CEST 2006 root 777 0.3 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.51 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:37 CEST 2006 root 777 0.2 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.53 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:39 CEST 2006 root 777 0.1 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.55 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:41 CEST 2006 root 777 0.2 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.57 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:43 CEST 2006 root 777 0.1 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.57 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:45 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.58 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:47 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.59 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:49 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.60 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:51 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.61 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:53 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN  
>> 7:11PM 0:22.61 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs)
>> Fri Sep 29 19:31:55 CEST 2006
>> Fri Sep 29 19:31:57 CEST 2006
>> Fri Sep 29 19:31:59 CEST 2006
>> Fri Sep 29 19:32:01 CEST 2006
>> Fri Sep 29 19:32:03 CEST 2006
>> Fri Sep 29 19:32:05 CEST 2006
>> Fri Sep 29 19:32:07 CEST 2006
>> Fri Sep 29 19:32:09 CEST 2006
>> Fri Sep 29 19:32:11 CEST 2006
>> Fri Sep 29 19:32:13 CEST 2006
>> Fri Sep 29 19:32:15 CEST 2006
>> Fri Sep 29 19:32:17 CEST 2006
>> Fri Sep 29 19:32:19 CEST 2006
>> Fri Sep 29 19:32:21 CEST 2006
>> Fri Sep 29 19:32:23 CEST 2006
>> Fri Sep 29 19:32:25 CEST 2006
>> <boom>
>> As you can see, at the same time as softupdate reported a problem,  
>> the whole system wouldnt work anymore, process wouldnt display  
>> properly anymore either.


>> RAM was extensively tested and is perfectly fine.
>> The power supply unit seems to be running really fine too, with no  
>> particular events.
>> The hard drive itself was tested extensively using both smart  
>> values and writing/reading from a linux setup and had not a single  
>> fault or alarming value.


>> Another quite surprising event is the following:
>> After a crash, i went into the rescue freebsd and as usual started  
>> an fsck. Result was:
>> arcueid-rescue# fsck -y /dev/ad4s2f
>> ** /dev/ad4s2f
>> ** Last Mounted on /usr
>> ** Phase 1 - Check Blocks and Sizes

>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> 1362084 files, 47944480 used, 23676944 free (1752392 frags, 2740569  
>> blocks, 2.4% fragmentation)

>> ***** FILE SYSTEM MARKED CLEAN *****


>> Just for the fun i ran it a second time just after:
>> arcueid-rescue# fsck -y /dev/ad4s2f
>> ** /dev/ad4s2f
>> ** Last Mounted on /usr
>> ** Phase 1 - Check Blocks and Sizes


>> CANNOT READ BLK: 190083360
>> UNEXPECTED SOFT UPDATE INCONSISTENCY

>> CONTINUE? yes

>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190083360, 190083361,  
>> 190083362, 190083363, 190083364, 190083365, 190083366, 190083367,  
>> 190083368, 190083369, 190083370, 190083371, 190083372, 190083373,  
>> 190083374, 190083375, 190083376, 190083377, 190083378, 190083379,
>> 190083380, 190083381, 190083382, 190083383, 190083384, 190083385,  
>> 190083386, 190083387, 190083388, 190083389, 190083390, 190083391,
>> INCORRECT BLOCK COUNT I=18375172 (640 should be 416)
>> CORRECT? yes


>> CANNOT READ BLK: 293557088
>> UNEXPECTED SOFT UPDATE INCONSISTENCY

>> CONTINUE? yes

>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 293557088, 293557089,  
>> 293557090, 293557091, 293557092, 293557093, 293557094, 293557095,  
>> 293557096, 293557097, 293557098, 293557099, 293557100, 293557101,  
>> 293557102, 293557103, 293557104, 293557105, 293557106, 293557107,
>> 293557108, 293557109, 293557110, 293557111, 293557112, 293557113,  
>> 293557114, 293557115, 293557116, 293557117, 293557118, 293557119,

>> CANNOT READ BLK: 190090784
>> UNEXPECTED SOFT UPDATE INCONSISTENCY

>> CONTINUE? yes

>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190090784, 190090785,  
>> 190090786, 190090787, 190090788, 190090789, 190090790, 190090791,  
>> 190090792, 190090793, 190090794, 190090795, 190090796, 190090797,  
>> 190090798, 190090799, 190090800, 190090801, 190090802, 190090803,
>> 190090804, 190090805, 190090806, 190090807, 190090808, 190090809,  
>> 190090810, 190090811, 190090812, 190090813, 190090814, 190090815,
>> INCORRECT BLOCK COUNT I=18375174 (480 should be 416)
>> CORRECT? yes


>> CANNOT READ BLK: 190122624
>> UNEXPECTED SOFT UPDATE INCONSISTENCY

>> CONTINUE? yes

>> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190122624, 190122625,  
>> 190122626, 190122627, 190122628, 190122629, 190122630, 190122631,  
>> 190122632, 190122633, 190122634, 190122635, 190122636, 190122637,  
>> 190122638, 190122639, 190122640, 190122641, 190122642, 190122643,
>> 190122644, 190122645, 190122646, 190122647, 190122648, 190122649,  
>> 190122650, 190122651, 190122652, 190122653, 190122654, 190122655,
>> INCORRECT BLOCK COUNT I=18375178 (512 should be 416)
>> CORRECT? yes


>> CANNOT READ BLK: 190422528
>> UNEXPECTED SOFT UPDATE INCONSISTENCY

>> CONTINUE? yes
>> [etc]

>> And then it crashed....
>> Again i did the sata tests numerous times and the drive seems
>> perfectly healthy... So i have to think it might be driver related
>> rather than hardware related.


>> Any help would be appreciated,

>> -- 
>> Best regards,
>>  Paul-Kenji

>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers- 
>> unsubscribe@freebsd.org"


> Lucas Holt
> Luke@FoolishGames.com
> ________________________________________________________
> FoolishGames.com  (Jewel Fan Site)
> JustJournal.com (Free blogging)
> FoolishGames.net (Enemy Territory site)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?189665893.20061001002214>