Date: Sun, 1 Oct 2006 00:22:14 +0200 From: Paul-Kenji Cahier <deathwolf@gmail.com> To: freebsd-hackers@freebsd.org Subject: Re[2]: VIA Sata Problem maybe? Message-ID: <189665893.20061001002214@F1-Photo.com> In-Reply-To: <7528F5A9-4B4A-4FEC-9726-70BDCF31B631@foolishgames.com> References: <52009783.20060929214021@F1-Photo.com> <7528F5A9-4B4A-4FEC-9726-70BDCF31B631@foolishgames.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, thanks for the reply. > You mentioned softupdates. Try disabling softupdates on the file > system. I've had to do this in the past with a sata raid 1 setup or > the system would reboot every few days. I tried that earlier, and it didnt help at all, i got a crash in the minutes after rebooting during fsck. > Also verify the "speed" of > the drives. I had a problem with freebsd 5.3 where it would set my > sata drives as udma33 which caused quite a few problems. How do i check the speed?(although i think it's not linked to that) You may > want to check if there are any updates in stable for the driver you > are using. 6.2 is close to release and while there are several > outstanding issues, it may improve your situation. where can i find a list of the drivers updates? I downloaded the latest HEAD kernel source, but from looking into the GENERIC configuration file, i didnt see anything for a VIA ATA chipset, or is there more than what is in the GENERIC file? Best Regards, Paul-Kenji Cahier >> Hello, >> I am currently experiencing some technical problems with one of my >> remote servers running 6.1-RELEASE(#0: Sun May 7 04:32:43 UTC >> 2006 root@opus.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC >> i386) with the default kernel. >> Configuration is the following: http://83.149.102.206/lspci.txt >> The problem is basically the following: >> Every few days, the server crashes. Sometimes it reboots, sometimes >> it doesnt. In all cases, there's not a single trace of anything in >> logs(not even a single word in /var/log/messages). >> The problem usually worsens after it auto-reboots itself as it runs >> an fsck at the end of the boot time. That fsck runs, and most of >> the time it crashes again, causing a reboot or a complete freeze, >> then another reboot, etc. Once the system freezes, my only option >> is(since i dont have local access to the server) to remote reboot >> on a rescue os that allows me to set my rescue freebsd partition as >> main, and get a rescue freebsd. Once in, i just fsck the file >> system until it is marked clean and reboot. And everything goes >> fine... >> for a few days. >> Now what i tried was, after an unwanted reboot, to monitor the logs >> of fsck running to see what made it crash. >> Here is what i got from tail -f /var/log/messages(ad4s2f is the / >> usr partition that has most of the data): >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615093 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615094 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615095 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615096 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615097 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615098 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615099 (20 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=2615100 (16 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=3674361 (12 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=14994399 (4 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=16016109 (6624 should be 320) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=16016110 (12 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=16016111 (4352 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: INCORRECT BLOCK COUNT >> I=16016112 (8 should be 0) (CORRECTED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: UNREF FILE I=6 >> OWNER=root MODE=100400 >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: SIZE=151445575864 >> MTIME=Sep 29 19:02 2006 (CLEARED) >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: ALLOCATED FRAGS >> 431136-431167 MARKED FREE >> Sep 29 19:31:54 arcueid fsck: >> Sep 29 19:31:54 arcueid fsck: /dev/ad4s2f: UNEXPECTED SOFT UPDATE >> INCONSISTENCY; RUN fsck MANUALLY. >> And then the server went more or less haywire and crashed(no other >> input in /var/log/messages though) >> I also monitored the fsck process to see how it was running: >> <19:15:57> root@arcueid:~$ while (true); do echo `date` `ps aux | >> grep fsck | grep ad4`; sleep 2; done >> <...> >> Fri Sep 29 19:29:36 CEST 2006 root 777 0.7 1.8 19092 18612 ?? DN >> 7:11PM 0:21.07 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs) >> Fri Sep 29 19:29:38 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN >> 7:11PM 0:21.09 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs) >> Fri Sep 29 19:29:40 CEST 2006 root 777 0.6 1.8 19092 18612 ?? RN >> 7:11PM 0:21.12 fsck_ufs: /dev/ad4s2f p2 92% (fsck_ufs) >> Fri Sep 29 19:29:42 CEST 2006 root 777 0.6 1.8 19092 18612 ?? RN >> 7:11PM 0:21.14 fsck_ufs: /dev/ad4s2f p2 94% (fsck_ufs) >> Fri Sep 29 19:29:44 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN >> 7:11PM 0:21.17 fsck_ufs: /dev/ad4s2f p2 94% (fsck_ufs) >> Fri Sep 29 19:29:46 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN >> 7:11PM 0:21.19 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs) >> Fri Sep 29 19:29:48 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN >> 7:11PM 0:21.21 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs) >> Fri Sep 29 19:29:50 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN >> 7:11PM 0:21.25 fsck_ufs: /dev/ad4s2f p2 95% (fsck_ufs) >> Fri Sep 29 19:29:52 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN >> 7:11PM 0:21.29 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs) >> Fri Sep 29 19:29:54 CEST 2006 root 777 0.8 1.8 19092 18612 ?? SN >> 7:11PM 0:21.35 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs) >> Fri Sep 29 19:29:56 CEST 2006 root 777 0.8 1.8 19092 18612 ?? SN >> 7:11PM 0:21.38 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs) >> Fri Sep 29 19:29:58 CEST 2006 root 777 0.8 1.8 19092 18612 ?? DN >> 7:11PM 0:21.40 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs) >> Fri Sep 29 19:30:00 CEST 2006 root 777 0.7 1.8 19092 18612 ?? DN >> 7:11PM 0:21.43 fsck_ufs: /dev/ad4s2f p2 96% (fsck_ufs) >> Fri Sep 29 19:30:02 CEST 2006 root 777 0.7 1.8 19092 18612 ?? SN >> 7:11PM 0:21.47 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs) >> Fri Sep 29 19:30:04 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN >> 7:11PM 0:21.49 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs) >> Fri Sep 29 19:30:06 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN >> 7:11PM 0:21.51 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs) >> Fri Sep 29 19:30:08 CEST 2006 root 777 0.4 1.8 19092 18612 ?? DN >> 7:11PM 0:21.53 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs) >> Fri Sep 29 19:30:10 CEST 2006 root 777 0.6 1.8 19092 18612 ?? DN >> 7:11PM 0:21.57 fsck_ufs: /dev/ad4s2f p2 97% (fsck_ufs) >> Fri Sep 29 19:30:12 CEST 2006 root 777 0.7 1.8 19092 18612 ?? SN >> 7:11PM 0:21.59 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs) >> Fri Sep 29 19:30:14 CEST 2006 root 777 0.6 1.8 19092 18612 ?? SN >> 7:11PM 0:21.59 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs) >> Fri Sep 29 19:30:16 CEST 2006 root 777 0.5 1.8 19092 18612 ?? DN >> 7:11PM 0:21.61 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs) >> Fri Sep 29 19:30:18 CEST 2006 root 777 0.4 1.8 19092 18612 ?? SN >> 7:11PM 0:21.62 fsck_ufs: /dev/ad4s2f p2 98% (fsck_ufs) >> Fri Sep 29 19:30:20 CEST 2006 root 777 0.4 1.8 19092 18612 ?? SN >> 7:11PM 0:21.63 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs) >> Fri Sep 29 19:30:23 CEST 2006 root 777 0.3 1.8 19092 18612 ?? DN >> 7:11PM 0:21.64 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs) >> Fri Sep 29 19:30:25 CEST 2006 root 777 0.2 1.8 19092 18612 ?? DN >> 7:11PM 0:21.66 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs) >> Fri Sep 29 19:30:27 CEST 2006 root 777 0.1 1.8 19092 18612 ?? SN >> 7:11PM 0:21.68 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs) >> Fri Sep 29 19:30:29 CEST 2006 root 777 0.0 1.8 19092 18612 ?? SN >> 7:11PM 0:21.70 fsck_ufs: /dev/ad4s2f p2 99% (fsck_ufs) >> Fri Sep 29 19:31:12 CEST 2006 root 777 0.0 1.8 19092 18724 ?? RN >> 7:11PM 0:22.17 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:14 CEST 2006 root 777 0.9 1.8 19092 18748 ?? DN >> 7:11PM 0:22.34 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:16 CEST 2006 root 777 0.9 1.8 19092 18748 ?? DN >> 7:11PM 0:22.35 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:18 CEST 2006 root 777 0.7 1.8 19092 18748 ?? DN >> 7:11PM 0:22.37 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:20 CEST 2006 root 777 0.7 1.8 19092 18748 ?? DN >> 7:11PM 0:22.40 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:22 CEST 2006 root 777 0.6 1.8 19092 18748 ?? DN >> 7:11PM 0:22.40 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:25 CEST 2006 root 777 0.5 1.8 19092 18748 ?? DN >> 7:11PM 0:22.42 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:27 CEST 2006 root 777 0.5 1.8 19092 18748 ?? DN >> 7:11PM 0:22.45 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:29 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN >> 7:11PM 0:22.46 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:31 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN >> 7:11PM 0:22.48 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:33 CEST 2006 root 777 0.4 1.8 19092 18748 ?? DN >> 7:11PM 0:22.50 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:35 CEST 2006 root 777 0.3 1.8 19092 18748 ?? DN >> 7:11PM 0:22.51 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:37 CEST 2006 root 777 0.2 1.8 19092 18748 ?? DN >> 7:11PM 0:22.53 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:39 CEST 2006 root 777 0.1 1.8 19092 18748 ?? DN >> 7:11PM 0:22.55 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:41 CEST 2006 root 777 0.2 1.8 19092 18748 ?? DN >> 7:11PM 0:22.57 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:43 CEST 2006 root 777 0.1 1.8 19092 18748 ?? DN >> 7:11PM 0:22.57 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:45 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN >> 7:11PM 0:22.58 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:47 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN >> 7:11PM 0:22.59 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:49 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN >> 7:11PM 0:22.60 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:51 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN >> 7:11PM 0:22.61 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:53 CEST 2006 root 777 0.0 1.8 19092 18748 ?? DN >> 7:11PM 0:22.61 fsck_ufs: /dev/ad4s2f p4 0% (fsck_ufs) >> Fri Sep 29 19:31:55 CEST 2006 >> Fri Sep 29 19:31:57 CEST 2006 >> Fri Sep 29 19:31:59 CEST 2006 >> Fri Sep 29 19:32:01 CEST 2006 >> Fri Sep 29 19:32:03 CEST 2006 >> Fri Sep 29 19:32:05 CEST 2006 >> Fri Sep 29 19:32:07 CEST 2006 >> Fri Sep 29 19:32:09 CEST 2006 >> Fri Sep 29 19:32:11 CEST 2006 >> Fri Sep 29 19:32:13 CEST 2006 >> Fri Sep 29 19:32:15 CEST 2006 >> Fri Sep 29 19:32:17 CEST 2006 >> Fri Sep 29 19:32:19 CEST 2006 >> Fri Sep 29 19:32:21 CEST 2006 >> Fri Sep 29 19:32:23 CEST 2006 >> Fri Sep 29 19:32:25 CEST 2006 >> <boom> >> As you can see, at the same time as softupdate reported a problem, >> the whole system wouldnt work anymore, process wouldnt display >> properly anymore either. >> RAM was extensively tested and is perfectly fine. >> The power supply unit seems to be running really fine too, with no >> particular events. >> The hard drive itself was tested extensively using both smart >> values and writing/reading from a linux setup and had not a single >> fault or alarming value. >> Another quite surprising event is the following: >> After a crash, i went into the rescue freebsd and as usual started >> an fsck. Result was: >> arcueid-rescue# fsck -y /dev/ad4s2f >> ** /dev/ad4s2f >> ** Last Mounted on /usr >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> 1362084 files, 47944480 used, 23676944 free (1752392 frags, 2740569 >> blocks, 2.4% fragmentation) >> ***** FILE SYSTEM MARKED CLEAN ***** >> Just for the fun i ran it a second time just after: >> arcueid-rescue# fsck -y /dev/ad4s2f >> ** /dev/ad4s2f >> ** Last Mounted on /usr >> ** Phase 1 - Check Blocks and Sizes >> CANNOT READ BLK: 190083360 >> UNEXPECTED SOFT UPDATE INCONSISTENCY >> CONTINUE? yes >> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190083360, 190083361, >> 190083362, 190083363, 190083364, 190083365, 190083366, 190083367, >> 190083368, 190083369, 190083370, 190083371, 190083372, 190083373, >> 190083374, 190083375, 190083376, 190083377, 190083378, 190083379, >> 190083380, 190083381, 190083382, 190083383, 190083384, 190083385, >> 190083386, 190083387, 190083388, 190083389, 190083390, 190083391, >> INCORRECT BLOCK COUNT I=18375172 (640 should be 416) >> CORRECT? yes >> CANNOT READ BLK: 293557088 >> UNEXPECTED SOFT UPDATE INCONSISTENCY >> CONTINUE? yes >> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 293557088, 293557089, >> 293557090, 293557091, 293557092, 293557093, 293557094, 293557095, >> 293557096, 293557097, 293557098, 293557099, 293557100, 293557101, >> 293557102, 293557103, 293557104, 293557105, 293557106, 293557107, >> 293557108, 293557109, 293557110, 293557111, 293557112, 293557113, >> 293557114, 293557115, 293557116, 293557117, 293557118, 293557119, >> CANNOT READ BLK: 190090784 >> UNEXPECTED SOFT UPDATE INCONSISTENCY >> CONTINUE? yes >> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190090784, 190090785, >> 190090786, 190090787, 190090788, 190090789, 190090790, 190090791, >> 190090792, 190090793, 190090794, 190090795, 190090796, 190090797, >> 190090798, 190090799, 190090800, 190090801, 190090802, 190090803, >> 190090804, 190090805, 190090806, 190090807, 190090808, 190090809, >> 190090810, 190090811, 190090812, 190090813, 190090814, 190090815, >> INCORRECT BLOCK COUNT I=18375174 (480 should be 416) >> CORRECT? yes >> CANNOT READ BLK: 190122624 >> UNEXPECTED SOFT UPDATE INCONSISTENCY >> CONTINUE? yes >> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 190122624, 190122625, >> 190122626, 190122627, 190122628, 190122629, 190122630, 190122631, >> 190122632, 190122633, 190122634, 190122635, 190122636, 190122637, >> 190122638, 190122639, 190122640, 190122641, 190122642, 190122643, >> 190122644, 190122645, 190122646, 190122647, 190122648, 190122649, >> 190122650, 190122651, 190122652, 190122653, 190122654, 190122655, >> INCORRECT BLOCK COUNT I=18375178 (512 should be 416) >> CORRECT? yes >> CANNOT READ BLK: 190422528 >> UNEXPECTED SOFT UPDATE INCONSISTENCY >> CONTINUE? yes >> [etc] >> And then it crashed.... >> Again i did the sata tests numerous times and the drive seems >> perfectly healthy... So i have to think it might be driver related >> rather than hardware related. >> Any help would be appreciated, >> -- >> Best regards, >> Paul-Kenji >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers- >> unsubscribe@freebsd.org" > Lucas Holt > Luke@FoolishGames.com > ________________________________________________________ > FoolishGames.com (Jewel Fan Site) > JustJournal.com (Free blogging) > FoolishGames.net (Enemy Territory site)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?189665893.20061001002214>