Date: Mon, 12 Sep 2005 15:53:27 +0200 From: MaXX <bs139412@skynet.be> To: freebsd-stable@freebsd.org Subject: Re: Stress testing and TIMEOUT - WRITE_DMA Message-ID: <200509121553.27981.bs139412@skynet.be> In-Reply-To: <20050912120040.02A6B16A41F@hub.freebsd.org> References: <20050912120040.02A6B16A41F@hub.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 26 Aug 2005 03:21:35 -0600 Anthony Chavez <acc@anthonychavez.org> wrote: > My question is simply this: is the fact that I received 4 TIMEOUT > warnings in the space of roughly 2 weeks significant cause for concern? Hi, You may have a look at this pr :85603 (FS corruption and 'uncorrectable' DMA errors on ATA disks after unclean shutdown) and see if that applies for you. Are you running a kernel built around mid June this year? Did your machine paniced before the DMA problems appears (I think a power faillure can do the trick too)? We were severall usenet user experiencing this kind of problems (news://comp.unix.bsd.freebsd.misc thread was named "Disaster Recovery? and started 30 Aug 05). If you have the same problem as us, the fix is easy: - backup your data with tar (will take a while due to timeouts) - fdisk + newfs - reinstall your backup - cvsup + upgrade your kernel and thats all... And I was surprised to see my PostgreSQL database coming online without a single error message Pg really hate when theFS is inconsistent... In our case this problem was fixed by newfs, even smartctl (sysutils/smartmontool) did report errors at the drive level. After newfs'ing the disk no more message (but they still in the drive's log). Hope this is relevant to your problem... -- MaXX I tested my drive as follow: On comp.unix.bsd.freebsd.misc MaXX wrote: > I will stress test the drive to see if it still reliable for some purpose. I've finished some tests on the drive: 1. filled the drive with huge files (11,25,30,10Gb) 3 simultaneous writes => no DMA_READ or DMA_WRITE errors; fsck OK 2. copied 18 times /usr/ports with some distfiles and work folders (2 simultaneous copies , 9 times about 4 596 000 files) => no DMA_READ or DMA_WRITE errors; fsck NOT OK: a bunch of errors which seem to be only at the file system level. 3. md5 sum of 4 596 000 files before corrective fsck: no errors, burning hot drive 4. clean reboot + fsck: ok; fsck skipped checks. 5. compare md5 before and after reboot: OK, no missing files/folders, newsum == oldsum. I the tried to reproduce the initial problem, no way to do it... I killed init, pulled the plug while writing or reading. No way to get those DMA_* errors back (Note: the kernel was not the same as the failled one)... I give up... Conclusion: the disk is reliable enough to go back to work with a good backup policy (maybe in a vinum mirror to be sure). The problem seem to be bound to the kernel the machine was running since mid June 05.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509121553.27981.bs139412>