Date: Mon, 20 Oct 2008 15:12:12 -0200 From: JoaoBR <joao@matik.com.br> To: Chuck Swiger <cswiger@mac.com> Cc: Jeremy Chadwick <koitsu@freebsd.org>, freebsd-stable@freebsd.org Subject: Re: constant zfs data corruption Message-ID: <200810201512.12926.joao@matik.com.br> In-Reply-To: <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com> References: <200810171530.45570.joao@matik.com.br> <20081020132208.GA3847@icarus.home.lan> <98238FC8-0FC4-4410-829F-EF2EA16A57B8@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 20 October 2008 14:44:50 Chuck Swiger wrote: > Hi, all-- > > On Oct 20, 2008, at 6:22 AM, Jeremy Chadwick wrote: > [ ...JoaoBR wrote... ] > > >> well, hardware seems to be ok and not older than 6 month, also > >> happens not > >> only on one machine ... smartctl do not report any hw failures on > >> disk > >> > >> regarding jumpering the drives to 150 you suspect a driver problem? > > > > It's not because of a driver problem. There are known SATA chipsets > > which do not properly work with SATA300 (particularly VIA and SiS > > chipsets); they claim to support it, but data is occasionally > > corrupted. > > Capping the drive to SATA150 fixes this problem. > > > > http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gb= it > >.2Fs > > Exactly so. Just as a general principle, if you've got sporadic data > corruption, turning I/O and system busses down a notch and retesting > is a useful starting point towards identifying whether the issue is > repeatable and whether it leans towards a hardware issue or software. > However, ZFS file checksumming supposedly is code that has been > carefully reviewed and tested so when it logs problems that is > supposed to be a fairly sure sign that the hardware isn't behaving > right. > ok, I will jumper it on some machines and see if the error comes back, even= if=20 my are Nvidia Sata > > > Because you didn't provide your smartctl output, I can't really tell > > if > > the drives are in "good shape" or not. :-) > > > > Also, do you not think it's a little odd that the only data corruption > > occurring for you are related to RRDtool? > > RRD tends to involve lots of small writes so it's files are going to > be changed often compared to other things that might be running; a > busy webserver or mailserver would involve more I/O to logfiles and > queue/mailspool, or so I would expect, but who knows what the machine > in question is being used for? > this server are transparent proxies (squid) on the top of small ISP network= s=20 with IPFW bandwidth control for the clients, the rrdtools collect the clien= t=20 traffic and some other data at every 5 minutes very ocasional I get the data corruption on a squid_cache file, normally 2= =20 days after the rrdtool error appears first =2D-=20 Jo=E3o A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200810201512.12926.joao>