Date: Mon, 20 Oct 2008 15:07:30 -0200 From: JoaoBR <joao@matik.com.br> To: Jeremy Chadwick <koitsu@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: constant zfs data corruption Message-ID: <200810201507.30778.joao@matik.com.br> In-Reply-To: <20081020132208.GA3847@icarus.home.lan> References: <200810171530.45570.joao@matik.com.br> <200810200837.40451.joao@matik.com.br> <20081020132208.GA3847@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 20 October 2008 11:22:08 you wrote: > On Mon, Oct 20, 2008 at 08:37:40AM -0200, JoaoBR wrote: > > On Friday 17 October 2008 15:39:59 Chuck Swiger wrote: > > > On Oct 17, 2008, at 11:30 AM, JoaoBR wrote: > > > > constantly I find data corruption on ZFS volums, ever from rrdtool, > > > > this > > > > corrupt data happens on SATA disks, never seem on SCSI > > > > > > Presumably your SATA drives are correctly being reported by ZFS as > > > corrupting data, and you should do something like replace cables, the > > > drives themselves, perhaps try downgrading to SATA-150 rather than > > > -300 if you are using the later. Also consider running a drive > > > diagnostic utility from the mfgr (or smartmontools) and doing an > > > extended self-test or destructive write surface check. > > > > well, hardware seems to be ok and not older than 6 month, also happens > > not only on one machine ... smartctl do not report any hw failures on > > disk > > > > regarding jumpering the drives to 150 you suspect a driver problem? > > It's not because of a driver problem. There are known SATA chipsets > which do not properly work with SATA300 (particularly VIA and SiS > chipsets); they claim to support it, but data is occasionally corrupted. > Capping the drive to SATA150 fixes this problem. > > http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit= =2E2 >Fs > > There are also known problems with Silicon Image chipsets (on Linux, > Windows, and FreeBSD). > > Because you didn't provide your smartctl output, I can't really tell if > the drives are in "good shape" or not. :-) > ok then here it comes smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce= =20 Allen Home page is http://smartmontools.sourceforge.net/ =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D Model Family: Hitachi Deskstar T7K500 Device Model: Hitachi HDT725025VLA380 Serial Number: VFL101RK0A9SDP =46irmware Version: V5DOA7EA User Capacity: 250.058.268.160 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1 Local Time is: Mon Oct 20 15:07:01 2008 BRST SMART support is: Available - device has SMART capability. SMART support is: Enabled =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection:=20 Disabled. Self-test execution status: ( 0) The previous self-test routine=20 completed without error or no self-test has e= ver been run. Total time to complete Offline data collection: (4949) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off= =20 support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 83) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED = =20 WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 099 099 016 Pre-fail =20 Always - 3 2 Throughput_Performance 0x0005 100 100 050 Pre-fail =20 Offline - 0 3 Spin_Up_Time 0x0007 117 117 024 Pre-fail =20 Always - 316 (Average 322) 4 Start_Stop_Count 0x0012 100 100 000 Old_age =20 Always - 36 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail =20 Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail =20 Always - 0 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail =20 Offline - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age =20 Always - 800 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail =20 Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age =20 Always - 36 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age =20 Always - 69 193 Load_Cycle_Count 0x0012 100 100 000 Old_age =20 Always - 69 194 Temperature_Celsius 0x0002 130 130 000 Old_age =20 Always - 46 (Lifetime Min/Max 19/52) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age =20 Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age =20 Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age =20 Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age =20 Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. > Also, do you not think it's a little odd that the only data corruption > occurring for you are related to RRDtool? this yes I think is suspitious =2D-=20 Jo=E3o A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200810201507.30778.joao>