Date: Wed, 27 Jul 2011 16:34:23 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access Message-ID: <4E3013DF.10803@FreeBSD.org> In-Reply-To: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk> <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
on 27/07/2011 15:06 Steven Hartland said the following: > I've checked the raw disk and all seems fine there, so does look like its > some sort of zfs livelock. > > I'm trying to keep the machine available in case someone needs more information, > but its a production machine so I'm going to have to reboot it in the next > few hours. > > Disk tests:- > > dd if=/dev/da1 of=/dev/null bs=10m 5724+1 records in > 5724+1 records out > 60022480896 bytes transferred in 430.479894 secs (139431555 bytes/sec) > > > smartctl -a /dev/da1 Is this the only disk associated with the troubled pool? > smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-RELEASE amd64] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: SandForce Driven SSDs > Device Model: Corsair CSSD-F60GB2 > Serial Number: 10446509320009990024 > Firmware Version: 1.1 > User Capacity: 60,022,480,896 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: ATA-8-ACS revision 6 > Local Time is: Wed Jul 27 11:27:30 2011 UTC > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data collection activity > was never started. > Auto Offline Data Collection: Disabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline data collection: ( 0) seconds. > Offline data collection > capabilities: (0x7f) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Abort Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine recommended polling time: ( 1) minutes. > Extended self-test routine > recommended polling time: ( 48) minutes. > Conveyance self-test routine > recommended polling time: ( 2) minutes. > SCT capabilities: (0x003d) SCT Status supported. > SCT Error Recovery Control supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 119 100 050 Pre-fail Always > - 0/238293224 > 5 Retired_Block_Count 0x0033 097 097 003 Pre-fail Always > - 256 > 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always > - 5513h+00m+39.450s > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always > - 2 > 171 Program_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 172 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline > - 0 > 177 Wear_Range_Delta 0x0000 000 000 --- Old_age Offline > - 1 > 181 Program_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 182 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always > - 0 > 194 Temperature_Celsius 0x0022 022 026 000 Old_age Always > - 22 (Min/Max 0/26) > 195 ECC_Uncorr_Error_Count 0x001c 119 100 000 Old_age Offline > - 0/238293224 > 196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always > - 0 > 231 SSD_Life_Left 0x0013 057 057 010 Pre-fail Always > - 0 > 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline > - 152704 > 234 SandForce_Internal 0x0000 000 000 000 Old_age Offline > - 90688 > 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always > - 90688 > 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always > - 3584 > > Error SMART Error Log Read failed: Input/output error > Smartctl: SMART Error Log Read Failed > Error SMART Error Self-Test Log Read failed: Input/output error > Smartctl: SMART Self Test Log Read Failed > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E3013DF.10803>