Date: Sat, 16 Feb 2013 11:55:05 +0200 From: Alexandr Krivulya <shuriku@shurik.kiev.ua> To: freebsd-fs@freebsd.org Subject: Re: error destroying zfs filesystem Message-ID: <511F5779.5030805@shurik.kiev.ua> In-Reply-To: <511E30B3.2070302@brockmann-consult.de> References: <511E1C6B.50101@shurik.kiev.ua> <CAJDksDTnMNUo06Exivr4-tcCcGCNuRZnvP4jP7XTc-ydekKGvQ@mail.gmail.com> <511E30B3.2070302@brockmann-consult.de>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] 15.02.2013 14:57, Peter Maloney пишет: > On 2013-02-15 13:44, Alexandr Kovalenko wrote: >> On Fri, Feb 15, 2013 at 11:30 AM, Alexandr Krivulya >> <shuriku@shurik.kiev.ua> wrote: >>> Hello everyone! >>> >>> After upgrading my zfs-only system from 8.2 to 9.1 I have many errors >>> related to zfs in my /var/log/messages: >>> >>> Feb 15 13:12:44 gw kernel: metaslab_free_dva(): bad DVA >>> 0:264842321920Solaris: WARNING: metaslab_free_dva(): bad DVA 0:338480095232 >>> Feb 15 13:12:44 gw kernel: Solaris: WARNING: metaslab_free_dva(): bad >>> DVA 0:277633901056Solaris: WARNING: >>> Feb 15 13:12:45 gw kernel: metaslab_free_dva(): bad DVA >>> 0:277263710208Solaris: WARNING: metaslab_free_dva(): bad DVA >>> 0:277633606144Solaris: WARNING: metaslab_free_dva(): bad DVA >>> 0:278349642240Solaris: WARNING: metaslab_free_dva(): bad DVA >>> 0:278429099008Solaris: WARNING: metaslab_free_dva(): bad DVA >>> 0:278349926400Solaris: WARNING: metaslab_free_dva(): bad DVA >>> 0:278245378560Solaris: WARNING: metaslab_free_dva(): bad DVA >>> 0:256838777344Solaris: WARNING: metaslab_free_dva(): bad DVA 0:327364684800 >>> Feb 15 13:12:45 gw kernel: Solaris: WARNING: metaslab_free_dva(): bad >>> DVA 0:312373604864 >>> >>> root@gw:/ # zpool status -v >>> pool: zmirror >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://illumos.org/msg/ZFS-8000-8A >>> scan: scrub repaired 0 in 1h39m with 1 errors on Thu Feb 14 17:48:53 2013 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> zmirror ONLINE 0 0 2 >>> mirror-0 ONLINE 0 0 8 >>> gpt/disk01 ONLINE 0 0 8 >>> gpt/disk02 ONLINE 0 0 8 >>> >>> errors: Permanent errors have been detected in the following files: >>> >>> zmirror/usr:<0x0> >>> <0xc8>:<0x0> >> [dd] >>> How can I solve this issue? >> Make smartctl -t long /dev/<your_physical_drive_here> and then take a >> look if there any pending sectors/errors in output of smartctl -a >> /dev/<your_physical_drive_here> ? (for both of drives used) All tests seems to be fine: root@gw:/usr/home/support # smartctl -l selftest /dev/ada0 smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1446 - root@gw:/usr/home/support # smartctl -l selftest /dev/ada1 smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1630 smartctl also didn't show any problems, see attached file > You could also try going in /usr and "rm" or "truncate" some files > until the "Permanent errors have been detected" list is empty. And > this assumes you already ran a full scrub, which you must do to remove > the files. Now I cannot mount this filesystem to remove files: root@gw:/usr/home/support # zfs mount zmirror/usr cannot mount 'zmirror/usr': mountpoint or dataset is busy The only way I see is to backup entire pool, destroy and recreate it, and restore from a backup. [-- Attachment #2 --] root@gw:/usr/home/support # smartctl -iAH /dev/ada0 smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital RE4 Serial ATA Device Model: WDC WD5003ABYX-01WERA1 Serial Number: WD-WMAYP3251340 LU WWN Device Id: 5 0014ee 0032ad53b Firmware Version: 01.01S02 User Capacity: 500 107 862 016 bytes [500 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sat Feb 16 11:49:32 2013 EET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 139 139 021 Pre-fail Always - 4033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 18 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1465 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 16 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 15 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2 194 Temperature_Celsius 0x0022 116 095 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 root@gw:/usr/home/support # smartctl -iAH /dev/ada1 smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE amd64] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital RE4 Serial ATA Device Model: WDC WD5003ABYX-01WERA1 Serial Number: WD-WMAYP3265645 LU WWN Device Id: 5 0014ee 0032c2b14 Firmware Version: 01.01S02 User Capacity: 500 107 862 016 bytes [500 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sat Feb 16 11:48:40 2013 EET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 142 142 021 Pre-fail Always - 3875 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 15 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1649 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2 194 Temperature_Celsius 0x0022 116 095 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?511F5779.5030805>
