From owner-freebsd-questions@FreeBSD.ORG Sun Feb 27 22:09:52 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E51D516A4D5 for ; Sun, 27 Feb 2005 22:09:52 +0000 (GMT) Received: from smtp11.wanadoo.fr (smtp11.wanadoo.fr [193.252.22.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3155643D53 for ; Sun, 27 Feb 2005 22:09:52 +0000 (GMT) (envelope-from atkielski.anthony@wanadoo.fr) Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf1107.wanadoo.fr (SMTP Server) with ESMTP id 5FAAB1C0008F for ; Sun, 27 Feb 2005 23:09:51 +0100 (CET) Received: from pix.atkielski.com (ASt-Lambert-111-2-1-3.w81-50.abo.wanadoo.fr [81.50.80.3]) by mwinf1107.wanadoo.fr (SMTP Server) with ESMTP id B35B61C00093 for ; Sun, 27 Feb 2005 23:09:50 +0100 (CET) X-ME-UUID: 20050227220950735.B35B61C00093@mwinf1107.wanadoo.fr Date: Sun, 27 Feb 2005 23:09:50 +0100 From: Anthony Atkielski X-Priority: 3 (Normal) Message-ID: <1742881516.20050227230950@wanadoo.fr> To: freebsd-questions@freebsd.org In-Reply-To: References: <1561762673.20050227155330@wanadoo.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: Re: WRITE_DMA errors on SATA drive under 5.3-RELEASE X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: freebsd-questions@freebsd.org List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Feb 2005 22:09:53 -0000 Mike Tancsa writes: > Could be a bad sector on the drive, or bad cable. Hard to say. Try > /usr/ports/sysutils/smartmontools/ > > It can read all sorts of info off the drive and help you narrow down > what the problem might be. Wow! That is a very cool tool. There's even a Windows port so I can use it on my XP machine. The two SATA drives show no errors. The older IDE drive (which contains the filesystem root) shows the stuff below. There have been over 1000 read errors over the lifetime of the disk, but the disk had some hard times back in December when it was in my overheated old server, so that might account for part of that. The most recent errors look like they might correlate with what I saw today (unfortunately, I'm not sure how to interpret them): ====================================================================== smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG SV4002H Serial Number: 0413J1FR932555 Firmware Version: QP100-07 Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 1 Local Time is: Sun Feb 27 22:52:54 2005 CET ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled The SMART RETURN STATUS return value (smartmontools -H option/Directive) can not be retrieved with this version of ATAng, please do not rely on this value === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (1560) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 8) minutes. SMART Attributes Data Structure revision number: 9 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 1050 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 55 5 Reallocated_Sector_Ct 0x0033 253 253 009 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 2968364 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 54 194 Temperature_Celsius 0x0022 175 145 000 Old_age Always - 21 197 Current_Pending_Sector 0x0033 253 253 009 Pre-fail Always - 0 198 Offline_Uncorrectable 0x0031 253 253 009 Pre-fail Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000b 100 100 051 Pre-fail Always - 0 201 Soft_Read_Error_Rate 0x000b 100 100 051 Pre-fail Always - 1 SMART Error Log Version: 1 Warning: ATA error count 22 inconsistent with error log pointer 4 ATA Error Count: 22 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 22 occurred at disk power-on lifetime: 23324 hours (971 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 88 05 01 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- a1 00 05 01 00 00 a0 00 49d+16:22:20.296 IDENTIFY PACKET DEVICE ec 00 05 01 00 00 b0 00 49d+16:22:20.296 IDENTIFY DEVICE a1 00 05 01 00 00 b0 00 49d+16:22:20.296 IDENTIFY PACKET DEVICE c4 00 19 7f 01 06 e0 ff 49d+16:22:06.296 READ MULTIPLE c4 00 01 40 00 00 e0 00 49d+16:20:45.296 READ MULTIPLE Error 21 occurred at disk power-on lifetime: 23324 hours (971 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 88 05 01 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- a1 00 05 01 00 00 a0 00 49d+16:20:17.296 IDENTIFY PACKET DEVICE ec 00 05 01 00 00 b0 00 49d+16:20:17.296 IDENTIFY DEVICE a1 00 05 01 00 00 b0 00 49d+16:20:17.296 IDENTIFY PACKET DEVICE ca 00 0c 5f 61 38 e0 ff 49d+16:20:04.296 WRITE DMA e7 00 00 00 00 00 e0 00 49d+16:19:33.296 FLUSH CACHE Error 20 occurred at disk power-on lifetime: 23283 hours (970 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 88 05 01 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- a1 00 05 01 00 00 a0 00 49d+09:02:47.296 IDENTIFY PACKET DEVICE ec 00 05 01 00 00 b0 00 49d+09:02:47.296 IDENTIFY DEVICE a1 00 05 01 00 00 b0 00 49d+09:02:47.296 IDENTIFY PACKET DEVICE c4 00 1a ff cd 06 e0 ff 49d+09:02:34.296 READ MULTIPLE c4 00 20 df cd 06 e0 ff 07:57:42.000 READ MULTIPLE Error 19 occurred at disk power-on lifetime: 23281 hours (970 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 88 05 01 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- a1 00 05 01 00 00 a0 00 07:50:43.000 IDENTIFY PACKET DEVICE ec 00 05 01 00 00 b0 00 07:50:43.000 IDENTIFY DEVICE a1 00 05 01 00 00 b0 00 07:50:43.000 IDENTIFY PACKET DEVICE c4 00 07 98 01 06 e0 ff 07:50:43.000 READ MULTIPLE e3 00 00 40 00 00 a0 00 07:50:43.000 IDLE Error 18 occurred at disk power-on lifetime: 23272 hours (969 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 88 05 01 00 00 a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d5 01 01 4f c2 e0 00 05:59:56.000 SMART READ LOG b0 d1 01 01 4f c2 e0 00 05:59:56.000 SMART READ ATTRIBUTE THRESHOLDS [OBS-4] b0 d0 00 00 4f c2 e0 00 05:59:56.000 SMART READ DATA b0 da 00 00 4f c2 e0 00 05:59:56.000 SMART RETURN STATUS b0 da 00 00 4f c2 e0 00 05:59:56.000 SMART RETURN STATUS SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging -- Anthony