From owner-freebsd-stable@FreeBSD.ORG Thu Feb 23 12:31:20 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 14D1316A420; Thu, 23 Feb 2006 12:31:20 +0000 (GMT) (envelope-from mike@reifenberger.com) Received: from mail-out.m-online.net (mail-out.m-online.net [212.18.0.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B8AE43D46; Thu, 23 Feb 2006 12:31:19 +0000 (GMT) (envelope-from mike@reifenberger.com) Received: from mail01.m-online.net (svr21.m-online.net [192.168.3.149]) by mail-out.m-online.net (Postfix) with ESMTP id EA8A572BFE; Thu, 23 Feb 2006 13:31:17 +0100 (CET) Received: from fw.reifenberger.com (ppp-82-135-5-72.mnet-online.de [82.135.5.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.m-online.net (Postfix) with ESMTP id 16A11C09C8; Thu, 23 Feb 2006 13:31:15 +0100 (CET) Received: from localhost (mike@localhost) by fw.reifenberger.com (8.13.4/8.13.4/Submit) with ESMTP id k1NCVH0M039101; Thu, 23 Feb 2006 13:31:17 +0100 (CET) (envelope-from mike@reifenberger.com) X-Authentication-Warning: fw.reifenberger.com: mike owned process doing -bs Date: Thu, 23 Feb 2006 13:31:16 +0100 (CET) From: Michael Reifenberger To: pjd@FreeBSD.org Message-ID: <20060223131549.V38816@fw.reifenberger.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Stable Subject: graid3 data corruption?!? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Feb 2006 12:31:20 -0000 Hi, I'm having 5 firewire Disks in one graid3 set. and using a fresh STABLE on SMP with an dual AMD64 in i386 mode. While doing an md5 checksum of all files in the filesystem (~770GB of data) on disk died. graid3 did the right thing and disconnected the disk. BUT: after diffing the md5sums of the files on large file (probably the one that got checked during the disk failure) had an different md5sum than before. --- md5_11.log Fri Dec 9 13:23:07 2005 +++ md5_12.log Wed Feb 22 18:03:03 2006 @@ -4460,3 +4460,3 @@ MD5 (Backup/totum/root_0_050211_i386.dmp.gz) = 5a3e7b03f48ea4c2cba10624edd996cf -MD5 (Backup/totum/root_0_050715.dmp.gz) = 0e154301cbec84571d1df94bf68e3d79 +MD5 (Backup/totum/root_0_050715.dmp.gz) = 172d7c12b78f3f191c184d467e31a53c MD5 (RIP/.pgp/PGPMacBinaryMappings.txt) = bf1b637a3a69bcbb8d4177be46a1c3ac BUT: doing a fresh md5sum now in degraded mode of the file I get again (the correct) value of: MD5 (Backup/totum/root_0_050715.dmp.gz) = 0e154301cbec84571d1df94bf68e3d79 For me this means, that graid3 gave incorrect data during the disk los. This shouldn't happen! Any clues how this could happen? Has anyone else seen this behaviour? BTW: dmesg showed: ... GEOM_RAID3: Device data created (id=0). GEOM_RAID3: Device data: provider da5s1a detected. GEOM_RAID3: Device data: provider da4s1a detected. GEOM_RAID3: Device data: provider da3s1a detected. GEOM_RAID3: Device data: provider da2s1a detected. GEOM_RAID3: Device data: provider da1s1a detected. GEOM_RAID3: Device data: provider da1s1a activated. GEOM_RAID3: Device data: provider da2s1a activated. GEOM_RAID3: Device data: provider da4s1a activated. GEOM_RAID3: Device data: provider da3s1a activated. GEOM_RAID3: Device data: provider da5s1a activated. GEOM_RAID3: Device data: provider raid3/data launched. ... (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): ABORTED COMMAND asc:0,0 (da2:sbp0:0:0:0): No additional sense information (da2:sbp0:0:0:0): Retrying Command (per Sense Data) (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0 (da2:sbp0:0:0:0): Data phase error (da2:sbp0:0:0:0): Retrying Command (per Sense Data) (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): ABORTED COMMAND asc:0,0 (da2:sbp0:0:0:0): No additional sense information (da2:sbp0:0:0:0): Retrying Command (per Sense Data) (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0 (da2:sbp0:0:0:0): Data phase error (da2:sbp0:0:0:0): Retrying Command (per Sense Data) (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): ABORTED COMMAND asc:0,0 (da2:sbp0:0:0:0): No additional sense information (da2:sbp0:0:0:0): Retries Exhausted GEOM_RAID3: Request failed. da2s1a[READ(offset=79432531968, length=32768)] GEOM_RAID3: Device data: provider da2s1a disconnected. GEOM_RAID3: Request failed. da2s1a[READ(offset=79432761344, length=32768)] GEOM_RAID3: Device data: provider [unknown] disconnected. GEOM_RAID3: Request failed. da2s1a[READ(offset=79432695808, length=32768)] GEOM_RAID3: Device data: provider [unknown] disconnected. GEOM_RAID3: Request failed. da2s1a[READ(offset=79432663040, length=32768)] GEOM_RAID3: Device data: provider [unknown] disconnected. GEOM_RAID3: Request failed. da2s1a[READ(offset=79432630272, length=32768)] GEOM_RAID3: Device data: provider [unknown] disconnected. GEOM_RAID3: Request failed. da2s1a[READ(offset=79432597504, length=32768)] GEOM_RAID3: Device data: provider [unknown] disconnected. ... (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 80 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0 (da2:sbp0:0:0:0): Data phase error (da2:sbp0:0:0:0): Retrying Command (per Sense Data) (da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 80 0 0 40 0 (da2:sbp0:0:0:0): CAM Status: SCSI Status Error (da2:sbp0:0:0:0): SCSI Status: Check Condition (da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0 (da2:sbp0:0:0:0): Data phase error (da2:sbp0:0:0:0): Retrying Command (per Sense Data) The last cam errors are during `dd`. Bye/2 --- Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting Comp: Michael.Reifenberger@plaut.de | Priv: Michael@Reifenberger.com http://www.plaut.de | http://www.Reifenberger.com