From owner-freebsd-stable@FreeBSD.ORG Mon Feb 28 21:58:27 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B484C16A4CE for ; Mon, 28 Feb 2005 21:58:27 +0000 (GMT) Received: from mailserver.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 26BED43D48 for ; Mon, 28 Feb 2005 21:58:27 +0000 (GMT) (envelope-from don@SANDVINE.com) X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Mon, 28 Feb 2005 16:58:26 -0500 Message-ID: <2BCEB9A37A4D354AA276774EE13FB8C224D34D@mailserver.sandvine.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Adaptec 3210S, 4.9-STABLE, corruption when disk fails Thread-Index: AcUd4KDOwAyKTRKpSlu/NqPp1EJkMQ== From: "Don Bowman" To: Subject: Adaptec 3210S, 4.9-STABLE, corruption when disk fails X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Feb 2005 21:58:27 -0000 I have a machine running: $ uname -a FreeBSD machine.phaedrus.sandvine.com 4.9-STABLE FreeBSD 4.9-STABLE #0: Fri Mar 19 10:39:07 EST 2004 user@machine.phaedrus.sandvine.com:/usr/src/sys/compile/LABDB i386 It has an adaptec 3210S raid controller running a single raid-5, and runs postgresql 7.4.6 as its primary application. 3 times now I have had a drive fail, and have had corrupted files in the postgresql cluster @ the same time. The time is too closely correlated to be a coincidence. It passes fsck @ the time that I got to it a couple of hours later, and the filesystem seems to be ok (with a failed drive, the raid in 'degrade' mode). It appears that the drive failure and the postgresql failure occur @ exactly the same time (monitoring with nagios, within 1hr accuracy). It would appear that for some file(s) bad data was returned. Does anyone have any suggestions? $ raidutil -L all RAIDUTIL Version: 3.04 Date: 9/27/2000 FreeBSD CLI Configuration Utility Adaptec ENGINE Version: 3.04 Date: 9/27/2000 Adaptec FreeBSD SCSI Engine # b0 b1 b2 Controller Cache FW NVRAM Serial Status ------------------------------------------------------------------------ --- d0 -- -- ADAP3210S 16MB 370F ADPT 1.0 BF0A21700J7Optimal Physical View Address Type Manufacturer/Model Capacity Status ------------------------------------------------------------------------ --- d0b0t0d0 Disk Drive (DASD) SEAGATE ST318453LW 17501MB Optimal d0b0t1d0 Disk Drive (DASD) SEAGATE ST318453LW 17501MB Optimal d0b0t2d0 Disk Drive (DASD) IBM DNES-318350W 17501MB Optimal d0b1t3d0 Disk Drive (DASD) IBM DNES-318350W 17501MB Optimal d0b1t4d0 Disk Drive (DASD) SEAGATE ST318452LW 17501MB Optimal d0b1t5d0 Disk Drive (DASD) IBM DNES-318350W 17501MB Optimal