From owner-freebsd-stable@FreeBSD.ORG Thu Mar 31 21:00:22 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C40A216A4CE for ; Thu, 31 Mar 2005 21:00:22 +0000 (GMT) Received: from mailserver.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1CE4E43D46 for ; Thu, 31 Mar 2005 21:00:22 +0000 (GMT) (envelope-from don@SANDVINE.com) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Date: Thu, 31 Mar 2005 16:00:21 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0 Message-ID: <2BCEB9A37A4D354AA276774EE13FB8C23A690B@mailserver.sandvine.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Adaptec 3210S, 4.9-STABLE, corruption when disk fails Thread-Index: AcU2LfUw6Xb820hyRi6ckEQZlNcJ7QABhojQ From: "Don Bowman" To: "Uwe Doering" cc: freebsd-stable@freebsd.org Subject: RE: Adaptec 3210S, 4.9-STABLE, corruption when disk fails X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2005 21:00:22 -0000 From: Uwe Doering [mailto:gemini@geminix.org]=20 > Don Bowman wrote: > > From: owner-freebsd-stable@freebsd.org > >=20 > >>From: Uwe Doering [mailto:gemini@geminix.org] ... > >> > >>>>Did you merge 1.3.2.3 as well? This actually should have > >>> > >>>been one MFC > >> > >>Yes, merged from RELENG_4. > >> > >>I will post later if this happens again, but it will be=20 > quite a long=20 > >>time. The machine has 7 drives in it, there are only > >>3 ones left old enough they might fail before I take it out=20 > of service=20 > >>(it originally had 7 1999-era IBM drives, now it has 4 2004-era=20 > >>seagate drives and 3 of the old IBM's. > >>The drives have been in continuous service, so they've lead=20 > a pretty=20 > >>good life!) > >> > >>Thanks for the suggestion on the cam timeout, I've set that value. > >=20 > > Another drive failed and the same thing happened. > > After the failure, the raid worked in degrade mode just=20 > fine, but many=20 > > files had been corrupted during the failure. > >=20 > > So I would suggest that this merge did not help, and the=20 > cam timeout=20 > > did not help either. > >=20 > > This is very frustrating, again I rebuild my postgresql=20 > install from=20 > > backup :( >=20 > This is indeed unfortunate. Maybe the problem is in fact=20 > located neither in PostgreSQL nor in FreeBSD but in the=20 > controller itself. Does it have the latest firmware? The=20 > necessary files should be available on Adaptec's website, and=20 > you can use the 'raidutil' program under FreeBSD to upload=20 > the firmware to the controller. I have to concede, however,=20 > that I never did this under FreeBSD myself. If I recall=20 > correctly I did the upload via a DOS diskette the last time. >=20 > If this doesn't help either you could ask Adaptec's support for help.=20 > You need to register the controller first, if memory serves. The latest firmware & bios is in the controller (upgraded the last time I had problems). Tried adaptec support, controller is registered. The problem is definitely not in postgresql. Files go missing in directories that are having new entries added (e.g. I lost a 'PG_VERSION' file). Data within the postgresql files becomes corrupt. Since the only application running is postgresql, and it reads/writes/fsyncs the data, its not unexpected that it's the one that reaps the 'rewards' of the failure. I have to believe this is either a bug in the controller, or a problem in cam or asr. --don