Date: Thu, 31 Mar 2005 16:00:21 -0500 From: "Don Bowman" <don@SANDVINE.com> To: "Uwe Doering" <gemini@geminix.org> Cc: freebsd-stable@freebsd.org Subject: RE: Adaptec 3210S, 4.9-STABLE, corruption when disk fails Message-ID: <2BCEB9A37A4D354AA276774EE13FB8C23A690B@mailserver.sandvine.com>
next in thread | raw e-mail | index | archive | help
From: Uwe Doering [mailto:gemini@geminix.org]=20 > Don Bowman wrote: > > From: owner-freebsd-stable@freebsd.org > >=20 > >>From: Uwe Doering [mailto:gemini@geminix.org] ... > >> > >>>>Did you merge 1.3.2.3 as well? This actually should have > >>> > >>>been one MFC > >> > >>Yes, merged from RELENG_4. > >> > >>I will post later if this happens again, but it will be=20 > quite a long=20 > >>time. The machine has 7 drives in it, there are only > >>3 ones left old enough they might fail before I take it out=20 > of service=20 > >>(it originally had 7 1999-era IBM drives, now it has 4 2004-era=20 > >>seagate drives and 3 of the old IBM's. > >>The drives have been in continuous service, so they've lead=20 > a pretty=20 > >>good life!) > >> > >>Thanks for the suggestion on the cam timeout, I've set that value. > >=20 > > Another drive failed and the same thing happened. > > After the failure, the raid worked in degrade mode just=20 > fine, but many=20 > > files had been corrupted during the failure. > >=20 > > So I would suggest that this merge did not help, and the=20 > cam timeout=20 > > did not help either. > >=20 > > This is very frustrating, again I rebuild my postgresql=20 > install from=20 > > backup :( >=20 > This is indeed unfortunate. Maybe the problem is in fact=20 > located neither in PostgreSQL nor in FreeBSD but in the=20 > controller itself. Does it have the latest firmware? The=20 > necessary files should be available on Adaptec's website, and=20 > you can use the 'raidutil' program under FreeBSD to upload=20 > the firmware to the controller. I have to concede, however,=20 > that I never did this under FreeBSD myself. If I recall=20 > correctly I did the upload via a DOS diskette the last time. >=20 > If this doesn't help either you could ask Adaptec's support for help.=20 > You need to register the controller first, if memory serves. The latest firmware & bios is in the controller (upgraded the last time I had problems). Tried adaptec support, controller is registered. The problem is definitely not in postgresql. Files go missing in directories that are having new entries added (e.g. I lost a 'PG_VERSION' file). Data within the postgresql files becomes corrupt. Since the only application running is postgresql, and it reads/writes/fsyncs the data, its not unexpected that it's the one that reaps the 'rewards' of the failure. I have to believe this is either a bug in the controller, or a problem in cam or asr. --don
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2BCEB9A37A4D354AA276774EE13FB8C23A690B>