Date: Thu, 31 Mar 2005 16:00:21 -0500 From: "Don Bowman" <don@SANDVINE.com> To: "Uwe Doering" <gemini@geminix.org> Cc: freebsd-stable@freebsd.org Subject: RE: Adaptec 3210S, 4.9-STABLE, corruption when disk fails Message-ID: <2BCEB9A37A4D354AA276774EE13FB8C23A690B@mailserver.sandvine.com>
next in thread | raw e-mail | index | archive | help
From: Uwe Doering [mailto:gemini@geminix.org] > Don Bowman wrote: > > From: owner-freebsd-stable@freebsd.org > > > >>From: Uwe Doering [mailto:gemini@geminix.org] ... > >> > >>>>Did you merge 1.3.2.3 as well? This actually should have > >>> > >>>been one MFC > >> > >>Yes, merged from RELENG_4. > >> > >>I will post later if this happens again, but it will be > quite a long > >>time. The machine has 7 drives in it, there are only > >>3 ones left old enough they might fail before I take it out > of service > >>(it originally had 7 1999-era IBM drives, now it has 4 2004-era > >>seagate drives and 3 of the old IBM's. > >>The drives have been in continuous service, so they've lead > a pretty > >>good life!) > >> > >>Thanks for the suggestion on the cam timeout, I've set that value. > > > > Another drive failed and the same thing happened. > > After the failure, the raid worked in degrade mode just > fine, but many > > files had been corrupted during the failure. > > > > So I would suggest that this merge did not help, and the > cam timeout > > did not help either. > > > > This is very frustrating, again I rebuild my postgresql > install from > > backup :( > > This is indeed unfortunate. Maybe the problem is in fact > located neither in PostgreSQL nor in FreeBSD but in the > controller itself. Does it have the latest firmware? The > necessary files should be available on Adaptec's website, and > you can use the 'raidutil' program under FreeBSD to upload > the firmware to the controller. I have to concede, however, > that I never did this under FreeBSD myself. If I recall > correctly I did the upload via a DOS diskette the last time. > > If this doesn't help either you could ask Adaptec's support for help. > You need to register the controller first, if memory serves. The latest firmware & bios is in the controller (upgraded the last time I had problems). Tried adaptec support, controller is registered. The problem is definitely not in postgresql. Files go missing in directories that are having new entries added (e.g. I lost a 'PG_VERSION' file). Data within the postgresql files becomes corrupt. Since the only application running is postgresql, and it reads/writes/fsyncs the data, its not unexpected that it's the one that reaps the 'rewards' of the failure. I have to believe this is either a bug in the controller, or a problem in cam or asr. --don
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2BCEB9A37A4D354AA276774EE13FB8C23A690B>
