Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 Mar 2005 16:00:21 -0500
From:      "Don Bowman" <don@SANDVINE.com>
To:        "Uwe Doering" <gemini@geminix.org>
Cc:        freebsd-stable@freebsd.org
Subject:   RE: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Message-ID:  <2BCEB9A37A4D354AA276774EE13FB8C23A690B@mailserver.sandvine.com>

next in thread | raw e-mail | index | archive | help
From: Uwe Doering [mailto:gemini@geminix.org]=20
> Don Bowman wrote:
> > From: owner-freebsd-stable@freebsd.org
> >=20
> >>From: Uwe Doering [mailto:gemini@geminix.org]  ...
> >>
> >>>>Did you merge 1.3.2.3 as well?  This actually should have
> >>>
> >>>been one MFC
> >>
> >>Yes, merged from RELENG_4.
> >>
> >>I will post later if this happens again, but it will be=20
> quite a long=20
> >>time. The machine has 7 drives in it, there are only
> >>3 ones left old enough they might fail before I take it out=20
> of service=20
> >>(it originally had 7 1999-era IBM drives, now it has 4 2004-era=20
> >>seagate drives and 3 of the old IBM's.
> >>The drives have been in continuous service, so they've lead=20
> a pretty=20
> >>good life!)
> >>
> >>Thanks for the suggestion on the cam timeout, I've set that value.
> >=20
> > Another drive failed and the same thing happened.
> > After the failure, the raid worked in degrade mode just=20
> fine, but many=20
> > files had been corrupted during the failure.
> >=20
> > So I would suggest that this merge did not help, and the=20
> cam timeout=20
> > did not help either.
> >=20
> > This is very frustrating, again I rebuild my postgresql=20
> install from=20
> > backup :(
>=20
> This is indeed unfortunate.  Maybe the problem is in fact=20
> located neither in PostgreSQL nor in FreeBSD but in the=20
> controller itself.  Does it have the latest firmware?  The=20
> necessary files should be available on Adaptec's website, and=20
> you can use the 'raidutil' program under FreeBSD to upload=20
> the firmware to the controller.  I have to concede, however,=20
> that I never did this under FreeBSD myself.  If I recall=20
> correctly I did the upload via a DOS diskette the last time.
>=20
> If this doesn't help either you could ask Adaptec's support for help.=20
> You need to register the controller first, if memory serves.

The latest firmware & bios is in the controller (upgraded the
last time I had problems).

Tried adaptec support, controller is registered.

The problem is definitely not in postgresql. Files go missing
in directories that are having new entries added (e.g. I lost
a 'PG_VERSION' file). Data within the postgresql files becomes
corrupt. Since the only application running is postgresql,
and it reads/writes/fsyncs the data, its not unexpected that
it's the one that reaps the 'rewards' of the failure.

I have to believe this is either a bug in the controller,
or a problem in cam or asr.

--don



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2BCEB9A37A4D354AA276774EE13FB8C23A690B>