Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Aug 2008 21:33:29 +1000
From:      Antony Mawer <fbsd-fs@mawer.org>
To:        Jeremy Chadwick <koitsu@FreeBSD.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zpool degraded - 'UNAVAIL cannot open' functioning drive
Message-ID:  <489ADD89.8070809@mawer.org>
In-Reply-To: <20080807071434.GA15465@eos.sc1.parodius.com>
References:  <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com>	<20080807044759.GA7505@eos.sc1.parodius.com>	<6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com>	<6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com>	<20080807055841.GB9735@eos.sc1.parodius.com>	<489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 7/08/2008 5:14 PM, Jeremy Chadwick wrote:
>>> My advice at this point in time, because as of today I have officially
>>> lost faith in it: avoid ata(4) at all costs.
>> I tried to contact you some time ago, but didn't receive any
>> answers.. Do you still want to resolve your problems with ATA?
> 
> Yes, I did receive your mails, but you just wanted to know "if I was
> still having problems".  I should have replied, but I did not.  That is
> my fault, and for that I apologise.
> 
> The issues aren't problems specific to me -- they are affecting a
> significant userbase, specifically folks who use servers in production
> environments.  But maybe I've misunderstood what you meant by "your
> problems" -- my apologies if I have.
...
> We still don't have an answer to the famous "DMA timeout issue", which
> continues to haunt many.  I provided a small analysis in my Wiki, but
> the technical justification is over my head -- it needs review from
> someone who is familiar with the ATA protocol.  I inteprete the
> NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an
> invalid LBA.  I received one mail from a user (I forget if a mailing
> list was CC'd or not -- I need to dig up the mail) who said that in some
> cases NID_NOT_FOUND is normal.

Do you know if most people found these are something that go away, at 
least temporarily, when the server is rebooted? Or do they persist 
across reboots?

I'm going to do some analysis and find out whether I can find any of our 
systems that may be experiencing ATA errors that don't correlate with 
what their SMART data is saying. To date I haven't caught any, but 
that's not to say they may not be happening... just that all of the ones 
I have caught to date do appear to have been hardware-related issues...

It seems a shame that, at one point, bits of FreeBSD ATA code wound up 
in Linux (the controversy when parts were lifted but Soren's copyright 
removed)... now FreeBSD's ata subsystem is left languishing with no 
immediate solution in sight :-(

Is there any means or interest for the FreeBSD Foundation to find 
someone experienced enough with ATA to spend some time reviewing/testing 
it? Or does that sort of thing just "not happen"?

--Antony



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?489ADD89.8070809>