Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Aug 2008 05:12:45 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Antony Mawer <fbsd-fs@mawer.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zpool degraded - 'UNAVAIL cannot open' functioning drive
Message-ID:  <20080807121245.GA26629@eos.sc1.parodius.com>
In-Reply-To: <489ADD89.8070809@mawer.org>
References:  <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 07, 2008 at 09:33:29PM +1000, Antony Mawer wrote:
> On 7/08/2008 5:14 PM, Jeremy Chadwick wrote:
>>>> My advice at this point in time, because as of today I have officially
>>>> lost faith in it: avoid ata(4) at all costs.
>>> I tried to contact you some time ago, but didn't receive any
>>> answers.. Do you still want to resolve your problems with ATA?
>>
>> Yes, I did receive your mails, but you just wanted to know "if I was
>> still having problems".  I should have replied, but I did not.  That is
>> my fault, and for that I apologise.
>>
>> The issues aren't problems specific to me -- they are affecting a
>> significant userbase, specifically folks who use servers in production
>> environments.  But maybe I've misunderstood what you meant by "your
>> problems" -- my apologies if I have.
> ...
>> We still don't have an answer to the famous "DMA timeout issue", which
>> continues to haunt many.  I provided a small analysis in my Wiki, but
>> the technical justification is over my head -- it needs review from
>> someone who is familiar with the ATA protocol.  I inteprete the
>> NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an
>> invalid LBA.  I received one mail from a user (I forget if a mailing
>> list was CC'd or not -- I need to dig up the mail) who said that in some
>> cases NID_NOT_FOUND is normal.
>
> Do you know if most people found these are something that go away, at  
> least temporarily, when the server is rebooted? Or do they persist  
> across reboots?

For some people, the problems are permanent.  Those people have been
referred to talk to Scott Long, who offered to help look into the issue,
specifically those who can reproduce the errors.

For others, the problems eventually disappear, or possibly disappear
after "messing around" with their systems.  Some people swapped SATA
cables, others replaced their entire motherboard; "things work fine" was
the result.  The problem is that these were considered solutions, and
I'm not so sure the problem really was with their hardware, cables, or
anything else.

For example, the FreeNAS folks found that for some users, increasing the
ATA command timeout in the code from an arbitrary 5 seconds to something
larger (10-15 seconds) fixed the problem.  I don't know why an ATA
transaction would take 10-15 seconds, but I suppose it's possible if the
disk is doing something internal.

<storytime>
I have personal experience with an example of drives doing such.  Some
older (circa late 90s/early 2000) IBM ATA disks have a feature called
ADM, where almost like clockwork, the drive would spin down and start
doing some sort of internal maintenance.  Upon recieving an ATA command,
it would abort ADM, spin up, then complete the request.  The result in
FreeBSD (4.x) was a slew of ATA timeout/DMA errors.

I eventually found this feature mentioned in the disk specification PDF,
and contacted IBM about it.  IBM confirmed the feature, confirmed it was
responsible, and stated there was no way to disable it on ATA disks.
(On SCSI, the feature defaulted to off, but could be toggled on via a
custom vendor-specific SCSI command).  I still have the mail from IBM if
anyone wants to see it.

A few months later, when IBM released their next generation version of
their ATA disks, I found the ADM feature completely gone (from the disk
and the disk specification PDF).
</storytime>

In almost every case I've looked at so far, the individuals' chipsets,
disks, and overall setup are different.  SMART statistics on the drives
show absolutely no sign of errors, or anything that indicates a hardware
failure.  Many of the users are using AHCI as well (myself included, and
I have seen the DMA error issue myself), which is more reliable than
classic IDE.

Even in the case of temporary failures ("I saw those DMA errors once,
but they haven't returned"), it would be benefitial if users could
submit that data to me, so I can put more example cases in the Wiki.

The NID_NOT_FOUND error bother me, because the ATA specification implies
the OS is asking for an invalid LBA, which would likely be caused by a
bug in FreeBSD.  That's why that error condition needs to be analysed
more.  I will point people to the libata FAQ on errors, too -- look
at the definition of error type IDNF (this is what FreeBSD calls
NID_NOT_FOUND):

http://ata.wiki.kernel.org/index.php/Libata_error_messages

For SATA, FreeBSD does not appear to support printing SATA SError
codes, so for those of us with SATA disks, we're actually missing some
verbosity on what the errors could be caused by.

It would be benefitial if there was some form of sysctl to increase the
verbosity from the ATA subsystem when an error happens.  The existing
data we get back is terse, and barely useful.  I know for a fact there's
more debug information that could be output in such scenarios.  And
please do not reply with "good idea, send patches" unless you're wanting
to be chewed out.  :-)

> I'm going to do some analysis and find out whether I can find any of our  
> systems that may be experiencing ATA errors that don't correlate with  
> what their SMART data is saying. To date I haven't caught any, but  
> that's not to say they may not be happening... just that all of the ones  
> I have caught to date do appear to have been hardware-related issues...
>
> It seems a shame that, at one point, bits of FreeBSD ATA code wound up  
> in Linux (the controversy when parts were lifted but Soren's copyright  
> removed)... now FreeBSD's ata subsystem is left languishing with no  
> immediate solution in sight :-(

I had forgotten all about that piece of history -- thanks for reminding
me.

The only solutions/options that are on the horizon:

* Recommending people buy SATA controllers that utilise CAM and da(4),
  thus avoiding ata(4) entirely.  Areca makes such controllers, but
  it's impractical to ask users to buy one, since they're expensive;
  end-users should be able to use their onboard SATA controllers like
  the Intel ICH series and nVidia nForce and MCP with reliability.

* Implement a form of ATA-to-SCSI translation, similar to what Linux
  libata does; this would make ATA/SATA disks utilise CAM and da(4)
  through a translation layer.  Scott Long has told me that he is actually
  in the process of writing such, but I know Scott is also *incredibly*
  busy, so that project may take a very long time.

* If you use ATA (read: PATA) disks, disabling DMA and forcing PIO
  does apparently work.  The performance hit is quite substantial,
  however, and this isn't practical for servers.  Disabling DMA is
  not possible with SATA.

* Contact Jeff Garzik (Linux libata maintainer) and ask for help.  This
  might upset some people, especially considering the history item you
  brought up earlier.  By "help" I don't mean "Hey man, write the code",
  I mean "can you help expand on what this error means, and have you
  folks seen it on Linux?"

> Is there any means or interest for the FreeBSD Foundation to find  
> someone experienced enough with ATA to spend some time reviewing/testing  
> it? Or does that sort of thing just "not happen"?

That's a very good question.  I don't know how the politics of the
FreeBSD Foundation work.

But I would be more than happy to donate money (read: a couple thousand
US dollars out of my pocket) to the Foundation assuming the proceeds
went to getting someone very familiar with ATA to look at the problem,
look at the code, address existing PRs, and fix things.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080807121245.GA26629>