Date: Thu, 7 Aug 2008 05:12:45 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Antony Mawer <fbsd-fs@mawer.org> Cc: freebsd-fs@freebsd.org Subject: Re: zpool degraded - 'UNAVAIL cannot open' functioning drive Message-ID: <20080807121245.GA26629@eos.sc1.parodius.com> In-Reply-To: <489ADD89.8070809@mawer.org> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 07, 2008 at 09:33:29PM +1000, Antony Mawer wrote: > On 7/08/2008 5:14 PM, Jeremy Chadwick wrote: >>>> My advice at this point in time, because as of today I have officially >>>> lost faith in it: avoid ata(4) at all costs. >>> I tried to contact you some time ago, but didn't receive any >>> answers.. Do you still want to resolve your problems with ATA? >> >> Yes, I did receive your mails, but you just wanted to know "if I was >> still having problems". I should have replied, but I did not. That is >> my fault, and for that I apologise. >> >> The issues aren't problems specific to me -- they are affecting a >> significant userbase, specifically folks who use servers in production >> environments. But maybe I've misunderstood what you meant by "your >> problems" -- my apologies if I have. > ... >> We still don't have an answer to the famous "DMA timeout issue", which >> continues to haunt many. I provided a small analysis in my Wiki, but >> the technical justification is over my head -- it needs review from >> someone who is familiar with the ATA protocol. I inteprete the >> NID_NOT_FOUND error to mean FreeBSD is asking the disk to r/w to/from an >> invalid LBA. I received one mail from a user (I forget if a mailing >> list was CC'd or not -- I need to dig up the mail) who said that in some >> cases NID_NOT_FOUND is normal. > > Do you know if most people found these are something that go away, at > least temporarily, when the server is rebooted? Or do they persist > across reboots? For some people, the problems are permanent. Those people have been referred to talk to Scott Long, who offered to help look into the issue, specifically those who can reproduce the errors. For others, the problems eventually disappear, or possibly disappear after "messing around" with their systems. Some people swapped SATA cables, others replaced their entire motherboard; "things work fine" was the result. The problem is that these were considered solutions, and I'm not so sure the problem really was with their hardware, cables, or anything else. For example, the FreeNAS folks found that for some users, increasing the ATA command timeout in the code from an arbitrary 5 seconds to something larger (10-15 seconds) fixed the problem. I don't know why an ATA transaction would take 10-15 seconds, but I suppose it's possible if the disk is doing something internal. <storytime> I have personal experience with an example of drives doing such. Some older (circa late 90s/early 2000) IBM ATA disks have a feature called ADM, where almost like clockwork, the drive would spin down and start doing some sort of internal maintenance. Upon recieving an ATA command, it would abort ADM, spin up, then complete the request. The result in FreeBSD (4.x) was a slew of ATA timeout/DMA errors. I eventually found this feature mentioned in the disk specification PDF, and contacted IBM about it. IBM confirmed the feature, confirmed it was responsible, and stated there was no way to disable it on ATA disks. (On SCSI, the feature defaulted to off, but could be toggled on via a custom vendor-specific SCSI command). I still have the mail from IBM if anyone wants to see it. A few months later, when IBM released their next generation version of their ATA disks, I found the ADM feature completely gone (from the disk and the disk specification PDF). </storytime> In almost every case I've looked at so far, the individuals' chipsets, disks, and overall setup are different. SMART statistics on the drives show absolutely no sign of errors, or anything that indicates a hardware failure. Many of the users are using AHCI as well (myself included, and I have seen the DMA error issue myself), which is more reliable than classic IDE. Even in the case of temporary failures ("I saw those DMA errors once, but they haven't returned"), it would be benefitial if users could submit that data to me, so I can put more example cases in the Wiki. The NID_NOT_FOUND error bother me, because the ATA specification implies the OS is asking for an invalid LBA, which would likely be caused by a bug in FreeBSD. That's why that error condition needs to be analysed more. I will point people to the libata FAQ on errors, too -- look at the definition of error type IDNF (this is what FreeBSD calls NID_NOT_FOUND): http://ata.wiki.kernel.org/index.php/Libata_error_messages For SATA, FreeBSD does not appear to support printing SATA SError codes, so for those of us with SATA disks, we're actually missing some verbosity on what the errors could be caused by. It would be benefitial if there was some form of sysctl to increase the verbosity from the ATA subsystem when an error happens. The existing data we get back is terse, and barely useful. I know for a fact there's more debug information that could be output in such scenarios. And please do not reply with "good idea, send patches" unless you're wanting to be chewed out. :-) > I'm going to do some analysis and find out whether I can find any of our > systems that may be experiencing ATA errors that don't correlate with > what their SMART data is saying. To date I haven't caught any, but > that's not to say they may not be happening... just that all of the ones > I have caught to date do appear to have been hardware-related issues... > > It seems a shame that, at one point, bits of FreeBSD ATA code wound up > in Linux (the controversy when parts were lifted but Soren's copyright > removed)... now FreeBSD's ata subsystem is left languishing with no > immediate solution in sight :-( I had forgotten all about that piece of history -- thanks for reminding me. The only solutions/options that are on the horizon: * Recommending people buy SATA controllers that utilise CAM and da(4), thus avoiding ata(4) entirely. Areca makes such controllers, but it's impractical to ask users to buy one, since they're expensive; end-users should be able to use their onboard SATA controllers like the Intel ICH series and nVidia nForce and MCP with reliability. * Implement a form of ATA-to-SCSI translation, similar to what Linux libata does; this would make ATA/SATA disks utilise CAM and da(4) through a translation layer. Scott Long has told me that he is actually in the process of writing such, but I know Scott is also *incredibly* busy, so that project may take a very long time. * If you use ATA (read: PATA) disks, disabling DMA and forcing PIO does apparently work. The performance hit is quite substantial, however, and this isn't practical for servers. Disabling DMA is not possible with SATA. * Contact Jeff Garzik (Linux libata maintainer) and ask for help. This might upset some people, especially considering the history item you brought up earlier. By "help" I don't mean "Hey man, write the code", I mean "can you help expand on what this error means, and have you folks seen it on Linux?" > Is there any means or interest for the FreeBSD Foundation to find > someone experienced enough with ATA to spend some time reviewing/testing > it? Or does that sort of thing just "not happen"? That's a very good question. I don't know how the politics of the FreeBSD Foundation work. But I would be more than happy to donate money (read: a couple thousand US dollars out of my pocket) to the Foundation assuming the proceeds went to getting someone very familiar with ATA to look at the problem, look at the code, address existing PRs, and fix things. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080807121245.GA26629>