Date: Sun, 10 Aug 2008 00:11:17 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: "Andrey V. Elsukov" <bu7cher@yandex.ru> Cc: freebsd-fs@freebsd.org, Scott Long <scottl@FreeBSD.org> Subject: Re: zpool degraded - 'UNAVAIL cannot open' functioning drive Message-ID: <20080810071117.GA3857@eos.sc1.parodius.com> In-Reply-To: <489BCA4D.3050704@yandex.ru> References: <6c3c36d00808062109y6ae176a0ha055129392b00542@mail.gmail.com> <20080807044759.GA7505@eos.sc1.parodius.com> <6c3c36d00808062212y4e9a1464i48e146e84725a36e@mail.gmail.com> <6c3c36d00808062235v5cbb4470v990b76d569f85614@mail.gmail.com> <20080807055841.GB9735@eos.sc1.parodius.com> <489A9739.20707@yandex.ru> <20080807071434.GA15465@eos.sc1.parodius.com> <489ADD89.8070809@mawer.org> <20080807121245.GA26629@eos.sc1.parodius.com> <489BCA4D.3050704@yandex.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 08, 2008 at 08:23:41AM +0400, Andrey V. Elsukov wrote: > Jeremy Chadwick wrote: >> In almost every case I've looked at so far, the individuals' chipsets, >> disks, and overall setup are different. SMART statistics on the drives >> show absolutely no sign of errors, or anything that indicates a hardware >> failure. Many of the users are using AHCI as well (myself included, and >> I have seen the DMA error issue myself), which is more reliable than >> classic IDE. > > I have done some work on AHCI part of ATA driver and I am looking > for testers... > http://perforce.freebsd.org/changeList.cgi?CMD=changes&FSPC=//depot/user/butcher/src/... These look quite good. Regarding change 146184, do you know if this addresses the problems documented in PR 102211, PR 108924, or what I described in http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html ? >> It would be benefitial if there was some form of sysctl to increase the >> verbosity from the ATA subsystem when an error happens. The existing >> data we get back is terse, and barely useful. I know for a fact there's >> more debug information that could be output in such scenarios. And >> please do not reply with "good idea, send patches" unless you're wanting >> to be chewed out. :-) > > Ok, I'll try to add some verbose 'printfs' in my branch in perforce :) That'd be great. It appears to me, WRT FreeBSD, that error conditions do not bother to handle SATA-related errors; everything is assumed to be ATA, so the extra granularity SATA implements is not available on FreeBSD. This also starts to enter the realm of why FreeBSD does not implement support for NCQ -- is this because the ATA driver was built solely around ATA, rather than AHCI? Linux appears to have two different drivers depending upon if you're using AHCI or not. FreeBSD's ata(4) code seems to have everything intermixed/jumbled around, so it looks a lot like spaghetti... Is this the problem? >>> I'm going to do some analysis and find out whether I can find any of >>> our systems that may be experiencing ATA errors that don't correlate >>> with what their SMART data is saying. To date I haven't caught any, >>> but that's not to say they may not be happening... just that all of >>> the ones I have caught to date do appear to have been >>> hardware-related issues... > > IMHO. Today we have many hardware versions and revisions and some of > them are buggy. But another OSes (windows, linux) work with buggy > hardware without big problems. Yes, some developers have docs and can > make workarounds.. I think our ata driver needs new error handling > subsystem, which can correctly handle errors. Yep, I understand there is in fact bugs in consumer and commercial-grade hardware/firmwares. However, FreeBSD users will want to know if they're suffering from said bugs, or some other issue. I'm more than willing to document both scenarios (known buggy hardware and other bugs which are NOT the result of hardware flaws), but I (obviously) need data and example output for this. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080810071117.GA3857>