Date: Sun, 14 Apr 2013 12:34:46 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl> To: Jeremy Chadwick <jdc@koitsu.org> Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org Subject: Re: A failed drive causes system to hang Message-ID: <516A8646.4000101@o2.pl> In-Reply-To: <20130413000731.GA84309@icarus.home.lan> References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org> <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 13/04/2013 02:07, Jeremy Chadwick wrote: > > > On Sat, Apr 13, 2013 at 12:33:10AM +0200, Radio m?odych bandytw wrote: >> On 13/04/2013 00:03, Jeremy Chadwick wrote: >>> On Fri, Apr 12, 2013 at 11:52:31PM +0200, Radio m?odych bandytw wrote: >>>> On 11/04/2013 23:24, Jeremy Chadwick wrote: >>>>> On Thu, Apr 11, 2013 at 10:47:32PM +0200, Radio m?odych bandytw wrote: >>>>>> Seeing a ZFS thread, I decided to write about a similar problem that >>>>>> I experience. >>>>>> I have a failing drive in my array. I need to RMA it, but don't have >>>>>> time and it fails rarely enough to be a yet another annoyance. >>>>>> The failure is simple: it fails to respond. >>>>>> When it happens, the only thing I found I can do is switch consoles. >>>>>> Any command fails, login fails, apps hang. >>>>>> >>>>>> On the 1st console I see a series of messages like: >>>>>> >>>>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED >>>>>> >>>>>> I use RAIDZ1 and I'd expect that none single failure would cause the >>>>>> system to fail... >>>>> >>>>> You need to provide full output from "dmesg", and you need to define >>>>> what the word "fails" means (re: "any command fails", "login fails"). >>>> Fails = hangs. When trying to log it, I can type my user name, but >>>> after I press enter the prompt for password never appear. >>>> As to dmesg, tough luck. I have 2 photos on my phone and their >>>> transcripts are all I can give until the problem reappears (which >>>> should take up to 2 weeks). Photos are blurry and in many cases I'm >>>> not sure what exactly is there. >>>> >>>> Screen1: >>>> (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) >>>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) >>>> 00 >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) >>>> 00 >>>> (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) >>>> 00 >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> >>>> >>>> Screen 2: >>>> ahcich0: Timeout on slot 29 port 0 >>>> ahcich0: (unreadable, lots of numbers, some text) >>>> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) >>>> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout >>>> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked >>>> ahcich0: Timeout on slot 29 port 0 >>>> ahcich0: (unreadable, lots of numbers, some text) >>>> (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) >>>> (aprobe0:ahcich0:0:0:0): CAM status: Command timeout >>>> (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked >>>> ahcich0: Timeout on slot 30 port 0 >>>> ahcich0: (unreadable, lots of numbers, some text) >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) >>>> >>>> Both are from the same event. In general, messages: >>>> >>>> (ada0:ahcich0:0:0:0): CAM status: Command timeout >>>> (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated >>>> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. >>>> >>>> are the most common. >>>> >>>> I've waited for more than 1/2 hour once and the system didn't return >>>> to a working state, the messages kept flowing and pretty much >>>> nothing was working. What's interesting, I remember that it happened >>>> to me even when I was using an installer (PC-BSD one), before the >>>> actual installation began, so the disk stored no program data. And I >>>> *think* there was no ZFS yet anyway. >>>> >>>>> >>>>> I've already demonstrated that loss of a disk in raidz1 (or even 2 disks >>>>> in raidz2) does not cause ""the system to fail"" on stable/9. However, >>>>> if you lose enough members or vdevs to cause catastrophic failure, there >>>>> may be anomalies depending on how your system is set up: >>>>> >>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html >>>>> >>>>> If the pool has failmode=wait, any I/O to that pool will block (wait) >>>>> indefinitely. This is the default. >>>>> >>>>> If the pool has failmode=continue, existing write I/O operations will >>>>> fail with EIO (I/O error) (and hopefully applications/daemons will >>>>> handle that gracefully -- if not, that's their fault) but any subsequent >>>>> I/O (read or write) to that pool will block (wait) indefinitely. >>>>> >>>>> If the pool has failmode=panic, the kernel will immediately panic. >>>>> >>>>> If the CAM layer is what's wedged, that may be a different issue (and >>>>> not related to ZFS). I would suggest running stable/9 as many >>>>> improvements in this regard have been committed recently (some related >>>>> to CAM, others related to ZFS and its new "deadman" watcher). >>>> >>>> Yeah, because of the installer failure, I don't think it's related to ZFS. >>>> Even if it is, for now I won't set any ZFS properties in hope it >>>> repeats and I can get better data. >>>>> >>>>> Bottom line: terse output of the problem does not help. Be verbose, >>>>> provide all output (commands you type, everything!), as well as any >>>>> physical actions you take. >>>>> >>>> Yep. In fact having little data was what made me hesitate to write >>>> about it; since I did already, I'll do my best to get more info, >>>> though for now I can only wait for a repetition. >>>> >>>> >>>> On 12/04/2013 00:08, Quartz wrote:> >>>>>> Seeing a ZFS thread, I decided to write about a similar problem that I >>>>>> experience. >>>>> >>>>> I'm assuming you're referring to my "Failed pool causes system to hang" >>>>> thread. I wonder if there's some common issue with zfs where it locks up >>>>> if it can't write to disks how it wants to. >>>>> >>>>> I'm not sure how similar your problem is to mine. What's your pool setup >>>>> look like? Redundancy options? Are you booting from a pool? I'd be >>>>> interested to know if you can just yank the cable to the drive and see >>>>> if the system recovers. >>>>> >>>>> You seem to be worse off than me- I can still login and run at least a >>>>> couple commands. I'm booting from a straight ufs drive though. >>>>> >>>>> ______________________________________ >>>>> it has a certain smooth-brained appeal >>>>> >>>> Like I said, I don't think it's ZFS-specific, but just in case...: >>>> RAIDZ1, root on ZFS. I should reduce severity of a pool loss before >>>> pulling cables, so no tests for now. >>> >>> Key points: >>> >>> 1. We now know why "commands hang" and anything I/O-related blocks >>> (waits) for you: because your root filesystem is ZFS. If the ZFS layer >>> is waiting on CAM, and CAM is waiting on your hardware, then those I/O >>> requests are going to block indefinitely. So now you know the answer to >>> why that happens. >>> >>> 2. I agree that the problem is not likely in ZFS, but rather either with >>> CAM, the AHCI implementation used, or hardware (either disk or storage >>> controller). >>> >>> 3. Your lack of "dmesg" is going to make this virtually impossible to >>> solve. We really, ***really*** need that. I cannot stress this enough. >>> This will tell us a lot of information about your system. We're also >>> going to need to see "zpool status" output, as well as "zpool get all" >>> and "zfs get all". "pciconf -lvbc" would also be useful. >>> >>> There are some known "gotchas" with certain models of hard disks or AHCI >>> controllers (which is responsible is unknown at this time), but I don't >>> want to start jumping to conclusions until full details can be provided >>> first. >>> >>> I would recommend formatting a USB flash drive as FAT/FAT32, booting >>> into single-user mode, then mounting the USB flash drive and issuing >>> the above commands + writing the output to files on the flash drive, >>> then provide those here. >>> >>> We really need this information. >>> >>> 4. Please involve the PC-BSD folks in this discussion. They need to be >>> made aware of issues like this so they (and iXSystems, potentially) can >>> investigate from their side. >>> >> OK, thanks for the info. >> Since dmesg is so important, I'd say the best thing is to wait for >> the problem to happen again. When it does, I'll restart the thread >> with every information that you requested here and with a PC-BSD >> cross-post. >> >> However, I just got a different hang just a while ago. This time it >> was temporary, I don't know, I switched to console0 after ~10 >> seconds, there were 2 errors. Nothing appeared for ~1 minute, so I >> switched back and the system was OK. Different drive, I haven't seen >> problems with this one. And I think they used to be ahci, here's >> ata. >> >> dmesg: >> >> fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 >> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 >> (ada1:ata0:0:0:0): CAM status: Command timeout >> (ada1:ata0:0:0:0): Retrying command >> vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 >> linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented >> (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 >> (ada1:ata0:0:0:0): CAM status: Command timeout >> (ada1:ata0:0:0:0): Retrying command >> >> {another 150KBytes of data snipped} > > The above output indicates that there was a timeout when trying to issue > a 48-bit DMA request to the disk. The disk did not respond to the > request within 30 seconds. > > If you were using AHCI, we'd be able to see if the AHCI layer was > reporting signalling problems or other anomalies that could explain the > behaviour. With ATA, such is significantly limited. It's worse if > you're hiding/not showing us the entire information. > > The classic FreeBSD ATA driver does not provide command queueing (NCQ), > while AHCI via CAM does. The difference is that command queueing causes > xxx_FPDMA_QUEUED CDBs to be issued to the disk. > > I'm going to repeat myself -- for the last time: CAN YOU PLEASE JUST > PROVIDE "DMESG" FROM THE SYSTEM? Like after a fresh reboot? If you're > able to provide all of the above, I don't know why you can't provide > dmesg. It is the most important information that there is. I am sick > and tired of stressing this point. Sorry. I thought just the error was important. So here you are: dmesg.boot: http://pastebin.com/LFXPusMX > > Furthermore, please stop changing ATA vs. AHCI interface drivers. > The more you change/screw around with, the less likely people are going > to help. CHANGE NOTHING ON THE SYSTEM. Leave it how it is. Do not > fiddle with things or start flipping switches/changing settings/etc. to > "try and relieve the problem". You're asking other people for help, > which means you need to be patient and follow what we ask. I haven't changed one bit myself. It may have been a change of defaults in PC-BSD. I just asked them about it. Or maybe different drives use different drivers. > > Thank you for the rest of the output, however. It looks like this is > another system with an ATI-based controller (which is usually the kind > involved in my aforementioned "gotchas"), but there still isn't enough > information that can help. I have a gut feeling of what's about to > come, but I need to see dmesg output before I can determine that. > > Furthermore, can you please provide this information with its formatting > intact? Your Email client is screwing up "long lines" and causing > unnecesary wrapping. > > The mailing list will nuke attachments, so please use pastebin or some > similar service + provide URLs. pciconf -lvbc: http://pastebin.com/vvCKAWm1 zpool status: http://pastebin.com/D3Av7x9X zfs get all: http://pastebin.com/4sT37VqZ zpool get all tank1: http://pastebin.com/HZJTJPa2 -- Twoje radio
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?516A8646.4000101>