Date: Fri, 12 Apr 2013 17:07:31 -0700 From: Jeremy Chadwick <jdc@koitsu.org> To: Radio =?unknown-8bit?B?bcU/b2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl> Cc: freebsd-fs@freebsd.org Subject: Re: A failed drive causes system to hang Message-ID: <20130413000731.GA84309@icarus.home.lan> In-Reply-To: <51688BA6.1000507@o2.pl> References: <mailman.11.1365681601.78138.freebsd-fs@freebsd.org> <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 13, 2013 at 12:33:10AM +0200, Radio m?odych bandytw wrote: > On 13/04/2013 00:03, Jeremy Chadwick wrote: > >On Fri, Apr 12, 2013 at 11:52:31PM +0200, Radio m?odych bandytw wrote: > >>On 11/04/2013 23:24, Jeremy Chadwick wrote: > >>>On Thu, Apr 11, 2013 at 10:47:32PM +0200, Radio m?odych bandytw wrote: > >>>>Seeing a ZFS thread, I decided to write about a similar problem that > >>>>I experience. > >>>>I have a failing drive in my array. I need to RMA it, but don't have > >>>>time and it fails rarely enough to be a yet another annoyance. > >>>>The failure is simple: it fails to respond. > >>>>When it happens, the only thing I found I can do is switch consoles. > >>>>Any command fails, login fails, apps hang. > >>>> > >>>>On the 1st console I see a series of messages like: > >>>> > >>>>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>>>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>>>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED > >>>> > >>>>I use RAIDZ1 and I'd expect that none single failure would cause the > >>>>system to fail... > >>> > >>>You need to provide full output from "dmesg", and you need to define > >>>what the word "fails" means (re: "any command fails", "login fails"). > >>Fails = hangs. When trying to log it, I can type my user name, but > >>after I press enter the prompt for password never appear. > >>As to dmesg, tough luck. I have 2 photos on my phone and their > >>transcripts are all I can give until the problem reappears (which > >>should take up to 2 weeks). Photos are blurry and in many cases I'm > >>not sure what exactly is there. > >> > >>Screen1: > >>(ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) > >>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) > >>00 > >>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) > >>00 > >>(ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) > >>00 > >>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >> > >> > >>Screen 2: > >>ahcich0: Timeout on slot 29 port 0 > >>ahcich0: (unreadable, lots of numbers, some text) > >>(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > >>(aprobe0:ahcich0:0:0:0): CAM status: Command timeout > >>(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > >>ahcich0: Timeout on slot 29 port 0 > >>ahcich0: (unreadable, lots of numbers, some text) > >>(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > >>(aprobe0:ahcich0:0:0:0): CAM status: Command timeout > >>(aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > >>ahcich0: Timeout on slot 30 port 0 > >>ahcich0: (unreadable, lots of numbers, some text) > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > >>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > >> > >>Both are from the same event. In general, messages: > >> > >>(ada0:ahcich0:0:0:0): CAM status: Command timeout > >>(ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > >>(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. > >> > >>are the most common. > >> > >>I've waited for more than 1/2 hour once and the system didn't return > >>to a working state, the messages kept flowing and pretty much > >>nothing was working. What's interesting, I remember that it happened > >>to me even when I was using an installer (PC-BSD one), before the > >>actual installation began, so the disk stored no program data. And I > >>*think* there was no ZFS yet anyway. > >> > >>> > >>>I've already demonstrated that loss of a disk in raidz1 (or even 2 disks > >>>in raidz2) does not cause ""the system to fail"" on stable/9. However, > >>>if you lose enough members or vdevs to cause catastrophic failure, there > >>>may be anomalies depending on how your system is set up: > >>> > >>>http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html > >>> > >>>If the pool has failmode=wait, any I/O to that pool will block (wait) > >>>indefinitely. This is the default. > >>> > >>>If the pool has failmode=continue, existing write I/O operations will > >>>fail with EIO (I/O error) (and hopefully applications/daemons will > >>>handle that gracefully -- if not, that's their fault) but any subsequent > >>>I/O (read or write) to that pool will block (wait) indefinitely. > >>> > >>>If the pool has failmode=panic, the kernel will immediately panic. > >>> > >>>If the CAM layer is what's wedged, that may be a different issue (and > >>>not related to ZFS). I would suggest running stable/9 as many > >>>improvements in this regard have been committed recently (some related > >>>to CAM, others related to ZFS and its new "deadman" watcher). > >> > >>Yeah, because of the installer failure, I don't think it's related to ZFS. > >>Even if it is, for now I won't set any ZFS properties in hope it > >>repeats and I can get better data. > >>> > >>>Bottom line: terse output of the problem does not help. Be verbose, > >>>provide all output (commands you type, everything!), as well as any > >>>physical actions you take. > >>> > >>Yep. In fact having little data was what made me hesitate to write > >>about it; since I did already, I'll do my best to get more info, > >>though for now I can only wait for a repetition. > >> > >> > >>On 12/04/2013 00:08, Quartz wrote:> > >>>>Seeing a ZFS thread, I decided to write about a similar problem that I > >>>>experience. > >>> > >>>I'm assuming you're referring to my "Failed pool causes system to hang" > >>>thread. I wonder if there's some common issue with zfs where it locks up > >>>if it can't write to disks how it wants to. > >>> > >>>I'm not sure how similar your problem is to mine. What's your pool setup > >>>look like? Redundancy options? Are you booting from a pool? I'd be > >>>interested to know if you can just yank the cable to the drive and see > >>>if the system recovers. > >>> > >>>You seem to be worse off than me- I can still login and run at least a > >>>couple commands. I'm booting from a straight ufs drive though. > >>> > >>>______________________________________ > >>>it has a certain smooth-brained appeal > >>> > >>Like I said, I don't think it's ZFS-specific, but just in case...: > >>RAIDZ1, root on ZFS. I should reduce severity of a pool loss before > >>pulling cables, so no tests for now. > > > >Key points: > > > >1. We now know why "commands hang" and anything I/O-related blocks > >(waits) for you: because your root filesystem is ZFS. If the ZFS layer > >is waiting on CAM, and CAM is waiting on your hardware, then those I/O > >requests are going to block indefinitely. So now you know the answer to > >why that happens. > > > >2. I agree that the problem is not likely in ZFS, but rather either with > >CAM, the AHCI implementation used, or hardware (either disk or storage > >controller). > > > >3. Your lack of "dmesg" is going to make this virtually impossible to > >solve. We really, ***really*** need that. I cannot stress this enough. > >This will tell us a lot of information about your system. We're also > >going to need to see "zpool status" output, as well as "zpool get all" > >and "zfs get all". "pciconf -lvbc" would also be useful. > > > >There are some known "gotchas" with certain models of hard disks or AHCI > >controllers (which is responsible is unknown at this time), but I don't > >want to start jumping to conclusions until full details can be provided > >first. > > > >I would recommend formatting a USB flash drive as FAT/FAT32, booting > >into single-user mode, then mounting the USB flash drive and issuing > >the above commands + writing the output to files on the flash drive, > >then provide those here. > > > >We really need this information. > > > >4. Please involve the PC-BSD folks in this discussion. They need to be > >made aware of issues like this so they (and iXSystems, potentially) can > >investigate from their side. > > > OK, thanks for the info. > Since dmesg is so important, I'd say the best thing is to wait for > the problem to happen again. When it does, I'll restart the thread > with every information that you requested here and with a PC-BSD > cross-post. > > However, I just got a different hang just a while ago. This time it > was temporary, I don't know, I switched to console0 after ~10 > seconds, there were 2 errors. Nothing appeared for ~1 minute, so I > switched back and the system was OK. Different drive, I haven't seen > problems with this one. And I think they used to be ahci, here's > ata. > > dmesg: > > fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 > (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 > (ada1:ata0:0:0:0): CAM status: Command timeout > (ada1:ata0:0:0:0): Retrying command > vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 > linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented > (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 > (ada1:ata0:0:0:0): CAM status: Command timeout > (ada1:ata0:0:0:0): Retrying command > > {another 150KBytes of data snipped} The above output indicates that there was a timeout when trying to issue a 48-bit DMA request to the disk. The disk did not respond to the request within 30 seconds. If you were using AHCI, we'd be able to see if the AHCI layer was reporting signalling problems or other anomalies that could explain the behaviour. With ATA, such is significantly limited. It's worse if you're hiding/not showing us the entire information. The classic FreeBSD ATA driver does not provide command queueing (NCQ), while AHCI via CAM does. The difference is that command queueing causes xxx_FPDMA_QUEUED CDBs to be issued to the disk. I'm going to repeat myself -- for the last time: CAN YOU PLEASE JUST PROVIDE "DMESG" FROM THE SYSTEM? Like after a fresh reboot? If you're able to provide all of the above, I don't know why you can't provide dmesg. It is the most important information that there is. I am sick and tired of stressing this point. Furthermore, please stop changing ATA vs. AHCI interface drivers. The more you change/screw around with, the less likely people are going to help. CHANGE NOTHING ON THE SYSTEM. Leave it how it is. Do not fiddle with things or start flipping switches/changing settings/etc. to "try and relieve the problem". You're asking other people for help, which means you need to be patient and follow what we ask. Thank you for the rest of the output, however. It looks like this is another system with an ATI-based controller (which is usually the kind involved in my aforementioned "gotchas"), but there still isn't enough information that can help. I have a gut feeling of what's about to come, but I need to see dmesg output before I can determine that. Furthermore, can you please provide this information with its formatting intact? Your Email client is screwing up "long lines" and causing unnecesary wrapping. The mailing list will nuke attachments, so please use pastebin or some similar service + provide URLs. > OK, so I forgot an important info that my pool doesn't have redundancy.... > Still, I don't think it's *the* problem because it happened in the > installer and it used to happen before I RMA'd another disk (That > was hanging the same, but more often. They are the same batch. Not > sure about ada1, but I do have a 3rd disk from the batch. I can > check it tomorrow). Actually I have the replacement disk laying > around...should I connect it or better leave the system as it is and > wait for the problem to reappear? You're still not giving the information needed. Your reluctance and inability to provide what's asked for is really pissing me off. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130413000731.GA84309>