From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:10:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9DD6E1D8 for ; Sun, 14 Apr 2013 10:10:46 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh1-ve2.go2.pl (moh1-ve2.go2.pl [193.17.41.132]) by mx1.freebsd.org (Postfix) with ESMTP id 2FA09948 for ; Sun, 14 Apr 2013 10:10:45 +0000 (UTC) Received: from moh1-ve2.go2.pl (unknown [10.0.0.132]) by moh1-ve2.go2.pl (Postfix) with ESMTP id B2B971065D07 for ; Sun, 14 Apr 2013 12:10:38 +0200 (CEST) Received: from unknown (unknown [10.0.0.142]) by moh1-ve2.go2.pl (Postfix) with SMTP for ; Sun, 14 Apr 2013 12:10:37 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IEMXMr; Sun, 14 Apr 2013 12:10:36 +0200 Message-ID: <516A8092.2080002@o2.pl> Date: Sun, 14 Apr 2013 12:10:26 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: support@lists.pcbsd.org Subject: A failed drive causes system to hang Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 38 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:10:46 -0000 Cross-post from freebsd-fs: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs I have a failing drive in my array. I need to RMA it, but don't have time and it fails rarely enough to be a yet another annoyance. The failure is simple: it fails to respond. When it happens, the only thing I found I can do is switch consoles. Any command hangs, login on different consoles hangs, apps hang. I run PC-BSD 9.1. On the 1st console I see a series of messages like: (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED I've seen it happening even when running an installer from a different drive, while preparing installation (don't remember which step). I have partial dmesg screenshots from an older failure (21st of December 2012), transcript below: Screen1: (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) 00 (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated Screen 2: ahcich0: Timeout on slot 29 port 0 ahcich0: (unreadable, lots of numbers, some text) (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked ahcich0: Timeout on slot 29 port 0 ahcich0: (unreadable, lots of numbers, some text) (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked ahcich0: Timeout on slot 30 port 0 ahcich0: (unreadable, lots of numbers, some text) (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) Both are from the same event. In general, messages: (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. are the most common. And one recent, though from a different drive (being a part of the same array): fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 (ada1:ata0:0:0:0): CAM status: Command timeout (ada1:ata0:0:0:0): Retrying command vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 (ada1:ata0:0:0:0): CAM status: Command timeout (ada1:ata0:0:0:0): Retrying command A thing pointed out on freebsd-fs is that driver changed from ahcich0 to ata0. I haven't done any configuration here myself. Have you changed some defaults? -- Twoje radio