From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 10:18:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AE28841E for ; Sun, 14 Apr 2013 10:18:12 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 470B8978 for ; Sun, 14 Apr 2013 10:18:11 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1URK0e-0007Hp-1d; Sun, 14 Apr 2013 12:18:08 +0200 Received: from dhcp-077-251-158-153.chello.nl ([77.251.158.153] helo=pinky) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1URK0d-0005D2-Lj; Sun, 14 Apr 2013 12:18:07 +0200 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: support@lists.pcbsd.org, =?utf-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= Subject: Re: A failed drive causes system to hang References: <516A8092.2080002@o2.pl> Date: Sun, 14 Apr 2013 12:18:07 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <516A8092.2080002@o2.pl> User-Agent: Opera Mail/12.15 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 246115766b56dba7f675551df821dbd2 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 10:18:12 -0000 On Sun, 14 Apr 2013 12:10:26 +0200, Radio młodych bandytów wrote: > Cross-post from freebsd-fs: > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333977+0+archive/2013/freebsd-fs/20130414.freebsd-fs > > I have a failing drive in my array. I need to RMA it, but don't have > time and it fails rarely enough to be a yet another annoyance. Maybe offtopic, but you do have time to write long mails, but not to RMA broken disks? I hope your clients don't read this. :-) Ronald. > The failure is simple: it fails to respond. > When it happens, the only thing I found I can do is switch consoles. Any > command hangs, login on different consoles hangs, apps hang. > I run PC-BSD 9.1. > > On the 1st console I see a series of messages like: > > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED > > I've seen it happening even when running an installer from a different > drive, while preparing installation (don't remember which step). > > I have partial dmesg screenshots from an older failure (21st of December > 2012), transcript below: > > Screen1: > (ada0:ahcich0:0:0:0): FLUSHCACHE40. ACB: (ea?) 00 00 00 00 (cut?) > (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 05 d3(cut) > 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 7b(cut) > 00 > (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-qu (cut) > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 d0(cut) > 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > > Screen 2: > ahcich0: Timeout on slot 29 port 0 > ahcich0: (unreadable, lots of numbers, some text) > (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > (aprobe0:ahcich0:0:0:0): CAM status: Command timeout > (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > ahcich0: Timeout on slot 29 port 0 > ahcich0: (unreadable, lots of numbers, some text) > (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: (cc?) 00 (cut) > (aprobe0:ahcich0:0:0:0): CAM status: Command timeout > (aprobe0:ahcich0:0:0:0): Error (5?), Retry was blocked > ahcich0: Timeout on slot 30 port 0 > ahcich0: (unreadable, lots of numbers, some text) > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 01 (cut) > > Both are from the same event. In general, messages: > > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Error 5, Periph was invalidated > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. > > are the most common. > > And one recent, though from a different drive (being a part of the same > array): > fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.19 > (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 82 46 b8 40 25 00 00 00 01 00 > (ada1:ata0:0:0:0): CAM status: Command timeout > (ada1:ata0:0:0:0): Retrying command > vboxdrv: fAsync=0 offMin=0x53d offMax=0x52b9 > linux: pid 17170 (npviewer.bin): syscall pipe2 not implemented > (ada1:ata0:0:0:0): READ_DMA48. ACB: 25 00 87 1a c7 40 1a 00 00 00 01 00 > (ada1:ata0:0:0:0): CAM status: Command timeout > (ada1:ata0:0:0:0): Retrying command > > A thing pointed out on freebsd-fs is that driver changed from ahcich0 to > ata0. I haven't done any configuration here myself. Have you changed > some defaults?