From owner-freebsd-fs@FreeBSD.ORG Thu Jul 18 07:29:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DF647735; Thu, 18 Jul 2013 07:29:23 +0000 (UTC) (envelope-from joe@karthauser.co.uk) Received: from babel.karthauser.co.uk (212-13-197-151.karthauser.co.uk [212.13.197.151]) by mx1.freebsd.org (Postfix) with ESMTP id 3C0EBBC7; Thu, 18 Jul 2013 07:29:22 +0000 (UTC) Received: from [192.168.10.240] (unknown [81.144.225.214]) (Authenticated sender: joemail@tao.org.uk) by babel.karthauser.co.uk (Postfix) with ESMTPSA id B4FFC290E; Thu, 18 Jul 2013 07:29:13 +0000 (UTC) Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue? From: Dr Josef Karthauser Date: Thu, 18 Jul 2013 08:29:14 +0100 Message-Id: <60F7BE75-5E2F-471E-A9CE-AF4CD17D96E2@karthauser.co.uk> References: <20130716225013.1C63B23A@babel.karthauser.co.uk> To: "freebsd-fs@freebsd.org" X-Mailer: Apple Mail (2.1508) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jul 2013 07:29:23 -0000 Hi there, I'm scratching my head. I've just migrated to a super micro chassis and = at the same time gone from FreeBSD 9.0 to 9.1-RELEASE. The machine in question is running a ZFS mirror configuration on two ada = devices (with a 8gb gmirror carved out for swap). Since doing so I've been having strange drop outs on the drives; the = just disappear from the bus like so: (ada2:ahcich2:0:0:0): removing device entry (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 = (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 = (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted At first I though it was a failing drive - one of the drives did this, = and I limped on a single drive for a week until I could get someone up = to the rack to plug a third drive in. We resilvered the zpool onto the = new device and ran with the failed drive still plugged in (but not = responding to a reset on the ada bus with camcontrol) for a week or so. Then, the new drive dropped out in exactly the same way, followed in = short order by the remaining original drive!!! After rebooting the machine, and observing all three drives probing and = available, I resilvered the gmirror and zpool again on the two devices = expected that I thought were reliable, but before the resilvering was = completed the new drive dropped out again. I'm scratching my head now. I can't imagine that it's a wiring problem, = as they are all on individual SATA buses and individually cabled. Smart isn't reporting an drive issues either=85. :/ So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade = to 9-RELENG would I expect that to resolve the problem? (Have there = been any reported ada bus issuer reported since last December?) The hardware in question is: ahci0: port = 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f = mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad8 Any ideas would be greatly welcomed. Thanks, Joe