From owner-freebsd-fs@FreeBSD.ORG Sun Apr 6 01:45:45 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4078F7F9 for ; Sun, 6 Apr 2014 01:45:45 +0000 (UTC) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C3A9761B for ; Sun, 6 Apr 2014 01:45:44 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id b13so5197073wgh.5 for ; Sat, 05 Apr 2014 18:45:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=MLGcXcFey19PtxItau85nDfGTZvUqtR9ALokTzHJ7kw=; b=KbEQNSu91PJZ+rsmP5H66oh/3RNJP3bipqa2p7IIzdUuLbUC03PNtrLrmk1+3iAJdp pdn/1gSmQUSePbpBqBknL0zKrnxf9dGQXQD6xoq0cnA+rYKJW3AQS/lClYRl35LhMfxa Gt50C96ZGETn6WtZbwQ1ErvQji+yPYuAZ2aWYHtg8S7G+/m93PWbtdm09nXinN/Kqsy1 A7wUVfg55GtB6pQs4qV0ckA2qLjdARbbF5uIDynf2T3v6yhpWEbCl1oyclzJXz5Aamjh 19+cf81gLfBvb1MfvO7w8VM+2HDJvV0nPb8ZD9oFRZxealVo+PoG/NaCkb3cEUVSui3b yIbg== X-Received: by 10.194.59.43 with SMTP id w11mr31675247wjq.65.1396748743094; Sat, 05 Apr 2014 18:45:43 -0700 (PDT) Received: from [192.168.20.30] (81-178-2-118.dsl.pipex.com. [81.178.2.118]) by mx.google.com with ESMTPSA id gz1sm14171133wib.14.2014.04.05.18.45.41 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 05 Apr 2014 18:45:42 -0700 (PDT) Message-ID: <5340B1C5.4000700@gmail.com> Date: Sun, 06 Apr 2014 02:45:41 +0100 From: Kaya Saman User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: kpneal@pobox.com Subject: Re: Device Removed by Administrator in ZPOOL? References: <53408FAB.8080202@gmail.com> <512A7865-CEFD-4BDA-A060-AE911BEDD5B7@tuxsystems.co.za> <53409BF1.6050001@gmail.com> <20140406002849.GA14765@neutralgood.org> In-Reply-To: <20140406002849.GA14765@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Filesystems , Vusa Moyo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Apr 2014 01:45:45 -0000 On 04/06/2014 01:28 AM, kpneal@pobox.com wrote: > On Sun, Apr 06, 2014 at 01:12:33AM +0100, Kaya Saman wrote: >> Many thanks for the response! >> >> The server doesn't show any lights for "drive error" however, the blue >> read LED isn't coming on, on the drive in question (as removed from ZPOOL). >> >> I will have a look for LSI tools in @Ports and also see if the BIOS LSI >> hook comes up with anything. > Have you seen any other errors in your logs? Seems like if a drive fails > there should be some other error message reporting the errors that resulted > in ZFS marking the drive removed. What does 'dmesg' have to say? > > Once ZFS has stopped using the drive (for whatever reason) I wouldn't > expect you to see anything else happening on the drive. So the light not > coming on doesn't really tell us anything new. > > Also, aren't 'green' drives the kind that spin down and then have to spin > back up when a request comes in? I don't know what happens if a drive takes > "too long" to respond because it has spun down. I have no idea how FreeBSD > handles that, and I also don't know if ZFS adds anything to the equation. > Hopefully someone else here will clue me/us in. > Ok this is really weird.... just did a reboot and now: $ zpool status pool: ZPOOL_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sun Apr 6 02:43:03 2014 1.13G scanned out of 7.77T at 22.2M/s, 101h57m to go 227M resilvered, 0.01% done config: NAME STATE READ WRITE CKSUM ZPOOL_2 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 (resilvering) da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 ???? Looks like the drive might have fallen off the controller? Am just looking at the tools for it on the LSI website but there doesn't seem to be anything FreeBSD related.... Linux and Solaris yes but no FBSD? Model is LSI SAS 9207-4i4e >> On 04/06/2014 12:44 AM, Vusa Moyo wrote: >>> This is more than likely a failed drive. >>> >>> Have you physically looked at the server for orange lights which may help ID the failed drive?? >>> >>> There could also be tools to query the lsi hba. >>> >>> Sent from my iPad >>> >>>> On Apr 6, 2014, at 1:20 AM, Kaya Saman wrote: >>>> >>>> Hi, >>>> >>>> I'm running FreeBSD 10.0 x64 on a Xeon E5 based system with 8GB RAM. >>>> >>>> >>>> Checking the ZPOOL status I saw one of my drives has been offlined... the exact error is this: >>>> >>>> # zpool status -v >>>> pool: ZPOOL_2 >>>> state: DEGRADED >>>> status: One or more devices has been removed by the administrator. >>>> Sufficient replicas exist for the pool to continue functioning in a >>>> degraded state. >>>> action: Online the device using 'zpool online' or replace the device with >>>> 'zpool replace'. >>>> scan: scrub repaired 0 in 9h3m with 0 errors on Sat Apr 5 03:46:55 2014 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> ZPOOL_2 DEGRADED 0 0 0 >>>> raidz2-0 DEGRADED 0 0 0 >>>> da0 ONLINE 0 0 0 >>>> 14870388343127772554 REMOVED 0 0 0 was /dev/da1 >>>> da2 ONLINE 0 0 0 >>>> da3 ONLINE 0 0 0 >>>> da4 ONLINE 0 0 0 >>>> >>>> >>>> I think this is due to a dead disk however, I'm not certain which is why I wanted to ask here as I didn't remove the drive at all..... rather then some kind of OS/ZFS error. >>>> >>>> >>>> The drives are 2TB WD Green drives all connected to an LSI HBA; everything is still under warranty so no big issue there and I have external backups too so I'm not really that worried, I'm just trying to work out what's going on. >>>> >>>> >>>> Are my suspicions correct or should I simply try to reboot the system and see if the drive comes back online?