From owner-freebsd-stable@FreeBSD.ORG Sun Jul 3 15:55:13 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 964371065672; Sun, 3 Jul 2011 15:55:13 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f44.google.com (mail-fx0-f44.google.com [209.85.161.44]) by mx1.freebsd.org (Postfix) with ESMTP id 68CCA8FC0A; Sun, 3 Jul 2011 15:55:11 +0000 (UTC) Received: by fxe6 with SMTP id 6so3726703fxe.17 for ; Sun, 03 Jul 2011 08:55:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; bh=BHkEFEnVDPkt4PRx84BrgGWVGlDajdmIm9n9Gt8ZN+Y=; b=V+zgVwQ3coNeybuuILS6DrrtV00a3v0Ni32Ei3jzMHvriteQfMYBzOJpqp46mNZ0mm G2sENPyCWCRqrG8easO4F0Z5UD138Fl316Gz4MRsQ/Qzt/Fjklp7e+/AFv4tdK6NIvAd zDCx7aZ+kuey6knIRKSJWC1inhS/Ib/Be9Nz8= Received: by 10.223.55.8 with SMTP id s8mr8028415fag.141.1309708511134; Sun, 03 Jul 2011 08:55:11 -0700 (PDT) Received: from localhost ([95.69.173.122]) by mx.google.com with ESMTPS id k26sm3875893fak.0.2011.07.03.08.55.08 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 03 Jul 2011 08:55:08 -0700 (PDT) From: Mikolaj Golub To: Timothy Smith References: <8639ioadji.fsf@kopusha.home.net> X-Comment-To: Timothy Smith Sender: Mikolaj Golub Date: Sun, 03 Jul 2011 18:55:06 +0300 In-Reply-To: (Timothy Smith's message of "Sat, 2 Jul 2011 14:43:15 -0700") Message-ID: <861uy7xsth.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Pawel Jakub Dawidek , freebsd-stable@freebsd.org Subject: Re: HAST + ZFS: no action on drive failure X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2011 15:55:13 -0000 On Sat, 2 Jul 2011 14:43:15 -0700 Timothy Smith wrote: TS> Hello Mikolaj, TS> So, just to be clear, if a local drive fails in my pool, but the TS> corresponding remote drive remains available, then hastd will both write to TS> and read from the remote drive? That's really very cool! Yes. TS> I looked more closely at the hastd(8) man page. There is some indication of TS> what you say, but not so clear: TS> "Read operations (BIO_READ) are handled locally unless I/O error occurs or local TS> version of the data is not up-to-date yet (synchronization is in progress)." This is about READ operations, and for WRITE we have just above: Every write, delete and flush operation (BIO_WRITE, BIO_DELETE, BIO_FLUSH) is send to local component and synchronously replicated to the remote (secondary) node if it is available. There might be things that should be improved in documetation but I don't feel capable to do this :-) TS> Perhaps this can be modified a bit? Adding, "or the local disk is TS> unavailable. In such a case, the I/O operation will be handled by the remote TS> resource." TS> It does makes sense however, since HAST is base on the idea of raid. This TS> feature increases the redundancy of the system greatly. My boss will be TS> very impressed, as am I! TS> I did notice however that when the pulled drive is reinserted, I need to TS> change the associated hast resource to init, then back to primary to allow TS> hastd to once again use it (perhaps the same if the secondary drive is TS> failed?). Unless it will do this on it's own after some time? I did not wait TS> more than a few minutes. But this is easy enough to script or to monitor the TS> log and present a notification to admin at such a time. When you are reinserting the drive the resource should be in init state. Remember, some data was updated on secondary only, so the right sequence of operations could be: 1) Failover (switch primary to init and secondary to primary). 2) Fix the disk issue. 3) If this is a new drive, recreate HAST metadata on it with hastctl utility. 4) Switch the repaired resource to secondary and wait until the new primary connects to it and updates metadata. After this synchronization is started. 5) You can switch to the previous primary before the synchronization is complete -- it will continue in right direction, but then you should expect performance degradation until the synchronization is complete -- the READ requests will go to remote node. So it might be better to wait until the synchronization is complete before switching back. -- Mikolaj Golub