From owner-svn-doc-all@FreeBSD.ORG Wed Apr 9 13:44:06 2014 Return-Path: Delivered-To: svn-doc-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07B55EAB; Wed, 9 Apr 2014 13:44:06 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E7FAC194F; Wed, 9 Apr 2014 13:44:05 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.8/8.14.8) with ESMTP id s39Di5tc053635; Wed, 9 Apr 2014 13:44:05 GMT (envelope-from dru@svn.freebsd.org) Received: (from dru@localhost) by svn.freebsd.org (8.14.8/8.14.8/Submit) id s39Di5gQ053634; Wed, 9 Apr 2014 13:44:05 GMT (envelope-from dru@svn.freebsd.org) Message-Id: <201404091344.s39Di5gQ053634@svn.freebsd.org> From: Dru Lavigne Date: Wed, 9 Apr 2014 13:44:05 +0000 (UTC) To: doc-committers@freebsd.org, svn-doc-all@freebsd.org, svn-doc-head@freebsd.org Subject: svn commit: r44500 - head/en_US.ISO8859-1/books/handbook/disks X-SVN-Group: doc-head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-doc-all@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "SVN commit messages for the entire doc trees \(except for " user" , " projects" , and " translations" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2014 13:44:06 -0000 Author: dru Date: Wed Apr 9 13:44:05 2014 New Revision: 44500 URL: http://svnweb.freebsd.org/changeset/doc/44500 Log: Finish editorial review of HAST chapter. Sponsored by: iXsystems Modified: head/en_US.ISO8859-1/books/handbook/disks/chapter.xml Modified: head/en_US.ISO8859-1/books/handbook/disks/chapter.xml ============================================================================== --- head/en_US.ISO8859-1/books/handbook/disks/chapter.xml Wed Apr 9 12:40:41 2014 (r44499) +++ head/en_US.ISO8859-1/books/handbook/disks/chapter.xml Wed Apr 9 13:44:05 2014 (r44500) @@ -3675,22 +3675,22 @@ Device 1K-blocks Used Av The goal of this example is to build a robust storage system which is resistant to the failure of any given node. - The scenario is that a primary node of - the cluster fails. If this happens, the - secondary node is there to take over + If the primary node + fails, the + secondary node is there to take over seamlessly, check and mount the file system, and continue to work without missing a single bit of data. - To accomplish this task, another &os; feature, - CARP, provides for automatic failover on - the IP layer. CARP (Common - Address Redundancy Protocol) allows multiple hosts on the - same network segment to share an IP address. Set up + To accomplish this task, the Common + Address Redundancy Protocol + (CARP) is used to provide for automatic failover at + the IP layer. CARP allows multiple hosts on the + same network segment to share an IP address. Set up CARP on both nodes of the cluster according to the documentation available in - . After setup, each node will - have its own carp0 interface with a - shared IP address of + . In this example, each node will + have its own management IP address and a + shared IP address of 172.16.0.254. The primary HAST node of the cluster must be the master CARP node. @@ -3699,7 +3699,7 @@ Device 1K-blocks Used Av section is now ready to be exported to the other hosts on the network. This can be accomplished by exporting it through NFS or - Samba, using the shared IP + Samba, using the shared IP address 172.16.0.254. The only problem which remains unresolved is an automatic failover should the primary node fail. @@ -3713,7 +3713,7 @@ Device 1K-blocks Used Av These state change events make it possible to run a script which will automatically handle the HAST failover. - To be able to catch state changes on the + To catch state changes on the CARP interfaces, add this configuration to /etc/devd.conf on each node: @@ -3732,21 +3732,27 @@ notify 30 { action "/usr/local/sbin/carp-hast-switch slave"; }; + + If the systems are running &os; 10 or higher, + replace carp0 with the name of the + CARP-configured interface. + + Restart &man.devd.8; on both nodes to put the new configuration into effect: &prompt.root; service devd restart - When the carp0 interface state + When the specified interface state changes by going up or down , the system generates a - notification, allowing the &man.devd.8; subsystem to run an - arbitrary script, in this case - /usr/local/sbin/carp-hast-switch. This - script handles the automatic failover. For further - clarification about the above &man.devd.8; configuration, + notification, allowing the &man.devd.8; subsystem to run the + specified automatic failover script, + /usr/local/sbin/carp-hast-switch. + For further + clarification about this configuration, refer to &man.devd.conf.5;. - An example of such a script could be: + Here is an example of an automated failover script: #!/bin/sh @@ -3755,7 +3761,7 @@ notify 30 { # and Viktor Petersson <vpetersson@wireload.net> # The names of the HAST resources, as listed in /etc/hast.conf -resources="test" +resources="test" # delay in mounting HAST resource after becoming master # make your best guess @@ -3833,13 +3839,12 @@ case "$1" in esac In a nutshell, the script takes these actions when a - node becomes master / - primary: + node becomes master: - Promotes the HAST pools to - primary on a given node. + Promotes the HAST pool to + primary on the other node. @@ -3848,41 +3853,40 @@ esac - Mounts the pools at an appropriate place. + Mounts the pool. - When a node becomes backup / - secondary: + When a node becomes + secondary: - Unmounts the HAST pools. + Unmounts the HAST pool. - Degrades the HAST pools to + Degrades the HAST pool to secondary. - Keep in mind that this is just an example script which + This is just an example script which serves as a proof of concept. It does not handle all the possible scenarios and can be extended or altered in any - way, for example, to start/stop required services. + way, for example, to start or stop required services. - For this example, a standard UFS file system was used. + For this example, a standard UFS file system was used. To reduce the time needed for recovery, a journal-enabled - UFS or ZFS file system can be used instead. + UFS or ZFS file system can be used instead. More detailed information with additional examples can - be found in the HAST Wiki - page. + be found at http://wiki.FreeBSD.org/HAST. @@ -3893,22 +3897,21 @@ esac issues. However, as with any other software product, there may be times when it does not work as supposed. The sources of the problems may be different, but the rule of thumb is to - ensure that the time is synchronized between all nodes of the + ensure that the time is synchronized between the nodes of the cluster. - When troubleshooting HAST problems, the + When troubleshooting HAST, the debugging level of &man.hastd.8; should be increased by - starting &man.hastd.8; with -d. This + starting hastd with -d. This argument may be specified multiple times to further increase - the debugging level. A lot of useful information may be - obtained this way. Consider also using - -F, which starts &man.hastd.8; in the + the debugging level. Consider also using + -F, which starts hastd in the foreground. Recovering from the Split-brain Condition - Split-brain is when the nodes of the + Split-brain occurs when the nodes of the cluster are unable to communicate with each other, and both are configured as primary. This is a dangerous condition because it allows both nodes to make incompatible changes to @@ -3916,15 +3919,15 @@ esac system administrator. The administrator must decide which node has more - important changes (or merge them manually) and let + important changes or merge them manually. Then, let HAST perform full synchronization of the node which has the broken data. To do this, issue these commands on the node which needs to be resynchronized: - &prompt.root; hastctl role init <resource> -&prompt.root; hastctl create <resource> -&prompt.root; hastctl role secondary <resource> + &prompt.root; hastctl role init test +&prompt.root; hastctl create test +&prompt.root; hastctl role secondary test