From owner-freebsd-doc@FreeBSD.ORG Sun Aug 21 01:56:10 2011 Return-Path: Delivered-To: freebsd-doc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB4681065670; Sun, 21 Aug 2011 01:56:10 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5A2DC8FC12; Sun, 21 Aug 2011 01:56:09 +0000 (UTC) X-AuditID: 12074424-b7bcaae000000a05-0a-4e5065e27fc9 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 59.3F.02565.2E5605E4; Sat, 20 Aug 2011 21:56:50 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id p7L1u95S030165; Sat, 20 Aug 2011 21:56:09 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p7L1u6Ek007767 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 20 Aug 2011 21:56:08 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p7L1u6PH025956; Sat, 20 Aug 2011 21:56:06 -0400 (EDT) Date: Sat, 20 Aug 2011 21:56:06 -0400 (EDT) From: Benjamin Kaduk To: Warren Block In-Reply-To: <201108182253.p7IMr0us086588@red.freebsd.org> Message-ID: References: <201108182253.p7IMr0us086588@red.freebsd.org> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrAIsWRmVeSWpSXmKPExsUixCmqrPsoNcDPoOOXvsWpM12sFi1PVrNb /Drl4cDsMePTfBaPIx2PGAOYorhsUlJzMstSi/TtErgyLq7ZylRwMLNi0YUWlgbGr8FdjJwc EgImEhs7N7FC2GISF+6tZwOxhQT2MUq8WaXWxcgFZG9glNj38jszhHOASaL97TUop4FRYs7M KewgLSwC2hLHDh4Ea2cTUJGY+WYjkM3BISKgKrH3jDVImFnAVmLPhCfMILawgJPElRnnwVo5 BawkjnW+YQSxeQUcJA5tuAR1haXEg499YHFRAR2J1funsEDUCEqcnPmEBWKmpcS/tb9YJzAK zkKSmoUktYCRaRWjbEpulW5uYmZOcWqybnFyYl5eapGuuV5uZoleakrpJkZQwLK7qOxgbD6k dIhRgINRiYc38Kq/nxBrYllxZe4hRkkOJiVR3pCkAD8hvqT8lMqMxOKM+KLSnNTiQ4wSHMxK IrxOQkA53pTEyqrUonyYlDQHi5I4r81OBz8hgfTEktTs1NSC1CKYrAwHh5IErzowMoUEi1LT UyvSMnNKENJMHJwgw3mAhkuA1PAWFyTmFmemQ+RPMSpKifOqgSQEQBIZpXlwvbCE8opRHOgV YV5JkCoeYDKC634FNJgJaLDULh+QwSWJCCmpBka+22Lvm3ftFo1YGB46paBt48uyaZ3C2yVr p1YFFaqxn925toBtqVFBl2PHkSrzOc5/FoUd3Xj9w+/H0zS/q5fE9OxaGNLPFhp8MVbBvcYs +ueP5TZJiYqp17fmJre2XJwx1W2CvdzD/86z+WpSnz7/bvjBUSbu5r+cWq60ifv3vXh/RvdH gIoSS3FGoqEWc1FxIgAMzTo+AwMAAA== Cc: freebsd-doc@freebsd.org, freebsd-gnats-submit@freebsd.org Subject: Re: docs/159897: [patch] improve HAST section of Handbook X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 01:56:10 -0000 On Thu, 18 Aug 2011, Warren Block wrote: > FreeBSD lightning 8.2-STABLE FreeBSD 8.2-STABLE #0: Wed Aug 17 19:31:39 MDT 2011 root@lightning:/usr/obj/usr/src/sys/LIGHTNING i386 >> Description: > Edit and polish the HAST section of the Handbook with an eye to conciseness and clarity. "concision" is three fewer characters :) (though OED has conciseness as older) >> How-To-Repeat: > >> Fix: > Apply patch. > > Patch attached with submission follows: > > --- en_US.ISO8859-1/books/handbook/disks/chapter.sgml.orig 2011-08-18 15:22:56.000000000 -0600 > +++ en_US.ISO8859-1/books/handbook/disks/chapter.sgml 2011-08-18 16:35:46.000000000 -0600 > @@ -4038,7 +4038,7 @@ > > Synopsis > > - High-availability is one of the main requirements in serious > + High availability is one of the main requirements in serious > business applications and highly-available storage is a key > component in such environments. Highly Available STorage, or > HASTHighly Available > @@ -4109,7 +4109,7 @@ > drives. > > > - File system agnostic, thus allowing to use any file > + File system agnostic, thus allowing use of any file I think "allowing the use" is better here. > system supported by &os;. > > > @@ -4152,7 +4152,7 @@ > total. > > > - Since the HAST works in > + Since HAST works in "in a primary-secondary" > primary-secondary configuration, it allows only one of the > cluster nodes to be active at any given time. The > primary node, also called > @@ -4334,51 +4334,51 @@ > available. > > > - HAST is not responsible for selecting node's role > - (primary or secondary). > - Node's role has to be configured by an administrator or other > - software like Heartbeat using the > + A HAST node's role (primary or > + secondary) is selected by an administrator > + or other > + software like Heartbeat using the > &man.hastctl.8; utility. Move to the primary node > (hasta) and > - issue the following command: > + issue this command: > > &prompt.root; hastctl role primary test > > - Similarly, run the following command on the secondary node > + Similarly, run this command on the secondary node > (hastb): > > &prompt.root; hastctl role secondary test > > > - It may happen that both of the nodes are not able to > - communicate with each other and both are configured as > - primary nodes; the consequence of this condition is called > - split-brain. In order to troubleshoot > + When the nodes are unable to > + communicate with each other, and both are configured as > + primary nodes, the condition is called > + split-brain. To troubleshoot > this situation, follow the steps described in linkend="disks-hast-sb">. > > > - It is possible to verify the result with the > + Verify the result with the > &man.hastctl.8; utility on each node: > > &prompt.root; hastctl status test > > - The important text is the status line > - from its output and it should say complete > + The important text is the status line, > + which should say complete > on each of the nodes. If it says degraded, > something went wrong. At this point, the synchronization > between the nodes has already started. The synchronization > - completes when the hastctl status command > + completes when hastctl status > reports 0 bytes of dirty extents. > > > - The last step is to create a filesystem on the > + The next step is to create a filesystem on the > /dev/hast/test > - GEOM provider and mount it. This has to be done on the > - primary node (as the > + GEOM provider and mount it. This must be done on the > + primary node, as > /dev/hast/test > - appears only on the primary node), and > - it can take a few minutes depending on the size of the hard > + appears only on the primary node. > + It can take a few minutes depending on the size of the hard The pronoun "it" may be confusing, here -- I would probably just say "Creating the filesystem". > drive: > > &prompt.root; newfs -U /dev/hast/test > @@ -4387,9 +4387,9 @@ > > Once the HAST framework is configured > properly, the final step is to make sure that > - HAST is started during the system boot time > - automatically. The following line should be added to the > - /etc/rc.conf file: > + HAST is started automatically during the system > + boot. This line is added to > + /etc/rc.conf: "This line is added" is a pretty unusual grammatical construct for what is attempting to be conveyed. "To do so, add this line to" I think says things more clearly. > > hastd_enable="YES" > > @@ -4397,26 +4397,25 @@ > Failover Configuration > > The goal of this example is to build a robust storage > - system which is resistant from the failures of any given node. > - The key task here is to remedy a scenario when a > - primary node of the cluster fails. Should > - it happen, the secondary node is there to > + system which is resistant to failures of any given node. The plural is not consistent between "failures" and "node". "resistant to the failure of any given node" is I think the conventional way to say this (note that the original also had the incorrect plural "failures"). > + The scenario is that a > + primary node of the cluster fails. If > + this happens, the secondary node is there to > take over seamlessly, check and mount the file system, and > continue to work without missing a single bit of data. > > - In order to accomplish this task, it will be required to > - utilize another feature available under &os; which provides > + To accomplish this task, another &os; feature provides > for automatic failover on the IP layer — > - CARP. CARP stands for > - Common Address Redundancy Protocol and allows multiple hosts > + CARP. CARP (Common Address > + Redundancy Protocol) allows multiple hosts > on the same network segment to share an IP address. Set up > CARP on both nodes of the cluster according > to the documentation available in . > - After completing this task, each node should have its own > + After setup, each node will have its own > carp0 interface with a shared IP > address 172.16.0.254. > - Obviously, the primary HAST node of the > - cluster has to be the master CARP > + The primary HAST node of the > + cluster must be the master CARP > node. > > The HAST pool created in the previous > @@ -4430,17 +4429,17 @@ > > In the event of CARP interfaces going > up or down, the &os; operating system generates a &man.devd.8; > - event, which makes it possible to watch for the state changes > + event, making it possible to watch for the state changes > on the CARP interfaces. A state change on > the CARP interface is an indication that > - one of the nodes failed or came back online. In such a case, > - it is possible to run a particular script which will > + one of the nodes failed or came back online. These state change > + events make it possible to run a script which will > automatically handle the failover. I think "handle HAST failover" would be an improvement. > > - To be able to catch the state changes on the > - CARP interfaces, the following > - configuration has to be added to the > - /etc/devd.conf file on each node: > + To be able to catch state changes on the > + CARP interfaces, add this > + configuration to > + /etc/devd.conf on each node: > > notify 30 { > match "system" "IFNET"; > @@ -4456,12 +4455,12 @@ > action "/usr/local/sbin/carp-hast-switch slave"; > }; > > - To put the new configuration into effect, run the > - following command on both nodes: > + Restart &man.devd.8; on both nodes o put the new configuration "to" > + into effect: > > &prompt.root; /etc/rc.d/devd restart > > - In the event that the carp0 > + When the carp0 > interface goes up or down (i.e. the interface state changes), > the system generates a notification, allowing the &man.devd.8; > subsystem to run an arbitrary script, in this case > @@ -4615,41 +4614,40 @@ > > General Troubleshooting Tips > > - HAST should be generally working > - without any issues, however as with any other software > + HAST should generally work > + without issues. However, as with any other software > product, there may be times when it does not work as > supposed. The sources of the problems may be different, but > the rule of thumb is to ensure that the time is synchronized > between all nodes of the cluster. > > - The debugging level of the &man.hastd.8; should be > - increased when troubleshooting HAST > - problems. This can be accomplished by starting the > + When troubleshooting HAST problems, > + the debugging level of &man.hastd.8; should be increased > + by starting the > &man.hastd.8; daemon with the -d > - argument. Note, that this argument may be specified > + argument. Note that this argument may be specified > multiple times to further increase the debugging level. A > - lot of useful information may be obtained this way. It > - should be also considered to use -F > - argument, which will start the &man.hastd.8; daemon in > + lot of useful information may be obtained this way. Consider > + also using the -F > + argument, which starts the &man.hastd.8; daemon in the > foreground. > > > > Recovering from the Split-brain Condition > > - The consequence of a situation when both nodes of the > - cluster are not able to communicate with each other and both > - are configured as primary nodes is called > - split-brain. This is a dangerous > + Split-brain is when the nodes of the > + cluster are unable to communicate with each other, and both > + are configured as primary. This is a dangerous > condition because it allows both nodes to make incompatible > - changes to the data. This situation has to be handled by > - the system administrator manually. > + changes to the data. This problem must be corrected > + manually by the system administrator. > > - In order to fix this situation the administrator has to > + The administrator must > decide which node has more important changes (or merge them > - manually) and let the HAST perform > + manually) and let HAST perform > the full synchronization of the node which has the broken Just "full synchronization", I think. Thanks for spotting these grammar rough edges and putting together a patch! -Ben Kaduk > - data. To do this, issue the following commands on the node > + data. To do this, issue these commands on the node > which needs to be resynchronized: > > &prompt.root; hastctl role init <resource> > >