Date: Sat, 20 Aug 2011 21:56:06 -0400 (EDT) From: Benjamin Kaduk <kaduk@MIT.EDU> To: Warren Block <wblock@wonkity.com> Cc: freebsd-doc@freebsd.org, freebsd-gnats-submit@freebsd.org Subject: Re: docs/159897: [patch] improve HAST section of Handbook Message-ID: <alpine.GSO.1.10.1108202132270.7526@multics.mit.edu> In-Reply-To: <201108182253.p7IMr0us086588@red.freebsd.org> References: <201108182253.p7IMr0us086588@red.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 18 Aug 2011, Warren Block wrote: > FreeBSD lightning 8.2-STABLE FreeBSD 8.2-STABLE #0: Wed Aug 17 19:31:39 MDT 2011 root@lightning:/usr/obj/usr/src/sys/LIGHTNING i386 >> Description: > Edit and polish the HAST section of the Handbook with an eye to conciseness and clarity. "concision" is three fewer characters :) (though OED has conciseness as older) >> How-To-Repeat: > >> Fix: > Apply patch. > > Patch attached with submission follows: > > --- en_US.ISO8859-1/books/handbook/disks/chapter.sgml.orig 2011-08-18 15:22:56.000000000 -0600 > +++ en_US.ISO8859-1/books/handbook/disks/chapter.sgml 2011-08-18 16:35:46.000000000 -0600 > @@ -4038,7 +4038,7 @@ > <sect2> > <title>Synopsis</title> > > - <para>High-availability is one of the main requirements in serious > + <para>High availability is one of the main requirements in serious > business applications and highly-available storage is a key > component in such environments. Highly Available STorage, or > <acronym>HAST<remark role="acronym">Highly Available > @@ -4109,7 +4109,7 @@ > drives.</para> > </listitem> > <listitem> > - <para>File system agnostic, thus allowing to use any file > + <para>File system agnostic, thus allowing use of any file I think "allowing the use" is better here. > system supported by &os;.</para> > </listitem> > <listitem> > @@ -4152,7 +4152,7 @@ > total.</para> > </note> > > - <para>Since the <acronym>HAST</acronym> works in > + <para>Since <acronym>HAST</acronym> works in "in a primary-secondary" > primary-secondary configuration, it allows only one of the > cluster nodes to be active at any given time. The > <literal>primary</literal> node, also called > @@ -4334,51 +4334,51 @@ > available.</para> > </note> > > - <para>HAST is not responsible for selecting node's role > - (<literal>primary</literal> or <literal>secondary</literal>). > - Node's role has to be configured by an administrator or other > - software like <application>Heartbeat</application> using the > + <para>A HAST node's role (<literal>primary</literal> or > + <literal>secondary</literal>) is selected by an administrator > + or other > + software like <application>Heartbeat</application> using the > &man.hastctl.8; utility. Move to the primary node > (<literal><replaceable>hasta</replaceable></literal>) and > - issue the following command:</para> > + issue this command:</para> > > <screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen> > > - <para>Similarly, run the following command on the secondary node > + <para>Similarly, run this command on the secondary node > (<literal><replaceable>hastb</replaceable></literal>):</para> > > <screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen> > > <caution> > - <para>It may happen that both of the nodes are not able to > - communicate with each other and both are configured as > - primary nodes; the consequence of this condition is called > - <literal>split-brain</literal>. In order to troubleshoot > + <para>When the nodes are unable to > + communicate with each other, and both are configured as > + primary nodes, the condition is called > + <literal>split-brain</literal>. To troubleshoot > this situation, follow the steps described in <xref > linkend="disks-hast-sb">.</para> > </caution> > > - <para>It is possible to verify the result with the > + <para>Verify the result with the > &man.hastctl.8; utility on each node:</para> > > <screen>&prompt.root; <userinput>hastctl status test</userinput></screen> > > - <para>The important text is the <literal>status</literal> line > - from its output and it should say <literal>complete</literal> > + <para>The important text is the <literal>status</literal> line, > + which should say <literal>complete</literal> > on each of the nodes. If it says <literal>degraded</literal>, > something went wrong. At this point, the synchronization > between the nodes has already started. The synchronization > - completes when the <command>hastctl status</command> command > + completes when <command>hastctl status</command> > reports 0 bytes of <literal>dirty</literal> extents.</para> > > > - <para>The last step is to create a filesystem on the > + <para>The next step is to create a filesystem on the > <devicename>/dev/hast/<replaceable>test</replaceable></devicename> > - GEOM provider and mount it. This has to be done on the > - <literal>primary</literal> node (as the > + GEOM provider and mount it. This must be done on the > + <literal>primary</literal> node, as > <filename>/dev/hast/<replaceable>test</replaceable></filename> > - appears only on the <literal>primary</literal> node), and > - it can take a few minutes depending on the size of the hard > + appears only on the <literal>primary</literal> node. > + It can take a few minutes depending on the size of the hard The pronoun "it" may be confusing, here -- I would probably just say "Creating the filesystem". > drive:</para> > > <screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput> > @@ -4387,9 +4387,9 @@ > > <para>Once the <acronym>HAST</acronym> framework is configured > properly, the final step is to make sure that > - <acronym>HAST</acronym> is started during the system boot time > - automatically. The following line should be added to the > - <filename>/etc/rc.conf</filename> file:</para> > + <acronym>HAST</acronym> is started automatically during the system > + boot. This line is added to > + <filename>/etc/rc.conf</filename>:</para> "This line is added" is a pretty unusual grammatical construct for what is attempting to be conveyed. "To do so, add this line to" I think says things more clearly. > > <programlisting>hastd_enable="YES"</programlisting> > > @@ -4397,26 +4397,25 @@ > <title>Failover Configuration</title> > > <para>The goal of this example is to build a robust storage > - system which is resistant from the failures of any given node. > - The key task here is to remedy a scenario when a > - <literal>primary</literal> node of the cluster fails. Should > - it happen, the <literal>secondary</literal> node is there to > + system which is resistant to failures of any given node. The plural is not consistent between "failures" and "node". "resistant to the failure of any given node" is I think the conventional way to say this (note that the original also had the incorrect plural "failures"). > + The scenario is that a > + <literal>primary</literal> node of the cluster fails. If > + this happens, the <literal>secondary</literal> node is there to > take over seamlessly, check and mount the file system, and > continue to work without missing a single bit of data.</para> > > - <para>In order to accomplish this task, it will be required to > - utilize another feature available under &os; which provides > + <para>To accomplish this task, another &os; feature provides > for automatic failover on the IP layer — > - <acronym>CARP</acronym>. <acronym>CARP</acronym> stands for > - Common Address Redundancy Protocol and allows multiple hosts > + <acronym>CARP</acronym>. <acronym>CARP</acronym> (Common Address > + Redundancy Protocol) allows multiple hosts > on the same network segment to share an IP address. Set up > <acronym>CARP</acronym> on both nodes of the cluster according > to the documentation available in <xref linkend="carp">. > - After completing this task, each node should have its own > + After setup, each node will have its own > <devicename>carp0</devicename> interface with a shared IP > address <replaceable>172.16.0.254</replaceable>. > - Obviously, the primary <acronym>HAST</acronym> node of the > - cluster has to be the master <acronym>CARP</acronym> > + The primary <acronym>HAST</acronym> node of the > + cluster must be the master <acronym>CARP</acronym> > node.</para> > > <para>The <acronym>HAST</acronym> pool created in the previous > @@ -4430,17 +4429,17 @@ > > <para>In the event of <acronym>CARP</acronym> interfaces going > up or down, the &os; operating system generates a &man.devd.8; > - event, which makes it possible to watch for the state changes > + event, making it possible to watch for the state changes > on the <acronym>CARP</acronym> interfaces. A state change on > the <acronym>CARP</acronym> interface is an indication that > - one of the nodes failed or came back online. In such a case, > - it is possible to run a particular script which will > + one of the nodes failed or came back online. These state change > + events make it possible to run a script which will > automatically handle the failover.</para> I think "handle HAST failover" would be an improvement. > > - <para>To be able to catch the state changes on the > - <acronym>CARP</acronym> interfaces, the following > - configuration has to be added to the > - <filename>/etc/devd.conf</filename> file on each node:</para> > + <para>To be able to catch state changes on the > + <acronym>CARP</acronym> interfaces, add this > + configuration to > + <filename>/etc/devd.conf</filename> on each node:</para> > > <programlisting>notify 30 { > match "system" "IFNET"; > @@ -4456,12 +4455,12 @@ > action "/usr/local/sbin/carp-hast-switch slave"; > };</programlisting> > > - <para>To put the new configuration into effect, run the > - following command on both nodes:</para> > + <para>Restart &man.devd.8; on both nodes o put the new configuration "to" > + into effect:</para> > > <screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen> > > - <para>In the event that the <devicename>carp0</devicename> > + <para>When the <devicename>carp0</devicename> > interface goes up or down (i.e. the interface state changes), > the system generates a notification, allowing the &man.devd.8; > subsystem to run an arbitrary script, in this case > @@ -4615,41 +4614,40 @@ > <sect3> > <title>General Troubleshooting Tips</title> > > - <para><acronym>HAST</acronym> should be generally working > - without any issues, however as with any other software > + <para><acronym>HAST</acronym> should generally work > + without issues. However, as with any other software > product, there may be times when it does not work as > supposed. The sources of the problems may be different, but > the rule of thumb is to ensure that the time is synchronized > between all nodes of the cluster.</para> > > - <para>The debugging level of the &man.hastd.8; should be > - increased when troubleshooting <acronym>HAST</acronym> > - problems. This can be accomplished by starting the > + <para>When troubleshooting <acronym>HAST</acronym> problems, > + the debugging level of &man.hastd.8; should be increased > + by starting the > &man.hastd.8; daemon with the <literal>-d</literal> > - argument. Note, that this argument may be specified > + argument. Note that this argument may be specified > multiple times to further increase the debugging level. A > - lot of useful information may be obtained this way. It > - should be also considered to use <literal>-F</literal> > - argument, which will start the &man.hastd.8; daemon in > + lot of useful information may be obtained this way. Consider > + also using the <literal>-F</literal> > + argument, which starts the &man.hastd.8; daemon in the > foreground.</para> > </sect3> > > <sect3 id="disks-hast-sb"> > <title>Recovering from the Split-brain Condition</title> > > - <para>The consequence of a situation when both nodes of the > - cluster are not able to communicate with each other and both > - are configured as primary nodes is called > - <literal>split-brain</literal>. This is a dangerous > + <para><literal>Split-brain</literal> is when the nodes of the > + cluster are unable to communicate with each other, and both > + are configured as primary. This is a dangerous > condition because it allows both nodes to make incompatible > - changes to the data. This situation has to be handled by > - the system administrator manually.</para> > + changes to the data. This problem must be corrected > + manually by the system administrator.</para> > > - <para>In order to fix this situation the administrator has to > + <para>The administrator must > decide which node has more important changes (or merge them > - manually) and let the <acronym>HAST</acronym> perform > + manually) and let <acronym>HAST</acronym> perform > the full synchronization of the node which has the broken Just "full synchronization", I think. Thanks for spotting these grammar rough edges and putting together a patch! -Ben Kaduk > - data. To do this, issue the following commands on the node > + data. To do this, issue these commands on the node > which needs to be resynchronized:</para> > > <screen>&prompt.root; <userinput>hastctl role init <resource></userinput> > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.GSO.1.10.1108202132270.7526>