FreeBSD Mail Archives

Date:      Sun, 8 Sep 2019 20:08:15 +0000 (UTC)
From:      Benedict Reuschling <bcr@FreeBSD.org>
To:        doc-committers@freebsd.org, svn-doc-all@freebsd.org, svn-doc-head@freebsd.org
Subject:   svn commit: r53386 - head/en_US.ISO8859-1/books/developers-handbook/x86
Message-ID:  <201909082008.x88K8FBD016322@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help

Author: bcr
Date: Sun Sep  8 20:08:15 2019
New Revision: 53386
URL: https://svnweb.freebsd.org/changeset/doc/53386

Log:
  Mass cleanup of textproc/igor warnings including:
  - use two spaces at sentence start
  - space before content
  - wrap long line
  - start content on same line
  - straggling <tag>
  - put listing on same line
  - add blank line after <tag> on previous line

Modified:
  head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml

Modified: head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml
==============================================================================
--- head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml	Sun Sep  8 19:40:52 2019	(r53385)
+++ head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml	Sun Sep  8 20:08:15 2019	(r53386)
@@ -532,16 +532,16 @@ sys.err:
     <para>The library approach may seem inconvenient at first because
       it requires you to produce a separate file your code depends on.
       But it has many advantages: For one, you only need to write it
-      once and can use it for all your programs. You can even let
+      once and can use it for all your programs.  You can even let
       other assembly language programmers use it, or perhaps use one
-      written by someone else. But perhaps the greatest advantage of
+      written by someone else.  But perhaps the greatest advantage of
       the library is that your code can be ported to other systems,
       even by other programmers, by simply writing a new library
       without any changes to your code.</para>
 
     <para>If you do not like the idea of having a library, you can at
       least place all your system calls in a separate assembly
-      language file and link it with your main program. Here, again,
+      language file and link it with your main program.  Here, again,
       all porters have to do is create a new object file to link with
       your main program.</para>
   </sect2>
@@ -554,7 +554,7 @@ sys.err:
       include in your code.</para>
 
     <para>Porters of your software will simply write a new include
-      file. No library or external object file is necessary, yet your
+      file.  No library or external object file is necessary, yet your
       code is portable without any need to edit the code.</para>
 
     <note>
@@ -651,111 +651,100 @@ access.the.bsd.kernel:
 
   <para>Lines 3-5 are the data: Line 3 starts the data
     section/segment.  Line 4 contains the string "Hello, World!"
-    followed by a new line (<constant>0Ah</constant>). Line 5 creates
+    followed by a new line (<constant>0Ah</constant>).  Line 5 creates
     a constant that contains the length of the string from line 4 in
     bytes.</para>
 
-  <para> Lines 7-16 contain the code. Note that FreeBSD uses the
+  <para>Lines 7-16 contain the code.  Note that FreeBSD uses the
     <emphasis>elf</emphasis> file format for its executables, which
     requires every program to start at the point labeled
     <varname>_start</varname> (or, more precisely, the linker expects
-    that). This label has to be global.</para>
+    that).  This label has to be global.</para>
 
   <para>Lines 10-13 ask the system to write <varname>hbytes</varname>
     bytes of the <varname>hello</varname> string to
     <varname>stdout</varname>.</para>
 
   <para>Lines 15-16 ask the system to end the program with the return
-    value of <constant>0</constant>. The <function
+    value of <constant>0</constant>.  The <function
       role="syscall">SYS_exit</function> syscall never returns, so the
     code ends there.</para>
 
   <note>
     <para>If you have come to &unix; from <acronym>&ms-dos;</acronym>
       assembly language background, you may be used to writing
-      directly to the video hardware. You will never have to worry
-      about this in FreeBSD, or any other flavor of &unix;. As far as
+      directly to the video hardware.  You will never have to worry
+      about this in FreeBSD, or any other flavor of &unix;.  As far as
       you are concerned, you are writing to a file known as
-      <filename>stdout</filename>. This can be the video screen, or a
+      <filename>stdout</filename>.  This can be the video screen, or a
       <application>telnet</application> terminal, or an actual file,
-      or even the input of another program. Which one it is, is for
+      or even the input of another program.  Which one it is, is for
       the system to figure out.</para>
   </note>
 
-  <sect2 xml:id="x86-assemble-1"><title>Assembling the Code</title>
+  <sect2 xml:id="x86-assemble-1">
+    <title>Assembling the Code</title>
 
-  <para>Type the code (except the line numbers) in an editor, and save
-    it in a file named <filename>hello.asm</filename>. You need
-    <application>nasm</application> to assemble it.</para>
+    <para>Type the code (except the line numbers) in an editor, and
+      save it in a file named <filename>hello.asm</filename>.  You
+      need <application>nasm</application> to assemble it.</para>
 
-    <sect3 xml:id="x86-get-nasm"><title>Installing <application>nasm</application></title>
+    <sect3 xml:id="x86-get-nasm">
+      <title>Installing <application>nasm</application></title>
 
       <para>If you do not have <application>nasm</application>,
 	type:</para>
 
-<screen>&prompt.user; <userinput>su</userinput>
+      <screen>&prompt.user; <userinput>su</userinput>
 Password:<userinput><replaceable>your root password</replaceable></userinput>
 &prompt.root; <userinput>cd /usr/ports/devel/nasm</userinput>
 &prompt.root; <userinput>make install</userinput>
 &prompt.root; <userinput>exit</userinput>
 &prompt.user;</screen>
 
-<para>
-You may type <userinput>make install clean</userinput> instead of just
-<userinput>make install</userinput> if you do not want to keep
-<application>nasm</application> source code.
-</para>
+      <para>You may type <userinput>make install clean</userinput>
+	instead of just <userinput>make install</userinput> if you do
+	not want to keep <application>nasm</application> source
+	code.</para>
 
-<para>
-Either way, FreeBSD will automatically download
-<application>nasm</application> from the Internet,
-compile it, and install it on your system.
-</para>
+      <para>Either way, FreeBSD will automatically download
+	<application>nasm</application> from the Internet, compile it,
+	and install it on your system.</para>
 
-<note>
-<para>
-If your system is not FreeBSD, you need to get
-<application>nasm</application> from its
-<link xlink:href="https://sourceforge.net/projects/nasm">home
-page</link>. You can still use it to assemble FreeBSD code.
-</para>
-</note>
+      <note>
+	<para>If your system is not FreeBSD, you need to get
+	  <application>nasm</application> from its <link
+	    xlink:href="https://sourceforge.net/projects/nasm">home
+	    page</link>.  You can still use it to assemble FreeBSD
+	  code.</para>
+      </note>
 
-<para>
-Now you can assemble, link, and run the code:
-</para>
+      <para>Now you can assemble, link, and run the code:</para>
 
-<screen>&prompt.user; <userinput>nasm -f elf hello.asm</userinput>
+      <screen>&prompt.user; <userinput>nasm -f elf hello.asm</userinput>
 &prompt.user; <userinput>ld -s -o hello hello.o</userinput>
 &prompt.user; <userinput>./hello</userinput>
 Hello, World!
 &prompt.user;</screen>
-
-</sect3>
-
-</sect2>
-
+    </sect3>
+  </sect2>
 </sect1>
 
 <sect1 xml:id="x86-unix-filters">
-<title>Writing &unix; Filters</title>
+  <title>Writing &unix; Filters</title>
 
-<para>
-A common type of &unix; application is a filter&mdash;a program
-that reads data from the <filename>stdin</filename>, processes it
-somehow, then writes the result to <filename>stdout</filename>.
-</para>
+  <para>A common type of &unix; application is a filter&mdash;a
+    program that reads data from the <filename>stdin</filename>,
+    processes it somehow, then writes the result to
+    <filename>stdout</filename>.</para>
 
-<para>
-In this chapter, we shall develop a simple filter, and
-learn how to read from <filename>stdin</filename> and write to
-<filename>stdout</filename>. This filter will convert each byte
-of its input into a hexadecimal number followed by a
-blank space.
-</para>
+  <para>In this chapter, we shall develop a simple filter, and
+    learn how to read from <filename>stdin</filename> and write to
+    <filename>stdout</filename>.  This filter will convert each byte
+    of its input into a hexadecimal number followed by a blank
+    space.</para>
 
-<programlisting>
-%include	'system.inc'
+  <programlisting>%include	'system.inc'
 
 section	.data
 hex	db	'0123456789ABCDEF'
@@ -793,102 +782,85 @@ _start:
 
 .done:
 	push	dword 0
-	sys.exit
-</programlisting>
-<para>
-In the data section we create an array called <varname>hex</varname>.
-It contains the 16 hexadecimal digits in ascending order.
-The array is followed by a buffer which we will use for
-both input and output. The first two bytes of the buffer
-are initially set to <constant>0</constant>. This is where we will write
-the two hexadecimal digits (the first byte also is
-where we will read the input). The third byte is a
-space.
-</para>
+	sys.exit</programlisting>
 
-<para>
-The code section consists of four parts: Reading the byte,
-converting it to a hexadecimal number, writing the result,
-and eventually exiting the program.
-</para>
+      <para>In the data section we create an array called
+	<varname>hex</varname>.  It contains the 16 hexadecimal digits
+	in ascending order.  The array is followed by a buffer which
+	we will use for both input and output.  The first two bytes of
+	the buffer are initially set to <constant>0</constant>.  This
+	is where we will write the two hexadecimal digits (the first
+	byte also is where we will read the input).  The third byte is
+	a space.</para>
 
-<para>
-To read the byte, we ask the system to read one byte
-from <filename>stdin</filename>, and store it in the first byte
-of the <varname>buffer</varname>. The system returns the number
-of bytes read in <varname role="register">EAX</varname>. This will be <constant>1</constant>
-while data is coming, or <constant>0</constant>, when no more input
-data is available. Therefore, we check the value of
-<varname role="register">EAX</varname>. If it is <constant>0</constant>,
-we jump to <varname>.done</varname>, otherwise we continue.
-</para>
+      <para>The code section consists of four parts: Reading the byte,
+	converting it to a hexadecimal number, writing the result, and
+	eventually exiting the program.</para>
 
-<note>
-<para>
-For simplicity sake, we are ignoring the possibility
-of an error condition at this time.
-</para>
-</note>
+      <para>To read the byte, we ask the system to read one byte from
+	<filename>stdin</filename>, and store it in the first byte of
+	the <varname>buffer</varname>.  The system returns the number
+	of bytes read in <varname role="register">EAX</varname>.  This
+	will be <constant>1</constant> while data is coming, or
+	<constant>0</constant>, when no more input data is available.
+	Therefore, we check the value of <varname
+	  role="register">EAX</varname>.  If it is
+	<constant>0</constant>, we jump to <varname>.done</varname>,
+	otherwise we continue.</para>
 
-<para>
-The hexadecimal conversion reads the byte from the
-<varname>buffer</varname> into <varname role="register">EAX</varname>, or actually just
-<varname role="register">AL</varname>, while clearing the remaining bits of
-<varname role="register">EAX</varname> to zeros. We also copy the byte to
-<varname role="register">EDX</varname> because we need to convert the upper
-four bits (nibble) separately from the lower
-four bits. We store the result in the first two
-bytes of the buffer.
-</para>
+      <note>
+	<para>For simplicity sake, we are ignoring the possibility of
+	  an error condition at this time.</para>
+      </note>
 
-<para>
-Next, we ask the system to write the three bytes
-of the buffer, i.e., the two hexadecimal digits and
-the blank space, to <filename>stdout</filename>. We then
-jump back to the beginning of the program and
-process the next byte.
-</para>
+      <para>The hexadecimal conversion reads the byte from the
+	<varname>buffer</varname> into <varname
+	  role="register">EAX</varname>, or actually just <varname
+	  role="register">AL</varname>, while clearing the remaining
+	bits of <varname role="register">EAX</varname> to zeros.  We
+	also copy the byte to <varname role="register">EDX</varname>
+	because we need to convert the upper four bits (nibble)
+	separately from the lower four bits.  We store the result in
+	the first two bytes of the buffer.</para>
 
-<para>
-Once there is no more input left, we ask the system
-to exit our program, returning a zero, which is
-the traditional value meaning the program was
-successful.
-</para>
+      <para>Next, we ask the system to write the three bytes of the
+	buffer, i.e., the two hexadecimal digits and the blank space,
+	to <filename>stdout</filename>.  We then jump back to the
+	beginning of the program and process the next byte.</para>
 
-<para>
-Go ahead, and save the code in a file named <filename>hex.asm</filename>,
-then type the following (the <userinput>^D</userinput> means press the
-control key and type <userinput>D</userinput> while holding the
-control key down):
-</para>
+      <para>Once there is no more input left, we ask the system to
+	exit our program, returning a zero, which is the traditional
+	value meaning the program was successful.</para>
 
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+      <para>Go ahead, and save the code in a file named
+	<filename>hex.asm</filename>, then type the following (the
+	<userinput>^D</userinput> means press the control key and type
+	<userinput>D</userinput> while holding the control key
+	down):</para>
+
+      <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
 &prompt.user; <userinput>ld -s -o hex hex.o</userinput>
 &prompt.user; <userinput>./hex</userinput>
 <userinput>Hello, World!</userinput>
 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A <userinput>Here I come!</userinput>
 48 65 72 65 20 49 20 63 6F 6D 65 21 0A <userinput>^D</userinput> &prompt.user;</screen>
 
-<note>
-<para>
-If you are migrating to &unix; from <acronym>&ms-dos;</acronym>,
-you may be wondering why each line ends with <constant>0A</constant>
-instead of <constant>0D 0A</constant>.
-This is because &unix; does not use the cr/lf convention, but
-a "new line" convention, which is <constant>0A</constant> in hexadecimal.
-</para>
-</note>
+      <note>
+	<para>If you are migrating to &unix; from
+	  <acronym>&ms-dos;</acronym>, you may be wondering why each
+	  line ends with <constant>0A</constant> instead of
+	  <constant>0D 0A</constant>.  This is because &unix; does not
+	  use the cr/lf convention, but a "new line" convention, which
+	  is <constant>0A</constant> in hexadecimal.</para>
+      </note>
 
-<para>
-Can we improve this? Well, for one, it is a bit confusing because
-once we have converted a line of text, our input no longer
-starts at the beginning of the line. We can modify it to print
-a new line instead of a space after each <constant>0A</constant>:
-</para>
+      <para>Can we improve this? Well, for one, it is a bit confusing
+	because once we have converted a line of text, our input no
+	longer starts at the beginning of the line.  We can modify it
+	to print a new line instead of a space after each
+	<constant>0A</constant>:</para>
 
-<programlisting>
-%include	'system.inc'
+      <programlisting>%include	'system.inc'
 
 section	.data
 hex	db	'0123456789ABCDEF'
@@ -935,29 +907,26 @@ _start:
 
 .done:
 	push	dword 0
-	sys.exit
-</programlisting>
-<para>
-We have stored the space in the <varname role="register">CL</varname> register. We can
-do this safely because, unlike &microsoft.windows;, &unix; system
-calls do not modify the value of any register they do not use
-to return a value in.
-</para>
+	sys.exit</programlisting>
 
-<para>
-That means we only need to set <varname role="register">CL</varname> once. We have, therefore,
-added a new label <varname>.loop</varname> and jump to it for the next byte
-instead of jumping at <varname>_start</varname>. We have also added the
-<varname>.hex</varname> label so we can either have a blank space or a
-new line as the third byte of the <varname>buffer</varname>.
-</para>
+      <para>We have stored the space in the <varname
+	  role="register">CL</varname> register.  We can do this
+	safely because, unlike &microsoft.windows;, &unix; system
+	calls do not modify the value of any register they do not use
+	to return a value in.</para>
 
-<para>
-Once you have changed <filename>hex.asm</filename> to reflect
-these changes, type:
-</para>
+      <para>That means we only need to set <varname
+	  role="register">CL</varname> once.  We have, therefore,
+	added a new label <varname>.loop</varname> and jump to it for
+	the next byte instead of jumping at <varname>_start</varname>.
+	We have also added the <varname>.hex</varname> label so we can
+	either have a blank space or a new line as the third byte of
+	the <varname>buffer</varname>.</para>
 
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+      <para>Once you have changed <filename>hex.asm</filename> to
+	reflect these changes, type:</para>
+
+      <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
 &prompt.user; <userinput>ld -s -o hex hex.o</userinput>
 &prompt.user; <userinput>./hex</userinput>
 <userinput>Hello, World!</userinput>
@@ -966,42 +935,33 @@ these changes, type:
 48 65 72 65 20 49 20 63 6F 6D 65 21 0A
 <userinput>^D</userinput> &prompt.user;</screen>
 
-<para>
-That looks better. But this code is quite inefficient! We
-are making a system call for every single byte twice (once
-to read it, another time to write the output).
-</para>
+      <para>That looks better.  But this code is quite inefficient! We
+	are making a system call for every single byte twice (once to
+	read it, another time to write the output).</para>
+    </sect1>
 
-</sect1>
+    <sect1 xml:id="x86-buffered-io">
+      <title>Buffered Input and Output</title>
 
-<sect1 xml:id="x86-buffered-io">
-<title>Buffered Input and Output</title>
+      <para>We can improve the efficiency of our code by buffering our
+	input and output.  We create an input buffer and read a whole
+	sequence of bytes at one time.  Then we fetch them one by one
+	from the buffer.</para>
 
-<para>
-We can improve the efficiency of our code by buffering our
-input and output. We create an input buffer and read a whole
-sequence of bytes at one time. Then we fetch them one by one
-from the buffer.
-</para>
+      <para>We also create an output buffer.  We store our output in
+	it until it is full.  At that time we ask the kernel to write
+	the contents of the buffer to
+	<filename>stdout</filename>.</para>
 
-<para>
-We also create an output buffer. We store our output in it until
-it is full. At that time we ask the kernel to write the contents
-of the buffer to <filename>stdout</filename>.
-</para>
+      <para>The program ends when there is no more input.  But we
+	still need to ask the kernel to write the contents of our
+	output buffer to <filename>stdout</filename> one last time,
+	otherwise some of our output would make it to the output
+	buffer, but never be sent out.  Do not forget that, or you
+	will be wondering why some of your output is missing.</para>
 
-<para>
-The program ends when there is no more input. But we still need
-to ask the kernel to write the contents of our output buffer
-to <filename>stdout</filename> one last time, otherwise some of our output
-would make it to the output buffer, but never be sent out.
-Do not forget that, or you will be wondering why some of your
-output is missing.
-</para>
+      <programlisting>%include	'system.inc'
 
-<programlisting>
-%include	'system.inc'
-
 %define	BUFSIZE	2048
 
 section	.data
@@ -1092,39 +1052,35 @@ write:
 	add	esp, byte 12
 	sub	eax, eax
 	sub	ecx, ecx	; buffer is empty now
-	ret
-</programlisting>
-<para>
-We now have a third section in the source code, named
-<varname>.bss</varname>. This section is not included in our
-executable file, and, therefore, cannot be initialized. We use
-<function role="opcode">resb</function> instead of <function role="opcode">db</function>.
-It simply reserves the requested size of uninitialized memory
-for our use.
-</para>
+	ret</programlisting>
 
-<para>
-We take advantage of the fact that the system does not modify the
-registers: We use registers for what, otherwise, would have to be
-global variables stored in the <varname>.data</varname> section. This is
-also why the &unix; convention of passing parameters to system calls
-on the stack is superior to the Microsoft convention of passing
-them in the registers: We can keep the registers for our own use.
-</para>
+      <para>We now have a third section in the source code, named
+	<varname>.bss</varname>.  This section is not included in our
+	executable file, and, therefore, cannot be initialized.  We
+	use <function role="opcode">resb</function> instead of
+	<function role="opcode">db</function>.  It simply reserves
+	the requested size of uninitialized memory for our use.</para>
 
-<para>
-We use <varname role="register">EDI</varname> and <varname role="register">ESI</varname> as pointers to the next byte
-to be read from or written to. We use <varname role="register">EBX</varname> and
-<varname role="register">ECX</varname> to keep count of the number of bytes in the
-two buffers, so we know when to dump the output to, or read more
-input from, the system.
-</para>
+      <para>We take advantage of the fact that the system does not
+	modify the registers: We use registers for what, otherwise,
+	would have to be global variables stored in the
+	<varname>.data</varname> section.  This is also why the
+	&unix; convention of passing parameters to system calls on the
+	stack is superior to the Microsoft convention of passing them
+	in the registers: We can keep the registers for our own
+	use.</para>
 
-<para>
-Let us see how it works now:
-</para>
+      <para>We use <varname role="register">EDI</varname> and
+	<varname role="register">ESI</varname> as pointers to the next
+	byte to be read from or written to.  We use <varname
+	  role="register">EBX</varname> and <varname
+	  role="register">ECX</varname> to keep count of the number of
+	bytes in the two buffers, so we know when to dump the output
+	to, or read more input from, the system.</para>
 
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+      <para>Let us see how it works now:</para>
+
+      <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
 &prompt.user; <userinput>ld -s -o hex hex.o</userinput>
 &prompt.user; <userinput>./hex</userinput>
 <userinput>Hello, World!</userinput>
@@ -1133,17 +1089,15 @@ Let us see how it works now:
 48 65 72 65 20 49 20 63 6F 6D 65 21 0A
 <userinput>^D</userinput> &prompt.user;</screen>
 
-<para>
-Not what you expected? The program did not print the output
-until we pressed <userinput>^D</userinput>. That is easy to fix by
-inserting three lines of code to write the output every time
-we have converted a new line to <constant>0A</constant>. I have marked
-the three lines with &gt; (do not copy the &gt; in your
-<filename>hex.asm</filename>).
-</para>
+      <para>Not what you expected? The program did not print the
+	output until we pressed <userinput>^D</userinput>.  That is
+	easy to fix by inserting three lines of code to write the
+	output every time we have converted a new line to
+	<constant>0A</constant>.  I have marked the three lines with
+	&gt; (do not copy the &gt; in your
+	<filename>hex.asm</filename>).</para>
 
-<programlisting>
-%include	'system.inc'
+      <programlisting>%include	'system.inc'
 
 %define	BUFSIZE	2048
 
@@ -1238,14 +1192,11 @@ write:
 	add	esp, byte 12
 	sub	eax, eax
 	sub	ecx, ecx	; buffer is empty now
-	ret
-</programlisting>
+	ret</programlisting>
 
-<para>
-Now, let us see how it works:
-</para>
+      <para>Now, let us see how it works:</para>
 
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+      <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
 &prompt.user; <userinput>ld -s -o hex hex.o</userinput>
 &prompt.user; <userinput>./hex</userinput>
 <userinput>Hello, World!</userinput>
@@ -1254,265 +1205,214 @@ Now, let us see how it works:
 48 65 72 65 20 49 20 63 6F 6D 65 21 0A
 <userinput>^D</userinput> &prompt.user;</screen>
 
-<para>
-Not bad for a 644-byte executable, is it!
-</para>
+      <para>Not bad for a 644-byte executable, is it!</para>
 
-<note>
-<para>
-This approach to buffered input/output still
-contains a hidden danger. I will discuss&mdash;and
-fix&mdash;it later, when I talk about the
-<link linkend="x86-buffered-dark-side">dark
-side of buffering</link>.</para>
-</note>
+      <note>
+	<para>This approach to buffered input/output still
+	  contains a hidden danger.  I will discuss&mdash;and
+	  fix&mdash;it later, when I talk about the <link
+	    linkend="x86-buffered-dark-side">dark side of
+	    buffering</link>.</para>
+      </note>
 
-<sect2 xml:id="x86-ungetc">
-<title>How to Unread a Character</title>
+      <sect2 xml:id="x86-ungetc">
+	<title>How to Unread a Character</title>
 
-<warning><para>
-This may be a somewhat advanced topic, mostly of interest to
-programmers familiar with the theory of compilers. If you wish,
-you may <link linkend="x86-command-line">skip to the next
-section</link>, and perhaps read this later.
-</para>
-</warning>
-<para>
-While our sample program does not require it, more sophisticated
-filters often need to look ahead. In other words, they may need
-to see what the next character is (or even several characters).
-If the next character is of a certain value, it is part of the
-token currently being processed. Otherwise, it is not.
-</para>
+	<warning>
+	  <para>This may be a somewhat advanced topic, mostly of
+	    interest to programmers familiar with the theory of
+	    compilers.  If you wish, you may <link
+	      linkend="x86-command-line">skip to the next
+	      section</link>, and perhaps read this later.</para>
+	</warning>
 
-<para>
-For example, you may be parsing the input stream for a textual
-string (e.g., when implementing a language compiler): If a
-character is followed by another character, or perhaps a digit,
-it is part of the token you are processing. If it is followed by
-white space, or some other value, then it is not part of the
-current token.
-</para>
+	<para>While our sample program does not require it, more
+	  sophisticated filters often need to look ahead.  In other
+	  words, they may need to see what the next character is (or
+	  even several characters).  If the next character is of a
+	  certain value, it is part of the token currently being
+	  processed.  Otherwise, it is not.</para>
 
-<para>
-This presents an interesting problem: How to return the next
-character back to the input stream, so it can be read again
-later?
-</para>
+	<para>For example, you may be parsing the input stream for a
+	  textual string (e.g., when implementing a language
+	  compiler): If a character is followed by another character,
+	  or perhaps a digit, it is part of the token you are
+	  processing.  If it is followed by white space, or some other
+	  value, then it is not part of the current token.</para>
 
-<para>
-One possible solution is to store it in a character variable,
-then set a flag. We can modify <function>getchar</function> to check the flag,
-and if it is set, fetch the byte from that variable instead of the
-input buffer, and reset the flag. But, of course, that slows us
-down.
-</para>
+	<para>This presents an interesting problem: How to return the
+	  next character back to the input stream, so it can be read
+	  again later?</para>
 
-<para>
-The C language has an <function>ungetc()</function> function, just for that
-purpose. Is there a quick way to implement it in our code?
-I would like you to scroll back up and take a look at the
-<function>getchar</function> procedure and see if you can find a nice and
-fast solution before reading the next paragraph. Then come back
-here and see my own solution.
-</para>
+	<para>One possible solution is to store it in a character
+	  variable, then set a flag.  We can modify
+	  <function>getchar</function> to check the flag, and if it is
+	  set, fetch the byte from that variable instead of the input
+	  buffer, and reset the flag.  But, of course, that slows us
+	  down.</para>
 
-<para>
-The key to returning a character back to the stream is in how
-we are getting the characters to start with:
-</para>
+	<para>The C language has an <function>ungetc()</function>
+	  function, just for that purpose.  Is there a quick way to
+	  implement it in our code?  I would like you to scroll back
+	  up and take a look at the <function>getchar</function>
+	  procedure and see if you can find a nice and fast solution
+	  before reading the next paragraph.  Then come back here and
+	  see my own solution.</para>
 
-<para>
-First we check if the buffer is empty by testing the value
-of <varname role="register">EBX</varname>. If it is zero, we call the
-<function>read</function> procedure.
-</para>
+	<para>The key to returning a character back to the stream is
+	  in how we are getting the characters to start with:</para>
 
-<para>
-If we do have a character available, we use <function role="opcode">lodsb</function>, then
-decrease the value of <varname role="register">EBX</varname>. The <function role="opcode">lodsb</function>
-instruction is effectively identical to:
-</para>
+	<para>First we check if the buffer is empty by testing the
+	  value of <varname role="register">EBX</varname>.  If it is
+	  zero, we call the <function>read</function>
+	  procedure.</para>
 
-<programlisting>
-	mov	al, [esi]
-	inc	esi
-</programlisting>
+	<para>If we do have a character available, we use <function
+	    role="opcode">lodsb</function>, then decrease the value of
+	  <varname role="register">EBX</varname>.  The <function
+	    role="opcode">lodsb</function> instruction is effectively
+	  identical to:</para>
 
-<para>
-The byte we have fetched remains in the buffer until the next
-time <function>read</function> is called. We do not know when that happens,
-but we do know it will not happen until the next call to
-<function>getchar</function>. Hence, to "return" the last-read byte back
-to the stream, all we have to do is decrease the value of
-<varname role="register">ESI</varname> and increase the value of <varname role="register">EBX</varname>:
-</para>
+	<programlisting>mov	al, [esi]
+	inc	esi</programlisting>
 
-<programlisting>
-ungetc:
+      <para>The byte we have fetched remains in the buffer until the
+	next time <function>read</function> is called.  We do not know
+	when that happens, but we do know it will not happen until the
+	next call to <function>getchar</function>.  Hence, to "return"
+	the last-read byte back to the stream, all we have to do is
+	decrease the value of <varname role="register">ESI</varname>
+	and increase the value of <varname
+	  role="register">EBX</varname>:</para>
+
+      <programlisting>ungetc:
 	dec	esi
 	inc	ebx
-	ret
-</programlisting>
+	ret</programlisting>
 
-<para>
-But, be careful! We are perfectly safe doing this if our look-ahead
-is at most one character at a time. If we are examining more than
-one upcoming character and call <function>ungetc</function> several times
-in a row, it will work most of the time, but not all the time
-(and will be tough to debug). Why?
-</para>
+      <para>But, be careful! We are perfectly safe doing this if our
+	look-ahead is at most one character at a time.  If we are
+	examining more than one upcoming character and call
+	<function>ungetc</function> several times in a row, it will
+	work most of the time, but not all the time (and will be tough
+	to debug).  Why?</para>
 
-<para>
-Because as long as <function>getchar</function> does not have to call
-<function>read</function>, all of the pre-read bytes are still in the buffer,
-and our <function>ungetc</function> works without a glitch. But the moment
-<function>getchar</function> calls <function>read</function>,
-the contents of the buffer change.
-</para>
+      <para>Because as long as <function>getchar</function> does not
+	have to call <function>read</function>, all of the pre-read
+	bytes are still in the buffer, and our
+	<function>ungetc</function> works without a glitch.  But the
+	moment <function>getchar</function> calls
+	<function>read</function>, the contents of the buffer
+	change.</para>
 
-<para>
-We can always rely on <function>ungetc</function> working properly on the last
-character we have read with <function>getchar</function>, but not on anything
-we have read before that.
-</para>
+      <para>We can always rely on <function>ungetc</function> working
+	properly on the last character we have read with
+	<function>getchar</function>, but not on anything we have read
+	before that.</para>
 
-<para>
-If your program reads more than one byte ahead, you have at least
-two choices:
-</para>
+      <para>If your program reads more than one byte ahead, you have
+	at least two choices:</para>
 
-<para>
-If possible, modify the program so it only reads one byte ahead.
-This is the simplest solution.
-</para>
+      <para>If possible, modify the program so it only reads one byte
+	ahead.  This is the simplest solution.</para>
 
-<para>
-If that option is not available, first of all determine the maximum
-number of characters your program needs to return to the input
-stream at one time. Increase that number slightly, just to be
-sure, preferably to a multiple of 16&mdash;so it aligns nicely.
-Then modify the <varname>.bss</varname> section of your code, and create
-a small "spare" buffer right before your input buffer,
-something like this:
-</para>
+      <para>If that option is not available, first of all determine
+	the maximum number of characters your program needs to return
+	to the input stream at one time.  Increase that number
+	slightly, just to be sure, preferably to a multiple of
+	16&mdash;so it aligns nicely.  Then modify the
+	<varname>.bss</varname> section of your code, and create a
+	small "spare" buffer right before your input buffer, something
+	like this:</para>
 
-<programlisting>
-section	.bss
+      <programlisting>section	.bss
 	resb	16	; or whatever the value you came up with
 ibuffer	resb	BUFSIZE
-obuffer	resb	BUFSIZE
-</programlisting>
+obuffer	resb	BUFSIZE</programlisting>
 
-<para>
-You also need to modify your <function>ungetc</function> to pass the value
-of the byte to unget in <varname role="register">AL</varname>:
-</para>
+      <para>You also need to modify your <function>ungetc</function>
+	to pass the value of the byte to unget in <varname
+	  role="register">AL</varname>:</para>
 
-<programlisting>
-ungetc:
+      <programlisting>ungetc:
 	dec	esi
 	inc	ebx
 	mov	[esi], al
-	ret
-</programlisting>
+	ret</programlisting>
 
-<para>
-With this modification, you can call <function>ungetc</function>
-up to 17 times in a row safely (the first call will still
-be within the buffer, the remaining 16 may be either within
-the buffer or within the "spare").
-</para>
+      <para>With this modification, you can call
+	<function>ungetc</function> up to 17 times in a row safely
+	(the first call will still be within the buffer, the remaining
+	16 may be either within the buffer or within the
+	"spare").</para>
+    </sect2>
+  </sect1>
 
-</sect2>
+  <sect1 xml:id="x86-command-line">
+    <title>Command Line Arguments</title>
 
-</sect1>
+    <para>Our <application>hex</application> program will be more
+      useful if it can read the names of an input and output file from
+      its command line, i.e., if it can process the command line
+      arguments.  But... Where are they?</para>
 
-<sect1 xml:id="x86-command-line"><title>Command Line Arguments</title>
+    <para>Before a &unix; system starts a program, it <function
+	role="opcode">push</function>es some data on the stack, then
+      jumps at the <varname>_start</varname> label of the program.
+      Yes, I said jumps, not calls.  That means the data can be
+      accessed by reading <varname>[esp+offset]</varname>, or by
+      simply <function role="opcode">pop</function>ping it.</para>
 
-<para>
-Our <application>hex</application> program will be more useful if it can
-read the names of an input and output file from its command
-line, i.e., if it can process the command line arguments.
-But... Where are they?
-</para>
+    <para>The value at the top of the stack contains the number of
+      command line arguments.  It is traditionally called
+      <varname>argc</varname>, for "argument count."</para>
 
-<para>
-Before a &unix; system starts a program, it <function role="opcode">push</function>es some
-data on the stack, then jumps at the <varname>_start</varname>
-label of the program. Yes, I said jumps, not calls. That means the
-data can be accessed by reading <varname>[esp+offset]</varname>,
-or by simply <function role="opcode">pop</function>ping it.
-</para>
+    <para>Command line arguments follow next, all
+      <varname>argc</varname> of them.  These are typically referred
+      to as <varname>argv</varname>, for "argument value(s)."  That
+      is, we get <varname>argv[0]</varname>,
+      <varname>argv[1]</varname>, <varname>...</varname>,
+      <varname>argv[argc-1]</varname>.  These are not the actual
+      arguments, but pointers to arguments, i.e., memory addresses of
+      the actual arguments.  The arguments themselves are
+      NUL-terminated character strings.</para>
 
-<para>
-The value at the top of the stack contains the number of
-command line arguments. It is traditionally called
-<varname>argc</varname>, for "argument count."
-</para>
+    <para>The <varname>argv</varname> list is followed by a NULL
+      pointer, which is simply a <constant>0</constant>.  There is
+      more, but this is enough for our purposes right now.</para>
 
-<para>
-Command line arguments follow next, all <varname>argc</varname> of them.
-These are typically referred to as <varname>argv</varname>, for
-"argument value(s)." That is, we get <varname>argv[0]</varname>,
-<varname>argv[1]</varname>, <varname>...</varname>,
-<varname>argv[argc-1]</varname>. These are not the actual
-arguments, but pointers to arguments, i.e., memory addresses of
-the actual arguments. The arguments themselves are
-NUL-terminated character strings.
-</para>
+    <note>
+      <para>If you have come from the <acronym>&ms-dos;</acronym>
+	programming environment, the main difference is that each
+	argument is in a separate string.  The second difference is
+	that there is no practical limit on how many arguments there
+	can be.</para>
+    </note>
 
-<para>
-The <varname>argv</varname> list is followed by a NULL pointer,
-which is simply a <constant>0</constant>. There is more, but this is
-enough for our purposes right now.
-</para>
+    <para>Armed with this knowledge, we are almost ready for the next
+      version of <filename>hex.asm</filename>.  First, however, we
+      need to add a few lines to
+      <filename>system.inc</filename>:</para>
 
-<note>
-<para>
-If you have come from the <acronym>&ms-dos;</acronym> programming
-environment, the main difference is that each argument is in
-a separate string. The second difference is that there is no
-practical limit on how many arguments there can be.
-</para>
-</note>
+    <para>First, we need to add two new entries to our list of system
+      call numbers:</para>
 
-<para>
-Armed with this knowledge, we are almost ready for the next
-version of <filename>hex.asm</filename>. First, however, we need to
-add a few lines to <filename>system.inc</filename>:
-</para>
+    <programlisting>%define	SYS_open	5
+%define	SYS_close	6</programlisting>
 
-<para>
-First, we need to add two new entries to our list of system
-call numbers:
-</para>
+    <para>Then we add two new macros at the end of the file:</para>
 
-<programlisting>
-%define	SYS_open	5
-%define	SYS_close	6
-</programlisting>
-
-<para>
-Then we add two new macros at the end of the file:
-</para>
-
-<programlisting>
-%macro	sys.open	0
+    <programlisting>%macro	sys.open	0
 	system	SYS_open
 %endmacro
 
 %macro	sys.close	0
 	system	SYS_close
-%endmacro
-</programlisting>
+%endmacro</programlisting>
 
-<para>
-Here, then, is our modified source code:
-</para>
+    <para>Here, then, is our modified source code:</para>
 
-<programlisting>
-%include	'system.inc'
+    <programlisting>%include	'system.inc'
 
 %define	BUFSIZE	2048
 
@@ -1653,234 +1553,192 @@ write:

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201909082008.x88K8FBD016322>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation