Date: Sun, 8 Sep 2019 20:08:15 +0000 (UTC) From: Benedict Reuschling <bcr@FreeBSD.org> To: doc-committers@freebsd.org, svn-doc-all@freebsd.org, svn-doc-head@freebsd.org Subject: svn commit: r53386 - head/en_US.ISO8859-1/books/developers-handbook/x86 Message-ID: <201909082008.x88K8FBD016322@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: bcr Date: Sun Sep 8 20:08:15 2019 New Revision: 53386 URL: https://svnweb.freebsd.org/changeset/doc/53386 Log: Mass cleanup of textproc/igor warnings including: - use two spaces at sentence start - space before content - wrap long line - start content on same line - straggling <tag> - put listing on same line - add blank line after <tag> on previous line Modified: head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml Modified: head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml ============================================================================== --- head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml Sun Sep 8 19:40:52 2019 (r53385) +++ head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml Sun Sep 8 20:08:15 2019 (r53386) @@ -532,16 +532,16 @@ sys.err: <para>The library approach may seem inconvenient at first because it requires you to produce a separate file your code depends on. But it has many advantages: For one, you only need to write it - once and can use it for all your programs. You can even let + once and can use it for all your programs. You can even let other assembly language programmers use it, or perhaps use one - written by someone else. But perhaps the greatest advantage of + written by someone else. But perhaps the greatest advantage of the library is that your code can be ported to other systems, even by other programmers, by simply writing a new library without any changes to your code.</para> <para>If you do not like the idea of having a library, you can at least place all your system calls in a separate assembly - language file and link it with your main program. Here, again, + language file and link it with your main program. Here, again, all porters have to do is create a new object file to link with your main program.</para> </sect2> @@ -554,7 +554,7 @@ sys.err: include in your code.</para> <para>Porters of your software will simply write a new include - file. No library or external object file is necessary, yet your + file. No library or external object file is necessary, yet your code is portable without any need to edit the code.</para> <note> @@ -651,111 +651,100 @@ access.the.bsd.kernel: <para>Lines 3-5 are the data: Line 3 starts the data section/segment. Line 4 contains the string "Hello, World!" - followed by a new line (<constant>0Ah</constant>). Line 5 creates + followed by a new line (<constant>0Ah</constant>). Line 5 creates a constant that contains the length of the string from line 4 in bytes.</para> - <para> Lines 7-16 contain the code. Note that FreeBSD uses the + <para>Lines 7-16 contain the code. Note that FreeBSD uses the <emphasis>elf</emphasis> file format for its executables, which requires every program to start at the point labeled <varname>_start</varname> (or, more precisely, the linker expects - that). This label has to be global.</para> + that). This label has to be global.</para> <para>Lines 10-13 ask the system to write <varname>hbytes</varname> bytes of the <varname>hello</varname> string to <varname>stdout</varname>.</para> <para>Lines 15-16 ask the system to end the program with the return - value of <constant>0</constant>. The <function + value of <constant>0</constant>. The <function role="syscall">SYS_exit</function> syscall never returns, so the code ends there.</para> <note> <para>If you have come to &unix; from <acronym>&ms-dos;</acronym> assembly language background, you may be used to writing - directly to the video hardware. You will never have to worry - about this in FreeBSD, or any other flavor of &unix;. As far as + directly to the video hardware. You will never have to worry + about this in FreeBSD, or any other flavor of &unix;. As far as you are concerned, you are writing to a file known as - <filename>stdout</filename>. This can be the video screen, or a + <filename>stdout</filename>. This can be the video screen, or a <application>telnet</application> terminal, or an actual file, - or even the input of another program. Which one it is, is for + or even the input of another program. Which one it is, is for the system to figure out.</para> </note> - <sect2 xml:id="x86-assemble-1"><title>Assembling the Code</title> + <sect2 xml:id="x86-assemble-1"> + <title>Assembling the Code</title> - <para>Type the code (except the line numbers) in an editor, and save - it in a file named <filename>hello.asm</filename>. You need - <application>nasm</application> to assemble it.</para> + <para>Type the code (except the line numbers) in an editor, and + save it in a file named <filename>hello.asm</filename>. You + need <application>nasm</application> to assemble it.</para> - <sect3 xml:id="x86-get-nasm"><title>Installing <application>nasm</application></title> + <sect3 xml:id="x86-get-nasm"> + <title>Installing <application>nasm</application></title> <para>If you do not have <application>nasm</application>, type:</para> -<screen>&prompt.user; <userinput>su</userinput> + <screen>&prompt.user; <userinput>su</userinput> Password:<userinput><replaceable>your root password</replaceable></userinput> &prompt.root; <userinput>cd /usr/ports/devel/nasm</userinput> &prompt.root; <userinput>make install</userinput> &prompt.root; <userinput>exit</userinput> &prompt.user;</screen> -<para> -You may type <userinput>make install clean</userinput> instead of just -<userinput>make install</userinput> if you do not want to keep -<application>nasm</application> source code. -</para> + <para>You may type <userinput>make install clean</userinput> + instead of just <userinput>make install</userinput> if you do + not want to keep <application>nasm</application> source + code.</para> -<para> -Either way, FreeBSD will automatically download -<application>nasm</application> from the Internet, -compile it, and install it on your system. -</para> + <para>Either way, FreeBSD will automatically download + <application>nasm</application> from the Internet, compile it, + and install it on your system.</para> -<note> -<para> -If your system is not FreeBSD, you need to get -<application>nasm</application> from its -<link xlink:href="https://sourceforge.net/projects/nasm">home -page</link>. You can still use it to assemble FreeBSD code. -</para> -</note> + <note> + <para>If your system is not FreeBSD, you need to get + <application>nasm</application> from its <link + xlink:href="https://sourceforge.net/projects/nasm">home + page</link>. You can still use it to assemble FreeBSD + code.</para> + </note> -<para> -Now you can assemble, link, and run the code: -</para> + <para>Now you can assemble, link, and run the code:</para> -<screen>&prompt.user; <userinput>nasm -f elf hello.asm</userinput> + <screen>&prompt.user; <userinput>nasm -f elf hello.asm</userinput> &prompt.user; <userinput>ld -s -o hello hello.o</userinput> &prompt.user; <userinput>./hello</userinput> Hello, World! &prompt.user;</screen> - -</sect3> - -</sect2> - + </sect3> + </sect2> </sect1> <sect1 xml:id="x86-unix-filters"> -<title>Writing &unix; Filters</title> + <title>Writing &unix; Filters</title> -<para> -A common type of &unix; application is a filter—a program -that reads data from the <filename>stdin</filename>, processes it -somehow, then writes the result to <filename>stdout</filename>. -</para> + <para>A common type of &unix; application is a filter—a + program that reads data from the <filename>stdin</filename>, + processes it somehow, then writes the result to + <filename>stdout</filename>.</para> -<para> -In this chapter, we shall develop a simple filter, and -learn how to read from <filename>stdin</filename> and write to -<filename>stdout</filename>. This filter will convert each byte -of its input into a hexadecimal number followed by a -blank space. -</para> + <para>In this chapter, we shall develop a simple filter, and + learn how to read from <filename>stdin</filename> and write to + <filename>stdout</filename>. This filter will convert each byte + of its input into a hexadecimal number followed by a blank + space.</para> -<programlisting> -%include 'system.inc' + <programlisting>%include 'system.inc' section .data hex db '0123456789ABCDEF' @@ -793,102 +782,85 @@ _start: .done: push dword 0 - sys.exit -</programlisting> -<para> -In the data section we create an array called <varname>hex</varname>. -It contains the 16 hexadecimal digits in ascending order. -The array is followed by a buffer which we will use for -both input and output. The first two bytes of the buffer -are initially set to <constant>0</constant>. This is where we will write -the two hexadecimal digits (the first byte also is -where we will read the input). The third byte is a -space. -</para> + sys.exit</programlisting> -<para> -The code section consists of four parts: Reading the byte, -converting it to a hexadecimal number, writing the result, -and eventually exiting the program. -</para> + <para>In the data section we create an array called + <varname>hex</varname>. It contains the 16 hexadecimal digits + in ascending order. The array is followed by a buffer which + we will use for both input and output. The first two bytes of + the buffer are initially set to <constant>0</constant>. This + is where we will write the two hexadecimal digits (the first + byte also is where we will read the input). The third byte is + a space.</para> -<para> -To read the byte, we ask the system to read one byte -from <filename>stdin</filename>, and store it in the first byte -of the <varname>buffer</varname>. The system returns the number -of bytes read in <varname role="register">EAX</varname>. This will be <constant>1</constant> -while data is coming, or <constant>0</constant>, when no more input -data is available. Therefore, we check the value of -<varname role="register">EAX</varname>. If it is <constant>0</constant>, -we jump to <varname>.done</varname>, otherwise we continue. -</para> + <para>The code section consists of four parts: Reading the byte, + converting it to a hexadecimal number, writing the result, and + eventually exiting the program.</para> -<note> -<para> -For simplicity sake, we are ignoring the possibility -of an error condition at this time. -</para> -</note> + <para>To read the byte, we ask the system to read one byte from + <filename>stdin</filename>, and store it in the first byte of + the <varname>buffer</varname>. The system returns the number + of bytes read in <varname role="register">EAX</varname>. This + will be <constant>1</constant> while data is coming, or + <constant>0</constant>, when no more input data is available. + Therefore, we check the value of <varname + role="register">EAX</varname>. If it is + <constant>0</constant>, we jump to <varname>.done</varname>, + otherwise we continue.</para> -<para> -The hexadecimal conversion reads the byte from the -<varname>buffer</varname> into <varname role="register">EAX</varname>, or actually just -<varname role="register">AL</varname>, while clearing the remaining bits of -<varname role="register">EAX</varname> to zeros. We also copy the byte to -<varname role="register">EDX</varname> because we need to convert the upper -four bits (nibble) separately from the lower -four bits. We store the result in the first two -bytes of the buffer. -</para> + <note> + <para>For simplicity sake, we are ignoring the possibility of + an error condition at this time.</para> + </note> -<para> -Next, we ask the system to write the three bytes -of the buffer, i.e., the two hexadecimal digits and -the blank space, to <filename>stdout</filename>. We then -jump back to the beginning of the program and -process the next byte. -</para> + <para>The hexadecimal conversion reads the byte from the + <varname>buffer</varname> into <varname + role="register">EAX</varname>, or actually just <varname + role="register">AL</varname>, while clearing the remaining + bits of <varname role="register">EAX</varname> to zeros. We + also copy the byte to <varname role="register">EDX</varname> + because we need to convert the upper four bits (nibble) + separately from the lower four bits. We store the result in + the first two bytes of the buffer.</para> -<para> -Once there is no more input left, we ask the system -to exit our program, returning a zero, which is -the traditional value meaning the program was -successful. -</para> + <para>Next, we ask the system to write the three bytes of the + buffer, i.e., the two hexadecimal digits and the blank space, + to <filename>stdout</filename>. We then jump back to the + beginning of the program and process the next byte.</para> -<para> -Go ahead, and save the code in a file named <filename>hex.asm</filename>, -then type the following (the <userinput>^D</userinput> means press the -control key and type <userinput>D</userinput> while holding the -control key down): -</para> + <para>Once there is no more input left, we ask the system to + exit our program, returning a zero, which is the traditional + value meaning the program was successful.</para> -<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> + <para>Go ahead, and save the code in a file named + <filename>hex.asm</filename>, then type the following (the + <userinput>^D</userinput> means press the control key and type + <userinput>D</userinput> while holding the control key + down):</para> + + <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> &prompt.user; <userinput>ld -s -o hex hex.o</userinput> &prompt.user; <userinput>./hex</userinput> <userinput>Hello, World!</userinput> 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A <userinput>Here I come!</userinput> 48 65 72 65 20 49 20 63 6F 6D 65 21 0A <userinput>^D</userinput> &prompt.user;</screen> -<note> -<para> -If you are migrating to &unix; from <acronym>&ms-dos;</acronym>, -you may be wondering why each line ends with <constant>0A</constant> -instead of <constant>0D 0A</constant>. -This is because &unix; does not use the cr/lf convention, but -a "new line" convention, which is <constant>0A</constant> in hexadecimal. -</para> -</note> + <note> + <para>If you are migrating to &unix; from + <acronym>&ms-dos;</acronym>, you may be wondering why each + line ends with <constant>0A</constant> instead of + <constant>0D 0A</constant>. This is because &unix; does not + use the cr/lf convention, but a "new line" convention, which + is <constant>0A</constant> in hexadecimal.</para> + </note> -<para> -Can we improve this? Well, for one, it is a bit confusing because -once we have converted a line of text, our input no longer -starts at the beginning of the line. We can modify it to print -a new line instead of a space after each <constant>0A</constant>: -</para> + <para>Can we improve this? Well, for one, it is a bit confusing + because once we have converted a line of text, our input no + longer starts at the beginning of the line. We can modify it + to print a new line instead of a space after each + <constant>0A</constant>:</para> -<programlisting> -%include 'system.inc' + <programlisting>%include 'system.inc' section .data hex db '0123456789ABCDEF' @@ -935,29 +907,26 @@ _start: .done: push dword 0 - sys.exit -</programlisting> -<para> -We have stored the space in the <varname role="register">CL</varname> register. We can -do this safely because, unlike µsoft.windows;, &unix; system -calls do not modify the value of any register they do not use -to return a value in. -</para> + sys.exit</programlisting> -<para> -That means we only need to set <varname role="register">CL</varname> once. We have, therefore, -added a new label <varname>.loop</varname> and jump to it for the next byte -instead of jumping at <varname>_start</varname>. We have also added the -<varname>.hex</varname> label so we can either have a blank space or a -new line as the third byte of the <varname>buffer</varname>. -</para> + <para>We have stored the space in the <varname + role="register">CL</varname> register. We can do this + safely because, unlike µsoft.windows;, &unix; system + calls do not modify the value of any register they do not use + to return a value in.</para> -<para> -Once you have changed <filename>hex.asm</filename> to reflect -these changes, type: -</para> + <para>That means we only need to set <varname + role="register">CL</varname> once. We have, therefore, + added a new label <varname>.loop</varname> and jump to it for + the next byte instead of jumping at <varname>_start</varname>. + We have also added the <varname>.hex</varname> label so we can + either have a blank space or a new line as the third byte of + the <varname>buffer</varname>.</para> -<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> + <para>Once you have changed <filename>hex.asm</filename> to + reflect these changes, type:</para> + + <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> &prompt.user; <userinput>ld -s -o hex hex.o</userinput> &prompt.user; <userinput>./hex</userinput> <userinput>Hello, World!</userinput> @@ -966,42 +935,33 @@ these changes, type: 48 65 72 65 20 49 20 63 6F 6D 65 21 0A <userinput>^D</userinput> &prompt.user;</screen> -<para> -That looks better. But this code is quite inefficient! We -are making a system call for every single byte twice (once -to read it, another time to write the output). -</para> + <para>That looks better. But this code is quite inefficient! We + are making a system call for every single byte twice (once to + read it, another time to write the output).</para> + </sect1> -</sect1> + <sect1 xml:id="x86-buffered-io"> + <title>Buffered Input and Output</title> -<sect1 xml:id="x86-buffered-io"> -<title>Buffered Input and Output</title> + <para>We can improve the efficiency of our code by buffering our + input and output. We create an input buffer and read a whole + sequence of bytes at one time. Then we fetch them one by one + from the buffer.</para> -<para> -We can improve the efficiency of our code by buffering our -input and output. We create an input buffer and read a whole -sequence of bytes at one time. Then we fetch them one by one -from the buffer. -</para> + <para>We also create an output buffer. We store our output in + it until it is full. At that time we ask the kernel to write + the contents of the buffer to + <filename>stdout</filename>.</para> -<para> -We also create an output buffer. We store our output in it until -it is full. At that time we ask the kernel to write the contents -of the buffer to <filename>stdout</filename>. -</para> + <para>The program ends when there is no more input. But we + still need to ask the kernel to write the contents of our + output buffer to <filename>stdout</filename> one last time, + otherwise some of our output would make it to the output + buffer, but never be sent out. Do not forget that, or you + will be wondering why some of your output is missing.</para> -<para> -The program ends when there is no more input. But we still need -to ask the kernel to write the contents of our output buffer -to <filename>stdout</filename> one last time, otherwise some of our output -would make it to the output buffer, but never be sent out. -Do not forget that, or you will be wondering why some of your -output is missing. -</para> + <programlisting>%include 'system.inc' -<programlisting> -%include 'system.inc' - %define BUFSIZE 2048 section .data @@ -1092,39 +1052,35 @@ write: add esp, byte 12 sub eax, eax sub ecx, ecx ; buffer is empty now - ret -</programlisting> -<para> -We now have a third section in the source code, named -<varname>.bss</varname>. This section is not included in our -executable file, and, therefore, cannot be initialized. We use -<function role="opcode">resb</function> instead of <function role="opcode">db</function>. -It simply reserves the requested size of uninitialized memory -for our use. -</para> + ret</programlisting> -<para> -We take advantage of the fact that the system does not modify the -registers: We use registers for what, otherwise, would have to be -global variables stored in the <varname>.data</varname> section. This is -also why the &unix; convention of passing parameters to system calls -on the stack is superior to the Microsoft convention of passing -them in the registers: We can keep the registers for our own use. -</para> + <para>We now have a third section in the source code, named + <varname>.bss</varname>. This section is not included in our + executable file, and, therefore, cannot be initialized. We + use <function role="opcode">resb</function> instead of + <function role="opcode">db</function>. It simply reserves + the requested size of uninitialized memory for our use.</para> -<para> -We use <varname role="register">EDI</varname> and <varname role="register">ESI</varname> as pointers to the next byte -to be read from or written to. We use <varname role="register">EBX</varname> and -<varname role="register">ECX</varname> to keep count of the number of bytes in the -two buffers, so we know when to dump the output to, or read more -input from, the system. -</para> + <para>We take advantage of the fact that the system does not + modify the registers: We use registers for what, otherwise, + would have to be global variables stored in the + <varname>.data</varname> section. This is also why the + &unix; convention of passing parameters to system calls on the + stack is superior to the Microsoft convention of passing them + in the registers: We can keep the registers for our own + use.</para> -<para> -Let us see how it works now: -</para> + <para>We use <varname role="register">EDI</varname> and + <varname role="register">ESI</varname> as pointers to the next + byte to be read from or written to. We use <varname + role="register">EBX</varname> and <varname + role="register">ECX</varname> to keep count of the number of + bytes in the two buffers, so we know when to dump the output + to, or read more input from, the system.</para> -<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> + <para>Let us see how it works now:</para> + + <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> &prompt.user; <userinput>ld -s -o hex hex.o</userinput> &prompt.user; <userinput>./hex</userinput> <userinput>Hello, World!</userinput> @@ -1133,17 +1089,15 @@ Let us see how it works now: 48 65 72 65 20 49 20 63 6F 6D 65 21 0A <userinput>^D</userinput> &prompt.user;</screen> -<para> -Not what you expected? The program did not print the output -until we pressed <userinput>^D</userinput>. That is easy to fix by -inserting three lines of code to write the output every time -we have converted a new line to <constant>0A</constant>. I have marked -the three lines with > (do not copy the > in your -<filename>hex.asm</filename>). -</para> + <para>Not what you expected? The program did not print the + output until we pressed <userinput>^D</userinput>. That is + easy to fix by inserting three lines of code to write the + output every time we have converted a new line to + <constant>0A</constant>. I have marked the three lines with + > (do not copy the > in your + <filename>hex.asm</filename>).</para> -<programlisting> -%include 'system.inc' + <programlisting>%include 'system.inc' %define BUFSIZE 2048 @@ -1238,14 +1192,11 @@ write: add esp, byte 12 sub eax, eax sub ecx, ecx ; buffer is empty now - ret -</programlisting> + ret</programlisting> -<para> -Now, let us see how it works: -</para> + <para>Now, let us see how it works:</para> -<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> + <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput> &prompt.user; <userinput>ld -s -o hex hex.o</userinput> &prompt.user; <userinput>./hex</userinput> <userinput>Hello, World!</userinput> @@ -1254,265 +1205,214 @@ Now, let us see how it works: 48 65 72 65 20 49 20 63 6F 6D 65 21 0A <userinput>^D</userinput> &prompt.user;</screen> -<para> -Not bad for a 644-byte executable, is it! -</para> + <para>Not bad for a 644-byte executable, is it!</para> -<note> -<para> -This approach to buffered input/output still -contains a hidden danger. I will discuss—and -fix—it later, when I talk about the -<link linkend="x86-buffered-dark-side">dark -side of buffering</link>.</para> -</note> + <note> + <para>This approach to buffered input/output still + contains a hidden danger. I will discuss—and + fix—it later, when I talk about the <link + linkend="x86-buffered-dark-side">dark side of + buffering</link>.</para> + </note> -<sect2 xml:id="x86-ungetc"> -<title>How to Unread a Character</title> + <sect2 xml:id="x86-ungetc"> + <title>How to Unread a Character</title> -<warning><para> -This may be a somewhat advanced topic, mostly of interest to -programmers familiar with the theory of compilers. If you wish, -you may <link linkend="x86-command-line">skip to the next -section</link>, and perhaps read this later. -</para> -</warning> -<para> -While our sample program does not require it, more sophisticated -filters often need to look ahead. In other words, they may need -to see what the next character is (or even several characters). -If the next character is of a certain value, it is part of the -token currently being processed. Otherwise, it is not. -</para> + <warning> + <para>This may be a somewhat advanced topic, mostly of + interest to programmers familiar with the theory of + compilers. If you wish, you may <link + linkend="x86-command-line">skip to the next + section</link>, and perhaps read this later.</para> + </warning> -<para> -For example, you may be parsing the input stream for a textual -string (e.g., when implementing a language compiler): If a -character is followed by another character, or perhaps a digit, -it is part of the token you are processing. If it is followed by -white space, or some other value, then it is not part of the -current token. -</para> + <para>While our sample program does not require it, more + sophisticated filters often need to look ahead. In other + words, they may need to see what the next character is (or + even several characters). If the next character is of a + certain value, it is part of the token currently being + processed. Otherwise, it is not.</para> -<para> -This presents an interesting problem: How to return the next -character back to the input stream, so it can be read again -later? -</para> + <para>For example, you may be parsing the input stream for a + textual string (e.g., when implementing a language + compiler): If a character is followed by another character, + or perhaps a digit, it is part of the token you are + processing. If it is followed by white space, or some other + value, then it is not part of the current token.</para> -<para> -One possible solution is to store it in a character variable, -then set a flag. We can modify <function>getchar</function> to check the flag, -and if it is set, fetch the byte from that variable instead of the -input buffer, and reset the flag. But, of course, that slows us -down. -</para> + <para>This presents an interesting problem: How to return the + next character back to the input stream, so it can be read + again later?</para> -<para> -The C language has an <function>ungetc()</function> function, just for that -purpose. Is there a quick way to implement it in our code? -I would like you to scroll back up and take a look at the -<function>getchar</function> procedure and see if you can find a nice and -fast solution before reading the next paragraph. Then come back -here and see my own solution. -</para> + <para>One possible solution is to store it in a character + variable, then set a flag. We can modify + <function>getchar</function> to check the flag, and if it is + set, fetch the byte from that variable instead of the input + buffer, and reset the flag. But, of course, that slows us + down.</para> -<para> -The key to returning a character back to the stream is in how -we are getting the characters to start with: -</para> + <para>The C language has an <function>ungetc()</function> + function, just for that purpose. Is there a quick way to + implement it in our code? I would like you to scroll back + up and take a look at the <function>getchar</function> + procedure and see if you can find a nice and fast solution + before reading the next paragraph. Then come back here and + see my own solution.</para> -<para> -First we check if the buffer is empty by testing the value -of <varname role="register">EBX</varname>. If it is zero, we call the -<function>read</function> procedure. -</para> + <para>The key to returning a character back to the stream is + in how we are getting the characters to start with:</para> -<para> -If we do have a character available, we use <function role="opcode">lodsb</function>, then -decrease the value of <varname role="register">EBX</varname>. The <function role="opcode">lodsb</function> -instruction is effectively identical to: -</para> + <para>First we check if the buffer is empty by testing the + value of <varname role="register">EBX</varname>. If it is + zero, we call the <function>read</function> + procedure.</para> -<programlisting> - mov al, [esi] - inc esi -</programlisting> + <para>If we do have a character available, we use <function + role="opcode">lodsb</function>, then decrease the value of + <varname role="register">EBX</varname>. The <function + role="opcode">lodsb</function> instruction is effectively + identical to:</para> -<para> -The byte we have fetched remains in the buffer until the next -time <function>read</function> is called. We do not know when that happens, -but we do know it will not happen until the next call to -<function>getchar</function>. Hence, to "return" the last-read byte back -to the stream, all we have to do is decrease the value of -<varname role="register">ESI</varname> and increase the value of <varname role="register">EBX</varname>: -</para> + <programlisting>mov al, [esi] + inc esi</programlisting> -<programlisting> -ungetc: + <para>The byte we have fetched remains in the buffer until the + next time <function>read</function> is called. We do not know + when that happens, but we do know it will not happen until the + next call to <function>getchar</function>. Hence, to "return" + the last-read byte back to the stream, all we have to do is + decrease the value of <varname role="register">ESI</varname> + and increase the value of <varname + role="register">EBX</varname>:</para> + + <programlisting>ungetc: dec esi inc ebx - ret -</programlisting> + ret</programlisting> -<para> -But, be careful! We are perfectly safe doing this if our look-ahead -is at most one character at a time. If we are examining more than -one upcoming character and call <function>ungetc</function> several times -in a row, it will work most of the time, but not all the time -(and will be tough to debug). Why? -</para> + <para>But, be careful! We are perfectly safe doing this if our + look-ahead is at most one character at a time. If we are + examining more than one upcoming character and call + <function>ungetc</function> several times in a row, it will + work most of the time, but not all the time (and will be tough + to debug). Why?</para> -<para> -Because as long as <function>getchar</function> does not have to call -<function>read</function>, all of the pre-read bytes are still in the buffer, -and our <function>ungetc</function> works without a glitch. But the moment -<function>getchar</function> calls <function>read</function>, -the contents of the buffer change. -</para> + <para>Because as long as <function>getchar</function> does not + have to call <function>read</function>, all of the pre-read + bytes are still in the buffer, and our + <function>ungetc</function> works without a glitch. But the + moment <function>getchar</function> calls + <function>read</function>, the contents of the buffer + change.</para> -<para> -We can always rely on <function>ungetc</function> working properly on the last -character we have read with <function>getchar</function>, but not on anything -we have read before that. -</para> + <para>We can always rely on <function>ungetc</function> working + properly on the last character we have read with + <function>getchar</function>, but not on anything we have read + before that.</para> -<para> -If your program reads more than one byte ahead, you have at least -two choices: -</para> + <para>If your program reads more than one byte ahead, you have + at least two choices:</para> -<para> -If possible, modify the program so it only reads one byte ahead. -This is the simplest solution. -</para> + <para>If possible, modify the program so it only reads one byte + ahead. This is the simplest solution.</para> -<para> -If that option is not available, first of all determine the maximum -number of characters your program needs to return to the input -stream at one time. Increase that number slightly, just to be -sure, preferably to a multiple of 16—so it aligns nicely. -Then modify the <varname>.bss</varname> section of your code, and create -a small "spare" buffer right before your input buffer, -something like this: -</para> + <para>If that option is not available, first of all determine + the maximum number of characters your program needs to return + to the input stream at one time. Increase that number + slightly, just to be sure, preferably to a multiple of + 16—so it aligns nicely. Then modify the + <varname>.bss</varname> section of your code, and create a + small "spare" buffer right before your input buffer, something + like this:</para> -<programlisting> -section .bss + <programlisting>section .bss resb 16 ; or whatever the value you came up with ibuffer resb BUFSIZE -obuffer resb BUFSIZE -</programlisting> +obuffer resb BUFSIZE</programlisting> -<para> -You also need to modify your <function>ungetc</function> to pass the value -of the byte to unget in <varname role="register">AL</varname>: -</para> + <para>You also need to modify your <function>ungetc</function> + to pass the value of the byte to unget in <varname + role="register">AL</varname>:</para> -<programlisting> -ungetc: + <programlisting>ungetc: dec esi inc ebx mov [esi], al - ret -</programlisting> + ret</programlisting> -<para> -With this modification, you can call <function>ungetc</function> -up to 17 times in a row safely (the first call will still -be within the buffer, the remaining 16 may be either within -the buffer or within the "spare"). -</para> + <para>With this modification, you can call + <function>ungetc</function> up to 17 times in a row safely + (the first call will still be within the buffer, the remaining + 16 may be either within the buffer or within the + "spare").</para> + </sect2> + </sect1> -</sect2> + <sect1 xml:id="x86-command-line"> + <title>Command Line Arguments</title> -</sect1> + <para>Our <application>hex</application> program will be more + useful if it can read the names of an input and output file from + its command line, i.e., if it can process the command line + arguments. But... Where are they?</para> -<sect1 xml:id="x86-command-line"><title>Command Line Arguments</title> + <para>Before a &unix; system starts a program, it <function + role="opcode">push</function>es some data on the stack, then + jumps at the <varname>_start</varname> label of the program. + Yes, I said jumps, not calls. That means the data can be + accessed by reading <varname>[esp+offset]</varname>, or by + simply <function role="opcode">pop</function>ping it.</para> -<para> -Our <application>hex</application> program will be more useful if it can -read the names of an input and output file from its command -line, i.e., if it can process the command line arguments. -But... Where are they? -</para> + <para>The value at the top of the stack contains the number of + command line arguments. It is traditionally called + <varname>argc</varname>, for "argument count."</para> -<para> -Before a &unix; system starts a program, it <function role="opcode">push</function>es some -data on the stack, then jumps at the <varname>_start</varname> -label of the program. Yes, I said jumps, not calls. That means the -data can be accessed by reading <varname>[esp+offset]</varname>, -or by simply <function role="opcode">pop</function>ping it. -</para> + <para>Command line arguments follow next, all + <varname>argc</varname> of them. These are typically referred + to as <varname>argv</varname>, for "argument value(s)." That + is, we get <varname>argv[0]</varname>, + <varname>argv[1]</varname>, <varname>...</varname>, + <varname>argv[argc-1]</varname>. These are not the actual + arguments, but pointers to arguments, i.e., memory addresses of + the actual arguments. The arguments themselves are + NUL-terminated character strings.</para> -<para> -The value at the top of the stack contains the number of -command line arguments. It is traditionally called -<varname>argc</varname>, for "argument count." -</para> + <para>The <varname>argv</varname> list is followed by a NULL + pointer, which is simply a <constant>0</constant>. There is + more, but this is enough for our purposes right now.</para> -<para> -Command line arguments follow next, all <varname>argc</varname> of them. -These are typically referred to as <varname>argv</varname>, for -"argument value(s)." That is, we get <varname>argv[0]</varname>, -<varname>argv[1]</varname>, <varname>...</varname>, -<varname>argv[argc-1]</varname>. These are not the actual -arguments, but pointers to arguments, i.e., memory addresses of -the actual arguments. The arguments themselves are -NUL-terminated character strings. -</para> + <note> + <para>If you have come from the <acronym>&ms-dos;</acronym> + programming environment, the main difference is that each + argument is in a separate string. The second difference is + that there is no practical limit on how many arguments there + can be.</para> + </note> -<para> -The <varname>argv</varname> list is followed by a NULL pointer, -which is simply a <constant>0</constant>. There is more, but this is -enough for our purposes right now. -</para> + <para>Armed with this knowledge, we are almost ready for the next + version of <filename>hex.asm</filename>. First, however, we + need to add a few lines to + <filename>system.inc</filename>:</para> -<note> -<para> -If you have come from the <acronym>&ms-dos;</acronym> programming -environment, the main difference is that each argument is in -a separate string. The second difference is that there is no -practical limit on how many arguments there can be. -</para> -</note> + <para>First, we need to add two new entries to our list of system + call numbers:</para> -<para> -Armed with this knowledge, we are almost ready for the next -version of <filename>hex.asm</filename>. First, however, we need to -add a few lines to <filename>system.inc</filename>: -</para> + <programlisting>%define SYS_open 5 +%define SYS_close 6</programlisting> -<para> -First, we need to add two new entries to our list of system -call numbers: -</para> + <para>Then we add two new macros at the end of the file:</para> -<programlisting> -%define SYS_open 5 -%define SYS_close 6 -</programlisting> - -<para> -Then we add two new macros at the end of the file: -</para> - -<programlisting> -%macro sys.open 0 + <programlisting>%macro sys.open 0 system SYS_open %endmacro %macro sys.close 0 system SYS_close -%endmacro -</programlisting> +%endmacro</programlisting> -<para> -Here, then, is our modified source code: -</para> + <para>Here, then, is our modified source code:</para> -<programlisting> -%include 'system.inc' + <programlisting>%include 'system.inc' %define BUFSIZE 2048 @@ -1653,234 +1553,192 @@ write: *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201909082008.x88K8FBD016322>