Date: Sat, 23 Jun 2018 14:55:54 +0000 (UTC) From: Benedict Reuschling <bcr@FreeBSD.org> To: doc-committers@freebsd.org, svn-doc-all@freebsd.org, svn-doc-head@freebsd.org Subject: svn commit: r51904 - head/en_US.ISO8859-1/articles/linux-emulation Message-ID: <201806231455.w5NEtstd048273@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: bcr Date: Sat Jun 23 14:55:54 2018 New Revision: 51904 URL: https://svnweb.freebsd.org/changeset/doc/51904 Log: Style cleanup, purely cosmetical, no visual content changes: - Wrap overly long lines - Use two spaces after a sentence stop in a few places Modified: head/en_US.ISO8859-1/articles/linux-emulation/article.xml Modified: head/en_US.ISO8859-1/articles/linux-emulation/article.xml ============================================================================== --- head/en_US.ISO8859-1/articles/linux-emulation/article.xml Sat Jun 23 06:57:42 2018 (r51903) +++ head/en_US.ISO8859-1/articles/linux-emulation/article.xml Sat Jun 23 14:55:54 2018 (r51904) @@ -3,13 +3,23 @@ "http://www.FreeBSD.org/XML/share/xml/freebsd50.dtd"> <!-- $FreeBSD$ --> <!-- The FreeBSD Documentation Project --> -<article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en"> - <info><title>&linux; emulation in &os;</title> - +<article xmlns="http://docbook.org/ns/docbook" + xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" + xml:lang="en"> + <info> + <title>&linux; emulation in &os;</title> - <author><personname><firstname>Roman</firstname><surname>Divacky</surname></personname><affiliation> - <address><email>rdivacky@FreeBSD.org</email></address> - </affiliation></author> + <author> + <personname> + <firstname>Roman</firstname> + <surname>Divacky</surname> + </personname> + <affiliation> + <address> + <email>rdivacky@FreeBSD.org</email> + </address> + </affiliation> + </author> <legalnotice xml:id="trademarks" role="trademarks"> &tm-attrib.adobe; @@ -28,151 +38,165 @@ <releaseinfo>$FreeBSD$</releaseinfo> <abstract> - <para>This masters thesis deals with updating the &linux; emulation layer - (the so called <firstterm>Linuxulator</firstterm>). The task was to update the layer to match - the functionality of &linux; 2.6. As a reference implementation, the - &linux; 2.6.16 kernel was chosen. The concept is loosely based on - the NetBSD implementation. Most of the work was done in the summer - of 2006 as a part of the Google Summer of Code students program. - The focus was on bringing the <firstterm>NPTL</firstterm> (new &posix; - thread library) support into the emulation layer, including - <firstterm>TLS</firstterm> (thread local storage), + <para>This masters thesis deals with updating the &linux; + emulation layer (the so called + <firstterm>Linuxulator</firstterm>). The task was to update + the layer to match the functionality of &linux; 2.6. As a + reference implementation, the &linux; 2.6.16 kernel was + chosen. The concept is loosely based on the NetBSD + implementation. Most of the work was done in the summer of + 2006 as a part of the Google Summer of Code students program. + The focus was on bringing the <firstterm>NPTL</firstterm> (new + &posix; thread library) support into the emulation layer, + including <firstterm>TLS</firstterm> (thread local storage), <firstterm>futexes</firstterm> (fast user space mutexes), <firstterm>PID mangling</firstterm>, and some other minor things. Many small problems were identified and fixed in the process. My work was integrated into the main &os; source - repository and will be shipped in the upcoming 7.0R release. We, - the emulation development team, are working on making the - &linux; 2.6 emulation the default emulation layer in &os;.</para> + repository and will be shipped in the upcoming 7.0R release. + We, the emulation development team, are working on making the + &linux; 2.6 emulation the default emulation layer in + &os;.</para> </abstract> </info> <sect1 xml:id="intro"> <title>Introduction</title> - <para>In the last few years the open source &unix; based operating systems - started to be widely deployed on server and client machines. Among - these operating systems I would like to point out two: &os;, for its BSD - heritage, time proven code base and many interesting features and - &linux; for its wide user base, enthusiastic open developer community - and support from large companies. &os; tends to be used on server - class machines serving heavy duty networking tasks with less usage on - desktop class machines for ordinary users. While &linux; has the same - usage on servers, but it is used much more by home based users. This - leads to a situation where there are many binary only programs available - for &linux; that lack support for &os;.</para> + <para>In the last few years the open source &unix; based operating + systems started to be widely deployed on server and client + machines. Among these operating systems I would like to point + out two: &os;, for its BSD heritage, time proven code base and + many interesting features and &linux; for its wide user base, + enthusiastic open developer community and support from large + companies. &os; tends to be used on server class machines + serving heavy duty networking tasks with less usage on desktop + class machines for ordinary users. While &linux; has the same + usage on servers, but it is used much more by home based users. + This leads to a situation where there are many binary only + programs available for &linux; that lack support for + &os;.</para> - <para>Naturally, a need for the ability to run &linux; binaries on a &os; - system arises and this is what this thesis deals with: the emulation of - the &linux; kernel in the &os; operating system.</para> + <para>Naturally, a need for the ability to run &linux; binaries on + a &os; system arises and this is what this thesis deals with: + the emulation of the &linux; kernel in the &os; operating + system.</para> - <para>During the Summer of 2006 Google Inc. sponsored a project which - focused on extending the &linux; emulation layer (the so called Linuxulator) - in &os; to include &linux; 2.6 facilities. This thesis is written as a - part of this project.</para> + <para>During the Summer of 2006 Google Inc. sponsored a project + which focused on extending the &linux; emulation layer (the so + called Linuxulator) in &os; to include &linux; 2.6 facilities. + This thesis is written as a part of this project.</para> </sect1> <sect1 xml:id="inside"> <title>A look inside…</title> - <para>In this section we are going to describe every operating system in - question. How they deal with syscalls, trapframes etc., all the low-level - stuff. We also describe the way they understand common &unix; - primitives like what a PID is, what a thread is, etc. In the third - subsection we talk about how &unix; on &unix; emulation could be done - in general.</para> + <para>In this section we are going to describe every operating + system in question. How they deal with syscalls, trapframes + etc., all the low-level stuff. We also describe the way they + understand common &unix; primitives like what a PID is, what a + thread is, etc. In the third subsection we talk about how + &unix; on &unix; emulation could be done in general.</para> <sect2 xml:id="what-is-unix"> <title>What is &unix;</title> <para>&unix; is an operating system with a long history that has - influenced almost every other operating system currently in use. - Starting in the 1960s, its development continues to this day (although - in different projects). &unix; development soon forked into two main - ways: the BSDs and System III/V families. They mutually influenced - themselves by growing a common &unix; standard. Among the - contributions originated in BSD we can name virtual memory, TCP/IP - networking, FFS, and many others. The System V branch contributed to - SysV interprocess communication primitives, copy-on-write, etc. &unix; - itself does not exist any more but its ideas have been used by many - other operating systems world wide thus forming the so called &unix;-like - operating systems. These days the most influential ones are &linux;, - Solaris, and possibly (to some extent) &os;. There are in-company - &unix; derivatives (AIX, HP-UX etc.), but these have been more and - more migrated to the aforementioned systems. Let us summarize typical - &unix; characteristics.</para> + influenced almost every other operating system currently in + use. Starting in the 1960s, its development continues to this + day (although in different projects). &unix; development soon + forked into two main ways: the BSDs and System III/V families. + They mutually influenced themselves by growing a common &unix; + standard. Among the contributions originated in BSD we can + name virtual memory, TCP/IP networking, FFS, and many others. + The System V branch contributed to SysV interprocess + communication primitives, copy-on-write, etc. &unix; itself + does not exist any more but its ideas have been used by many + other operating systems world wide thus forming the so called + &unix;-like operating systems. These days the most + influential ones are &linux;, Solaris, and possibly (to some + extent) &os;. There are in-company &unix; derivatives (AIX, + HP-UX etc.), but these have been more and more migrated to the + aforementioned systems. Let us summarize typical &unix; + characteristics.</para> </sect2> <sect2 xml:id="tech-details"> <title>Technical details</title> - <para>Every running program constitutes a process that represents a state - of the computation. Running process is divided between kernel-space - and user-space. Some operations can be done only from kernel space - (dealing with hardware etc.), but the process should spend most of its - lifetime in the user space. The kernel is where the management of the - processes, hardware, and low-level details take place. The kernel - provides a standard unified &unix; API to the user space. The most - important ones are covered below.</para> + <para>Every running program constitutes a process that + represents a state of the computation. Running process is + divided between kernel-space and user-space. Some operations + can be done only from kernel space (dealing with hardware + etc.), but the process should spend most of its lifetime in + the user space. The kernel is where the management of the + processes, hardware, and low-level details take place. The + kernel provides a standard unified &unix; API to the user + space. The most important ones are covered below.</para> <sect3 xml:id="kern-proc-comm"> - <title>Communication between kernel and user space process</title> + <title>Communication between kernel and user space + process</title> - <para>Common &unix; API defines a syscall as a way to issue commands - from a user space process to the kernel. The most common - implementation is either by using an interrupt or specialized - instruction (think of - <literal>SYSENTER</literal>/<literal>SYSCALL</literal> instructions - for ia32). Syscalls are defined by a number. For example in &os;, - the syscall number 85 is the &man.swapon.2; syscall and the - syscall number 132 is &man.mkfifo.2;. Some syscalls need - parameters, which are passed from the user-space to the kernel-space - in various ways (implementation dependant). Syscalls are + <para>Common &unix; API defines a syscall as a way to issue + commands from a user space process to the kernel. The most + common implementation is either by using an interrupt or + specialized instruction (think of + <literal>SYSENTER</literal>/<literal>SYSCALL</literal> + instructions for ia32). Syscalls are defined by a number. + For example in &os;, the syscall number 85 is the + &man.swapon.2; syscall and the syscall number 132 is + &man.mkfifo.2;. Some syscalls need parameters, which are + passed from the user-space to the kernel-space in various + ways (implementation dependant). Syscalls are synchronous.</para> <para>Another possible way to communicate is by using a - <firstterm>trap</firstterm>. Traps occur asynchronously after - some event occurs (division by zero, page fault etc.). A trap - can be transparent for a process (page fault) or can result in - a reaction like sending a <firstterm>signal</firstterm> - (division by zero).</para> + <firstterm>trap</firstterm>. Traps occur asynchronously + after some event occurs (division by zero, page fault etc.). + A trap can be transparent for a process (page fault) or can + result in a reaction like sending a + <firstterm>signal</firstterm> (division by zero).</para> </sect3> <sect3 xml:id="proc-proc-comm"> <title>Communication between processes</title> - <para>There are other APIs (System V IPC, shared memory etc.) but the - single most important API is signal. Signals are sent by processes - or by the kernel and received by processes. Some signals - can be ignored or handled by a user supplied routine, some result - in a predefined action that cannot be altered or ignored.</para> + <para>There are other APIs (System V IPC, shared memory etc.) + but the single most important API is signal. Signals are + sent by processes or by the kernel and received by + processes. Some signals can be ignored or handled by a user + supplied routine, some result in a predefined action that + cannot be altered or ignored.</para> </sect3> <sect3 xml:id="proc-mgmt"> <title>Process management</title> - <para>Kernel instances are processed first in the system (so called - init). Every running process can create its identical copy using - the &man.fork.2; syscall. Some slightly modified versions of this - syscall were introduced but the basic semantic is the same. Every - running process can morph into some other process using the - &man.exec.3; syscall. Some modifications of this syscall were - introduced but all serve the same basic purpose. Processes end - their lives by calling the &man.exit.2; syscall. Every process is - identified by a unique number called PID. Every process has a - defined parent (identified by its PID).</para> + <para>Kernel instances are processed first in the system (so + called init). Every running process can create its + identical copy using the &man.fork.2; syscall. Some + slightly modified versions of this syscall were introduced + but the basic semantic is the same. Every running process + can morph into some other process using the &man.exec.3; + syscall. Some modifications of this syscall were introduced + but all serve the same basic purpose. Processes end their + lives by calling the &man.exit.2; syscall. Every process is + identified by a unique number called PID. Every process has + a defined parent (identified by its PID).</para> </sect3> <sect3 xml:id="thread-mgmt"> <title>Thread management</title> - <para>Traditional &unix; does not define any API nor implementation - for threading, while &posix; defines its threading API but the - implementation is undefined. Traditionally there were two ways of - implementing threads. Handling them as separate processes (1:1 - threading) or envelope the whole thread group in one process and - managing the threading in userspace (1:N threading). Comparing - main features of each approach:</para> + <para>Traditional &unix; does not define any API nor + implementation for threading, while &posix; defines its + threading API but the implementation is undefined. + Traditionally there were two ways of implementing threads. + Handling them as separate processes (1:1 threading) or + envelope the whole thread group in one process and managing + the threading in userspace (1:N threading). Comparing main + features of each approach:</para> <para>1:1 threading</para> @@ -199,10 +223,11 @@ <para>+ lightweight threads</para> </listitem> <listitem> - <para>+ scheduling can be easily altered by the user</para> + <para>+ scheduling can be easily altered by the + user</para> </listitem> <listitem> - <para>- syscalls must be wrapped </para> + <para>- syscalls must be wrapped</para> </listitem> <listitem> <para>- cannot utilize more than one CPU</para> @@ -214,24 +239,26 @@ <sect2 xml:id="what-is-freebsd"> <title>What is &os;?</title> - <para>The &os; project is one of the oldest open source operating - systems currently available for daily use. It is a direct descendant - of the genuine &unix; so it could be claimed that it is a true &unix; - although licensing issues do not permit that. The start of the project - dates back to the early 1990's when a crew of fellow BSD users patched - the 386BSD operating system. Based on this patchkit a new operating - system arose named &os; for its liberal license. Another group created - the NetBSD operating system with different goals in mind. We will - focus on &os;.</para> + <para>The &os; project is one of the oldest open source + operating systems currently available for daily use. It is a + direct descendant of the genuine &unix; so it could be claimed + that it is a true &unix; although licensing issues do not + permit that. The start of the project dates back to the early + 1990's when a crew of fellow BSD users patched the 386BSD + operating system. Based on this patchkit a new operating + system arose named &os; for its liberal license. Another + group created the NetBSD operating system with different goals + in mind. We will focus on &os;.</para> - <para>&os; is a modern &unix;-based operating system with all the - features of &unix;. Preemptive multitasking, multiuser facilities, - TCP/IP networking, memory protection, symmetric multiprocessing - support, virtual memory with merged VM and buffer cache, they are all - there. One of the interesting and extremely useful features is the - ability to emulate other &unix;-like operating systems. As of - December 2006 and 7-CURRENT development, the following - emulation functionalities are supported:</para> + <para>&os; is a modern &unix;-based operating system with all + the features of &unix;. Preemptive multitasking, multiuser + facilities, TCP/IP networking, memory protection, symmetric + multiprocessing support, virtual memory with merged VM and + buffer cache, they are all there. One of the interesting and + extremely useful features is the ability to emulate other + &unix;-like operating systems. As of December 2006 and + 7-CURRENT development, the following emulation functionalities + are supported:</para> <itemizedlist> <listitem> @@ -241,10 +268,12 @@ <para>&os;/i386 emulation on &os;/ia64</para> </listitem> <listitem> - <para>&linux;-emulation of &linux; operating system on &os;</para> + <para>&linux;-emulation of &linux; operating system on + &os;</para> </listitem> <listitem> - <para>NDIS-emulation of Windows networking drivers interface</para> + <para>NDIS-emulation of Windows networking drivers + interface</para> </listitem> <listitem> <para>NetBSD-emulation of NetBSD operating system</para> @@ -257,62 +286,70 @@ </listitem> </itemizedlist> - <para>Actively developed emulations are the &linux; layer and various - &os;-on-&os; layers. Others are not supposed to work properly nor - be usable these days.</para> + <para>Actively developed emulations are the &linux; layer and + various &os;-on-&os; layers. Others are not supposed to work + properly nor be usable these days.</para> <sect3 xml:id="freebsd-tech-details"> <title>Technical details</title> - <para>&os; is traditional flavor of &unix; in the sense of dividing the - run of processes into two halves: kernel space and user space run. - There are two types of process entry to the kernel: a syscall and a - trap. There is only one way to return. In the subsequent sections - we will describe the three gates to/from the kernel. The whole - description applies to the i386 architecture as the Linuxulator - only exists there but the concept is similar on other architectures. - The information was taken from [1] and the source code.</para> + <para>&os; is traditional flavor of &unix; in the sense of + dividing the run of processes into two halves: kernel space + and user space run. There are two types of process entry to + the kernel: a syscall and a trap. There is only one way to + return. In the subsequent sections we will describe the + three gates to/from the kernel. The whole description + applies to the i386 architecture as the Linuxulator only + exists there but the concept is similar on other + architectures. The information was taken from [1] and the + source code.</para> <sect4 xml:id="freebsd-sys-entries"> <title>System entries</title> - <para>&os; has an abstraction called an execution class loader, - which is a wedge into the &man.execve.2; syscall. This employs a - structure <literal>sysentvec</literal>, which describes an - executable ABI. It contains things like errno translation table, - signal translation table, various functions to serve syscall needs - (stack fixup, coredumping, etc.). Every ABI the &os; kernel wants - to support must define this structure, as it is used later in the - syscall processing code and at some other places. System entries - are handled by trap handlers, where we can access both the - kernel-space and the user-space at once.</para> + <para>&os; has an abstraction called an execution class + loader, which is a wedge into the &man.execve.2; syscall. + This employs a structure <literal>sysentvec</literal>, + which describes an executable ABI. It contains things + like errno translation table, signal translation table, + various functions to serve syscall needs (stack fixup, + coredumping, etc.). Every ABI the &os; kernel wants to + support must define this structure, as it is used later in + the syscall processing code and at some other places. + System entries are handled by trap handlers, where we can + access both the kernel-space and the user-space at + once.</para> </sect4> <sect4 xml:id="freebsd-syscalls"> <title>Syscalls</title> <para>Syscalls on &os; are issued by executing interrupt - <literal>0x80</literal> with register <varname>%eax</varname> set - to a desired syscall number with arguments passed on the stack.</para> + <literal>0x80</literal> with register + <varname>%eax</varname> set to a desired syscall number + with arguments passed on the stack.</para> - <para>When a process issues an interrupt <literal>0x80</literal>, the - <literal>int0x80</literal> syscall trap handler is issued (defined - in <filename>sys/i386/i386/exception.s</filename>), which prepares - arguments (i.e. copies them on to the stack) for a - call to a C function &man.syscall.2; (defined in - <filename>sys/i386/i386/trap.c</filename>), which processes the - passed in trapframe. The processing consists of preparing the - syscall (depending on the <literal>sysvec</literal> entry), - determining if the syscall is 32-bit or 64-bit one (changes size - of the parameters), then the parameters are copied, including the - syscall. Next, the actual syscall function is executed with - processing of the return code (special cases for - <literal>ERESTART</literal> and <literal>EJUSTRETURN</literal> - errors). Finally an <literal>userret()</literal> is scheduled, - switching the process back to the users-pace. The parameters to - the actual syscall handler are passed in the form of - <literal>struct thread *td</literal>, - <literal>struct syscall args *</literal> arguments where the second + <para>When a process issues an interrupt + <literal>0x80</literal>, the <literal>int0x80</literal> + syscall trap handler is issued (defined in + <filename>sys/i386/i386/exception.s</filename>), which + prepares arguments (i.e. copies them on to the stack) for + a call to a C function &man.syscall.2; (defined in + <filename>sys/i386/i386/trap.c</filename>), which + processes the passed in trapframe. The processing + consists of preparing the syscall (depending on the + <literal>sysvec</literal> entry), determining if the + syscall is 32-bit or 64-bit one (changes size of the + parameters), then the parameters are copied, including the + syscall. Next, the actual syscall function is executed + with processing of the return code (special cases for + <literal>ERESTART</literal> and + <literal>EJUSTRETURN</literal> errors). Finally an + <literal>userret()</literal> is scheduled, switching the + process back to the users-pace. The parameters to the + actual syscall handler are passed in the form of + <literal>struct thread *td</literal>, <literal>struct + syscall args *</literal> arguments where the second parameter is a pointer to the copied in structure of parameters.</para> </sect4> @@ -320,68 +357,76 @@ <sect4 xml:id="freebsd-traps"> <title>Traps</title> - <para>Handling of traps in &os; is similar to the handling of - syscalls. Whenever a trap occurs, an assembler handler is called. - It is chosen between alltraps, alltraps with regs pushed or - calltrap depending on the type of the trap. This handler prepares - arguments for a call to a C function <literal>trap()</literal> - (defined in <filename>sys/i386/i386/trap.c</filename>), which then - processes the occurred trap. After the processing it might send a - signal to the process and/or exit to userland using - <literal>userret()</literal>.</para> + <para>Handling of traps in &os; is similar to the handling + of syscalls. Whenever a trap occurs, an assembler handler + is called. It is chosen between alltraps, alltraps with + regs pushed or calltrap depending on the type of the trap. + This handler prepares arguments for a call to a C function + <literal>trap()</literal> (defined in + <filename>sys/i386/i386/trap.c</filename>), which then + processes the occurred trap. After the processing it + might send a signal to the process and/or exit to userland + using <literal>userret()</literal>.</para> </sect4> <sect4 xml:id="freebsd-exits"> <title>Exits</title> - <para>Exits from kernel to userspace happen using the assembler - routine <literal>doreti</literal> regardless of whether the kernel - was entered via a trap or via a syscall. This restores the program - status from the stack and returns to the userspace.</para> + <para>Exits from kernel to userspace happen using the + assembler routine <literal>doreti</literal> regardless of + whether the kernel was entered via a trap or via a + syscall. This restores the program status from the stack + and returns to the userspace.</para> </sect4> <sect4 xml:id="freebsd-unix-primitives"> <title>&unix; primitives</title> - <para>&os; operating system adheres to the traditional &unix; scheme, - where every process has a unique identification number, the so - called <firstterm>PID</firstterm> (Process ID). PID numbers are + <para>&os; operating system adheres to the traditional + &unix; scheme, where every process has a unique + identification number, the so called + <firstterm>PID</firstterm> (Process ID). PID numbers are allocated either linearly or randomly ranging from - <literal>0</literal> to <literal>PID_MAX</literal>. The allocation - of PID numbers is done using linear searching of PID space. Every - thread in a process receives the same PID number as result of the - &man.getpid.2; call.</para> + <literal>0</literal> to <literal>PID_MAX</literal>. The + allocation of PID numbers is done using linear searching + of PID space. Every thread in a process receives the same + PID number as result of the &man.getpid.2; call.</para> - <para>There are currently two ways to implement threading in &os;. - The first way is M:N threading followed by the 1:1 threading model. - The default library used is M:N threading - (<literal>libpthread</literal>) and you can switch at runtime to - 1:1 threading (<literal>libthr</literal>). The plan is to switch - to 1:1 library by default soon. Although those two libraries use - the same kernel primitives, they are accessed through different - API(es). The M:N library uses the <literal>kse_*</literal> family - of syscalls while the 1:1 library uses the <literal>thr_*</literal> - family of syscalls. Because of this, there is no general concept - of thread ID shared between kernel and userspace. Of course, both - threading libraries implement the pthread thread ID API. Every - kernel thread (as described by <literal>struct thread</literal>) - has td tid identifier but this is not directly accessible - from userland and solely serves the kernel's needs. It is also - used for 1:1 threading library as pthread's thread ID but handling - of this is internal to the library and cannot be relied on.</para> + <para>There are currently two ways to implement threading in + &os;. The first way is M:N threading followed by the 1:1 + threading model. The default library used is M:N + threading (<literal>libpthread</literal>) and you can + switch at runtime to 1:1 threading + (<literal>libthr</literal>). The plan is to switch to 1:1 + library by default soon. Although those two libraries use + the same kernel primitives, they are accessed through + different API(es). The M:N library uses the + <literal>kse_*</literal> family of syscalls while the 1:1 + library uses the <literal>thr_*</literal> family of + syscalls. Because of this, there is no general concept of + thread ID shared between kernel and userspace. Of course, + both threading libraries implement the pthread thread ID + API. Every kernel thread (as described by <literal>struct + thread</literal>) has td tid identifier but this is not + directly accessible from userland and solely serves the + kernel's needs. It is also used for 1:1 threading library + as pthread's thread ID but handling of this is internal to + the library and cannot be relied on.</para> - <para>As stated previously there are two implementations of threading - in &os;. The M:N library divides the work between kernel space and - userspace. Thread is an entity that gets scheduled in the kernel - but it can represent various number of userspace threads. - M userspace threads get mapped to N kernel threads thus saving - resources while keeping the ability to exploit multiprocessor - parallelism. Further information about the implementation can be - obtained from the man page or [1]. The 1:1 library directly maps a - userland thread to a kernel thread thus greatly simplifying the - scheme. None of these designs implement a fairness mechanism (such - a mechanism was implemented but it was removed recently because it - caused serious slowdown and made the code more difficult to deal + <para>As stated previously there are two implementations of + threading in &os;. The M:N library divides the work + between kernel space and userspace. Thread is an entity + that gets scheduled in the kernel but it can represent + various number of userspace threads. M userspace threads + get mapped to N kernel threads thus saving resources while + keeping the ability to exploit multiprocessor parallelism. + Further information about the implementation can be + obtained from the man page or [1]. The 1:1 library + directly maps a userland thread to a kernel thread thus + greatly simplifying the scheme. None of these designs + implement a fairness mechanism (such a mechanism was + implemented but it was removed recently because it caused + serious slowdown and made the code more difficult to deal with).</para> </sect4> </sect3> @@ -390,64 +435,70 @@ <sect2 xml:id="what-is-linux"> <title>What is &linux;</title> - <para>&linux; is a &unix;-like kernel originally developed by Linus - Torvalds, and now being contributed to by a massive crowd of - programmers all around the world. From its mere beginnings to today, - with wide support from companies such as IBM or Google, &linux; is - being associated with its fast development pace, full hardware support - and benevolent dictator model of organization.</para> + <para>&linux; is a &unix;-like kernel originally developed by + Linus Torvalds, and now being contributed to by a massive + crowd of programmers all around the world. From its mere + beginnings to today, with wide support from companies such as + IBM or Google, &linux; is being associated with its fast + development pace, full hardware support and benevolent + dictator model of organization.</para> - <para>&linux; development started in 1991 as a hobbyist project at - University of Helsinki in Finland. Since then it has obtained all the - features of a modern &unix;-like OS: multiprocessing, multiuser - support, virtual memory, networking, basically everything is there. - There are also highly advanced features like virtualization etc.</para> + <para>&linux; development started in 1991 as a hobbyist project + at University of Helsinki in Finland. Since then it has + obtained all the features of a modern &unix;-like OS: + multiprocessing, multiuser support, virtual memory, + networking, basically everything is there. There are also + highly advanced features like virtualization etc.</para> - <para>As of 2006 &linux; seems to be the most widely used open source - operating system with support from independent software vendors like - Oracle, RealNetworks, Adobe, etc. Most of the commercial software - distributed for &linux; can only be obtained in a binary form so - recompilation for other operating systems is impossible.</para> + <para>As of 2006 &linux; seems to be the most widely used open + source operating system with support from independent software + vendors like Oracle, RealNetworks, Adobe, etc. Most of the + commercial software distributed for &linux; can only be + obtained in a binary form so recompilation for other operating + systems is impossible.</para> <para>Most of the &linux; development happens in a <application>Git</application> version control system. - <application>Git</application> is a distributed system so there is - no central source of the &linux; code, but some branches are considered - prominent and official. The version number scheme implemented by - &linux; consists of four numbers A.B.C.D. Currently development - happens in 2.6.C.D, where C represents major version, where new - features are added or changed while D is a minor version for bugfixes - only.</para> + <application>Git</application> is a distributed system so + there is no central source of the &linux; code, but some + branches are considered prominent and official. The version + number scheme implemented by &linux; consists of four numbers + A.B.C.D. Currently development happens in 2.6.C.D, where C + represents major version, where new features are added or + changed while D is a minor version for bugfixes only.</para> <para>More information can be obtained from [3].</para> <sect3 xml:id="linux-tech-details"> <title>Technical details</title> - <para>&linux; follows the traditional &unix; scheme of dividing the run - of a process in two halves: the kernel and user space. The kernel can - be entered in two ways: via a trap or via a syscall. The return is - handled only in one way. The further description applies to - &linux; 2.6 on the &i386; architecture. This information was - taken from [2].</para> + <para>&linux; follows the traditional &unix; scheme of + dividing the run of a process in two halves: the kernel and + user space. The kernel can be entered in two ways: via a + trap or via a syscall. The return is handled only in one + way. The further description applies to &linux; 2.6 on + the &i386; architecture. This information was taken from + [2].</para> <sect4 xml:id="linux-syscalls"> <title>Syscalls</title> <para>Syscalls in &linux; are performed (in userspace) using - <literal>syscallX</literal> macros where X substitutes a number - representing the number of parameters of the given syscall. This - macro translates to a code that loads <varname>%eax</varname> - register with a number of the syscall and executes interrupt - <literal>0x80</literal>. After this syscall return is called, - which translates negative return values to positive - <literal>errno</literal> values and sets <literal>res</literal> to - <literal>-1</literal> in case of an error. Whenever the interrupt - <literal>0x80</literal> is called the process enters the kernel in - system call trap handler. This routine saves all registers on the - stack and calls the selected syscall entry. Note that the &linux; - calling convention expects parameters to the syscall to be passed - via registers as shown here:</para> + <literal>syscallX</literal> macros where X substitutes a + number representing the number of parameters of the given + syscall. This macro translates to a code that loads + <varname>%eax</varname> register with a number of the + syscall and executes interrupt <literal>0x80</literal>. + After this syscall return is called, which translates + negative return values to positive + <literal>errno</literal> values and sets + <literal>res</literal> to <literal>-1</literal> in case of + an error. Whenever the interrupt <literal>0x80</literal> + is called the process enters the kernel in system call + trap handler. This routine saves all registers on the + stack and calls the selected syscall entry. Note that the + &linux; calling convention expects parameters to the + syscall to be passed via registers as shown here:</para> <orderedlist> <listitem> @@ -470,53 +521,58 @@ </listitem> </orderedlist> - <para>There are some exceptions to this, where &linux; uses different - calling convention (most notably the <literal>clone</literal> - syscall).</para> + <para>There are some exceptions to this, where &linux; uses + different calling convention (most notably the + <literal>clone</literal> syscall).</para> </sect4> <sect4 xml:id="linux-traps"> <title>Traps</title> <para>The trap handlers are introduced in - <filename>arch/i386/kernel/traps.c</filename> and most of these - handlers live in <filename>arch/i386/kernel/entry.S</filename>, - where handling of the traps happens.</para> + <filename>arch/i386/kernel/traps.c</filename> and most of + these handlers live in + <filename>arch/i386/kernel/entry.S</filename>, where + handling of the traps happens.</para> </sect4> <sect4 xml:id="linux-exits"> <title>Exits</title> - <para>Return from the syscall is managed by syscall &man.exit.3;, - which checks for the process having unfinished work, then checks - whether we used user-supplied selectors. If this happens stack - fixing is applied and finally the registers are restored from the - stack and the process returns to the userspace.</para> + <para>Return from the syscall is managed by syscall + &man.exit.3;, which checks for the process having + unfinished work, then checks whether we used user-supplied + selectors. If this happens stack fixing is applied and + finally the registers are restored from the stack and the + process returns to the userspace.</para> </sect4> <sect4 xml:id="linux-unix-primitives"> <title>&unix; primitives</title> - <para>In the 2.6 version, the &linux; operating system redefined some - of the traditional &unix; primitives, notably PID, TID and thread. - PID is defined not to be unique for every process, so for some - processes (threads) &man.getppid.2; returns the same value. Unique - identification of process is provided by TID. This is because - <firstterm>NPTL</firstterm> (New &posix; Thread Library) defines - threads to be normal processes (so called 1:1 threading). Spawning - a new process in &linux; 2.6 happens using the - <literal>clone</literal> syscall (fork variants are reimplemented using - it). This clone syscall defines a set of flags that affect - behavior of the cloning process regarding thread implementation. - The semantic is a bit fuzzy as there is no single flag telling the - syscall to create a thread.</para> + <para>In the 2.6 version, the &linux; operating system + redefined some of the traditional &unix; primitives, + notably PID, TID and thread. PID is defined not to be + unique for every process, so for some processes (threads) + &man.getppid.2; returns the same value. Unique + identification of process is provided by TID. This is + because <firstterm>NPTL</firstterm> (New &posix; Thread + Library) defines threads to be normal processes (so called + 1:1 threading). Spawning a new process in + &linux; 2.6 happens using the + <literal>clone</literal> syscall (fork variants are + reimplemented using it). This clone syscall defines a set + of flags that affect behavior of the cloning process + regarding thread implementation. The semantic is a bit + fuzzy as there is no single flag telling the syscall to + create a thread.</para> <para>Implemented clone flags are:</para> <itemizedlist> <listitem> - <para><literal>CLONE_VM</literal> - processes share their memory - space</para> + <para><literal>CLONE_VM</literal> - processes share + their memory space</para> </listitem> <listitem> <para><literal>CLONE_FS</literal> - share umask, cwd and @@ -527,72 +583,78 @@ files</para> </listitem> <listitem> - <para><literal>CLONE_SIGHAND</literal> - share signal handlers - and blocked signals</para> + <para><literal>CLONE_SIGHAND</literal> - share signal + handlers and blocked signals</para> </listitem> <listitem> - <para><literal>CLONE_PARENT</literal> - share parent</para> + <para><literal>CLONE_PARENT</literal> - share + parent</para> </listitem> <listitem> - <para><literal>CLONE_THREAD</literal> - be thread (further - explanation below)</para> + <para><literal>CLONE_THREAD</literal> - be thread + (further explanation below)</para> </listitem> <listitem> - <para><literal>CLONE_NEWNS</literal> - new namespace</para> + <para><literal>CLONE_NEWNS</literal> - new + namespace</para> </listitem> <listitem> <para><literal>CLONE_SYSVSEM</literal> - share SysV undo structures</para> </listitem> <listitem> - <para><literal>CLONE_SETTLS</literal> - setup TLS at supplied - address</para> + <para><literal>CLONE_SETTLS</literal> - setup TLS at + supplied address</para> </listitem> <listitem> - <para><literal>CLONE_PARENT_SETTID</literal> - set TID in the - parent</para> + <para><literal>CLONE_PARENT_SETTID</literal> - set TID + in the parent</para> </listitem> <listitem> - <para><literal>CLONE_CHILD_CLEARTID</literal> - clear TID in the - child</para> + <para><literal>CLONE_CHILD_CLEARTID</literal> - clear + TID in the child</para> </listitem> <listitem> - <para><literal>CLONE_CHILD_SETTID</literal> - set TID in the - child</para> + <para><literal>CLONE_CHILD_SETTID</literal> - set TID in + the child</para> </listitem> </itemizedlist> - <para><literal>CLONE_PARENT</literal> sets the real parent to the - parent of the caller. This is useful for threads because if thread - A creates thread B we want thread B to be parented to the parent of - the whole thread group. <literal>CLONE_THREAD</literal> does - exactly the same thing as <literal>CLONE_PARENT</literal>, - <literal>CLONE_VM</literal> and <literal>CLONE_SIGHAND</literal>, - rewrites PID to be the same as PID of the caller, sets exit signal - to be none and enters the thread group. - <literal>CLONE_SETTLS</literal> sets up GDT entries for TLS - handling. The <literal>CLONE_*_*TID</literal> set of flags - sets/clears user supplied address to TID or 0.</para> + <para><literal>CLONE_PARENT</literal> sets the real parent + to the parent of the caller. This is useful for threads + because if thread A creates thread B we want thread B to + be parented to the parent of the whole thread group. + <literal>CLONE_THREAD</literal> does exactly the same + thing as <literal>CLONE_PARENT</literal>, + <literal>CLONE_VM</literal> and + <literal>CLONE_SIGHAND</literal>, rewrites PID to be the + same as PID of the caller, sets exit signal to be none and + enters the thread group. <literal>CLONE_SETTLS</literal> + sets up GDT entries for TLS handling. The + <literal>CLONE_*_*TID</literal> set of flags sets/clears + user supplied address to TID or 0.</para> - <para>As you can see the <literal>CLONE_THREAD</literal> does most - of the work and does not seem to fit the scheme very well. The - original intention is unclear (even for authors, according to - comments in the code) but I think originally there was one - threading flag, which was then parcelled among many other flags - but this separation was never fully finished. It is also unclear - what this partition is good for as glibc does not use that so only - hand-written use of the clone permits a programmer to access this - features.</para> + <para>As you can see the <literal>CLONE_THREAD</literal> + does most of the work and does not seem to fit the scheme + very well. The original intention is unclear (even for + authors, according to comments in the code) but I think + originally there was one threading flag, which was then + parcelled among many other flags but this separation was + never fully finished. It is also unclear what this + partition is good for as glibc does not use that so only + hand-written use of the clone permits a programmer to + access this features.</para> - <para>For non-threaded programs the PID and TID are the same. For - threaded programs the first thread PID and TID are the same and - every created thread shares the same PID and gets assigned a - unique TID (because <literal>CLONE_THREAD</literal> is passed in) - also parent is shared for all processes forming this threaded + <para>For non-threaded programs the PID and TID are the + same. For threaded programs the first thread PID and TID + are the same and every created thread shares the same PID + and gets assigned a unique TID (because + <literal>CLONE_THREAD</literal> is passed in) also parent + is shared for all processes forming this threaded program.</para> - <para>The code that implements &man.pthread.create.3; in NPTL defines - the clone flags like this:</para> + <para>The code that implements &man.pthread.create.3; in + NPTL defines the clone flags like this:</para> <programlisting>int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL @@ -606,12 +668,13 @@ | 0);</programlisting> - <para>The <literal>CLONE_SIGNAL</literal> is defined like</para> + <para>The <literal>CLONE_SIGNAL</literal> is defined + like</para> <programlisting>#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)</programlisting> - <para>the last 0 means no signal is sent when any of the threads - exits.</para> + <para>the last 0 means no signal is sent when any of the + threads exits.</para> </sect4> </sect3> </sect2> @@ -619,71 +682,80 @@ <sect2 xml:id="what-is-emu"> <title>What is emulation</title> - <para>According to a dictionary definition, emulation is the ability of - a program or device to imitate another program or device. This is - achieved by providing the same reaction to a given stimulus as the - emulated object. In practice, the software world mostly sees three - types of emulation - a program used to emulate a machine (QEMU, various - game console emulators etc.), software emulation of a hardware facility - (OpenGL emulators, floating point units emulation etc.) and operating - system emulation (either in kernel of the operating system or as a - userspace program).</para> + <para>According to a dictionary definition, emulation is the + ability of a program or device to imitate another program or + device. This is achieved by providing the same reaction to a + given stimulus as the emulated object. In practice, the + software world mostly sees three types of emulation - a + program used to emulate a machine (QEMU, various game console + emulators etc.), software emulation of a hardware facility + (OpenGL emulators, floating point units emulation etc.) and + operating system emulation (either in kernel of the operating + system or as a userspace program).</para> - <para>Emulation is usually used in a place, where using the original - component is not feasible nor possible at all. For example someone - might want to use a program developed for a different operating - system than they use. Then emulation comes in handy. Sometimes - there is no other way but to use emulation - e.g. when the hardware - device you try to use does not exist (yet/anymore) then there is no - other way but emulation. This happens often when porting an operating + <para>Emulation is usually used in a place, where using the + original component is not feasible nor possible at all. For + example someone might want to use a program developed for a + different operating system than they use. Then emulation + comes in handy. Sometimes there is no other way but to use + emulation - e.g. when the hardware device you try to use does + not exist (yet/anymore) then there is no other way but + emulation. This happens often when porting an operating system to a new (non-existent) platform. Sometimes it is just cheaper to emulate.</para> - <para>Looking from an implementation point of view, there are two main - approaches to the implementation of emulation. You can either emulate - the whole thing - accepting possible inputs of the original object, - maintaining inner state and emitting correct output based on the state - and/or input. This kind of emulation does not require any special - conditions and basically can be implemented anywhere for any - device/program. The drawback is that implementing such emulation is - quite difficult, time-consuming and error-prone. In some cases we can - use a simpler approach. Imagine you want to emulate a printer that - prints from left to right on a printer that prints from right to left. - It is obvious that there is no need for a complex emulation layer but - simply reversing of the printed text is sufficient. Sometimes the - emulating environment is very similar to the emulated one so just a - thin layer of some translation is necessary to provide fully working - emulation! As you can see this is much less demanding to implement, - so less time-consuming and error-prone than the previous approach. But - the necessary condition is that the two environments must be similar *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201806231455.w5NEtstd048273>