Date: Sun, 9 Nov 1997 10:21:54 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: mike@smith.net.au (Mike Smith) Cc: mini@d198-232.uoregon.edu, hackers@FreeBSD.ORG Subject: Re: x86 gods; advice? Suggestions? Message-ID: <199711091021.DAA24289@usr06.primenet.com> In-Reply-To: <199711080954.UAA00629@word.smith.net.au> from "Mike Smith" at Nov 8, 97 08:24:16 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > > This was suggested by another respondent. I'd be very interested in > > > knowing how I could arrange such a thing, either overloading the > > > existing syscall callgate or making another for temporary use (I have > > > another free descriptor that I can hijack for the purpose). > > > > I don't know, I've never done it myself, personally. :) > > Ah. A handwave response. 8) I'll look at it tomorrow; it was > suggested that I buy a book - obviously a suggestion from someone that > doesn't live in this book-desert. I was the one who suggested a book: Protected Mode Software Architecture - PC System Architecture Series Tom Shanley MindShare, Inc. ISBN: 0-201-55447-X Lest I be accused of handwaving-by-association yet gain, here's the process (people who are too lazy to wrap their brains around complicated things should delete this message now instead of bitching at me like they do when I talk about other complicated things they are too lazy to wrap their brains around, like file systems, etc.): o Load all or part of the task into memory (minimally, startup code). o Create a TSS for the task. A TSS is an execution context for the processor at the point it first begins or resumes the exection of the task. A special TSS segment desriptor is placed in the GDT defining the base address, length, and Descriptor Priviledge Level for the TSS it points to. A TSS starts at the TR specified TSS base address and extends to the TSS limit, also from the TR. It looks like: 0 31 00 Link (old TSS selector) 0000000000000000 04 ESP0 08 SS0 0000000000000000 0C ESP1 10 SS1 0000000000000000 14 ESP2 18 SS2 0000000000000000 1C CR3 20 EIP 24 EFLAGS 28 EAX 2C ECX 30 EDX 34 EBX <-- this ordering explains a bit, eh? 38 ESP 3C EBP 40 ESI 44 EDI 48 ES 0000000000000000 4C CS 0000000000000000 50 SS 0000000000000000 54 DS 0000000000000000 58 FS 0000000000000000 5C GS 0000000000000000 60 Task's LDT selector 0000000000000000 64 X000000000000000 Base address of I/O Map (*) Optional additonal information: You can follow this starting at 68 with OS specific data of arbitrary length (I wouldn't make it too large... the whole thing shouldn't exceed 64k, and should be longword aligned -- the I/O Map Base address is a 16 bit offset). If you are using the Appendix H disclosures to set the virtual 8086 mode interrupt handling by setting bit 0 (Later 486 and Pentium processors or better only), then a CLI, STI, INT, PUSHF, POPF, or IRET is not handled the same was as on the 386 (and 486's before a rev or two after the Pentium was introduced). Instead, the IF register is shadowed in a register called VIF (V=virtual), and modifying EFLAGS IF modifies this instead. A bitmap that must be exactly 8 longwords long (256 bits) then lets you let the interrupt be handled in "real mode" instead of trapping to your Virtual Machine Monitor if the bit is set. The "Base address of I/O Map" field points to the first longword immediately following the above two areas (ie: the processor uses this value and a negative offset of 32 bytes to access the virtual interrupt processing bitmap). The I/O Map must be present if your VM86 task is going to access I/O ports. If you virtualize all accesses, you do not need a bitmap. In theory, the bitmap needs to be 8k. In practice, you can do it in longword chunks up to the TSS limit from the TR if you don't want to permit accesses to ports above some arbitrary limit. (*) If X = 1, a debug exception occurs when switching to the task The TSS descriptor looks like: 0 7 0 LSB of Segment Size 1 2nd byte of Segment size 2 LSB of Base Address 3 2nd byte of Base Address 4 3rd byte of Base Address 5 1 B 0 X S D1 D2 L 6 [ nibble ] U 0 0 G 7 4th byte of Base Address B 1 = task is busy X 0 = 16 bit TSS, 1 = 32 bit TSS S 0 = system segment (must be zero in TSS descriptor) D1&D2 Descriptor Priviledge Level (0 for OS, 3 for user/VM86) P 1 = Segment present nibble upper nibble of Segment Size (20 bits total) U user bit (can be used by FreeBSD, etc.) G Granularity of Segment Size (0 = bytes, 1 = pages) o Set up a hardware timer to force you back. PCAUDIO is a definite drag at this point because of interrupts. o Switch to the task using a far call or jump that selects the TSS descriptor in the GDT. This implies you have a TSS for the OS, since the processor is going to copy the registers out to the OS's TSS. o The processor loads the new task's TSS an uses the CS:EIP from the new TSS to start fetching (and executing) code from the new task. This sort of answers Tony Overfield's "3. Something else..." question: you can get a Task Gate Descriptor in the LDT or GDT or IDT. This means "Something else..." can be triggered by: o A far call o A jump o A hardware interrupt o A software exception o An INT instruction (hello, thunking...) You use a VMM to field the event (in the INT case) in a memory extender; basically you need a VMM to do VM86 mode anyway. It works by being the general purpose exception handler code. A VM86 task pushes the EFLAGS register on the stack, but then clears the VM bit disabling VM86 mode. The exception handler looks at the EFLAGS on the stack to see if the VM bit is set; if it isn't, it does the normal protected mode stuff. If it is, however, the VMM must determine the action requested by the DOS task, and figure out how to do it itself. For example, an INT 21 against 0x80 could be handled as a request to a virtual "C:" actually implemented as a subhierarchy of an FFS mounted volume (Yet Another Reason To Allow Terry's Layering Fixes: now there are *4* VFS consumers: system calls, the NFS server, the kld code, and the newly defined VMM fielding INT 21 calls on behalf of a DOS process. Now we *really* can't keep our eyes tighly closed and pretend it's only system calls...). > > - a 32-bit 'we are the kernel now' context, > > - a 16-bit protected-mode 'let's play with the BIOS' context, and > > - a 16-bit vm86 'let's pretend we are a Microsoft OS' context. > > Too complicated, and inadequate. There is a separate 32-bit context > needed for the APM BIOS as well, and we are out of descriptors already. > The vm86 support handles this differently by creating a kernel process > at a later stage. Because we want to *not* virtualize INT 21 (or more likely INT 13) in the case where we are are running a fallback driver, and *not* virtualize INT 10 in the case where we are calling it (against my better judgement: many VGA cards disable interrupts in BIOS to get rid of "sparklies" any time you call INT 10 -- can you say "sucks"? I knew you could...) to set video modes via BIOS, etc.... Then we want to have at least three TSS's: one for the OS, one for the OS to make BIOS calls with, and one for the OS to run a VM86 for DOS under UNIX (and depending on the complexity of the VMM, maybe even emulate the functions of the 386 to the point of running Windows 95, like some other x86 protected mode OS's can). The APM mention above would be handled in the second context, above. Technically, you could do this using only two GDT's: one for the OS (in this case FreeBSD), and one that you switched off between as many others as you wanted, so long as they never trapped back to something other than the OS exception handler (acting as the VMM) to do their thing. Practically, there are far to many things that rely on the "DOS not busy" interrupt at INT 27 for this to work well... for example, if you wanted to unset the bit for an INT for a network card in the OS TSS and make the VMM switch to a VM86 and run a DOS network card driver to field the interrupt and use real mode IPX or ODI or whatever stacks, and then pass the data to FreeBSD via thunk. This would leave you with two TSS's, which, together, implemented the actual OS (instead of one). Any card driver that ran in DOS but not in FreeBSD could, in principle, be handled the same way. > > - hop into protected mode, create a vm86 task which handle > > loading the kernel. (map the vm86 1M+ range to where you want the > > kernel to go, or do a 1:1 map of all physical memory, and then set > > the vm86's descriptor limit to 4G or so. Do all the loading from > > vm86 mode. much easier code to look at) > > Unfortunately, we have no tools for writing realmode code. I was > perhaps somewhat misleading regarding running the bootstrap in real > mode; please look at the code for an understanding of the issues. I agree. We want to enter protected mode as early as we can, build a VM86, and *stay* in protected mode ever after. > This isn't making a lot of sense to me. Are you implying that one > could be in 32-bit PM and vm86 mode at the same time? No. You can swap them, but one has 16 bit segments, the other 32, so at least two additional TSS's are needed ...but they could be virtual, as stated above, so you'd have only two total real GDT entries, if you felt you were running low... Buy the book; it goes into much more detail than I did, including VM86 mode, task creation, the "magic registers" from Appendix H, including how to do a virtualized INT 10 for a VM86 task to think it has video hardware (it doesn't cover screen writes, but you can do that fairly simply by marking the "screen memory" read-only and taking an exception when it's written; use two timers, the first for latency so that inactivity causes the "real" screen, probably an X window or virtual console, to be updated, and the second, longer one for interval so that even if the display is highly active, you mirror the changes to that point. This lets you do things like graphics fairly quickly, at the cost of keeping a "diff" copy around to note deltas at any given timer firing. Yet Another Reason To Listen To Terry And Put DDX In The Kernel: console graphics for programs running under VM86...). This is entirely more than I had intended to type using only 9 fingers (one of my fingers is currently on strike for the next 6 weeks or so...). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199711091021.DAA24289>