Date: Thu, 23 Nov 2000 11:03:53 -0800 From: Julian Elischer <julian@elischer.org> To: John Baldwin <jhb@FreeBSD.org> Cc: arch@FreeBSD.org Subject: Re: Thread-specific data and KSEs Message-ID: <3A1D6A19.801BBFA5@elischer.org> References: <XFMail.001122141324.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote: > > On 22-Nov-00 Terry Lambert wrote: > > > > The %gs register already has to be saved for WINE processes, > > so it's taken (at least when USER_LDT is defined). So there > > would not be an additional context switch for it. > > Ok. Since %fs is only used in the kernel and is saved/restored it might be a > good thing to use instead. OK so let's do a little kernel code inspection..... first let's look at where the regs are saved.. the trapframe looks as follows (frame.h) slightly cut down ([...]) System calls are treated as a trap. This should be a good starting point. /* * Exception/Trap Stack Frame */ struct trapframe { int tf_fs; int tf_es; int tf_ds; [...] int tf_cs; int tf_eflags; /* below only when crossing rings (e.g. user to kernel) */ int tf_esp; int tf_ss; }; /* Superset of trap frame, for traps from virtual-8086 mode */ struct trapframe_vm86 { int tf_fs; int tf_es; int tf_ds; [...] int tf_cs; int tf_eflags; /* below only when crossing rings (e.g. user to kernel) */ int tf_esp; int tf_ss; /* below only when switching out of VM86 mode */ int tf_vm86_es; int tf_vm86_ds; int tf_vm86_fs; int tf_vm86_gs; }; /* Interrupt stack frame */ struct intrframe { int if_vec; int if_fs; int if_es; int if_ds; [...] /* below portion defined in 386 hardware */ int if_eip; int if_cs; int if_eflags; /* below only when crossing rings (e.g. user to kernel) */ int if_esp; int if_ss; }; /* frame of clock (same as interrupt frame) */ struct clockframe { int cf_vec; int cf_fs; int cf_es; int cf_ds; [...] int cf_cs; int cf_eflags; /* below only when crossing rings (e.g. user to kernel) */ int cf_esp; int cf_ss; }; So, as you see, there is space for %fs to be saved, but in general, no place for %gs (except in th VM86 case). This kinda suggests that %fs is the way to go. (so far it appears that %gs can't be in use at the moment). In signal.h the osigcontext looks like: (showing only segment regs) truct osigcontext { int sc_onstack; /* sigstack state to restore */ osigset_t sc_mask; /* signal mask to restore */ [...] int sc_es; int sc_ds; int sc_cs; int sc_ss; [...] int sc_gs; int sc_fs; int sc_trapno; int sc_err; }; which has places for both %gs and %fs Similarly the new sigcontext given to the process is: /* * The sequence of the fields/registers in struct sigcontext should match * those in mcontext_t. */ struct sigcontext { sigset_t sc_mask; /* signal mask to restore */ int sc_onstack; /* sigstack state to restore */ int sc_gs; /* machine state (struct trapframe): */ int sc_fs; int sc_es; int sc_ds; [...] int sc_cs; int sc_efl; int sc_esp; int sc_ss; [...] }; Once again both %gs and %fs are supported. so, signals should be able to cope with either. reg.h shows what /proc supports (both f and g) proc.h includes a trapframe (see above) via machine/proc.h so the proc structure (and this the KSEC eventually) hold %f but not %g In trap.c there is the following code that might have to be understood for this to work: void trap(frame) { [...] if ((ISPL(frame.tf_cs) == SEL_UPL) || ((frame.tf_eflags & PSL_VM) && !in_vm86call)) { /* user trap */ [...] } else { /* kernel trap */ [...] case T_SEGNPFLT: /* segment not present fault */ if (in_vm86call) break; if (intr_nesting_level != 0) break; /* * Invalid %fs's and %gs's can be created using * procfs or PT_SETREGS or by invalidating the * underlying LDT entry. This causes a fault * in kernel mode when the kernel attempts to * switch contexts. Lose the bad context * (XXX) so that we can continue, and generate * a signal. */ if (frame.tf_eip == (int)cpu_switch_load_gs) { curpcb->pcb_gs = 0; psignal(p, SIGBUS); goto out; } I notice that %fs is not touched.. (maybe it's fixed elsewhere) but this suggests that %gs and %fs are being loaded or the fault wouldn't happen. So where is %gs being loaded from..? in proc.h the proc structure includes: struct mdproc p_md; /* Any machine-dependent fields. */ which from i386/include/proc.h is: struct mdproc { struct trapframe *md_regs; /* registers on current frame */ }; which as we see above does not include room for %gs, however This appears misleading, because the structure 'pcb' in i386/include/pcb.h does include a field for %gs. A pointer to the current pcb is part of the per-CPU global data in globals.h. It is in user.h and as such is in the user structure. which is pointed to by p_addr in the proc structure. And lo-and-behold, there it is... a place to store the %gs register as well. (Why it's not in the proc structure I don't follow) Swtch.s seems to save it nicely with: movl %gs,PCB_GS(%edx) and I'm sure that %fs is similarly saved (It's on the stack) So it looks like you should be able to go ahead and use those registers. We will need to duplicate the U-area for KSEs anyhow so assuming that, both regs would be ok. Interestingly they use %fs in kernel, but in fact since they have a Per-CPU 4MB range of memory now (where each CPU sees different physical pages at the same address, it would now be possible for the kernel to drop this.. at the moment they are using %fs AND mapping, so in fact they are mapping twice. This brings up a possibility that if they have to fiddle the page maps for each KSE anyhow (to put the different PDE in,) they could just as easily fiddle TWO entries and give us a 4K or 4MB (take your pick) KSE dependent region within the use space. That would not require ANY registers. The trick would be to put a different PTE in for each KSE in the top page table just above the orogonal stack. The top page table must already there and loaded because the stack is in it. The trouble with this idea is that it would require having code to keep the rest of the PTEs (in the other KSEs Page tables) all in sync. You could make the kernel take 4MB from each address space, and put the stack (etc) below that and make it illegal to map new pages into that region. that way the kernel would only have to keep track of the single Page it allocates into that space per KSE. (The VM would have to be in on that act.. yech) Or alternatively, you could allow the user to access a page in the existing PER_CPU region (yeah I know it's at teh top of memory, above where the user process can usually touch, but we could set a segment up there, and allow it to get to it. You'd get a 4K window at 0xffaxx000 or somewhere. > > I think that if you guys go forward with this, you should do an > > indirect through whatever you end up using. I realize this will > > cost an additional 6 clock cycles, but it will let you expand > > the list of things indefinitely, going forward, instead of having > > to keep a register dedicated for backward compatability, and then > > somehow "grow a new one" when you need to do something similar to > > this again, in the future. > > It will be an indirect if I have any say in it. :) Currently we use %fs in the > kernel to address a segment that contains per-CPU data. I think that if we use > a seg reg, then we should have it address a segment that contains per-KSE data. For now I think that %fs is definitly safe.. and if you use it it could be used as the entry into the per-KSE area I just mentionned. (with interestingly, the almost the same contents as the kernel uses.). > > John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A1D6A19.801BBFA5>