Date: Thu, 30 May 2002 21:29:54 -0400 From: Jake Burkholder <jake@locore.ca> To: Julian Elischer <julian@elischer.org> Cc: FreeBSD current users <current@FreeBSD.ORG> Subject: Re: Seeking OK to commit KSE MIII-again Message-ID: <20020530212954.S62759@locore.ca> In-Reply-To: <Pine.BSF.4.21.0205300904260.23153-100000@InterJet.elischer.org>; from julian@elischer.org on Thu, May 30, 2002 at 09:20:57AM -0700 References: <XFMail.20020530105952.jhb@FreeBSD.org> <Pine.BSF.4.21.0205300904260.23153-100000@InterJet.elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
apparently, On Thu, May 30, 2002 at 09:20:57AM -0700, Julian Elischer said words to the effect of; > > > ok, but does anyone other than john (who has commented) have any comments > about the logic and work in the change? > > I'm working on his comments but comments by others would sure be > appreciated.. > especially if they actually comment on what I'm trying to do.. > > If I can get the changes for the other architectures done, > I'd like to commit this weekend. HOPEFULLY it shouldn't > affect normal operations but of course the testing done by two people > can't hope to equal that which will be done in teh first 24 hours > once it's committed :-) > > once again: > > the diffs are at: > http://people.freebsd.org/~peter/kse.diff > and > http://people.freebsd.org/~julian/thediff > and the diffs I need for other architectures are versions of: > > sys/i386/i386/genassym.c (small) > sys/i386/i386/machdep.c (1 line) > sys/i386/i386/swtch.s (a few lines) > sys/i386/i386/trap.c (small) > sys/i386/i386/vm_machdep.c (largly new functions, we could stub them) > sys/i386/include/kse.h (new file) > sys/i386/linux/linux_machdep.c (one line) > > Largely these need to be written by someone who is intimately aquainted > with the register set of the machine in question and knows > what registers need to be saved to restore a user context correctly. > > Index: bin/ksetest/Makefile > =========================================================================== > Index: bin/ksetest/kse_asm.S > =========================================================================== > Index: bin/ksetest/kse_threads_test.c > =========================================================================== I don't know if you intended to commit this test program as well. Please do not. > Index: sys/i386/i386/trap.c > =========================================================================== > @@ -942,6 +948,23 @@ > td->td_frame = &frame; > if (td->td_ucred != p->p_ucred) > cred_update_thread(td); > + if (p->p_flag & P_KSES) { > + /* > + * If we are doing a syscall in a KSE environment, > + * note where our mailbox is. There is always the > + * possibility that we could do this lazily (in sleep()), > + * but for now do it every time. > + */ > + error = copyin((caddr_t)td->td_kse->ke_mailbox + > + offsetof(struct kse_mailbox, current_thread), > + &td->td_mailbox, sizeof(void *)); > + if (error || td->td_mailbox == NULL) { > + td->td_mailbox = NULL; /* single thread it.. */ > + td->td_flags &= ~TDF_UNBOUND; > + } else { > + td->td_flags |= TDF_UNBOUND; > + } > + } > params = (caddr_t)frame.tf_esp + sizeof(int); The places where you access user space to setup the linkage for the UTS should use fuword and suword instead of copyin and copyout, its faster and it makes the code clearer. > Index: sys/i386/i386/vm_machdep.c > =========================================================================== > --- sys/i386/i386/vm_machdep.c 2002/05/29 07:21:58 #21 > +++ sys/i386/i386/vm_machdep.c 2002/05/29 07:21:58 > @@ -283,6 +293,145 @@ > } > > void > +cpu_thread_setup(struct thread *td) > +{ > + > + td->td_pcb = > + (struct pcb *)(td->td_kstack + KSTACK_PAGES * PAGE_SIZE) - 1; > + td->td_frame = (struct trapframe *)((caddr_t)td->td_pcb - 16) - 1; > +} > + > +struct md_store { > + struct pcb mds_pcb; > + struct trapframe mds_frame; > +}; > + > +void > +cpu_save_upcall(struct thread *td, struct kse *newkse) > +{ > + > + /* Point the pcb to the top of the stack. */ > + newkse->ke_mdstorage = malloc(sizeof(struct md_store), M_TEMP, > + M_WAITOK); > + /* Note: use of M_WAITOK means it won't fail. */ > + newkse->ke_pcb = > + &(((struct md_store *)(newkse->ke_mdstorage))->mds_pcb); > + newkse->ke_frame = > + &(((struct md_store *)(newkse->ke_mdstorage))->mds_frame); > + > + /* Copy the upcall pcb. Kernel mode & fp regs are here. */ > + bcopy(td->td_pcb, newkse->ke_pcb, sizeof(struct pcb)); > + > + /* This copies most of the user mode register values. */ > + bcopy(td->td_frame, newkse->ke_frame, sizeof(struct trapframe)); > +} ke_frame, ke_pcb and ke_mdstorage should all be in a machine dependent struct mdkse, like mdproc. The fact that the storage is large enough to warrant using malloc is machine dependent, so it should not be a pointer. I would be inclined to just embed a trapframe. The pcb should not be needed at all here; all of the meaningful kernel mode register values are set below. Capturing the whole execution context at the time of the kse_new call (floating point registers, debug registers) may be expensive and I don't think is worth doing. The whole trick of a system call that returns multiple times is dubious. The fact that it works at all is machine dependent; for sparc64 it needs wierd hacks in the kernel like is done for vfork. It would be better to just register an upcall stack and entry point with the kernel, like how signals work. This would make mdkse even smaller. > +void > +cpu_set_upcall(struct thread *td, void *pcb) > +{ > + struct pcb *pcb2; > + > + td->td_flags |= TDF_UPCALLING; > + > + /* Point the pcb to the top of the stack. */ > + pcb2 = td->td_pcb; > + > + /* > + * Copy the upcall pcb. This loads kernel regs. > + * Those not loaded individually below get their default > + * values here. > + * > + * XXXKSE It might be a good idea to simply skip this as > + * the values of the other registers may be unimportant. > + * This would remove any requirement for knowing the KSE > + * at this time (see the matching comment below for > + * more analysis) (need a good safe default). > + */ > + bcopy(pcb, pcb2, sizeof(*pcb2)); > + > + /* > + * Create a new fresh stack for the new thread. > + * The -16 is so we can expand the trapframe if we go to vm86. > + * Don't forget to set this stack value into whatever supplies > + * the address for the fault handlers. > + * The contexts are filled in at the time we actually DO the > + * upcall as only then do we know which KSE we got. > + */ > + td->td_frame = (struct trapframe *)((caddr_t)pcb2 - 16) - 1; > + > + /* > + * Set registers for trampoline to user mode. Leave space for the > + * return address on stack. These are the kernel mode register values. > + */ > + pcb2->pcb_cr3 = vtophys(vmspace_pmap(td->td_proc->p_vmspace)->pm_pdir); > + pcb2->pcb_edi = 0; > + pcb2->pcb_esi = (int)fork_return; /* trampoline arg */ > + pcb2->pcb_ebp = 0; > + pcb2->pcb_esp = (int)td->td_frame - sizeof(void *); /* trampoline arg */ > + pcb2->pcb_ebx = (int)td; /* trampoline arg */ > + pcb2->pcb_eip = (int)fork_trampoline; > + /* > + * If we didn't copy the pcb, we'd need to do the following registers: > + * pcb2->pcb_dr*: cloned above. > + * pcb2->pcb_savefpu: cloned above. > + * pcb2->pcb_flags: cloned above. > + * pcb2->pcb_onfault: cloned above (always NULL here?). > + * pcb2->pcb_gs: cloned above. XXXKSE ??? > + * pcb2->pcb_ext: cleared below. > + */ > + pcb2->pcb_ext = NULL; > +} > > Index: sys/i386/include/kse.h > =========================================================================== > *** /dev/null Wed May 29 07:21:41 2002 > --- sys/i386/include/kse.h Wed May 29 07:22:12 2002 > *************** > *** 0 **** > --- 1,49 ---- > + /* > + * Copyright (C) 2002 Julian Elischer <julian@freebsd.org>. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice(s), this list of conditions and the following disclaimer as > + * the first lines of this file unmodified other than the possible > + * addition of one or more copyright notices. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice(s), this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY > + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED > + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE > + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY > + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES > + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR > + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER > + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > + * DAMAGE. > + * > + * $FreeBSD$ > + */ > + > + #ifndef MACHINE_KSE_H > + #define MACHINE_KSE_H > + #include <machine/frame.h> > + #include <machine/ucontext.h> > + > + union kse_td_ctx { > + struct { > + int if_onstack; > + struct intrframe if_if; > + } intrfrm; > + struct { > + int tf_onstack; > + int tf_gs; > + struct trapframe tf_tf; > + } tfrm; > + mcontext_t mcontext; > + }; Please do not export trapframe and intrframe to userland like this; just use mcontext. Doing this makes it part of the kernel->userland ABI more so than it already is, debuggers use it for core dumps which is bad enough. mcontext is already there for this purpose and required to have a stable ABI. > Index: sys/kern/kern_intr.c > =========================================================================== > --- sys/kern/kern_intr.c 2002/05/29 07:21:58 #15 > +++ sys/kern/kern_intr.c 2002/05/29 07:21:58 > @@ -387,21 +386,22 @@ > */ > ithread->it_need = 1; > mtx_lock_spin(&sched_lock); > - if (p->p_stat == SWAIT) { > + if (td->td_state == TDS_IWAIT) { > CTR2(KTR_INTR, "%s: setrunqueue %d", __func__, p->p_pid); > - p->p_stat = SRUN; > - setrunqueue(td); /* XXXKSE */ > - if (do_switch && curthread->td_critnest == 1 && > - curthread->td_proc->p_stat == SRUN) { > + setrunqueue(td); > + if (do_switch && > + (curthread->td_critnest == 1) && > + (curthread->td_proc->p_state == TDS_RUNNING)) { Is this a bug? TDS_RUNNING is a thread state but its testing the process's state. Seems like this will make preemptions not happen. > Index: sys/kern/kern_synch.c > =========================================================================== > --- sys/kern/kern_synch.c 2002/05/29 07:21:58 #24 > +++ sys/kern/kern_synch.c 2002/05/29 07:21:58 > @@ -788,37 +881,41 @@ > struct proc *p = td->td_proc; > > mtx_lock_spin(&sched_lock); > - switch (p->p_stat) { > - case SZOMB: /* not a thread flag XXXKSE */ > + switch (p->p_state) { > + case PRS_ZOMBIE: > panic("setrunnable(1)"); > + default: > } > - switch (td->td_proc->p_stat) { > + switch (td->td_state) { > case 0: > - case SRUN: > - case SWAIT: > + case TDS_RUNNING: > + case TDS_IWAIT: > default: > + printf("state is %d", td->td_state); > panic("setrunnable(2)"); One of these panics was hit by a user running 5.0-DP1. I guess you missed their mail. > Index: sys/kern/kern_thread.c > =========================================================================== > + /* > + * Tear down type-stable parts of a thread (just before being discarded). > + */ > + static void > + thread_fini(void *mem, int size) > + { > + struct thread *td; > + > + KASSERT((size == sizeof(struct thread)), > + ("size mismatch: %d != %d\n", size, sizeof(struct thread))); > + > + td = (struct thread *)mem; > + pmap_dispose_thread(td); > + vm_object_deallocate(td->td_kstack_obj); > + cached_threads--; /* XXXSMP */ > + allocated_threads--; /* XXXSMP */ > + } Where is the kva space for the kernel stack freed? Before it was not freed at all, and relied on stable storage to still be around when the thread was allocated again. The fini callout is called when the memory is being freed for real and will no longer be type stable. I think this leaks KSTACK_PAGES pages of kva space per thread when the thread zone is drained. > + int > + thread_export_context(struct thread *td) > + { > + struct kse *ke; > + void *td2_mbx; > + void *addr1; > + void *addr2; > + int error; > + > + /* Export the register contents. */ > + error = cpu_export_context(td); > + > + ke = td->td_kse; > + addr1 = (caddr_t)ke->ke_mailbox > + + offsetof(struct kse_mailbox, completed_threads); > + addr2 = (caddr_t)td->td_mailbox > + + offsetof(struct thread_mailbox , next_completed); > + /* Then link it into it's KSE's list of completed threads. */ > + if (!error) > + error = copyin( addr1, &td2_mbx, sizeof(void *)); > + if (!error) > + error = copyout(&td2_mbx, addr2, sizeof(void *)); > + if (!error) > + error = copyout(&td->td_mailbox, addr1, sizeof(void *)); > + return (error); > + } Please use fuword and suword here. > Index: sys/kern/subr_trap.c > =========================================================================== > --- sys/kern/subr_trap.c 2002/05/29 07:21:58 #20 > +++ sys/kern/subr_trap.c 2002/05/29 07:21:58 > + > + /* > + * There is no more work to do and we are going to ride > + * this thead/KSE up to userland. Make sure the user's > + * pointer to the thread mailbox is cleared before we > + * re-enter the kernel next time for any reason.. > + * We might as well do it here. > + */ > + td->td_flags &= ~TDF_UPCALLING; /* Hmmmm. */ > + error = copyout(&dummy, /* NULL */ > + (caddr_t)td->td_kse->ke_mailbox + > + offsetof(struct kse_mailbox, current_thread), > + sizeof(void *)); suword(&td->td_kse->ke_mailbox->current_thread, 0); would be much clearer here. [...] Before I spend much time on the machine dependent code for sparc64 I would like to see a well defined kernel->userland interface, with long term ABI issues taken into consideration. I'm not convinced that the current kse_new scheme will work; I would much rather see just an entry point, which is where the upcalls begin executing. This is how the netbsd upcall works as far as I can tell. It is much more difficult to ensure that all the register values end up the same on each return from the system call on sparc64, due to the way that register stack works. The current test program will not work at all, because setjmp, longjmp cannot be used to switch the stack in the same way. Jake To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020530212954.S62759>