From owner-freebsd-current  Thu May 30 19: 0:45 2002
Delivered-To: freebsd-current@freebsd.org
Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63])
	by hub.freebsd.org (Postfix) with ESMTP id E85B737B400
	for <current@FreeBSD.ORG>; Thu, 30 May 2002 19:00:11 -0700 (PDT)
Received: from InterJet.elischer.org ([12.232.206.8])
          by sccrmhc03.attbi.com
          (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP
          id <20020531020010.NHBJ20219.sccrmhc03.attbi.com@InterJet.elischer.org>;
          Fri, 31 May 2002 02:00:10 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id SAA25508;
	Thu, 30 May 2002 18:56:31 -0700 (PDT)
Date: Thu, 30 May 2002 18:56:30 -0700 (PDT)
From: Julian Elischer <julian@elischer.org>
To: Jake Burkholder <jake@locore.ca>
Cc: FreeBSD current users <current@FreeBSD.ORG>
Subject: Re: Seeking OK to commit KSE MIII-again
In-Reply-To: <20020530212954.S62759@locore.ca>
Message-ID: <Pine.BSF.4.21.0205301816320.23947-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG


On Thu, 30 May 2002, Jake Burkholder wrote:

> apparently, On Thu, May 30, 2002 at 09:20:57AM -0700,
> 	Julian Elischer said words to the effect of;
> 
> > 
> > 
> 
> > Index: bin/ksetest/Makefile
> > ===========================================================================
> > Index: bin/ksetest/kse_asm.S
> > ===========================================================================
> > Index: bin/ksetest/kse_threads_test.c
> > ===========================================================================
> 
> I don't know if you intended to commit this test program as well.
> Please do not.

not planing to commit it, (except maybe under 'tools')


> 
> > Index: sys/i386/i386/trap.c
> > ===========================================================================
> > @@ -942,6 +948,23 @@
> >  	td->td_frame = &frame;
> >  	if (td->td_ucred != p->p_ucred) 
> >  		cred_update_thread(td);
> > +	if (p->p_flag & P_KSES) {
> > +		/*
> > +		 * If we are doing a syscall in a KSE environment,
> > +		 * note where our mailbox is. There is always the
> > +		 * possibility that we could do this lazily (in sleep()),
> > +		 * but for now do it every time.
> > +		 */
> > +		error = copyin((caddr_t)td->td_kse->ke_mailbox +
> > +		    offsetof(struct kse_mailbox, current_thread),
> > +		    &td->td_mailbox, sizeof(void *));
> > +		if (error || td->td_mailbox == NULL) {
> > +			td->td_mailbox = NULL;	/* single thread it.. */
> > +			td->td_flags &= ~TDF_UNBOUND;
> > +		} else {
> > +			td->td_flags |= TDF_UNBOUND;
> > +		}
> > +	}
> >  	params = (caddr_t)frame.tf_esp + sizeof(int);
> 
> The places where you access user space to setup the linkage for the
> UTS should use fuword and suword instead of copyin and copyout, its
> faster and it makes the code clearer.

Great! I've seen them mentionned several times and thought
"I should see what they are" but never done so..

exactly what I want!

> 
> > Index: sys/i386/i386/vm_machdep.c
> > ===========================================================================
> > --- sys/i386/i386/vm_machdep.c	2002/05/29 07:21:58	#21
> > +++ sys/i386/i386/vm_machdep.c	2002/05/29 07:21:58
> > @@ -283,6 +293,145 @@
> >  }
> >  
> >  void
> > +cpu_thread_setup(struct thread *td)
> > +{
> > +
> > +	td->td_pcb =
> > +	     (struct pcb *)(td->td_kstack + KSTACK_PAGES * PAGE_SIZE) - 1;
> > +	td->td_frame = (struct trapframe *)((caddr_t)td->td_pcb - 16) - 1;
> > +}
> > +
> > +struct md_store {
> > +	struct pcb mds_pcb;
> > +	struct trapframe mds_frame;
> > +};
> > +
> > +void
> > +cpu_save_upcall(struct thread *td, struct kse *newkse)
> > +{
> > +
> > +	/* Point the pcb to the top of the stack. */
> > +	newkse->ke_mdstorage = malloc(sizeof(struct md_store), M_TEMP,
> > +	    M_WAITOK);
> > +	/* Note: use of M_WAITOK means it won't fail. */
> > +	newkse->ke_pcb =
> > +	    &(((struct md_store *)(newkse->ke_mdstorage))->mds_pcb);
> > +	newkse->ke_frame =
> > +	    &(((struct md_store *)(newkse->ke_mdstorage))->mds_frame);
> > +
> > +	/* Copy the upcall pcb. Kernel mode & fp regs are here. */
> > +	bcopy(td->td_pcb, newkse->ke_pcb, sizeof(struct pcb));
> > +
> > +	/* This copies most of the user mode register values. */
> > +	bcopy(td->td_frame, newkse->ke_frame, sizeof(struct trapframe));
> > +}
> 
> ke_frame, ke_pcb and ke_mdstorage should all be in a machine dependent
> struct mdkse, like mdproc.  The fact that the storage is large enough
> to warrant using malloc is machine dependent, so it should not be a
> pointer.  I would be inclined to just embed a trapframe.

errrr... ke_mdstorage is just a pointer to the mdstorage
as are the others.. I don't want to include an md structure into 
the KSE.. it's big enough as it is.. Every process has a KSE
but only KSE-mode processes have the extra mdstorage area.

Do you feel strongly about this?

> 
> The pcb should not be needed at all here; all of the meaningful kernel
> mode register values are set below.  Capturing the whole execution
> context at the time of the kse_new call (floating point registers,
> debug registers) may be expensive and I don't think is worth doing.

Yes I started out with the PCB there but as I went I found I was needing
less and less of it. I even have a comment to that effect somewhere..
At this stage I still have it only because I wanted to make sure that
I had good defaults for anything that I wasn't sure about.. 

Also I haven't figured out what to do about FP registers
and I may want to stuff them there at some stage...
(not sure yet)

> 
> The whole trick of a system call that returns multiple times is
> dubious.  The fact that it works at all is machine dependent; for
> sparc64 it needs wierd hacks in the kernel like is done for vfork.
> It would be better to just register an upcall stack and entry point
> with the kernel, like how signals work.  This would make mdkse even
> smaller.

It's effectively the same thing..
except it allows the function to have persistent state in all it's 
local variables and arguments. Which is REALLY useful in the UTS.
As for hacks.. we have the code in vfork, no?
:-)
(actually the code actually uses fork_return() to do the returns so if
your hack is in there we get it for free.)

> 
[...]
> > +	 * This would remove any requirement for knowing the KSE
> > +	 * at this time (see the matching comment below for
> > +	 * more analysis) (need a good safe default).
> > +	 */


The comment alluded to.


[...]

> > + 
> > + union kse_td_ctx {
> > + 	struct {
> > + 		int			if_onstack;
> > + 		struct intrframe	if_if;
> > + 	} intrfrm;
> > + 	struct {
> > + 		int			tf_onstack;
> > + 		int			tf_gs;
> > + 		struct trapframe	tf_tf;
> > + 	} tfrm;
> > + 	mcontext_t	mcontext;
> > + };
> 
> Please do not export trapframe and intrframe to userland like this;
> just use mcontext.  Doing this makes it part of the kernel->userland
> ABI more so than it already is, debuggers use it for core dumps which
> is bad enough.  mcontext is already there for this purpose and required
> to have a stable ABI.

They are only exported this way because Dan Eischen and I were
experimenting with what context was needed.
We have basically decided that the  mcontext is the one we want
except that I am worried about FP regs.

intrfrm and tfrm will be deleted well before anyone else
gets to play with it..

thanks for reminding me though..


> 
> > Index: sys/kern/kern_intr.c
> > ===========================================================================
> > --- sys/kern/kern_intr.c	2002/05/29 07:21:58	#15
> > +++ sys/kern/kern_intr.c	2002/05/29 07:21:58
> > @@ -387,21 +386,22 @@
> >  	 */
> >  	ithread->it_need = 1;
> >  	mtx_lock_spin(&sched_lock);
> > -	if (p->p_stat == SWAIT) {
> > +	if (td->td_state == TDS_IWAIT) {
> >  		CTR2(KTR_INTR, "%s: setrunqueue %d", __func__, p->p_pid);
> > -		p->p_stat = SRUN;
> > -		setrunqueue(td); /* XXXKSE */
> > -		if (do_switch && curthread->td_critnest == 1 &&
> > -		    curthread->td_proc->p_stat == SRUN) {
> > +		setrunqueue(td);
> > +		if (do_switch &&
> > +		    (curthread->td_critnest == 1) &&
> > +		    (curthread->td_proc->p_state == TDS_RUNNING)) {
> 
> Is this a bug?  TDS_RUNNING is a thread state but its testing the
> process's state.  Seems like this will make preemptions not
> happen.

This IS a bug..
I'm ever so grateful that you are actually reading it :-)


> 
> > Index: sys/kern/kern_synch.c
> > ===========================================================================
> > --- sys/kern/kern_synch.c	2002/05/29 07:21:58	#24
> > +++ sys/kern/kern_synch.c	2002/05/29 07:21:58
> > @@ -788,37 +881,41 @@
> >  	struct proc *p = td->td_proc;
> >  
> >  	mtx_lock_spin(&sched_lock);
> > -	switch (p->p_stat) {
> > -	case SZOMB: /* not a thread flag XXXKSE */
> > +	switch (p->p_state) {
> > +	case PRS_ZOMBIE:
> >  		panic("setrunnable(1)");
> > +	default:
> >  	}
> > -	switch (td->td_proc->p_stat) {
> > +	switch (td->td_state) {
> >  	case 0:
> > -	case SRUN:
> > -	case SWAIT:
> > +	case TDS_RUNNING:
> > +	case TDS_IWAIT:
> >  	default:
> > +		printf("state is %d", td->td_state);
> >  		panic("setrunnable(2)");
> 
> One of these panics was hit by a user running 5.0-DP1.  I guess you missed
> their mail.

no... I didn't see it..
though I'm not exactly sure why this is relevant as I'm
not really changing the panic condition.. (unless of course they were 
running with KSE patches?) (Am I missing something?)


> 
> > Index: sys/kern/kern_thread.c
> > ===========================================================================
> > + /*
> > +  * Tear down type-stable parts of a thread (just before being discarded).
> > +  */
> > + static void
> > + thread_fini(void *mem, int size)
> > + {
> > + 	struct thread	*td;
> > + 
> > + 	KASSERT((size == sizeof(struct thread)),
> > + 	    ("size mismatch: %d != %d\n", size, sizeof(struct thread)));
> > + 
> > + 	td = (struct thread *)mem;
> > + 	pmap_dispose_thread(td);
> > + 	vm_object_deallocate(td->td_kstack_obj);
> > + 	cached_threads--;	/* XXXSMP */
> > + 	allocated_threads--;	/* XXXSMP */
> > + }
> 
> Where is the kva space for the kernel stack freed?  Before it was not
> freed at all, and relied on stable storage to still be around when the
> thread was allocated again.  The fini callout is called when the memory
> is being freed for real and will no longer be type stable.  I think this
> leaks KSTACK_PAGES pages of kva space per thread when the thread zone is
> drained.

 hmmmm
you may be right..
I'll check... yep

(this was recently changed by Jonathon mini so I think he's introduecd
this bug.. I was never freeing them before..)

Jonathon!   (   :-)     )

[...]

> 
> Please use fuword and suword here.


yep.. now I know they exist.. I was actually worried about the 'weight'
of these..

> 
> > Index: sys/kern/subr_trap.c
> > ===========================================================================
> > --- sys/kern/subr_trap.c	2002/05/29 07:21:58	#20
> > +++ sys/kern/subr_trap.c	2002/05/29 07:21:58
> > +
> > +			/*
> > +			 * There is no more work to do and we are going to ride
> > +			 * this thead/KSE up to userland. Make sure the user's
> > +			 * pointer to the thread mailbox is cleared before we
> > +			 * re-enter the kernel next time for any reason..
> > +			 * We might as well do it here.
> > +			 */
> > +			td->td_flags &= ~TDF_UPCALLING;	/* Hmmmm. */
> > +			error = copyout(&dummy, /* NULL */
> > +			    (caddr_t)td->td_kse->ke_mailbox +
> > +			    offsetof(struct kse_mailbox, current_thread),
> > +			    sizeof(void *));
> 
> suword(&td->td_kse->ke_mailbox->current_thread, 0); would be much
> clearer here.

yep!

> 
> [...]
> 
> Before I spend much time on the machine dependent code for sparc64
> I would like to see a well defined kernel->userland interface, with
> long term ABI issues taken into consideration.  I'm not convinced
> that the current kse_new scheme will work; I would much rather see
> just an entry point, which is where the upcalls begin executing.
> This is how the netbsd upcall works as far as I can tell.

But it's a lot harder to set the correct context in userland..
you need to modify the userland UTS setup every time 
there is a change in the compiler framing for example.
This way, it's automatic.. You are restoring the registers
in exactly the way you want them, where the other way, the kernel
code doing the work needs to know what compiler is running the 
userland and how it's optimising variables etc.

this is "self correcting" :-)

> 
> It is much more difficult to ensure that all the register values
> end up the same on each return from the system call on sparc64, due
> to the way that register stack works.  The current test program
> will not work at all, because setjmp, longjmp cannot be used to
> switch the stack in the same way.

The library will not be using setjmp and longjmp in this way but 
instead the setcontext() call that dan wrote for the current thread
library. If that works it should be enough.
(I'd like to investigate your comments though... can you explain
more about why it's a problem? It sure simplifies things on most
architectures I've done this on..


> 
> Jake
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message