From owner-freebsd-smp  Thu Nov 16 21:26: 7 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from io.yi.org (h24-69-199-88.gv.shawcable.net [24.69.199.88])
	by hub.freebsd.org (Postfix) with ESMTP
	id 679A837B479; Thu, 16 Nov 2000 21:25:57 -0800 (PST)
Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1])
	by io.yi.org (Postfix) with ESMTP
	id 5F927BA7A; Thu, 16 Nov 2000 21:25:56 -0800 (PST)
X-Mailer: exmh version 2.1.1 10/15/1999
To: John Baldwin <jhb@FreeBSD.ORG>
Cc: smp@FreeBSD.ORG, cp@bsdi.com, Jake Burkholder <jburkhol@home.com>
Subject: Re: cvs commit: src/sys/kern kern_timeout.c 
In-Reply-To: Message from John Baldwin <jhb@FreeBSD.ORG> 
   of "Thu, 16 Nov 2000 18:08:44 PST." <XFMail.001116180844.jhb@FreeBSD.org> 
Mime-Version: 1.0
Content-Type: multipart/mixed ;
	boundary="==_Exmh_15084400340"
Date: Thu, 16 Nov 2000 21:25:56 -0800
From: Jake Burkholder <jburkhol@home.com>
Message-Id: <20001117052556.5F927BA7A@io.yi.org>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

This is a multipart MIME message.

--==_Exmh_15084400340
Content-Type: text/plain; charset=us-ascii

> 
> On 16-Nov-00 John Baldwin wrote:
> > 
> > On 16-Nov-00 John Baldwin wrote:
> >> 
> >> On 16-Nov-00 John Baldwin wrote:
> >>>> I think we need a separate spin lock for the callout wheel, ala BSD/OS's
> >>>> callout_mtx.  Hardclock looks at the callout wheel and is now a fast
> >>>> interrupt, so it can't acquire a sleep mutex.  Its a little paranoid
> >>>> because hardclock doesn't actually traverse any lists, it just checks
> >>>> if the current callout bucket is empty, and potentially schedules
> >>>> softclock, but you could miss a very short timeout on an smp system.
> >>>> ticks could also get incremented in the middle of softclock's test
> >>>> for if the callout's time has come.
> >>>> 
> >>>> I have patches that do this and make softclock INTR_MPSAFE, I just need
> >>>> to test them.
> >>> 
> >>> Ok.  I was about to check the BSD/OS code to see how this was done there.
> >>> 
> >>>> There's actually another major problem with this.  The run queue and
> >>>> sleep queue use the same list linkage in struct proc, so its not
> >>>> safe to release sched_lock while you're on the sleep queue.  If
> >>>> the process blocks on giant in CURSIG, the sleep queue will get
> >>>> corrupted.  We really need to split the run queue/sleep queue
> >>>> linkage.
> >>> 
> >>> Ugh, ok.  I'll do this next then.  Grrrr.
> >> 
> >> Grr, wouldn't you know it, bar just died with a double fault because
> >> 
> >> panic: cpu_switch has wchan
> >> 
> >> Happened when I Ctrl-C'd a process. :-P
> >> 
> >> *sigh*
> > 
> > I actually don't like the concept of CURSIG() forcing a context switch due to
> > needing to grab Giant.  For one thing, it breaks the nice assertion of
> > running
> > processes not having p->p_wchan != NULL that caused my machine to panic.  I'm
> > trying a patch right now that grabs Giant in msleep() before we grab the
> > sched_lock so that the call to CURSIG() before mi_switch() won't need to
> > block.
> > It then releases Giant after CURSIG().  For the CURSIG() after mi_switch(),
> > doing another context switch due to blocking on Giant isn't a problem, so it
> > doesn't mess with it.  (Not that there is anything one could do to work
> > around
> > it.)
> 
> Well, when I tried this the machine hung on the first fast interrupt handler it
> ran, so it doesn't look like this approach works either. :-/  I've tried
> splitting up p_procq into p_runq, p_sleepq, and p_mtxq (for processes blocked
> on a mutex), but while those kernels boot ok and seem to sort of run, I end up
> with hung processes.  If I (or anyone else) don't have a good solution for this,
> then I think I will back out the changes to move Giant out of mi_switch()
> tomorrow afternoon until we have a solution for this.

Hmm.  This is unfortunate because the change to mi_switch is absolutely
necessary.  I don't see why changing the queues would cause problems.
The p_runq/p_mtxq isn't really necessary because they are mutually
exclusive, like the runq/sleepq used to be.

Looking at the use of DROP_GIANT_NOSWITCH, I think that it should be
after the mtx_enter(&sched_lock, MTX_SPIN) instead of before.  The idea
with the noswitch flag is that it allows you to release a sleep mutex
with interrupts disabled, otherwise there's no point.  If interrupts are
enabled you're supposed to be able to block.

Here's a patch that seems to work ok here.  It works on my UP box running
X and stuff, I don't notice any hung processes.  It works on the SMP box,
I built a kernel over NFS and it seemed ok, except that I can't
get a kernel to boot using the serial console with WITNESS enabled.  It
stops at the twiddle thing before the copyright is printed and just hangs.
This also happens without the patch and even with a UP kernel, so I don't
really know what's going on.  Please try and let me know if you still
see the hung processes.

I think we'll just have to settle with no longer being able to assert
that a process has no wchan in cpu_switch.  There's bound to be alot of
contention with Giant, but eventually signals should be protected by
a per-process mutex, so a context switch at that point is less likely.
 
Jake

> 
> -- 
> 
> John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
> PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
> "Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-smp" in the body of the message


--==_Exmh_15084400340
Content-Type: text/plain ; name="slpq.diff"; charset=us-ascii
Content-Description: slpq.diff
Content-Disposition: attachment; filename="slpq.diff"

Index: i386/i386/swtch.s
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v
retrieving revision 1.98
diff -u -r1.98 swtch.s
--- i386/i386/swtch.s	2000/10/13 22:03:29	1.98
+++ i386/i386/swtch.s	2000/11/17 03:23:56
@@ -184,8 +184,10 @@
 	andl	$~AST_RESCHED,_astpending
 
 #ifdef	DIAGNOSTIC
+#ifdef  notanymore
 	cmpl	%eax,P_WCHAN(%ecx)
 	jne	badsw1
+#endif
 	cmpb	$SRUN,P_STAT(%ecx)
 	jne	badsw2
 #endif
Index: i386/i386/trap.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.160
diff -u -r1.160 trap.c
--- i386/i386/trap.c	2000/11/16 02:16:43	1.160
+++ i386/i386/trap.c	2000/11/17 03:20:24
@@ -192,8 +192,8 @@
 		 * our priority.
 		 */
 		s = splhigh();
-		DROP_GIANT_NOSWITCH();
 		mtx_enter(&sched_lock, MTX_SPIN);
+		DROP_GIANT_NOSWITCH();
 		setrunqueue(p);
 		p->p_stats->p_ru.ru_nivcsw++;
 		mi_switch();
Index: ia64/ia64/trap.c
===================================================================
RCS file: /home/ncvs/src/sys/ia64/ia64/trap.c,v
retrieving revision 1.6
diff -u -r1.6 trap.c
--- ia64/ia64/trap.c	2000/11/16 02:16:43	1.6
+++ ia64/ia64/trap.c	2000/11/17 04:34:19
@@ -102,8 +102,8 @@
 		 * indicated by our priority.
 		 */
 		s = splstatclock();
-		DROP_GIANT_NOSWITCH();
 		mtx_enter(&sched_lock, MTX_SPIN);
+		DROP_GIANT_NOSWITCH();
 		setrunqueue(p);
 		p->p_stats->p_ru.ru_nivcsw++;
 		mi_switch();
Index: alpha/alpha/trap.c
===================================================================
RCS file: /home/ncvs/src/sys/alpha/alpha/trap.c,v
retrieving revision 1.36
diff -u -r1.36 trap.c
--- alpha/alpha/trap.c	2000/11/16 02:16:42	1.36
+++ alpha/alpha/trap.c	2000/11/17 04:34:06
@@ -120,8 +120,8 @@
 		 * indicated by our priority.
 		 */
 		s = splstatclock();
-		DROP_GIANT_NOSWITCH();
 		mtx_enter(&sched_lock, MTX_SPIN);
+		DROP_GIANT_NOSWITCH();
 		setrunqueue(p);
 		p->p_stats->p_ru.ru_nivcsw++;
 		mi_switch();
Index: sys/proc.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/proc.h,v
retrieving revision 1.122
diff -u -r1.122 proc.h
--- sys/proc.h	2000/10/29 16:06:54	1.122
+++ sys/proc.h	2000/11/17 03:30:57
@@ -127,7 +127,8 @@
 struct ithd;
 
 struct	proc {
-	TAILQ_ENTRY(proc) p_procq;	/* run/sleep queue. */
+	TAILQ_ENTRY(proc) p_procq;	/* run/mutex queue. */
+	TAILQ_ENTRY(proc) p_slpq;	/* sleep queue. */
 	LIST_ENTRY(proc) p_list;	/* List of all processes. */
 
 	/* substructures: */
Index: kern/kern_mutex.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_mutex.c,v
retrieving revision 1.16
diff -u -r1.16 kern_mutex.c
--- kern/kern_mutex.c	2000/11/16 02:16:44	1.16
+++ kern/kern_mutex.c	2000/11/17 03:22:42
@@ -718,6 +718,7 @@
 };
 
 static char *sleep_list[] = {
+	"Giant",
 	NULL
 };
 
Index: kern/kern_sig.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.91
diff -u -r1.91 kern_sig.c
--- kern/kern_sig.c	2000/11/16 02:16:44	1.91
+++ kern/kern_sig.c	2000/11/17 03:21:45
@@ -1283,8 +1283,8 @@
 			psignal(p->p_pptr, SIGCHLD);
 			do {
 				stop(p);
-				DROP_GIANT_NOSWITCH();
 				mtx_enter(&sched_lock, MTX_SPIN);
+				DROP_GIANT_NOSWITCH();
 				mi_switch();
 				mtx_exit(&sched_lock, MTX_SPIN);
 				PICKUP_GIANT();
@@ -1356,8 +1356,8 @@
 				stop(p);
 				if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDSTOP) == 0)
 					psignal(p->p_pptr, SIGCHLD);
-				DROP_GIANT_NOSWITCH();
 				mtx_enter(&sched_lock, MTX_SPIN);
+				DROP_GIANT_NOSWITCH();
 				mi_switch();
 				mtx_exit(&sched_lock, MTX_SPIN);
 				PICKUP_GIANT();
Index: kern/kern_subr.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_subr.c,v
retrieving revision 1.37
diff -u -r1.37 kern_subr.c
--- kern/kern_subr.c	2000/11/16 02:16:44	1.37
+++ kern/kern_subr.c	2000/11/17 03:21:59
@@ -377,8 +377,8 @@
 
 	p = curproc;
 	s = splhigh();
-	DROP_GIANT_NOSWITCH();
 	mtx_enter(&sched_lock, MTX_SPIN);
+	DROP_GIANT_NOSWITCH();
 	p->p_priority = p->p_usrpri;
 	setrunqueue(p);
 	p->p_stats->p_ru.ru_nivcsw++;
Index: kern/kern_synch.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.108
diff -u -r1.108 kern_synch.c
--- kern/kern_synch.c	2000/11/16 02:16:44	1.108
+++ kern/kern_synch.c	2000/11/17 03:29:16
@@ -420,9 +420,9 @@
 	if (p && KTRPOINT(p, KTR_CSW))
 		ktrcsw(p->p_tracep, 1, 0);
 #endif
-	DROP_GIANT_NOSWITCH();
 	WITNESS_SLEEP(0, mtx);
 	mtx_enter(&sched_lock, MTX_SPIN);
+	DROP_GIANT_NOSWITCH();
 
 	if (mtx != NULL) {
 		mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED);
@@ -461,7 +461,7 @@
 	p->p_nativepri = p->p_priority;
 	CTR4(KTR_PROC, "msleep: proc %p (pid %d, %s), schedlock %p",
 		p, p->p_pid, p->p_comm, (void *) sched_lock.mtx_lock);
-	TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_procq);
+	TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_slpq);
 	if (timo)
 		thandle = timeout(endtsleep, (void *)p, timo);
 	/*
@@ -583,7 +583,7 @@
 		p->p_slptime = 0;
 		p->p_asleep.as_priority = priority;
 		p->p_asleep.as_timo = timo;
-		TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_procq);
+		TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_slpq);
 	}
 
 	mtx_exit(&sched_lock, MTX_SPIN);
@@ -612,9 +612,9 @@
 	int s;
 	WITNESS_SAVE_DECL(mtx);
 
-	DROP_GIANT_NOSWITCH();
 	WITNESS_SLEEP(0, mtx);
 	mtx_enter(&sched_lock, MTX_SPIN);
+	DROP_GIANT_NOSWITCH();
 	if (mtx != NULL) {
 		mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED);
 		WITNESS_SAVE(mtx, mtx);
@@ -778,7 +778,7 @@
 	s = splhigh();
 	mtx_enter(&sched_lock, MTX_SPIN);
 	if (p->p_wchan) {
-		TAILQ_REMOVE(&slpque[LOOKUP(p->p_wchan)], p, p_procq);
+		TAILQ_REMOVE(&slpque[LOOKUP(p->p_wchan)], p, p_slpq);
 		p->p_wchan = 0;
 	}
 	mtx_exit(&sched_lock, MTX_SPIN);
@@ -800,9 +800,9 @@
 	mtx_enter(&sched_lock, MTX_SPIN);
 	qp = &slpque[LOOKUP(ident)];
 restart:
-	TAILQ_FOREACH(p, qp, p_procq) {
+	TAILQ_FOREACH(p, qp, p_slpq) {
 		if (p->p_wchan == ident) {
-			TAILQ_REMOVE(qp, p, p_procq);
+			TAILQ_REMOVE(qp, p, p_slpq);
 			p->p_wchan = 0;
 			if (p->p_stat == SSLEEP) {
 				/* OPTIMIZED EXPANSION OF setrunnable(p); */
@@ -846,9 +846,9 @@
 	mtx_enter(&sched_lock, MTX_SPIN);
 	qp = &slpque[LOOKUP(ident)];
 
-	TAILQ_FOREACH(p, qp, p_procq) {
+	TAILQ_FOREACH(p, qp, p_slpq) {
 		if (p->p_wchan == ident) {
-			TAILQ_REMOVE(qp, p, p_procq);
+			TAILQ_REMOVE(qp, p, p_slpq);
 			p->p_wchan = 0;
 			if (p->p_stat == SSLEEP) {
 				/* OPTIMIZED EXPANSION OF setrunnable(p); */

--==_Exmh_15084400340--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message