From owner-freebsd-smp Thu Nov 16 21:26: 7 2000 Delivered-To: freebsd-smp@freebsd.org Received: from io.yi.org (h24-69-199-88.gv.shawcable.net [24.69.199.88]) by hub.freebsd.org (Postfix) with ESMTP id 679A837B479; Thu, 16 Nov 2000 21:25:57 -0800 (PST) Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1]) by io.yi.org (Postfix) with ESMTP id 5F927BA7A; Thu, 16 Nov 2000 21:25:56 -0800 (PST) X-Mailer: exmh version 2.1.1 10/15/1999 To: John Baldwin Cc: smp@FreeBSD.ORG, cp@bsdi.com, Jake Burkholder Subject: Re: cvs commit: src/sys/kern kern_timeout.c In-Reply-To: Message from John Baldwin of "Thu, 16 Nov 2000 18:08:44 PST." Mime-Version: 1.0 Content-Type: multipart/mixed ; boundary="==_Exmh_15084400340" Date: Thu, 16 Nov 2000 21:25:56 -0800 From: Jake Burkholder Message-Id: <20001117052556.5F927BA7A@io.yi.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org This is a multipart MIME message. --==_Exmh_15084400340 Content-Type: text/plain; charset=us-ascii > > On 16-Nov-00 John Baldwin wrote: > > > > On 16-Nov-00 John Baldwin wrote: > >> > >> On 16-Nov-00 John Baldwin wrote: > >>>> I think we need a separate spin lock for the callout wheel, ala BSD/OS's > >>>> callout_mtx. Hardclock looks at the callout wheel and is now a fast > >>>> interrupt, so it can't acquire a sleep mutex. Its a little paranoid > >>>> because hardclock doesn't actually traverse any lists, it just checks > >>>> if the current callout bucket is empty, and potentially schedules > >>>> softclock, but you could miss a very short timeout on an smp system. > >>>> ticks could also get incremented in the middle of softclock's test > >>>> for if the callout's time has come. > >>>> > >>>> I have patches that do this and make softclock INTR_MPSAFE, I just need > >>>> to test them. > >>> > >>> Ok. I was about to check the BSD/OS code to see how this was done there. > >>> > >>>> There's actually another major problem with this. The run queue and > >>>> sleep queue use the same list linkage in struct proc, so its not > >>>> safe to release sched_lock while you're on the sleep queue. If > >>>> the process blocks on giant in CURSIG, the sleep queue will get > >>>> corrupted. We really need to split the run queue/sleep queue > >>>> linkage. > >>> > >>> Ugh, ok. I'll do this next then. Grrrr. > >> > >> Grr, wouldn't you know it, bar just died with a double fault because > >> > >> panic: cpu_switch has wchan > >> > >> Happened when I Ctrl-C'd a process. :-P > >> > >> *sigh* > > > > I actually don't like the concept of CURSIG() forcing a context switch due to > > needing to grab Giant. For one thing, it breaks the nice assertion of > > running > > processes not having p->p_wchan != NULL that caused my machine to panic. I'm > > trying a patch right now that grabs Giant in msleep() before we grab the > > sched_lock so that the call to CURSIG() before mi_switch() won't need to > > block. > > It then releases Giant after CURSIG(). For the CURSIG() after mi_switch(), > > doing another context switch due to blocking on Giant isn't a problem, so it > > doesn't mess with it. (Not that there is anything one could do to work > > around > > it.) > > Well, when I tried this the machine hung on the first fast interrupt handler it > ran, so it doesn't look like this approach works either. :-/ I've tried > splitting up p_procq into p_runq, p_sleepq, and p_mtxq (for processes blocked > on a mutex), but while those kernels boot ok and seem to sort of run, I end up > with hung processes. If I (or anyone else) don't have a good solution for this, > then I think I will back out the changes to move Giant out of mi_switch() > tomorrow afternoon until we have a solution for this. Hmm. This is unfortunate because the change to mi_switch is absolutely necessary. I don't see why changing the queues would cause problems. The p_runq/p_mtxq isn't really necessary because they are mutually exclusive, like the runq/sleepq used to be. Looking at the use of DROP_GIANT_NOSWITCH, I think that it should be after the mtx_enter(&sched_lock, MTX_SPIN) instead of before. The idea with the noswitch flag is that it allows you to release a sleep mutex with interrupts disabled, otherwise there's no point. If interrupts are enabled you're supposed to be able to block. Here's a patch that seems to work ok here. It works on my UP box running X and stuff, I don't notice any hung processes. It works on the SMP box, I built a kernel over NFS and it seemed ok, except that I can't get a kernel to boot using the serial console with WITNESS enabled. It stops at the twiddle thing before the copyright is printed and just hangs. This also happens without the patch and even with a UP kernel, so I don't really know what's going on. Please try and let me know if you still see the hung processes. I think we'll just have to settle with no longer being able to assert that a process has no wchan in cpu_switch. There's bound to be alot of contention with Giant, but eventually signals should be protected by a per-process mutex, so a context switch at that point is less likely. Jake > > -- > > John Baldwin -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-smp" in the body of the message --==_Exmh_15084400340 Content-Type: text/plain ; name="slpq.diff"; charset=us-ascii Content-Description: slpq.diff Content-Disposition: attachment; filename="slpq.diff" Index: i386/i386/swtch.s =================================================================== RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v retrieving revision 1.98 diff -u -r1.98 swtch.s --- i386/i386/swtch.s 2000/10/13 22:03:29 1.98 +++ i386/i386/swtch.s 2000/11/17 03:23:56 @@ -184,8 +184,10 @@ andl $~AST_RESCHED,_astpending #ifdef DIAGNOSTIC +#ifdef notanymore cmpl %eax,P_WCHAN(%ecx) jne badsw1 +#endif cmpb $SRUN,P_STAT(%ecx) jne badsw2 #endif Index: i386/i386/trap.c =================================================================== RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.160 diff -u -r1.160 trap.c --- i386/i386/trap.c 2000/11/16 02:16:43 1.160 +++ i386/i386/trap.c 2000/11/17 03:20:24 @@ -192,8 +192,8 @@ * our priority. */ s = splhigh(); - DROP_GIANT_NOSWITCH(); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); setrunqueue(p); p->p_stats->p_ru.ru_nivcsw++; mi_switch(); Index: ia64/ia64/trap.c =================================================================== RCS file: /home/ncvs/src/sys/ia64/ia64/trap.c,v retrieving revision 1.6 diff -u -r1.6 trap.c --- ia64/ia64/trap.c 2000/11/16 02:16:43 1.6 +++ ia64/ia64/trap.c 2000/11/17 04:34:19 @@ -102,8 +102,8 @@ * indicated by our priority. */ s = splstatclock(); - DROP_GIANT_NOSWITCH(); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); setrunqueue(p); p->p_stats->p_ru.ru_nivcsw++; mi_switch(); Index: alpha/alpha/trap.c =================================================================== RCS file: /home/ncvs/src/sys/alpha/alpha/trap.c,v retrieving revision 1.36 diff -u -r1.36 trap.c --- alpha/alpha/trap.c 2000/11/16 02:16:42 1.36 +++ alpha/alpha/trap.c 2000/11/17 04:34:06 @@ -120,8 +120,8 @@ * indicated by our priority. */ s = splstatclock(); - DROP_GIANT_NOSWITCH(); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); setrunqueue(p); p->p_stats->p_ru.ru_nivcsw++; mi_switch(); Index: sys/proc.h =================================================================== RCS file: /home/ncvs/src/sys/sys/proc.h,v retrieving revision 1.122 diff -u -r1.122 proc.h --- sys/proc.h 2000/10/29 16:06:54 1.122 +++ sys/proc.h 2000/11/17 03:30:57 @@ -127,7 +127,8 @@ struct ithd; struct proc { - TAILQ_ENTRY(proc) p_procq; /* run/sleep queue. */ + TAILQ_ENTRY(proc) p_procq; /* run/mutex queue. */ + TAILQ_ENTRY(proc) p_slpq; /* sleep queue. */ LIST_ENTRY(proc) p_list; /* List of all processes. */ /* substructures: */ Index: kern/kern_mutex.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_mutex.c,v retrieving revision 1.16 diff -u -r1.16 kern_mutex.c --- kern/kern_mutex.c 2000/11/16 02:16:44 1.16 +++ kern/kern_mutex.c 2000/11/17 03:22:42 @@ -718,6 +718,7 @@ }; static char *sleep_list[] = { + "Giant", NULL }; Index: kern/kern_sig.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.91 diff -u -r1.91 kern_sig.c --- kern/kern_sig.c 2000/11/16 02:16:44 1.91 +++ kern/kern_sig.c 2000/11/17 03:21:45 @@ -1283,8 +1283,8 @@ psignal(p->p_pptr, SIGCHLD); do { stop(p); - DROP_GIANT_NOSWITCH(); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); mi_switch(); mtx_exit(&sched_lock, MTX_SPIN); PICKUP_GIANT(); @@ -1356,8 +1356,8 @@ stop(p); if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDSTOP) == 0) psignal(p->p_pptr, SIGCHLD); - DROP_GIANT_NOSWITCH(); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); mi_switch(); mtx_exit(&sched_lock, MTX_SPIN); PICKUP_GIANT(); Index: kern/kern_subr.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_subr.c,v retrieving revision 1.37 diff -u -r1.37 kern_subr.c --- kern/kern_subr.c 2000/11/16 02:16:44 1.37 +++ kern/kern_subr.c 2000/11/17 03:21:59 @@ -377,8 +377,8 @@ p = curproc; s = splhigh(); - DROP_GIANT_NOSWITCH(); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); p->p_priority = p->p_usrpri; setrunqueue(p); p->p_stats->p_ru.ru_nivcsw++; Index: kern/kern_synch.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v retrieving revision 1.108 diff -u -r1.108 kern_synch.c --- kern/kern_synch.c 2000/11/16 02:16:44 1.108 +++ kern/kern_synch.c 2000/11/17 03:29:16 @@ -420,9 +420,9 @@ if (p && KTRPOINT(p, KTR_CSW)) ktrcsw(p->p_tracep, 1, 0); #endif - DROP_GIANT_NOSWITCH(); WITNESS_SLEEP(0, mtx); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); if (mtx != NULL) { mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED); @@ -461,7 +461,7 @@ p->p_nativepri = p->p_priority; CTR4(KTR_PROC, "msleep: proc %p (pid %d, %s), schedlock %p", p, p->p_pid, p->p_comm, (void *) sched_lock.mtx_lock); - TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_procq); + TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_slpq); if (timo) thandle = timeout(endtsleep, (void *)p, timo); /* @@ -583,7 +583,7 @@ p->p_slptime = 0; p->p_asleep.as_priority = priority; p->p_asleep.as_timo = timo; - TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_procq); + TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_slpq); } mtx_exit(&sched_lock, MTX_SPIN); @@ -612,9 +612,9 @@ int s; WITNESS_SAVE_DECL(mtx); - DROP_GIANT_NOSWITCH(); WITNESS_SLEEP(0, mtx); mtx_enter(&sched_lock, MTX_SPIN); + DROP_GIANT_NOSWITCH(); if (mtx != NULL) { mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED); WITNESS_SAVE(mtx, mtx); @@ -778,7 +778,7 @@ s = splhigh(); mtx_enter(&sched_lock, MTX_SPIN); if (p->p_wchan) { - TAILQ_REMOVE(&slpque[LOOKUP(p->p_wchan)], p, p_procq); + TAILQ_REMOVE(&slpque[LOOKUP(p->p_wchan)], p, p_slpq); p->p_wchan = 0; } mtx_exit(&sched_lock, MTX_SPIN); @@ -800,9 +800,9 @@ mtx_enter(&sched_lock, MTX_SPIN); qp = &slpque[LOOKUP(ident)]; restart: - TAILQ_FOREACH(p, qp, p_procq) { + TAILQ_FOREACH(p, qp, p_slpq) { if (p->p_wchan == ident) { - TAILQ_REMOVE(qp, p, p_procq); + TAILQ_REMOVE(qp, p, p_slpq); p->p_wchan = 0; if (p->p_stat == SSLEEP) { /* OPTIMIZED EXPANSION OF setrunnable(p); */ @@ -846,9 +846,9 @@ mtx_enter(&sched_lock, MTX_SPIN); qp = &slpque[LOOKUP(ident)]; - TAILQ_FOREACH(p, qp, p_procq) { + TAILQ_FOREACH(p, qp, p_slpq) { if (p->p_wchan == ident) { - TAILQ_REMOVE(qp, p, p_procq); + TAILQ_REMOVE(qp, p, p_slpq); p->p_wchan = 0; if (p->p_stat == SSLEEP) { /* OPTIMIZED EXPANSION OF setrunnable(p); */ --==_Exmh_15084400340-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message