From owner-freebsd-smp Thu Nov 16 18:12:30 2000 Delivered-To: freebsd-smp@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id B928137B479 for ; Thu, 16 Nov 2000 18:12:27 -0800 (PST) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id eAH2C8B76224; Thu, 16 Nov 2000 18:12:08 -0800 (PST) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.1/8.11.0) id eAH28iJ94107; Thu, 16 Nov 2000 18:08:44 -0800 (PST) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Thu, 16 Nov 2000 18:08:44 -0800 (PST) Organization: BSD, Inc. From: John Baldwin To: smp@FreeBSD.org Subject: Re: cvs commit: src/sys/kern kern_timeout.c Cc: cp@bsdi.com, jake@io.yi.org, Jake Burkholder Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 16-Nov-00 John Baldwin wrote: > > On 16-Nov-00 John Baldwin wrote: >> >> On 16-Nov-00 John Baldwin wrote: >>>> I think we need a separate spin lock for the callout wheel, ala BSD/OS's >>>> callout_mtx. Hardclock looks at the callout wheel and is now a fast >>>> interrupt, so it can't acquire a sleep mutex. Its a little paranoid >>>> because hardclock doesn't actually traverse any lists, it just checks >>>> if the current callout bucket is empty, and potentially schedules >>>> softclock, but you could miss a very short timeout on an smp system. >>>> ticks could also get incremented in the middle of softclock's test >>>> for if the callout's time has come. >>>> >>>> I have patches that do this and make softclock INTR_MPSAFE, I just need >>>> to test them. >>> >>> Ok. I was about to check the BSD/OS code to see how this was done there. >>> >>>> There's actually another major problem with this. The run queue and >>>> sleep queue use the same list linkage in struct proc, so its not >>>> safe to release sched_lock while you're on the sleep queue. If >>>> the process blocks on giant in CURSIG, the sleep queue will get >>>> corrupted. We really need to split the run queue/sleep queue >>>> linkage. >>> >>> Ugh, ok. I'll do this next then. Grrrr. >> >> Grr, wouldn't you know it, bar just died with a double fault because >> >> panic: cpu_switch has wchan >> >> Happened when I Ctrl-C'd a process. :-P >> >> *sigh* > > I actually don't like the concept of CURSIG() forcing a context switch due to > needing to grab Giant. For one thing, it breaks the nice assertion of > running > processes not having p->p_wchan != NULL that caused my machine to panic. I'm > trying a patch right now that grabs Giant in msleep() before we grab the > sched_lock so that the call to CURSIG() before mi_switch() won't need to > block. > It then releases Giant after CURSIG(). For the CURSIG() after mi_switch(), > doing another context switch due to blocking on Giant isn't a problem, so it > doesn't mess with it. (Not that there is anything one could do to work > around > it.) Well, when I tried this the machine hung on the first fast interrupt handler it ran, so it doesn't look like this approach works either. :-/ I've tried splitting up p_procq into p_runq, p_sleepq, and p_mtxq (for processes blocked on a mutex), but while those kernels boot ok and seem to sort of run, I end up with hung processes. If I (or anyone else) don't have a good solution for this, then I think I will back out the changes to move Giant out of mi_switch() tomorrow afternoon until we have a solution for this. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message