From owner-freebsd-smp Sun Dec 22 01:06:20 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id BAA24258 for smp-outgoing; Sun, 22 Dec 1996 01:06:20 -0800 (PST) Received: from tfs.com (tfs.com [140.145.250.1]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id BAA24253 for ; Sun, 22 Dec 1996 01:06:19 -0800 (PST) Received: from critter.tfs.com by tfs.com (smail3.1.28.1) with SMTP id m0vbjqt-0003wDC; Sun, 22 Dec 96 01:05 PST Received: from critter.tfs.com (localhost.phk.dk [127.0.0.1]) by critter.tfs.com (8.8.2/8.8.2) with ESMTP id KAA24902 for ; Sun, 22 Dec 1996 10:10:20 +0100 (MET) To: smp@freebsd.org Subject: P6 problem idea Reply-to: phk@freebsd.org Date: Sun, 22 Dec 1996 10:10:19 +0100 Message-ID: <24900.851245819@critter.tfs.com> From: Poul-Henning Kamp Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk John Dyson has enabled the "Global" bit on some parts of the kernel. This makes the marked pages immune to TLB flushes. Could this be the reason for the P6 Problems ? I belive there is a bit in a control register that will enable/disable this, it's probably in locore.s somewhere. -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@ref.tfs.com TRW Financial Systems, Inc. Future will arrive by its own means, progress not so. From owner-freebsd-smp Sun Dec 22 09:24:34 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA12559 for smp-outgoing; Sun, 22 Dec 1996 09:24:34 -0800 (PST) Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id JAA12553; Sun, 22 Dec 1996 09:24:31 -0800 (PST) Received: (from root@localhost) by dyson.iquest.net (8.8.2/8.6.9) id MAA07051; Sun, 22 Dec 1996 12:24:35 -0500 (EST) From: "John S. Dyson" Message-Id: <199612221724.MAA07051@dyson.iquest.net> Subject: Re: P6 problem idea To: phk@freebsd.org Date: Sun, 22 Dec 1996 12:24:35 -0500 (EST) Cc: smp@freebsd.org In-Reply-To: <24900.851245819@critter.tfs.com> from "Poul-Henning Kamp" at Dec 22, 96 10:10:19 am X-Mailer: ELM [version 2.4 PL24 ME8] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > > John Dyson has enabled the "Global" bit on some parts of the kernel. > This makes the marked pages immune to TLB flushes. Could this be the > reason for the P6 Problems ? > > I belive there is a bit in a control register that will enable/disable > this, it's probably in locore.s somewhere. > If you want to test that -- all you have to do is to disable the setting of PG_G in locore and in pmap. You can also optionally turn off the capability (but you still need to disable all of the PG_G bit setting) by not setting the CR4_PGE bit in cr4 also. This *could* be the problem, because you need to do the single page updates when the PG_G bit has been set (or there are more complex ways of doing it also.) The global update will just not work... (Of course, that is much of the purpose of the PG_G bit :-)). John From owner-freebsd-smp Sun Dec 22 09:36:48 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA13569 for smp-outgoing; Sun, 22 Dec 1996 09:36:48 -0800 (PST) Received: from uruk.org (root@faustus.dev.com [198.145.95.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA13557 for ; Sun, 22 Dec 1996 09:36:43 -0800 (PST) Received: from uruk.org [127.0.0.1] (erich) by uruk.org with esmtp (Exim 0.53 #1) id E0vbsml-00080n-00; Sun, 22 Dec 1996 10:38:07 -0800 To: smp@freebsd.org Subject: (long) P6 and ??? TLB shootdown ??? Date: Sun, 22 Dec 1996 10:38:07 -0800 From: Erich Boleyn Message-Id: Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi all. I spent the last few days doing debugging exercises with FreeBSD-SMP on my P6 SMP test box. The results were interesting. First of all, I dug around in the debugger more, always getting an error message and stack traceback that always looks like the following (modulo some differences in the "fault virtual address" and the "current process" stuff): ---------------------------(start DDB stuff)-------------------------------- Fatal trap 12: page fault while in kernel mode cpunumber = 0 fault virtual addres = 0xffc00034 fault code = supervisor read, page not present instruction pointer = 0x8:0xf01d158f stack pointer = 0x10:0xefbffe94 frame pointer = 0x10:0xefbffeb0 ... current process = 419 (cc) interrupt mask = kernel: type 12 trap, code=0 Stopped at _pmap_enter+0x8f: movl 0(%ecx),%ecx db> trace _pmap_enter(f2336a64,d000,1d34000,7,0) at _pmap_enter+0x8f _vm_fault(f2336a00,d000,3,0,0) at _vm_fault+0xd0b _trap_pfault(efbfffbc,1) at _trap_pfault+0xd4 _trap(27,27,0,efbfdbac,efbfdba4) at _trap+0x14b calltrap() at calltrap+0x1a --- trap 12, eip = 0x1048, ebp = 0xefbfdba4 --- --- curproc = 0xf22f6e00, pid = 419 --- ---------------------------(end DDB stuff)-------------------------------- I think it is not just TLB shootdown issues, for 2 reasons: (1) I tried using the "examine" command for the virtual address listed in the error, and it gave me another "page fault in kernel mode" error, and (2) I implemented a wait for all other CPUs after the TLB shootdown messages were sent, plus placing a *long* wait afterward for paranoia. This gave exactly the same results. The two above points lead me to believe that: (a) it is not a TLB shootdown issue in the sense that simply having a better rendevous procedure to make sure the TLB shootdowns all happen before the sending CPU proceeds would solve it, and (b) it looks like it might really be a problem in the code which sets up the kernel pmaps which point to the user-level pmaps, since I'm getting consistent page faults when accessing the page tables of the user-level process. I'll continue to look into it later today... -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying" From owner-freebsd-smp Sun Dec 22 13:50:08 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id NAA21165 for smp-outgoing; Sun, 22 Dec 1996 13:50:08 -0800 (PST) Received: from tfs.com (tfs.com [140.145.250.1]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id NAA21155 for ; Sun, 22 Dec 1996 13:50:05 -0800 (PST) Received: from critter.tfs.com by tfs.com (smail3.1.28.1) with SMTP id m0vbvku-0003xZC; Sun, 22 Dec 96 13:48 PST Received: from critter.tfs.com (localhost.phk.dk [127.0.0.1]) by critter.tfs.com (8.8.2/8.8.2) with ESMTP id WAA26011; Sun, 22 Dec 1996 22:53:02 +0100 (MET) To: "John S. Dyson" cc: smp@freebsd.org Subject: Re: P6 problem idea In-reply-to: Your message of "Sun, 22 Dec 1996 12:24:35 EST." <199612221724.MAA07051@dyson.iquest.net> Date: Sun, 22 Dec 1996 22:53:02 +0100 Message-ID: <26009.851291582@critter.tfs.com> From: Poul-Henning Kamp Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message <199612221724.MAA07051@dyson.iquest.net>, "John S. Dyson" writes: >> >> >> John Dyson has enabled the "Global" bit on some parts of the kernel. >> This makes the marked pages immune to TLB flushes. Could this be the >> reason for the P6 Problems ? >> >> I belive there is a bit in a control register that will enable/disable >> this, it's probably in locore.s somewhere. >> >If you want to test that -- all you have to do is to disable the >setting of PG_G in locore and in pmap. You can also optionally >turn off the capability (but you still need to disable all of the >PG_G bit setting) by not setting the CR4_PGE bit in cr4 also. > >This *could* be the problem, because you need to do the single page >updates when the PG_G bit has been set (or there are more complex ways >of doing it also.) The global update will just not work... (Of course, >that is much of the purpose of the PG_G bit :-)). Well, we might have to be more selective about what we "Globalize" then... -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-smp Sun Dec 22 14:18:47 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id OAA22238 for smp-outgoing; Sun, 22 Dec 1996 14:18:47 -0800 (PST) Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id OAA22233 for ; Sun, 22 Dec 1996 14:18:43 -0800 (PST) Received: (from root@localhost) by dyson.iquest.net (8.8.2/8.6.9) id RAA00490; Sun, 22 Dec 1996 17:18:37 -0500 (EST) From: "John S. Dyson" Message-Id: <199612222218.RAA00490@dyson.iquest.net> Subject: Re: P6 problem idea To: phk@critter.tfs.com (Poul-Henning Kamp) Date: Sun, 22 Dec 1996 17:18:36 -0500 (EST) Cc: smp@freebsd.org In-Reply-To: <26009.851291582@critter.tfs.com> from "Poul-Henning Kamp" at Dec 22, 96 10:53:02 pm Reply-To: dyson@freebsd.org X-Mailer: ELM [version 2.4 PL24 ME8] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > >If you want to test that -- all you have to do is to disable the > >setting of PG_G in locore and in pmap. You can also optionally > >turn off the capability (but you still need to disable all of the > >PG_G bit setting) by not setting the CR4_PGE bit in cr4 also. > > > >This *could* be the problem, because you need to do the single page > >updates when the PG_G bit has been set (or there are more complex ways > >of doing it also.) The global update will just not work... (Of course, > >that is much of the purpose of the PG_G bit :-)). > > Well, we might have to be more selective about what we "Globalize" then... > Only in the SMP case. The kernel itself (the image) is pretty safe, and there is quite alot of the bang to be gotten just for that. I think that it is OK to tradeoff a little in performance on an SMP kernel. In the single-processor kernel, the biggest thing that we currently cannot take advantage of mapping into the kernel permanently are the UPAGES and associated kernel stack. When we make the kernel stack more mobile, we'll even be able to take advantage of the PG_G flag for that. I still think that we should also take advantage of the 4M pages, but that is still probably in the future (except for perhaps mapping in the frame-buffer.) Sure wish those 4M pages were 256K or so, we could use them much more effectively. So, after all of my rambling, it is probably safe to keep the PG_G flag being set in locore at startup time (esp. for the kernel .text), but it might be too complex to use in SMP case where the mappings are more dynamic. John From owner-freebsd-smp Mon Dec 23 01:19:29 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id BAA25781 for smp-outgoing; Mon, 23 Dec 1996 01:19:29 -0800 (PST) Received: from mail001.mediacity.com (mail001.mediacity.com [206.24.105.68]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id BAA25776 for ; Mon, 23 Dec 1996 01:19:26 -0800 (PST) From: brian@mediacity.com Received: (qmail-queue invoked from smtpd); 23 Dec 1996 09:19:02 -0000 Received: from home001.mediacity.com (HELO mediacity.com) (qmailr@206.24.105.66) by mail001.mediacity.com with SMTP; 23 Dec 1996 09:19:02 -0000 Received: (qmail-queue invoked by uid 100); 23 Dec 1996 09:18:08 -0000 Message-ID: <19961223091808.7334.qmail@mediacity.com> Subject: Re: (long) P6 and ??? TLB shootdown ??? To: erich@uruk.org (Erich Boleyn) Date: Mon, 23 Dec 1996 01:18:08 -0800 (PST) Cc: smp@freebsd.org In-Reply-To: from Erich Boleyn at "Dec 22, 96 10:38:07 am" Reply-To: brian@mediacity.com X-Mailer: ELM [version 2.4ME+ PL22 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Erich Boleyn wrote: > Hi all. I spent the last few days doing debugging exercises with > FreeBSD-SMP on my P6 SMP test box. The results were interesting. > > First of all, I dug around in the debugger more, always getting an > error message and stack traceback that always looks like the following > (modulo some differences in the "fault virtual address" and the > "current process" stuff): I get the identical fault, at the same point, running on my P6 ASUS 2xPP200 motherboard. brian@mediacity.com > > ---------------------------(start DDB stuff)-------------------------------- > Fatal trap 12: page fault while in kernel mode ... > kernel: type 12 trap, code=0 > Stopped at _pmap_enter+0x8f: movl 0(%ecx),%ecx From owner-freebsd-smp Mon Dec 23 18:31:33 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id SAA05115 for smp-outgoing; Mon, 23 Dec 1996 18:31:33 -0800 (PST) Received: from uruk.org (root@ns.uruk.org [198.145.95.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id SAA05108 for ; Mon, 23 Dec 1996 18:31:28 -0800 (PST) Received: from uruk.org [127.0.0.1] (erich) by uruk.org with esmtp (Exim 0.53 #1) id E0vcNbo-0003EK-00; Mon, 23 Dec 1996 19:32:52 -0800 To: smp@freebsd.org Subject: Eureka (maybe...) (was -> Re: P6 problem idea ) In-reply-to: Your message of "Sun, 22 Dec 1996 17:18:36 EST." <199612222218.RAA00490@dyson.iquest.net> Date: Mon, 23 Dec 1996 19:32:52 -0800 From: Erich Boleyn Message-Id: Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi all. After seeing the messages about the Page Global bit being set, it seemed clear that it is at least some kind of problem, and the right entries aren't being flushed from the TLB, whether it was the major problem causing the crashes or not. I tried shutting off the page global stuff, and while I don't have a difinitively long run yet, it has run through 3 full kernel compiles with no crash yet. I'll run it for the next 1 1/2 hours and see if it lives through that. If so, I think we have our main culprit (I'll also post the (small) code change which synchronizes the CPUs on TLB shootdown before letting the sender continue). All that said, I'm very surprised that this *isn't* also a serious problem on the Pentium (the Pentium has the Page Global stuff as well... I didn't look to see if it is used for the Pentium as well as the Pentium Pro). -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying" From owner-freebsd-smp Mon Dec 23 21:20:31 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id VAA09335 for smp-outgoing; Mon, 23 Dec 1996 21:20:31 -0800 (PST) Received: from uruk.org (root@ns.uruk.org [198.145.95.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id VAA09330 for ; Mon, 23 Dec 1996 21:20:27 -0800 (PST) Received: from uruk.org [127.0.0.1] (erich) by uruk.org with esmtp (Exim 0.53 #1) id E0vcQFJ-0003Y6-00; Mon, 23 Dec 1996 22:21:49 -0800 To: smp@freebsd.org cc: haertel@ichips.intel.com, wscott@ichips.intel.com Subject: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Mon, 23 Dec 1996 19:32:52 PST." Date: Mon, 23 Dec 1996 22:21:49 -0800 From: Erich Boleyn Message-Id: Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Erich Boleyn writes: > I tried shutting off the page global stuff, and while I don't have > a difinitively long run yet, it has run through 3 full kernel compiles > with no crash yet. I'll run it for the next 1 1/2 hours and see if > it lives through that. If so, I think we have our main culprit (I'll > also post the (small) code change which synchronizes the CPUs on TLB > shootdown before letting the sender continue). Well, after 2 hours of kernel builds, and now a few sets of 4 parallel kernel builds later, the system is still running great. I think we have our culprit... the Page Global stuff (plus adding the TLB shootdown synchronization may be helping a little with stability, but it's absence doesn't appear to be the major cause). > All that said, I'm very surprised that this *isn't* also a serious > problem on the Pentium (the Pentium has the Page Global stuff as > well... I didn't look to see if it is used for the Pentium as well > as the Pentium Pro). I might be confused here, but as mentioned in the above comment, I thought this was implemented in the Pentium as well. Can someone who remembers better (or has the "Appendix H" equivalent released documentation) comment? -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying" From owner-freebsd-smp Mon Dec 23 22:06:44 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id WAA10662 for smp-outgoing; Mon, 23 Dec 1996 22:06:44 -0800 (PST) Received: from root.com (implode.root.com [198.145.90.17]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id WAA10657 for ; Mon, 23 Dec 1996 22:06:42 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by root.com (8.7.6/8.6.5) with SMTP id WAA27540; Mon, 23 Dec 1996 22:05:33 -0800 (PST) Message-Id: <199612240605.WAA27540@root.com> X-Authentication-Warning: implode.root.com: Host localhost [127.0.0.1] didn't use HELO protocol To: Erich Boleyn cc: smp@freebsd.org, haertel@ichips.intel.com, wscott@ichips.intel.com Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Mon, 23 Dec 1996 22:21:49 PST." From: David Greenman Reply-To: dg@root.com Date: Mon, 23 Dec 1996 22:05:33 -0800 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > >Erich Boleyn writes: > >> I tried shutting off the page global stuff, and while I don't have >> a difinitively long run yet, it has run through 3 full kernel compiles >> with no crash yet. I'll run it for the next 1 1/2 hours and see if >> it lives through that. If so, I think we have our main culprit (I'll >> also post the (small) code change which synchronizes the CPUs on TLB >> shootdown before letting the sender continue). > >Well, after 2 hours of kernel builds, and now a few sets of 4 parallel >kernel builds later, the system is still running great. > >I think we have our culprit... the Page Global stuff (plus adding the >TLB shootdown synchronization may be helping a little with stability, but >it's absence doesn't appear to be the major cause). > >> All that said, I'm very surprised that this *isn't* also a serious >> problem on the Pentium (the Pentium has the Page Global stuff as >> well... I didn't look to see if it is used for the Pentium as well >> as the Pentium Pro). > >I might be confused here, but as mentioned in the above comment, I >thought this was implemented in the Pentium as well. Can someone >who remembers better (or has the "Appendix H" equivalent released >documentation) comment? Tge "PGE" feature doesn't appear to be present in the stepping 4 or stepping 12 chips that I have here...so if the Pentium has the feature, it must have been added only very recently. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-smp Mon Dec 23 22:38:13 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id WAA11403 for smp-outgoing; Mon, 23 Dec 1996 22:38:13 -0800 (PST) Received: from uruk.org (root@ns.uruk.org [198.145.95.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id WAA11398 for ; Mon, 23 Dec 1996 22:38:09 -0800 (PST) Received: from uruk.org [127.0.0.1] (erich) by uruk.org with esmtp (Exim 0.53 #1) id E0vcRSI-0003hW-00; Mon, 23 Dec 1996 23:39:18 -0800 To: dg@root.com cc: smp@freebsd.org, haertel@ichips.intel.com, wscott@ichips.intel.com Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Mon, 23 Dec 1996 22:05:33 PST." <199612240605.WAA27540@root.com> Date: Mon, 23 Dec 1996 23:39:18 -0800 From: Erich Boleyn Message-Id: Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk David Greenman writes: > >Erich Boleyn writes: ... > >I might be confused here, but as mentioned in the above comment, I > >thought this was implemented in the Pentium as well. Can someone > >who remembers better (or has the "Appendix H" equivalent released > >documentation) comment? > > Tge "PGE" feature doesn't appear to be present in the stepping 4 or > stepping 12 chips that I have here...so if the Pentium has the feature, > it must have been added only very recently. Well, if it wasn't in the mainstream Pentium CPUs, I don't think it would be in any of the newer ones. I think the only "new" features to the recent Pentium CPUs have been widely advertised (MMX, faster clocks...). Anyway, it seems reasonable that this is the main difference. I'll generate a cvs diff for the tree I have tomorrow morning (it conditionally compiles the Page Global stuff on not having SMP enabled, and adds the waiting of the other CPUs to the TLB shootdown... it doesn't synchronize the other CPUs *before* the page tables are changed, but that seems to be a much rarer problem, so getting this in as is is probably worthwhile). I must go to bed for now. -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying" From owner-freebsd-smp Mon Dec 23 22:39:54 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id WAA11434 for smp-outgoing; Mon, 23 Dec 1996 22:39:54 -0800 (PST) Received: from spinner.DIALix.COM (root@spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id WAA11429 for ; Mon, 23 Dec 1996 22:39:50 -0800 (PST) Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id OAA19697; Tue, 24 Dec 1996 14:00:47 +0800 (WST) Message-Id: <199612240600.OAA19697@spinner.DIALix.COM> X-Mailer: exmh version 1.6.9 8/22/96 To: Erich Boleyn cc: smp@freebsd.org, haertel@ichips.intel.com, wscott@ichips.intel.com Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Mon, 23 Dec 1996 22:21:49 PST." Date: Tue, 24 Dec 1996 14:00:46 +0800 From: Peter Wemm Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Erich Boleyn wrote: > Erich Boleyn writes: > > I tried shutting off the page global stuff, and while I don't have > > a difinitively long run yet, it has run through 3 full kernel compiles > > with no crash yet. I'll run it for the next 1 1/2 hours and see if > > it lives through that. If so, I think we have our main culprit (I'll > > also post the (small) code change which synchronizes the CPUs on TLB > > shootdown before letting the sender continue). > > Well, after 2 hours of kernel builds, and now a few sets of 4 parallel > kernel builds later, the system is still running great. > > I think we have our culprit... the Page Global stuff (plus adding the > TLB shootdown synchronization may be helping a little with stability, but > it's absence doesn't appear to be the major cause). Hmm.. Interesting... I have a theory. On the standard kernel when on a cpu_class >= PPro (not Pentium) we set the PG_G bits. We also have an invltlb() function call as well as the page level invlpg() and invl2pg() calls. (invl2pg just does two invlpg's in a single function call to lower the function call overheads). On the SMP kernel, all three of these functions cause an "global invalidate" broadcast. If the initiating cpu is actually trying to modify a PG_G page, this will screw up since the per-page invalidate gets converted to a global invalidate on the other cpu's, and hence they don't flush their PG_G page. Does that sound like a plausable explanation? If so, we need to refine the implementation of TLB shootdowns more so that we can initiate a per-page flush as well as a global flush.. This will require syncronisation, so if you can send your code you can save some reinvention.. :-) > > All that said, I'm very surprised that this *isn't* also a serious > > problem on the Pentium (the Pentium has the Page Global stuff as > > well... I didn't look to see if it is used for the Pentium as well > > as the Pentium Pro). > > I might be confused here, but as mentioned in the above comment, I > thought this was implemented in the Pentium as well. Can someone > who remembers better (or has the "Appendix H" equivalent released > documentation) comment? Don't know about the Pentium, but we definately don't enable it on anything smaller than a PPro. Cheers, -Peter From owner-freebsd-smp Mon Dec 23 22:53:27 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id WAA11840 for smp-outgoing; Mon, 23 Dec 1996 22:53:27 -0800 (PST) Received: from root.com (implode.root.com [198.145.90.17]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id WAA11835 for ; Mon, 23 Dec 1996 22:53:25 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by root.com (8.7.6/8.6.5) with SMTP id WAA27742; Mon, 23 Dec 1996 22:52:10 -0800 (PST) Message-Id: <199612240652.WAA27742@root.com> X-Authentication-Warning: implode.root.com: Host localhost [127.0.0.1] didn't use HELO protocol To: Peter Wemm cc: Erich Boleyn , smp@freebsd.org, haertel@ichips.intel.com, wscott@ichips.intel.com Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Tue, 24 Dec 1996 14:00:46 +0800." <199612240600.OAA19697@spinner.DIALix.COM> From: David Greenman Reply-To: dg@root.com Date: Mon, 23 Dec 1996 22:52:09 -0800 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >On the standard kernel when on a cpu_class >= PPro (not Pentium) we set >the PG_G bits. Actually, it is conditional on the "PGE" chip feature...it doesn't matter if it is a P5 or P6. > We also have an invltlb() function call as well as the >page level invlpg() and invl2pg() calls. (invl2pg just does two invlpg's >in a single function call to lower the function call overheads). > >On the SMP kernel, all three of these functions cause an "global >invalidate" broadcast. If the initiating cpu is actually trying to modify >a PG_G page, this will screw up since the per-page invalidate gets >converted to a global invalidate on the other cpu's, and hence they don't >flush their PG_G page. > >Does that sound like a plausable explanation? That's what we've talking about... >If so, we need to refine the implementation of TLB shootdowns more so that >we can initiate a per-page flush as well as a global flush.. This will >require syncronisation, so if you can send your code you can save some >reinvention.. :-) Uh, no, I don't think that is the right solution. The single-page invalidates are going to be too expensive in the SMP case. I think a better solution would be to simply only set the PG_G flag for static mappings (kernel text, data, and bss), and NOT set it for other kernel mappings if SMP is true. This is a trivial change to pmap.c. >> I might be confused here, but as mentioned in the above comment, I >> thought this was implemented in the Pentium as well. Can someone >> who remembers better (or has the "Appendix H" equivalent released >> documentation) comment? > >Don't know about the Pentium, but we definately don't enable it on >anything smaller than a PPro. See above. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-smp Mon Dec 23 23:19:41 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id XAA12493 for smp-outgoing; Mon, 23 Dec 1996 23:19:41 -0800 (PST) Received: from spinner.DIALix.COM (root@spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id XAA12484 for ; Mon, 23 Dec 1996 23:19:28 -0800 (PST) Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id PAA00923; Tue, 24 Dec 1996 15:17:38 +0800 (WST) Message-Id: <199612240717.PAA00923@spinner.DIALix.COM> X-Mailer: exmh version 1.6.9 8/22/96 To: dg@root.com cc: Erich Boleyn , smp@freebsd.org, haertel@ichips.intel.com, wscott@ichips.intel.com Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Mon, 23 Dec 1996 22:52:09 PST." <199612240652.WAA27742@root.com> Date: Tue, 24 Dec 1996 15:17:37 +0800 From: Peter Wemm Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk David Greenman wrote: > >If so, we need to refine the implementation of TLB shootdowns more so that > >we can initiate a per-page flush as well as a global flush.. This will > >require syncronisation, so if you can send your code you can save some > >reinvention.. :-) > > Uh, no, I don't think that is the right solution. The single-page > invalidates are going to be too expensive in the SMP case. I think a > better solution would be to simply only set the PG_G flag for static > mappings (kernel text, data, and bss), and NOT set it for other kernel > mappings if SMP is true. This is a trivial change to pmap.c. What I had in mind was simply storing up to the two addresses. When the IPI handler on the slave CPU fires, it checks the first address. If it's zero, it does a global invalidate and returns. If it's set it does an invlpg on the address. If the second one is set, it does an invlpg on that one too. The cost of doing this relative to the cost of the IPI in the first place should be pretty small since it's just a few extra instructions. How expensive are global unnecessary TLB refills? Since only one cpu can ever be sending a TLB flush at any given time (only one in the kernel proper), we have implicit locking for free. We need to wait for the target cpu's to process the invalidates anyway before we start using "stolen" pages etc so with that feature present we should not have to worry about protecting the two address pointers from reuse before all cpu's have finished using them. Cheers, -Peter From owner-freebsd-smp Mon Dec 23 23:42:20 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id XAA13393 for smp-outgoing; Mon, 23 Dec 1996 23:42:20 -0800 (PST) Received: from root.com (implode.root.com [198.145.90.17]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id XAA13388 for ; Mon, 23 Dec 1996 23:42:17 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by root.com (8.7.6/8.6.5) with SMTP id XAA27949; Mon, 23 Dec 1996 23:40:35 -0800 (PST) Message-Id: <199612240740.XAA27949@root.com> X-Authentication-Warning: implode.root.com: Host localhost [127.0.0.1] didn't use HELO protocol To: Peter Wemm cc: Erich Boleyn , smp@freebsd.org, haertel@ichips.intel.com, wscott@ichips.intel.com Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Tue, 24 Dec 1996 15:17:37 +0800." <199612240717.PAA00923@spinner.DIALix.COM> From: David Greenman Reply-To: dg@root.com Date: Mon, 23 Dec 1996 23:40:35 -0800 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >David Greenman wrote: >> >If so, we need to refine the implementation of TLB shootdowns more so that >> >we can initiate a per-page flush as well as a global flush.. This will >> >require syncronisation, so if you can send your code you can save some >> >reinvention.. :-) >> >> Uh, no, I don't think that is the right solution. The single-page >> invalidates are going to be too expensive in the SMP case. I think a >> better solution would be to simply only set the PG_G flag for static >> mappings (kernel text, data, and bss), and NOT set it for other kernel >> mappings if SMP is true. This is a trivial change to pmap.c. > >What I had in mind was simply storing up to the two addresses. When the >IPI handler on the slave CPU fires, it checks the first address. If it's >zero, it does a global invalidate and returns. If it's set it does an >invlpg on the address. If the second one is set, it does an invlpg on that >one too. The cost of doing this relative to the cost of the IPI in the >first place should be pretty small since it's just a few extra >instructions. How expensive are global unnecessary TLB refills? In the case where the single-page invalidates are only happening once (not in a loop - pmap_kremove()), then this would be a win. In the other case where many pages will be singularly invalidated (pmap_qremove), it would be a lose. I think that in our current code, pmap_kenter's are always paired with pmap_kremove's, and pmap_qenter's are always paired with pmap_qremove's, but it is probably a bad assumption to assume this will always be true...so I'm a little uncomfortable with splitting the behavior (PG_G in one case, not in the other). -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-smp Tue Dec 24 01:33:08 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id BAA16539 for smp-outgoing; Tue, 24 Dec 1996 01:33:08 -0800 (PST) Received: from tfs.com (tfs.com [140.145.250.1]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id BAA16534 for ; Tue, 24 Dec 1996 01:33:06 -0800 (PST) Received: from critter.tfs.com by tfs.com (smail3.1.28.1) with SMTP id m0vcTDt-0003vkC; Tue, 24 Dec 96 01:32 PST Received: from critter.tfs.com (localhost.phk.dk [127.0.0.1]) by critter.tfs.com (8.8.2/8.8.2) with ESMTP id KAA29101; Tue, 24 Dec 1996 10:37:29 +0100 (MET) To: Erich Boleyn cc: smp@freebsd.org Subject: Re: Eureka (maybe...) (was -> Re: P6 problem idea ) In-reply-to: Your message of "Mon, 23 Dec 1996 19:32:52 PST." Date: Tue, 24 Dec 1996 10:37:28 +0100 Message-ID: <29099.851420248@critter.tfs.com> From: Poul-Henning Kamp Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message , Erich Boleyn writes: >I tried shutting off the page global stuff, and while I don't have >a difinitively long run yet, it has run through 3 full kernel compiles >with no crash yet. I'll run it for the next 1 1/2 hours and see if >it lives through that. If so, I think we have our main culprit (I'll >also post the (small) code change which synchronizes the CPUs on TLB >shootdown before letting the sender continue). Cool! Once again proves that thinking about things far removed from computers also works :-) >All that said, I'm very surprised that this *isn't* also a serious >problem on the Pentium (the Pentium has the Page Global stuff as >well... I didn't look to see if it is used for the Pentium as well >as the Pentium Pro). Now that is an interesting thing. I wonder if there is some underlying difference in the global bit, Hmm... The P5 doesn't have a global bit, does it ? That would qualify for a "difference" to me :-) Merry Xmas! -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-smp Tue Dec 24 09:39:54 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA27446 for smp-outgoing; Tue, 24 Dec 1996 09:39:54 -0800 (PST) Received: from uruk.org (root@ns.uruk.org [198.145.95.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA27440 for ; Tue, 24 Dec 1996 09:39:49 -0800 (PST) Received: from uruk.org [127.0.0.1] (erich) by uruk.org with esmtp (Exim 0.53 #1) id E0vcbmp-0004zg-00; Tue, 24 Dec 1996 10:41:11 -0800 To: smp@freebsd.org Subject: Re: I think we have the culprit!! (was -> Re: Eureka (maybe...) (was -> Re: P6 problem idea ) ) In-reply-to: Your message of "Mon, 23 Dec 1996 23:39:18 PST." Date: Tue, 24 Dec 1996 10:41:11 -0800 From: Erich Boleyn Message-Id: Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Erich Boleyn writes: > I'll generate a cvs diff for the tree I have tomorrow morning (it > conditionally compiles the Page Global stuff on not having SMP > enabled, and adds the waiting of the other CPUs to the TLB shootdown... > it doesn't synchronize the other CPUs *before* the page tables are > changed, but that seems to be a much rarer problem, so getting > this in as is is probably worthwhile). I must go to bed for now. I figured I'd send unified diffs of what I have, which works on my machine without crashing or corruption that I could detect, but there was a snag... The diff included below comments out the TLB shootdown rendevous because either the broadcast doesn't work quite right or one of the CPUs have blocked interrupts in some way. In any case, somewhere along the way it hangs. Breaking into the kernel debugger invariably showed a bit in the "smp_invalidate_needed" variable still set. A variant where I simply set "smp_invalidate_needed" to 1 on the sending CPU and to 0 in the IPI handler routine (i.e. the first CPU receiving the interrupt will release the sender) seemed to work OK, but in general I figured it was a band-aid that might fail if all the other CPUs don't get a message (a 4-CPU machine like mine might be OK, but a 2-CPU machine might choke before too long). So, this is indicative of some kind of problem with TLB invalidates not getting sent to or received by all CPUs (or maybe my "lock" instruction wasn't assembled correctly and the CPUs didn't do locked bus cycles??? I'm just not sure). ----------------------(start cvs diff -u)----------------------- Index: sys/i386/i386/locore.s =================================================================== RCS file: /usr/cvssup/sys/i386/i386/locore.s,v retrieving revision 1.34 diff -u -r1.34 locore.s --- locore.s 1996/12/03 05:51:09 1.34 +++ locore.s 1996/12/24 06:30:02 @@ -725,12 +725,14 @@ create_pagetables: +#ifndef SMP testl $CPUID_PGE, R(_cpu_feature) jz 1f movl %cr4, %eax orl $CR4_PGE, %eax movl %eax, %cr4 1: +#endif /* Find end of kernel image (rounded up to a page boundary). */ movl $R(_end),%esi @@ -792,9 +794,11 @@ jne map_read_write #endif xorl %edx,%edx +#ifndef SMP testl $CPUID_PGE, R(_cpu_feature) jz 2f orl $PG_G,%edx +#endif 2: movl $R(_etext),%ecx addl $PAGE_MASK,%ecx @@ -807,9 +811,11 @@ andl $~PAGE_MASK, %eax map_read_write: movl $PG_RW,%edx +#ifndef SMP testl $CPUID_PGE, R(_cpu_feature) jz 1f orl $PG_G,%edx +#endif 1: movl R(_KERNend),%ecx subl %eax,%ecx Index: sys/i386/i386/mp_machdep.c =================================================================== RCS file: /usr/cvssup/sys/i386/i386/mp_machdep.c,v retrieving revision 1.35 diff -u -r1.35 mp_machdep.c --- mp_machdep.c 1996/12/12 08:43:52 1.35 +++ mp_machdep.c 1996/12/24 17:08:02 @@ -1545,9 +1545,23 @@ /* * Flush the TLB on all other CPU's * - * XXX: Needs to handshake and wait for completion before proceding. + * XXX: Needs to handshake and wait for completion before proceeding. + * + * XXX: -- Erich Boleyn + * The code inside the "WORKING_INVALIDATE_RENDEVOUS" defines implements a + * handshake waiting for all the other CPUs to complete their invalidate. + * "WORKING_INVALIDATE_RENDEVOUS" is undefined because the "allButSelfIPI" + * doesn't always seem to be received by all CPUs. This could be a + * problem where one of other CPUs' interrupt controllers are pending + * on a higher priority interrupt or some such thing... I really don't + * know at this point, but I figured I'd leave the code in since it + * works fine otherwise. */ +#ifdef WORKING_INVALIDATE_RENDEVOUS +volatile unsigned smp_invalidate_needed = 0; +#endif + void smp_invltlb() { @@ -1558,8 +1572,15 @@ serial_putc('A' + cpunumber()); } #endif - if (invldebug & 2) + if (invldebug & 2) { +#ifdef WORKING_INVALIDATE_RENDEVOUS + smp_invalidate_needed = ((1 << mp_ncpus) - 1) & ~(1 << cpunumber()); +#endif allButSelfIPI(ICU_OFFSET+27); +#ifdef WORKING_INVALIDATE_RENDEVOUS + while (smp_invalidate_needed); +#endif + } } } @@ -1606,6 +1627,15 @@ __asm __volatile("movl %%cr3, %0; movl %0, %%cr3" : "=r" (temp) : : "memory"); +#ifdef WORKING_INVALIDATE_RENDEVOUS + /* + * This is an atomic bit reset, to declare to the world that + * the invalidate for this CPU has been performed. + */ + temp = cpunumber(); + __asm __volatile("lock ; btr %0, _smp_invalidate_needed" : : + "r" (temp) : "memory"); +#endif } } #endif /* SMP_INVLTLB */ Index: sys/i386/i386/pmap.c =================================================================== RCS file: /usr/cvssup/sys/i386/i386/pmap.c,v retrieving revision 1.32 diff -u -r1.32 pmap.c --- pmap.c 1996/12/12 08:43:53 1.32 +++ pmap.c 1996/12/24 17:15:21 @@ -157,7 +157,9 @@ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ static boolean_t pmap_initialized = FALSE; /* Has pmap_init completed? */ static vm_offset_t vm_first_phys; +#ifndef SMP static int pgeflag; /* PG_G or-in */ +#endif static int nkpt; static vm_page_t nkpg; @@ -354,10 +356,12 @@ invltlb(); +#ifndef SMP if (cpu_feature & CPUID_PGE) pgeflag = PG_G; else pgeflag = 0; +#endif } #if defined(SMP) || defined(APIC_IO) @@ -647,7 +651,11 @@ for (i = 0; i < count; i++) { vm_offset_t tva = va + i * PAGE_SIZE; +#ifndef SMP unsigned npte = VM_PAGE_TO_PHYS(m[i]) | PG_RW | PG_V | pgeflag; +#else + unsigned npte = VM_PAGE_TO_PHYS(m[i]) | PG_RW | PG_V; +#endif unsigned opte; pte = (unsigned *)vtopte(tva); opte = *pte; @@ -690,7 +698,11 @@ register unsigned *pte; unsigned npte, opte; +#ifndef SMP npte = pa | PG_RW | PG_V | pgeflag; +#else + npte = pa | PG_RW | PG_V; +#endif pte = (unsigned *)vtopte(va); opte = *pte; *pte = npte; @@ -1658,8 +1670,10 @@ * Machines that don't support invlpg, also don't support * PG_G. */ +#ifndef SMP if (oldpte & PG_G) invlpg(va); +#endif pmap->pm_stats.resident_count -= 1; if (oldpte & PG_MANAGED) { ppv = pa_to_pvh(oldpte); @@ -2088,8 +2102,10 @@ newpte |= PG_W; if (va < UPT_MIN_ADDRESS) newpte |= PG_U; +#ifndef SMP if (pmap == kernel_pmap) newpte |= pgeflag; +#endif /* * if the mapping or permission bits are different, we need ----------------------(end cvs diff -u)----------------------- -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying" From owner-freebsd-smp Wed Dec 25 10:35:59 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id KAA18504 for smp-outgoing; Wed, 25 Dec 1996 10:35:59 -0800 (PST) Received: from bluenose.na.tuns.ca (bluenose.na.tuns.ca [134.190.50.156]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id KAA18499 for ; Wed, 25 Dec 1996 10:35:57 -0800 (PST) Received: (from smp@localhost) by bluenose.na.tuns.ca (8.7.6/8.7.3) id OAA21895 for smp@freebsd.org; Wed, 25 Dec 1996 14:45:07 -0400 (AST) From: "J.M. Chuang" Message-Id: <199612251845.OAA21895@bluenose.na.tuns.ca> Subject: No More Trap 12 (was-> I think we have the culprit!!) To: smp@freebsd.org Date: Wed, 25 Dec 1996 14:45:07 -0400 (AST) X-Mailer: ELM [version 2.4ME+ PL13 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, After changing only one line in pmap.c as follows: ... ... if (cpu_feature & CPUID_PGE) pgeflag = PG_G; ---> pgeflag = 0; else pgeflag = 0; ... ... The current smp-kernel+APIC_IO+SMP_INVLTLB seems solid enough for Dual Pentium Pro to compile kernel without trap 12. Some results of `make -jx' for compiling a kernel are included below. Any theory for it? Jim ------------------------------------------------------------------ make -j 1 268.66 real 110.34 user 93.47 sys 3512 maximum resident set size 187 average shared memory size 275 average unshared data size 35 average unshared stack size 173362 page reclaims 7 page faults 0 swaps 71 block input operations 6121 block output operations 0 messages sent 0 messages received 0 signals received 5611 voluntary context switches 7695 involuntary context switches ------------------------------------------------- make -j 2 203.48 real 144.54 user 75.44 sys 3512 maximum resident set size 335 average shared memory size 473 average unshared data size 57 average unshared stack size 178505 page reclaims 18 page faults 0 swaps 87 block input operations 7322 block output operations 0 messages sent 0 messages received 0 signals received 8266 voluntary context switches 7893 involuntary context switches -------------------------------------------------- make -j 4 159.90 real 160.57 user 72.32 sys 3512 maximum resident set size 366 average shared memory size 508 average unshared data size 63 average unshared stack size 178681 page reclaims 0 page faults 0 swaps 71 block input operations 7135 block output operations 0 messages sent 0 messages received 0 signals received 8676 voluntary context switches 9372 involuntary context switches -------------------------------------------------- make -j 6 155.47 real 167.11 user 72.79 sys 3512 maximum resident set size 358 average shared memory size 496 average unshared data size 63 average unshared stack size 178810 page reclaims 0 page faults 0 swaps 71 block input operations 7065 block output operations 0 messages sent 0 messages received 0 signals received 9357 voluntary context switches 12145 involuntary context switches -------------------------------------------------- make -j 8 149.20 real 166.34 user 74.14 sys 3512 maximum resident set size 362 average shared memory size 500 average unshared data size 63 average unshared stack size 178894 page reclaims 0 page faults 0 swaps 71 block input operations 6966 block output operations 0 messages sent 0 messages received 0 signals received 9867 voluntary context switches 10648 involuntary context switches -------------------------------------------------- make -j 10 153.76 real 172.27 user 73.22 sys 3512 maximum resident set size 362 average shared memory size 498 average unshared data size 63 average unshared stack size 178867 page reclaims 38 page faults 0 swaps 106 block input operations 6842 block output operations 0 messages sent 0 messages received 0 signals received 10202 voluntary context switches 12608 involuntary context switches ---------------------------------------------------- From owner-freebsd-smp Wed Dec 25 11:33:07 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id LAA19609 for smp-outgoing; Wed, 25 Dec 1996 11:33:07 -0800 (PST) Received: from root.com (implode.root.com [198.145.90.17]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id LAA19604 for ; Wed, 25 Dec 1996 11:33:05 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by root.com (8.7.6/8.6.5) with SMTP id LAA03897; Wed, 25 Dec 1996 11:31:46 -0800 (PST) Message-Id: <199612251931.LAA03897@root.com> X-Authentication-Warning: implode.root.com: Host localhost [127.0.0.1] didn't use HELO protocol To: "J.M. Chuang" cc: smp@freebsd.org Subject: Re: No More Trap 12 (was-> I think we have the culprit!!) In-reply-to: Your message of "Wed, 25 Dec 1996 14:45:07 -0400." <199612251845.OAA21895@bluenose.na.tuns.ca> From: David Greenman Reply-To: dg@root.com Date: Wed, 25 Dec 1996 11:31:45 -0800 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >After changing only one line in pmap.c as follows: >... >... >if (cpu_feature & CPUID_PGE) > pgeflag = PG_G; ---> pgeflag = 0; >else > pgeflag = 0; > >... >... > >The current smp-kernel+APIC_IO+SMP_INVLTLB seems solid enough for >Dual Pentium Pro to compile kernel without trap 12. > >Some results of `make -jx' for compiling a kernel are included below. > >Any theory for it? Right, that's the only one that needs to be changed. The other ones in Erich's patch didn't need to be changed (they are completely static). The problem is caused by the page-global flag preventing TLB updates for certain pages on the other CPUs. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project From owner-freebsd-smp Thu Dec 26 09:53:22 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA00404 for smp-outgoing; Thu, 26 Dec 1996 09:53:22 -0800 (PST) Received: from Central.KeyWest.MPGN.COM (root@Central.TanSoft.COM [206.175.4.1]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA00399 for ; Thu, 26 Dec 1996 09:53:20 -0800 (PST) Received: from devious.Tansoft.com (Devious.TanSoft.COM [206.175.4.10]) by Central.KeyWest.MPGN.COM (8.6.9/8.6.9) with SMTP id MAA24768 for ; Thu, 26 Dec 1996 12:53:13 -0500 Message-Id: <3.0.32.19961226125312.007ca1b0@central.TanSoft.COM> X-Sender: rwm@central.TanSoft.COM X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Thu, 26 Dec 1996 12:53:13 -0500 To: freebsd-smp@freebsd.org From: Rob Miracle Subject: SMP and 2.2-BETA Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I installed 2.2-BETA this morning, and the SMP support seems to be lacking. Since I have been off this list for sometime, is the SMP not going to make it into 2.2? When can we expect it? If I am running 2.2-BETA, what do I need to do to get SMP working again? Thanks Rob From owner-freebsd-smp Thu Dec 26 12:59:18 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id MAA07138 for smp-outgoing; Thu, 26 Dec 1996 12:59:18 -0800 (PST) Received: from mail001.mediacity.com (mail001.mediacity.com [206.24.105.68]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id MAA07125 for ; Thu, 26 Dec 1996 12:59:14 -0800 (PST) From: brian@mediacity.com Received: (qmail-queue invoked from smtpd); 26 Dec 1996 20:58:41 -0000 Received: from home001.mediacity.com (HELO mediacity.com) (qmailr@206.24.105.66) by mail001.mediacity.com with SMTP; 26 Dec 1996 20:58:41 -0000 Received: (qmail-queue invoked by uid 100); 26 Dec 1996 20:57:46 -0000 Message-ID: <19961226205746.18257.qmail@mediacity.com> Subject: ASUS 2xP6-200 is much happier campber To: freebsd-smp@freebsd.org Date: Thu, 26 Dec 1996 12:57:46 -0800 (PST) Reply-To: brian@mediacity.com X-Mailer: ELM [version 2.4ME+ PL22 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I applied the single line patch published on freebsd-smp yesterday, and my ASUS 2xP6-200 system is up and running. Evil trap 12's are gone. kernel builds work well. However, when I run things like 'make -j 2' I get various errors during the build, though they don't happen when run via 'make'. -- Brian Litzinger Powered by FreeBSD http[s]://www.mpress.com From owner-freebsd-smp Thu Dec 26 15:20:54 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id PAA16052 for smp-outgoing; Thu, 26 Dec 1996 15:20:54 -0800 (PST) Received: from clem.systemsix.com (clem.systemsix.com [198.99.86.131]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id PAA16046 for ; Thu, 26 Dec 1996 15:20:50 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by clem.systemsix.com (8.6.12/8.6.12) with SMTP id QAA06730; Thu, 26 Dec 1996 16:20:25 -0700 Message-Id: <199612262320.QAA06730@clem.systemsix.com> X-Authentication-Warning: clem.systemsix.com: Host localhost didn't use HELO protocol X-Mailer: exmh version 1.6.5 12/11/95 From: Steve Passe To: Rob Miracle cc: freebsd-smp@freebsd.org Subject: Re: SMP and 2.2-BETA In-reply-to: Your message of "Thu, 26 Dec 1996 12:53:13 EST." <3.0.32.19961226125312.007ca1b0@central.TanSoft.COM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 26 Dec 1996 16:20:24 -0700 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, > I installed 2.2-BETA this morning, and the SMP support seems to be lacking. > Since I have been off this list for sometime, is the SMP not going to make > it into 2.2? When can we expect it? If I am running 2.2-BETA, what do I > need to do to get SMP working again? SMP won't be part of 2.2, its scheduled for 3.0. You won't be able to use SMP with 2.2 for long, SMP will be tracking 3.0-current and will become incompatible with 2.2, if it hasn't already... -- Steve Passe | powered by smp@csn.net | FreeBSD From owner-freebsd-smp Thu Dec 26 15:24:48 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id PAA16382 for smp-outgoing; Thu, 26 Dec 1996 15:24:48 -0800 (PST) Received: from clem.systemsix.com (clem.systemsix.com [198.99.86.131]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id PAA16233 for ; Thu, 26 Dec 1996 15:23:24 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by clem.systemsix.com (8.6.12/8.6.12) with SMTP id QAA06750; Thu, 26 Dec 1996 16:21:53 -0700 Message-Id: <199612262321.QAA06750@clem.systemsix.com> X-Authentication-Warning: clem.systemsix.com: Host localhost didn't use HELO protocol X-Mailer: exmh version 1.6.5 12/11/95 From: Steve Passe To: brian@mediacity.com cc: freebsd-smp@freebsd.org Subject: Re: ASUS 2xP6-200 is much happier campber In-reply-to: Your message of "Thu, 26 Dec 1996 12:57:46 PST." <19961226205746.18257.qmail@mediacity.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 26 Dec 1996 16:21:53 -0700 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, > I applied the single line patch published on freebsd-smp yesterday, > and my ASUS 2xP6-200 system is up and running. Evil trap 12's are gone. > > kernel builds work well. > > However, when I run things like 'make -j 2' I get various > errors during the build, though they don't happen when run > via 'make'. could you be more specific about the ERRORs? -- Steve Passe | powered by smp@csn.net | FreeBSD From owner-freebsd-smp Thu Dec 26 16:27:40 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id QAA20934 for smp-outgoing; Thu, 26 Dec 1996 16:27:40 -0800 (PST) Received: from mexico.brainstorm.eu.org (root@mexico.brainstorm.fr [193.56.58.253]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id QAA20908 for ; Thu, 26 Dec 1996 16:27:35 -0800 (PST) Received: from brasil.brainstorm.eu.org (brasil.brainstorm.fr [193.56.58.33]) by mexico.brainstorm.eu.org (8.7.5/8.7.3) with ESMTP id BAA02690; Fri, 27 Dec 1996 01:27:21 +0100 Received: (from uucp@localhost) by brasil.brainstorm.eu.org (8.6.12/8.6.12) with UUCP id BAA15668; Fri, 27 Dec 1996 01:26:47 +0100 Received: (from roberto@localhost) by keltia.freenix.fr (8.8.4/keltia-uucp-2.9) id BAA06771; Fri, 27 Dec 1996 01:25:15 +0100 (CET) Message-ID: Date: Fri, 27 Dec 1996 01:25:15 +0100 From: roberto@keltia.freenix.fr (Ollivier Robert) To: freebsd-smp@freebsd.org Cc: rwm@MPGN.COM (Rob Miracle) Subject: Re: SMP and 2.2-BETA References: <3.0.32.19961226125312.007ca1b0@central.TanSoft.COM> X-Mailer: Mutt 0.55.04 Mime-Version: 1.0 X-Operating-System: FreeBSD 3.0-CURRENT ctm#2837 In-Reply-To: <3.0.32.19961226125312.007ca1b0@central.TanSoft.COM>; from Rob Miracle on Dec 26, 1996 12:53:13 -0500 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk According to Rob Miracle: > I installed 2.2-BETA this morning, and the SMP support seems to be lacking. > Since I have been off this list for sometime, is the SMP not going to make > it into 2.2? When can we expect it? If I am running 2.2-BETA, what do I > need to do to get SMP working again? SMP is a 3.0 target, not a 2.2 one. If you want to play SMP, you gotta be a real man^H^H^H^H^H^H^H^H^HCURRENT. :-) -- Ollivier ROBERT -=- The daemon is FREE! -=- roberto@keltia.freenix.fr FreeBSD keltia.freenix.fr 3.0-CURRENT #33: Sat Dec 21 12:57:17 CET 1996 From owner-freebsd-smp Fri Dec 27 07:33:31 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id HAA24197 for smp-outgoing; Fri, 27 Dec 1996 07:33:31 -0800 (PST) Received: from Central.KeyWest.MPGN.COM (root@Central.TanSoft.COM [206.175.4.1]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id HAA24191 for ; Fri, 27 Dec 1996 07:33:27 -0800 (PST) Received: from devious.Tansoft.com (Devious.TanSoft.COM [206.175.4.10]) by Central.KeyWest.MPGN.COM (8.6.9/8.6.9) with SMTP id KAA20472; Fri, 27 Dec 1996 10:33:16 -0500 Message-Id: <3.0.32.19961227103319.007dc100@central.TanSoft.COM> X-Sender: rwm@central.TanSoft.COM X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Fri, 27 Dec 1996 10:33:20 -0500 To: roberto@keltia.freenix.fr (Ollivier Robert) From: Rob Miracle Subject: Re: SMP and 2.2-BETA Cc: freebsd-smp@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk At 01:25 AM 12/27/96 +0100, you wrote: >According to Rob Miracle: >> I installed 2.2-BETA this morning, and the SMP support seems to be lacking. >> Since I have been off this list for sometime, is the SMP not going to make >> it into 2.2? When can we expect it? If I am running 2.2-BETA, what do I >> need to do to get SMP working again? > >SMP is a 3.0 target, not a 2.2 one. > >If you want to play SMP, you gotta be a real man^H^H^H^H^H^H^H^H^HCURRENT. Ok, I accept that answer, but I didn't see the 3.0/Current stuff on ftp.freebsd.org. Where can I find it these days? Also, we are going to have to make a decision on our production machines. Right now we are running them on the 100696 Snapshot with SMP enable and it is running fine, but for the immediate future (since I can't get that Snap any more), do I go with 2.2 Release for our customer machines, or do I try to stay with the current branch so I can use my multi-processors? Rob From owner-freebsd-smp Fri Dec 27 15:38:49 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id PAA13864 for smp-outgoing; Fri, 27 Dec 1996 15:38:49 -0800 (PST) Received: from atlantis.nconnect.net (root@[206.54.227.6]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id PAA13859 for ; Fri, 27 Dec 1996 15:38:46 -0800 (PST) Received: from arabian.astrolab.org (dial208.nconnect.net [206.54.227.208]) by atlantis.nconnect.net (8.8.4/8.7.3) with SMTP id RAA15090 for ; Fri, 27 Dec 1996 17:36:05 -0600 (CST) Message-ID: <32C45DB5.41C67EA6@nconnect.net> Date: Fri, 27 Dec 1996 17:37:25 -0600 From: Randy DuCharme X-Mailer: Mozilla 3.01Gold (X11; I; FreeBSD 2.2-961014-SNAP i386) MIME-Version: 1.0 To: smp@freebsd.org Subject: Kernel Build Error Message Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Greetings, I've just cvsup'ed the SMP sources and am attempting to build my first SMP kernel. Make depend goes ok. On make however, I get the following message.... cc -c -x -assembler-with-cpp -DLOCORE -nostdinc -I- -I. -I../.. -I../../ include -DFAILSAFE iDCOMPAT_43 -DCD9660 -DNFS -DFFS -DINET -DKERNEL ../. ./i386/i386/locore.s ../../i386/i386/locore.s:Assembler messages: ../../i386/i386/locore.s:730: Error: bad register name ('%cr4') ../../i386/i386/locore.s:732: Error: bad register name ('%cr4') *** Error code 1 Stop. Would this be the same issue as is in the FAQ on ... ../../i386.i386/locore.s:705 Error: bad register name ('%cr4') What might be the best workaround for this ??? Thanks Randy From owner-freebsd-smp Fri Dec 27 16:13:32 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id QAA17021 for smp-outgoing; Fri, 27 Dec 1996 16:13:32 -0800 (PST) Received: from clem.systemsix.com (clem.systemsix.com [198.99.86.131]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id QAA17002 for ; Fri, 27 Dec 1996 16:13:27 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by clem.systemsix.com (8.6.12/8.6.12) with SMTP id RAA13243; Fri, 27 Dec 1996 17:12:34 -0700 Message-Id: <199612280012.RAA13243@clem.systemsix.com> X-Authentication-Warning: clem.systemsix.com: Host localhost didn't use HELO protocol X-Mailer: exmh version 1.6.5 12/11/95 From: Steve Passe To: Randy DuCharme cc: smp@freebsd.org Subject: Re: Kernel Build Error Message In-reply-to: Your message of "Fri, 27 Dec 1996 17:37:25 CST." <32C45DB5.41C67EA6@nconnect.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 27 Dec 1996 17:12:34 -0700 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, > I've just cvsup'ed the SMP sources and am attempting to build my first > SMP kernel. Make depend goes ok. On make however, I get the following > message.... > > cc -c -x -assembler-with-cpp -DLOCORE -nostdinc -I- -I. -I../.. -I../../ > include -DFAILSAFE iDCOMPAT_43 -DCD9660 -DNFS -DFFS -DINET -DKERNEL ../. > ./i386/i386/locore.s > ../../i386/i386/locore.s:Assembler messages: > ../../i386/i386/locore.s:730: Error: bad register name ('%cr4') > ... > What might be the best workaround for this ??? you need to bring the system up to -current. You might get away with just updating the compiler/assembler/loader, but eventually something in current will become mandatory... -- Steve Passe | powered by smp@csn.net | FreeBSD From owner-freebsd-smp Fri Dec 27 19:48:02 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id TAA25405 for smp-outgoing; Fri, 27 Dec 1996 19:48:02 -0800 (PST) Received: from mexico.brainstorm.eu.org (root@mexico.brainstorm.fr [193.56.58.253]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id TAA25387 for ; Fri, 27 Dec 1996 19:47:58 -0800 (PST) Received: from brasil.brainstorm.eu.org (brasil.brainstorm.fr [193.56.58.33]) by mexico.brainstorm.eu.org (8.7.5/8.7.3) with ESMTP id EAA04909; Sat, 28 Dec 1996 04:47:50 +0100 Received: (from uucp@localhost) by brasil.brainstorm.eu.org (8.6.12/8.6.12) with UUCP id EAA26851; Sat, 28 Dec 1996 04:47:22 +0100 Received: (from roberto@localhost) by keltia.freenix.fr (8.8.4/keltia-uucp-2.9) id EAA00980; Sat, 28 Dec 1996 04:23:57 +0100 (CET) Message-ID: Date: Sat, 28 Dec 1996 04:23:57 +0100 From: roberto@keltia.freenix.fr (Ollivier Robert) To: rwm@MPGN.COM (Rob Miracle) Cc: freebsd-smp@freebsd.org Subject: Re: SMP and 2.2-BETA References: <3.0.32.19961227103319.007dc100@central.TanSoft.COM> X-Mailer: Mutt 0.55.04 Mime-Version: 1.0 X-Operating-System: FreeBSD 3.0-CURRENT ctm#2837 In-Reply-To: <3.0.32.19961227103319.007dc100@central.TanSoft.COM>; from Rob Miracle on Dec 27, 1996 10:33:20 -0500 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk According to Rob Miracle: > Ok, I accept that answer, but I didn't see the 3.0/Current stuff on > ftp.freebsd.org. Where can I find it these days? See > Also, we are going to have to make a decision on our production machines. > Right now we are running them on the 100696 Snapshot with SMP enable and > it is running fine, but for the immediate future (since I can't get that > Snap any more), do I go with 2.2 Release for our customer machines, or do I > try to stay with the current branch so I can use my multi-processors? If you want to keep both your processors working, you need 3.0-CURRENT+SMP. It will be merged into mainstream CURRENT someday. -- Ollivier ROBERT -=- The daemon is FREE! -=- roberto@keltia.freenix.fr FreeBSD keltia.freenix.fr 3.0-CURRENT #33: Sat Dec 21 12:57:17 CET 1996 From owner-freebsd-smp Sat Dec 28 08:41:55 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id IAA16938 for smp-outgoing; Sat, 28 Dec 1996 08:41:55 -0800 (PST) Received: from netwolf.NetMasters.com (netwolf.netmasters.com [199.201.245.5]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id IAA16931 for ; Sat, 28 Dec 1996 08:41:53 -0800 (PST) Received: from netwolf.NetMasters.com (localhost [127.0.0.1]) by netwolf.NetMasters.com (8.8.4/8.7.3) with ESMTP id LAA28588; Sat, 28 Dec 1996 11:41:51 -0500 (EST) Message-Id: <199612281641.LAA28588@netwolf.NetMasters.com> X-Mailer: exmh version 1.6.9 8/22/96 To: freebsd-smp@freebsd.org cc: louie@wa3ymh.transsys.com Subject: psignal under SMP Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 28 Dec 1996 11:41:51 -0500 From: Michael Petry Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk While showing a buddy the SMP build of FreeBSD we came across an interesting bug. I opened two windows from my work back to my dual P6-180 system at home. I compiled the program main() { foo: goto foo; } I exec'd it in the second window and was showing that things were still snappy in the primary window. I then went back to the second window and tried to kill the process. No go. Hmmm, scratched head and went back to the primary and ran a few utilities to find out what was going on. Boom!! The process in the second window dies. My buddy and I kicked the issue around for a while and then it hit us. The hard cpu process had its signal posted, but never got rescheduled to see the signals because it was happy and cozy running on the second processor. My running of some utilities were enough to force it off its cpu and cause it to be rescheduled and pickup its signal. Though this is an extreme case (no syscalls , just a hard cpu loop) We thought some more and concluded that that it would reap havoc with processes that were computing in wait of a SIGIOs on a relatively idle system. It looked to us like psignal.c would have to be made smarter to know not only if a process is runnable, but also if it is running on another CPU and be IPI'd. From owner-freebsd-smp Sat Dec 28 08:48:56 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id IAA17070 for smp-outgoing; Sat, 28 Dec 1996 08:48:56 -0800 (PST) Received: from spinner.DIALix.COM (root@spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id IAA17065 for ; Sat, 28 Dec 1996 08:48:41 -0800 (PST) Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id AAA00835; Sun, 29 Dec 1996 00:48:11 +0800 (WST) Message-Id: <199612281648.AAA00835@spinner.DIALix.COM> X-Mailer: exmh version 1.6.9 8/22/96 To: Michael Petry cc: freebsd-smp@freebsd.org, louie@wa3ymh.transsys.com Subject: Re: psignal under SMP In-reply-to: Your message of "Sat, 28 Dec 1996 11:41:51 EST." <199612281641.LAA28588@netwolf.NetMasters.com> Date: Sun, 29 Dec 1996 00:48:10 +0800 From: Peter Wemm Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Michael Petry wrote: > > While showing a buddy the SMP build of FreeBSD we came across an interesting > bug. I opened two windows from my work back to my dual P6-180 system at home . > I compiled the program > > main() { > foo: > goto foo; > } > > I exec'd it in the second window and was showing that things were still snapp y > in the primary window. I then went back to the second window and tried to > kill the process. No go. Hmmm, scratched head and went back to the primary > and ran a few utilities to find out what was going on. Boom!! The process in > the second window dies. My buddy and I kicked the issue around for a while > and then it hit us. The hard cpu process had its signal posted, but never got > rescheduled to see the signals because it was happy and cozy running on the > second processor. My running of some utilities were enough to force it off > its cpu and cause it to be rescheduled and pickup its signal. > > Though this is an extreme case (no syscalls , just a hard cpu loop) We though t > some more and concluded that that it would reap havoc with processes that wer e > computing in wait of a SIGIOs on a relatively idle system. > > It looked to us like psignal.c would have to be made smarter to know not only > if a process is runnable, but also if it is running on another CPU and be > IPI'd. Yep, this is entry #13 on the TODO list in the smp tree: 13. Send APIC->APIC irq when process is killed. .. yeah.. this is pretty important if the process is running... BTW, this is real panic potential. Given that cpu1 is running in a tight loop in user mode and cpu0 sends it an "untrappable" signal, cpu0 can mark the process data structures as "dead" and free them up since the process obviously doesn't need to be scheduled again... If cpu1 then goes to update the process stats.... boom! Cheers, -Peter From owner-freebsd-smp Sat Dec 28 08:54:06 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id IAA17227 for smp-outgoing; Sat, 28 Dec 1996 08:54:06 -0800 (PST) Received: from netwolf.NetMasters.com (netwolf.netmasters.com [199.201.245.5]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id IAA17222 for ; Sat, 28 Dec 1996 08:54:04 -0800 (PST) Received: from netwolf.NetMasters.com (localhost [127.0.0.1]) by netwolf.NetMasters.com (8.8.4/8.7.3) with ESMTP id LAA28624 for ; Sat, 28 Dec 1996 11:54:03 -0500 (EST) Message-Id: <199612281654.LAA28624@netwolf.NetMasters.com> X-Mailer: exmh version 1.6.9 8/22/96 To: freebsd-smp@freebsd.org Subject: SMP make world Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 28 Dec 1996 11:54:02 -0500 From: Michael Petry Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk A bunch of the header files are dependent on opt_smp.h. With opt_smp.h being a config generated file that is local to a particular kernel build, I was wondering how a "make world" was going to be sorted out. I've tried building an SMP kernel for a single P5 CPU without an APIC and it dove quickly into the ground. If the plan is "one" kernel, I'd be glad to help by spending some time to understand the crash. Mike Petry From owner-freebsd-smp Sat Dec 28 09:43:26 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA18694 for smp-outgoing; Sat, 28 Dec 1996 09:43:26 -0800 (PST) Received: from clem.systemsix.com (clem.systemsix.com [198.99.86.131]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA18678 for ; Sat, 28 Dec 1996 09:43:22 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by clem.systemsix.com (8.6.12/8.6.12) with SMTP id KAA19708; Sat, 28 Dec 1996 10:42:36 -0700 Message-Id: <199612281742.KAA19708@clem.systemsix.com> X-Authentication-Warning: clem.systemsix.com: Host localhost didn't use HELO protocol X-Mailer: exmh version 1.6.5 12/11/95 From: Steve Passe To: Michael Petry cc: freebsd-smp@freebsd.org Subject: Re: SMP make world In-reply-to: Your message of "Sat, 28 Dec 1996 11:54:02 EST." <199612281654.LAA28624@netwolf.NetMasters.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 28 Dec 1996 10:42:35 -0700 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, > > A bunch of the header files are dependent on opt_smp.h. With opt_smp.h being > a config generated file that is local to a particular kernel build, I was > wondering how a "make world" was going to be sorted out. I've tried building > an SMP kernel for a single P5 CPU without an APIC and it dove quickly into the > ground. If the plan is "one" kernel, I'd be glad to help by spending some > time to understand the crash. If by "one kernel" you mean a binary kernel that runs on both SMP and non-SMP boards, I don't see it in the future. IMHO there are just too many things that would need to be conditionalized at run-time to be efficient. If by "one kernel" you mean one unified set of world sources, that will work itself out when we do the SMP/3.0-current merge. -- Steve Passe | powered by smp@csn.net | FreeBSD From owner-freebsd-smp Sat Dec 28 09:50:05 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA18875 for smp-outgoing; Sat, 28 Dec 1996 09:50:05 -0800 (PST) Received: from tfs.com (tfs.com [140.145.250.1]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA18870 for ; Sat, 28 Dec 1996 09:50:03 -0800 (PST) Received: from critter.dk.tfs.com by tfs.com (smail3.1.28.1) with SMTP id m0ve2t1-0003vuC; Sat, 28 Dec 96 09:49 PST Received: from critter.dk.tfs.com (localhost.dk.tfs.com [127.0.0.1]) by critter.dk.tfs.com (8.8.2/8.8.2) with ESMTP id SAA03718; Sat, 28 Dec 1996 18:52:39 +0100 (MET) To: Michael Petry cc: freebsd-smp@freebsd.org, louie@wa3ymh.transsys.com Subject: Re: psignal under SMP In-reply-to: Your message of "Sat, 28 Dec 1996 11:41:51 EST." <199612281641.LAA28588@netwolf.NetMasters.com> Date: Sat, 28 Dec 1996 18:52:23 +0100 Message-ID: <3716.851795543@critter.dk.tfs.com> From: Poul-Henning Kamp Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk In message <199612281641.LAA28588@netwolf.NetMasters.com>, Michael Petry writes >The hard cpu process had its signal posted, but never got >rescheduled to see the signals because it was happy and cozy running on the >second processor. My running of some utilities were enough to force it off >its cpu and cause it to be rescheduled and pickup its signal. yup, known bogon. >It looked to us like psignal.c would have to be made smarter to know not only >if a process is runnable, but also if it is running on another CPU and be >IPI'd. Exactly. Signals are a pain in the butt for MP and heavily pipe-lined/parallel systems, in particular synchronous signals. -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc. Power and ignorance is a disgusting cocktail. From owner-freebsd-smp Sat Dec 28 12:51:29 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id MAA24564 for smp-outgoing; Sat, 28 Dec 1996 12:51:29 -0800 (PST) Received: from atlantis.nconnect.net (root@[206.54.227.6]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id MAA24559 for ; Sat, 28 Dec 1996 12:51:27 -0800 (PST) Received: from arabian.astrolab.org (dial49.nconnect.net [206.54.227.49]) by atlantis.nconnect.net (8.8.4/8.7.3) with SMTP id OAA20112 for ; Sat, 28 Dec 1996 14:48:40 -0600 (CST) Message-ID: <32C587F8.41C67EA6@nconnect.net> Date: Sat, 28 Dec 1996 14:50:00 -0600 From: Randy DuCharme X-Mailer: Mozilla 3.01Gold (X11; I; FreeBSD 3.0-CURRENT i386) MIME-Version: 1.0 To: smp@freebsd.org Subject: starting 2nd cpu Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Greetings, I think I'm a bit confused as to what exactly it is I need to enable SMP. I've gotten the SMP sources via sup (cvsup) as per http://www.freebsd.org/~fsmp/smp.html. I've also cvsupped the entire 3.0 source tree and, while this eliminated the 'bad register name' errors, and allowed me to build the kernel, rebooting and doing (single, or multi-user) a sysctl -w kern.smp_active=2 returns "unknown oid 'kern.smp_active'. Please forgive me if this is a really stupid question, but... just what tree(s) exactly do I need to grab?? Thanks Randy From owner-freebsd-smp Sat Dec 28 14:02:35 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id OAA26562 for smp-outgoing; Sat, 28 Dec 1996 14:02:35 -0800 (PST) Received: from clem.systemsix.com (clem.systemsix.com [198.99.86.131]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id OAA26557 for ; Sat, 28 Dec 1996 14:02:31 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by clem.systemsix.com (8.6.12/8.6.12) with SMTP id PAA20854; Sat, 28 Dec 1996 15:01:39 -0700 Message-Id: <199612282201.PAA20854@clem.systemsix.com> X-Authentication-Warning: clem.systemsix.com: Host localhost didn't use HELO protocol X-Mailer: exmh version 1.6.5 12/11/95 From: Steve Passe To: Randy DuCharme cc: smp@freebsd.org Subject: Re: starting 2nd cpu In-reply-to: Your message of "Sat, 28 Dec 1996 14:50:00 CST." <32C587F8.41C67EA6@nconnect.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 28 Dec 1996 15:01:39 -0700 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi, > I think I'm a bit confused as to what exactly it is I need to enable > SMP. I've gotten the SMP sources via sup (cvsup) as per > http://www.freebsd.org/~fsmp/smp.html. I've also cvsupped the entire > 3.0 source tree and, while this eliminated the 'bad register name' > errors, and allowed me to build the kernel, rebooting and doing (single, > or multi-user) a sysctl -w kern.smp_active=2 returns "unknown oid > 'kern.smp_active'. Please forgive me if this is a really stupid > question, but... just what tree(s) exactly do I need to grab?? I think you just failed to add "options SMP" to your kernel config file. -- Steve Passe | powered by smp@csn.net | FreeBSD From owner-freebsd-smp Sat Dec 28 14:57:52 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id OAA29984 for smp-outgoing; Sat, 28 Dec 1996 14:57:52 -0800 (PST) Received: from atlantis.nconnect.net (root@[206.54.227.6]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id OAA29977 for ; Sat, 28 Dec 1996 14:57:49 -0800 (PST) Received: from arabian.astrolab.org (dial211.nconnect.net [206.54.227.211]) by atlantis.nconnect.net (8.8.4/8.7.3) with SMTP id QAA22356; Sat, 28 Dec 1996 16:54:31 -0600 (CST) Message-ID: <32C5A57C.41C67EA6@nconnect.net> Date: Sat, 28 Dec 1996 16:55:56 -0600 From: Randy DuCharme X-Mailer: Mozilla 3.01Gold (X11; I; FreeBSD 3.0-CURRENT i386) MIME-Version: 1.0 To: Steve Passe CC: smp@freebsd.org Subject: Re: starting 2nd cpu References: <199612282201.PAA20854@clem.systemsix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Steve Passe wrote: > > Hi, > > > I think I'm a bit confused as to what exactly it is I need to enable > > SMP. I've gotten the SMP sources via sup (cvsup) as per > > http://www.freebsd.org/~fsmp/smp.html. I've also cvsupped the entire > > 3.0 source tree and, while this eliminated the 'bad register name' > > errors, and allowed me to build the kernel, rebooting and doing (single, > > or multi-user) a sysctl -w kern.smp_active=2 returns "unknown oid > > 'kern.smp_active'. Please forgive me if this is a really stupid > > question, but... just what tree(s) exactly do I need to grab?? > > I think you just failed to add "options SMP" to your kernel config file. > > -- > Steve Passe | powered by > smp@csn.net | FreeBSD Hi Steve, Here,s my config file.... # $Id: ARABIAN_SMP, v 1.0.0 1996/12/26 rdd Exp $ machine "i386" cpu "I586_CPU" ident ARABIAN_SMP maxusers 10 options INET #InterNETworking options FFS #Berkeley Fast Filesystem options NFS #Network Filesystem options "CD9660" #ISO 9660 Filesystem options PROCFS #Process filesystem options "COMPAT_43" #Compatible with BSD 4.3 [KEEP THIS!] options UCONSOLE #Allow users to grab the console #options FAILSAFE #Be conservative options USERCONFIG #boot -c editor options VISUAL_USERCONFIG #visual boot -c editor options "MAXMEM=98284" options "AUTO_EOI_1" options "AUTO_EOI_2" options CHILD_MAX=128 options OPEN_MAX=128 options SMP options NCPU=2 config kernel root on sd1 controller isa0 controller pci0 controller fdc0 at isa? port "IO_FD1" bio irq 6 drq 2 vector fdintr disk fd0 at fdc0 drive 0 disk fd1 at fdc0 drive 1 controller ahc0 controller scbus0 device sd0 device st0 device cd0 #Only need one of these, the code dynamically grows # syscons is the default console driver, resembling an SCO console device sc0 at isa? port "IO_KBD" tty irq 1 vector scintr # Enable this and PCVT_FREEBSD for pcvt vt220 compatible console driver #device vt0 at isa? port "IO_KBD" tty irq 1 vector pcrint #options PCVT_FREEBSD=210 # pcvt running on FreeBSD >= 2.0.5 #options XSERVER # include code for XFree86 #options FAT_CURSOR # start with block cursor # Mandatory, don't remove device npx0 at isa? port "IO_NPX" iosiz 0x0 flags 0x0 irq 13 vector npxintr device sio0 at isa? port "IO_COM1" tty irq 4 vector siointr device sio1 at isa? port "IO_COM2" tty irq 3 vector siointr device lpt0 at isa? port? tty # Order is important here due to intrusive probes, do *not* alphabetize # this list of network interfaces until the probes have been fixed. # Right now it appears that the ie0 must be probed before ep0. See # revision 1.20 of this file. device de0 controller snd0 device pas0 at isa? port 0x388 irq 10 drq 6 vector pasintr device sb0 at isa? port 0x220 irq 7 conflicts drq 1 vector sbintr pseudo-device loop pseudo-device ether pseudo-device log pseudo-device sl 1 # ijppp uses tun instead of ppp device #pseudo-device ppp 1 pseudo-device tun 1 pseudo-device pty 16 pseudo-device gzip # Exec gzipped a.out's # KTRACE enables the system-call tracing facility ktrace(2). # This adds 4 KB bloat to your kernel, and slightly increases # the costs of each syscall. #options KTRACE #kernel tracing It's definitely in there. Any other ideas? Thanks Randy From owner-freebsd-smp Sat Dec 28 17:17:41 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id RAA06977 for smp-outgoing; Sat, 28 Dec 1996 17:17:41 -0800 (PST) Received: from mail001.mediacity.com (mail001.mediacity.com [206.24.105.68]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id RAA06971 for ; Sat, 28 Dec 1996 17:17:38 -0800 (PST) From: brian@mediacity.com Received: (qmail-queue invoked from smtpd); 29 Dec 1996 01:17:01 -0000 Received: from home001.mediacity.com (HELO mediacity.com) (qmailr@206.24.105.66) by mail001.mediacity.com with SMTP; 29 Dec 1996 01:17:01 -0000 Received: (qmail-queue invoked by uid 100); 29 Dec 1996 01:15:53 -0000 Message-ID: <19961229011553.25152.qmail@mediacity.com> Subject: A question of how much memory? To: freebsd-smp@freebsd.org Date: Sat, 28 Dec 1996 17:15:53 -0800 (PST) Reply-To: brian@mediacity.com X-Mailer: ELM [version 2.4ME+ PL22 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I have an ASUS 2xPP200 with 256MB. It has been my guess that I therefore need to set MAXMEM to 128*1024, assuming each processor has its own private 128MBs to work with. This does indeed work. However, my assumptions may be wrong, and memory may be shared in some way. In which case MAXMEM may need to by 256*1024. I've tried this and the kernel fails to boot with a Panic message along the lines of unable to [reach/allocate?] bounce buffer. Basically, which is it supposed to work? Thanks, -- Brian Litzinger Powered by FreeBSD http[s]://www.mpress.com From owner-freebsd-smp Sat Dec 28 19:00:01 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id TAA10361 for smp-outgoing; Sat, 28 Dec 1996 19:00:01 -0800 (PST) Received: from root.com (implode.root.com [198.145.90.17]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id SAA10322 for ; Sat, 28 Dec 1996 18:59:58 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by root.com (8.7.6/8.6.5) with SMTP id SAA02634; Sat, 28 Dec 1996 18:59:56 -0800 (PST) Message-Id: <199612290259.SAA02634@root.com> X-Authentication-Warning: implode.root.com: Host localhost [127.0.0.1] didn't use HELO protocol To: brian@mediacity.com cc: freebsd-smp@freebsd.org Subject: Re: A question of how much memory? In-reply-to: Your message of "Sat, 28 Dec 1996 17:15:53 PST." <19961229011553.25152.qmail@mediacity.com> From: David Greenman Reply-To: dg@root.com Date: Sat, 28 Dec 1996 18:59:56 -0800 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >I have an ASUS 2xPP200 with 256MB. It has been my guess that >I therefore need to set MAXMEM to 128*1024, assuming each >processor has its own private 128MBs to work with. This does >indeed work. > >However, my assumptions may be wrong, and memory may be shared >in some way. In which case MAXMEM may need to by 256*1024. >I've tried this and the kernel fails to boot with a Panic >message along the lines of unable to [reach/allocate?] bounce >buffer. > >Basically, which is it supposed to work? You're correct that your assumption is wrong. All of the memory is shared in SMP PCs. The reason the machine panics is because you have run out of kernel virtual memory. You need to more carefully tune the various parameters in your kernel config file (the ones that take lots of virtual memory like NMBCLUSTERS). The fact that it doesn't panic with 128MB indicates that you are right on the edge of running out and the extra kernel data structures that are allocated to manage 256MB is just enough to run out. BTW, why do you have bounce buffers configured in your kernel?? -DG David Greenman Core-team/Principal Architect, The FreeBSD Project