From owner-freebsd-current@FreeBSD.ORG Thu Apr 30 21:28:03 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from [127.0.0.1] (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id 3F54D106564A; Thu, 30 Apr 2009 21:28:02 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: pluknet Date: Thu, 30 Apr 2009 17:27:53 -0400 User-Agent: KMail/1.6.2 References: <20090429161626.GQ1387@albert.catwhisker.org> <200904301656.51003.jkim@FreeBSD.org> In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: Multipart/Mixed; boundary="Boundary-00=_bfh+JouM/gBbBo9" Message-Id: <200904301727.55099.jkim@FreeBSD.org> Cc: Scott Ullrich , freebsd-current@freebsd.org, Andriy Gapon Subject: Re: Panic "Fatal trap 18: integer divide fault while in kernel mode" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Apr 2009 21:28:03 -0000 --Boundary-00=_bfh+JouM/gBbBo9 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Thursday 30 April 2009 05:09 pm, pluknet wrote: > 2009/5/1 Jung-uk Kim : > > On Thursday 30 April 2009 04:25 pm, pluknet wrote: > >> 2009/4/30 Jung-uk Kim : > >> > On Thursday 30 April 2009 12:37 pm, pluknet wrote: > >> >> 2009/4/30 Andriy Gapon : > >> >> > on 30/04/2009 18:58 David Wolfskill said the following: > >> >> >> On Thu, Apr 30, 2009 at 06:35:32PM +0300, Andriy Gapon wrote: > >> >> >>> on 30/04/2009 18:18 David Wolfskill said the following: > >> >> >>>> On Wed, Apr 29, 2009 at 09:16:26AM -0700, David > >> >> >>>> Wolfskill > >> > > >> > wrote: > >> >> >>>>> Is there anything of use I might get from DDB? > >> >> >>>> > >> >> >>>> I can still poke around there for a bit, if that would > >> >> >>>> be useful. > >> >> >>> > >> >> >>> In general the stack trace[*] should be provided at the > >> >> >>> very least, otherwise people have hard figuring out where > >> >> >>> the problem occurred, so right people may just not notice > >> >> >>> a report. > >> >> >> > >> >> >> Sorry; it happened so quickly, I wasn't at all certain > >> >> >> there would be enough to show: > >> >> >> > >> >> >> db> bt > >> >> >> Tracing pid 0 tid 100000 td 0xc0d43610 > >> >> >> cpu_topo(2,c1420d34,c081ff07,c1420d58,c0820042,...) at > >> >> >> cpu_topo+0x43 smp_topo(c0804378,2,c4145a5c,fffffff,0,...) > >> >> >> at smp_topo+0x10b > >> >> >> sched_setup(0,141ec00,141ec00,141e000,1425000,...) at > >> >> >> sched_setup+0x1a mi_startup() at mi_startup+0x96 > >> >> >> begin() at begin+0x2c > >> >> > > >> >> > My guess is that (cpu_cores * cpu_logical) somehow equals > >> >> > to zero. > >> >> > >> >> That was masked earlier by additional checks on zero, > >> >> and now that routine moved to the separate function > >> >> (and to separate call path from subr_smp.c:mp_start() > >> >> which seems not to be called). > >> >> > >> >> > Have you by a chance saved this crash dump? > >> >> > I think that t would be interesting to look at it in kgdb. > >> > > >> > Please try the attached patch. > >> > > >> > Jung-uk Kim > >> > >> The strange thing is why cpu_mp_start() is called at all in case > >> when there is only one CPU in system. It should early return in > >> mp_start(). (I saw two reports and both of them were UP > >> systems). > > > > I don't think cpu_mp_start() is the culprit. > > Actually you are right. I was wrong and cpu_mp_start() is not > called here on UP. > > > When SMP kernel is used > > on UP system, scheduler still tries to probe topology although it > > should be simply smp_topo_none() instead of calling MD > > cpu_topo(). In fact, I had a simple band-aid in cpu_topo() in my > > local tree to shut up annoying: > > > > WARNING: Non-uniform processors. > > WARNING: Using suboptimal topology. > > > > messages when SMP is forced off or a core is disabled on > > multi-core systems, etc. It wasn't critical before but it is > > now, > > unfortunately. > > > > Jung-uk Kim > > I decided to go another way. Before last changes in mp_machdep.c > cpu_topo() included > previously that piece of code which now is in topo_probe(). > > What if just return that part back to cpu_topo() ? > > David, can you thy this? It works for me now at least. > > $ diff -urp sys/amd64/amd64/mp_machdep.c.orig > sys/amd64/amd64/mp_machdep.c --- sys/amd64/amd64/mp_machdep.c.orig > 2009-05-01 00:59:55.000000000 +0400 +++ > sys/amd64/amd64/mp_machdep.c 2009-05-01 01:00:20.000000000 > +0400 @@ -309,6 +309,8 @@ cpu_topo(void) > { > int cg_flags; > > + topo_probe(); > + > /* > * Determine whether any threading flags are > * necessry. > $ diff -urp sys/i386/i386/mp_machdep.c.orig > sys/i386/i386/mp_machdep.c --- sys/i386/i386/mp_machdep.c.orig > 2009-05-01 01:01:53.000000000 +0400 +++ sys/i386/i386/mp_machdep.c > 2009-05-01 01:01:41.000000000 +0400 @@ -362,6 +362,8 @@ > cpu_topo(void) > { > int cg_flags; > > + topo_probe(); > + > /* > * Determine whether any threading flags are > * necessry. Ah, you're right. More complete patch is attached. Jung-uk Kim --Boundary-00=_bfh+JouM/gBbBo9 Content-Type: text/x-diff; charset="iso-8859-1"; name="mp_machdep.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mp_machdep.diff" --- sys/i386/i386/mp_machdep.c (revision 191699) +++ sys/i386/i386/mp_machdep.c (working copy) @@ -267,6 +267,8 @@ else if (type == CPUID_TYPE_CORE) cpu_cores = cnt; } + if (cpu_cores == 0) + cpu_cores = 1; if (cpu_logical == 0) cpu_logical = 1; cpu_cores /= cpu_logical; @@ -345,16 +347,21 @@ static void topo_probe(void) { + static int cpu_topo_probed = 0; + if (cpu_topo_probed) + return; + logical_cpus = logical_cpus_mask = 0; if (cpu_high >= 0xb) topo_probe_0xb(); else if (cpu_high) topo_probe_0x4(); if (cpu_cores == 0) - cpu_cores = mp_ncpus; + cpu_cores = mp_ncpus > 0 ? mp_ncpus : 1; if (cpu_logical == 0) cpu_logical = 1; + cpu_topo_probed = 1; } struct cpu_group * @@ -366,6 +373,7 @@ * Determine whether any threading flags are * necessry. */ + topo_probe(); if (cpu_logical > 1 && hyperthreading_cpus) cg_flags = CG_FLAG_HTT; else if (cpu_logical > 1) --- sys/amd64/amd64/mp_machdep.c (revision 191699) +++ sys/amd64/amd64/mp_machdep.c (working copy) @@ -214,6 +214,8 @@ else if (type == CPUID_TYPE_CORE) cpu_cores = cnt; } + if (cpu_cores == 0) + cpu_cores = 1; if (cpu_logical == 0) cpu_logical = 1; cpu_cores /= cpu_logical; @@ -292,16 +294,21 @@ static void topo_probe(void) { + static int cpu_topo_probed = 0; + if (cpu_topo_probed) + return; + logical_cpus = logical_cpus_mask = 0; if (cpu_high >= 0xb) topo_probe_0xb(); else if (cpu_high) topo_probe_0x4(); if (cpu_cores == 0) - cpu_cores = mp_ncpus; + cpu_cores = mp_ncpus > 0 ? mp_ncpus : 1; if (cpu_logical == 0) cpu_logical = 1; + cpu_topo_probed = 1; } struct cpu_group * @@ -313,6 +320,7 @@ * Determine whether any threading flags are * necessry. */ + topo_probe(); if (cpu_logical > 1 && hyperthreading_cpus) cg_flags = CG_FLAG_HTT; else if (cpu_logical > 1) --Boundary-00=_bfh+JouM/gBbBo9--