Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Apr 2009 17:27:53 -0400
From:      Jung-uk Kim <jkim@FreeBSD.org>
To:        pluknet <pluknet@gmail.com>
Cc:        Scott Ullrich <sullrich@gmail.com>, freebsd-current@freebsd.org, Andriy Gapon <avg@icyb.net.ua>
Subject:   Re: Panic "Fatal trap 18: integer divide fault while in kernel mode"
Message-ID:  <200904301727.55099.jkim@FreeBSD.org>
In-Reply-To: <a31046fc0904301409y6db1b591nb6bc4887ab8bff0f@mail.gmail.com>
References:  <20090429161626.GQ1387@albert.catwhisker.org> <200904301656.51003.jkim@FreeBSD.org> <a31046fc0904301409y6db1b591nb6bc4887ab8bff0f@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Boundary-00=_bfh+JouM/gBbBo9
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Thursday 30 April 2009 05:09 pm, pluknet wrote:
> 2009/5/1 Jung-uk Kim <jkim@freebsd.org>:
> > On Thursday 30 April 2009 04:25 pm, pluknet wrote:
> >> 2009/4/30 Jung-uk Kim <jkim@freebsd.org>:
> >> > On Thursday 30 April 2009 12:37 pm, pluknet wrote:
> >> >> 2009/4/30 Andriy Gapon <avg@icyb.net.ua>:
> >> >> > on 30/04/2009 18:58 David Wolfskill said the following:
> >> >> >> On Thu, Apr 30, 2009 at 06:35:32PM +0300, Andriy Gapon 
wrote:
> >> >> >>> on 30/04/2009 18:18 David Wolfskill said the following:
> >> >> >>>> On Wed, Apr 29, 2009 at 09:16:26AM -0700, David
> >> >> >>>> Wolfskill
> >> >
> >> > wrote:
> >> >> >>>>> Is there anything of use I might get from DDB?
> >> >> >>>>
> >> >> >>>> I can still poke around there for a bit, if that would
> >> >> >>>> be useful.
> >> >> >>>
> >> >> >>> In general the stack trace[*] should be provided at the
> >> >> >>> very least, otherwise people have hard figuring out where
> >> >> >>> the problem occurred, so right people may just not notice
> >> >> >>> a report.
> >> >> >>
> >> >> >> Sorry; it happened so quickly, I wasn't at all certain
> >> >> >> there would be enough to show:
> >> >> >>
> >> >> >> db> bt
> >> >> >> Tracing pid 0 tid 100000 td 0xc0d43610
> >> >> >> cpu_topo(2,c1420d34,c081ff07,c1420d58,c0820042,...) at
> >> >> >> cpu_topo+0x43 smp_topo(c0804378,2,c4145a5c,fffffff,0,...)
> >> >> >> at smp_topo+0x10b
> >> >> >> sched_setup(0,141ec00,141ec00,141e000,1425000,...) at
> >> >> >> sched_setup+0x1a mi_startup() at mi_startup+0x96
> >> >> >> begin() at begin+0x2c
> >> >> >
> >> >> > My guess is that (cpu_cores * cpu_logical) somehow equals
> >> >> > to zero.
> >> >>
> >> >> That was masked earlier by  additional checks on zero,
> >> >> and now that routine moved to the separate function
> >> >> (and to separate call path from subr_smp.c:mp_start()
> >> >> which seems not to be called).
> >> >>
> >> >> > Have you by a chance saved this crash dump?
> >> >> > I think that t would be interesting to look at it in kgdb.
> >> >
> >> > Please try the attached patch.
> >> >
> >> > Jung-uk Kim
> >>
> >> The strange thing is why cpu_mp_start() is called at all in case
> >> when there is only one CPU in system. It should early return in
> >> mp_start(). (I saw two reports and both of them were UP
> >> systems).
> >
> > I don't think cpu_mp_start() is the culprit.
>
> Actually you are right. I was wrong and cpu_mp_start() is not
> called here on UP.
>
> > When SMP kernel is used
> > on UP system, scheduler still tries to probe topology although it
> > should be simply smp_topo_none() instead of calling MD
> > cpu_topo(). In fact, I had a simple band-aid in cpu_topo() in my
> > local tree to shut up annoying:
> >
> >        WARNING: Non-uniform processors.
> >        WARNING: Using suboptimal topology.
> >
> > messages when SMP is forced off or a core is disabled on
> > multi-core systems, etc.  It wasn't critical before but it is
> > now,
> > unfortunately.
> >
> > Jung-uk Kim
>
> I decided to go another way. Before last changes in mp_machdep.c
> cpu_topo() included
> previously that piece of code which now is in topo_probe().
>
> What if just return that part back to cpu_topo() ?
>
> David, can you thy this? It works for me now at least.
>
> $ diff -urp sys/amd64/amd64/mp_machdep.c.orig
> sys/amd64/amd64/mp_machdep.c --- sys/amd64/amd64/mp_machdep.c.orig 
>  2009-05-01 00:59:55.000000000 +0400 +++
> sys/amd64/amd64/mp_machdep.c        2009-05-01 01:00:20.000000000
> +0400 @@ -309,6 +309,8 @@ cpu_topo(void)
>  {
>         int cg_flags;
>
> +       topo_probe();
> +
>         /*
>          * Determine whether any threading flags are
>          * necessry.
> $ diff -urp sys/i386/i386/mp_machdep.c.orig
> sys/i386/i386/mp_machdep.c --- sys/i386/i386/mp_machdep.c.orig    
> 2009-05-01 01:01:53.000000000 +0400 +++ sys/i386/i386/mp_machdep.c 
> 2009-05-01 01:01:41.000000000 +0400 @@ -362,6 +362,8 @@
> cpu_topo(void)
>  {
>         int cg_flags;
>
> +       topo_probe();
> +
>         /*
>          * Determine whether any threading flags are
>          * necessry.

Ah, you're right.  More complete patch is attached.

Jung-uk Kim

--Boundary-00=_bfh+JouM/gBbBo9
Content-Type: text/x-diff;
  charset="iso-8859-1";
  name="mp_machdep.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="mp_machdep.diff"

--- sys/i386/i386/mp_machdep.c	(revision 191699)
+++ sys/i386/i386/mp_machdep.c	(working copy)
@@ -267,6 +267,8 @@
 		else if (type == CPUID_TYPE_CORE)
 			cpu_cores = cnt;
 	}
+	if (cpu_cores == 0)
+		cpu_cores = 1;
 	if (cpu_logical == 0)
 		cpu_logical = 1;
 	cpu_cores /= cpu_logical;
@@ -345,16 +347,21 @@
 static void
 topo_probe(void)
 {
+	static int cpu_topo_probed = 0;
 
+	if (cpu_topo_probed)
+		return;
+
 	logical_cpus = logical_cpus_mask = 0;
 	if (cpu_high >= 0xb)
 		topo_probe_0xb();
 	else if (cpu_high)
 		topo_probe_0x4();
 	if (cpu_cores == 0)
-		cpu_cores = mp_ncpus;
+		cpu_cores = mp_ncpus > 0 ? mp_ncpus : 1;
 	if (cpu_logical == 0)
 		cpu_logical = 1;
+	cpu_topo_probed = 1;
 }
 
 struct cpu_group *
@@ -366,6 +373,7 @@
 	 * Determine whether any threading flags are
 	 * necessry.
 	 */
+	topo_probe();
 	if (cpu_logical > 1 && hyperthreading_cpus)
 		cg_flags = CG_FLAG_HTT;
 	else if (cpu_logical > 1)
--- sys/amd64/amd64/mp_machdep.c	(revision 191699)
+++ sys/amd64/amd64/mp_machdep.c	(working copy)
@@ -214,6 +214,8 @@
 		else if (type == CPUID_TYPE_CORE)
 			cpu_cores = cnt;
 	}
+	if (cpu_cores == 0)
+		cpu_cores = 1;
 	if (cpu_logical == 0)
 		cpu_logical = 1;
 	cpu_cores /= cpu_logical;
@@ -292,16 +294,21 @@
 static void
 topo_probe(void)
 {
+	static int cpu_topo_probed = 0;
 
+	if (cpu_topo_probed)
+		return;
+
 	logical_cpus = logical_cpus_mask = 0;
 	if (cpu_high >= 0xb)
 		topo_probe_0xb();
 	else if (cpu_high)
 		topo_probe_0x4();
 	if (cpu_cores == 0)
-		cpu_cores = mp_ncpus;
+		cpu_cores = mp_ncpus > 0 ? mp_ncpus : 1;
 	if (cpu_logical == 0)
 		cpu_logical = 1;
+	cpu_topo_probed = 1;
 }
 
 struct cpu_group *
@@ -313,6 +320,7 @@
 	 * Determine whether any threading flags are
 	 * necessry.
 	 */
+	topo_probe();
 	if (cpu_logical > 1 && hyperthreading_cpus)
 		cg_flags = CG_FLAG_HTT;
 	else if (cpu_logical > 1)

--Boundary-00=_bfh+JouM/gBbBo9--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200904301727.55099.jkim>