From owner-freebsd-current@FreeBSD.ORG Thu Apr 30 20:56:58 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from [127.0.0.1] (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id EDDFF1065670; Thu, 30 Apr 2009 20:56:57 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: freebsd-current@FreeBSD.org Date: Thu, 30 Apr 2009 16:56:38 -0400 User-Agent: KMail/1.6.2 References: <20090429161626.GQ1387@albert.catwhisker.org> <200904301552.03118.jkim@FreeBSD.org> In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200904301656.51003.jkim@FreeBSD.org> Cc: pluknet , Andriy Gapon , Scott Ullrich Subject: Re: Panic "Fatal trap 18: integer divide fault while in kernel mode" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Apr 2009 20:56:58 -0000 On Thursday 30 April 2009 04:25 pm, pluknet wrote: > 2009/4/30 Jung-uk Kim : > > On Thursday 30 April 2009 12:37 pm, pluknet wrote: > >> 2009/4/30 Andriy Gapon : > >> > on 30/04/2009 18:58 David Wolfskill said the following: > >> >> On Thu, Apr 30, 2009 at 06:35:32PM +0300, Andriy Gapon wrote: > >> >>> on 30/04/2009 18:18 David Wolfskill said the following: > >> >>>> On Wed, Apr 29, 2009 at 09:16:26AM -0700, David Wolfskill > > > > wrote: > >> >>>>> Is there anything of use I might get from DDB? > >> >>>> > >> >>>> I can still poke around there for a bit, if that would be > >> >>>> useful. > >> >>> > >> >>> In general the stack trace[*] should be provided at the very > >> >>> least, otherwise people have hard figuring out where the > >> >>> problem occurred, so right people may just not notice a > >> >>> report. > >> >> > >> >> Sorry; it happened so quickly, I wasn't at all certain there > >> >> would be enough to show: > >> >> > >> >> db> bt > >> >> Tracing pid 0 tid 100000 td 0xc0d43610 > >> >> cpu_topo(2,c1420d34,c081ff07,c1420d58,c0820042,...) at > >> >> cpu_topo+0x43 smp_topo(c0804378,2,c4145a5c,fffffff,0,...) at > >> >> smp_topo+0x10b > >> >> sched_setup(0,141ec00,141ec00,141e000,1425000,...) at > >> >> sched_setup+0x1a mi_startup() at mi_startup+0x96 > >> >> begin() at begin+0x2c > >> > > >> > My guess is that (cpu_cores * cpu_logical) somehow equals to > >> > zero. > >> > >> That was masked earlier by additional checks on zero, > >> and now that routine moved to the separate function > >> (and to separate call path from subr_smp.c:mp_start() > >> which seems not to be called). > >> > >> > Have you by a chance saved this crash dump? > >> > I think that t would be interesting to look at it in kgdb. > > > > Please try the attached patch. > > > > Jung-uk Kim > > The strange thing is why cpu_mp_start() is called at all in case > when there is only one CPU in system. It should early return in > mp_start(). (I saw two reports and both of them were UP systems). I don't think cpu_mp_start() is the culprit. When SMP kernel is used on UP system, scheduler still tries to probe topology although it should be simply smp_topo_none() instead of calling MD cpu_topo(). In fact, I had a simple band-aid in cpu_topo() in my local tree to shut up annoying: WARNING: Non-uniform processors. WARNING: Using suboptimal topology. messages when SMP is forced off or a core is disabled on multi-core systems, etc. It wasn't critical before but it is now, unfortunately. Jung-uk Kim