From owner-freebsd-current@FreeBSD.ORG Sat Aug 25 12:07:17 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5EC716A41A for ; Sat, 25 Aug 2007 12:07:16 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from mta01.xtra.co.nz (mta01.xtra.co.nz [210.54.141.254]) by mx1.freebsd.org (Postfix) with ESMTP id 4385E13C45A for ; Sat, 25 Aug 2007 12:07:15 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from fep05.xtra.co.nz ([172.23.12.51]) by mta01.xtra.co.nz with ESMTP id <20070825120709.JYIL3105.mta01.xtra.co.nz@fep05.xtra.co.nz>; Sun, 26 Aug 2007 00:07:09 +1200 Received: from serv.int.fubar.geek.nz ([219.89.91.5]) by fep05.xtra.co.nz with ESMTP id <20070825120709.HJIV20018.fep05.xtra.co.nz@serv.int.fubar.geek.nz>; Sun, 26 Aug 2007 00:07:09 +1200 Date: Sun, 26 Aug 2007 00:07:08 +1200 From: Andrew Turner To: Robert Watson Message-ID: <20070826000708.15fbb5bb@hermies.int.fubar.geek.nz> In-Reply-To: <20070824132409.W3900@fledge.watson.org> References: <20070824181627.57bed401@hermies.int.fubar.geek.nz> <20070824132409.W3900@fledge.watson.org> X-Mailer: Claws Mail 2.10.0 (GTK+ 2.10.14; i386-portbld-freebsd6.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: FreeBSD on xen hvm X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Aug 2007 12:07:17 -0000 On Fri, 24 Aug 2007 13:30:17 +0100 (BST) Robert Watson wrote: > On Fri, 24 Aug 2007, Andrew Turner wrote: > > > 1) PREEMPTION Preemption causes the kernel to panic with a page > > fault. The dmesg is available from [1]. > > Any chance it's possible to get a core for this, or attach GDB > somehow to the VM? I haven't managed to get either remote GDB working and it's too early in the boot for a core. I can get a xen core dump but it would require processing to get it into something gdb could use. > It looks like timing in Xen may be exposing a > race in some or another subsystem with timers, but figuring out which > subsystem it is will be most easily done if we can inspect the > callout information, which is most easily done with GDB since you can > inspect the callout structure more easily. If not, then we can add > some printfs to extract the information, I expect, or extend DDB. We > need to find out what the function pointer in the callout structure > is for. I've created a patch at [1] to add "show callouts" to ddb. It prints all the callouts in callwheel and the name of the function they call. The callouts with preemption are: loadav in6_tmpaddrtimer in6_rtqtimo in_rtqtimo in6_mtutimo uma_timeout nd6_slowtimo nfsrv_timer tcp_isn_tick scrn_timer roundrobin atkbd_timeout sleepq_timeout sleepq_timeout sleepq_timeout sleepq_timeout pffasttimo pfslowtimo kbdmux_kbd_intr_timo if_slowtimo ipport_tick nd6_timer lboltcb tcp_hc_purge Preemption does not always cause the kernel to panic, however when it doesn't it shows the mountroot> prompt and is unable to load the root as no disk drives show up. > > > 3) INVARIANTS Invariants causes a panic from a page fault. See [2] > > for the dmesg and backtrace. > > This appears to be in the start up of Audit as it creates a kernel > thread. Possibly it's creating the thread too early, or possibly > something else is going on. Can you try creating a kernel without > options AUDIT and see if it works better, or if it just panics when > the next thread is created? It just panics in the next thread created. > > It sounds like Xen may start the timer firing sooner than on plain > hardware, or possibly at a faster rate initially, and that's causing > things to happen in a different order, so I expect we'll either bump > into a series of races of this sort based on different ordering of > events, or discover the timer isn't properly being disabled or > managed in Xen :-). I'm suspecting the timer isn't being managed properly. The timer in the loader always stays at 10 and with DIAGNOSTIC I'm getting lines like: Expensive timeout(9) function: 0xc097da70(0xc0bbaa00) -1.982636062 s Andrew [1] http://fubar.geek.nz/files/freebsd/ddb-callout.diff -- Andrew Turner http://fubar.geek.nz/blog/