From owner-freebsd-sparc64@FreeBSD.ORG Wed Nov 7 21:21:56 2007 Return-Path: Delivered-To: freebsd-sparc64@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C976716A41B; Wed, 7 Nov 2007 21:21:56 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id D509113C4A5; Wed, 7 Nov 2007 21:21:55 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.1/8.14.1/ALCHEMY.FRANKEN.DE) with ESMTP id lA7LLYjf015294; Wed, 7 Nov 2007 22:21:35 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.1/8.14.1/Submit) id lA7LLYjA015293; Wed, 7 Nov 2007 22:21:34 +0100 (CET) (envelope-from marius) Date: Wed, 7 Nov 2007 22:21:34 +0100 From: Marius Strobl To: Alan Cox Message-ID: <20071107212134.GL36824@alchemy.franken.de> References: <46FEADFD.8020105@FreeBSD.org> <20071003132944.GA17342@alchemy.franken.de> <200710060222.31023.jhb@freebsd.org> <20071006132620.GF24840@alchemy.franken.de> <472DFC18.3080000@FreeBSD.org> <472E4573.3090708@FreeBSD.org> <20071104224618.GD36824@alchemy.franken.de> <472E54D0.8070807@FreeBSD.org> <473019E8.3070203@cs.rice.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <473019E8.3070203@cs.rice.edu> User-Agent: Mutt/1.4.2.3i Cc: alc@FreeBSD.org, Kris Kennaway , freebsd-sparc64@FreeBSD.org, John Baldwin Subject: Re: 7.0 broken on e4500 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2007 21:21:56 -0000 On Tue, Nov 06, 2007 at 01:38:16AM -0600, Alan Cox wrote: > Kris Kennaway wrote: > > >Marius Strobl wrote: > > > >>On Sun, Nov 04, 2007 at 11:19:31PM +0100, Kris Kennaway wrote: > >> > >>>Kris Kennaway wrote: > >>> > >>>>Marius Strobl wrote: > >>>> > >>>>>On Sat, Oct 06, 2007 at 02:22:30AM -0400, John Baldwin wrote: > >>>>> > >>>>>>On Wednesday 03 October 2007 09:29:44 am Marius Strobl wrote: > >>>>>> > >>>>>>>On Sat, Sep 29, 2007 at 09:56:45PM +0200, Kris Kennaway wrote: > >>>>>>> > >>>>>>>>I get this early during boot with a CVS kernel (updated from last > >>>>>>> > >>>>>>December): > >>>>>> > >>>>>>>>>FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs > >>>>>>>>>panic: tsb_tte_enter: replacing valid kernel mapping > >>>>>>>>>cpuid = 0 > >>>>>>>>>KDB: enter: panic > >>>>>>>>>[thread pid 0 tid 0 ] > >>>>>>>>>Stopped at kdb_enter+0x68: ta %xcc, 1 > >>>>>>>>>db> wh > >>>>>>>>>Tracing pid 0 tid 0 td 0xc0744f80 > >>>>>>>>>panic() at panic+0x204 > >>>>>>>>>tsb_tte_enter() at tsb_tte_enter+0xdc > >>>>>>>>>pmap_enter_locked() at pmap_enter_locked+0x2d0 > >>>>>>>>>pmap_enter() at pmap_enter+0x64 > >>>>>>>>>kmem_malloc() at kmem_malloc+0x6e0 > >>>>>>>>>page_alloc() at page_alloc+0x28 > >>>>>>>>>uma_large_malloc() at uma_large_malloc+0x44 > >>>>>>>>>malloc() at malloc+0x1b0 > >>>>>>>>>sf_buf_init() at sf_buf_init+0xf8 > >>>>>>>>>mi_startup() at mi_startup+0x18c > >>>>>>>>>btext() at btext+0x34 > >>>>>>>> > >>>>>>>Do you by chance load the new kernel manually via the loader > >>>>>>>prompt, with the old kernel being <= 8MB in size and the new > >>>>>>>one > 8MB? > >>>>>> > >>>>>>I get this panic on an E220R at work, but my "new" kernel is > >>>>>>smaller. > >>>>>> > >>>>>If the actual panic string is "vm_phys_paddr_to_vm_page: paddr > >>>>>is not in any segment" than that's the problem I had in mind when > >>>>>replying to Kris but unfortunately failed to describe the right > >>>>>way around. > >>>>> > >>>>>>>ll /boot/kernel/kernel* /boot/test/kernel* > >>>>>> > >>>>>>-r-xr-xr-x 1 root wheel 7821094 Feb 6 2007 /boot/kernel/kernel > >>>>>>-r-xr-xr-x 1 root wheel 13902501 Feb 6 2007 > >>>>>>/boot/kernel/kernel.symbols > >>>>>>-r-xr-xr-x 1 root wheel 4534968 Oct 6 00:20 /boot/test/kernel > >>>>>>-r-xr-xr-x 1 root wheel 10101980 Oct 6 00:20 > >>>>>>/boot/test/kernel.symbols > >>>>>> > >>>>>>The working kernel (~7MB) is the GENERIC kernel, and the "test" > >>>>>>kernel > >>>>>>is the stripped down kernel for this machine. In my case I'm > >>>>>>panicing in pmap_remove_tte() called from pmap_enter_locked(). I > >>>>>>added some KTR traces to the pmap code to try and investigate, > >>>>>>but I'm guessing the root problem is that the loader doesn't > >>>>>>properly handle telling OFW about needing to change the mappings > >>>>>>when unloading and then loading a new kernel? > >>>>>> > >>>>>>Hmm, it looks like currently the loader doesn't do any sort of MD > >>>>>>callback > >>>>>>when unloading a file, so the loader isn't going to free up the > >>>>>>RAM it asked for from OFW for the old kernel. > >>>>>> > >>>>>Correct, the immediate problem (which I had a patch for somewhere) > >>>>>is that in case the "old" kernel required more TLB slots to be used > >>>>>than the "new" one one can't use the kernel end in order to determine > >>>>>how many slots are used for the kernel map. As you describe the real > >>>>>problem lies within the loader though. The funny thing is that no > >>>>>arch except sparc64 and sun4v seems to rely on the kernel end > >>>>>provided by the loader. > >>>>>If no idea what's the cause of the problem Kris is seeing though. > >>>>> > >>>>>Marius > >>>>> > >>>>> > >>>>FYI one of the e4500's is now booting again but another is still > >>>>failing with the same panic: > >>>> > >>>>FreeBSD 8.0-CURRENT #44: Mon Nov 5 01:52:42 JST 2007 > >>>> root@e4500-2.allbsd.org:/usr/src/sys/sparc64/compile/E4500_2 > >>>>real memory = 9663676416 (9216 MB) > >>>>avail memory = 9433554944 (8996 MB) > >>>>cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu2: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu3: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu4: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu5: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu6: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu7: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu8: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>cpu9: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs > >>>>panic: tsb_tte_enter: replacing valid kernel mapping > >>>>db> wh > >>>>Tracing pid 0 tid 0 td 0xc056ad30 > >>>>panic() at panic+0x248 > >>>>tsb_tte_enter() at tsb_tte_enter+0xdc > >>>>pmap_enter_locked() at pmap_enter_locked+0x318 > >>>>pmap_enter() at pmap_enter+0x64 > >>>>kmem_malloc() at kmem_malloc+0x644 > >>>>page_alloc() at page_alloc+0x28 > >>>>uma_large_malloc() at uma_large_malloc+0x44 > >>>>malloc() at malloc+0x1a0 > >>>>sf_buf_init() at sf_buf_init+0xe8 > >>>>mi_startup() at mi_startup+0x1e8 > >>>>btext() at btext+0x34 > >>>> > > Can anyone tell me more about the "vm_phys_paddr_to_vm_page: paddr > is not in any segment" panic? > The relevant info should be also above; if one unloads a kernel in the loader and loads another one which occupies fewer TLB slots than the previous one, the excess slots aren't flushed. The kernel in turn relies on the MODINFOMD_KERNEND provided by the loader (i.e. the ekva supplied to pmap_bootstrap()) for calculating the start of KVA however, which doesn't include the excess slots with locked entries entered by the loader. Typical panics look like: cpu0: Sun Microsystems UltraSparc-IIi Processor (440.16 MHz CPU) panic: vm_phys_paddr_to_vm_page: paddr 0x1e01a000 is not in any segment cpuid = 0 KDB: enter: panic [thread pid 0 tid 0 ] Stopped at kdb_enter+0x68: ta %xcc, 1 db> bt Tracing pid 0 tid 0 td 0xc06a2780 panic() at panic+0x204 vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84 pmap_remove_tte() at pmap_remove_tte+0x44 pmap_enter_locked() at pmap_enter_locked+0x1b4 pmap_enter() at pmap_enter+0x94 kmem_malloc() at kmem_malloc+0x69c page_alloc() at page_alloc+0x28 uma_large_malloc() at uma_large_malloc+0x44 malloc() at malloc+0xc4 sf_buf_init() at sf_buf_init+0xf8 mi_startup() at mi_startup+0x18c btext() at btext+0x34 db> Marius