From owner-freebsd-sparc64@FreeBSD.ORG Wed Nov 7 21:33:25 2007 Return-Path: Delivered-To: freebsd-sparc64@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24B0E16A419; Wed, 7 Nov 2007 21:33:25 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 34CE113C4BA; Wed, 7 Nov 2007 21:33:19 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.1/8.14.1/ALCHEMY.FRANKEN.DE) with ESMTP id lA7LX1QZ015402; Wed, 7 Nov 2007 22:33:01 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.1/8.14.1/Submit) id lA7LX1JU015401; Wed, 7 Nov 2007 22:33:01 +0100 (CET) (envelope-from marius) Date: Wed, 7 Nov 2007 22:33:01 +0100 From: Marius Strobl To: Kris Kennaway Message-ID: <20071107213301.GM36824@alchemy.franken.de> References: <20071003132944.GA17342@alchemy.franken.de> <200710060222.31023.jhb@freebsd.org> <20071006132620.GF24840@alchemy.franken.de> <472DFC18.3080000@FreeBSD.org> <472E4573.3090708@FreeBSD.org> <20071104224618.GD36824@alchemy.franken.de> <472E54D0.8070807@FreeBSD.org> <473019E8.3070203@cs.rice.edu> <20071107212134.GL36824@alchemy.franken.de> <47322D98.9090202@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47322D98.9090202@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: alc@FreeBSD.org, Alan Cox , freebsd-sparc64@FreeBSD.org, John Baldwin Subject: Re: 7.0 broken on e4500 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2007 21:33:25 -0000 On Wed, Nov 07, 2007 at 10:26:48PM +0100, Kris Kennaway wrote: > Marius Strobl wrote: > >On Tue, Nov 06, 2007 at 01:38:16AM -0600, Alan Cox wrote: > >>Kris Kennaway wrote: > >> > >>>Marius Strobl wrote: > >>> > >>>>On Sun, Nov 04, 2007 at 11:19:31PM +0100, Kris Kennaway wrote: > >>>> > >>>>>Kris Kennaway wrote: > >>>>> > >>>>>>Marius Strobl wrote: > >>>>>> > >>>>>>>On Sat, Oct 06, 2007 at 02:22:30AM -0400, John Baldwin wrote: > >>>>>>> > >>>>>>>>On Wednesday 03 October 2007 09:29:44 am Marius Strobl wrote: > >>>>>>>> > >>>>>>>>>On Sat, Sep 29, 2007 at 09:56:45PM +0200, Kris Kennaway wrote: > >>>>>>>>> > >>>>>>>>>>I get this early during boot with a CVS kernel (updated from last > >>>>>>>>December): > >>>>>>>> > >>>>>>>>>>>FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs > >>>>>>>>>>>panic: tsb_tte_enter: replacing valid kernel mapping > >>>>>>>>>>>cpuid = 0 > >>>>>>>>>>>KDB: enter: panic > >>>>>>>>>>>[thread pid 0 tid 0 ] > >>>>>>>>>>>Stopped at kdb_enter+0x68: ta %xcc, 1 > >>>>>>>>>>>db> wh > >>>>>>>>>>>Tracing pid 0 tid 0 td 0xc0744f80 > >>>>>>>>>>>panic() at panic+0x204 > >>>>>>>>>>>tsb_tte_enter() at tsb_tte_enter+0xdc > >>>>>>>>>>>pmap_enter_locked() at pmap_enter_locked+0x2d0 > >>>>>>>>>>>pmap_enter() at pmap_enter+0x64 > >>>>>>>>>>>kmem_malloc() at kmem_malloc+0x6e0 > >>>>>>>>>>>page_alloc() at page_alloc+0x28 > >>>>>>>>>>>uma_large_malloc() at uma_large_malloc+0x44 > >>>>>>>>>>>malloc() at malloc+0x1b0 > >>>>>>>>>>>sf_buf_init() at sf_buf_init+0xf8 > >>>>>>>>>>>mi_startup() at mi_startup+0x18c > >>>>>>>>>>>btext() at btext+0x34 > >>>>>>>>>Do you by chance load the new kernel manually via the loader > >>>>>>>>>prompt, with the old kernel being <= 8MB in size and the new > >>>>>>>>>one > 8MB? > >>>>>>>>I get this panic on an E220R at work, but my "new" kernel is > >>>>>>>>smaller. > >>>>>>>> > >>>>>>>If the actual panic string is "vm_phys_paddr_to_vm_page: paddr > >>>>>>>is not in any segment" than that's the problem I had in mind when > >>>>>>>replying to Kris but unfortunately failed to describe the right > >>>>>>>way around. > >>>>>>> > >>>>>>>>>ll /boot/kernel/kernel* /boot/test/kernel* > >>>>>>>>-r-xr-xr-x 1 root wheel 7821094 Feb 6 2007 /boot/kernel/kernel > >>>>>>>>-r-xr-xr-x 1 root wheel 13902501 Feb 6 2007 > >>>>>>>>/boot/kernel/kernel.symbols > >>>>>>>>-r-xr-xr-x 1 root wheel 4534968 Oct 6 00:20 /boot/test/kernel > >>>>>>>>-r-xr-xr-x 1 root wheel 10101980 Oct 6 00:20 > >>>>>>>>/boot/test/kernel.symbols > >>>>>>>> > >>>>>>>>The working kernel (~7MB) is the GENERIC kernel, and the "test" > >>>>>>>>kernel > >>>>>>>>is the stripped down kernel for this machine. In my case I'm > >>>>>>>>panicing in pmap_remove_tte() called from pmap_enter_locked(). I > >>>>>>>>added some KTR traces to the pmap code to try and investigate, > >>>>>>>>but I'm guessing the root problem is that the loader doesn't > >>>>>>>>properly handle telling OFW about needing to change the mappings > >>>>>>>>when unloading and then loading a new kernel? > >>>>>>>> > >>>>>>>>Hmm, it looks like currently the loader doesn't do any sort of MD > >>>>>>>>callback > >>>>>>>>when unloading a file, so the loader isn't going to free up the > >>>>>>>>RAM it asked for from OFW for the old kernel. > >>>>>>>> > >>>>>>>Correct, the immediate problem (which I had a patch for somewhere) > >>>>>>>is that in case the "old" kernel required more TLB slots to be used > >>>>>>>than the "new" one one can't use the kernel end in order to determine > >>>>>>>how many slots are used for the kernel map. As you describe the real > >>>>>>>problem lies within the loader though. The funny thing is that no > >>>>>>>arch except sparc64 and sun4v seems to rely on the kernel end > >>>>>>>provided by the loader. > >>>>>>>If no idea what's the cause of the problem Kris is seeing though. > >>>>>>> > >>>>>>>Marius > >>>>>>> > >>>>>>> > >>>>>>FYI one of the e4500's is now booting again but another is still > >>>>>>failing with the same panic: > >>>>>> > >>>>>>FreeBSD 8.0-CURRENT #44: Mon Nov 5 01:52:42 JST 2007 > >>>>>> root@e4500-2.allbsd.org:/usr/src/sys/sparc64/compile/E4500_2 > >>>>>>real memory = 9663676416 (9216 MB) > >>>>>>avail memory = 9433554944 (8996 MB) > >>>>>>cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu2: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu3: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu4: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu5: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu6: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu7: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu8: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>cpu9: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) > >>>>>>FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs > >>>>>>panic: tsb_tte_enter: replacing valid kernel mapping > >>>>>>db> wh > >>>>>>Tracing pid 0 tid 0 td 0xc056ad30 > >>>>>>panic() at panic+0x248 > >>>>>>tsb_tte_enter() at tsb_tte_enter+0xdc > >>>>>>pmap_enter_locked() at pmap_enter_locked+0x318 > >>>>>>pmap_enter() at pmap_enter+0x64 > >>>>>>kmem_malloc() at kmem_malloc+0x644 > >>>>>>page_alloc() at page_alloc+0x28 > >>>>>>uma_large_malloc() at uma_large_malloc+0x44 > >>>>>>malloc() at malloc+0x1a0 > >>>>>>sf_buf_init() at sf_buf_init+0xe8 > >>>>>>mi_startup() at mi_startup+0x1e8 > >>>>>>btext() at btext+0x34 > >>>>>> > >>Can anyone tell me more about the "vm_phys_paddr_to_vm_page: paddr > >>is not in any segment" panic? > >> > > > >The relevant info should be also above; if one unloads a kernel > >in the loader and loads another one which occupies fewer TLB > >slots than the previous one, the excess slots aren't flushed. > >The kernel in turn relies on the MODINFOMD_KERNEND provided > >by the loader (i.e. the ekva supplied to pmap_bootstrap()) for > >calculating the start of KVA however, which doesn't include > >the excess slots with locked entries entered by the loader. > >Typical panics look like: > >cpu0: Sun Microsystems UltraSparc-IIi Processor (440.16 MHz CPU) > >panic: vm_phys_paddr_to_vm_page: paddr 0x1e01a000 is not in any segment > >cpuid = 0 > >KDB: enter: panic > >[thread pid 0 tid 0 ] > >Stopped at kdb_enter+0x68: ta %xcc, 1 > >db> bt > >Tracing pid 0 tid 0 td 0xc06a2780 > >panic() at panic+0x204 > >vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84 > >pmap_remove_tte() at pmap_remove_tte+0x44 > >pmap_enter_locked() at pmap_enter_locked+0x1b4 > >pmap_enter() at pmap_enter+0x94 > >kmem_malloc() at kmem_malloc+0x69c > >page_alloc() at page_alloc+0x28 > >uma_large_malloc() at uma_large_malloc+0x44 > >malloc() at malloc+0xc4 > >sf_buf_init() at sf_buf_init+0xf8 > >mi_startup() at mi_startup+0x18c > >btext() at btext+0x34 > >db> > > > >Marius > > > > > > Well, except I'm not unloading the kernel, just letting it boot the > default /boot/kernel/kernel. > Yup, as also written above you're obviously facing another problem. I just initially thought it might be the same. Marius