From owner-freebsd-sparc64@FreeBSD.ORG Wed Nov 7 21:26:53 2007 Return-Path: Delivered-To: freebsd-sparc64@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A736216A46E; Wed, 7 Nov 2007 21:26:53 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (pointyhat.freebsd.org [IPv6:2001:4f8:fff6::2b]) by mx1.freebsd.org (Postfix) with ESMTP id B6EBB13C4AA; Wed, 7 Nov 2007 21:26:49 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47322D98.9090202@FreeBSD.org> Date: Wed, 07 Nov 2007 22:26:48 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Marius Strobl References: <46FEADFD.8020105@FreeBSD.org> <20071003132944.GA17342@alchemy.franken.de> <200710060222.31023.jhb@freebsd.org> <20071006132620.GF24840@alchemy.franken.de> <472DFC18.3080000@FreeBSD.org> <472E4573.3090708@FreeBSD.org> <20071104224618.GD36824@alchemy.franken.de> <472E54D0.8070807@FreeBSD.org> <473019E8.3070203@cs.rice.edu> <20071107212134.GL36824@alchemy.franken.de> In-Reply-To: <20071107212134.GL36824@alchemy.franken.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: alc@FreeBSD.org, Alan Cox , freebsd-sparc64@FreeBSD.org, John Baldwin Subject: Re: 7.0 broken on e4500 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2007 21:26:53 -0000 Marius Strobl wrote: > On Tue, Nov 06, 2007 at 01:38:16AM -0600, Alan Cox wrote: >> Kris Kennaway wrote: >> >>> Marius Strobl wrote: >>> >>>> On Sun, Nov 04, 2007 at 11:19:31PM +0100, Kris Kennaway wrote: >>>> >>>>> Kris Kennaway wrote: >>>>> >>>>>> Marius Strobl wrote: >>>>>> >>>>>>> On Sat, Oct 06, 2007 at 02:22:30AM -0400, John Baldwin wrote: >>>>>>> >>>>>>>> On Wednesday 03 October 2007 09:29:44 am Marius Strobl wrote: >>>>>>>> >>>>>>>>> On Sat, Sep 29, 2007 at 09:56:45PM +0200, Kris Kennaway wrote: >>>>>>>>> >>>>>>>>>> I get this early during boot with a CVS kernel (updated from last >>>>>>>> December): >>>>>>>> >>>>>>>>>>> FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs >>>>>>>>>>> panic: tsb_tte_enter: replacing valid kernel mapping >>>>>>>>>>> cpuid = 0 >>>>>>>>>>> KDB: enter: panic >>>>>>>>>>> [thread pid 0 tid 0 ] >>>>>>>>>>> Stopped at kdb_enter+0x68: ta %xcc, 1 >>>>>>>>>>> db> wh >>>>>>>>>>> Tracing pid 0 tid 0 td 0xc0744f80 >>>>>>>>>>> panic() at panic+0x204 >>>>>>>>>>> tsb_tte_enter() at tsb_tte_enter+0xdc >>>>>>>>>>> pmap_enter_locked() at pmap_enter_locked+0x2d0 >>>>>>>>>>> pmap_enter() at pmap_enter+0x64 >>>>>>>>>>> kmem_malloc() at kmem_malloc+0x6e0 >>>>>>>>>>> page_alloc() at page_alloc+0x28 >>>>>>>>>>> uma_large_malloc() at uma_large_malloc+0x44 >>>>>>>>>>> malloc() at malloc+0x1b0 >>>>>>>>>>> sf_buf_init() at sf_buf_init+0xf8 >>>>>>>>>>> mi_startup() at mi_startup+0x18c >>>>>>>>>>> btext() at btext+0x34 >>>>>>>>> Do you by chance load the new kernel manually via the loader >>>>>>>>> prompt, with the old kernel being <= 8MB in size and the new >>>>>>>>> one > 8MB? >>>>>>>> I get this panic on an E220R at work, but my "new" kernel is >>>>>>>> smaller. >>>>>>>> >>>>>>> If the actual panic string is "vm_phys_paddr_to_vm_page: paddr >>>>>>> is not in any segment" than that's the problem I had in mind when >>>>>>> replying to Kris but unfortunately failed to describe the right >>>>>>> way around. >>>>>>> >>>>>>>>> ll /boot/kernel/kernel* /boot/test/kernel* >>>>>>>> -r-xr-xr-x 1 root wheel 7821094 Feb 6 2007 /boot/kernel/kernel >>>>>>>> -r-xr-xr-x 1 root wheel 13902501 Feb 6 2007 >>>>>>>> /boot/kernel/kernel.symbols >>>>>>>> -r-xr-xr-x 1 root wheel 4534968 Oct 6 00:20 /boot/test/kernel >>>>>>>> -r-xr-xr-x 1 root wheel 10101980 Oct 6 00:20 >>>>>>>> /boot/test/kernel.symbols >>>>>>>> >>>>>>>> The working kernel (~7MB) is the GENERIC kernel, and the "test" >>>>>>>> kernel >>>>>>>> is the stripped down kernel for this machine. In my case I'm >>>>>>>> panicing in pmap_remove_tte() called from pmap_enter_locked(). I >>>>>>>> added some KTR traces to the pmap code to try and investigate, >>>>>>>> but I'm guessing the root problem is that the loader doesn't >>>>>>>> properly handle telling OFW about needing to change the mappings >>>>>>>> when unloading and then loading a new kernel? >>>>>>>> >>>>>>>> Hmm, it looks like currently the loader doesn't do any sort of MD >>>>>>>> callback >>>>>>>> when unloading a file, so the loader isn't going to free up the >>>>>>>> RAM it asked for from OFW for the old kernel. >>>>>>>> >>>>>>> Correct, the immediate problem (which I had a patch for somewhere) >>>>>>> is that in case the "old" kernel required more TLB slots to be used >>>>>>> than the "new" one one can't use the kernel end in order to determine >>>>>>> how many slots are used for the kernel map. As you describe the real >>>>>>> problem lies within the loader though. The funny thing is that no >>>>>>> arch except sparc64 and sun4v seems to rely on the kernel end >>>>>>> provided by the loader. >>>>>>> If no idea what's the cause of the problem Kris is seeing though. >>>>>>> >>>>>>> Marius >>>>>>> >>>>>>> >>>>>> FYI one of the e4500's is now booting again but another is still >>>>>> failing with the same panic: >>>>>> >>>>>> FreeBSD 8.0-CURRENT #44: Mon Nov 5 01:52:42 JST 2007 >>>>>> root@e4500-2.allbsd.org:/usr/src/sys/sparc64/compile/E4500_2 >>>>>> real memory = 9663676416 (9216 MB) >>>>>> avail memory = 9433554944 (8996 MB) >>>>>> cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu2: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu3: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu4: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu5: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu6: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu7: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu8: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> cpu9: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) >>>>>> FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs >>>>>> panic: tsb_tte_enter: replacing valid kernel mapping >>>>>> db> wh >>>>>> Tracing pid 0 tid 0 td 0xc056ad30 >>>>>> panic() at panic+0x248 >>>>>> tsb_tte_enter() at tsb_tte_enter+0xdc >>>>>> pmap_enter_locked() at pmap_enter_locked+0x318 >>>>>> pmap_enter() at pmap_enter+0x64 >>>>>> kmem_malloc() at kmem_malloc+0x644 >>>>>> page_alloc() at page_alloc+0x28 >>>>>> uma_large_malloc() at uma_large_malloc+0x44 >>>>>> malloc() at malloc+0x1a0 >>>>>> sf_buf_init() at sf_buf_init+0xe8 >>>>>> mi_startup() at mi_startup+0x1e8 >>>>>> btext() at btext+0x34 >>>>>> >> Can anyone tell me more about the "vm_phys_paddr_to_vm_page: paddr >> is not in any segment" panic? >> > > The relevant info should be also above; if one unloads a kernel > in the loader and loads another one which occupies fewer TLB > slots than the previous one, the excess slots aren't flushed. > The kernel in turn relies on the MODINFOMD_KERNEND provided > by the loader (i.e. the ekva supplied to pmap_bootstrap()) for > calculating the start of KVA however, which doesn't include > the excess slots with locked entries entered by the loader. > Typical panics look like: > cpu0: Sun Microsystems UltraSparc-IIi Processor (440.16 MHz CPU) > panic: vm_phys_paddr_to_vm_page: paddr 0x1e01a000 is not in any segment > cpuid = 0 > KDB: enter: panic > [thread pid 0 tid 0 ] > Stopped at kdb_enter+0x68: ta %xcc, 1 > db> bt > Tracing pid 0 tid 0 td 0xc06a2780 > panic() at panic+0x204 > vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84 > pmap_remove_tte() at pmap_remove_tte+0x44 > pmap_enter_locked() at pmap_enter_locked+0x1b4 > pmap_enter() at pmap_enter+0x94 > kmem_malloc() at kmem_malloc+0x69c > page_alloc() at page_alloc+0x28 > uma_large_malloc() at uma_large_malloc+0x44 > malloc() at malloc+0xc4 > sf_buf_init() at sf_buf_init+0xf8 > mi_startup() at mi_startup+0x18c > btext() at btext+0x34 > db> > > Marius > > Well, except I'm not unloading the kernel, just letting it boot the default /boot/kernel/kernel. Kris