Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 07 Nov 2007 22:26:48 +0100
From:      Kris Kennaway <kris@FreeBSD.org>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        alc@FreeBSD.org, Alan Cox <alc@cs.rice.edu>, freebsd-sparc64@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>
Subject:   Re: 7.0 broken on e4500
Message-ID:  <47322D98.9090202@FreeBSD.org>
In-Reply-To: <20071107212134.GL36824@alchemy.franken.de>
References:  <46FEADFD.8020105@FreeBSD.org> <20071003132944.GA17342@alchemy.franken.de> <200710060222.31023.jhb@freebsd.org> <20071006132620.GF24840@alchemy.franken.de> <472DFC18.3080000@FreeBSD.org> <472E4573.3090708@FreeBSD.org> <20071104224618.GD36824@alchemy.franken.de> <472E54D0.8070807@FreeBSD.org> <473019E8.3070203@cs.rice.edu> <20071107212134.GL36824@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Marius Strobl wrote:
> On Tue, Nov 06, 2007 at 01:38:16AM -0600, Alan Cox wrote:
>> Kris Kennaway wrote:
>>
>>> Marius Strobl wrote:
>>>
>>>> On Sun, Nov 04, 2007 at 11:19:31PM +0100, Kris Kennaway wrote:
>>>>
>>>>> Kris Kennaway wrote:
>>>>>
>>>>>> Marius Strobl wrote:
>>>>>>
>>>>>>> On Sat, Oct 06, 2007 at 02:22:30AM -0400, John Baldwin wrote:
>>>>>>>
>>>>>>>> On Wednesday 03 October 2007 09:29:44 am Marius Strobl wrote:
>>>>>>>>
>>>>>>>>> On Sat, Sep 29, 2007 at 09:56:45PM +0200, Kris Kennaway wrote:
>>>>>>>>>
>>>>>>>>>> I get this early during boot with a CVS kernel (updated from last 
>>>>>>>> December):
>>>>>>>>
>>>>>>>>>>> FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs
>>>>>>>>>>> panic: tsb_tte_enter: replacing valid kernel mapping
>>>>>>>>>>> cpuid = 0
>>>>>>>>>>> KDB: enter: panic
>>>>>>>>>>> [thread pid 0 tid 0 ]
>>>>>>>>>>> Stopped at      kdb_enter+0x68: ta              %xcc, 1
>>>>>>>>>>> db> wh
>>>>>>>>>>> Tracing pid 0 tid 0 td 0xc0744f80
>>>>>>>>>>> panic() at panic+0x204
>>>>>>>>>>> tsb_tte_enter() at tsb_tte_enter+0xdc
>>>>>>>>>>> pmap_enter_locked() at pmap_enter_locked+0x2d0
>>>>>>>>>>> pmap_enter() at pmap_enter+0x64
>>>>>>>>>>> kmem_malloc() at kmem_malloc+0x6e0
>>>>>>>>>>> page_alloc() at page_alloc+0x28
>>>>>>>>>>> uma_large_malloc() at uma_large_malloc+0x44
>>>>>>>>>>> malloc() at malloc+0x1b0
>>>>>>>>>>> sf_buf_init() at sf_buf_init+0xf8
>>>>>>>>>>> mi_startup() at mi_startup+0x18c
>>>>>>>>>>> btext() at btext+0x34
>>>>>>>>> Do you by chance load the new kernel manually via the loader
>>>>>>>>> prompt, with the old kernel being <= 8MB in size and the new
>>>>>>>>> one > 8MB?
>>>>>>>> I get this panic on an E220R at work, but my "new" kernel is 
>>>>>>>> smaller.
>>>>>>>>
>>>>>>> If the actual panic string is "vm_phys_paddr_to_vm_page: paddr <foo>
>>>>>>> is not in any segment" than that's the problem I had in mind when
>>>>>>> replying to Kris but unfortunately failed to describe the right
>>>>>>> way around.
>>>>>>>
>>>>>>>>> ll /boot/kernel/kernel* /boot/test/kernel*
>>>>>>>> -r-xr-xr-x  1 root  wheel   7821094 Feb  6  2007 /boot/kernel/kernel
>>>>>>>> -r-xr-xr-x  1 root  wheel  13902501 Feb  6  2007 
>>>>>>>> /boot/kernel/kernel.symbols
>>>>>>>> -r-xr-xr-x  1 root  wheel   4534968 Oct  6 00:20 /boot/test/kernel
>>>>>>>> -r-xr-xr-x  1 root  wheel  10101980 Oct  6 00:20 
>>>>>>>> /boot/test/kernel.symbols
>>>>>>>>
>>>>>>>> The working kernel (~7MB) is the GENERIC kernel, and the "test" 
>>>>>>>> kernel
>>>>>>>> is the stripped down kernel for this machine.  In my case I'm 
>>>>>>>> panicing in pmap_remove_tte() called from pmap_enter_locked().  I 
>>>>>>>> added some KTR traces to the pmap code to try and investigate, 
>>>>>>>> but I'm guessing the root problem is that the loader doesn't 
>>>>>>>> properly handle telling OFW about needing to change the mappings 
>>>>>>>> when unloading and then loading a new kernel?
>>>>>>>>
>>>>>>>> Hmm, it looks like currently the loader doesn't do any sort of MD 
>>>>>>>> callback
>>>>>>>> when unloading a file, so the loader isn't going to free up the 
>>>>>>>> RAM it asked for from OFW for the old kernel.
>>>>>>>>
>>>>>>> Correct, the immediate problem (which I had a patch for somewhere)
>>>>>>> is that in case the "old" kernel required more TLB slots to be used
>>>>>>> than the "new" one one can't use the kernel end in order to determine
>>>>>>> how many slots are used for the kernel map. As you describe the real
>>>>>>> problem lies within the loader though. The funny thing is that no
>>>>>>> arch except sparc64 and sun4v seems to rely on the kernel end
>>>>>>> provided by the loader.
>>>>>>> If no idea what's the cause of the problem Kris is seeing though.
>>>>>>>
>>>>>>> Marius
>>>>>>>
>>>>>>>
>>>>>> FYI one of the e4500's is now booting again but another is still 
>>>>>> failing with the same panic:
>>>>>>
>>>>>> FreeBSD 8.0-CURRENT #44: Mon Nov  5 01:52:42 JST 2007
>>>>>>   root@e4500-2.allbsd.org:/usr/src/sys/sparc64/compile/E4500_2
>>>>>> real memory  = 9663676416 (9216 MB)
>>>>>> avail memory = 9433554944 (8996 MB)
>>>>>> cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu2: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu3: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu4: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu5: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu6: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu7: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu8: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> cpu9: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
>>>>>> FreeBSD/SMP: Multiprocessor System Detected: 10 CPUs
>>>>>> panic: tsb_tte_enter: replacing valid kernel mapping
>>>>>> db> wh
>>>>>> Tracing pid 0 tid 0 td 0xc056ad30
>>>>>> panic() at panic+0x248
>>>>>> tsb_tte_enter() at tsb_tte_enter+0xdc
>>>>>> pmap_enter_locked() at pmap_enter_locked+0x318
>>>>>> pmap_enter() at pmap_enter+0x64
>>>>>> kmem_malloc() at kmem_malloc+0x644
>>>>>> page_alloc() at page_alloc+0x28
>>>>>> uma_large_malloc() at uma_large_malloc+0x44
>>>>>> malloc() at malloc+0x1a0
>>>>>> sf_buf_init() at sf_buf_init+0xe8
>>>>>> mi_startup() at mi_startup+0x1e8
>>>>>> btext() at btext+0x34
>>>>>>
>> Can anyone tell me more about the "vm_phys_paddr_to_vm_page: paddr <foo> 
>> is not in any segment" panic?
>>
> 
> The relevant info should be also above; if one unloads a kernel
> in the loader and loads another one which occupies fewer TLB
> slots than the previous one, the excess slots aren't flushed.
> The kernel in turn relies on the MODINFOMD_KERNEND provided
> by the loader (i.e. the ekva supplied to pmap_bootstrap()) for
> calculating the start of KVA however, which doesn't include
> the excess slots with locked entries entered by the loader.
> Typical panics look like:
> cpu0: Sun Microsystems UltraSparc-IIi Processor (440.16 MHz CPU)
> panic: vm_phys_paddr_to_vm_page: paddr 0x1e01a000 is not in any segment
> cpuid = 0
> KDB: enter: panic
> [thread pid 0 tid 0 ]
> Stopped at      kdb_enter+0x68: ta              %xcc, 1
> db> bt
> Tracing pid 0 tid 0 td 0xc06a2780
> panic() at panic+0x204
> vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84
> pmap_remove_tte() at pmap_remove_tte+0x44
> pmap_enter_locked() at pmap_enter_locked+0x1b4
> pmap_enter() at pmap_enter+0x94
> kmem_malloc() at kmem_malloc+0x69c
> page_alloc() at page_alloc+0x28
> uma_large_malloc() at uma_large_malloc+0x44
> malloc() at malloc+0xc4
> sf_buf_init() at sf_buf_init+0xf8
> mi_startup() at mi_startup+0x18c
> btext() at btext+0x34
> db>
> 
> Marius
> 
> 

Well, except I'm not unloading the kernel, just letting it boot the 
default /boot/kernel/kernel.

Kris




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47322D98.9090202>