Date: Sat, 9 May 2020 19:50:10 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Andriy Gapon <avg@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: CHANGE_PV_LIST_LOCK_TO_PHYS is not correct when !NUMA ? Message-ID: <20200509165010.GI44519@kib.kiev.ua> In-Reply-To: <d8876981-4775-0548-e49e-2f9171a2db10@FreeBSD.org> References: <0d7db402-621e-cc6b-2918-2078f63e2a9b@FreeBSD.org> <20200508161500.GC44519@kib.kiev.ua> <6485ab77-a3d0-8916-9431-74e4da1e3ea7@FreeBSD.org> <20200509161325.GH44519@kib.kiev.ua> <d8876981-4775-0548-e49e-2f9171a2db10@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, May 09, 2020 at 07:16:27PM +0300, Andriy Gapon wrote: > On 09/05/2020 19:13, Konstantin Belousov wrote: > > On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote: > >> On 08/05/2020 19:15, Konstantin Belousov wrote: > >>> On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote: > >>>> > >>>> I have a reproducible panic with a custom kernel without option NUMA while using > >>>> amdgpu driver from linuxkpi-based drm: > >>>> > >>>> panic: address 41ec00000 beyond the last segment > >>>> > >>>> I did some quick debugging and the panic happens when Xorg server tries to > >>>> access a frame buffer (or something like that). There is a page fault that gets > >>>> satisfied by ttm with a fictitious page. > >>>> > >>>> The stack trace is: > >>>> #11 0xffffffff808031a3 in panic (fmt=0xffffffff8119a998 <cnputs_mtx> > >>>> "5\003ʀ\377\377\377\377") at /usr/devel/git/motil/sys/kern/kern_shutdown.c:839 > >>>> #12 0xffffffff80bbc552 in pmap_enter (pmap=<optimized out>, va=34504441856, > >>>> m=<optimized out>, prot=<optimized out>, flags=<optimized out>, psind=<optimized > >>>> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035 > >>>> #13 0xffffffff80b288be in vm_fault_populate (fs=<optimized out>) at > >>>> /usr/devel/git/motil/sys/vm/vm_fault.c:519 > >>>> #14 vm_fault_allocate (fs=<optimized out>) at > >>>> /usr/devel/git/motil/sys/vm/vm_fault.c:1032 > >>>> #15 vm_fault (map=<optimized out>, vaddr=<optimized out>, fault_type=<optimized > >>>> out>, fault_flags=<optimized out>, m_hold=<optimized out>) at > >>>> /usr/devel/git/motil/sys/vm/vm_fault.c:1342 > >>>> #16 0xffffffff80b26e7e in vm_fault_trap (map=0xfffffe0017cd39e8, > >>>> vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, > >>>> signo=0xfffffe00a810dbc4, ucode=0xfffffe00a810dbc0) at > >>>> /usr/devel/git/motil/sys/vm/vm_fault.c:589 > >>>> #17 0xffffffff80bcf89c in trap_pfault (frame=0xfffffe00a810dc00, > >>>> usermode=<optimized out>, signo=<optimized out>, ucode=0xffffffff80853250 > >>>> <putchar>) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821 > >>>> #18 0xffffffff80bceeec in trap (frame=0xfffffe00a810dc00) at > >>>> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34 > >>>> > >>>> > >>>> The line number in pmap_enter() is incorrect, I guess because of optimizations. > >>>> The assert seems to be reached via pmap_enter -> CHANGE_PV_LIST_LOCK_TO_PHYS -> > >>>> PHYS_TO_PV_LIST_LOCK -> pa_index(). > >>>> > >>>> The panic in correct in that the page is fictitious and its physical address is > >>>> beyond the end of real physical memory. > >>>> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but !NUMA one > >>>> is not. > >>> > >>> I think you can remove this assert. pa_index() is always taken by > >>> % NVP_LIST_LOCKS, because fictitious mappings are not promoted. > >>> > >>> Try that and commit if it works for you. > >> > >> I tried this change: > >> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c > >> index 4deed86a76d1a..b834b7f0388b7 100644 > >> --- a/sys/amd64/amd64/pmap.c > >> +++ b/sys/amd64/amd64/pmap.c > >> @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap) > >> #define NPV_LIST_LOCKS MAXCPU > >> > >> #define PHYS_TO_PV_LIST_LOCK(pa) \ > >> - (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS]) > >> + (&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS]) > >> #endif > >> > >> #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa) do { \ > >> > >> It fixed the original problem, but I got a new panic. > >> "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u(). > >> I guess that !NUMA variant does not get much testing, so I'll probably just > >> stick with the default. > > Why didn't you just removed the KASSERT from pa_index ? > > Well, I thought it might be useful in the NUMA case. > pa_index() definition is shared between both cases. Might be define the macro two times, for NUMA/non-NUMA. non-NUMA case does not need the assert, because users take it mod NPV_LIST_LOCKS.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200509165010.GI44519>