Date: Sat, 24 Oct 2020 15:37:35 -0400 From: Mark Johnston <markj@freebsd.org> To: mmel@freebsd.org Cc: bob prohaska <fbsd@www.zefox.net>, freebsd-current@freebsd.org, freebsd-arm@freebsd.org Subject: Re: panic: non-current pmap 0xffffa00020eab8f0 on Rpi3 Message-ID: <20201024193735.GA7755@raichu> In-Reply-To: <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org> References: <20201006021029.GA13260@www.zefox.net> <20201006133743.GA96285@raichu> <c8a5e1d2-0c47-e3f7-300a-f2fce55d2819@freebsd.org> <20201019203954.GC46122@raichu> <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 23, 2020 at 06:32:25PM +0200, Michal Meloun wrote: > > > On 19.10.2020 22:39, Mark Johnston wrote: > > On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote: > >> > >> > >> On 06.10.2020 15:37, Mark Johnston wrote: > >>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote: > >>>> Still seeing non-current pmap panics on the Pi3, this time a B+ running > >>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master) > >>>> during a -j4 buildworld. The backtrace reports > >>>> > >>>> panic: non-current pmap 0xffffa00020eab8f0 > >>> > >>> Could you show the output of "show procvm" from the debugger? > >> > >> I see same panic too, in my case its very rare - typical scenario is > >> rebuild of kf5 ports (~250, 2 days of full load). Any idea how to debug > >> this? > >> Michal > > > > I suspect that there is some race involving the pmap switching in > > vmspace_exit(), but I can't see it. In the example below, presumably > > process 22604 on CPU 0 is also exiting? Could you show the backtrace?> > > It would also be useful to see the value of PCPU_GET(curpmap) at the > > time of the panic. I'm not sure if there's a way to get that from DDB, > > but I suspect it should be equal to &vmspace0->vm_pmap. > Mark, > I think that I found problem. > The PCPU_GET() is not (and is not supposed to be) an atomic operation, > it expects that thread is at least pinned. > This is not true for pmap_remove_pages() - so I think that the KASSERT > is racy and shoud be removed (or at least covered by > sched_pin()/sched_unpin() pair). > What do you think? I think you're right. On amd64 curpmap is loaded using a single instruction so the assertion happens to work properly. On arm64 we have: 0xffff0000007ff138 <+32>: mov x8, x18 0xffff0000007ff13c <+36>: ldr x8, [x8, #216] 0xffff0000007ff140 <+40>: mov x26, x0 0xffff0000007ff144 <+44>: cmp x8, x0 Though, it looks like arm64's PCPU_GET could be modified to combine the first two instructions. To fix it, we could perhaps change the KASSERT to verify that pmap == vmspace_pmap(curthread->td_proc->p_vmspace). The various implementations of pmap_remove_pages() have different flavours of the same check and it would be nice to unify them. Using sched_pin() would also be fine I think. > > I think vmspace_exit() should issue a release fence with the cmpset and > > an acquire fence when handling the refcnt == 1 case, > Yep, true, fully agree. Alan pointed out in the review that pmap_remove_pages() acquires the pmap lock, which I missed, so I don't think the extra barriers are necessary after all.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20201024193735.GA7755>