From owner-freebsd-current@freebsd.org Mon Oct 26 03:24:43 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8676F45FA4F; Mon, 26 Oct 2020 03:24:43 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CKKvZ4DMrz4sfX; Mon, 26 Oct 2020 03:24:42 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: by mail-ot1-x343.google.com with SMTP id m22so6813326ots.4; Sun, 25 Oct 2020 20:24:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=E0p4IQyynXfuDQREqz90oFb+GMZfYSBTw/F660fpwvQ=; b=sSvfyR2cJ+4ie8S+KZx4S1zts//Ajcl7e0CLiQYNjQ+Cm511ZJC7FfzN1fls2X+y+G La2TOAs/LADn2iZQS6sw2zur4mt0z71zA1DD7hx4RR4IDcsQp70tcC0zCq0mzSThgmQ5 /Moxn1czu5cIBT/e5RrOf0LQ02eNCRQBmw5VTLuAJIClDHo8wL5qHlPvAv+wGwG2aSAG vW7YAyX2TM602xJCoxK5S2VY8615Od33QhjEiUn4F35DAE785rJUbj4jQB9urx0U9DIr RmaeUV8N4I5Q7x1WaiiviIcCNJJjZsJxnonpII6iYgvi9GwKjE+ursHMRPmLGKGww8Be T0dw== X-Gm-Message-State: AOAM533HwS0epruf/zkWiVWQOSCvkF28hhm21pYLvU7p42gtk71inn+6 CBOsi6dj2jF6CKZJ1Sws2xIgo9O8Pzo9RJDVjKvyrK2V X-Google-Smtp-Source: ABdhPJwFEX9KN36/hip3zhV4j3W/cFWm5oUSYy59+MtBTk/qLtKK5NbdoQlcfQOfz2fVwLABnEUbi8PdbbfOZxm4Owc= X-Received: by 2002:a9d:518c:: with SMTP id y12mr3369564otg.284.1603682681308; Sun, 25 Oct 2020 20:24:41 -0700 (PDT) MIME-Version: 1.0 References: <20201006021029.GA13260@www.zefox.net> <20201006133743.GA96285@raichu> <20201019203954.GC46122@raichu> <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org> <20201024193735.GA7755@raichu> In-Reply-To: <20201024193735.GA7755@raichu> Reply-To: alc@freebsd.org From: Alan Cox Date: Sun, 25 Oct 2020 22:24:30 -0500 Message-ID: Subject: Re: panic: non-current pmap 0xffffa00020eab8f0 on Rpi3 To: Mark Johnston Cc: mmel@freebsd.org, bob prohaska , freebsd-current , freebsd-arm X-Rspamd-Queue-Id: 4CKKvZ4DMrz4sfX X-Spamd-Bar: -- X-Spamd-Result: default: False [-3.00 / 15.00]; HAS_REPLYTO(0.00)[alc@freebsd.org]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; RCPT_COUNT_FIVE(0.00)[5]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TAGGED_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.04)[-1.036]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.03)[0.030]; NEURAL_HAM_LONG(-0.99)[-0.992]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; REPLYTO_DOM_NEQ_FROM_DOM(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::343:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-current,freebsd-arm] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Oct 2020 03:24:43 -0000 On Sat, Oct 24, 2020 at 2:38 PM Mark Johnston wrote: > On Fri, Oct 23, 2020 at 06:32:25PM +0200, Michal Meloun wrote: > > > > > > On 19.10.2020 22:39, Mark Johnston wrote: > > > On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote: > > >> > > >> > > >> On 06.10.2020 15:37, Mark Johnston wrote: > > >>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote: > > >>>> Still seeing non-current pmap panics on the Pi3, this time a B+ > running > > >>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master) > > >>>> during a -j4 buildworld. The backtrace reports > > >>>> > > >>>> panic: non-current pmap 0xffffa00020eab8f0 > > >>> > > >>> Could you show the output of "show procvm" from the debugger? > > >> > > >> I see same panic too, in my case its very rare - typical scenario is > > >> rebuild of kf5 ports (~250, 2 days of full load). Any idea how to > debug > > >> this? > > >> Michal > > > > > > I suspect that there is some race involving the pmap switching in > > > vmspace_exit(), but I can't see it. In the example below, presumably > > > process 22604 on CPU 0 is also exiting? Could you show the backtrace?> > > > It would also be useful to see the value of PCPU_GET(curpmap) at the > > > time of the panic. I'm not sure if there's a way to get that from DDB, > > > but I suspect it should be equal to &vmspace0->vm_pmap. > > Mark, > > I think that I found problem. > > The PCPU_GET() is not (and is not supposed to be) an atomic operation, > > it expects that thread is at least pinned. > > This is not true for pmap_remove_pages() - so I think that the KASSERT > > is racy and shoud be removed (or at least covered by > > sched_pin()/sched_unpin() pair). > > What do you think? > > I think you're right. On amd64 curpmap is loaded using a single > instruction so the assertion happens to work properly. On arm64 we > have: > > 0xffff0000007ff138 <+32>: mov x8, x18 > 0xffff0000007ff13c <+36>: ldr x8, [x8, #216] > 0xffff0000007ff140 <+40>: mov x26, x0 > 0xffff0000007ff144 <+44>: cmp x8, x0 > > Though, it looks like arm64's PCPU_GET could be modified to combine the > first two instructions. > > To fix it, we could perhaps change the KASSERT to verify that pmap == > vmspace_pmap(curthread->td_proc->p_vmspace). ... > Just delete it. It isn't useful. ... The various > implementations of pmap_remove_pages() have different flavours of the > same check and it would be nice to unify them. Using sched_pin() would > also be fine I think. > The useful version exists on amd64, where we verify that the pmap is only active on the processor performing pmap_remove_pages(). The reason being that some implementations of pmap_remove_pages(), including amd64's and arm64's, don't not use atomic RMW operations to simultaneously clear a PTE and check the status of the dirty bit. > > I think vmspace_exit() should issue a release fence with the cmpset and > > > an acquire fence when handling the refcnt == 1 case, > > Yep, true, fully agree. > > Alan pointed out in the review that pmap_remove_pages() acquires the > pmap lock, which I missed, so I don't think the extra barriers are > necessary after all. > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >