From owner-freebsd-current@freebsd.org Sat Oct 24 19:37:42 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id F3744452917; Sat, 24 Oct 2020 19:37:42 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CJWbB1dPfz417P; Sat, 24 Oct 2020 19:37:41 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x731.google.com with SMTP id b69so4835244qkg.8; Sat, 24 Oct 2020 12:37:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=e85WFxtcM8vw1yb7YkMm1o5VYF29zO92gdczz6GqUVo=; b=TMEieT6QqIoF4ZeSrCIsSuxmShA6YTTpSZqr7hfv/QPipYcnQ5+KmdU388SaCja7WL qX/NdrLxRHWIN2kcMnS4b5YI6gferBFvQngRThLRXwP8TrQL9zngB4wINzlYfr9QYasO zDFSnkgOUwtuZIhEdafQCF8rzXnEldDbHnOFS5te+eSpe2PVB3+wtmo3gAi1C1dTkzTw x0snomkTZU9ZsGzHn1TANxTA+9+74yM7lM1fu+f7iiIRqD8kGF2Zd3tnAYGfNyTmZ+G5 A8x4v1h13N7S30sUrDFGsZn7f8VVyUWIRGkeg93Pp6nKUpV90u7sfbkS8fWPN01r0fdI rAYA== X-Gm-Message-State: AOAM533VpC6exYKdWFGvcRNpVGhdy7WGRD0g5xambk1vAq3GP/YppB/9 PkBx+B6RgSCSLIzDZnc6qBlQMl7uZs8= X-Google-Smtp-Source: ABdhPJw8yryx1PcxuEZ0tBIni/FP8pKTPzv24Z5nk9VgnRnMtizA6lETmKy8ca9JwwYHiOz/o0ScHA== X-Received: by 2002:a05:620a:142d:: with SMTP id k13mr8389579qkj.315.1603568260845; Sat, 24 Oct 2020 12:37:40 -0700 (PDT) Received: from raichu (toroon0560w-lp130-01-174-88-77-103.dsl.bell.ca. [174.88.77.103]) by smtp.gmail.com with ESMTPSA id f3sm3349914qkl.134.2020.10.24.12.37.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Oct 2020 12:37:39 -0700 (PDT) Sender: Mark Johnston Date: Sat, 24 Oct 2020 15:37:35 -0400 From: Mark Johnston To: mmel@freebsd.org Cc: bob prohaska , freebsd-current@freebsd.org, freebsd-arm@freebsd.org Subject: Re: panic: non-current pmap 0xffffa00020eab8f0 on Rpi3 Message-ID: <20201024193735.GA7755@raichu> References: <20201006021029.GA13260@www.zefox.net> <20201006133743.GA96285@raichu> <20201019203954.GC46122@raichu> <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <454e1e9f-e839-8961-2ae1-9ddd86f1cefd@freebsd.org> X-Rspamd-Queue-Id: 4CJWbB1dPfz417P X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.01 / 15.00]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[text/plain]; MID_RHS_NOT_FQDN(0.50)[]; DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_LONG(-0.97)[-0.974]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; NEURAL_HAM_SHORT(-0.28)[-0.284]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::731:from]; NEURAL_HAM_MEDIUM(-1.05)[-1.053]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; MAILMAN_DEST(0.00)[freebsd-current,freebsd-arm] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Oct 2020 19:37:43 -0000 On Fri, Oct 23, 2020 at 06:32:25PM +0200, Michal Meloun wrote: > > > On 19.10.2020 22:39, Mark Johnston wrote: > > On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote: > >> > >> > >> On 06.10.2020 15:37, Mark Johnston wrote: > >>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote: > >>>> Still seeing non-current pmap panics on the Pi3, this time a B+ running > >>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master) > >>>> during a -j4 buildworld. The backtrace reports > >>>> > >>>> panic: non-current pmap 0xffffa00020eab8f0 > >>> > >>> Could you show the output of "show procvm" from the debugger? > >> > >> I see same panic too, in my case its very rare - typical scenario is > >> rebuild of kf5 ports (~250, 2 days of full load). Any idea how to debug > >> this? > >> Michal > > > > I suspect that there is some race involving the pmap switching in > > vmspace_exit(), but I can't see it. In the example below, presumably > > process 22604 on CPU 0 is also exiting? Could you show the backtrace?> > > It would also be useful to see the value of PCPU_GET(curpmap) at the > > time of the panic. I'm not sure if there's a way to get that from DDB, > > but I suspect it should be equal to &vmspace0->vm_pmap. > Mark, > I think that I found problem. > The PCPU_GET() is not (and is not supposed to be) an atomic operation, > it expects that thread is at least pinned. > This is not true for pmap_remove_pages() - so I think that the KASSERT > is racy and shoud be removed (or at least covered by > sched_pin()/sched_unpin() pair). > What do you think? I think you're right. On amd64 curpmap is loaded using a single instruction so the assertion happens to work properly. On arm64 we have: 0xffff0000007ff138 <+32>: mov x8, x18 0xffff0000007ff13c <+36>: ldr x8, [x8, #216] 0xffff0000007ff140 <+40>: mov x26, x0 0xffff0000007ff144 <+44>: cmp x8, x0 Though, it looks like arm64's PCPU_GET could be modified to combine the first two instructions. To fix it, we could perhaps change the KASSERT to verify that pmap == vmspace_pmap(curthread->td_proc->p_vmspace). The various implementations of pmap_remove_pages() have different flavours of the same check and it would be nice to unify them. Using sched_pin() would also be fine I think. > > I think vmspace_exit() should issue a release fence with the cmpset and > > an acquire fence when handling the refcnt == 1 case, > Yep, true, fully agree. Alan pointed out in the review that pmap_remove_pages() acquires the pmap lock, which I missed, so I don't think the extra barriers are necessary after all.