From owner-freebsd-current@freebsd.org Mon Oct 12 18:13:34 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6A0A243DA89 for ; Mon, 12 Oct 2020 18:13:34 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4C96Hd5Fr1z4WMN for ; Mon, 12 Oct 2020 18:13:33 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x836.google.com with SMTP id r8so14166328qtp.13 for ; Mon, 12 Oct 2020 11:13:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KBwo4i43GiWoJw5v9KV4HdS3PDa5YY/wJDQGjMAAkqU=; b=uCJLRpRtu5t0CFUFA6N+usgVa3Iy8FOKrEnIhjBCuG9MsaQF5y3EttU9q/KKqhn+N1 lxE2BKl9wXMeEqQXNNHNPcF+FyUskMp+DFpfR7w5hkDltdLwjIXFlS4BYIQGkIxvhthQ Y+8LQ3IHscP+nZIoLm54a5HGxGcypwecqlS2faniunlNy8FVQJOv6komoXTlK0TW5D77 9DHXyK0+wUYMXA2OTmAxj7szs4Hu1nM9ZM07aLxylCq5MVbIYFPI4Odyoz3NYN90UpLb f3dZLzJWCGmF1K8AD3UC6cq02yI2dAWC7m7ISKgXTOYwHUz0qxV50dzs4QWT81fdtIMS 09Wg== X-Gm-Message-State: AOAM533EfNMlh6ZAuyhUCOrJ5TUCEwlZEROaEQGe15WbOyLsNQbcGjhe Xy/NbowOfSQ1C3VBPQOV8JUjSoeMzMsG+w726L9cIw== X-Google-Smtp-Source: ABdhPJxrAajCEA67t9rddFjiTI5Mi22xEuZ5dIIwcIsN4CtIiHGMh8uwODMe0CkKwkcLJiVs0z4U5Q0XXEVIeEctmB4= X-Received: by 2002:ac8:3178:: with SMTP id h53mr11160960qtb.187.1602526412498; Mon, 12 Oct 2020 11:13:32 -0700 (PDT) MIME-Version: 1.0 References: <02fa309e-9467-f741-8092-974bfc145c9a@FreeBSD.org> <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> In-Reply-To: <5e4f0439-08fa-7715-7672-05793d05cc6d@FreeBSD.org> From: Warner Losh Date: Mon, 12 Oct 2020 12:13:21 -0600 Message-ID: Subject: Re: GPF on boot with devmatch To: Alexander Motin Cc: Xin LI , FreeBSD Current , Warner Losh X-Rspamd-Queue-Id: 4C96Hd5Fr1z4WMN X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.33 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(0.00)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; NEURAL_HAM_MEDIUM(-0.65)[-0.649]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_HAM_LONG(-0.70)[-0.698]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; URIBL_RED(3.50)[ixsystems.com:url]; BAD_REP_POLICIES(0.10)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; HAS_ANON_DOMAIN(0.10)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::836:from]; NEURAL_HAM_SHORT(-0.22)[-0.223]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; MAILMAN_DEST(0.00)[freebsd-current]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Oct 2020 18:13:34 -0000 On Mon, Oct 5, 2020 at 3:39 PM Alexander Motin wrote: > On 05.10.2020 17:20, Warner Losh wrote: > > On Mon, Oct 5, 2020 at 12:36 PM Alexander Motin > > wrote: > > > > I can add that we've received report about identical panic on FreeBSD > > releng/12.2 of r365436, AKA TrueNAS 12.0-RC1: > > https://jira.ixsystems.com/browse/NAS-107578 . So it looks a) > pretty > > rate (one report from thousands of early adopters and none in our > lab), > > and b) it is in stable/12 too, not only head. > > > > Thanks! I'll see if I can recreate here.... But we're accessing the > > sysctl tree from devmatch to get some information, which should always > > be OK (the fact that it isn't suggests either a bug in some driver > > leaving bad pointers, or some race or both)... It would be nice to know > > which nodes they were, or to have a kernel panic I can look at... > > All we have now in this case is a screenshot you may see in the ticket. > Also previously the same user on some earlier version of stable/12 > reported other very weird panics on process lock being dropped where it > can't be in some other sysctls inside kern.proc, so if we guess those > are related, I suspect there may be some kind of memory corruption > happening, but have no clue where. Unfortunately we have only textdumps > for those. So if Xin is able to reproduce it locally, it may be our > best chance to debug it, at least this specific issue. > That's totally weird. Xin Li's traceback lead to code I just rewrote in current, while this code leads to code that's been there for a long time and hasn't been MFC'd. This suggests that Xin Li's backtrace isn't to be trusted, or there's two issues at play. Both are plausible. I've fixed a minor signedness bug and a possible one byte overflow that might have happened in the code I just rewrote. But I suspect this is due to something else related to how children are handled after we've raced. Maybe there's something special about how USB does things, because other buses will create the child early and the child list is stable. If USB's discovery code is adding something and is racing with devd's walking of the tree, that might explain it... It would be nice if there were some way to provoke the race on a system I could get a core from for deeper analysis.... Warner > -- > Alexander Motin >