From owner-freebsd-current@freebsd.org Tue Mar 3 02:47:42 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4749F26339E for ; Tue, 3 Mar 2020 02:47:42 +0000 (UTC) (envelope-from dmarquess@gmail.com) Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48WhJG0R8Mz4L4R; Tue, 3 Mar 2020 02:47:41 +0000 (UTC) (envelope-from dmarquess@gmail.com) Received: by mail-lj1-x242.google.com with SMTP id 195so1812451ljf.3; Mon, 02 Mar 2020 18:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c77dRjTF8Z2qcR6ML43GTJ0WIRqehwhcah8hV4V//kg=; b=pHuyrA628W8kwuJUrRbvIb+CjE9ucfXV7iO6lYn/sGcRaYwahrr0AYCjMW69FnxWn7 JP47lvr4Zwdg6qC3fyFM3kzBa0UzSebhHtQy/Kg/2jL81LviC1WBiaccu+lr8VhG7ORG 6vAPB4y8FG45We6MOw1E0zMsJTB1tRCyc+G8M2/E2G1zzprPDrHvapggrvnKOQZcHPxJ LiVQmcH2UdB4fnD64VDF8jAv4bmR137JoaVJt9G60Fl7OY5uKPXhH3aP8guagiGiSbBL UQGyChgM0vtkUprhRBOw4qg2PXlVJMkTvOf//nSxze/SOMLzxyx4dsj+Jt2Wm8gDujTU RMCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c77dRjTF8Z2qcR6ML43GTJ0WIRqehwhcah8hV4V//kg=; b=skWiEWVDc48G8HlQJ59PdlHY4BuMhFJMpbT6Pn6NXLNGHyPiReUeYQ4CVo/Mwt7tgU XfSK+rT5y0qYUbMb2Lt/bp5+rlIX/EIxwmdbwHSXIMjlAxWOsg3HKUybhod33GFaVnel IRFiYBvFN0gPJQbjhC2yRmYC9xIW0Hd1z7UNof8wou75+ix0SKVqSpCV8YWvAzU9TbSk owOaHx3WLUuEPymh2YonG4wnnc85Yod8QgWhPT0UDz/4PccK3X6lRllvoCqLdRryVY19 0Mux+h4LsH/r4aUFCbHtRI2D6bAARLt7yGHVtFviqBv6kV3kBhHnZCwUENNDCZQZc78R XV+Q== X-Gm-Message-State: ANhLgQ2koDlE1dQE/UcVfqeIWH9WVUiLRQHrwRe23OA4z5E78CATjzsa ZODnHehFkPIPvyHd60FsF7/ITOCm5dnvsVIop2Wum2bR X-Google-Smtp-Source: ADFU+vt3TaANx3tdSV3HUhhwXmdECIfye5OUKS5/nsx7W2AETZ/LODW3ecpAhe6OtAEcHy0U9F2jGZbUkFD2p47X4P8= X-Received: by 2002:a2e:b5ca:: with SMTP id g10mr1075552ljn.123.1583203658530; Mon, 02 Mar 2020 18:47:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Dustin Marquess Date: Mon, 2 Mar 2020 20:47:27 -0600 Message-ID: Subject: Re: -CURRENT fatal trap cause by cxgbe module To: Ryan Libby Cc: FreeBSD CURRENT Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 48WhJG0R8Mz4L4R X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-6.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Mar 2020 02:47:42 -0000 On Mon, Mar 2, 2020 at 6:55 PM Ryan Libby wrote: > > On Sun, Mar 1, 2020 at 8:07 PM Dustin Marquess wrote: > > > > So I've been fighting with any current from the last month or so > > instantly crashing when I boot it. I did notice that kernels in the > > various snapshot images were working, however, so I was trying to > > figure out why. At first I thought it was because I had INVARIANTS > > and such disabled, but no, I finally figured it out. > > > > I've had in my /boot/loader.conf for a while now: > > > > if_cxgbe_load="YES" > > > > I guess since the stock installer kernels don't have cxgbe enabled by > > default. I added "device cxgbe" to my kernels a while ago. Normally > > the kernel would give some error about the module already being loaded > > or something and just continue. As of last month or so, however, > > instead it just crashes: > > > > FreeBSD clang version 9.0.1 (git@github.com:llvm/llvm-project.git > > c1a0a213378a458fbea1a5c77b315c7dce08fd05) (based on LLVM 9.0.1) > > WARNING: WITNESS option enabled, expect reduced performance. > > kernel trap 12 with interrupts disabled > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x8 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80622931 > > stack pointer = 0x28:0xffffffff8241c9a0 > > frame pointer = 0x28:0xffffffff8241c9e0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = resume, IOPL = 0 > > current process = 0 () > > trap number = 12 > > panic: page fault > > cpuid = 0 > > time = 1 > > > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff8241c600 > > vpanic() at vpanic+0x18a/frame 0xffffffff8241c660 > > panic() at panic+0x43/frame 0xffffffff8241c6c0 > > trap_fatal() at trap_fatal+0x386/frame 0xffffffff8241c720 > > trap_pfault() at trap_pfault+0x99/frame 0xffffffff8241c7a0 > > trap() at trap+0x4e9/frame 0xffffffff8241c8d0 > > calltrap() at calltrap+0x8/frame 0xffffffff8241c8d0 > > --- trap 0xc, rip = 0xffffffff80622931, rsp = 0xffffffff8241c9a0, rbp > > = 0xffffffff8241c9e0 --- > > malloc() at malloc+0x51/frame 0xffffffff8241c9e0 > > sysctl_handle_string() at sysctl_handle_string+0x12d/frame 0xffffffff8241ca20 > > sysctl_root_handler_locked() at sysctl_root_handler_locked+0xa2/frame > > 0xffffffff8241ca70 > > sysctl_register_oid() at sysctl_register_oid+0x54c/frame 0xffffffff8241cd80 > > sysctl_register_all() at sysctl_register_all+0x88/frame 0xffffffff8241cda0 > > mi_startup() at mi_startup+0xf2/frame 0xffffffff8241cdf0 > > btext() at btext+0x2c > > KDB: enter: panic > > [ thread pid 0 tid 0 ] > > Stopped at kdb_enter+0x37: movq $0,0xa5f4a6(%rip) > > db> > > > > If I take the if_cxgbe_load out, however, it boots fine. > > You maybe also have something defined in your /boot/loader.conf that > causes a tunable to be set? > > It looks like there's just an ordering bug in kern_sysctl.c, where we > call sysctl_register_all() with SI_SUB_KMEM, SI_ORDER_FIRST but we do > MALLOC_DEFINE() with SI_SUB_KMEM, SI_ORDER_THIRD. If > sysctl_register_all() is going to malloc(), it needs to run after > malloc_init(), and it looks like populating a string tunable causes it > to malloc(). Ah, indeed, I do! That explains why Navdeep couldn't reproduce it. -Dustin