From owner-freebsd-stable@freebsd.org Fri Nov 20 14:53:51 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 439A02ED001 for ; Fri, 20 Nov 2020 14:53:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Cd01C1Vfxz3kjN; Fri, 20 Nov 2020 14:53:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [5.9.86.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.codepro.be", Issuer "Let's Encrypt Authority X3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id BD1D1FB8A; Fri, 20 Nov 2020 14:53:50 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: by venus.codepro.be (Postfix, authenticated sender kp) id 65ECB4B93B; Fri, 20 Nov 2020 15:53:48 +0100 (CET) From: "Kristof Provost" To: "Peter Blok" Cc: "FreeBSD Stable" Subject: Re: Commit 367705+367706 causes a pabic Date: Fri, 20 Nov 2020 15:53:47 +0100 X-Mailer: MailMate (1.13.2r5673) Message-ID: <5489BF90-E1F6-44C2-ABD5-1C52BABA206D@FreeBSD.org> In-Reply-To: <5DA2A6D4-1FDA-48B3-A319-F26CB802E01B@bsd4all.org> References: <1753B4A3-2FFC-47A5-9D0C-DC0B71BA22E8@FreeBSD.org> <665757BF-DA06-4503-9ACD-8A4630E23FF4@bsd4all.org> <5DA2A6D4-1FDA-48B3-A319-F26CB802E01B@bsd4all.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2020 14:53:51 -0000 I still can’t reproduce that panic. Does it happen immediately after you start a vnet jail? Does it also happen with a GENERIC kernel? Regards, Kristof On 20 Nov 2020, at 14:53, Peter Blok wrote: > The panic with ipsec code in the backtrace was already very strange. I > was using IPsec, but only on one interface totally separate from the > members of the bridge as well as the bridge itself. The jails were not > doing any ipsec as well. Note that panic was a while ago and it was > after the 1st bridge epochification was done on stable-12 which was > later backed out > > Today the system is no longer using ipsec, but it is still compiled > in. I can remove it if need be for a test > > > src.conf > WITHOUT_KERBEROS=yes > WITHOUT_GSSAPI=yes > WITHOUT_SENDMAIL=true > WITHOUT_MAILWRAPPER=true > WITHOUT_DMAGENT=true > WITHOUT_GAMES=true > WITHOUT_IPFILTER=true > WITHOUT_UNBOUND=true > WITHOUT_PROFILE=true > WITHOUT_ATM=true > WITHOUT_BSNMP=true > #WITHOUT_CROSS_COMPILER=true > WITHOUT_DEBUG_FILES=true > WITHOUT_DICT=true > WITHOUT_FLOPPY=true > WITHOUT_HTML=true > WITHOUT_HYPERV=true > WITHOUT_NDIS=true > WITHOUT_NIS=true > WITHOUT_PPP=true > WITHOUT_TALK=true > WITHOUT_TESTS=true > WITHOUT_WIRELESS=true > #WITHOUT_LIB32=true > WITHOUT_LPR=true > > make.conf > KERNCONF=BHYVE > MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge bridgestp > if_vxlan pflog libmchain libiconv smbfs linux linux64 linux_common > linuxkpi linprocfs linsysfs ext2fs > DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8 > OPTIONS_UNSET=DOCS NLS MANPAGES > > BHYVE > cpu HAMMER > ident BHYVE > > makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols > makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support > > options CAMDEBUG > > options SCHED_ULE # ULE scheduler > options PREEMPTION # Enable kernel thread preemption > options INET # InterNETworking > options INET6 # IPv6 communications protocols > options IPSEC > options TCP_OFFLOAD # TCP offload > options TCP_RFC7413 # TCP FASTOPEN > options SCTP # Stream Control Transmission Protocol > options FFS # Berkeley Fast Filesystem > options SOFTUPDATES # Enable FFS soft updates support > options UFS_ACL # Support for access control lists > options UFS_DIRHASH # Improve performance on big directories > options UFS_GJOURNAL # Enable gjournal-based UFS journaling > options QUOTA # Enable disk quotas for UFS > options SUIDDIR > options NFSCL # Network Filesystem Client > options NFSD # Network Filesystem Server > options NFSLOCKD # Network Lock Manager > options MSDOSFS # MSDOS Filesystem > options CD9660 # ISO 9660 Filesystem > options FUSEFS > options NULLFS # NULL filesystem > options UNIONFS > options FDESCFS # File descriptor filesystem > options PROCFS # Process filesystem (requires PSEUDOFS) > options PSEUDOFS # Pseudo-filesystem framework > options GEOM_PART_GPT # GUID Partition Tables. > options GEOM_RAID # Soft RAID functionality. > options GEOM_LABEL # Provides labelization > options GEOM_ELI # Disk encryption. > options COMPAT_FREEBSD32 # Compatible with i386 binaries > options COMPAT_FREEBSD4 # Compatible with FreeBSD4 > options COMPAT_FREEBSD5 # Compatible with FreeBSD5 > options COMPAT_FREEBSD6 # Compatible with FreeBSD6 > options COMPAT_FREEBSD7 # Compatible with FreeBSD7 > options COMPAT_FREEBSD9 # Compatible with FreeBSD9 > options COMPAT_FREEBSD10 # Compatible with FreeBSD10 > options COMPAT_FREEBSD11 # Compatible with FreeBSD11 > options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI > options KTRACE # ktrace(1) support > options STACK # stack(9) support > options SYSVSHM # SYSV-style shared memory > options SYSVMSG # SYSV-style message queues > options SYSVSEM # SYSV-style semaphores > options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time > extensions > options PRINTF_BUFR_SIZE=128 # Prevent printf output being > interspersed. > options KBD_INSTALL_CDEV # install a CDEV entry in /dev > options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4) > options AUDIT # Security event auditing > options CAPABILITY_MODE # Capsicum capability mode > options CAPABILITIES # Capsicum capabilities > options MAC # TrustedBSD MAC Framework > options MAC_PORTACL > options MAC_NTPD > options KDTRACE_FRAME # Ensure frames are compiled in > options KDTRACE_HOOKS # Kernel DTrace hooks > options DDB_CTF # Kernel ELF linker loads CTF data > options INCLUDE_CONFIG_FILE # Include this file in kernel > > # Debugging support. Always need this: > options KDB # Enable kernel debugger support. > options KDB_TRACE # Print a stack trace for a panic. > options KDB_UNATTENDED > > # Make an SMP-capable kernel by default > options SMP # Symmetric MultiProcessor Kernel > options EARLY_AP_STARTUP > > # CPU frequency control > device cpufreq > device cpuctl > device coretemp > > # Bus support. > device acpi > options ACPI_DMAR > device pci > options PCI_IOV # PCI SR-IOV support > > device iicbus > device iicbb > > device iic > device ic > device iicsmb > > device ichsmb > device smbus > device smb > > #device jedec_dimm > > # ATA controllers > device ahci # AHCI-compatible SATA controllers > device mvs # Marvell 88SX50XX/88SX60XX/88SX70XX/SoC SATA > > # SCSI Controllers > device mps # LSI-Logic MPT-Fusion 2 > > # ATA/SCSI peripherals > device scbus # SCSI bus (required for ATA/SCSI) > device da # Direct Access (disks) > device cd # CD > device pass # Passthrough device (direct ATA/SCSI access) > device ses # Enclosure Services (SES and SAF-TE) > device sg > > device cfiscsi > device ctl # CAM Target Layer > device iscsi > > # atkbdc0 controls both the keyboard and the PS/2 mouse > device atkbdc # AT keyboard controller > device atkbd # AT keyboard > device psm # PS/2 mouse > > device kbdmux # keyboard multiplexer > > # vt is the new video console driver > device vt > device vt_vga > device vt_efifb > > # Serial (COM) ports > device uart # Generic UART driver > > # PCI/PCI-X/PCIe Ethernet NICs that use iflib infrastructure > device iflib > device em # Intel PRO/1000 Gigabit Ethernet Family > device ix # Intel PRO/10GbE PCIE PF Ethernet > > # Network stack virtualization. > options VIMAGE > > # Pseudo devices. > device crypto > device cryptodev > device loop # Network loopback > device random # Entropy device > device padlock_rng # VIA Padlock RNG > device rdrand_rng # Intel Bull Mountain RNG > device ipmi > device smbios > device vpd > device aesni # AES-NI OpenCrypto module > device ether # Ethernet support > device lagg > device vlan # 802.1Q VLAN support > device tuntap # Packet tunnel. > device md # Memory "disks" > device gif # IPv6 and IPv4 tunneling > device firmware # firmware assist module > > device pf > #device pflog > #device pfsync > > # The `bpf' device enables the Berkeley Packet Filter. > # Be aware of the administrative consequences of enabling this! > # Note that 'bpf' is required for DHCP. > device bpf # Berkeley packet filter > > # The `epair' device implements a virtual back-to-back connected > Ethernet > # like interface pair. > device epair > > # USB support > options USB_DEBUG # enable debug msgs > device uhci # UHCI PCI->USB interface > device ohci # OHCI PCI->USB interface > device ehci # EHCI PCI->USB interface (USB 2.0) > device xhci # XHCI PCI->USB interface (USB 3.0) > device usb # USB Bus (required) > device uhid > device ukbd # Keyboard > device umass # Disks/Mass storage - Requires scbus and da > device ums > > device filemon > > device if_bridge > >> On 20 Nov 2020, at 12:53, Kristof Provost wrote: >> >> Can you share your kernel config file (and src.conf / make.conf if >> they exist)? >> >> This second panic is in the IPSec code. My current thinking is that >> your kernel config is triggering a bug that’s manifesting in >> multiple places, but not actually caused by those places. >> >> I’d like to be able to reproduce it so we can debug it. >> >> Best regards, >> Kristof >> >> On 20 Nov 2020, at 12:02, Peter Blok wrote: >>> Hi Kristof, >>> >>> This is 12-stable. With the previous bridge epochification that was >>> backed out my config had a panic too. >>> >>> I don’t have any local modifications. I did a clean rebuild after >>> removing /usr/obj/usr >>> >>> My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and >>> nmdm.ko as modules. Everything else is statically linked. I have >>> removed all drivers not needed for the hardware at hand. >>> >>> My bridge is between two vlans from the same trunk and the jail >>> epair devices as well as the bhyve tap devices. >>> >>> The panic happens when the jails are starting. >>> >>> I can try to narrow it down over the weekend and make the crash dump >>> available for analysis. >>> >>> Previously I had the following crash with 363492 >>> >>> kernel trap 12 with interrupts disabled >>> >>> >>> Fatal trap 12: page fault while in kernel mode >>> cpuid = 2; apic id = 02 >>> fault virtual address = 0xffffffff00000410 >>> fault code = supervisor read data, page not present >>> instruction pointer = 0x20:0xffffffff80692326 >>> stack pointer = 0x28:0xfffffe00c06097b0 >>> frame pointer = 0x28:0xfffffe00c06097f0 >>> code segment = base 0x0, limit 0xfffff, type 0x1b >>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>> processor eflags = resume, IOPL = 0 >>> current process = 2030 (ifconfig) >>> trap number = 12 >>> panic: page fault >>> cpuid = 2 >>> time = 1595683412 >>> KDB: stack backtrace: >>> #0 0xffffffff80698165 at kdb_backtrace+0x65 >>> #1 0xffffffff8064d67b at vpanic+0x17b >>> #2 0xffffffff8064d4f3 at panic+0x43 >>> #3 0xffffffff809cc311 at trap_fatal+0x391 >>> #4 0xffffffff809cc36f at trap_pfault+0x4f >>> #5 0xffffffff809cb9b6 at trap+0x286 >>> #6 0xffffffff809a5b28 at calltrap+0x8 >>> #7 0xffffffff803677fd at ck_epoch_synchronize_wait+0x8d >>> #8 0xffffffff8069213a at epoch_wait_preempt+0xaa >>> #9 0xffffffff807615b7 at ipsec_ioctl+0x3a7 >>> #10 0xffffffff8075274f at ifioctl+0x47f >>> #11 0xffffffff806b5ea7 at kern_ioctl+0x2b7 >>> #12 0xffffffff806b5b4a at sys_ioctl+0xfa >>> #13 0xffffffff809ccec7 at amd64_syscall+0x387 >>> #14 0xffffffff809a6450 at fast_syscall_common+0x101 >>> >>> >>> >>> >>>> On 20 Nov 2020, at 11:30, Kristof Provost wrote: >>>> >>>> On 20 Nov 2020, at 11:18, peter.blok@bsd4all.org >>>> wrote: >>>>> I’m afraid the last Epoch fix for bridge is not solving the >>>>> problem ( or perhaps creates a new ). >>>>> >>>> We’re talking about the stable/12 branch, right? >>>> >>>>> This seems to happen when the jail epair is added to the bridge. >>>>> >>>> There must be something more to it than that. I’ve run the bridge >>>> tests on stable/12 without issue, and this is a problem we didn’t >>>> see when the bridge epochification initially went into stable/12. >>>> >>>> Do you have a custom kernel config? Other patches? What exact >>>> commands do you run to trigger the panic? >>>> >>>>> kernel trap 12 with interrupts disabled >>>>> >>>>> >>>>> Fatal trap 12: page fault while in kernel mode >>>>> cpuid = 6; apic id = 06 >>>>> fault virtual address = 0xc10 >>>>> fault code = supervisor read data, page not present >>>>> instruction pointer = 0x20:0xffffffff80695e76 >>>>> stack pointer = 0x28:0xfffffe00bf14e6e0 >>>>> frame pointer = 0x28:0xfffffe00bf14e720 >>>>> code segment = base 0x0, limit 0xfffff, type 0x1b >>>>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>>>> processor eflags = resume, IOPL = 0 >>>>> current process = 1686 (jail) >>>>> trap number = 12 >>>>> panic: page fault >>>>> cpuid = 6 >>>>> time = 1605811310 >>>>> KDB: stack backtrace: >>>>> #0 0xffffffff8069bb85 at kdb_backtrace+0x65 >>>>> #1 0xffffffff80650a4b at vpanic+0x17b >>>>> #2 0xffffffff806508c3 at panic+0x43 >>>>> #3 0xffffffff809d0351 at trap_fatal+0x391 >>>>> #4 0xffffffff809d03af at trap_pfault+0x4f >>>>> #5 0xffffffff809cf9f6 at trap+0x286 >>>>> #6 0xffffffff809a98c8 at calltrap+0x8 >>>>> #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d >>>>> #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa >>>>> #9 0xffffffff80757d40 at vnet_if_init+0x120 >>>>> #10 0xffffffff8078c994 at vnet_alloc+0x114 >>>>> #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7 >>>>> #12 0xffffffff80620190 at sys_jail_set+0x40 >>>>> #13 0xffffffff809d0f07 at amd64_syscall+0x387 >>>>> #14 0xffffffff809aa1ee at fast_syscall_common+0xf8 >>>> >>>> This panic is rather odd. This isn’t even the bridge code. This >>>> is during initial creation of the vnet. I don’t really see how >>>> this could even trigger panics. >>>> That panic looks as if something corrupted the net_epoch_preempt, >>>> by overwriting the epoch->e_epoch. The bridge patches only access >>>> this variable through the well-established functions and macros. I >>>> see no obvious way that they could corrupt it. >>>> >>>> Best regards, >>>> Kristof >> >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe@freebsd.org"