From owner-freebsd-hackers@freebsd.org Tue Apr 20 05:32:54 2021 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 49C5C5E4D58 for ; Tue, 20 Apr 2021 05:32:54 +0000 (UTC) (envelope-from n7w@delta.emu.st) Received: from f3.bushwire.net (f3.bushwire.net [203.0.120.11]) by mx1.freebsd.org (Postfix) with ESMTP id 4FPXQD5bjJz3nXN for ; Tue, 20 Apr 2021 05:32:52 +0000 (UTC) (envelope-from n7w@delta.emu.st) Received: by f3.bushwire.net (Postfix, from userid 1001) id B65D03B01E; Tue, 20 Apr 2021 15:32:37 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/simple; d=emu.st; s=2019; t=1618896757; bh=33m+Zct+szHUCMmtvxynV39nl/0=; h=Comments:Received:From:Comments:Message-ID:In-Reply-To:Date:To: Subject:References:Mime-Version:Content-Type:Content-Disposition; b=brKKwt/qyeGMCWxUO81E46jm97L5AunmsBNWPRDM5rbzg8qD+bjN/OgulZ0GDZIHZ s1uBj6p9Kw5G4acXsrI6/JJEnf9d1enZ2JSZcdJTPsHoKRyhGRqqYIHKyyy2iygxFy YiFgyqoY71wJ8zOd/yszOQPWf/HztRA8Plgagt2k=agt2k= Comments: QMDA 0.3a Received: (qmail 61241 invoked by uid 1001); 20 Apr 2021 05:32:37 -0000 From: "Mark Delany" Comments: QMDASubmit submit() 0.2.0-final Message-ID: <0.2.0-final-1618896757.688-0xb6a34e@qmda.emu.st> In-Reply-To: <20210420021318.GB18217@blisses.org> Date: Tue, 20 Apr 2021 05:32:37 +0000 To: freebsd-hackers@freebsd.org Subject: Re: Various problems with 13.0 amd64 on vultr.com References: <0.2.0-final-1618742820.474-0x878fa2@qmda.emu.st> <20210420021318.GB18217@blisses.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 4FPXQD5bjJz3nXN X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=fail (headers rsa verify failed) header.d=emu.st header.s=2019 header.b=brKKwt/q; dmarc=none; spf=pass (mx1.freebsd.org: domain of n7w@delta.emu.st designates 203.0.120.11 as permitted sender) smtp.mailfrom=n7w@delta.emu.st X-Spamd-Result: default: False [0.09 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.991]; NEURAL_SPAM_SHORT(0.88)[0.881]; RBL_SENDERSCORE_FAIL(0.00)[203.0.120.11:server fail]; FROM_HAS_DN(0.00)[]; R_DKIM_REJECT(1.00)[emu.st:s=2019]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[emu.st]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[203.0.120.11:from:127.0.2.255]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[emu.st:-]; R_SPF_ALLOW(-0.20)[+ip4:203.0.120.0/24]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[203.0.120.11:from]; ASN(0.00)[asn:4764, ipnet:203.0.120.0/24, country:AU]; MAILMAN_DEST(0.00)[freebsd-hackers] X-Mailman-Approved-At: Tue, 20 Apr 2021 07:34:46 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Technical discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2021 05:32:54 -0000 On 19Apr21, Mason Loring Bliss allegedly wrote: > I haven't seen a hang yet, but the test system hasn't been up much more > than ten minutes, so I'll report back later. I think I've isolated it to natd traffic. The system stays up reliably with natd disabled but hangs within a couple of minutes of an inbound ipv4 traffic. If I just run with the ipfw rule and the divert kernel module, then no problem the system runs albeit without any real ipv4 traffic working for obvious reasons. But I can happily do anything I like in ipv6 and it runs fine. But as soon as natd is run with inbound traffic such as an ssh session, then the system mostly hangs and according to the vultr console, it's spinning at 100% CPU. I say "mostly hangs" because I have now caused at least one core dump while ostensibly reproducing the hang. Here is a snippet of crashinfo data. Happy to provide more to anyone but it's 90K so I didn't think it appropriate to post it here. ... Unread portion of the kernel message buffer: panic: sbappendaddr_locked cpuid = 0 time = 1618895504 KDB: stack backtrace: #0 0xffffffff80c57345 at kdb_backtrace+0x65 #1 0xffffffff80c09d21 at vpanic+0x181 #2 0xffffffff80c09b93 at panic+0x43 #3 0xffffffff80ca51e0 at sbappendaddr_locked_internal+0 #4 0xffffffff827eafd0 at divert_packet+0x1a0 #5 0xffffffff827a2c81 at ipfw_check_packet+0x2c1 #6 0xffffffff80d41f87 at pfil_run_hooks+0x97 #7 0xffffffff80db2d71 at ip_output+0xb61 #8 0xffffffff80dc94b4 at tcp_output+0x1b04 #9 0xffffffff80dcf973 at tcp_ctlinput+0x313 #10 0xffffffff80daf105 at icmp_input+0x795 #11 0xffffffff80dafc15 at ip_input+0x125 #12 0xffffffff80d3fa7b at swi_net+0x12b #13 0xffffffff80bcae5d at ithread_loop+0x24d #14 0xffffffff80bc7c5e at fork_exit+0x7e #15 0xffffffff8106282e at fork_trampoline+0xe Uptime: 11m13s Dumping 123 out of 982 MB:..13%..26%..39%..52%..65%..78%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory. (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c09916 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80c09d90 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80c09b93 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff80ca51e0 in sbappendaddr_locked (sb=0xfffff800069b4c58, asa=0xfffffe00491bcd00, m0=0xfffff80006b7a000, control=0x0) at /usr/src/sys/kern/uipc_sockbuf.c:1198 #6 0xffffffff827eafd0 in divert_packet (m=0xfffff80006b7a000, incoming=) at /usr/src/sys/netinet/ip_divert.c:285 #7 0xffffffff827a2c81 in ipfw_divert (m0=0xfffffe00491bcf58, args=0xfffffe00491bcd70, tee=) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:525 #8 ipfw_check_packet (m0=0xfffffe00491bcf58, ifp=0xfffff8000358a000, flags=131072, ruleset=, inp=0xfffff80006f92000) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:283 #9 0xffffffff80d41f87 in pfil_run_hooks (head=, p=..., ifp=0xfffff8000358a000, flags=flags@entry=131072, inp=inp@entry=0xfffff80006f92000) at /usr/src/sys/net/pfil.c:187 #10 0xffffffff80db2d71 in ip_output_pfil (mp=0xfffffe00491bcf58, ifp=0xfffff8000358a000, flags=0, inp=0xfffff80006f92000, dst=0xfffff80006f921a8, fibnum=, error=) at /usr/src/sys/netinet/ip_output.c:130 #11 ip_output (m=0x0, m@entry=0xfffff80006b7a000, opt=, ro=, flags=0, imo=imo@entry=0x0, inp=) at /usr/src/sys/netinet/ip_output.c:705 #12 0xffffffff80dc94b4 in tcp_output (tp=0xfffffe008b5e1c48) at /usr/src/sys/netinet/tcp_output.c:1492 #13 0xffffffff80dcf973 in tcp_ctlinput (cmd=, cmd@entry=, sa=, sa@entry=, vip=0xfffff80006b511ac, vip@entry=) at /usr/src/sys/netinet/tcp_subr.c:2544 #14 0xffffffff80daf105 in icmp_input (mp=0xfffffe00491bd300, mp@entry=, offp=0xfffffe00491bd2fc, offp@entry=, proto=, proto@entry=) at /usr/src/sys/netinet/ip_icmp.c:571 #15 0xffffffff80dafc15 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:829 #16 0xffffffff80d3fa7b in netisr_process_workstream_proto ( nwsp=, proto=1) at /usr/src/sys/net/netisr.c:919 #17 swi_net (arg=) at /usr/src/sys/net/netisr.c:966 #18 0xffffffff80bcae5d in intr_event_execute_handlers (p=, ie=0xfffff8000332bc00) at /usr/src/sys/kern/kern_intr.c:1168 #19 ithread_execute_handlers (p=, ie=0xfffff8000332bc00) at /usr/src/sys/kern/kern_intr.c:1181 #20 ithread_loop (arg=arg@entry=0xfffff8000332fe00) at /usr/src/sys/kern/kern_intr.c:1269 #21 0xffffffff80bc7c5e in fork_exit ( callout=0xffffffff80bcac10 , arg=0xfffff8000332fe00, frame=0xfffffe00491bd480) at /usr/src/sys/kern/kern_fork.c:1069 #22 (kgdb) ... Happy to provide further info and run anything that folk think might help provide more useful diagnostic info. Oh, the interface, if it's relevant, is: vtnet0: on virtio_pci0 Mark.