From nobody Sun Jun 23 01:31:35 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4W6D8b0dLzz5NSbK for ; Sun, 23 Jun 2024 01:31:43 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4W6D8Z6th4z4hkx; Sun, 23 Jun 2024 01:31:42 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1719106302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=apuhragj7acoBthW96VpFetRv4Whm5MqKKRoLnkdV6k=; b=wveAM6SfSzJuKK84V1jgI596njBjJf2dsrhsumrKRGWlWdtyqACMqFySS0VX3Cf2D593/z jtlDSxC79X1wTKKjnlVD4gOLzIEnY5PKzzGxRm52oifTG14xFE4ia9PgeAGS4YPkhb9Oaz cbb/V4lBlKgGMcl09+cDxEdq+s/WQKkfhvIjURskY/Z5J/SagjR7c3JAizChRcI+zY+hih w5/XoWZlRlb3O3kf7zYhlp1EAYPRwTW4HFnDGUH42INAyfhIkSNGOVdHTp4QsmGCLENxM9 OKS/Qekp4J/FnX0Qd4Ht/YQvmzUTSjmz3wTnZcyV4np7HdLXqp33XB3obLXZsw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1719106302; a=rsa-sha256; cv=none; b=FfGoEO8K/8AM3pvKMtMlm81Ro+EVxhlr5RkCxblzgV0rL7DCbjdTVU/wbKyxRdjRs8EDz5 SSbXiEovdoOuDR/ITAHgf+qun504nGjndjlWQGzLI1MJ1LGQV6nvgGY04aq19aFGbHlA+N U6Fi0AwngMZTttXes/l1hFWaj/lO40TAnfG2DXdGGUwlWHjo6PzTr/Hd6bY0gzHKWuDRwg 0jyChIncrbJ8G0fGACZPGRoL8NjoAsD9fBx02P7MUAhbWs7YJvMxkbJICQ3AuGCPcLS0gv FyqmpJj0Qm3Bq8jbt/xv4lyIihVGtdrslz0kQmcoas8CXJxDNZxrZh4KveXf7w== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1719106302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=apuhragj7acoBthW96VpFetRv4Whm5MqKKRoLnkdV6k=; b=rIFRbFlPnBc05h9IX7iPdCmgbRuN3mgJkH/Hgkt7JJuG9NKBgE5JqpX9BVvHq7i1ksIb92 tHANPpXr5tmXvUS0L4sbXo0ZcS7aXNV6Y0TU5LQzUEms2NCBBx8uyg5A/JLj+V4Bgrctkh j547ZeI/LEX6WxDGTJvl4CERrhPgcVkIfXGFECq3RfDpP0tVCjfT2UATaI6g6wgjPe9F3l iNPEpOjqnUWwfqXT23vf5TWY1HK3wLZB94Q3MKb7FtwgMsgnkzsYNgyc7Ro65kIMNkuaCS y22pOXSoY+CywYcEzamBz8ZVUUKefOq/pZj6gYKY2RQ3CYZQ0eHqS1JQ6HS30Q== Received: from smtpclient.apple (unknown [IPv6:2001:19f0:6001:9db:98f0:9fe0:3545:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4W6D8Y6D7mz1MbX; Sun, 23 Jun 2024 01:31:41 +0000 (UTC) (envelope-from zlei@FreeBSD.org) From: Zhenlei Huang Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_5FF92A3F-999B-411E-A4BF-E5E75CE7B261" List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.8\)) Subject: Re: ifp gone in ip6_output() -> panic Date: Sun, 23 Jun 2024 09:31:35 +0800 In-Reply-To: Cc: FreeBSD Net To: "Bjoern A. Zeeb" References: <1p003r05-684o-8542-r153-n850s3sspnp3@yvfgf.mnoonqbm.arg> X-Mailer: Apple Mail (2.3696.120.41.1.8) --Apple-Mail=_5FF92A3F-999B-411E-A4BF-E5E75CE7B261 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On May 24, 2024, at 6:12 AM, Bjoern A. Zeeb = wrote: >=20 > On Wed, 22 May 2024, Zhenlei Huang wrote: >=20 >>=20 >>=20 >>> On May 22, 2024, at 12:17 PM, Bjoern A. Zeeb = wrote: >>>=20 >>> Hi, >>>=20 >>> sorry, I cannot dump; this is a diskless and netdump does not do = IPv6; >>> needless to say that would be funny in this case anyway; = unfortunately >>> I have also already re-compiled the kernel so I can only look things = up approx. >>>=20 >>> FreeBSD main from May 13 (f3eeeb959c9b00c89a2e1ff009c78162eb398656). >>>=20 >>> I assume we lost the ifp from a destroy of a cloned interface in = ip6_output() >>> between lines 806 and 811? >>>=20 >>>=20 >>> Kernel page fault with the following non-sleepable locks held: >>> exclusive rw rawinp (rawinp) r =3D 0 (0xfffff80002a6e1a0) locked @ = /usr/src/sys/netinet6/raw_ip6.c:393 >>> stack backtrace: >>> #0 0xffffffff80bb679c at witness_debugger+0x6c >>> #1 0xffffffff80bb7979 at witness_warn+0x3e9 >>> #2 0xffffffff81061d10 at trap_pfault+0x80 >>> #3 0xffffffff81033878 at calltrap+0x8 >>> #4 0xffffffff80d99228 at rip6_send+0x5a8 >>> #5 0xffffffff80bf570e at sosend_generic+0x5ee >>> #6 0xffffffff80bf5c49 at sousrsend+0x79 >>> #7 0xffffffff80bfbd5c at kern_sendit+0x1bc >>> #8 0xffffffff80bfc073 at sendit+0x1b3 >>> #9 0xffffffff80bfc1ab at sys_sendmsg+0x5b >>> #10 0xffffffff81062638 at amd64_syscall+0x158 >>> #11 0xffffffff8103418b at fast_syscall_common+0xf8 >>> Created wlan(4) interfaces: wlan >>=20 >> Note the creation of wlan, and a following ICMP6 (ping6) packet. >=20 > Yes I think it was running netif restart wlan0 in loops. >=20 >=20 > [...] >>=20 >> I'm not quite sure, but it seems the `ifp` is not fully constructed. = See https://cgit.freebsd.org/src/tree/sys/net/if.c#n950 = > >>=20 >> If I read the code correctly, the clone created interface is made = visible via `if_link_ifnet(ifp);` , and at that time the >> `ifp->if_afdata[AF_INET6]` is NULL and is not initialized yet by = `if_attachdomain1()` which will call `in6_domifattach()` >> to allocate the required data. >>=20 >> So I guess there is a race condition. I bet this can be repeated = easily. >>=20 >> I have not tested this yet, and not sure if it is the right fix, but = you can give it a try. >=20 > I'll do; I haven't seen the error happening since on other test > machines, so not sure about repeatability. >=20 > I am also not entirely sure this is not a ping6 ff02::1%wlan0 while > the ifp was destroyed by netif restart at the same time the packet was > still on the way out? I think that is possible. There is another report = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D279653 = but=20 with a different fault path. >=20 > If it was during create, the wlan(4) interface would not be associated > and UP at that point of if_attach_internal() and > `ifconfig inet6 -ifdisabled` would not have been run to be able to = send > that packet in first place? Emm, I'm still hacking the lifecycle of ifnet. I think you're right. By = default inet6 is disabled on interfaces. It seems to be impossible to send packets via disabled interfaces. >=20 > Othwerwise the packet would have had to "survive" the clone destroy = and > clone create cycle somewhere ...? >=20 >=20 >> diff --git a/sys/net/if.c b/sys/net/if.c >> index c3c27fbf678f..16ee5667e7bb 100644 >> --- a/sys/net/if.c >> +++ b/sys/net/if.c >> @@ -947,11 +947,11 @@ if_attach_internal(struct ifnet *ifp, bool = vmove) >> } >> #endif >>=20 >> - if_link_ifnet(ifp); >> - >> if (domain_init_status >=3D 2) >> if_attachdomain1(ifp); >>=20 >> + if_link_ifnet(ifp); >> + >> EVENTHANDLER_INVOKE(ifnet_arrival_event, ifp); >> if (IS_DEFAULT_VNET(curvnet)) >> devctl_notify("IFNET", ifp->if_xname, "ATTACH", NULL); >=20 > --=20 > Bjoern A. Zeeb = r15:7 --Apple-Mail=_5FF92A3F-999B-411E-A4BF-E5E75CE7B261 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii

On May 24, 2024, at 6:12 AM, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:

On Wed, 22 May 2024, Zhenlei Huang wrote:



On May = 22, 2024, at 12:17 PM, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
Hi,

sorry, I cannot dump; this = is a diskless and netdump does not do IPv6;
needless to = say that would be funny in this case anyway; unfortunately
I= have also already re-compiled the kernel so I can only look things up = approx.

FreeBSD main from May 13 = (f3eeeb959c9b00c89a2e1ff009c78162eb398656).

I= assume we lost the ifp from a destroy of a cloned interface in = ip6_output()
between lines 806 and 811?


Kernel page fault with the following = non-sleepable locks held:
exclusive rw rawinp (rawinp) r =3D= 0 (0xfffff80002a6e1a0) locked @ /usr/src/sys/netinet6/raw_ip6.c:393
stack backtrace:
#0 0xffffffff80bb679c at = witness_debugger+0x6c
#1 0xffffffff80bb7979 at = witness_warn+0x3e9
#2 0xffffffff81061d10 at = trap_pfault+0x80
#3 0xffffffff81033878 at calltrap+0x8
#4 0xffffffff80d99228 at rip6_send+0x5a8
#5 = 0xffffffff80bf570e at sosend_generic+0x5ee
#6 = 0xffffffff80bf5c49 at sousrsend+0x79
#7 0xffffffff80bfbd5c = at kern_sendit+0x1bc
#8 0xffffffff80bfc073 at = sendit+0x1b3
#9 0xffffffff80bfc1ab at sys_sendmsg+0x5b
#10 0xffffffff81062638 at amd64_syscall+0x158
#11= 0xffffffff8103418b at fast_syscall_common+0xf8
Created = wlan(4) interfaces: wlan

Note = the creation of wlan, and a following ICMP6 (ping6) packet.

Yes I think it was running netif restart wlan0 in = loops.


[...]

I'm not quite sure, but it seems the `ifp` is not fully = constructed. See https://cgit.freebsd.org/src/tree/sys/net/if.c#n950<https://cgit.freebsd.org/src/tree/sys/net/if.c#n950>

If I read the code correctly, the clone = created interface is made visible via `if_link_ifnet(ifp);` , and at = that time the
`ifp->if_afdata[AF_INET6]` is NULL and is = not initialized yet by `if_attachdomain1()` which will call = `in6_domifattach()`
to allocate the required data.

So I guess there is a race condition. I bet = this can be repeated easily.

I have not = tested this yet, and not sure if it is the right fix, but you can give = it a try.

I'll do; I haven't seen the error happening since on other = test
machines, so = not sure about repeatability.

I am also not entirely sure this is not a ping6 ff02::1%wlan0 = while
the ifp was = destroyed by netif restart at the same time the packet was
still on the = way out?

I = think that is possible. There is another report https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D279653&= nbsp;but 
with a different fault path.


If it was = during create, the wlan(4) interface would not be associated
and UP at = that point of if_attach_internal() and
`ifconfig inet6 -ifdisabled` would not have been run to be = able to send
that packet = in first place?

Emm, I'm still hacking the lifecycle of ifnet. I = think you're right. By default inet6 is disabled on = interfaces.
It seems to be impossible to send packets via = disabled interfaces.


Othwerwise the packet would have had to "survive" the clone = destroy and
clone create = cycle somewhere ...?


diff --git a/sys/net/if.c = b/sys/net/if.c
index c3c27fbf678f..16ee5667e7bb 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -947,11 +947,11 @@ if_attach_internal(struct ifnet *ifp, = bool vmove)
      }
#endif

- =       if_link_ifnet(ifp);
-      if (domain_init_status = >=3D 2)
          &nb= sp;   if_attachdomain1(ifp);

+=       if_link_ifnet(ifp);
+
      EVENTHANDLER_INVOKE(ifnet_a= rrival_event, ifp);
      if = (IS_DEFAULT_VNET(curvnet))
          &nb= sp;   devctl_notify("IFNET", ifp->if_xname, "ATTACH", = NULL);

-- Bjoern A. = Zeeb =             &n= bsp;           &nbs= p;            =             &n= bsp;  r15:7



= --Apple-Mail=_5FF92A3F-999B-411E-A4BF-E5E75CE7B261--