From nobody Thu Feb 23 22:38:19 2023 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PN7Gd0tgDz3ssMS for ; Thu, 23 Feb 2023 22:38:33 +0000 (UTC) (envelope-from nagy.attila@gmail.com) Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PN7Gc0BPlz3pHv for ; Thu, 23 Feb 2023 22:38:32 +0000 (UTC) (envelope-from nagy.attila@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=Lwu+xVnO; spf=pass (mx1.freebsd.org: domain of nagy.attila@gmail.com designates 2607:f8b0:4864:20::22b as permitted sender) smtp.mailfrom=nagy.attila@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oi1-x22b.google.com with SMTP id be35so14233125oib.4 for ; Thu, 23 Feb 2023 14:38:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677191911; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=lii3jS/9fGJvnkKWMhof5wb+vhSkuojlaAE9nLuT20s=; b=Lwu+xVnOGhgppTEsRP+EcxBmPuEHodyh/uDb28JvntEuj0McMq1CMd7gN/3QYDxzC0 HX/rETHgAx8XEG1qSCnXasFjXSyqzSptD9h1XN9cMvh0/H86ItsfeNV3f26dmdT9M06E RXibPZZbKi1zJo0W7zBZ2TbFqepUSpwv6ndhY7ZlcNiGjpCqJTBZOP1vlvCfdoqZ7V6k rKeZwH2URXO5oijdI8z9QBCc5NkW+y8zaGZY9rnH3AJ4yRtyCNVEgQg7CUBQCNEPhDtx nr3HG8dn7yrEL+cFKinBJp+BnoS2PZC85uNtjsH0c1MEbxoBioTwYPTtLX1GZ8cduC8W Ydng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677191911; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=lii3jS/9fGJvnkKWMhof5wb+vhSkuojlaAE9nLuT20s=; b=CZM03Y014ge3ipMp+lXSKgVeckwxXwgTIm7Uxqr2FunkvhSRTn8lDBztCKWWA13rsH wEkaD3EWGiqrYzGtZy+95SS8f47s/jZ+pAPSxqn9Ob/4ZYCs9rUTHu2DR9sPaznSBJHv iWxXs+6gCLscanXxsOLTEfXdyQRGMCCaW0p5iqHg8eNm+ddjgOg9ZO+HZXbNJcUcnGhM VNJ+piZGTmp3oPzctSgV7gS7m/Y0on/SVJ0efx8olPvJnalt5QebR31ddTad2Q9HQ0dd 5K+phH5OHBaatOoWH9QTVxrT6mIZNroZYy6hPG5iAT3PyxMQOQqMSzzpqG0YBGs6pDTU G6xg== X-Gm-Message-State: AO0yUKVwd+2VQmIDlFb3AdCyasMo6q4xctiPi8e8fyS30saqRJ2LqtwS 5OmEM56DoYgoz/tc5qg7NdLEs8zuuC8Buq1/3ukQ1+5pU890bg== X-Google-Smtp-Source: AK7set/BjCFPI+fPYzTUtkPnktS7K6e0KEdBRl9dbrZ8J7r17pWiqE0kE2pOaKfKYsTpelrhEmfKPx1dXwphzL4WWhU= X-Received: by 2002:a05:6808:9a4:b0:37a:d91c:da8e with SMTP id e4-20020a05680809a400b0037ad91cda8emr444184oig.10.1677191910703; Thu, 23 Feb 2023 14:38:30 -0800 (PST) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 From: Attila Nagy Date: Thu, 23 Feb 2023 23:38:19 +0100 Message-ID: Subject: Kernel DHCP unpredictable/fails (PXE boot), userspace DHCP works just fine To: freebsd-net@freebsd.org Content-Type: multipart/alternative; boundary="0000000000000ec80705f565ac68" X-Spamd-Result: default: False [-3.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-0.99)[-0.991]; NEURAL_HAM_SHORT(-0.99)[-0.989]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::22b:from]; ARC_NA(0.00)[]; TAGGED_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-net@freebsd.org]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; TO_DN_NONE(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4PN7Gc0BPlz3pHv X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N --0000000000000ec80705f565ac68 Content-Type: text/plain; charset="UTF-8" Hi, I have a bunch of netbooted machines, one set in a cluster is older (HP DL80 G9, 2x8C, Intel I350 -igb- NICs), the other set is newer (HP XL225n G10, AMD EPYC2x16C, BCM57412 -bnxt- NICs). All of these boot from the network, which is basically: - get IP and options with DHCP with the help of the NIC's PXE stack - get the loader and kernel, start it - do another round of DHCP from the kernel (bootp_subr.c) - mount the root via NFS and let everything work as usual My problem is that the newer machines take an indefinite time to boot. The older ones are just working reliably, they always boot fast. The process of getting an IP address via DHCP (bootpc_call from bootp_subr.c) either succeeds normally (in a few seconds), or takes a lot of time. Common (measured) times to boot range from 10s of minutes to anywhere between a few hours (1-6). Sometimes it just gets stuck and couldn't get past bootpc_call (getting the DHCP lease). What I've already tried: - we have a redundant set of DHCP servers which offer static leases (so there are two DHCPOFFERs), so I tried to turn off one of them, nothing has changed - tried to disable SMP, the effect is the same - tried to see whether it's a network issue. The NIC's PXE stack always gets the lease quickly and booting FreeBSD from an ISO and issuing dhclient on the same interface is also fast. After the machines have booted, there are no network issues, they work reliably (since more than a year for 20+ machines, so not just a few hours) This issue wasn't so bad previously (only a few mins to tens of minutes delay), but recently it got pretty unbearable, even making some machines unbootable for days... First I thought it might be a packet loss (or more exactly packet delivery from the DHCP server to the receiving socket), either in the network or in the NIC/kernel itself, so I placed a few random printfs into bootp_subr.c and udp_usrreq.c. After spending some time trying to understand the problem it feels like a race condition in bootpc_call, but I don't know the code well enough to effectively verify that. Here are the modified bootp_subr.c and udp_usrreq.c: https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/bootp_subr.c https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/udp_usrreq.c (modified from today's stable/13 branch, I also run that kernel) This is the output with the always working DL80 (igb) machine: https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/DL80%2520igb%2520good.txt This is the console output from a working boot for the XL225n (bnxt) machine: https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520good.txt as you can see, it's much slower than the DL80 (which also isn't that fast...) And this one is a longer output, without success to that point (2 minutes without completing the DHCP flow): https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt For the latter, here's an excerpt from the DHCP log: https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/dhcp_log.txt It seems the DHCP state always gets reset to IF_DHCP_UNRESOLVED even if there's answers from the DHCP server. Here's another, longer console log, which succeeded after spending 236 seconds in the loop: https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a77f52f5e83c699b38a7c2d3acdc52d26ceeba71/XL225n%2520bnxt%2520long%2520good.txt Any ideas about this? --0000000000000ec80705f565ac68 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I have a bunch of netboo= ted machines, one set in a cluster is older (HP DL80 G9, 2x8C, Intel I350 -= igb- NICs), the other set is newer (HP XL225n G10, AMD EPYC2x16C, BCM57412 = -bnxt- NICs).
All of these boot from the network, which is basica= lly:
- get IP and options with DHCP with the help of the NIC'= s PXE stack
- get the loader and kernel, start it
- do = another round of DHCP from the kernel (bootp_subr.c)
- mount the = root via NFS and let everything work as usual

My p= roblem is that the newer machines take an indefinite time to boot. The olde= r ones are just working reliably, they always boot fast.
The = process of getting an IP address via DHCP (bootpc_call from bootp_subr.c) e= ither succeeds normally (in a few seconds), or takes a lot of time.
Common (measured) times to boot range from 10s of minutes to anywhere be= tween a few hours (1-6).
Sometimes it just gets stuck and couldn&= #39;t get past bootpc_call (getting the DHCP lease).

What I've already tried:
- we have a redundant set of DHCP= servers which offer static leases (so there are two DHCPOFFERs), so I trie= d to turn off one of them, nothing has changed
- tried to dis= able SMP, the effect is the same
- tried to see whether it= 9;s a network issue. The NIC's PXE stack always gets the lease quickly = and booting FreeBSD from an ISO and issuing dhclient on the same interface = is also fast. After the machines have booted, there are no network issues, = they work reliably (since more than a year for 20+ machines, so not just a = few hours)

This issue wasn't so bad previo= usly (only a few mins to tens of minutes delay), but recently it got pretty= unbearable, even making some machines unbootable for days...
First I thought it might be a packet loss (or more exactly pack= et delivery from the DHCP server to the receiving socket), either in the ne= twork or in the NIC/kernel itself, so I placed a few random printfs into bo= otp_subr.c and udp_usrreq.c.

After spending some t= ime trying to understand the problem it feels like a race condition in
=
bootpc_call, but I don't know the code well enough to effect= ively verify that.

Here are the modified boot= p_subr.c and udp_usrreq.c:
(modified from today's stable/13 branch, I also run that kernel)
=

This is the output with the always working DL80 (= igb) machine:

This is the console output from a work= ing boot for the XL225n (bnxt) machine:
as you can see, it= 9;s much slower than the DL80 (which also isn't that fast...)

And this one is a longer output, without success to that po= int (2 minutes without completing the DHCP flow):

<= div>For the latter, here's an excerpt from the DHCP log:

It seems the DHCP state always gets reset to IF_DHCP_UNRESOLVED ev= en if there's answers from the DHCP server.

Here's another, longer console log, which succeeded after spending 23= 6 seconds in the loop:

A= ny ideas about this?

--0000000000000ec80705f565ac68--