Date: Thu, 16 Mar 2023 21:14:25 +0000 (UTC) From: =?UTF-8?Q?Yves_Gu=C3=A9rin?= <yvesguerin@yahoo.ca> To: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, Attila Nagy <nagy.attila@gmail.com> Subject: Re: Kernel DHCP unpredictable/fails (PXE boot), userspace DHCP works just fine Message-ID: <132303943.191443.1679001265318@mail.yahoo.com> In-Reply-To: <CAM2hQG-oDRsoccg3S1LykyUF=joWbdJz=GSPOnUroDRxjZ2_iQ@mail.gmail.com> References: <CAM2hQG-p=bfSh_nxuah9zcTBbz7HQ9pYyvOR2f6rC=CUGePKsg@mail.gmail.com> <CAM2hQG-oDRsoccg3S1LykyUF=joWbdJz=GSPOnUroDRxjZ2_iQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_191442_1545221852.1679001265315 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dear Attila, May be I will add some noise to your thread, sorry in advance, I am just a = sysadmin and I faced the same problem with one of my old hp g7 the network = card was broken (malfunctionning) , sometime it works and sometime not when= I used pxe and dhcpd (take to much time to answer to the dhcp so the mothe= rboard decided to reboot, etc. (infinite loop)).=C2=A0 The card works perfe= ctly when it's setup by an OS. May be it's a stupid question or two: do you check the network cable ?=C2= =A0 (I faced some defective cables and it ruin my day...) in the same way w= hat about the hub/router attached to this server (configuration, etc.), Do = you switched a good one by a bad one ? (same network cable, hub/router, etc= .) I spend too much nights in the lab... Regards,=20 Yves Guerin=20 Le jeudi 16 mars 2023 =C3=A0 16:44:49 UTC=E2=88=924, Attila Nagy <nagy.= attila@gmail.com> a =C3=A9crit : =20 =20 Hi, As this is super annoying, I'm willing to pay a $500 bounty for solving thi= s issue (whomever is first, however I don't anticipate a big competition :)= Having an invoice would be best, but I'm willing to accept individuals as = well).I can't give remote access, but can run debug builds with serial cons= ole. stable/13 branch. I have a bunch of netbooted machines, one set in a cluster is older (HP DL8= 0 G9, 2x8C, Intel I350 -igb- NICs), the other set is newer (HP XL225n G10, = AMD EPYC2x16C, BCM57412 -bnxt- NICs).All of these boot from the network, wh= ich is basically:- get IP and options with DHCP with the help of the NIC's = PXE stack- get the loader and kernel, start it- do another round of DHCP fr= om the kernel (bootp_subr.c)- mount the root via NFS and let everything wor= k as usual The problem is that the newer machines take an indefinite time to boot. The= older ones (with igb NIC) work reliably, they always boot fast. The process of getting an IP address via DHCP (bootpc_call from bootp_subr.= c) either succeeds normally (in a few seconds), or takes a lot of time.Comm= on (measured) times to boot range from 10s of minutes to anywhere between a= few hours (1-6).Sometimes it just gets stuck and couldn't get past bootpc_= call (getting the DHCP lease). What I've already tried:- we have a redundant set of DHCP servers which off= er static leases (so there are two DHCPOFFERs), so I tried to turn off one = of them, nothing has changed - tried to disable SMP, the effect is the same - tried to see whether it's a network issue. The NIC's PXE stack always get= s the lease quickly and booting FreeBSD from an ISO and issuing dhclient on= the same interface is also fast. After the machines have booted, there are= no network issues, they work reliably (since more than a year for 20+ mach= ines, so not just a few hours) This issue wasn't so bad previously (only a few mins to tens of minutes del= ay), but recently it got pretty unbearable, even making some machines unboo= table for days... First I thought it might be a packet loss (or more exactly packet delivery = from the DHCP server to the receiving socket), either in the network or in = the NIC/kernel itself, so I placed a few random printfs into bootp_subr.c a= nd udp_usrreq.c. After spending some time trying to understand the problem it feels like a r= ace condition in=20 bootpc_call, but I don't know the code well enough to effectively verify th= at. Here are the modified bootp_subr.c and udp_usrreq.c:https://gist.githubuser= content.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84= a46da2452d557ebc5078ac/bootp_subr.chttps://gist.github.com/bra-fsn/128ae9a3= bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/udp_u= srreq.c(modified from stable/13 branch from a few weeks earlier) This is the output with the always working DL80 (igb) machine:https://gist.= github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a= 46da2452d557ebc5078ac/DL80%2520igb%2520good.txt This is the console output from a working boot for the XL225n (bnxt) machin= e:https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ad= e8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520good.txtas you can= see, it's much slower than the DL80 (which also isn't that fast...) And this one is a longer output, without success to that point (2 minutes w= ithout completing the DHCP flow):https://gist.github.com/bra-fsn/128ae9a3bb= c0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%= 2520bnxt%2520long.txt For the latter, here's an excerpt from the DHCP log: https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a= /raw/a8ade8af252f618c84a46da2452d557ebc5078ac/dhcp_log.txt It seems the DHCP state always gets reset to IF_DHCP_UNRESOLVED even if the= re's answers from the DHCP server. Here's another, longer console log, which succeeded after spending 236 seco= nds in the loop: https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a77f52= f5e83c699b38a7c2d3acdc52d26ceeba71/XL225n%2520bnxt%2520long%2520good.txt Any ideas about this? =20 ------=_Part_191442_1545221852.1679001265315 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html><head></head><body><div class=3D"ydp41b30c31yahoo-style-wrap" style= =3D"font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:16px= ;"><div><div dir=3D"ltr" data-setdir=3D"false">Dear Attila,</div><div dir= =3D"ltr" data-setdir=3D"false"><br></div><div dir=3D"ltr" data-setdir=3D"fa= lse">May be I will add some noise to your thread, sorry in advance, I am ju= st a sysadmin and I faced the same problem with one of my old hp g7 the net= work card was broken (malfunctionning) , sometime it works and sometime not= when I used pxe and dhcpd (take to much time to answer to the dhcp so the = motherboard decided to reboot, etc. (infinite loop)). The card works = perfectly when it's setup by an OS.</div><div dir=3D"ltr" data-setdir=3D"fa= lse"><br></div><div dir=3D"ltr" data-setdir=3D"false">May be it's a stupid = question or two: do you check the network cable ? (I faced some defec= tive cables and it ruin my day...) in the same way what about the hub/route= r attached to this server (configuration, etc.), Do you switched a good one= by a bad one ? (same network cable, hub/router, etc.)</div><div dir=3D"ltr= " data-setdir=3D"false"><br></div><div dir=3D"ltr" data-setdir=3D"false">I = spend too much nights in the lab...<br></div><div dir=3D"ltr" data-setdir= =3D"false"><br></div><div dir=3D"ltr" data-setdir=3D"false">Regards, <br></= div><div><br></div><div class=3D"ydp41b30c31signature">Yves Guerin</div></d= iv> <div><br></div><div><br></div> =20 </div><div id=3D"yahoo_quoted_9406432147" class=3D"yahoo_quoted"> <div style=3D"font-family:'Helvetica Neue', Helvetica, Arial, s= ans-serif;font-size:13px;color:#26282a;"> =20 <div> Le jeudi 16 mars 2023 =C3=A0 16:44:49 UTC=E2=88=924, At= tila Nagy <nagy.attila@gmail.com> a =C3=A9crit : </div> <div><br></div> <div><br></div> <div><div id=3D"yiv5749293741"><div dir=3D"ltr">Hi,<div cla= ss=3D"yiv5749293741gmail_quote"><div dir=3D"ltr"><div><br></div><div>As thi= s is super annoying, I'm willing to pay a $500 bounty for solving this issu= e (whomever is first, however I don't anticipate a big competition :) Havin= g an invoice would be best, but I'm willing to accept individuals as well).= </div><div>I can't give remote access, but can run debug builds with serial= console. stable/13 branch.<br></div><div><br></div><div>I have a bunch of = netbooted machines, one set in a cluster is older (HP DL80 G9, 2x8C, Intel = I350 -igb- NICs), the other set is newer (HP XL225n G10, AMD EPYC2x16C, BCM= 57412 -bnxt- NICs).</div><div>All of these boot from the network, which is = basically:</div><div>- get IP and options with DHCP with the help of the NI= C's PXE stack</div><div>- get the loader and kernel, start it</div><div>- d= o another round of DHCP from the kernel (bootp_subr.c)</div><div>- mount th= e root via NFS and let everything work as usual</div><div><br></div><div>Th= e problem is that the newer machines take an indefinite time to boot. The o= lder ones (with igb NIC) work reliably, they always boot fast.<br></div><di= v>The process of getting an IP address via DHCP (bootpc_call from bootp_sub= r.c) either succeeds normally (in a few seconds), or takes a lot of time.</= div><div>Common (measured) times to boot range from 10s of minutes to anywh= ere between a few hours (1-6).</div><div>Sometimes it just gets stuck and c= ouldn't get past bootpc_call (getting the DHCP lease).</div><div><br></div>= <div>What I've already tried:</div><div>- we have a redundant set of DHCP s= ervers which offer static leases (so there are two DHCPOFFERs), so I tried = to turn off one of them, nothing has changed<br></div><div>- tried to disab= le SMP, the effect is the same<br></div><div>- tried to see whether it's a = network issue. The NIC's PXE stack always gets the lease quickly and bootin= g FreeBSD from an ISO and issuing dhclient on the same interface is also fa= st. After the machines have booted, there are no network issues, they work = reliably (since more than a year for 20+ machines, so not just a few hours)= <br></div><div><br></div><div>This issue wasn't so bad previously (only a f= ew mins to tens of minutes delay), but recently it got pretty unbearable, e= ven making some machines unbootable for days...</div><div><br></div><div>Fi= rst I thought it might be a packet loss (or more exactly packet delivery fr= om the DHCP server to the receiving socket), either in the network or in th= e NIC/kernel itself, so I placed a few random printfs into bootp_subr.c and= udp_usrreq.c.</div><div><br></div><div>After spending some time trying to = understand the problem it feels like a race condition in <br></div><div>boo= tpc_call, but I don't know the code well enough to effectively verify that.= <br></div><div><br></div><div>Here are the modified bootp_subr.c and udp_u= srreq.c:</div><div><a rel=3D"nofollow noopener noreferrer" target=3D"_blank= " href=3D"https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4= b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/bootp_subr.c">https= ://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/= a8ade8af252f618c84a46da2452d557ebc5078ac/bootp_subr.c</a></div><div><a rel= =3D"nofollow noopener noreferrer" target=3D"_blank" href=3D"https://gist.gi= thub.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46= da2452d557ebc5078ac/udp_usrreq.c">https://gist.github.com/bra-fsn/128ae9a3b= bc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/udp_us= rreq.c</a></div><div>(modified from stable/13 branch from a few weeks earli= er)<br></div><div><br></div><div>This is the output with the always working= DL80 (igb) machine:</div><div><a rel=3D"nofollow noopener noreferrer" targ= et=3D"_blank" href=3D"https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f= 4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/DL80%2520igb%2520g= ood.txt">https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/r= aw/a8ade8af252f618c84a46da2452d557ebc5078ac/DL80%2520igb%2520good.txt</a></= div><div><br></div><div>This is the console output from a working boot for = the XL225n (bnxt) machine:</div><div><a rel=3D"nofollow noopener noreferrer= " target=3D"_blank" href=3D"https://gist.github.com/bra-fsn/128ae9a3bbc0dbd= bb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520b= nxt%2520good.txt">https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e= 2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520go= od.txt</a></div><div>as you can see, it's much slower than the DL80 (which = also isn't that fast...)</div><div><br></div><div>And this one is a longer = output, without success to that point (2 minutes without completing the DHC= P flow):</div><div><a rel=3D"nofollow noopener noreferrer" target=3D"_blank= " href=3D"https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/= raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt">= https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw</a><a = rel=3D"nofollow noopener noreferrer" target=3D"_blank" href=3D"https://gist= .github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84= a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt">/</a><a rel=3D"nofollo= w noopener noreferrer" target=3D"_blank" href=3D"https://gist.github.com/br= a-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557e= bc5078ac/XL225n%2520bnxt%2520long.txt">a8ade8af252f618c84a46da2452d557ebc50= 78ac/XL225n%2520bnxt%2520long.txt</a></div><div><br></div><div>For the latt= er, here's an excerpt from the DHCP log:<br></div><div><a rel=3D"nofollow n= oopener noreferrer" target=3D"_blank" href=3D"https://gist.githubuserconten= t.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2= 452d557ebc5078ac/dhcp_log.txt">https://gist.githubusercontent.com/bra-fsn/1= 28ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078a= c/dhcp_log.txt</a></div><div><br></div><div>It seems the DHCP state always = gets reset to IF_DHCP_UNRESOLVED even if there's answers from the DHCP serv= er.<br></div><div><br></div><div>Here's another, longer console log, which = succeeded after spending 236 seconds in the loop:<br></div><div><a rel=3D"n= ofollow noopener noreferrer" target=3D"_blank" href=3D"https://gist.github.= com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a77f52f5e83c699b38a7c2d3ac= dc52d26ceeba71/XL225n%2520bnxt%2520long%2520good.txt">https://gist.github.c= om/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a77f52f5e83c699b38a7c2d3acd= c52d26ceeba71/XL225n%2520bnxt%2520long%2520good.txt</a></div><div><br></div= ><div>Any ideas about this?</div><div><br></div></div> </div></div> </div></div> </div> </div></body></html> ------=_Part_191442_1545221852.1679001265315--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?132303943.191443.1679001265318>