Date: Thu, 16 Mar 2023 22:26:40 +0100 From: Attila Nagy <nagy.attila@gmail.com> To: =?UTF-8?B?WXZlcyBHdcOpcmlu?= <yvesguerin@yahoo.ca> Cc: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org> Subject: Re: Kernel DHCP unpredictable/fails (PXE boot), userspace DHCP works just fine Message-ID: <CAM2hQG9XucjqM763CcCivtZufc4BYQi5BDdjmzAAMccBVEy2hA@mail.gmail.com> In-Reply-To: <132303943.191443.1679001265318@mail.yahoo.com> References: <CAM2hQG-p=bfSh_nxuah9zcTBbz7HQ9pYyvOR2f6rC=CUGePKsg@mail.gmail.com> <CAM2hQG-oDRsoccg3S1LykyUF=joWbdJz=GSPOnUroDRxjZ2_iQ@mail.gmail.com> <132303943.191443.1679001265318@mail.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000080e78305f70b1ed9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey, Sure. We're talking about 30 machines, all behave the same (either bad or good). I'm pretty sure it's not a cabling issue. :) Yves Gu=C3=A9rin <yvesguerin@yahoo.ca> ezt =C3=ADrta (id=C5=91pont: 2023. m= =C3=A1rc. 16., Cs, 22:14): > Dear Attila, > > May be I will add some noise to your thread, sorry in advance, I am just = a > sysadmin and I faced the same problem with one of my old hp g7 the networ= k > card was broken (malfunctionning) , sometime it works and sometime not wh= en > I used pxe and dhcpd (take to much time to answer to the dhcp so the > motherboard decided to reboot, etc. (infinite loop)). The card works > perfectly when it's setup by an OS. > > May be it's a stupid question or two: do you check the network cable ? (= I > faced some defective cables and it ruin my day...) in the same way what > about the hub/router attached to this server (configuration, etc.), Do yo= u > switched a good one by a bad one ? (same network cable, hub/router, etc.) > > I spend too much nights in the lab... > > Regards, > > Yves Guerin > > > Le jeudi 16 mars 2023 =C3=A0 16:44:49 UTC=E2=88=924, Attila Nagy <nagy.at= tila@gmail.com> > a =C3=A9crit : > > > Hi, > > As this is super annoying, I'm willing to pay a $500 bounty for solving > this issue (whomever is first, however I don't anticipate a big competiti= on > :) Having an invoice would be best, but I'm willing to accept individuals > as well). > I can't give remote access, but can run debug builds with serial console. > stable/13 branch. > > I have a bunch of netbooted machines, one set in a cluster is older (HP > DL80 G9, 2x8C, Intel I350 -igb- NICs), the other set is newer (HP XL225n > G10, AMD EPYC2x16C, BCM57412 -bnxt- NICs). > All of these boot from the network, which is basically: > - get IP and options with DHCP with the help of the NIC's PXE stack > - get the loader and kernel, start it > - do another round of DHCP from the kernel (bootp_subr.c) > - mount the root via NFS and let everything work as usual > > The problem is that the newer machines take an indefinite time to boot. > The older ones (with igb NIC) work reliably, they always boot fast. > The process of getting an IP address via DHCP (bootpc_call from > bootp_subr.c) either succeeds normally (in a few seconds), or takes a lot > of time. > Common (measured) times to boot range from 10s of minutes to anywhere > between a few hours (1-6). > Sometimes it just gets stuck and couldn't get past bootpc_call (getting > the DHCP lease). > > What I've already tried: > - we have a redundant set of DHCP servers which offer static leases (so > there are two DHCPOFFERs), so I tried to turn off one of them, nothing ha= s > changed > - tried to disable SMP, the effect is the same > - tried to see whether it's a network issue. The NIC's PXE stack always > gets the lease quickly and booting FreeBSD from an ISO and issuing dhclie= nt > on the same interface is also fast. After the machines have booted, there > are no network issues, they work reliably (since more than a year for 20+ > machines, so not just a few hours) > > This issue wasn't so bad previously (only a few mins to tens of minutes > delay), but recently it got pretty unbearable, even making some machines > unbootable for days... > > First I thought it might be a packet loss (or more exactly packet deliver= y > from the DHCP server to the receiving socket), either in the network or i= n > the NIC/kernel itself, so I placed a few random printfs into bootp_subr.c > and udp_usrreq.c. > > After spending some time trying to understand the problem it feels like a > race condition in > bootpc_call, but I don't know the code well enough to effectively verify > that. > > Here are the modified bootp_subr.c and udp_usrreq.c: > > https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c515= 7a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/bootp_subr.c > > https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ad= e8af252f618c84a46da2452d557ebc5078ac/udp_usrreq.c > (modified from stable/13 branch from a few weeks earlier) > > This is the output with the always working DL80 (igb) machine: > > https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ad= e8af252f618c84a46da2452d557ebc5078ac/DL80%2520igb%2520good.txt > > This is the console output from a working boot for the XL225n (bnxt) > machine: > > https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ad= e8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520good.txt > as you can see, it's much slower than the DL80 (which also isn't that > fast...) > > And this one is a longer output, without success to that point (2 minutes > without completing the DHCP flow): > https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw > <https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8a= de8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt> > / > <https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8a= de8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt> > a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt > <https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8a= de8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt> > > For the latter, here's an excerpt from the DHCP log: > > https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c515= 7a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/dhcp_log.txt > > It seems the DHCP state always gets reset to IF_DHCP_UNRESOLVED even if > there's answers from the DHCP server. > > Here's another, longer console log, which succeeded after spending 236 > seconds in the loop: > > https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a77f= 52f5e83c699b38a7c2d3acdc52d26ceeba71/XL225n%2520bnxt%2520long%2520good.txt > > Any ideas about this? > > --00000000000080e78305f70b1ed9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Hey,</div><div><br></div><div>Sure. We're talking= about 30 machines, all behave the same (either bad or good). I'm prett= y sure it's not a cabling issue. :)<br></div></div><br><div class=3D"gm= ail_quote"><div dir=3D"ltr" class=3D"gmail_attr">Yves Gu=C3=A9rin <<a hr= ef=3D"mailto:yvesguerin@yahoo.ca">yvesguerin@yahoo.ca</a>> ezt =C3=ADrta= (id=C5=91pont: 2023. m=C3=A1rc. 16., Cs, 22:14):<br></div><blockquote clas= s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid r= gb(204,204,204);padding-left:1ex"><div><div style=3D"font-family:Helvetica = Neue,Helvetica,Arial,sans-serif;font-size:16px"><div><div dir=3D"ltr">Dear = Attila,</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">May be I will add = some noise to your thread, sorry in advance, I am just a sysadmin and I fac= ed the same problem with one of my old hp g7 the network card was broken (m= alfunctionning) , sometime it works and sometime not when I used pxe and dh= cpd (take to much time to answer to the dhcp so the motherboard decided to = reboot, etc. (infinite loop)).=C2=A0 The card works perfectly when it's= setup by an OS.</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">May be it= 's a stupid question or two: do you check the network cable ?=C2=A0 (I = faced some defective cables and it ruin my day...) in the same way what abo= ut the hub/router attached to this server (configuration, etc.), Do you swi= tched a good one by a bad one ? (same network cable, hub/router, etc.)</div= ><div dir=3D"ltr"><br></div><div dir=3D"ltr">I spend too much nights in the= lab...<br></div><div dir=3D"ltr"><br></div><div dir=3D"ltr">Regards, <br><= /div><div><br></div><div>Yves Guerin</div></div> <div><br></div><div><br></div> =20 </div><div id=3D"m_8289234250849919298yahoo_quoted_9406432147"> <div style=3D"font-family:"Helvetica Neue",Helvetica,= Arial,sans-serif;font-size:13px;color:rgb(38,40,42)"> =20 <div> Le jeudi 16 mars 2023 =C3=A0 16:44:49 UTC=E2=88=924, At= tila Nagy <<a href=3D"mailto:nagy.attila@gmail.com" target=3D"_blank">na= gy.attila@gmail.com</a>> a =C3=A9crit : </div> <div><br></div> <div><br></div> <div><div id=3D"m_8289234250849919298yiv5749293741"><div di= r=3D"ltr">Hi,<div><div dir=3D"ltr"><div><br></div><div>As this is super ann= oying, I'm willing to pay a $500 bounty for solving this issue (whomeve= r is first, however I don't anticipate a big competition :) Having an i= nvoice would be best, but I'm willing to accept individuals as well).</= div><div>I can't give remote access, but can run debug builds with seri= al console. stable/13 branch.<br></div><div><br></div><div>I have a bunch o= f netbooted machines, one set in a cluster is older (HP DL80 G9, 2x8C, Inte= l I350 -igb- NICs), the other set is newer (HP XL225n G10, AMD EPYC2x16C, B= CM57412 -bnxt- NICs).</div><div>All of these boot from the network, which i= s basically:</div><div>- get IP and options with DHCP with the help of the = NIC's PXE stack</div><div>- get the loader and kernel, start it</div><d= iv>- do another round of DHCP from the kernel (bootp_subr.c)</div><div>- mo= unt the root via NFS and let everything work as usual</div><div><br></div><= div>The problem is that the newer machines take an indefinite time to boot.= The older ones (with igb NIC) work reliably, they always boot fast.<br></d= iv><div>The process of getting an IP address via DHCP (bootpc_call from boo= tp_subr.c) either succeeds normally (in a few seconds), or takes a lot of t= ime.</div><div>Common (measured) times to boot range from 10s of minutes to= anywhere between a few hours (1-6).</div><div>Sometimes it just gets stuck= and couldn't get past bootpc_call (getting the DHCP lease).</div><div>= <br></div><div>What I've already tried:</div><div>- we have a redundant= set of DHCP servers which offer static leases (so there are two DHCPOFFERs= ), so I tried to turn off one of them, nothing has changed<br></div><div>- = tried to disable SMP, the effect is the same<br></div><div>- tried to see w= hether it's a network issue. The NIC's PXE stack always gets the le= ase quickly and booting FreeBSD from an ISO and issuing dhclient on the sam= e interface is also fast. After the machines have booted, there are no netw= ork issues, they work reliably (since more than a year for 20+ machines, so= not just a few hours)<br></div><div><br></div><div>This issue wasn't s= o bad previously (only a few mins to tens of minutes delay), but recently i= t got pretty unbearable, even making some machines unbootable for days...</= div><div><br></div><div>First I thought it might be a packet loss (or more = exactly packet delivery from the DHCP server to the receiving socket), eith= er in the network or in the NIC/kernel itself, so I placed a few random pri= ntfs into bootp_subr.c and udp_usrreq.c.</div><div><br></div><div>After spe= nding some time trying to understand the problem it feels like a race condi= tion in <br></div><div>bootpc_call, but I don't know the code well enou= gh to effectively verify that.<br></div><div><br></div><div>Here are the mo= dified bootp_subr.c and udp_usrreq.c:</div><div><a rel=3D"nofollow noopene= r noreferrer" href=3D"https://gist.githubusercontent.com/bra-fsn/128ae9a3bb= c0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/bootp_s= ubr.c" target=3D"_blank">https://gist.githubusercontent.com/bra-fsn/128ae9a= 3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/boot= p_subr.c</a></div><div><a rel=3D"nofollow noopener noreferrer" href=3D"http= s://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af25= 2f618c84a46da2452d557ebc5078ac/udp_usrreq.c" target=3D"_blank">https://gist= .github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84= a46da2452d557ebc5078ac/udp_usrreq.c</a></div><div>(modified from stable/13 = branch from a few weeks earlier)<br></div><div><br></div><div>This is the o= utput with the always working DL80 (igb) machine:</div><div><a rel=3D"nofol= low noopener noreferrer" href=3D"https://gist.github.com/bra-fsn/128ae9a3bb= c0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/DL80%25= 20igb%2520good.txt" target=3D"_blank">https://gist.github.com/bra-fsn/128ae= 9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/DL= 80%2520igb%2520good.txt</a></div><div><br></div><div>This is the console ou= tput from a working boot for the XL225n (bnxt) machine:</div><div><a rel=3D= "nofollow noopener noreferrer" href=3D"https://gist.github.com/bra-fsn/128a= e9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/X= L225n%2520bnxt%2520good.txt" target=3D"_blank">https://gist.github.com/bra-= fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc= 5078ac/XL225n%2520bnxt%2520good.txt</a></div><div>as you can see, it's = much slower than the DL80 (which also isn't that fast...)</div><div><br= ></div><div>And this one is a longer output, without success to that point = (2 minutes without completing the DHCP flow):</div><div><a rel=3D"nofollow = noopener noreferrer" href=3D"https://gist.github.com/bra-fsn/128ae9a3bbc0db= dbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520= bnxt%2520long.txt" target=3D"_blank">https://gist.github.com/bra-fsn/128ae9= a3bbc0dbdbb2f6f4b3e2c5157a/raw</a><a rel=3D"nofollow noopener noreferrer" h= ref=3D"https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw= /a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt" tar= get=3D"_blank">/</a><a rel=3D"nofollow noopener noreferrer" href=3D"https:/= /gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f6= 18c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt" target=3D"_blank"= >a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt</a><= /div><div><br></div><div>For the latter, here's an excerpt from the DHC= P log:<br></div><div><a rel=3D"nofollow noopener noreferrer" href=3D"https:= //gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a= 8ade8af252f618c84a46da2452d557ebc5078ac/dhcp_log.txt" target=3D"_blank">htt= ps://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/ra= w/a8ade8af252f618c84a46da2452d557ebc5078ac/dhcp_log.txt</a></div><div><br><= /div><div>It seems the DHCP state always gets reset to IF_DHCP_UNRESOLVED e= ven if there's answers from the DHCP server.<br></div><div><br></div><d= iv>Here's another, longer console log, which succeeded after spending 2= 36 seconds in the loop:<br></div><div><a rel=3D"nofollow noopener noreferre= r" href=3D"https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a= /raw/a77f52f5e83c699b38a7c2d3acdc52d26ceeba71/XL225n%2520bnxt%2520long%2520= good.txt" target=3D"_blank">https://gist.github.com/bra-fsn/128ae9a3bbc0dbd= bb2f6f4b3e2c5157a/raw/a77f52f5e83c699b38a7c2d3acdc52d26ceeba71/XL225n%2520b= nxt%2520long%2520good.txt</a></div><div><br></div><div>Any ideas about this= ?</div><div><br></div></div> </div></div> </div></div> </div> </div></div></blockquote></div> --00000000000080e78305f70b1ed9--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM2hQG9XucjqM763CcCivtZufc4BYQi5BDdjmzAAMccBVEy2hA>