Date: Thu, 8 Feb 2024 15:18:02 +0000 From: Muhammad Waseem <Muhammad.Waseem@Sophos.com> To: "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Seg Fault while upgrading freebsd TCP stack to v13.1 in mbuf chain Message-ID: <CWXP265MB4796C339D93CED1682BD8A5E9F442@CWXP265MB4796.GBRP265.PROD.OUTLOOK.COM>
next in thread | raw e-mail | index | archive | help
--_000_CWXP265MB4796C339D93CED1682BD8A5E9F442CWXP265MB4796GBRP_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable We are running a software that is used to operate middleboxes, and we have = the FreeBSD Network Stack for the TCP protocol implementation. It has serve= d us well, but we have not upgraded since 2013. In the past we have address= ed issues as and when we find them by applying patches from upstream, but t= hat won't work anymore and we decided to upgrade to 13.1. The upgrade was g= oing fine until, we were running smoke tests and we ran into a core. The particular smoke test in question uses Linux Traffic Control to add del= ay, and other impairments to the packet transmission. The exact options bei= ng: delay 10ms reorder 10% and we are using CURL to send a 512 MB randomly = generated file with a 45 second timeout window from the client. Even if we = expect the test to fail due to changes in the upgrade, it should not result= in a segfault. We primarily see the segmentation fault in this low memory machine and not = in other environments. We don't have a liberty to run memory analysis tool = on the environment where the core is reproduced. As for the details of the segmentation fault itself, its occurring in the m= buf chain. In the different tests we have run, the crash point is a differe= nt function, but usually occurs in these functions: 1. sbdrop_internal 2. tcp_m_copym 3. m_split (very rarely, it has also occured in) However, what's to note it always on trying to access a member of the curre= nt m buffer, e.g. m->m_len causes the crash or m->m_flag causes the crash. = I have tracked the faulty address that I get from these functions, to a soc= ket which is assigned from tcp_input_with_port() function from the inpcb st= ruct. The address is of course inaccessible in gdb. The faulty address belo= ngs to the mbuf chains in the so_snd socket buffer. It is usually the mbuf = in sb_sndptr. Either the first member itself or down the line. Although in = one or two cores, the same applies for the sb_mb mbuf chain (which I assume= is the main chain itself). From the addresses we can clearly see its a hea= p overflow, as I was able to go through sb_sndptr chain in one the cores un= til i found the faulty address. The last address before the faulty one is: 0x7f402a4d0700 after which comes= 0x9fff22eb779f. I also see this faulty address for the first time in the f= rame of the function tcp_input_with_port(), in inpcb struct(inp). The very = obvious difference between the two addresses and it show that somewhere whi= le accessing or assigning the mbuf, an overflow has occurred. These are mos= t common back trace: 1. sbdrop_internal() 2. sbdrop_locked() 3. tcp_do_segment() 4. tcp_input_with_port() 5. in_input() 5. netisr_dispatch_src() 6. ether_demux() 7. ether_input_internal() 8. ether_nh_input 9. netisr_dispatch_src() 10. netisr_dispatch() 11. ns_net_tcp_push_frame(). I have tried to track down the source of the faulty address further than tc= p_input_with_port() but with no avail. I only have cores available, and eve= n gdb blocks the seg fault from happening in the test. I have gone through = the code, and according to my meagre understanding, nothing indicates towar= ds a heap buffer overflow in any of the above functions. Any help, in point= ing to the right direction or anything else would be greatly appreciated. I= f you need any more information or a more appropriate mailing list, please = let me know. Thanks, Waseem --_000_CWXP265MB4796C339D93CED1682BD8A5E9F442CWXP265MB4796GBRP_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr= osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:= //www.w3.org/TR/REC-html40"> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"= > <meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;} span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri",sans-serif; color:windowtext;} .MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--> </head> <body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72" style=3D"word-wrap:= break-word"> <div class=3D"WordSection1"> <p class=3D"MsoNormal">We are running a software that is used to operate mi= ddleboxes, and we have the FreeBSD Network Stack for the TCP protocol imple= mentation. It has served us well, but we have not upgraded since 2013. In t= he past we have addressed issues as and when we find them by applying patches from upstream, but that won't wo= rk anymore and we decided to upgrade to 13.1. The upgrade was going fine un= til, we were running smoke tests and we ran into a core. <o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">The particular smoke test in question uses Linux Tra= ffic Control to add delay, and other impairments to the packet transmission= . The exact options being: delay 10ms reorder 10% and we are using CURL to = send a 512 MB randomly generated file with a 45 second timeout window from the client. Even if we expect the tes= t to fail due to changes in the upgrade, it should not result in a segfault= .<o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">We primarily see the segmentation fault in this low = memory machine and not in other environments. We don’t have a liberty= to run memory analysis tool on the environment where the core is reproduce= d.<o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">As for the details of the segmentation fault itself,= its occurring in the mbuf chain. In the different tests we have run, the c= rash point is a different function, but usually occurs in these functions:<= o:p></o:p></p> <p class=3D"MsoNormal">1. sbdrop_internal<o:p></o:p></p> <p class=3D"MsoNormal">2. tcp_m_copym<o:p></o:p></p> <p class=3D"MsoNormal">3. m_split (very rarely, it has also occured in)<o:p= ></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">However, what's to note it always on trying to acces= s a member of the current m buffer, e.g. m->m_len causes the crash or m-= >m_flag causes the crash. I have tracked the faulty address that I get f= rom these functions, to a socket which is assigned from tcp_input_with_port() function from the inpcb struct. The ad= dress is of course inaccessible in gdb. The faulty address belongs to the m= buf chains in the so_snd socket buffer. It is usually the mbuf in sb_sndptr= . Either the first member itself or down the line. Although in one or two cores, the same applies for the s= b_mb mbuf chain (which I assume is the main chain itself). From the address= es we can clearly see its a heap overflow, as I was able to go through sb_s= ndptr chain in one the cores until i found the faulty address. <o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">The last address before the faulty one is: 0x7f402a4= d0700 after which comes 0x9fff22eb779f. I also see this faulty address for = the first time in the frame of the function tcp_input_with_port(), in inpcb= struct(inp). The very obvious difference between the two addresses and it show that somewhere while accessing or as= signing the mbuf, an overflow has occurred. These are most common back trac= e:<o:p></o:p></p> <p class=3D"MsoNormal">1. sbdrop_internal()<o:p></o:p></p> <p class=3D"MsoNormal">2. sbdrop_locked()<o:p></o:p></p> <p class=3D"MsoNormal">3. tcp_do_segment()<o:p></o:p></p> <p class=3D"MsoNormal">4. tcp_input_with_port()<o:p></o:p></p> <p class=3D"MsoNormal">5. in_input()<o:p></o:p></p> <p class=3D"MsoNormal">5. netisr_dispatch_src()<o:p></o:p></p> <p class=3D"MsoNormal">6. ether_demux()<o:p></o:p></p> <p class=3D"MsoNormal">7. ether_input_internal()<o:p></o:p></p> <p class=3D"MsoNormal">8. ether_nh_input<o:p></o:p></p> <p class=3D"MsoNormal">9. netisr_dispatch_src()<o:p></o:p></p> <p class=3D"MsoNormal">10. netisr_dispatch()<o:p></o:p></p> <p class=3D"MsoNormal">11. ns_net_tcp_push_frame().<o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">I have tried to track down the source of the faulty = address further than tcp_input_with_port() but with no avail. I only have c= ores available, and even gdb blocks the seg fault from happening in the tes= t. I have gone through the code, and according to my meagre understanding, nothing indicates towards a heap buf= fer overflow in any of the above functions. Any help, in pointing to the ri= ght direction or anything else would be greatly appreciated. If you need an= y more information or a more appropriate mailing list, please let me know.<o:p></o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">Thanks,<o:p></o:p></p> <p class=3D"MsoNormal">Waseem<o:p></o:p></p> </div> </body> </html> --_000_CWXP265MB4796C339D93CED1682BD8A5E9F442CWXP265MB4796GBRP_--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CWXP265MB4796C339D93CED1682BD8A5E9F442>