Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Feb 2024 15:18:02 +0000
From:      Muhammad Waseem <Muhammad.Waseem@Sophos.com>
To:        "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Seg Fault while upgrading freebsd TCP stack to v13.1 in mbuf chain
Message-ID:  <CWXP265MB4796C339D93CED1682BD8A5E9F442@CWXP265MB4796.GBRP265.PROD.OUTLOOK.COM>

next in thread | raw e-mail | index | archive | help
--_000_CWXP265MB4796C339D93CED1682BD8A5E9F442CWXP265MB4796GBRP_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

We are running a software that is used to operate middleboxes, and we have =
the FreeBSD Network Stack for the TCP protocol implementation. It has serve=
d us well, but we have not upgraded since 2013. In the past we have address=
ed issues as and when we find them by applying patches from upstream, but t=
hat won't work anymore and we decided to upgrade to 13.1. The upgrade was g=
oing fine until, we were running smoke tests and we ran into a core.

The particular smoke test in question uses Linux Traffic Control to add del=
ay, and other impairments to the packet transmission. The exact options bei=
ng: delay 10ms reorder 10% and we are using CURL to send a 512 MB randomly =
generated file with a 45 second timeout window from the client. Even if we =
expect the test to fail due to changes in the upgrade, it should not result=
 in a segfault.

We primarily see the segmentation fault in this low memory machine and not =
in other environments. We don't have a liberty to run memory analysis tool =
on the environment where the core is reproduced.

As for the details of the segmentation fault itself, its occurring in the m=
buf chain. In the different tests we have run, the crash point is a differe=
nt function, but usually occurs in these functions:
1. sbdrop_internal
2. tcp_m_copym
3. m_split (very rarely, it has also occured in)

However, what's to note it always on trying to access a member of the curre=
nt m buffer, e.g. m->m_len causes the crash or m->m_flag causes the crash. =
I have tracked the faulty address that I get from these functions, to a soc=
ket which is assigned from tcp_input_with_port() function from the inpcb st=
ruct. The address is of course inaccessible in gdb. The faulty address belo=
ngs to the mbuf chains in the so_snd socket buffer. It is usually the mbuf =
in sb_sndptr. Either the first member itself or down the line. Although in =
one or two cores, the same applies for the sb_mb mbuf chain (which I assume=
 is the main chain itself). From the addresses we can clearly see its a hea=
p overflow, as I was able to go through sb_sndptr chain in one the cores un=
til i found the faulty address.

The last address before the faulty one is: 0x7f402a4d0700 after which comes=
 0x9fff22eb779f. I also see this faulty address for the first time in the f=
rame of the function tcp_input_with_port(), in inpcb struct(inp). The very =
obvious difference between the two addresses and it show that somewhere whi=
le accessing or assigning the mbuf, an overflow has occurred. These are mos=
t common back trace:
1. sbdrop_internal()
2. sbdrop_locked()
3. tcp_do_segment()
4. tcp_input_with_port()
5. in_input()
5. netisr_dispatch_src()
6. ether_demux()
7. ether_input_internal()
8. ether_nh_input
9. netisr_dispatch_src()
10. netisr_dispatch()
11. ns_net_tcp_push_frame().

I have tried to track down the source of the faulty address further than tc=
p_input_with_port() but with no avail. I only have cores available, and eve=
n gdb blocks the seg fault from happening in the test. I have gone through =
the code, and according to my meagre understanding, nothing indicates towar=
ds a heap buffer overflow in any of the above functions. Any help, in point=
ing to the right direction or anything else would be greatly appreciated. I=
f you need any more information or a more appropriate mailing list, please =
let me know.

Thanks,
Waseem

--_000_CWXP265MB4796C339D93CED1682BD8A5E9F442CWXP265MB4796GBRP_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri",sans-serif;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72" style=3D"word-wrap:=
break-word">
<div class=3D"WordSection1">
<p class=3D"MsoNormal">We are running a software that is used to operate mi=
ddleboxes, and we have the FreeBSD Network Stack for the TCP protocol imple=
mentation. It has served us well, but we have not upgraded since 2013. In t=
he past we have addressed issues as
 and when we find them by applying patches from upstream, but that won't wo=
rk anymore and we decided to upgrade to 13.1. The upgrade was going fine un=
til, we were running smoke tests and we ran into a core.
<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">The particular smoke test in question uses Linux Tra=
ffic Control to add delay, and other impairments to the packet transmission=
. The exact options being: delay 10ms reorder 10% and we are using CURL to =
send a 512 MB randomly generated file
 with a 45 second timeout window from the client. Even if we expect the tes=
t to fail due to changes in the upgrade, it should not result in a segfault=
.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">We primarily see the segmentation fault in this low =
memory machine and not in other environments. We don&#8217;t have a liberty=
 to run memory analysis tool on the environment where the core is reproduce=
d.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">As for the details of the segmentation fault itself,=
 its occurring in the mbuf chain. In the different tests we have run, the c=
rash point is a different function, but usually occurs in these functions:<=
o:p></o:p></p>
<p class=3D"MsoNormal">1. sbdrop_internal<o:p></o:p></p>
<p class=3D"MsoNormal">2. tcp_m_copym<o:p></o:p></p>
<p class=3D"MsoNormal">3. m_split (very rarely, it has also occured in)<o:p=
></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">However, what's to note it always on trying to acces=
s a member of the current m buffer, e.g. m-&gt;m_len causes the crash or m-=
&gt;m_flag causes the crash. I have tracked the faulty address that I get f=
rom these functions, to a socket which is
 assigned from tcp_input_with_port() function from the inpcb struct. The ad=
dress is of course inaccessible in gdb. The faulty address belongs to the m=
buf chains in the so_snd socket buffer. It is usually the mbuf in sb_sndptr=
. Either the first member itself
 or down the line. Although in one or two cores, the same applies for the s=
b_mb mbuf chain (which I assume is the main chain itself). From the address=
es we can clearly see its a heap overflow, as I was able to go through sb_s=
ndptr chain in one the cores until
 i found the faulty address. <o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">The last address before the faulty one is: 0x7f402a4=
d0700 after which comes 0x9fff22eb779f. I also see this faulty address for =
the first time in the frame of the function tcp_input_with_port(), in inpcb=
 struct(inp). The very obvious difference
 between the two addresses and it show that somewhere while accessing or as=
signing the mbuf, an overflow has occurred. These are most common back trac=
e:<o:p></o:p></p>
<p class=3D"MsoNormal">1. sbdrop_internal()<o:p></o:p></p>
<p class=3D"MsoNormal">2. sbdrop_locked()<o:p></o:p></p>
<p class=3D"MsoNormal">3. tcp_do_segment()<o:p></o:p></p>
<p class=3D"MsoNormal">4. tcp_input_with_port()<o:p></o:p></p>
<p class=3D"MsoNormal">5. in_input()<o:p></o:p></p>
<p class=3D"MsoNormal">5. netisr_dispatch_src()<o:p></o:p></p>
<p class=3D"MsoNormal">6. ether_demux()<o:p></o:p></p>
<p class=3D"MsoNormal">7. ether_input_internal()<o:p></o:p></p>
<p class=3D"MsoNormal">8. ether_nh_input<o:p></o:p></p>
<p class=3D"MsoNormal">9. netisr_dispatch_src()<o:p></o:p></p>
<p class=3D"MsoNormal">10. netisr_dispatch()<o:p></o:p></p>
<p class=3D"MsoNormal">11. ns_net_tcp_push_frame().<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">I have tried to track down the source of the faulty =
address further than tcp_input_with_port() but with no avail. I only have c=
ores available, and even gdb blocks the seg fault from happening in the tes=
t. I have gone through the code, and
 according to my meagre understanding, nothing indicates towards a heap buf=
fer overflow in any of the above functions. Any help, in pointing to the ri=
ght direction or anything else would be greatly appreciated. If you need an=
y more information or a more appropriate
 mailing list, please let me know.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Thanks,<o:p></o:p></p>
<p class=3D"MsoNormal">Waseem<o:p></o:p></p>
</div>
</body>
</html>

--_000_CWXP265MB4796C339D93CED1682BD8A5E9F442CWXP265MB4796GBRP_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CWXP265MB4796C339D93CED1682BD8A5E9F442>