Date: Sun, 19 Mar 2017 19:46:49 -0500 From: "Mike Karels" <mike@karels.net> To: "Andrey V. Elsukov" <bu7cher@yandex.ru> Cc: freebsd-net@FreeBSD.org, "Alexander V. Chernikov" <melifaro@freebsd.org>, "Eugene Grosbein" <eugen@freebsd.org>, karels@FreeBSD.org, "George Neville-Neil" <gnn@freebsd.org> Subject: Re: LLE reference leak in the L2 cache Message-ID: <FC62E82D-4019-4FF8-8EAD-87CC4A369755@karels.net> In-Reply-To: <70D2287B-664C-48E4-9E8B-68B574BE6CE6@karels.net> References: <201703140840.v2E8ecH2040827@mail.karels.net> <3a4c5d87-d42e-5615-5d2b-2a8801376600@yandex.ru> <70D2287B-664C-48E4-9E8B-68B574BE6CE6@karels.net>
next in thread | previous in thread | raw e-mail | index | archive | help
The context has gotten messy here, so I=E2=80=99m going to break down and= =20 top-post. I started review https://reviews.freebsd.org/D10059 with a fix for the=20 reference-count leak. It changes the semantics so only routes within an in_pcb automatically=20 do L2 caching. I=E2=80=99ll put the tcp_output change for V6 in a separate review when t= his=20 one is done. Andrey, could you try your iperf test again? Thanks, Mike On 14 Mar 2017, at 21:39, Mike Karels wrote: > On 14 Mar 2017, at 3:50, Andrey V. Elsukov wrote: > >> On 14.03.2017 11:40, Mike Karels wrote: >>>> Hi All, >>> >>>> Eugene has reported about the following assertion in the ARP code: >>>> http://www.grosbein.net/freebsd/crash/arp-kassert.txt >>> >>>> After some investigation I found that L2 cache has reference leak,=20 >>>> that >>>> can lead to integer overflow and this assertion. >>>> The one of the ways to reproduce this overflow can be demonstrated=20 >>>> with >>>> simple IP forwarding, when ip_forward() is used (not=20 >>>> ip_tryforward). >>> >>>> I asked olivier@ to reproduce this leak and he got this result: >>>> http://slexy.org/view/s21ql7nA0q >>> >>>> After further investigation I found similar leak in the IPv6 TCP=20 >>>> path. >>>> Simple iperf test shows these results: >>> >>>> # dtrace -n 'fbt::in6_lltable_dump_entry:entry {printf("%d", >>>> args[1]->lle_refcnt);}' >>>> dtrace: description 'fbt::in6_lltable_dump_entry:entry ' matched 1=20 >>>> probe >>>> CPU ID FUNCTION:NAME >>>> 51 18589 in6_lltable_dump_entry:entry 55721 >>>> 51 18589 in6_lltable_dump_entry:entry 1 >>>> 51 18589 in6_lltable_dump_entry:entry 1 >>>> 51 18589 in6_lltable_dump_entry:entry 2 >>>> 38 18589 in6_lltable_dump_entry:entry 111417 >>>> 38 18589 in6_lltable_dump_entry:entry 1 >>>> 38 18589 in6_lltable_dump_entry:entry 1 >>> >>>> -- >>>> WBR, Andrey V. Elsukov >>> >>> Thanks! Could you try the following patch (compiles, but untested): >>> >>> Index: netinet/ip_input.c >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> --- netinet/ip_input.c (revision 315160) >>> +++ netinet/ip_input.c (working copy) >>> @@ -60,6 +60,7 @@ >>> #include <net/if_types.h> >>> #include <net/if_var.h> >>> #include <net/if_dl.h> >>> +#include <net/if_llatbl.h> >>> #include <net/route.h> >>> #include <net/netisr.h> >>> #include <net/rss_config.h> >>> @@ -1066,6 +1067,8 @@ >>> if (error =3D=3D EMSGSIZE && ro.ro_rt) >>> mtu =3D ro.ro_rt->rt_mtu; >>> RO_RTFREE(&ro); >>> + if (ro.ro_lle) >>> + LLE_FREE(ro.ro_lle); >>> >>> if (error) >>> IPSTAT_INC(ips_cantforward); >> >> I think it would be better to set RT_LLE_CACHE flag only for=20 >> protocols >> that expect presence of L2 cache. I.e. only for the TCP and UDP and=20 >> do >> it in the corresponding protocol output routine, not in the=20 >> ip[6]_output. > > Hmm, let me think about that. TCP and UDP know nothing about L2=20 > cache, > they just use the in_pcb cache which handles it. L3 caching was=20 > removed > earlier by someone who thought of it as a layering violation, which is=20 > why > I tried to keep in the IP layer for the most part. I can probably=20 > find a > way to encapsulate it. > >> >>> Index: netinet6/ip6_forward.c >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> --- netinet6/ip6_forward.c (revision 315160) >>> +++ netinet6/ip6_forward.c (working copy) >>> @@ -52,6 +52,7 @@ >>> #include <net/if.h> >>> #include <net/if_var.h> >>> #include <net/netisr.h> >>> +#include <net/if_llatbl.h> >>> #include <net/route.h> >>> #include <net/pfil.h> >>> >>> @@ -431,4 +432,6 @@ >>> out: >>> if (rt !=3D NULL) >>> RTFREE(rt); >>> + if (rin6.ro_lle) >>> + LLE_FREE(rin6.ro_lle); >>> } >> >> I don't think this chunk will help. ip6_forward() doesn't use >> ip6_output(). And IPv6 forwarding is not affected by this problem.=20 >> Look >> at the tcp_output(), it uses local route variable for IPv6 output. > > Ah, right, I obviously didn=E2=80=99t read closely enough earlier. I=E2= =80=99ve=20 > attached > a patch that should fix this, as well as adding route caching for=20 > TCP/IPv6. > >> I'm not sure, but probably SCTP also can be affected by this problem. > > Probably true. SCTP could probably benefit from L2 caching, but this=20 > also > argues for making this more transparent. > > Mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FC62E82D-4019-4FF8-8EAD-87CC4A369755>