Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Aug 2021 14:36:29 +0100
From:      "Alexander V. Chernikov" <melifaro@ipfw.ru>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        freebsd-net <freebsd-net@FreeBSD.org>
Subject:   Re: network crash in nhop_free
Message-ID:  <869483A6-FA65-40A2-9CCC-05216588EAC8@ipfw.ru>
In-Reply-To: <70d1091d-07ec-1c76-29bc-1f2e2264b55a@FreeBSD.org>
References:  <2fbc5205-3fcc-d233-dae1-cf6ddc8d691d@FreeBSD.org> <d1e5e244-09e2-c30c-d08c-95d907c72f18@FreeBSD.org> <95F4F779-91A0-482B-B26B-6C95A60FC281@ipfw.ru> <70d1091d-07ec-1c76-29bc-1f2e2264b55a@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 10 Jul 2021, at 10:07, Andriy Gapon <avg@FreeBSD.org> wrote:
>=20
> On 09/07/2021 00:02, Alexander V. Chernikov wrote:
>> Hi Andriy,
>> Could you by any chance provide a bit more info on the system =
networking configuration and the steps leading to panic?
>> No chance for a coredump?
>> destroy_nhgrp() suggests that there was a multipath route (default?) =
that was deleted.
>> nhops are created with UMA_ALIGN_PTR, so I suspect there is a garbage =
inside nhgrp pointer..
>=20
> I've just reproduced the problem and got a crash dump.
> The new panic is a little bit different, but I think that it confirms =
your analysis.
> Also, you are right about the multipath route, although its creation =
was not intentional.

Should be fixed by =
https://cgit.freebsd.org/src/commit/?id=3D054948bd81bb9e4e32449cf351b62e50=
1b8831ff .

>=20
> The test setup is a host with an ethernet interface and a 3g modem =
(for ppp).
> The default default route is via the ethernet.
>=20
> Destination        Gateway            Flags     Netif Expire
> default            192.168.0.1        UGS        dwc0
> 8.8.8.8            192.168.0.1        UGHS       dwc0
> 127.0.0.1          link#2             UH          lo0
> 192.168.0.0/24     link#1             U          dwc0
> 192.168.0.137      link#1             UHS         lo0
>=20
> 192.168.0.0/24 is the LAN.
> The static route to 8.8.8.8 is for internet accessibility checking.
>=20
> Interesting bits of my ppp configuration:
> ----- ppp.linkup -----
> 3g:
> add! default HISADDR
> ----------------------
>=20
> When I bring up the ppp link I get two default routes -- which is not =
what I expected even when using 'add!':
> Destination        Gateway            Flags     Netif Expire
> default            192.168.0.1        UGS        dwc0
> default            10.1.1.1           UGS        tun0
> 8.8.8.8            192.168.0.1        UGHS       dwc0
> 10.1.1.1           link#4             UHS        tun0
> 10.133.147.118     link#4             UHS         lo0
> 127.0.0.1          link#2             UH          lo0
> 192.168.0.0/24     link#1             U          dwc0
> 192.168.0.137      link#1             UHS         lo0
>=20
> The procedure to re-create the problem is two bring up and down the =
ppp link twice.  That is, up -> down -> up -> down -> crash.
>=20
> Now, about the new crash.
> The panic message is:
> panic: refcount 0xffffa00027813318 wraparound
>=20
> The stack trace is approximately the same:
> panic() at panic+0x44
> _refcount_update_saturated() at _refcount_update_saturated+0x14
> nhop_free() at nhop_free+0x118
> destroy_nhgrp() at destroy_nhgrp+0x38
> epoch_call_task() at epoch_call_task+0x158
> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178
> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c
> fork_exit() at fork_exit+0x74
> fork_trampoline() at fork_trampoline+0x14
>=20
> =46rom kgdb it seems like a refcount underflow (decrement from zero).
> (kgdb) p *nhg_priv
> $1 =3D {nhg_idx =3D 0, nhg_nh_count =3D 2 '\002', nhg_spare =3D =
"\000\000", nhg_refcount =3D 0, nhg_linked =3D 1, nh_control =3D 0x0, =
nhg_priv_next =3D 0x0, nhg =3D 0xffffa00032049e80, nhg_epoch_ctx =3D =
{data =3D {
>      0xffff0000005a0edc <destroy_nhgrp_epoch>, 0xffffa0000eecb148}}, =
nhg_nh_weights =3D 0xffffa00032049ed0}
> (kgdb) p nhg_priv->nhg_nh_weights[0]
> $2 =3D {nh =3D 0xffffa00027813200, weight =3D 0}
> (kgdb) p nhg_priv->nhg_nh_weights[1]
> $3 =3D {nh =3D 0xffffa00027813800, weight =3D 1}
> (kgdb) p *nhg_priv->nhg_nh_weights[0].nh
> $4 =3D {nh_flags =3D 128, nh_mtu =3D 1500, {gw4_sa =3D {sin_len =3D 16 =
'\020', sin_family =3D 2 '\002', sin_port =3D 0, sin_addr =3D {s_addr =3D =
16843018}, sin_zero =3D "\000\000\000\000\000\000\000"}, gw6_sa =3D =
{sin6_len =3D 16 '\020',
>      sin6_family =3D 2 '\002', sin6_port =3D 0, sin6_flowinfo =3D =
16843018, sin6_addr =3D {__u6_addr =3D {__u6_addr8 =3D '\000' <repeats =
15 times>, __u6_addr16 =3D {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 =3D {0, =
0, 0, 0}}}, sin6_scope_id =3D 0},
>    gw_sa =3D {sa_len =3D 16 '\020', sa_family =3D 2 '\002', sa_data =3D =
"\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa =3D =
{sdl_len =3D 16 '\020', sdl_family =3D 2 '\002', sdl_index =3D 0, =
sdl_type =3D 10 '\n',
>      sdl_nlen =3D 1 '\001', sdl_alen =3D 1 '\001', sdl_slen =3D 1 =
'\001', sdl_data =3D "\000\000\000\000\000\000\000"}, gw_buf =3D =
"\020\002\000\000\n\001\001\001", '\000' <repeats 19 times>}, nh_ifp =3D =
0xffffa00027843800,
>  nh_ifa =3D 0xffffa0000eec4900, nh_aifp =3D 0xffffa00027843800, =
nh_pksent =3D 0xffff0000c2d38cd8, nh_prepend_len =3D 0 '\000', spare =3D =
"\000\000", spare1 =3D 0, nh_prepend =3D '\000' <repeats 47 times>, =
nh_priv =3D 0xffffa00027813300}
> (kgdb) p *nhg_priv->nhg_nh_weights[1].nh
> $5 =3D {nh_flags =3D 640, nh_mtu =3D 1500, {gw4_sa =3D {sin_len =3D 16 =
'\020', sin_family =3D 2 '\002', sin_port =3D 0, sin_addr =3D {s_addr =3D =
16843018}, sin_zero =3D "\000\000\000\000\000\000\000"}, gw6_sa =3D =
{sin6_len =3D 16 '\020',
>      sin6_family =3D 2 '\002', sin6_port =3D 0, sin6_flowinfo =3D =
16843018, sin6_addr =3D {__u6_addr =3D {__u6_addr8 =3D '\000' <repeats =
15 times>, __u6_addr16 =3D {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 =3D {0, =
0, 0, 0}}}, sin6_scope_id =3D 0},
>    gw_sa =3D {sa_len =3D 16 '\020', sa_family =3D 2 '\002', sa_data =3D =
"\000\000\n\001\001\001\000\000\000\000\000\000\000"}, gwl_sa =3D =
{sdl_len =3D 16 '\020', sdl_family =3D 2 '\002', sdl_index =3D 0, =
sdl_type =3D 10 '\n',
>      sdl_nlen =3D 1 '\001', sdl_alen =3D 1 '\001', sdl_slen =3D 1 =
'\001', sdl_data =3D "\000\000\000\000\000\000\000"}, gw_buf =3D =
"\020\002\000\000\n\001\001\001", '\000' <repeats 19 times>}, nh_ifp =3D =
0xffffa00027843800,
>  nh_ifa =3D 0xffffa0000eec4900, nh_aifp =3D 0xffffa00027843800, =
nh_pksent =3D 0xffff0000c2d38430, nh_prepend_len =3D 0 '\000', spare =3D =
"\000\000", spare1 =3D 0, nh_prepend =3D '\000' <repeats 47 times>, =
nh_priv =3D 0xffffa00027813900}
>=20
> (kgdb) p *nhg_priv->nhg_nh_weights[0].nh->nh_priv
> $7 =3D {nh_family =3D 2 '\002', spare =3D 0 '\000', nh_type =3D 2, =
rt_flags =3D 526336, nh_idx =3D 0, cb_func =3D 0x0, nh_refcnt =3D =
4294967295, nh_linked =3D 1, nh =3D 0xffffa00027813200, nh_control =3D =
0xffffa00000ddf900,
>  nh_next =3D 0xffffa00027813900, nh_vnet =3D 0xffffa0000084c580, =
nh_epoch_ctx =3D {data =3D {0xffff0000005a2f90 <destroy_nhop_epoch>, =
0x0}}}
> (kgdb) p *nhg_priv->nhg_nh_weights[1].nh->nh_priv
> $8 =3D {nh_family =3D 2 '\002', spare =3D 0 '\000', nh_type =3D 2, =
rt_flags =3D 2050, nh_idx =3D 11, cb_func =3D 0x0, nh_refcnt =3D 4, =
nh_linked =3D 2, nh =3D 0xffffa00027813800, nh_control =3D =
0xffffa00000ddf900, nh_next =3D 0xffffa00027813500,
>  nh_vnet =3D 0xffffa0000084c580, nh_epoch_ctx =3D {data =3D {0x0, =
0x0}}}
>=20
> nh_refcnt =3D 4294967295 (0xffffffff) in =
nhg_priv->nhg_nh_weights[0].nh->nh_priv.
>=20
>>> On 22 Jun 2021, at 11:31, Andriy Gapon <avg@FreeBSD.org> wrote:
>>>=20
>>>=20
>>> It seems that the panic message was
>>> panic: Misaligned access from kernel space!
>>>=20
>>> On 22/06/2021 12:54, Andriy Gapon wrote:
>>>> Not sure if I'll be able to get more out of this arm64 machine.
>>>> I was playing with ppp and switching routes between LAN and ppp =
when the crash happened.
>>>> The system is 2-3 weeks old 14.0-CURRENT as of c8250c5ada85fec.
>>>> Tracing pid 0 tid 100014 td 0xffffa00000c00000
>>>> db_trace_self() at db_trace_self
>>>> db_stack_trace() at db_stack_trace+0x11c
>>>> db_command() at db_command+0x244
>>>> db_command_loop() at db_command_loop+0x54
>>>> db_trap() at db_trap+0xf8
>>>> kdb_trap() at kdb_trap+0x1c4
>>>> handle_el1h_sync() at handle_el1h_sync+0x74
>>>> --- exception, esr 0xf2000000
>>>> kdb_enter() at kdb_enter+0x44
>>>> vpanic() at vpanic+0x1c4
>>>> panic() at panic+0x44
>>>> align_abort() at align_abort+0xb8
>>>> handle_el1h_sync() at handle_el1h_sync+0x74
>>>> --- exception, esr 0x96000021
>>>> nhop_free() at nhop_free+0x100
>>>> destroy_nhgrp() at destroy_nhgrp+0x38
>>>> epoch_call_task() at epoch_call_task+0x158
>>>> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x178
>>>> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x9c
>>>> fork_exit() at fork_exit+0x74
>>>> fork_trampoline() at fork_trampoline+0x14
>>>=20
>>>=20
>>> --=20
>>> Andriy Gapon
>>>=20
>=20
>=20
> --=20
> Andriy Gapon




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?869483A6-FA65-40A2-9CCC-05216588EAC8>