Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Jan 2023 18:14:06 +0800
From:      Zhenlei Huang <zlei@FreeBSD.org>
To:        James Gritton <jamie@freebsd.org>
Cc:        freebsd-jail@freebsd.org, freebsd-net <freebsd-net@freebsd.org>
Subject:   Re: Propose a new stage `vnet_shutdown` before `vnet_destroy`
Message-ID:  <C0FE4771-C552-431C-9B6C-4C6AE1BE5D48@FreeBSD.org>
In-Reply-To: <1c9dbf6d26b9525243dd6b3ffafa23cb@freebsd.org>
References:  <F64E9B0F-6C4E-4754-B829-1A5ACDB6D614@FreeBSD.org> <1c9dbf6d26b9525243dd6b3ffafa23cb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Dec 19, 2022, at 1:44 AM, James Gritton <jamie@freebsd.org> wrote:
>=20
> On 2022-12-18 00:01, Zhenlei Huang wrote:
>> I'm currently working on route nexthop caching feature for tunneling
>> interfaces such as
>> if_gif, if_gre, if_vxlan, and potentially if_wg. I encounter a nasty
>> bug related to VNET lifecycle.
>> More preciously I'd like to call `rib_unsubscribe()` to unsubscribe
>> route event when the interface
>> tunnel is deleted (gif_delete_tunnel).
>> While on VNET shutting down, VNET SYSUNINIT was called and the =
routing
>> vnet subsystem
>> is destroyed before the interface going down and hence cause
>> pagefault. I do not want to check
>> `vnet.vnet_shutdown` state as it looks messed up.
>> I'm recently reviewing the life cycles of prison and get some =
inspirations.
>> When the jail / prison is submitted to destroy ( by jail_remove
>> syscall ) then SIGKILL is sent to
>> the prison's processes. I think it is correct order to destroy jail /
>> prison. To summarize, the life cycle
>> of jail / prison is:
>> on jail create: PRISON_STATE_INVALID -> create VNET ->
>> PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses,
>> routing, etc. -> create / attach (network) processes
>> on jail destroy: jexec kill processes (1) by user -> mark it as
>> PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2)  ->
>> destroy VNET (if prison pr_ref go to the last one) ->  DYED
>> The (2) is a cleanup by kernel as (1) is possible not done by user.
>> So it comes the idea about the life cycle of VNET.
>> While on jail destroy, the network resources are cleaned up by
>> vnet_destroy ( SYSUNINIT ). Then the
>> order of SYSUNINIT of network components is hacking as circular
>> network resource dependency is possible.
>> For example the routing table entries (nhop) have reference of ifnet,
>> and ifnet have reference to route nhop (cache), as
>> I encountered.
>> Just like the cleanup processes by kernel, we can introduce a new
>> stage `vnet_shutdown` that clean up network resources.
>> When jail / prison is going to dye, after kernel has cleaned up
>> processes it call `vnet_shutdown` to cleanup network resources,
>> then vnet_destroy will go smoothly as there's no circular network
>> resource dependency right now.
>> The life cycle of prison becomes:
>> on jail create: PRISON_STATE_INVALID -> create VNET ->
>> PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses,
>> routing, etc. -> create / attach (network) processes
>> on jail destroy: jexec kill processes (1) by user -> mark it as
>> PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2)  ->
>> vnet_shutdown cleanup network resources -> destroy VNET (if prison
>> pr_ref go to the last one) ->  DYED
>> This idea is still unmature and I hope to hear more voices about it.
>=20
> This is absolutely the direction things need to go.  Vnet isn't the
> only thing that can have these problems, though it's been the biggest
> offender.  There could also be cycles that involve more than one
> subsystem, which could be helped by broad application of this idea.
>=20
> There's a function in kern_jail.c ready for this: prison_cleanup.
> It's called in "mark PRISON_STATE_DYING" stage of things.  That's
> before the "send SIGKILL" part of your sequence, but otherwise fits.
>=20

Submitted to Phabricator for review:

https://reviews.freebsd.org/D37956
https://reviews.freebsd.org/D37957


> - Jamie

Best regards,
Zhenlei




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C0FE4771-C552-431C-9B6C-4C6AE1BE5D48>