Date: Fri, 6 Jan 2023 18:14:06 +0800 From: Zhenlei Huang <zlei@FreeBSD.org> To: James Gritton <jamie@freebsd.org> Cc: freebsd-jail@freebsd.org, freebsd-net <freebsd-net@freebsd.org> Subject: Re: Propose a new stage `vnet_shutdown` before `vnet_destroy` Message-ID: <C0FE4771-C552-431C-9B6C-4C6AE1BE5D48@FreeBSD.org> In-Reply-To: <1c9dbf6d26b9525243dd6b3ffafa23cb@freebsd.org> References: <F64E9B0F-6C4E-4754-B829-1A5ACDB6D614@FreeBSD.org> <1c9dbf6d26b9525243dd6b3ffafa23cb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Dec 19, 2022, at 1:44 AM, James Gritton <jamie@freebsd.org> wrote: >=20 > On 2022-12-18 00:01, Zhenlei Huang wrote: >> I'm currently working on route nexthop caching feature for tunneling >> interfaces such as >> if_gif, if_gre, if_vxlan, and potentially if_wg. I encounter a nasty >> bug related to VNET lifecycle. >> More preciously I'd like to call `rib_unsubscribe()` to unsubscribe >> route event when the interface >> tunnel is deleted (gif_delete_tunnel). >> While on VNET shutting down, VNET SYSUNINIT was called and the = routing >> vnet subsystem >> is destroyed before the interface going down and hence cause >> pagefault. I do not want to check >> `vnet.vnet_shutdown` state as it looks messed up. >> I'm recently reviewing the life cycles of prison and get some = inspirations. >> When the jail / prison is submitted to destroy ( by jail_remove >> syscall ) then SIGKILL is sent to >> the prison's processes. I think it is correct order to destroy jail / >> prison. To summarize, the life cycle >> of jail / prison is: >> on jail create: PRISON_STATE_INVALID -> create VNET -> >> PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses, >> routing, etc. -> create / attach (network) processes >> on jail destroy: jexec kill processes (1) by user -> mark it as >> PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2) -> >> destroy VNET (if prison pr_ref go to the last one) -> DYED >> The (2) is a cleanup by kernel as (1) is possible not done by user. >> So it comes the idea about the life cycle of VNET. >> While on jail destroy, the network resources are cleaned up by >> vnet_destroy ( SYSUNINIT ). Then the >> order of SYSUNINIT of network components is hacking as circular >> network resource dependency is possible. >> For example the routing table entries (nhop) have reference of ifnet, >> and ifnet have reference to route nhop (cache), as >> I encountered. >> Just like the cleanup processes by kernel, we can introduce a new >> stage `vnet_shutdown` that clean up network resources. >> When jail / prison is going to dye, after kernel has cleaned up >> processes it call `vnet_shutdown` to cleanup network resources, >> then vnet_destroy will go smoothly as there's no circular network >> resource dependency right now. >> The life cycle of prison becomes: >> on jail create: PRISON_STATE_INVALID -> create VNET -> >> PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses, >> routing, etc. -> create / attach (network) processes >> on jail destroy: jexec kill processes (1) by user -> mark it as >> PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2) -> >> vnet_shutdown cleanup network resources -> destroy VNET (if prison >> pr_ref go to the last one) -> DYED >> This idea is still unmature and I hope to hear more voices about it. >=20 > This is absolutely the direction things need to go. Vnet isn't the > only thing that can have these problems, though it's been the biggest > offender. There could also be cycles that involve more than one > subsystem, which could be helped by broad application of this idea. >=20 > There's a function in kern_jail.c ready for this: prison_cleanup. > It's called in "mark PRISON_STATE_DYING" stage of things. That's > before the "send SIGKILL" part of your sequence, but otherwise fits. >=20 Submitted to Phabricator for review: https://reviews.freebsd.org/D37956 https://reviews.freebsd.org/D37957 > - Jamie Best regards, Zhenlei
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C0FE4771-C552-431C-9B6C-4C6AE1BE5D48>