From nobody Sun Dec 18 08:01:20 2022 X-Original-To: freebsd-jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NZZzW2RRfz1GprZ; Sun, 18 Dec 2022 08:01:27 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NZZzV4S6Sz4PJl; Sun, 18 Dec 2022 08:01:26 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=F6roBBiO; spf=pass (mx1.freebsd.org: domain of zlei.huang@gmail.com designates 2607:f8b0:4864:20::529 as permitted sender) smtp.mailfrom=zlei.huang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-x529.google.com with SMTP id 36so4320246pgp.10; Sun, 18 Dec 2022 00:01:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:date:message-id:subject:mime-version:content-transfer-encoding :from:from:to:cc:subject:date:message-id:reply-to; bh=DDhAhmQegCLVp1KDXNxcfrOo7um/S/b+5jj5cIu/r0E=; b=F6roBBiOnbCW6bf2KffLiGeV3YbwqYt+d3aaRuTN9mHk7g+iAuUEZE2I2KvsrdW7vj zpqk/MBf/ztjsNwLotmV+f42ZVWtw42lt6d2XQdZUQ/Wpi+vCon6sNbvub/C9MzvfnYN eEoql8cnjxI10aSicYhijN4pAKM4cDwQNFpY12siz/y+gOPVRrPb18w3lL4k2pLkWnfu qra0N3RsUUzEMsRO65ppZfTeleaTFcGLMH+qY3c/NkJnIpOfWpmP8jlhTD08tn9Pu2WS /FYRlQd68VpfkkFk5vp56zgFyp0OSRtw7zUBA9cERPtjoFGAfNL2csNWMwSvrqwWV676 EReg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:date:message-id:subject:mime-version:content-transfer-encoding :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DDhAhmQegCLVp1KDXNxcfrOo7um/S/b+5jj5cIu/r0E=; b=iuoQYkpwgbqkXUeJLJCVUkzemgQlFIFP5R47crVlz4FWbimPco+ewpO4YZS6ZesJMQ 4ji4U5Ycf9fN/7TtilTmMnghG45c/GGiy10IlMQUi0q5LI+oWWsInGxF5VNQQRum7ab3 /WhIAh7GwyHx1AsafLXUBOL++AKs8wHIzCkLNfffue/xi6XpiuJPaou0xBMGHGuo7uvo aD9ANsWR20JxdwV8BEAiryFvQj2EYRX+OSQX6Q0iNhGMfHqR6BoN9ym76kB93lAwmNAP vVOh2u5NVUW21n5zdVO7Hq8NxsLokR6rn7SzYQO9EeU5EHLCNQdbiUkX+kOwZFtzk+ow 4Ryw== X-Gm-Message-State: ANoB5pkRKIijaDU0RkcScO1zU1WpdiyjJFcyCF1iqWmbhobBoQdMsGQQ pwh8GFiEa4sEhZxvHtV8KeR/G47KNpuBm/qS X-Google-Smtp-Source: AA0mqf7m9bM02iK+LxbpEyW+DNPiI8XWpJwdqbXQ0GZ+I5NwGuLT0GmGzOwCHNzgDBGY2Z/uE8yYUQ== X-Received: by 2002:a62:3644:0:b0:576:663b:6614 with SMTP id d65-20020a623644000000b00576663b6614mr37324328pfa.2.1671350485257; Sun, 18 Dec 2022 00:01:25 -0800 (PST) Received: from [172.17.252.129] (ns1.oxydns.net. [45.32.91.63]) by smtp.gmail.com with ESMTPSA id k26-20020aa7999a000000b00574679561b4sm4162084pfh.134.2022.12.18.00.01.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Dec 2022 00:01:24 -0800 (PST) From: Zhenlei Huang X-Google-Original-From: Zhenlei Huang Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Subject: Propose a new stage `vnet_shutdown` before `vnet_destroy` Message-Id: Date: Sun, 18 Dec 2022 16:01:20 +0800 To: freebsd-jail@freebsd.org, freebsd-net X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Spamd-Result: default: False [-2.50 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_RHS_MATCH_TO(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MV_CASE(0.50)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TAGGED_FROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::529:from]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DKIM_TRACE(0.00)[gmail.com:+]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-jail@freebsd.org,freebsd-net@freebsd.org] X-Rspamd-Queue-Id: 4NZZzV4S6Sz4PJl X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Hi, I'm currently working on route nexthop caching feature for tunneling = interfaces such as if_gif, if_gre, if_vxlan, and potentially if_wg. I encounter a nasty bug = related to VNET lifecycle. More preciously I'd like to call `rib_unsubscribe()` to unsubscribe = route event when the interface tunnel is deleted (gif_delete_tunnel). While on VNET shutting down, VNET SYSUNINIT was called and the routing = vnet subsystem is destroyed before the interface going down and hence cause pagefault. = I do not want to check `vnet.vnet_shutdown` state as it looks messed up. I'm recently reviewing the life cycles of prison and get some = inspirations. When the jail / prison is submitted to destroy ( by jail_remove syscall = ) then SIGKILL is sent to the prison's processes. I think it is correct order to destroy jail / = prison. To summarize, the life cycle=20 of jail / prison is: on jail create: PRISON_STATE_INVALID -> create VNET -> = PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses, = routing, etc. -> create / attach (network) processes=20 on jail destroy: jexec kill processes (1) by user -> mark it as = PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2) -> = destroy VNET (if prison pr_ref go to the last one) -> DYED The (2) is a cleanup by kernel as (1) is possible not done by user. So it comes the idea about the life cycle of VNET. While on jail destroy, the network resources are cleaned up by = vnet_destroy ( SYSUNINIT ). Then the order of SYSUNINIT of network components is hacking as circular network = resource dependency is possible. For example the routing table entries (nhop) have reference of ifnet, = and ifnet have reference to route nhop (cache), as=20 I encountered. Just like the cleanup processes by kernel, we can introduce a new stage = `vnet_shutdown` that clean up network resources. When jail / prison is going to dye, after kernel has cleaned up = processes it call `vnet_shutdown` to cleanup network resources, then vnet_destroy will go smoothly as there's no circular network = resource dependency right now. The life cycle of prison becomes: on jail create: PRISON_STATE_INVALID -> create VNET -> = PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses, = routing, etc. -> create / attach (network) processes=20 on jail destroy: jexec kill processes (1) by user -> mark it as = PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2) -> = vnet_shutdown cleanup network resources -> destroy VNET (if prison = pr_ref go to the last one) -> DYED This idea is still unmature and I hope to hear more voices about it. Thanks! Best regards, Zhenlei