From nobody Tue Dec 13 19:03:54 2022 X-Original-To: jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NWnwJ3sRqz4kX2T for ; Tue, 13 Dec 2022 19:04:00 +0000 (UTC) (envelope-from bz@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NWnwJ3MTRz4L4P; Tue, 13 Dec 2022 19:04:00 +0000 (UTC) (envelope-from bz@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1670958240; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6axhjV1lfdZJTvgvxK6oaS4V8FvbdhFI8oSErcJdZ0w=; b=o3Q+ObL0apbc+viEiGn7XK1vMgBygywFgCfM4rcxnzlA3ZnGfpE6Axg/KZEC865mJP24JI dlDw7bh+svR1vGdMXnUIWNONlJPpDXOzaJMipq8zcy4utW6U63pGZ9BIsdKDVr3jfF5Ny0 TSc7RcOMtGX+j+hTQdArDgO6433eDXlnyybCBkCLt43dZRaH9e1qepi95oiSfoUWcBI7xU yvwHrawSvxXvodIDY5wmgG5+W8yh0SxzAIvuaLJzNMOfyKibHQc5qQtydPbRWOw6m6qw0f f93dcD552qZoT2NJJgx7AYzmd+Zu2zqtkv0AqJOVB9Gs7Kqt61EFxrrhB7Vr4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1670958240; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6axhjV1lfdZJTvgvxK6oaS4V8FvbdhFI8oSErcJdZ0w=; b=Twu3jIB21VXaKH4C5+b1vldadR250BOiNJUCpa++dAdkZcKTD05lXI/m9tE/RDRG6h9i9Y zRtEe3Ho3CwXczhQq3HVlgI1OTJbZZQT8D3u24j5ApXqdnXxn7JuFkZ7d9PJnkuVhJGP0I Dj83N+LCnGl/T+8q3lGZ9zSQ5v/TwMXtxN93b538yzAhumJith2kCVggkhJleFG5nywurM PLccB7iXMh+yllXRV7mpMzD8/35UO15DjFbIZgiqPGJJX0m54ql3IsriiyKkHgBIY845BB WfsKt4KtMLpg5HK+X2mwYSeHaxRfb11bCQjUVQzrKOXN7ec1VWQ1MbDQhdN0Sg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1670958240; a=rsa-sha256; cv=none; b=vAEvbxcyoQBG1LwSvR1tksEYntwj6xKMZ7r7Ti5+RjD7unzWJgVekGwPJixweHAZlikmbI MY9GNl4mEF8dhmCbRYDOJejQJpOrM6aApV7G/eDIRVOgZBHKY2XK9le6dJevmhWvIPzqiA GBzNen9egYVYrVJvec+knWv74MKsHWxSkcmmN/MAMnofpSINxrYD1b6qx1SpEBOreH8E+B sBXWLr1bp8m62NavGtZoaTH110ljm/uoHS8LROG3j46YhUcFrg8Rknn4EVTiHpZsUUnqGJ WT8vlR9F/2CXOtpfO3UmBj6wgsq13KBPrigdHjGSOxXr8J6hvP3e9lqzzfiEbw== Received: from mx1.sbone.de (cross.sbone.de [195.201.62.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified)) (Authenticated sender: bz/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4NWnwJ1hCFzNK5; Tue, 13 Dec 2022 19:04:00 +0000 (UTC) (envelope-from bz@freebsd.org) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id C1BE38D4A162; Tue, 13 Dec 2022 19:03:58 +0000 (UTC) Received: from content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id DF6275C3A833; Tue, 13 Dec 2022 19:03:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) by content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (amavisd-new, port 10024) with ESMTP id uoSbzvEqXc0B; Tue, 13 Dec 2022 19:03:55 +0000 (UTC) Received: from strong-iwl0.sbone.de (strong-iwl0.sbone.de [IPv6:fde9:577b:c1a9:4902:b66b:fcff:fef3:e3d2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id 5E39B5C3A830; Tue, 13 Dec 2022 19:03:55 +0000 (UTC) Date: Tue, 13 Dec 2022 19:03:54 +0000 (UTC) From: "Bjoern A. Zeeb" To: James Gritton cc: jail@freebsd.org, "glebius@FreeBSD.org" , Andrew Gallatin Subject: Re: prison_flag() check in hot path of in_pcblookup() In-Reply-To: Message-ID: <89pn26q0-pps9-q8n7-1334-q15o5896p6p@serrofq.bet> References: X-OpenPGP-Key-Id: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-ThisMailContainsUnwantedMimeParts: N On Tue, 13 Dec 2022, James Gritton wrote: Hi, Argh, sorry Drew, I looked at the wrong of the two checks in the function earlier. Sorry, that's what happens if trying to be helpful when firefighting elsewhere. > On 2022-12-13 09:18, Andrew Gallatin wrote: > >> I was trying to improve the performance of in_pcblookup(), as it is a very >> hot path for us (Netflix). One thing I noticed was the prison_flag() check >> in in_pcblookup_hash_locked() can cause a cache miss just by deref'ing the >> cred pointer, and it can also cause multiple misses in tables with >> collisions by causing us to walk the entire chain even after finding a >> perfect match. >> >> I'm curious why this check is needed. Can you explain it to me? It >> originated in this commit: >> >> commit 413628a7e3d23a897cd959638d325395e4c9691b >> Author: Bjoern A. Zeeb >> Date: Sat Nov 29 14:32:14 2008 +0000 >> >> MFp4: >> Bring in updated jail support from bz_jail branch. >> >> This enhances the current jail implementation to permit multiple >> addresses per jail. In addtion to IPv4, IPv6 is supported as well. >> >> My thinking is that a jail will either use the host IP, and share its port >> space, or it will have its own IP entirely (but I know nothing about >> jails). Well having its own IP address entirely doesn't work with classic jails as there is only one network stack where they can be configured on an interface, and that is the base system. So de-facto all jail address/port space is always shared with the host (ignoring vnet jails with their own entire virtual network stack). Whether the host would bind/listen is a different story. I know 15 years ago people would bind the sshd of the base to a single-IP address which was not assigned to jails and then the sshd inside a jail would bind to a different (single) IP address. If one doesn't do that and the sshd inside the jail isn't runnig one would end up trying to connect to the sshd in the base system (sshd being a popular example). Not sure if people still do but that's still the case. There are special cases with classic multi-IP jails in that they then cannot have overlapping IP-ranges as otherwise it would not be deterministic which jail would get an inbound connection on inaddr_any:port_n if two jails were to listen on the same port_n. Hope that helps for basic understanding. >> In either case, a perfect 4-tuple match should be enough to >> uniquely identify the connection. >> >> Even if this somehow is not the case and we have multiple connections >> somehow sharing the same 4-tuple, how does checking the prison flag help >> us? That logic predates me and came from [1]. The jail_jailed_sockets_first sysctl got removed in the review process with rwatson. I am still trying to see where the SO_REUSEPORT comment (back then) came from. I know I only had the first lines initially, so must have been sometime during review with rwatson as well. Sadly p4 emails where truncated to 1000 lines so I cannot simply grep for the change (if it is in my mail archives) or had a useful commit message (but at least would give a date to check further private email). My current guess is that if we have the 4-tuple in both the base and a jail (hence the SO_REUSEPORT comment) we want the jail not getting a socket of the base system returned as that would mean one could "break out of prison". But if the inp belongs to a jail we know we can simply return. So if you find the one of the base system first you'll have to go and look through the others. XXX-jamie: is that all still true in hierarchical jails? Now whether fabricating the case of having both inps on the hash is still theorectically possible I cannot say. I haven't followed all the changes to that code lately close enough. >> It would prefer the jailed connection over the non jailed, but that >> would shadow a host connection. And if we had 2 jails sharing the same >> 4-tuple, the first jail would win. >> >> I can't see how this check is doing anything useful, so I'd very much like >> to remove this check if possible. Untested patch attached. > > For a complete 4-tuple, it should indeed be the case that a match would only > ever identify a single prison. The later part of the function that examines > wildcards definitely needs the check. I don't get the XXX comment about both > being bound with SO_REUSEPORT, because I would only expect that to apply to > listening, not to full connections. But I also expect Bjoern to know more > than I do here... /bz [1] https://people.freebsd.org/~pjd/patches/jail_2004120901.patch -- Bjoern A. Zeeb r15:7