From owner-freebsd-transport@freebsd.org Fri Dec 18 04:43:12 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 897C9A4AC37 for ; Fri, 18 Dec 2015 04:43:12 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 74DBD18BB for ; Fri, 18 Dec 2015 04:43:12 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: by mailman.ysv.freebsd.org (Postfix) id 7437FA4AC35; Fri, 18 Dec 2015 04:43:12 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 73DBCA4AC34 for ; Fri, 18 Dec 2015 04:43:12 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from smtp.hungerhost.com (smtp.hungerhost.com [216.38.51.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 517E618BA for ; Fri, 18 Dec 2015 04:43:12 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from pool-108-54-164-204.nycmny.fios.verizon.net ([108.54.164.204]:63718 helo=[192.168.64.1]) by vps.hungerhost.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.86) (envelope-from ) id 1a9msp-0003zT-I4 for transport@freebsd.org; Thu, 17 Dec 2015 23:43:11 -0500 From: "George Neville-Neil" To: transport@freebsd.org Subject: Fwd: Please create a herald rule for the transport group Date: Thu, 17 Dec 2015 23:43:11 -0500 Message-ID: <5A174504-F6CF-4927-98E2-6792DC2856E9@neville-neil.com> References: MIME-Version: 1.0 Content-Type: text/plain; format=flowed X-Mailer: MailMate (1.9.3r5187) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com X-Authenticated-Sender: vps.hungerhost.com: gnn@neville-neil.com X-Source: X-Source-Args: X-Source-Dir: X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 04:43:12 -0000 Forwarded message: > From: Ed Maste > To: George Neville-Neil > Cc: phabric-admin@FreeBSD.org > Subject: Re: Please create a herald rule for the transport group > Date: Thu, 17 Dec 2015 17:22:50 -0500 > > On 17 December 2015 at 17:03, George Neville-Neil > wrote: >> Howdy, >> >> Can we have a herald rule to auto-add transport as a reviewer for any >> change >> impacting sys/netinet/tcp* or sys/netinet/udp*. > > https://reviews.freebsd.org/H64 From owner-freebsd-transport@freebsd.org Fri Dec 18 22:26:25 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89127A4C5C5 for ; Fri, 18 Dec 2015 22:26:25 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 65C261E3F for ; Fri, 18 Dec 2015 22:26:22 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-io0-x230.google.com with SMTP id 186so105734343iow.0 for ; Fri, 18 Dec 2015 14:26:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=TxHgQJ6M1Ikw0cPR7L7hIWVL9NhJA4WsPV96obAhRUk=; b=vHjKJtPmqNB7uGNO8EsUv/zSaS6xOpu+orrUBfX5nH7Pp8Z0xC1dWbA7HHHNguHVE4 5ptRPLzDXxGbZkM/W4M2TjjOl709vgs2p8YhfZilYIAKJk2LRdPsed6M2Jgt8reQsFtc rKvIpfG7Ce8kVUxeufHjj4QYt/JtBSo8p/htqndH5ZSu1OkWVeexDTDSTyilOnF7/dpO J5kLD91XBMjY+vy3ASKlqfDwHr2Y+obsQj26MFxov/ID6QAVmJeL/RoV59kC7+hmm3/Z ZT+e8LGZSILX2TJBGAqFHFGObmAQawFIjhDWFjqE2glW7yBsHOyJcFJTInP4lhN9Cv7O FUeA== MIME-Version: 1.0 X-Received: by 10.107.30.209 with SMTP id e200mr7603325ioe.113.1450477581863; Fri, 18 Dec 2015 14:26:21 -0800 (PST) Received: by 10.107.163.202 with HTTP; Fri, 18 Dec 2015 14:26:21 -0800 (PST) Date: Fri, 18 Dec 2015 17:26:21 -0500 Message-ID: Subject: Extending FIBs to support multi-tenancy From: Ryan Stone To: freebsd-transport@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 22:26:25 -0000 My employer is going through the process of extending our product to support multi-tenant networking. The details of what are product does isn't really relevant to the discussion -- it's enough to know that we have a number of daemons acting as servers for various network protocols. Multi-tenacy, as we've defined the feature, imposes the requirement that our network services be able to communicate with clients from completely independent networks. This has imposed the following new requirements on us: - different tenant networks may have different DNS servers - they may use different AAA mechanisms (e.g. one tenant uses LDAP, another uses Kerberos, and a third also uses LDAP but a different server) - they may use independent routing tables - different tenant networks may use overlapping IP ranges (so we might see two different clients from two different tenant networks with the IP 192.168.0.1, for instance) - traffic from different tenant networks is not guaranteed to be segregated in any way -- it might all come in the same network interface, without any vlan tagging or any other encapsulation that might differentiate tenant networks - we need to scale to thousands of tenant networks - we will impose the requirement that our system can't be assigned the same IP address for different tenant networks Our intention is to use the destination IP address of incoming packets to determine which tenant network the packet came from (hence the requirement of not allowing the same IP to be configured for different tenant networks).. The obvious tool for meeting these requirements in FreeBSD is VIMAGE. However, we have prototyped that approach already, and I have been told that we discovered that we found this will not scale to thousands of networks. I don't have all of the details as to why, but the root of the problem is that any given process can only be associated with a single vnet instance. In our current architecture, we can't have thousands of instances of each network service running (and I'm not sure that we really could: if we support A services, B tenant networks and C CPU cores, we would need a minimum of A * B * C threads to ensure that any given service on any single tenant network could fully utilize the system's resources to process requests). We're instead looking at using FIBs to implement the routing table requirement. To meet our requirements, we're expecting to have to make three important chances to how FIBs are managed: 1) Allow listening sockets to be wildcarded across FIBs 2) Make FIBs a property of an interface address, not the interface 3) Allow each thread to set a default FIB that will be applied to newly created sockets 1) We don't really want to change all of our services to instantiate one listening socket for every tenant network. Instead we're looking at implementing (and upstreaming) a kernel extension that allows a listening socket to be wildcarded across all FIBs (note: yesterday I described this feature as allowing us to pick-and-choose FIBs, but people internally have convinced me that a wildcard match would make their lives significantly easier). When a new connection attempt to a listening socket in this mode is accepted, the socket would not inherit its FIB from the listening socket. Instead, it would be set based on the local IP address of the connection. 2) Currently, FIBs are a property of an interface (struct ifnet). We aren't very enthusiastic about the prospect of having to create thousands of interfaces to support thousands of network interfaces. We would instead like to make the FIB a property of the interface address. For backwards compatibility reasons I would still let admins set a FIB on an ifnet, but instead that would be the default FIB assigned to addresses that aren't explicitly assigned a FIB. That should maintain the current behaviour while making it easy to push FIBs down into the address. 3) The idea of a per-thread FIB has gotten the most pushback so far, and I understand the objection. I'll explain the problem that we're trying to solve with this. When a new request comes in, we may need to perform authentication through LDAP or Kerberos. The problem is that the existing open-source implementations that we are using manage sockets directly. We really don't want to have to go through them and make their APIs entirely FIB-aware -- that is far too much churn. By moving awareness of the current FIB into the kernel, existing calls to socket() can do the right thing transparently. We're not entirely happy with the solution, but the "right" way to solve the problem involves rototilling a number of libraries. Even if we could convince the upstream projects to take patches, it's far more work than we're willing to take on. We're planning on doing the work ourselves, but we feel that coming up with a solution that FreeBSD is comfortable taking into head is critical. We really don't want to be carrying around diffs from upstream in such a critical part of the network stack, so I am hoping that we can come to an agreement on the best path forward for both sides. If anybody has any comments, concerns or objections to any of this, please pipe up now rather than after I've implemented all of this. Thanks, Ryan From owner-freebsd-transport@freebsd.org Sat Dec 19 01:32:27 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6EDCFA4BD6F for ; Sat, 19 Dec 2015 01:32:27 +0000 (UTC) (envelope-from jtl@freebsd.org) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0106.outbound.protection.outlook.com [207.46.100.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 14B9A1960; Sat, 19 Dec 2015 01:32:26 +0000 (UTC) (envelope-from jtl@freebsd.org) Received: from BL2PR05CA0023.namprd05.prod.outlook.com (10.255.226.23) by BN1PR05MB058.namprd05.prod.outlook.com (10.255.202.145) with Microsoft SMTP Server (TLS) id 15.1.361.13; Sat, 19 Dec 2015 01:32:19 +0000 Received: from BL2FFO11OLC003.protection.gbl (2a01:111:f400:7c09::174) by BL2PR05CA0023.outlook.office365.com (2a01:111:e400:c04::23) with Microsoft SMTP Server (TLS) id 15.1.361.13 via Frontend Transport; Sat, 19 Dec 2015 01:32:19 +0000 Authentication-Results: spf=softfail (sender IP is 66.129.239.19) smtp.mailfrom=freebsd.org; gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=freebsd.org; Received-SPF: SoftFail (protection.outlook.com: domain of transitioning freebsd.org discourages use of 66.129.239.19 as permitted sender) Received: from p-emfe01b-sac.jnpr.net (66.129.239.19) by BL2FFO11OLC003.mail.protection.outlook.com (10.173.161.187) with Microsoft SMTP Server (TLS) id 15.1.355.15 via Frontend Transport; Sat, 19 Dec 2015 01:32:18 +0000 Received: from magenta.juniper.net (172.17.27.123) by p-emfe01b-sac.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.123.3; Fri, 18 Dec 2015 17:32:15 -0800 Received: from [172.29.33.199] ([172.29.33.199]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id tBJ1WDD96294; Fri, 18 Dec 2015 17:32:13 -0800 (PST) (envelope-from jtl@freebsd.org) User-Agent: Microsoft-MacOutlook/14.5.9.151119 Date: Fri, 18 Dec 2015 20:32:10 -0500 Subject: Re: Extending FIBs to support multi-tenancy From: "Jonathan T. Looney" Sender: Jonathan Looney To: Ryan Stone CC: Gleb Smirnoff , "freebsd-transport@freebsd.org" Message-ID: Thread-Topic: Extending FIBs to support multi-tenancy References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; BL2FFO11OLC003; 1:HbIgk+9hD81q4wsIA2wWKBYoc3cTp9Rc2kufCgkgnft/Fsu/3/h+0EuHFIAugPFHWRJAqUpQLeKqpKEs5qyQE086GQ7W4BfKq2PrHeXGfJATtFcyYakgFrhx4ruPI4YRg6AbOYktR31uvfc0bTrj07LN8CTT9XPhqvfhqWe9WYpoVPiPN82zSzWz4tg4lIqiFmYHd3hmiTnAufH/x/8hF3eb2fZvrUYZIH+wnMBl+BtpAoUIrxtjK71kGqjnajBEe4+1eoG8VNFPxJy1Y30zHcEx0dm/DDQxVQlls/ZQbHx6EJzk2ZzKZG0HdR3fjjqmmN+v6rC4Fkwl4zyAr7dGVA== X-Forefront-Antispam-Report: CIP:66.129.239.19; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(2980300002)(377454003)(199003)(52604005)(479174004)(24454002)(54094003)(189002)(54356999)(23726003)(69596002)(46406003)(86362001)(19580405001)(1096002)(83506001)(106466001)(92566002)(87936001)(586003)(76176999)(105596002)(230700001)(50986999)(4001350100001)(97736004)(81156007)(47776003)(1220700001)(77096005)(16796002)(19580395003)(1411001)(2950100001)(11100500001)(50466002)(36756003)(6806005)(189998001)(110136002)(5001960100002)(42262002); DIR:OUT; SFP:1102; SCL:1; SRVR:BN1PR05MB058; H:p-emfe01b-sac.jnpr.net; FPR:; SPF:SoftFail; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BN1PR05MB058; 2:aPJNle06DTFaUaQ2J/pjpJKebZrp5bFIjF4XLLMWa5MaOtxtmQK9k3qHU61ZC+z1TlELygKY3+nAJCjoAZyL/k94cGr2fkn0PVK5IZ/UCy5UYRImWowto2Cz60vj7ayZqHeutsZ5fo05YZ170hMchg==; 3:VYOZ95TrXprtAWdBcs+0WhEMNLKFR8dz5TvWBIPGvI2DzoJrEk4+3wAhtedlQJ7ZiEULz+IaO1y8u/kr1bQR2TcBrFUiguQbU5p5YV2TCA8rr8xjwY+HRF7nAEMA+rcaLX8gudpExPepi4QYg/jr67HzrW+4w5COvl0yt6DCcx/5T3KVPNNPULr85cF9UqmW8n5T2Zeg5O5aQY9QeM0pLZgUUZQ8/Pt1G7RXdPd+KmY=; 25:KdDgO6EggX5GY8URQJPYJntlboXC7ZYi50RkSUgFl4hLp9hg5wE666UW4XwnyGwUbTg9U4eI7+98S6NxpyMSaj1/SK4w031nQE11DzP03PW/Ygpn2kV61y9FmcoN3pHfy9X/dgBaePT20jSiHqxcXMsLYZObgMb+8rH9lB5ZNxE9sbKpUM5q+VZ91ku/mqXZ4auiWLGb5r2Fqy5yltVr2AM+S+24pQe2rlW9mhzy9XO1aU6kh+M4cn7fffk5oPKXtOD8kb7bm94D1kWaUlgs+Q== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN1PR05MB058; X-Microsoft-Exchange-Diagnostics: 1; BN1PR05MB058; 20:irKydRf3UdorNCriWAEOBKo6Wt/JVr2GMeUckUFmTLvzITIq6okUY6b3AObNTXxXEYgkZM4LYrjdt6jsRlJebElnUFcfHxMkCZ82uh0diV3lTQ/c4aeQJP/vqxeKCsyxsr2sz9lQbeA57jfRzSSHIeq0a8ea/riazjtFyxBQVexQ8KCNz7QM2szv8abob9ebt3ryjbGpBX5soIoQwfatTmy9UB0kJZoZQI7BzqRqtF/7U3FWdPSjESDk7UpWU1BUg0zcoaDRvBNgL/lvheSrReKAC6GtZ0LXB4t1plOVE4Yx/6ymybKJiDfLlt97FAz+8MbUMn8xee+i1lclbPkTXV9Y99YKSQ4sfjN/RPTLigx5rLWhDw+D47/o4h8Cf4HRMTyF/D+fme4GIp/HneP8nZX5e57EsIEcb/yKs+6ZcNBCQ+8rOJ4A8AcgTvqZQoeiAMtWBDQe6JODTsHTlfSP08ftAuDhxabC8dQiHQAt5CJ9A1An/EHgRhBxQ58Hr/V3; 4:t5P6KIP17G8ey+Q4LFu/K8JEn3Pqy2Ks1UK1gxSHhjgOjXY0seJPhUlH2k2mBxGYWniSNJq9Mc2n8mFRG0yG6xj+RR4TOwtljoyhRuTlVpbqv1wam45Kais0LT6gZWMT8C2z4m2ohn9siL16wmQBmpTvOFMdxY7buBPjzxDtx4RwO3+vg6ZbJJwiNijbgv2R2Xairasl2TWE3/iN4sVIkeMQ4bliv1QHEebs8hmOfAK9ZFiJKm+KglxM9rHkd3wiW0/zDMk3Ywu0KbVy7vfq1+WpZkyCazjU5cRm1zzq8aFLiJwWK6UzJkrI2a73EFzQMKm3R3efXcqHLsqEVOtQ0/mtI6VnCc1jwuTbzYTJH9/mbXwFT0twllzIiTkZ0BhL X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001); SRVR:BN1PR05MB058; BCL:0; PCL:0; RULEID:; SRVR:BN1PR05MB058; X-Forefront-PRVS: 07954CC105 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN1PR05MB058; 23:oaoWoJncqa+K+DaK0k35gBVuNs5OmeIWI1j9NoAxKa?= =?us-ascii?Q?ccYbC4nkJXEHZqA+HvnzORa9lHvmh6adsXeez5H5RtK+g1vvfLlQ1rlr3JXF?= =?us-ascii?Q?hWDPlZ44nzfvE6KWP4mhJWt28SQ9nYFvjvvXIFw/hqbzq2aGi6cvz5AjKH2b?= =?us-ascii?Q?Iw5dV7Nuz4LA6WtycD8l1ky45v8SR/n6tDmCLLXvEFSk/NoiryIQOY9XJij5?= =?us-ascii?Q?xZWABMiw20MDYsi33GCC1Wv/kf9oMNIodS4V7+XUU9y+IoDYS2q6jgaOgmr6?= =?us-ascii?Q?BMxS/kFlqSdM9nrPtSEP0EPTNoEMfSzxhKDttcOyWkCC6HYrBf2K4czZg7jG?= =?us-ascii?Q?GcWfm+5TpWUgnjNzG30aW7P2d3WJRCPoJCotoFicxLj0AbOSUBVdN75UpNxN?= =?us-ascii?Q?aBbyApgkeyrSJufTVw0bjONeOcqweb0XQfaWDJCDXfa0zokB2Jps9E8rQjsf?= =?us-ascii?Q?6eGJ9CQ3aaE+5QORNf3DtucFa1q88RiCvfDacaVMrUAU7rQlwaEVHETAAin7?= =?us-ascii?Q?mELIQ+u3jlRkiOI46WwTa1ltP9bxYhMmYx8sIxb1u0Gmnf3fdUEEYYI1T6c7?= =?us-ascii?Q?ShrQtC1dZWPkXly/PH22GAFowKZKVLqzLwB5MhFM2sJDuHvErC1++vo67jBq?= =?us-ascii?Q?o2su7jhuQ24LYzEt/qpBA1jvjZDOi+rfSTumv35UYhxbaMJOEC9Wg73z4URZ?= =?us-ascii?Q?PZnlt3+pweM82WKFzszMmyl3R1x9sTSpoQTge5NrXr9G/dz/+xH0ulCRzrze?= =?us-ascii?Q?RLLzpV0pgtLFHw7b/RFdZlSZlUw6kx/qnvsWFJ6RwvqRckclfUxW3Epd5hk4?= =?us-ascii?Q?XXruWKTnnPTEgHgDUz7FJXAk7tz+Q2CR9pEI5+m0JeCv3cE6XsB426VM7lRE?= =?us-ascii?Q?DXqSLdQLUMPeMsctcASecDhTV826HOcgdG4lOiVZg+OATp4hQ7DEq6NytGu0?= =?us-ascii?Q?ZpHhHN5jwMnjosrkP1n9rdRmoEsH+dYFxnuo8ihEsqEjlSCIDfsPlmq9Ax7A?= =?us-ascii?Q?+gRwF8g9eiz4j1L25sI9+B+avT6LMZYYzU4F20bSqDRJYnJjqKSjQelie9HS?= =?us-ascii?Q?Ksio6l51HnD/DVWFEiByJTIrN1GDTUjg57GB/Q9alNxjCzICAAhLVzDEOUcb?= =?us-ascii?Q?FMcdhmmYM=3D?= X-Microsoft-Exchange-Diagnostics: 1; BN1PR05MB058; 5:6w8fr3EB2KGzc/+Lqv8J/2eLeiIano5i/KT9a8dLVuX7L8HeZMBiPPRasOR3rsVtJaKtbRgd+3vWWQHxleOQgUsl+jbDIcub0si3zQxX0oFCLAFUieoSXUs2CVy8KNDVriRXgw0brdEmHhKTO54lUQ==; 24:vw7pVlTsMye+lyBSzdIJPjRrYiR/Wi3oypDpedDWZyYnxe/yLG67+/Bmo9UHZLUMfb1qUEjBuGxcbkbsNZlJBnHh8GtqSGmaVDqu9DcPA0c= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Dec 2015 01:32:18.8131 (UTC) X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.19]; Helo=[p-emfe01b-sac.jnpr.net] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN1PR05MB058 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Dec 2015 01:32:27 -0000 On 12/18/15, 5:26 PM, "owner-freebsd-transport@freebsd.org on behalf of Ryan Stone" wrote: >- they may use independent routing tables [...] >- traffic from different tenant networks is not guaranteed to be >segregated >in any way -- it might all come in the same network interface, without any >vlan tagging or any other encapsulation that might differentiate tenant >networks The combination of these two requirements seems slightly odd to me. Usually, you need separate routing tables because you have separate interfaces. When you have shared interfaces, you can usually use the same routing table. I think it might help to have more information about the reasoning for these requirements, as it seems that this combination is what is leading you towards making the FIB assignment be an address property. >1) >We don't really want to change all of our services to instantiate one >listening socket for every tenant network. Instead we're looking at >implementing (and upstreaming) a kernel extension that allows a listening >socket to be wildcarded across all FIBs (note: yesterday I described this >feature as allowing us to pick-and-choose FIBs, but people internally have >convinced me that a wildcard match would make their lives significantly >easier). When a new connection attempt to a listening socket in this mode >is accepted, the socket would not inherit its FIB from the listening >socket. Instead, it would be set based on the local IP address of the >connection. Makes sense. My employer does something similar in their stack: listen sockets can be assigned to a particular FIB or be wildcard entries that listen in all FIBs. We haven't noticed any scaling problems, but we typically don't have high connection setup rates, either. In any case, I think this makes sense. >2) >Currently, FIBs are a property of an interface (struct ifnet). We aren't >very enthusiastic about the prospect of having to create thousands of >interfaces to support thousands of network interfaces. We would instead >like to make the FIB a property of the interface address. I don't understand the motivation for this. It would help if you would provide more context for the use case. (See my earlier comments.) At minimum, before proceeding, you should connect with the folks who had talked about wanting to make changes to ifnet. (Among other things, I think they had considered creating separate physical interface, logical interface, and interface address constructs.) I'm not sure what happened to that project, but I think it is still an ongoing project. I think Gleb (cc'd) was involved in that, so you might want to check with him. >3) >The idea of a per-thread FIB has gotten the most pushback so far, and I >understand the objection. I'll explain the problem that we're trying to >solve with this. When a new request comes in, we may need to perform >authentication through LDAP or Kerberos. The problem is that the existing >open-source implementations that we are using manage sockets directly. We >really don't want to have to go through them and make their APIs entirely >FIB-aware -- that is far too much churn. By moving awareness of the >current FIB into the kernel, existing calls to socket() can do the right >thing transparently. > >We're not entirely happy with the solution, but the "right" way to solve >the problem involves rototilling a number of libraries. Even if we could >convince the upstream projects to take patches, it's far more work than >we're willing to take on. Thanks for sharing more details on the use case. It certainly helps clarify the reasoning. However, I wonder if this really solves all of your problems. For example, you talk about needing to perform LDAP or Kerberos authentication. You are already going to need to make your application smart enough to figure out which servers to use based on the source of the incoming request. That may or may not require adding intelligence to your libraries to give you enough information to identify the incoming connection. Further, per-thread FIBs may not solve your scaling problem. You initially stated that your objection to VNET was that you would need a minimum of "A * B * C threads to ensure that any given service on any single tenant network could fully utilize the system's resources to process requests". If you assign threads to a particular FIB, then you are back in the A * B * C scaling model that you didn't want. However, on the other hand, if you maintain a smaller pool of threads and continually reassign their FIB, you could hit interesting problems if any of your libraries implement their own thread pools or event-driven libraries (e.g. libisc2). In those cases, they may try to switch contexts between connections as events occur. How will you ensure the thread's FIB is always assigned correctly? It seems like this could become quite complicated, depending on the exact situation. Per-thread FIBs have a lot of potential concerns, ESPECIALLY when implemented by programs or libraries that aren't expecting to work this way. The biggest concerns I see are complexity and troubleshooting: you need to make sure that every thread knows which FIB it is using and only handles connections for that FIB. If you make one mistake, your connection suddenly can go to the wrong place. Just my 2c. Others may disagree. Jonathan From owner-freebsd-transport@freebsd.org Sat Dec 19 02:07:33 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0016A4B699 for ; Sat, 19 Dec 2015 02:07:33 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-yk0-x233.google.com (mail-yk0-x233.google.com [IPv6:2607:f8b0:4002:c07::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AF36C13DC; Sat, 19 Dec 2015 02:07:33 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: by mail-yk0-x233.google.com with SMTP id 140so78825634ykp.0; Fri, 18 Dec 2015 18:07:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=rpQp66C0ZGj/Xdlupv84IeWYnC7B/oUv5v0z3uK8G3Y=; b=LuuK3iGe6wXu+OtowBPXCZT6FMgWt+qLW9VT538ItYHRXjB2rx5lxaFwglgt9zl878 8xbICF12/C9pf5k4IfggDCt0KRSWpkaI9Yh3xrhusLV1dTRRwetAktrTJNvBWkKsmEPy oX/fkI3GeeOit92yUprwicnAAFbsd1EJ5x3QF937rkP+K4dFPcBxYdKLmZLYAu4ziBH6 BGXDpE1Nw5DlK1+ct3nzDzyFjN4V8l6uSBEnz+6yNxCbAX4wBppkb6tO0cMfSXmT3quN 3LdB8YzlAEl2+ilUEZC8c6qIEFATBTKRKoJKa3Yr1d7VZAOS2Jiya827AamPcp9pjElB zF4Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ieee-org.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=rpQp66C0ZGj/Xdlupv84IeWYnC7B/oUv5v0z3uK8G3Y=; b=Xw9ZKf38adx6BYdtsexgCDvsdwmEnfYfqJFtfuzH7nrqpzfxC5Xw1Kr0U0OypMN0wG hql3Y1ooL70TNeb0U9kwJyxfKaByM5shya92UoBREKNHOBZHsO6A8IVsanFRfczIJS4h hNXUnt/HjU4REgonvhOWrK2L4c+kyTPgWQ2UqgVTx0q2tBhy6Y99rD49iyrdKBtbfjEY Gy9w/8eCZR+Ei+sFA3iTYKhOE4GQ6rC2KF9zPJ1bwPlowQk8DkdQ/zoXFyz07SrzK8ma OrcxYSykMOeI4K7/MDL94/7YB+JDyT4jopHijqvVsaZ9LLn+Q7hx0KZqbxe/BySSAfSB QR1A== MIME-Version: 1.0 X-Received: by 10.129.101.5 with SMTP id z5mr6527423ywb.184.1450490852778; Fri, 18 Dec 2015 18:07:32 -0800 (PST) Sender: pkelsey@gmail.com Received: by 10.13.211.65 with HTTP; Fri, 18 Dec 2015 18:07:32 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Dec 2015 21:07:32 -0500 X-Google-Sender-Auth: wwtGBwhFsGjcVG6uLrWFZvPOUf4 Message-ID: Subject: Re: Extending FIBs to support multi-tenancy From: Patrick Kelsey To: "Jonathan T. Looney" Cc: Ryan Stone , "freebsd-transport@freebsd.org" , Gleb Smirnoff Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Dec 2015 02:07:34 -0000 On Fri, Dec 18, 2015 at 8:32 PM, Jonathan T. Looney wrote: > On 12/18/15, 5:26 PM, "owner-freebsd-transport@freebsd.org on behalf of > Ryan Stone" rysto32@gmail.com> wrote: > > >- they may use independent routing tables > [...] > >- traffic from different tenant networks is not guaranteed to be > >segregated > >in any way -- it might all come in the same network interface, without any > >vlan tagging or any other encapsulation that might differentiate tenant > >networks > > The combination of these two requirements seems slightly odd to me. > Usually, you need separate routing tables because you have separate > interfaces. When you have shared interfaces, you can usually use the same > routing table. > > I think it might help to have more information about the reasoning for > these requirements, as it seems that this combination is what is leading > you towards making the FIB assignment be an address property. > > > > >1) > >We don't really want to change all of our services to instantiate one > >listening socket for every tenant network. Instead we're looking at > >implementing (and upstreaming) a kernel extension that allows a listening > >socket to be wildcarded across all FIBs (note: yesterday I described this > >feature as allowing us to pick-and-choose FIBs, but people internally have > >convinced me that a wildcard match would make their lives significantly > >easier). When a new connection attempt to a listening socket in this mode > >is accepted, the socket would not inherit its FIB from the listening > >socket. Instead, it would be set based on the local IP address of the > >connection. > > Makes sense. My employer does something similar in their stack: listen > sockets can be assigned to a particular FIB or be wildcard entries that > listen in all FIBs. We haven't noticed any scaling problems, but we > typically don't have high connection setup rates, either. > > In any case, I think this makes sense. > I did have an earlier concern that the worst-case wildcard search time for an inpcb lookup might be doubled, depending on the desired properties of the FIB wildcarding. That would only be true if the FIB number was made part of the hash key in order to support the desired behavior (as in that case twice as many buckets may need to be searched), but I don't see that as necessary to achieve what's being described here. With the FIB remaining outside the hash key, the only impact to lookup would be that if a wildcard-FIB inpcb is encountered during a bucket walk, the remainder of the bucket would have to be walked to rule out a match with a specific FIB, which is a relatively small cost that would only be incurred by applications using the wildcard-FIB feature. > > > >2) > >Currently, FIBs are a property of an interface (struct ifnet). We aren't > >very enthusiastic about the prospect of having to create thousands of > >interfaces to support thousands of network interfaces. We would instead > >like to make the FIB a property of the interface address. > > I don't understand the motivation for this. It would help if you would > provide more context for the use case. (See my earlier comments.) > > At minimum, before proceeding, you should connect with the folks who had > talked about wanting to make changes to ifnet. (Among other things, I > think they had considered creating separate physical interface, logical > interface, and interface address constructs.) I'm not sure what happened > to that project, but I think it is still an ongoing project. I think Gleb > (cc'd) was involved in that, so you might want to check with him. > > > >3) > >The idea of a per-thread FIB has gotten the most pushback so far, and I > >understand the objection. I'll explain the problem that we're trying to > >solve with this. When a new request comes in, we may need to perform > >authentication through LDAP or Kerberos. The problem is that the existing > >open-source implementations that we are using manage sockets directly. We > >really don't want to have to go through them and make their APIs entirely > >FIB-aware -- that is far too much churn. By moving awareness of the > >current FIB into the kernel, existing calls to socket() can do the right > >thing transparently. > > > >We're not entirely happy with the solution, but the "right" way to solve > >the problem involves rototilling a number of libraries. Even if we could > >convince the upstream projects to take patches, it's far more work than > >we're willing to take on. > > Thanks for sharing more details on the use case. It certainly helps > clarify the reasoning. > > However, I wonder if this really solves all of your problems. For example, > you talk about needing to perform LDAP or Kerberos authentication. You are > already going to need to make your application smart enough to figure out > which servers to use based on the source of the incoming request. That may > or may not require adding intelligence to your libraries to give you > enough information to identify the incoming connection. > I believe what Ryan is saying is that he would be using an INADDR_ANY, FIB_ANY listen for a given service, and for any incoming connection, the FIB would be chosen based on the local address used in that connection. That is what drives the constraint he gave that a given service lives at a unique IP address across all tenant networks > > Further, per-thread FIBs may not solve your scaling problem. You initially > stated that your objection to VNET was that you would need a minimum of "A > * B * C threads to ensure that any given service on any single tenant > network could fully utilize the system's resources to process requests". > If you assign threads to a particular FIB, then you are back in the A * B > * C scaling model that you didn't want. > I think it would be reduced to A * C threads, where A was the number of services and C the number of CPUs - what you would drop is the B dimension (replication of service connections across all tenant networks). > > However, on the other hand, if you maintain a smaller pool of threads and > continually reassign their FIB, you could hit interesting problems if any > of your libraries implement their own thread pools or event-driven > libraries (e.g. libisc2). In those cases, they may try to switch contexts > between connections as events occur. How will you ensure the thread's FIB > is always assigned correctly? It seems like this could become quite > complicated, depending on the exact situation. > > Per-thread FIBs have a lot of potential concerns, ESPECIALLY when > implemented by programs or libraries that aren't expecting to work this > way. The biggest concerns I see are complexity and troubleshooting: you > need to make sure that every thread knows which FIB it is using and only > handles connections for that FIB. If you make one mistake, your connection > suddenly can go to the wrong place. > > There's an earlier message of mine that got sent off for moderation (due to source address and my subscription config) that may yet surface, in which I suggest leaving FIB selection policy in the application by using wrapper functions around the desired set of socket library calls (see ld(1) --wrap). -Patrick From owner-freebsd-transport@freebsd.org Sat Dec 19 01:25:36 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E5D4A4B6F8 for ; Sat, 19 Dec 2015 01:25:36 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-yk0-x234.google.com (mail-yk0-x234.google.com [IPv6:2607:f8b0:4002:c07::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 441D31142 for ; Sat, 19 Dec 2015 01:25:36 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: by mail-yk0-x234.google.com with SMTP id p130so78206738yka.1 for ; Fri, 18 Dec 2015 17:25:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=IKJ0ID2lDOQOSTEXBeP47afk+4ee9Sm3jELhOcZ5b0w=; b=XixxoJo+bsCDJPxGbiP2vQLIWZJ+xjufqep0VDbH820aL4bo7i8yBdmEI/Qp6yKbq2 wld0HcMPzYTsITdV1I/2vajz5vzMgBjkmQfkah5pC8UKgDHN6XuJsepPYFnG04yIg6M7 UhOqV14kHlHtAb4O1d68YMGiyDPtEMU+fa6QjBtK7tybKUgJD8LYnBu2S2NrkmcQRQNO EMwWv3PWb59MdbXVUTbZH2oi4PDq3dqcbcw2IO3cqqy48V+f5y3OJ6DgiUedRUB3T3q1 /l1Ne6rangBfMfTjOAT1SbdMrhPaSxo9JBF8YZ25yl1/ArfyTbGJwXA3y5plI7sdrSrI etqw== MIME-Version: 1.0 X-Received: by 10.129.75.145 with SMTP id y139mr5639908ywa.32.1450488335248; Fri, 18 Dec 2015 17:25:35 -0800 (PST) Sender: pkelsey@gmail.com Received: by 10.13.211.65 with HTTP; Fri, 18 Dec 2015 17:25:35 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Dec 2015 20:25:35 -0500 X-Google-Sender-Auth: WsQAH2TPF_vWjrI4tXRTdz3mpnw Message-ID: Subject: Re: Extending FIBs to support multi-tenancy From: Patrick Kelsey To: Ryan Stone Cc: freebsd-transport@freebsd.org X-Mailman-Approved-At: Sat, 19 Dec 2015 07:55:30 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Dec 2015 01:25:36 -0000 On Fri, Dec 18, 2015 at 5:26 PM, Ryan Stone wrote: > My employer is going through the process of extending our product to > support multi-tenant networking. The details of what are product does > isn't really relevant to the discussion -- it's enough to know that we have > a number of daemons acting as servers for various network protocols. > Multi-tenacy, as we've defined the feature, imposes the requirement that > our network services be able to communicate with clients from completely > independent networks. This has imposed the following new requirements on > us: > > - different tenant networks may have different DNS servers > - they may use different AAA mechanisms (e.g. one tenant uses LDAP, another > uses Kerberos, and a third also uses LDAP but a different server) > - they may use independent routing tables > - different tenant networks may use overlapping IP ranges (so we might see > two different clients from two different tenant networks with the IP > 192.168.0.1, for instance) > - traffic from different tenant networks is not guaranteed to be segregated > in any way -- it might all come in the same network interface, without any > vlan tagging or any other encapsulation that might differentiate tenant > networks > - we need to scale to thousands of tenant networks > - we will impose the requirement that our system can't be assigned the same > IP address for different tenant networks > > Our intention is to use the destination IP address of incoming packets to > determine which tenant network the packet came from (hence the requirement > of not allowing the same IP to be configured for different tenant > networks).. > > The obvious tool for meeting these requirements in FreeBSD is VIMAGE. > However, we have prototyped that approach already, and I have been told > that we discovered that we found this will not scale to thousands of > networks. I don't have all of the details as to why, but the root of the > problem is that any given process can only be associated with a single vnet > instance. In our current architecture, we can't have thousands of > instances of each network service running (and I'm not sure that we really > could: if we support A services, B tenant networks and C CPU cores, we > would need a minimum of A * B * C threads to ensure that any given service > on any single tenant network could fully utilize the system's resources to > process requests). > > > We're instead looking at using FIBs to implement the routing table > requirement. To meet our requirements, we're expecting to have to make > three important chances to how FIBs are managed: > > 1) Allow listening sockets to be wildcarded across FIBs > 2) Make FIBs a property of an interface address, not the interface > 3) Allow each thread to set a default FIB that will be applied to newly > created sockets > > 1) > We don't really want to change all of our services to instantiate one > listening socket for every tenant network. Instead we're looking at > implementing (and upstreaming) a kernel extension that allows a listening > socket to be wildcarded across all FIBs (note: yesterday I described this > feature as allowing us to pick-and-choose FIBs, but people internally have > convinced me that a wildcard match would make their lives significantly > easier). When a new connection attempt to a listening socket in this mode > is accepted, the socket would not inherit its FIB from the listening > socket. Instead, it would be set based on the local IP address of the > connection. > > 2) > Currently, FIBs are a property of an interface (struct ifnet). We aren't > very enthusiastic about the prospect of having to create thousands of > interfaces to support thousands of network interfaces. We would instead > like to make the FIB a property of the interface address. For backwards > compatibility reasons I would still let admins set a FIB on an ifnet, but > instead that would be the default FIB assigned to addresses that aren't > explicitly assigned a FIB. That should maintain the current behaviour > while making it easy to push FIBs down into the address. > > 3) > The idea of a per-thread FIB has gotten the most pushback so far, and I > understand the objection. I'll explain the problem that we're trying to > solve with this. When a new request comes in, we may need to perform > authentication through LDAP or Kerberos. The problem is that the existing > open-source implementations that we are using manage sockets directly. We > really don't want to have to go through them and make their APIs entirely > FIB-aware -- that is far too much churn. By moving awareness of the > current FIB into the kernel, existing calls to socket() can do the right > thing transparently. > > We're not entirely happy with the solution, but the "right" way to solve > the problem involves rototilling a number of libraries. Even if we could > convince the upstream projects to take patches, it's far more work than > we're willing to take on. > > Why not address (3) by creating your own wrappers of the socket library calls whose behavior you want to change (see --wrap in ld(1))? Your wrapper(s) could then implement whatever application-specific FIB selection policy you'd like, for example, checking for a particular thread local storage key whose value indicates the FIB to use, and if defined setting the FIB ahead of invoking the original socket library call.