From owner-freebsd-current@freebsd.org Fri Mar 27 07:49:05 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B0D4A26D33A for ; Fri, 27 Mar 2020 07:49:05 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 48pYrp4W6vz4Mdc for ; Fri, 27 Mar 2020 07:48:58 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: by mailman.nyi.freebsd.org (Postfix) id 5ABB526D338; Fri, 27 Mar 2020 07:48:50 +0000 (UTC) Delivered-To: current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5410926D336; Fri, 27 Mar 2020 07:48:50 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from forward103o.mail.yandex.net (forward103o.mail.yandex.net [37.140.190.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48pYrB1GLKz4MTh; Fri, 27 Mar 2020 07:48:24 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mxback20o.mail.yandex.net (mxback20o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::71]) by forward103o.mail.yandex.net (Yandex) with ESMTP id 77ACF5F8017A; Fri, 27 Mar 2020 10:48:14 +0300 (MSK) Received: from localhost (localhost [::1]) by mxback20o.mail.yandex.net (mxback/Yandex) with ESMTP id lqfV92whFj-mD6GbFKv; Fri, 27 Mar 2020 10:48:13 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfw.ru; s=mail; t=1585295293; bh=7KDc+mlqaxPiIXhVAu1rq5V1k7I9L0IXpdzKTg2gg/Q=; h=Message-Id:Date:Subject:To:From; b=B8Dn/07UmJmzRB0gqrvatkHjqgDMcXJJZWqG935HqA4EmMKW4DZBgezg6ECPC6TPU S71B8e0nHiJ83z7oC8OelBPhTMOBhktPVc5XHm0IbA4KJGtS61fb0g2O1zYU2RC0ql ojDZ34uWyMecL531X8pxMe7GH6fNoV+AiOvrkAgw= Received: by myt6-4218ece6190d.qloud-c.yandex.net with HTTP; Fri, 27 Mar 2020 10:48:13 +0300 From: Alexander V. Chernikov To: "current@FreeBSD.org" , net Subject: CFT: Next-hop objects and scalable multipath routing MIME-Version: 1.0 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Fri, 27 Mar 2020 07:48:13 +0000 Message-Id: <1607511585294876@vla1-b2d94eaf2344.qloud-c.yandex.net> Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Rspamd-Queue-Id: 48pYrB1GLKz4MTh X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ipfw.ru header.s=mail header.b=B8Dn/07U; dmarc=none; spf=pass (mx1.freebsd.org: domain of melifaro@ipfw.ru designates 37.140.190.177 as permitted sender) smtp.mailfrom=melifaro@ipfw.ru X-Spamd-Result: default: False [-6.30 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; R_DKIM_ALLOW(-0.20)[ipfw.ru:s=mail]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:37.140.128.0/18]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[ipfw.ru]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[4]; IP_SCORE(-3.70)[ip: (-9.79), ipnet: 37.140.128.0/18(-4.89), asn: 13238(-3.85), country: RU(0.01)]; DKIM_TRACE(0.00)[ipfw.ru:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_LOW(-0.10)[177.190.140.37.list.dnswl.org : 127.0.5.1]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:13238, ipnet:37.140.128.0/18, country:RU]; RCVD_TLS_LAST(0.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2020 07:49:05 -0000 I would like to introduce an implementation of scalable multipath routing. Previous implementation (RADIX_MPATH) focused on a simpler case like having 2 defaults, with performance falling linearly proportional to the number of paths. That implementation was also tightly coupled lookup algorithm details with the routing details, making it hard to hack both. The proposed one allows O(1) lookup and is more cache-efficient with the large amount of routes. Furthermore, multipath functionality is based on the number of internal changes, modernizing the old routing code. Most of the changes revolves around introducing the concept of _nexthops_. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway, interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Multipath implementation adds _nexthop groups_ which are basically collection of nexthops weights, compiled into an array, to allow direct nexthop selection. More detailed technical description is available at [1]. Any comments/suggestions are welcome! Presentation of the similar functionality in the other OS: [2] Next-hop objects support was implemented in FRR in 2019 [3]. Next steps: As these changes decouples routing code details from algorithm details and abstracts callers, it is much easier to introduce a number of other relevant features. The most important proposed features are: nexthop-based route installation and custom per-address-family route lookup algorithms. The former targets improving convergence times for the large-fib boxes, while the latter may improve dataplane performance, especially for IPv6. How to test: fetch the patch from https://reviews.freebsd.org/D24141 rebuild kernel with ROUTE_MPATH option (already added to amd64 GENERIC) Optionally, rebuild world to get netstat nexthops/multipath groups reporting. Use route(8) to add multiple routes for the same destination, optionally specifying weight. Example: add 2:1 load balancing for the default route: route add -net default 192.168.53.1 -weight 100 route add -net default 192.168.53.2 -weight 200 netstat -4rnW .. Destination Gateway Flags Nhop# Mtu Netif Expire default 192.168.53.1 UGS 4 1500 em0 default 192.168.53.2 UGS 5 1500 em0 netstat -4onW Nexthop data Idx Type IFA Gateway Flags Use Mtu Netif Addrif Refcnt Prepend .. 4 v4/gw 192.168.53.128 192.168.53.1 GS 0 1500 em0 2 5 v4/gw 192.168.53.128 192.168.53.2 GS 0 1500 em0 1 Nexthop groups data MpIdx NHIdx Weigh Slots Gateway Netif Refcnt 1 ---- ---- ---- ---- ---- 1 4 100 1 192.168.53.1 em0 5 200 2 192.168.53.2 em0 [1] https://reviews.freebsd.org/D24141 [2] https://linuxplumbersconf.org/event/4/contributions/434/attachments/251/436/nexthop-objects-talk.pdf [3] https://github.com/FRRouting/frr/commit/d9f5b2f50f53d625986dbd47cd12778c9f841f0c