From owner-freebsd-net@FreeBSD.ORG Thu Aug 14 22:01:33 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D90C0E80; Thu, 14 Aug 2014 22:01:32 +0000 (UTC) Received: from forward-corp1e.mail.yandex.net (forward-corp1e.mail.yandex.net [IPv6:2a02:6b8:0:202::10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Certum Level IV CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E908E2E86; Thu, 14 Aug 2014 20:28:37 +0000 (UTC) Received: from smtpcorp4.mail.yandex.net (smtpcorp4.mail.yandex.net [95.108.252.2]) by forward-corp1e.mail.yandex.net (Yandex) with ESMTP id 0F8AE640582; Fri, 15 Aug 2014 00:28:32 +0400 (MSK) Received: from smtpcorp4.mail.yandex.net (localhost [127.0.0.1]) by smtpcorp4.mail.yandex.net (Yandex) with ESMTP id BDA7F2C05F8; Fri, 15 Aug 2014 00:28:32 +0400 (MSK) Received: from unknown (unknown [2a02:6b8:0:c33::a5]) by smtpcorp4.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id HJ7r5n9jg8-SWIWUmvU; Fri, 15 Aug 2014 00:28:32 +0400 (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client certificate not present) X-Yandex-Uniq: b1347399-3fcd-4e6f-a81f-1a68edba693f DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1408048112; bh=AMhV+l3u5lxVYwQRR0+Y4s9FBo/KJnT4CWVFgwmsEcM=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type; b=xde01vkgzPfZyegBt7DxlWDZoR69gzEzUGykzvJJugnBOw6wDGiziLPBk8Ll2qmuZ NUzoE70kr6gD6ug1m6sq3Cn2pXfb6CP6rBWvk+ffHpFRz+RAKWWRm1CpUj0WezOw7s qu2ZELAgZ0lmr6Xp8585azQYHLkJAMhWXynv2okM= Authentication-Results: smtpcorp4.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Message-ID: <53ED1BEB.7000409@yandex-team.ru> Date: Fri, 15 Aug 2014 00:28:27 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: [CFT] new tables for ipfw References: <53EBC687.9050503@yandex-team.ru> <53EC880B.3020903@yandex-team.ru> <53EC960A.1030603@yandex-team.ru> <53ECA302.8010100@yandex-team.ru> In-Reply-To: <53ECA302.8010100@yandex-team.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-net@freebsd.org" , Luigi Rizzo , "Andrey V. Elsukov" , freebsd-ipfw X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Aug 2014 22:01:33 -0000 On 14.08.2014 15:52, Alexander V. Chernikov wrote: > On 14.08.2014 15:15, Luigi Rizzo wrote: >> >> >> >> On Thu, Aug 14, 2014 at 12:57 PM, Alexander V. Chernikov >> > wrote: >> >> On 14.08.2014 14:44, Luigi Rizzo wrote: >>> >>> >>> >>> On Thu, Aug 14, 2014 at 11:57 AM, Alexander V. Chernikov >>> > wrote: >>> >>> On 14.08.2014 13:23, Luigi Rizzo wrote: >>>> >>>> >>>> >>>> On Wed, Aug 13, 2014 at 10:11 PM, Alexander V. Chernikov >>>> > >>>> wrote: >>>> >>>> Hello list. >>>> >>>> I've been hacking ipfw for a while and It seems there >>>> is something ready to test/review in projects/ipfw branch. >>>> >>>> >>>> ​this is a fantastic piece of work, thanks for doing it and for >>>> integrating the feedback. >>>> ​ >>>> I have some detailed feedback that will send you privately, >>>> but just a curiosity: >>>> >>>> ​...​ >>>> >>>> Some examples (see ipfw(8) manual page for the >>>> description): >>>> >>>> >>>> ​... >>>> >>>> >>>> ipfw table mi_test create type cidr algo "cidr:hash >>>> masks=/30,/64" >>>> >>>> >>>> ​why do we need to specify mask lengths in the above​ ? >>> Well, since we're hashing IP we have to know mask to cut >>> host bits in advance. >>> (And the real reason is that I'm too lazy to implement >>> hierarchical matching (check /32, then /31, then /30) like >>> how, for example, >>> >>> >>> ​oh well for that we should use cidr:radix >>> >>> Research results have never shown a strong superiority of >>> hierarchical hash tables over good radix implementations, >>> and in those cases one usually adopts partial prefix >>> expansion so you only have, say, masks that are a >>> multiple of 2..8 bits so you only need a small number of >>> hash lookups. >> Definitely, especially for IPv6. So I was actually thinking about >> covering some special sparse cases (e.g. someone having a bunch >> of /32 and a bunch of /30 and that's all). >> >> Btw, since we're talking about "good radix implementation": what >> license does DXR have? :) >> Is it OK to merge it as another cidr implementation? >> >> >> "cidr" is a very ugly name, i'd rather use "addr" > Ok, no problem with that. "addr" really sounds better. >> >> DXR has a ​bsd license and of course it is possible to use it. >> You should ask Marko Zec for his latest version of the code >> (and probably make sure we have one copy of the code in the source tree). > Great!. I'll ask him :) >> >> Speaking of features, one thing that would be nice is the ability >> for tables to reference the in-kernel tables (e.g. fibs, socket >> lists, interface lists...), perhaps in readonly mode. >> How complex do you think that would be ? Well, the most major problem is that tables handling code assumed that we do known number of items in advance, and since we're holding locks it won't change, so we don't need large contigious buffer to dump data to. This is not the case with "external" tables, so we can't _reliably_ dump them (the same situation as in case of dynamic states). Anyway, I've added cidr:kfib algo ( http://svnweb.freebsd.org/base?view=revision&revision=270001 ) and it looks funny. Quoting commit message: # ipfw table fib2 create algo "cidr:kfib fib=2" # ipfw table fib2 info +++ table(fib2), set(0) +++ kindex: 2, type: cidr, locked valtype: number, references: 0 algorithm: cidr:kfib fib=2 items: 11, size: 288 # ipfw table fib2 list +++ table(fib2), set(0) +++ 10.0.0.0/24 0 127.0.0.1/32 0 ::/96 0 ::1/128 0 ::ffff:0.0.0.0/96 0 2a02:978:2::/112 0 fe80::/10 0 fe80:1::/64 0 fe80:2::/64 0 fe80:3::/64 0 ff02::/16 0 # ipfw table fib2 lookup 10.0.0.5 10.0.0.0/24 0 # ipfw table fib2 lookup 2a02:978:2::11 2a02:978:2::/112 0 # ipfw table fib2 detail +++ table(fib2), set(0) +++ kindex: 2, type: cidr, locked valtype: number, references: 0 algorithm: cidr:kfib fib=2 items: 11, size: 288 IPv4 algorithm radix info items: 0 itemsize: 200 IPv6 algorithm radix info items: 0 itemsize: 200 > Implementing algo support for particular provider like sockets/iflists > shouldn't be hard. Most of the algorithms complexity lies in table > modifications. Here we have to support > lookup and dump operations, so it is the question of providing > necessary bindings to existing mechanisms (via some direct binding or > utilizing things like kernel_sysctl for dump support). > > It looks like the following maps well to current table concept: > * such tables are not created by default > * user issues > `ipfw table kfib create type addr algo "addr:kernel fib=0"` > or > `ipfw table ktcp create type flow algo "flow:kernel_tcp fib=0"` > or > `ipfw table kiface create type iface algo "iface:kernel"` > * tables have special "readonly" type, flush_all requests are ignored > * no state stored internally > > So generic table handling code needs to be modified to support > read-only tables (and making more callbacks optional). > Additionally, we might need to proxy "info" request info algo callback > (optional, "real" algorithms won't implement it) to be able to show > number of items (and some other info) to user. > > > >> >> cheers >> luigi >> >