From owner-freebsd-net@FreeBSD.ORG Fri Aug 15 13:25:13 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9BFD08E0; Fri, 15 Aug 2014 13:25:13 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2C7D32D20; Fri, 15 Aug 2014 13:25:13 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1XIDXm-000D82-Q8; Fri, 15 Aug 2014 13:11:30 +0400 Message-ID: <53EE0A30.4020800@FreeBSD.org> Date: Fri, 15 Aug 2014 17:25:04 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Dmitry Selivanov Subject: Re: ipfw named objejcts, table values and syntax change References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org> <20140804115817.GA13814@onelab2.iet.unipi.it> <53DFE438.5050209@FreeBSD.org> <53E4BE62.4050303@rlan.ru> In-Reply-To: <53E4BE62.4050303@rlan.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Aug 2014 13:25:13 -0000 On 08.08.2014 16:11, Dmitry Selivanov wrote: > 04.08.2014 23:51, Alexander V. Chernikov пишет: >> On 04.08.2014 15:58, Luigi Rizzo wrote: >>> On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote: >>>> On 02.08.2014 12:33, Alexander V. Chernikov wrote: >>>>> On 02.08.2014 10:33, Luigi Rizzo wrote: >>>>>> >>>>>> >>>>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov >>>>>> > wrote: >>>>>> >>>>>> Hello all. >>>>>> >>>>>> I'm currently working on to enhance ipfw in some areas. >>>>>> The most notable (and user-visible) change is named table >>>>>> support. >>>>>> The other one is support for different lookup algorithms >>>>>> for different >>>>>> key types. >>>>>> >>>>>> For example, new ipfw permits writing this: >>>>>> >>>>>> ipfw table tb1 create type cidr >>>>>> ipfw add allow ip from table(tl1) to any >>>>>> ipfw add allow ip from any lookup dst-ip tb1 >>>>>> >>>>>> ipfw table if1 create type iface >>>>>> ipfw add skipto tablearg ip from any to any via table(if1) >>>>>> >>>>>> or even this: >>>>>> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port >>>>>> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 >>>>>> ipfw add allow ip from any to any flow table(fl1) >>>>>> >>>>>> all these changes fully preserve backward compatibility. >>>>>> (actually tables needs now to be created before use and >>>>>> their type needs >>>>>> to match with opcode used, but new ipfw(8) performs >>>>>> auto-creation >>>>>> for cidr tables). >>>>>> >>>>>> There is another thing I'm going to change and I'm not sure >>>>>> I can keep >>>>>> the same compatibility level. >>>>>> >>>>>> Table values, from one point of view, can be classified to >>>>>> the following >>>>>> types: >>>>>> >>>>>> - skipto argument >>>>>> - fwd argument (*) >>>>>> - link to another object (nat, pipe, queue) >>>>>> - plain u32 (not bound to any object) >>>>>> (divert/tee,netgraph,tag/utag,limit) >>>>>> >>>>>> There are the following reasons why I think it is necessary >>>>>> to implement >>>>>> explicit table values typing (like tables): >>>>>> - Implementing fwd tablearg for IPv6 hosts requires >>>>>> indirection table >>>>>> - Converting nat/pipe instance ids to names renders values >>>>>> unusable >>>>>> - retiring old hack with storing saved pointer of found >>>>>> object/rule >>>>>> inside rule w/o proper locking >>>>>> - making faster skipto >>>>>> >>>>>> >>>>>> ??????i don't buy the idea that you need typed arguments >>>>>> for all the cases above. Maybe the case that >>>>>> may make sense is the fwd argument (and in the future >>>>>> something else). >>>>>> We already discussed, i think, the fact that now it >>>>>> is legal to have references to non existing things >>>>>> (skipto, pipes etc.) implemented as u32. >>>>>> Removing that would break configurations. >>>>> It depends on actual implementation. This can be preserved by >>>>> auto-creating necessary objects in kernel and/or in userspace, so >>>>> we can (and should) avoid breaking in this particular way. >>>> Can you please explain your vision on values another time? >>>> As far as I understand, you're not against it in general, but the >>>> details matter: >>>> * IP address can be one of the types (it won't break much, and we can >>>> simply skip that one for MFC) >>>> * what about typing for nat/pipes ? we're not going to convert >>>> their ids >>>> to names? (or maybe you can suggest other non-disruptive way?) >>>> * everything else is type "u32" >>> >>> Correct, I am mostly concerned about the details, not on the general >>> concept. >>> >>> To summarize the discussion Alexander and I had about converting >>> identifiers from numbers to arbitrary strings (this is partly related >>> to the values stored in tables, but I think we should have a coherent >>> behaviour) >>> >>> 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits >>> or less) >>> for rules, pipes, queues, tables, probably nat instances. >>> >>> 2. CURRENTLY, in all the above contexts, it is legal to reference a >>> non existing object (rule, pipe, table names, etc.), >>> and the kernel will do something reasonable, namely jump to the >>> next rule, drop traffic for non existing pipes, and so on. >>> >>> 3. of course we want to preserve backward compatibility both for >>> the ioctl interface, and for user configurations. >>> >>> 4. The in-kernel representation of identifiers is not visible to users, >>> so we can use a numeric representation in the kernel for >>> identifiers. >>> Strings like "12345" are converted with atoi() or the like, >>> whereas for other identifiers or numbers outside of the 2^16 range >>> the kernel manages a translation table, allocating new numeric >>> identifiers if a new string appears. >>> This permits backward compatibility for old rulesets, and does not >>> impact performance because the translation table is only >>> used during rules additions or deletion. >> Yes. However this requires either holding either (1) 2 pointers (old&new >> arrays), or (2) 65k+ index array, or (3) chained hash table. >> (1) would require additional pointers for each subsystem (and some >> additional management), >> (2) will definitely upset embedded guys and >> (3) is worse in terms of performance >>> >>> With this in mind, i think we should follow a similar approach for >>> objects stored in tables, hence >>> >>> if an u32 value was available in the past, it must be >>> available also in the new implementation. >>> >>> The issue with tables is that some convoluted configuration could >>> use the same table to reference pipes _and_ rules _and_ perhaps >>> other things represented as numbers (the former is not too strange, >>> if i have a large configuration i might place sections at rules >>> 12000, 13000, 14000... and associate pipes with the same numberic >>> identifier to each block of rules). >>> >>> Typed table values would clearly disturb backward compatibility >>> in the above configurations. However it should not be difficult >>> to accept arbitrary strings as the values stored in tables, and >>> then store multiple representations as appropriate, including: >> Well, I've thought about thas one. It may be an option, but the details >> are not so promising (below) >>> - the string representation, unconditionally >>> - for names that can be resolved by DNS, the ipv6 and ipv4 address(es) >>> associated with them. ipfw already translates hostnames in rules >>> so this is POLA >> I'm not happy what ipfw(8) is doing instead of translation. The proper >> way would be not simply using first AF_INET answer but saving ALL >> IPv4+IPv6 records inside rule (and some more tracking should be done >> afterwards, but that's totally different story). Additionally, I'm >> unsure if we really need next-hop value expressed as hostname (how can >> we deal with multiple addresses and diffrent AFs?). We may store strings >> (and I think we should do it) but I'm unsure about this particular >> option of interpreting them. >>> - for other strings, a u32 from the translation table as previously >>> indicated >>> - and for numeric values, the u32 representation (truncated if needed, >>> according to whatever is the existing behaviour) >>> - >>> If we cannot generate an u32 we will put some value (e.g. 0) >>> that hopefully will not cause confusion. >> As far as I understand, we accept some string "s" as table value inside >> the kernel, than, we have some logic that says: >> oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s" >> has just been created, let's save indices. >> >> That would require additional indirection table like: >> >> index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ] >> ( so we will have 2-level indirection table for fwd if we do IPv6) >> >> We can optimize this if we use "same name -> same kidx" approach >> regardless of kernel object we're refering to. That might require some >> more memory, but that's OK from my point of view. >> >> So we end up with >> int [ skipto idx | fwd idx | obj idx ] >> >> idx "0" is special value which means the same as 2.CURRENT >> >> That looks better, but still way to complex. >> I do care about compatibility, but it's hard to improve things without >> changing. >> >> I'd like to propose the following: >> * Split values into 3 types ("ip|nexthop", "number", "object") >> * Do not insist on object existence, use value "0" to mimic 2.CURRENT >> behavior. >> * Retain full compatibility by introducing special value type "legacy" >> which matches any type and is backed by given indirection table. >> * Issue warning in ipfw(8) binary on all auto-created tables that >> auto-creation is legacy and this behavior will be dropped in next major >> release (e.g. 11.0) >> * Save this behavior in MFC but drop "legacy" tables in head after a >> month after actual MFC. >> >> That do you think? >>> >>> If we do it this way, we should be able to preserve backward >>> compatibility _and_ add features that people may need. >>> >>> cheers >>> luigi >>> > Here is my idea: tablearg should contain more than one value. I think > getting several values from one table lookup is faster than several > table lookups with one value. > Let tablearg be not just uint32, but array with different value types > inside it. There are some use cases where we might need 2-level value lookup (e.g. algo returning index for index table where actual data reside) and each data item can really be up to 64-bytes long. The problem is in actual partitioning and compatibility. > > For example I have many such rules: > allow src-ip 1.2.3.4 MAC any 11:22:33:44:55:66 recv vlan1234 dst-ip > 1.1.1.1 Sorry, what task are you solving by using given rules? > > These rules can be replaced with such construction: > allow src-ip table(1) MAC any tablearg[1] recv tablearg[2] dst-ip > tablearg[3] > > But I don't think indexing by value is a good idea. I think > index==starting byte is a better way: > allow src-ip table(1) MAC any tablearg:0 recv tablearg:6 dst-ip > tablearg:32 > where MAC's 6 bytes are from 0 to 5 in tablearg; iface string is from > 6 and till \0, but less than 26 bytes; and IPv4's 4 bytes are from 32 > to 35. > So we need to create table for it: > table 1 set MAC:0 string:6:26 ip:32 > table 1 add 1.2.3.4 11:22:33:44:55:66 vlan1234 1.1.1.1 > > String can be used both for iface and comment. > Other possible value types: > uint16 for nat, pipe, skipto and other 2-bytes actions > IPv4 4 bytes > CIDRv4 5 bytes > IPv6 16 bytes > CIDRv6 17 bytes > table_id 2 bytes - link to another table Well, it seems we have enough space to store most of these, however, problems seem to remain the same: typing and compatibility. When you're creating new table (or it is auto-created) which values types should be assumed ? All of them? What should `ipfw table X list` show as "value" field ? How should ipfw(8) treat "add 1.1.1.1 0" input? What will happen if we want to add another type field to this list? (MAC address of Infiniband MAC address, for example). > > Table value length can be set for example with loader tunable like > net.inet.ip.fw.table_value_length. > Even with default uint32 value length we can get 2 uint16 values or 4 > uint8 values, this can help in some configurations. > > This way is more complex, but much more flexible. It's like netgraph > subsystem. > I think it suites both Alexander and Luigi requests. > >