From owner-freebsd-ipfw@FreeBSD.ORG Fri Aug 15 14:19:16 2014 Return-Path: Delivered-To: freebsd-ipfw@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AE9111DF; Fri, 15 Aug 2014 14:19:16 +0000 (UTC) Received: from mail.rlan.ru (mail.rlan.ru [213.234.25.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 03C372540; Fri, 15 Aug 2014 14:19:15 +0000 (UTC) Message-ID: <53EE16DE.9020209@rlan.ru> Date: Fri, 15 Aug 2014 18:19:10 +0400 From: Dmitry Selivanov User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: "Alexander V. Chernikov" Subject: Re: ipfw named objejcts, table values and syntax change References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org> <20140804115817.GA13814@onelab2.iet.unipi.it> <53DFE438.5050209@FreeBSD.org> <53E4BE62.4050303@rlan.ru> <53EE0A30.4020800@FreeBSD.org> In-Reply-To: <53EE0A30.4020800@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-ipfw@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: IPFW Technical Discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Aug 2014 14:19:16 -0000 15.08.2014 17:25, Alexander V. Chernikov пишет: > On 08.08.2014 16:11, Dmitry Selivanov wrote: >> 04.08.2014 23:51, Alexander V. Chernikov пишет: >>> On 04.08.2014 15:58, Luigi Rizzo wrote: >>>> On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote: >>>>> On 02.08.2014 12:33, Alexander V. Chernikov wrote: >>>>>> On 02.08.2014 10:33, Luigi Rizzo wrote: >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov >>>>>>> > wrote: >>>>>>> >>>>>>> Hello all. >>>>>>> >>>>>>> I'm currently working on to enhance ipfw in some areas. >>>>>>> The most notable (and user-visible) change is named table support. >>>>>>> The other one is support for different lookup algorithms for different >>>>>>> key types. >>>>>>> >>>>>>> For example, new ipfw permits writing this: >>>>>>> >>>>>>> ipfw table tb1 create type cidr >>>>>>> ipfw add allow ip from table(tl1) to any >>>>>>> ipfw add allow ip from any lookup dst-ip tb1 >>>>>>> >>>>>>> ipfw table if1 create type iface >>>>>>> ipfw add skipto tablearg ip from any to any via table(if1) >>>>>>> >>>>>>> or even this: >>>>>>> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port >>>>>>> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 >>>>>>> ipfw add allow ip from any to any flow table(fl1) >>>>>>> >>>>>>> all these changes fully preserve backward compatibility. >>>>>>> (actually tables needs now to be created before use and their type needs >>>>>>> to match with opcode used, but new ipfw(8) performs auto-creation >>>>>>> for cidr tables). >>>>>>> >>>>>>> There is another thing I'm going to change and I'm not sure I can keep >>>>>>> the same compatibility level. >>>>>>> >>>>>>> Table values, from one point of view, can be classified to the following >>>>>>> types: >>>>>>> >>>>>>> - skipto argument >>>>>>> - fwd argument (*) >>>>>>> - link to another object (nat, pipe, queue) >>>>>>> - plain u32 (not bound to any object) >>>>>>> (divert/tee,netgraph,tag/utag,limit) >>>>>>> >>>>>>> There are the following reasons why I think it is necessary to implement >>>>>>> explicit table values typing (like tables): >>>>>>> - Implementing fwd tablearg for IPv6 hosts requires indirection table >>>>>>> - Converting nat/pipe instance ids to names renders values unusable >>>>>>> - retiring old hack with storing saved pointer of found object/rule >>>>>>> inside rule w/o proper locking >>>>>>> - making faster skipto >>>>>>> >>>>>>> >>>>>>> ??????i don't buy the idea that you need typed arguments >>>>>>> for all the cases above. Maybe the case that >>>>>>> may make sense is the fwd argument (and in the future >>>>>>> something else). >>>>>>> We already discussed, i think, the fact that now it >>>>>>> is legal to have references to non existing things >>>>>>> (skipto, pipes etc.) implemented as u32. >>>>>>> Removing that would break configurations. >>>>>> It depends on actual implementation. This can be preserved by >>>>>> auto-creating necessary objects in kernel and/or in userspace, so >>>>>> we can (and should) avoid breaking in this particular way. >>>>> Can you please explain your vision on values another time? >>>>> As far as I understand, you're not against it in general, but the >>>>> details matter: >>>>> * IP address can be one of the types (it won't break much, and we can >>>>> simply skip that one for MFC) >>>>> * what about typing for nat/pipes ? we're not going to convert their ids >>>>> to names? (or maybe you can suggest other non-disruptive way?) >>>>> * everything else is type "u32" >>>> >>>> Correct, I am mostly concerned about the details, not on the general concept. >>>> >>>> To summarize the discussion Alexander and I had about converting >>>> identifiers from numbers to arbitrary strings (this is partly related >>>> to the values stored in tables, but I think we should have a coherent >>>> behaviour) >>>> >>>> 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits or less) >>>> for rules, pipes, queues, tables, probably nat instances. >>>> >>>> 2. CURRENTLY, in all the above contexts, it is legal to reference a >>>> non existing object (rule, pipe, table names, etc.), >>>> and the kernel will do something reasonable, namely jump to the >>>> next rule, drop traffic for non existing pipes, and so on. >>>> >>>> 3. of course we want to preserve backward compatibility both for >>>> the ioctl interface, and for user configurations. >>>> >>>> 4. The in-kernel representation of identifiers is not visible to users, >>>> so we can use a numeric representation in the kernel for identifiers. >>>> Strings like "12345" are converted with atoi() or the like, >>>> whereas for other identifiers or numbers outside of the 2^16 range >>>> the kernel manages a translation table, allocating new numeric >>>> identifiers if a new string appears. >>>> This permits backward compatibility for old rulesets, and does not >>>> impact performance because the translation table is only >>>> used during rules additions or deletion. >>> Yes. However this requires either holding either (1) 2 pointers (old&new >>> arrays), or (2) 65k+ index array, or (3) chained hash table. >>> (1) would require additional pointers for each subsystem (and some >>> additional management), >>> (2) will definitely upset embedded guys and >>> (3) is worse in terms of performance >>>> >>>> With this in mind, i think we should follow a similar approach for >>>> objects stored in tables, hence >>>> >>>> if an u32 value was available in the past, it must be >>>> available also in the new implementation. >>>> >>>> The issue with tables is that some convoluted configuration could >>>> use the same table to reference pipes _and_ rules _and_ perhaps >>>> other things represented as numbers (the former is not too strange, >>>> if i have a large configuration i might place sections at rules >>>> 12000, 13000, 14000... and associate pipes with the same numberic >>>> identifier to each block of rules). >>>> >>>> Typed table values would clearly disturb backward compatibility >>>> in the above configurations. However it should not be difficult >>>> to accept arbitrary strings as the values stored in tables, and >>>> then store multiple representations as appropriate, including: >>> Well, I've thought about thas one. It may be an option, but the details >>> are not so promising (below) >>>> - the string representation, unconditionally >>>> - for names that can be resolved by DNS, the ipv6 and ipv4 address(es) >>>> associated with them. ipfw already translates hostnames in rules >>>> so this is POLA >>> I'm not happy what ipfw(8) is doing instead of translation. The proper >>> way would be not simply using first AF_INET answer but saving ALL >>> IPv4+IPv6 records inside rule (and some more tracking should be done >>> afterwards, but that's totally different story). Additionally, I'm >>> unsure if we really need next-hop value expressed as hostname (how can >>> we deal with multiple addresses and diffrent AFs?). We may store strings >>> (and I think we should do it) but I'm unsure about this particular >>> option of interpreting them. >>>> - for other strings, a u32 from the translation table as previously >>>> indicated >>>> - and for numeric values, the u32 representation (truncated if needed, >>>> according to whatever is the existing behaviour) >>>> - >>>> If we cannot generate an u32 we will put some value (e.g. 0) >>>> that hopefully will not cause confusion. >>> As far as I understand, we accept some string "s" as table value inside >>> the kernel, than, we have some logic that says: >>> oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s" >>> has just been created, let's save indices. >>> >>> That would require additional indirection table like: >>> >>> index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ] >>> ( so we will have 2-level indirection table for fwd if we do IPv6) >>> >>> We can optimize this if we use "same name -> same kidx" approach >>> regardless of kernel object we're refering to. That might require some >>> more memory, but that's OK from my point of view. >>> >>> So we end up with >>> int [ skipto idx | fwd idx | obj idx ] >>> >>> idx "0" is special value which means the same as 2.CURRENT >>> >>> That looks better, but still way to complex. >>> I do care about compatibility, but it's hard to improve things without >>> changing. >>> >>> I'd like to propose the following: >>> * Split values into 3 types ("ip|nexthop", "number", "object") >>> * Do not insist on object existence, use value "0" to mimic 2.CURRENT >>> behavior. >>> * Retain full compatibility by introducing special value type "legacy" >>> which matches any type and is backed by given indirection table. >>> * Issue warning in ipfw(8) binary on all auto-created tables that >>> auto-creation is legacy and this behavior will be dropped in next major >>> release (e.g. 11.0) >>> * Save this behavior in MFC but drop "legacy" tables in head after a >>> month after actual MFC. >>> >>> That do you think? >>>> >>>> If we do it this way, we should be able to preserve backward >>>> compatibility _and_ add features that people may need. >>>> >>>> cheers >>>> luigi >>>> >> Here is my idea: tablearg should contain more than one value. I think getting several values from one table lookup is faster than several table lookups with one value. >> Let tablearg be not just uint32, but array with different value types inside it. > There are some use cases where we might need 2-level value lookup (e.g. algo returning index for index table where actual data reside) and each data item can > really be up to 64-bytes long. The problem is in actual partitioning and compatibility. >> >> For example I have many such rules: >> allow src-ip 1.2.3.4 MAC any 11:22:33:44:55:66 recv vlan1234 dst-ip 1.1.1.1 > Sorry, what task are you solving by using given rules? Small ISP, clients have static IP with MAC-authorization. Src iface must be checked to prevent IP-spoofing. Dst-IP sometimes is used for p2p-channels. >> >> These rules can be replaced with such construction: >> allow src-ip table(1) MAC any tablearg[1] recv tablearg[2] dst-ip tablearg[3] >> >> But I don't think indexing by value is a good idea. I think index==starting byte is a better way: >> allow src-ip table(1) MAC any tablearg:0 recv tablearg:6 dst-ip tablearg:32 >> where MAC's 6 bytes are from 0 to 5 in tablearg; iface string is from 6 and till \0, but less than 26 bytes; and IPv4's 4 bytes are from 32 to 35. > >> So we need to create table for it: >> table 1 set MAC:0 string:6:26 ip:32 >> table 1 add 1.2.3.4 11:22:33:44:55:66 vlan1234 1.1.1.1 >> >> String can be used both for iface and comment. >> Other possible value types: >> uint16 for nat, pipe, skipto and other 2-bytes actions >> IPv4 4 bytes >> CIDRv4 5 bytes >> IPv6 16 bytes >> CIDRv6 17 bytes >> table_id 2 bytes - link to another table > Well, it seems we have enough space to store most of these, however, problems seem to remain the same: typing and compatibility. > When you're creating new table (or it is auto-created) which values types should be assumed ? All of them? Default - as usually uint32. > What should `ipfw table X list` show as "value" field ? I added table "header" in this line: table 1 set MAC:0 string:6:26 ip:32 So `ipfw table X list` should show something like this: ---table(0)--- 1.2.3.4/32 11:22:33:44:55:66 vlan1234 1.1.1.1 We can also add "header" description in output (with or without additional parameter - depends on compatibility needs) like this: ---table(0)--- addr MAC iface IPv4 > How should ipfw(8) treat "add 1.1.1.1 0" input? It should look at table "header" and return error message like "Value doesn't match table header" > What will happen if we want to add another type field to this list? (MAC address of Infiniband MAC address, for example). I don't think there is a sense to mix both MAC[6] and MAC[20] values in 1 table. It is easier to create 2 tables with different "headers". For Infiniband we can add another type: MAC20 (or something like this). Or we can use "MAC"-type like string type(see above): MAC:6:25 (1st and last bytes, or 1st and length). > >> >> Table value length can be set for example with loader tunable like net.inet.ip.fw.table_value_length. >> Even with default uint32 value length we can get 2 uint16 values or 4 uint8 values, this can help in some configurations. >> >> This way is more complex, but much more flexible. It's like netgraph subsystem. >> I think it suites both Alexander and Luigi requests. >> >> >