From owner-freebsd-net@FreeBSD.ORG Tue Aug 19 16:06:56 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7157A417; Tue, 19 Aug 2014 16:06:56 +0000 (UTC) Received: from mail.rlan.ru (mail.rlan.ru [213.234.25.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A0E843FE6; Tue, 19 Aug 2014 16:06:54 +0000 (UTC) Message-ID: <53F3760E.9070206@rlan.ru> Date: Tue, 19 Aug 2014 20:06:38 +0400 From: Dmitry Selivanov User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: "Alexander V. Chernikov" Subject: Re: ipfw named objejcts, table values and syntax change References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org> <20140804115817.GA13814@onelab2.iet.unipi.it> <53DFE438.5050209@FreeBSD.org> <53E4BE62.4050303@rlan.ru> <53EE0A30.4020800@FreeBSD.org> <53EE16DE.9020209@rlan.ru> <53EE252D.10109@FreeBSD.org> <53F3563D.6020107@FreeBSD.org> In-Reply-To: <53F3563D.6020107@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Aug 2014 16:06:56 -0000 19.08.2014 17:50, Alexander V. Chernikov пишет: > On 15.08.2014 19:20, Alexander V. Chernikov wrote: >> On 15.08.2014 18:19, Dmitry Selivanov wrote: >>> 15.08.2014 17:25, Alexander V. Chernikov пишет: >>>> On 08.08.2014 16:11, Dmitry Selivanov wrote: >>>>> 04.08.2014 23:51, Alexander V. Chernikov пишет: >>>>>> On 04.08.2014 15:58, Luigi Rizzo wrote: >>>>>>> On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote: >>>>>>>> On 02.08.2014 12:33, Alexander V. Chernikov wrote: >>>>>>>>> On 02.08.2014 10:33, Luigi Rizzo wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>> Hello all. >>>>>>>>>> >>>>>>>>>> I'm currently working on to enhance ipfw in some areas. >>>>>>>>>> The most notable (and user-visible) change is named table support. >>>>>>>>>> The other one is support for different lookup algorithms for different >>>>>>>>>> key types. >>>>>>>>>> >>>>>>>>>> For example, new ipfw permits writing this: >>>>>>>>>> >>>>>>>>>> ipfw table tb1 create type cidr >>>>>>>>>> ipfw add allow ip from table(tl1) to any >>>>>>>>>> ipfw add allow ip from any lookup dst-ip tb1 >>>>>>>>>> >>>>>>>>>> ipfw table if1 create type iface >>>>>>>>>> ipfw add skipto tablearg ip from any to any via table(if1) >>>>>>>>>> >>>>>>>>>> or even this: >>>>>>>>>> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port >>>>>>>>>> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 >>>>>>>>>> ipfw add allow ip from any to any flow table(fl1) >>>>>>>>>> >>>>>>>>>> all these changes fully preserve backward compatibility. >>>>>>>>>> (actually tables needs now to be created before use and their type needs >>>>>>>>>> to match with opcode used, but new ipfw(8) performs auto-creation >>>>>>>>>> for cidr tables). >>>>>>>>>> >>>>>>>>>> There is another thing I'm going to change and I'm not sure I can keep >>>>>>>>>> the same compatibility level. >>>>>>>>>> >>>>>>>>>> Table values, from one point of view, can be classified to the following >>>>>>>>>> types: >>>>>>>>>> >>>>>>>>>> - skipto argument >>>>>>>>>> - fwd argument (*) >>>>>>>>>> - link to another object (nat, pipe, queue) >>>>>>>>>> - plain u32 (not bound to any object) >>>>>>>>>> (divert/tee,netgraph,tag/utag,limit) >>>>>>>>>> >>>>>>>>>> There are the following reasons why I think it is necessary to implement >>>>>>>>>> explicit table values typing (like tables): >>>>>>>>>> - Implementing fwd tablearg for IPv6 hosts requires indirection table >>>>>>>>>> - Converting nat/pipe instance ids to names renders values unusable >>>>>>>>>> - retiring old hack with storing saved pointer of found object/rule >>>>>>>>>> inside rule w/o proper locking >>>>>>>>>> - making faster skipto >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ??????i don't buy the idea that you need typed arguments >>>>>>>>>> for all the cases above. Maybe the case that >>>>>>>>>> may make sense is the fwd argument (and in the future >>>>>>>>>> something else). >>>>>>>>>> We already discussed, i think, the fact that now it >>>>>>>>>> is legal to have references to non existing things >>>>>>>>>> (skipto, pipes etc.) implemented as u32. >>>>>>>>>> Removing that would break configurations. >>>>>>>>> It depends on actual implementation. This can be preserved by >>>>>>>>> auto-creating necessary objects in kernel and/or in userspace, so >>>>>>>>> we can (and should) avoid breaking in this particular way. >>>>>>>> Can you please explain your vision on values another time? >>>>>>>> As far as I understand, you're not against it in general, but the >>>>>>>> details matter: >>>>>>>> * IP address can be one of the types (it won't break much, and we can >>>>>>>> simply skip that one for MFC) >>>>>>>> * what about typing for nat/pipes ? we're not going to convert their ids >>>>>>>> to names? (or maybe you can suggest other non-disruptive way?) >>>>>>>> * everything else is type "u32" >>>>>>> >>>>>>> Correct, I am mostly concerned about the details, not on the general concept. >>>>>>> >>>>>>> To summarize the discussion Alexander and I had about converting >>>>>>> identifiers from numbers to arbitrary strings (this is partly related >>>>>>> to the values stored in tables, but I think we should have a coherent >>>>>>> behaviour) >>>>>>> >>>>>>> 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits or less) >>>>>>> for rules, pipes, queues, tables, probably nat instances. >>>>>>> >>>>>>> 2. CURRENTLY, in all the above contexts, it is legal to reference a >>>>>>> non existing object (rule, pipe, table names, etc.), >>>>>>> and the kernel will do something reasonable, namely jump to the >>>>>>> next rule, drop traffic for non existing pipes, and so on. >>>>>>> >>>>>>> 3. of course we want to preserve backward compatibility both for >>>>>>> the ioctl interface, and for user configurations. >>>>>>> >>>>>>> 4. The in-kernel representation of identifiers is not visible to users, >>>>>>> so we can use a numeric representation in the kernel for identifiers. >>>>>>> Strings like "12345" are converted with atoi() or the like, >>>>>>> whereas for other identifiers or numbers outside of the 2^16 range >>>>>>> the kernel manages a translation table, allocating new numeric >>>>>>> identifiers if a new string appears. >>>>>>> This permits backward compatibility for old rulesets, and does not >>>>>>> impact performance because the translation table is only >>>>>>> used during rules additions or deletion. >>>>>> Yes. However this requires either holding either (1) 2 pointers (old&new >>>>>> arrays), or (2) 65k+ index array, or (3) chained hash table. >>>>>> (1) would require additional pointers for each subsystem (and some >>>>>> additional management), >>>>>> (2) will definitely upset embedded guys and >>>>>> (3) is worse in terms of performance >>>>>>> >>>>>>> With this in mind, i think we should follow a similar approach for >>>>>>> objects stored in tables, hence >>>>>>> >>>>>>> if an u32 value was available in the past, it must be >>>>>>> available also in the new implementation. >>>>>>> >>>>>>> The issue with tables is that some convoluted configuration could >>>>>>> use the same table to reference pipes _and_ rules _and_ perhaps >>>>>>> other things represented as numbers (the former is not too strange, >>>>>>> if i have a large configuration i might place sections at rules >>>>>>> 12000, 13000, 14000... and associate pipes with the same numberic >>>>>>> identifier to each block of rules). >>>>>>> >>>>>>> Typed table values would clearly disturb backward compatibility >>>>>>> in the above configurations. However it should not be difficult >>>>>>> to accept arbitrary strings as the values stored in tables, and >>>>>>> then store multiple representations as appropriate, including: >>>>>> Well, I've thought about thas one. It may be an option, but the details >>>>>> are not so promising (below) >>>>>>> - the string representation, unconditionally >>>>>>> - for names that can be resolved by DNS, the ipv6 and ipv4 address(es) >>>>>>> associated with them. ipfw already translates hostnames in rules >>>>>>> so this is POLA >>>>>> I'm not happy what ipfw(8) is doing instead of translation. The proper >>>>>> way would be not simply using first AF_INET answer but saving ALL >>>>>> IPv4+IPv6 records inside rule (and some more tracking should be done >>>>>> afterwards, but that's totally different story). Additionally, I'm >>>>>> unsure if we really need next-hop value expressed as hostname (how can >>>>>> we deal with multiple addresses and diffrent AFs?). We may store strings >>>>>> (and I think we should do it) but I'm unsure about this particular >>>>>> option of interpreting them. >>>>>>> - for other strings, a u32 from the translation table as previously >>>>>>> indicated >>>>>>> - and for numeric values, the u32 representation (truncated if needed, >>>>>>> according to whatever is the existing behaviour) >>>>>>> - >>>>>>> If we cannot generate an u32 we will put some value (e.g. 0) >>>>>>> that hopefully will not cause confusion. >>>>>> As far as I understand, we accept some string "s" as table value inside >>>>>> the kernel, than, we have some logic that says: >>>>>> oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s" >>>>>> has just been created, let's save indices. >>>>>> >>>>>> That would require additional indirection table like: >>>>>> >>>>>> index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ] >>>>>> ( so we will have 2-level indirection table for fwd if we do IPv6) >>>>>> >>>>>> We can optimize this if we use "same name -> same kidx" approach >>>>>> regardless of kernel object we're refering to. That might require some >>>>>> more memory, but that's OK from my point of view. >>>>>> >>>>>> So we end up with >>>>>> int [ skipto idx | fwd idx | obj idx ] >>>>>> >>>>>> idx "0" is special value which means the same as 2.CURRENT >>>>>> >>>>>> That looks better, but still way to complex. >>>>>> I do care about compatibility, but it's hard to improve things without >>>>>> changing. >>>>>> >>>>>> I'd like to propose the following: >>>>>> * Split values into 3 types ("ip|nexthop", "number", "object") >>>>>> * Do not insist on object existence, use value "0" to mimic 2.CURRENT >>>>>> behavior. >>>>>> * Retain full compatibility by introducing special value type "legacy" >>>>>> which matches any type and is backed by given indirection table. >>>>>> * Issue warning in ipfw(8) binary on all auto-created tables that >>>>>> auto-creation is legacy and this behavior will be dropped in next major >>>>>> release (e.g. 11.0) >>>>>> * Save this behavior in MFC but drop "legacy" tables in head after a >>>>>> month after actual MFC. >>>>>> >>>>>> That do you think? >>>>>>> >>>>>>> If we do it this way, we should be able to preserve backward >>>>>>> compatibility _and_ add features that people may need. >>>>>>> >>>>>>> cheers >>>>>>> luigi >>>>>>> >>>>> Here is my idea: tablearg should contain more than one value. I think getting several values from one table lookup is faster than several table lookups with one value. >>>>> Let tablearg be not just uint32, but array with different value types inside it. >>>> There are some use cases where we might need 2-level value lookup (e.g. algo returning index for index table where actual data reside) and each data item can >>>> really be up to 64-bytes long. The problem is in actual partitioning and compatibility. >>>>> >>>>> For example I have many such rules: >>>>> allow src-ip 1.2.3.4 MAC any 11:22:33:44:55:66 recv vlan1234 dst-ip 1.1.1.1 >>>> Sorry, what task are you solving by using given rules? >>> Small ISP, clients have static IP with MAC-authorization. Src iface must be checked to prevent IP-spoofing. Dst-IP sometimes is used for p2p-channels. >>>>> >>>>> These rules can be replaced with such construction: >>>>> allow src-ip table(1) MAC any tablearg[1] recv tablearg[2] dst-ip tablearg[3] >>>>> >>>>> But I don't think indexing by value is a good idea. I think index==starting byte is a better way: >>>>> allow src-ip table(1) MAC any tablearg:0 recv tablearg:6 dst-ip tablearg:32 >>>>> where MAC's 6 bytes are from 0 to 5 in tablearg; iface string is from 6 and till \0, but less than 26 bytes; and IPv4's 4 bytes are from 32 to 35. >>>> >>>>> So we need to create table for it: >>>>> table 1 set MAC:0 string:6:26 ip:32 >>>>> table 1 add 1.2.3.4 11:22:33:44:55:66 vlan1234 1.1.1.1 >>>>> >>>>> String can be used both for iface and comment. >>>>> Other possible value types: >>>>> uint16 for nat, pipe, skipto and other 2-bytes actions >>>>> IPv4 4 bytes >>>>> CIDRv4 5 bytes >>>>> IPv6 16 bytes >>>>> CIDRv6 17 bytes >>>>> table_id 2 bytes - link to another table >>>> Well, it seems we have enough space to store most of these, however, problems seem to remain the same: typing and compatibility. >>>> When you're creating new table (or it is auto-created) which values types should be assumed ? All of them? >>> Default - as usually uint32. >> I can't see "uint32" value in the list you have specified before. I'll rephrase: >> what value types (from the list above or similar) should ipfw(8) or kernel fill in case of "default" table? >> (And once again, what should we print as value) ? >> Please think about >> a) old ipfw binaries >> b) new ipfw binaries using exactly the same ruleset they are already using (with, for example, both "skipto tablearg" and "fwd tablearg " tables). At that time I meant default table "header" is "ip:0" (in my context). It would be completely compatible with old ipfw tables. > I've increased kernel<>userland 'struct tentry' value field to 64 bytes. > It looks like we were talking about a bit different things. > Let me try to explain the problem I'm stuck with: > > We may take the road you've suggested, it looks OK: > > * by default tables are created with "all-values" mask. > * ipfw(8) value treats default "ipfw table X add Y val" input where value is u32 number as input data for each type specified in all-values without returning error > * for non-default mask value data should be validated. > > e.g. if we have table with valtype="skipto,nat,pipe,ip4,ip6" and "100" as input -> it turns to "100,100,0.0.0.0,::". I don't fully understand. One "100" value for all valtypes? Then "100" can't be equal "0.0.0.0" and "::". Or you meant "100,100,0,0" as input? > If we have value with valtype="skipto,ip6" and "100" as input -> error while the valid one would be "100,2a01::1:111", for example. > > I'm unsure how should one be able to update _specific_ value (e.g. update nat id or skipto arg), but that's not the problem. Maybe new command would help, like "ipfw table X set Y newval". > > The problem arises if we start talking about using names for nat/pipe/queue ids instead of numbers. > If we have nat instances "nat1", "11" and "23", and one specifies "44" as part of value, logic starts to be complex: > > we either require nat "44" to exists (and I'm unsure if we can auto-create it *) or start doing complex stuff like tracking all those non-existing objects: > e.g. add some special record somewhere that we're wating for nat instance "44" to be created, than auto-update given value with its kernel index, > than, do something reasonable if nat "44" instance is destroyed (OK, nat instance can't be destroyed, but pipe can). > .. and we have to do the same for pipes/queues and any following kernel object. > > Or we have to require user to reference existing objects only (create explicitly before use). This one makes things easier in code, but require user to change their scripts. > It looks like there is no consensus on that point. User can destroy object after table creating. I think this way: "no object - no packet (explicitly deny)". No need to check object existence. > > * Maybe auto-creation is not so tricky and we should try to evaluate it.. > >> >> >>>> What should `ipfw table X list` show as "value" field ? >>> I added table "header" in this line: >>> table 1 set MAC:0 string:6:26 ip:32 >> I don't think that user should be able to set any offsets in userland. Exact offsets of variable of given type needs to be enforced by kernel, >> so you may fill that you want "mac" and "ip" as values for given table, but not lengths or offsets. Does your way allow to use strings (e.g. iface or comments)? >>> So `ipfw table X list` should show something like this: >>> ---table(0)--- >>> 1.2.3.4/32 11:22:33:44:55:66 vlan1234 1.1.1.1 >>> We can also add "header" description in output (with or without additional parameter - depends on compatibility needs) like this: >>> ---table(0)--- addr MAC iface IPv4 >>>> How should ipfw(8) treat "add 1.1.1.1 0" input? >>> It should look at table "header" and return error message like "Value doesn't match table header" >> >>>> What will happen if we want to add another type field to this list? (MAC address of Infiniband MAC address, for example). >>> I don't think there is a sense to mix both MAC[6] and MAC[20] values in 1 table. It is easier to create 2 tables with different "headers". >>> For Infiniband we can add another type: MAC20 (or something like this). Or we can use "MAC"-type like string type(see above): MAC:6:25 (1st and last bytes, or 1st and length). >>>> >>>>> >>>>> Table value length can be set for example with loader tunable like net.inet.ip.fw.table_value_length. >>>>> Even with default uint32 value length we can get 2 uint16 values or 4 uint8 values, this can help in some configurations. >>>>> >>>>> This way is more complex, but much more flexible. It's like netgraph subsystem. >>>>> I think it suites both Alexander and Luigi requests. >>>>> >>>>>