From owner-freebsd-ipfw@FreeBSD.ORG  Fri Aug 15 14:19:16 2014
Return-Path: <owner-freebsd-ipfw@FreeBSD.ORG>
Delivered-To: freebsd-ipfw@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AE9111DF;
 Fri, 15 Aug 2014 14:19:16 +0000 (UTC)
Received: from mail.rlan.ru (mail.rlan.ru [213.234.25.10])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 03C372540;
 Fri, 15 Aug 2014 14:19:15 +0000 (UTC)
Message-ID: <53EE16DE.9020209@rlan.ru>
Date: Fri, 15 Aug 2014 18:19:10 +0400
From: Dmitry Selivanov <sd@rlan.ru>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.1
MIME-Version: 1.0
To: "Alexander V. Chernikov" <melifaro@FreeBSD.org>
Subject: Re: ipfw named objejcts, table values and syntax change
References: <53DC01DE.3000000@FreeBSD.org>
 <CA+hQ2+gNjA0rucTYAaPYQKtEMt9GZLC6RCi+OgPVRpuDC5Ei7Q@mail.gmail.com>
 <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org>
 <20140804115817.GA13814@onelab2.iet.unipi.it> <53DFE438.5050209@FreeBSD.org>
 <53E4BE62.4050303@rlan.ru> <53EE0A30.4020800@FreeBSD.org>
In-Reply-To: <53EE0A30.4020800@FreeBSD.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: freebsd-ipfw <freebsd-ipfw@freebsd.org>,
 "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-ipfw@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: IPFW Technical Discussions <freebsd-ipfw.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-ipfw>,
 <mailto:freebsd-ipfw-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-ipfw/>
List-Post: <mailto:freebsd-ipfw@freebsd.org>
List-Help: <mailto:freebsd-ipfw-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw>,
 <mailto:freebsd-ipfw-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Aug 2014 14:19:16 -0000

15.08.2014 17:25, Alexander V. Chernikov пишет:
> On 08.08.2014 16:11, Dmitry Selivanov wrote:
>> 04.08.2014 23:51, Alexander V. Chernikov пишет:
>>> On 04.08.2014 15:58, Luigi Rizzo wrote:
>>>> On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote:
>>>>> On 02.08.2014 12:33, Alexander V. Chernikov wrote:
>>>>>> On 02.08.2014 10:33, Luigi Rizzo wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov
>>>>>>> <melifaro@freebsd.org <mailto:melifaro@freebsd.org>> wrote:
>>>>>>>
>>>>>>>       Hello all.
>>>>>>>
>>>>>>>       I'm currently working on to enhance ipfw in some areas.
>>>>>>>       The most notable (and user-visible) change is named table support.
>>>>>>>       The other one is support for different lookup algorithms for different
>>>>>>>       key types.
>>>>>>>
>>>>>>>       For example, new ipfw permits writing this:
>>>>>>>
>>>>>>>       ipfw table tb1 create type cidr
>>>>>>>       ipfw add allow ip from table(tl1) to any
>>>>>>>       ipfw add allow ip from any lookup dst-ip tb1
>>>>>>>
>>>>>>>       ipfw table if1 create type iface
>>>>>>>       ipfw add skipto tablearg ip from any to any via table(if1)
>>>>>>>
>>>>>>>       or even this:
>>>>>>>       ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port
>>>>>>>       ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444
>>>>>>>       ipfw add allow ip from any to any flow table(fl1)
>>>>>>>
>>>>>>>       all these changes fully preserve backward compatibility.
>>>>>>>       (actually tables needs now to be created before use and their type needs
>>>>>>>       to match with opcode used, but new ipfw(8) performs auto-creation
>>>>>>>       for cidr tables).
>>>>>>>
>>>>>>>       There is another thing I'm going to change and I'm not sure I can keep
>>>>>>>       the same compatibility level.
>>>>>>>
>>>>>>>       Table values, from one point of view, can be classified to the following
>>>>>>>       types:
>>>>>>>
>>>>>>>       - skipto argument
>>>>>>>       - fwd argument (*)
>>>>>>>       - link to another object (nat, pipe, queue)
>>>>>>>       - plain u32 (not bound to any object)
>>>>>>>       (divert/tee,netgraph,tag/utag,limit)
>>>>>>>
>>>>>>>       There are the following reasons why I think it is necessary to implement
>>>>>>>       explicit table values typing (like tables):
>>>>>>>       - Implementing fwd tablearg for IPv6 hosts requires indirection table
>>>>>>>       - Converting nat/pipe instance ids to names renders values unusable
>>>>>>>       - retiring old hack with storing saved pointer of found object/rule
>>>>>>>       inside rule w/o proper locking
>>>>>>>       - making faster skipto
>>>>>>>
>>>>>>>
>>>>>>> ??????i don't buy the idea that you need typed arguments
>>>>>>> for all the cases above. Maybe the case that
>>>>>>> may make sense is the fwd argument (and in the future
>>>>>>> something else).
>>>>>>> We already discussed, i think, the fact that now it
>>>>>>> is legal to have references to non existing things
>>>>>>> (skipto, pipes etc.) implemented as u32.
>>>>>>> Removing that would break configurations.
>>>>>> It depends on actual implementation. This can be preserved by
>>>>>> auto-creating necessary objects in kernel and/or in userspace, so
>>>>>> we can (and should) avoid breaking in this particular way.
>>>>> Can you please explain your vision on values another time?
>>>>> As far as I understand, you're not against it in general, but the
>>>>> details matter:
>>>>> * IP address can be one of the types (it won't break much, and we can
>>>>> simply skip that one for MFC)
>>>>> * what about typing for nat/pipes ? we're not going to convert their ids
>>>>> to names? (or maybe you can suggest other non-disruptive way?)
>>>>> * everything else is type "u32"
>>>>
>>>> Correct, I am mostly concerned about the details, not on the general concept.
>>>>
>>>> To summarize the discussion Alexander and I had about converting
>>>> identifiers from numbers to arbitrary strings (this is partly related
>>>> to the values stored in tables, but I think we should have a coherent
>>>> behaviour)
>>>>
>>>> 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits or less)
>>>>     for rules, pipes, queues, tables, probably nat instances.
>>>>
>>>> 2. CURRENTLY, in all the above contexts, it is legal to reference a
>>>>     non existing object (rule, pipe, table names, etc.),
>>>>     and the kernel will do something reasonable, namely jump to the
>>>>     next rule, drop traffic for non existing pipes, and so on.
>>>>
>>>> 3. of course we want to preserve backward compatibility both for
>>>>     the ioctl interface, and for user configurations.
>>>>
>>>> 4. The in-kernel representation of identifiers is not visible to users,
>>>>     so we can use a numeric representation in the kernel for identifiers.
>>>>     Strings like "12345" are converted with atoi() or the like,
>>>>     whereas for other identifiers or numbers outside of the 2^16 range
>>>>     the kernel manages a translation table, allocating new numeric
>>>>     identifiers if a new string appears.
>>>>     This permits backward compatibility for old rulesets, and does not
>>>>     impact performance because the translation table is only
>>>>     used during rules additions or deletion.
>>> Yes. However this requires either holding either (1) 2 pointers (old&new
>>> arrays), or (2) 65k+ index array, or (3) chained hash table.
>>> (1) would require additional pointers for each subsystem (and some
>>> additional management),
>>> (2) will definitely upset embedded guys and
>>> (3) is worse in terms of performance
>>>>
>>>> With this in mind, i think we should follow a similar approach for
>>>> objects stored in tables, hence
>>>>
>>>>     if an u32 value was available in the past, it must be
>>>>     available also in the new implementation.
>>>>
>>>> The issue with tables is that some convoluted configuration could
>>>> use the same table to reference pipes _and_ rules _and_ perhaps
>>>> other things represented as numbers (the former is not too strange,
>>>> if i have a large configuration i might place sections at rules
>>>> 12000, 13000, 14000... and associate pipes with the same numberic
>>>> identifier to each block of rules).
>>>>
>>>> Typed table values would clearly disturb backward compatibility
>>>> in the above configurations. However it should not be difficult
>>>> to accept arbitrary strings as the values stored in tables, and
>>>> then store multiple representations as appropriate, including:
>>> Well, I've thought about thas one. It may be an option, but the details
>>> are not so promising (below)
>>>> - the string representation, unconditionally
>>>> - for names that can be resolved by DNS, the ipv6 and ipv4 address(es)
>>>>    associated with them. ipfw already translates hostnames in rules
>>>>    so this is POLA
>>> I'm not happy what ipfw(8) is doing instead of translation. The proper
>>> way would be not simply using first AF_INET answer but saving ALL
>>> IPv4+IPv6 records inside rule (and some more tracking should be done
>>> afterwards, but that's totally different story). Additionally, I'm
>>> unsure if we really need next-hop value expressed as hostname (how can
>>> we deal with multiple addresses and diffrent AFs?). We may store strings
>>> (and I think we should do it) but I'm unsure about this particular
>>> option of interpreting them.
>>>> - for other strings, a u32 from the translation table as previously
>>>>    indicated
>>>> - and for numeric values, the u32 representation (truncated if needed,
>>>>    according to whatever is the existing behaviour)
>>>> - <add other representations if needed>
>>>> If we cannot generate an u32 we will put some value (e.g. 0)
>>>> that hopefully will not cause confusion.
>>> As far as I understand, we accept some string "s" as table value inside
>>> the kernel, than, we have some logic that says:
>>> oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s"
>>> has just been created, let's save indices.
>>>
>>> That would require additional indirection table like:
>>>
>>> index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ]
>>> ( so we will have 2-level indirection table for fwd if we do IPv6)
>>>
>>> We can optimize this if we use "same name -> same kidx" approach
>>> regardless of kernel object we're refering to. That might require some
>>> more memory, but that's OK from my point of view.
>>>
>>> So we end up with
>>> int [ skipto idx | fwd idx | obj idx ]
>>>
>>> idx "0" is special value which means the same as 2.CURRENT
>>>
>>> That looks better, but still way to complex.
>>> I do care about compatibility, but it's hard to improve things without
>>> changing.
>>>
>>> I'd like to propose the following:
>>> * Split values into 3 types ("ip|nexthop", "number", "object")
>>> * Do not insist on object existence, use value "0" to mimic 2.CURRENT
>>> behavior.
>>> * Retain full compatibility by introducing special value type "legacy"
>>>    which matches any type and is backed by given indirection table.
>>> * Issue warning in ipfw(8) binary on all auto-created tables that
>>> auto-creation is legacy and this behavior will be dropped in next major
>>> release (e.g. 11.0)
>>> * Save this behavior in MFC but drop "legacy" tables in head after a
>>> month after actual MFC.
>>>
>>> That do you think?
>>>>
>>>> If we do it this way, we should be able to preserve backward
>>>> compatibility _and_ add features that people may need.
>>>>
>>>> cheers
>>>> luigi
>>>>
>> Here is my idea: tablearg should contain more than one value. I think getting several values from one table lookup is faster than several table lookups with one value.
>> Let tablearg be not just uint32, but array with different value types inside it.
> There are some use cases where we might need 2-level value lookup (e.g. algo returning index for index table where actual data reside) and each data item can
> really be up to 64-bytes long. The problem is in actual partitioning and compatibility.
>>
>> For example I have many such rules:
>> allow src-ip 1.2.3.4 MAC any 11:22:33:44:55:66 recv vlan1234 dst-ip 1.1.1.1
> Sorry, what task are you solving by using given rules?
Small ISP, clients have static IP with MAC-authorization. Src iface must be checked to prevent IP-spoofing. Dst-IP sometimes is used for p2p-channels.
>>
>> These rules can be replaced with such construction:
>> allow src-ip table(1) MAC any tablearg[1] recv tablearg[2] dst-ip tablearg[3]
>>
>> But I don't think indexing by value is a good idea. I think index==starting byte is a better way:
>> allow src-ip table(1) MAC any tablearg:0 recv tablearg:6 dst-ip tablearg:32
>> where MAC's 6 bytes are from 0 to 5 in tablearg; iface string is from 6 and till \0, but less than 26 bytes; and IPv4's 4 bytes are from 32 to 35.
>
>> So we need to create table for it:
>> table 1 set MAC:0 string:6:26 ip:32
>> table 1 add 1.2.3.4 11:22:33:44:55:66 vlan1234 1.1.1.1
>>
>> String can be used both for iface and comment.
>> Other possible value types:
>> uint16 for nat, pipe, skipto and other 2-bytes actions
>> IPv4 4 bytes
>> CIDRv4 5 bytes
>> IPv6 16 bytes
>> CIDRv6 17 bytes
>> table_id 2 bytes - link to another table
> Well, it seems we have enough space to store most of these, however, problems seem to remain the same: typing and compatibility.
> When you're creating new table (or it is auto-created) which values types should be assumed ? All of them?
Default - as usually uint32.
> What should `ipfw table X list` show as "value" field ?
I added table "header" in this line:
table 1 set MAC:0 string:6:26 ip:32
So `ipfw table X list` should show something like this:
---table(0)---
1.2.3.4/32 11:22:33:44:55:66 vlan1234 1.1.1.1
We can also add "header" description in output (with or without additional parameter - depends on compatibility needs) like this:
---table(0)--- addr MAC iface IPv4
> How should ipfw(8) treat "add 1.1.1.1 0" input?
It should look at table "header" and return error message like "Value doesn't match table header"
> What will happen if we want to add another type field to this list? (MAC address of Infiniband MAC address, for example).
I don't think there is a sense to mix both MAC[6] and MAC[20] values in 1 table. It is easier to create 2 tables with different "headers".
For Infiniband we can add another type: MAC20 (or something like this). Or we can use "MAC"-type like string type(see above): MAC:6:25 (1st and last bytes, or 1st and length).
>
>>
>> Table value length can be set for example with loader tunable like net.inet.ip.fw.table_value_length.
>> Even with default uint32 value length we can get 2 uint16 values or 4 uint8 values, this can help in some configurations.
>>
>> This way is more complex, but much more flexible. It's like netgraph subsystem.
>> I think it suites both Alexander and Luigi requests.
>>
>>
>