From owner-freebsd-net@FreeBSD.ORG  Thu May 22 15:34:55 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 36BE94EB;
 Thu, 22 May 2014 15:34:55 +0000 (UTC)
Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BDB672E10;
 Thu, 22 May 2014 15:34:54 +0000 (UTC)
Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net)
 by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
 (Exim 4.76 (FreeBSD)) (envelope-from <melifaro@FreeBSD.org>)
 id 1WnR6m-000KZo-P0; Thu, 22 May 2014 15:24:24 +0400
Message-ID: <537E18D3.2010201@FreeBSD.org>
Date: Thu, 22 May 2014 19:33:39 +0400
From: "Alexander V. Chernikov" <melifaro@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.0.1
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: [CFT]: ipfw named tables / different tabletypes
References: <5379FE3C.6060501@FreeBSD.org>
 <20140521111002.GB62462@onelab2.iet.unipi.it> <537CEC12.8050404@FreeBSD.org>
 <20140521204826.GA67124@onelab2.iet.unipi.it> <537E1029.70007@FreeBSD.org>
In-Reply-To: <537E1029.70007@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Luigi Rizzo <luigi@freebsd.org>, FreeBSD Net <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 May 2014 15:34:55 -0000

On 22.05.2014 18:56, Alexander V. Chernikov wrote:

It looks like we have reached some kind of consensus on table naming,
so I'm going to implement the following as the first part:

* named-only tables, no "user-visible" indexes
* Keep the same opcodes, use additional TLVs to pass names in rules
* Use explicit userland object names retrieval while listing
* Make the previous ones easily extendable for other ipfw objects
* Introduce table references and explicit typecasting (while permitting 
user to refernce non-existing tables)

* leave table atomics for one the next stages


Are you OK with this?

> On 22.05.2014 00:48, Luigi Rizzo wrote:
>> On Wed, May 21, 2014 at 10:10:26PM +0400, Alexander V. Chernikov wrote:
>>> On 21.05.2014 15:10, Luigi Rizzo wrote:
>>>> On Mon, May 19, 2014 at 04:51:08PM +0400, Alexander V. Chernikov 
>>>> wrote:
>>>>> Hello list!
>>>>>
>>>>> This patch adds ability to name tables / reference them by name.
>>>>> Additionally, it simplifies adding new table types.
>>>> Hi Alex,
>>>> at a high level, i think your changes here are in the right direction.
>>> Hello!
>>>> However i think part of the design, and also of the patch below,
>>>> is not sound/correct so i would like to wait to commit at least
>>>> until some of the problems below are resolved.
>>>>
>>>> 1. The patch as is has several issues (variable declarations in the
>>>>      middle of a block, assignment in conditionals, incorrect
>>>>      comments in cut&pasted code, missing checks on strings)
>>>>      that should be corrected.
>>> ..missing documentation and so on :)
>>> Of course, I'll fix all these (or most of these :))
>> good thanks
>>
>>>> 3. there is no explanation, here or in the code, on how the
>>>>      names are managed. I could try to figure out from the code
>>>>      but it would be the wrong way to understand things so let me
>>>>      ask, and please explain what you have in mind.
>>> Currently it is very simple non-resizable hash table with fnv hash 
>>> based
>>> on table name.
>> that is not an issue. The question is whether one needs to lookup
>> the hash table every time you have a 'table' argument (of course i
>> think one should not, and the implementation i propose below gives
>> direct access to the table without name lookups, as the internal
>> identifier is still poiniter or small integer, just one that is not 
>> exposed
>> to userspace).
>>
>>>> Let me address first the name <-> table-id thing.
>>>>
>>>> Introducing a symbolic name for tables is a great and useful feature.
>>>> However the implementation has some tricky issues:
>>>>
>>>> - do you want to retain the old numeric identifiers or not ?
>>>>     I think it is a bad idea to have two namespaces for the same 
>>>> thing,
>>>>     so if we are switching to alphanumeric strings for tables we 
>>>> should
>>>>     as well get rid of the numbers (i.e. consider them as strings 
>>>> as well).
>>>>
>>>>     I am strongly in favour of not using names as aliases for numbers.
>>>>     It would require no changes for clients issuing ipfw commands
>>>>     from a script, and would not require users to to manually handle
>>>>     the name-id translations.
>>> Well. I'd prefer not to. However, code we're discussing assumes that
>>> numeric ids
>>> are primary ones (e.g. you can't have named, but unnumbered table,
>>> you have to choose number by yourself). Switching to named-only tables
>>> can be tricky since we don't want to match them by inside rules and we
>>> don't want
>>> to loose atomicity when allocating table ids via separate cmds before
>>> adding rule.
>> i think this is solved by the implementation i proposed below.
>>
>>> It looks like we can do the following:
>>> 1) Add another IP_FW3_ADD opcode with the following layout:
>>>
>>> {
>>> rule len;
>>> unmodified rule itself;
>>> tlv1 {type=table name;len=..;id=1;"_TABLENAME11_"}
>>> tlv2 {type=table name;len=..;id=2;"_TABLENAME4_"}
>>> ..
>>> }
>>> Values inside appropriate opcodes { O_IP_XXX_LOOKUP and so on } 
>>> won't be
>>> real values
>>> but values described in attached TLVs.
>>>
>>> After validating/parsing/checking under UH lock we replace fake values
>>> with real ones (probably, newly allocated)
>>> and return or rollback atomically.
>> yes this should work, probably not even requiring a new IP_FW3_ADD 
>> opcode
>> because the extra tlv entries can appear in a block by themselves.
>>
>>> The same thing can be done for displaying ruleset, however I'd prefer
>>> client to ask for table names and caching them while displaying.
>>>
>>> Old clients will retain the ability to operate on tables but will see
>>> table ids, so nothing will change for them
>>> (except the case when new binary adds table "5" and new binary sees id
>>> which is not 5, but that shouldn't be a problem).
>> we can solve this by using 'low' numbers for the numeric tables
>> (these were limited anyways) and allocate the fake entries in
>> another range.
> Currently we have u16 space available in base opcode.
> Introducing another range will require additional opcode, additional 
> array for new tables and so on.
> I'd prefer not to do this. Since table ids are allocated by ourselves 
> we can (and should) pack them
> efficiently and 65k _real tables_ currently available is a quite good 
> value.
>
> We preserve compability for old binaries so people with old userland 
> and scripts should
> not not notice anything. We have no public userland API so the only 
> base binary is /sbin/ipfw which is either old or new.
>
>>
>>>> - The rename command worries me a lot:
>>>>
>>>>     > ipfw table <num> name XXX
>>>>     > Names (or renames) table. Not the name has to be unique.
>>>>
>>>>     because it is neither clear nor intuitive whether you want to
>>>>     1. just rename a table (possibly breaking or creating references
>>>>        for rules in the firewall), or
>>>>     2. modify the name-id translation preserving existing pointers.
>>>>
>>>>     Consider the sequence
>>>>     ipfw table 13 name bar
>>>>           ipfw add 100 count dst-ip table bar
>>>>           ipfw table 13 name foo
>>>>           ipfw table 14 name bar
>>>>           ipfw add 200 count src-port 22 dst-ip table bar
>>>>
>>>>      Approach #1 would detach rule 100 from table 13 and then 
>>>> connect to 14
>>>>      (in a non-atomic way), whereas approach #2 would make rule 100 
>>>> and 200
>>>>      point to different tables (which is confusing).
>>>>
>>>>      Now part of this problem goes away if we get rid of number 
>>>> altogether.
>>>>
>>>>      You may legitimately want to swap tables in an atomic way (we 
>>>> have something
>>>>      similar for rulesets):
>>>>     ipfw set swap number number
>>> There is some problem here:
>>> Atomic ruleset changes is a great thing, but tables have no atomic 
>>> support.
>>> We, for example, are solving this by using different table range on
>>> every new configuration.
>>> It won't be possible with named-only tables (since names usually care
>>> some semantic in contrast to numbers).
>>> The only thing I can think of is separate namespace per each set.
>>> What do you think?
>> maybe i am missing some detail but it seems reasonably easy to implement
>> the atomic swap -- and the use case is when you want to move from
>> one configuration to a new one:
>>     ipfw table foo-new flush // clear initial content
>>     ipfw table foo-new add  ... <repeat as needed>
>>     ipfw table swap foo-current foo-new // swap the content of the 
>> table objects
>>
>> so you preserve the semantic of the name very easily.
> Yes. We can easily add atomic table swap that way. However, I'm 
> talking about different use scenario:
> Atomically swap entire ruleset which has some tables depency:
>
>
> e.g. we have:
>
> "
> 100 allow ip from table(TABLE1) to me
> 200 allow ip from table(TABLE2) to (TABLE3) 80
>
> table TABLE1 1.1.1.1/32
> table TABLE1 1.0.0.0/16
>
> table TABLE2 2.2.2.2/32
>
> table TABLE3 3.3.3.3/32
> "
> and we want to _atomically_ change this to
>
> "
> 100 allow ip from table(TABLE1) to me
> +200 allow ip from table(TABLE4) to any
> 300 allow ip from table(TABLE2) to (TABLE3) 80
>
> table TABLE1 1.1.1.1/32
> -table TABLE1 1.0.0.0/16
>
> -table TABLE2 2.2.2.2/32
> +table TABLE2 77.77.77.0/24
>
> table TABLE3 3.3.3.3/32
>
> +table TABLE4 4.4.4.4/32
> "
>
>>
>>>> So here is what i would propose:
>>>> - [gradually] introduce new commands that accept strings for every 
>>>> place where
>>>>     a number was previously accepted. Internally in the firewall, 
>>>> the old
>>>>     16-bit number is interpreted as a string.
>>> Yup.
>>>> - within the kernel create a small set of functions (invoked on 
>>>> insert, list, delete)
>>>>     that do proper checks on the string and translate it to a 
>>>> pointer (or short integer,
>>>>     i.e. an index in an array) to the proper object. Done properly, 
>>>> we can reuse
>>>>     this code also for pipes, sets, nat groups and whatnot.
>>> Yup.
>>>>     When a rule references an object that does not exist, you 
>>>> create an empty
>>>>     object that a subsequent table/pipe/XXX 'create' would initialize.
>>>>     On 'destroy', no need to touch the ruleset, just clear the object.
>>> Yes. This looks better than requiring user to create table of given 
>>> type
>>> _before_
>>> referencing it.
>>>> - for renames, try to follow the approach used for sets i.e.
>>>>     ipfw table rename old-name new-name
>>>>     changes the name, preserves the object.
>>>>     Does not touch the ruleset, only the translation table
>>> Well, I'm not sure about renaming. I'd prefer permitting renaming iff
>>> table is not referenced.
>>> (and why do we need to rename tables?)
>> you are right, let's forget it. And 'ipfw set move' actually does a 
>> different thing
>> which has no use here.
>>
>>>>     ipfw table swap first-table second-table
>>>>     atomically swap the content of the two table objects
>>>>     (once again not touching the rulesets)
>> cheers
>> luigi
>>
>