From owner-freebsd-net@FreeBSD.ORG Thu Jun 5 10:24:11 2014 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D075368F; Thu, 5 Jun 2014 10:24:11 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 05EDA2895; Thu, 5 Jun 2014 10:24:10 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1WsQvF-00083h-Ag; Thu, 05 Jun 2014 10:13:09 +0400 Message-ID: <539044E4.1020904@ipfw.ru> Date: Thu, 05 Jun 2014 14:22:28 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: [CFT]: ipfw named tables / different tabletypes References: <5379FE3C.6060501@FreeBSD.org> <20140521111002.GB62462@onelab2.iet.unipi.it> <537CEC12.8050404@FreeBSD.org> <20140521204826.GA67124@onelab2.iet.unipi.it> <537E1029.70007@FreeBSD.org> <20140522154740.GA76448@onelab2.iet.unipi.it> <537E2153.1040005@FreeBSD.org> <20140522163812.GA77634@onelab2.iet.unipi.it> <538B2FE5.6070407@FreeBSD.org> In-Reply-To: <538B2FE5.6070407@FreeBSD.org> Content-Type: multipart/mixed; boundary="------------060502050508080706040508" Cc: Luigi Rizzo , Bill Yuan , FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jun 2014 10:24:11 -0000 This is a multi-part message in MIME format. --------------060502050508080706040508 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 01.06.2014 17:51, Alexander V. Chernikov wrote: > On 22.05.2014 20:38, Luigi Rizzo wrote: > > Long story short, new version is ready. > I've tried to minimize changes in this patch to ease review/commit. > > Changes: > * Add namedobject set-aware api capable of searching/allocation > objects by their name/idx. > * Switch tables code to use string ids for configuration tasks. > * Change locking model: most configuration changes are protected with > UH lock, runtime-visible are protected with both locks. > * Reduce number of arguments passed to ipfw_table_add/del by using > separate structure. > * Add internal V_fw_tables_sets tunable (set to 0) to prepare for > set-aware tables (requires opcodes/client support) > * Implement typed table referencing (and tables are implicitly > allocated with all state like radix ptrs on reference) > * Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode > > Namedobj more detailed: > * Blackbox api providing methods to add/del/search/enumerate objects > * Statically-sized hashes for names/indexes > * Per-set bitmask to indicate free indexes > * Separate methods for index alloc/delete/resize > > > Basically, there should not be any user-visible changes except the > following: > * reducing table_max is not supported > * flush & add change table type won't work if table is referenced > > > I haven't removed any numbering restrictions to protect the following > case: > one (with old client) unintentionally references too many tables (e.g. > 1000-1128), > tries to allocate table from "valid" range and fails. Old client does > not have any ability to > destroy any table, so the only way to solve this is either module > unload or reboot. > > I've uploaded the same patch to phabricator since it provides quite > handy diffs: > https://phabric.freebsd.org/D139 (no login required). A bit cleaner version attached. > >> On Thu, May 22, 2014 at 08:09:55PM +0400, Alexander V. Chernikov wrote: >>> On 22.05.2014 19:47, Luigi Rizzo wrote: >>>> On Thu, May 22, 2014 at 06:56:41PM +0400, Alexander V. Chernikov >>>> wrote: >>>>> On 22.05.2014 00:48, Luigi Rizzo wrote: >>>>>> On Wed, May 21, 2014 at 10:10:26PM +0400, Alexander V. Chernikov >>>>>> wrote: >>>> ... >>>>>> we can solve this by using 'low' numbers for the numeric tables >>>>>> (these were limited anyways) and allocate the fake entries in >>>>>> another range. >>>>> Currently we have u16 space available in base opcode. >>>> yes but the standard range for tables is much more limited: >>>> >>>> net.inet.ip.fw.tables_max: 128 >>>> >>>> so one can just (say) use 32k for "old" tables and the rest >>>> for tables with non numeric names. >>>> Does not seem to be a problem in practice. >>> Well, using upper 32k means that you set this default to 65k which >>> consumes 256k of memory on 32-bit arch. >>> Embedded people won't be very happy about this (and changing table >>> numbers on resize would be a nightmare). >> no no, this is an implementation detail but >> within the kernel you can just remap the 'old' and 'new' >> table identifiers to a single contiguous range. >> The only thing you need to do is that when you push >> identifiers up to userland, those with 'new' names will >> be mapped to the 32-64k range. >> >> Example: >> user first specifies tables >> "18, goodguys, 530, badguys" in the same rule >> /sbin/ipfw will generate these numbers: >> 18, 32768, 530, 32769 ; tlv {32768:goodguys, 32769:badguys} >> The kernel will then do a lookup of those identifiers and >> 18: internal index 1, name "18" >> 32768: internal index 2, name "goodguys" >> 530: internal index 3, name "530" >> 32769: internal index 4, name "badguys" >> >> Then the next rule contains tables >> 1, badguys, 18 >> /sbin/ipfw generates >> 1, 32768, 18 ; tlv {32768:badguys} // note different from before >> Kernel looks up the names and remaps >> 1: internal index 5, name "1" >> 32768: internal index 4, name "badguys" >> 18: internal index 1, name "18" >> >> Finally when you do an 'ipfw show' the kernel will remap names >> between 1 and 32768 to themselves, and other names to 32768+ >> (or some other large number, say 40k and above) so >> as they are found. So the rules will be pushed up with >> 18, 40000, 530, 40001 >> 1, 40001, 18 >> >> we can discusso the other details privately >> >> cheers >> luigi >> >> >> 1. first, the >>>>>> maybe i am missing some detail but it seems reasonably easy to >>>>>> implement >>>>>> the atomic swap -- and the use case is when you want to move from >>>>>> one configuration to a new one: >>>>>> ipfw table foo-new flush // clear initial content >>>>>> ipfw table foo-new add ... >>>>>> ipfw table swap foo-current foo-new // swap the content of >>>>>> the table objects >>>>>> >>>>>> so you preserve the semantic of the name very easily. >>>>> Yes. We can easily add atomic table swap that way. However, I'm >>>>> talking >>>>> about different use scenario: >>>>> Atomically swap entire ruleset which has some tables depency: >>>>> >>>>> >>>>> e.g. we have: >>>>> >>>>> " >>>>> 100 allow ip from table(TABLE1) to me >>>>> 200 allow ip from table(TABLE2) to (TABLE3) 80 >>>>> >>>>> table TABLE1 1.1.1.1/32 >>>>> table TABLE1 1.0.0.0/16 >>>>> >>>>> table TABLE2 2.2.2.2/32 >>>>> >>>>> table TABLE3 3.3.3.3/32 >>>>> " >>>>> and we want to _atomically_ change this to >>>>> >>>>> " >>>>> 100 allow ip from table(TABLE1) to me >>>>> +200 allow ip from table(TABLE4) to any >>>>> 300 allow ip from table(TABLE2) to (TABLE3) 80 >>>>> >>>>> table TABLE1 1.1.1.1/32 >>>>> -table TABLE1 1.0.0.0/16 >>>>> >>>>> -table TABLE2 2.2.2.2/32 >>>>> +table TABLE2 77.77.77.0/24 >>>>> >>>>> table TABLE3 3.3.3.3/32 >>>>> >>>>> +table TABLE4 4.4.4.4/32 >>>>> " >>>> aargh, that's too much -- because between changing >>>> one table and all tables there are infinite intermediate >>>> points that all make sense. >>> It depends. As I said before, we're currently solving this problem by >>> adding new rules (to set X) referencing tables from different range >>> (2048 tables per ruleset) and than doing swap. >>> (And not being able to use named tables to store real names after >>> implementing them is a bit discouraging). >>> >>>> For those cases i think the way to go could be to >>>> insert a 'disabled' new ruleset (however complex it is, >>>> so it covers all possible cases), and then do the set swap, >>>> or disable/enable. >>> We can think of per-set arrays/namespaces of tables: >>> >>> so "ipfw add 100 set X allow ipfw from table(Y) to ..." will reference >>> table Y in set X and >>> "ipfw table ABC list" can differ from "ipfw table ABC set 5 list". >>> >>> This behavior can break some users setups so we can provide >>> sysctl/tunable to turn this off or on. >>> >>>> cheers >>>> luigi >>>> > --------------060502050508080706040508 Content-Type: text/x-patch; name="D139_4.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="D139_4.diff" Index: sbin/ipfw/ipfw2.c =================================================================== --- sbin/ipfw/ipfw2.c +++ sbin/ipfw/ipfw2.c @@ -4243,6 +4243,23 @@ do { table_list(xent.tbl, is_all); } while (++xent.tbl < a); + } else if (_substrcmp(*av, "destroy") == 0) { + char xbuf[sizeof(ipfw_obj_header) + sizeof(ipfw_xtable_ntlv)]; + ipfw_obj_header *oh; + ipfw_xtable_ntlv *ntlv; + + memset(xbuf, 0, sizeof(xbuf)); + oh = (ipfw_obj_header *)xbuf; + ntlv = (ipfw_xtable_ntlv *)(oh + 1); + + ntlv->head.type = IPFW_TLV_NAME; + ntlv->head.length = sizeof(*ntlv); + ntlv->idx = 1; + snprintf(ntlv->name, sizeof(ntlv->name), "%d", xent.tbl); + oh->idx = 1; + oh->objtype = IPFW_OBJTYPE_TABLE; + if (do_setcmd3(IP_FW_OBJ_DEL, xbuf, sizeof(xbuf)) != 0) + err(EX_OSERR, "setsockopt(IP_FW_OBJ_DEL)"); } else errx(EX_USAGE, "invalid table command %s", *av); } Index: sys/netinet/ip_fw.h =================================================================== --- sys/netinet/ip_fw.h +++ sys/netinet/ip_fw.h @@ -37,6 +37,11 @@ #define IPFW_DEFAULT_RULE 65535 /* + * Number of sets supported by ipfw + */ +#define IPFW_MAX_SETS 32 + +/* * Default number of ipfw tables. */ #define IPFW_TABLES_MAX 65535 @@ -74,6 +79,7 @@ #define IP_FW_TABLE_XDEL 87 /* delete entry */ #define IP_FW_TABLE_XGETSIZE 88 /* get table size */ #define IP_FW_TABLE_XLIST 89 /* list table contents */ +#define IP_FW_OBJ_DEL 90 /* del table/pipe/etc */ /* * The kernel representation of ipfw rules is made of a list of @@ -632,12 +638,34 @@ } ipfw_table; typedef struct _ipfw_xtable { - ip_fw3_opheader opheader; /* eXtended tables are controlled via IP_FW3 */ + ip_fw3_opheader opheader; /* IP_FW3 opcode */ uint32_t size; /* size of entries in bytes */ uint32_t cnt; /* # of entries */ uint16_t tbl; /* table number */ uint8_t type; /* table type */ ipfw_table_xentry xent[0]; /* entries */ } ipfw_xtable; +typedef struct _ipfw_xtable_tlv { + uint16_t type; /* TLV type */ + uint16_t length; /* Total length, aligned to u32 */ +} ipfw_xtable_tlv; + +#define IPFW_TLV_NAME 1 +/* Object name TLV */ +typedef struct _ipfw_xtable_ntlv { + ipfw_xtable_tlv head; /* TLV header */ + uint16_t idx; /* Name index */ + uint16_t spare; /* unused */ + char name[64]; /* Null-terminated name */ +} ipfw_xtable_ntlv; + +typedef struct _ipfw_obj_header { + ip_fw3_opheader opheader; /* IP_FW3 opcode */ + uint32_t set; /* Set we're operating */ + uint16_t idx; /* object name index */ + uint16_t objtype; /* object type */ +} ipfw_obj_header; +#define IPFW_OBJTYPE_TABLE 1 + #endif /* _IPFW2_H */ Index: sys/netpfil/ipfw/ip_fw2.c =================================================================== --- sys/netpfil/ipfw/ip_fw2.c +++ sys/netpfil/ipfw/ip_fw2.c @@ -121,6 +121,7 @@ VNET_DEFINE(int, fw_one_pass) = 1; VNET_DEFINE(unsigned int, fw_tables_max); +VNET_DEFINE(unsigned int, fw_tables_sets) = 0; /* Don't use set-aware tables */ /* Use 128 tables by default */ static unsigned int default_fw_tables = IPFW_TABLES_DEFAULT; @@ -2719,7 +2720,6 @@ ipfw_dyn_uninit(0); /* run the callout_drain */ IPFW_WUNLOCK(chain); - ipfw_destroy_tables(chain); reap = NULL; IPFW_WLOCK(chain); for (i = 0; i < chain->n_rules; i++) { @@ -2731,6 +2731,7 @@ free(chain->map, M_IPFW); IPFW_WUNLOCK(chain); IPFW_UH_WUNLOCK(chain); + ipfw_destroy_tables(chain); if (reap != NULL) ipfw_reap_rules(reap); IPFW_LOCK_DESTROY(chain); Index: sys/netpfil/ipfw/ip_fw_private.h =================================================================== --- sys/netpfil/ipfw/ip_fw_private.h +++ sys/netpfil/ipfw/ip_fw_private.h @@ -212,14 +212,18 @@ VNET_DECLARE(unsigned int, fw_tables_max); #define V_fw_tables_max VNET(fw_tables_max) +VNET_DECLARE(unsigned int, fw_tables_sets); +#define V_fw_tables_sets VNET(fw_tables_sets) + +struct tables_config; + struct ip_fw_chain { struct ip_fw **map; /* array of rule ptrs to ease lookup */ uint32_t id; /* ruleset id */ int n_rules; /* number of static rules */ LIST_HEAD(nat_list, cfg_nat) nat; /* list of nat entries */ struct radix_node_head **tables; /* IPv4 tables */ struct radix_node_head **xtables; /* extended tables */ - uint8_t *tabletype; /* Array of table types */ #if defined( __linux__ ) || defined( _WIN32 ) spinlock_t rwmtx; #else @@ -229,6 +233,7 @@ uint32_t gencnt; /* NAT generation count */ struct ip_fw *reap; /* list of rules to reap */ struct ip_fw *default_rule; + struct tables_config *tblcfg; /* tables module data */ #if defined( __linux__ ) || defined( _WIN32 ) spinlock_t uh_lock; #else @@ -295,32 +300,113 @@ #define IPFW_UH_WLOCK(p) rw_wlock(&(p)->uh_lock) #define IPFW_UH_WUNLOCK(p) rw_wunlock(&(p)->uh_lock) +struct tid_info { + uint32_t set; /* table set */ + uint16_t uidx; /* table index */ + uint8_t type; /* table type */ + uint8_t spare; + void *tlvs; /* Pointer to first TLV */ + int tlen; /* Total TLV size block */ +}; + +struct obj_idx { + uint16_t uidx; /* internal index supplied by userland */ + uint16_t kidx; /* kernel object index */ + uint16_t off; /* tlv offset from rule end in 4-byte words */ + uint8_t new; /* index is newly-allocated */ + uint8_t type; /* object type within its category */ +}; + +struct rule_check_info { + uint16_t table_opcodes; /* count of opcodes referencing table */ + uint16_t new_tables; /* count of opcodes referencing table */ + uint32_t tableset; /* ipfw set id for table */ + void *tlvs; /* Pointer to first TLV if any */ + int tlen; /* *Total TLV size block */ + uint8_t fw3; /* opcode is new */ + struct ip_fw *krule; /* resulting rule pointer */ + struct obj_idx obuf[8]; /* table references storage */ +}; + +struct tentry_info { + void *paddr; + int plen; /* Total entry length */ + uint8_t masklen; /* mask length */ + uint8_t spare; + uint16_t flags; /* record flags */ + uint32_t value; /* value */ +}; + /* In ip_fw_sockopt.c */ int ipfw_find_rule(struct ip_fw_chain *chain, uint32_t key, uint32_t id); -int ipfw_add_rule(struct ip_fw_chain *chain, struct ip_fw *input_rule); int ipfw_ctl(struct sockopt *sopt); int ipfw_chk(struct ip_fw_args *args); void ipfw_reap_rules(struct ip_fw *head); +struct namedobj_instance; + +struct named_object { + TAILQ_ENTRY(named_object) nn_next; /* namehash */ + TAILQ_ENTRY(named_object) nv_next; /* valuehash */ + char *name; /* object name */ + uint8_t type; /* object type */ + uint8_t compat; /* Object name is number */ + uint16_t kidx; /* object kernel index */ + uint16_t uidx; /* userland idx for compat records */ + uint32_t set; /* set object belongs to */ + uint32_t refcnt; /* number of references */ +}; +TAILQ_HEAD(namedobjects_head, named_object); + +typedef void (objhash_cb_t)(struct namedobj_instance *ni, struct named_object *, + void *arg); +struct namedobj_instance *ipfw_objhash_create(uint32_t items); +void ipfw_objhash_destroy(struct namedobj_instance *); +void ipfw_objhash_bitmap_alloc(uint32_t items, void **idx, int *pblocks); +int ipfw_objhash_bitmap_merge(struct namedobj_instance *ni, + void **idx, int *blocks); +void ipfw_objhash_bitmap_free(void *idx, int blocks); +struct named_object *ipfw_objhash_lookup_name(struct namedobj_instance *ni, + uint32_t set, char *name); +struct named_object *ipfw_objhash_lookup_idx(struct namedobj_instance *ni, + uint32_t set, uint16_t idx); +void ipfw_objhash_add(struct namedobj_instance *ni, struct named_object *no); +void ipfw_objhash_del(struct namedobj_instance *ni, struct named_object *no); +void ipfw_objhash_foreach(struct namedobj_instance *ni, objhash_cb_t *f, + void *arg); +int ipfw_objhash_free_idx(struct namedobj_instance *ni, uint32_t set, + uint16_t idx); +int ipfw_objhash_alloc_idx(void *n, uint32_t set, uint16_t *pidx); + /* In ip_fw_table.c */ struct radix_node; int ipfw_lookup_table(struct ip_fw_chain *ch, uint16_t tbl, in_addr_t addr, uint32_t *val); int ipfw_lookup_table_extended(struct ip_fw_chain *ch, uint16_t tbl, void *paddr, uint32_t *val, int type); int ipfw_init_tables(struct ip_fw_chain *ch); +int ipfw_destroy_table(struct ip_fw_chain *ch, struct tid_info *ti, int force); void ipfw_destroy_tables(struct ip_fw_chain *ch); -int ipfw_flush_table(struct ip_fw_chain *ch, uint16_t tbl); -int ipfw_add_table_entry(struct ip_fw_chain *ch, uint16_t tbl, void *paddr, - uint8_t plen, uint8_t mlen, uint8_t type, uint32_t value); -int ipfw_del_table_entry(struct ip_fw_chain *ch, uint16_t tbl, void *paddr, - uint8_t plen, uint8_t mlen, uint8_t type); -int ipfw_count_table(struct ip_fw_chain *ch, uint32_t tbl, uint32_t *cnt); +int ipfw_flush_table(struct ip_fw_chain *ch, struct tid_info *ti); +int ipfw_add_table_entry(struct ip_fw_chain *ch, struct tid_info *ti, + struct tentry_info *tei); +int ipfw_del_table_entry(struct ip_fw_chain *ch, struct tid_info *ti, + struct tentry_info *tei); +int ipfw_count_table(struct ip_fw_chain *ch, struct tid_info *ti, + uint32_t *cnt); int ipfw_dump_table_entry(struct radix_node *rn, void *arg); -int ipfw_dump_table(struct ip_fw_chain *ch, ipfw_table *tbl); -int ipfw_count_xtable(struct ip_fw_chain *ch, uint32_t tbl, uint32_t *cnt); -int ipfw_dump_xtable(struct ip_fw_chain *ch, ipfw_xtable *tbl); +int ipfw_dump_table(struct ip_fw_chain *ch, struct tid_info *ti, + ipfw_table *tbl); +int ipfw_count_xtable(struct ip_fw_chain *ch, struct tid_info *ti, + uint32_t *cnt); +int ipfw_dump_xtable(struct ip_fw_chain *ch, struct tid_info *ti, + ipfw_xtable *tbl); int ipfw_resize_tables(struct ip_fw_chain *ch, unsigned int ntables); +int ipfw_rewrite_table_uidx(struct ip_fw_chain *chain, + struct rule_check_info *ci); +int ipfw_rewrite_table_kidx(struct ip_fw_chain *chain, struct ip_fw *rule); +void ipfw_unbind_table_rule(struct ip_fw_chain *chain, struct ip_fw *rule); +void ipfw_unbind_table_list(struct ip_fw_chain *chain, struct ip_fw *head); /* In ip_fw_nat.c -- XXX to be moved to ip_var.h */ Index: sys/netpfil/ipfw/ip_fw_sockopt.c =================================================================== --- sys/netpfil/ipfw/ip_fw_sockopt.c +++ sys/netpfil/ipfw/ip_fw_sockopt.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -67,6 +68,25 @@ #include #endif +#define NAMEDOBJ_HASH_SIZE 32 + +struct namedobj_instance { + struct namedobjects_head *names; + struct namedobjects_head *values; + uint32_t nn_size; /* names hash size */ + uint32_t nv_size; /* number hash size */ + u_long *idx_mask; /* used items bitmask */ + uint32_t max_blocks; /* number of "long" blocks in bitmask */ + uint16_t free_off[IPFW_MAX_SETS]; /* first possible free offset */ +}; +#define BLOCK_ITEMS (8 * sizeof(u_long)) /* Number of items for ffsl() */ + +static uint32_t objhash_hash_name(struct namedobj_instance *ni, uint32_t set, + char *name); +static uint32_t objhash_hash_val(struct namedobj_instance *ni, uint32_t set, + uint32_t val); + + MALLOC_DEFINE(M_IPFW, "IpFw/IpAcct", "IpFw/IpAcct chain's"); /* @@ -152,8 +172,9 @@ * XXX DO NOT USE FOR THE DEFAULT RULE. * Must be called without IPFW_UH held */ -int -ipfw_add_rule(struct ip_fw_chain *chain, struct ip_fw *input_rule) +static int +add_rule(struct ip_fw_chain *chain, struct ip_fw *input_rule, + struct rule_check_info *ci) { struct ip_fw *rule; int i, l, insert_before; @@ -164,19 +185,37 @@ l = RULESIZE(input_rule); rule = malloc(l, M_IPFW, M_WAITOK | M_ZERO); - /* get_map returns with IPFW_UH_WLOCK if successful */ - map = get_map(chain, 1, 0 /* not locked */); - if (map == NULL) { - free(rule, M_IPFW); - return ENOSPC; - } - bcopy(input_rule, rule, l); /* clear fields not settable from userland */ rule->x_next = NULL; rule->next_rule = NULL; IPFW_ZERO_RULE_COUNTER(rule); + /* Check if we need to do table remap */ + if (ci->table_opcodes > 0) { + ci->krule = rule; + i = ipfw_rewrite_table_uidx(chain, ci); + if (i != 0) { + /* rewrite failed, return error */ + free(rule, M_IPFW); + return (i); + } + } + + /* get_map returns with IPFW_UH_WLOCK if successful */ + map = get_map(chain, 1, 0 /* not locked */); + if (map == NULL) { + if (ci->table_opcodes > 0) { + /* We need to unbind tables */ + IPFW_UH_WLOCK(chain); + ipfw_unbind_table_rule(chain, rule); + IPFW_UH_WUNLOCK(chain); + } + + free(rule, M_IPFW); + return (ENOSPC); + } + if (V_autoinc_step < 1) V_autoinc_step = 1; else if (V_autoinc_step > 1000) @@ -421,6 +460,7 @@ rule = chain->reap; chain->reap = NULL; + ipfw_unbind_table_list(chain, rule); IPFW_UH_WUNLOCK(chain); ipfw_reap_rules(rule); if (map) @@ -517,7 +557,7 @@ * Rules are simple, so this mostly need to check rule sizes. */ static int -check_ipfw_struct(struct ip_fw *rule, int size) +check_ipfw_struct(struct ip_fw *rule, int size, struct rule_check_info *ci) { int l, cmdlen = 0; int have_action=0; @@ -662,6 +702,7 @@ cmdlen != F_INSN_SIZE(ipfw_insn_u32) + 1 && cmdlen != F_INSN_SIZE(ipfw_insn_u32)) goto bad_size; + ci->table_opcodes++; break; case O_MACADDR2: if (cmdlen != F_INSN_SIZE(ipfw_insn_mac)) @@ -694,6 +735,8 @@ case O_RECV: case O_XMIT: case O_VIA: + if (((ipfw_insn_if *)cmd)->name[0] == '\1') + ci->table_opcodes++; if (cmdlen != F_INSN_SIZE(ipfw_insn_if)) goto bad_size; break; @@ -879,7 +922,7 @@ char *bp = buf; char *ep = bp + space; struct ip_fw *rule, *dst; - int l, i; + int error, i, l; time_t boot_seconds; boot_seconds = boottime.tv_sec; @@ -890,8 +933,11 @@ /* Convert rule to FreeBSd 7.2 format */ l = RULESIZE7(rule); if (bp + l + sizeof(uint32_t) <= ep) { - int error; bcopy(rule, bp, l + sizeof(uint32_t)); + error = ipfw_rewrite_table_kidx(chain, + (struct ip_fw *)bp); + if (error != 0) + return (0); error = convert_rule_to_7((struct ip_fw *) bp); if (error) return 0; /*XXX correct? */ @@ -918,6 +964,13 @@ } dst = (struct ip_fw *)bp; bcopy(rule, dst, l); + error = ipfw_rewrite_table_kidx(chain, dst); + if (error != 0) { + printf("Stop on rule %d. Fail to convert table\n", + rule->rulenum); + break; + } + /* * XXX HACK. Store the disable mask in the "next" * pointer in a wild attempt to keep the ABI the same. @@ -949,6 +1002,7 @@ uint32_t opt; char xbuf[128]; ip_fw3_opheader *op3 = NULL; + struct rule_check_info ci; error = priv_check(sopt->sopt_td, PRIV_NETINET_IPFW); if (error) @@ -1027,6 +1081,8 @@ error = sooptcopyin(sopt, rule, RULE_MAXSIZE, sizeof(struct ip_fw7) ); + memset(&ci, 0, sizeof(struct rule_check_info)); + /* * If the size of commands equals RULESIZE7 then we assume * a FreeBSD7.2 binary is talking to us (set is7=1). @@ -1044,15 +1100,15 @@ return error; } if (error == 0) - error = check_ipfw_struct(rule, RULESIZE(rule)); + error = check_ipfw_struct(rule, RULESIZE(rule), &ci); } else { is7 = 0; if (error == 0) - error = check_ipfw_struct(rule, sopt->sopt_valsize); + error = check_ipfw_struct(rule, sopt->sopt_valsize,&ci); } if (error == 0) { - /* locking is done within ipfw_add_rule() */ - error = ipfw_add_rule(chain, rule); + /* locking is done within add_rule() */ + error = add_rule(chain, rule, &ci); size = RULESIZE(rule); if (!error && sopt->sopt_dir == SOPT_GET) { if (is7) { @@ -1114,37 +1170,67 @@ break; /*--- TABLE manipulations are protected by the IPFW_LOCK ---*/ - case IP_FW_TABLE_ADD: + case IP_FW_OBJ_DEL: /* IP_FW3 */ { - ipfw_table_entry ent; + struct _ipfw_obj_header *oh; + struct tid_info ti; - error = sooptcopyin(sopt, &ent, - sizeof(ent), sizeof(ent)); - if (error) + if (sopt->sopt_valsize < sizeof(*oh)) { + error = EINVAL; break; - error = ipfw_add_table_entry(chain, ent.tbl, - &ent.addr, sizeof(ent.addr), ent.masklen, - IPFW_TABLE_CIDR, ent.value); - } - break; + } + + oh = (struct _ipfw_obj_header *)(op3 + 1); + switch (oh->objtype) { + case IPFW_OBJTYPE_TABLE: + memset(&ti, 0, sizeof(ti)); + ti.set = oh->set; + ti.uidx = oh->idx; + ti.tlvs = (oh + 1); + ti.tlen = sopt->sopt_valsize - sizeof(*oh); + error = ipfw_destroy_table(chain, &ti, 0); + break; + default: + error = ENOTSUP; + break; + } + break; + } + case IP_FW_TABLE_ADD: case IP_FW_TABLE_DEL: { ipfw_table_entry ent; + struct tentry_info tei; + struct tid_info ti; error = sooptcopyin(sopt, &ent, sizeof(ent), sizeof(ent)); if (error) break; - error = ipfw_del_table_entry(chain, ent.tbl, - &ent.addr, sizeof(ent.addr), ent.masklen, IPFW_TABLE_CIDR); + + memset(&tei, 0, sizeof(tei)); + tei.paddr = &ent.addr; + tei.plen = sizeof(ent.addr); + tei.masklen = ent.masklen; + tei.value = ent.value; + memset(&ti, 0, sizeof(ti)); + ti.set = RESVD_SET; + ti.uidx = ent.tbl; + ti.type = IPFW_TABLE_CIDR; + + error = (opt == IP_FW_TABLE_ADD) ? + ipfw_add_table_entry(chain, &ti, &tei) : + ipfw_del_table_entry(chain, &ti, &tei); } break; case IP_FW_TABLE_XADD: /* IP_FW3 */ case IP_FW_TABLE_XDEL: /* IP_FW3 */ { ipfw_table_xentry *xent = (ipfw_table_xentry *)(op3 + 1); + struct tentry_info tei; + struct tid_info ti; /* Check minimum header size */ if (IP_FW3_OPLENGTH(sopt) < offsetof(ipfw_table_xentry, k)) { @@ -1160,35 +1246,51 @@ len = xent->len - offsetof(ipfw_table_xentry, k); + memset(&tei, 0, sizeof(tei)); + tei.paddr = &xent->k; + tei.plen = len; + tei.masklen = xent->masklen; + tei.value = xent->value; + memset(&ti, 0, sizeof(ti)); + ti.set = 0; /* XXX: No way to specify set */ + ti.uidx = xent->tbl; + ti.type = xent->type; + error = (opt == IP_FW_TABLE_XADD) ? - ipfw_add_table_entry(chain, xent->tbl, &xent->k, - len, xent->masklen, xent->type, xent->value) : - ipfw_del_table_entry(chain, xent->tbl, &xent->k, - len, xent->masklen, xent->type); + ipfw_add_table_entry(chain, &ti, &tei) : + ipfw_del_table_entry(chain, &ti, &tei); } break; case IP_FW_TABLE_FLUSH: { u_int16_t tbl; + struct tid_info ti; error = sooptcopyin(sopt, &tbl, sizeof(tbl), sizeof(tbl)); if (error) break; - error = ipfw_flush_table(chain, tbl); + memset(&ti, 0, sizeof(ti)); + ti.set = 0; /* XXX: No way to specify set */ + ti.uidx = tbl; + error = ipfw_flush_table(chain, &ti); } break; case IP_FW_TABLE_GETSIZE: { u_int32_t tbl, cnt; + struct tid_info ti; if ((error = sooptcopyin(sopt, &tbl, sizeof(tbl), sizeof(tbl)))) break; + memset(&ti, 0, sizeof(ti)); + ti.set = 0; /* XXX: No way to specify set */ + ti.uidx = tbl; IPFW_RLOCK(chain); - error = ipfw_count_table(chain, tbl, &cnt); + error = ipfw_count_table(chain, &ti, &cnt); IPFW_RUNLOCK(chain); if (error) break; @@ -1199,6 +1301,7 @@ case IP_FW_TABLE_LIST: { ipfw_table *tbl; + struct tid_info ti; if (sopt->sopt_valsize < sizeof(*tbl)) { error = EINVAL; @@ -1213,8 +1316,11 @@ } tbl->size = (size - sizeof(*tbl)) / sizeof(ipfw_table_entry); + memset(&ti, 0, sizeof(ti)); + ti.set = 0; /* XXX: No way to specify set */ + ti.uidx = tbl->tbl; IPFW_RLOCK(chain); - error = ipfw_dump_table(chain, tbl); + error = ipfw_dump_table(chain, &ti, tbl); IPFW_RUNLOCK(chain); if (error) { free(tbl, M_TEMP); @@ -1228,16 +1334,20 @@ case IP_FW_TABLE_XGETSIZE: /* IP_FW3 */ { uint32_t *tbl; + struct tid_info ti; if (IP_FW3_OPLENGTH(sopt) < sizeof(uint32_t)) { error = EINVAL; break; } tbl = (uint32_t *)(op3 + 1); + memset(&ti, 0, sizeof(ti)); + ti.set = 0; /* XXX: No way to specify set */ + ti.uidx = *tbl; IPFW_RLOCK(chain); - error = ipfw_count_xtable(chain, *tbl, tbl); + error = ipfw_count_xtable(chain, &ti, tbl); IPFW_RUNLOCK(chain); if (error) break; @@ -1248,6 +1358,7 @@ case IP_FW_TABLE_XLIST: /* IP_FW3 */ { ipfw_xtable *tbl; + struct tid_info ti; if ((size = valsize) < sizeof(ipfw_xtable)) { error = EINVAL; @@ -1260,8 +1371,11 @@ /* Get maximum number of entries we can store */ tbl->size = (size - sizeof(ipfw_xtable)) / sizeof(ipfw_table_xentry); + memset(&ti, 0, sizeof(ti)); + ti.set = 0; /* XXX: No way to specify set */ + ti.uidx = tbl->tbl; IPFW_RLOCK(chain); - error = ipfw_dump_xtable(chain, tbl); + error = ipfw_dump_xtable(chain, &ti, tbl); IPFW_RUNLOCK(chain); if (error) { free(tbl, M_TEMP); @@ -1444,4 +1558,271 @@ return 0; } +/* + * Named object api + * + */ + +void +ipfw_objhash_bitmap_alloc(uint32_t items, void **idx, int *pblocks) +{ + size_t size; + int max_blocks; + void *idx_mask; + + items = roundup2(items, BLOCK_ITEMS); /* Align to block size */ + max_blocks = items / BLOCK_ITEMS; + size = items / 8; + idx_mask = malloc(size * IPFW_MAX_SETS, M_IPFW, M_WAITOK); + /* Mark all as free */ + memset(idx_mask, 0xFF, size * IPFW_MAX_SETS); + + *idx = idx_mask; + *pblocks = max_blocks; +} + +int +ipfw_objhash_bitmap_merge(struct namedobj_instance *ni, void **idx, int *blocks) +{ + int old_blocks, new_blocks; + u_long *old_idx, *new_idx; + int i; + + old_idx = ni->idx_mask; + old_blocks = ni->max_blocks; + new_idx = *idx; + new_blocks = *blocks; + + /* + * FIXME: Permit reducing total amount of tables + */ + if (old_blocks > new_blocks) + return (1); + + for (i = 0; i < IPFW_MAX_SETS; i++) { + memcpy(&new_idx[new_blocks * i], &old_idx[old_blocks * i], + old_blocks * sizeof(u_long)); + } + + ni->idx_mask = new_idx; + ni->max_blocks = new_blocks; + + /* Save old values */ + *idx = old_idx; + *blocks = old_blocks; + + return (0); +} + +void +ipfw_objhash_bitmap_free(void *idx, int blocks) +{ + + free(idx, M_IPFW); +} + +/* + * Creates named hash instance. + * Must be called without holding any locks. + * Return pointer to new instance. + */ +struct namedobj_instance * +ipfw_objhash_create(uint32_t items) +{ + struct namedobj_instance *ni; + int i; + size_t size; + + size = sizeof(struct namedobj_instance) + + sizeof(struct namedobjects_head) * NAMEDOBJ_HASH_SIZE + + sizeof(struct namedobjects_head) * NAMEDOBJ_HASH_SIZE; + + ni = malloc(size, M_IPFW, M_WAITOK | M_ZERO); + ni->nn_size = NAMEDOBJ_HASH_SIZE; + ni->nv_size = NAMEDOBJ_HASH_SIZE; + + ni->names = (struct namedobjects_head *)(ni +1); + ni->values = &ni->names[ni->nn_size]; + + for (i = 0; i < ni->nn_size; i++) + TAILQ_INIT(&ni->names[i]); + + for (i = 0; i < ni->nv_size; i++) + TAILQ_INIT(&ni->values[i]); + + /* Allocate bitmask separately due to possible resize */ + ipfw_objhash_bitmap_alloc(items, (void*)&ni->idx_mask, &ni->max_blocks); + + return (ni); +} + +void +ipfw_objhash_destroy(struct namedobj_instance *ni) +{ + + free(ni->idx_mask, M_IPFW); + free(ni, M_IPFW); +} + +static uint32_t +objhash_hash_name(struct namedobj_instance *ni, uint32_t set, char *name) +{ + uint32_t v; + + v = fnv_32_str(name, FNV1_32_INIT); + + return (v % ni->nn_size); +} + +static uint32_t +objhash_hash_val(struct namedobj_instance *ni, uint32_t set, uint32_t val) +{ + uint32_t v; + + v = val % (ni->nv_size - 1); + + return (v); +} + +struct named_object * +ipfw_objhash_lookup_name(struct namedobj_instance *ni, uint32_t set, char *name) +{ + struct named_object *no; + uint32_t hash; + + hash = objhash_hash_name(ni, set, name); + + TAILQ_FOREACH(no, &ni->names[hash], nn_next) { + if ((strcmp(no->name, name) == 0) && (no->set == set)) + return (no); + } + + return (NULL); +} + +struct named_object * +ipfw_objhash_lookup_idx(struct namedobj_instance *ni, uint32_t set, + uint16_t idx) +{ + struct named_object *no; + uint32_t hash; + + hash = objhash_hash_val(ni, set, idx); + + TAILQ_FOREACH(no, &ni->values[hash], nv_next) { + if ((no->kidx == idx) && (no->set == set)) + return (no); + } + + return (NULL); +} + +void +ipfw_objhash_add(struct namedobj_instance *ni, struct named_object *no) +{ + uint32_t hash; + + hash = objhash_hash_name(ni, no->set, no->name); + TAILQ_INSERT_HEAD(&ni->names[hash], no, nn_next); + + hash = objhash_hash_val(ni, no->set, no->kidx); + TAILQ_INSERT_HEAD(&ni->values[hash], no, nv_next); +} + +void +ipfw_objhash_del(struct namedobj_instance *ni, struct named_object *no) +{ + uint32_t hash; + + hash = objhash_hash_name(ni, no->set, no->name); + TAILQ_REMOVE(&ni->names[hash], no, nn_next); + + hash = objhash_hash_val(ni, no->set, no->kidx); + TAILQ_REMOVE(&ni->values[hash], no, nv_next); +} + +/* + * Runs @func for each found named object. + * It is safe to delete objects from callback + */ +void +ipfw_objhash_foreach(struct namedobj_instance *ni, objhash_cb_t *f, void *arg) +{ + struct named_object *no, *no_tmp; + int i; + + for (i = 0; i < ni->nn_size; i++) { + TAILQ_FOREACH_SAFE(no, &ni->names[i], nn_next, no_tmp) + f(ni, no, arg); + } +} + +/* + * Removes index from given set. + * Returns 0 on success. + */ +int +ipfw_objhash_free_idx(struct namedobj_instance *ni, uint32_t set, uint16_t idx) +{ + u_long *mask; + int i, v; + + i = idx / BLOCK_ITEMS; + v = idx % BLOCK_ITEMS; + + if ((i >= ni->max_blocks) || set >= IPFW_MAX_SETS) + return (1); + + mask = &ni->idx_mask[set * ni->max_blocks + i]; + + if ((*mask & ((u_long)1 << v)) != 0) + return (1); + + /* Mark as free */ + *mask |= (u_long)1 << v; + + /* Update free offset */ + if (ni->free_off[set] > i) + ni->free_off[set] = i; + + return (0); +} + +/* + * Allocate new index in given set and stores in in @pidx. + * Returns 0 on success. + */ +int +ipfw_objhash_alloc_idx(void *n, uint32_t set, uint16_t *pidx) +{ + struct namedobj_instance *ni; + u_long *mask; + int i, off, v; + + if (set >= IPFW_MAX_SETS) + return (-1); + + ni = (struct namedobj_instance *)n; + + off = ni->free_off[set]; + mask = &ni->idx_mask[set * ni->max_blocks + off]; + + for (i = off; i < ni->max_blocks; i++, mask++) { + if ((v = ffsl(*mask)) == 0) + continue; + + /* Mark as busy */ + *mask &= ~ ((u_long)1 << (v - 1)); + + ni->free_off[set] = i; + + v = BLOCK_ITEMS * i + v - 1; + + *pidx = v; + return (0); + } + + return (1); +} + /* end of file */ Index: sys/netpfil/ipfw/ip_fw_table.c =================================================================== --- sys/netpfil/ipfw/ip_fw_table.c +++ sys/netpfil/ipfw/ip_fw_table.c @@ -100,6 +100,49 @@ u_int32_t value; }; + /* + * Table has the following `type` concepts: + * + * `type` represents lookup key type (cidr, ifp, uid, etc..) + * `ftype` is pure userland field helping to properly format table data + * `atype` represents exact lookup algorithm for given tabletype. + * For example, we can use more efficient search schemes if we plan + * to use some specific table for storing host-routes only. + * + */ +struct table_config { + struct named_object no; + uint8_t ftype; /* format table type */ + uint8_t atype; /* algorith type */ + uint8_t linked; /* 1 if already linked */ + uint8_t spare0; + uint32_t count; /* Number of records */ + char tablename[64]; /* table name */ + void *state; /* Store some state if needed */ + void *xstate; +}; +#define TABLE_SET(set) ((V_fw_tables_sets != 0) ? set : 0) + +struct tables_config { + struct namedobj_instance *namehash; +}; + +static struct table_config *find_table(struct namedobj_instance *ni, + struct tid_info *ti); +static struct table_config *alloc_table_config(struct namedobj_instance *ni, + struct tid_info *ti); +static void free_table_config(struct namedobj_instance *ni, + struct table_config *tc); +static void link_table(struct ip_fw_chain *chain, struct table_config *tc); +static void unlink_table(struct ip_fw_chain *chain, struct table_config *tc); +static int alloc_table_state(void **state, void **xstate, uint8_t type); +static void free_table_state(void **state, void **xstate, uint8_t type); + + +#define CHAIN_TO_TCFG(chain) ((struct tables_config *)(chain)->tblcfg) +#define CHAIN_TO_NI(chain) (CHAIN_TO_TCFG(chain)->namehash) + + /* * The radix code expects addr and mask to be array of bytes, * with the first byte being the length of the array. rn_inithead @@ -136,62 +179,68 @@ #endif int -ipfw_add_table_entry(struct ip_fw_chain *ch, uint16_t tbl, void *paddr, - uint8_t plen, uint8_t mlen, uint8_t type, uint32_t value) +ipfw_add_table_entry(struct ip_fw_chain *ch, struct tid_info *ti, + struct tentry_info *tei) { - struct radix_node_head *rnh, **rnh_ptr; + struct radix_node_head *rnh; struct table_entry *ent; struct table_xentry *xent; struct radix_node *rn; in_addr_t addr; int offset; void *ent_ptr; struct sockaddr *addr_ptr, *mask_ptr; + struct table_config *tc, *tc_new; + struct namedobj_instance *ni; char c; + uint8_t mlen; + uint16_t kidx; - if (tbl >= V_fw_tables_max) + if (ti->uidx >= V_fw_tables_max) return (EINVAL); - switch (type) { + mlen = tei->masklen; + + switch (ti->type) { case IPFW_TABLE_CIDR: - if (plen == sizeof(in_addr_t)) { + if (tei->plen == sizeof(in_addr_t)) { #ifdef INET /* IPv4 case */ if (mlen > 32) return (EINVAL); ent = malloc(sizeof(*ent), M_IPFW_TBL, M_WAITOK | M_ZERO); - ent->value = value; + ent->value = tei->value; /* Set 'total' structure length */ KEY_LEN(ent->addr) = KEY_LEN_INET; KEY_LEN(ent->mask) = KEY_LEN_INET; /* Set offset of IPv4 address in bits */ offset = OFF_LEN_INET; - ent->mask.sin_addr.s_addr = htonl(mlen ? ~((1 << (32 - mlen)) - 1) : 0); - addr = *((in_addr_t *)paddr); + ent->mask.sin_addr.s_addr = + htonl(mlen ? ~((1 << (32 - mlen)) - 1) : 0); + addr = *((in_addr_t *)tei->paddr); ent->addr.sin_addr.s_addr = addr & ent->mask.sin_addr.s_addr; /* Set pointers */ - rnh_ptr = &ch->tables[tbl]; ent_ptr = ent; addr_ptr = (struct sockaddr *)&ent->addr; mask_ptr = (struct sockaddr *)&ent->mask; #endif #ifdef INET6 - } else if (plen == sizeof(struct in6_addr)) { + } else if (tei->plen == sizeof(struct in6_addr)) { /* IPv6 case */ if (mlen > 128) return (EINVAL); xent = malloc(sizeof(*xent), M_IPFW_TBL, M_WAITOK | M_ZERO); - xent->value = value; + xent->value = tei->value; /* Set 'total' structure length */ KEY_LEN(xent->a.addr6) = KEY_LEN_INET6; KEY_LEN(xent->m.mask6) = KEY_LEN_INET6; /* Set offset of IPv6 address in bits */ offset = OFF_LEN_INET6; ipv6_writemask(&xent->m.mask6.sin6_addr, mlen); - memcpy(&xent->a.addr6.sin6_addr, paddr, sizeof(struct in6_addr)); + memcpy(&xent->a.addr6.sin6_addr, tei->paddr, + sizeof(struct in6_addr)); APPLY_MASK(&xent->a.addr6.sin6_addr, &xent->m.mask6.sin6_addr); /* Set pointers */ - rnh_ptr = &ch->xtables[tbl]; ent_ptr = xent; addr_ptr = (struct sockaddr *)&xent->a.addr6; mask_ptr = (struct sockaddr *)&xent->m.mask6; @@ -204,30 +253,30 @@ case IPFW_TABLE_INTERFACE: /* Check if string is terminated */ - c = ((char *)paddr)[IF_NAMESIZE - 1]; - ((char *)paddr)[IF_NAMESIZE - 1] = '\0'; - if (((mlen = strlen((char *)paddr)) == IF_NAMESIZE - 1) && (c != '\0')) + c = ((char *)tei->paddr)[IF_NAMESIZE - 1]; + ((char *)tei->paddr)[IF_NAMESIZE - 1] = '\0'; + mlen = strlen((char *)tei->paddr); + if ((mlen == IF_NAMESIZE - 1) && (c != '\0')) return (EINVAL); /* Include last \0 into comparison */ mlen++; xent = malloc(sizeof(*xent), M_IPFW_TBL, M_WAITOK | M_ZERO); - xent->value = value; + xent->value = tei->value; /* Set 'total' structure length */ KEY_LEN(xent->a.iface) = KEY_LEN_IFACE + mlen; KEY_LEN(xent->m.ifmask) = KEY_LEN_IFACE + mlen; /* Set offset of interface name in bits */ offset = OFF_LEN_IFACE; - memcpy(xent->a.iface.ifname, paddr, mlen); + memcpy(xent->a.iface.ifname, tei->paddr, mlen); /* Assume direct match */ /* TODO: Add interface pattern matching */ #if 0 memset(xent->m.ifmask.ifname, 0xFF, IF_NAMESIZE); mask_ptr = (struct sockaddr *)&xent->m.ifmask; #endif /* Set pointers */ - rnh_ptr = &ch->xtables[tbl]; ent_ptr = xent; addr_ptr = (struct sockaddr *)&xent->a.iface; mask_ptr = NULL; @@ -237,84 +286,128 @@ return (EINVAL); } - IPFW_WLOCK(ch); + IPFW_UH_WLOCK(ch); - /* Check if tabletype is valid */ - if ((ch->tabletype[tbl] != 0) && (ch->tabletype[tbl] != type)) { - IPFW_WUNLOCK(ch); - free(ent_ptr, M_IPFW_TBL); - return (EINVAL); - } + ni = CHAIN_TO_NI(ch); - /* Check if radix tree exists */ - if ((rnh = *rnh_ptr) == NULL) { - IPFW_WUNLOCK(ch); - /* Create radix for a new table */ - if (!rn_inithead((void **)&rnh, offset)) { - free(ent_ptr, M_IPFW_TBL); + tc_new = NULL; + if ((tc = find_table(ni, ti)) == NULL) { + /* Not found. We have to create new one */ + IPFW_UH_WUNLOCK(ch); + + tc_new = alloc_table_config(ni, ti); + if (tc_new == NULL) return (ENOMEM); - } - IPFW_WLOCK(ch); - if (*rnh_ptr != NULL) { - /* Tree is already attached by other thread */ - rn_detachhead((void **)&rnh); - rnh = *rnh_ptr; - /* Check table type another time */ - if (ch->tabletype[tbl] != type) { - IPFW_WUNLOCK(ch); - free(ent_ptr, M_IPFW_TBL); + IPFW_UH_WLOCK(ch); + + /* Check if table has already allocated by other thread */ + if ((tc = find_table(ni, ti)) != NULL) { + if (tc->no.type != ti->type) { + IPFW_UH_WUNLOCK(ch); + free_table_config(ni, tc); return (EINVAL); } } else { - *rnh_ptr = rnh; - /* - * Set table type. It can be set already - * (if we have IPv6-only table) but setting - * it another time does not hurt + /* + * New table. + * Set tc_new to zero not to free it afterwards. */ - ch->tabletype[tbl] = type; + tc = tc_new; + tc_new = NULL; + + /* Allocate table index. */ + if (ipfw_objhash_alloc_idx(ni, ti->set, &kidx) != 0) { + /* Index full. */ + IPFW_UH_WUNLOCK(ch); + printf("Unable to allocate index for table %s." + " Consider increasing " + "net.inet.ip.fw.tables_max", + tc->no.name); + free_table_config(ni, tc); + return (EBUSY); + } + /* Save kidx */ + tc->no.kidx = kidx; } + } else { + /* We still have to check table type */ + if (tc->no.type != ti->type) { + IPFW_UH_WUNLOCK(ch); + return (EINVAL); + } + } + kidx = tc->no.kidx; + + /* We've got valid table in @tc. Let's add data */ + IPFW_WLOCK(ch); + + if (tc->linked == 0) { + link_table(ch, tc); + } + + /* XXX: Temporary until splitting add/del to per-type functions */ + rnh = NULL; + switch (ti->type) { + case IPFW_TABLE_CIDR: + if (tei->plen == sizeof(in_addr_t)) + rnh = ch->tables[kidx]; + else + rnh = ch->xtables[kidx]; + break; + case IPFW_TABLE_INTERFACE: + rnh = ch->xtables[kidx]; + break; } rn = rnh->rnh_addaddr(addr_ptr, mask_ptr, rnh, ent_ptr); IPFW_WUNLOCK(ch); + IPFW_UH_WUNLOCK(ch); + + if (tc_new != NULL) + free_table_config(ni, tc); if (rn == NULL) { free(ent_ptr, M_IPFW_TBL); return (EEXIST); } + return (0); } int -ipfw_del_table_entry(struct ip_fw_chain *ch, uint16_t tbl, void *paddr, - uint8_t plen, uint8_t mlen, uint8_t type) +ipfw_del_table_entry(struct ip_fw_chain *ch, struct tid_info *ti, + struct tentry_info *tei) { - struct radix_node_head *rnh, **rnh_ptr; + struct radix_node_head *rnh; struct table_entry *ent; in_addr_t addr; struct sockaddr_in sa, mask; struct sockaddr *sa_ptr, *mask_ptr; + struct table_config *tc; + struct namedobj_instance *ni; char c; + uint8_t mlen; + uint16_t kidx; - if (tbl >= V_fw_tables_max) + if (ti->uidx >= V_fw_tables_max) return (EINVAL); - switch (type) { + mlen = tei->masklen; + + switch (ti->type) { case IPFW_TABLE_CIDR: - if (plen == sizeof(in_addr_t)) { + if (tei->plen == sizeof(in_addr_t)) { /* Set 'total' structure length */ KEY_LEN(sa) = KEY_LEN_INET; KEY_LEN(mask) = KEY_LEN_INET; mask.sin_addr.s_addr = htonl(mlen ? ~((1 << (32 - mlen)) - 1) : 0); - addr = *((in_addr_t *)paddr); + addr = *((in_addr_t *)tei->paddr); sa.sin_addr.s_addr = addr & mask.sin_addr.s_addr; - rnh_ptr = &ch->tables[tbl]; sa_ptr = (struct sockaddr *)&sa; mask_ptr = (struct sockaddr *)&mask; #ifdef INET6 - } else if (plen == sizeof(struct in6_addr)) { + } else if (tei->plen == sizeof(struct in6_addr)) { /* IPv6 case */ if (mlen > 128) return (EINVAL); @@ -325,9 +418,9 @@ KEY_LEN(sa6) = KEY_LEN_INET6; KEY_LEN(mask6) = KEY_LEN_INET6; ipv6_writemask(&mask6.sin6_addr, mlen); - memcpy(&sa6.sin6_addr, paddr, sizeof(struct in6_addr)); + memcpy(&sa6.sin6_addr, tei->paddr, + sizeof(struct in6_addr)); APPLY_MASK(&sa6.sin6_addr, &mask6.sin6_addr); - rnh_ptr = &ch->xtables[tbl]; sa_ptr = (struct sockaddr *)&sa6; mask_ptr = (struct sockaddr *)&mask6; #endif @@ -339,9 +432,10 @@ case IPFW_TABLE_INTERFACE: /* Check if string is terminated */ - c = ((char *)paddr)[IF_NAMESIZE - 1]; - ((char *)paddr)[IF_NAMESIZE - 1] = '\0'; - if (((mlen = strlen((char *)paddr)) == IF_NAMESIZE - 1) && (c != '\0')) + c = ((char *)tei->paddr)[IF_NAMESIZE - 1]; + ((char *)tei->paddr)[IF_NAMESIZE - 1] = '\0'; + mlen = strlen((char *)tei->paddr); + if ((mlen == IF_NAMESIZE - 1) && (c != '\0')) return (EINVAL); struct xaddr_iface ifname, ifmask; @@ -360,31 +454,49 @@ mask_ptr = (struct sockaddr *)&ifmask; #endif mask_ptr = NULL; - memcpy(ifname.ifname, paddr, mlen); + memcpy(ifname.ifname, tei->paddr, mlen); /* Set pointers */ - rnh_ptr = &ch->xtables[tbl]; sa_ptr = (struct sockaddr *)&ifname; break; default: return (EINVAL); } - IPFW_WLOCK(ch); - if ((rnh = *rnh_ptr) == NULL) { - IPFW_WUNLOCK(ch); + IPFW_UH_RLOCK(ch); + ni = CHAIN_TO_NI(ch); + if ((tc = find_table(ni, ti)) == NULL) { + IPFW_UH_RUNLOCK(ch); return (ESRCH); } - if (ch->tabletype[tbl] != type) { - IPFW_WUNLOCK(ch); + if (tc->no.type != ti->type) { + IPFW_UH_RUNLOCK(ch); return (EINVAL); } + kidx = tc->no.kidx; + + IPFW_WLOCK(ch); + + rnh = NULL; + switch (ti->type) { + case IPFW_TABLE_CIDR: + if (tei->plen == sizeof(in_addr_t)) + rnh = ch->tables[kidx]; + else + rnh = ch->xtables[kidx]; + break; + case IPFW_TABLE_INTERFACE: + rnh = ch->xtables[kidx]; + break; + } ent = (struct table_entry *)rnh->rnh_deladdr(sa_ptr, mask_ptr, rnh); IPFW_WUNLOCK(ch); + IPFW_UH_RUNLOCK(ch); + if (ent == NULL) return (ESRCH); @@ -405,102 +517,206 @@ return (0); } +/* + * Flushes all entries in given table minimizing hoding chain WLOCKs. + * + */ int -ipfw_flush_table(struct ip_fw_chain *ch, uint16_t tbl) +ipfw_flush_table(struct ip_fw_chain *ch, struct tid_info *ti) { - struct radix_node_head *rnh, *xrnh; + struct namedobj_instance *ni; + struct table_config *tc; + void *ostate, *oxstate; + void *state, *xstate; + int error; + uint8_t type; + uint16_t kidx; - if (tbl >= V_fw_tables_max) + if (ti->uidx >= V_fw_tables_max) return (EINVAL); /* - * We free both (IPv4 and extended) radix trees and - * clear table type here to permit table to be reused - * for different type without module reload + * Stage 1: determine table type. + * Reference found table to ensure it won't disappear. + */ + IPFW_UH_WLOCK(ch); + ni = CHAIN_TO_NI(ch); + if ((tc = find_table(ni, ti)) == NULL) { + IPFW_UH_WUNLOCK(ch); + return (ESRCH); + } + type = tc->no.type; + tc->no.refcnt++; + IPFW_UH_WUNLOCK(ch); + + /* + * Stage 2: allocate new state for given type. */ + if ((error = alloc_table_state(&state, &xstate, type)) != 0) { + IPFW_UH_WLOCK(ch); + tc->no.refcnt--; + IPFW_UH_WUNLOCK(ch); + return (error); + } + /* + * Stage 3: swap old state pointers with newly-allocated ones. + * Decrease refcount. + */ + IPFW_UH_WLOCK(ch); IPFW_WLOCK(ch); - /* Set IPv4 table pointer to zero */ - if ((rnh = ch->tables[tbl]) != NULL) - ch->tables[tbl] = NULL; - /* Set extended table pointer to zero */ - if ((xrnh = ch->xtables[tbl]) != NULL) - ch->xtables[tbl] = NULL; - /* Zero table type */ - ch->tabletype[tbl] = 0; + + ni = CHAIN_TO_NI(ch); + kidx = tc->no.kidx; + + ostate = ch->tables[kidx]; + ch->tables[kidx] = state; + oxstate = ch->xtables[kidx]; + ch->xtables[kidx] = xstate; + + tc->no.refcnt--; + IPFW_WUNLOCK(ch); + IPFW_UH_WUNLOCK(ch); - if (rnh != NULL) { - rnh->rnh_walktree(rnh, flush_table_entry, rnh); - rn_detachhead((void **)&rnh); + /* + * Stage 4: perform real flush. + */ + free_table_state(&ostate, &xstate, tc->no.type); + + return (0); +} + +/* + * Destroys given table @ti: flushes it, + */ +int +ipfw_destroy_table(struct ip_fw_chain *ch, struct tid_info *ti, int force) +{ + struct namedobj_instance *ni; + struct table_config *tc; + + ti->set = TABLE_SET(ti->set); + + IPFW_UH_WLOCK(ch); + + ni = CHAIN_TO_NI(ch); + if ((tc = find_table(ni, ti)) == NULL) { + IPFW_UH_WUNLOCK(ch); + return (ESRCH); } - if (xrnh != NULL) { - xrnh->rnh_walktree(xrnh, flush_table_entry, xrnh); - rn_detachhead((void **)&xrnh); + /* Do not permit destroying used tables */ + if (tc->no.refcnt > 0 && force == 0) { + IPFW_UH_WUNLOCK(ch); + return (EBUSY); } + IPFW_WLOCK(ch); + unlink_table(ch, tc); + IPFW_WUNLOCK(ch); + + /* Free obj index */ + if (ipfw_objhash_free_idx(ni, tc->no.set, tc->no.kidx) != 0) + printf("Error unlinking kidx %d from table %s\n", + tc->no.kidx, tc->tablename); + + IPFW_UH_WUNLOCK(ch); + + free_table_config(ni, tc); + return (0); } +static void +destroy_table_locked(struct namedobj_instance *ni, struct named_object *no, + void *arg) +{ + + unlink_table((struct ip_fw_chain *)arg, (struct table_config *)no); + if (ipfw_objhash_free_idx(ni, no->set, no->kidx) != 0) + printf("Error unlinking kidx %d from table %s\n", + no->kidx, no->name); + free_table_config(ni, (struct table_config *)no); +} + void ipfw_destroy_tables(struct ip_fw_chain *ch) { - uint16_t tbl; - /* Flush all tables */ - for (tbl = 0; tbl < V_fw_tables_max; tbl++) - ipfw_flush_table(ch, tbl); + /* Remove all tables from working set */ + IPFW_UH_WLOCK(ch); + IPFW_WLOCK(ch); + ipfw_objhash_foreach(CHAIN_TO_NI(ch), destroy_table_locked, ch); + IPFW_WUNLOCK(ch); + IPFW_UH_WUNLOCK(ch); /* Free pointers itself */ free(ch->tables, M_IPFW); free(ch->xtables, M_IPFW); - free(ch->tabletype, M_IPFW); + + ipfw_objhash_destroy(CHAIN_TO_NI(ch)); + free(CHAIN_TO_TCFG(ch), M_IPFW); } int ipfw_init_tables(struct ip_fw_chain *ch) { + struct tables_config *tcfg; + /* Allocate pointers */ ch->tables = malloc(V_fw_tables_max * sizeof(void *), M_IPFW, M_WAITOK | M_ZERO); ch->xtables = malloc(V_fw_tables_max * sizeof(void *), M_IPFW, M_WAITOK | M_ZERO); - ch->tabletype = malloc(V_fw_tables_max * sizeof(uint8_t), M_IPFW, M_WAITOK | M_ZERO); + + tcfg = malloc(sizeof(struct tables_config), M_IPFW, M_WAITOK | M_ZERO); + tcfg->namehash = ipfw_objhash_create(V_fw_tables_max); + ch->tblcfg = tcfg; + return (0); } int ipfw_resize_tables(struct ip_fw_chain *ch, unsigned int ntables) { struct radix_node_head **tables, **xtables, *rnh; struct radix_node_head **tables_old, **xtables_old; - uint8_t *tabletype, *tabletype_old; unsigned int ntables_old, tbl; + struct namedobj_instance *ni; + void *new_idx; + int new_blocks; /* Check new value for validity */ if (ntables > IPFW_TABLES_MAX) ntables = IPFW_TABLES_MAX; /* Allocate new pointers */ tables = malloc(ntables * sizeof(void *), M_IPFW, M_WAITOK | M_ZERO); xtables = malloc(ntables * sizeof(void *), M_IPFW, M_WAITOK | M_ZERO); - tabletype = malloc(ntables * sizeof(uint8_t), M_IPFW, M_WAITOK | M_ZERO); + ipfw_objhash_bitmap_alloc(ntables, (void *)&new_idx, &new_blocks); IPFW_WLOCK(ch); tbl = (ntables >= V_fw_tables_max) ? V_fw_tables_max : ntables; + ni = CHAIN_TO_NI(ch); + + /* Temportary restrict decreasing max_tables */ + if (ipfw_objhash_bitmap_merge(ni, &new_idx, &new_blocks) != 0) { + IPFW_WUNLOCK(ch); + free(tables, M_IPFW); + free(xtables, M_IPFW); + ipfw_objhash_bitmap_free(new_idx, new_blocks); + return (EINVAL); + } /* Copy old table pointers */ memcpy(tables, ch->tables, sizeof(void *) * tbl); memcpy(xtables, ch->xtables, sizeof(void *) * tbl); - memcpy(tabletype, ch->tabletype, sizeof(uint8_t) * tbl); /* Change pointers and number of tables */ tables_old = ch->tables; xtables_old = ch->xtables; - tabletype_old = ch->tabletype; ch->tables = tables; ch->xtables = xtables; - ch->tabletype = tabletype; ntables_old = V_fw_tables_max; V_fw_tables_max = ntables; @@ -525,7 +741,7 @@ /* Free old pointers */ free(tables_old, M_IPFW); free(xtables_old, M_IPFW); - free(tabletype_old, M_IPFW); + ipfw_objhash_bitmap_free(new_idx, new_blocks); return (0); } @@ -602,14 +818,17 @@ } int -ipfw_count_table(struct ip_fw_chain *ch, uint32_t tbl, uint32_t *cnt) +ipfw_count_table(struct ip_fw_chain *ch, struct tid_info *ti, uint32_t *cnt) { struct radix_node_head *rnh; + struct table_config *tc; - if (tbl >= V_fw_tables_max) + if (ti->uidx >= V_fw_tables_max) return (EINVAL); + if ((tc = find_table(CHAIN_TO_NI(ch), ti)) == NULL) + return (ESRCH); *cnt = 0; - if ((rnh = ch->tables[tbl]) == NULL) + if ((rnh = ch->tables[tc->no.kidx]) == NULL) return (0); rnh->rnh_walktree(rnh, count_table_entry, cnt); return (0); @@ -637,14 +856,17 @@ } int -ipfw_dump_table(struct ip_fw_chain *ch, ipfw_table *tbl) +ipfw_dump_table(struct ip_fw_chain *ch, struct tid_info *ti, ipfw_table *tbl) { struct radix_node_head *rnh; + struct table_config *tc; - if (tbl->tbl >= V_fw_tables_max) + if (ti->uidx >= V_fw_tables_max) return (EINVAL); + if ((tc = find_table(CHAIN_TO_NI(ch), ti)) == NULL) + return (ESRCH); tbl->cnt = 0; - if ((rnh = ch->tables[tbl->tbl]) == NULL) + if ((rnh = ch->tables[tc->no.kidx]) == NULL) return (0); rnh->rnh_walktree(rnh, dump_table_entry, tbl); return (0); @@ -660,16 +882,19 @@ } int -ipfw_count_xtable(struct ip_fw_chain *ch, uint32_t tbl, uint32_t *cnt) +ipfw_count_xtable(struct ip_fw_chain *ch, struct tid_info *ti, uint32_t *cnt) { struct radix_node_head *rnh; + struct table_config *tc; - if (tbl >= V_fw_tables_max) + if (ti->uidx >= V_fw_tables_max) return (EINVAL); *cnt = 0; - if ((rnh = ch->tables[tbl]) != NULL) + if ((tc = find_table(CHAIN_TO_NI(ch), ti)) == NULL) + return (0); /* XXX: We should return ESRCH */ + if ((rnh = ch->tables[tc->no.kidx]) != NULL) rnh->rnh_walktree(rnh, count_table_xentry, cnt); - if ((rnh = ch->xtables[tbl]) != NULL) + if ((rnh = ch->xtables[tc->no.kidx]) != NULL) rnh->rnh_walktree(rnh, count_table_xentry, cnt); /* Return zero if table is empty */ if (*cnt > 0) @@ -747,19 +972,700 @@ } int -ipfw_dump_xtable(struct ip_fw_chain *ch, ipfw_xtable *tbl) +ipfw_dump_xtable(struct ip_fw_chain *ch, struct tid_info *ti, ipfw_xtable *tbl) { struct radix_node_head *rnh; + struct table_config *tc; if (tbl->tbl >= V_fw_tables_max) return (EINVAL); tbl->cnt = 0; - tbl->type = ch->tabletype[tbl->tbl]; - if ((rnh = ch->tables[tbl->tbl]) != NULL) + + if ((tc = find_table(CHAIN_TO_NI(ch), ti)) == NULL) + return (0); /* XXX: We should return ESRCH */ + tbl->type = tc->no.type; + if ((rnh = ch->tables[tc->no.kidx]) != NULL) rnh->rnh_walktree(rnh, dump_table_xentry_base, tbl); - if ((rnh = ch->xtables[tbl->tbl]) != NULL) + if ((rnh = ch->xtables[tc->no.kidx]) != NULL) rnh->rnh_walktree(rnh, dump_table_xentry_extended, tbl); return (0); } +/* + * Tables rewriting code + * + */ + +/* + * Determine table number and lookup type for @cmd. + * Fill @tbl and @type with appropriate values. + * Returns 0 for relevant opcodes, 1 otherwise. + */ +static int +classify_table_opcode(ipfw_insn *cmd, uint16_t *puidx, uint8_t *ptype) +{ + ipfw_insn_if *cmdif; + int skip; + uint16_t v; + + skip = 1; + + switch (cmd->opcode) { + case O_IP_SRC_LOOKUP: + case O_IP_DST_LOOKUP: + /* Basic IPv4/IPv6 or u32 lookups */ + *puidx = cmd->arg1; + /* Assume CIDR by default */ + *ptype = IPFW_TABLE_CIDR; + skip = 0; + + if (F_LEN(cmd) > F_INSN_SIZE(ipfw_insn_u32)) { + /* + * generic lookup. The key must be + * in 32bit big-endian format. + */ + v = ((ipfw_insn_u32 *)cmd)->d[1]; + switch (v) { + case 0: + case 1: + /* IPv4 src/dst */ + break; + case 2: + case 3: + /* src/dst port */ + //type = IPFW_TABLE_U16; + break; + case 4: + /* uid/gid */ + //type = IPFW_TABLE_U32; + case 5: + //type = IPFW_TABLE_U32; + /* jid */ + case 6: + //type = IPFW_TABLE_U16; + /* dscp */ + break; + } + } + break; + case O_XMIT: + case O_RECV: + case O_VIA: + /* Interface table, possibly */ + cmdif = (ipfw_insn_if *)cmd; + if (cmdif->name[0] != '\1') + break; + + *ptype = IPFW_TABLE_INTERFACE; + *puidx = cmdif->p.glob; + skip = 0; + break; + } + + return (skip); +} + +/* + * Sets new table value for given opcode. + * Assume the same opcodes as classify_table_opcode() + */ +static void +update_table_opcode(ipfw_insn *cmd, uint16_t idx) +{ + ipfw_insn_if *cmdif; + + switch (cmd->opcode) { + case O_IP_SRC_LOOKUP: + case O_IP_DST_LOOKUP: + /* Basic IPv4/IPv6 or u32 lookups */ + cmd->arg1 = idx; + break; + case O_XMIT: + case O_RECV: + case O_VIA: + /* Interface table, possibly */ + cmdif = (ipfw_insn_if *)cmd; + cmdif->p.glob = idx; + break; + } +} + +static char * +find_name_tlv(void *tlvs, int len, uint16_t uidx) +{ + ipfw_xtable_ntlv *ntlv; + uintptr_t pa, pe; + int l; + + pa = (uintptr_t)tlvs; + pe = pa + len; + l = 0; + for (; pa < pe; pa += l) { + ntlv = (ipfw_xtable_ntlv *)pa; + l = ntlv->head.length; + if (ntlv->head.type != IPFW_TLV_NAME) + continue; + if (ntlv->idx != uidx) + continue; + + return (ntlv->name); + } + + return (NULL); +} + +static struct table_config * +find_table(struct namedobj_instance *ni, struct tid_info *ti) +{ + char *name, bname[16]; + struct named_object *no; + + if (ti->tlvs != NULL) { + name = find_name_tlv(ti->tlvs, ti->tlen, ti->uidx); + if (name == NULL) + return (NULL); + } else { + snprintf(bname, sizeof(bname), "%d", ti->uidx); + name = bname; + } + + no = ipfw_objhash_lookup_name(ni, ti->set, name); + + return ((struct table_config *)no); +} + +static int +alloc_table_state(void **state, void **xstate, uint8_t type) +{ + + switch (type) { + case IPFW_TABLE_CIDR: + if (!rn_inithead(state, OFF_LEN_INET)) + return (ENOMEM); + if (!rn_inithead(xstate, OFF_LEN_INET6)) { + rn_detachhead(state); + return (ENOMEM); + } + break; + case IPFW_TABLE_INTERFACE: + *state = NULL; + if (!rn_inithead(xstate, OFF_LEN_IFACE)) + return (ENOMEM); + break; + } + + return (0); +} + + +static struct table_config * +alloc_table_config(struct namedobj_instance *ni, struct tid_info *ti) +{ + char *name, bname[16]; + struct table_config *tc; + int error; + + if (ti->tlvs != NULL) { + name = find_name_tlv(ti->tlvs, ti->tlen, ti->uidx); + if (name == NULL) + return (NULL); + } else { + snprintf(bname, sizeof(bname), "%d", ti->uidx); + name = bname; + } + + tc = malloc(sizeof(struct table_config), M_IPFW, M_WAITOK | M_ZERO); + tc->no.name = tc->tablename; + tc->no.type = ti->type; + tc->no.set = ti->set; + strlcpy(tc->tablename, name, sizeof(tc->tablename)); + + if (ti->tlvs == NULL) { + tc->no.compat = 1; + tc->no.uidx = ti->uidx; + } + + /* Preallocate data structures for new tables */ + error = alloc_table_state(&tc->state, &tc->xstate, ti->type); + if (error != 0) { + free(tc, M_IPFW); + return (NULL); + } + + return (tc); +} + +static void +free_table_state(void **state, void **xstate, uint8_t type) +{ + struct radix_node_head *rnh; + + switch (type) { + case IPFW_TABLE_CIDR: + rnh = (struct radix_node_head *)(*state); + rnh->rnh_walktree(rnh, flush_table_entry, rnh); + rn_detachhead(state); + + rnh = (struct radix_node_head *)(*xstate); + rnh->rnh_walktree(rnh, flush_table_entry, rnh); + rn_detachhead(xstate); + break; + case IPFW_TABLE_INTERFACE: + rnh = (struct radix_node_head *)(*xstate); + rnh->rnh_walktree(rnh, flush_table_entry, rnh); + rn_detachhead(xstate); + break; + } +} + +static void +free_table_config(struct namedobj_instance *ni, struct table_config *tc) +{ + + if (tc->linked == 0) + free_table_state(&tc->state, &tc->xstate, tc->no.type); + + free(tc, M_IPFW); +} + +/* + * Links @tc to @chain table named instance. + * Sets appropriate type/states in @chain table info. + */ +static void +link_table(struct ip_fw_chain *chain, struct table_config *tc) +{ + struct namedobj_instance *ni; + uint16_t kidx; + + IPFW_UH_WLOCK_ASSERT(chain); + IPFW_WLOCK_ASSERT(chain); + + ni = CHAIN_TO_NI(chain); + kidx = tc->no.kidx; + + ipfw_objhash_add(ni, &tc->no); + chain->tables[kidx] = tc->state; + chain->xtables[kidx] = tc->xstate; + + tc->linked = 1; +} + +/* + * Unlinks @tc from @chain table named instance. + * Zeroes states in @chain and stores them in @tc. + */ +static void +unlink_table(struct ip_fw_chain *chain, struct table_config *tc) +{ + struct namedobj_instance *ni; + uint16_t kidx; + + IPFW_UH_WLOCK_ASSERT(chain); + IPFW_WLOCK_ASSERT(chain); + + ni = CHAIN_TO_NI(chain); + kidx = tc->no.kidx; + + /* Clear state and save pointers for flush */ + ipfw_objhash_del(ni, &tc->no); + tc->state = chain->tables[kidx]; + chain->tables[kidx] = NULL; + tc->xstate = chain->xtables[kidx]; + chain->xtables[kidx] = NULL; + + tc->linked = 0; +} + +/* + * Finds named object by @uidx number. + * Refs found object, allocate new index for non-existing object. + * Fills in @pidx with userland/kernel indexes. + * + * Returns 0 on success. + */ +static int +bind_table(struct namedobj_instance *ni, struct rule_check_info *ci, + struct obj_idx *pidx, struct tid_info *ti) +{ + struct table_config *tc; + + tc = find_table(ni, ti); + + pidx->uidx = ti->uidx; + pidx->type = ti->type; + + if (tc == NULL) { + /* Try to acquire refcount */ + if (ipfw_objhash_alloc_idx(ni, ti->set, &pidx->kidx) != 0) { + printf("Unable to allocate table index in set %u." + " Consider increasing net.inet.ip.fw.tables_max", + ti->set); + return (EBUSY); + } + + pidx->new = 1; + ci->new_tables++; + + return (0); + } + + /* Check if table type if valid first */ + if (tc->no.type != ti->type) + return (EINVAL); + + tc->no.refcnt++; + + pidx->kidx = tc->no.kidx; + + return (0); +} + +/* + * Compatibility function for old ipfw(8) binaries. + * Rewrites table kernel indices with userland ones. + * Works for \d+ talbes only (e.g. for tables, converted + * from old numbered system calls). + * + * Returns 0 on success. + * Raises error on any other tables. + */ +int +ipfw_rewrite_table_kidx(struct ip_fw_chain *chain, struct ip_fw *rule) +{ + int cmdlen, l; + ipfw_insn *cmd; + uint32_t set; + uint16_t kidx; + uint8_t type; + struct named_object *no; + struct namedobj_instance *ni; + + ni = CHAIN_TO_NI(chain); + + set = TABLE_SET(rule->set); + + l = rule->cmd_len; + cmd = rule->cmd; + cmdlen = 0; + for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) { + cmdlen = F_LEN(cmd); + + if (classify_table_opcode(cmd, &kidx, &type) != 0) + continue; + + if ((no = ipfw_objhash_lookup_idx(ni, set, kidx)) == NULL) + return (1); + + if (no->compat == 0) + return (2); + + update_table_opcode(cmd, no->uidx); + } + + return (0); +} + + +/* + * Checks is opcode is referencing table of appropriate type. + * Adds reference count for found table if true. + * Rewrites user-supplied opcode values with kernel ones. + * + * Returns 0 on success and appropriate error code otherwise. + */ +int +ipfw_rewrite_table_uidx(struct ip_fw_chain *chain, + struct rule_check_info *ci) +{ + int cmdlen, error, ftype, l; + ipfw_insn *cmd; + uint16_t uidx; + uint8_t type; + struct table_config *tc; + struct namedobj_instance *ni; + struct named_object *no, *no_n, *no_tmp; + struct obj_idx *pidx, *p, *oib; + struct namedobjects_head nh; + struct tid_info ti; + + ni = CHAIN_TO_NI(chain); + + /* + * Prepare an array for storing opcode indices. + * Use stack allocation by default. + */ + if (ci->table_opcodes <= (sizeof(ci->obuf)/sizeof(ci->obuf[0]))) { + /* Stack */ + pidx = ci->obuf; + } else + pidx = malloc(ci->table_opcodes * sizeof(struct obj_idx), + M_IPFW, M_WAITOK | M_ZERO); + + oib = pidx; + error = 0; + + type = 0; + ftype = 0; + + ci->tableset = TABLE_SET(ci->krule->set); + + memset(&ti, 0, sizeof(ti)); + ti.set = ci->tableset; + ti.tlvs = ci->tlvs; + ti.tlen = ci->tlen; + + /* + * Stage 1: reference existing tables and determine number + * of tables we need to allocate + */ + IPFW_UH_WLOCK(chain); + + l = ci->krule->cmd_len; + cmd = ci->krule->cmd; + cmdlen = 0; + for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) { + cmdlen = F_LEN(cmd); + + if (classify_table_opcode(cmd, &ti.uidx, &ti.type) != 0) + continue; + + /* + * Got table opcode with necessary info. + * Try to reference existing tables and allocate + * indices for non-existing one while holding write lock. + */ + if ((error = bind_table(ni, ci, pidx, &ti)) != 0) + break; + + /* + * @pidx stores either existing ref'd table id or new one. + * Move to next index + */ + + pidx++; + } + + if (error != 0) { + /* Unref everything we have already done */ + for (p = oib; p < pidx; p++) { + if (p->new != 0) { + ipfw_objhash_free_idx(ni, ci->tableset,p->kidx); + continue; + } + + /* Find & unref by existing idx */ + no = ipfw_objhash_lookup_idx(ni, ci->tableset, p->kidx); + KASSERT(no!=NULL, ("Ref'd table %d disappeared", + p->kidx)); + + no->refcnt--; + } + + IPFW_UH_WUNLOCK(chain); + + if (oib != ci->obuf) + free(oib, M_IPFW); + + return (error); + } + + IPFW_UH_WUNLOCK(chain); + + /* + * Stage 2: allocate table configs for every non-existent table + */ + + if (ci->new_tables > 0) { + /* Prepare queue to store configs */ + TAILQ_INIT(&nh); + + for (p = oib; p < pidx; p++) { + if (p->new == 0) + continue; + + /* TODO: get name from TLV */ + ti.uidx = p->uidx; + ti.type = p->type; + + tc = alloc_table_config(ni, &ti); + + if (tc == NULL) { + error = ENOMEM; + goto free; + } + + tc->no.kidx = p->kidx; + tc->no.refcnt = 1; + + /* Add to list */ + TAILQ_INSERT_TAIL(&nh, &tc->no, nn_next); + } + + /* + * Stage 2.1: Check if we're going to create 2 tables + * with the same name, but different table types. + */ + TAILQ_FOREACH(no, &nh, nn_next) { + TAILQ_FOREACH(no_tmp, &nh, nn_next) { + if (strcmp(no->name, no_tmp->name) != 0) + continue; + if (no->type != no_tmp->type) { + error = EINVAL; + goto free; + } + } + } + + /* + * Stage 3: link & reference new table configs + */ + + IPFW_UH_WLOCK(chain); + + /* + * Step 3.1: Check if some tables we need to create have been + * already created with different table type. + */ + + error = 0; + TAILQ_FOREACH_SAFE(no, &nh, nn_next, no_tmp) { + no_n = ipfw_objhash_lookup_name(ni, no->set, no->name); + if (no_n == NULL) + continue; + + if (no_n->type != no->type) { + error = EINVAL; + break; + } + + } + + if (error != 0) { + /* + * Someone has allocated table with different table type. + * We have to rollback everything. + */ + IPFW_UH_WUNLOCK(chain); + + goto free; + } + + + /* + * Finally, attach tables and rewrite rule. + * We need to set table type for each new table, + * so we have to acquire main WLOCK. + */ + IPFW_WLOCK(chain); + TAILQ_FOREACH_SAFE(no, &nh, nn_next, no_tmp) { + no_n = ipfw_objhash_lookup_name(ni, no->set, no->name); + if (no_n != NULL) { + /* Increase refcount for existing table */ + no_n->refcnt++; + /* Keep oib array in sync: update kindx */ + for (p = oib; p < pidx; p++) { + if (p->kidx == no->kidx) { + p->kidx = no_n->kidx; + break; + } + } + + continue; + } + + /* New table. Attach to runtime hash */ + TAILQ_REMOVE(&nh, no, nn_next); + + link_table(chain, (struct table_config *)no); + } + IPFW_WUNLOCK(chain); + + /* Perform rule rewrite */ + l = ci->krule->cmd_len; + cmd = ci->krule->cmd; + cmdlen = 0; + pidx = oib; + for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) { + cmdlen = F_LEN(cmd); + + if (classify_table_opcode(cmd, &uidx, &type) != 0) + continue; + update_table_opcode(cmd, pidx->kidx); + pidx++; + } + + IPFW_UH_WUNLOCK(chain); + } + + error = 0; + + /* + * Stage 4: free resources + */ +free: + TAILQ_FOREACH_SAFE(no, &nh, nn_next, no_tmp) + free_table_config(ni, tc); + + if (oib != ci->obuf) + free(oib, M_IPFW); + + return (error); +} + +/* + * Remove references from every table used in @rule. + */ +void +ipfw_unbind_table_rule(struct ip_fw_chain *chain, struct ip_fw *rule) +{ + int cmdlen, l; + ipfw_insn *cmd; + struct namedobj_instance *ni; + struct named_object *no; + uint32_t set; + uint16_t kidx; + uint8_t type; + + ni = CHAIN_TO_NI(chain); + + set = TABLE_SET(rule->set); + + l = rule->cmd_len; + cmd = rule->cmd; + cmdlen = 0; + for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) { + cmdlen = F_LEN(cmd); + + if (classify_table_opcode(cmd, &kidx, &type) != 0) + continue; + + no = ipfw_objhash_lookup_idx(ni, set, kidx); + + KASSERT(no != NULL, ("table id %d not found", kidx)); + KASSERT(no->type == type, ("wrong type %d (%d) for table id %d", + no->type, type, kidx)); + KASSERT(no->refcnt > 0, ("refcount for table %d is %d", + kidx, no->refcnt)); + + no->refcnt--; + } +} + + +/* + * Removes table bindings for every rule in rule chain @head. + */ +void +ipfw_unbind_table_list(struct ip_fw_chain *chain, struct ip_fw *head) +{ + struct ip_fw *rule; + + while ((rule = head) != NULL) { + head = head->x_next; + ipfw_unbind_table_rule(chain, rule); + } +} + + /* end of file */ --------------060502050508080706040508--