Date: Mon, 19 May 2003 00:49:14 -0700 From: Luigi Rizzo <luigi@freebsd.org> To: freebsd-alpha@freebsd.org Subject: [FEEDBACK REQUIRED] patch for ipfw/ipfw2 on alpha/sparc64 Message-ID: <20030519004914.B46011@xorpc.icir.org>
next in thread | raw e-mail | index | archive | help
--tThc/1wpZn/ma/RB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, in the past there have been reports about panics on 64 bit architectures due to unaligned accesses to parts of the ipfw2 rules. The attached patch tries to fix them. I tested that it compiles, but couldn't run it on the relevant architectures for lack of hardware, and have not found yet anyone who wanted to try it. If you care and want to spend a few minutes to test the patch on a Sparc64 or Alpha box and provide feedback, I'd ask you to do the following: STEP 1: try to reproduce the panic + compile a kernel (a recent CURRENT or STABLE) WITHOUT THE PATCH (i.e. with standard sources) with the following options: options IPFIREWALL options DUMMYNET options IPFW2 # only in STABLE (on STABLE, make sure you also have rebuilt ipfw with make -DIPFW2) + [THIS SHOULD TRIGGER A PANIC] reboot with the new kernel and configure ipfw with one of the following configurations, and fire some traffic at the firewall (e.g. "ssh localhost") to try and cause a panic. If these configuration do not cause a panic and you know a better one, please post it. ipfw pipe 1 config ipfw add pipe 1 tcp from any to any 1-65535,1-65535 or ipfw pipe 1 config ipfw add pipe 1 tcp from any to any 1-65535,1-65535,1-65535 or ipfw add allow tcp from any to any 1-65535,1-65535 or ipfw add allow tcp from any to any 1-65535,1-65535,1-65535 STEP 2: test the patch + patch sys/netinet/ and sbin/ipfw/ with the attached patch, rebuild and reinstall kernel and ipfw, and try again to see if the panic does not occur anymore. The patch is for -current, but it should apply to -stable with relatively little effort (see the description below). Feedback would be really appreciated if you want to have this fixed for 5.1 cheers luigi --------------------------------------- Description and proposed patch for ipfw[2] panics on 64-bit architectures. BACKGROUND ipfw2 has been designed so that rules are made of a standard header, followed by a sequence of [micro]instructions (represented by C struct's), each one having a length of a multiple of 32 bits. Because instructions are written one after the other without padding, variables of size > 32bit which reside within each struct might end up unaligned, and the compiler cannot help fixing this. I am listing below some of the approaches that can be used to avoid the problem; because such occurences are extremely rare (there is actually only one instance, "void *pipe_ptr" within "struct _ipfw_insn_pipe"), i believe the first approach to be the easiest to implement and also the most effective in terms of memory occupation and code modifications. Solution #1: handle the critical 64-bit objects with bcopy/bcmp instead of accessing them directly. This is slightly inefficient but fortunately the only instance when it needs to be done is when passing a packet to a dummynet pipe. Also, the size of the copy/compare is known at compile time, so i suppose it might be candidate to some compiler optimizations. Solution #2: make the size of each microinstruction a multiple of 64 bits, so the alignment can be guaranteed by the compiler. This would generate quite an increase of the instruction sizes, and more importantly, would require a significant revision of the ipfw code. The good thing of this approach is that, once done, would put on the compiler the burden of providing the alignment. Solution #3: possibly introduce a padding (in the form of NOP instructions) before those instructions which need alignment. This also requires some amount of extra code to introduce the padding when necessary, and leaves the burden of providing the aligment on the programmer. DESCRIPTION OF THE ATTACHED PATCH The attached patch modifies the following files: src/sys/netinet/ip_dummynet.c src/sys/netinet/ip_fw.h src/sys/netinet/ip_fw2.c src/sbin/ipfw2.c and the changes are the following: sys/netinet/ip_dummynet.c: use bcopy to read/write the "pipe_ptr" field of "struct _ipfw_insn_pipe" on non-i386 architectures (which might be sensitive to alignment problems). This is invoked only in two places, when passing a packet to a dummynet pipe. src/sys/netinet/ip_fw.h @@ -32,11 +32,16 @@ add a comment to explain the alignment problems. @@ -211,7 +216,7 @@ change "int unit" to "int32_t unit" to save space on those architectures where int are 64 bits. @@ -220,10 +225,13 @@ add a comment explaining that "void *pipe_ptr" must be treated carefully @@ -278,7 +286,9 @@ remove the unnecessary field "set_disable" from "struct ip_fw". The set of disabled rulesets is passed up to userspace through the "next rule" pointer (unused otherwise, in that context), which is manipulated using bcopy to avoid aligmnent problems. NOTE: by doing this, we revert to the method used by ipfw2 in 4.x, so we should save the change in that branch. @@ -319,12 +329,12 @@ reorder some fields of _ipfw_dyn_rule to reduce the amount of padding. NOTE: This change is not strictly necessary, and could be omitted if we want to keep the struct unchanged (this is only true for the to 4.x branch, because the "rulenum" field from the same struct should be removed anyways, see the diff below). @@ -334,7 +344,9 @@ remove the "rulenum" field from struct _ipfw_dyn_rule, same as done for "set_disable" in "struct ip_fw". sys/netinet/ip_fw2.c @@ -2017,8 +2017,15 @@ use bcmp/bzero to manipulate pipe_ptr. As the comment says, this is done very infrequently so efficiency is not a concern. @@ -2559,7 +2566,8 @@ @@ -2571,7 +2579,8 @@ use bcopy to pass up info from the kernel to userland through an unused pointer, to avoid aligmnent problems. sbin/ipfw/ipfw2.c @@ -813,8 +813,9 @@ @@ -1454,7 +1458,9 @@ use bcopy to fetch the "set_disable" info into a u_int32_t variable @@ -1213,13 +1214,16 @@ @@ -1690,9 +1696,12 @@ use bcopy to fetch the "rulenum" info into a uint16_t variable. ------------------------------ Index: sys/netinet/ip_dummynet.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.c,v retrieving revision 1.63 diff -u -r1.63 ip_dummynet.c --- sys/netinet/ip_dummynet.c 27 Mar 2003 15:00:10 -0000 1.63 +++ sys/netinet/ip_dummynet.c 11 May 2003 07:19:56 -0000 @@ -1027,7 +1027,11 @@ if (cmd->opcode == O_LOG) cmd += F_LEN(cmd); +#ifdef I386 fs = ((ipfw_insn_pipe *)cmd)->pipe_ptr; +#else + bcopy(& ((ipfw_insn_pipe *)cmd)->pipe_ptr, &fs, sizeof(fs)); +#endif if (fs != NULL) return fs; @@ -1049,7 +1053,11 @@ } /* record for the future */ #if IPFW2 +#ifdef I386 ((ipfw_insn_pipe *)cmd)->pipe_ptr = fs; +#else + bcopy(&fs, & ((ipfw_insn_pipe *)cmd)->pipe_ptr, sizeof(fs)); +#endif #else if (fs != NULL) rule->pipe_ptr = fs; Index: sys/netinet/ip_fw.h =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v retrieving revision 1.76 diff -u -r1.76 ip_fw.h --- sys/netinet/ip_fw.h 15 Mar 2003 01:13:00 -0000 1.76 +++ sys/netinet/ip_fw.h 11 May 2003 07:20:28 -0000 @@ -32,11 +32,16 @@ * The kernel representation of ipfw rules is made of a list of * 'instructions' (for all practical purposes equivalent to BPF * instructions), which specify which fields of the packet - * (or its metatada) should be analysed. + * (or its metadata) should be analysed. * * Each instruction is stored in a structure which begins with * "ipfw_insn", and can contain extra fields depending on the * instruction type (listed below). + * Note that the code is written so that individual instructions + * have a size which is a multiple of 32 bits. This means that, if + * such structures contain pointers or other 64-bit entities, + * (there is just one instance now) they may end up unaligned on + * 64-bit architectures, so the must be handled with care. * * "enum ipfw_opcodes" are the opcodes supported. We can have up * to 256 different opcodes. @@ -211,7 +216,7 @@ ipfw_insn o; union { struct in_addr ip; - int unit; + int32_t unit; } p; char name[IFNAMSIZ]; } ipfw_insn_if; @@ -220,10 +225,13 @@ * This is used for pipe and queue actions, which need to store * a single pointer (which can have different size on different * architectures. + * Note that, because of previous instructions, pipe_ptr might + * be unaligned in the overall structure, so it needs to be + * manipulated with care. */ typedef struct _ipfw_insn_pipe { ipfw_insn o; - void *pipe_ptr; + void *pipe_ptr; /* XXX */ } ipfw_insn_pipe; /* @@ -278,7 +286,9 @@ struct ip_fw { struct ip_fw *next; /* linked list of rules */ struct ip_fw *next_rule; /* ptr to next [skipto] rule */ +#if 0 /* passed up using 'next_rule' */ u_int32_t set_disable; /* disabled sets (for userland) */ +#endif u_int16_t act_ofs; /* offset of action in 32-bit units */ u_int16_t cmd_len; /* # of 32-bit words in cmd */ u_int16_t rulenum; /* rule number */ @@ -319,12 +329,12 @@ struct _ipfw_dyn_rule { ipfw_dyn_rule *next; /* linked list of rules. */ - struct ipfw_flow_id id; /* (masked) flow id */ struct ip_fw *rule; /* pointer to rule */ ipfw_dyn_rule *parent; /* pointer to parent rule */ - u_int32_t expire; /* expire time */ u_int64_t pcnt; /* packet match counter */ u_int64_t bcnt; /* byte match counter */ + struct ipfw_flow_id id; /* (masked) flow id */ + u_int32_t expire; /* expire time */ u_int32_t bucket; /* which bucket in hash table */ u_int32_t state; /* state of this rule (typically a * combination of TCP flags) @@ -334,7 +344,9 @@ /* to generate keepalives) */ u_int16_t dyn_type; /* rule type */ u_int16_t count; /* refcount */ +#if 0 /* passed up with 'rule' */ u_int16_t rulenum; /* rule number (for userland) */ +#endif }; /* Index: sys/netinet/ip_fw2.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_fw2.c,v retrieving revision 1.28 diff -u -r1.28 ip_fw2.c --- sys/netinet/ip_fw2.c 15 Mar 2003 01:13:00 -0000 1.28 +++ sys/netinet/ip_fw2.c 11 May 2003 07:21:40 -0000 @@ -2017,8 +2017,15 @@ if (cmd->o.opcode != O_PIPE && cmd->o.opcode != O_QUEUE) continue; - if (match == NULL || cmd->pipe_ptr == match) - cmd->pipe_ptr = NULL; + /* + * XXX Use bcmp/bzero to handle pipe_ptr to overcome + * possible alignment problems on 64-bit architectures. + * This code is seldom used so we do not worry too + * much about efficiency. + */ + if (match == NULL || + !bcmp(&cmd->pipe_ptr, &match, sizeof(match)) ) + bzero(&cmd->pipe_ptr, sizeof(cmd->pipe_ptr)); } } @@ -2559,7 +2566,8 @@ for (rule = layer3_chain; rule ; rule = rule->next) { int i = RULESIZE(rule); bcopy(rule, bp, i); - ((struct ip_fw *)bp)->set_disable = set_disable; + bcopy(&set_disable, &(bp->next_rule), + sizeof(set_disable)); bp = (struct ip_fw *)((char *)bp + i); } if (ipfw_dyn_v) { @@ -2571,7 +2579,8 @@ for ( p = ipfw_dyn_v[i] ; p != NULL ; p = p->next, dst++ ) { bcopy(p, dst, sizeof *p); - dst->rulenum = p->rule->rulenum; + bcopy(&(p->rule->rulenum), &(dst->rule), + sizeof(p->rule->rulenum)); /* * store a non-null value in "next". * The userland code will interpret a Index: sbin/ipfw/ipfw2.c =================================================================== RCS file: /home/ncvs/src/sbin/ipfw/ipfw2.c,v retrieving revision 1.23 diff -u -r1.23 ipfw2.c --- sbin/ipfw/ipfw2.c 15 Mar 2003 01:12:59 -0000 1.23 +++ sbin/ipfw/ipfw2.c 11 May 2003 09:00:26 -0000 @@ -813,8 +813,9 @@ int flags = 0; /* prerequisites */ ipfw_insn_log *logptr = NULL; /* set if we find an O_LOG */ int or_block = 0; /* we are in an or block */ + u_int32_t set_disable; - u_int32_t set_disable = rule->set_disable; + bcopy(rule->next_rule, &set_disable, sizeof(set_disable)); if (set_disable & (1 << rule->set)) { /* disabled */ if (!show_sets) @@ -1213,13 +1214,16 @@ { struct protoent *pe; struct in_addr a; + uint16_t rulenum; if (!do_expired) { if (!d->expire && !(d->dyn_type == O_LIMIT_PARENT)) return; } - printf("%05d %*llu %*llu (%ds)", d->rulenum, pcwidth, d->pcnt, bcwidth, + bcopy(d->rule, &rulenum, sizeof(rulenum)); + + printf("%05d %*llu %*llu (%ds)", rulenum, pcwidth, d->pcnt, bcwidth, d->bcnt, d->expire); switch (d->dyn_type) { case O_LIMIT_PARENT: @@ -1454,7 +1458,9 @@ err(EX_OSERR, "malloc"); if (getsockopt(s, IPPROTO_IP, IP_FW_GET, data, &nbytes) < 0) err(EX_OSERR, "getsockopt(IP_FW_GET)"); - set_disable = ((struct ip_fw *)data)->set_disable; + bcopy(((struct ip_fw *)data)->next_rule, + &set_disable, sizeof(set_disable)); + for (i = 0, msg = "disable" ; i < 31; i++) if ( (set_disable & (1<<i))) { @@ -1690,9 +1696,12 @@ /* already warned */ continue; for (n = 0, d = dynrules; n < ndyn; n++, d++) { - if (d->rulenum > rnum) + uint16_t rulenum; + + bcopy(d->rule, &rulenum, sizeof(rulenum)); + if (rulenum > rnum) break; - if (d->rulenum == rnum) + if (rulenum == rnum) show_dyn_ipfw(d, pcwidth, bcwidth); } } _______________________________________________ freebsd-ipfw@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw To unsubscribe, send any mail to "freebsd-ipfw-unsubscribe@freebsd.org" ----- End forwarded message ----- --tThc/1wpZn/ma/RB Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ipfw2.diff" Index: sys/netinet/ip_dummynet.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.c,v retrieving revision 1.63 diff -u -r1.63 ip_dummynet.c --- sys/netinet/ip_dummynet.c 27 Mar 2003 15:00:10 -0000 1.63 +++ sys/netinet/ip_dummynet.c 11 May 2003 07:19:56 -0000 @@ -1027,7 +1027,11 @@ if (cmd->opcode == O_LOG) cmd += F_LEN(cmd); +#ifdef I386 fs = ((ipfw_insn_pipe *)cmd)->pipe_ptr; +#else + bcopy(& ((ipfw_insn_pipe *)cmd)->pipe_ptr, &fs, sizeof(fs)); +#endif if (fs != NULL) return fs; @@ -1049,7 +1053,11 @@ } /* record for the future */ #if IPFW2 +#ifdef I386 ((ipfw_insn_pipe *)cmd)->pipe_ptr = fs; +#else + bcopy(&fs, & ((ipfw_insn_pipe *)cmd)->pipe_ptr, sizeof(fs)); +#endif #else if (fs != NULL) rule->pipe_ptr = fs; Index: sys/netinet/ip_fw.h =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v retrieving revision 1.76 diff -u -r1.76 ip_fw.h --- sys/netinet/ip_fw.h 15 Mar 2003 01:13:00 -0000 1.76 +++ sys/netinet/ip_fw.h 11 May 2003 07:20:28 -0000 @@ -32,11 +32,16 @@ * The kernel representation of ipfw rules is made of a list of * 'instructions' (for all practical purposes equivalent to BPF * instructions), which specify which fields of the packet - * (or its metatada) should be analysed. + * (or its metadata) should be analysed. * * Each instruction is stored in a structure which begins with * "ipfw_insn", and can contain extra fields depending on the * instruction type (listed below). + * Note that the code is written so that individual instructions + * have a size which is a multiple of 32 bits. This means that, if + * such structures contain pointers or other 64-bit entities, + * (there is just one instance now) they may end up unaligned on + * 64-bit architectures, so the must be handled with care. * * "enum ipfw_opcodes" are the opcodes supported. We can have up * to 256 different opcodes. @@ -211,7 +216,7 @@ ipfw_insn o; union { struct in_addr ip; - int unit; + int32_t unit; } p; char name[IFNAMSIZ]; } ipfw_insn_if; @@ -220,10 +225,13 @@ * This is used for pipe and queue actions, which need to store * a single pointer (which can have different size on different * architectures. + * Note that, because of previous instructions, pipe_ptr might + * be unaligned in the overall structure, so it needs to be + * manipulated with care. */ typedef struct _ipfw_insn_pipe { ipfw_insn o; - void *pipe_ptr; + void *pipe_ptr; /* XXX */ } ipfw_insn_pipe; /* @@ -278,7 +286,9 @@ struct ip_fw { struct ip_fw *next; /* linked list of rules */ struct ip_fw *next_rule; /* ptr to next [skipto] rule */ +#if 0 /* passed up using 'next_rule' */ u_int32_t set_disable; /* disabled sets (for userland) */ +#endif u_int16_t act_ofs; /* offset of action in 32-bit units */ u_int16_t cmd_len; /* # of 32-bit words in cmd */ u_int16_t rulenum; /* rule number */ @@ -319,12 +329,12 @@ struct _ipfw_dyn_rule { ipfw_dyn_rule *next; /* linked list of rules. */ - struct ipfw_flow_id id; /* (masked) flow id */ struct ip_fw *rule; /* pointer to rule */ ipfw_dyn_rule *parent; /* pointer to parent rule */ - u_int32_t expire; /* expire time */ u_int64_t pcnt; /* packet match counter */ u_int64_t bcnt; /* byte match counter */ + struct ipfw_flow_id id; /* (masked) flow id */ + u_int32_t expire; /* expire time */ u_int32_t bucket; /* which bucket in hash table */ u_int32_t state; /* state of this rule (typically a * combination of TCP flags) @@ -334,7 +344,9 @@ /* to generate keepalives) */ u_int16_t dyn_type; /* rule type */ u_int16_t count; /* refcount */ +#if 0 /* passed up with 'rule' */ u_int16_t rulenum; /* rule number (for userland) */ +#endif }; /* Index: sys/netinet/ip_fw2.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_fw2.c,v retrieving revision 1.28 diff -u -r1.28 ip_fw2.c --- sys/netinet/ip_fw2.c 15 Mar 2003 01:13:00 -0000 1.28 +++ sys/netinet/ip_fw2.c 11 May 2003 07:21:40 -0000 @@ -2017,8 +2017,15 @@ if (cmd->o.opcode != O_PIPE && cmd->o.opcode != O_QUEUE) continue; - if (match == NULL || cmd->pipe_ptr == match) - cmd->pipe_ptr = NULL; + /* + * XXX Use bcmp/bzero to handle pipe_ptr to overcome + * possible alignment problems on 64-bit architectures. + * This code is seldom used so we do not worry too + * much about efficiency. + */ + if (match == NULL || + !bcmp(&cmd->pipe_ptr, &match, sizeof(match)) ) + bzero(&cmd->pipe_ptr, sizeof(cmd->pipe_ptr)); } } @@ -2559,7 +2566,8 @@ for (rule = layer3_chain; rule ; rule = rule->next) { int i = RULESIZE(rule); bcopy(rule, bp, i); - ((struct ip_fw *)bp)->set_disable = set_disable; + bcopy(&set_disable, &(bp->next_rule), + sizeof(set_disable)); bp = (struct ip_fw *)((char *)bp + i); } if (ipfw_dyn_v) { @@ -2571,7 +2579,8 @@ for ( p = ipfw_dyn_v[i] ; p != NULL ; p = p->next, dst++ ) { bcopy(p, dst, sizeof *p); - dst->rulenum = p->rule->rulenum; + bcopy(&(p->rule->rulenum), &(dst->rule), + sizeof(p->rule->rulenum)); /* * store a non-null value in "next". * The userland code will interpret a Index: sbin/ipfw/ipfw2.c =================================================================== RCS file: /home/ncvs/src/sbin/ipfw/ipfw2.c,v retrieving revision 1.23 diff -u -r1.23 ipfw2.c --- sbin/ipfw/ipfw2.c 15 Mar 2003 01:12:59 -0000 1.23 +++ sbin/ipfw/ipfw2.c 11 May 2003 09:00:26 -0000 @@ -813,8 +813,9 @@ int flags = 0; /* prerequisites */ ipfw_insn_log *logptr = NULL; /* set if we find an O_LOG */ int or_block = 0; /* we are in an or block */ + u_int32_t set_disable; - u_int32_t set_disable = rule->set_disable; + bcopy(rule->next_rule, &set_disable, sizeof(set_disable)); if (set_disable & (1 << rule->set)) { /* disabled */ if (!show_sets) @@ -1213,13 +1214,16 @@ { struct protoent *pe; struct in_addr a; + uint16_t rulenum; if (!do_expired) { if (!d->expire && !(d->dyn_type == O_LIMIT_PARENT)) return; } - printf("%05d %*llu %*llu (%ds)", d->rulenum, pcwidth, d->pcnt, bcwidth, + bcopy(d->rule, &rulenum, sizeof(rulenum)); + + printf("%05d %*llu %*llu (%ds)", rulenum, pcwidth, d->pcnt, bcwidth, d->bcnt, d->expire); switch (d->dyn_type) { case O_LIMIT_PARENT: @@ -1454,7 +1458,9 @@ err(EX_OSERR, "malloc"); if (getsockopt(s, IPPROTO_IP, IP_FW_GET, data, &nbytes) < 0) err(EX_OSERR, "getsockopt(IP_FW_GET)"); - set_disable = ((struct ip_fw *)data)->set_disable; + bcopy(((struct ip_fw *)data)->next_rule, + &set_disable, sizeof(set_disable)); + for (i = 0, msg = "disable" ; i < 31; i++) if ( (set_disable & (1<<i))) { @@ -1690,9 +1696,12 @@ /* already warned */ continue; for (n = 0, d = dynrules; n < ndyn; n++, d++) { - if (d->rulenum > rnum) + uint16_t rulenum; + + bcopy(d->rule, &rulenum, sizeof(rulenum)); + if (rulenum > rnum) break; - if (d->rulenum == rnum) + if (rulenum == rnum) show_dyn_ipfw(d, pcwidth, bcwidth); } } --tThc/1wpZn/ma/RB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030519004914.B46011>