From owner-freebsd-net Sun Aug 27 14:38:17 2000 Delivered-To: freebsd-net@freebsd.org Received: from whistle.com (s205m131.whistle.com [207.76.205.131]) by hub.freebsd.org (Postfix) with ESMTP id 3B54237B422; Sun, 27 Aug 2000 14:38:01 -0700 (PDT) Received: (from smap@localhost) by whistle.com (8.10.0/8.10.0) id e7RLQ3T26925; Sun, 27 Aug 2000 14:26:03 -0700 (PDT) Received: from bubba.whistle.com( 207.76.205.7) by whistle.com via smap (V2.0) id xma026923; Sun, 27 Aug 2000 14:25:56 -0700 Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id OAA66159; Sun, 27 Aug 2000 14:25:55 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200008272125.OAA66159@bubba.whistle.com> Subject: Proposal to clarify mbuf handling rules To: freebsd-net@freebsd.org, freebsd-current@freebsd.org Date: Sun, 27 Aug 2000 14:25:55 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In looking at some of the problems relating to divert, bridging, etc., it's apparent that lots of code is breaking one of the rules for handling mbufs: that mbuf data can sometimes be read-only. Each mbuf may be either a normal mbuf or a cluster mbuf (if the mbuf flags contains M_EXT). Cluster mbufs point to an entire page of memory, and this page of memory may be shared by more than one cluster mbuf (see m_copypacket()). This effectively makes the mbuf data read-only, because a change to one mbuf affects all of the mbufs, not just the one you're working on. There have been (and still are) several FreeBSD bugs because of this subtlety. A test for an mbuf being "read-only" is: if ((m->m_flags & M_EXT) != 0 && MEXT_IS_REF(m)) ... So an implicit rule for handling mbufs is that they should be treated as read-only unless/until you either check that it's not, and/or pullup a new (non-cluster) mbuf that covers the data area that you're going to modify. However, many routines that take an mbuf parameter assume that the mbuf given to them is modifiable and proceed to write all over it. A few examples are: ip_input(), in_arpinput(), tcp_input(), divert_packt(), etc. In practice, this is often not a problem because the mbuf is actually modifiable (because there are no other references to it). But this is just because we've been lucky. When you throw things like bridging, dummynet, divert, and netgraph into the mix, not to mention other site-specific hacks, then these assumptions no longer hold. At the minimum these assumptions should be clearly commented, but that's not even the case right now. Routines that don't change any data, or that only do m_pullup(), M_PREPEND(), m_adj(), etc. don't have a problem. So I'd like to propose a mini-project to clarify and fix this problem. Here is the propsal: 1. All routines that take an mbuf as an argument must not assume that any mbuf in the chain is modifyable, unless expclicitly and clearly documented (in the comment at the top of the function) as doing so. 2. For routines that don't modify data, incorporate liberal use of the "const" keyword to make this clear. For example, change struct ip *ip; ip = mtod(m, struct ip *); to: const struct ip *ip; ip = mtod(m, const struct ip *); 3. For any routines that do need to modify mbuf data, but don't assume anything about the mbuf, alter those routines to do an m_pullup() when necessary to make the data are they are working on modifiable. For example: struct ip *ip; /* Pull up IP header */ if (m->m_len < sizeof(*ip) && !(m = m_pullup(m, sizeof(*ip)))) return; ip = mtod(m, struct ip *); #ifdef NEW_CODE_BEING_ADDED /* Make sure the IP header area is writable */ if ((m->m_flags & M_EXT) != 0 && MEXT_IS_REF(m)) { /* m_pullup() *always* prepends a fresh, non-cluster mbuf */ if ((m = m_pullup(m, sizeof(struct ip))) == 0) return; ip = mtod(m, struct ip *); } #endif /* Modify the header */ ip->ip_len = 123; ... The only negative is the addition of the NEW_CODE_BEING_ADDED code in the relevant places. In practice this test will usually fail, as most mbufs are modifiable, so there should be no noticable slowdown. However, robustness should improve, especially when bridging, diverting, etc. What do people think? If this is generally agreeable I'll try to work on putting together a patch set for review. -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message