From owner-freebsd-arch Mon Aug 28 10:42:35 2000 Delivered-To: freebsd-arch@freebsd.org Received: from whistle.com (s205m131.whistle.com [207.76.205.131]) by hub.freebsd.org (Postfix) with ESMTP id 705DB37B424 for ; Mon, 28 Aug 2000 10:42:21 -0700 (PDT) Received: (from smap@localhost) by whistle.com (8.10.0/8.10.0) id e7SHgLn05258 for ; Mon, 28 Aug 2000 10:42:21 -0700 (PDT) Received: from bubba.whistle.com( 207.76.205.7) by whistle.com via smap (V2.0) id xma005256; Mon, 28 Aug 2000 10:42:01 -0700 Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id KAA69859 for freebsd-arch@freebsd.org; Mon, 28 Aug 2000 10:42:00 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200008281742.KAA69859@bubba.whistle.com> Subject: Proposal to clarify mbuf handling rules (fwd) To: freebsd-arch@freebsd.org Date: Mon, 28 Aug 2000 10:42:00 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG David O'Brien correctly pointed out that I should have sent the email below to -arch and -net instead of -current and -net. So at the risk of repetition for some folks, here it is again for anybody on -arch but not -net or -current. Followups should go to -net I suppose. Sorry for the confusion. Thanks, -Archie > From owner-freebsd-net@FreeBSD.ORG Sun Aug 27 14:38:34 2000 > From: Archie Cobbs > Message-Id: <200008272125.OAA66159@bubba.whistle.com> > Subject: Proposal to clarify mbuf handling rules > To: freebsd-net@FreeBSD.ORG, freebsd-current@FreeBSD.ORG > > In looking at some of the problems relating to divert, bridging, > etc., it's apparent that lots of code is breaking one of the rules > for handling mbufs: that mbuf data can sometimes be read-only. > > Each mbuf may be either a normal mbuf or a cluster mbuf (if the > mbuf flags contains M_EXT). Cluster mbufs point to an entire page > of memory, and this page of memory may be shared by more than one > cluster mbuf (see m_copypacket()). This effectively makes the mbuf > data read-only, because a change to one mbuf affects all of the > mbufs, not just the one you're working on. There have been (and > still are) several FreeBSD bugs because of this subtlety. > > A test for an mbuf being "read-only" is: > > if ((m->m_flags & M_EXT) != 0 && MEXT_IS_REF(m)) ... > > So an implicit rule for handling mbufs is that they should be > treated as read-only unless/until you either check that it's not, > and/or pullup a new (non-cluster) mbuf that covers the data area > that you're going to modify. > > However, many routines that take an mbuf parameter assume that the > mbuf given to them is modifiable and proceed to write all over it. > A few examples are: ip_input(), in_arpinput(), tcp_input(), > divert_packt(), etc. > > In practice, this is often not a problem because the mbuf is actually > modifiable (because there are no other references to it). But this > is just because we've been lucky. When you throw things like bridging, > dummynet, divert, and netgraph into the mix, not to mention other > site-specific hacks, then these assumptions no longer hold. At the > minimum these assumptions should be clearly commented, but that's > not even the case right now. > > Routines that don't change any data, or that only do m_pullup(), > M_PREPEND(), m_adj(), etc. don't have a problem. > > So I'd like to propose a mini-project to clarify and fix this problem. > Here is the propsal: > > 1. All routines that take an mbuf as an argument must not assume > that any mbuf in the chain is modifyable, unless expclicitly > and clearly documented (in the comment at the top of the function) > as doing so. > > 2. For routines that don't modify data, incorporate liberal use > of the "const" keyword to make this clear. For example, change > > struct ip *ip; > ip = mtod(m, struct ip *); > > to: > > const struct ip *ip; > ip = mtod(m, const struct ip *); > > 3. For any routines that do need to modify mbuf data, but don't > assume anything about the mbuf, alter those routines to do > an m_pullup() when necessary to make the data are they are > working on modifiable. For example: > > struct ip *ip; > > /* Pull up IP header */ > if (m->m_len < sizeof(*ip) && !(m = m_pullup(m, sizeof(*ip)))) > return; > ip = mtod(m, struct ip *); > > #ifdef NEW_CODE_BEING_ADDED > /* Make sure the IP header area is writable */ > if ((m->m_flags & M_EXT) != 0 && MEXT_IS_REF(m)) { > /* m_pullup() *always* prepends a fresh, non-cluster mbuf */ > if ((m = m_pullup(m, sizeof(struct ip))) == 0) > return; > ip = mtod(m, struct ip *); > } > #endif > > /* Modify the header */ > ip->ip_len = 123; > ... > > The only negative is the addition of the NEW_CODE_BEING_ADDED code > in the relevant places. In practice this test will usually fail, > as most mbufs are modifiable, so there should be no noticable > slowdown. However, robustness should improve, especially when > bridging, diverting, etc. > > What do people think? If this is generally agreeable I'll try to > work on putting together a patch set for review. ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message