From owner-freebsd-net Fri May 31 12: 0:39 2002 Delivered-To: freebsd-net@freebsd.org Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by hub.freebsd.org (Postfix) with ESMTP id 6B60A37B401 for ; Fri, 31 May 2002 12:00:27 -0700 (PDT) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id g4VIxcl71245; Fri, 31 May 2002 14:59:38 -0400 (EDT) (envelope-from bmilekic@unixdaemons.com) Date: Fri, 31 May 2002 14:59:38 -0400 From: Bosko Milekic To: Archie Cobbs Cc: freebsd-net@FreeBSD.ORG Subject: Re: m_split() considered harmful Message-ID: <20020531145938.A71219@unixdaemons.com> References: <200205311829.g4VITKM01684@arch20m.dellroad.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200205311829.g4VITKM01684@arch20m.dellroad.org>; from archie@dellroad.org on Fri, May 31, 2002 at 11:29:20AM -0700 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Fri, May 31, 2002 at 11:29:20AM -0700, Archie Cobbs wrote: > Hi, > > There is an inconsistency in the kernel mbuf code around the > "m->m_ext.ext_size" field of an M_EXT mbuf. > > At first glance you might assume that this field is the total amount > of contiguous space available in the external buffer pointed to by > "m->m_ext.ext_buf". For example, look at the M_TRAILINGSPACE() and > MCLGET() macros. But alas, you know where assumptions lead... :-) > > Now look at m_split(), in particular this code: > > if (m->m_flags & M_EXT) { > n->m_flags |= M_EXT; > n->m_ext = m->m_ext; > if(!m->m_ext.ext_ref) > mclrefcnt[mtocl(m->m_ext.ext_buf)]++; > else > (*(m->m_ext.ext_ref))(m->m_ext.ext_buf, > m->m_ext.ext_size); > m->m_ext.ext_size = 0; /* For Accounting XXXXXX danger */ > n->m_data = m->m_data + len; > } else { > > Notice the 'XXXXXX' which is where the problem lies. The code is > just recklessly setting m->m_ext.ext_size to zero to avoid some > "accounting" problem. This has been there since revision 1.1/rgrimes. > > This is comletely broken and violates the semantics of the > "m->m_ext.ext_size" field implied by M_TRAILINGSPACE() et. al. Good catch. Thanks Archie! > Moreover, I've also done a search of every occurrence of "ext_size" > and everywhere else in the kernel treats this feild with the same > "external buffer size" semantics, and there is no "accounting" that > is done with this field. > > For an example of where this could cause problems, notice that > a call to sbfree() on an mbuf that had gone through an m_split() > could cause a "receive 1" panic (I've seen these occasionally over > the years). > > FYI, m_split() is used in these files: > > netgraph/ng_ppp.c > netinet6/ah_input.c > netinet6/esp_input.c > netinet6/frag6.c > netsmb/smb_rq.c > > If there are no objections, I will apply the patch below. > > Thanks, > -Archie > > __________________________________________________________________________ > Archie Cobbs * Packet Design * http://www.packetdesign.com > > --- kern/uipc_mbuf.c.orig Fri May 31 11:17:52 2002 > +++ kern/uipc_mbuf.c Fri May 31 11:27:42 2002 > @@ -1194,6 +1194,10 @@ > * Partition an mbuf chain in two pieces, returning the tail -- > * all but the first len0 bytes. In case of failure, it returns NULL and > * attempts to restore the chain to its original state. > + * > + * Note that the returned mbuf must be treated as read-only, because > + * it will end up sharing an mbuf cluster with the original mbuf if the > + * "breaking point" happens to lie within a cluster mbuf. > */ Can you please apply this to -CURRENT first? -CURRENT has the same problem. Notice that in -CURRENT we have a M_WRITABLE() macro which will actually evaluate to false for both mbufs as they will be referring to a cluster that has a ref count of > 1 so this comment will be implicitly represented in the code. Now all we have to do is actually start using the M_WRITABLE macro more often. :-) > struct mbuf * > m_split(m0, len0, wait) > @@ -1247,7 +1251,6 @@ > else > (*(m->m_ext.ext_ref))(m->m_ext.ext_buf, > m->m_ext.ext_size); > - m->m_ext.ext_size = 0; /* For Accounting XXXXXX danger */ > n->m_data = m->m_data + len; > } else { > bcopy(mtod(m, caddr_t) + len, mtod(n, caddr_t), remain); I don't remember why the ext_size here was this originally (as you mention, it was imported that way), but it certainly seems bogus and you catching it now is hopefully going to solve some really wierd and difficult-to-debug problems we've had involving mbufs over the years. Woop! -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message