Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 May 2002 14:59:38 -0400
From:      Bosko Milekic <bmilekic@unixdaemons.com>
To:        Archie Cobbs <archie@dellroad.org>
Cc:        freebsd-net@FreeBSD.ORG
Subject:   Re: m_split() considered harmful
Message-ID:  <20020531145938.A71219@unixdaemons.com>
In-Reply-To: <200205311829.g4VITKM01684@arch20m.dellroad.org>; from archie@dellroad.org on Fri, May 31, 2002 at 11:29:20AM -0700
References:  <200205311829.g4VITKM01684@arch20m.dellroad.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, May 31, 2002 at 11:29:20AM -0700, Archie Cobbs wrote:
> Hi,
> 
> There is an inconsistency in the kernel mbuf code around the
> "m->m_ext.ext_size" field of an M_EXT mbuf.
> 
> At first glance you might assume that this field is the total amount
> of contiguous space available in the external buffer pointed to by
> "m->m_ext.ext_buf". For example, look at the M_TRAILINGSPACE() and
> MCLGET() macros. But alas, you know where assumptions lead... :-)
> 
> Now look at m_split(), in particular this code:
> 
>     if (m->m_flags & M_EXT) {
> 	n->m_flags |= M_EXT;
> 	n->m_ext = m->m_ext;
> 	if(!m->m_ext.ext_ref)
> 	    mclrefcnt[mtocl(m->m_ext.ext_buf)]++;
> 	else
> 	    (*(m->m_ext.ext_ref))(m->m_ext.ext_buf,  
> 				    m->m_ext.ext_size);
> 	m->m_ext.ext_size = 0; /* For Accounting XXXXXX danger */
> 	n->m_data = m->m_data + len;
>     } else {
> 
> Notice the 'XXXXXX' which is where the problem lies. The code is
> just recklessly setting m->m_ext.ext_size to zero to avoid some
> "accounting" problem. This has been there since revision 1.1/rgrimes.
> 
> This is comletely broken and violates the semantics of the
> "m->m_ext.ext_size" field implied by M_TRAILINGSPACE() et. al.

  Good catch.  Thanks Archie!

> Moreover, I've also done a search of every occurrence of "ext_size"
> and everywhere else in the kernel treats this feild with the same
> "external buffer size" semantics, and there is no "accounting" that
> is done with this field.
> 
> For an example of where this could cause problems, notice that
> a call to sbfree() on an mbuf that had gone through an m_split()
> could cause a "receive 1" panic (I've seen these occasionally over
> the years).
> 
> FYI, m_split() is used in these files:
> 
>     netgraph/ng_ppp.c
>     netinet6/ah_input.c
>     netinet6/esp_input.c
>     netinet6/frag6.c
>     netsmb/smb_rq.c
> 
> If there are no objections, I will apply the patch below.
>
> Thanks,
> -Archie
> 
> __________________________________________________________________________
> Archie Cobbs     *     Packet Design     *     http://www.packetdesign.com
> 
> --- kern/uipc_mbuf.c.orig	Fri May 31 11:17:52 2002
> +++ kern/uipc_mbuf.c	Fri May 31 11:27:42 2002
> @@ -1194,6 +1194,10 @@
>   * Partition an mbuf chain in two pieces, returning the tail --
>   * all but the first len0 bytes.  In case of failure, it returns NULL and
>   * attempts to restore the chain to its original state.
> + *
> + * Note that the returned mbuf must be treated as read-only, because
> + * it will end up sharing an mbuf cluster with the original mbuf if the
> + * "breaking point" happens to lie within a cluster mbuf.
>   */

  Can you please apply this to -CURRENT first?  -CURRENT has the same
  problem.  Notice that in -CURRENT we have a M_WRITABLE() macro which
  will actually evaluate to false for both mbufs as they will be
  referring to a cluster that has a ref count of > 1 so this comment
  will be implicitly represented in the code.  Now all we have to do is
  actually start using the M_WRITABLE macro more often. :-)

>  struct mbuf *
>  m_split(m0, len0, wait)
> @@ -1247,7 +1251,6 @@
>  		else
>  			(*(m->m_ext.ext_ref))(m->m_ext.ext_buf,
>  						m->m_ext.ext_size);
> -		m->m_ext.ext_size = 0; /* For Accounting XXXXXX danger */
>  		n->m_data = m->m_data + len;
>  	} else {
>  		bcopy(mtod(m, caddr_t) + len, mtod(n, caddr_t), remain);

  I don't remember why the ext_size here was this originally (as you
  mention, it was imported that way), but it certainly seems bogus and
  you catching it now is hopefully going to solve some really wierd and
  difficult-to-debug problems we've had involving mbufs over the years.

Woop!
-- 
Bosko Milekic
bmilekic@unixdaemons.com
bmilekic@FreeBSD.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020531145938.A71219>