Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Aug 2000 20:39:35 -0400 (EDT)
From:      Bosko Milekic <bmilekic@dsuper.net>
To:        David Malone <dwmalone@maths.tcd.ie>
Cc:        freebsd-net@FreeBSD.ORG
Subject:   Re: Proposal to clarify mbuf handling rules
Message-ID:  <Pine.BSF.4.21.0008291843410.9530-100000@jehovah.technokratis.com>
In-Reply-To: <200008291159.aa60672@salmon.maths.tcd.ie>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 29 Aug 2000, David Malone wrote:

> >   (a) The mbuf data should not be written to under any circumstances.
> >       even if it is a cluster mbuf and the reference count becomes 1.
> 
> >   (b) The mbuf data may be written to. This implies that if this mbuf
> >       is a cluster, then the reference count is 1.
> 
> >   (c) The mbuf data may not be written to, BUT if it is a cluster mbuf
> >       and the reference count decreases to 1, then it may then be
> >       written to.


	All of these different possibilities add quite a bit of hysteria for
  just figuring out which of the above a given mbuf falls under from within
  the mbuf subsystem alone.
  	I think that it's rather clear that if we do intend to implement some
  sort of permissions structure with respect to external buffers that we
  need to have the mbufs referring to these ext_bufs share a unique
  permissions field.
  	The solution: Change mext_refcnt union to be a mext_descr union with
  an additional field falling within like so:

  union mext_descr {
  	union	mext_descr *next_desc;
	struct {
		u_long refcnt;
		u_int perms;
	} desc;
  };

  	The cost: Additional 4 bytes per every M_EXT mbuf. (These unions are
  typically allocated en-masse from mb_map during bootup and more are added
  from mb_map on demand, assuming we have the space in mb_map, otherwise,
  ... we'll probably have to implement another tsleep mechanism).

  perms is to be modified atomically and can contain, for now:

  	M_RDWR	0x00000000
	M_RDONLY	0x00000001
	[... more to come ...?]

  (Notice that the mext_descr _only_ contains SMP-safe atomically
  modifiable and shareable settings for the ext_buf -- this is like shared
  space for M_EXT mbufs referring to the same ext_buf -- the SMP-friendly
  version of the doubly-linked list solution).

  This solution would allow on-the-fly modification of ext_bufs
  permissions.

  If this works for you guys, and if you think it's worth it, then we
  should also add an ext_flags in the m_ext structure while we're add it,
  because it will cost us nothing (m_ext is within a union containing ample
  amount of scratch space in an M_EXT mbuf) and because it will allow me to
  remove the code that assumes that if ext_free is NULL, that we're dealing
  with an mcluster (which is true for now); but the new check of the flag
  would make the code a little faster at 0 extra overhead. The idea would
  be to have that ext_flags contain one of M_CLUSTER, M_WHATEVER, M_OTHER.


> I think it would be a nice idea not to tie this to clusters, but
> to have the M_RDONLY as an option which is given at the time which
> external storage is first attached to a mbuf. This means we set
> M_RDONLY for sendfile bufs, but don't set it for mbuf's with no
> external storage, regular clusters or for jumbo ethernet storage
> (once we've checked this with Bill Paul naturally).
> 
> We will have to be careful with the M_WRITABLE flag, as it is stored
> in the mbufs rather than the externam storage, and there may be
> multiple mbufs referencing this storage. Think of the following
> situations:
> 
> 1) When we first make a mbuf we set M_WRITABLE according to !M_RDONLY.
> 
> 2) If we make a second mbuf which references this original mbuf we
> can clear M_WRITABLE in both mbufs, as we have access to both.
> 
> 3) Adding references from now on isn't a problem as M_WRITABLE is
> clear.
> 
> 4) Now, consider the situation where we are freeing the second last
> mbuf referencing this storage. We have access to the mbuf we are
> freeing, but not to the remain mbuf, in which we want to set
> M_WRITABLE.
> 
> I can't see an obvious way around this, so we may find situations
> where M_WRITABLE is not set, but the storage is writable. This would
> translate to a M_IS_WRITABLE macro like:
> 
> #define M_IS_WRITABLE(m) ((m)->m_flags & M_WRITABLE) ||		\
> 	(!((m)->m_flags & M_RDONLY) && !MEXT_IS_REF(m) &&	\
> 	    ((m)->m_flags |= M_WRITABLE)			\
> 	)
> 
> Note, this macro actually sets the M_WRITABLE flag, which is a bit
> icky - but seems reasonable given that M_WRITABLE is supposed to be
> cache for the value of !((m)->m_flags & M_RDONLY) && !MEXT_IS_REF(m).
> The alternative is to just add the M_RDONLY flag and define:
> 
> #define M_IS_WRITABLE(m) (!((m)->m_flags & M_RDONLY) && !MEXT_IS_REF(m))
> 
> I think in real terms we may find this is a cleaner solution; This
> check is not going to be a hugely expensive operation compared to
> the alternative above.
> 
> Finally, it might actually be better to have the opposit sense for
> the M_RDONLY flag (call is M_MAYBEWRITEABLE or somethign more
> catchy).  This would mean that if someone initially forgets to set
> the flag, it will be zero and the storage will not accidently be
> written to.  I think it would also give us a better chance of
> catching things we're accidently writing to with KASSERTs.
> 
> 	David.

	If we decide to go with the above proposal, there can be one macro to
  set perms, and it can be of course used to remove a bit as well. There
  can also be two "wrapper" macros that will essentially either call the
  reference count increment macro and set RDONLY if it becomes > 1 or call
  the reference count decrement macro and unset RDONLY if it becomes
  exactly 1. The wrapper macros are only to be called for those wishing
  this specific behavior. We can have a number of such macros, if we judge
  them to be useful, based on the different possibilities that you listed
  at the beginning of this Email.

  Let me know what you think.

  Regards,

  Bosko Milekic
  bmilekic@technokratis.com
   



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0008291843410.9530-100000>