From owner-freebsd-net Tue Aug 29 17:48:19 2000 Delivered-To: freebsd-net@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id B21F637B42C for ; Tue, 29 Aug 2000 17:48:10 -0700 (PDT) Received: from modemcable136.203-201-24.mtl.mc.videotron.net ([24.201.203.136]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G020068KXOJFB@falla.videotron.net> for freebsd-net@FreeBSD.ORG; Tue, 29 Aug 2000 20:36:19 -0400 (EDT) Date: Tue, 29 Aug 2000 20:39:35 -0400 (EDT) From: Bosko Milekic Subject: Re: Proposal to clarify mbuf handling rules In-reply-to: <200008291159.aa60672@salmon.maths.tcd.ie> X-Sender: bmilekic@jehovah.technokratis.com To: David Malone Cc: freebsd-net@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 29 Aug 2000, David Malone wrote: > > (a) The mbuf data should not be written to under any circumstances. > > even if it is a cluster mbuf and the reference count becomes 1. > > > (b) The mbuf data may be written to. This implies that if this mbuf > > is a cluster, then the reference count is 1. > > > (c) The mbuf data may not be written to, BUT if it is a cluster mbuf > > and the reference count decreases to 1, then it may then be > > written to. All of these different possibilities add quite a bit of hysteria for just figuring out which of the above a given mbuf falls under from within the mbuf subsystem alone. I think that it's rather clear that if we do intend to implement some sort of permissions structure with respect to external buffers that we need to have the mbufs referring to these ext_bufs share a unique permissions field. The solution: Change mext_refcnt union to be a mext_descr union with an additional field falling within like so: union mext_descr { union mext_descr *next_desc; struct { u_long refcnt; u_int perms; } desc; }; The cost: Additional 4 bytes per every M_EXT mbuf. (These unions are typically allocated en-masse from mb_map during bootup and more are added from mb_map on demand, assuming we have the space in mb_map, otherwise, ... we'll probably have to implement another tsleep mechanism). perms is to be modified atomically and can contain, for now: M_RDWR 0x00000000 M_RDONLY 0x00000001 [... more to come ...?] (Notice that the mext_descr _only_ contains SMP-safe atomically modifiable and shareable settings for the ext_buf -- this is like shared space for M_EXT mbufs referring to the same ext_buf -- the SMP-friendly version of the doubly-linked list solution). This solution would allow on-the-fly modification of ext_bufs permissions. If this works for you guys, and if you think it's worth it, then we should also add an ext_flags in the m_ext structure while we're add it, because it will cost us nothing (m_ext is within a union containing ample amount of scratch space in an M_EXT mbuf) and because it will allow me to remove the code that assumes that if ext_free is NULL, that we're dealing with an mcluster (which is true for now); but the new check of the flag would make the code a little faster at 0 extra overhead. The idea would be to have that ext_flags contain one of M_CLUSTER, M_WHATEVER, M_OTHER. > I think it would be a nice idea not to tie this to clusters, but > to have the M_RDONLY as an option which is given at the time which > external storage is first attached to a mbuf. This means we set > M_RDONLY for sendfile bufs, but don't set it for mbuf's with no > external storage, regular clusters or for jumbo ethernet storage > (once we've checked this with Bill Paul naturally). > > We will have to be careful with the M_WRITABLE flag, as it is stored > in the mbufs rather than the externam storage, and there may be > multiple mbufs referencing this storage. Think of the following > situations: > > 1) When we first make a mbuf we set M_WRITABLE according to !M_RDONLY. > > 2) If we make a second mbuf which references this original mbuf we > can clear M_WRITABLE in both mbufs, as we have access to both. > > 3) Adding references from now on isn't a problem as M_WRITABLE is > clear. > > 4) Now, consider the situation where we are freeing the second last > mbuf referencing this storage. We have access to the mbuf we are > freeing, but not to the remain mbuf, in which we want to set > M_WRITABLE. > > I can't see an obvious way around this, so we may find situations > where M_WRITABLE is not set, but the storage is writable. This would > translate to a M_IS_WRITABLE macro like: > > #define M_IS_WRITABLE(m) ((m)->m_flags & M_WRITABLE) || \ > (!((m)->m_flags & M_RDONLY) && !MEXT_IS_REF(m) && \ > ((m)->m_flags |= M_WRITABLE) \ > ) > > Note, this macro actually sets the M_WRITABLE flag, which is a bit > icky - but seems reasonable given that M_WRITABLE is supposed to be > cache for the value of !((m)->m_flags & M_RDONLY) && !MEXT_IS_REF(m). > The alternative is to just add the M_RDONLY flag and define: > > #define M_IS_WRITABLE(m) (!((m)->m_flags & M_RDONLY) && !MEXT_IS_REF(m)) > > I think in real terms we may find this is a cleaner solution; This > check is not going to be a hugely expensive operation compared to > the alternative above. > > Finally, it might actually be better to have the opposit sense for > the M_RDONLY flag (call is M_MAYBEWRITEABLE or somethign more > catchy). This would mean that if someone initially forgets to set > the flag, it will be zero and the storage will not accidently be > written to. I think it would also give us a better chance of > catching things we're accidently writing to with KASSERTs. > > David. If we decide to go with the above proposal, there can be one macro to set perms, and it can be of course used to remove a bit as well. There can also be two "wrapper" macros that will essentially either call the reference count increment macro and set RDONLY if it becomes > 1 or call the reference count decrement macro and unset RDONLY if it becomes exactly 1. The wrapper macros are only to be called for those wishing this specific behavior. We can have a number of such macros, if we judge them to be useful, based on the different possibilities that you listed at the beginning of this Email. Let me know what you think. Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message