FreeBSD Mail Archives

Date:      Wed, 20 Jun 2001 13:59:39 -0400
From:      Bosko Milekic <bmilekic@technokratis.com>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Terry Lambert <tlambert@primenet.com>, freebsd-alpha@FreeBSD.ORG
Subject:   Re: vx, lge, nge, dc, rl, sf, sis, sk, vr, wb users please TEST
Message-ID:  <20010620135939.A34888@technokratis.com>
In-Reply-To: <3B30D941.6AE93443@mindspring.com>; from tlambert2@mindspring.com on Wed, Jun 20, 2001 at 10:11:29AM -0700
References:  <20010619191602.A28591@technokratis.com> <200106200224.TAA24251@usr05.primenet.com> <20010619232624.A29829@technokratis.com> <3B304ADF.C5131399@mindspring.com> <20010620123029.A34452@technokratis.com> <3B30D941.6AE93443@mindspring.com>


  Before I go into this [the topic seems to have diverged a little], has
anybody gotten around to testing (or has the hardware to do the testing
required?) - if some of you guys are stuck with the fact that you have the
hardware but don't run -CURRENT, please let me know - I could generate an
equivalent patch for -STABLE, with a little work.

On Wed, Jun 20, 2001 at 10:11:29AM -0700, Terry Lambert wrote:
> I can see where that would be a problem with the new
> allocator.
> 
> May I offer a suggestion?  The purpose of having the
> clusters be a fixed size, given a much larger number
> of mbufs has never really been clear to me, given that
> mbufs are allocated to act as cluster headers, to make
> things a tiny bit (but not substantially) easier, when
> it comes to freing chains, etc..

	The reason is that clusters are not used everywhere, but mbufs are
(even when the storage type is not a cluster).

> It seems to me that what you really want to do is
> allocate _different sizes_ of mbufs, and have the
> deallocator sort them out on free.

	Believe me, I've thought about this. Alfred Perlstein has been pushing
me to do something like this for a while now. There are several problems with
the suggestion, but it's something to consider in the future. I'd rather allow
mb_alloc to stabilize a little more after committing it, work at lockifying
net*/, unwind Giant, and then continue to deal with issues such as this.
 
> This could result in a substantial space savings, as
> the majority of the mbuf used for the cluster header
> is not used for anything useful.

	Yeah.

> Do you have any interest in generalizing your allocator?

	Well, not really. See, the allocator is a specialization of a general
allocator. One of the main design goals of the allocator is performance. I 
tried very hard to give the new allocator the advantages required for
scalability and the "infrastructure" required to reclaim memory while keeping
approximately the same allocation/deallocation standards of the present
allocator. One important performance advantage of the present allocator
relative to say the NetBSD or OpenBSD mbuf allocations/deallocations is that
we *never* have to free pages back to the map from m_free() (an expensive
procedure). This is precisely why I'd like to have freeing eventually
implemented from a kproc, when it can be handled only when really needed
without affecting network performance. General purpose allocations should
probably be handled differently.

> Eventually, you will probably want to do allocations of
> things other than mbufs.

	There *is* a general version of an allocator such as mb_alloc. In fact,
as is mentionned in the introductory comments in subr_mbuf.c, mb_alloc is
in part based on Alfred's "memcache" allocator. Although a little outdated,
the source is:

	http://people.freebsd.org/~alfred/memcache/

	All it needs is a little bit of cleaning up/fixing up and it's ready
to fly for general purpose allocations. Keep in mind, though, that these types
of allocators have a wastage factor that becomes significant as the size of
the objects being allocated approaches (falls to) the size of a pointer. The
reason is that the free list is implemented with an allocated pointer array
[it's done this way for very specific reasons which i'll keep out of this
Email] and if you're allocating a page worth of ptr-size objects, youre spending
a whole other page for the pointers for a one page worth freelist for these
objects.
	The whole reason I wrote mb_alloc separately from memcache altogether
is to allow for the different type of freeing to occur, and to allow for
future very mbuf-specific alloc. optimizations to occur, and to allow us to
inline the code in the mbuf allocation functions, and ... (some other less
worthy reasons) [glancing over subr_mbuf.c a few times should make these
things obvious].

> Also, the buckets should probably be permitted to be
> some multiple of the page size, in order to permit the
> allocation of odd-sized structures, and allow them to
> span page boundaries, if you went ahead with a more
> general approach.

	This is done in memcache, actually.
 
> I guess the next thing to think about after that would
> be allocation at interrupt time.  I think this can be
> done using the ziniti() approach; but you would then
> reserve the KVA space for use by the allocator at page
> fault time, instead.

	This is for the general allocator, right? Yeah, memcache can be made
to optionally reserve KVA space for interrupt-time allocations; mind you, it
would only serve as an optimization. As I previously mentionned for mb_alloc,
the KVA space needed is already reserved.

> I have some other methods of getting around faulting,
> when you have sufficient backing store.  Now that there
> are systems that have the capability of having as much
> RAM as the KVA space (e.g. the KVA space is really no
> longer sparse), there are a number of optimizations
> that become pretty obvious.

	Cool. Do share. Had my trip to Usenix this year not been cancelled
due to the fact that I'm going to see family in Yugoslavia this Saturday,
I would have gathered you and Alfred together, and listened over a few beers.

> That's not true for the Alpha, yet, but the Intel KVA
> space is definitely matured to the point where physical
> RAM equals the possible KVA + user space addressable
> memory.
> 
> NB: The windowed access to more than 4G (e.g. the AMD
> 16G processor that uses "megasegments" to access the
> extra memory) has nevery struck me as being useful,
> unless you can partition you working set data enough
> to deal with the window flipping overhead issue that
> would result, so I'm not too concerned about trying
> to support them -- they seem more useful for VMWare
> type applications, where you context switch at a very
> low level in order to run multiple instances of a
> kernel, or similar pig tricks.  The IA64 and AMD
> "Sledgehammer" will make us think about these things
> again, as soon as someone comes up with a MB/Memory
> sticks that beat the 4G limit without melting down.
> 
> -- Terry

Regards,
-- 
 Bosko Milekic
 bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010620135939.A34888>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation