Date: Wed, 20 Jun 2001 13:59:39 -0400 From: Bosko Milekic <bmilekic@technokratis.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: Terry Lambert <tlambert@primenet.com>, freebsd-alpha@FreeBSD.ORG Subject: Re: vx, lge, nge, dc, rl, sf, sis, sk, vr, wb users please TEST Message-ID: <20010620135939.A34888@technokratis.com> In-Reply-To: <3B30D941.6AE93443@mindspring.com>; from tlambert2@mindspring.com on Wed, Jun 20, 2001 at 10:11:29AM -0700 References: <20010619191602.A28591@technokratis.com> <200106200224.TAA24251@usr05.primenet.com> <20010619232624.A29829@technokratis.com> <3B304ADF.C5131399@mindspring.com> <20010620123029.A34452@technokratis.com> <3B30D941.6AE93443@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Before I go into this [the topic seems to have diverged a little], has anybody gotten around to testing (or has the hardware to do the testing required?) - if some of you guys are stuck with the fact that you have the hardware but don't run -CURRENT, please let me know - I could generate an equivalent patch for -STABLE, with a little work. On Wed, Jun 20, 2001 at 10:11:29AM -0700, Terry Lambert wrote: > I can see where that would be a problem with the new > allocator. > > May I offer a suggestion? The purpose of having the > clusters be a fixed size, given a much larger number > of mbufs has never really been clear to me, given that > mbufs are allocated to act as cluster headers, to make > things a tiny bit (but not substantially) easier, when > it comes to freing chains, etc.. The reason is that clusters are not used everywhere, but mbufs are (even when the storage type is not a cluster). > It seems to me that what you really want to do is > allocate _different sizes_ of mbufs, and have the > deallocator sort them out on free. Believe me, I've thought about this. Alfred Perlstein has been pushing me to do something like this for a while now. There are several problems with the suggestion, but it's something to consider in the future. I'd rather allow mb_alloc to stabilize a little more after committing it, work at lockifying net*/, unwind Giant, and then continue to deal with issues such as this. > This could result in a substantial space savings, as > the majority of the mbuf used for the cluster header > is not used for anything useful. Yeah. > Do you have any interest in generalizing your allocator? Well, not really. See, the allocator is a specialization of a general allocator. One of the main design goals of the allocator is performance. I tried very hard to give the new allocator the advantages required for scalability and the "infrastructure" required to reclaim memory while keeping approximately the same allocation/deallocation standards of the present allocator. One important performance advantage of the present allocator relative to say the NetBSD or OpenBSD mbuf allocations/deallocations is that we *never* have to free pages back to the map from m_free() (an expensive procedure). This is precisely why I'd like to have freeing eventually implemented from a kproc, when it can be handled only when really needed without affecting network performance. General purpose allocations should probably be handled differently. > Eventually, you will probably want to do allocations of > things other than mbufs. There *is* a general version of an allocator such as mb_alloc. In fact, as is mentionned in the introductory comments in subr_mbuf.c, mb_alloc is in part based on Alfred's "memcache" allocator. Although a little outdated, the source is: http://people.freebsd.org/~alfred/memcache/ All it needs is a little bit of cleaning up/fixing up and it's ready to fly for general purpose allocations. Keep in mind, though, that these types of allocators have a wastage factor that becomes significant as the size of the objects being allocated approaches (falls to) the size of a pointer. The reason is that the free list is implemented with an allocated pointer array [it's done this way for very specific reasons which i'll keep out of this Email] and if you're allocating a page worth of ptr-size objects, youre spending a whole other page for the pointers for a one page worth freelist for these objects. The whole reason I wrote mb_alloc separately from memcache altogether is to allow for the different type of freeing to occur, and to allow for future very mbuf-specific alloc. optimizations to occur, and to allow us to inline the code in the mbuf allocation functions, and ... (some other less worthy reasons) [glancing over subr_mbuf.c a few times should make these things obvious]. > Also, the buckets should probably be permitted to be > some multiple of the page size, in order to permit the > allocation of odd-sized structures, and allow them to > span page boundaries, if you went ahead with a more > general approach. This is done in memcache, actually. > I guess the next thing to think about after that would > be allocation at interrupt time. I think this can be > done using the ziniti() approach; but you would then > reserve the KVA space for use by the allocator at page > fault time, instead. This is for the general allocator, right? Yeah, memcache can be made to optionally reserve KVA space for interrupt-time allocations; mind you, it would only serve as an optimization. As I previously mentionned for mb_alloc, the KVA space needed is already reserved. > I have some other methods of getting around faulting, > when you have sufficient backing store. Now that there > are systems that have the capability of having as much > RAM as the KVA space (e.g. the KVA space is really no > longer sparse), there are a number of optimizations > that become pretty obvious. Cool. Do share. Had my trip to Usenix this year not been cancelled due to the fact that I'm going to see family in Yugoslavia this Saturday, I would have gathered you and Alfred together, and listened over a few beers. > That's not true for the Alpha, yet, but the Intel KVA > space is definitely matured to the point where physical > RAM equals the possible KVA + user space addressable > memory. > > NB: The windowed access to more than 4G (e.g. the AMD > 16G processor that uses "megasegments" to access the > extra memory) has nevery struck me as being useful, > unless you can partition you working set data enough > to deal with the window flipping overhead issue that > would result, so I'm not too concerned about trying > to support them -- they seem more useful for VMWare > type applications, where you context switch at a very > low level in order to run multiple instances of a > kernel, or similar pig tricks. The IA64 and AMD > "Sledgehammer" will make us think about these things > again, as soon as someone comes up with a MB/Memory > sticks that beat the 4G limit without melting down. > > -- Terry Regards, -- Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010620135939.A34888>