From owner-freebsd-net Fri Jul 12 11: 3:54 2002 Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6ABF237B400 for ; Fri, 12 Jul 2002 11:03:51 -0700 (PDT) Received: from wall.polstra.com (wall-gw.polstra.com [206.213.73.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1477D43E4A for ; Fri, 12 Jul 2002 11:03:50 -0700 (PDT) (envelope-from jdp@polstra.com) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.11.3/8.11.3) with ESMTP id g6CI3kT24919; Fri, 12 Jul 2002 11:03:46 -0700 (PDT) (envelope-from jdp@vashon.polstra.com) Received: (from jdp@localhost) by vashon.polstra.com (8.12.4/8.12.4/Submit) id g6CI3je9008944; Fri, 12 Jul 2002 11:03:45 -0700 (PDT) (envelope-from jdp) Date: Fri, 12 Jul 2002 11:03:45 -0700 (PDT) Message-Id: <200207121803.g6CI3je9008944@vashon.polstra.com> To: net@freebsd.org From: John Polstra Cc: bmilekic@unixdaemons.com Subject: Re: mbuf external buffer reference counters In-Reply-To: <20020711162026.A18717@unixdaemons.com> References: <20020711162026.A18717@unixdaemons.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org In article <20020711162026.A18717@unixdaemons.com>, Bosko Milekic wrote: > > Right now, in -CURRENT, there is this hack that I introduced that > basically just allocates a ref. counter for external buffers attached > to mbufs with malloc(9). What this means is that if you do something > like allocate an mbuf and then a cluster, there's a malloc() call that > is made to allocate a small (usually 4-byte) reference counter for it. > > That sucks, Eeek, it sure does! > and even -STABLE doesn't do this. I changed it this way > a long time ago for simplicity's sake and since then I've been meaning > to do something better here. The idea was, for mbuf CLUSTERS, to > stash the counter at the end of the 2K buffer area, and to make > MCLBYTES = 2048 - sizeof(refcount), which should be more than enough, > theoretically, for all cluster users. This is by far the easiest > solution (I had it implemented about 10 months ago) and it worked > great. > > The purpose of this Email is to find out if anyone has concrete > information on why this wouldn't work (if they think it wouldn't). I've been out of town and I realize I'm coming into this thread late and that it has evolved a bit. But I still think it's worthwhile to point out a very big problem with the idea of putting the reference count at the end of each mbuf cluster. It would have disastrous consequences for performance because of cache effects. Bear with me through a little bit of arithmetic. Consider a typical PIII CPU that has a 256 kbyte 4-way set-associative L2 cache with 32-byte cache lines. 4-way means that there are 4 different cache lines associated with each address. Each group of 4 is called a set, and each set covers 32 bytes of the address space (the cache line size). The total number of sets is: 256 kbytes / 32 bytes per line / 4 lines per set = 2048 sets and as mentioned above, each set covers 32 bytes. The cache wraps around every 256 kbytes / 4-way = 64 kbytes of address space. In other words, if address N maps onto a given set, then addresses N + 64k, N + 128k, etc. all map onto the same set. An mbuf cluster is 2 kbytes and all mbuf clusters are well-aligned. So the wrap around of the cache occurs every 64 kbytes / 2 kbytes per cluster = 32 clusters. To put it another way, all of the reference counts would be sharing (i.e., competing for) the same 32 cache sets and they would never utilize the remaining 2061 sets at all. Only 1.56% of the cache (32 sets / 2048 sets) would be usable for the reference counts. This means there would be a lot of cache misses as reference count updates caused other reference counts to be flushed from the cache. These cache effects are huge, and they are growing all the time as CPU speeds increase while RAM speeds remain relatively constant. It is much better to have the reference counts laid out as they are in -stable, i.e., one big contiguous block of counts. That way, the counts are spread out through the entire cache and they don't compete with each other nearly so much. That is the underlying principle of slab allocators, by the way. John -- John Polstra John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message