From owner-freebsd-net Fri Jul 12 11:21:11 2002 Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CAC2337B401 for ; Fri, 12 Jul 2002 11:21:05 -0700 (PDT) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id DFA8243E31 for ; Fri, 12 Jul 2002 11:21:04 -0700 (PDT) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id g6CIKLf44665; Fri, 12 Jul 2002 14:20:21 -0400 (EDT) (envelope-from bmilekic@unixdaemons.com) Date: Fri, 12 Jul 2002 14:20:21 -0400 From: Bosko Milekic To: John Polstra Cc: net@freebsd.org Subject: Re: mbuf external buffer reference counters Message-ID: <20020712142021.A44645@unixdaemons.com> References: <20020711162026.A18717@unixdaemons.com> <200207121803.g6CI3je9008944@vashon.polstra.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5.1i In-Reply-To: <200207121803.g6CI3je9008944@vashon.polstra.com>; from jdp@polstra.com on Fri, Jul 12, 2002 at 11:03:45AM -0700 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Fri, Jul 12, 2002 at 11:03:45AM -0700, John Polstra wrote: > I've been out of town and I realize I'm coming into this thread late > and that it has evolved a bit. But I still think it's worthwhile to > point out a very big problem with the idea of putting the reference > count at the end of each mbuf cluster. It would have disastrous > consequences for performance because of cache effects. Bear with me > through a little bit of arithmetic. > > Consider a typical PIII CPU that has a 256 kbyte 4-way set-associative > L2 cache with 32-byte cache lines. 4-way means that there are 4 > different cache lines associated with each address. Each group of 4 > is called a set, and each set covers 32 bytes of the address space > (the cache line size). > > The total number of sets is: > > 256 kbytes / 32 bytes per line / 4 lines per set = 2048 sets > > and as mentioned above, each set covers 32 bytes. > > The cache wraps around every 256 kbytes / 4-way = 64 kbytes of address > space. In other words, if address N maps onto a given set, then > addresses N + 64k, N + 128k, etc. all map onto the same set. > > An mbuf cluster is 2 kbytes and all mbuf clusters are well-aligned. > So the wrap around of the cache occurs every 64 kbytes / 2 kbytes per > cluster = 32 clusters. To put it another way, all of the reference > counts would be sharing (i.e., competing for) the same 32 cache sets > and they would never utilize the remaining 2061 sets at all. Only > 1.56% of the cache (32 sets / 2048 sets) would be usable for the > reference counts. This means there would be a lot of cache misses as > reference count updates caused other reference counts to be flushed > from the cache. > > These cache effects are huge, and they are growing all the time as CPU > speeds increase while RAM speeds remain relatively constant. I've thought about the cache issue with regards to the ref. counts before, actually, and initially, I also thought the exact same thing as you bring up here. However, there are a few things you need to remember: 1) SMP; counters are typically referenced by several different threads which may be running on different CPUs at any given point in time, and this means that we'll probably end up having corresponding cache lines invalidated back and forth anyway; 2) Using more cache lines may not be better overall, we may be doing write-backs of other data already there; in any case, we would really have to measure this; 3) By far the most important: all modifications to the ref. count are atomic, bus-locked, ops. I spoke to Peter a little about this and although I'm not 100% sure, we think that bus-locked fetch-inc/dec-stores need the bus anyway. If that's the case, then we really don't care about whether or not they get cached, right? > John > -- > John Polstra > John D. Polstra & Co., Inc. Seattle, Washington USA > "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa Thanks for the cool infos. and feedback. Regards, -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message