Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jan 1997 10:08:09 +0000 (GMT)
From:      Alan Cox <alan@cymru.net>
To:        julian@whistle.com (Julian Elischer)
Cc:        proff@suburbia.net, dg@root.com, hackers@freebsd.org, alan.cox@linux.org
Subject:   Re: SLAB stuff, and applications to current net code (fwd)
Message-ID:  <199701271008.KAA26279@snowcrash.cymru.net>
In-Reply-To: <32EBDC19.794BDF32@whistle.com> from "Julian Elischer" at Jan 26, 97 02:35:05 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> He told me that  originally he looked at using a BSD style allocator,
> but that small allocations of mbufs etc all hit the same cache
> lines as they were always on powers of two. (obviously)
> especially when working on the headers of large packets.
> 
> he saw a noticable problem with cache overwrites.
> 
> I didn't get it all but probably he could tell us more..
> I've CC'd him. He seems very knowledgable about BSD internals 
> and not at all the screaming fanatic that we sometimes see in the linux
> camp so I really enjoyed the conversation..

I'm flattered 8)

The problem occuring was one that seems to occur with a lot of allocators.
Linux like most other OS's tends to do things of the form

	object=some_malloc(size_of_object)

With the older Linux allocator and all the buddy based allocators object
has the property that the lower bits are 0, and the next few bits are 0 with
a passably higher probability. In other words the start of the object tends
to be at
	
	something0000000000

and the 0000000000 tends to include the bits used to select an L1 cache
line.


Its pretty trivial to inspect the Linux and BSD code to see that almost all
objects are of the form


	struct
	{
		most used field
		regularly used field
		buf[blah]; /* rarely accessed */
	}
	[memory allocator slop - never used]

In the end I pulled some messy tricks in Linux 2.0 to keep the cache a bit
saner by building sk_buff's (mbufs to the BSD world) with the buffer
structure at the tail of the object .. ie


	ptr=malloc(size+struct+15)
	struct=ptr-sizeof(struct)
	round_down(ptr,16);
	struct->data=ptr

That combined with the fact most linux sk_buff's have the major headers on
the 2nd and 3rd cache line into the buffer gave me a performance improvement
I could benchmark.

The right answer in current literature is undoubtedly a SLAB allocator and
that is where we are going at the moment.

Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701271008.KAA26279>