Date: Mon, 27 Jan 1997 10:08:09 +0000 (GMT) From: Alan Cox <alan@cymru.net> To: julian@whistle.com (Julian Elischer) Cc: proff@suburbia.net, dg@root.com, hackers@freebsd.org, alan.cox@linux.org Subject: Re: SLAB stuff, and applications to current net code (fwd) Message-ID: <199701271008.KAA26279@snowcrash.cymru.net> In-Reply-To: <32EBDC19.794BDF32@whistle.com> from "Julian Elischer" at Jan 26, 97 02:35:05 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> He told me that originally he looked at using a BSD style allocator, > but that small allocations of mbufs etc all hit the same cache > lines as they were always on powers of two. (obviously) > especially when working on the headers of large packets. > > he saw a noticable problem with cache overwrites. > > I didn't get it all but probably he could tell us more.. > I've CC'd him. He seems very knowledgable about BSD internals > and not at all the screaming fanatic that we sometimes see in the linux > camp so I really enjoyed the conversation.. I'm flattered 8) The problem occuring was one that seems to occur with a lot of allocators. Linux like most other OS's tends to do things of the form object=some_malloc(size_of_object) With the older Linux allocator and all the buddy based allocators object has the property that the lower bits are 0, and the next few bits are 0 with a passably higher probability. In other words the start of the object tends to be at something0000000000 and the 0000000000 tends to include the bits used to select an L1 cache line. Its pretty trivial to inspect the Linux and BSD code to see that almost all objects are of the form struct { most used field regularly used field buf[blah]; /* rarely accessed */ } [memory allocator slop - never used] In the end I pulled some messy tricks in Linux 2.0 to keep the cache a bit saner by building sk_buff's (mbufs to the BSD world) with the buffer structure at the tail of the object .. ie ptr=malloc(size+struct+15) struct=ptr-sizeof(struct) round_down(ptr,16); struct->data=ptr That combined with the fact most linux sk_buff's have the major headers on the 2nd and 3rd cache line into the buffer gave me a performance improvement I could benchmark. The right answer in current literature is undoubtedly a SLAB allocator and that is where we are going at the moment. Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701271008.KAA26279>