Date: Mon, 22 May 2017 21:11:27 +0200 From: Jens Stimpfle <jfs@jstimpfle.de> To: freebsd-hackers@freebsd.org Subject: Re: Improved Red-black tree implementation Message-ID: <20170522191127.GB27949@jstimpfle.de> In-Reply-To: <59227456.9020805@embedded-brains.de> References: <20170514050220.GA768@jstimpfle.de> <591EA74A.3080500@embedded-brains.de> <20170520165303.GA22452@jstimpfle.de> <59227456.9020805@embedded-brains.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 22, 2017 at 07:17:10AM +0200, Sebastian Huber wrote: > >Considering memory savings, have you considered 2-pointer trees (no > >parent pointer)? They need a temporary search stack for insertion and > >deletion, but not for iteration ("threaded trees"). > > The RB and Eternally Confuzzled variants don't use a parent pointer as far > as I remember. I don't know "RB", but a few more words since both do not do too well in your bench, performance-wise. I know that Eternally Confuzzled uses a possibly easier to implement ("top-down") algorithm which is expected to be slower. A 2-ptr tree with a conventional implementation need not be much slower than a 3-ptr, or could in fact be faster due to memory savings. Here are measurements including an unpolished 2-ptr tree. Amd X2-250 (x86-64): ==================== SIL 3-pointer Red-black tree: 0.699s 0.646s 0.635s SIL 2-pointer Red-black tree: 0.856s 0.856s 0.856s STL std::set: 0.750s 0.751s 0.750s BSD RB tree: 0.804s 0.832s 0.828s Raspberry Pi 2 (ARM ???) ======================== SIL 3-pointer Red-black tree: 3.844s 3.841s 3.839s SIL 2-pointer Red-black tree: 4.002s 4.231s 3.954s STL std::set: 3.742s 3.786s 3.776s BSD RB tree: 4.112s 4.107s 4.105s That particular benchmark generates a random permutation of [0,2^16) in a flat array of C ints. It adds 2^15 random elements from that array to an empty tree and then makes another 2^15 operations, randomly chosen to be add, delete, or find (in 128-bursts) on random elements from the array. The variance in the measurements accounts for the randomly chosen array elements. There is no variance if I chose neighbouring elements or when the bench data is small enough to fit in the L1 cache (it's also about 100x faster). The measurements do not account for the performance disadvantage of 2-ptr that is having to "find", i.e. create a search stack, an (intrusively linked) element that you already hold in your hands, if you want to delete it or use it as a hint to add a similar element. This is to my knowledge the only case where maybe 2x slowdown can be expected. Jens
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170522191127.GB27949>