Date: Sat, 26 Nov 2016 10:26:50 +0000 From: Tristan Verniquet <tris_vern@hotmail.com> To: freebsd hackers <freebsd-hackers@freebsd.org> Subject: qsort switching to insertsort Message-ID: <ME1PR01MB0546A92343D712D8439B299C83880@ME1PR01MB0546.ausprd01.prod.outlook.com>
next in thread | raw e-mail | index | archive | help
We recently hit an issue with qsort where it was taking 2 minutes to sort 5= 00k entries. Changing to mergesort was comparatively instantaneous (which i= s what was expected). This was trusted data and shouldn't have been failing= due to pivot choice. After investigation, we discovered we'd been caught by a quirk of FreeBSD's= qsort where it will switch to insertsort (even for the whole array) if a p= ass matches a very simplistic heuristic of doing no swaps. This has previously been written up about in the following links: http://calmerthanyouare.org/2013/05/31/qsort-shootout.html and http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-l= ibc-qsort.html The articles above focus on the security aspect of quicksort, and that all = implementations are vulnerable, however what seems to be missed in the Free= BSD case is how simple it is to hit this (really quite bad) worst case scen= ario with everyday data, as we did. This discovery is enough for us to not = want to use this version of qsort. FreeBSD does provide heapsort and mergesort as alternatives, but not merges= ort_r(). Also, as a developer, having these options gives the impression th= at choosing qsort will give me a quicksort-like implementation with the tra= de-offs that come with that, but it is not obvious that I might be given in= sertsort. The easiest way forward for us is probably to comment out the offending cod= e. But I haven't been able to find much discussion on it. I'm not sure how wel= l known the quirk is. I'm not sure of the rationality for it in the first p= lace (obviously a speedup, but whether it was considered alongside the down= falls), or what other peoples opinions are. So I thought I'd ask. I've done up a simple test case that is similar to our original usage that = shows why we were likely to always trigger an insertsort. It has 2 sort fie= lds in a 1->N relationship with the second fields sorted within the first, = but the topmost fields slightly out of order. I imagine there are lots of o= ther "almost sorted" scenarios that could end up being quite bad. http://tverniquet.com/whirlpool/test_qsort/test_qsort.c Regards, Tristan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ME1PR01MB0546A92343D712D8439B299C83880>