Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Nov 2016 10:26:50 +0000
From:      Tristan Verniquet <tris_vern@hotmail.com>
To:        freebsd hackers <freebsd-hackers@freebsd.org>
Subject:   qsort switching to insertsort
Message-ID:  <ME1PR01MB0546A92343D712D8439B299C83880@ME1PR01MB0546.ausprd01.prod.outlook.com>

next in thread | raw e-mail | index | archive | help
We recently hit an issue with qsort where it was taking 2 minutes to sort 5=
00k entries. Changing to mergesort was comparatively instantaneous (which i=
s what was expected). This was trusted data and shouldn't have been failing=
 due to pivot choice.

After investigation, we discovered we'd been caught by a quirk of FreeBSD's=
 qsort where it will switch to insertsort (even for the whole array) if a p=
ass matches a very simplistic heuristic of doing no swaps.

This has previously been written up about in the following links:

http://calmerthanyouare.org/2013/05/31/qsort-shootout.html

and

http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-l=
ibc-qsort.html


The articles above focus on the security aspect of quicksort, and that all =
implementations are vulnerable, however what seems to be missed in the Free=
BSD case is how simple it is to hit this (really quite bad) worst case scen=
ario with everyday data, as we did. This discovery is enough for us to not =
want to use this version of qsort.

FreeBSD does provide heapsort and mergesort as alternatives, but not merges=
ort_r(). Also, as a developer, having these options gives the impression th=
at choosing qsort will give me a quicksort-like implementation with the tra=
de-offs that come with that, but it is not obvious that I might be given in=
sertsort.

The easiest way forward for us is probably to comment out the offending cod=
e.

But I haven't been able to find much discussion on it. I'm not sure how wel=
l known the quirk is. I'm not sure of the rationality for it in the first p=
lace (obviously a speedup, but whether it was considered alongside the down=
falls), or what other peoples opinions are. So I thought I'd ask.

I've done up a simple test case that is similar to our original usage that =
shows why we were likely to always trigger an insertsort. It has 2 sort fie=
lds in a 1->N relationship with the second fields sorted within the first, =
but the topmost fields slightly out of order. I imagine there are lots of o=
ther "almost sorted" scenarios that could end up being quite bad.

http://tverniquet.com/whirlpool/test_qsort/test_qsort.c


Regards,

Tristan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ME1PR01MB0546A92343D712D8439B299C83880>