From owner-freebsd-hackers@freebsd.org Sun Nov 27 21:33:43 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AF7AC59299 for ; Sun, 27 Nov 2016 21:33:43 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-yw0-x22c.google.com (mail-yw0-x22c.google.com [IPv6:2607:f8b0:4002:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B856D2492; Sun, 27 Nov 2016 21:33:42 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: by mail-yw0-x22c.google.com with SMTP id i145so101146515ywg.2; Sun, 27 Nov 2016 13:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=zGmkeISk8EfnG3h3aYUNUXFMTbtkVyvp04luX+8bCmM=; b=iUwTzILveMRqL5C2KTcrKnnnbq6QbNVyCZ1c20KQ98FU9kQ+nCo8j7MOORoiDvrrwY iQ45NVqUu2AB7Wx4bx0VOAiuc546RCrY2eGtSK3h8WsR6EN53GY7+MIGywnjuwkaoSEF nq4s24FkWXTmStUt6Pgm2LFb9IAIuA70enFwTz0OlY0mNKTYvAz81ilgTt5ZNo2yg0Dc z2wEQ41MBTwupWwT2pHw/b/QmN/5II2+L3rl6Sir1HBqyCuVvVHvhcJTdvRJYMQF9zah i8byVxBAdM53/F32fqqd3oCK3u5PH672kXzpUnEsZFzeKFKcRzWcaeEslIdWwDSYeMm9 QgRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=zGmkeISk8EfnG3h3aYUNUXFMTbtkVyvp04luX+8bCmM=; b=XyF7DNKRMez++hdl8E0S1jPV5tDXUHqjRauNueQhc8l3WcRp5idXJXZyuPoc/RtTxR OXLL5VcZoYla9D5a2scpDLbzixCPIfslA5rucqV3FJ7w0lvGRfWxYnSAPGnNUMvfoqyz 2yxyzfwT0U5wxV3x1kNH7fqW2YFjKVynEOXYRTV9xd0DdMr9xEsZ2Z3PlDvIJNP9GIDV zZ49295/hfNcIYNdrxg40wxIQASvvubWfWnERR5EY5VpNRUuc4XazoMltE99pSDnPTBV 7fzcbNEX3oAs/JzzKVjqXo95CfDj80MhvaiTHtUs5qpaT/Y3Ez4e35jGr3Q/yAadx2VH 86Yg== X-Gm-Message-State: AKaTC02mahAVO6/28MFZrkdnikZVGYDboDx6Zc61fIQGsgRpLRggaKpc6XcsWQ6bj4DMwvqKtVQXkhEPqzyODg== X-Received: by 10.13.202.196 with SMTP id m187mr22127468ywd.11.1480282421607; Sun, 27 Nov 2016 13:33:41 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.170.67 with HTTP; Sun, 27 Nov 2016 13:33:41 -0800 (PST) In-Reply-To: <8b0bd8e7-a77d-85d8-18e7-e1e33fed78f5@freebsd.org> References: <0bfb49b0-5d24-2766-6982-b4e49b0d5e81@selasky.org> <8b0bd8e7-a77d-85d8-18e7-e1e33fed78f5@freebsd.org> From: Mehmet Erol Sanliturk Date: Sun, 27 Nov 2016 13:33:41 -0800 Message-ID: Subject: Re: qsort switching to insertsort To: Alfred Perlstein Cc: Tristan Verniquet , =?UTF-8?Q?Arne_Dag_Fidjest=C3=B8l?= , Hans Petter Selasky , freebsd hackers Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Nov 2016 21:33:43 -0000 On Sat, Nov 26, 2016 at 10:40 PM, Alfred Perlstein wrote: > A couple notes: > > 1) It's interesting to me that it's based on a simple heuristic as oppose= d > to also checking the depth of the recursion within the qsort itself. > > Aim is not to enter into quick sort at the beginning . Number of sorted pairs can be counted on ( input or generation ) of data to sort . Based on this count , the sorting method may be selected as either quick or an alternative sort which will be able to sort the data deterministically . It is possible to implement a robust quick sort if ( time and/or resources ) are sufficient to use . When there is not sufficient resources , a simple method is better than a more optimized ( which will not be attainable at present state ) . > 2) Wondering if upon detection of this situation it might even be faster > to perform some randomization (shuffle) of the data and then retry. > > Instead of ( entering into quick sort , fail , and , search a way to recover ) , I would prefer to use another method will will produce a desired result by using more time . Sometimes , it is very difficult to generate a "universal" algorithm that will be able to cure "all" difficulties . > 3) Wondering if there could be a way to return an error and indicate the > data is unsorted if the corner case is hit allowing the caller to make a > choice at that point. Obviously the API would need to be changed. > > Problem is lying in the complexity of writing a very robust quick sort algorithm . There are many ( at least parts in ) books , papers on these subject . There is also a cost of obtaining / reading / utilizing such resources . When the available resource is only a "data structures" book , the cheapest solution may be the above described method to sort the data without entering into error generation and recover from it . > I haven't thought too hard on this. > > One thing to keep in mind: Using qsort in a codepath has "deadlines" or > other deterministic needs is not going to work out well. It's better to > use a sort with a known complexity with an upper bound than something tha= t > can be defeated by a pathological input data set. > > -Alfred > > > On 11/26/16 8:17 PM, Tristan Verniquet wrote: > >> From: Mehmet Erol Sanliturk >> Sent: Sunday, 27 November 2016 5:51 AM >> To: Arne Dag Fidjest=C3=B8l >> Cc: Tristan Verniquet; Hans Petter Selasky; freebsd hackers >> Subject: Re: qsort switching to insertsort >> >> >> >> On Sat, Nov 26, 2016 at 9:14 AM, Arne Dag Fidjest=C3=B8l >> wrote: >> >> On 26 Nov 2016, at 15:35, Mehmet Erol Sanliturk >>> wrote: >>> >>> In quick sort , it is necessary to check worst case and switch to anoth= er >>> sort method if it is detected . >>> When this is not done , end result is stack overflow , etc . , but not >>> success . >>> Therefore , it is not possible to eliminate an alternative sort method = . >>> >> I you sort the smaller partition recursively first, and then sort the >> larger partition either by tail recursion or iteration, you will only >> consume O(log n) of stack, so stack overflow needn=E2=80=99t be an issue= , >> regardless of the input. >> >> -adf >> >> >> >> Important problem is caused by almost sorted values . Myself , I am >> counting the sorted elements first , if they exceed a large percentage o= f >> number of all elements , then I am switching to an alternative sort , >> otherwise a quick sort is used . This is for a simple application . >> >> In an operating system , more complex algorithms may be more useful . >> >> >> Mehmet Erol Sanliturk >> >> --- >> >> It can still trigger with completely unsorted data in the top and bottom >> half, as long as it chooses the middle value for the pivot. The main rea= son >> nearly sorted data is vulnerable is that it is more likely to match thes= e >> conditions, and more likely to happen in real life situations. >> >> But this is why i don't really consider it an "edge" case, there would b= e >> a whole class of situations (like the one we had) where it would be very >> likely to trigger with very bad side effects. >> >> Maybe, does anyone continue to use FreeBSD qsort while being aware of >> this implementation detail? Under what conditions/assurances? >> >> Does anyone use FreeBSDs qsort because of this feature? >> >> Tristan >> ________________________________ >> From: Mehmet Erol Sanliturk >> Sent: Sunday, 27 November 2016 5:51:31 AM >> To: Arne Dag Fidjest=C3=B8l >> Cc: Tristan Verniquet; Hans Petter Selasky; freebsd hackers >> Subject: Re: qsort switching to insertsort >> >> >> >> On Sat, Nov 26, 2016 at 9:14 AM, Arne Dag Fidjest=C3=B8l > > wrote: >> >> On 26 Nov 2016, at 15:35, Mehmet Erol Sanliturk >> > wrote: >>> >>> In quick sort , it is necessary to check worst case and switch to anoth= er >>> sort method if it is detected . >>> When this is not done , end result is stack overflow , etc . , but not >>> success . >>> Therefore , it is not possible to eliminate an alternative sort method = . >>> >> I you sort the smaller partition recursively first, and then sort the >> larger partition either by tail recursion or iteration, you will only >> consume O(log n) of stack, so stack overflow needn=E2=80=99t be an issue= , >> regardless of the input. >> >> -adf >> >> >> >> Important problem is caused by almost sorted values . Myself , I am >> counting the sorted elements first , if they exceed a large percentage o= f >> number of all elements , then I am switching to an alternative sort , >> otherwise a quick sort is used . This is for a simple application . >> >> In an operating system , more complex algorithms may be more useful . >> >> >> Mehmet Erol Sanliturk >> >> >> >> _______________________________________________ >> >> Mehmet Erol Sanliturk