From nobody Thu Sep 8 10:50:28 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MNbWF32xBz4bp8X; Thu, 8 Sep 2022 10:50:33 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4MNbWD2Stgz4065; Thu, 8 Sep 2022 10:50:32 +0000 (UTC) (envelope-from hps@selasky.org) Received: from [10.36.2.165] (unknown [178.232.223.95]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 6361726050E; Thu, 8 Sep 2022 12:50:31 +0200 (CEST) Message-ID: Date: Thu, 8 Sep 2022 12:50:28 +0200 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US To: FreeBSD Current , "freebsd-arch@freebsd.org" From: Hans Petter Selasky Subject: [RFC] Proposal adding new sorting algorithm, bsort() to libc Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4MNbWD2Stgz4065 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-3.30 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net:c]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; MLMMJ_DEST(0.00)[freebsd-arch@FreeBSD.org,freebsd-current@freebsd.org]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; DMARC_NA(0.00)[selasky.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N See: https://reviews.freebsd.org/D36493 Looking through base I see qsort() being used in places it shouldn't be used. For example in fts_open(). If for example you fill a directory with 64k simply numerical file names in the wrong order and ask fts_open() to sort these ascending for example, qsort() will end stuck for a long-long time. So either switch to mergesort, or if malloc() is unacceptable, use something like bsort() which I've implemented in the above review as a drop-in replacement for qsort(). The advantage with bsort() is that in can be CPU accelerated, due to fixed comparison patterns. Quick sort is not always a quick sorting algorithm. Quick means simple, and not clever this time. For the qsort's bad pattern, sorting 4096 entries 1024 times in a row took: qsort: 15 seconds bsort: 230 milliseconds (non-CPU accelerated) mergesort: 30 milliseconds The problem with qsort() is that as the array size grows, the time consumption just gets worse and worse for the bad-patterns. Sorry there is no nice and fancy paper yet about this. --HPS