From nobody Wed Apr 19 20:17:57 2023
X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q1sY413qJz46Pv6
	for <dev-commits-src-all@mlmmj.nyi.freebsd.org>; Wed, 19 Apr 2023 20:18:00 +0000 (UTC)
	(envelope-from jrtc27@jrtc27.com)
Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4Q1sY36Xylz3xhR
	for <dev-commits-src-all@freebsd.org>; Wed, 19 Apr 2023 20:17:59 +0000 (UTC)
	(envelope-from jrtc27@jrtc27.com)
Authentication-Results: mx1.freebsd.org;
	none
Received: by mail-wm1-f43.google.com with SMTP id n43-20020a05600c502b00b003f17466a9c1so2080533wmr.2
        for <dev-commits-src-all@freebsd.org>; Wed, 19 Apr 2023 13:17:59 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681935478; x=1684527478;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=rwUMbvtlVqEwBFoH8OQ9M6yG9iPsfzCkBhQ+lfoGb4k=;
        b=ERsatxRca40dz3ZY0wddvBA8ZVUQknuhVH2FYQzwrdwnitEmGHpqtvN5uFCNFrOEnp
         AjV3mLZec9Rl6zWe5YgBIGoOMU07p5N7e/6jaKZHQUHNPfVTIj/AxC/ndhy9yF6jloJv
         +wd5ur39hIltFVlpaD2ktwFtvZn3flwzvoFLIDVxUTJ7ocHVSwUYwyZq0TVxUYm5nF9c
         W7CCgcLz0/I6NBJ0TxaQlsEnoZ6jsXXgPZm1hnnuDQ2OgLz8mAXLAmpchvwhNLFyBXqN
         PYU2U9i5p3xOjR/0zR0qTEISsDh6DDnvZZZ1Ek9F+t+PMdUGkbrAt/e8bPbLX9FU5IOa
         tGYg==
X-Gm-Message-State: AAQBX9d318M5wtf/HQT3bNs34nxnG6M4lluH29mpge3249g7TT62CBvE
	ptiaVtQhzsbqKOjlTqhAr4uah0tv+UvFuQCkyQxhrg==
X-Google-Smtp-Source: AKy350awf187czuAFzk/xqSPRSZhDJfdQ9w71xEExB/KQg2ktpR8Fd9NvLE18BBKjqEgAAYJ+lDAww==
X-Received: by 2002:a05:600c:538f:b0:3f1:7510:62e8 with SMTP id hg15-20020a05600c538f00b003f1751062e8mr2717152wmb.3.1681935478371;
        Wed, 19 Apr 2023 13:17:58 -0700 (PDT)
Received: from smtpclient.apple ([131.111.5.246])
        by smtp.gmail.com with ESMTPSA id j22-20020a5d6e56000000b002fbb285b01fsm36218wrz.25.2023.04.19.13.17.57
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 19 Apr 2023 13:17:57 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
List-Id: Commit messages for all branches of the src repository <dev-commits-src-all.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all
List-Help: <mailto:dev-commits-src-all+help@freebsd.org>
List-Post: <mailto:dev-commits-src-all@freebsd.org>
List-Subscribe: <mailto:dev-commits-src-all+subscribe@freebsd.org>
List-Unsubscribe: <mailto:dev-commits-src-all+unsubscribe@freebsd.org>
Sender: owner-dev-commits-src-all@freebsd.org
X-BeenThere: dev-commits-src-all@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\))
Subject: Re: git: 8dcf3a82c54c - main - libc: Implement bsort(3) a bitonic
 type of sorting algorithm.
From: Jessica Clarke <jrtc27@freebsd.org>
In-Reply-To: <e2ea5c89-aa48-e257-b9d5-7fe783bf216e@freebsd.org>
Date: Wed, 19 Apr 2023 21:17:57 +0100
Cc: Brooks Davis <brooks@freebsd.org>,
 src-committers <src-committers@freebsd.org>,
 dev-commits-src-all@freebsd.org,
 dev-commits-src-main@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <2840BE79-CC25-427A-A5E9-476A38E749E3@freebsd.org>
References: <202304191206.33JC6Qcp062380@gitrepo.freebsd.org>
 <ZEAM3rO9r5e97BHE@spindle.one-eyed-alien.net>
 <a073fd36-9aa9-e988-0cc5-86af305fb654@freebsd.org>
 <3B5734C4-8630-4CD5-BA8D-DE33899161F1@freebsd.org>
 <e2ea5c89-aa48-e257-b9d5-7fe783bf216e@freebsd.org>
To: Hans Petter Selasky <hselasky@freebsd.org>
X-Mailer: Apple Mail (2.3696.120.41.1.1)
X-Rspamd-Queue-Id: 4Q1sY36Xylz3xhR
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-ThisMailContainsUnwantedMimeParts: N

On 19 Apr 2023, at 20:27, Hans Petter Selasky <hselasky@freebsd.org> =
wrote:
>=20
> On 4/19/23 19:51, Jessica Clarke wrote:
>> On 19 Apr 2023, at 18:24, Hans Petter Selasky <hselasky@freebsd.org> =
wrote:
>>>=20
>>> On 4/19/23 17:46, Brooks Davis wrote:
>>>> On Wed, Apr 19, 2023 at 12:06:26PM +0000, Hans Petter Selasky =
wrote:
>>>>> The branch main has been updated by hselasky:
>>>>>=20
>>>>> URL: =
https://cgit.FreeBSD.org/src/commit/?id=3D8dcf3a82c54cb216df3213a013047907=
636a01da
>>>>>=20
>>>>> commit 8dcf3a82c54cb216df3213a013047907636a01da
>>>>> Author:     Hans Petter Selasky <hselasky@FreeBSD.org>
>>>>> AuthorDate: 2022-09-08 10:16:43 +0000
>>>>> Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
>>>>> CommitDate: 2023-04-19 12:04:22 +0000
>>>>>=20
>>>>>     libc: Implement bsort(3) a bitonic type of sorting algorithm.
>>>>>          The bsort(3) algorithm works by swapping objects, =
similarly to qsort(3),
>>>>>     and does not require any significant amount of additional =
memory.
>>>>>          The bsort(3) algorithm doesn't suffer from the processing =
time issues
>>>>>     known the plague the qsort(3) family of algorithms, and is =
bounded by
>>>>>     a complexity of O(log2(N) * log2(N) * N), where N is the =
number of
>>>>>     elements in the sorting array. The additional complexity =
compared to
>>>>>     mergesort(3) is a fair tradeoff in situations where no memory =
may
>>>>>     be allocated.
>>>>>          The bsort(3) APIs are identical to those of qsort(3), =
allowing for
>>>>>     easy drop-in and testing.
>>>>>          The design of the bsort(3) algorithm allows for future =
parallell CPU
>>>>>     execution when sorting arrays. The current version of the =
bsort(3)
>>>>>     algorithm is single threaded. This is possible because fixed =
areas
>>>>>     of the sorting data is compared at a time, and can easily be =
divided
>>>>>     among different CPU's to sort large arrays faster.
>>>>>          Reviewed by:    gbe@, delphij@, pauamma_gundo.com =
(manpages)
>>>>>     Sponsored by:   NVIDIA Networking
>>>>>     Differential Revision:  https://reviews.freebsd.org/D36493
>>>> Why was this needed?  I'm not really a fan to adding new, =
non-standard
>>>> sorts without a clear use case.  I understand it has some specific
>>>> advantages vs qsort, but are we going to use it?  Doing so will =
make our
>>>> code less portable so we almost certainly can't s/qsort/bsort/.
>>>=20
>>> Hi Brooks,
>>>=20
>>> My long term plan is to get bsort() to replace qsort(), but because =
the two algorithms have different characteristics, then some people may =
complain it is good for me, but not for you. I want there to be an =
option besides from qsort(), which does not use malloc() as an integral =
part of sorting. And is faster than O(N*N) sorting (still the worst case =
for qsort in FreeBSD).
>> Why not do an adaptive qsort instead like other standard libraries?
>> Also, nothing actually says qsort doesn=E2=80=99t allocate memory; in =
fact,
>> glibc=E2=80=99s own documentation states that one should not assume =
it is
>> in-place and doesn=E2=80=99t allocate.
>=20
> Hi Jessica,
>=20
>>> qsort() is frequently used to do all kinds of sorting, and some =
pointed out that qsort() can technically be any kind of sorting =
algorithm, but looking around it is not.
>> Because there are variants that are guaranteed n log n? Which is =
better
>> than your n log^2 n.
>=20
> Only so-called mergesort() algorithms can do sorting in O(N log N) =
time, from what I know. And it needs to allocate working memory, when =
the sorting arrays get large. =46rom past experience, malloc() is a =
problem in the fast path. If only mergesort() could pre-allocate that =
memory, or have a pointer to pass working memory. What is the point of =
allocating and freeing memory over and over again, in certain =
applications, doing frequent sorting. The API is broken.

pdqsort is n log n time, in-place and doesn=E2=80=99t allocate, and is =
used,
for example, for Rust=E2=80=99s standard sort_unstable.

>>> When I build applications of my own, I always use mergesort(), which =
depend on malloc(). Having a dependency on a certain memory allocator to =
get performance is not good.
>>>=20
>>> I want to distinguish from qsort() by calling it bsort(). If people =
use bsort() they know what they get cross-platform.
>> No they don=E2=80=99t, bsort can mean multiple things.
>=20
> In the C-code standard domain, can you give examples of different =
meanings of bsort() which are established? I'm not aware of any such =
existing usage.

The b could be bitonic, bubble, binary insertion or something else?
Searching the literature turns up
https://dl.acm.org/doi/pdf/10.1145/3341.3348 as an algorithm described
in the April 1985 Communications of the ACM.

>> If you want a specific
>> algorithm, put it in your program, but please don=E2=80=99t, just use =
a
>> standardised function like qsort.
>=20
> Sorting is a quite generic thing.
>=20
> Likewise with <sys/queue.h> we support four different ways to make =
lists. And then after epoch(9) was introduced came another three ways, =
basically replicas of <sys/queue.h> with different properties (See: =
sys/contrib/ck/include/ck_queue.h)
>=20
> Personally I think the same about mergesort(), qsort() and bsort(). =
They are so different and unique ways to sort that we should have all of =
them in libc. They each serve a purpose.

Except they=E2=80=99re not, because all of what bsort provides can be =
provided
by qsort. All people care about are being able to quickly sort their
data and whether it=E2=80=99s a stable sort. Occasionally people care =
about
allocating, but not often. We have multiple list implementations in
sys/queue.h because they support different sets of operations. Arguably
SLIST and LIST should be subsumed by STAILQ and TAILQ respectively,
though.

Meanwhile the CK variants exist because they=E2=80=99re lockless data
structures, which the sys/queue.h ones are not (and shouldn=E2=80=99t =
be).

>>> If people use qsort() the output is random basically. It helps very =
little my application works on FreeBSD, but not Linux, for example.
>> No, the output is sorted, which is far from random. And if you need a
>> stable sort you should ask for one.
>=20
> Let me rephrase: Random with regards to execution time.
>=20
> mergesort() : not for realtime applications (syscalls can sleep when =
getting working memory)
> qsort() : usually fast, but not always
> bsort() : Worst case is "N * log2 N" compared to "N * N" (qsort), and =
no syscalls are involved.

Modified qsort() : Wort case is =E2=80=9CN * log N=E2=80=9D, and no =
syscalls are involved.

Please just adapt qsort to not have degenerate behaviour; it will
improve our implementation of a standard C function and render your
bsort() entirely obsolete.

>>> In FreeBSD qsort() is typically used for sorting files up to 16K =
entries per directory. Even if qsort() can explode in time, probably =
it's not that important. But when using qsort() for sorting millions of =
mathematical expressions, for example, doing number analysis, this is =
unacceptable.
>>>=20
>>> I think "C.A.R. Hoare's quicksort" technique is designed for single =
CPU systemsf only. Even if the best-case average is "N*log2(N)", that =
amount of processing cannot be split by multiple CPUs. The algorithm is =
serial as such.
>>>=20
>>> The bsort() algorithm is much more NCPU friendly, because it can =
split the work into fixed size smaller and independent work loads. =
Otherwise the work load doubles every time you merge two sorted lists. =
And when the number of lists to merge is fewer than the number of CPUs =
available, your system runs out of guts basically.
>> I highly doubt you want a libc sort routine to start spawning =
threads.
>=20
> Right. I'm thinking more like Grand Dispatch Central, though not that =
common in FreeBSD, we still have a wiki page:
>=20
> https://wiki.freebsd.org/GrandCentralDispatch
>=20
> bsort() itself doesn't have to create threads to be SMP'ed, but can =
provide the basic functions for SMP sorting.
>=20
> When sorting records, the clever way to do it, is to make a list of =
pointers to those records, and sort based on that, swapping pointers =
first, then records in the end.
>=20
> Technically you could have 4 CPUs go in a tight spin on different =
parts of the bsort() algorithm, and by use of existing atomic compare =
and set pointer operations, you could make a fully deterministic sorting =
algorithm, even if the different CPUs clobber on the same memory =
locations.

That sounds like a great way to get a big game of cache line ping pong.

> This is something I want to explore later.
>=20
> Neither mergesort() nor qsort() can do that.

You can still parallelise them, there=E2=80=99s just a serial step to =
merge or
partition respectively.

>> That also seems extremely contradictory to your claim about wanting =
one
>> that doesn=E2=80=99t allocate memory, but creating threads is going =
to do much
>> more than just that, and would scream POLA violation to me.
>=20
> See my explanation above. TAILQ_INSERT_TAIL() can also be used in =
multithreading environments, without having to link to pthreads.

Only under a lock. At which point multithreading isn't relevant.

Please revert this all. FreeBSD=E2=80=99s libc isn=E2=80=99t a place for =
your pet
projects. New public interfaces in something as core as libc should
have good justification, and I still do not see that here.

>>>> I also note that the swap code is pointlessly slow for size > 256 =
and
>>>> should almost certainly use aliasing with matching words like =
memcpy
>>>> implementations do.  Doing so would make it easier to port this =
code to
>>>> CHERI where that is required.
>>>=20
>>> I totally agree about the swap code being pointless. And I tried to =
look where is memswap(), but it was not there. This kind of swapping is =
done many places, and we could benefit from having a compiler supported =
memswap() function. What do you think?
>> We agree: https://github.com/CTSRD-CHERI/cheribsd/issues/207. But the
>> world doesn=E2=80=99t have that in C.
>=20
> Yes, that's a good idea! I'll try to follow that lead.
>=20
>> We don=E2=80=99t need new non-standard sorting routines in libc.
>=20
> We don't need English tea either ;-)

I don=E2=80=99t know, nor do I care, what tea has to do with sorting =
routines.

Jess