From nobody Sun Nov 30 01:45:45 2025 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dJqcV2knBz6Jf7q for ; Sun, 30 Nov 2025 01:45:46 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4dJqcV0PTBz3m7x for ; Sun, 30 Nov 2025 01:45:46 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1764467146; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=TqmrKaKEhly3SII9ORvv0OpWrR3r2eCKpDuDh6G3XjY=; b=fwrWfFIUmc3C5oYOtICVrUTriBygszDiN72VPGme5hG8Lj1VpZRtA71s+U+4jlyKZXnxKO wcIGra3E/v1DGCO9+8INISAZSMLFGdTMGehwgoaK+9smJmgS1RTzMoWxfnSj19BfGNZX0H xdIHDpQn78S8iHPTjyO87IJqt6s6TPC8YqVvBZZMG1LDzNDAIscz1dCkDEACj3mulSt4UA d0zZYSagZ7r+bL9eMtHU9i3w8k8IHz6qu/EBkbe0GTziyKQ7nR7bE0GiO+zxOrGEj7THgK DpRg8v0VRmMRgHYvnonNulmQyy8bqd6iTIxZfUZUTz61pfvFQ+lMMwLh2vX3gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1764467146; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=TqmrKaKEhly3SII9ORvv0OpWrR3r2eCKpDuDh6G3XjY=; b=iN2RRlkYNHSi9R5etql79mjsabg5wObc7FWt8CYF+iPmpMmL5/d68abdrzBl8+awvsNG2N 4fTXvu1iOfqQIrl8/yUryf+LRG8+QSG4wUjEEoYTWD+15WT3oXStI8PvCndP+JzvoSiYTm L58nl4fuWcliDxEgkRQ5psFRCUVK3VkyTqbKGYIbQSgi/yeZeGDgDl/3Y8mp8AyKg5biSA aNHXelt/f+bc5JivpqnXatq/1IDc+3fOY7d872MxDTVUQISK7tTDbZTILMG1e71xbQd449 BIVU8Sz2DE1OXaxwB3U5zjTxagdM4w1unB9EutYDVI2uPI0V3AU7IPwnJiVbXw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1764467146; a=rsa-sha256; cv=none; b=jGPxvwmDBFUm4pvFwT0ts5IHkr6nw/Fp5OJ0nVBHx882pHpiFyK8GF8eb7pzToWYfeLyPd ouWaMhn3qyoHgiIUJaHv4zjbA+I0wfB7Y+CVqomv8JLODsON3NSZ5AzmZhiSL6HN9AX5Ir 2O19Zh9Wp2el1Rre0BSWRuLjYZZlIlGRYV22IqJHPxaZifoXn/PKQjGLag5a1f75QbNf24 MUuut0lCsaCKLKs2yZMZ/VLQ8Gt6cQg2yuflPcLokF5Bnf9gX/6RFrLmp9MJnqtVOjYo+h tDyjkmAXkU9jPnYtRGNlup6CnyEdYSmqr1gPigEpRahUXQH5AJrmdcIOBly18w== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) by mxrelay.nyi.freebsd.org (Postfix) with ESMTP id 4dJqcT6VKdzT6 for ; Sun, 30 Nov 2025 01:45:45 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from git (uid 1279) (envelope-from git@FreeBSD.org) id 29cae by gitrepo.freebsd.org (DragonFly Mail Agent v0.13+ on gitrepo.freebsd.org); Sun, 30 Nov 2025 01:45:45 +0000 To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org Cc: Strahinja =?utf-8?Q?Stani=C5=A1?==?utf-8?Q?i=C4=87?= From: Robert Clausecker Subject: git: 8a02704131b8 - stable/15 - libc: scalar strrchr() in RISC-V assembly List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/stable/15 X-Git-Reftype: branch X-Git-Commit: 8a02704131b84826f4a327097361199d9762a471 Auto-Submitted: auto-generated Date: Sun, 30 Nov 2025 01:45:45 +0000 Message-Id: <692ba1c9.29cae.10676243@gitrepo.freebsd.org> The branch stable/15 has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=8a02704131b84826f4a327097361199d9762a471 commit 8a02704131b84826f4a327097361199d9762a471 Author: Strahinja Stanišić AuthorDate: 2024-10-24 16:18:07 +0000 Commit: Robert Clausecker CommitDate: 2025-11-30 00:43:05 +0000 libc: scalar strrchr() in RISC-V assembly Implements strrchr in RISC-V assembly, leading to the following improvements (performance measured on SiFive HF105-001) os: FreeBSD arch: riscv │ strrchr_baseline │ strrchr_scalar │ │ sec/op │ sec/op vs base │ Short 837.2µ ± 1% 574.6µ ± 1% -31.37% (p=0.000 n=20+21) Mid 639.7µ ± 0% 269.7µ ± 0% -57.84% (p=0.000 n=20+21) Long 589.1µ ± 0% 176.7µ ± 0% -70.01% (p=0.000 n=20+21) geomean 680.8µ 301.4µ -55.73% │ strrchr_baseline │ strrchr_scalar │ │ MiB/s │ MiB/s vs base │ Short 149.3 ± 1% 217.6 ± 1% +45.71% (p=0.000 n=20+21) Mid 195.4 ± 0% 463.6 ± 0% +137.22% (p=0.000 n=20+21) Long 212.2 ± 0% 707.4 ± 0% +233.40% (p=0.000 n=20+21) geomean 183.6 414.7 +125.88% MFC after: 1 month MFC to: stable/15 Approved by: mhorne, markj (mentor) Sponsored by: Google LLC (GSoC 2024) Differential Revision: https://reviews.freebsd.org/D47275 (cherry picked from commit df21a004be237a1dccd03c7b47254625eea62fa9) --- lib/libc/riscv/string/Makefile.inc | 2 + lib/libc/riscv/string/strrchr.S | 124 +++++++++++++++++++++++++++++++++++++ 2 files changed, 126 insertions(+) diff --git a/lib/libc/riscv/string/Makefile.inc b/lib/libc/riscv/string/Makefile.inc new file mode 100644 index 000000000000..a9cf8bf52481 --- /dev/null +++ b/lib/libc/riscv/string/Makefile.inc @@ -0,0 +1,2 @@ +MDSRCS+= \ + strrchr.S diff --git a/lib/libc/riscv/string/strrchr.S b/lib/libc/riscv/string/strrchr.S new file mode 100644 index 000000000000..51f34ca21fac --- /dev/null +++ b/lib/libc/riscv/string/strrchr.S @@ -0,0 +1,124 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2024 Strahinja Stanisic + */ + +#include + +/* + * a0 - const char *s + * a1 - int c + */ +ENTRY(strrchr) + /* + * a0 - const char *ptr_align + * a1 - temporary + * a2 - temporary + * a3 - temporary + * a4 - temporary + * a5 - const char[8] cccccccc + * a6 - const uint64_t *save_align + * a7 - const uint64_t save_iter + * t0 - const uintr64_t REP8_0X01 + * t1 - const uintr64_t REP8_0X80 + */ + + /* + * save_align = 0 + * save_iter = 0xFFFFFFFFFFFFFF00 + * REP8_0X01 = 0x0101010101010101 + * cccccccc = (char)c * REP8_0X01 + * REP8_0X80 = (REP8_0X80 << 7) << ((str % 8) * 8) + * ptr_align = str - str % 8 + */ + li t0, 0x01010101 + li a6, 0 + slli a2, a0, 3 + slli t1, t0, 32 + li a7, 0xFFFFFFFFFFFFFF00 + or t0, t0, t1 + andi a1, a1, 0xFF + slli t1, t0, 7 + andi a0, a0, ~0b111 + mul a5, a1, t0 + sll t1, t1, a2 + +.Lloop: /* do { */ + ld a1, 0(a0) /* a1 -> data = *ptr_align */ + not a3, a1 /* a3 -> nhz = ~data */ + xor a2, a1, a5 /* a2 -> iter = data ^ cccccccc */ + sub a1, a1, t0 /* a1 -> hz = data - REP8_0X01 */ + not a4, a2 /* a4 -> nhc = ~iter */ + and a1, a1, a3 /* hz = hz & nhz */ + sub a3, a2, t0 /* a3 -> hc = iter - REP8_0X01 */ + and a1, a1, t1 /* hz = hz & REP8_0X80 */ + and a3, a3, a4 /* hc = hc & nhc */ + addi a4, a1, -1 /* a4 -> mask_end = hz - 1 */ + and a3, a3, t1 /* hc = hc & REP8_0X80 */ + xor a4, a4, a1 /* mask_end = mask_end ^ hz */ + addi a0, a0, 8 /* ptr_align = ptr_align + 8 */ + and a3, a3, a4 /* hc = hc & mask_end */ + slli t1, t0, 7 /* REP8_0X80 = REP8_0X01 << 7 */ + not a4, a4 /* mask_end = ~mask_end */ + + beqz a3, .Lskip_save /* if(!hc) goto skip_save */ + or a2, a2, a4 /* iter = iter | mask_end */ + addi a6, a0, -8 /* save_align = ptr_align - 8 */ + mv a7, a2 /* save_iter = iter */ + +.Lskip_save: + beqz a1, .Lloop /* } while(!hz) */ + +.Lfind_char: + /* + * a1 -> iter = save_iter + * a2 -> mask_iter = 0xFF00000000000000 + * a3 -> match_off = 7 + */ + li a2, 0xFF + mv a1, a7 + slli a2, a2, 56 + li a3, 7 + + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + and a0, a1, a2 + srli a2, a2, 8 + beqz a0, .Lret + + addi a3, a3, -1 + +.Lret: + /* return save_align + match_offset */ + add a0, a6, a3 + ret +END(strrchr)