From nobody Thu Dec 28 17:20:18 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4T1FdG67tXz56CB4; Thu, 28 Dec 2023 17:20:18 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4T1FdG4tbKz3Zpw; Thu, 28 Dec 2023 17:20:18 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1703784018; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=g7CCQnv3Fup5JGP1qaipT8uUIWisyq5fJ/aNZUEoX1Q=; b=OXu6fUyaromXpUWAExsNkRwoHNi3HevDDFD3qwfW1dEymUaL1PoLSkAUoeAxMSCowS4jxm L7vxXQbZ0IZdNpxLC19oZSkfJxy5tJhxr87IwIvCLCBjMPrWrCtIBv72NR3gPIQpDfdqoG E7d6K1y49yLAjq8laC8K27jigg4I+ZxrfArmHKLAo2c57rVEzj8oQUPcjPJRltNCjuZRSe 5f6CJNDAHjKVItPDMFAoVE0WgXnleEr1uNr6PI22kts1owboowsHHj3jnt+n6oROgBSF3S gOWPeNYCLdUIob4KUARRoJSGpME7CZ+lFYYkrkCa0XuDLpDwbIqeF0hdSPBWzw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1703784018; a=rsa-sha256; cv=none; b=b/POAdFTzMZKPjOdfZCeK1OpeZU6a2iOlbT5QxuPsZysOG7zxzwhwJz7+kj2wkxmsh2TE3 msmX2dEHiAw/W5l7Cjfnpmt8RIsMwcB0+ZpSiNA2+SKZx0a/voHJQeYq7UBiWKVefFWUA/ vbUoUndrF20A6tXO23RyseylSjfm/yfm++320xirtlptIejSBruPk2Jfi7/bfeNojyZ6Fm PjPofcSw/l6Vx1n+5OMNBYfilXfXsm0xKXPvYweYjkCvOtFGfxxaNJM1CTbKa2F66TYUQM +3Wm5vB2PZJNAaIoNvJamEPWsbzHk14CeBjpoZQcs3Hc1hWwjlEnyGIB0k8Dlg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1703784018; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=g7CCQnv3Fup5JGP1qaipT8uUIWisyq5fJ/aNZUEoX1Q=; b=Tm0/x3wppigq2L6rxI1LTsrj8qUqyIz4zmMFacFoMYmluLplz3tfpEe6MDa1ThaS8Yj8tW kIRpfDu/4dMR/piv3V+Y2RxmuTqullAmUGlBGD2gQmH8+Q9OXHjg0yUmtdOVW8W08/l6OO 9V1568EAxCM9fCHMZlQZO4/cjGOTT2ru+2FeJlgjqwjspNcCQSd2TrG0N5iiBYYLWhQI76 haH+dW6OezoQtfmg0c+JsFaRi215IY6C603KGukZMq3SUWUlz7Q8+YsBCHkMF++qsj1KPK UHKFBOW1cyaLv/ByVrfXylEjuEBO89jgLYr8udJqxxydzglTxg5jbS1ntJrwUA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4T1FdG3XfZzXQr; Thu, 28 Dec 2023 17:20:18 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 3BSHKIba082500; Thu, 28 Dec 2023 17:20:18 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 3BSHKI4L082497; Thu, 28 Dec 2023 17:20:18 GMT (envelope-from git) Date: Thu, 28 Dec 2023 17:20:18 GMT Message-Id: <202312281720.3BSHKI4L082497@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Robert Clausecker Subject: git: 9a6a587e672b - stable/14 - lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/stable/14 X-Git-Reftype: branch X-Git-Commit: 9a6a587e672baaed3470f6cf4a27a0d1166ca372 Auto-Submitted: auto-generated The branch stable/14 has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=9a6a587e672baaed3470f6cf4a27a0d1166ca372 commit 9a6a587e672baaed3470f6cf4a27a0d1166ca372 Author: Robert Clausecker AuthorDate: 2023-10-15 19:25:53 +0000 Commit: Robert Clausecker CommitDate: 2023-12-28 17:02:41 +0000 lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE) implementation was omitted this time as I was not able to get it to perform adequately. Best I got was 8% over the scalar version for long inputs, but slower for short inputs. Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Inspired by: https://github.com/moon-chilled/fancy-memcmp Differential Revision: https://reviews.freebsd.org/D41696 (cherry picked from commit 5048c1b85506c5e0f441ee7dd98dd8d96d0a4a24) --- lib/libc/amd64/string/Makefile.inc | 4 +- lib/libc/amd64/string/timingsafe_memcmp.S | 145 ++++++++++++++++++++++++++++++ 2 files changed, 147 insertions(+), 2 deletions(-) diff --git a/lib/libc/amd64/string/Makefile.inc b/lib/libc/amd64/string/Makefile.inc index fc420de0450e..09bf7c8f251e 100644 --- a/lib/libc/amd64/string/Makefile.inc +++ b/lib/libc/amd64/string/Makefile.inc @@ -1,4 +1,3 @@ - MDSRCS+= \ amd64_archlevel.c \ bcmp.S \ @@ -16,4 +15,5 @@ MDSRCS+= \ strlen.S \ strnlen.c \ strspn.S \ - timingsafe_bcmp.S + timingsafe_bcmp.S \ + timingsafe_memcmp.S diff --git a/lib/libc/amd64/string/timingsafe_memcmp.S b/lib/libc/amd64/string/timingsafe_memcmp.S new file mode 100644 index 000000000000..3f1eccdbd640 --- /dev/null +++ b/lib/libc/amd64/string/timingsafe_memcmp.S @@ -0,0 +1,145 @@ +/*- + * Copyright (c) 2023 The FreeBSD Foundation + * + * This software was developed by Robert Clausecker + * under sponsorship from the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE + */ + +#include + +#define ALIGN_TEXT .p2align 4,0x90 /* 16-byte alignment, nop filled */ + +/* int timingsafe_memcmp(const void *rdi, const void *rsi, size_t rdx) */ +ENTRY(timingsafe_memcmp) + cmp $16, %rdx # at least 17 bytes to process? + ja .Lgt16 + + cmp $8, %edx # at least 9 bytes to process? + ja .L0916 + + cmp $4, %edx # at least 5 bytes to process? + ja .L0508 + + cmp $2, %edx # at least 3 bytes to process? + ja .L0304 + + test %edx, %edx # buffer empty? + jnz .L0102 + + xor %eax, %eax # empty buffer always matches + ret + +.L0102: movzbl -1(%rdi, %rdx, 1), %eax # load 1--2 bytes from first buffer + movzbl -1(%rsi, %rdx, 1), %ecx + mov (%rdi), %ah # in big endian + mov (%rsi), %ch + sub %ecx, %eax + ret + +.L0304: movzwl -2(%rdi, %rdx, 1), %ecx + movzwl -2(%rsi, %rdx, 1), %edx + movzwl (%rdi), %eax + movzwl (%rsi), %esi + bswap %ecx # convert to big endian + bswap %edx # dito for edx, (e)ax, and (e)si + rol $8, %ax # ROLW is used here so the upper two + rol $8, %si # bytes stay clear, allowing us to + sub %edx, %ecx # save a SBB compared to .L0508 + sbb %esi, %eax + or %eax, %ecx # nonzero if not equal + setnz %al + ret + +.L0508: mov -4(%rdi, %rdx, 1), %ecx + mov -4(%rsi, %rdx, 1), %edx + mov (%rdi), %edi + mov (%rsi), %esi + bswap %ecx # compare in big endian + bswap %edx + bswap %edi + bswap %esi + sub %edx, %ecx + sbb %esi, %edi + sbb %eax, %eax # -1 if less, 0 if greater or equal + or %edi, %ecx # nonzero if not equal + setnz %al # negative if <, 0 if =, 1 if > + ret + +.L0916: mov -8(%rdi, %rdx, 1), %rcx + mov -8(%rsi, %rdx, 1), %rdx + mov (%rdi), %rdi + mov (%rsi), %rsi + bswap %rcx # compare in big endian + bswap %rdx + bswap %rdi + bswap %rsi + sub %rdx, %rcx + sbb %rsi, %rdi + sbb %eax, %eax # -1 if less, 0 if greater or equal + or %rdi, %rcx # nonzero if not equal + setnz %al # negative if <, 0 if =, 1 if > + ret + + /* compare 17+ bytes */ +.Lgt16: mov (%rdi), %r8 # process first 16 bytes + mov (%rsi), %r9 + mov $32, %ecx + cmp %r8, %r9 # mismatch in head? + cmove 8(%rdi), %r8 # if not, try second pair + cmove 8(%rsi), %r9 + cmp %rdx, %rcx + jae .Ltail + + /* main loop processing 16 bytes per iteration */ + ALIGN_TEXT +0: mov -16(%rdi, %rcx, 1), %r10 + mov -16(%rsi, %rcx, 1), %r11 + cmp %r10, %r11 # mismatch in first pair? + cmove -8(%rdi, %rcx, 1), %r10 # if not, try second pair + cmove -8(%rsi, %rcx, 1), %r11 + cmp %r8, %r9 # was there a mismatch previously? + cmove %r10, %r8 # apply new pair if there was not + cmove %r11, %r9 + add $16, %rcx + cmp %rdx, %rcx + jb 0b + +.Ltail: mov -8(%rdi, %rdx, 1), %r10 + mov -8(%rsi, %rdx, 1), %r11 + cmp %r8, %r9 + cmove -16(%rdi, %rdx, 1), %r8 + cmove -16(%rsi, %rdx, 1), %r9 + bswap %r10 # compare in big endian + bswap %r11 + bswap %r8 + bswap %r9 + sub %r11, %r10 + sbb %r9, %r8 + sbb %eax, %eax # -1 if less, 0 if greater or equal + or %r10, %r8 # nonzero if not equal + setnz %al # negative if <, 0 if =, 1 if > + ret +END(timingsafe_memcmp) + + .section .note.GNU-stack,"",%progbits