Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Sep 2017 23:12:33 +0000 (UTC)
From:      Conrad Meyer <cem@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r324037 - in head: share/man/man4 sys/conf sys/crypto/aesni sys/modules/aesni tests/sys/opencrypto
Message-ID:  <201709262312.v8QNCXvB005324@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: cem
Date: Tue Sep 26 23:12:32 2017
New Revision: 324037
URL: https://svnweb.freebsd.org/changeset/base/324037

Log:
  aesni(4): Add support for x86 SHA intrinsics
  
  Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
  Provide this functionality on CPUs that support it.
  
  This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.
  
  Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
  enhanced to verify SHA1 and SHA256 HMAC using standard NIST test vectors.
  The test passes on this driver.  Additionally, jhb's cryptocheck tool has
  been used to compare various random inputs against OpenSSL.  This test also
  passes.
  
  Rough performance averages on AMD Ryzen 1950X (4kB buffer):
  aesni:      SHA1: ~8300 Mb/s    SHA256: ~8000 Mb/s
  cryptosoft:       ~1800 Mb/s    SHA256: ~1800 Mb/s
  
  So ~4.4-4.6x speedup depending on algorithm choice.  This is consistent with
  the results the Linux folks saw for 4kB buffers.
  
  The driver borrows SHA update code from sys/crypto sha1 and sha256.  The
  intrinsic step function comes from Intel under a 3-clause BSDL.[0]  The
  intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
  modified (added const, resolved a warning or two; included the sha_sse
  header to declare the functions).
  
  [0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations
  
  Reviewed by:	jhb
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12452

Added:
  head/sys/crypto/aesni/intel_sha1.c   (contents, props changed)
  head/sys/crypto/aesni/intel_sha256.c   (contents, props changed)
  head/sys/crypto/aesni/sha_sse.h   (contents, props changed)
Modified:
  head/share/man/man4/aesni.4
  head/sys/conf/files.amd64
  head/sys/conf/files.i386
  head/sys/crypto/aesni/aesni.c
  head/sys/crypto/aesni/aesni.h
  head/sys/modules/aesni/Makefile
  head/tests/sys/opencrypto/cryptotest.py

Modified: head/share/man/man4/aesni.4
==============================================================================
--- head/share/man/man4/aesni.4	Tue Sep 26 22:32:08 2017	(r324036)
+++ head/share/man/man4/aesni.4	Tue Sep 26 23:12:32 2017	(r324037)
@@ -24,12 +24,12 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd December 14, 2015
+.Dd September 26, 2017
 .Dt AESNI 4
 .Os
 .Sh NAME
 .Nm aesni
-.Nd "driver for the AES accelerator on Intel CPUs"
+.Nd "driver for the AES and SHA accelerator on x86 CPUs"
 .Sh SYNOPSIS
 To compile this driver into the kernel,
 place the following lines in your
@@ -47,8 +47,8 @@ module at boot time, place the following line in
 aesni_load="YES"
 .Ed
 .Sh DESCRIPTION
-Starting with some models of Core i5/i7, Intel processors implement
-a new set of instructions called AESNI.
+Starting with Intel Westmere and AMD Bulldozer, some x86 processors implement a
+new set of instructions called AESNI.
 The set of six instructions accelerates the calculation of the key
 schedule for key lengths of 128, 192, and 256 of the Advanced
 Encryption Standard (AES) symmetric cipher, and provides a hardware
@@ -56,13 +56,24 @@ implementation of the regular and the last encryption 
 rounds.
 .Pp
 The processor capability is reported as AESNI in the Features2 line at boot.
+.Pp
+Starting with the Intel Goldmont and AMD Ryzen microarchitectures, some x86
+processors implement a new set of SHA instructions.
+The set of seven instructions accelerates the calculation of SHA1 and SHA256
+hashes.
+.Pp
+The processor capability is reported as SHA in the Structured Extended Features
+line at boot.
+.Pp
 The
 .Nm
-driver does not attach on systems that lack the required CPU capability.
+driver does not attach on systems that lack both CPU capabilities.
+On systems that support only one of AESNI or SHA extensions, the driver will
+attach and support that one function.
 .Pp
 The
 .Nm
-driver registers itself to accelerate AES operations for
+driver registers itself to accelerate AES and SHA operations for
 .Xr crypto 4 .
 Besides speed, the advantage of using the
 .Nm
@@ -83,13 +94,18 @@ The
 .Nm
 driver first appeared in
 .Fx 9.0 .
+SHA support was added in
+.Fx 12.0 .
 .Sh AUTHORS
 .An -nosplit
 The
 .Nm
 driver was written by
-.An Konstantin Belousov Aq Mt kib@FreeBSD.org .
+.An Konstantin Belousov Aq Mt kib@FreeBSD.org
+and
+.An Conrad Meyer Aq Mt cem@FreeBSD.org .
 The key schedule calculation code was adopted from the sample provided
 by Intel and used in the analogous
 .Ox
 driver.
+The hash step intrinsics implementations were supplied by Intel.

Modified: head/sys/conf/files.amd64
==============================================================================
--- head/sys/conf/files.amd64	Tue Sep 26 22:32:08 2017	(r324036)
+++ head/sys/conf/files.amd64	Tue Sep 26 23:12:32 2017	(r324037)
@@ -182,6 +182,16 @@ aesni_wrap.o			optional aesni				\
 crypto/blowfish/bf_enc.c	optional	crypto | ipsec | ipsec_support
 crypto/des/des_enc.c		optional	crypto | ipsec | \
 	ipsec_support | netsmb
+intel_sha1.o			optional	aesni			\
+	dependency	"$S/crypto/aesni/intel_sha1.c"			\
+	compile-with	"${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \
+	no-implicit-rule						\
+	clean		"intel_sha1.o"
+intel_sha256.o			optional	aesni			\
+	dependency	"$S/crypto/aesni/intel_sha256.c"		\
+	compile-with	"${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \
+	no-implicit-rule						\
+	clean		"intel_sha256.o"
 crypto/via/padlock.c		optional	padlock
 crypto/via/padlock_cipher.c	optional	padlock
 crypto/via/padlock_hash.c	optional	padlock

Modified: head/sys/conf/files.i386
==============================================================================
--- head/sys/conf/files.i386	Tue Sep 26 22:32:08 2017	(r324036)
+++ head/sys/conf/files.i386	Tue Sep 26 23:12:32 2017	(r324037)
@@ -132,6 +132,16 @@ aesni_wrap.o			optional aesni				\
 	no-implicit-rule						\
 	clean		"aesni_wrap.o"
 crypto/des/arch/i386/des_enc.S	optional crypto | ipsec | ipsec_support | netsmb
+intel_sha1.o			optional	aesni			\
+	dependency	"$S/crypto/aesni/intel_sha1.c"			\
+	compile-with	"${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \
+	no-implicit-rule						\
+	clean		"intel_sha1.o"
+intel_sha256.o			optional	aesni			\
+	dependency	"$S/crypto/aesni/intel_sha256.c"		\
+	compile-with	"${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \
+	no-implicit-rule						\
+	clean		"intel_sha256.o"
 crypto/via/padlock.c		optional padlock
 crypto/via/padlock_cipher.c	optional padlock
 crypto/via/padlock_hash.c	optional padlock

Modified: head/sys/crypto/aesni/aesni.c
==============================================================================
--- head/sys/crypto/aesni/aesni.c	Tue Sep 26 22:32:08 2017	(r324036)
+++ head/sys/crypto/aesni/aesni.c	Tue Sep 26 23:12:32 2017	(r324037)
@@ -2,6 +2,7 @@
  * Copyright (c) 2005-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>
  * Copyright (c) 2010 Konstantin Belousov <kib@FreeBSD.org>
  * Copyright (c) 2014 The FreeBSD Foundation
+ * Copyright (c) 2017 Conrad Meyer <cem@FreeBSD.org>
  * All rights reserved.
  *
  * Portions of this software were developed by John-Mark Gurney
@@ -46,10 +47,24 @@ __FBSDID("$FreeBSD$");
 #include <sys/uio.h>
 #include <sys/mbuf.h>
 #include <sys/smp.h>
+
 #include <crypto/aesni/aesni.h>
-#include <cryptodev_if.h>
+#include <crypto/aesni/sha_sse.h>
+#include <crypto/sha1.h>
+#include <crypto/sha2/sha256.h>
+
+#include <opencrypto/cryptodev.h>
 #include <opencrypto/gmac.h>
+#include <cryptodev_if.h>
 
+#include <machine/md_var.h>
+#include <machine/specialreg.h>
+#if defined(__i386__)
+#include <machine/npx.h>
+#elif defined(__amd64__)
+#include <machine/fpu.h>
+#endif
+
 static struct mtx_padalign *ctx_mtx;
 static struct fpu_kern_ctx **ctx_fpu;
 
@@ -57,6 +72,8 @@ struct aesni_softc {
 	int	dieing;
 	int32_t cid;
 	uint32_t sid;
+	bool	has_aes;
+	bool	has_sha;
 	TAILQ_HEAD(aesni_sessions_head, aesni_session) sessions;
 	struct rwlock lock;
 };
@@ -79,9 +96,13 @@ static int aesni_freesession(device_t, uint64_t tid);
 static void aesni_freesession_locked(struct aesni_softc *sc,
     struct aesni_session *ses);
 static int aesni_cipher_setup(struct aesni_session *ses,
-    struct cryptoini *encini);
+    struct cryptoini *encini, struct cryptoini *authini);
 static int aesni_cipher_process(struct aesni_session *ses,
     struct cryptodesc *enccrd, struct cryptodesc *authcrd, struct cryptop *crp);
+static int aesni_cipher_crypt(struct aesni_session *ses,
+    struct cryptodesc *enccrd, struct cryptodesc *authcrd, struct cryptop *crp);
+static int aesni_cipher_mac(struct aesni_session *ses, struct cryptodesc *crd,
+    struct cryptop *crp);
 
 MALLOC_DEFINE(M_AESNI, "aesni_data", "AESNI Data");
 
@@ -95,21 +116,33 @@ aesni_identify(driver_t *drv, device_t parent)
 		panic("aesni: could not attach");
 }
 
+static void
+detect_cpu_features(bool *has_aes, bool *has_sha)
+{
+
+	*has_aes = ((cpu_feature2 & CPUID2_AESNI) != 0 &&
+	    (cpu_feature2 & CPUID2_SSE41) != 0);
+	*has_sha = ((cpu_stdext_feature & CPUID_STDEXT_SHA) != 0 &&
+	    (cpu_feature2 & CPUID2_SSSE3) != 0);
+}
+
 static int
 aesni_probe(device_t dev)
 {
+	bool has_aes, has_sha;
 
-	if ((cpu_feature2 & CPUID2_AESNI) == 0) {
-		device_printf(dev, "No AESNI support.\n");
+	detect_cpu_features(&has_aes, &has_sha);
+	if (!has_aes && !has_sha) {
+		device_printf(dev, "No AES or SHA support.\n");
 		return (EINVAL);
-	}
+	} else if (has_aes && has_sha)
+		device_set_desc(dev,
+		    "AES-CBC,AES-XTS,AES-GCM,AES-ICM,SHA1,SHA256");
+	else if (has_aes)
+		device_set_desc(dev, "AES-CBC,AES-XTS,AES-GCM,AES-ICM");
+	else
+		device_set_desc(dev, "SHA1,SHA256");
 
-	if ((cpu_feature2 & CPUID2_SSE41) == 0) {
-		device_printf(dev, "No SSE4.1 support.\n");
-		return (EINVAL);
-	}
-
-	device_set_desc_copy(dev, "AES-CBC,AES-XTS,AES-GCM,AES-ICM");
 	return (0);
 }
 
@@ -161,13 +194,22 @@ aesni_attach(device_t dev)
 	}
 
 	rw_init(&sc->lock, "aesni_lock");
-	crypto_register(sc->cid, CRYPTO_AES_CBC, 0, 0);
-	crypto_register(sc->cid, CRYPTO_AES_ICM, 0, 0);
-	crypto_register(sc->cid, CRYPTO_AES_NIST_GCM_16, 0, 0);
-	crypto_register(sc->cid, CRYPTO_AES_128_NIST_GMAC, 0, 0);
-	crypto_register(sc->cid, CRYPTO_AES_192_NIST_GMAC, 0, 0);
-	crypto_register(sc->cid, CRYPTO_AES_256_NIST_GMAC, 0, 0);
-	crypto_register(sc->cid, CRYPTO_AES_XTS, 0, 0);
+
+	detect_cpu_features(&sc->has_aes, &sc->has_sha);
+	if (sc->has_aes) {
+		crypto_register(sc->cid, CRYPTO_AES_CBC, 0, 0);
+		crypto_register(sc->cid, CRYPTO_AES_ICM, 0, 0);
+		crypto_register(sc->cid, CRYPTO_AES_NIST_GCM_16, 0, 0);
+		crypto_register(sc->cid, CRYPTO_AES_128_NIST_GMAC, 0, 0);
+		crypto_register(sc->cid, CRYPTO_AES_192_NIST_GMAC, 0, 0);
+		crypto_register(sc->cid, CRYPTO_AES_256_NIST_GMAC, 0, 0);
+		crypto_register(sc->cid, CRYPTO_AES_XTS, 0, 0);
+	}
+	if (sc->has_sha) {
+		crypto_register(sc->cid, CRYPTO_SHA1, 0, 0);
+		crypto_register(sc->cid, CRYPTO_SHA1_HMAC, 0, 0);
+		crypto_register(sc->cid, CRYPTO_SHA2_256_HMAC, 0, 0);
+	}
 	return (0);
 }
 
@@ -208,7 +250,8 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct 
 {
 	struct aesni_softc *sc;
 	struct aesni_session *ses;
-	struct cryptoini *encini;
+	struct cryptoini *encini, *authini;
+	bool gcm_hash, gcm;
 	int error;
 
 	if (sidp == NULL || cri == NULL) {
@@ -221,13 +264,20 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct 
 		return (EINVAL);
 
 	ses = NULL;
+	authini = NULL;
 	encini = NULL;
+	gcm = false;
+	gcm_hash = false;
 	for (; cri != NULL; cri = cri->cri_next) {
 		switch (cri->cri_alg) {
+		case CRYPTO_AES_NIST_GCM_16:
+			gcm = true;
+			/* FALLTHROUGH */
 		case CRYPTO_AES_CBC:
 		case CRYPTO_AES_ICM:
 		case CRYPTO_AES_XTS:
-		case CRYPTO_AES_NIST_GCM_16:
+			if (!sc->has_aes)
+				goto unhandled;
 			if (encini != NULL) {
 				CRYPTDEB("encini already set");
 				return (EINVAL);
@@ -241,16 +291,35 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct 
 			 * nothing to do here, maybe in the future cache some
 			 * values for GHASH
 			 */
+			gcm_hash = true;
 			break;
+		case CRYPTO_SHA1:
+		case CRYPTO_SHA1_HMAC:
+		case CRYPTO_SHA2_256_HMAC:
+			if (!sc->has_sha)
+				goto unhandled;
+			if (authini != NULL) {
+				CRYPTDEB("authini already set");
+				return (EINVAL);
+			}
+			authini = cri;
+			break;
 		default:
+unhandled:
 			CRYPTDEB("unhandled algorithm");
 			return (EINVAL);
 		}
 	}
-	if (encini == NULL) {
+	if (encini == NULL && authini == NULL) {
 		CRYPTDEB("no cipher");
 		return (EINVAL);
 	}
+	/*
+	 * GMAC algorithms are only supported with simultaneous GCM.  Likewise
+	 * GCM is not supported without GMAC.
+	 */
+	if (gcm_hash != gcm)
+		return (EINVAL);
 
 	rw_wlock(&sc->lock);
 	if (sc->dieing) {
@@ -275,9 +344,13 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct 
 	ses->used = 1;
 	TAILQ_INSERT_TAIL(&sc->sessions, ses, next);
 	rw_wunlock(&sc->lock);
-	ses->algo = encini->cri_alg;
 
-	error = aesni_cipher_setup(ses, encini);
+	if (encini != NULL)
+		ses->algo = encini->cri_alg;
+	if (authini != NULL)
+		ses->auth_algo = authini->cri_alg;
+
+	error = aesni_cipher_setup(ses, encini, authini);
 	if (error != 0) {
 		CRYPTDEB("setup failed");
 		rw_wlock(&sc->lock);
@@ -299,7 +372,7 @@ aesni_freesession_locked(struct aesni_softc *sc, struc
 
 	sid = ses->id;
 	TAILQ_REMOVE(&sc->sessions, ses, next);
-	*ses = (struct aesni_session){};
+	explicit_bzero(ses, sizeof(*ses));
 	ses->id = sid;
 	TAILQ_INSERT_HEAD(&sc->sessions, ses, next);
 }
@@ -351,6 +424,9 @@ aesni_process(device_t dev, struct cryptop *crp, int h
 
 	for (crd = crp->crp_desc; crd != NULL; crd = crd->crd_next) {
 		switch (crd->crd_alg) {
+		case CRYPTO_AES_NIST_GCM_16:
+			needauth = 1;
+			/* FALLTHROUGH */
 		case CRYPTO_AES_CBC:
 		case CRYPTO_AES_ICM:
 		case CRYPTO_AES_XTS:
@@ -361,24 +437,17 @@ aesni_process(device_t dev, struct cryptop *crp, int h
 			enccrd = crd;
 			break;
 
-		case CRYPTO_AES_NIST_GCM_16:
-			if (enccrd != NULL) {
-				error = EINVAL;
-				goto out;
-			}
-			enccrd = crd;
-			needauth = 1;
-			break;
-
 		case CRYPTO_AES_128_NIST_GMAC:
 		case CRYPTO_AES_192_NIST_GMAC:
 		case CRYPTO_AES_256_NIST_GMAC:
+		case CRYPTO_SHA1:
+		case CRYPTO_SHA1_HMAC:
+		case CRYPTO_SHA2_256_HMAC:
 			if (authcrd != NULL) {
 				error = EINVAL;
 				goto out;
 			}
 			authcrd = crd;
-			needauth = 1;
 			break;
 
 		default:
@@ -387,14 +456,16 @@ aesni_process(device_t dev, struct cryptop *crp, int h
 		}
 	}
 
-	if (enccrd == NULL || (needauth && authcrd == NULL)) {
+	if ((enccrd == NULL && authcrd == NULL) ||
+	    (needauth && authcrd == NULL)) {
 		error = EINVAL;
 		goto out;
 	}
 
 	/* CBC & XTS can only handle full blocks for now */
-	if ((enccrd->crd_alg == CRYPTO_AES_CBC || enccrd->crd_alg ==
-	    CRYPTO_AES_XTS) && (enccrd->crd_len % AES_BLOCK_LEN) != 0) {
+	if (enccrd != NULL && (enccrd->crd_alg == CRYPTO_AES_CBC ||
+	    enccrd->crd_alg == CRYPTO_AES_XTS) &&
+	    (enccrd->crd_len % AES_BLOCK_LEN) != 0) {
 		error = EINVAL;
 		goto out;
 	}
@@ -420,9 +491,9 @@ out:
 	return (error);
 }
 
-uint8_t *
+static uint8_t *
 aesni_cipher_alloc(struct cryptodesc *enccrd, struct cryptop *crp,
-    int *allocated)
+    bool *allocated)
 {
 	struct mbuf *m;
 	struct uio *uio;
@@ -442,18 +513,18 @@ aesni_cipher_alloc(struct cryptodesc *enccrd, struct c
 		addr = (uint8_t *)iov->iov_base;
 	} else
 		addr = (uint8_t *)crp->crp_buf;
-	*allocated = 0;
+	*allocated = false;
 	addr += enccrd->crd_skip;
 	return (addr);
 
 alloc:
 	addr = malloc(enccrd->crd_len, M_AESNI, M_NOWAIT);
 	if (addr != NULL) {
-		*allocated = 1;
+		*allocated = true;
 		crypto_copydata(crp->crp_flags, crp->crp_buf, enccrd->crd_skip,
 		    enccrd->crd_len, addr);
 	} else
-		*allocated = 0;
+		*allocated = false;
 	return (addr);
 }
 
@@ -482,13 +553,28 @@ MODULE_VERSION(aesni, 1);
 MODULE_DEPEND(aesni, crypto, 1, 1, 1);
 
 static int
-aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini)
+aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini,
+    struct cryptoini *authini)
 {
 	struct fpu_kern_ctx *ctx;
-	int error;
-	int kt, ctxidx;
+	int kt, ctxidx, keylen, error;
 
-	kt = is_fpu_kern_thread(0);
+	switch (ses->auth_algo) {
+	case CRYPTO_SHA1:
+	case CRYPTO_SHA1_HMAC:
+	case CRYPTO_SHA2_256_HMAC:
+		if (authini->cri_klen % 8 != 0)
+			return (EINVAL);
+		keylen = authini->cri_klen / 8;
+		if (keylen > sizeof(ses->hmac_key))
+			return (EINVAL);
+		if (ses->auth_algo == CRYPTO_SHA1 && keylen > 0)
+			return (EINVAL);
+		memcpy(ses->hmac_key, authini->cri_key, keylen);
+		ses->mlen = authini->cri_mlen;
+	}
+
+	kt = is_fpu_kern_thread(0) || (encini == NULL);
 	if (!kt) {
 		ACQUIRE_CTX(ctxidx, ctx);
 		error = fpu_kern_enter(curthread, ctx,
@@ -497,8 +583,10 @@ aesni_cipher_setup(struct aesni_session *ses, struct c
 			goto out;
 	}
 
-	error = aesni_cipher_setup_common(ses, encini->cri_key,
-	    encini->cri_klen);
+	error = 0;
+	if (encini != NULL)
+		error = aesni_cipher_setup_common(ses, encini->cri_key,
+		    encini->cri_klen);
 
 	if (!kt) {
 		fpu_kern_leave(curthread, ctx);
@@ -508,52 +596,198 @@ out:
 	return (error);
 }
 
+static int
+intel_sha1_update(void *vctx, const void *vdata, u_int datalen)
+{
+	struct sha1_ctxt *ctx = vctx;
+	const char *data = vdata;
+	size_t gaplen;
+	size_t gapstart;
+	size_t off;
+	size_t copysiz;
+	u_int blocks;
+
+	off = 0;
+	/* Do any aligned blocks without redundant copying. */
+	if (datalen >= 64 && ctx->count % 64 == 0) {
+		blocks = datalen / 64;
+		ctx->c.b64[0] += blocks * 64 * 8;
+		intel_sha1_step(ctx->h.b32, data + off, blocks);
+		off += blocks * 64;
+	}
+
+	while (off < datalen) {
+		gapstart = ctx->count % 64;
+		gaplen = 64 - gapstart;
+
+		copysiz = (gaplen < datalen - off) ? gaplen : datalen - off;
+		bcopy(&data[off], &ctx->m.b8[gapstart], copysiz);
+		ctx->count += copysiz;
+		ctx->count %= 64;
+		ctx->c.b64[0] += copysiz * 8;
+		if (ctx->count % 64 == 0)
+			intel_sha1_step(ctx->h.b32, (void *)ctx->m.b8, 1);
+		off += copysiz;
+	}
+	return (0);
+}
+
+static void
+SHA1_Finalize_fn(void *digest, void *ctx)
+{
+	sha1_result(ctx, digest);
+}
+
+static int
+intel_sha256_update(void *vctx, const void *vdata, u_int len)
+{
+	SHA256_CTX *ctx = vctx;
+	uint64_t bitlen;
+	uint32_t r;
+	u_int blocks;
+	const unsigned char *src = vdata;
+
+	/* Number of bytes left in the buffer from previous updates */
+	r = (ctx->count >> 3) & 0x3f;
+
+	/* Convert the length into a number of bits */
+	bitlen = len << 3;
+
+	/* Update number of bits */
+	ctx->count += bitlen;
+
+	/* Handle the case where we don't need to perform any transforms */
+	if (len < 64 - r) {
+		memcpy(&ctx->buf[r], src, len);
+		return (0);
+	}
+
+	/* Finish the current block */
+	memcpy(&ctx->buf[r], src, 64 - r);
+	intel_sha256_step(ctx->state, ctx->buf, 1);
+	src += 64 - r;
+	len -= 64 - r;
+
+	/* Perform complete blocks */
+	if (len >= 64) {
+		blocks = len / 64;
+		intel_sha256_step(ctx->state, src, blocks);
+		src += blocks * 64;
+		len -= blocks * 64;
+	}
+
+	/* Copy left over data into buffer */
+	memcpy(ctx->buf, src, len);
+	return (0);
+}
+
+static void
+SHA256_Finalize_fn(void *digest, void *ctx)
+{
+	SHA256_Final(digest, ctx);
+}
+
 /*
- * authcrd contains the associated date.
+ * Compute the HASH( (key ^ xorbyte) || buf )
  */
+static void
+hmac_internal(void *ctx, uint32_t *res,
+	int (*update)(void *, const void *, u_int),
+	void (*finalize)(void *, void *), uint8_t *key, uint8_t xorbyte,
+	const void *buf, size_t off, size_t buflen, int crpflags)
+{
+	size_t i;
+
+	for (i = 0; i < 64; i++)
+		key[i] ^= xorbyte;
+	update(ctx, key, 64);
+	for (i = 0; i < 64; i++)
+		key[i] ^= xorbyte;
+
+	crypto_apply(crpflags, __DECONST(void *, buf), off, buflen,
+	    __DECONST(int (*)(void *, void *, u_int), update), ctx);
+	finalize(res, ctx);
+}
+
 static int
 aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd,
     struct cryptodesc *authcrd, struct cryptop *crp)
 {
 	struct fpu_kern_ctx *ctx;
-	uint8_t iv[AES_BLOCK_LEN];
-	uint8_t tag[GMAC_DIGEST_LEN];
-	uint8_t *buf, *authbuf;
-	int error, allocated, authallocated;
-	int ivlen, encflag;
-	int kt, ctxidx;
+	int error, ctxidx;
+	bool kt;
 
-	encflag = (enccrd->crd_flags & CRD_F_ENCRYPT) == CRD_F_ENCRYPT;
+	if (enccrd != NULL) {
+		if ((enccrd->crd_alg == CRYPTO_AES_ICM ||
+		    enccrd->crd_alg == CRYPTO_AES_NIST_GCM_16) &&
+		    (enccrd->crd_flags & CRD_F_IV_EXPLICIT) == 0)
+			return (EINVAL);
+	}
 
-	if ((enccrd->crd_alg == CRYPTO_AES_ICM ||
-	    enccrd->crd_alg == CRYPTO_AES_NIST_GCM_16) &&
-	    (enccrd->crd_flags & CRD_F_IV_EXPLICIT) == 0)
-		return (EINVAL);
+	error = 0;
+	kt = is_fpu_kern_thread(0);
+	if (!kt) {
+		ACQUIRE_CTX(ctxidx, ctx);
+		error = fpu_kern_enter(curthread, ctx,
+		    FPU_KERN_NORMAL | FPU_KERN_KTHR);
+		if (error != 0)
+			goto out2;
+	}
 
+	/* Do work */
+	if (enccrd != NULL && authcrd != NULL) {
+		/* Perform the first operation */
+		if (crp->crp_desc == enccrd)
+			error = aesni_cipher_crypt(ses, enccrd, authcrd, crp);
+		else
+			error = aesni_cipher_mac(ses, authcrd, crp);
+		if (error != 0)
+			goto out;
+		/* Perform the second operation */
+		if (crp->crp_desc == enccrd)
+			error = aesni_cipher_mac(ses, authcrd, crp);
+		else
+			error = aesni_cipher_crypt(ses, enccrd, authcrd, crp);
+	} else if (enccrd != NULL)
+		error = aesni_cipher_crypt(ses, enccrd, authcrd, crp);
+	else
+		error = aesni_cipher_mac(ses, authcrd, crp);
+
+	if (error != 0)
+		goto out;
+
+out:
+	if (!kt) {
+		fpu_kern_leave(curthread, ctx);
+out2:
+		RELEASE_CTX(ctxidx, ctx);
+	}
+	return (error);
+}
+
+static int
+aesni_cipher_crypt(struct aesni_session *ses, struct cryptodesc *enccrd,
+	struct cryptodesc *authcrd, struct cryptop *crp)
+{
+	uint8_t iv[AES_BLOCK_LEN], tag[GMAC_DIGEST_LEN], *buf, *authbuf;
+	int error, ivlen;
+	bool encflag, allocated, authallocated;
+
 	buf = aesni_cipher_alloc(enccrd, crp, &allocated);
 	if (buf == NULL)
 		return (ENOMEM);
 
-	error = 0;
-	authbuf = NULL;
-	authallocated = 0;
-	if (authcrd != NULL) {
+	authallocated = false;
+	if (ses->algo == CRYPTO_AES_NIST_GCM_16 && authcrd != NULL) {
 		authbuf = aesni_cipher_alloc(authcrd, crp, &authallocated);
 		if (authbuf == NULL) {
 			error = ENOMEM;
-			goto out1;
+			goto out;
 		}
 	}
 
-	kt = is_fpu_kern_thread(0);
-	if (!kt) {
-		ACQUIRE_CTX(ctxidx, ctx);
-		error = fpu_kern_enter(curthread, ctx,
-		    FPU_KERN_NORMAL|FPU_KERN_KTHR);
-		if (error != 0)
-			goto out2;
-	}
-
+	error = 0;
+	encflag = (enccrd->crd_flags & CRD_F_ENCRYPT) == CRD_F_ENCRYPT;
 	if ((enccrd->crd_flags & CRD_F_KEY_EXPLICIT) != 0) {
 		error = aesni_cipher_setup_common(ses, enccrd->crd_key,
 		    enccrd->crd_klen);
@@ -561,7 +795,6 @@ aesni_cipher_process(struct aesni_session *ses, struct
 			goto out;
 	}
 
-	/* XXX - validate that enccrd and authcrd have/use same key? */
 	switch (enccrd->crd_alg) {
 	case CRYPTO_AES_CBC:
 	case CRYPTO_AES_ICM:
@@ -593,13 +826,6 @@ aesni_cipher_process(struct aesni_session *ses, struct
 			    enccrd->crd_inject, ivlen, iv);
 	}
 
-	if (authcrd != NULL && !encflag)
-		crypto_copydata(crp->crp_flags, crp->crp_buf,
-		    authcrd->crd_inject, GMAC_DIGEST_LEN, tag);
-	else
-		bzero(tag, sizeof tag);
-
-	/* Do work */
 	switch (ses->algo) {
 	case CRYPTO_AES_CBC:
 		if (encflag)
@@ -625,11 +851,21 @@ aesni_cipher_process(struct aesni_session *ses, struct
 			    iv);
 		break;
 	case CRYPTO_AES_NIST_GCM_16:
-		if (encflag)
+		if (authcrd != NULL && !encflag)
+			crypto_copydata(crp->crp_flags, crp->crp_buf,
+			    authcrd->crd_inject, GMAC_DIGEST_LEN, tag);
+		else
+			bzero(tag, sizeof tag);
+
+		if (encflag) {
 			AES_GCM_encrypt(buf, buf, authbuf, iv, tag,
 			    enccrd->crd_len, authcrd->crd_len, ivlen,
 			    ses->enc_schedule, ses->rounds);
-		else {
+
+			if (authcrd != NULL)
+				crypto_copyback(crp->crp_flags, crp->crp_buf,
+				    authcrd->crd_inject, GMAC_DIGEST_LEN, tag);
+		} else {
 			if (!AES_GCM_decrypt(buf, buf, authbuf, iv, tag,
 			    enccrd->crd_len, authcrd->crd_len, ivlen,
 			    ses->enc_schedule, ses->rounds))
@@ -638,28 +874,78 @@ aesni_cipher_process(struct aesni_session *ses, struct
 		break;
 	}
 
-	if (allocated)
-		crypto_copyback(crp->crp_flags, crp->crp_buf, enccrd->crd_skip,
-		    enccrd->crd_len, buf);
-
-	if (!error && authcrd != NULL) {
-		crypto_copyback(crp->crp_flags, crp->crp_buf,
-		    authcrd->crd_inject, GMAC_DIGEST_LEN, tag);
-	}
-
 out:
-	if (!kt) {
-		fpu_kern_leave(curthread, ctx);
-out2:
-		RELEASE_CTX(ctxidx, ctx);
-	}
-
-out1:
 	if (allocated) {
-		bzero(buf, enccrd->crd_len);
+		explicit_bzero(buf, enccrd->crd_len);
 		free(buf, M_AESNI);
 	}
-	if (authallocated)
+	if (authallocated) {
+		explicit_bzero(authbuf, authcrd->crd_len);
 		free(authbuf, M_AESNI);
+	}
 	return (error);
+}
+
+static int
+aesni_cipher_mac(struct aesni_session *ses, struct cryptodesc *crd,
+    struct cryptop *crp)
+{
+	union {
+		struct SHA256Context sha2 __aligned(16);
+		struct sha1_ctxt sha1 __aligned(16);
+	} sctx;
+	uint32_t res[SHA2_256_HASH_LEN / sizeof(uint32_t)];
+	int hashlen;
+
+	if (crd->crd_flags != 0)
+		return (EINVAL);
+
+	switch (ses->auth_algo) {
+	case CRYPTO_SHA1_HMAC:
+		hashlen = SHA1_HASH_LEN;
+		/* Inner hash: (K ^ IPAD) || data */
+		sha1_init(&sctx.sha1);
+		hmac_internal(&sctx.sha1, res, intel_sha1_update,
+		    SHA1_Finalize_fn, ses->hmac_key, 0x36, crp->crp_buf,
+		    crd->crd_skip, crd->crd_len, crp->crp_flags);
+		/* Outer hash: (K ^ OPAD) || inner hash */
+		sha1_init(&sctx.sha1);
+		hmac_internal(&sctx.sha1, res, intel_sha1_update,
+		    SHA1_Finalize_fn, ses->hmac_key, 0x5C, res, 0, hashlen, 0);
+		break;
+	case CRYPTO_SHA1:
+		hashlen = SHA1_HASH_LEN;
+		sha1_init(&sctx.sha1);
+		crypto_apply(crp->crp_flags, crp->crp_buf, crd->crd_skip,
+		    crd->crd_len, __DECONST(int (*)(void *, void *, u_int),
+		    intel_sha1_update), &sctx.sha1);
+		sha1_result(&sctx.sha1, (void *)res);
+		break;
+	case CRYPTO_SHA2_256_HMAC:
+		hashlen = SHA2_256_HASH_LEN;
+		/* Inner hash: (K ^ IPAD) || data */
+		SHA256_Init(&sctx.sha2);
+		hmac_internal(&sctx.sha2, res, intel_sha256_update,
+		    SHA256_Finalize_fn, ses->hmac_key, 0x36, crp->crp_buf,
+		    crd->crd_skip, crd->crd_len, crp->crp_flags);
+		/* Outer hash: (K ^ OPAD) || inner hash */
+		SHA256_Init(&sctx.sha2);
+		hmac_internal(&sctx.sha2, res, intel_sha256_update,
+		    SHA256_Finalize_fn, ses->hmac_key, 0x5C, res, 0, hashlen,
+		    0);
+		break;
+	default:
+		/*
+		 * AES-GMAC authentication is verified while processing the
+		 * enccrd
+		 */
+		return (0);
+	}
+
+	if (ses->mlen != 0 && ses->mlen < hashlen)
+		hashlen = ses->mlen;
+
+	crypto_copyback(crp->crp_flags, crp->crp_buf, crd->crd_inject, hashlen,
+	    (void *)res);
+	return (0);
 }

Modified: head/sys/crypto/aesni/aesni.h
==============================================================================
--- head/sys/crypto/aesni/aesni.h	Tue Sep 26 22:32:08 2017	(r324036)
+++ head/sys/crypto/aesni/aesni.h	Tue Sep 26 23:12:32 2017	(r324037)
@@ -56,12 +56,16 @@ struct aesni_session {
 	uint8_t enc_schedule[AES_SCHED_LEN] __aligned(16);
 	uint8_t dec_schedule[AES_SCHED_LEN] __aligned(16);
 	uint8_t xts_schedule[AES_SCHED_LEN] __aligned(16);
+	/* Same as the SHA256 Blocksize. */
+	uint8_t hmac_key[SHA1_HMAC_BLOCK_LEN] __aligned(16);
 	int algo;
 	int rounds;
 	/* uint8_t *ses_ictx; */
 	/* uint8_t *ses_octx; */
 	/* int ses_mlen; */
 	int used;
+	int auth_algo;
+	int mlen;
 	uint32_t id;
 	TAILQ_ENTRY(aesni_session) next;
 };
@@ -111,7 +115,5 @@ int AES_GCM_decrypt(const unsigned char *in, unsigned 
 
 int aesni_cipher_setup_common(struct aesni_session *ses, const uint8_t *key,
     int keylen);
-uint8_t *aesni_cipher_alloc(struct cryptodesc *enccrd, struct cryptop *crp,
-    int *allocated);
 
 #endif /* _AESNI_H_ */

Added: head/sys/crypto/aesni/intel_sha1.c
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/sys/crypto/aesni/intel_sha1.c	Tue Sep 26 23:12:32 2017	(r324037)
@@ -0,0 +1,261 @@
+/*******************************************************************************
+* Copyright (c) 2013, Intel Corporation 
+* 
+* All rights reserved. 
+* 
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions are
+* met: 
+* 
+* * Redistributions of source code must retain the above copyright
+*   notice, this list of conditions and the following disclaimer.  
+* 
+* * Redistributions in binary form must reproduce the above copyright
+*   notice, this list of conditions and the following disclaimer in the
+*   documentation and/or other materials provided with the
+*   distribution. 
+* 
+* * Neither the name of the Intel Corporation nor the names of its
+*   contributors may be used to endorse or promote products derived from
+*   this software without specific prior written permission. 
+* 
+* 
+* THIS SOFTWARE IS PROVIDED BY INTEL CORPORATION ""AS IS"" AND ANY
+* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL CORPORATION OR
+* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+********************************************************************************
+*
+* Intel SHA Extensions optimized implementation of a SHA-1 update function 
+* 
+* The function takes a pointer to the current hash values, a pointer to the 
+* input data, and a number of 64 byte blocks to process.  Once all blocks have 
+* been processed, the digest pointer is  updated with the resulting hash value.
+* The function only processes complete blocks, there is no functionality to 
+* store partial blocks.  All message padding and hash value initialization must
+* be done outside the update function.  
+* 
+* The indented lines in the loop are instructions related to rounds processing.
+* The non-indented lines are instructions related to the message schedule.
+* 
+* Author: Sean Gulley <sean.m.gulley@intel.com>
+* Date:   July 2013
+*
+********************************************************************************
+*
+* Example complier command line:
+* icc intel_sha_extensions_sha1_intrinsic.c
+* gcc -msha -msse4 intel_sha_extensions_sha1_intrinsic.c
+*
+*******************************************************************************/
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/types.h>
+#include <immintrin.h>
+
+#include <crypto/aesni/sha_sse.h>
+
+void intel_sha1_step(uint32_t *digest, const char *data, uint32_t num_blks) {
+   __m128i abcd, e0, e1;
+   __m128i abcd_save, e_save;
+   __m128i msg0, msg1, msg2, msg3;
+   __m128i shuf_mask, e_mask;
+
+#if 0
+   e_mask    = _mm_set_epi64x(0xFFFFFFFF00000000ull, 0x0000000000000000ull);
+#else
+   (void)e_mask;
+   e0        = _mm_set_epi64x(0, 0);
+#endif
+   shuf_mask = _mm_set_epi64x(0x0001020304050607ull, 0x08090a0b0c0d0e0full);
+
+   // Load initial hash values
+   abcd      = _mm_loadu_si128((__m128i*) digest);
+   e0        = _mm_insert_epi32(e0, *(digest+4), 3);
+   abcd      = _mm_shuffle_epi32(abcd, 0x1B);
+#if 0
+   e0        = _mm_and_si128(e0, e_mask);
+#endif
+
+   while (num_blks > 0) {
+      // Save hash values for addition after rounds
+      abcd_save = abcd;
+      e_save    = e0;
+
+      // Rounds 0-3
+      msg0 = _mm_loadu_si128((const __m128i*) data);
+      msg0 = _mm_shuffle_epi8(msg0, shuf_mask);
+         e0   = _mm_add_epi32(e0, msg0);
+         e1   = abcd;
+         abcd = _mm_sha1rnds4_epu32(abcd, e0, 0);
+
+      // Rounds 4-7
+      msg1 = _mm_loadu_si128((const __m128i*) (data+16));
+      msg1 = _mm_shuffle_epi8(msg1, shuf_mask);
+         e1   = _mm_sha1nexte_epu32(e1, msg1);
+         e0   = abcd;
+         abcd = _mm_sha1rnds4_epu32(abcd, e1, 0);
+      msg0 = _mm_sha1msg1_epu32(msg0, msg1);
+
+      // Rounds 8-11
+      msg2 = _mm_loadu_si128((const __m128i*) (data+32));
+      msg2 = _mm_shuffle_epi8(msg2, shuf_mask);
+         e0   = _mm_sha1nexte_epu32(e0, msg2);
+         e1   = abcd;
+         abcd = _mm_sha1rnds4_epu32(abcd, e0, 0);
+      msg1 = _mm_sha1msg1_epu32(msg1, msg2);
+      msg0 = _mm_xor_si128(msg0, msg2);
+
+      // Rounds 12-15
+      msg3 = _mm_loadu_si128((const __m128i*) (data+48));
+      msg3 = _mm_shuffle_epi8(msg3, shuf_mask);
+         e1   = _mm_sha1nexte_epu32(e1, msg3);
+         e0   = abcd;
+      msg0 = _mm_sha1msg2_epu32(msg0, msg3);
+         abcd = _mm_sha1rnds4_epu32(abcd, e1, 0);
+      msg2 = _mm_sha1msg1_epu32(msg2, msg3);

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201709262312.v8QNCXvB005324>