From owner-svn-src-head@freebsd.org Tue Sep 26 23:12:34 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4059E24FC7; Tue, 26 Sep 2017 23:12:34 +0000 (UTC) (envelope-from cem@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66CE37F2A4; Tue, 26 Sep 2017 23:12:34 +0000 (UTC) (envelope-from cem@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id v8QNCXBN005331; Tue, 26 Sep 2017 23:12:33 GMT (envelope-from cem@FreeBSD.org) Received: (from cem@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id v8QNCXvB005324; Tue, 26 Sep 2017 23:12:33 GMT (envelope-from cem@FreeBSD.org) Message-Id: <201709262312.v8QNCXvB005324@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: cem set sender to cem@FreeBSD.org using -f From: Conrad Meyer Date: Tue, 26 Sep 2017 23:12:33 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r324037 - in head: share/man/man4 sys/conf sys/crypto/aesni sys/modules/aesni tests/sys/opencrypto X-SVN-Group: head X-SVN-Commit-Author: cem X-SVN-Commit-Paths: in head: share/man/man4 sys/conf sys/crypto/aesni sys/modules/aesni tests/sys/opencrypto X-SVN-Commit-Revision: 324037 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Sep 2017 23:12:34 -0000 Author: cem Date: Tue Sep 26 23:12:32 2017 New Revision: 324037 URL: https://svnweb.freebsd.org/changeset/base/324037 Log: aesni(4): Add support for x86 SHA intrinsics Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256. Provide this functionality on CPUs that support it. This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC. Correctness: The cryptotest.py suite in tests/sys/opencrypto has been enhanced to verify SHA1 and SHA256 HMAC using standard NIST test vectors. The test passes on this driver. Additionally, jhb's cryptocheck tool has been used to compare various random inputs against OpenSSL. This test also passes. Rough performance averages on AMD Ryzen 1950X (4kB buffer): aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s So ~4.4-4.6x speedup depending on algorithm choice. This is consistent with the results the Linux folks saw for 4kB buffers. The driver borrows SHA update code from sys/crypto sha1 and sha256. The intrinsic step function comes from Intel under a 3-clause BSDL.[0] The intel_sha_extensions_sha_intrinsic.c files were renamed and lightly modified (added const, resolved a warning or two; included the sha_sse header to declare the functions). [0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations Reviewed by: jhb Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12452 Added: head/sys/crypto/aesni/intel_sha1.c (contents, props changed) head/sys/crypto/aesni/intel_sha256.c (contents, props changed) head/sys/crypto/aesni/sha_sse.h (contents, props changed) Modified: head/share/man/man4/aesni.4 head/sys/conf/files.amd64 head/sys/conf/files.i386 head/sys/crypto/aesni/aesni.c head/sys/crypto/aesni/aesni.h head/sys/modules/aesni/Makefile head/tests/sys/opencrypto/cryptotest.py Modified: head/share/man/man4/aesni.4 ============================================================================== --- head/share/man/man4/aesni.4 Tue Sep 26 22:32:08 2017 (r324036) +++ head/share/man/man4/aesni.4 Tue Sep 26 23:12:32 2017 (r324037) @@ -24,12 +24,12 @@ .\" .\" $FreeBSD$ .\" -.Dd December 14, 2015 +.Dd September 26, 2017 .Dt AESNI 4 .Os .Sh NAME .Nm aesni -.Nd "driver for the AES accelerator on Intel CPUs" +.Nd "driver for the AES and SHA accelerator on x86 CPUs" .Sh SYNOPSIS To compile this driver into the kernel, place the following lines in your @@ -47,8 +47,8 @@ module at boot time, place the following line in aesni_load="YES" .Ed .Sh DESCRIPTION -Starting with some models of Core i5/i7, Intel processors implement -a new set of instructions called AESNI. +Starting with Intel Westmere and AMD Bulldozer, some x86 processors implement a +new set of instructions called AESNI. The set of six instructions accelerates the calculation of the key schedule for key lengths of 128, 192, and 256 of the Advanced Encryption Standard (AES) symmetric cipher, and provides a hardware @@ -56,13 +56,24 @@ implementation of the regular and the last encryption rounds. .Pp The processor capability is reported as AESNI in the Features2 line at boot. +.Pp +Starting with the Intel Goldmont and AMD Ryzen microarchitectures, some x86 +processors implement a new set of SHA instructions. +The set of seven instructions accelerates the calculation of SHA1 and SHA256 +hashes. +.Pp +The processor capability is reported as SHA in the Structured Extended Features +line at boot. +.Pp The .Nm -driver does not attach on systems that lack the required CPU capability. +driver does not attach on systems that lack both CPU capabilities. +On systems that support only one of AESNI or SHA extensions, the driver will +attach and support that one function. .Pp The .Nm -driver registers itself to accelerate AES operations for +driver registers itself to accelerate AES and SHA operations for .Xr crypto 4 . Besides speed, the advantage of using the .Nm @@ -83,13 +94,18 @@ The .Nm driver first appeared in .Fx 9.0 . +SHA support was added in +.Fx 12.0 . .Sh AUTHORS .An -nosplit The .Nm driver was written by -.An Konstantin Belousov Aq Mt kib@FreeBSD.org . +.An Konstantin Belousov Aq Mt kib@FreeBSD.org +and +.An Conrad Meyer Aq Mt cem@FreeBSD.org . The key schedule calculation code was adopted from the sample provided by Intel and used in the analogous .Ox driver. +The hash step intrinsics implementations were supplied by Intel. Modified: head/sys/conf/files.amd64 ============================================================================== --- head/sys/conf/files.amd64 Tue Sep 26 22:32:08 2017 (r324036) +++ head/sys/conf/files.amd64 Tue Sep 26 23:12:32 2017 (r324037) @@ -182,6 +182,16 @@ aesni_wrap.o optional aesni \ crypto/blowfish/bf_enc.c optional crypto | ipsec | ipsec_support crypto/des/des_enc.c optional crypto | ipsec | \ ipsec_support | netsmb +intel_sha1.o optional aesni \ + dependency "$S/crypto/aesni/intel_sha1.c" \ + compile-with "${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \ + no-implicit-rule \ + clean "intel_sha1.o" +intel_sha256.o optional aesni \ + dependency "$S/crypto/aesni/intel_sha256.c" \ + compile-with "${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \ + no-implicit-rule \ + clean "intel_sha256.o" crypto/via/padlock.c optional padlock crypto/via/padlock_cipher.c optional padlock crypto/via/padlock_hash.c optional padlock Modified: head/sys/conf/files.i386 ============================================================================== --- head/sys/conf/files.i386 Tue Sep 26 22:32:08 2017 (r324036) +++ head/sys/conf/files.i386 Tue Sep 26 23:12:32 2017 (r324037) @@ -132,6 +132,16 @@ aesni_wrap.o optional aesni \ no-implicit-rule \ clean "aesni_wrap.o" crypto/des/arch/i386/des_enc.S optional crypto | ipsec | ipsec_support | netsmb +intel_sha1.o optional aesni \ + dependency "$S/crypto/aesni/intel_sha1.c" \ + compile-with "${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \ + no-implicit-rule \ + clean "intel_sha1.o" +intel_sha256.o optional aesni \ + dependency "$S/crypto/aesni/intel_sha256.c" \ + compile-with "${CC} -c ${CFLAGS:C/^-O2$/-O3/:N-nostdinc} ${WERROR} ${PROF} -mmmx -msse -msse4 -msha ${.IMPSRC}" \ + no-implicit-rule \ + clean "intel_sha256.o" crypto/via/padlock.c optional padlock crypto/via/padlock_cipher.c optional padlock crypto/via/padlock_hash.c optional padlock Modified: head/sys/crypto/aesni/aesni.c ============================================================================== --- head/sys/crypto/aesni/aesni.c Tue Sep 26 22:32:08 2017 (r324036) +++ head/sys/crypto/aesni/aesni.c Tue Sep 26 23:12:32 2017 (r324037) @@ -2,6 +2,7 @@ * Copyright (c) 2005-2008 Pawel Jakub Dawidek * Copyright (c) 2010 Konstantin Belousov * Copyright (c) 2014 The FreeBSD Foundation + * Copyright (c) 2017 Conrad Meyer * All rights reserved. * * Portions of this software were developed by John-Mark Gurney @@ -46,10 +47,24 @@ __FBSDID("$FreeBSD$"); #include #include #include + #include -#include +#include +#include +#include + +#include #include +#include +#include +#include +#if defined(__i386__) +#include +#elif defined(__amd64__) +#include +#endif + static struct mtx_padalign *ctx_mtx; static struct fpu_kern_ctx **ctx_fpu; @@ -57,6 +72,8 @@ struct aesni_softc { int dieing; int32_t cid; uint32_t sid; + bool has_aes; + bool has_sha; TAILQ_HEAD(aesni_sessions_head, aesni_session) sessions; struct rwlock lock; }; @@ -79,9 +96,13 @@ static int aesni_freesession(device_t, uint64_t tid); static void aesni_freesession_locked(struct aesni_softc *sc, struct aesni_session *ses); static int aesni_cipher_setup(struct aesni_session *ses, - struct cryptoini *encini); + struct cryptoini *encini, struct cryptoini *authini); static int aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd, struct cryptodesc *authcrd, struct cryptop *crp); +static int aesni_cipher_crypt(struct aesni_session *ses, + struct cryptodesc *enccrd, struct cryptodesc *authcrd, struct cryptop *crp); +static int aesni_cipher_mac(struct aesni_session *ses, struct cryptodesc *crd, + struct cryptop *crp); MALLOC_DEFINE(M_AESNI, "aesni_data", "AESNI Data"); @@ -95,21 +116,33 @@ aesni_identify(driver_t *drv, device_t parent) panic("aesni: could not attach"); } +static void +detect_cpu_features(bool *has_aes, bool *has_sha) +{ + + *has_aes = ((cpu_feature2 & CPUID2_AESNI) != 0 && + (cpu_feature2 & CPUID2_SSE41) != 0); + *has_sha = ((cpu_stdext_feature & CPUID_STDEXT_SHA) != 0 && + (cpu_feature2 & CPUID2_SSSE3) != 0); +} + static int aesni_probe(device_t dev) { + bool has_aes, has_sha; - if ((cpu_feature2 & CPUID2_AESNI) == 0) { - device_printf(dev, "No AESNI support.\n"); + detect_cpu_features(&has_aes, &has_sha); + if (!has_aes && !has_sha) { + device_printf(dev, "No AES or SHA support.\n"); return (EINVAL); - } + } else if (has_aes && has_sha) + device_set_desc(dev, + "AES-CBC,AES-XTS,AES-GCM,AES-ICM,SHA1,SHA256"); + else if (has_aes) + device_set_desc(dev, "AES-CBC,AES-XTS,AES-GCM,AES-ICM"); + else + device_set_desc(dev, "SHA1,SHA256"); - if ((cpu_feature2 & CPUID2_SSE41) == 0) { - device_printf(dev, "No SSE4.1 support.\n"); - return (EINVAL); - } - - device_set_desc_copy(dev, "AES-CBC,AES-XTS,AES-GCM,AES-ICM"); return (0); } @@ -161,13 +194,22 @@ aesni_attach(device_t dev) } rw_init(&sc->lock, "aesni_lock"); - crypto_register(sc->cid, CRYPTO_AES_CBC, 0, 0); - crypto_register(sc->cid, CRYPTO_AES_ICM, 0, 0); - crypto_register(sc->cid, CRYPTO_AES_NIST_GCM_16, 0, 0); - crypto_register(sc->cid, CRYPTO_AES_128_NIST_GMAC, 0, 0); - crypto_register(sc->cid, CRYPTO_AES_192_NIST_GMAC, 0, 0); - crypto_register(sc->cid, CRYPTO_AES_256_NIST_GMAC, 0, 0); - crypto_register(sc->cid, CRYPTO_AES_XTS, 0, 0); + + detect_cpu_features(&sc->has_aes, &sc->has_sha); + if (sc->has_aes) { + crypto_register(sc->cid, CRYPTO_AES_CBC, 0, 0); + crypto_register(sc->cid, CRYPTO_AES_ICM, 0, 0); + crypto_register(sc->cid, CRYPTO_AES_NIST_GCM_16, 0, 0); + crypto_register(sc->cid, CRYPTO_AES_128_NIST_GMAC, 0, 0); + crypto_register(sc->cid, CRYPTO_AES_192_NIST_GMAC, 0, 0); + crypto_register(sc->cid, CRYPTO_AES_256_NIST_GMAC, 0, 0); + crypto_register(sc->cid, CRYPTO_AES_XTS, 0, 0); + } + if (sc->has_sha) { + crypto_register(sc->cid, CRYPTO_SHA1, 0, 0); + crypto_register(sc->cid, CRYPTO_SHA1_HMAC, 0, 0); + crypto_register(sc->cid, CRYPTO_SHA2_256_HMAC, 0, 0); + } return (0); } @@ -208,7 +250,8 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct { struct aesni_softc *sc; struct aesni_session *ses; - struct cryptoini *encini; + struct cryptoini *encini, *authini; + bool gcm_hash, gcm; int error; if (sidp == NULL || cri == NULL) { @@ -221,13 +264,20 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct return (EINVAL); ses = NULL; + authini = NULL; encini = NULL; + gcm = false; + gcm_hash = false; for (; cri != NULL; cri = cri->cri_next) { switch (cri->cri_alg) { + case CRYPTO_AES_NIST_GCM_16: + gcm = true; + /* FALLTHROUGH */ case CRYPTO_AES_CBC: case CRYPTO_AES_ICM: case CRYPTO_AES_XTS: - case CRYPTO_AES_NIST_GCM_16: + if (!sc->has_aes) + goto unhandled; if (encini != NULL) { CRYPTDEB("encini already set"); return (EINVAL); @@ -241,16 +291,35 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct * nothing to do here, maybe in the future cache some * values for GHASH */ + gcm_hash = true; break; + case CRYPTO_SHA1: + case CRYPTO_SHA1_HMAC: + case CRYPTO_SHA2_256_HMAC: + if (!sc->has_sha) + goto unhandled; + if (authini != NULL) { + CRYPTDEB("authini already set"); + return (EINVAL); + } + authini = cri; + break; default: +unhandled: CRYPTDEB("unhandled algorithm"); return (EINVAL); } } - if (encini == NULL) { + if (encini == NULL && authini == NULL) { CRYPTDEB("no cipher"); return (EINVAL); } + /* + * GMAC algorithms are only supported with simultaneous GCM. Likewise + * GCM is not supported without GMAC. + */ + if (gcm_hash != gcm) + return (EINVAL); rw_wlock(&sc->lock); if (sc->dieing) { @@ -275,9 +344,13 @@ aesni_newsession(device_t dev, uint32_t *sidp, struct ses->used = 1; TAILQ_INSERT_TAIL(&sc->sessions, ses, next); rw_wunlock(&sc->lock); - ses->algo = encini->cri_alg; - error = aesni_cipher_setup(ses, encini); + if (encini != NULL) + ses->algo = encini->cri_alg; + if (authini != NULL) + ses->auth_algo = authini->cri_alg; + + error = aesni_cipher_setup(ses, encini, authini); if (error != 0) { CRYPTDEB("setup failed"); rw_wlock(&sc->lock); @@ -299,7 +372,7 @@ aesni_freesession_locked(struct aesni_softc *sc, struc sid = ses->id; TAILQ_REMOVE(&sc->sessions, ses, next); - *ses = (struct aesni_session){}; + explicit_bzero(ses, sizeof(*ses)); ses->id = sid; TAILQ_INSERT_HEAD(&sc->sessions, ses, next); } @@ -351,6 +424,9 @@ aesni_process(device_t dev, struct cryptop *crp, int h for (crd = crp->crp_desc; crd != NULL; crd = crd->crd_next) { switch (crd->crd_alg) { + case CRYPTO_AES_NIST_GCM_16: + needauth = 1; + /* FALLTHROUGH */ case CRYPTO_AES_CBC: case CRYPTO_AES_ICM: case CRYPTO_AES_XTS: @@ -361,24 +437,17 @@ aesni_process(device_t dev, struct cryptop *crp, int h enccrd = crd; break; - case CRYPTO_AES_NIST_GCM_16: - if (enccrd != NULL) { - error = EINVAL; - goto out; - } - enccrd = crd; - needauth = 1; - break; - case CRYPTO_AES_128_NIST_GMAC: case CRYPTO_AES_192_NIST_GMAC: case CRYPTO_AES_256_NIST_GMAC: + case CRYPTO_SHA1: + case CRYPTO_SHA1_HMAC: + case CRYPTO_SHA2_256_HMAC: if (authcrd != NULL) { error = EINVAL; goto out; } authcrd = crd; - needauth = 1; break; default: @@ -387,14 +456,16 @@ aesni_process(device_t dev, struct cryptop *crp, int h } } - if (enccrd == NULL || (needauth && authcrd == NULL)) { + if ((enccrd == NULL && authcrd == NULL) || + (needauth && authcrd == NULL)) { error = EINVAL; goto out; } /* CBC & XTS can only handle full blocks for now */ - if ((enccrd->crd_alg == CRYPTO_AES_CBC || enccrd->crd_alg == - CRYPTO_AES_XTS) && (enccrd->crd_len % AES_BLOCK_LEN) != 0) { + if (enccrd != NULL && (enccrd->crd_alg == CRYPTO_AES_CBC || + enccrd->crd_alg == CRYPTO_AES_XTS) && + (enccrd->crd_len % AES_BLOCK_LEN) != 0) { error = EINVAL; goto out; } @@ -420,9 +491,9 @@ out: return (error); } -uint8_t * +static uint8_t * aesni_cipher_alloc(struct cryptodesc *enccrd, struct cryptop *crp, - int *allocated) + bool *allocated) { struct mbuf *m; struct uio *uio; @@ -442,18 +513,18 @@ aesni_cipher_alloc(struct cryptodesc *enccrd, struct c addr = (uint8_t *)iov->iov_base; } else addr = (uint8_t *)crp->crp_buf; - *allocated = 0; + *allocated = false; addr += enccrd->crd_skip; return (addr); alloc: addr = malloc(enccrd->crd_len, M_AESNI, M_NOWAIT); if (addr != NULL) { - *allocated = 1; + *allocated = true; crypto_copydata(crp->crp_flags, crp->crp_buf, enccrd->crd_skip, enccrd->crd_len, addr); } else - *allocated = 0; + *allocated = false; return (addr); } @@ -482,13 +553,28 @@ MODULE_VERSION(aesni, 1); MODULE_DEPEND(aesni, crypto, 1, 1, 1); static int -aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini) +aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini, + struct cryptoini *authini) { struct fpu_kern_ctx *ctx; - int error; - int kt, ctxidx; + int kt, ctxidx, keylen, error; - kt = is_fpu_kern_thread(0); + switch (ses->auth_algo) { + case CRYPTO_SHA1: + case CRYPTO_SHA1_HMAC: + case CRYPTO_SHA2_256_HMAC: + if (authini->cri_klen % 8 != 0) + return (EINVAL); + keylen = authini->cri_klen / 8; + if (keylen > sizeof(ses->hmac_key)) + return (EINVAL); + if (ses->auth_algo == CRYPTO_SHA1 && keylen > 0) + return (EINVAL); + memcpy(ses->hmac_key, authini->cri_key, keylen); + ses->mlen = authini->cri_mlen; + } + + kt = is_fpu_kern_thread(0) || (encini == NULL); if (!kt) { ACQUIRE_CTX(ctxidx, ctx); error = fpu_kern_enter(curthread, ctx, @@ -497,8 +583,10 @@ aesni_cipher_setup(struct aesni_session *ses, struct c goto out; } - error = aesni_cipher_setup_common(ses, encini->cri_key, - encini->cri_klen); + error = 0; + if (encini != NULL) + error = aesni_cipher_setup_common(ses, encini->cri_key, + encini->cri_klen); if (!kt) { fpu_kern_leave(curthread, ctx); @@ -508,52 +596,198 @@ out: return (error); } +static int +intel_sha1_update(void *vctx, const void *vdata, u_int datalen) +{ + struct sha1_ctxt *ctx = vctx; + const char *data = vdata; + size_t gaplen; + size_t gapstart; + size_t off; + size_t copysiz; + u_int blocks; + + off = 0; + /* Do any aligned blocks without redundant copying. */ + if (datalen >= 64 && ctx->count % 64 == 0) { + blocks = datalen / 64; + ctx->c.b64[0] += blocks * 64 * 8; + intel_sha1_step(ctx->h.b32, data + off, blocks); + off += blocks * 64; + } + + while (off < datalen) { + gapstart = ctx->count % 64; + gaplen = 64 - gapstart; + + copysiz = (gaplen < datalen - off) ? gaplen : datalen - off; + bcopy(&data[off], &ctx->m.b8[gapstart], copysiz); + ctx->count += copysiz; + ctx->count %= 64; + ctx->c.b64[0] += copysiz * 8; + if (ctx->count % 64 == 0) + intel_sha1_step(ctx->h.b32, (void *)ctx->m.b8, 1); + off += copysiz; + } + return (0); +} + +static void +SHA1_Finalize_fn(void *digest, void *ctx) +{ + sha1_result(ctx, digest); +} + +static int +intel_sha256_update(void *vctx, const void *vdata, u_int len) +{ + SHA256_CTX *ctx = vctx; + uint64_t bitlen; + uint32_t r; + u_int blocks; + const unsigned char *src = vdata; + + /* Number of bytes left in the buffer from previous updates */ + r = (ctx->count >> 3) & 0x3f; + + /* Convert the length into a number of bits */ + bitlen = len << 3; + + /* Update number of bits */ + ctx->count += bitlen; + + /* Handle the case where we don't need to perform any transforms */ + if (len < 64 - r) { + memcpy(&ctx->buf[r], src, len); + return (0); + } + + /* Finish the current block */ + memcpy(&ctx->buf[r], src, 64 - r); + intel_sha256_step(ctx->state, ctx->buf, 1); + src += 64 - r; + len -= 64 - r; + + /* Perform complete blocks */ + if (len >= 64) { + blocks = len / 64; + intel_sha256_step(ctx->state, src, blocks); + src += blocks * 64; + len -= blocks * 64; + } + + /* Copy left over data into buffer */ + memcpy(ctx->buf, src, len); + return (0); +} + +static void +SHA256_Finalize_fn(void *digest, void *ctx) +{ + SHA256_Final(digest, ctx); +} + /* - * authcrd contains the associated date. + * Compute the HASH( (key ^ xorbyte) || buf ) */ +static void +hmac_internal(void *ctx, uint32_t *res, + int (*update)(void *, const void *, u_int), + void (*finalize)(void *, void *), uint8_t *key, uint8_t xorbyte, + const void *buf, size_t off, size_t buflen, int crpflags) +{ + size_t i; + + for (i = 0; i < 64; i++) + key[i] ^= xorbyte; + update(ctx, key, 64); + for (i = 0; i < 64; i++) + key[i] ^= xorbyte; + + crypto_apply(crpflags, __DECONST(void *, buf), off, buflen, + __DECONST(int (*)(void *, void *, u_int), update), ctx); + finalize(res, ctx); +} + static int aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd, struct cryptodesc *authcrd, struct cryptop *crp) { struct fpu_kern_ctx *ctx; - uint8_t iv[AES_BLOCK_LEN]; - uint8_t tag[GMAC_DIGEST_LEN]; - uint8_t *buf, *authbuf; - int error, allocated, authallocated; - int ivlen, encflag; - int kt, ctxidx; + int error, ctxidx; + bool kt; - encflag = (enccrd->crd_flags & CRD_F_ENCRYPT) == CRD_F_ENCRYPT; + if (enccrd != NULL) { + if ((enccrd->crd_alg == CRYPTO_AES_ICM || + enccrd->crd_alg == CRYPTO_AES_NIST_GCM_16) && + (enccrd->crd_flags & CRD_F_IV_EXPLICIT) == 0) + return (EINVAL); + } - if ((enccrd->crd_alg == CRYPTO_AES_ICM || - enccrd->crd_alg == CRYPTO_AES_NIST_GCM_16) && - (enccrd->crd_flags & CRD_F_IV_EXPLICIT) == 0) - return (EINVAL); + error = 0; + kt = is_fpu_kern_thread(0); + if (!kt) { + ACQUIRE_CTX(ctxidx, ctx); + error = fpu_kern_enter(curthread, ctx, + FPU_KERN_NORMAL | FPU_KERN_KTHR); + if (error != 0) + goto out2; + } + /* Do work */ + if (enccrd != NULL && authcrd != NULL) { + /* Perform the first operation */ + if (crp->crp_desc == enccrd) + error = aesni_cipher_crypt(ses, enccrd, authcrd, crp); + else + error = aesni_cipher_mac(ses, authcrd, crp); + if (error != 0) + goto out; + /* Perform the second operation */ + if (crp->crp_desc == enccrd) + error = aesni_cipher_mac(ses, authcrd, crp); + else + error = aesni_cipher_crypt(ses, enccrd, authcrd, crp); + } else if (enccrd != NULL) + error = aesni_cipher_crypt(ses, enccrd, authcrd, crp); + else + error = aesni_cipher_mac(ses, authcrd, crp); + + if (error != 0) + goto out; + +out: + if (!kt) { + fpu_kern_leave(curthread, ctx); +out2: + RELEASE_CTX(ctxidx, ctx); + } + return (error); +} + +static int +aesni_cipher_crypt(struct aesni_session *ses, struct cryptodesc *enccrd, + struct cryptodesc *authcrd, struct cryptop *crp) +{ + uint8_t iv[AES_BLOCK_LEN], tag[GMAC_DIGEST_LEN], *buf, *authbuf; + int error, ivlen; + bool encflag, allocated, authallocated; + buf = aesni_cipher_alloc(enccrd, crp, &allocated); if (buf == NULL) return (ENOMEM); - error = 0; - authbuf = NULL; - authallocated = 0; - if (authcrd != NULL) { + authallocated = false; + if (ses->algo == CRYPTO_AES_NIST_GCM_16 && authcrd != NULL) { authbuf = aesni_cipher_alloc(authcrd, crp, &authallocated); if (authbuf == NULL) { error = ENOMEM; - goto out1; + goto out; } } - kt = is_fpu_kern_thread(0); - if (!kt) { - ACQUIRE_CTX(ctxidx, ctx); - error = fpu_kern_enter(curthread, ctx, - FPU_KERN_NORMAL|FPU_KERN_KTHR); - if (error != 0) - goto out2; - } - + error = 0; + encflag = (enccrd->crd_flags & CRD_F_ENCRYPT) == CRD_F_ENCRYPT; if ((enccrd->crd_flags & CRD_F_KEY_EXPLICIT) != 0) { error = aesni_cipher_setup_common(ses, enccrd->crd_key, enccrd->crd_klen); @@ -561,7 +795,6 @@ aesni_cipher_process(struct aesni_session *ses, struct goto out; } - /* XXX - validate that enccrd and authcrd have/use same key? */ switch (enccrd->crd_alg) { case CRYPTO_AES_CBC: case CRYPTO_AES_ICM: @@ -593,13 +826,6 @@ aesni_cipher_process(struct aesni_session *ses, struct enccrd->crd_inject, ivlen, iv); } - if (authcrd != NULL && !encflag) - crypto_copydata(crp->crp_flags, crp->crp_buf, - authcrd->crd_inject, GMAC_DIGEST_LEN, tag); - else - bzero(tag, sizeof tag); - - /* Do work */ switch (ses->algo) { case CRYPTO_AES_CBC: if (encflag) @@ -625,11 +851,21 @@ aesni_cipher_process(struct aesni_session *ses, struct iv); break; case CRYPTO_AES_NIST_GCM_16: - if (encflag) + if (authcrd != NULL && !encflag) + crypto_copydata(crp->crp_flags, crp->crp_buf, + authcrd->crd_inject, GMAC_DIGEST_LEN, tag); + else + bzero(tag, sizeof tag); + + if (encflag) { AES_GCM_encrypt(buf, buf, authbuf, iv, tag, enccrd->crd_len, authcrd->crd_len, ivlen, ses->enc_schedule, ses->rounds); - else { + + if (authcrd != NULL) + crypto_copyback(crp->crp_flags, crp->crp_buf, + authcrd->crd_inject, GMAC_DIGEST_LEN, tag); + } else { if (!AES_GCM_decrypt(buf, buf, authbuf, iv, tag, enccrd->crd_len, authcrd->crd_len, ivlen, ses->enc_schedule, ses->rounds)) @@ -638,28 +874,78 @@ aesni_cipher_process(struct aesni_session *ses, struct break; } - if (allocated) - crypto_copyback(crp->crp_flags, crp->crp_buf, enccrd->crd_skip, - enccrd->crd_len, buf); - - if (!error && authcrd != NULL) { - crypto_copyback(crp->crp_flags, crp->crp_buf, - authcrd->crd_inject, GMAC_DIGEST_LEN, tag); - } - out: - if (!kt) { - fpu_kern_leave(curthread, ctx); -out2: - RELEASE_CTX(ctxidx, ctx); - } - -out1: if (allocated) { - bzero(buf, enccrd->crd_len); + explicit_bzero(buf, enccrd->crd_len); free(buf, M_AESNI); } - if (authallocated) + if (authallocated) { + explicit_bzero(authbuf, authcrd->crd_len); free(authbuf, M_AESNI); + } return (error); +} + +static int +aesni_cipher_mac(struct aesni_session *ses, struct cryptodesc *crd, + struct cryptop *crp) +{ + union { + struct SHA256Context sha2 __aligned(16); + struct sha1_ctxt sha1 __aligned(16); + } sctx; + uint32_t res[SHA2_256_HASH_LEN / sizeof(uint32_t)]; + int hashlen; + + if (crd->crd_flags != 0) + return (EINVAL); + + switch (ses->auth_algo) { + case CRYPTO_SHA1_HMAC: + hashlen = SHA1_HASH_LEN; + /* Inner hash: (K ^ IPAD) || data */ + sha1_init(&sctx.sha1); + hmac_internal(&sctx.sha1, res, intel_sha1_update, + SHA1_Finalize_fn, ses->hmac_key, 0x36, crp->crp_buf, + crd->crd_skip, crd->crd_len, crp->crp_flags); + /* Outer hash: (K ^ OPAD) || inner hash */ + sha1_init(&sctx.sha1); + hmac_internal(&sctx.sha1, res, intel_sha1_update, + SHA1_Finalize_fn, ses->hmac_key, 0x5C, res, 0, hashlen, 0); + break; + case CRYPTO_SHA1: + hashlen = SHA1_HASH_LEN; + sha1_init(&sctx.sha1); + crypto_apply(crp->crp_flags, crp->crp_buf, crd->crd_skip, + crd->crd_len, __DECONST(int (*)(void *, void *, u_int), + intel_sha1_update), &sctx.sha1); + sha1_result(&sctx.sha1, (void *)res); + break; + case CRYPTO_SHA2_256_HMAC: + hashlen = SHA2_256_HASH_LEN; + /* Inner hash: (K ^ IPAD) || data */ + SHA256_Init(&sctx.sha2); + hmac_internal(&sctx.sha2, res, intel_sha256_update, + SHA256_Finalize_fn, ses->hmac_key, 0x36, crp->crp_buf, + crd->crd_skip, crd->crd_len, crp->crp_flags); + /* Outer hash: (K ^ OPAD) || inner hash */ + SHA256_Init(&sctx.sha2); + hmac_internal(&sctx.sha2, res, intel_sha256_update, + SHA256_Finalize_fn, ses->hmac_key, 0x5C, res, 0, hashlen, + 0); + break; + default: + /* + * AES-GMAC authentication is verified while processing the + * enccrd + */ + return (0); + } + + if (ses->mlen != 0 && ses->mlen < hashlen) + hashlen = ses->mlen; + + crypto_copyback(crp->crp_flags, crp->crp_buf, crd->crd_inject, hashlen, + (void *)res); + return (0); } Modified: head/sys/crypto/aesni/aesni.h ============================================================================== --- head/sys/crypto/aesni/aesni.h Tue Sep 26 22:32:08 2017 (r324036) +++ head/sys/crypto/aesni/aesni.h Tue Sep 26 23:12:32 2017 (r324037) @@ -56,12 +56,16 @@ struct aesni_session { uint8_t enc_schedule[AES_SCHED_LEN] __aligned(16); uint8_t dec_schedule[AES_SCHED_LEN] __aligned(16); uint8_t xts_schedule[AES_SCHED_LEN] __aligned(16); + /* Same as the SHA256 Blocksize. */ + uint8_t hmac_key[SHA1_HMAC_BLOCK_LEN] __aligned(16); int algo; int rounds; /* uint8_t *ses_ictx; */ /* uint8_t *ses_octx; */ /* int ses_mlen; */ int used; + int auth_algo; + int mlen; uint32_t id; TAILQ_ENTRY(aesni_session) next; }; @@ -111,7 +115,5 @@ int AES_GCM_decrypt(const unsigned char *in, unsigned int aesni_cipher_setup_common(struct aesni_session *ses, const uint8_t *key, int keylen); -uint8_t *aesni_cipher_alloc(struct cryptodesc *enccrd, struct cryptop *crp, - int *allocated); #endif /* _AESNI_H_ */ Added: head/sys/crypto/aesni/intel_sha1.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sys/crypto/aesni/intel_sha1.c Tue Sep 26 23:12:32 2017 (r324037) @@ -0,0 +1,261 @@ +/******************************************************************************* +* Copyright (c) 2013, Intel Corporation +* +* All rights reserved. +* +* Redistribution and use in source and binary forms, with or without +* modification, are permitted provided that the following conditions are +* met: +* +* * Redistributions of source code must retain the above copyright +* notice, this list of conditions and the following disclaimer. +* +* * Redistributions in binary form must reproduce the above copyright +* notice, this list of conditions and the following disclaimer in the +* documentation and/or other materials provided with the +* distribution. +* +* * Neither the name of the Intel Corporation nor the names of its +* contributors may be used to endorse or promote products derived from +* this software without specific prior written permission. +* +* +* THIS SOFTWARE IS PROVIDED BY INTEL CORPORATION ""AS IS"" AND ANY +* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL CORPORATION OR +* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +******************************************************************************** +* +* Intel SHA Extensions optimized implementation of a SHA-1 update function +* +* The function takes a pointer to the current hash values, a pointer to the +* input data, and a number of 64 byte blocks to process. Once all blocks have +* been processed, the digest pointer is updated with the resulting hash value. +* The function only processes complete blocks, there is no functionality to +* store partial blocks. All message padding and hash value initialization must +* be done outside the update function. +* +* The indented lines in the loop are instructions related to rounds processing. +* The non-indented lines are instructions related to the message schedule. +* +* Author: Sean Gulley +* Date: July 2013 +* +******************************************************************************** +* +* Example complier command line: +* icc intel_sha_extensions_sha1_intrinsic.c +* gcc -msha -msse4 intel_sha_extensions_sha1_intrinsic.c +* +*******************************************************************************/ +#include +__FBSDID("$FreeBSD$"); + +#include +#include + +#include + +void intel_sha1_step(uint32_t *digest, const char *data, uint32_t num_blks) { + __m128i abcd, e0, e1; + __m128i abcd_save, e_save; + __m128i msg0, msg1, msg2, msg3; + __m128i shuf_mask, e_mask; + +#if 0 + e_mask = _mm_set_epi64x(0xFFFFFFFF00000000ull, 0x0000000000000000ull); +#else + (void)e_mask; + e0 = _mm_set_epi64x(0, 0); +#endif + shuf_mask = _mm_set_epi64x(0x0001020304050607ull, 0x08090a0b0c0d0e0full); + + // Load initial hash values + abcd = _mm_loadu_si128((__m128i*) digest); + e0 = _mm_insert_epi32(e0, *(digest+4), 3); + abcd = _mm_shuffle_epi32(abcd, 0x1B); +#if 0 + e0 = _mm_and_si128(e0, e_mask); +#endif + + while (num_blks > 0) { + // Save hash values for addition after rounds + abcd_save = abcd; + e_save = e0; + + // Rounds 0-3 + msg0 = _mm_loadu_si128((const __m128i*) data); + msg0 = _mm_shuffle_epi8(msg0, shuf_mask); + e0 = _mm_add_epi32(e0, msg0); + e1 = abcd; + abcd = _mm_sha1rnds4_epu32(abcd, e0, 0); + + // Rounds 4-7 + msg1 = _mm_loadu_si128((const __m128i*) (data+16)); + msg1 = _mm_shuffle_epi8(msg1, shuf_mask); + e1 = _mm_sha1nexte_epu32(e1, msg1); + e0 = abcd; + abcd = _mm_sha1rnds4_epu32(abcd, e1, 0); + msg0 = _mm_sha1msg1_epu32(msg0, msg1); + + // Rounds 8-11 + msg2 = _mm_loadu_si128((const __m128i*) (data+32)); + msg2 = _mm_shuffle_epi8(msg2, shuf_mask); + e0 = _mm_sha1nexte_epu32(e0, msg2); + e1 = abcd; + abcd = _mm_sha1rnds4_epu32(abcd, e0, 0); + msg1 = _mm_sha1msg1_epu32(msg1, msg2); + msg0 = _mm_xor_si128(msg0, msg2); + + // Rounds 12-15 + msg3 = _mm_loadu_si128((const __m128i*) (data+48)); + msg3 = _mm_shuffle_epi8(msg3, shuf_mask); + e1 = _mm_sha1nexte_epu32(e1, msg3); + e0 = abcd; + msg0 = _mm_sha1msg2_epu32(msg0, msg3); + abcd = _mm_sha1rnds4_epu32(abcd, e1, 0); + msg2 = _mm_sha1msg1_epu32(msg2, msg3); *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***