From owner-freebsd-hackers@FreeBSD.ORG Tue Jan 13 03:40:23 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5FF2D1DD; Tue, 13 Jan 2015 03:40:23 +0000 (UTC) Received: from mail-qg0-x232.google.com (mail-qg0-x232.google.com [IPv6:2607:f8b0:400d:c04::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0EAE2B96; Tue, 13 Jan 2015 03:40:23 +0000 (UTC) Received: by mail-qg0-f50.google.com with SMTP id z60so533874qgd.9; Mon, 12 Jan 2015 19:40:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to; bh=PzzdCPv9Hx5qNsK2+XXEWq5CxfoyMWM+0BOrDAFC33c=; b=Hl3KgKIpwCb3LR/21Sz2DF09qcBg1tAEQvw+8LrnYjAGnyxjRgN0FkV13qglaBcAGD Boel1OnzCEnSOx9Gt0g5/J4EVTZvm3uuNGbgMRMHl0npOlOS4UV0RAmuZaZYlwHhcqt0 BC6oUDBY69L7pvJFA+xV7ZnDUA4MveLTXvBJwVa/wmgiwoZ/zSow+6h4d76tOCl0iWXA Ovdx35wz7aFZHm1i2yXaey6ScOSK47YDLyIlxstKR0gEYtnENbfDoMhBp5efcmayC29i T8ZAo1WVeooa3i70GuSB0dVDy/n5JYnRC1o3lQNnVgUnbCa0Kw6jGvjqgOrdbLeIsXTm 50ow== X-Received: by 10.140.34.204 with SMTP id l70mr53078358qgl.55.1421120422148; Mon, 12 Jan 2015 19:40:22 -0800 (PST) Received: from [10.0.0.155] (c-50-131-147-141.hsd1.ca.comcast.net. [50.131.147.141]) by mx.google.com with ESMTPSA id u1sm1276455qap.11.2015.01.12.19.40.21 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 12 Jan 2015 19:40:21 -0800 (PST) Subject: Re: ChaCha8/12/20 and GEOM ELI tests Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Content-Type: multipart/signed; boundary="Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8"; protocol="application/pgp-signature"; micalg=pgp-sha512 X-Pgp-Agent: GPGMail 2.5b4 From: Alexey Ivanov In-Reply-To: <20150112233411.GP1949@funkthat.com> Date: Mon, 12 Jan 2015 19:40:13 -0800 Message-Id: <7A712B22-1151-4A80-970A-36C0C2A63653@gmail.com> References: <54b33bfa.e31b980a.3e5d.ffffc823@mx.google.com> <20150112072249.GM1949@funkthat.com> <54b43144.2d08980a.437b.0f8f@mx.google.com> <20150112233411.GP1949@funkthat.com> To: rozhuk.im@gmail.com, John-Mark Gurney X-Mailer: Apple Mail (2.1993) Cc: freebsd-hackers@freebsd.org, freebsd-geom@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jan 2015 03:40:23 -0000 --Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Just curious: why does a stream cipher use mode of operation (e.g. XTS)? > On Jan 12, 2015, at 3:34 PM, John-Mark Gurney = wrote: >=20 > rozhuk.im@gmail.com wrote this message on Mon, Jan 12, 2015 at 23:40 = +0300: >>>> Cha?ha patch: >>>>=20 >>> = http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch >>>=20 >>> What's the difference between CHACHA and XCHACHA? >>=20 >> Same as between SALSA and XSALSA. >>=20 >> XChaCha20 uses a 256-bit key as well as the first 128 bits of the = nonce in >> order to compute a subkey. This subkey, as well as the remaining 64 = bits of >> the nonce, are the parameters of the ChaCha20 function used to = actually >> generate the stream. >>=20 >> But with XChaCha20's longer nonce, it is safe to generate nonces = using >> randombytes_buf() for every message encrypted with the same key = without >> having to worry about a collision. >>=20 >> More details: http://cr.yp.to/snuffle/xsalsa-20081128.pdf >=20 > Ahh, thanks.. >=20 >>> Also, where are the man page diffs? They might have explained the >>> difference between the two, and explained why two versions of chacha >>> are needed... >>=20 >> No man page diffs. >=20 > You need to document the new defines in crypto(9), and document the > various parameters in crypto(7)... Yes, not all modes are documented > in crypto(7), but going forward, at a minimum we need to document new > additions... >=20 > I'll admit I didn't document the other algorithms as I'm not as = familar > w/ those as the ones that I worked one... >=20 >> Man pages does not explain difference between AES-CBC and AES-XTS... >=20 > True, but CBC and XTS (which includes a reference to the standard) are > a lot more searchable/common knowlege than xchacha.. google thinks = you > mean chacha, and xchacha just turns up a bunch of people on various > networks... Not until you search on xchacha crypto do you get a = relevant > page... Also, wikipedia doesn't have an entry for xchacha, nor does > the chacha (cipher) page list it... So, when documenting xchacha in > crypto(7), include a link to the description/standard... >=20 >>> Is there a reason you decided to write your own ChaCha = implementation >>> instead of using one of the standard ones? Did you run performance >>> tests between your implementation and others? >>=20 >> Reference ChaCha and reference (FreeBSD) XTS (4k sector): >> ChaCha8-XTS-256 =3D 199518722 bytes/sec >> ChaCha12-XTS-256 =3D 179029849 bytes/sec >> ChaCha20-XTS-256 =3D 149447317 bytes/sec >> XChaCha8-XTS-256 =3D 195675728 bytes/sec >> XChaCha12-XTS-256 =3D 175790196 bytes/sec >> XChaCha20-XTS-256 =3D 147939263 bytes/sec >=20 > So, you're seeing a 33%-50% improvement, good to hear... >=20 > Also, do you publish this implementation somewhere? If so, it'd be > helpful to include a url to where up to date versions can be = obtained... > If you don't plan on publishing/maintaining it outside of FreeBSD, = then > we need to unifdef out the Windows parts of it for our tree... >=20 >> This is the reference version adapted for use in /dev/crypto. >> chacha_block_unaligneg() - processing the reference version of a data = block. >> Macros are used for readability. >> chacha_block_aligned() - the same but the work on the aligned data. >=20 > Please use the macro __NO_STRICT_ALIGNMENT to decide if special work > is necessary to handle the alignment... >=20 > What is the CHACHA_X64 macro for? If that is to detect LP64 = platforms, > please use the macro __LP64__ to decide this... Have you done > performance evaluations on 32bit arches to make sure double rounds = aren't > a benefit there too? >=20 > Use the byteorder(9) macros to encode/decode integers instead of = rolling > your own (U8TO32_LITTLE and U32TO8_LITTLE)... Turns out compilers = aren't > good at optimizing this type of code, and platforms may have assembly > optimized versions for these... >=20 >> To increase speed, instead of one byte is processed for 4/8 byte = times. >> The data in the context of an 8-byte aligned. >> To increase security, all data, including temporary, saved in a = context that >> on completion of the work is filled with zeros. >=20 > Please use the function explicite_bzero that is available for all of > these instead of creating your own.. >=20 >>>> HW: Core Duo E8500, 8Gb DDR2-800. >>>> dd if=3D/dev/zero of=3D/dev/md0 bs=3D1m >>>> 2148489421 bytes/sec >>>>=20 >>>>=20 >>>> # sector =3D 512b >>>> 3DES-CBC-192 =3D 20773120 bytes/sec >>>> AES-CBC-128 =3D 85276853 bytes/sec >>>> AES-CBC-256 =3D 68893016 bytes/sec >>>> AES-XTS-128 =3D 68194868 bytes/sec >>>> AES-XTS-256 =3D 56611573 bytes/sec >>>> Blowfish-CBC-128 =3D 11169657 bytes/sec >>>> Blowfish-CBC-256 =3D 11185891 bytes/sec >>>> Camellia-CBC-128 =3D 78077243 bytes/sec >>>> Camellia-CBC-256 =3D 65732219 bytes/sec >>>> ChaCha8-XTS-256 =3D 258042765 bytes/sec >>>> ChaCha12-XTS-256 =3D 223616967 bytes/sec >>>> ChaCha20-XTS-256 =3D 176005366 bytes/sec >>>> XChaCha8-XTS-256 =3D 228292624 bytes/sec >>>> XChaCha12-XTS-256 =3D 195577624 bytes/sec >>>> XChaCha20-XTS-256 =3D 152247267 bytes/sec >>>> XChaCha20-XTS-128 =3D 152717737 bytes/sec ! 128 bit key have same = speed >>>> as 256 >>>>=20 >>>>=20 >>>> # sector =3D 4kb >>>> 3DES-CBC-192 =3D 22018189 bytes/sec >>>> AES-CBC-128 =3D 104097143 bytes/sec >>>> AES-CBC-256 =3D 81983833 bytes/sec >>>> AES-XTS-128 =3D 78559346 bytes/sec >>>> AES-XTS-256 =3D 66047200 bytes/sec >>>> Blowfish-CBC-128 =3D 38635464 bytes/sec >>>> Blowfish-CBC-256 =3D 38810555 bytes/sec >>>> Camellia-CBC-128 =3D 92814510 bytes/sec >>>> Camellia-CBC-256 =3D 75949489 bytes/sec >>>> ChaCha8-XTS-256 =3D 337336982 bytes/sec >>>> ChaCha12-XTS-256 =3D 284740187 bytes/sec >>>> ChaCha20-XTS-256 =3D 217326865 bytes/sec >>>> XChaCha8-XTS-256 =3D 328424551 bytes/sec >>>> XChaCha12-XTS-256 =3D 278579692 bytes/sec >>>> XChaCha20-XTS-256 =3D 211660225 bytes/sec >>>>=20 >>>> Optimized AES-XTS - speed like AES-CBC: >>>> AES-XTS-128 =3D 102841051 bytes/sec >>>> AES-XTS-256 =3D 80813644 bytes/sec >>>=20 >>> Is this from a different patch or what? Can you talk more about = this? >>=20 >> No patch at this moment. >> After optimization ChaCha-XTS I applied these optimizations to the = AES-XTS >> and get this result. >> All changes were aes_xts_reinit() and aes_xts_crypt(), just slightly = changed >> the structure aes_xts_ctx. >>=20 >> aes_xts_ctx: >> u_int8_t tweak[] -> u_int64_t tweak[] >>=20 >> aes_xts_reinit -> same as chacha_xts_reinit() >>=20 >> aes_xts_crypt -> same as chacha_xts_crypt(): >> block[] - temp buf removed; >> xor 1 byte -> xor 8 bytes at once; >> tweak[i] << 1: rotl 1 bit: 1 byte -> 8 bytes; >> unroll loops; >=20 > Ahh, I thought I had done some similar optimizations, but I only did > them to the aesni version of the routines... You should use the macro > above to decide if things are aligned or not... >=20 >>=20 >> Final: >>=20 >> struct aes_xts_ctx { >> rijndael_ctx key1; >> rijndael_ctx key2; >> uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))]; >> }; >>=20 >> void >> aes_xts_reinit(caddr_t key, u_int8_t *iv) >> { >> struct aes_xts_ctx *ctx =3D (struct aes_xts_ctx *)key; >>=20 >> /* >> * Prepare tweak as E_k2(IV). IV is specified as LE = representation >> * of a 64-bit block number which we allow to be passed in = directly. >> */ >> if (ALIGNED_POINTER(iv, uint64_t)) { >> ctx->tweak[0] =3D (*((uint64_t*)(void*)iv)); >> } else { >> bcopy(iv, ctx->tweak, sizeof(uint64_t)); >> } >> /* Convert to LE. */ >> ctx->tweak[0] =3D htole64(ctx->tweak[0]); >=20 > Hmm... this line bothers me.. I'll need to spend more time reading up > to decide if it is buggy or not... Is ctx->tweak in host order? or LE > order? I believe it's suppose to be LE order, as it gets passed > directly to _encryt.. I'm also not sure if the original code is BE > clean, which is part of my problem... >=20 >> /* Last 64 bits of IV are always zero */ >> ctx->tweak[1] =3D 0; >>=20 >> rijndael_encrypt(&ctx->key2, (uint8_t*)ctx->tweak, >> (uint8_t*)ctx->tweak); >> } >>=20 >> static void >> aes_xts_crypt(struct aes_xts_ctx *ctx, u_int8_t *data, u_int = do_encrypt) >> { >> size_t i; >> uint64_t crr, tm; >>=20 >> if (ALIGNED_POINTER(blk, uint64_t)) { >> ((uint64_t*)(void*)data)[0] ^=3D ctx->tweak[0]; >> ((uint64_t*)(void*)data)[1] ^=3D ctx->tweak[1]; >> } else { >> for (i =3D 0; i < AES_XTS_BLOCKSIZE; i ++) >> data[i] ^=3D ((uint8_t*)ctx->tweak)[i]; >> } >>=20 >> if (do_encrypt) >> rijndael_encrypt(&ctx->key1, data, data); >> else >> rijndael_decrypt(&ctx->key1, data, data); >>=20 >> if (ALIGNED_POINTER(blk, uint64_t)) { >> ((uint64_t*)(void*)data)[0] ^=3D ctx->tweak[0]; >> ((uint64_t*)(void*)data)[1] ^=3D ctx->tweak[1]; >> } else { >> for (i =3D 0; i < AES_XTS_BLOCKSIZE; i ++) >> data[i] ^=3D ((uint8_t*)ctx->tweak)[i]; >> } >>=20 >> /* Exponentiate tweak */ >> crr =3D (ctx->tweak[0] >> ((sizeof(uint64_t) * 8) - 1)); >> ctx->tweak[0] =3D (ctx->tweak[0] << 1); >>=20 >> tm =3D ctx->tweak[1]; >> ctx->tweak[1] =3D ((tm << 1) | crr); >> crr =3D (tm >> ((sizeof(uint64_t) * 8) - 1)); >>=20 >> if (crr) >> ctx->tweak[0] ^=3D 0x87; /* GF(2^128) generator = polynomial. */ >=20 > Please use the AES_XTS_ALPHA define instead of hardcoding the value.. >=20 > Thanks. >=20 > -- > John-Mark Gurney Voice: +1 415 225 5579 >=20 > "All that I will do, has been done, All that I have, has not." > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" --Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUtJOjAAoJECvXQw+IBr2aY+8P/iuzKXskTgHrRJUYnLcL6B9M JXD/KGCX38n5jEt7wzkv5dlvihnXvYaHxdvA0xQ7ehCEWdBwV4w/lwWnMcl10a1n SIbzyDtk5diYsbHBKLEQE3uuWXG1dcC08+LS3J4QYz0oYJzdJkVe/8Ci3FSCNhGX tt2RZJjMikVZcMU9/4TD51zvKbJfWaZOiS6Z/BTU/gWmPx0+HzelbudR8zrs6w3+ 0ow8PZE39qaj+RIxHjUhQyHGXRMnGW2ebrX/7nanVTO2j6Hxxip1Kqfc3Aa3wSIx S2NrL2VCA+vOfAcHqeAFOjAPrnasYivR3Rjw1aJ8u7m7wwn2ZVTSfGgykR+rvuIp wNWCb7N+487yLTxVH4+xso8hUnxEAJ/rkVQaS44JR3Bm0hGUkDaPZ5obp+7Szu3S BJAqHLkKn5NqHyXENfKdZQEFYHEot9m9H1gNWXqWSmk/0sed7bC1CjD3LQ3MRCQk tRjr6REATviqRT/DRKwQ7ldX1GUe3WN6t2ozA4xbxM/H7IGdKkztmZ9p4urnhIgp 3B6NhWzhX7bVkHZbEu/dq8WC8ZQMF+PlfcOTyDb8wl8Dfb9va/+vriV6zOosKOMU tzAbK/kDgSE/m2Aum1xYlCC1NxW02VfHrEVYGP2YHfA1i9a1fa+yqkR3gMYZjpHE qLjhXxVTMebG60ru9R84 =IQih -----END PGP SIGNATURE----- --Apple-Mail=_43A9D6FE-F5F9-4CCC-B6A3-B8B5171B44D8--