Date: Mon, 12 Jan 2015 19:40:13 -0800 From: Alexey Ivanov <savetherbtz@gmail.com> To: rozhuk.im@gmail.com, John-Mark Gurney <jmg@funkthat.com> Cc: freebsd-hackers@freebsd.org, freebsd-geom@freebsd.org Subject: Re: ChaCha8/12/20 and GEOM ELI tests Message-ID: <7A712B22-1151-4A80-970A-36C0C2A63653@gmail.com> In-Reply-To: <20150112233411.GP1949@funkthat.com> References: <54b33bfa.e31b980a.3e5d.ffffc823@mx.google.com> <20150112072249.GM1949@funkthat.com> <54b43144.2d08980a.437b.0f8f@mx.google.com> <20150112233411.GP1949@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Just curious: why does a stream cipher use mode of operation (e.g. XTS)? > On Jan 12, 2015, at 3:34 PM, John-Mark Gurney <jmg@funkthat.com> wrote: > > rozhuk.im@gmail.com wrote this message on Mon, Jan 12, 2015 at 23:40 +0300: >>>> Cha?ha patch: >>>> >>> http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch >>> >>> What's the difference between CHACHA and XCHACHA? >> >> Same as between SALSA and XSALSA. >> >> XChaCha20 uses a 256-bit key as well as the first 128 bits of the nonce in >> order to compute a subkey. This subkey, as well as the remaining 64 bits of >> the nonce, are the parameters of the ChaCha20 function used to actually >> generate the stream. >> >> But with XChaCha20's longer nonce, it is safe to generate nonces using >> randombytes_buf() for every message encrypted with the same key without >> having to worry about a collision. >> >> More details: http://cr.yp.to/snuffle/xsalsa-20081128.pdf > > Ahh, thanks.. > >>> Also, where are the man page diffs? They might have explained the >>> difference between the two, and explained why two versions of chacha >>> are needed... >> >> No man page diffs. > > You need to document the new defines in crypto(9), and document the > various parameters in crypto(7)... Yes, not all modes are documented > in crypto(7), but going forward, at a minimum we need to document new > additions... > > I'll admit I didn't document the other algorithms as I'm not as familar > w/ those as the ones that I worked one... > >> Man pages does not explain difference between AES-CBC and AES-XTS... > > True, but CBC and XTS (which includes a reference to the standard) are > a lot more searchable/common knowlege than xchacha.. google thinks you > mean chacha, and xchacha just turns up a bunch of people on various > networks... Not until you search on xchacha crypto do you get a relevant > page... Also, wikipedia doesn't have an entry for xchacha, nor does > the chacha (cipher) page list it... So, when documenting xchacha in > crypto(7), include a link to the description/standard... > >>> Is there a reason you decided to write your own ChaCha implementation >>> instead of using one of the standard ones? Did you run performance >>> tests between your implementation and others? >> >> Reference ChaCha and reference (FreeBSD) XTS (4k sector): >> ChaCha8-XTS-256 = 199518722 bytes/sec >> ChaCha12-XTS-256 = 179029849 bytes/sec >> ChaCha20-XTS-256 = 149447317 bytes/sec >> XChaCha8-XTS-256 = 195675728 bytes/sec >> XChaCha12-XTS-256 = 175790196 bytes/sec >> XChaCha20-XTS-256 = 147939263 bytes/sec > > So, you're seeing a 33%-50% improvement, good to hear... > > Also, do you publish this implementation somewhere? If so, it'd be > helpful to include a url to where up to date versions can be obtained... > If you don't plan on publishing/maintaining it outside of FreeBSD, then > we need to unifdef out the Windows parts of it for our tree... > >> This is the reference version adapted for use in /dev/crypto. >> chacha_block_unaligneg() - processing the reference version of a data block. >> Macros are used for readability. >> chacha_block_aligned() - the same but the work on the aligned data. > > Please use the macro __NO_STRICT_ALIGNMENT to decide if special work > is necessary to handle the alignment... > > What is the CHACHA_X64 macro for? If that is to detect LP64 platforms, > please use the macro __LP64__ to decide this... Have you done > performance evaluations on 32bit arches to make sure double rounds aren't > a benefit there too? > > Use the byteorder(9) macros to encode/decode integers instead of rolling > your own (U8TO32_LITTLE and U32TO8_LITTLE)... Turns out compilers aren't > good at optimizing this type of code, and platforms may have assembly > optimized versions for these... > >> To increase speed, instead of one byte is processed for 4/8 byte times. >> The data in the context of an 8-byte aligned. >> To increase security, all data, including temporary, saved in a context that >> on completion of the work is filled with zeros. > > Please use the function explicite_bzero that is available for all of > these instead of creating your own.. > >>>> HW: Core Duo E8500, 8Gb DDR2-800. >>>> dd if=/dev/zero of=/dev/md0 bs=1m >>>> 2148489421 bytes/sec >>>> >>>> >>>> # sector = 512b >>>> 3DES-CBC-192 = 20773120 bytes/sec >>>> AES-CBC-128 = 85276853 bytes/sec >>>> AES-CBC-256 = 68893016 bytes/sec >>>> AES-XTS-128 = 68194868 bytes/sec >>>> AES-XTS-256 = 56611573 bytes/sec >>>> Blowfish-CBC-128 = 11169657 bytes/sec >>>> Blowfish-CBC-256 = 11185891 bytes/sec >>>> Camellia-CBC-128 = 78077243 bytes/sec >>>> Camellia-CBC-256 = 65732219 bytes/sec >>>> ChaCha8-XTS-256 = 258042765 bytes/sec >>>> ChaCha12-XTS-256 = 223616967 bytes/sec >>>> ChaCha20-XTS-256 = 176005366 bytes/sec >>>> XChaCha8-XTS-256 = 228292624 bytes/sec >>>> XChaCha12-XTS-256 = 195577624 bytes/sec >>>> XChaCha20-XTS-256 = 152247267 bytes/sec >>>> XChaCha20-XTS-128 = 152717737 bytes/sec ! 128 bit key have same speed >>>> as 256 >>>> >>>> >>>> # sector = 4kb >>>> 3DES-CBC-192 = 22018189 bytes/sec >>>> AES-CBC-128 = 104097143 bytes/sec >>>> AES-CBC-256 = 81983833 bytes/sec >>>> AES-XTS-128 = 78559346 bytes/sec >>>> AES-XTS-256 = 66047200 bytes/sec >>>> Blowfish-CBC-128 = 38635464 bytes/sec >>>> Blowfish-CBC-256 = 38810555 bytes/sec >>>> Camellia-CBC-128 = 92814510 bytes/sec >>>> Camellia-CBC-256 = 75949489 bytes/sec >>>> ChaCha8-XTS-256 = 337336982 bytes/sec >>>> ChaCha12-XTS-256 = 284740187 bytes/sec >>>> ChaCha20-XTS-256 = 217326865 bytes/sec >>>> XChaCha8-XTS-256 = 328424551 bytes/sec >>>> XChaCha12-XTS-256 = 278579692 bytes/sec >>>> XChaCha20-XTS-256 = 211660225 bytes/sec >>>> >>>> Optimized AES-XTS - speed like AES-CBC: >>>> AES-XTS-128 = 102841051 bytes/sec >>>> AES-XTS-256 = 80813644 bytes/sec >>> >>> Is this from a different patch or what? Can you talk more about this? >> >> No patch at this moment. >> After optimization ChaCha-XTS I applied these optimizations to the AES-XTS >> and get this result. >> All changes were aes_xts_reinit() and aes_xts_crypt(), just slightly changed >> the structure aes_xts_ctx. >> >> aes_xts_ctx: >> u_int8_t tweak[] -> u_int64_t tweak[] >> >> aes_xts_reinit -> same as chacha_xts_reinit() >> >> aes_xts_crypt -> same as chacha_xts_crypt(): >> block[] - temp buf removed; >> xor 1 byte -> xor 8 bytes at once; >> tweak[i] << 1: rotl 1 bit: 1 byte -> 8 bytes; >> unroll loops; > > Ahh, I thought I had done some similar optimizations, but I only did > them to the aesni version of the routines... You should use the macro > above to decide if things are aligned or not... > >> >> Final: >> >> struct aes_xts_ctx { >> rijndael_ctx key1; >> rijndael_ctx key2; >> uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))]; >> }; >> >> void >> aes_xts_reinit(caddr_t key, u_int8_t *iv) >> { >> struct aes_xts_ctx *ctx = (struct aes_xts_ctx *)key; >> >> /* >> * Prepare tweak as E_k2(IV). IV is specified as LE representation >> * of a 64-bit block number which we allow to be passed in directly. >> */ >> if (ALIGNED_POINTER(iv, uint64_t)) { >> ctx->tweak[0] = (*((uint64_t*)(void*)iv)); >> } else { >> bcopy(iv, ctx->tweak, sizeof(uint64_t)); >> } >> /* Convert to LE. */ >> ctx->tweak[0] = htole64(ctx->tweak[0]); > > Hmm... this line bothers me.. I'll need to spend more time reading up > to decide if it is buggy or not... Is ctx->tweak in host order? or LE > order? I believe it's suppose to be LE order, as it gets passed > directly to _encryt.. I'm also not sure if the original code is BE > clean, which is part of my problem... > >> /* Last 64 bits of IV are always zero */ >> ctx->tweak[1] = 0; >> >> rijndael_encrypt(&ctx->key2, (uint8_t*)ctx->tweak, >> (uint8_t*)ctx->tweak); >> } >> >> static void >> aes_xts_crypt(struct aes_xts_ctx *ctx, u_int8_t *data, u_int do_encrypt) >> { >> size_t i; >> uint64_t crr, tm; >> >> if (ALIGNED_POINTER(blk, uint64_t)) { >> ((uint64_t*)(void*)data)[0] ^= ctx->tweak[0]; >> ((uint64_t*)(void*)data)[1] ^= ctx->tweak[1]; >> } else { >> for (i = 0; i < AES_XTS_BLOCKSIZE; i ++) >> data[i] ^= ((uint8_t*)ctx->tweak)[i]; >> } >> >> if (do_encrypt) >> rijndael_encrypt(&ctx->key1, data, data); >> else >> rijndael_decrypt(&ctx->key1, data, data); >> >> if (ALIGNED_POINTER(blk, uint64_t)) { >> ((uint64_t*)(void*)data)[0] ^= ctx->tweak[0]; >> ((uint64_t*)(void*)data)[1] ^= ctx->tweak[1]; >> } else { >> for (i = 0; i < AES_XTS_BLOCKSIZE; i ++) >> data[i] ^= ((uint8_t*)ctx->tweak)[i]; >> } >> >> /* Exponentiate tweak */ >> crr = (ctx->tweak[0] >> ((sizeof(uint64_t) * 8) - 1)); >> ctx->tweak[0] = (ctx->tweak[0] << 1); >> >> tm = ctx->tweak[1]; >> ctx->tweak[1] = ((tm << 1) | crr); >> crr = (tm >> ((sizeof(uint64_t) * 8) - 1)); >> >> if (crr) >> ctx->tweak[0] ^= 0x87; /* GF(2^128) generator polynomial. */ > > Please use the AES_XTS_ALPHA define instead of hardcoding the value.. > > Thanks. > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUtJOjAAoJECvXQw+IBr2aY+8P/iuzKXskTgHrRJUYnLcL6B9M JXD/KGCX38n5jEt7wzkv5dlvihnXvYaHxdvA0xQ7ehCEWdBwV4w/lwWnMcl10a1n SIbzyDtk5diYsbHBKLEQE3uuWXG1dcC08+LS3J4QYz0oYJzdJkVe/8Ci3FSCNhGX tt2RZJjMikVZcMU9/4TD51zvKbJfWaZOiS6Z/BTU/gWmPx0+HzelbudR8zrs6w3+ 0ow8PZE39qaj+RIxHjUhQyHGXRMnGW2ebrX/7nanVTO2j6Hxxip1Kqfc3Aa3wSIx S2NrL2VCA+vOfAcHqeAFOjAPrnasYivR3Rjw1aJ8u7m7wwn2ZVTSfGgykR+rvuIp wNWCb7N+487yLTxVH4+xso8hUnxEAJ/rkVQaS44JR3Bm0hGUkDaPZ5obp+7Szu3S BJAqHLkKn5NqHyXENfKdZQEFYHEot9m9H1gNWXqWSmk/0sed7bC1CjD3LQ3MRCQk tRjr6REATviqRT/DRKwQ7ldX1GUe3WN6t2ozA4xbxM/H7IGdKkztmZ9p4urnhIgp 3B6NhWzhX7bVkHZbEu/dq8WC8ZQMF+PlfcOTyDb8wl8Dfb9va/+vriV6zOosKOMU tzAbK/kDgSE/m2Aum1xYlCC1NxW02VfHrEVYGP2YHfA1i9a1fa+yqkR3gMYZjpHE qLjhXxVTMebG60ru9R84 =IQih -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7A712B22-1151-4A80-970A-36C0C2A63653>
