Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Aug 2013 19:27:28 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Ollivier Robert <roberto@keltia.freenix.fr>
Cc:        dinoex@freebsd.org, freebsd-security@freebsd.org, freebsd-ports@freebsd.org
Subject:   Re: security/openssl speed issues
Message-ID:  <20130828022728.GR29777@funkthat.com>
In-Reply-To: <20130827153205.GA48196@roberto02-aw.eurocontrol.fr>
References:  <20130827153205.GA48196@roberto02-aw.eurocontrol.fr>

next in thread | previous in thread | raw e-mail | index | archive | help
Ollivier Robert wrote this message on Tue, Aug 27, 2013 at 17:32 +0200:
> As I got a new machine with the AES-NI crypto extensions, I'm getting interested with it and as you may have seen, I've already merged into stable/9 two changesets for AES-NI support in GELI & cryptodev.
> 
> Now, I'm trying to measure the impact of said AES extentions, I tumbled on a very weird difference in behaviour between our base system openssl and the one in ports.
> 
> /usr/bin/openssl:
> OpenSSL 0.9.8y 5 Feb 2013
> 
> /usr/local/bin/openssl:
> OpenSSL 1.0.1e 11 Feb 2013
> 
> The one is base is not supposed to have cryptodev (and aesni) support at all as it was added apparently in 1.0.1.  Fine.
> 
> 1. Trying to run both on a machine without the AES-NI extensions, I should have similar results in running speed tests but:
> 
> 1181 [17:18] roberto@centre:/usr/ports> /usr/bin/openssl speed aes-256-cbc

This is not a very good way to run the tests...  It turns out that if
you run it this way, it will NOT go through the EVP OpenSSL system, and
instead call a slow AES function directly, which will not be the one that
gets used in real applications...

To use the EVP system, and see what performance OpenSSH and others will
get, use openssl speed -evp aes-256-cbc

Now if you have cryptodev+aesni loaded and my aesni patches applied,
you'll see something like this:
$ openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 928863 aes-256-cbc's in 0.20s
Doing aes-256-cbc for 3s on 64 size blocks: 880075 aes-256-cbc's in 0.28s
Doing aes-256-cbc for 3s on 256 size blocks: 775018 aes-256-cbc's in 0.20s
Doing aes-256-cbc for 3s on 1024 size blocks: 490425 aes-256-cbc's in 0.09s
Doing aes-256-cbc for 3s on 8192 size blocks: 102189 aes-256-cbc's in 0.05s
OpenSSL 1.0.1e-freebsd 11 Feb 2013
built on: Tue Aug 27 11:52:46 PDT 2013
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: cc -O
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      76092.46k   200265.96k  1015831.59k  5843725.96k 17858822.14k

Man, 17GB/sec!   Impressive!  Except not... notice the times above.. it
only took .05s to do it..  That's because by default, openssl speed only
computes user time, not real time and since it's now properly using
cryptodev, not much cpu time is spent in the process...  The
undocumented -elapsed option to the rescue:
$ openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 962245 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 64 size blocks: 918001 aes-256-cbc's in 3.02s
Doing aes-256-cbc for 3s on 256 size blocks: 799186 aes-256-cbc's in 3.05s
Doing aes-256-cbc for 3s on 1024 size blocks: 496954 aes-256-cbc's in 3.02s
Doing aes-256-cbc for 3s on 8192 size blocks: 102473 aes-256-cbc's in 3.00s
OpenSSL 1.0.1e-freebsd 11 Feb 2013
built on: Tue Aug 27 11:52:46 PDT 2013
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: cc -O
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       5118.64k    19432.21k    67148.02k   168312.03k   279819.61k

This gives a bit more resonable results...

And if you specify -decrypt also, you'll see it's faster because
w/ my aesni patches, it pipelines cbc decrypt:
$ openssl speed -elapsed -decrypt -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 941128 aes-256-cbc's in 3.02s
Doing aes-256-cbc for 3s on 64 size blocks: 950875 aes-256-cbc's in 3.03s
Doing aes-256-cbc for 3s on 256 size blocks: 922503 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 750362 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 208038 aes-256-cbc's in 3.03s
OpenSSL 1.0.1e-freebsd 11 Feb 2013
built on: Tue Aug 27 11:52:46 PDT 2013
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: cc -O
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       4993.34k    20076.21k    78720.26k   256123.56k   562225.91k

So, the good news is that cryptodev does properly work for OpenSSL...

And even better news (if you unload cryptodev), OpenSSL does work with
AESNI:
$ openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 56809161 aes-256-cbc's in 3.02s
Doing aes-256-cbc for 3s on 64 size blocks: 15626726 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 256 size blocks: 4289939 aes-256-cbc's in 3.04s
Doing aes-256-cbc for 3s on 1024 size blocks: 1090334 aes-256-cbc's in 3.02s
Doing aes-256-cbc for 3s on 8192 size blocks: 136649 aes-256-cbc's in 2.99s
OpenSSL 1.0.1e-freebsd 11 Feb 2013
built on: Tue Aug 27 11:52:46 PDT 2013
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: cc -O
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc     300633.49k   332504.26k   361369.46k   370239.01k   374117.13k

and decrypt is even faster:
$ openssl speed -decrypt -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 61585473 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 60938441 aes-256-cbc's in 3.02s
Doing aes-256-cbc for 3s on 256 size blocks: 34632152 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 1024 size blocks: 9779213 aes-256-cbc's in 3.03s
Doing aes-256-cbc for 3s on 8192 size blocks: 1320658 aes-256-cbc's in 3.05s
OpenSSL 1.0.1e-freebsd 11 Feb 2013
built on: Tue Aug 27 11:52:46 PDT 2013
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: cc -O
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc     328455.86k  1289942.40k  2947600.93k  3303559.29k  3550795.60k

Just for people wondering what CPU:
CPU: AMD A10-5700 APU with Radeon(tm) HD Graphics    (3393.89-MHz K8-class CPU)

I guess now we need to figure out how to teach OpenSSL to use AES-NI
natively even when /dev/crypto is available...

but at least we did solve the (non-)issue of bad OpenSSL performance...

I will submit a patch to OpenSSL to not make the documentation of the
-elapsed option dependent on defines...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130828022728.GR29777>