Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Jan 2021 14:33:14 -0800
From:      Neel Chauhan <nc@freebsd.org>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, Ronald Klop <ronald-lists@klop.ws>, freebsd-current@freebsd.org
Subject:   Re: Can In-Kernel TLS (kTLS) work with any OpenSSL Application?
Message-ID:  <4bc84eff95bbd6afbbafc13ce5bf32db@freebsd.org>
In-Reply-To: <YBG5RJqqeDTOzlOr@raichu>
References:  <bd56c9d3711738d65a074d73c04addd2@freebsd.org> <op.0xoawf2bkndu52@joepie> <YQXPR0101MB0968D75B9A846C4F91461A7DDDBF0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YBG5RJqqeDTOzlOr@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)

--=_3a35af549f137e51a3495c04c2b2e29a
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII;
 format=flowed

Hi Mark,

Thank you so much for your response describing how QAT encryption works.

I learned that my server (HPE ProLiant ML110 Gen10) does not have QAT, 
mainly because the chipset (Intel C621) doesn't enable it.

For reference, my firewall box (Intel D-1518-based HPE ProLiant EC200a) 
probably does, but I'm not going to use it for Tor.

Tor uses 512-byte sized packets (a.k.a "cells") so even if I had QAT it 
may not work well, not to mention Tor is singlethreaded.

I think I'll stick with kTLS with AESNI when 13.0-RELEASE is out. Worse 
case scenario I'll buy an AMD Ryzen-based PC and offload my Tor servers 
to it (assuming latest Ryzen > Skylake Xeon Scalable in single-thread 
performance).

-Neel

On 2021-01-27 11:04, Mark Johnston wrote:
> On Sat, Jan 23, 2021 at 03:25:59PM +0000, Rick Macklem wrote:
>> Ronald Klop wrote:
>> >On Wed, 20 Jan 2021 21:21:15 +0100, Neel Chauhan <nc@freebsd.org> wrote:
>> >
>> >> Hi freebsd-current@,
>> >>
>> >> I know that In-Kernel TLS was merged into the FreeBSD HEAD tree a while
>> >> back.
>> >>
>> >> With 13.0-RELEASE around the corner, I'm thinking about upgrading my
>> >> home server, well if I can accelerate any SSL application.
>> >>
>> >> I'm asking because I have a home server on a symmetrical Gigabit
>> >> connection (Google Fiber/Webpass), and that server runs a Tor relay. If
>> >> you're interested in how Tor works, the EFF has a writeup:
>> >> https://www.eff.org/pages/what-tor-relay
>> >>
>> >> But the main point for you all is: more-or-less Tor relays deal with
>> >> 1000s TLS connections going into and out of the server.
>> >>
>> >> Would In-Kernel TLS help with an application like Tor (or even load
>> >> balancers/TLS termination), or is it more for things like web servers
>> >> sending static files via sendfile() (e.g. CDN used by Netflix).
>> >>
>> >> My server could also work with Intel's QuickAssist (since it has an
>> >> Intel Xeon "Scalable" CPU). Would QuickAssist SSL be more helpful here?
>> There is now qat(4), which KTLS should be able to use, but I do
>> not think it has been tested for this. I also have no idea
>> if it can be used effectively for userland encryption?
> 
> KTLS requires support for separate output buffers and AAD buffers, 
> which
> I hadn't implemented in the committed driver.  I have a working patch
> which adds that, so when that's committed qat(4) could in principle be
> used with KTLS.  So far I only tested with /dev/crypto and a couple of
> debug sysctls used to toggle between the different cryptop buffer
> layouts, not with KTLS proper.
> 
> qat(4) can be used by userspace via cryptodev(4).  This comes with a
> fair bit of overhead since it involves a round-trip through the kernel
> and some extra copying.  AFAIK we don't have any framework for exposing
> crypto devices directly to userspace, akin to DPDK's polling mode
> drivers or netmap.
> 
> I've seen a few questions about the comparative (dis)advantages of QAT
> and AES-NI so I'll sidetrack a bit and try to characterize qat(4)'s
> performance here based on some microbenchmarking I did this week.  This
> was all done in the kernel and so might need some qualification if
> you're interested in using qat(4) from userspace.  Numbers below are
> gleaned from an Atom C3558 at 2.2GHz with an integrated QAT device.  I
> mostly tested AES-CBC-256 and AES-GCM-256.
> 
> The high-level tradeoffs are:
> - qat(4) introduces a lot of latency.  For a single synchronous
>   operation it can take between 2x and 100x more time than aesni(4) to
>   complete.  aesni takes 1000-2000 cycles to handle a request plus
>   3-5 cycles per byte depending on the algorithm.  qat takes at least
>   ~150,000 cycles between calling crypto_dispatch() and the cryptop
>   completion callback, plus 5-8 cycles per byte.  qat dispatch itself 
> is
>   quite cheap, typically 1000-2000 cycles depending on the size of the
>   buffer.  Handling a completion interrupt involves a context switch to
>   the driver ithread but this is also a small cost relative to the
>   entire operation.  So, for anything where latency is crucial QAT is
>   probably not a great bet.
> - qat can save a correspondingly large number of CPU cycles.  It takes
>   qat roughly twice as long as aesni to complete encryption of a 32KB
>   buffer using AES-CBC-256 (more with GCM), but with qat the CPU is 
> idle
>   much of the time.  Dispatching the request to firmware takes less 
> than
>   1% of the total time elapsed between request dispatch and completion,
>   even with small buffers.  OTOH with really small buffers aesni can
>   complete a request in the time that it takes qat just to dispatch the
>   request to the device, so at best qat will give comparable throughput
>   and CPU usage and worse latency.
> - qat can handle multiple requests in parallel.  This can improve
>   throughput dramatically if the producer can keep qat busy.
>   Empirically, the maximum throughput improvement is a function of the
>   request size.  For example, counting the number of cycles required to
>   encrypt 100,000 buffers using AES-GCM-256:
> 
>   max # in flight       1        16       64        128
> 
>   aesni, 16B           206M     n/a      n/a        n/a
>   aesni, 4KB          1.52B     n/a      n/a        n/a
>   aesni, 32KB         10.8B     n/a      n/a        n/a
>   qat,   16B          17.1B   1.11B     219M       184M
>   qat,   4KB          20.9B   1.68B     710M       694M
>   qat,   32KB         38.2B   8.37B    4.25B      4.23B
> 
>   As a side note, OpenCrypto supports async dispatch for software 
> crypto
>   drivers, in which crypto_dispatch() hands work off to other threads.
>   This is enabled by net.inet.ipsec.async_crypto, for example.  Of
>   course, the maximum parallelism is limited by the number of CPUs in
>   the system, but this can improve throughput significantly as well if
>   you're willing to spend the corresponding CPU cycles.
> 
> To summarize, QAT can be beneficial when some or all of the following
> apply:
> 1. You have large requests.  qat can give comparable throughput for
>    small requests if the producer can exploit parallelism in qat, 
> though
>    OpenCrypto's backpressure mechanism is really primitive (arguably
>    non-existent) and performance will tank if things get to a point
>    where qat can't keep up.
> 2. You're able to dispatch requests in parallel.  But see point 1.
> 3. CPU cycles are precious and the extra latency is tolerable.
> 3b. aesni doesn't implement some transform that you care about, but qat
>     does.  Some (most?) Xeons don't implement the SHA extensions for
>     instance.  I don't have a sense for how the plain cryptosoft driver
>     performs relative to aesni though.
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to 
> "freebsd-current-unsubscribe@freebsd.org"


--=_3a35af549f137e51a3495c04c2b2e29a
Content-Type: application/pgp-signature;
 name=signature.asc
Content-Disposition: attachment;
 filename=signature.asc;
 size=488
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEFpeUj+sDItoNIly9vzSRBRPfYX0FAmATO6oACgkQvzSRBRPf
YX3OFQf/YKr0TYb93+TkAkp+yCeMXUABIi404bdx7k0EGyAj1NLeg9pGQgJoKTZV
IIVytN9RoyyVsGYios7/mLgx6qYsp95PbbnYzvSco304DQS6ep2U1wkZB/bjRzq3
vepSaCU1sd7WxHERHVHa1bOHG5lHBpl9pn7zh1PdJX6NgiSCUNJH1ulrO03bq4Dq
6jrZFD5cBGg9ziNBWNG6UjCWPHqjRzXAKuchn4NWJrtOK9cpOLsO28Q/FXrikdHH
L1+PGID8zwVhUjvnhIl75FOx2LUD3EEmjKR++k1cjf3nnHZ2+ImocAN+iEageQ5B
kiJ5/EFvE5Uji/U8A793BT6QZ5KbgQ==
=t1pO
-----END PGP SIGNATURE-----

--=_3a35af549f137e51a3495c04c2b2e29a--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4bc84eff95bbd6afbbafc13ce5bf32db>