Date: Thu, 28 Jan 2021 14:33:14 -0800 From: Neel Chauhan <nc@freebsd.org> To: Mark Johnston <markj@freebsd.org> Cc: Rick Macklem <rmacklem@uoguelph.ca>, Ronald Klop <ronald-lists@klop.ws>, freebsd-current@freebsd.org Subject: Re: Can In-Kernel TLS (kTLS) work with any OpenSSL Application? Message-ID: <4bc84eff95bbd6afbbafc13ce5bf32db@freebsd.org> In-Reply-To: <YBG5RJqqeDTOzlOr@raichu> References: <bd56c9d3711738d65a074d73c04addd2@freebsd.org> <op.0xoawf2bkndu52@joepie> <YQXPR0101MB0968D75B9A846C4F91461A7DDDBF0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YBG5RJqqeDTOzlOr@raichu>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --=_3a35af549f137e51a3495c04c2b2e29a Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Hi Mark, Thank you so much for your response describing how QAT encryption works. I learned that my server (HPE ProLiant ML110 Gen10) does not have QAT, mainly because the chipset (Intel C621) doesn't enable it. For reference, my firewall box (Intel D-1518-based HPE ProLiant EC200a) probably does, but I'm not going to use it for Tor. Tor uses 512-byte sized packets (a.k.a "cells") so even if I had QAT it may not work well, not to mention Tor is singlethreaded. I think I'll stick with kTLS with AESNI when 13.0-RELEASE is out. Worse case scenario I'll buy an AMD Ryzen-based PC and offload my Tor servers to it (assuming latest Ryzen > Skylake Xeon Scalable in single-thread performance). -Neel On 2021-01-27 11:04, Mark Johnston wrote: > On Sat, Jan 23, 2021 at 03:25:59PM +0000, Rick Macklem wrote: >> Ronald Klop wrote: >> >On Wed, 20 Jan 2021 21:21:15 +0100, Neel Chauhan <nc@freebsd.org> wrote: >> > >> >> Hi freebsd-current@, >> >> >> >> I know that In-Kernel TLS was merged into the FreeBSD HEAD tree a while >> >> back. >> >> >> >> With 13.0-RELEASE around the corner, I'm thinking about upgrading my >> >> home server, well if I can accelerate any SSL application. >> >> >> >> I'm asking because I have a home server on a symmetrical Gigabit >> >> connection (Google Fiber/Webpass), and that server runs a Tor relay. If >> >> you're interested in how Tor works, the EFF has a writeup: >> >> https://www.eff.org/pages/what-tor-relay >> >> >> >> But the main point for you all is: more-or-less Tor relays deal with >> >> 1000s TLS connections going into and out of the server. >> >> >> >> Would In-Kernel TLS help with an application like Tor (or even load >> >> balancers/TLS termination), or is it more for things like web servers >> >> sending static files via sendfile() (e.g. CDN used by Netflix). >> >> >> >> My server could also work with Intel's QuickAssist (since it has an >> >> Intel Xeon "Scalable" CPU). Would QuickAssist SSL be more helpful here? >> There is now qat(4), which KTLS should be able to use, but I do >> not think it has been tested for this. I also have no idea >> if it can be used effectively for userland encryption? > > KTLS requires support for separate output buffers and AAD buffers, > which > I hadn't implemented in the committed driver. I have a working patch > which adds that, so when that's committed qat(4) could in principle be > used with KTLS. So far I only tested with /dev/crypto and a couple of > debug sysctls used to toggle between the different cryptop buffer > layouts, not with KTLS proper. > > qat(4) can be used by userspace via cryptodev(4). This comes with a > fair bit of overhead since it involves a round-trip through the kernel > and some extra copying. AFAIK we don't have any framework for exposing > crypto devices directly to userspace, akin to DPDK's polling mode > drivers or netmap. > > I've seen a few questions about the comparative (dis)advantages of QAT > and AES-NI so I'll sidetrack a bit and try to characterize qat(4)'s > performance here based on some microbenchmarking I did this week. This > was all done in the kernel and so might need some qualification if > you're interested in using qat(4) from userspace. Numbers below are > gleaned from an Atom C3558 at 2.2GHz with an integrated QAT device. I > mostly tested AES-CBC-256 and AES-GCM-256. > > The high-level tradeoffs are: > - qat(4) introduces a lot of latency. For a single synchronous > operation it can take between 2x and 100x more time than aesni(4) to > complete. aesni takes 1000-2000 cycles to handle a request plus > 3-5 cycles per byte depending on the algorithm. qat takes at least > ~150,000 cycles between calling crypto_dispatch() and the cryptop > completion callback, plus 5-8 cycles per byte. qat dispatch itself > is > quite cheap, typically 1000-2000 cycles depending on the size of the > buffer. Handling a completion interrupt involves a context switch to > the driver ithread but this is also a small cost relative to the > entire operation. So, for anything where latency is crucial QAT is > probably not a great bet. > - qat can save a correspondingly large number of CPU cycles. It takes > qat roughly twice as long as aesni to complete encryption of a 32KB > buffer using AES-CBC-256 (more with GCM), but with qat the CPU is > idle > much of the time. Dispatching the request to firmware takes less > than > 1% of the total time elapsed between request dispatch and completion, > even with small buffers. OTOH with really small buffers aesni can > complete a request in the time that it takes qat just to dispatch the > request to the device, so at best qat will give comparable throughput > and CPU usage and worse latency. > - qat can handle multiple requests in parallel. This can improve > throughput dramatically if the producer can keep qat busy. > Empirically, the maximum throughput improvement is a function of the > request size. For example, counting the number of cycles required to > encrypt 100,000 buffers using AES-GCM-256: > > max # in flight 1 16 64 128 > > aesni, 16B 206M n/a n/a n/a > aesni, 4KB 1.52B n/a n/a n/a > aesni, 32KB 10.8B n/a n/a n/a > qat, 16B 17.1B 1.11B 219M 184M > qat, 4KB 20.9B 1.68B 710M 694M > qat, 32KB 38.2B 8.37B 4.25B 4.23B > > As a side note, OpenCrypto supports async dispatch for software > crypto > drivers, in which crypto_dispatch() hands work off to other threads. > This is enabled by net.inet.ipsec.async_crypto, for example. Of > course, the maximum parallelism is limited by the number of CPUs in > the system, but this can improve throughput significantly as well if > you're willing to spend the corresponding CPU cycles. > > To summarize, QAT can be beneficial when some or all of the following > apply: > 1. You have large requests. qat can give comparable throughput for > small requests if the producer can exploit parallelism in qat, > though > OpenCrypto's backpressure mechanism is really primitive (arguably > non-existent) and performance will tank if things get to a point > where qat can't keep up. > 2. You're able to dispatch requests in parallel. But see point 1. > 3. CPU cycles are precious and the extra latency is tolerable. > 3b. aesni doesn't implement some transform that you care about, but qat > does. Some (most?) Xeons don't implement the SHA extensions for > instance. I don't have a sense for how the plain cryptosoft driver > performs relative to aesni though. > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org" --=_3a35af549f137e51a3495c04c2b2e29a Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc; size=488 Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEFpeUj+sDItoNIly9vzSRBRPfYX0FAmATO6oACgkQvzSRBRPf YX3OFQf/YKr0TYb93+TkAkp+yCeMXUABIi404bdx7k0EGyAj1NLeg9pGQgJoKTZV IIVytN9RoyyVsGYios7/mLgx6qYsp95PbbnYzvSco304DQS6ep2U1wkZB/bjRzq3 vepSaCU1sd7WxHERHVHa1bOHG5lHBpl9pn7zh1PdJX6NgiSCUNJH1ulrO03bq4Dq 6jrZFD5cBGg9ziNBWNG6UjCWPHqjRzXAKuchn4NWJrtOK9cpOLsO28Q/FXrikdHH L1+PGID8zwVhUjvnhIl75FOx2LUD3EEmjKR++k1cjf3nnHZ2+ImocAN+iEageQ5B kiJ5/EFvE5Uji/U8A793BT6QZ5KbgQ== =t1pO -----END PGP SIGNATURE----- --=_3a35af549f137e51a3495c04c2b2e29a--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4bc84eff95bbd6afbbafc13ce5bf32db>