From owner-freebsd-arch@freebsd.org  Thu Jan 11 00:19:04 2018
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 758EDE7FCD1
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Thu, 11 Jan 2018 00:19:04 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 621EB74547
 for <freebsd-arch@freebsd.org>; Thu, 11 Jan 2018 00:19:04 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 5E3BAE7FCD0; Thu, 11 Jan 2018 00:19:04 +0000 (UTC)
Delivered-To: arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DD24E7FCCF
 for <arch@mailman.ysv.freebsd.org>; Thu, 11 Jan 2018 00:19:04 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from mail.baldwin.cx (bigwig.baldwin.cx [96.47.65.170])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 355F874546
 for <arch@freebsd.org>; Thu, 11 Jan 2018 00:19:03 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (astound-66-234-199-215.ca.astound.net
 [66.234.199.215])
 by mail.baldwin.cx (Postfix) with ESMTPSA id 8A0C610A7DB
 for <arch@freebsd.org>; Wed, 10 Jan 2018 19:19:01 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: arch@freebsd.org
Subject: Ranting about OCF / crypto(9)
Date: Wed, 10 Jan 2018 16:18:52 -0800
Message-ID: <3790717.UIxaijsHl3@ralph.baldwin.cx>
User-Agent: KMail/4.14.10 (FreeBSD/11.1-STABLE; KDE/4.14.30; amd64; ; )
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3
 (mail.baldwin.cx); Wed, 10 Jan 2018 19:19:01 -0500 (EST)
X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx
X-Virus-Status: Clean
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jan 2018 00:19:04 -0000

While working on hooking the ccr(4) driver into our in-kernel crypto
framework (along with some out-of-tree patches to extend OpenSSL's
/dev/crypto engine to support AES-CTR/XTS/GCM and some further changes to
do zero-copy), I've run into several bumps / oddities in OCF.  I'm probably
going to miss several of them, but here's at least a start of a list of
things.  In some cases I have some suggestions on improvements.

I will try to start with more broad / higher-level items first before
diving into minutiae:

- OCF is over flexible and overly broad.  Rather than supporting
  arbitrary stacking of transforms (and arbitrary depths), I think we
  should probably aim to support more specific workloads and support
  them well.  To my mind the classes of things we should support are
  probably:

  - Simple block cipher requests.
  - Simple "hash a buffer" requests. (Both HMAC and non-HMAC)
  - IPSec-style requests (combined auth and encryption using
    "encrypt-then-mac" with an optional AAD region before the
    ciphertext).  Note that geli requests fall into this type.
  - TLS-style requests (using TLS's different methods of
    combining auth and encryption methods when those are
    separate)
  - Simple compression / decompression requests.  While this isn't
    "crypto", per se, I do think it is probably still simpler to
    manage this via OCF than a completely separate interface.

  In terms of algorithms, I suspect there are some older algorithms
  we could drop.  Modern hardware doesn't offload DES for example.
  Both ccr(4) and aesni(4) only support AES for encryption.  We
  do need to keep algorithms required for IPSec in the kernel, but
  we could probably drop some others?

- To better support OpenSSL's engine, the /dev/crypto hash interface
  should not require monotonic buffers, but support requests for
  large buffers that span multiple requests (so you can do something
  akin to the 'Init' / 'Update' (N times) / 'Final' model existing
  software hashing APIs use).  In particular, the bigger win for
  hashing in hardware is when you can offload the hashing of a large
  thing rather than small requests.

- To better support OpenSSL's engine, the /dev/crypto hash interface
  should support "plain" hash algorithms such as SHA* without an
  HMAC.  By default OpenSSL's engine interface does the HMAC-specific
  bits (generating pads, etc.) in software and only defers to the
  engine for the raw hash (e.g. if you use the HMAC() function from
  libcrypto it will only ask the engine interface for a raw hash,
  not for an HMAC hash).

- To better support OpenSSL's engine, the /dev/crypto cipher
  interface should also support non-monolithic buffers.  The existing
  engine does this now by copying the last block of the output data
  out as a saved IV to use for a subsequent request, but it might be
  nicer to be more formal here and return the IV to userland for
  non-"final" cipher requests.

- The interface between the crypto layer and backend drivers should
  _not_ use integer session IDs.  This is rediculously dumb and
  inefficient.  All the drivers have silly algorithms to try to manage
  growable arrays that can be indexed by the returned session ID.
  Instead, drivers should be able to return a 'void *' cookie when
  creating a session and get that cookie pointer as an argument to
  the 'process' and 'freesession' callbacks.  Imagine if vnodes used
  an i-node number rather than 'v_data' and you'd have the model OCF
  uses.  I don't mind if we have a kind of generic 'session' structure
  that we export to drivers and pass in the callbacks and the drivers
  get to use a 'foo_data' member of.

- The interface to describe crypto requests needs to move away from
  arbitrary linked lists of descriptors.  We should just have a
  single "session" structure that assumes you have one cipher and
  one auth with a "mode" member to indicate the particular direction
  / combination.  Likewise, the description of a request needs to
  have a similar assumption.  The structures used by the /dev/crypto
  ioctl's are a bit closer to what I think we should use compared to
  the linked-list thing we have now.  Related is that we should be
  able to get rid of having the three separate "algorithms" for GCM
  hashes.  For AES-GCM one would just say they are using AES-GCM
  and both the hash/tag and ciphertext would be valid inputs / outputs
  with a single key.

- To support non-monolithic buffers from the OpenSSL engine, crypto
  requests to drivers also have to support non-monolithic buffers.
  This means having a notion of a buffer that may be at the start,
  middle, or end of a larger transformation (e.g. for hash only the
  start gets the IPAD, only the end gets the OPAD and returns a
  valid hash, etc., whereas for ciphers any non-end requests would
  return the IV to use for the next request).

  For drivers that have buffer size limits, it would be nice to expose
  those limits in the driver capabilities and depend on the upper layer
  to "split" requests such as happens now for disk drivers.

- For hashing algorithms we should support a "verify" mode in addition
  to the current "compute" mode.  The verify mode would accept a block
  of data to hash along with an expected mac and return a success
  / failure rather than an computed hash value.  AES-GCM already works
  this way for decryption, but this would extend that mode for other
  hash algorithms (e.g. AES-CBC+SHA2-256-HMAC).  Existing crypto
  co-processors (e.g. ccr(4)) already support these types of requests.

  Related is that we need to fix IPSec to treat EBADMSG errors from
  descryption as auth failure rather than encryption failure (right
  now AES-GCM auth failures are reported incorrectly in netstat -s
  due to this).

- Sessions for a combined cipher + hash should also be tied to a
  specific way of combining the algorithms.  Right now you can
  create a session for AES-CBC with a SHA hash and the driver has no
  way to know if you are going to do encrypt-then-mac or one of the
  other variants.  We should include this in the session (so a given
  session can only be used for one type which is normally true anyway),
  and drivers can then only claim to support combinations they
  support.

- The CRD_F_IV_PRESENT flag should be removed and replaced with
  a CRD_F_IV_INJECT flag which means "inject the IV".  Right now
  the _lack_ of CRD_F_IV_PRESENT for encryption (but not decryption!)
  requests means "inject the IV".  It would be clearer to just have
  a flag that is only set when you want the driver to take the
  action.

- Speaking of IV handling, drivers have to do some extra handling for
  IVs including possibly generating them.  I think the idea is that
  some co-processors might support generating IVs, but most of the
  drivers I've looked at just end up duplicating the same block of
  code to call arc4rand() for encryption requests without
  CRD_F_IV_EXPLICIT.  I don't believe Linux tries to support this and
  instead always supplies an IV to the driver.  I'd rather we do that
  and only depend on a flag to indicate where the IV is (crd_iv vs
  in the buffer).

- The API for copying data to/from crypto buffers is a bit obtuse and
  limiting.  Rather than accepting the crypto operation ('crp') as
  a parameter to describe the crypto buffer, the crypto_copyback()
  and crypto_copydata() functions accept various members of that
  function explicitly (e.g. crp_flags and crp_buf).  However, in my
  experiments with zero-copy AES-GCM via /dev/crypto and OpenSSL it
  was convenient to store the AAD in a KVA buffer in the 'crp' and
  the payload to transform in an array of VM pages.  However, for
  this model 'crp_buf' is useless.  I ended up adding a wrapper API
  'crypto_copyto' and 'crypto_copyfrom' which accept a 'crp' directly.
  Linux's API actually passes something akin to sglist as the
  description of the buffers in a crypto request.

- We need to not treat accelerated software (e.g. AES-NI) as a
  hardware interface.  Right now OCF's model of priorities when
  trying to choose a backend driver for a session only has two
  "levels" software vs hardware and aesni(4) (and the ARMv8 variant)
  are lumped into the hardware bucket so that they have precedence
  over the "dumb" software implementation.  However, the accelerated
  software algorithms do need some of the same support features of
  the "dumb" software implementation (such as being scheduled on a
  thread pool to use CPU cycles) that are not needed by other "hardware"
  engines.  OCF needs to understand this distinction.

- Somewhat related, we should try to use accelerated software when
  possible (e.g. AES-CBC with SHA) doesn't use AES-NI unless the
  CPU supports accelerated SHA.  Ideally for this case we'd still
  use AES-NI for the AES portion along with the software SHA
  implementation (and we'd do it one pass over the data rather than
  two when possible).

- Sometimes a crypto driver might need to defer certain requests to
  software (e.g. ccr(4) has to do this for some GCM requests).  In
  addition, there are some other cases when we might want requests
  from a single session to be sent to different backends (e.g. you
  may want to use accelerated software for requests below a certain
  size, and a crypto engine for larger requests.  You might also want
  to take NUMA into account when choosing which backend crypto engine
  to dispatch a request to.)  To that end, I think we want to have the
  ability for a single OCF session to support multiple backend
  sessions.

  One use case is that if I as a driver can't handle a request I'd like
  to be able to fail it with a special error code and have the crypto
  later fall back to software for me (and to use accelerated software if
  possible).  Right now ccr(4) duplicates the "dumb" software for GCM
  requests it can't handle explicitly.

  Another use case might be failover if a hardware engine experiences
  a hardware failure.  In theory it should be possible to fail over
  to a different driver at that point including resubmitting pending
  requests that weren't completed, and it should be possible (I think)
  to manage this in the crypto framework rather than in consumers like
  IPSec and GELI.

  Load distribution among backends might be another case to consider
  (e.g. GELI or ZFS encryption once that lands) if you have long-
  running sessions that spawn lots of self-contained requests.

  Note that if we want to spawn additional backend sessions on the fly
  (e.g. only create a software fallback session on demand if a driver
  fails a request with the "use software" magic error code), we will
  have to keep per-session state such as keys around.  We probably
  already do that now, but this would definitely require doing that.

One concern with some of these changes is that there are several drivers
in the tree for older hardware that I'm not sure is really used anymore.
That is an impediment to making changes to the crypto <-> driver interface
if we can't find folks willing to at least test changes to those drivers
if not maintain them.

This is all I could think of today.  What do other folks think?

-- 
John Baldwin