Date: Tue, 13 Mar 2018 23:05:51 +0000 (UTC) From: John Baldwin <jhb@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r330884 - in head/sys: dev/cxgbe dev/cxgbe/firmware dev/cxgbe/tom modules/cxgbe/tom Message-ID: <201803132305.w2DN5pnc010768@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: jhb Date: Tue Mar 13 23:05:51 2018 New Revision: 330884 URL: https://svnweb.freebsd.org/changeset/base/330884 Log: Support for TLS offload of TOE connections on T6 adapters. The TOE engine in Chelsio T6 adapters supports offloading of TLS encryption and TCP segmentation for offloaded connections. Sockets using TLS are required to use a set of custom socket options to upload RX and TX keys to the NIC and to enable RX processing. Currently these socket options are implemented as TCP options in the vendor specific range. A patched OpenSSL library will be made available in a port / package for use with the TLS TOE support. TOE sockets can either offload both transmit and reception of TLS records or just transmit. TLS offload (both RX and TX) is enabled by setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be enabled on the relevant interface. Transmit offload can be used on any "normal" or TLS TOE socket by using the custom socket option to program a transmit key. This permits most TOE sockets to transparently offload TLS when applications use a patched SSL library (e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL library). Receive offload can only be used with TOE sockets using the TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a list of TCP port numbers. Any connection with either a local or remote port number in that list will be created as a TLS socket rather than a plain TOE socket. Note that although this sysctl accepts an arbitrary list of port numbers, the sysctl(8) tool is only able to set sysctl nodes to a single value. A TLS socket will hang without receiving data if used by an application that is not using a patched SSL library. Thus, the tls_rx_ports node should be used with care. For a server mostly concerned with offloading TLS transmit, this node is not needed as plain TOE sockets will fall back to software crypto when using an unpatched SSL library. New per-interface statistics nodes are added giving counts of TLS packets and payload bytes (payload bytes do not include TLS headers or authentication tags/MACs) offloaded via the TOE engine, e.g.: dev.cc.0.stats.rx_tls_octets: 149 dev.cc.0.stats.rx_tls_records: 13 dev.cc.0.stats.tx_tls_octets: 26501823 dev.cc.0.stats.tx_tls_records: 1620 TLS transmit work requests are constructed by a new variant of t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c. TLS transmit work requests require a buffer containing IVs. If the IVs are too large to fit into the work request, a separate buffer is allocated when constructing a work request. This buffer is associated with the transmit descriptor and freed when the descriptor is ACKed by the adapter. Received TLS frames use two new CPL messages. The first message is a CPL_TLS_DATA containing the decryped payload of a single TLS record. The handler places the mbuf containing the received payload on an mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message which includes a copy of the TLS header and indicates if there were any errors. The handler for this message places the TLS header into the socket buffer followed by the saved mbuf with the payload data. Both of these handlers are contained in tom/t4_tls.c. A few routines were exposed from t4_cpl_io.c for use by t4_tls.c including send_rx_credits(), a new send_rx_modulate(), and t4_close_conn(). TLS keys for both transmit and receive are stored in onboard memory in the NIC in the "TLS keys" memory region. In some cases a TLS socket can hang with pending data available in the NIC that is not delivered to the host. As a workaround, TLS sockets are more aggressive about sending CPL_RX_DATA_ACK messages anytime that any data is read from a TLS socket. In addition, a fallback timer will periodically send CPL_RX_DATA_ACK messages to the NIC for connections that are still in the handshake phase. Once the connection has finished the handshake and programmed RX keys via the socket option, the timer is stopped. A new function select_ulp_mode() is used to determine what sub-mode a given TOE socket should use (plain TOE, DDP, or TLS). The existing set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and handles initialization of TLS-specific state when necessary in addition to DDP-specific state. Since TLS sockets do not receive individual TCP segments but always receive full TLS records, they can receive more data than is available in the current window (e.g. if a 16k TLS record is received but the socket buffer is itself 16k). To cope with this, just drop the window to 0 when this happens, but track the overage and "eat" the overage as it is read from the socket buffer not opening the window (or adding rx_credits) for the overage bytes. Reviewed by: np (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14529 Added: head/sys/dev/cxgbe/tom/t4_tls.c (contents, props changed) head/sys/dev/cxgbe/tom/t4_tls.h (contents, props changed) Modified: head/sys/dev/cxgbe/adapter.h head/sys/dev/cxgbe/firmware/t6fw_cfg.txt head/sys/dev/cxgbe/offload.h head/sys/dev/cxgbe/t4_main.c head/sys/dev/cxgbe/tom/t4_connect.c head/sys/dev/cxgbe/tom/t4_cpl_io.c head/sys/dev/cxgbe/tom/t4_listen.c head/sys/dev/cxgbe/tom/t4_tom.c head/sys/dev/cxgbe/tom/t4_tom.h head/sys/modules/cxgbe/tom/Makefile Modified: head/sys/dev/cxgbe/adapter.h ============================================================================== --- head/sys/dev/cxgbe/adapter.h Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/adapter.h Tue Mar 13 23:05:51 2018 (r330884) @@ -297,6 +297,10 @@ struct port_info { struct port_stats stats; u_int tnl_cong_drops; u_int tx_parse_error; + u_long tx_tls_records; + u_long tx_tls_octets; + u_long rx_tls_records; + u_long rx_tls_octets; struct callout tick; }; Modified: head/sys/dev/cxgbe/firmware/t6fw_cfg.txt ============================================================================== --- head/sys/dev/cxgbe/firmware/t6fw_cfg.txt Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/firmware/t6fw_cfg.txt Tue Mar 13 23:05:51 2018 (r330884) @@ -163,10 +163,12 @@ nserver = 512 nhpfilter = 0 nhash = 16384 - protocol = ofld, rddp, rdmac, iscsi_initiator_pdu, iscsi_target_pdu, iscsi_t10dif, crypto_lookaside + protocol = ofld, rddp, rdmac, iscsi_initiator_pdu, iscsi_target_pdu, iscsi_t10dif, tlskeys, crypto_lookaside tp_l2t = 4096 tp_ddp = 2 tp_ddp_iscsi = 2 + tp_tls_key = 3 + tp_tls_mxrxsize = 17408 # 16384 + 1024, governs max rx data, pm max xfer len, rx coalesce sizes tp_stag = 2 tp_pbl = 5 tp_rq = 7 @@ -273,7 +275,7 @@ [fini] version = 0x1 - checksum = 0x7191019f + checksum = 0x9e8952d2 # # $FreeBSD$ # Modified: head/sys/dev/cxgbe/offload.h ============================================================================== --- head/sys/dev/cxgbe/offload.h Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/offload.h Tue Mar 13 23:05:51 2018 (r330884) @@ -151,6 +151,9 @@ struct tom_tunables { int sndbuf; int ddp; int rx_coalesce; + int tls; + int *tls_rx_ports; + int num_tls_rx_ports; int tx_align; int tx_zcopy; }; Modified: head/sys/dev/cxgbe/t4_main.c ============================================================================== --- head/sys/dev/cxgbe/t4_main.c Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/t4_main.c Tue Mar 13 23:05:51 2018 (r330884) @@ -591,6 +591,7 @@ static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS); static int sysctl_tc_params(SYSCTL_HANDLER_ARGS); #endif #ifdef TCP_OFFLOAD +static int sysctl_tls_rx_ports(SYSCTL_HANDLER_ARGS); static int sysctl_tp_tick(SYSCTL_HANDLER_ARGS); static int sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS); static int sysctl_tp_timer(SYSCTL_HANDLER_ARGS); @@ -1390,6 +1391,7 @@ t4_detach_common(device_t dev) free(sc->sge.iqmap, M_CXGBE); free(sc->sge.eqmap, M_CXGBE); free(sc->tids.ftid_tab, M_CXGBE); + free(sc->tt.tls_rx_ports, M_CXGBE); t4_destroy_dma_tag(sc); if (mtx_initialized(&sc->sc_lock)) { sx_xlock(&t4_list_lock); @@ -5433,6 +5435,14 @@ t4_sysctls(struct adapter *sc) SYSCTL_ADD_INT(ctx, children, OID_AUTO, "rx_coalesce", CTLFLAG_RW, &sc->tt.rx_coalesce, 0, "receive coalescing"); + sc->tt.tls = 0; + SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tls", CTLFLAG_RW, + &sc->tt.tls, 0, "Inline TLS allowed"); + + SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tls_rx_ports", + CTLTYPE_INT | CTLFLAG_RW, sc, 0, sysctl_tls_rx_ports, + "I", "TCP ports that use inline TLS+TOE RX"); + sc->tt.tx_align = 1; SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_align", CTLFLAG_RW, &sc->tt.tx_align, 0, "chop and align payload"); @@ -5836,6 +5846,19 @@ cxgbe_sysctls(struct port_info *pi) "# of buffer-group 3 truncated packets"); #undef SYSCTL_ADD_T4_PORTSTAT + + SYSCTL_ADD_ULONG(ctx, children, OID_AUTO, "tx_tls_records", + CTLFLAG_RD, &pi->tx_tls_records, + "# of TLS records transmitted"); + SYSCTL_ADD_ULONG(ctx, children, OID_AUTO, "tx_tls_octets", + CTLFLAG_RD, &pi->tx_tls_octets, + "# of payload octets in transmitted TLS records"); + SYSCTL_ADD_ULONG(ctx, children, OID_AUTO, "rx_tls_records", + CTLFLAG_RD, &pi->rx_tls_records, + "# of TLS records received"); + SYSCTL_ADD_ULONG(ctx, children, OID_AUTO, "rx_tls_octets", + CTLFLAG_RD, &pi->rx_tls_octets, + "# of payload octets in received TLS records"); } static int @@ -8257,6 +8280,68 @@ done: #endif #ifdef TCP_OFFLOAD +static int +sysctl_tls_rx_ports(SYSCTL_HANDLER_ARGS) +{ + struct adapter *sc = arg1; + int *old_ports, *new_ports; + int i, new_count, rc; + + if (req->newptr == NULL && req->oldptr == NULL) + return (SYSCTL_OUT(req, NULL, imax(sc->tt.num_tls_rx_ports, 1) * + sizeof(sc->tt.tls_rx_ports[0]))); + + rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4tlsrx"); + if (rc) + return (rc); + + if (sc->tt.num_tls_rx_ports == 0) { + i = -1; + rc = SYSCTL_OUT(req, &i, sizeof(i)); + } else + rc = SYSCTL_OUT(req, sc->tt.tls_rx_ports, + sc->tt.num_tls_rx_ports * sizeof(sc->tt.tls_rx_ports[0])); + if (rc == 0 && req->newptr != NULL) { + new_count = req->newlen / sizeof(new_ports[0]); + new_ports = malloc(new_count * sizeof(new_ports[0]), M_CXGBE, + M_WAITOK); + rc = SYSCTL_IN(req, new_ports, new_count * + sizeof(new_ports[0])); + if (rc) + goto err; + + /* Allow setting to a single '-1' to clear the list. */ + if (new_count == 1 && new_ports[0] == -1) { + ADAPTER_LOCK(sc); + old_ports = sc->tt.tls_rx_ports; + sc->tt.tls_rx_ports = NULL; + sc->tt.num_tls_rx_ports = 0; + ADAPTER_UNLOCK(sc); + free(old_ports, M_CXGBE); + } else { + for (i = 0; i < new_count; i++) { + if (new_ports[i] < 1 || + new_ports[i] > IPPORT_MAX) { + rc = EINVAL; + goto err; + } + } + + ADAPTER_LOCK(sc); + old_ports = sc->tt.tls_rx_ports; + sc->tt.tls_rx_ports = new_ports; + sc->tt.num_tls_rx_ports = new_count; + ADAPTER_UNLOCK(sc); + free(old_ports, M_CXGBE); + new_ports = NULL; + } + err: + free(new_ports, M_CXGBE); + } + end_synchronized_op(sc, 0); + return (rc); +} + static void unit_conv(char *buf, size_t len, u_int val, u_int factor) { Modified: head/sys/dev/cxgbe/tom/t4_connect.c ============================================================================== --- head/sys/dev/cxgbe/tom/t4_connect.c Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/tom/t4_connect.c Tue Mar 13 23:05:51 2018 (r330884) @@ -142,6 +142,10 @@ do_act_establish(struct sge_iq *iq, const struct rss_h } make_established(toep, cpl->snd_isn, cpl->rcv_isn, cpl->tcp_opt); + + if (toep->ulp_mode == ULP_MODE_TLS) + tls_establish(toep); + done: INP_WUNLOCK(inp); CURVNET_RESTORE(); @@ -268,6 +272,11 @@ calc_opt2a(struct socket *so, struct toepcb *toep) if (toep->ulp_mode == ULP_MODE_TCPDDP) opt2 |= F_RX_FC_VALID | F_RX_FC_DDP; #endif + if (toep->ulp_mode == ULP_MODE_TLS) { + opt2 |= F_RX_FC_VALID; + opt2 &= ~V_RX_COALESCE(M_RX_COALESCE); + opt2 |= F_RX_FC_DISABLE; + } return (htobe32(opt2)); } @@ -378,10 +387,7 @@ t4_connect(struct toedev *tod, struct socket *so, stru DONT_OFFLOAD_ACTIVE_OPEN(ENOMEM); toep->vnet = so->so_vnet; - if (sc->tt.ddp && (so->so_options & SO_NO_DDP) == 0) - set_tcpddp_ulp_mode(toep); - else - toep->ulp_mode = ULP_MODE_NONE; + set_ulp_mode(toep, select_ulp_mode(so, sc)); SOCKBUF_LOCK(&so->so_rcv); /* opt0 rcv_bufsiz initially, assumes its normal meaning later */ toep->rx_credits = min(select_rcv_wnd(so) >> 10, M_RCV_BUFSIZ); Modified: head/sys/dev/cxgbe/tom/t4_cpl_io.c ============================================================================== --- head/sys/dev/cxgbe/tom/t4_cpl_io.c Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/tom/t4_cpl_io.c Tue Mar 13 23:05:51 2018 (r330884) @@ -73,9 +73,6 @@ __FBSDID("$FreeBSD$"); #include "tom/t4_tom_l2t.h" #include "tom/t4_tom.h" -#define IS_AIOTX_MBUF(m) \ - ((m)->m_flags & M_EXT && (m)->m_ext.ext_flags & EXT_FLAG_AIOTX) - static void t4_aiotx_cancel(struct kaiocb *job); static void t4_aiotx_queue_toep(struct toepcb *toep); @@ -106,7 +103,7 @@ send_flowc_wr(struct toepcb *toep, struct flowc_tx_par { struct wrqe *wr; struct fw_flowc_wr *flowc; - unsigned int nparams = ftxp ? 8 : 6, flowclen; + unsigned int nparams, flowclen, paramidx; struct vi_info *vi = toep->vi; struct port_info *pi = vi->pi; struct adapter *sc = pi->adapter; @@ -116,6 +113,15 @@ send_flowc_wr(struct toepcb *toep, struct flowc_tx_par KASSERT(!(toep->flags & TPF_FLOWC_WR_SENT), ("%s: flowc for tid %u sent already", __func__, toep->tid)); + if (ftxp != NULL) + nparams = 8; + else + nparams = 6; + if (toep->ulp_mode == ULP_MODE_TLS) + nparams++; + if (toep->tls.fcplenmax != 0) + nparams++; + flowclen = sizeof(*flowc) + nparams * sizeof(struct fw_flowc_mnemval); wr = alloc_wrqe(roundup2(flowclen, 16), toep->ofld_txq); @@ -131,39 +137,45 @@ send_flowc_wr(struct toepcb *toep, struct flowc_tx_par flowc->flowid_len16 = htonl(V_FW_WR_LEN16(howmany(flowclen, 16)) | V_FW_WR_FLOWID(toep->tid)); - flowc->mnemval[0].mnemonic = FW_FLOWC_MNEM_PFNVFN; - flowc->mnemval[0].val = htobe32(pfvf); - flowc->mnemval[1].mnemonic = FW_FLOWC_MNEM_CH; - flowc->mnemval[1].val = htobe32(pi->tx_chan); - flowc->mnemval[2].mnemonic = FW_FLOWC_MNEM_PORT; - flowc->mnemval[2].val = htobe32(pi->tx_chan); - flowc->mnemval[3].mnemonic = FW_FLOWC_MNEM_IQID; - flowc->mnemval[3].val = htobe32(toep->ofld_rxq->iq.abs_id); +#define FLOWC_PARAM(__m, __v) \ + do { \ + flowc->mnemval[paramidx].mnemonic = FW_FLOWC_MNEM_##__m; \ + flowc->mnemval[paramidx].val = htobe32(__v); \ + paramidx++; \ + } while (0) + + paramidx = 0; + + FLOWC_PARAM(PFNVFN, pfvf); + FLOWC_PARAM(CH, pi->tx_chan); + FLOWC_PARAM(PORT, pi->tx_chan); + FLOWC_PARAM(IQID, toep->ofld_rxq->iq.abs_id); if (ftxp) { uint32_t sndbuf = min(ftxp->snd_space, sc->tt.sndbuf); - flowc->mnemval[4].mnemonic = FW_FLOWC_MNEM_SNDNXT; - flowc->mnemval[4].val = htobe32(ftxp->snd_nxt); - flowc->mnemval[5].mnemonic = FW_FLOWC_MNEM_RCVNXT; - flowc->mnemval[5].val = htobe32(ftxp->rcv_nxt); - flowc->mnemval[6].mnemonic = FW_FLOWC_MNEM_SNDBUF; - flowc->mnemval[6].val = htobe32(sndbuf); - flowc->mnemval[7].mnemonic = FW_FLOWC_MNEM_MSS; - flowc->mnemval[7].val = htobe32(ftxp->mss); + FLOWC_PARAM(SNDNXT, ftxp->snd_nxt); + FLOWC_PARAM(RCVNXT, ftxp->rcv_nxt); + FLOWC_PARAM(SNDBUF, sndbuf); + FLOWC_PARAM(MSS, ftxp->mss); CTR6(KTR_CXGBE, "%s: tid %u, mss %u, sndbuf %u, snd_nxt 0x%x, rcv_nxt 0x%x", __func__, toep->tid, ftxp->mss, sndbuf, ftxp->snd_nxt, ftxp->rcv_nxt); } else { - flowc->mnemval[4].mnemonic = FW_FLOWC_MNEM_SNDBUF; - flowc->mnemval[4].val = htobe32(512); - flowc->mnemval[5].mnemonic = FW_FLOWC_MNEM_MSS; - flowc->mnemval[5].val = htobe32(512); + FLOWC_PARAM(SNDBUF, 512); + FLOWC_PARAM(MSS, 512); CTR2(KTR_CXGBE, "%s: tid %u", __func__, toep->tid); } + if (toep->ulp_mode == ULP_MODE_TLS) + FLOWC_PARAM(ULP_MODE, toep->ulp_mode); + if (toep->tls.fcplenmax != 0) + FLOWC_PARAM(TXDATAPLEN_MAX, toep->tls.fcplenmax); +#undef FLOWC_PARAM + KASSERT(paramidx == nparams, ("nparams mismatch")); + txsd->tx_credits = howmany(flowclen, 16); txsd->plen = 0; KASSERT(toep->tx_credits >= txsd->tx_credits && toep->txsd_avail > 0, @@ -421,7 +433,7 @@ make_established(struct toepcb *toep, uint32_t snd_isn soisconnected(so); } -static int +int send_rx_credits(struct adapter *sc, struct toepcb *toep, int credits) { struct wrqe *wr; @@ -443,6 +455,23 @@ send_rx_credits(struct adapter *sc, struct toepcb *toe } void +send_rx_modulate(struct adapter *sc, struct toepcb *toep) +{ + struct wrqe *wr; + struct cpl_rx_data_ack *req; + + wr = alloc_wrqe(sizeof(*req), toep->ctrlq); + if (wr == NULL) + return; + req = wrtod(wr); + + INIT_TP_WR_MIT_CPL(req, CPL_RX_DATA_ACK, toep->tid); + req->credit_dack = htobe32(F_RX_MODULATE_RX); + + t4_wrq_tx(sc, wr); +} + +void t4_rcvd_locked(struct toedev *tod, struct tcpcb *tp) { struct adapter *sc = tod->tod_softc; @@ -459,8 +488,18 @@ t4_rcvd_locked(struct toedev *tod, struct tcpcb *tp) ("%s: sb %p has more data (%d) than last time (%d).", __func__, sb, sbused(sb), toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sbused(sb); + credits = toep->sb_cc - sbused(sb); toep->sb_cc = sbused(sb); + if (toep->ulp_mode == ULP_MODE_TLS) { + if (toep->tls.rcv_over >= credits) { + toep->tls.rcv_over -= credits; + credits = 0; + } else { + credits -= toep->tls.rcv_over; + toep->tls.rcv_over = 0; + } + } + toep->rx_credits += credits; if (toep->rx_credits > 0 && (tp->rcv_wnd <= 32 * 1024 || toep->rx_credits >= 64 * 1024 || @@ -471,7 +510,8 @@ t4_rcvd_locked(struct toedev *tod, struct tcpcb *tp) toep->rx_credits -= credits; tp->rcv_wnd += credits; tp->rcv_adv += credits; - } + } else if (toep->flags & TPF_FORCE_CREDITS) + send_rx_modulate(sc, toep); } void @@ -489,8 +529,8 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp) /* * Close a connection by sending a CPL_CLOSE_CON_REQ message. */ -static int -close_conn(struct adapter *sc, struct toepcb *toep) +int +t4_close_conn(struct adapter *sc, struct toepcb *toep) { struct wrqe *wr; struct cpl_close_con_req *req; @@ -691,6 +731,7 @@ t4_push_frames(struct adapter *sc, struct toepcb *toep KASSERT(toep->ulp_mode == ULP_MODE_NONE || toep->ulp_mode == ULP_MODE_TCPDDP || + toep->ulp_mode == ULP_MODE_TLS || toep->ulp_mode == ULP_MODE_RDMA, ("%s: ulp_mode %u for toep %p", __func__, toep->ulp_mode, toep)); @@ -905,7 +946,7 @@ t4_push_frames(struct adapter *sc, struct toepcb *toep /* Send a FIN if requested, but only if there's no more data to send */ if (m == NULL && toep->flags & TPF_SEND_FIN) - close_conn(sc, toep); + t4_close_conn(sc, toep); } static inline void @@ -1097,7 +1138,7 @@ t4_push_pdus(struct adapter *sc, struct toepcb *toep, /* Send a FIN if requested, but only if there are no more PDUs to send */ if (mbufq_first(pduq) == NULL && toep->flags & TPF_SEND_FIN) - close_conn(sc, toep); + t4_close_conn(sc, toep); } int @@ -1116,6 +1157,8 @@ t4_tod_output(struct toedev *tod, struct tcpcb *tp) if (toep->ulp_mode == ULP_MODE_ISCSI) t4_push_pdus(sc, toep, 0); + else if (tls_tx_key(toep)) + t4_push_tls_records(sc, toep, 0); else t4_push_frames(sc, toep, 0); @@ -1140,6 +1183,8 @@ t4_send_fin(struct toedev *tod, struct tcpcb *tp) if (tp->t_state >= TCPS_ESTABLISHED) { if (toep->ulp_mode == ULP_MODE_ISCSI) t4_push_pdus(sc, toep, 0); + else if (tls_tx_key(toep)) + t4_push_tls_records(sc, toep, 0); else t4_push_frames(sc, toep, 0); } @@ -1772,6 +1817,10 @@ do_fw4_ack(struct sge_iq *iq, const struct rss_header credits -= txsd->tx_credits; toep->tx_credits += txsd->tx_credits; plen += txsd->plen; + if (txsd->iv_buffer) { + free(txsd->iv_buffer, M_CXGBE); + txsd->iv_buffer = NULL; + } txsd++; toep->txsd_avail++; KASSERT(toep->txsd_avail <= toep->txsd_total, @@ -1797,6 +1846,8 @@ do_fw4_ack(struct sge_iq *iq, const struct rss_header CURVNET_SET(toep->vnet); if (toep->ulp_mode == ULP_MODE_ISCSI) t4_push_pdus(sc, toep, plen); + else if (tls_tx_key(toep)) + t4_push_tls_records(sc, toep, plen); else t4_push_frames(sc, toep, plen); CURVNET_RESTORE(); @@ -1826,6 +1877,12 @@ do_fw4_ack(struct sge_iq *iq, const struct rss_header tid, plen); #endif sbdrop_locked(sb, plen); + if (tls_tx_key(toep)) { + struct tls_ofld_info *tls_ofld = &toep->tls; + + MPASS(tls_ofld->sb_off >= plen); + tls_ofld->sb_off -= plen; + } if (!TAILQ_EMPTY(&toep->aiotx_jobq)) t4_aiotx_queue_toep(toep); sowwakeup_locked(so); /* unlocks so_snd */ @@ -2298,6 +2355,9 @@ t4_aio_queue_aiotx(struct socket *so, struct kaiocb *j return (EOPNOTSUPP); if (!sc->tt.tx_zcopy) + return (EOPNOTSUPP); + + if (is_tls_offload(toep) || tls_tx_key(toep)) return (EOPNOTSUPP); SOCKBUF_LOCK(&so->so_snd); Modified: head/sys/dev/cxgbe/tom/t4_listen.c ============================================================================== --- head/sys/dev/cxgbe/tom/t4_listen.c Tue Mar 13 22:54:29 2018 (r330883) +++ head/sys/dev/cxgbe/tom/t4_listen.c Tue Mar 13 23:05:51 2018 (r330884) @@ -1056,6 +1056,11 @@ calc_opt2p(struct adapter *sc, struct port_info *pi, i if (ulp_mode == ULP_MODE_TCPDDP) opt2 |= F_RX_FC_VALID | F_RX_FC_DDP; #endif + if (ulp_mode == ULP_MODE_TLS) { + opt2 |= F_RX_FC_VALID; + opt2 &= ~V_RX_COALESCE(M_RX_COALESCE); + opt2 |= F_RX_FC_DISABLE; + } return htobe32(opt2); } @@ -1347,11 +1352,15 @@ found: INIT_TP_WR_MIT_CPL(rpl5, CPL_PASS_ACCEPT_RPL, tid); } - if (sc->tt.ddp && (so->so_options & SO_NO_DDP) == 0) { - ulp_mode = ULP_MODE_TCPDDP; + ulp_mode = select_ulp_mode(so, sc); + switch (ulp_mode) { + case ULP_MODE_TCPDDP: synqe->flags |= TPF_SYNQE_TCPDDP; - } else - ulp_mode = ULP_MODE_NONE; + break; + case ULP_MODE_TLS: + synqe->flags |= TPF_SYNQE_TLS; + break; + } rpl->opt0 = calc_opt0(so, vi, e, mtu_idx, rscale, rx_credits, ulp_mode); rpl->opt2 = calc_opt2p(sc, pi, rxqid, &cpl->tcpopt, &th, ulp_mode); @@ -1407,8 +1416,8 @@ found: REJECT_PASS_ACCEPT(); } - CTR5(KTR_CXGBE, "%s: stid %u, tid %u, lctx %p, synqe %p, SYNACK", - __func__, stid, tid, lctx, synqe); + CTR6(KTR_CXGBE, "%s: stid %u, tid %u, lctx %p, synqe %p, SYNACK mode %d", + __func__, stid, tid, lctx, synqe, ulp_mode); INP_WLOCK(inp); synqe->flags |= TPF_SYNQE_HAS_L2TE; @@ -1557,9 +1566,11 @@ reset: toep->tid = tid; toep->l2te = &sc->l2t->l2tab[synqe->l2e_idx]; if (synqe->flags & TPF_SYNQE_TCPDDP) - set_tcpddp_ulp_mode(toep); + set_ulp_mode(toep, ULP_MODE_TCPDDP); + else if (synqe->flags & TPF_SYNQE_TLS) + set_ulp_mode(toep, ULP_MODE_TLS); else - toep->ulp_mode = ULP_MODE_NONE; + set_ulp_mode(toep, ULP_MODE_NONE); /* opt0 rcv_bufsiz initially, assumes its normal meaning later */ toep->rx_credits = synqe->rcv_bufsize; Added: head/sys/dev/cxgbe/tom/t4_tls.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sys/dev/cxgbe/tom/t4_tls.c Tue Mar 13 23:05:51 2018 (r330884) @@ -0,0 +1,1642 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD + * + * Copyright (c) 2017-2018 Chelsio Communications, Inc. + * All rights reserved. + * Written by: John Baldwin <jhb@FreeBSD.org> + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include "opt_inet.h" + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/sglist.h> +#include <sys/socket.h> +#include <sys/socketvar.h> +#include <sys/systm.h> +#include <netinet/in.h> +#include <netinet/in_pcb.h> +#include <netinet/tcp_var.h> +#include <netinet/toecore.h> + +#ifdef TCP_OFFLOAD +#include "common/common.h" +#include "common/t4_tcb.h" +#include "tom/t4_tom_l2t.h" +#include "tom/t4_tom.h" + +/* + * The TCP sequence number of a CPL_TLS_DATA mbuf is saved here while + * the mbuf is in the ulp_pdu_reclaimq. + */ +#define tls_tcp_seq PH_loc.thirtytwo[0] + +/* + * Handshake lock used for the handshake timer. Having a global lock + * is perhaps not ideal, but it avoids having to use callout_drain() + * in tls_uninit_toep() which can't block. Also, the timer shouldn't + * actually fire for most connections. + */ +static struct mtx tls_handshake_lock; + +static void +t4_set_tls_tcb_field(struct toepcb *toep, uint16_t word, uint64_t mask, + uint64_t val) +{ + struct adapter *sc = td_adapter(toep->td); + + t4_set_tcb_field(sc, toep->ctrlq, toep->tid, word, mask, val, 0, 0, + toep->ofld_rxq->iq.abs_id); +} + +/* TLS and DTLS common routines */ +int +tls_tx_key(struct toepcb *toep) +{ + struct tls_ofld_info *tls_ofld = &toep->tls; + + return (tls_ofld->tx_key_addr >= 0); +} + +int +tls_rx_key(struct toepcb *toep) +{ + struct tls_ofld_info *tls_ofld = &toep->tls; + + return (tls_ofld->rx_key_addr >= 0); +} + +static int +key_size(struct toepcb *toep) +{ + struct tls_ofld_info *tls_ofld = &toep->tls; + + return ((tls_ofld->key_location == TLS_SFO_WR_CONTEXTLOC_IMMEDIATE) ? + tls_ofld->k_ctx.tx_key_info_size : KEY_IN_DDR_SIZE); +} + +/* Set TLS Key-Id in TCB */ +static void +t4_set_tls_keyid(struct toepcb *toep, unsigned int key_id) +{ + + t4_set_tls_tcb_field(toep, W_TCB_RX_TLS_KEY_TAG, + V_TCB_RX_TLS_KEY_TAG(M_TCB_RX_TLS_BUF_TAG), + V_TCB_RX_TLS_KEY_TAG(key_id)); +} + +/* Clear TF_RX_QUIESCE to re-enable receive. */ +static void +t4_clear_rx_quiesce(struct toepcb *toep) +{ + + t4_set_tls_tcb_field(toep, W_TCB_T_FLAGS, V_TF_RX_QUIESCE(1), 0); +} + +static void +tls_clr_ofld_mode(struct toepcb *toep) +{ + + tls_stop_handshake_timer(toep); + + /* Operate in PDU extraction mode only. */ + t4_set_tls_tcb_field(toep, W_TCB_ULP_RAW, + V_TCB_ULP_RAW(M_TCB_ULP_RAW), + V_TCB_ULP_RAW(V_TF_TLS_ENABLE(1))); + t4_clear_rx_quiesce(toep); +} + +static void +tls_clr_quiesce(struct toepcb *toep) +{ + + tls_stop_handshake_timer(toep); + t4_clear_rx_quiesce(toep); +} + +/* + * Calculate the TLS data expansion size + */ +static int +tls_expansion_size(struct toepcb *toep, int data_len, int full_pdus_only, + unsigned short *pdus_per_ulp) +{ + struct tls_ofld_info *tls_ofld = &toep->tls; + struct tls_scmd *scmd = &tls_ofld->scmd0; + int expn_size = 0, frag_count = 0, pad_per_pdu = 0, + pad_last_pdu = 0, last_frag_size = 0, max_frag_size = 0; + int exp_per_pdu = 0; + int hdr_len = TLS_HEADER_LENGTH; + + do { + max_frag_size = tls_ofld->k_ctx.frag_size; + if (G_SCMD_CIPH_MODE(scmd->seqno_numivs) == + SCMD_CIPH_MODE_AES_GCM) { + frag_count = (data_len / max_frag_size); + exp_per_pdu = GCM_TAG_SIZE + AEAD_EXPLICIT_DATA_SIZE + + hdr_len; + expn_size = frag_count * exp_per_pdu; + if (full_pdus_only) { + *pdus_per_ulp = data_len / (exp_per_pdu + + max_frag_size); + if (*pdus_per_ulp > 32) + *pdus_per_ulp = 32; + else if(!*pdus_per_ulp) + *pdus_per_ulp = 1; + expn_size = (*pdus_per_ulp) * exp_per_pdu; + break; + } + if ((last_frag_size = data_len % max_frag_size) > 0) { + frag_count += 1; + expn_size += exp_per_pdu; + } + break; + } else if (G_SCMD_CIPH_MODE(scmd->seqno_numivs) != + SCMD_CIPH_MODE_NOP) { + /* Calculate the number of fragments we can make */ + frag_count = (data_len / max_frag_size); + if (frag_count > 0) { + pad_per_pdu = (((howmany((max_frag_size + + tls_ofld->mac_length), + CIPHER_BLOCK_SIZE)) * + CIPHER_BLOCK_SIZE) - + (max_frag_size + + tls_ofld->mac_length)); + if (!pad_per_pdu) + pad_per_pdu = CIPHER_BLOCK_SIZE; + exp_per_pdu = pad_per_pdu + + tls_ofld->mac_length + + hdr_len + CIPHER_BLOCK_SIZE; + expn_size = frag_count * exp_per_pdu; + } + if (full_pdus_only) { + *pdus_per_ulp = data_len / (exp_per_pdu + + max_frag_size); + if (*pdus_per_ulp > 32) + *pdus_per_ulp = 32; + else if (!*pdus_per_ulp) + *pdus_per_ulp = 1; + expn_size = (*pdus_per_ulp) * exp_per_pdu; + break; + } + /* Consider the last fragment */ + if ((last_frag_size = data_len % max_frag_size) > 0) { + pad_last_pdu = (((howmany((last_frag_size + + tls_ofld->mac_length), + CIPHER_BLOCK_SIZE)) * + CIPHER_BLOCK_SIZE) - + (last_frag_size + + tls_ofld->mac_length)); + if (!pad_last_pdu) + pad_last_pdu = CIPHER_BLOCK_SIZE; + expn_size += (pad_last_pdu + + tls_ofld->mac_length + hdr_len + + CIPHER_BLOCK_SIZE); + } + } + } while (0); + + return (expn_size); +} + +/* Copy Key to WR */ +static void +tls_copy_tx_key(struct toepcb *toep, void *dst) +{ + struct tls_ofld_info *tls_ofld = &toep->tls; + struct ulptx_sc_memrd *sc_memrd; + struct ulptx_idata *sc; + + if (tls_ofld->k_ctx.tx_key_info_size <= 0) + return; + + if (tls_ofld->key_location == TLS_SFO_WR_CONTEXTLOC_DDR) { + sc = dst; + sc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_NOOP)); + sc->len = htobe32(0); + sc_memrd = (struct ulptx_sc_memrd *)(sc + 1); + sc_memrd->cmd_to_len = htobe32(V_ULPTX_CMD(ULP_TX_SC_MEMRD) | + V_ULP_TX_SC_MORE(1) | + V_ULPTX_LEN16(tls_ofld->k_ctx.tx_key_info_size >> 4)); + sc_memrd->addr = htobe32(tls_ofld->tx_key_addr >> 5); + } else if (tls_ofld->key_location == TLS_SFO_WR_CONTEXTLOC_IMMEDIATE) { + memcpy(dst, &tls_ofld->k_ctx.tx, + tls_ofld->k_ctx.tx_key_info_size); + } +} + +/* TLS/DTLS content type for CPL SFO */ +static inline unsigned char +tls_content_type(unsigned char content_type) +{ + /* + * XXX: Shouldn't this map CONTENT_TYPE_APP_DATA to DATA and + * default to "CUSTOM" for all other types including + * heartbeat? + */ + switch (content_type) { + case CONTENT_TYPE_CCS: + return CPL_TX_TLS_SFO_TYPE_CCS; + case CONTENT_TYPE_ALERT: + return CPL_TX_TLS_SFO_TYPE_ALERT; + case CONTENT_TYPE_HANDSHAKE: + return CPL_TX_TLS_SFO_TYPE_HANDSHAKE; + case CONTENT_TYPE_HEARTBEAT: + return CPL_TX_TLS_SFO_TYPE_HEARTBEAT; + } + return CPL_TX_TLS_SFO_TYPE_DATA; +} + +static unsigned char +get_cipher_key_size(unsigned int ck_size) +{ + switch (ck_size) { + case AES_NOP: /* NOP */ + return 15; + case AES_128: /* AES128 */ + return CH_CK_SIZE_128; + case AES_192: /* AES192 */ + return CH_CK_SIZE_192; + case AES_256: /* AES256 */ + return CH_CK_SIZE_256; + default: + return CH_CK_SIZE_256; + } +} + +static unsigned char +get_mac_key_size(unsigned int mk_size) +{ + switch (mk_size) { + case SHA_NOP: /* NOP */ + return CH_MK_SIZE_128; + case SHA_GHASH: /* GHASH */ + case SHA_512: /* SHA512 */ + return CH_MK_SIZE_512; + case SHA_224: /* SHA2-224 */ + return CH_MK_SIZE_192; + case SHA_256: /* SHA2-256*/ + return CH_MK_SIZE_256; + case SHA_384: /* SHA384 */ + return CH_MK_SIZE_512; + case SHA1: /* SHA1 */ + default: + return CH_MK_SIZE_160; + } +} + +static unsigned int +get_proto_ver(int proto_ver) +{ + switch (proto_ver) { + case TLS1_2_VERSION: + return TLS_1_2_VERSION; + case TLS1_1_VERSION: + return TLS_1_1_VERSION; + case DTLS1_2_VERSION: + return DTLS_1_2_VERSION; + default: + return TLS_VERSION_MAX; + } +} + +static void +tls_rxkey_flit1(struct tls_keyctx *kwr, struct tls_key_context *kctx) +{ + + if (kctx->state.enc_mode == CH_EVP_CIPH_GCM_MODE) { + kwr->u.rxhdr.ivinsert_to_authinsrt = + htobe64(V_TLS_KEYCTX_TX_WR_IVINSERT(6ULL) | + V_TLS_KEYCTX_TX_WR_AADSTRTOFST(1ULL) | + V_TLS_KEYCTX_TX_WR_AADSTOPOFST(5ULL) | + V_TLS_KEYCTX_TX_WR_AUTHSRTOFST(14ULL) | + V_TLS_KEYCTX_TX_WR_AUTHSTOPOFST(16ULL) | + V_TLS_KEYCTX_TX_WR_CIPHERSRTOFST(14ULL) | + V_TLS_KEYCTX_TX_WR_CIPHERSTOPOFST(0ULL) | + V_TLS_KEYCTX_TX_WR_AUTHINSRT(16ULL)); + kwr->u.rxhdr.ivpresent_to_rxmk_size &= + ~(V_TLS_KEYCTX_TX_WR_RXOPAD_PRESENT(1)); + kwr->u.rxhdr.authmode_to_rxvalid &= + ~(V_TLS_KEYCTX_TX_WR_CIPHAUTHSEQCTRL(1)); + } else { + kwr->u.rxhdr.ivinsert_to_authinsrt = + htobe64(V_TLS_KEYCTX_TX_WR_IVINSERT(6ULL) | + V_TLS_KEYCTX_TX_WR_AADSTRTOFST(1ULL) | + V_TLS_KEYCTX_TX_WR_AADSTOPOFST(5ULL) | + V_TLS_KEYCTX_TX_WR_AUTHSRTOFST(22ULL) | + V_TLS_KEYCTX_TX_WR_AUTHSTOPOFST(0ULL) | + V_TLS_KEYCTX_TX_WR_CIPHERSRTOFST(22ULL) | + V_TLS_KEYCTX_TX_WR_CIPHERSTOPOFST(0ULL) | + V_TLS_KEYCTX_TX_WR_AUTHINSRT(0ULL)); + } +} + +/* Rx key */ +static void +prepare_rxkey_wr(struct tls_keyctx *kwr, struct tls_key_context *kctx) +{ + unsigned int ck_size = kctx->cipher_secret_size; + unsigned int mk_size = kctx->mac_secret_size; + int proto_ver = kctx->proto_ver; + + kwr->u.rxhdr.flitcnt_hmacctrl = + ((kctx->tx_key_info_size >> 4) << 3) | kctx->hmac_ctrl; + + kwr->u.rxhdr.protover_ciphmode = + V_TLS_KEYCTX_TX_WR_PROTOVER(get_proto_ver(proto_ver)) | + V_TLS_KEYCTX_TX_WR_CIPHMODE(kctx->state.enc_mode); + + kwr->u.rxhdr.authmode_to_rxvalid = + V_TLS_KEYCTX_TX_WR_AUTHMODE(kctx->state.auth_mode) | + V_TLS_KEYCTX_TX_WR_CIPHAUTHSEQCTRL(1) | + V_TLS_KEYCTX_TX_WR_SEQNUMCTRL(3) | + V_TLS_KEYCTX_TX_WR_RXVALID(1); + + kwr->u.rxhdr.ivpresent_to_rxmk_size = + V_TLS_KEYCTX_TX_WR_IVPRESENT(0) | + V_TLS_KEYCTX_TX_WR_RXOPAD_PRESENT(1) | + V_TLS_KEYCTX_TX_WR_RXCK_SIZE(get_cipher_key_size(ck_size)) | + V_TLS_KEYCTX_TX_WR_RXMK_SIZE(get_mac_key_size(mk_size)); + + tls_rxkey_flit1(kwr, kctx); + + /* No key reversal for GCM */ + if (kctx->state.enc_mode != CH_EVP_CIPH_GCM_MODE) { + t4_aes_getdeckey(kwr->keys.edkey, kctx->rx.key, + (kctx->cipher_secret_size << 3)); + memcpy(kwr->keys.edkey + kctx->cipher_secret_size, + kctx->rx.key + kctx->cipher_secret_size, + (IPAD_SIZE + OPAD_SIZE)); + } else { + memcpy(kwr->keys.edkey, kctx->rx.key, + (kctx->tx_key_info_size - SALT_SIZE)); + memcpy(kwr->u.rxhdr.rxsalt, kctx->rx.salt, SALT_SIZE); + } +} + +/* Tx key */ +static void +prepare_txkey_wr(struct tls_keyctx *kwr, struct tls_key_context *kctx) +{ + unsigned int ck_size = kctx->cipher_secret_size; + unsigned int mk_size = kctx->mac_secret_size; + + kwr->u.txhdr.ctxlen = + (kctx->tx_key_info_size >> 4); + kwr->u.txhdr.dualck_to_txvalid = + V_TLS_KEYCTX_TX_WR_TXOPAD_PRESENT(1) | + V_TLS_KEYCTX_TX_WR_SALT_PRESENT(1) | + V_TLS_KEYCTX_TX_WR_TXCK_SIZE(get_cipher_key_size(ck_size)) | + V_TLS_KEYCTX_TX_WR_TXMK_SIZE(get_mac_key_size(mk_size)) | + V_TLS_KEYCTX_TX_WR_TXVALID(1); + + memcpy(kwr->keys.edkey, kctx->tx.key, HDR_KCTX_SIZE); + if (kctx->state.enc_mode == CH_EVP_CIPH_GCM_MODE) { + memcpy(kwr->u.txhdr.txsalt, kctx->tx.salt, SALT_SIZE); + kwr->u.txhdr.dualck_to_txvalid &= + ~(V_TLS_KEYCTX_TX_WR_TXOPAD_PRESENT(1)); + } + kwr->u.txhdr.dualck_to_txvalid = htons(kwr->u.txhdr.dualck_to_txvalid); +} + +/* TLS Key memory management */ +int +tls_init_kmap(struct adapter *sc, struct tom_data *td) +{ + + td->key_map = vmem_create("T4TLS key map", sc->vres.key.start, + sc->vres.key.size, 8, 0, M_FIRSTFIT | M_NOWAIT); + if (td->key_map == NULL) + return (ENOMEM); + return (0); +} + +void +tls_free_kmap(struct tom_data *td) +{ + + if (td->key_map != NULL) + vmem_destroy(td->key_map); +} + +static int +get_new_keyid(struct toepcb *toep, struct tls_key_context *k_ctx) +{ + struct tom_data *td = toep->td; + vmem_addr_t addr; + + if (vmem_alloc(td->key_map, TLS_KEY_CONTEXT_SZ, M_NOWAIT | M_FIRSTFIT, + &addr) != 0) + return (-1); + + return (addr); +} + +static void +free_keyid(struct toepcb *toep, int keyid) +{ + struct tom_data *td = toep->td; + + vmem_free(td->key_map, keyid, TLS_KEY_CONTEXT_SZ); +} + *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201803132305.w2DN5pnc010768>