From owner-svn-src-stable@freebsd.org Thu Dec 13 10:13:31 2018 Return-Path: Delivered-To: svn-src-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3A5113156FB; Thu, 13 Dec 2018 10:13:30 +0000 (UTC) (envelope-from vmaffione@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 60A26809BA; Thu, 13 Dec 2018 10:13:30 +0000 (UTC) (envelope-from vmaffione@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 38DF31E591; Thu, 13 Dec 2018 10:13:30 +0000 (UTC) (envelope-from vmaffione@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wBDADUDZ093142; Thu, 13 Dec 2018 10:13:30 GMT (envelope-from vmaffione@FreeBSD.org) Received: (from vmaffione@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wBDADTjo093138; Thu, 13 Dec 2018 10:13:29 GMT (envelope-from vmaffione@FreeBSD.org) Message-Id: <201812131013.wBDADTjo093138@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: vmaffione set sender to vmaffione@FreeBSD.org using -f From: Vincenzo Maffione Date: Thu, 13 Dec 2018 10:13:29 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r342033 - in stable/11/sys: conf dev/netmap modules/netmap net X-SVN-Group: stable-11 X-SVN-Commit-Author: vmaffione X-SVN-Commit-Paths: in stable/11/sys: conf dev/netmap modules/netmap net X-SVN-Commit-Revision: 342033 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 60A26809BA X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-1.38 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-0.83)[-0.828,0]; NEURAL_HAM_SHORT(-0.55)[-0.553,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2018 10:13:31 -0000 Author: vmaffione Date: Thu Dec 13 10:13:29 2018 New Revision: 342033 URL: https://svnweb.freebsd.org/changeset/base/342033 Log: MFC r341516, r341589 netmap: align codebase to the current upstream (760279cfb2730a585) Changelist: - Replace netmap passthrough host support with a more general mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop. No kernel threads are used to use this feature: the application is required to spawn a thread (or a process) and issue a SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The kernel loop is executed by the ioctl implementation, which returns to userspace only when a different thread calls SYNC_KLOOP_STOP or the netmap file descriptor is closed. - Update the if_ptnet driver to cope with the new data structures, and prune all the obsolete ptnetmap code. - Add support for "null" netmap ports, useful to allocate netmap_if, netmap_ring and netmap buffers to be used by specialized applications (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect. - Various fixes and code refactoring. Sponsored by: Sunny Valley Networks Differential Revision: https://reviews.freebsd.org/D18015 Added: stable/11/sys/dev/netmap/netmap_kloop.c - copied unchanged from r341815, stable/12/sys/dev/netmap/netmap_kloop.c stable/11/sys/dev/netmap/netmap_null.c - copied unchanged from r341815, stable/12/sys/dev/netmap/netmap_null.c Modified: stable/11/sys/conf/files stable/11/sys/dev/netmap/if_ixl_netmap.h stable/11/sys/dev/netmap/if_vtnet_netmap.h stable/11/sys/dev/netmap/netmap.c stable/11/sys/dev/netmap/netmap_bdg.c stable/11/sys/dev/netmap/netmap_bdg.h stable/11/sys/dev/netmap/netmap_freebsd.c stable/11/sys/dev/netmap/netmap_generic.c stable/11/sys/dev/netmap/netmap_kern.h stable/11/sys/dev/netmap/netmap_legacy.c stable/11/sys/dev/netmap/netmap_mem2.c stable/11/sys/dev/netmap/netmap_mem2.h stable/11/sys/dev/netmap/netmap_pipe.c stable/11/sys/dev/netmap/netmap_vale.c stable/11/sys/modules/netmap/Makefile stable/11/sys/net/netmap.h stable/11/sys/net/netmap_user.h stable/11/sys/net/netmap_virt.h Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/conf/files ============================================================================== --- stable/11/sys/conf/files Thu Dec 13 09:40:06 2018 (r342032) +++ stable/11/sys/conf/files Thu Dec 13 10:13:29 2018 (r342033) @@ -2469,17 +2469,19 @@ dev/ncr/ncr.c optional ncr pci dev/ncv/ncr53c500.c optional ncv dev/ncv/ncr53c500_pccard.c optional ncv pccard dev/netmap/netmap.c optional netmap +dev/netmap/netmap_bdg.c optional netmap dev/netmap/netmap_freebsd.c optional netmap dev/netmap/netmap_generic.c optional netmap +dev/netmap/netmap_kloop.c optional netmap +dev/netmap/netmap_legacy.c optional netmap dev/netmap/netmap_mbq.c optional netmap dev/netmap/netmap_mem2.c optional netmap dev/netmap/netmap_monitor.c optional netmap +dev/netmap/netmap_null.c optional netmap dev/netmap/netmap_offloadings.c optional netmap dev/netmap/netmap_pipe.c optional netmap dev/netmap/netmap_pt.c optional netmap dev/netmap/netmap_vale.c optional netmap -dev/netmap/netmap_legacy.c optional netmap -dev/netmap/netmap_bdg.c optional netmap # compile-with "${NORMAL_C} -Wconversion -Wextra" dev/nfsmb/nfsmb.c optional nfsmb pci dev/nge/if_nge.c optional nge Modified: stable/11/sys/dev/netmap/if_ixl_netmap.h ============================================================================== --- stable/11/sys/dev/netmap/if_ixl_netmap.h Thu Dec 13 09:40:06 2018 (r342032) +++ stable/11/sys/dev/netmap/if_ixl_netmap.h Thu Dec 13 10:13:29 2018 (r342033) @@ -130,7 +130,7 @@ ixl_netmap_attach(struct ixl_vsi *vsi) na.ifp = vsi->ifp; na.na_flags = NAF_BDG_MAYSLEEP; // XXX check that queues is set. - nm_prinf("queues is %p\n", vsi->queues); + nm_prinf("queues is %p", vsi->queues); if (vsi->queues) { na.num_tx_desc = vsi->queues[0].num_desc; na.num_rx_desc = vsi->queues[0].num_desc; Modified: stable/11/sys/dev/netmap/if_vtnet_netmap.h ============================================================================== --- stable/11/sys/dev/netmap/if_vtnet_netmap.h Thu Dec 13 09:40:06 2018 (r342032) +++ stable/11/sys/dev/netmap/if_vtnet_netmap.h Thu Dec 13 10:13:29 2018 (r342033) @@ -79,7 +79,7 @@ vtnet_free_used(struct virtqueue *vq, int netmap_bufs, } if (deq) - nm_prinf("%d sgs dequeued from %s-%d (netmap=%d)\n", + nm_prinf("%d sgs dequeued from %s-%d (netmap=%d)", deq, nm_txrx2str(t), idx, netmap_bufs); } @@ -230,7 +230,7 @@ vtnet_netmap_txsync(struct netmap_kring *kring, int fl /*writeable=*/0); if (unlikely(err)) { if (err != ENOSPC) - nm_prerr("virtqueue_enqueue(%s) failed: %d\n", + nm_prerr("virtqueue_enqueue(%s) failed: %d", kring->name, err); break; } @@ -251,7 +251,7 @@ vtnet_netmap_txsync(struct netmap_kring *kring, int fl if (token == NULL) break; if (unlikely(token != (void *)txq)) - nm_prerr("BUG: TX token mismatch\n"); + nm_prerr("BUG: TX token mismatch"); else n++; } @@ -307,7 +307,7 @@ vtnet_netmap_kring_refill(struct netmap_kring *kring, /*readable=*/0, /*writeable=*/sg.sg_nseg); if (unlikely(err)) { if (err != ENOSPC) - nm_prerr("virtqueue_enqueue(%s) failed: %d\n", + nm_prerr("virtqueue_enqueue(%s) failed: %d", kring->name, err); break; } @@ -391,7 +391,7 @@ vtnet_netmap_rxsync(struct netmap_kring *kring, int fl break; } if (unlikely(token != (void *)rxq)) { - nm_prerr("BUG: RX token mismatch\n"); + nm_prerr("BUG: RX token mismatch"); } else { /* Skip the virtio-net header. */ len -= sc->vtnet_hdr_size; @@ -533,7 +533,7 @@ vtnet_netmap_attach(struct vtnet_softc *sc) netmap_attach(&na); - nm_prinf("vtnet attached txq=%d, txd=%d rxq=%d, rxd=%d\n", + nm_prinf("vtnet attached txq=%d, txd=%d rxq=%d, rxd=%d", na.num_tx_rings, na.num_tx_desc, na.num_tx_rings, na.num_rx_desc); } Modified: stable/11/sys/dev/netmap/netmap.c ============================================================================== --- stable/11/sys/dev/netmap/netmap.c Thu Dec 13 09:40:06 2018 (r342032) +++ stable/11/sys/dev/netmap/netmap.c Thu Dec 13 10:13:29 2018 (r342033) @@ -478,6 +478,9 @@ ports attached to the switch) /* user-controlled variables */ int netmap_verbose; +#ifdef CONFIG_NETMAP_DEBUG +int netmap_debug; +#endif /* CONFIG_NETMAP_DEBUG */ static int netmap_no_timestamp; /* don't timestamp on rxsync */ int netmap_mitigate = 1; @@ -526,9 +529,6 @@ int netmap_generic_hwcsum = 0; /* Non-zero if ptnet devices are allowed to use virtio-net headers. */ int ptnet_vnet_hdr = 1; -/* 0 if ptnetmap should not use worker threads for TX processing */ -int ptnetmap_tx_workers = 1; - /* * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated * in some other operating systems @@ -539,6 +539,10 @@ SYSCTL_DECL(_dev_netmap); SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args"); SYSCTL_INT(_dev_netmap, OID_AUTO, verbose, CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode"); +#ifdef CONFIG_NETMAP_DEBUG +SYSCTL_INT(_dev_netmap, OID_AUTO, debug, + CTLFLAG_RW, &netmap_debug, 0, "Debug messages"); +#endif /* CONFIG_NETMAP_DEBUG */ SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp, CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp"); SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr, CTLFLAG_RW, &netmap_no_pendintr, @@ -570,8 +574,6 @@ SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTL #endif SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0, "Allow ptnet devices to use virtio-net headers"); -SYSCTL_INT(_dev_netmap, OID_AUTO, ptnetmap_tx_workers, CTLFLAG_RW, - &ptnetmap_tx_workers, 0, "Use worker threads for pnetmap TX processing"); SYSEND; @@ -693,7 +695,7 @@ nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, op = "Clamp"; } if (op && msg) - nm_prinf("%s %s to %d (was %d)\n", op, msg, *v, oldv); + nm_prinf("%s %s to %d (was %d)", op, msg, *v, oldv); return *v; } @@ -777,13 +779,14 @@ netmap_update_config(struct netmap_adapter *na) na->num_rx_rings = info.num_rx_rings; na->num_rx_desc = info.num_rx_descs; na->rx_buf_maxsize = info.rx_buf_maxsize; - D("configuration changed for %s: txring %d x %d, " - "rxring %d x %d, rxbufsz %d", - na->name, na->num_tx_rings, na->num_tx_desc, - na->num_rx_rings, na->num_rx_desc, na->rx_buf_maxsize); + if (netmap_verbose) + nm_prinf("configuration changed for %s: txring %d x %d, " + "rxring %d x %d, rxbufsz %d", + na->name, na->num_tx_rings, na->num_tx_desc, + na->num_rx_rings, na->num_rx_desc, na->rx_buf_maxsize); return 0; } - D("WARNING: configuration changed for %s while active: " + nm_prerr("WARNING: configuration changed for %s while active: " "txring %d x %d, rxring %d x %d, rxbufsz %d", na->name, info.num_tx_rings, info.num_tx_descs, info.num_rx_rings, info.num_rx_descs, @@ -829,7 +832,8 @@ netmap_krings_create(struct netmap_adapter *na, u_int enum txrx t; if (na->tx_rings != NULL) { - D("warning: krings were already created"); + if (netmap_debug & NM_DEBUG_ON) + nm_prerr("warning: krings were already created"); return 0; } @@ -843,7 +847,7 @@ netmap_krings_create(struct netmap_adapter *na, u_int na->tx_rings = nm_os_malloc((size_t)len); if (na->tx_rings == NULL) { - D("Cannot allocate krings"); + nm_prerr("Cannot allocate krings"); return ENOMEM; } na->rx_rings = na->tx_rings + n[NR_TX]; @@ -911,7 +915,8 @@ netmap_krings_delete(struct netmap_adapter *na) enum txrx t; if (na->tx_rings == NULL) { - D("warning: krings were already deleted"); + if (netmap_debug & NM_DEBUG_ON) + nm_prerr("warning: krings were already deleted"); return; } @@ -1013,11 +1018,11 @@ netmap_do_unregif(struct netmap_priv_d *priv) * happens if the close() occurs while a concurrent * syscall is running. */ - if (netmap_verbose) - D("deleting last instance for %s", na->name); + if (netmap_debug & NM_DEBUG_ON) + nm_prinf("deleting last instance for %s", na->name); if (nm_netmap_on(na)) { - D("BUG: netmap on while going to delete the krings"); + nm_prerr("BUG: netmap on while going to delete the krings"); } na->nm_krings_delete(na); @@ -1034,14 +1039,6 @@ netmap_do_unregif(struct netmap_priv_d *priv) priv->np_nifp = NULL; } -/* call with NMG_LOCK held */ -static __inline int -nm_si_user(struct netmap_priv_d *priv, enum txrx t) -{ - return (priv->np_na != NULL && - (priv->np_qlast[t] - priv->np_qfirst[t] > 1)); -} - struct netmap_priv_d* netmap_priv_new(void) { @@ -1137,8 +1134,8 @@ netmap_send_up(struct ifnet *dst, struct mbq *q) /* Send packets up, outside the lock; head/prev machinery * is only useful for Windows. */ while ((m = mbq_dequeue(q)) != NULL) { - if (netmap_verbose & NM_VERB_HOST) - D("sending up pkt %p size %d", m, MBUF_LEN(m)); + if (netmap_debug & NM_DEBUG_HOST) + nm_prinf("sending up pkt %p size %d", m, MBUF_LEN(m)); prev = nm_os_send_up(dst, m, prev); if (head == NULL) head = prev; @@ -1333,8 +1330,8 @@ netmap_rxsync_from_host(struct netmap_kring *kring, in m_copydata(m, 0, len, NMB(na, slot)); ND("nm %d len %d", nm_i, len); - if (netmap_verbose) - D("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL)); + if (netmap_debug & NM_DEBUG_HOST) + nm_prinf("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL)); slot->len = len; slot->flags = 0; @@ -1501,7 +1498,7 @@ netmap_get_na(struct nmreq_header *hdr, if (req->nr_mode == NR_REG_PIPE_MASTER || req->nr_mode == NR_REG_PIPE_SLAVE) { /* Do not accept deprecated pipe modes. */ - D("Deprecated pipe nr_mode, use xx{yy or xx}yy syntax"); + nm_prerr("Deprecated pipe nr_mode, use xx{yy or xx}yy syntax"); return EINVAL; } @@ -1528,9 +1525,7 @@ netmap_get_na(struct nmreq_header *hdr, * 0 !NULL type matches and na created/found * !0 !NULL impossible */ - - /* try to see if this is a ptnetmap port */ - error = netmap_get_pt_host_na(hdr, na, nmd, create); + error = netmap_get_null_na(hdr, na, nmd, create); if (error || *na != NULL) goto out; @@ -1740,7 +1735,7 @@ nm_rxsync_prologue(struct netmap_kring *kring, struct /* * Error routine called when txsync/rxsync detects an error. - * Can't do much more than resetting head =cur = hwcur, tail = hwtail + * Can't do much more than resetting head = cur = hwcur, tail = hwtail * Return 1 on reinit. * * This routine is only called by the upper half of the kernel. @@ -1811,12 +1806,6 @@ netmap_interp_ringid(struct netmap_priv_d *priv, uint3 enum txrx t; u_int j; - if ((nr_flags & NR_PTNETMAP_HOST) && ((nr_mode != NR_REG_ALL_NIC) || - nr_flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) { - D("Error: only NR_REG_ALL_NIC supported with netmap passthrough"); - return EINVAL; - } - for_rx_tx(t) { if (nr_flags & excluded_direction[t]) { priv->np_qfirst[t] = priv->np_qlast[t] = 0; @@ -1824,6 +1813,7 @@ netmap_interp_ringid(struct netmap_priv_d *priv, uint3 } switch (nr_mode) { case NR_REG_ALL_NIC: + case NR_REG_NULL: priv->np_qfirst[t] = 0; priv->np_qlast[t] = nma_get_nrings(na, t); ND("ALL/PIPE: %s %d %d", nm_txrx2str(t), @@ -1832,7 +1822,7 @@ netmap_interp_ringid(struct netmap_priv_d *priv, uint3 case NR_REG_SW: case NR_REG_NIC_SW: if (!(na->na_flags & NAF_HOST_RINGS)) { - D("host rings not supported"); + nm_prerr("host rings not supported"); return EINVAL; } priv->np_qfirst[t] = (nr_mode == NR_REG_SW ? @@ -1845,7 +1835,7 @@ netmap_interp_ringid(struct netmap_priv_d *priv, uint3 case NR_REG_ONE_NIC: if (nr_ringid >= na->num_tx_rings && nr_ringid >= na->num_rx_rings) { - D("invalid ring id %d", nr_ringid); + nm_prerr("invalid ring id %d", nr_ringid); return EINVAL; } /* if not enough rings, use the first one */ @@ -1858,11 +1848,11 @@ netmap_interp_ringid(struct netmap_priv_d *priv, uint3 priv->np_qfirst[t], priv->np_qlast[t]); break; default: - D("invalid regif type %d", nr_mode); + nm_prerr("invalid regif type %d", nr_mode); return EINVAL; } } - priv->np_flags = nr_flags | nr_mode; // TODO + priv->np_flags = nr_flags; /* Allow transparent forwarding mode in the host --> nic * direction only if all the TX hw rings have been opened. */ @@ -1872,7 +1862,7 @@ netmap_interp_ringid(struct netmap_priv_d *priv, uint3 } if (netmap_verbose) { - D("%s: tx [%d,%d) rx [%d,%d) id %d", + nm_prinf("%s: tx [%d,%d) rx [%d,%d) id %d", na->name, priv->np_qfirst[NR_TX], priv->np_qlast[NR_TX], @@ -1928,6 +1918,7 @@ netmap_unset_ringid(struct netmap_priv_d *priv) } priv->np_flags = 0; priv->np_txpoll = 0; + priv->np_kloop_state = 0; } @@ -1944,8 +1935,8 @@ netmap_krings_get(struct netmap_priv_d *priv) int excl = (priv->np_flags & NR_EXCLUSIVE); enum txrx t; - if (netmap_verbose) - D("%s: grabbing tx [%d, %d) rx [%d, %d)", + if (netmap_debug & NM_DEBUG_ON) + nm_prinf("%s: grabbing tx [%d, %d) rx [%d, %d)", na->name, priv->np_qfirst[NR_TX], priv->np_qlast[NR_TX], @@ -2022,6 +2013,110 @@ nm_priv_rx_enabled(struct netmap_priv_d *priv) return (priv->np_qfirst[NR_RX] != priv->np_qlast[NR_RX]); } +/* Validate the CSB entries for both directions (atok and ktoa). + * To be called under NMG_LOCK(). */ +static int +netmap_csb_validate(struct netmap_priv_d *priv, struct nmreq_opt_csb *csbo) +{ + struct nm_csb_atok *csb_atok_base = + (struct nm_csb_atok *)(uintptr_t)csbo->csb_atok; + struct nm_csb_ktoa *csb_ktoa_base = + (struct nm_csb_ktoa *)(uintptr_t)csbo->csb_ktoa; + enum txrx t; + int num_rings[NR_TXRX], tot_rings; + size_t entry_size[2]; + void *csb_start[2]; + int i; + + if (priv->np_kloop_state & NM_SYNC_KLOOP_RUNNING) { + nm_prerr("Cannot update CSB while kloop is running"); + return EBUSY; + } + + tot_rings = 0; + for_rx_tx(t) { + num_rings[t] = priv->np_qlast[t] - priv->np_qfirst[t]; + tot_rings += num_rings[t]; + } + if (tot_rings <= 0) + return 0; + + if (!(priv->np_flags & NR_EXCLUSIVE)) { + nm_prerr("CSB mode requires NR_EXCLUSIVE"); + return EINVAL; + } + + entry_size[0] = sizeof(*csb_atok_base); + entry_size[1] = sizeof(*csb_ktoa_base); + csb_start[0] = (void *)csb_atok_base; + csb_start[1] = (void *)csb_ktoa_base; + + for (i = 0; i < 2; i++) { + /* On Linux we could use access_ok() to simplify + * the validation. However, the advantage of + * this approach is that it works also on + * FreeBSD. */ + size_t csb_size = tot_rings * entry_size[i]; + void *tmp; + int err; + + if ((uintptr_t)csb_start[i] & (entry_size[i]-1)) { + nm_prerr("Unaligned CSB address"); + return EINVAL; + } + + tmp = nm_os_malloc(csb_size); + if (!tmp) + return ENOMEM; + if (i == 0) { + /* Application --> kernel direction. */ + err = copyin(csb_start[i], tmp, csb_size); + } else { + /* Kernel --> application direction. */ + memset(tmp, 0, csb_size); + err = copyout(tmp, csb_start[i], csb_size); + } + nm_os_free(tmp); + if (err) { + nm_prerr("Invalid CSB address"); + return err; + } + } + + priv->np_csb_atok_base = csb_atok_base; + priv->np_csb_ktoa_base = csb_ktoa_base; + + /* Initialize the CSB. */ + for_rx_tx(t) { + for (i = 0; i < num_rings[t]; i++) { + struct netmap_kring *kring = + NMR(priv->np_na, t)[i + priv->np_qfirst[t]]; + struct nm_csb_atok *csb_atok = csb_atok_base + i; + struct nm_csb_ktoa *csb_ktoa = csb_ktoa_base + i; + + if (t == NR_RX) { + csb_atok += num_rings[NR_TX]; + csb_ktoa += num_rings[NR_TX]; + } + + CSB_WRITE(csb_atok, head, kring->rhead); + CSB_WRITE(csb_atok, cur, kring->rcur); + CSB_WRITE(csb_atok, appl_need_kick, 1); + CSB_WRITE(csb_atok, sync_flags, 1); + CSB_WRITE(csb_ktoa, hwcur, kring->nr_hwcur); + CSB_WRITE(csb_ktoa, hwtail, kring->nr_hwtail); + CSB_WRITE(csb_ktoa, kern_need_kick, 1); + + nm_prinf("csb_init for kring %s: head %u, cur %u, " + "hwcur %u, hwtail %u", kring->name, + kring->rhead, kring->rcur, kring->nr_hwcur, + kring->nr_hwtail); + } + } + + return 0; +} + /* * possibly move the interface to netmap-mode. * If success it returns a pointer to netmap_if, otherwise NULL. @@ -2138,7 +2233,7 @@ netmap_do_regif(struct netmap_priv_d *priv, struct net na->name, mtu, na->rx_buf_maxsize, nbs); if (na->rx_buf_maxsize == 0) { - D("%s: error: rx_buf_maxsize == 0", na->name); + nm_prerr("%s: error: rx_buf_maxsize == 0", na->name); error = EIO; goto err_drop_mem; } @@ -2150,7 +2245,7 @@ netmap_do_regif(struct netmap_priv_d *priv, struct net * cannot be used in this case. */ if (nbs < mtu) { nm_prerr("error: netmap buf size (%u) " - "< device MTU (%u)\n", nbs, mtu); + "< device MTU (%u)", nbs, mtu); error = EINVAL; goto err_drop_mem; } @@ -2163,14 +2258,14 @@ netmap_do_regif(struct netmap_priv_d *priv, struct net if (!(na->na_flags & NAF_MOREFRAG)) { nm_prerr("error: large MTU (%d) needed " "but %s does not support " - "NS_MOREFRAG\n", mtu, + "NS_MOREFRAG", mtu, na->ifp->if_xname); error = EINVAL; goto err_drop_mem; } else if (nbs < na->rx_buf_maxsize) { nm_prerr("error: using NS_MOREFRAG on " "%s requires netmap buf size " - ">= %u\n", na->ifp->if_xname, + ">= %u", na->ifp->if_xname, na->rx_buf_maxsize); error = EINVAL; goto err_drop_mem; @@ -2178,7 +2273,7 @@ netmap_do_regif(struct netmap_priv_d *priv, struct net nm_prinf("info: netmap application on " "%s needs to support " "NS_MOREFRAG " - "(MTU=%u,netmap_buf_size=%u)\n", + "(MTU=%u,netmap_buf_size=%u)", na->ifp->if_xname, mtu, nbs); } } @@ -2308,7 +2403,6 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c struct ifnet *ifp = NULL; int error = 0; u_int i, qfirst, qlast; - struct netmap_if *nifp; struct netmap_kring **krings; int sync_flags; enum txrx t; @@ -2317,14 +2411,10 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c case NIOCCTRL: { struct nmreq_header *hdr = (struct nmreq_header *)data; - if (hdr->nr_version != NETMAP_API) { - D("API mismatch for reqtype %d: got %d need %d", - hdr->nr_version, - hdr->nr_version, NETMAP_API); - hdr->nr_version = NETMAP_API; - } if (hdr->nr_version < NETMAP_MIN_API || hdr->nr_version > NETMAP_MAX_API) { + nm_prerr("API mismatch: got %d need %d", + hdr->nr_version, NETMAP_API); return EINVAL; } @@ -2346,13 +2436,13 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c case NETMAP_REQ_REGISTER: { struct nmreq_register *req = (struct nmreq_register *)(uintptr_t)hdr->nr_body; + struct netmap_if *nifp; + /* Protect access to priv from concurrent requests. */ NMG_LOCK(); do { - u_int memflags; -#ifdef WITH_EXTMEM struct nmreq_option *opt; -#endif /* WITH_EXTMEM */ + u_int memflags; if (priv->np_nifp != NULL) { /* thread already registered */ error = EBUSY; @@ -2383,6 +2473,10 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c /* find the allocator and get a reference */ nmd = netmap_mem_find(req->nr_mem_id); if (nmd == NULL) { + if (netmap_verbose) { + nm_prerr("%s: failed to find mem_id %u", + hdr->nr_name, req->nr_mem_id); + } error = EINVAL; break; } @@ -2398,6 +2492,8 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c } if (na->virt_hdr_len && !(req->nr_flags & NR_ACCEPT_VNET_HDR)) { + nm_prerr("virt_hdr_len=%d, but application does " + "not accept it", na->virt_hdr_len); error = EIO; break; } @@ -2407,6 +2503,23 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c if (error) { /* reg. failed, release priv and ref */ break; } + + opt = nmreq_findoption((struct nmreq_option *)(uintptr_t)hdr->nr_options, + NETMAP_REQ_OPT_CSB); + if (opt != NULL) { + struct nmreq_opt_csb *csbo = + (struct nmreq_opt_csb *)opt; + error = nmreq_checkduplicate(opt); + if (!error) { + error = netmap_csb_validate(priv, csbo); + } + opt->nro_status = error; + if (error) { + netmap_do_unregif(priv); + break; + } + } + nifp = priv->np_nifp; priv->np_td = td; /* for debugging purposes */ @@ -2431,12 +2544,12 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c if (req->nr_extra_bufs) { if (netmap_verbose) - D("requested %d extra buffers", + nm_prinf("requested %d extra buffers", req->nr_extra_bufs); req->nr_extra_bufs = netmap_extra_alloc(na, &nifp->ni_bufs_head, req->nr_extra_bufs); if (netmap_verbose) - D("got %d extra buffers", req->nr_extra_bufs); + nm_prinf("got %d extra buffers", req->nr_extra_bufs); } req->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp); @@ -2474,6 +2587,7 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c * so that we can call netmap_get_na(). */ struct nmreq_register regreq; bzero(®req, sizeof(regreq)); + regreq.nr_mode = NR_REG_ALL_NIC; regreq.nr_tx_slots = req->nr_tx_slots; regreq.nr_rx_slots = req->nr_rx_slots; regreq.nr_tx_rings = req->nr_tx_rings; @@ -2495,6 +2609,10 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c } else { nmd = netmap_mem_find(req->nr_mem_id ? req->nr_mem_id : 1); if (nmd == NULL) { + if (netmap_verbose) + nm_prerr("%s: failed to find mem_id %u", + hdr->nr_name, + req->nr_mem_id ? req->nr_mem_id : 1); error = EINVAL; break; } @@ -2506,8 +2624,6 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c break; if (na == NULL) /* only memory info */ break; - req->nr_offset = 0; - req->nr_rx_slots = req->nr_tx_slots = 0; netmap_update_config(na); req->nr_rx_rings = na->num_rx_rings; req->nr_tx_rings = na->num_tx_rings; @@ -2520,17 +2636,17 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c } #ifdef WITH_VALE case NETMAP_REQ_VALE_ATTACH: { - error = nm_bdg_ctl_attach(hdr, NULL /* userspace request */); + error = netmap_vale_attach(hdr, NULL /* userspace request */); break; } case NETMAP_REQ_VALE_DETACH: { - error = nm_bdg_ctl_detach(hdr, NULL /* userspace request */); + error = netmap_vale_detach(hdr, NULL /* userspace request */); break; } case NETMAP_REQ_VALE_LIST: { - error = netmap_bdg_list(hdr); + error = netmap_vale_list(hdr); break; } @@ -2541,12 +2657,16 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c * so that we can call netmap_get_bdg_na(). */ struct nmreq_register regreq; bzero(®req, sizeof(regreq)); + regreq.nr_mode = NR_REG_ALL_NIC; + /* For now we only support virtio-net headers, and only for * VALE ports, but this may change in future. Valid lengths * for the virtio-net header are 0 (no header), 10 and 12. */ if (req->nr_hdr_len != 0 && req->nr_hdr_len != sizeof(struct nm_vnet_hdr) && req->nr_hdr_len != 12) { + if (netmap_verbose) + nm_prerr("invalid hdr_len %u", req->nr_hdr_len); error = EINVAL; break; } @@ -2563,7 +2683,8 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c if (na->virt_hdr_len) { vpna->mfs = NETMAP_BUF_SIZE(na); } - D("Using vnet_hdr_len %d for %p", na->virt_hdr_len, na); + if (netmap_verbose) + nm_prinf("Using vnet_hdr_len %d for %p", na->virt_hdr_len, na); netmap_adapter_put(na); } else if (!na) { error = ENXIO; @@ -2582,6 +2703,7 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c struct ifnet *ifp; bzero(®req, sizeof(regreq)); + regreq.nr_mode = NR_REG_ALL_NIC; NMG_LOCK(); hdr->nr_reqtype = NETMAP_REQ_REGISTER; hdr->nr_body = (uintptr_t)®req; @@ -2613,22 +2735,80 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c } #endif /* WITH_VALE */ case NETMAP_REQ_POOLS_INFO_GET: { + /* Get information from the memory allocator used for + * hdr->nr_name. */ struct nmreq_pools_info *req = (struct nmreq_pools_info *)(uintptr_t)hdr->nr_body; - /* Get information from the memory allocator. This - * netmap device must already be bound to a port. - * Note that hdr->nr_name is ignored. */ NMG_LOCK(); - if (priv->np_na && priv->np_na->nm_mem) { - struct netmap_mem_d *nmd = priv->np_na->nm_mem; + do { + /* Build a nmreq_register out of the nmreq_pools_info, + * so that we can call netmap_get_na(). */ + struct nmreq_register regreq; + bzero(®req, sizeof(regreq)); + regreq.nr_mem_id = req->nr_mem_id; + regreq.nr_mode = NR_REG_ALL_NIC; + + hdr->nr_reqtype = NETMAP_REQ_REGISTER; + hdr->nr_body = (uintptr_t)®req; + error = netmap_get_na(hdr, &na, &ifp, NULL, 1 /* create */); + hdr->nr_reqtype = NETMAP_REQ_POOLS_INFO_GET; /* reset type */ + hdr->nr_body = (uintptr_t)req; /* reset nr_body */ + if (error) { + na = NULL; + ifp = NULL; + break; + } + nmd = na->nm_mem; /* grab the memory allocator */ + if (nmd == NULL) { + error = EINVAL; + break; + } + + /* Finalize the memory allocator, get the pools + * information and release the allocator. */ + error = netmap_mem_finalize(nmd, na); + if (error) { + break; + } error = netmap_mem_pools_info_get(req, nmd); - } else { + netmap_mem_drop(na); + } while (0); + netmap_unget_na(na, ifp); + NMG_UNLOCK(); + break; + } + + case NETMAP_REQ_CSB_ENABLE: { + struct nmreq_option *opt; + + opt = nmreq_findoption((struct nmreq_option *)(uintptr_t)hdr->nr_options, + NETMAP_REQ_OPT_CSB); + if (opt == NULL) { error = EINVAL; + } else { + struct nmreq_opt_csb *csbo = + (struct nmreq_opt_csb *)opt; + error = nmreq_checkduplicate(opt); + if (!error) { + NMG_LOCK(); + error = netmap_csb_validate(priv, csbo); + NMG_UNLOCK(); + } + opt->nro_status = error; } - NMG_UNLOCK(); break; } + case NETMAP_REQ_SYNC_KLOOP_START: { + error = netmap_sync_kloop(priv, hdr); + break; + } + + case NETMAP_REQ_SYNC_KLOOP_STOP: { + error = netmap_sync_kloop_stop(priv); + break; + } + default: { error = EINVAL; break; @@ -2642,22 +2822,20 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c case NIOCTXSYNC: case NIOCRXSYNC: { - nifp = priv->np_nifp; - - if (nifp == NULL) { + if (unlikely(priv->np_nifp == NULL)) { error = ENXIO; break; } mb(); /* make sure following reads are not from cache */ - na = priv->np_na; /* we have a reference */ - - if (na == NULL) { - D("Internal error: nifp != NULL && na == NULL"); - error = ENXIO; + if (unlikely(priv->np_csb_atok_base)) { + nm_prerr("Invalid sync in CSB mode"); + error = EBUSY; break; } + na = priv->np_na; /* we have a reference */ + mbq_init(&q); t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX); krings = NMR(na, t); @@ -2675,8 +2853,8 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c } if (cmd == NIOCTXSYNC) { - if (netmap_verbose & NM_VERB_TXSYNC) - D("pre txsync ring %d cur %d hwcur %d", + if (netmap_debug & NM_DEBUG_TXSYNC) + nm_prinf("pre txsync ring %d cur %d hwcur %d", i, ring->cur, kring->nr_hwcur); if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { @@ -2684,8 +2862,8 @@ netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, c } else if (kring->nm_sync(kring, sync_flags | NAF_FORCE_RECLAIM) == 0) { nm_sync_finalize(kring); } - if (netmap_verbose & NM_VERB_TXSYNC) - D("post txsync ring %d cur %d hwcur %d", + if (netmap_debug & NM_DEBUG_TXSYNC) + nm_prinf("post txsync ring %d cur %d hwcur %d", i, ring->cur, kring->nr_hwcur); } else { @@ -2740,18 +2918,22 @@ nmreq_size_by_type(uint16_t nr_reqtype) case NETMAP_REQ_VALE_NEWIF: return sizeof(struct nmreq_vale_newif); case NETMAP_REQ_VALE_DELIF: + case NETMAP_REQ_SYNC_KLOOP_STOP: + case NETMAP_REQ_CSB_ENABLE: return 0; case NETMAP_REQ_VALE_POLLING_ENABLE: case NETMAP_REQ_VALE_POLLING_DISABLE: return sizeof(struct nmreq_vale_polling); case NETMAP_REQ_POOLS_INFO_GET: return sizeof(struct nmreq_pools_info); + case NETMAP_REQ_SYNC_KLOOP_START: + return sizeof(struct nmreq_sync_kloop_start); } return 0; } static size_t -nmreq_opt_size_by_type(uint16_t nro_reqtype) +nmreq_opt_size_by_type(uint32_t nro_reqtype, uint64_t nro_size) { size_t rv = sizeof(struct nmreq_option); #ifdef NETMAP_REQ_OPT_DEBUG @@ -2764,6 +2946,13 @@ nmreq_opt_size_by_type(uint16_t nro_reqtype) rv = sizeof(struct nmreq_opt_extmem); break; #endif /* WITH_EXTMEM */ + case NETMAP_REQ_OPT_SYNC_KLOOP_EVENTFDS: + if (nro_size >= rv) + rv = nro_size; + break; + case NETMAP_REQ_OPT_CSB: + rv = sizeof(struct nmreq_opt_csb); + break; } /* subtract the common header */ return rv - sizeof(struct nmreq_option); @@ -2779,8 +2968,11 @@ nmreq_copyin(struct nmreq_header *hdr, int nr_body_is_ struct nmreq_option buf; uint64_t *ptrs; - if (hdr->nr_reserved) + if (hdr->nr_reserved) { + if (netmap_verbose) + nm_prerr("nr_reserved must be zero"); return EINVAL; + } if (!nr_body_is_user) return 0; @@ -2797,6 +2989,8 @@ nmreq_copyin(struct nmreq_header *hdr, int nr_body_is_ (!rqsz && hdr->nr_body != (uintptr_t)NULL)) { /* Request body expected, but not found; or * request body found but unexpected. */ + if (netmap_verbose) + nm_prerr("nr_body expected but not found, or vice versa"); error = EINVAL; goto out_err; } @@ -2810,7 +3004,7 @@ nmreq_copyin(struct nmreq_header *hdr, int nr_body_is_ if (error) goto out_err; optsz += sizeof(*src); - optsz += nmreq_opt_size_by_type(buf.nro_reqtype); + optsz += nmreq_opt_size_by_type(buf.nro_reqtype, buf.nro_size); if (rqsz + optsz > NETMAP_REQ_MAXSIZE) { error = EMSGSIZE; goto out_err; @@ -2864,7 +3058,8 @@ nmreq_copyin(struct nmreq_header *hdr, int nr_body_is_ p = (char *)(opt + 1); /* copy the option body */ - optsz = nmreq_opt_size_by_type(opt->nro_reqtype); + optsz = nmreq_opt_size_by_type(opt->nro_reqtype, + opt->nro_size); if (optsz) { /* the option body follows the option header */ error = copyin(src + 1, p, optsz); @@ -2938,7 +3133,8 @@ nmreq_copyout(struct nmreq_header *hdr, int rerror) /* copy the option body only if there was no error */ if (!rerror && !src->nro_status) { - optsz = nmreq_opt_size_by_type(src->nro_reqtype); + optsz = nmreq_opt_size_by_type(src->nro_reqtype, + src->nro_size); if (optsz) { error = copyout(src + 1, dst + 1, optsz); if (error) { @@ -3016,7 +3212,8 @@ netmap_poll(struct netmap_priv_d *priv, int events, NM struct netmap_adapter *na; struct netmap_kring *kring; struct netmap_ring *ring; - u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0; + u_int i, want[NR_TXRX], revents = 0; + NM_SELINFO_T *si[NR_TXRX]; #define want_tx want[NR_TX] #define want_rx want[NR_RX] struct mbq q; /* packets from RX hw queues to host stack */ @@ -3039,27 +3236,31 @@ netmap_poll(struct netmap_priv_d *priv, int events, NM mbq_init(&q); - if (priv->np_nifp == NULL) { - D("No if registered"); + if (unlikely(priv->np_nifp == NULL)) { return POLLERR; } mb(); /* make sure following reads are not from cache */ na = priv->np_na; - if (!nm_netmap_on(na)) + if (unlikely(!nm_netmap_on(na))) return POLLERR; - if (netmap_verbose & 0x8000) - D("device %s events 0x%x", na->name, events); + if (unlikely(priv->np_csb_atok_base)) { + nm_prerr("Invalid poll in CSB mode"); + return POLLERR; + } + + if (netmap_debug & NM_DEBUG_ON) + nm_prinf("device %s events 0x%x", na->name, events); want_tx = events & (POLLOUT | POLLWRNORM); want_rx = events & (POLLIN | POLLRDNORM); /* - * check_all_{tx|rx} are set if the card has more than one queue AND - * the file descriptor is bound to all of them. If so, we sleep on - * the "global" selinfo, otherwise we sleep on individual selinfo - * (FreeBSD only allows two selinfo's per file descriptor). + * If the card has more than one queue AND the file descriptor is + * bound to all of them, we sleep on the "global" selinfo, otherwise + * we sleep on individual selinfo (FreeBSD only allows two selinfo's + * per file descriptor). * The interrupt routine in the driver wake one or the other * (or both) depending on which clients are active. * @@ -3068,8 +3269,10 @@ netmap_poll(struct netmap_priv_d *priv, int events, NM * there are pending packets to send. The latter can be disabled * passing NETMAP_NO_TX_POLL in the NIOCREG call. */ - check_all_tx = nm_si_user(priv, NR_TX); - check_all_rx = nm_si_user(priv, NR_RX); + si[NR_RX] = nm_si_user(priv, NR_RX) ? &na->si[NR_RX] : + &na->rx_rings[priv->np_qfirst[NR_RX]]->si; + si[NR_TX] = nm_si_user(priv, NR_TX) ? &na->si[NR_TX] : + &na->tx_rings[priv->np_qfirst[NR_TX]]->si; #ifdef __FreeBSD__ /* @@ -3106,10 +3309,8 @@ netmap_poll(struct netmap_priv_d *priv, int events, NM #ifdef linux /* The selrecord must be unconditional on linux. */ - nm_os_selrecord(sr, check_all_tx ? - &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]]->si); - nm_os_selrecord(sr, check_all_rx ? - &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]]->si); + nm_os_selrecord(sr, si[NR_RX]); + nm_os_selrecord(sr, si[NR_TX]); #endif /* linux */ /* @@ -3174,8 +3375,7 @@ flush_tx: send_down = 0; *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***