Date: Fri, 2 Dec 2016 14:22:47 +0000 From: Steven Crangle <Steven@stream-technologies.com> To: Vincenzo Maffione <v.maffione@gmail.com> Cc: FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: Looking for some help with netmap/bhyve Message-ID: <DB5PR07MB16857B8F7A2B4783640074979B8E0@DB5PR07MB1685.eurprd07.prod.outlook.com> In-Reply-To: <CA%2B_eA9jfQc2B18fktgSt_JB4cRqsVLKA_11eWPfsbAKBMiJmUA@mail.gmail.com> References: <DB5PR07MB1685DAA40193595950A464FB9BB40@DB5PR07MB1685.eurprd07.prod.outlook.com> <CA%2B_eA9gggyo_ncSZDribOr%2BsWoFELbPKEeLdaZ8AZwgHAYjcRA@mail.gmail.com> <DB5PR07MB168582129D05D52878D3DCE59B8D0@DB5PR07MB1685.eurprd07.prod.outlook.com> <CA%2B_eA9jW_O_a5uRBAA9XSPspnwATrGhXM3NYusyXxemftn3uZw@mail.gmail.com> <DB5PR07MB1685470E0EB0B93A48E7AA229B8D0@DB5PR07MB1685.eurprd07.prod.outlook.com>, <CA%2B_eA9jfQc2B18fktgSt_JB4cRqsVLKA_11eWPfsbAKBMiJmUA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Hi Vincenzo, Figured I would reply to give you some more testing feedback! Ptnetmap is successfully working for us now on 11.0 RELEASE (stock src.txz from FreeBSD.org) with the help of your patch! There were a few files missing from the patch though, so I've attached a revised one which should work incase it's of use to you! (might be a bit messy) I've attached a txt file with some testing results too , but in summary: 1) netmap with no ptnet or vale: between 65-70Kpps. 2) netmap with vale but no ptnet: between 121-160Kpps. 3) ptnetmap with vale: seems to vary quite alot between 7-15Mpps. All the tests were performed using pkt-gen with 60 byte packets. It's worth noting that currently we're running the tests on a very old tired Sun Micro server(Sun Fire X2200 M2 with Quad Core Processor). Currently in the process of testing out our patch file on a clean 11.0 source on a much faster box, so can run more detailed testing there using iperf etc if it's useful to you! I have also attached a few kernel traces, I tried to turn symbols on, but i'm not sure it worked 100% Will try and fix it for the next install so that I can provide better crash dumps if we run into any bugs! The crashes attached only ever occured in setup number 2 above. Thanks again for your help! Kind Regards Steven ________________________________ [Stream Logo] Steven Crangle Systems Developer | Stream Technologies | Glasgow, UK [http://www.stream-technologies.com/img/phone.png] +44 (0)844 800 8520 | [http://www.stream-technologies.com/img/mouse.png] www.stream-technologies.com<http://www.stream-technologies.com/> ________________________________ From: Vincenzo Maffione <v.maffione@gmail.com> Sent: 29 November 2016 18:42:04 To: Steven Crangle Cc: FreeBSD Net Subject: Re: Looking for some help with netmap/bhyve Hi Steven, 2016-11-29 16:44 GMT+01:00 Steven Crangle <Steven@stream-technologies.com<mailto:Steven@stream-technologies.com>>: Hi Vincenzo, No problem! We've decided that we will try testing with 11.0 too, as it would help us to have our test system locked to a specific version like 11.0 and can hopefully help you with testing too. We'll compile netmap from source as advised, but I was wondering what steps we would take in order to port the changes to bhyve/ptnetmap from HEAD into 11.0? I think you can get the patch containing my modifications to HEAD for bhyve+vmm.ko+libvmmapi, that you can extract with the following command in my freebsd git repository $ git diff 9e10acc0303003d8733a546bca60f242b7b0aa64 ptnet-head lib/libvmmapi/ sys/amd64/ sys/modules/vmm/ usr.sbin/bhyve and apply that patch in your 11.0 FreeBSD tree. I've currently only tested with pkt-gen as our bhyve instances are pretty minimal right now, below are a few samples of the speeds seen in each of the different configurations: which configurations? Using netmap on top of standard tap devices with no vale: vm1: 197.259100 main_thread [2325] 30.977 Kpps (32.435 Kpkts 15.569 Mbps in 1047062 usec) 1.93 avg_batch 826 min_space 203.306126 main_thread [2325] 32.149 Kpps (32.411 Kpkts 15.557 Mbps in 1008156 usec) 1.88 avg_batch 866 min_space 204.313055 main_thread [2325] 33.095 Kpps (33.324 Kpkts 15.996 Mbps in 1006929 usec) 1.98 avg_batch 911 min_space vm2: 245.397418 main_thread [2325] 31.422 Kpps (33.262 Kpkts 15.966 Mbps in 1058559 usec) 313.79 avg_batch 99999 min_space 246.429810 main_thread [2325] 31.254 Kpps (32.266 Kpkts 15.488 Mbps in 1032392 usec) 319.47 avg_batch 99999 min_space 251.621436 main_thread [2325] 31.606 Kpps (33.329 Kpkts 15.998 Mbps in 1054531 usec) 314.42 avg_batch 99999 min_space Be aware that you typically want to use netmap both in the VM and in the host (to use VALE and ptnet). Here you are probably using netmap just in the VM. Performance are however extremely low. When using netmap in both guest and host, with ptnet, you should be able to get at least 10-30 Mpps from vm1 to vm2. After then trying to switch to the netmap in userspace/netmap in kernel space with ptnetmap, I realised I made a silly mistake and built from head instead of the branch ptnet-head... haha So I'm currently doing a quick rebuild on the correct branch and will report back with the results from pkt-gen using just vale, and also with the new ptnetmap changes! Ok, also consider that you can use the patch I indicated above against HEAD. Cheers, Vincenzo Kind Regards ________________________________ [Stream Logo] Steven Crangle Systems Developer | Stream Technologies | Glasgow, UK [http://www.stream-technologies.com/img/phone.png] +44 (0)844 800 8520<tel:%2B44%20%280%29844%20800%208520> | [http://www.stream-technologies.com/img/mouse.png] www.stream-technologies.com<http://www.stream-technologies.com/> ________________________________ From: Vincenzo Maffione <v.maffione@gmail.com<mailto:v.maffione@gmail.com>> Sent: 29 November 2016 11:52:18 To: Steven Crangle Cc: FreeBSD Net Subject: Re: Looking for some help with netmap/bhyve Hi Steven, Thanks for your testing this in HEAD, good news to know that it works ok! I think there is not going to be much difference between HEAD and 11.0, because in the end bhyve code didn't really change (for the parts I touched), and you need to use the latest netmap version (github) anyway, irrespective of whether you are testing on 10.3, 11.0, head, etc. In other words, you would be testing approximately the same code. Anyway, if you happen to test 11.0 by chance please let me know. What kind of tests did you perform? netmap applications (e.g. pkt-gen), or netperf/iperf standard TCP/IP tools? Could you please share the performance number you got for configurations (A) and (B) (or just the one you tried)? I think it is useful to compare your results with the ones I collected here (see the performance evaluation section): https://wiki.freebsd.org/DevSummit/201609?action=AttachFile&do=view&target=20160923-freebsd-summit-ptnet.pdf Regarding the crashes, thanks for reporting, I would be nice if you could describe in more detail how you trigger them (with some higher likelyhood); also, compiling the kernel with debug symbols should help in understanding which part of the code is involved in the crash. Cheers, Vincenzo 2016-11-29 12:26 GMT+01:00 Steven Crangle <Steven@stream-technologies.com<mailto:Steven@stream-technologies.com>>: Hi Vincenzo! Thank you so much for your help! With your instructions I've managed to get a set of bhyves up and running with netmap working properly within them! I chose to go with the recent version of head instead of 10.3. I will also try and build a version on 11.0 if I get chance! I will make sure and ask any further questions on the github page for netmap, but while I'm emailing, I've attached a text file containing a few kernel page fault errors I've ran into. They don't happen repeatedly. They seem to just happen randomly if I try to shut the process down and possibly catch netmap in an incorrect state! Figured they might be useful to you. If I figure out what the error is myself I will try and help out! And thanks again for your help! Kind Regards ________________________________ [Stream Logo] Steven Crangle Systems Developer | Stream Technologies | Glasgow, UK [http://www.stream-technologies.com/img/phone.png] +44 (0)844 800 8520<tel:%2B44%20%280%29844%20800%208520> | [http://www.stream-technologies.com/img/mouse.png] www.stream-technologies.com<http://www.stream-technologies.com/> ________________________________ From: Vincenzo Maffione <v.maffione@gmail.com<mailto:v.maffione@gmail.com>> Sent: 23 November 2016 08:33:11 To: Steven Crangle Cc: FreeBSD Net Subject: Re: Looking for some help with netmap/bhyve 2016-11-22 11:31 GMT+01:00 Steven Crangle <Steven@stream-technologies.com<mailto:Steven@stream-technologies.com>>: Hi, I've recently been trying to boot up several bhyves so that I can test netmap communication between instances. The problem is, no matter what configuration I try, the guest vm running in bhyve completely hangs and becomes unusable as soon as a packet hits the netmap interface. When testing with pkt-gen, the TX side successfully starts sending packets, but the RX side will reliable freeze with the only option being killing the bhyve process. The bhyve command used for the above test was: bhyve -c 1 -s 0,hostbridge -s 1,lpc -s 2,virtio-blk,/dev/zvol/zroot/viper1vol -s 3,virtio-net,tap0,mac=00:01:23:45:67:83 -s 4,virtio-net,tap4 -l com1,/dev/nmdm0A -A -H -P -m 6g viper1 & bhyve -c 1 -s 0,hostbridge -s 1,lpc -s 2,virtio-blk,/dev/zvol/zroot/viper2vol -s 3,virtio-net,tap1,mac=00:01:23:45:67:84 -s 4,virtio-net,tap5 -l com1,/dev/nmdm1A -A -H -P -m 6g viper2 For this test the host OS was FreeBSD-11.0-p3 and the guest OS was FreeBSD-11.0-p3. After failing to get this solution working, I pulled down the source from the following url and installed it on the host box: https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/head/ I then ran the following commands to try and bring up the machines using the ptnetmap interface (the guest still running 11.0p3) : bhyve -c 1 -s 0,hostbridge -s 1,lpc -s 1:1,virtio-blk,/dev/zvol/zroot/viper1vol -s 2:0,virtio-net,tap0,mac=00:01:23:45:67:83 -s 2:1,ptnetmap-memdev -s 2:2,ptnet,vale0:0 -l com1,/dev/nmdm0A -A -H -P -m 6g viper1 & bhyve -c 1 -s 0,hostbridge -s 1,lpc -s 1:1,virtio-blk,/dev/zvol/zroot/viper2vol -s 2:0,virtio-net,tap1,mac=00:01:23:45:67:84 -s 2:1,ptnetmap-memdev -s 2:2,ptnet,vale0:1 -l com1,/dev/nmdm1A -A -H -P -m 6g viper2 With the above commands the vm's fail to boot with the following message: ptnet_init: failed to get ptnetmap Output in /var/log/messages seems to just show the ptnetmap driver allocating one RX/TX ring for each vm, while bringing the device up, the device then goes down and the above error is seen in the console. Is there something I'm doing wrong with regards to running netmap or ptnetmap within a bhyve? Any pointers in the right direction will be much appreciated! Kind Regards Steven Hi Steven, The code you are looking at is the final code released by my gsoc 2016 project at the end of August 2016. However, I've been working on that for a while after the gsoc, so that code is not updated anymore. My modification basically involves two subsystems: netmap and bhyve. The updates to netmap are already available in HEAD. The updates to bhyve are not yet upstream, as we are in the process to review that with the bhyve maintainers. Anyway, you need two sources to get the latest code: 1) https://github.com/luigirizzo/netmap for the latest netmap code, that you could also compile as an external kernel module. 2) https://github.com/vmaffione/freebsd to get the updates to bhyve (and vmm.ko). You can use this as the host system. There are two branches here: ptnet-10.3, and ptnet-head. My original code was developed under FreeBSD 10.3, so a first possibility is to try this as the host system (using the ptnet-10.3 branch). The other branch (ptnet-head) contains a porting of the work to a recent version of HEAD. I would be very glad if you could test the code also on FreeBSD 11. We support two combinations of bhyve networking with netmap: (A) virtio-net + netmap: that is something like "-s 2:0,virtio-net,vale0:0" in the bhyve command line. (B) ptnet + ptnetmap: that is something like "-s 2:1,ptnetmap-memdev -s 2:2,ptnet,vale0:0" in the bhyve command line. so you may want to try A first (netmap backend in user-space, slower) and then B (netmap backend in kernel-space, faster). Sorry about the confusion on the code repositories, I'll try also to update the wiki page (and/or the gsoc svn repo) to reflect these updates. It's perfectly ok for me to discuss these issues here in the ML, however for more detailed/low-level discussion and support about problems you are running into, feel free to open Github issues here https://github.com/luigirizzo/netmap. Cheers, Vincenzo ________________________________ [Stream Logo] Steven Crangle Systems Developer | Stream Technologies | Glasgow, UK [http://www.stream-technologies.com/img/phone.png] +44 (0)844 800 8520<tel:%2B44%20%280%29844%20800%208520> | [http://www.stream-technologies.com/img/mouse.png] www.stream-technologies.com<http://www.stream-technologies.com><http://www.stream-technologies.com/> _______________________________________________ freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org<mailto:freebsd-net-unsubscribe@freebsd.org>" -- Vincenzo Maffione -- Vincenzo Maffione -- Vincenzo Maffione [-- Attachment #2 --] diff -u -r -N usr/src/.arcconfig /usr/src/.arcconfig --- usr/src/.arcconfig 2016-09-29 00:26:36.000000000 +0100 +++ /usr/src/.arcconfig 1970-01-01 01:00:00.000000000 +0100 @@ -1,5 +0,0 @@ -{ - "repository.callsign" : "S", - "phabricator.uri" : "https://reviews.freebsd.org/", - "history.immutable" : true -} diff -u -r -N usr/src/.arclint /usr/src/.arclint --- usr/src/.arclint 2016-09-29 00:26:36.000000000 +0100 +++ /usr/src/.arclint 1970-01-01 01:00:00.000000000 +0100 @@ -1,25 +0,0 @@ -{ - "exclude": "(contrib|crypto)", - "linters": { - "python": { - "type": "pep8", - "include": "(\\.py$)" - }, - "spelling": { - "type": "spelling" - }, - "chmod": { - "type": "chmod" - }, - "merge-conflict": { - "type": "merge-conflict" - }, - "filename": { - "type": "filename" - }, - "json": { - "type": "json", - "include": "(\\.arclint|\\.json$)" - } - } -} diff -u -r -N usr/src/Oops.rej /usr/src/Oops.rej --- usr/src/Oops.rej 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/Oops.rej 2016-11-30 10:56:10.434032000 +0000 @@ -0,0 +1,292 @@ +@@ -37,6 +37,7 @@ + #include <net/ethernet.h> + #include <netinet/in.h> + #include <netinet/tcp.h> ++#include <net/if.h> /* IFNAMSIZ */ + + #include <errno.h> + #include <fcntl.h> +@@ -55,6 +56,8 @@ + #include "bhyverun.h" + #include "pci_emul.h" + #include "mevent.h" ++#include "net_utils.h" /* MAC address generation */ ++#include "net_backends.h" + + /* Hardware/register definitions XXX: move some to common code. */ + #define E82545_VENDOR_ID_INTEL 0x8086 +@@ -234,11 +237,10 @@ + struct e82545_softc { + struct pci_devinst *esc_pi; + struct vmctx *esc_ctx; +- struct mevent *esc_mevp; + struct mevent *esc_mevpitr; + pthread_mutex_t esc_mtx; + struct ether_addr esc_mac; +- int esc_tapfd; ++ struct net_backend *esc_be; + + /* General */ + uint32_t esc_CTRL; /* x0000 device ctl */ +@@ -344,7 +346,7 @@ + static void e82545_reset(struct e82545_softc *sc, int dev); + static void e82545_rx_enable(struct e82545_softc *sc); + static void e82545_rx_disable(struct e82545_softc *sc); +-static void e82545_tap_callback(int fd, enum ev_type type, void *param); ++static void e82545_rx_callback(int fd, enum ev_type type, void *param); + static void e82545_tx_start(struct e82545_softc *sc); + static void e82545_tx_enable(struct e82545_softc *sc); + static void e82545_tx_disable(struct e82545_softc *sc); +@@ -813,11 +815,9 @@ + return (256); /* Forbidden value. */ + } + +-static uint8_t dummybuf[2048]; +- + /* XXX one packet at a time until this is debugged */ + static void +-e82545_tap_callback(int fd, enum ev_type type, void *param) ++e82545_rx_callback(int fd, enum ev_type type, void *param) + { + struct e82545_softc *sc = param; + struct e1000_rx_desc *rxd; +@@ -832,7 +832,7 @@ + if (!sc->esc_rx_enabled || sc->esc_rx_loopback) { + DPRINTF("rx disabled (!%d || %d) -- packet(s) dropped\r\n", + sc->esc_rx_enabled, sc->esc_rx_loopback); +- while (read(sc->esc_tapfd, dummybuf, sizeof(dummybuf)) > 0) { ++ while (netbe_rx_discard(sc->esc_be) > 0) { + } + goto done1; + } +@@ -845,7 +845,7 @@ + if (left < maxpktdesc) { + DPRINTF("rx overflow (%d < %d) -- packet(s) dropped\r\n", + left, maxpktdesc); +- while (read(sc->esc_tapfd, dummybuf, sizeof(dummybuf)) > 0) { ++ while (netbe_rx_discard(sc->esc_be) > 0) { + } + goto done1; + } +@@ -862,9 +862,9 @@ + rxd->buffer_addr, bufsz); + vec[i].iov_len = bufsz; + } +- len = readv(sc->esc_tapfd, vec, maxpktdesc); ++ len = netbe_recv(sc->esc_be, vec, maxpktdesc); + if (len <= 0) { +- DPRINTF("tap: readv() returned %d\n", len); ++ DPRINTF("be: recv() returned %d\n", len); + goto done; + } + +@@ -1036,13 +1036,10 @@ + } + + static void +-e82545_transmit_backend(struct e82545_softc *sc, struct iovec *iov, int iovcnt) ++e82545_transmit_backend(struct e82545_softc *sc, struct iovec *iov, int iovcnt, ++ uint32_t len) + { +- +- if (sc->esc_tapfd == -1) +- return; +- +- (void) writev(sc->esc_tapfd, iov, iovcnt); ++ netbe_send(sc->esc_be, iov, iovcnt, len, 0); + } + + static void +@@ -1078,7 +1075,7 @@ + + ckinfo[0].ck_valid = ckinfo[1].ck_valid = 0; + iovcnt = 0; +- tlen = 0; ++ tlen = 0; /* total length */ + ntype = 0; + tso = 0; + ohead = head; +@@ -1203,6 +1200,7 @@ + hdrlen = ETHER_ADDR_LEN*2; + vlen = ETHER_VLAN_ENCAP_LEN; + } ++ tlen += vlen; + if (!tso) { + /* Estimate required writable space for checksums. */ + if (ckinfo[0].ck_valid) +@@ -1268,7 +1266,7 @@ + e82545_transmit_checksum(iov, iovcnt, &ckinfo[0]); + if (ckinfo[1].ck_valid) + e82545_transmit_checksum(iov, iovcnt, &ckinfo[1]); +- e82545_transmit_backend(sc, iov, iovcnt); ++ e82545_transmit_backend(sc, iov, iovcnt, tlen); + goto done; + } + +@@ -1292,13 +1290,14 @@ + /* Construct IOVs for the segment. */ + /* Include whole original header. */ + tiov[0].iov_base = hdr; +- tiov[0].iov_len = hdrlen; ++ tiov[0].iov_len = tlen = hdrlen; + tiovcnt = 1; + /* Include respective part of payload IOV. */ + for (nleft = now; pv < iovcnt && nleft > 0; nleft -= nnow) { + nnow = MIN(nleft, iov[pv].iov_len - pvoff); + tiov[tiovcnt].iov_base = iov[pv].iov_base + pvoff; + tiov[tiovcnt++].iov_len = nnow; ++ tlen += nnow; + if (pvoff + nnow == iov[pv].iov_len) { + pv++; + pvoff = 0; +@@ -1351,7 +1350,7 @@ + e82545_carry(tcpsum); + e82545_transmit_checksum(tiov, tiovcnt, &ckinfo[1]); + } +- e82545_transmit_backend(sc, tiov, tiovcnt); ++ e82545_transmit_backend(sc, tiov, tiovcnt, tlen); + } + + done: +@@ -2198,80 +2197,17 @@ + sc->esc_TXDCTL = 0; + } + +-static void +-e82545_open_tap(struct e82545_softc *sc, char *opts) +-{ +- char tbuf[80]; +- +- if (opts == NULL) { +- sc->esc_tapfd = -1; +- return; +- } +- +- strcpy(tbuf, "/dev/"); +- strlcat(tbuf, opts, sizeof(tbuf)); +- +- sc->esc_tapfd = open(tbuf, O_RDWR); +- if (sc->esc_tapfd == -1) { +- DPRINTF("unable to open tap device %s\n", opts); +- exit(1); +- } +- +- /* +- * Set non-blocking and register for read +- * notifications with the event loop +- */ +- int opt = 1; +- if (ioctl(sc->esc_tapfd, FIONBIO, &opt) < 0) { +- WPRINTF("tap device O_NONBLOCK failed: %d\n", errno); +- close(sc->esc_tapfd); +- sc->esc_tapfd = -1; +- } +- +- sc->esc_mevp = mevent_add(sc->esc_tapfd, +- EVF_READ, +- e82545_tap_callback, +- sc); +- if (sc->esc_mevp == NULL) { +- DPRINTF("Could not register mevent %d\n", EVF_READ); +- close(sc->esc_tapfd); +- sc->esc_tapfd = -1; +- } +-} +- +-static int +-e82545_parsemac(char *mac_str, uint8_t *mac_addr) +-{ +- struct ether_addr *ea; +- char *tmpstr; +- char zero_addr[ETHER_ADDR_LEN] = { 0, 0, 0, 0, 0, 0 }; +- +- tmpstr = strsep(&mac_str,"="); +- if ((mac_str != NULL) && (!strcmp(tmpstr,"mac"))) { +- ea = ether_aton(mac_str); +- if (ea == NULL || ETHER_IS_MULTICAST(ea->octet) || +- memcmp(ea->octet, zero_addr, ETHER_ADDR_LEN) == 0) { +- fprintf(stderr, "Invalid MAC %s\n", mac_str); +- return (1); +- } else +- memcpy(mac_addr, ea->octet, ETHER_ADDR_LEN); +- } +- return (0); +-} +- + static int + e82545_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) + { +- DPRINTF("Loading with options: %s\r\n", opts); +- +- MD5_CTX mdctx; +- unsigned char digest[16]; + char nstr[80]; + struct e82545_softc *sc; + char *devname; + char *vtopts; + int mac_provided; + ++ DPRINTF("Loading with options: %s\r\n", opts); ++ + /* Setup our softc */ + sc = calloc(sizeof(*sc), 1); + +@@ -2309,11 +2245,10 @@ + E82545_BAR_IO_LEN); + + /* +- * Attempt to open the tap device and read the MAC address +- * if specified. Copied from virtio-net, slightly modified. ++ * Attempt to open the backend device and read the MAC address ++ * if specified. Copied from virtio-net, slightly modified. + */ + mac_provided = 0; +- sc->esc_tapfd = -1; + if (opts != NULL) { + int err; + +@@ -2321,7 +2256,7 @@ + (void) strsep(&vtopts, ","); + + if (vtopts != NULL) { +- err = e82545_parsemac(vtopts, sc->esc_mac.octet); ++ err = net_parsemac(vtopts, sc->esc_mac.octet); + if (err != 0) { + free(devname); + return (err); +@@ -2329,9 +2264,11 @@ + mac_provided = 1; + } + +- if (strncmp(devname, "tap", 3) == 0 || +- strncmp(devname, "vmnet", 5) == 0) +- e82545_open_tap(sc, devname); ++ sc->esc_be = netbe_init(devname, e82545_rx_callback, sc); ++ if (!sc->esc_be) { ++ WPRINTF("net backend '%s' initialization failed\n", ++ devname); ++ } + + free(devname); + } +@@ -2341,19 +2278,7 @@ + * followed by an MD5 of the PCI slot/func number and dev name + */ + if (!mac_provided) { +- snprintf(nstr, sizeof(nstr), "%d-%d-%s", pi->pi_slot, +- pi->pi_func, vmname); +- +- MD5Init(&mdctx); +- MD5Update(&mdctx, nstr, strlen(nstr)); +- MD5Final(digest, &mdctx); +- +- sc->esc_mac.octet[0] = 0x00; +- sc->esc_mac.octet[1] = 0xa0; +- sc->esc_mac.octet[2] = 0x98; +- sc->esc_mac.octet[3] = digest[0]; +- sc->esc_mac.octet[4] = digest[1]; +- sc->esc_mac.octet[5] = digest[2]; ++ net_genmac(pi, sc->esc_mac.octet); + } + + /* H/w initiated reset */ diff -u -r -N usr/src/etc/src.conf /usr/src/etc/src.conf --- usr/src/etc/src.conf 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/etc/src.conf 2016-12-01 10:06:18.011463000 +0000 @@ -0,0 +1,5 @@ +WITH_CLANG= yes +WITH_CLANG_EXTRAS= yes # Build additional clang and llvm tools, such as bugpoint. +WITH_CLANG_IS_CC= yes # Useful for some buildworld errors +# WITHOUT_GCC= yes + diff -u -r -N usr/src/lib/libvmmapi/vmmapi.c /usr/src/lib/libvmmapi/vmmapi.c --- usr/src/lib/libvmmapi/vmmapi.c 2016-09-29 00:26:02.000000000 +0100 +++ /usr/src/lib/libvmmapi/vmmapi.c 2016-11-30 10:56:05.781616000 +0000 @@ -882,6 +882,55 @@ return (ioctl(ctx->fd, VM_MAP_PPTDEV_MMIO, &pptmmio)); } +/* + * Export the file descriptor associated with this VM, userful for external + * programs (e.g. to issue ioctl()). + */ +int +vm_get_fd(struct vmctx *ctx) +{ + return (ctx->fd); +} + +/* + * Map an user-space buffer into the VM at a given physical address. + * To be used for devices that expose internal memory. + */ +int +vm_map_user_buf(struct vmctx *ctx, vm_paddr_t gpa, size_t len, void *host_buf) +{ + struct vm_user_buf user_buf; + + bzero(&user_buf, sizeof(user_buf)); + user_buf.gpa = gpa; + user_buf.len = len; + user_buf.addr = host_buf; + + return (ioctl(ctx->fd, VM_MAP_USER_BUF, &user_buf)); +} + +/* + * Register handler for guest I/O accesses on a given I/O port, optionally + * filtering on the data. QEMU/KVM implement a similar functionality. + */ +int +vm_io_reg_handler(struct vmctx *ctx, uint16_t port, uint16_t in, + uint32_t mask_data, uint32_t data, + enum vm_io_regh_type type, void *arg) +{ + struct vm_io_reg_handler ioregh; + + bzero(&ioregh, sizeof(ioregh)); + ioregh.port = port; + ioregh.in = in; + ioregh.mask_data = mask_data; + ioregh.data = data; + ioregh.type = type; + ioregh.arg = arg; + + return (ioctl(ctx->fd, VM_IO_REG_HANDLER, &ioregh)); +} + int vm_setup_pptdev_msi(struct vmctx *ctx, int vcpu, int bus, int slot, int func, uint64_t addr, uint64_t msg, int numvec) diff -u -r -N usr/src/lib/libvmmapi/vmmapi.c.orig /usr/src/lib/libvmmapi/vmmapi.c.orig --- usr/src/lib/libvmmapi/vmmapi.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/lib/libvmmapi/vmmapi.c.orig 2016-11-30 10:52:51.702629000 +0000 @@ -0,0 +1,1413 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/lib/libvmmapi/vmmapi.c 295881 2016-02-22 09:04:36Z skra $ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD: releng/11.0/lib/libvmmapi/vmmapi.c 295881 2016-02-22 09:04:36Z skra $"); + +#include <sys/param.h> +#include <sys/sysctl.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <sys/_iovec.h> +#include <sys/cpuset.h> + +#include <x86/segments.h> +#include <machine/specialreg.h> + +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <assert.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> + +#include <libutil.h> + +#include <machine/vmm.h> +#include <machine/vmm_dev.h> + +#include "vmmapi.h" + +#define MB (1024 * 1024UL) +#define GB (1024 * 1024 * 1024UL) + +/* + * Size of the guard region before and after the virtual address space + * mapping the guest physical memory. This must be a multiple of the + * superpage size for performance reasons. + */ +#define VM_MMAP_GUARD_SIZE (4 * MB) + +#define PROT_RW (PROT_READ | PROT_WRITE) +#define PROT_ALL (PROT_READ | PROT_WRITE | PROT_EXEC) + +struct vmctx { + int fd; + uint32_t lowmem_limit; + int memflags; + size_t lowmem; + size_t highmem; + char *baseaddr; + char *name; +}; + +#define CREATE(x) sysctlbyname("hw.vmm.create", NULL, NULL, (x), strlen((x))) +#define DESTROY(x) sysctlbyname("hw.vmm.destroy", NULL, NULL, (x), strlen((x))) + +static int +vm_device_open(const char *name) +{ + int fd, len; + char *vmfile; + + len = strlen("/dev/vmm/") + strlen(name) + 1; + vmfile = malloc(len); + assert(vmfile != NULL); + snprintf(vmfile, len, "/dev/vmm/%s", name); + + /* Open the device file */ + fd = open(vmfile, O_RDWR, 0); + + free(vmfile); + return (fd); +} + +int +vm_create(const char *name) +{ + + return (CREATE((char *)name)); +} + +struct vmctx * +vm_open(const char *name) +{ + struct vmctx *vm; + + vm = malloc(sizeof(struct vmctx) + strlen(name) + 1); + assert(vm != NULL); + + vm->fd = -1; + vm->memflags = 0; + vm->lowmem_limit = 3 * GB; + vm->name = (char *)(vm + 1); + strcpy(vm->name, name); + + if ((vm->fd = vm_device_open(vm->name)) < 0) + goto err; + + return (vm); +err: + vm_destroy(vm); + return (NULL); +} + +void +vm_destroy(struct vmctx *vm) +{ + assert(vm != NULL); + + if (vm->fd >= 0) + close(vm->fd); + DESTROY(vm->name); + + free(vm); +} + +int +vm_parse_memsize(const char *optarg, size_t *ret_memsize) +{ + char *endptr; + size_t optval; + int error; + + optval = strtoul(optarg, &endptr, 0); + if (*optarg != '\0' && *endptr == '\0') { + /* + * For the sake of backward compatibility if the memory size + * specified on the command line is less than a megabyte then + * it is interpreted as being in units of MB. + */ + if (optval < MB) + optval *= MB; + *ret_memsize = optval; + error = 0; + } else + error = expand_number(optarg, ret_memsize); + + return (error); +} + +uint32_t +vm_get_lowmem_limit(struct vmctx *ctx) +{ + + return (ctx->lowmem_limit); +} + +void +vm_set_lowmem_limit(struct vmctx *ctx, uint32_t limit) +{ + + ctx->lowmem_limit = limit; +} + +void +vm_set_memflags(struct vmctx *ctx, int flags) +{ + + ctx->memflags = flags; +} + +int +vm_get_memflags(struct vmctx *ctx) +{ + + return (ctx->memflags); +} + +/* + * Map segment 'segid' starting at 'off' into guest address range [gpa,gpa+len). + */ +int +vm_mmap_memseg(struct vmctx *ctx, vm_paddr_t gpa, int segid, vm_ooffset_t off, + size_t len, int prot) +{ + struct vm_memmap memmap; + int error, flags; + + memmap.gpa = gpa; + memmap.segid = segid; + memmap.segoff = off; + memmap.len = len; + memmap.prot = prot; + memmap.flags = 0; + + if (ctx->memflags & VM_MEM_F_WIRED) + memmap.flags |= VM_MEMMAP_F_WIRED; + + /* + * If this mapping already exists then don't create it again. This + * is the common case for SYSMEM mappings created by bhyveload(8). + */ + error = vm_mmap_getnext(ctx, &gpa, &segid, &off, &len, &prot, &flags); + if (error == 0 && gpa == memmap.gpa) { + if (segid != memmap.segid || off != memmap.segoff || + prot != memmap.prot || flags != memmap.flags) { + errno = EEXIST; + return (-1); + } else { + return (0); + } + } + + error = ioctl(ctx->fd, VM_MMAP_MEMSEG, &memmap); + return (error); +} + +int +vm_mmap_getnext(struct vmctx *ctx, vm_paddr_t *gpa, int *segid, + vm_ooffset_t *segoff, size_t *len, int *prot, int *flags) +{ + struct vm_memmap memmap; + int error; + + bzero(&memmap, sizeof(struct vm_memmap)); + memmap.gpa = *gpa; + error = ioctl(ctx->fd, VM_MMAP_GETNEXT, &memmap); + if (error == 0) { + *gpa = memmap.gpa; + *segid = memmap.segid; + *segoff = memmap.segoff; + *len = memmap.len; + *prot = memmap.prot; + *flags = memmap.flags; + } + return (error); +} + +/* + * Return 0 if the segments are identical and non-zero otherwise. + * + * This is slightly complicated by the fact that only device memory segments + * are named. + */ +static int +cmpseg(size_t len, const char *str, size_t len2, const char *str2) +{ + + if (len == len2) { + if ((!str && !str2) || (str && str2 && !strcmp(str, str2))) + return (0); + } + return (-1); +} + +static int +vm_alloc_memseg(struct vmctx *ctx, int segid, size_t len, const char *name) +{ + struct vm_memseg memseg; + size_t n; + int error; + + /* + * If the memory segment has already been created then just return. + * This is the usual case for the SYSMEM segment created by userspace + * loaders like bhyveload(8). + */ + error = vm_get_memseg(ctx, segid, &memseg.len, memseg.name, + sizeof(memseg.name)); + if (error) + return (error); + + if (memseg.len != 0) { + if (cmpseg(len, name, memseg.len, VM_MEMSEG_NAME(&memseg))) { + errno = EINVAL; + return (-1); + } else { + return (0); + } + } + + bzero(&memseg, sizeof(struct vm_memseg)); + memseg.segid = segid; + memseg.len = len; + if (name != NULL) { + n = strlcpy(memseg.name, name, sizeof(memseg.name)); + if (n >= sizeof(memseg.name)) { + errno = ENAMETOOLONG; + return (-1); + } + } + + error = ioctl(ctx->fd, VM_ALLOC_MEMSEG, &memseg); + return (error); +} + +int +vm_get_memseg(struct vmctx *ctx, int segid, size_t *lenp, char *namebuf, + size_t bufsize) +{ + struct vm_memseg memseg; + size_t n; + int error; + + memseg.segid = segid; + error = ioctl(ctx->fd, VM_GET_MEMSEG, &memseg); + if (error == 0) { + *lenp = memseg.len; + n = strlcpy(namebuf, memseg.name, bufsize); + if (n >= bufsize) { + errno = ENAMETOOLONG; + error = -1; + } + } + return (error); +} + +static int +setup_memory_segment(struct vmctx *ctx, vm_paddr_t gpa, size_t len, char *base) +{ + char *ptr; + int error, flags; + + /* Map 'len' bytes starting at 'gpa' in the guest address space */ + error = vm_mmap_memseg(ctx, gpa, VM_SYSMEM, gpa, len, PROT_ALL); + if (error) + return (error); + + flags = MAP_SHARED | MAP_FIXED; + if ((ctx->memflags & VM_MEM_F_INCORE) == 0) + flags |= MAP_NOCORE; + + /* mmap into the process address space on the host */ + ptr = mmap(base + gpa, len, PROT_RW, flags, ctx->fd, gpa); + if (ptr == MAP_FAILED) + return (-1); + + return (0); +} + +int +vm_setup_memory(struct vmctx *ctx, size_t memsize, enum vm_mmap_style vms) +{ + size_t objsize, len; + vm_paddr_t gpa; + char *baseaddr, *ptr; + int error, flags; + + assert(vms == VM_MMAP_ALL); + + /* + * If 'memsize' cannot fit entirely in the 'lowmem' segment then + * create another 'highmem' segment above 4GB for the remainder. + */ + if (memsize > ctx->lowmem_limit) { + ctx->lowmem = ctx->lowmem_limit; + ctx->highmem = memsize - ctx->lowmem_limit; + objsize = 4*GB + ctx->highmem; + } else { + ctx->lowmem = memsize; + ctx->highmem = 0; + objsize = ctx->lowmem; + } + + error = vm_alloc_memseg(ctx, VM_SYSMEM, objsize, NULL); + if (error) + return (error); + + /* + * Stake out a contiguous region covering the guest physical memory + * and the adjoining guard regions. + */ + len = VM_MMAP_GUARD_SIZE + objsize + VM_MMAP_GUARD_SIZE; + flags = MAP_PRIVATE | MAP_ANON | MAP_NOCORE | MAP_ALIGNED_SUPER; + ptr = mmap(NULL, len, PROT_NONE, flags, -1, 0); + if (ptr == MAP_FAILED) + return (-1); + + baseaddr = ptr + VM_MMAP_GUARD_SIZE; + if (ctx->highmem > 0) { + gpa = 4*GB; + len = ctx->highmem; + error = setup_memory_segment(ctx, gpa, len, baseaddr); + if (error) + return (error); + } + + if (ctx->lowmem > 0) { + gpa = 0; + len = ctx->lowmem; + error = setup_memory_segment(ctx, gpa, len, baseaddr); + if (error) + return (error); + } + + ctx->baseaddr = baseaddr; + + return (0); +} + +/* + * Returns a non-NULL pointer if [gaddr, gaddr+len) is entirely contained in + * the lowmem or highmem regions. + * + * In particular return NULL if [gaddr, gaddr+len) falls in guest MMIO region. + * The instruction emulation code depends on this behavior. + */ +void * +vm_map_gpa(struct vmctx *ctx, vm_paddr_t gaddr, size_t len) +{ + + if (ctx->lowmem > 0) { + if (gaddr < ctx->lowmem && gaddr + len <= ctx->lowmem) + return (ctx->baseaddr + gaddr); + } + + if (ctx->highmem > 0) { + if (gaddr >= 4*GB && gaddr + len <= 4*GB + ctx->highmem) + return (ctx->baseaddr + gaddr); + } + + return (NULL); +} + +size_t +vm_get_lowmem_size(struct vmctx *ctx) +{ + + return (ctx->lowmem); +} + +size_t +vm_get_highmem_size(struct vmctx *ctx) +{ + + return (ctx->highmem); +} + +void * +vm_create_devmem(struct vmctx *ctx, int segid, const char *name, size_t len) +{ + char pathname[MAXPATHLEN]; + size_t len2; + char *base, *ptr; + int fd, error, flags; + + fd = -1; + ptr = MAP_FAILED; + if (name == NULL || strlen(name) == 0) { + errno = EINVAL; + goto done; + } + + error = vm_alloc_memseg(ctx, segid, len, name); + if (error) + goto done; + + strlcpy(pathname, "/dev/vmm.io/", sizeof(pathname)); + strlcat(pathname, ctx->name, sizeof(pathname)); + strlcat(pathname, ".", sizeof(pathname)); + strlcat(pathname, name, sizeof(pathname)); + + fd = open(pathname, O_RDWR); + if (fd < 0) + goto done; + + /* + * Stake out a contiguous region covering the device memory and the + * adjoining guard regions. + */ + len2 = VM_MMAP_GUARD_SIZE + len + VM_MMAP_GUARD_SIZE; + flags = MAP_PRIVATE | MAP_ANON | MAP_NOCORE | MAP_ALIGNED_SUPER; + base = mmap(NULL, len2, PROT_NONE, flags, -1, 0); + if (base == MAP_FAILED) + goto done; + + flags = MAP_SHARED | MAP_FIXED; + if ((ctx->memflags & VM_MEM_F_INCORE) == 0) + flags |= MAP_NOCORE; + + /* mmap the devmem region in the host address space */ + ptr = mmap(base + VM_MMAP_GUARD_SIZE, len, PROT_RW, flags, fd, 0); +done: + if (fd >= 0) + close(fd); + return (ptr); +} + +int +vm_set_desc(struct vmctx *ctx, int vcpu, int reg, + uint64_t base, uint32_t limit, uint32_t access) +{ + int error; + struct vm_seg_desc vmsegdesc; + + bzero(&vmsegdesc, sizeof(vmsegdesc)); + vmsegdesc.cpuid = vcpu; + vmsegdesc.regnum = reg; + vmsegdesc.desc.base = base; + vmsegdesc.desc.limit = limit; + vmsegdesc.desc.access = access; + + error = ioctl(ctx->fd, VM_SET_SEGMENT_DESCRIPTOR, &vmsegdesc); + return (error); +} + +int +vm_get_desc(struct vmctx *ctx, int vcpu, int reg, + uint64_t *base, uint32_t *limit, uint32_t *access) +{ + int error; + struct vm_seg_desc vmsegdesc; + + bzero(&vmsegdesc, sizeof(vmsegdesc)); + vmsegdesc.cpuid = vcpu; + vmsegdesc.regnum = reg; + + error = ioctl(ctx->fd, VM_GET_SEGMENT_DESCRIPTOR, &vmsegdesc); + if (error == 0) { + *base = vmsegdesc.desc.base; + *limit = vmsegdesc.desc.limit; + *access = vmsegdesc.desc.access; + } + return (error); +} + +int +vm_get_seg_desc(struct vmctx *ctx, int vcpu, int reg, struct seg_desc *seg_desc) +{ + int error; + + error = vm_get_desc(ctx, vcpu, reg, &seg_desc->base, &seg_desc->limit, + &seg_desc->access); + return (error); +} + +int +vm_set_register(struct vmctx *ctx, int vcpu, int reg, uint64_t val) +{ + int error; + struct vm_register vmreg; + + bzero(&vmreg, sizeof(vmreg)); + vmreg.cpuid = vcpu; + vmreg.regnum = reg; + vmreg.regval = val; + + error = ioctl(ctx->fd, VM_SET_REGISTER, &vmreg); + return (error); +} + +int +vm_get_register(struct vmctx *ctx, int vcpu, int reg, uint64_t *ret_val) +{ + int error; + struct vm_register vmreg; + + bzero(&vmreg, sizeof(vmreg)); + vmreg.cpuid = vcpu; + vmreg.regnum = reg; + + error = ioctl(ctx->fd, VM_GET_REGISTER, &vmreg); + *ret_val = vmreg.regval; + return (error); +} + +int +vm_run(struct vmctx *ctx, int vcpu, struct vm_exit *vmexit) +{ + int error; + struct vm_run vmrun; + + bzero(&vmrun, sizeof(vmrun)); + vmrun.cpuid = vcpu; + + error = ioctl(ctx->fd, VM_RUN, &vmrun); + bcopy(&vmrun.vm_exit, vmexit, sizeof(struct vm_exit)); + return (error); +} + +int +vm_suspend(struct vmctx *ctx, enum vm_suspend_how how) +{ + struct vm_suspend vmsuspend; + + bzero(&vmsuspend, sizeof(vmsuspend)); + vmsuspend.how = how; + return (ioctl(ctx->fd, VM_SUSPEND, &vmsuspend)); +} + +int +vm_reinit(struct vmctx *ctx) +{ + + return (ioctl(ctx->fd, VM_REINIT, 0)); +} + +int +vm_inject_exception(struct vmctx *ctx, int vcpu, int vector, int errcode_valid, + uint32_t errcode, int restart_instruction) +{ + struct vm_exception exc; + + exc.cpuid = vcpu; + exc.vector = vector; + exc.error_code = errcode; + exc.error_code_valid = errcode_valid; + exc.restart_instruction = restart_instruction; + + return (ioctl(ctx->fd, VM_INJECT_EXCEPTION, &exc)); +} + +int +vm_apicid2vcpu(struct vmctx *ctx, int apicid) +{ + /* + * The apic id associated with the 'vcpu' has the same numerical value + * as the 'vcpu' itself. + */ + return (apicid); +} + +int +vm_lapic_irq(struct vmctx *ctx, int vcpu, int vector) +{ + struct vm_lapic_irq vmirq; + + bzero(&vmirq, sizeof(vmirq)); + vmirq.cpuid = vcpu; + vmirq.vector = vector; + + return (ioctl(ctx->fd, VM_LAPIC_IRQ, &vmirq)); +} + +int +vm_lapic_local_irq(struct vmctx *ctx, int vcpu, int vector) +{ + struct vm_lapic_irq vmirq; + + bzero(&vmirq, sizeof(vmirq)); + vmirq.cpuid = vcpu; + vmirq.vector = vector; + + return (ioctl(ctx->fd, VM_LAPIC_LOCAL_IRQ, &vmirq)); +} + +int +vm_lapic_msi(struct vmctx *ctx, uint64_t addr, uint64_t msg) +{ + struct vm_lapic_msi vmmsi; + + bzero(&vmmsi, sizeof(vmmsi)); + vmmsi.addr = addr; + vmmsi.msg = msg; + + return (ioctl(ctx->fd, VM_LAPIC_MSI, &vmmsi)); +} + +int +vm_ioapic_assert_irq(struct vmctx *ctx, int irq) +{ + struct vm_ioapic_irq ioapic_irq; + + bzero(&ioapic_irq, sizeof(struct vm_ioapic_irq)); + ioapic_irq.irq = irq; + + return (ioctl(ctx->fd, VM_IOAPIC_ASSERT_IRQ, &ioapic_irq)); +} + +int +vm_ioapic_deassert_irq(struct vmctx *ctx, int irq) +{ + struct vm_ioapic_irq ioapic_irq; + + bzero(&ioapic_irq, sizeof(struct vm_ioapic_irq)); + ioapic_irq.irq = irq; + + return (ioctl(ctx->fd, VM_IOAPIC_DEASSERT_IRQ, &ioapic_irq)); +} + +int +vm_ioapic_pulse_irq(struct vmctx *ctx, int irq) +{ + struct vm_ioapic_irq ioapic_irq; + + bzero(&ioapic_irq, sizeof(struct vm_ioapic_irq)); + ioapic_irq.irq = irq; + + return (ioctl(ctx->fd, VM_IOAPIC_PULSE_IRQ, &ioapic_irq)); +} + +int +vm_ioapic_pincount(struct vmctx *ctx, int *pincount) +{ + + return (ioctl(ctx->fd, VM_IOAPIC_PINCOUNT, pincount)); +} + +int +vm_isa_assert_irq(struct vmctx *ctx, int atpic_irq, int ioapic_irq) +{ + struct vm_isa_irq isa_irq; + + bzero(&isa_irq, sizeof(struct vm_isa_irq)); + isa_irq.atpic_irq = atpic_irq; + isa_irq.ioapic_irq = ioapic_irq; + + return (ioctl(ctx->fd, VM_ISA_ASSERT_IRQ, &isa_irq)); +} + +int +vm_isa_deassert_irq(struct vmctx *ctx, int atpic_irq, int ioapic_irq) +{ + struct vm_isa_irq isa_irq; + + bzero(&isa_irq, sizeof(struct vm_isa_irq)); + isa_irq.atpic_irq = atpic_irq; + isa_irq.ioapic_irq = ioapic_irq; + + return (ioctl(ctx->fd, VM_ISA_DEASSERT_IRQ, &isa_irq)); +} + +int +vm_isa_pulse_irq(struct vmctx *ctx, int atpic_irq, int ioapic_irq) +{ + struct vm_isa_irq isa_irq; + + bzero(&isa_irq, sizeof(struct vm_isa_irq)); + isa_irq.atpic_irq = atpic_irq; + isa_irq.ioapic_irq = ioapic_irq; + + return (ioctl(ctx->fd, VM_ISA_PULSE_IRQ, &isa_irq)); +} + +int +vm_isa_set_irq_trigger(struct vmctx *ctx, int atpic_irq, + enum vm_intr_trigger trigger) +{ + struct vm_isa_irq_trigger isa_irq_trigger; + + bzero(&isa_irq_trigger, sizeof(struct vm_isa_irq_trigger)); + isa_irq_trigger.atpic_irq = atpic_irq; + isa_irq_trigger.trigger = trigger; + + return (ioctl(ctx->fd, VM_ISA_SET_IRQ_TRIGGER, &isa_irq_trigger)); +} + +int +vm_inject_nmi(struct vmctx *ctx, int vcpu) +{ + struct vm_nmi vmnmi; + + bzero(&vmnmi, sizeof(vmnmi)); + vmnmi.cpuid = vcpu; + + return (ioctl(ctx->fd, VM_INJECT_NMI, &vmnmi)); +} + +static struct { + const char *name; + int type; +} capstrmap[] = { + { "hlt_exit", VM_CAP_HALT_EXIT }, + { "mtrap_exit", VM_CAP_MTRAP_EXIT }, + { "pause_exit", VM_CAP_PAUSE_EXIT }, + { "unrestricted_guest", VM_CAP_UNRESTRICTED_GUEST }, + { "enable_invpcid", VM_CAP_ENABLE_INVPCID }, + { 0 } +}; + +int +vm_capability_name2type(const char *capname) +{ + int i; + + for (i = 0; capstrmap[i].name != NULL && capname != NULL; i++) { + if (strcmp(capstrmap[i].name, capname) == 0) + return (capstrmap[i].type); + } + + return (-1); +} + +const char * +vm_capability_type2name(int type) +{ + int i; + + for (i = 0; capstrmap[i].name != NULL; i++) { + if (capstrmap[i].type == type) + return (capstrmap[i].name); + } + + return (NULL); +} + +int +vm_get_capability(struct vmctx *ctx, int vcpu, enum vm_cap_type cap, + int *retval) +{ + int error; + struct vm_capability vmcap; + + bzero(&vmcap, sizeof(vmcap)); + vmcap.cpuid = vcpu; + vmcap.captype = cap; + + error = ioctl(ctx->fd, VM_GET_CAPABILITY, &vmcap); + *retval = vmcap.capval; + return (error); +} + +int +vm_set_capability(struct vmctx *ctx, int vcpu, enum vm_cap_type cap, int val) +{ + struct vm_capability vmcap; + + bzero(&vmcap, sizeof(vmcap)); + vmcap.cpuid = vcpu; + vmcap.captype = cap; + vmcap.capval = val; + + return (ioctl(ctx->fd, VM_SET_CAPABILITY, &vmcap)); +} + +int +vm_assign_pptdev(struct vmctx *ctx, int bus, int slot, int func) +{ + struct vm_pptdev pptdev; + + bzero(&pptdev, sizeof(pptdev)); + pptdev.bus = bus; + pptdev.slot = slot; + pptdev.func = func; + + return (ioctl(ctx->fd, VM_BIND_PPTDEV, &pptdev)); +} + +int +vm_unassign_pptdev(struct vmctx *ctx, int bus, int slot, int func) +{ + struct vm_pptdev pptdev; + + bzero(&pptdev, sizeof(pptdev)); + pptdev.bus = bus; + pptdev.slot = slot; + pptdev.func = func; + + return (ioctl(ctx->fd, VM_UNBIND_PPTDEV, &pptdev)); +} + +int +vm_map_pptdev_mmio(struct vmctx *ctx, int bus, int slot, int func, + vm_paddr_t gpa, size_t len, vm_paddr_t hpa) +{ + struct vm_pptdev_mmio pptmmio; + + bzero(&pptmmio, sizeof(pptmmio)); + pptmmio.bus = bus; + pptmmio.slot = slot; + pptmmio.func = func; + pptmmio.gpa = gpa; + pptmmio.len = len; + pptmmio.hpa = hpa; + + return (ioctl(ctx->fd, VM_MAP_PPTDEV_MMIO, &pptmmio)); +} + +int +vm_setup_pptdev_msi(struct vmctx *ctx, int vcpu, int bus, int slot, int func, + uint64_t addr, uint64_t msg, int numvec) +{ + struct vm_pptdev_msi pptmsi; + + bzero(&pptmsi, sizeof(pptmsi)); + pptmsi.vcpu = vcpu; + pptmsi.bus = bus; + pptmsi.slot = slot; + pptmsi.func = func; + pptmsi.msg = msg; + pptmsi.addr = addr; + pptmsi.numvec = numvec; + + return (ioctl(ctx->fd, VM_PPTDEV_MSI, &pptmsi)); +} + +int +vm_setup_pptdev_msix(struct vmctx *ctx, int vcpu, int bus, int slot, int func, + int idx, uint64_t addr, uint64_t msg, uint32_t vector_control) +{ + struct vm_pptdev_msix pptmsix; + + bzero(&pptmsix, sizeof(pptmsix)); + pptmsix.vcpu = vcpu; + pptmsix.bus = bus; + pptmsix.slot = slot; + pptmsix.func = func; + pptmsix.idx = idx; + pptmsix.msg = msg; + pptmsix.addr = addr; + pptmsix.vector_control = vector_control; + + return ioctl(ctx->fd, VM_PPTDEV_MSIX, &pptmsix); +} + +uint64_t * +vm_get_stats(struct vmctx *ctx, int vcpu, struct timeval *ret_tv, + int *ret_entries) +{ + int error; + + static struct vm_stats vmstats; + + vmstats.cpuid = vcpu; + + error = ioctl(ctx->fd, VM_STATS, &vmstats); + if (error == 0) { + if (ret_entries) + *ret_entries = vmstats.num_entries; + if (ret_tv) + *ret_tv = vmstats.tv; + return (vmstats.statbuf); + } else + return (NULL); +} + +const char * +vm_get_stat_desc(struct vmctx *ctx, int index) +{ + static struct vm_stat_desc statdesc; + + statdesc.index = index; + if (ioctl(ctx->fd, VM_STAT_DESC, &statdesc) == 0) + return (statdesc.desc); + else + return (NULL); +} + +int +vm_get_x2apic_state(struct vmctx *ctx, int vcpu, enum x2apic_state *state) +{ + int error; + struct vm_x2apic x2apic; + + bzero(&x2apic, sizeof(x2apic)); + x2apic.cpuid = vcpu; + + error = ioctl(ctx->fd, VM_GET_X2APIC_STATE, &x2apic); + *state = x2apic.state; + return (error); +} + +int +vm_set_x2apic_state(struct vmctx *ctx, int vcpu, enum x2apic_state state) +{ + int error; + struct vm_x2apic x2apic; + + bzero(&x2apic, sizeof(x2apic)); + x2apic.cpuid = vcpu; + x2apic.state = state; + + error = ioctl(ctx->fd, VM_SET_X2APIC_STATE, &x2apic); + + return (error); +} + +/* + * From Intel Vol 3a: + * Table 9-1. IA-32 Processor States Following Power-up, Reset or INIT + */ +int +vcpu_reset(struct vmctx *vmctx, int vcpu) +{ + int error; + uint64_t rflags, rip, cr0, cr4, zero, desc_base, rdx; + uint32_t desc_access, desc_limit; + uint16_t sel; + + zero = 0; + + rflags = 0x2; + error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RFLAGS, rflags); + if (error) + goto done; + + rip = 0xfff0; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RIP, rip)) != 0) + goto done; + + cr0 = CR0_NE; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_CR0, cr0)) != 0) + goto done; + + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_CR3, zero)) != 0) + goto done; + + cr4 = 0; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_CR4, cr4)) != 0) + goto done; + + /* + * CS: present, r/w, accessed, 16-bit, byte granularity, usable + */ + desc_base = 0xffff0000; + desc_limit = 0xffff; + desc_access = 0x0093; + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_CS, + desc_base, desc_limit, desc_access); + if (error) + goto done; + + sel = 0xf000; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_CS, sel)) != 0) + goto done; + + /* + * SS,DS,ES,FS,GS: present, r/w, accessed, 16-bit, byte granularity + */ + desc_base = 0; + desc_limit = 0xffff; + desc_access = 0x0093; + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_SS, + desc_base, desc_limit, desc_access); + if (error) + goto done; + + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_DS, + desc_base, desc_limit, desc_access); + if (error) + goto done; + + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_ES, + desc_base, desc_limit, desc_access); + if (error) + goto done; + + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_FS, + desc_base, desc_limit, desc_access); + if (error) + goto done; + + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_GS, + desc_base, desc_limit, desc_access); + if (error) + goto done; + + sel = 0; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_SS, sel)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_DS, sel)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_ES, sel)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_FS, sel)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_GS, sel)) != 0) + goto done; + + /* General purpose registers */ + rdx = 0xf00; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RAX, zero)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RBX, zero)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RCX, zero)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RDX, rdx)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RSI, zero)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RDI, zero)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RBP, zero)) != 0) + goto done; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_RSP, zero)) != 0) + goto done; + + /* GDTR, IDTR */ + desc_base = 0; + desc_limit = 0xffff; + desc_access = 0; + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_GDTR, + desc_base, desc_limit, desc_access); + if (error != 0) + goto done; + + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_IDTR, + desc_base, desc_limit, desc_access); + if (error != 0) + goto done; + + /* TR */ + desc_base = 0; + desc_limit = 0xffff; + desc_access = 0x0000008b; + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_TR, 0, 0, desc_access); + if (error) + goto done; + + sel = 0; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_TR, sel)) != 0) + goto done; + + /* LDTR */ + desc_base = 0; + desc_limit = 0xffff; + desc_access = 0x00000082; + error = vm_set_desc(vmctx, vcpu, VM_REG_GUEST_LDTR, desc_base, + desc_limit, desc_access); + if (error) + goto done; + + sel = 0; + if ((error = vm_set_register(vmctx, vcpu, VM_REG_GUEST_LDTR, 0)) != 0) + goto done; + + /* XXX cr2, debug registers */ + + error = 0; +done: + return (error); +} + +int +vm_get_gpa_pmap(struct vmctx *ctx, uint64_t gpa, uint64_t *pte, int *num) +{ + int error, i; + struct vm_gpa_pte gpapte; + + bzero(&gpapte, sizeof(gpapte)); + gpapte.gpa = gpa; + + error = ioctl(ctx->fd, VM_GET_GPA_PMAP, &gpapte); + + if (error == 0) { + *num = gpapte.ptenum; + for (i = 0; i < gpapte.ptenum; i++) + pte[i] = gpapte.pte[i]; + } + + return (error); +} + +int +vm_get_hpet_capabilities(struct vmctx *ctx, uint32_t *capabilities) +{ + int error; + struct vm_hpet_cap cap; + + bzero(&cap, sizeof(struct vm_hpet_cap)); + error = ioctl(ctx->fd, VM_GET_HPET_CAPABILITIES, &cap); + if (capabilities != NULL) + *capabilities = cap.capabilities; + return (error); +} + +int +vm_gla2gpa(struct vmctx *ctx, int vcpu, struct vm_guest_paging *paging, + uint64_t gla, int prot, uint64_t *gpa, int *fault) +{ + struct vm_gla2gpa gg; + int error; + + bzero(&gg, sizeof(struct vm_gla2gpa)); + gg.vcpuid = vcpu; + gg.prot = prot; + gg.gla = gla; + gg.paging = *paging; + + error = ioctl(ctx->fd, VM_GLA2GPA, &gg); + if (error == 0) { + *fault = gg.fault; + *gpa = gg.gpa; + } + return (error); +} + +#ifndef min +#define min(a,b) (((a) < (b)) ? (a) : (b)) +#endif + +int +vm_copy_setup(struct vmctx *ctx, int vcpu, struct vm_guest_paging *paging, + uint64_t gla, size_t len, int prot, struct iovec *iov, int iovcnt, + int *fault) +{ + void *va; + uint64_t gpa; + int error, i, n, off; + + for (i = 0; i < iovcnt; i++) { + iov[i].iov_base = 0; + iov[i].iov_len = 0; + } + + while (len) { + assert(iovcnt > 0); + error = vm_gla2gpa(ctx, vcpu, paging, gla, prot, &gpa, fault); + if (error || *fault) + return (error); + + off = gpa & PAGE_MASK; + n = min(len, PAGE_SIZE - off); + + va = vm_map_gpa(ctx, gpa, n); + if (va == NULL) + return (EFAULT); + + iov->iov_base = va; + iov->iov_len = n; + iov++; + iovcnt--; + + gla += n; + len -= n; + } + return (0); +} + +void +vm_copy_teardown(struct vmctx *ctx, int vcpu, struct iovec *iov, int iovcnt) +{ + + return; +} + +void +vm_copyin(struct vmctx *ctx, int vcpu, struct iovec *iov, void *vp, size_t len) +{ + const char *src; + char *dst; + size_t n; + + dst = vp; + while (len) { + assert(iov->iov_len); + n = min(len, iov->iov_len); + src = iov->iov_base; + bcopy(src, dst, n); + + iov++; + dst += n; + len -= n; + } +} + +void +vm_copyout(struct vmctx *ctx, int vcpu, const void *vp, struct iovec *iov, + size_t len) +{ + const char *src; + char *dst; + size_t n; + + src = vp; + while (len) { + assert(iov->iov_len); + n = min(len, iov->iov_len); + dst = iov->iov_base; + bcopy(src, dst, n); + + iov++; + src += n; + len -= n; + } +} + +static int +vm_get_cpus(struct vmctx *ctx, int which, cpuset_t *cpus) +{ + struct vm_cpuset vm_cpuset; + int error; + + bzero(&vm_cpuset, sizeof(struct vm_cpuset)); + vm_cpuset.which = which; + vm_cpuset.cpusetsize = sizeof(cpuset_t); + vm_cpuset.cpus = cpus; + + error = ioctl(ctx->fd, VM_GET_CPUS, &vm_cpuset); + return (error); +} + +int +vm_active_cpus(struct vmctx *ctx, cpuset_t *cpus) +{ + + return (vm_get_cpus(ctx, VM_ACTIVE_CPUS, cpus)); +} + +int +vm_suspended_cpus(struct vmctx *ctx, cpuset_t *cpus) +{ + + return (vm_get_cpus(ctx, VM_SUSPENDED_CPUS, cpus)); +} + +int +vm_activate_cpu(struct vmctx *ctx, int vcpu) +{ + struct vm_activate_cpu ac; + int error; + + bzero(&ac, sizeof(struct vm_activate_cpu)); + ac.vcpuid = vcpu; + error = ioctl(ctx->fd, VM_ACTIVATE_CPU, &ac); + return (error); +} + +int +vm_get_intinfo(struct vmctx *ctx, int vcpu, uint64_t *info1, uint64_t *info2) +{ + struct vm_intinfo vmii; + int error; + + bzero(&vmii, sizeof(struct vm_intinfo)); + vmii.vcpuid = vcpu; + error = ioctl(ctx->fd, VM_GET_INTINFO, &vmii); + if (error == 0) { + *info1 = vmii.info1; + *info2 = vmii.info2; + } + return (error); +} + +int +vm_set_intinfo(struct vmctx *ctx, int vcpu, uint64_t info1) +{ + struct vm_intinfo vmii; + int error; + + bzero(&vmii, sizeof(struct vm_intinfo)); + vmii.vcpuid = vcpu; + vmii.info1 = info1; + error = ioctl(ctx->fd, VM_SET_INTINFO, &vmii); + return (error); +} + +int +vm_rtc_write(struct vmctx *ctx, int offset, uint8_t value) +{ + struct vm_rtc_data rtcdata; + int error; + + bzero(&rtcdata, sizeof(struct vm_rtc_data)); + rtcdata.offset = offset; + rtcdata.value = value; + error = ioctl(ctx->fd, VM_RTC_WRITE, &rtcdata); + return (error); +} + +int +vm_rtc_read(struct vmctx *ctx, int offset, uint8_t *retval) +{ + struct vm_rtc_data rtcdata; + int error; + + bzero(&rtcdata, sizeof(struct vm_rtc_data)); + rtcdata.offset = offset; + error = ioctl(ctx->fd, VM_RTC_READ, &rtcdata); + if (error == 0) + *retval = rtcdata.value; + return (error); +} + +int +vm_rtc_settime(struct vmctx *ctx, time_t secs) +{ + struct vm_rtc_time rtctime; + int error; + + bzero(&rtctime, sizeof(struct vm_rtc_time)); + rtctime.secs = secs; + error = ioctl(ctx->fd, VM_RTC_SETTIME, &rtctime); + return (error); +} + +int +vm_rtc_gettime(struct vmctx *ctx, time_t *secs) +{ + struct vm_rtc_time rtctime; + int error; + + bzero(&rtctime, sizeof(struct vm_rtc_time)); + error = ioctl(ctx->fd, VM_RTC_GETTIME, &rtctime); + if (error == 0) + *secs = rtctime.secs; + return (error); +} + +int +vm_restart_instruction(void *arg, int vcpu) +{ + struct vmctx *ctx = arg; + + return (ioctl(ctx->fd, VM_RESTART_INSTRUCTION, &vcpu)); +} diff -u -r -N usr/src/lib/libvmmapi/vmmapi.h /usr/src/lib/libvmmapi/vmmapi.h --- usr/src/lib/libvmmapi/vmmapi.h 2016-09-29 00:26:02.000000000 +0100 +++ /usr/src/lib/libvmmapi/vmmapi.h 2016-11-30 10:56:05.783036000 +0000 @@ -162,6 +162,11 @@ int vm_get_intinfo(struct vmctx *ctx, int vcpu, uint64_t *i1, uint64_t *i2); int vm_set_intinfo(struct vmctx *ctx, int vcpu, uint64_t exit_intinfo); +/* The next three functions are documented in vmmapi.c */ +int vm_get_fd(struct vmctx *ctx); +int vm_map_user_buf(struct vmctx *ctx, vm_paddr_t gpa, size_t len, void *host_buf); +int vm_io_reg_handler(struct vmctx *ctx, uint16_t port, uint16_t in, + uint32_t mask_data, uint32_t data, enum vm_io_regh_type type, void *arg); /* * Return a pointer to the statistics buffer. Note that this is not MT-safe. */ diff -u -r -N usr/src/lib/libvmmapi/vmmapi.h.orig /usr/src/lib/libvmmapi/vmmapi.h.orig --- usr/src/lib/libvmmapi/vmmapi.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/lib/libvmmapi/vmmapi.h.orig 2016-11-30 10:52:53.124321000 +0000 @@ -0,0 +1,219 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/lib/libvmmapi/vmmapi.h 298896 2016-05-01 19:37:33Z pfg $ + */ + +#ifndef _VMMAPI_H_ +#define _VMMAPI_H_ + +#include <sys/param.h> +#include <sys/cpuset.h> + +/* + * API version for out-of-tree consumers like grub-bhyve for making compile + * time decisions. + */ +#define VMMAPI_VERSION 0102 /* 2 digit major followed by 2 digit minor */ + +struct iovec; +struct vmctx; +enum x2apic_state; + +/* + * Different styles of mapping the memory assigned to a VM into the address + * space of the controlling process. + */ +enum vm_mmap_style { + VM_MMAP_NONE, /* no mapping */ + VM_MMAP_ALL, /* fully and statically mapped */ + VM_MMAP_SPARSE, /* mappings created on-demand */ +}; + +/* + * 'flags' value passed to 'vm_set_memflags()'. + */ +#define VM_MEM_F_INCORE 0x01 /* include guest memory in core file */ +#define VM_MEM_F_WIRED 0x02 /* guest memory is wired */ + +/* + * Identifiers for memory segments: + * - vm_setup_memory() uses VM_SYSMEM for the system memory segment. + * - the remaining identifiers can be used to create devmem segments. + */ +enum { + VM_SYSMEM, + VM_BOOTROM, + VM_FRAMEBUFFER, +}; + +/* + * Get the length and name of the memory segment identified by 'segid'. + * Note that system memory segments are identified with a nul name. + * + * Returns 0 on success and non-zero otherwise. + */ +int vm_get_memseg(struct vmctx *ctx, int ident, size_t *lenp, char *name, + size_t namesiz); + +/* + * Iterate over the guest address space. This function finds an address range + * that starts at an address >= *gpa. + * + * Returns 0 if the next address range was found and non-zero otherwise. + */ +int vm_mmap_getnext(struct vmctx *ctx, vm_paddr_t *gpa, int *segid, + vm_ooffset_t *segoff, size_t *len, int *prot, int *flags); +/* + * Create a device memory segment identified by 'segid'. + * + * Returns a pointer to the memory segment on success and MAP_FAILED otherwise. + */ +void *vm_create_devmem(struct vmctx *ctx, int segid, const char *name, + size_t len); + +/* + * Map the memory segment identified by 'segid' into the guest address space + * at [gpa,gpa+len) with protection 'prot'. + */ +int vm_mmap_memseg(struct vmctx *ctx, vm_paddr_t gpa, int segid, + vm_ooffset_t segoff, size_t len, int prot); + +int vm_create(const char *name); +struct vmctx *vm_open(const char *name); +void vm_destroy(struct vmctx *ctx); +int vm_parse_memsize(const char *optarg, size_t *memsize); +int vm_setup_memory(struct vmctx *ctx, size_t len, enum vm_mmap_style s); +void *vm_map_gpa(struct vmctx *ctx, vm_paddr_t gaddr, size_t len); +int vm_get_gpa_pmap(struct vmctx *, uint64_t gpa, uint64_t *pte, int *num); +int vm_gla2gpa(struct vmctx *, int vcpuid, struct vm_guest_paging *paging, + uint64_t gla, int prot, uint64_t *gpa, int *fault); +uint32_t vm_get_lowmem_limit(struct vmctx *ctx); +void vm_set_lowmem_limit(struct vmctx *ctx, uint32_t limit); +void vm_set_memflags(struct vmctx *ctx, int flags); +int vm_get_memflags(struct vmctx *ctx); +size_t vm_get_lowmem_size(struct vmctx *ctx); +size_t vm_get_highmem_size(struct vmctx *ctx); +int vm_set_desc(struct vmctx *ctx, int vcpu, int reg, + uint64_t base, uint32_t limit, uint32_t access); +int vm_get_desc(struct vmctx *ctx, int vcpu, int reg, + uint64_t *base, uint32_t *limit, uint32_t *access); +int vm_get_seg_desc(struct vmctx *ctx, int vcpu, int reg, + struct seg_desc *seg_desc); +int vm_set_register(struct vmctx *ctx, int vcpu, int reg, uint64_t val); +int vm_get_register(struct vmctx *ctx, int vcpu, int reg, uint64_t *retval); +int vm_run(struct vmctx *ctx, int vcpu, struct vm_exit *ret_vmexit); +int vm_suspend(struct vmctx *ctx, enum vm_suspend_how how); +int vm_reinit(struct vmctx *ctx); +int vm_apicid2vcpu(struct vmctx *ctx, int apicid); +int vm_inject_exception(struct vmctx *ctx, int vcpu, int vector, + int errcode_valid, uint32_t errcode, int restart_instruction); +int vm_lapic_irq(struct vmctx *ctx, int vcpu, int vector); +int vm_lapic_local_irq(struct vmctx *ctx, int vcpu, int vector); +int vm_lapic_msi(struct vmctx *ctx, uint64_t addr, uint64_t msg); +int vm_ioapic_assert_irq(struct vmctx *ctx, int irq); +int vm_ioapic_deassert_irq(struct vmctx *ctx, int irq); +int vm_ioapic_pulse_irq(struct vmctx *ctx, int irq); +int vm_ioapic_pincount(struct vmctx *ctx, int *pincount); +int vm_isa_assert_irq(struct vmctx *ctx, int atpic_irq, int ioapic_irq); +int vm_isa_deassert_irq(struct vmctx *ctx, int atpic_irq, int ioapic_irq); +int vm_isa_pulse_irq(struct vmctx *ctx, int atpic_irq, int ioapic_irq); +int vm_isa_set_irq_trigger(struct vmctx *ctx, int atpic_irq, + enum vm_intr_trigger trigger); +int vm_inject_nmi(struct vmctx *ctx, int vcpu); +int vm_capability_name2type(const char *capname); +const char *vm_capability_type2name(int type); +int vm_get_capability(struct vmctx *ctx, int vcpu, enum vm_cap_type cap, + int *retval); +int vm_set_capability(struct vmctx *ctx, int vcpu, enum vm_cap_type cap, + int val); +int vm_assign_pptdev(struct vmctx *ctx, int bus, int slot, int func); +int vm_unassign_pptdev(struct vmctx *ctx, int bus, int slot, int func); +int vm_map_pptdev_mmio(struct vmctx *ctx, int bus, int slot, int func, + vm_paddr_t gpa, size_t len, vm_paddr_t hpa); +int vm_setup_pptdev_msi(struct vmctx *ctx, int vcpu, int bus, int slot, + int func, uint64_t addr, uint64_t msg, int numvec); +int vm_setup_pptdev_msix(struct vmctx *ctx, int vcpu, int bus, int slot, + int func, int idx, uint64_t addr, uint64_t msg, + uint32_t vector_control); + +int vm_get_intinfo(struct vmctx *ctx, int vcpu, uint64_t *i1, uint64_t *i2); +int vm_set_intinfo(struct vmctx *ctx, int vcpu, uint64_t exit_intinfo); + +/* + * Return a pointer to the statistics buffer. Note that this is not MT-safe. + */ +uint64_t *vm_get_stats(struct vmctx *ctx, int vcpu, struct timeval *ret_tv, + int *ret_entries); +const char *vm_get_stat_desc(struct vmctx *ctx, int index); + +int vm_get_x2apic_state(struct vmctx *ctx, int vcpu, enum x2apic_state *s); +int vm_set_x2apic_state(struct vmctx *ctx, int vcpu, enum x2apic_state s); + +int vm_get_hpet_capabilities(struct vmctx *ctx, uint32_t *capabilities); + +/* + * Translate the GLA range [gla,gla+len) into GPA segments in 'iov'. + * The 'iovcnt' should be big enough to accommodate all GPA segments. + * + * retval fault Interpretation + * 0 0 Success + * 0 1 An exception was injected into the guest + * EFAULT N/A Error + */ +int vm_copy_setup(struct vmctx *ctx, int vcpu, struct vm_guest_paging *pg, + uint64_t gla, size_t len, int prot, struct iovec *iov, int iovcnt, + int *fault); +void vm_copyin(struct vmctx *ctx, int vcpu, struct iovec *guest_iov, + void *host_dst, size_t len); +void vm_copyout(struct vmctx *ctx, int vcpu, const void *host_src, + struct iovec *guest_iov, size_t len); +void vm_copy_teardown(struct vmctx *ctx, int vcpu, struct iovec *iov, + int iovcnt); + +/* RTC */ +int vm_rtc_write(struct vmctx *ctx, int offset, uint8_t value); +int vm_rtc_read(struct vmctx *ctx, int offset, uint8_t *retval); +int vm_rtc_settime(struct vmctx *ctx, time_t secs); +int vm_rtc_gettime(struct vmctx *ctx, time_t *secs); + +/* Reset vcpu register state */ +int vcpu_reset(struct vmctx *ctx, int vcpu); + +int vm_active_cpus(struct vmctx *ctx, cpuset_t *cpus); +int vm_suspended_cpus(struct vmctx *ctx, cpuset_t *cpus); +int vm_activate_cpu(struct vmctx *ctx, int vcpu); + +/* + * FreeBSD specific APIs + */ +int vm_setup_freebsd_registers(struct vmctx *ctx, int vcpu, + uint64_t rip, uint64_t cr3, uint64_t gdtbase, + uint64_t rsp); +int vm_setup_freebsd_registers_i386(struct vmctx *vmctx, int vcpu, + uint32_t eip, uint32_t gdtbase, + uint32_t esp); +void vm_setup_freebsd_gdt(uint64_t *gdtr); +#endif /* _VMMAPI_H_ */ Files usr/src/src.txz and /usr/src/src.txz differ diff -u -r -N usr/src/sys/amd64/conf/GENERIC /usr/src/sys/amd64/conf/GENERIC --- usr/src/sys/amd64/conf/GENERIC 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/conf/GENERIC 2016-11-30 10:58:02.591933000 +0000 @@ -353,7 +353,7 @@ device vmx # VMware VMXNET3 Ethernet # Netmap provides direct access to TX/RX rings on supported NICs -device netmap # netmap(4) support +#device netmap # netmap(4) support # The crypto framework is required by IPSEC device crypto # Required by IPSEC diff -u -r -N usr/src/sys/amd64/include/vmm.h /usr/src/sys/amd64/include/vmm.h --- usr/src/sys/amd64/include/vmm.h 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/include/vmm.h 2016-11-30 10:56:05.784999000 +0000 @@ -183,6 +183,7 @@ int vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem); void vm_free_memseg(struct vm *vm, int ident); int vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa); +int vm_map_usermem(struct vm *vm, vm_paddr_t gpa, size_t len, void *buf, struct thread *td); int vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len); int vm_assign_pptdev(struct vm *vm, int bus, int slot, int func); int vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func); @@ -321,6 +322,7 @@ struct vatpit *vm_atpit(struct vm *vm); struct vpmtmr *vm_pmtmr(struct vm *vm); struct vrtc *vm_rtc(struct vm *vm); +struct ioregh *vm_ioregh(struct vm *vm); /* * Inject exception 'vector' into the guest vcpu. This function returns 0 on @@ -417,7 +419,14 @@ EDGE_TRIGGER, LEVEL_TRIGGER }; - + +/* Operations supported on VM_IO_REG_HANDLER ioctl. */ +enum vm_io_regh_type { + VM_IO_REGH_DELETE, + VM_IO_REGH_KWEVENTS, /* kernel wait events */ + VM_IO_REGH_MAX +}; + /* * The 'access' field has the format specified in Table 21-2 of the Intel * Architecture Manual vol 3b. diff -u -r -N usr/src/sys/amd64/include/vmm.h.orig /usr/src/sys/amd64/include/vmm.h.orig --- usr/src/sys/amd64/include/vmm.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/include/vmm.h.orig 2016-11-30 10:52:53.822959000 +0000 @@ -0,0 +1,675 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm.h 299010 2016-05-03 22:13:04Z pfg $ + */ + +#ifndef _VMM_H_ +#define _VMM_H_ + +#include <x86/segments.h> + +enum vm_suspend_how { + VM_SUSPEND_NONE, + VM_SUSPEND_RESET, + VM_SUSPEND_POWEROFF, + VM_SUSPEND_HALT, + VM_SUSPEND_TRIPLEFAULT, + VM_SUSPEND_LAST +}; + +/* + * Identifiers for architecturally defined registers. + */ +enum vm_reg_name { + VM_REG_GUEST_RAX, + VM_REG_GUEST_RBX, + VM_REG_GUEST_RCX, + VM_REG_GUEST_RDX, + VM_REG_GUEST_RSI, + VM_REG_GUEST_RDI, + VM_REG_GUEST_RBP, + VM_REG_GUEST_R8, + VM_REG_GUEST_R9, + VM_REG_GUEST_R10, + VM_REG_GUEST_R11, + VM_REG_GUEST_R12, + VM_REG_GUEST_R13, + VM_REG_GUEST_R14, + VM_REG_GUEST_R15, + VM_REG_GUEST_CR0, + VM_REG_GUEST_CR3, + VM_REG_GUEST_CR4, + VM_REG_GUEST_DR7, + VM_REG_GUEST_RSP, + VM_REG_GUEST_RIP, + VM_REG_GUEST_RFLAGS, + VM_REG_GUEST_ES, + VM_REG_GUEST_CS, + VM_REG_GUEST_SS, + VM_REG_GUEST_DS, + VM_REG_GUEST_FS, + VM_REG_GUEST_GS, + VM_REG_GUEST_LDTR, + VM_REG_GUEST_TR, + VM_REG_GUEST_IDTR, + VM_REG_GUEST_GDTR, + VM_REG_GUEST_EFER, + VM_REG_GUEST_CR2, + VM_REG_GUEST_PDPTE0, + VM_REG_GUEST_PDPTE1, + VM_REG_GUEST_PDPTE2, + VM_REG_GUEST_PDPTE3, + VM_REG_GUEST_INTR_SHADOW, + VM_REG_LAST +}; + +enum x2apic_state { + X2APIC_DISABLED, + X2APIC_ENABLED, + X2APIC_STATE_LAST +}; + +#define VM_INTINFO_VECTOR(info) ((info) & 0xff) +#define VM_INTINFO_DEL_ERRCODE 0x800 +#define VM_INTINFO_RSVD 0x7ffff000 +#define VM_INTINFO_VALID 0x80000000 +#define VM_INTINFO_TYPE 0x700 +#define VM_INTINFO_HWINTR (0 << 8) +#define VM_INTINFO_NMI (2 << 8) +#define VM_INTINFO_HWEXCEPTION (3 << 8) +#define VM_INTINFO_SWINTR (4 << 8) + +#ifdef _KERNEL + +#define VM_MAX_NAMELEN 32 + +struct vm; +struct vm_exception; +struct seg_desc; +struct vm_exit; +struct vm_run; +struct vhpet; +struct vioapic; +struct vlapic; +struct vmspace; +struct vm_object; +struct vm_guest_paging; +struct pmap; + +struct vm_eventinfo { + void *rptr; /* rendezvous cookie */ + int *sptr; /* suspend cookie */ + int *iptr; /* reqidle cookie */ +}; + +typedef int (*vmm_init_func_t)(int ipinum); +typedef int (*vmm_cleanup_func_t)(void); +typedef void (*vmm_resume_func_t)(void); +typedef void * (*vmi_init_func_t)(struct vm *vm, struct pmap *pmap); +typedef int (*vmi_run_func_t)(void *vmi, int vcpu, register_t rip, + struct pmap *pmap, struct vm_eventinfo *info); +typedef void (*vmi_cleanup_func_t)(void *vmi); +typedef int (*vmi_get_register_t)(void *vmi, int vcpu, int num, + uint64_t *retval); +typedef int (*vmi_set_register_t)(void *vmi, int vcpu, int num, + uint64_t val); +typedef int (*vmi_get_desc_t)(void *vmi, int vcpu, int num, + struct seg_desc *desc); +typedef int (*vmi_set_desc_t)(void *vmi, int vcpu, int num, + struct seg_desc *desc); +typedef int (*vmi_get_cap_t)(void *vmi, int vcpu, int num, int *retval); +typedef int (*vmi_set_cap_t)(void *vmi, int vcpu, int num, int val); +typedef struct vmspace * (*vmi_vmspace_alloc)(vm_offset_t min, vm_offset_t max); +typedef void (*vmi_vmspace_free)(struct vmspace *vmspace); +typedef struct vlapic * (*vmi_vlapic_init)(void *vmi, int vcpu); +typedef void (*vmi_vlapic_cleanup)(void *vmi, struct vlapic *vlapic); + +struct vmm_ops { + vmm_init_func_t init; /* module wide initialization */ + vmm_cleanup_func_t cleanup; + vmm_resume_func_t resume; + + vmi_init_func_t vminit; /* vm-specific initialization */ + vmi_run_func_t vmrun; + vmi_cleanup_func_t vmcleanup; + vmi_get_register_t vmgetreg; + vmi_set_register_t vmsetreg; + vmi_get_desc_t vmgetdesc; + vmi_set_desc_t vmsetdesc; + vmi_get_cap_t vmgetcap; + vmi_set_cap_t vmsetcap; + vmi_vmspace_alloc vmspace_alloc; + vmi_vmspace_free vmspace_free; + vmi_vlapic_init vlapic_init; + vmi_vlapic_cleanup vlapic_cleanup; +}; + +extern struct vmm_ops vmm_ops_intel; +extern struct vmm_ops vmm_ops_amd; + +int vm_create(const char *name, struct vm **retvm); +void vm_destroy(struct vm *vm); +int vm_reinit(struct vm *vm); +const char *vm_name(struct vm *vm); + +/* + * APIs that modify the guest memory map require all vcpus to be frozen. + */ +int vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t off, + size_t len, int prot, int flags); +int vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem); +void vm_free_memseg(struct vm *vm, int ident); +int vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa); +int vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len); +int vm_assign_pptdev(struct vm *vm, int bus, int slot, int func); +int vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func); + +/* + * APIs that inspect the guest memory map require only a *single* vcpu to + * be frozen. This acts like a read lock on the guest memory map since any + * modification requires *all* vcpus to be frozen. + */ +int vm_mmap_getnext(struct vm *vm, vm_paddr_t *gpa, int *segid, + vm_ooffset_t *segoff, size_t *len, int *prot, int *flags); +int vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem, + struct vm_object **objptr); +void *vm_gpa_hold(struct vm *, int vcpuid, vm_paddr_t gpa, size_t len, + int prot, void **cookie); +void vm_gpa_release(void *cookie); +bool vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa); + +int vm_get_register(struct vm *vm, int vcpu, int reg, uint64_t *retval); +int vm_set_register(struct vm *vm, int vcpu, int reg, uint64_t val); +int vm_get_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *ret_desc); +int vm_set_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *desc); +int vm_run(struct vm *vm, struct vm_run *vmrun); +int vm_suspend(struct vm *vm, enum vm_suspend_how how); +int vm_inject_nmi(struct vm *vm, int vcpu); +int vm_nmi_pending(struct vm *vm, int vcpuid); +void vm_nmi_clear(struct vm *vm, int vcpuid); +int vm_inject_extint(struct vm *vm, int vcpu); +int vm_extint_pending(struct vm *vm, int vcpuid); +void vm_extint_clear(struct vm *vm, int vcpuid); +struct vlapic *vm_lapic(struct vm *vm, int cpu); +struct vioapic *vm_ioapic(struct vm *vm); +struct vhpet *vm_hpet(struct vm *vm); +int vm_get_capability(struct vm *vm, int vcpu, int type, int *val); +int vm_set_capability(struct vm *vm, int vcpu, int type, int val); +int vm_get_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state *state); +int vm_set_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state state); +int vm_apicid2vcpuid(struct vm *vm, int apicid); +int vm_activate_cpu(struct vm *vm, int vcpu); +struct vm_exit *vm_exitinfo(struct vm *vm, int vcpuid); +void vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip); + +#ifdef _SYS__CPUSET_H_ +/* + * Rendezvous all vcpus specified in 'dest' and execute 'func(arg)'. + * The rendezvous 'func(arg)' is not allowed to do anything that will + * cause the thread to be put to sleep. + * + * If the rendezvous is being initiated from a vcpu context then the + * 'vcpuid' must refer to that vcpu, otherwise it should be set to -1. + * + * The caller cannot hold any locks when initiating the rendezvous. + * + * The implementation of this API may cause vcpus other than those specified + * by 'dest' to be stalled. The caller should not rely on any vcpus making + * forward progress when the rendezvous is in progress. + */ +typedef void (*vm_rendezvous_func_t)(struct vm *vm, int vcpuid, void *arg); +void vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest, + vm_rendezvous_func_t func, void *arg); +cpuset_t vm_active_cpus(struct vm *vm); +cpuset_t vm_suspended_cpus(struct vm *vm); +#endif /* _SYS__CPUSET_H_ */ + +static __inline int +vcpu_rendezvous_pending(struct vm_eventinfo *info) +{ + + return (*((uintptr_t *)(info->rptr)) != 0); +} + +static __inline int +vcpu_suspended(struct vm_eventinfo *info) +{ + + return (*info->sptr); +} + +static __inline int +vcpu_reqidle(struct vm_eventinfo *info) +{ + + return (*info->iptr); +} + +/* + * Return 1 if device indicated by bus/slot/func is supposed to be a + * pci passthrough device. + * + * Return 0 otherwise. + */ +int vmm_is_pptdev(int bus, int slot, int func); + +void *vm_iommu_domain(struct vm *vm); + +enum vcpu_state { + VCPU_IDLE, + VCPU_FROZEN, + VCPU_RUNNING, + VCPU_SLEEPING, +}; + +int vcpu_set_state(struct vm *vm, int vcpu, enum vcpu_state state, + bool from_idle); +enum vcpu_state vcpu_get_state(struct vm *vm, int vcpu, int *hostcpu); + +static int __inline +vcpu_is_running(struct vm *vm, int vcpu, int *hostcpu) +{ + return (vcpu_get_state(vm, vcpu, hostcpu) == VCPU_RUNNING); +} + +#ifdef _SYS_PROC_H_ +static int __inline +vcpu_should_yield(struct vm *vm, int vcpu) +{ + + if (curthread->td_flags & (TDF_ASTPENDING | TDF_NEEDRESCHED)) + return (1); + else if (curthread->td_owepreempt) + return (1); + else + return (0); +} +#endif + +void *vcpu_stats(struct vm *vm, int vcpu); +void vcpu_notify_event(struct vm *vm, int vcpuid, bool lapic_intr); +struct vmspace *vm_get_vmspace(struct vm *vm); +struct vatpic *vm_atpic(struct vm *vm); +struct vatpit *vm_atpit(struct vm *vm); +struct vpmtmr *vm_pmtmr(struct vm *vm); +struct vrtc *vm_rtc(struct vm *vm); + +/* + * Inject exception 'vector' into the guest vcpu. This function returns 0 on + * success and non-zero on failure. + * + * Wrapper functions like 'vm_inject_gp()' should be preferred to calling + * this function directly because they enforce the trap-like or fault-like + * behavior of an exception. + * + * This function should only be called in the context of the thread that is + * executing this vcpu. + */ +int vm_inject_exception(struct vm *vm, int vcpuid, int vector, int err_valid, + uint32_t errcode, int restart_instruction); + +/* + * This function is called after a VM-exit that occurred during exception or + * interrupt delivery through the IDT. The format of 'intinfo' is described + * in Figure 15-1, "EXITINTINFO for All Intercepts", APM, Vol 2. + * + * If a VM-exit handler completes the event delivery successfully then it + * should call vm_exit_intinfo() to extinguish the pending event. For e.g., + * if the task switch emulation is triggered via a task gate then it should + * call this function with 'intinfo=0' to indicate that the external event + * is not pending anymore. + * + * Return value is 0 on success and non-zero on failure. + */ +int vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_t intinfo); + +/* + * This function is called before every VM-entry to retrieve a pending + * event that should be injected into the guest. This function combines + * nested events into a double or triple fault. + * + * Returns 0 if there are no events that need to be injected into the guest + * and non-zero otherwise. + */ +int vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *info); + +int vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2); + +enum vm_reg_name vm_segment_name(int seg_encoding); + +struct vm_copyinfo { + uint64_t gpa; + size_t len; + void *hva; + void *cookie; +}; + +/* + * Set up 'copyinfo[]' to copy to/from guest linear address space starting + * at 'gla' and 'len' bytes long. The 'prot' should be set to PROT_READ for + * a copyin or PROT_WRITE for a copyout. + * + * retval is_fault Interpretation + * 0 0 Success + * 0 1 An exception was injected into the guest + * EFAULT N/A Unrecoverable error + * + * The 'copyinfo[]' can be passed to 'vm_copyin()' or 'vm_copyout()' only if + * the return value is 0. The 'copyinfo[]' resources should be freed by calling + * 'vm_copy_teardown()' after the copy is done. + */ +int vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, + uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo, + int num_copyinfo, int *is_fault); +void vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + int num_copyinfo); +void vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + void *kaddr, size_t len); +void vm_copyout(struct vm *vm, int vcpuid, const void *kaddr, + struct vm_copyinfo *copyinfo, size_t len); + +int vcpu_trace_exceptions(struct vm *vm, int vcpuid); +#endif /* KERNEL */ + +#define VM_MAXCPU 16 /* maximum virtual cpus */ + +/* + * Identifiers for optional vmm capabilities + */ +enum vm_cap_type { + VM_CAP_HALT_EXIT, + VM_CAP_MTRAP_EXIT, + VM_CAP_PAUSE_EXIT, + VM_CAP_UNRESTRICTED_GUEST, + VM_CAP_ENABLE_INVPCID, + VM_CAP_MAX +}; + +enum vm_intr_trigger { + EDGE_TRIGGER, + LEVEL_TRIGGER +}; + +/* + * The 'access' field has the format specified in Table 21-2 of the Intel + * Architecture Manual vol 3b. + * + * XXX The contents of the 'access' field are architecturally defined except + * bit 16 - Segment Unusable. + */ +struct seg_desc { + uint64_t base; + uint32_t limit; + uint32_t access; +}; +#define SEG_DESC_TYPE(access) ((access) & 0x001f) +#define SEG_DESC_DPL(access) (((access) >> 5) & 0x3) +#define SEG_DESC_PRESENT(access) (((access) & 0x0080) ? 1 : 0) +#define SEG_DESC_DEF32(access) (((access) & 0x4000) ? 1 : 0) +#define SEG_DESC_GRANULARITY(access) (((access) & 0x8000) ? 1 : 0) +#define SEG_DESC_UNUSABLE(access) (((access) & 0x10000) ? 1 : 0) + +enum vm_cpu_mode { + CPU_MODE_REAL, + CPU_MODE_PROTECTED, + CPU_MODE_COMPATIBILITY, /* IA-32E mode (CS.L = 0) */ + CPU_MODE_64BIT, /* IA-32E mode (CS.L = 1) */ +}; + +enum vm_paging_mode { + PAGING_MODE_FLAT, + PAGING_MODE_32, + PAGING_MODE_PAE, + PAGING_MODE_64, +}; + +struct vm_guest_paging { + uint64_t cr3; + int cpl; + enum vm_cpu_mode cpu_mode; + enum vm_paging_mode paging_mode; +}; + +/* + * The data structures 'vie' and 'vie_op' are meant to be opaque to the + * consumers of instruction decoding. The only reason why their contents + * need to be exposed is because they are part of the 'vm_exit' structure. + */ +struct vie_op { + uint8_t op_byte; /* actual opcode byte */ + uint8_t op_type; /* type of operation (e.g. MOV) */ + uint16_t op_flags; +}; + +#define VIE_INST_SIZE 15 +struct vie { + uint8_t inst[VIE_INST_SIZE]; /* instruction bytes */ + uint8_t num_valid; /* size of the instruction */ + uint8_t num_processed; + + uint8_t addrsize:4, opsize:4; /* address and operand sizes */ + uint8_t rex_w:1, /* REX prefix */ + rex_r:1, + rex_x:1, + rex_b:1, + rex_present:1, + repz_present:1, /* REP/REPE/REPZ prefix */ + repnz_present:1, /* REPNE/REPNZ prefix */ + opsize_override:1, /* Operand size override */ + addrsize_override:1, /* Address size override */ + segment_override:1; /* Segment override */ + + uint8_t mod:2, /* ModRM byte */ + reg:4, + rm:4; + + uint8_t ss:2, /* SIB byte */ + index:4, + base:4; + + uint8_t disp_bytes; + uint8_t imm_bytes; + + uint8_t scale; + int base_register; /* VM_REG_GUEST_xyz */ + int index_register; /* VM_REG_GUEST_xyz */ + int segment_register; /* VM_REG_GUEST_xyz */ + + int64_t displacement; /* optional addr displacement */ + int64_t immediate; /* optional immediate operand */ + + uint8_t decoded; /* set to 1 if successfully decoded */ + + struct vie_op op; /* opcode description */ +}; + +enum vm_exitcode { + VM_EXITCODE_INOUT, + VM_EXITCODE_VMX, + VM_EXITCODE_BOGUS, + VM_EXITCODE_RDMSR, + VM_EXITCODE_WRMSR, + VM_EXITCODE_HLT, + VM_EXITCODE_MTRAP, + VM_EXITCODE_PAUSE, + VM_EXITCODE_PAGING, + VM_EXITCODE_INST_EMUL, + VM_EXITCODE_SPINUP_AP, + VM_EXITCODE_DEPRECATED1, /* used to be SPINDOWN_CPU */ + VM_EXITCODE_RENDEZVOUS, + VM_EXITCODE_IOAPIC_EOI, + VM_EXITCODE_SUSPENDED, + VM_EXITCODE_INOUT_STR, + VM_EXITCODE_TASK_SWITCH, + VM_EXITCODE_MONITOR, + VM_EXITCODE_MWAIT, + VM_EXITCODE_SVM, + VM_EXITCODE_REQIDLE, + VM_EXITCODE_MAX +}; + +struct vm_inout { + uint16_t bytes:3; /* 1 or 2 or 4 */ + uint16_t in:1; + uint16_t string:1; + uint16_t rep:1; + uint16_t port; + uint32_t eax; /* valid for out */ +}; + +struct vm_inout_str { + struct vm_inout inout; /* must be the first element */ + struct vm_guest_paging paging; + uint64_t rflags; + uint64_t cr0; + uint64_t index; + uint64_t count; /* rep=1 (%rcx), rep=0 (1) */ + int addrsize; + enum vm_reg_name seg_name; + struct seg_desc seg_desc; +}; + +enum task_switch_reason { + TSR_CALL, + TSR_IRET, + TSR_JMP, + TSR_IDT_GATE, /* task gate in IDT */ +}; + +struct vm_task_switch { + uint16_t tsssel; /* new TSS selector */ + int ext; /* task switch due to external event */ + uint32_t errcode; + int errcode_valid; /* push 'errcode' on the new stack */ + enum task_switch_reason reason; + struct vm_guest_paging paging; +}; + +struct vm_exit { + enum vm_exitcode exitcode; + int inst_length; /* 0 means unknown */ + uint64_t rip; + union { + struct vm_inout inout; + struct vm_inout_str inout_str; + struct { + uint64_t gpa; + int fault_type; + } paging; + struct { + uint64_t gpa; + uint64_t gla; + uint64_t cs_base; + int cs_d; /* CS.D */ + struct vm_guest_paging paging; + struct vie vie; + } inst_emul; + /* + * VMX specific payload. Used when there is no "better" + * exitcode to represent the VM-exit. + */ + struct { + int status; /* vmx inst status */ + /* + * 'exit_reason' and 'exit_qualification' are valid + * only if 'status' is zero. + */ + uint32_t exit_reason; + uint64_t exit_qualification; + /* + * 'inst_error' and 'inst_type' are valid + * only if 'status' is non-zero. + */ + int inst_type; + int inst_error; + } vmx; + /* + * SVM specific payload. + */ + struct { + uint64_t exitcode; + uint64_t exitinfo1; + uint64_t exitinfo2; + } svm; + struct { + uint32_t code; /* ecx value */ + uint64_t wval; + } msr; + struct { + int vcpu; + uint64_t rip; + } spinup_ap; + struct { + uint64_t rflags; + } hlt; + struct { + int vector; + } ioapic_eoi; + struct { + enum vm_suspend_how how; + } suspended; + struct vm_task_switch task_switch; + } u; +}; + +/* APIs to inject faults into the guest */ +void vm_inject_fault(void *vm, int vcpuid, int vector, int errcode_valid, + int errcode); + +static __inline void +vm_inject_ud(void *vm, int vcpuid) +{ + vm_inject_fault(vm, vcpuid, IDT_UD, 0, 0); +} + +static __inline void +vm_inject_gp(void *vm, int vcpuid) +{ + vm_inject_fault(vm, vcpuid, IDT_GP, 1, 0); +} + +static __inline void +vm_inject_ac(void *vm, int vcpuid, int errcode) +{ + vm_inject_fault(vm, vcpuid, IDT_AC, 1, errcode); +} + +static __inline void +vm_inject_ss(void *vm, int vcpuid, int errcode) +{ + vm_inject_fault(vm, vcpuid, IDT_SS, 1, errcode); +} + +void vm_inject_pf(void *vm, int vcpuid, int error_code, uint64_t cr2); + +int vm_restart_instruction(void *vm, int vcpuid); + +#endif /* _VMM_H_ */ diff -u -r -N usr/src/sys/amd64/include/vmm_dev.h /usr/src/sys/amd64/include/vmm_dev.h --- usr/src/sys/amd64/include/vmm_dev.h 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/include/vmm_dev.h 2016-11-30 10:56:05.786583000 +0000 @@ -123,6 +123,23 @@ size_t len; }; +/* Argument for VM_MAP_USER_BUF ioctl in vmmapi.c */ +struct vm_user_buf { + vm_paddr_t gpa; + void *addr; + size_t len; +}; + +/* Argument for VM_IO_REG_HANDLER ioctl in vmmapi.c */ +struct vm_io_reg_handler { + uint16_t port; /* I/O address */ + uint16_t in; /* 0 out, 1 in */ + uint32_t mask_data; /* 0 means match anything */ + uint32_t data; /* data to match */ + enum vm_io_regh_type type; /* handler type */ + void *arg; /* handler argument */ +}; + struct vm_pptdev_msi { int vcpu; int bus; @@ -286,6 +303,10 @@ IOCNUM_RTC_WRITE = 101, IOCNUM_RTC_SETTIME = 102, IOCNUM_RTC_GETTIME = 103, + + /* host mmap and IO handler */ + IOCNUM_MAP_USER_BUF = 104, + IOCNUM_IO_REG_HANDLER = 105, }; #define VM_RUN \ @@ -344,6 +365,10 @@ _IOW('v', IOCNUM_UNBIND_PPTDEV, struct vm_pptdev) #define VM_MAP_PPTDEV_MMIO \ _IOW('v', IOCNUM_MAP_PPTDEV_MMIO, struct vm_pptdev_mmio) +#define VM_MAP_USER_BUF \ + _IOW('v', IOCNUM_MAP_USER_BUF, struct vm_user_buf) +#define VM_IO_REG_HANDLER \ + _IOW('v', IOCNUM_IO_REG_HANDLER, struct vm_io_reg_handler) #define VM_PPTDEV_MSI \ _IOW('v', IOCNUM_PPTDEV_MSI, struct vm_pptdev_msi) #define VM_PPTDEV_MSIX \ diff -u -r -N usr/src/sys/amd64/include/vmm_dev.h.orig /usr/src/sys/amd64/include/vmm_dev.h.orig --- usr/src/sys/amd64/include/vmm_dev.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/include/vmm_dev.h.orig 2016-11-30 10:52:54.370440000 +0000 @@ -0,0 +1,385 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm_dev.h 298094 2016-04-16 03:44:50Z gjb $ + */ + +#ifndef _VMM_DEV_H_ +#define _VMM_DEV_H_ + +#ifdef _KERNEL +void vmmdev_init(void); +int vmmdev_cleanup(void); +#endif + +struct vm_memmap { + vm_paddr_t gpa; + int segid; /* memory segment */ + vm_ooffset_t segoff; /* offset into memory segment */ + size_t len; /* mmap length */ + int prot; /* RWX */ + int flags; +}; +#define VM_MEMMAP_F_WIRED 0x01 +#define VM_MEMMAP_F_IOMMU 0x02 + +#define VM_MEMSEG_NAME(m) ((m)->name[0] != '\0' ? (m)->name : NULL) +struct vm_memseg { + int segid; + size_t len; + char name[SPECNAMELEN + 1]; +}; + +struct vm_register { + int cpuid; + int regnum; /* enum vm_reg_name */ + uint64_t regval; +}; + +struct vm_seg_desc { /* data or code segment */ + int cpuid; + int regnum; /* enum vm_reg_name */ + struct seg_desc desc; +}; + +struct vm_run { + int cpuid; + struct vm_exit vm_exit; +}; + +struct vm_exception { + int cpuid; + int vector; + uint32_t error_code; + int error_code_valid; + int restart_instruction; +}; + +struct vm_lapic_msi { + uint64_t msg; + uint64_t addr; +}; + +struct vm_lapic_irq { + int cpuid; + int vector; +}; + +struct vm_ioapic_irq { + int irq; +}; + +struct vm_isa_irq { + int atpic_irq; + int ioapic_irq; +}; + +struct vm_isa_irq_trigger { + int atpic_irq; + enum vm_intr_trigger trigger; +}; + +struct vm_capability { + int cpuid; + enum vm_cap_type captype; + int capval; + int allcpus; +}; + +struct vm_pptdev { + int bus; + int slot; + int func; +}; + +struct vm_pptdev_mmio { + int bus; + int slot; + int func; + vm_paddr_t gpa; + vm_paddr_t hpa; + size_t len; +}; + +struct vm_pptdev_msi { + int vcpu; + int bus; + int slot; + int func; + int numvec; /* 0 means disabled */ + uint64_t msg; + uint64_t addr; +}; + +struct vm_pptdev_msix { + int vcpu; + int bus; + int slot; + int func; + int idx; + uint64_t msg; + uint32_t vector_control; + uint64_t addr; +}; + +struct vm_nmi { + int cpuid; +}; + +#define MAX_VM_STATS 64 +struct vm_stats { + int cpuid; /* in */ + int num_entries; /* out */ + struct timeval tv; + uint64_t statbuf[MAX_VM_STATS]; +}; + +struct vm_stat_desc { + int index; /* in */ + char desc[128]; /* out */ +}; + +struct vm_x2apic { + int cpuid; + enum x2apic_state state; +}; + +struct vm_gpa_pte { + uint64_t gpa; /* in */ + uint64_t pte[4]; /* out */ + int ptenum; +}; + +struct vm_hpet_cap { + uint32_t capabilities; /* lower 32 bits of HPET capabilities */ +}; + +struct vm_suspend { + enum vm_suspend_how how; +}; + +struct vm_gla2gpa { + int vcpuid; /* inputs */ + int prot; /* PROT_READ or PROT_WRITE */ + uint64_t gla; + struct vm_guest_paging paging; + int fault; /* outputs */ + uint64_t gpa; +}; + +struct vm_activate_cpu { + int vcpuid; +}; + +struct vm_cpuset { + int which; + int cpusetsize; + cpuset_t *cpus; +}; +#define VM_ACTIVE_CPUS 0 +#define VM_SUSPENDED_CPUS 1 + +struct vm_intinfo { + int vcpuid; + uint64_t info1; + uint64_t info2; +}; + +struct vm_rtc_time { + time_t secs; +}; + +struct vm_rtc_data { + int offset; + uint8_t value; +}; + +enum { + /* general routines */ + IOCNUM_ABIVERS = 0, + IOCNUM_RUN = 1, + IOCNUM_SET_CAPABILITY = 2, + IOCNUM_GET_CAPABILITY = 3, + IOCNUM_SUSPEND = 4, + IOCNUM_REINIT = 5, + + /* memory apis */ + IOCNUM_MAP_MEMORY = 10, /* deprecated */ + IOCNUM_GET_MEMORY_SEG = 11, /* deprecated */ + IOCNUM_GET_GPA_PMAP = 12, + IOCNUM_GLA2GPA = 13, + IOCNUM_ALLOC_MEMSEG = 14, + IOCNUM_GET_MEMSEG = 15, + IOCNUM_MMAP_MEMSEG = 16, + IOCNUM_MMAP_GETNEXT = 17, + + /* register/state accessors */ + IOCNUM_SET_REGISTER = 20, + IOCNUM_GET_REGISTER = 21, + IOCNUM_SET_SEGMENT_DESCRIPTOR = 22, + IOCNUM_GET_SEGMENT_DESCRIPTOR = 23, + + /* interrupt injection */ + IOCNUM_GET_INTINFO = 28, + IOCNUM_SET_INTINFO = 29, + IOCNUM_INJECT_EXCEPTION = 30, + IOCNUM_LAPIC_IRQ = 31, + IOCNUM_INJECT_NMI = 32, + IOCNUM_IOAPIC_ASSERT_IRQ = 33, + IOCNUM_IOAPIC_DEASSERT_IRQ = 34, + IOCNUM_IOAPIC_PULSE_IRQ = 35, + IOCNUM_LAPIC_MSI = 36, + IOCNUM_LAPIC_LOCAL_IRQ = 37, + IOCNUM_IOAPIC_PINCOUNT = 38, + IOCNUM_RESTART_INSTRUCTION = 39, + + /* PCI pass-thru */ + IOCNUM_BIND_PPTDEV = 40, + IOCNUM_UNBIND_PPTDEV = 41, + IOCNUM_MAP_PPTDEV_MMIO = 42, + IOCNUM_PPTDEV_MSI = 43, + IOCNUM_PPTDEV_MSIX = 44, + + /* statistics */ + IOCNUM_VM_STATS = 50, + IOCNUM_VM_STAT_DESC = 51, + + /* kernel device state */ + IOCNUM_SET_X2APIC_STATE = 60, + IOCNUM_GET_X2APIC_STATE = 61, + IOCNUM_GET_HPET_CAPABILITIES = 62, + + /* legacy interrupt injection */ + IOCNUM_ISA_ASSERT_IRQ = 80, + IOCNUM_ISA_DEASSERT_IRQ = 81, + IOCNUM_ISA_PULSE_IRQ = 82, + IOCNUM_ISA_SET_IRQ_TRIGGER = 83, + + /* vm_cpuset */ + IOCNUM_ACTIVATE_CPU = 90, + IOCNUM_GET_CPUSET = 91, + + /* RTC */ + IOCNUM_RTC_READ = 100, + IOCNUM_RTC_WRITE = 101, + IOCNUM_RTC_SETTIME = 102, + IOCNUM_RTC_GETTIME = 103, +}; + +#define VM_RUN \ + _IOWR('v', IOCNUM_RUN, struct vm_run) +#define VM_SUSPEND \ + _IOW('v', IOCNUM_SUSPEND, struct vm_suspend) +#define VM_REINIT \ + _IO('v', IOCNUM_REINIT) +#define VM_ALLOC_MEMSEG \ + _IOW('v', IOCNUM_ALLOC_MEMSEG, struct vm_memseg) +#define VM_GET_MEMSEG \ + _IOWR('v', IOCNUM_GET_MEMSEG, struct vm_memseg) +#define VM_MMAP_MEMSEG \ + _IOW('v', IOCNUM_MMAP_MEMSEG, struct vm_memmap) +#define VM_MMAP_GETNEXT \ + _IOWR('v', IOCNUM_MMAP_GETNEXT, struct vm_memmap) +#define VM_SET_REGISTER \ + _IOW('v', IOCNUM_SET_REGISTER, struct vm_register) +#define VM_GET_REGISTER \ + _IOWR('v', IOCNUM_GET_REGISTER, struct vm_register) +#define VM_SET_SEGMENT_DESCRIPTOR \ + _IOW('v', IOCNUM_SET_SEGMENT_DESCRIPTOR, struct vm_seg_desc) +#define VM_GET_SEGMENT_DESCRIPTOR \ + _IOWR('v', IOCNUM_GET_SEGMENT_DESCRIPTOR, struct vm_seg_desc) +#define VM_INJECT_EXCEPTION \ + _IOW('v', IOCNUM_INJECT_EXCEPTION, struct vm_exception) +#define VM_LAPIC_IRQ \ + _IOW('v', IOCNUM_LAPIC_IRQ, struct vm_lapic_irq) +#define VM_LAPIC_LOCAL_IRQ \ + _IOW('v', IOCNUM_LAPIC_LOCAL_IRQ, struct vm_lapic_irq) +#define VM_LAPIC_MSI \ + _IOW('v', IOCNUM_LAPIC_MSI, struct vm_lapic_msi) +#define VM_IOAPIC_ASSERT_IRQ \ + _IOW('v', IOCNUM_IOAPIC_ASSERT_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_DEASSERT_IRQ \ + _IOW('v', IOCNUM_IOAPIC_DEASSERT_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_PULSE_IRQ \ + _IOW('v', IOCNUM_IOAPIC_PULSE_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_PINCOUNT \ + _IOR('v', IOCNUM_IOAPIC_PINCOUNT, int) +#define VM_ISA_ASSERT_IRQ \ + _IOW('v', IOCNUM_ISA_ASSERT_IRQ, struct vm_isa_irq) +#define VM_ISA_DEASSERT_IRQ \ + _IOW('v', IOCNUM_ISA_DEASSERT_IRQ, struct vm_isa_irq) +#define VM_ISA_PULSE_IRQ \ + _IOW('v', IOCNUM_ISA_PULSE_IRQ, struct vm_isa_irq) +#define VM_ISA_SET_IRQ_TRIGGER \ + _IOW('v', IOCNUM_ISA_SET_IRQ_TRIGGER, struct vm_isa_irq_trigger) +#define VM_SET_CAPABILITY \ + _IOW('v', IOCNUM_SET_CAPABILITY, struct vm_capability) +#define VM_GET_CAPABILITY \ + _IOWR('v', IOCNUM_GET_CAPABILITY, struct vm_capability) +#define VM_BIND_PPTDEV \ + _IOW('v', IOCNUM_BIND_PPTDEV, struct vm_pptdev) +#define VM_UNBIND_PPTDEV \ + _IOW('v', IOCNUM_UNBIND_PPTDEV, struct vm_pptdev) +#define VM_MAP_PPTDEV_MMIO \ + _IOW('v', IOCNUM_MAP_PPTDEV_MMIO, struct vm_pptdev_mmio) +#define VM_PPTDEV_MSI \ + _IOW('v', IOCNUM_PPTDEV_MSI, struct vm_pptdev_msi) +#define VM_PPTDEV_MSIX \ + _IOW('v', IOCNUM_PPTDEV_MSIX, struct vm_pptdev_msix) +#define VM_INJECT_NMI \ + _IOW('v', IOCNUM_INJECT_NMI, struct vm_nmi) +#define VM_STATS \ + _IOWR('v', IOCNUM_VM_STATS, struct vm_stats) +#define VM_STAT_DESC \ + _IOWR('v', IOCNUM_VM_STAT_DESC, struct vm_stat_desc) +#define VM_SET_X2APIC_STATE \ + _IOW('v', IOCNUM_SET_X2APIC_STATE, struct vm_x2apic) +#define VM_GET_X2APIC_STATE \ + _IOWR('v', IOCNUM_GET_X2APIC_STATE, struct vm_x2apic) +#define VM_GET_HPET_CAPABILITIES \ + _IOR('v', IOCNUM_GET_HPET_CAPABILITIES, struct vm_hpet_cap) +#define VM_GET_GPA_PMAP \ + _IOWR('v', IOCNUM_GET_GPA_PMAP, struct vm_gpa_pte) +#define VM_GLA2GPA \ + _IOWR('v', IOCNUM_GLA2GPA, struct vm_gla2gpa) +#define VM_ACTIVATE_CPU \ + _IOW('v', IOCNUM_ACTIVATE_CPU, struct vm_activate_cpu) +#define VM_GET_CPUS \ + _IOW('v', IOCNUM_GET_CPUSET, struct vm_cpuset) +#define VM_SET_INTINFO \ + _IOW('v', IOCNUM_SET_INTINFO, struct vm_intinfo) +#define VM_GET_INTINFO \ + _IOWR('v', IOCNUM_GET_INTINFO, struct vm_intinfo) +#define VM_RTC_WRITE \ + _IOW('v', IOCNUM_RTC_WRITE, struct vm_rtc_data) +#define VM_RTC_READ \ + _IOWR('v', IOCNUM_RTC_READ, struct vm_rtc_data) +#define VM_RTC_SETTIME \ + _IOW('v', IOCNUM_RTC_SETTIME, struct vm_rtc_time) +#define VM_RTC_GETTIME \ + _IOR('v', IOCNUM_RTC_GETTIME, struct vm_rtc_time) +#define VM_RESTART_INSTRUCTION \ + _IOW('v', IOCNUM_RESTART_INSTRUCTION, int) +#endif diff -u -r -N usr/src/sys/amd64/vmm/vmm.c /usr/src/sys/amd64/vmm/vmm.c --- usr/src/sys/amd64/vmm/vmm.c 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm.c 2016-11-30 10:56:05.790373000 +0000 @@ -66,6 +66,7 @@ #include "vmm_ktr.h" #include "vmm_host.h" #include "vmm_mem.h" +#include "vmm_usermem.h" #include "vmm_util.h" #include "vatpic.h" #include "vatpit.h" @@ -148,6 +149,7 @@ struct vatpit *vatpit; /* (i) virtual atpit */ struct vpmtmr *vpmtmr; /* (i) virtual ACPI PM timer */ struct vrtc *vrtc; /* (o) virtual RTC */ + struct ioregh *ioregh; /* () I/O reg handler */ volatile cpuset_t active_cpus; /* (i) active vcpus */ int suspend; /* (i) stop VM execution */ volatile cpuset_t suspended_cpus; /* (i) suspended vcpus */ @@ -419,6 +421,7 @@ vm->vpmtmr = vpmtmr_init(vm); if (create) vm->vrtc = vrtc_init(vm); + vm->ioregh = ioregh_init(vm); CPU_ZERO(&vm->active_cpus); @@ -475,11 +478,13 @@ vrtc_cleanup(vm->vrtc); else vrtc_reset(vm->vrtc); + ioregh_cleanup(vm->ioregh); vpmtmr_cleanup(vm->vpmtmr); vatpit_cleanup(vm->vatpit); vhpet_cleanup(vm->vhpet); vatpic_cleanup(vm->vatpic); vioapic_cleanup(vm->vioapic); + vmm_usermem_cleanup(vm->vmspace); for (i = 0; i < VM_MAXCPU; i++) vcpu_cleanup(vm, i, destroy); @@ -552,6 +557,18 @@ return (0); } +/* Handler function for VM_MAP_USER_BUF ioctl. */ +int +vm_map_usermem(struct vm *vm, vm_paddr_t gpa, size_t len, void *buf, struct thread *td) +{ + vm_object_t obj; + + if ((obj = vmm_usermem_alloc(vm->vmspace, gpa, len, buf, td)) == NULL) + return (ENOMEM); + + return (0); +} + int vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len) { @@ -588,6 +605,9 @@ if (ppt_is_mmio(vm, gpa)) return (true); /* 'gpa' is pci passthru mmio */ + if (usermem_mapped(vm->vmspace, gpa)) + return (true); /* 'gpa' is user-space buffer mapped */ + return (false); } @@ -2457,6 +2477,12 @@ return (vm->vrtc); } +struct ioregh * +vm_ioregh(struct vm *vm) +{ + return (vm->ioregh); +} + enum vm_reg_name vm_segment_name(int seg) { diff -u -r -N usr/src/sys/amd64/vmm/vmm.c.orig /usr/src/sys/amd64/vmm/vmm.c.orig --- usr/src/sys/amd64/vmm/vmm.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm.c.orig 2016-11-30 10:52:57.180528000 +0000 @@ -0,0 +1,2598 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/vmm/vmm.c 296103 2016-02-26 16:18:47Z marcel $ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD: releng/11.0/sys/amd64/vmm/vmm.c 296103 2016-02-26 16:18:47Z marcel $"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/kernel.h> +#include <sys/module.h> +#include <sys/sysctl.h> +#include <sys/malloc.h> +#include <sys/pcpu.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/proc.h> +#include <sys/rwlock.h> +#include <sys/sched.h> +#include <sys/smp.h> +#include <sys/systm.h> + +#include <vm/vm.h> +#include <vm/vm_object.h> +#include <vm/vm_page.h> +#include <vm/pmap.h> +#include <vm/vm_map.h> +#include <vm/vm_extern.h> +#include <vm/vm_param.h> + +#include <machine/cpu.h> +#include <machine/pcb.h> +#include <machine/smp.h> +#include <x86/psl.h> +#include <x86/apicreg.h> + +#include <machine/vmm.h> +#include <machine/vmm_dev.h> +#include <machine/vmm_instruction_emul.h> + +#include "vmm_ioport.h" +#include "vmm_ktr.h" +#include "vmm_host.h" +#include "vmm_mem.h" +#include "vmm_util.h" +#include "vatpic.h" +#include "vatpit.h" +#include "vhpet.h" +#include "vioapic.h" +#include "vlapic.h" +#include "vpmtmr.h" +#include "vrtc.h" +#include "vmm_stat.h" +#include "vmm_lapic.h" + +#include "io/ppt.h" +#include "io/iommu.h" + +struct vlapic; + +/* + * Initialization: + * (a) allocated when vcpu is created + * (i) initialized when vcpu is created and when it is reinitialized + * (o) initialized the first time the vcpu is created + * (x) initialized before use + */ +struct vcpu { + struct mtx mtx; /* (o) protects 'state' and 'hostcpu' */ + enum vcpu_state state; /* (o) vcpu state */ + int hostcpu; /* (o) vcpu's host cpu */ + int reqidle; /* (i) request vcpu to idle */ + struct vlapic *vlapic; /* (i) APIC device model */ + enum x2apic_state x2apic_state; /* (i) APIC mode */ + uint64_t exitintinfo; /* (i) events pending at VM exit */ + int nmi_pending; /* (i) NMI pending */ + int extint_pending; /* (i) INTR pending */ + int exception_pending; /* (i) exception pending */ + int exc_vector; /* (x) exception collateral */ + int exc_errcode_valid; + uint32_t exc_errcode; + struct savefpu *guestfpu; /* (a,i) guest fpu state */ + uint64_t guest_xcr0; /* (i) guest %xcr0 register */ + void *stats; /* (a,i) statistics */ + struct vm_exit exitinfo; /* (x) exit reason and collateral */ + uint64_t nextrip; /* (x) next instruction to execute */ +}; + +#define vcpu_lock_initialized(v) mtx_initialized(&((v)->mtx)) +#define vcpu_lock_init(v) mtx_init(&((v)->mtx), "vcpu lock", 0, MTX_SPIN) +#define vcpu_lock(v) mtx_lock_spin(&((v)->mtx)) +#define vcpu_unlock(v) mtx_unlock_spin(&((v)->mtx)) +#define vcpu_assert_locked(v) mtx_assert(&((v)->mtx), MA_OWNED) + +struct mem_seg { + size_t len; + bool sysmem; + struct vm_object *object; +}; +#define VM_MAX_MEMSEGS 3 + +struct mem_map { + vm_paddr_t gpa; + size_t len; + vm_ooffset_t segoff; + int segid; + int prot; + int flags; +}; +#define VM_MAX_MEMMAPS 4 + +/* + * Initialization: + * (o) initialized the first time the VM is created + * (i) initialized when VM is created and when it is reinitialized + * (x) initialized before use + */ +struct vm { + void *cookie; /* (i) cpu-specific data */ + void *iommu; /* (x) iommu-specific data */ + struct vhpet *vhpet; /* (i) virtual HPET */ + struct vioapic *vioapic; /* (i) virtual ioapic */ + struct vatpic *vatpic; /* (i) virtual atpic */ + struct vatpit *vatpit; /* (i) virtual atpit */ + struct vpmtmr *vpmtmr; /* (i) virtual ACPI PM timer */ + struct vrtc *vrtc; /* (o) virtual RTC */ + volatile cpuset_t active_cpus; /* (i) active vcpus */ + int suspend; /* (i) stop VM execution */ + volatile cpuset_t suspended_cpus; /* (i) suspended vcpus */ + volatile cpuset_t halted_cpus; /* (x) cpus in a hard halt */ + cpuset_t rendezvous_req_cpus; /* (x) rendezvous requested */ + cpuset_t rendezvous_done_cpus; /* (x) rendezvous finished */ + void *rendezvous_arg; /* (x) rendezvous func/arg */ + vm_rendezvous_func_t rendezvous_func; + struct mtx rendezvous_mtx; /* (o) rendezvous lock */ + struct mem_map mem_maps[VM_MAX_MEMMAPS]; /* (i) guest address space */ + struct mem_seg mem_segs[VM_MAX_MEMSEGS]; /* (o) guest memory regions */ + struct vmspace *vmspace; /* (o) guest's address space */ + char name[VM_MAX_NAMELEN]; /* (o) virtual machine name */ + struct vcpu vcpu[VM_MAXCPU]; /* (i) guest vcpus */ +}; + +static int vmm_initialized; + +static struct vmm_ops *ops; +#define VMM_INIT(num) (ops != NULL ? (*ops->init)(num) : 0) +#define VMM_CLEANUP() (ops != NULL ? (*ops->cleanup)() : 0) +#define VMM_RESUME() (ops != NULL ? (*ops->resume)() : 0) + +#define VMINIT(vm, pmap) (ops != NULL ? (*ops->vminit)(vm, pmap): NULL) +#define VMRUN(vmi, vcpu, rip, pmap, evinfo) \ + (ops != NULL ? (*ops->vmrun)(vmi, vcpu, rip, pmap, evinfo) : ENXIO) +#define VMCLEANUP(vmi) (ops != NULL ? (*ops->vmcleanup)(vmi) : NULL) +#define VMSPACE_ALLOC(min, max) \ + (ops != NULL ? (*ops->vmspace_alloc)(min, max) : NULL) +#define VMSPACE_FREE(vmspace) \ + (ops != NULL ? (*ops->vmspace_free)(vmspace) : ENXIO) +#define VMGETREG(vmi, vcpu, num, retval) \ + (ops != NULL ? (*ops->vmgetreg)(vmi, vcpu, num, retval) : ENXIO) +#define VMSETREG(vmi, vcpu, num, val) \ + (ops != NULL ? (*ops->vmsetreg)(vmi, vcpu, num, val) : ENXIO) +#define VMGETDESC(vmi, vcpu, num, desc) \ + (ops != NULL ? (*ops->vmgetdesc)(vmi, vcpu, num, desc) : ENXIO) +#define VMSETDESC(vmi, vcpu, num, desc) \ + (ops != NULL ? (*ops->vmsetdesc)(vmi, vcpu, num, desc) : ENXIO) +#define VMGETCAP(vmi, vcpu, num, retval) \ + (ops != NULL ? (*ops->vmgetcap)(vmi, vcpu, num, retval) : ENXIO) +#define VMSETCAP(vmi, vcpu, num, val) \ + (ops != NULL ? (*ops->vmsetcap)(vmi, vcpu, num, val) : ENXIO) +#define VLAPIC_INIT(vmi, vcpu) \ + (ops != NULL ? (*ops->vlapic_init)(vmi, vcpu) : NULL) +#define VLAPIC_CLEANUP(vmi, vlapic) \ + (ops != NULL ? (*ops->vlapic_cleanup)(vmi, vlapic) : NULL) + +#define fpu_start_emulating() load_cr0(rcr0() | CR0_TS) +#define fpu_stop_emulating() clts() + +static MALLOC_DEFINE(M_VM, "vm", "vm"); + +/* statistics */ +static VMM_STAT(VCPU_TOTAL_RUNTIME, "vcpu total runtime"); + +SYSCTL_NODE(_hw, OID_AUTO, vmm, CTLFLAG_RW, NULL, NULL); + +/* + * Halt the guest if all vcpus are executing a HLT instruction with + * interrupts disabled. + */ +static int halt_detection_enabled = 1; +SYSCTL_INT(_hw_vmm, OID_AUTO, halt_detection, CTLFLAG_RDTUN, + &halt_detection_enabled, 0, + "Halt VM if all vcpus execute HLT with interrupts disabled"); + +static int vmm_ipinum; +SYSCTL_INT(_hw_vmm, OID_AUTO, ipinum, CTLFLAG_RD, &vmm_ipinum, 0, + "IPI vector used for vcpu notifications"); + +static int trace_guest_exceptions; +SYSCTL_INT(_hw_vmm, OID_AUTO, trace_guest_exceptions, CTLFLAG_RDTUN, + &trace_guest_exceptions, 0, + "Trap into hypervisor on all guest exceptions and reflect them back"); + +static int vmm_force_iommu = 0; +TUNABLE_INT("hw.vmm.force_iommu", &vmm_force_iommu); +SYSCTL_INT(_hw_vmm, OID_AUTO, force_iommu, CTLFLAG_RDTUN, &vmm_force_iommu, 0, + "Force use of I/O MMU even if no passthrough devices were found."); + +static void vm_free_memmap(struct vm *vm, int ident); +static bool sysmem_mapping(struct vm *vm, struct mem_map *mm); +static void vcpu_notify_event_locked(struct vcpu *vcpu, bool lapic_intr); + +#ifdef KTR +static const char * +vcpu_state2str(enum vcpu_state state) +{ + + switch (state) { + case VCPU_IDLE: + return ("idle"); + case VCPU_FROZEN: + return ("frozen"); + case VCPU_RUNNING: + return ("running"); + case VCPU_SLEEPING: + return ("sleeping"); + default: + return ("unknown"); + } +} +#endif + +static void +vcpu_cleanup(struct vm *vm, int i, bool destroy) +{ + struct vcpu *vcpu = &vm->vcpu[i]; + + VLAPIC_CLEANUP(vm->cookie, vcpu->vlapic); + if (destroy) { + vmm_stat_free(vcpu->stats); + fpu_save_area_free(vcpu->guestfpu); + } +} + +static void +vcpu_init(struct vm *vm, int vcpu_id, bool create) +{ + struct vcpu *vcpu; + + KASSERT(vcpu_id >= 0 && vcpu_id < VM_MAXCPU, + ("vcpu_init: invalid vcpu %d", vcpu_id)); + + vcpu = &vm->vcpu[vcpu_id]; + + if (create) { + KASSERT(!vcpu_lock_initialized(vcpu), ("vcpu %d already " + "initialized", vcpu_id)); + vcpu_lock_init(vcpu); + vcpu->state = VCPU_IDLE; + vcpu->hostcpu = NOCPU; + vcpu->guestfpu = fpu_save_area_alloc(); + vcpu->stats = vmm_stat_alloc(); + } + + vcpu->vlapic = VLAPIC_INIT(vm->cookie, vcpu_id); + vm_set_x2apic_state(vm, vcpu_id, X2APIC_DISABLED); + vcpu->reqidle = 0; + vcpu->exitintinfo = 0; + vcpu->nmi_pending = 0; + vcpu->extint_pending = 0; + vcpu->exception_pending = 0; + vcpu->guest_xcr0 = XFEATURE_ENABLED_X87; + fpu_save_area_reset(vcpu->guestfpu); + vmm_stat_init(vcpu->stats); +} + +int +vcpu_trace_exceptions(struct vm *vm, int vcpuid) +{ + + return (trace_guest_exceptions); +} + +struct vm_exit * +vm_exitinfo(struct vm *vm, int cpuid) +{ + struct vcpu *vcpu; + + if (cpuid < 0 || cpuid >= VM_MAXCPU) + panic("vm_exitinfo: invalid cpuid %d", cpuid); + + vcpu = &vm->vcpu[cpuid]; + + return (&vcpu->exitinfo); +} + +static void +vmm_resume(void) +{ + VMM_RESUME(); +} + +static int +vmm_init(void) +{ + int error; + + vmm_host_state_init(); + + vmm_ipinum = lapic_ipi_alloc(&IDTVEC(justreturn)); + if (vmm_ipinum < 0) + vmm_ipinum = IPI_AST; + + error = vmm_mem_init(); + if (error) + return (error); + + if (vmm_is_intel()) + ops = &vmm_ops_intel; + else if (vmm_is_amd()) + ops = &vmm_ops_amd; + else + return (ENXIO); + + vmm_resume_p = vmm_resume; + + return (VMM_INIT(vmm_ipinum)); +} + +static int +vmm_handler(module_t mod, int what, void *arg) +{ + int error; + + switch (what) { + case MOD_LOAD: + vmmdev_init(); + if (vmm_force_iommu || ppt_avail_devices() > 0) + iommu_init(); + error = vmm_init(); + if (error == 0) + vmm_initialized = 1; + break; + case MOD_UNLOAD: + error = vmmdev_cleanup(); + if (error == 0) { + vmm_resume_p = NULL; + iommu_cleanup(); + if (vmm_ipinum != IPI_AST) + lapic_ipi_free(vmm_ipinum); + error = VMM_CLEANUP(); + /* + * Something bad happened - prevent new + * VMs from being created + */ + if (error) + vmm_initialized = 0; + } + break; + default: + error = 0; + break; + } + return (error); +} + +static moduledata_t vmm_kmod = { + "vmm", + vmm_handler, + NULL +}; + +/* + * vmm initialization has the following dependencies: + * + * - iommu initialization must happen after the pci passthru driver has had + * a chance to attach to any passthru devices (after SI_SUB_CONFIGURE). + * + * - VT-x initialization requires smp_rendezvous() and therefore must happen + * after SMP is fully functional (after SI_SUB_SMP). + */ +DECLARE_MODULE(vmm, vmm_kmod, SI_SUB_SMP + 1, SI_ORDER_ANY); +MODULE_VERSION(vmm, 1); + +static void +vm_init(struct vm *vm, bool create) +{ + int i; + + vm->cookie = VMINIT(vm, vmspace_pmap(vm->vmspace)); + vm->iommu = NULL; + vm->vioapic = vioapic_init(vm); + vm->vhpet = vhpet_init(vm); + vm->vatpic = vatpic_init(vm); + vm->vatpit = vatpit_init(vm); + vm->vpmtmr = vpmtmr_init(vm); + if (create) + vm->vrtc = vrtc_init(vm); + + CPU_ZERO(&vm->active_cpus); + + vm->suspend = 0; + CPU_ZERO(&vm->suspended_cpus); + + for (i = 0; i < VM_MAXCPU; i++) + vcpu_init(vm, i, create); +} + +int +vm_create(const char *name, struct vm **retvm) +{ + struct vm *vm; + struct vmspace *vmspace; + + /* + * If vmm.ko could not be successfully initialized then don't attempt + * to create the virtual machine. + */ + if (!vmm_initialized) + return (ENXIO); + + if (name == NULL || strlen(name) >= VM_MAX_NAMELEN) + return (EINVAL); + + vmspace = VMSPACE_ALLOC(0, VM_MAXUSER_ADDRESS); + if (vmspace == NULL) + return (ENOMEM); + + vm = malloc(sizeof(struct vm), M_VM, M_WAITOK | M_ZERO); + strcpy(vm->name, name); + vm->vmspace = vmspace; + mtx_init(&vm->rendezvous_mtx, "vm rendezvous lock", 0, MTX_DEF); + + vm_init(vm, true); + + *retvm = vm; + return (0); +} + +static void +vm_cleanup(struct vm *vm, bool destroy) +{ + struct mem_map *mm; + int i; + + ppt_unassign_all(vm); + + if (vm->iommu != NULL) + iommu_destroy_domain(vm->iommu); + + if (destroy) + vrtc_cleanup(vm->vrtc); + else + vrtc_reset(vm->vrtc); + vpmtmr_cleanup(vm->vpmtmr); + vatpit_cleanup(vm->vatpit); + vhpet_cleanup(vm->vhpet); + vatpic_cleanup(vm->vatpic); + vioapic_cleanup(vm->vioapic); + + for (i = 0; i < VM_MAXCPU; i++) + vcpu_cleanup(vm, i, destroy); + + VMCLEANUP(vm->cookie); + + /* + * System memory is removed from the guest address space only when + * the VM is destroyed. This is because the mapping remains the same + * across VM reset. + * + * Device memory can be relocated by the guest (e.g. using PCI BARs) + * so those mappings are removed on a VM reset. + */ + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + mm = &vm->mem_maps[i]; + if (destroy || !sysmem_mapping(vm, mm)) + vm_free_memmap(vm, i); + } + + if (destroy) { + for (i = 0; i < VM_MAX_MEMSEGS; i++) + vm_free_memseg(vm, i); + + VMSPACE_FREE(vm->vmspace); + vm->vmspace = NULL; + } +} + +void +vm_destroy(struct vm *vm) +{ + vm_cleanup(vm, true); + free(vm, M_VM); +} + +int +vm_reinit(struct vm *vm) +{ + int error; + + /* + * A virtual machine can be reset only if all vcpus are suspended. + */ + if (CPU_CMP(&vm->suspended_cpus, &vm->active_cpus) == 0) { + vm_cleanup(vm, false); + vm_init(vm, false); + error = 0; + } else { + error = EBUSY; + } + + return (error); +} + +const char * +vm_name(struct vm *vm) +{ + return (vm->name); +} + +int +vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa) +{ + vm_object_t obj; + + if ((obj = vmm_mmio_alloc(vm->vmspace, gpa, len, hpa)) == NULL) + return (ENOMEM); + else + return (0); +} + +int +vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len) +{ + + vmm_mmio_free(vm->vmspace, gpa, len); + return (0); +} + +/* + * Return 'true' if 'gpa' is allocated in the guest address space. + * + * This function is called in the context of a running vcpu which acts as + * an implicit lock on 'vm->mem_maps[]'. + */ +bool +vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa) +{ + struct mem_map *mm; + int i; + +#ifdef INVARIANTS + int hostcpu, state; + state = vcpu_get_state(vm, vcpuid, &hostcpu); + KASSERT(state == VCPU_RUNNING && hostcpu == curcpu, + ("%s: invalid vcpu state %d/%d", __func__, state, hostcpu)); +#endif + + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + mm = &vm->mem_maps[i]; + if (mm->len != 0 && gpa >= mm->gpa && gpa < mm->gpa + mm->len) + return (true); /* 'gpa' is sysmem or devmem */ + } + + if (ppt_is_mmio(vm, gpa)) + return (true); /* 'gpa' is pci passthru mmio */ + + return (false); +} + +int +vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem) +{ + struct mem_seg *seg; + vm_object_t obj; + + if (ident < 0 || ident >= VM_MAX_MEMSEGS) + return (EINVAL); + + if (len == 0 || (len & PAGE_MASK)) + return (EINVAL); + + seg = &vm->mem_segs[ident]; + if (seg->object != NULL) { + if (seg->len == len && seg->sysmem == sysmem) + return (EEXIST); + else + return (EINVAL); + } + + obj = vm_object_allocate(OBJT_DEFAULT, len >> PAGE_SHIFT); + if (obj == NULL) + return (ENOMEM); + + seg->len = len; + seg->object = obj; + seg->sysmem = sysmem; + return (0); +} + +int +vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem, + vm_object_t *objptr) +{ + struct mem_seg *seg; + + if (ident < 0 || ident >= VM_MAX_MEMSEGS) + return (EINVAL); + + seg = &vm->mem_segs[ident]; + if (len) + *len = seg->len; + if (sysmem) + *sysmem = seg->sysmem; + if (objptr) + *objptr = seg->object; + return (0); +} + +void +vm_free_memseg(struct vm *vm, int ident) +{ + struct mem_seg *seg; + + KASSERT(ident >= 0 && ident < VM_MAX_MEMSEGS, + ("%s: invalid memseg ident %d", __func__, ident)); + + seg = &vm->mem_segs[ident]; + if (seg->object != NULL) { + vm_object_deallocate(seg->object); + bzero(seg, sizeof(struct mem_seg)); + } +} + +int +vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t first, + size_t len, int prot, int flags) +{ + struct mem_seg *seg; + struct mem_map *m, *map; + vm_ooffset_t last; + int i, error; + + if (prot == 0 || (prot & ~(VM_PROT_ALL)) != 0) + return (EINVAL); + + if (flags & ~VM_MEMMAP_F_WIRED) + return (EINVAL); + + if (segid < 0 || segid >= VM_MAX_MEMSEGS) + return (EINVAL); + + seg = &vm->mem_segs[segid]; + if (seg->object == NULL) + return (EINVAL); + + last = first + len; + if (first < 0 || first >= last || last > seg->len) + return (EINVAL); + + if ((gpa | first | last) & PAGE_MASK) + return (EINVAL); + + map = NULL; + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + m = &vm->mem_maps[i]; + if (m->len == 0) { + map = m; + break; + } + } + + if (map == NULL) + return (ENOSPC); + + error = vm_map_find(&vm->vmspace->vm_map, seg->object, first, &gpa, + len, 0, VMFS_NO_SPACE, prot, prot, 0); + if (error != KERN_SUCCESS) + return (EFAULT); + + vm_object_reference(seg->object); + + if (flags & VM_MEMMAP_F_WIRED) { + error = vm_map_wire(&vm->vmspace->vm_map, gpa, gpa + len, + VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES); + if (error != KERN_SUCCESS) { + vm_map_remove(&vm->vmspace->vm_map, gpa, gpa + len); + return (EFAULT); + } + } + + map->gpa = gpa; + map->len = len; + map->segoff = first; + map->segid = segid; + map->prot = prot; + map->flags = flags; + return (0); +} + +int +vm_mmap_getnext(struct vm *vm, vm_paddr_t *gpa, int *segid, + vm_ooffset_t *segoff, size_t *len, int *prot, int *flags) +{ + struct mem_map *mm, *mmnext; + int i; + + mmnext = NULL; + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + mm = &vm->mem_maps[i]; + if (mm->len == 0 || mm->gpa < *gpa) + continue; + if (mmnext == NULL || mm->gpa < mmnext->gpa) + mmnext = mm; + } + + if (mmnext != NULL) { + *gpa = mmnext->gpa; + if (segid) + *segid = mmnext->segid; + if (segoff) + *segoff = mmnext->segoff; + if (len) + *len = mmnext->len; + if (prot) + *prot = mmnext->prot; + if (flags) + *flags = mmnext->flags; + return (0); + } else { + return (ENOENT); + } +} + +static void +vm_free_memmap(struct vm *vm, int ident) +{ + struct mem_map *mm; + int error; + + mm = &vm->mem_maps[ident]; + if (mm->len) { + error = vm_map_remove(&vm->vmspace->vm_map, mm->gpa, + mm->gpa + mm->len); + KASSERT(error == KERN_SUCCESS, ("%s: vm_map_remove error %d", + __func__, error)); + bzero(mm, sizeof(struct mem_map)); + } +} + +static __inline bool +sysmem_mapping(struct vm *vm, struct mem_map *mm) +{ + + if (mm->len != 0 && vm->mem_segs[mm->segid].sysmem) + return (true); + else + return (false); +} + +static vm_paddr_t +sysmem_maxaddr(struct vm *vm) +{ + struct mem_map *mm; + vm_paddr_t maxaddr; + int i; + + maxaddr = 0; + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + mm = &vm->mem_maps[i]; + if (sysmem_mapping(vm, mm)) { + if (maxaddr < mm->gpa + mm->len) + maxaddr = mm->gpa + mm->len; + } + } + return (maxaddr); +} + +static void +vm_iommu_modify(struct vm *vm, boolean_t map) +{ + int i, sz; + vm_paddr_t gpa, hpa; + struct mem_map *mm; + void *vp, *cookie, *host_domain; + + sz = PAGE_SIZE; + host_domain = iommu_host_domain(); + + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + mm = &vm->mem_maps[i]; + if (!sysmem_mapping(vm, mm)) + continue; + + if (map) { + KASSERT((mm->flags & VM_MEMMAP_F_IOMMU) == 0, + ("iommu map found invalid memmap %#lx/%#lx/%#x", + mm->gpa, mm->len, mm->flags)); + if ((mm->flags & VM_MEMMAP_F_WIRED) == 0) + continue; + mm->flags |= VM_MEMMAP_F_IOMMU; + } else { + if ((mm->flags & VM_MEMMAP_F_IOMMU) == 0) + continue; + mm->flags &= ~VM_MEMMAP_F_IOMMU; + KASSERT((mm->flags & VM_MEMMAP_F_WIRED) != 0, + ("iommu unmap found invalid memmap %#lx/%#lx/%#x", + mm->gpa, mm->len, mm->flags)); + } + + gpa = mm->gpa; + while (gpa < mm->gpa + mm->len) { + vp = vm_gpa_hold(vm, -1, gpa, PAGE_SIZE, VM_PROT_WRITE, + &cookie); + KASSERT(vp != NULL, ("vm(%s) could not map gpa %#lx", + vm_name(vm), gpa)); + + vm_gpa_release(cookie); + + hpa = DMAP_TO_PHYS((uintptr_t)vp); + if (map) { + iommu_create_mapping(vm->iommu, gpa, hpa, sz); + iommu_remove_mapping(host_domain, hpa, sz); + } else { + iommu_remove_mapping(vm->iommu, gpa, sz); + iommu_create_mapping(host_domain, hpa, hpa, sz); + } + + gpa += PAGE_SIZE; + } + } + + /* + * Invalidate the cached translations associated with the domain + * from which pages were removed. + */ + if (map) + iommu_invalidate_tlb(host_domain); + else + iommu_invalidate_tlb(vm->iommu); +} + +#define vm_iommu_unmap(vm) vm_iommu_modify((vm), FALSE) +#define vm_iommu_map(vm) vm_iommu_modify((vm), TRUE) + +int +vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func) +{ + int error; + + error = ppt_unassign_device(vm, bus, slot, func); + if (error) + return (error); + + if (ppt_assigned_devices(vm) == 0) + vm_iommu_unmap(vm); + + return (0); +} + +int +vm_assign_pptdev(struct vm *vm, int bus, int slot, int func) +{ + int error; + vm_paddr_t maxaddr; + + /* Set up the IOMMU to do the 'gpa' to 'hpa' translation */ + if (ppt_assigned_devices(vm) == 0) { + KASSERT(vm->iommu == NULL, + ("vm_assign_pptdev: iommu must be NULL")); + maxaddr = sysmem_maxaddr(vm); + vm->iommu = iommu_create_domain(maxaddr); + vm_iommu_map(vm); + } + + error = ppt_assign_device(vm, bus, slot, func); + return (error); +} + +void * +vm_gpa_hold(struct vm *vm, int vcpuid, vm_paddr_t gpa, size_t len, int reqprot, + void **cookie) +{ + int i, count, pageoff; + struct mem_map *mm; + vm_page_t m; +#ifdef INVARIANTS + /* + * All vcpus are frozen by ioctls that modify the memory map + * (e.g. VM_MMAP_MEMSEG). Therefore 'vm->memmap[]' stability is + * guaranteed if at least one vcpu is in the VCPU_FROZEN state. + */ + int state; + KASSERT(vcpuid >= -1 || vcpuid < VM_MAXCPU, ("%s: invalid vcpuid %d", + __func__, vcpuid)); + for (i = 0; i < VM_MAXCPU; i++) { + if (vcpuid != -1 && vcpuid != i) + continue; + state = vcpu_get_state(vm, i, NULL); + KASSERT(state == VCPU_FROZEN, ("%s: invalid vcpu state %d", + __func__, state)); + } +#endif + pageoff = gpa & PAGE_MASK; + if (len > PAGE_SIZE - pageoff) + panic("vm_gpa_hold: invalid gpa/len: 0x%016lx/%lu", gpa, len); + + count = 0; + for (i = 0; i < VM_MAX_MEMMAPS; i++) { + mm = &vm->mem_maps[i]; + if (sysmem_mapping(vm, mm) && gpa >= mm->gpa && + gpa < mm->gpa + mm->len) { + count = vm_fault_quick_hold_pages(&vm->vmspace->vm_map, + trunc_page(gpa), PAGE_SIZE, reqprot, &m, 1); + break; + } + } + + if (count == 1) { + *cookie = m; + return ((void *)(PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)) + pageoff)); + } else { + *cookie = NULL; + return (NULL); + } +} + +void +vm_gpa_release(void *cookie) +{ + vm_page_t m = cookie; + + vm_page_lock(m); + vm_page_unhold(m); + vm_page_unlock(m); +} + +int +vm_get_register(struct vm *vm, int vcpu, int reg, uint64_t *retval) +{ + + if (vcpu < 0 || vcpu >= VM_MAXCPU) + return (EINVAL); + + if (reg >= VM_REG_LAST) + return (EINVAL); + + return (VMGETREG(vm->cookie, vcpu, reg, retval)); +} + +int +vm_set_register(struct vm *vm, int vcpuid, int reg, uint64_t val) +{ + struct vcpu *vcpu; + int error; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + if (reg >= VM_REG_LAST) + return (EINVAL); + + error = VMSETREG(vm->cookie, vcpuid, reg, val); + if (error || reg != VM_REG_GUEST_RIP) + return (error); + + /* Set 'nextrip' to match the value of %rip */ + VCPU_CTR1(vm, vcpuid, "Setting nextrip to %#lx", val); + vcpu = &vm->vcpu[vcpuid]; + vcpu->nextrip = val; + return (0); +} + +static boolean_t +is_descriptor_table(int reg) +{ + + switch (reg) { + case VM_REG_GUEST_IDTR: + case VM_REG_GUEST_GDTR: + return (TRUE); + default: + return (FALSE); + } +} + +static boolean_t +is_segment_register(int reg) +{ + + switch (reg) { + case VM_REG_GUEST_ES: + case VM_REG_GUEST_CS: + case VM_REG_GUEST_SS: + case VM_REG_GUEST_DS: + case VM_REG_GUEST_FS: + case VM_REG_GUEST_GS: + case VM_REG_GUEST_TR: + case VM_REG_GUEST_LDTR: + return (TRUE); + default: + return (FALSE); + } +} + +int +vm_get_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *desc) +{ + + if (vcpu < 0 || vcpu >= VM_MAXCPU) + return (EINVAL); + + if (!is_segment_register(reg) && !is_descriptor_table(reg)) + return (EINVAL); + + return (VMGETDESC(vm->cookie, vcpu, reg, desc)); +} + +int +vm_set_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *desc) +{ + if (vcpu < 0 || vcpu >= VM_MAXCPU) + return (EINVAL); + + if (!is_segment_register(reg) && !is_descriptor_table(reg)) + return (EINVAL); + + return (VMSETDESC(vm->cookie, vcpu, reg, desc)); +} + +static void +restore_guest_fpustate(struct vcpu *vcpu) +{ + + /* flush host state to the pcb */ + fpuexit(curthread); + + /* restore guest FPU state */ + fpu_stop_emulating(); + fpurestore(vcpu->guestfpu); + + /* restore guest XCR0 if XSAVE is enabled in the host */ + if (rcr4() & CR4_XSAVE) + load_xcr(0, vcpu->guest_xcr0); + + /* + * The FPU is now "dirty" with the guest's state so turn on emulation + * to trap any access to the FPU by the host. + */ + fpu_start_emulating(); +} + +static void +save_guest_fpustate(struct vcpu *vcpu) +{ + + if ((rcr0() & CR0_TS) == 0) + panic("fpu emulation not enabled in host!"); + + /* save guest XCR0 and restore host XCR0 */ + if (rcr4() & CR4_XSAVE) { + vcpu->guest_xcr0 = rxcr(0); + load_xcr(0, vmm_get_host_xcr0()); + } + + /* save guest FPU state */ + fpu_stop_emulating(); + fpusave(vcpu->guestfpu); + fpu_start_emulating(); +} + +static VMM_STAT(VCPU_IDLE_TICKS, "number of ticks vcpu was idle"); + +static int +vcpu_set_state_locked(struct vm *vm, int vcpuid, enum vcpu_state newstate, + bool from_idle) +{ + struct vcpu *vcpu; + int error; + + vcpu = &vm->vcpu[vcpuid]; + vcpu_assert_locked(vcpu); + + /* + * State transitions from the vmmdev_ioctl() must always begin from + * the VCPU_IDLE state. This guarantees that there is only a single + * ioctl() operating on a vcpu at any point. + */ + if (from_idle) { + while (vcpu->state != VCPU_IDLE) { + vcpu->reqidle = 1; + vcpu_notify_event_locked(vcpu, false); + VCPU_CTR1(vm, vcpuid, "vcpu state change from %s to " + "idle requested", vcpu_state2str(vcpu->state)); + msleep_spin(&vcpu->state, &vcpu->mtx, "vmstat", hz); + } + } else { + KASSERT(vcpu->state != VCPU_IDLE, ("invalid transition from " + "vcpu idle state")); + } + + if (vcpu->state == VCPU_RUNNING) { + KASSERT(vcpu->hostcpu == curcpu, ("curcpu %d and hostcpu %d " + "mismatch for running vcpu", curcpu, vcpu->hostcpu)); + } else { + KASSERT(vcpu->hostcpu == NOCPU, ("Invalid hostcpu %d for a " + "vcpu that is not running", vcpu->hostcpu)); + } + + /* + * The following state transitions are allowed: + * IDLE -> FROZEN -> IDLE + * FROZEN -> RUNNING -> FROZEN + * FROZEN -> SLEEPING -> FROZEN + */ + switch (vcpu->state) { + case VCPU_IDLE: + case VCPU_RUNNING: + case VCPU_SLEEPING: + error = (newstate != VCPU_FROZEN); + break; + case VCPU_FROZEN: + error = (newstate == VCPU_FROZEN); + break; + default: + error = 1; + break; + } + + if (error) + return (EBUSY); + + VCPU_CTR2(vm, vcpuid, "vcpu state changed from %s to %s", + vcpu_state2str(vcpu->state), vcpu_state2str(newstate)); + + vcpu->state = newstate; + if (newstate == VCPU_RUNNING) + vcpu->hostcpu = curcpu; + else + vcpu->hostcpu = NOCPU; + + if (newstate == VCPU_IDLE) + wakeup(&vcpu->state); + + return (0); +} + +static void +vcpu_require_state(struct vm *vm, int vcpuid, enum vcpu_state newstate) +{ + int error; + + if ((error = vcpu_set_state(vm, vcpuid, newstate, false)) != 0) + panic("Error %d setting state to %d\n", error, newstate); +} + +static void +vcpu_require_state_locked(struct vm *vm, int vcpuid, enum vcpu_state newstate) +{ + int error; + + if ((error = vcpu_set_state_locked(vm, vcpuid, newstate, false)) != 0) + panic("Error %d setting state to %d", error, newstate); +} + +static void +vm_set_rendezvous_func(struct vm *vm, vm_rendezvous_func_t func) +{ + + KASSERT(mtx_owned(&vm->rendezvous_mtx), ("rendezvous_mtx not locked")); + + /* + * Update 'rendezvous_func' and execute a write memory barrier to + * ensure that it is visible across all host cpus. This is not needed + * for correctness but it does ensure that all the vcpus will notice + * that the rendezvous is requested immediately. + */ + vm->rendezvous_func = func; + wmb(); +} + +#define RENDEZVOUS_CTR0(vm, vcpuid, fmt) \ + do { \ + if (vcpuid >= 0) \ + VCPU_CTR0(vm, vcpuid, fmt); \ + else \ + VM_CTR0(vm, fmt); \ + } while (0) + +static void +vm_handle_rendezvous(struct vm *vm, int vcpuid) +{ + + KASSERT(vcpuid == -1 || (vcpuid >= 0 && vcpuid < VM_MAXCPU), + ("vm_handle_rendezvous: invalid vcpuid %d", vcpuid)); + + mtx_lock(&vm->rendezvous_mtx); + while (vm->rendezvous_func != NULL) { + /* 'rendezvous_req_cpus' must be a subset of 'active_cpus' */ + CPU_AND(&vm->rendezvous_req_cpus, &vm->active_cpus); + + if (vcpuid != -1 && + CPU_ISSET(vcpuid, &vm->rendezvous_req_cpus) && + !CPU_ISSET(vcpuid, &vm->rendezvous_done_cpus)) { + VCPU_CTR0(vm, vcpuid, "Calling rendezvous func"); + (*vm->rendezvous_func)(vm, vcpuid, vm->rendezvous_arg); + CPU_SET(vcpuid, &vm->rendezvous_done_cpus); + } + if (CPU_CMP(&vm->rendezvous_req_cpus, + &vm->rendezvous_done_cpus) == 0) { + VCPU_CTR0(vm, vcpuid, "Rendezvous completed"); + vm_set_rendezvous_func(vm, NULL); + wakeup(&vm->rendezvous_func); + break; + } + RENDEZVOUS_CTR0(vm, vcpuid, "Wait for rendezvous completion"); + mtx_sleep(&vm->rendezvous_func, &vm->rendezvous_mtx, 0, + "vmrndv", 0); + } + mtx_unlock(&vm->rendezvous_mtx); +} + +/* + * Emulate a guest 'hlt' by sleeping until the vcpu is ready to run. + */ +static int +vm_handle_hlt(struct vm *vm, int vcpuid, bool intr_disabled, bool *retu) +{ + struct vcpu *vcpu; + const char *wmesg; + int t, vcpu_halted, vm_halted; + + KASSERT(!CPU_ISSET(vcpuid, &vm->halted_cpus), ("vcpu already halted")); + + vcpu = &vm->vcpu[vcpuid]; + vcpu_halted = 0; + vm_halted = 0; + + vcpu_lock(vcpu); + while (1) { + /* + * Do a final check for pending NMI or interrupts before + * really putting this thread to sleep. Also check for + * software events that would cause this vcpu to wakeup. + * + * These interrupts/events could have happened after the + * vcpu returned from VMRUN() and before it acquired the + * vcpu lock above. + */ + if (vm->rendezvous_func != NULL || vm->suspend || vcpu->reqidle) + break; + if (vm_nmi_pending(vm, vcpuid)) + break; + if (!intr_disabled) { + if (vm_extint_pending(vm, vcpuid) || + vlapic_pending_intr(vcpu->vlapic, NULL)) { + break; + } + } + + /* Don't go to sleep if the vcpu thread needs to yield */ + if (vcpu_should_yield(vm, vcpuid)) + break; + + /* + * Some Linux guests implement "halt" by having all vcpus + * execute HLT with interrupts disabled. 'halted_cpus' keeps + * track of the vcpus that have entered this state. When all + * vcpus enter the halted state the virtual machine is halted. + */ + if (intr_disabled) { + wmesg = "vmhalt"; + VCPU_CTR0(vm, vcpuid, "Halted"); + if (!vcpu_halted && halt_detection_enabled) { + vcpu_halted = 1; + CPU_SET_ATOMIC(vcpuid, &vm->halted_cpus); + } + if (CPU_CMP(&vm->halted_cpus, &vm->active_cpus) == 0) { + vm_halted = 1; + break; + } + } else { + wmesg = "vmidle"; + } + + t = ticks; + vcpu_require_state_locked(vm, vcpuid, VCPU_SLEEPING); + /* + * XXX msleep_spin() cannot be interrupted by signals so + * wake up periodically to check pending signals. + */ + msleep_spin(vcpu, &vcpu->mtx, wmesg, hz); + vcpu_require_state_locked(vm, vcpuid, VCPU_FROZEN); + vmm_stat_incr(vm, vcpuid, VCPU_IDLE_TICKS, ticks - t); + } + + if (vcpu_halted) + CPU_CLR_ATOMIC(vcpuid, &vm->halted_cpus); + + vcpu_unlock(vcpu); + + if (vm_halted) + vm_suspend(vm, VM_SUSPEND_HALT); + + return (0); +} + +static int +vm_handle_paging(struct vm *vm, int vcpuid, bool *retu) +{ + int rv, ftype; + struct vm_map *map; + struct vcpu *vcpu; + struct vm_exit *vme; + + vcpu = &vm->vcpu[vcpuid]; + vme = &vcpu->exitinfo; + + KASSERT(vme->inst_length == 0, ("%s: invalid inst_length %d", + __func__, vme->inst_length)); + + ftype = vme->u.paging.fault_type; + KASSERT(ftype == VM_PROT_READ || + ftype == VM_PROT_WRITE || ftype == VM_PROT_EXECUTE, + ("vm_handle_paging: invalid fault_type %d", ftype)); + + if (ftype == VM_PROT_READ || ftype == VM_PROT_WRITE) { + rv = pmap_emulate_accessed_dirty(vmspace_pmap(vm->vmspace), + vme->u.paging.gpa, ftype); + if (rv == 0) { + VCPU_CTR2(vm, vcpuid, "%s bit emulation for gpa %#lx", + ftype == VM_PROT_READ ? "accessed" : "dirty", + vme->u.paging.gpa); + goto done; + } + } + + map = &vm->vmspace->vm_map; + rv = vm_fault(map, vme->u.paging.gpa, ftype, VM_FAULT_NORMAL); + + VCPU_CTR3(vm, vcpuid, "vm_handle_paging rv = %d, gpa = %#lx, " + "ftype = %d", rv, vme->u.paging.gpa, ftype); + + if (rv != KERN_SUCCESS) + return (EFAULT); +done: + return (0); +} + +static int +vm_handle_inst_emul(struct vm *vm, int vcpuid, bool *retu) +{ + struct vie *vie; + struct vcpu *vcpu; + struct vm_exit *vme; + uint64_t gla, gpa, cs_base; + struct vm_guest_paging *paging; + mem_region_read_t mread; + mem_region_write_t mwrite; + enum vm_cpu_mode cpu_mode; + int cs_d, error, fault; + + vcpu = &vm->vcpu[vcpuid]; + vme = &vcpu->exitinfo; + + KASSERT(vme->inst_length == 0, ("%s: invalid inst_length %d", + __func__, vme->inst_length)); + + gla = vme->u.inst_emul.gla; + gpa = vme->u.inst_emul.gpa; + cs_base = vme->u.inst_emul.cs_base; + cs_d = vme->u.inst_emul.cs_d; + vie = &vme->u.inst_emul.vie; + paging = &vme->u.inst_emul.paging; + cpu_mode = paging->cpu_mode; + + VCPU_CTR1(vm, vcpuid, "inst_emul fault accessing gpa %#lx", gpa); + + /* Fetch, decode and emulate the faulting instruction */ + if (vie->num_valid == 0) { + error = vmm_fetch_instruction(vm, vcpuid, paging, vme->rip + + cs_base, VIE_INST_SIZE, vie, &fault); + } else { + /* + * The instruction bytes have already been copied into 'vie' + */ + error = fault = 0; + } + if (error || fault) + return (error); + + if (vmm_decode_instruction(vm, vcpuid, gla, cpu_mode, cs_d, vie) != 0) { + VCPU_CTR1(vm, vcpuid, "Error decoding instruction at %#lx", + vme->rip + cs_base); + *retu = true; /* dump instruction bytes in userspace */ + return (0); + } + + /* + * Update 'nextrip' based on the length of the emulated instruction. + */ + vme->inst_length = vie->num_processed; + vcpu->nextrip += vie->num_processed; + VCPU_CTR1(vm, vcpuid, "nextrip updated to %#lx after instruction " + "decoding", vcpu->nextrip); + + /* return to userland unless this is an in-kernel emulated device */ + if (gpa >= DEFAULT_APIC_BASE && gpa < DEFAULT_APIC_BASE + PAGE_SIZE) { + mread = lapic_mmio_read; + mwrite = lapic_mmio_write; + } else if (gpa >= VIOAPIC_BASE && gpa < VIOAPIC_BASE + VIOAPIC_SIZE) { + mread = vioapic_mmio_read; + mwrite = vioapic_mmio_write; + } else if (gpa >= VHPET_BASE && gpa < VHPET_BASE + VHPET_SIZE) { + mread = vhpet_mmio_read; + mwrite = vhpet_mmio_write; + } else { + *retu = true; + return (0); + } + + error = vmm_emulate_instruction(vm, vcpuid, gpa, vie, paging, + mread, mwrite, retu); + + return (error); +} + +static int +vm_handle_suspend(struct vm *vm, int vcpuid, bool *retu) +{ + int i, done; + struct vcpu *vcpu; + + done = 0; + vcpu = &vm->vcpu[vcpuid]; + + CPU_SET_ATOMIC(vcpuid, &vm->suspended_cpus); + + /* + * Wait until all 'active_cpus' have suspended themselves. + * + * Since a VM may be suspended at any time including when one or + * more vcpus are doing a rendezvous we need to call the rendezvous + * handler while we are waiting to prevent a deadlock. + */ + vcpu_lock(vcpu); + while (1) { + if (CPU_CMP(&vm->suspended_cpus, &vm->active_cpus) == 0) { + VCPU_CTR0(vm, vcpuid, "All vcpus suspended"); + break; + } + + if (vm->rendezvous_func == NULL) { + VCPU_CTR0(vm, vcpuid, "Sleeping during suspend"); + vcpu_require_state_locked(vm, vcpuid, VCPU_SLEEPING); + msleep_spin(vcpu, &vcpu->mtx, "vmsusp", hz); + vcpu_require_state_locked(vm, vcpuid, VCPU_FROZEN); + } else { + VCPU_CTR0(vm, vcpuid, "Rendezvous during suspend"); + vcpu_unlock(vcpu); + vm_handle_rendezvous(vm, vcpuid); + vcpu_lock(vcpu); + } + } + vcpu_unlock(vcpu); + + /* + * Wakeup the other sleeping vcpus and return to userspace. + */ + for (i = 0; i < VM_MAXCPU; i++) { + if (CPU_ISSET(i, &vm->suspended_cpus)) { + vcpu_notify_event(vm, i, false); + } + } + + *retu = true; + return (0); +} + +static int +vm_handle_reqidle(struct vm *vm, int vcpuid, bool *retu) +{ + struct vcpu *vcpu = &vm->vcpu[vcpuid]; + + vcpu_lock(vcpu); + KASSERT(vcpu->reqidle, ("invalid vcpu reqidle %d", vcpu->reqidle)); + vcpu->reqidle = 0; + vcpu_unlock(vcpu); + *retu = true; + return (0); +} + +int +vm_suspend(struct vm *vm, enum vm_suspend_how how) +{ + int i; + + if (how <= VM_SUSPEND_NONE || how >= VM_SUSPEND_LAST) + return (EINVAL); + + if (atomic_cmpset_int(&vm->suspend, 0, how) == 0) { + VM_CTR2(vm, "virtual machine already suspended %d/%d", + vm->suspend, how); + return (EALREADY); + } + + VM_CTR1(vm, "virtual machine successfully suspended %d", how); + + /* + * Notify all active vcpus that they are now suspended. + */ + for (i = 0; i < VM_MAXCPU; i++) { + if (CPU_ISSET(i, &vm->active_cpus)) + vcpu_notify_event(vm, i, false); + } + + return (0); +} + +void +vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip) +{ + struct vm_exit *vmexit; + + KASSERT(vm->suspend > VM_SUSPEND_NONE && vm->suspend < VM_SUSPEND_LAST, + ("vm_exit_suspended: invalid suspend type %d", vm->suspend)); + + vmexit = vm_exitinfo(vm, vcpuid); + vmexit->rip = rip; + vmexit->inst_length = 0; + vmexit->exitcode = VM_EXITCODE_SUSPENDED; + vmexit->u.suspended.how = vm->suspend; +} + +void +vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip) +{ + struct vm_exit *vmexit; + + KASSERT(vm->rendezvous_func != NULL, ("rendezvous not in progress")); + + vmexit = vm_exitinfo(vm, vcpuid); + vmexit->rip = rip; + vmexit->inst_length = 0; + vmexit->exitcode = VM_EXITCODE_RENDEZVOUS; + vmm_stat_incr(vm, vcpuid, VMEXIT_RENDEZVOUS, 1); +} + +void +vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip) +{ + struct vm_exit *vmexit; + + vmexit = vm_exitinfo(vm, vcpuid); + vmexit->rip = rip; + vmexit->inst_length = 0; + vmexit->exitcode = VM_EXITCODE_REQIDLE; + vmm_stat_incr(vm, vcpuid, VMEXIT_REQIDLE, 1); +} + +void +vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip) +{ + struct vm_exit *vmexit; + + vmexit = vm_exitinfo(vm, vcpuid); + vmexit->rip = rip; + vmexit->inst_length = 0; + vmexit->exitcode = VM_EXITCODE_BOGUS; + vmm_stat_incr(vm, vcpuid, VMEXIT_ASTPENDING, 1); +} + +int +vm_run(struct vm *vm, struct vm_run *vmrun) +{ + struct vm_eventinfo evinfo; + int error, vcpuid; + struct vcpu *vcpu; + struct pcb *pcb; + uint64_t tscval; + struct vm_exit *vme; + bool retu, intr_disabled; + pmap_t pmap; + + vcpuid = vmrun->cpuid; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + if (!CPU_ISSET(vcpuid, &vm->active_cpus)) + return (EINVAL); + + if (CPU_ISSET(vcpuid, &vm->suspended_cpus)) + return (EINVAL); + + pmap = vmspace_pmap(vm->vmspace); + vcpu = &vm->vcpu[vcpuid]; + vme = &vcpu->exitinfo; + evinfo.rptr = &vm->rendezvous_func; + evinfo.sptr = &vm->suspend; + evinfo.iptr = &vcpu->reqidle; +restart: + critical_enter(); + + KASSERT(!CPU_ISSET(curcpu, &pmap->pm_active), + ("vm_run: absurd pm_active")); + + tscval = rdtsc(); + + pcb = PCPU_GET(curpcb); + set_pcb_flags(pcb, PCB_FULL_IRET); + + restore_guest_fpustate(vcpu); + + vcpu_require_state(vm, vcpuid, VCPU_RUNNING); + error = VMRUN(vm->cookie, vcpuid, vcpu->nextrip, pmap, &evinfo); + vcpu_require_state(vm, vcpuid, VCPU_FROZEN); + + save_guest_fpustate(vcpu); + + vmm_stat_incr(vm, vcpuid, VCPU_TOTAL_RUNTIME, rdtsc() - tscval); + + critical_exit(); + + if (error == 0) { + retu = false; + vcpu->nextrip = vme->rip + vme->inst_length; + switch (vme->exitcode) { + case VM_EXITCODE_REQIDLE: + error = vm_handle_reqidle(vm, vcpuid, &retu); + break; + case VM_EXITCODE_SUSPENDED: + error = vm_handle_suspend(vm, vcpuid, &retu); + break; + case VM_EXITCODE_IOAPIC_EOI: + vioapic_process_eoi(vm, vcpuid, + vme->u.ioapic_eoi.vector); + break; + case VM_EXITCODE_RENDEZVOUS: + vm_handle_rendezvous(vm, vcpuid); + error = 0; + break; + case VM_EXITCODE_HLT: + intr_disabled = ((vme->u.hlt.rflags & PSL_I) == 0); + error = vm_handle_hlt(vm, vcpuid, intr_disabled, &retu); + break; + case VM_EXITCODE_PAGING: + error = vm_handle_paging(vm, vcpuid, &retu); + break; + case VM_EXITCODE_INST_EMUL: + error = vm_handle_inst_emul(vm, vcpuid, &retu); + break; + case VM_EXITCODE_INOUT: + case VM_EXITCODE_INOUT_STR: + error = vm_handle_inout(vm, vcpuid, vme, &retu); + break; + case VM_EXITCODE_MONITOR: + case VM_EXITCODE_MWAIT: + vm_inject_ud(vm, vcpuid); + break; + default: + retu = true; /* handled in userland */ + break; + } + } + + if (error == 0 && retu == false) + goto restart; + + VCPU_CTR2(vm, vcpuid, "retu %d/%d", error, vme->exitcode); + + /* copy the exit information */ + bcopy(vme, &vmrun->vm_exit, sizeof(struct vm_exit)); + return (error); +} + +int +vm_restart_instruction(void *arg, int vcpuid) +{ + struct vm *vm; + struct vcpu *vcpu; + enum vcpu_state state; + uint64_t rip; + int error; + + vm = arg; + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + vcpu = &vm->vcpu[vcpuid]; + state = vcpu_get_state(vm, vcpuid, NULL); + if (state == VCPU_RUNNING) { + /* + * When a vcpu is "running" the next instruction is determined + * by adding 'rip' and 'inst_length' in the vcpu's 'exitinfo'. + * Thus setting 'inst_length' to zero will cause the current + * instruction to be restarted. + */ + vcpu->exitinfo.inst_length = 0; + VCPU_CTR1(vm, vcpuid, "restarting instruction at %#lx by " + "setting inst_length to zero", vcpu->exitinfo.rip); + } else if (state == VCPU_FROZEN) { + /* + * When a vcpu is "frozen" it is outside the critical section + * around VMRUN() and 'nextrip' points to the next instruction. + * Thus instruction restart is achieved by setting 'nextrip' + * to the vcpu's %rip. + */ + error = vm_get_register(vm, vcpuid, VM_REG_GUEST_RIP, &rip); + KASSERT(!error, ("%s: error %d getting rip", __func__, error)); + VCPU_CTR2(vm, vcpuid, "restarting instruction by updating " + "nextrip from %#lx to %#lx", vcpu->nextrip, rip); + vcpu->nextrip = rip; + } else { + panic("%s: invalid state %d", __func__, state); + } + return (0); +} + +int +vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_t info) +{ + struct vcpu *vcpu; + int type, vector; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + vcpu = &vm->vcpu[vcpuid]; + + if (info & VM_INTINFO_VALID) { + type = info & VM_INTINFO_TYPE; + vector = info & 0xff; + if (type == VM_INTINFO_NMI && vector != IDT_NMI) + return (EINVAL); + if (type == VM_INTINFO_HWEXCEPTION && vector >= 32) + return (EINVAL); + if (info & VM_INTINFO_RSVD) + return (EINVAL); + } else { + info = 0; + } + VCPU_CTR2(vm, vcpuid, "%s: info1(%#lx)", __func__, info); + vcpu->exitintinfo = info; + return (0); +} + +enum exc_class { + EXC_BENIGN, + EXC_CONTRIBUTORY, + EXC_PAGEFAULT +}; + +#define IDT_VE 20 /* Virtualization Exception (Intel specific) */ + +static enum exc_class +exception_class(uint64_t info) +{ + int type, vector; + + KASSERT(info & VM_INTINFO_VALID, ("intinfo must be valid: %#lx", info)); + type = info & VM_INTINFO_TYPE; + vector = info & 0xff; + + /* Table 6-4, "Interrupt and Exception Classes", Intel SDM, Vol 3 */ + switch (type) { + case VM_INTINFO_HWINTR: + case VM_INTINFO_SWINTR: + case VM_INTINFO_NMI: + return (EXC_BENIGN); + default: + /* + * Hardware exception. + * + * SVM and VT-x use identical type values to represent NMI, + * hardware interrupt and software interrupt. + * + * SVM uses type '3' for all exceptions. VT-x uses type '3' + * for exceptions except #BP and #OF. #BP and #OF use a type + * value of '5' or '6'. Therefore we don't check for explicit + * values of 'type' to classify 'intinfo' into a hardware + * exception. + */ + break; + } + + switch (vector) { + case IDT_PF: + case IDT_VE: + return (EXC_PAGEFAULT); + case IDT_DE: + case IDT_TS: + case IDT_NP: + case IDT_SS: + case IDT_GP: + return (EXC_CONTRIBUTORY); + default: + return (EXC_BENIGN); + } +} + +static int +nested_fault(struct vm *vm, int vcpuid, uint64_t info1, uint64_t info2, + uint64_t *retinfo) +{ + enum exc_class exc1, exc2; + int type1, vector1; + + KASSERT(info1 & VM_INTINFO_VALID, ("info1 %#lx is not valid", info1)); + KASSERT(info2 & VM_INTINFO_VALID, ("info2 %#lx is not valid", info2)); + + /* + * If an exception occurs while attempting to call the double-fault + * handler the processor enters shutdown mode (aka triple fault). + */ + type1 = info1 & VM_INTINFO_TYPE; + vector1 = info1 & 0xff; + if (type1 == VM_INTINFO_HWEXCEPTION && vector1 == IDT_DF) { + VCPU_CTR2(vm, vcpuid, "triple fault: info1(%#lx), info2(%#lx)", + info1, info2); + vm_suspend(vm, VM_SUSPEND_TRIPLEFAULT); + *retinfo = 0; + return (0); + } + + /* + * Table 6-5 "Conditions for Generating a Double Fault", Intel SDM, Vol3 + */ + exc1 = exception_class(info1); + exc2 = exception_class(info2); + if ((exc1 == EXC_CONTRIBUTORY && exc2 == EXC_CONTRIBUTORY) || + (exc1 == EXC_PAGEFAULT && exc2 != EXC_BENIGN)) { + /* Convert nested fault into a double fault. */ + *retinfo = IDT_DF; + *retinfo |= VM_INTINFO_VALID | VM_INTINFO_HWEXCEPTION; + *retinfo |= VM_INTINFO_DEL_ERRCODE; + } else { + /* Handle exceptions serially */ + *retinfo = info2; + } + return (1); +} + +static uint64_t +vcpu_exception_intinfo(struct vcpu *vcpu) +{ + uint64_t info = 0; + + if (vcpu->exception_pending) { + info = vcpu->exc_vector & 0xff; + info |= VM_INTINFO_VALID | VM_INTINFO_HWEXCEPTION; + if (vcpu->exc_errcode_valid) { + info |= VM_INTINFO_DEL_ERRCODE; + info |= (uint64_t)vcpu->exc_errcode << 32; + } + } + return (info); +} + +int +vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *retinfo) +{ + struct vcpu *vcpu; + uint64_t info1, info2; + int valid; + + KASSERT(vcpuid >= 0 && vcpuid < VM_MAXCPU, ("invalid vcpu %d", vcpuid)); + + vcpu = &vm->vcpu[vcpuid]; + + info1 = vcpu->exitintinfo; + vcpu->exitintinfo = 0; + + info2 = 0; + if (vcpu->exception_pending) { + info2 = vcpu_exception_intinfo(vcpu); + vcpu->exception_pending = 0; + VCPU_CTR2(vm, vcpuid, "Exception %d delivered: %#lx", + vcpu->exc_vector, info2); + } + + if ((info1 & VM_INTINFO_VALID) && (info2 & VM_INTINFO_VALID)) { + valid = nested_fault(vm, vcpuid, info1, info2, retinfo); + } else if (info1 & VM_INTINFO_VALID) { + *retinfo = info1; + valid = 1; + } else if (info2 & VM_INTINFO_VALID) { + *retinfo = info2; + valid = 1; + } else { + valid = 0; + } + + if (valid) { + VCPU_CTR4(vm, vcpuid, "%s: info1(%#lx), info2(%#lx), " + "retinfo(%#lx)", __func__, info1, info2, *retinfo); + } + + return (valid); +} + +int +vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + vcpu = &vm->vcpu[vcpuid]; + *info1 = vcpu->exitintinfo; + *info2 = vcpu_exception_intinfo(vcpu); + return (0); +} + +int +vm_inject_exception(struct vm *vm, int vcpuid, int vector, int errcode_valid, + uint32_t errcode, int restart_instruction) +{ + struct vcpu *vcpu; + uint64_t regval; + int error; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + if (vector < 0 || vector >= 32) + return (EINVAL); + + /* + * A double fault exception should never be injected directly into + * the guest. It is a derived exception that results from specific + * combinations of nested faults. + */ + if (vector == IDT_DF) + return (EINVAL); + + vcpu = &vm->vcpu[vcpuid]; + + if (vcpu->exception_pending) { + VCPU_CTR2(vm, vcpuid, "Unable to inject exception %d due to " + "pending exception %d", vector, vcpu->exc_vector); + return (EBUSY); + } + + if (errcode_valid) { + /* + * Exceptions don't deliver an error code in real mode. + */ + error = vm_get_register(vm, vcpuid, VM_REG_GUEST_CR0, ®val); + KASSERT(!error, ("%s: error %d getting CR0", __func__, error)); + if (!(regval & CR0_PE)) + errcode_valid = 0; + } + + /* + * From section 26.6.1 "Interruptibility State" in Intel SDM: + * + * Event blocking by "STI" or "MOV SS" is cleared after guest executes + * one instruction or incurs an exception. + */ + error = vm_set_register(vm, vcpuid, VM_REG_GUEST_INTR_SHADOW, 0); + KASSERT(error == 0, ("%s: error %d clearing interrupt shadow", + __func__, error)); + + if (restart_instruction) + vm_restart_instruction(vm, vcpuid); + + vcpu->exception_pending = 1; + vcpu->exc_vector = vector; + vcpu->exc_errcode = errcode; + vcpu->exc_errcode_valid = errcode_valid; + VCPU_CTR1(vm, vcpuid, "Exception %d pending", vector); + return (0); +} + +void +vm_inject_fault(void *vmarg, int vcpuid, int vector, int errcode_valid, + int errcode) +{ + struct vm *vm; + int error, restart_instruction; + + vm = vmarg; + restart_instruction = 1; + + error = vm_inject_exception(vm, vcpuid, vector, errcode_valid, + errcode, restart_instruction); + KASSERT(error == 0, ("vm_inject_exception error %d", error)); +} + +void +vm_inject_pf(void *vmarg, int vcpuid, int error_code, uint64_t cr2) +{ + struct vm *vm; + int error; + + vm = vmarg; + VCPU_CTR2(vm, vcpuid, "Injecting page fault: error_code %#x, cr2 %#lx", + error_code, cr2); + + error = vm_set_register(vm, vcpuid, VM_REG_GUEST_CR2, cr2); + KASSERT(error == 0, ("vm_set_register(cr2) error %d", error)); + + vm_inject_fault(vm, vcpuid, IDT_PF, 1, error_code); +} + +static VMM_STAT(VCPU_NMI_COUNT, "number of NMIs delivered to vcpu"); + +int +vm_inject_nmi(struct vm *vm, int vcpuid) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + vcpu = &vm->vcpu[vcpuid]; + + vcpu->nmi_pending = 1; + vcpu_notify_event(vm, vcpuid, false); + return (0); +} + +int +vm_nmi_pending(struct vm *vm, int vcpuid) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + panic("vm_nmi_pending: invalid vcpuid %d", vcpuid); + + vcpu = &vm->vcpu[vcpuid]; + + return (vcpu->nmi_pending); +} + +void +vm_nmi_clear(struct vm *vm, int vcpuid) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + panic("vm_nmi_pending: invalid vcpuid %d", vcpuid); + + vcpu = &vm->vcpu[vcpuid]; + + if (vcpu->nmi_pending == 0) + panic("vm_nmi_clear: inconsistent nmi_pending state"); + + vcpu->nmi_pending = 0; + vmm_stat_incr(vm, vcpuid, VCPU_NMI_COUNT, 1); +} + +static VMM_STAT(VCPU_EXTINT_COUNT, "number of ExtINTs delivered to vcpu"); + +int +vm_inject_extint(struct vm *vm, int vcpuid) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + vcpu = &vm->vcpu[vcpuid]; + + vcpu->extint_pending = 1; + vcpu_notify_event(vm, vcpuid, false); + return (0); +} + +int +vm_extint_pending(struct vm *vm, int vcpuid) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + panic("vm_extint_pending: invalid vcpuid %d", vcpuid); + + vcpu = &vm->vcpu[vcpuid]; + + return (vcpu->extint_pending); +} + +void +vm_extint_clear(struct vm *vm, int vcpuid) +{ + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + panic("vm_extint_pending: invalid vcpuid %d", vcpuid); + + vcpu = &vm->vcpu[vcpuid]; + + if (vcpu->extint_pending == 0) + panic("vm_extint_clear: inconsistent extint_pending state"); + + vcpu->extint_pending = 0; + vmm_stat_incr(vm, vcpuid, VCPU_EXTINT_COUNT, 1); +} + +int +vm_get_capability(struct vm *vm, int vcpu, int type, int *retval) +{ + if (vcpu < 0 || vcpu >= VM_MAXCPU) + return (EINVAL); + + if (type < 0 || type >= VM_CAP_MAX) + return (EINVAL); + + return (VMGETCAP(vm->cookie, vcpu, type, retval)); +} + +int +vm_set_capability(struct vm *vm, int vcpu, int type, int val) +{ + if (vcpu < 0 || vcpu >= VM_MAXCPU) + return (EINVAL); + + if (type < 0 || type >= VM_CAP_MAX) + return (EINVAL); + + return (VMSETCAP(vm->cookie, vcpu, type, val)); +} + +struct vlapic * +vm_lapic(struct vm *vm, int cpu) +{ + return (vm->vcpu[cpu].vlapic); +} + +struct vioapic * +vm_ioapic(struct vm *vm) +{ + + return (vm->vioapic); +} + +struct vhpet * +vm_hpet(struct vm *vm) +{ + + return (vm->vhpet); +} + +boolean_t +vmm_is_pptdev(int bus, int slot, int func) +{ + int found, i, n; + int b, s, f; + char *val, *cp, *cp2; + + /* + * XXX + * The length of an environment variable is limited to 128 bytes which + * puts an upper limit on the number of passthru devices that may be + * specified using a single environment variable. + * + * Work around this by scanning multiple environment variable + * names instead of a single one - yuck! + */ + const char *names[] = { "pptdevs", "pptdevs2", "pptdevs3", NULL }; + + /* set pptdevs="1/2/3 4/5/6 7/8/9 10/11/12" */ + found = 0; + for (i = 0; names[i] != NULL && !found; i++) { + cp = val = kern_getenv(names[i]); + while (cp != NULL && *cp != '\0') { + if ((cp2 = strchr(cp, ' ')) != NULL) + *cp2 = '\0'; + + n = sscanf(cp, "%d/%d/%d", &b, &s, &f); + if (n == 3 && bus == b && slot == s && func == f) { + found = 1; + break; + } + + if (cp2 != NULL) + *cp2++ = ' '; + + cp = cp2; + } + freeenv(val); + } + return (found); +} + +void * +vm_iommu_domain(struct vm *vm) +{ + + return (vm->iommu); +} + +int +vcpu_set_state(struct vm *vm, int vcpuid, enum vcpu_state newstate, + bool from_idle) +{ + int error; + struct vcpu *vcpu; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + panic("vm_set_run_state: invalid vcpuid %d", vcpuid); + + vcpu = &vm->vcpu[vcpuid]; + + vcpu_lock(vcpu); + error = vcpu_set_state_locked(vm, vcpuid, newstate, from_idle); + vcpu_unlock(vcpu); + + return (error); +} + +enum vcpu_state +vcpu_get_state(struct vm *vm, int vcpuid, int *hostcpu) +{ + struct vcpu *vcpu; + enum vcpu_state state; + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + panic("vm_get_run_state: invalid vcpuid %d", vcpuid); + + vcpu = &vm->vcpu[vcpuid]; + + vcpu_lock(vcpu); + state = vcpu->state; + if (hostcpu != NULL) + *hostcpu = vcpu->hostcpu; + vcpu_unlock(vcpu); + + return (state); +} + +int +vm_activate_cpu(struct vm *vm, int vcpuid) +{ + + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + if (CPU_ISSET(vcpuid, &vm->active_cpus)) + return (EBUSY); + + VCPU_CTR0(vm, vcpuid, "activated"); + CPU_SET_ATOMIC(vcpuid, &vm->active_cpus); + return (0); +} + +cpuset_t +vm_active_cpus(struct vm *vm) +{ + + return (vm->active_cpus); +} + +cpuset_t +vm_suspended_cpus(struct vm *vm) +{ + + return (vm->suspended_cpus); +} + +void * +vcpu_stats(struct vm *vm, int vcpuid) +{ + + return (vm->vcpu[vcpuid].stats); +} + +int +vm_get_x2apic_state(struct vm *vm, int vcpuid, enum x2apic_state *state) +{ + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + *state = vm->vcpu[vcpuid].x2apic_state; + + return (0); +} + +int +vm_set_x2apic_state(struct vm *vm, int vcpuid, enum x2apic_state state) +{ + if (vcpuid < 0 || vcpuid >= VM_MAXCPU) + return (EINVAL); + + if (state >= X2APIC_STATE_LAST) + return (EINVAL); + + vm->vcpu[vcpuid].x2apic_state = state; + + vlapic_set_x2apic_state(vm, vcpuid, state); + + return (0); +} + +/* + * This function is called to ensure that a vcpu "sees" a pending event + * as soon as possible: + * - If the vcpu thread is sleeping then it is woken up. + * - If the vcpu is running on a different host_cpu then an IPI will be directed + * to the host_cpu to cause the vcpu to trap into the hypervisor. + */ +static void +vcpu_notify_event_locked(struct vcpu *vcpu, bool lapic_intr) +{ + int hostcpu; + + hostcpu = vcpu->hostcpu; + if (vcpu->state == VCPU_RUNNING) { + KASSERT(hostcpu != NOCPU, ("vcpu running on invalid hostcpu")); + if (hostcpu != curcpu) { + if (lapic_intr) { + vlapic_post_intr(vcpu->vlapic, hostcpu, + vmm_ipinum); + } else { + ipi_cpu(hostcpu, vmm_ipinum); + } + } else { + /* + * If the 'vcpu' is running on 'curcpu' then it must + * be sending a notification to itself (e.g. SELF_IPI). + * The pending event will be picked up when the vcpu + * transitions back to guest context. + */ + } + } else { + KASSERT(hostcpu == NOCPU, ("vcpu state %d not consistent " + "with hostcpu %d", vcpu->state, hostcpu)); + if (vcpu->state == VCPU_SLEEPING) + wakeup_one(vcpu); + } +} + +void +vcpu_notify_event(struct vm *vm, int vcpuid, bool lapic_intr) +{ + struct vcpu *vcpu = &vm->vcpu[vcpuid]; + + vcpu_lock(vcpu); + vcpu_notify_event_locked(vcpu, lapic_intr); + vcpu_unlock(vcpu); +} + +struct vmspace * +vm_get_vmspace(struct vm *vm) +{ + + return (vm->vmspace); +} + +int +vm_apicid2vcpuid(struct vm *vm, int apicid) +{ + /* + * XXX apic id is assumed to be numerically identical to vcpu id + */ + return (apicid); +} + +void +vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest, + vm_rendezvous_func_t func, void *arg) +{ + int i; + + /* + * Enforce that this function is called without any locks + */ + WITNESS_WARN(WARN_PANIC, NULL, "vm_smp_rendezvous"); + KASSERT(vcpuid == -1 || (vcpuid >= 0 && vcpuid < VM_MAXCPU), + ("vm_smp_rendezvous: invalid vcpuid %d", vcpuid)); + +restart: + mtx_lock(&vm->rendezvous_mtx); + if (vm->rendezvous_func != NULL) { + /* + * If a rendezvous is already in progress then we need to + * call the rendezvous handler in case this 'vcpuid' is one + * of the targets of the rendezvous. + */ + RENDEZVOUS_CTR0(vm, vcpuid, "Rendezvous already in progress"); + mtx_unlock(&vm->rendezvous_mtx); + vm_handle_rendezvous(vm, vcpuid); + goto restart; + } + KASSERT(vm->rendezvous_func == NULL, ("vm_smp_rendezvous: previous " + "rendezvous is still in progress")); + + RENDEZVOUS_CTR0(vm, vcpuid, "Initiating rendezvous"); + vm->rendezvous_req_cpus = dest; + CPU_ZERO(&vm->rendezvous_done_cpus); + vm->rendezvous_arg = arg; + vm_set_rendezvous_func(vm, func); + mtx_unlock(&vm->rendezvous_mtx); + + /* + * Wake up any sleeping vcpus and trigger a VM-exit in any running + * vcpus so they handle the rendezvous as soon as possible. + */ + for (i = 0; i < VM_MAXCPU; i++) { + if (CPU_ISSET(i, &dest)) + vcpu_notify_event(vm, i, false); + } + + vm_handle_rendezvous(vm, vcpuid); +} + +struct vatpic * +vm_atpic(struct vm *vm) +{ + return (vm->vatpic); +} + +struct vatpit * +vm_atpit(struct vm *vm) +{ + return (vm->vatpit); +} + +struct vpmtmr * +vm_pmtmr(struct vm *vm) +{ + + return (vm->vpmtmr); +} + +struct vrtc * +vm_rtc(struct vm *vm) +{ + + return (vm->vrtc); +} + +enum vm_reg_name +vm_segment_name(int seg) +{ + static enum vm_reg_name seg_names[] = { + VM_REG_GUEST_ES, + VM_REG_GUEST_CS, + VM_REG_GUEST_SS, + VM_REG_GUEST_DS, + VM_REG_GUEST_FS, + VM_REG_GUEST_GS + }; + + KASSERT(seg >= 0 && seg < nitems(seg_names), + ("%s: invalid segment encoding %d", __func__, seg)); + return (seg_names[seg]); +} + +void +vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + int num_copyinfo) +{ + int idx; + + for (idx = 0; idx < num_copyinfo; idx++) { + if (copyinfo[idx].cookie != NULL) + vm_gpa_release(copyinfo[idx].cookie); + } + bzero(copyinfo, num_copyinfo * sizeof(struct vm_copyinfo)); +} + +int +vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, + uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo, + int num_copyinfo, int *fault) +{ + int error, idx, nused; + size_t n, off, remaining; + void *hva, *cookie; + uint64_t gpa; + + bzero(copyinfo, sizeof(struct vm_copyinfo) * num_copyinfo); + + nused = 0; + remaining = len; + while (remaining > 0) { + KASSERT(nused < num_copyinfo, ("insufficient vm_copyinfo")); + error = vm_gla2gpa(vm, vcpuid, paging, gla, prot, &gpa, fault); + if (error || *fault) + return (error); + off = gpa & PAGE_MASK; + n = min(remaining, PAGE_SIZE - off); + copyinfo[nused].gpa = gpa; + copyinfo[nused].len = n; + remaining -= n; + gla += n; + nused++; + } + + for (idx = 0; idx < nused; idx++) { + hva = vm_gpa_hold(vm, vcpuid, copyinfo[idx].gpa, + copyinfo[idx].len, prot, &cookie); + if (hva == NULL) + break; + copyinfo[idx].hva = hva; + copyinfo[idx].cookie = cookie; + } + + if (idx != nused) { + vm_copy_teardown(vm, vcpuid, copyinfo, num_copyinfo); + return (EFAULT); + } else { + *fault = 0; + return (0); + } +} + +void +vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, void *kaddr, + size_t len) +{ + char *dst; + int idx; + + dst = kaddr; + idx = 0; + while (len > 0) { + bcopy(copyinfo[idx].hva, dst, copyinfo[idx].len); + len -= copyinfo[idx].len; + dst += copyinfo[idx].len; + idx++; + } +} + +void +vm_copyout(struct vm *vm, int vcpuid, const void *kaddr, + struct vm_copyinfo *copyinfo, size_t len) +{ + const char *src; + int idx; + + src = kaddr; + idx = 0; + while (len > 0) { + bcopy(src, copyinfo[idx].hva, copyinfo[idx].len); + len -= copyinfo[idx].len; + src += copyinfo[idx].len; + idx++; + } +} + +/* + * Return the amount of in-use and wired memory for the VM. Since + * these are global stats, only return the values with for vCPU 0 + */ +VMM_STAT_DECLARE(VMM_MEM_RESIDENT); +VMM_STAT_DECLARE(VMM_MEM_WIRED); + +static void +vm_get_rescnt(struct vm *vm, int vcpu, struct vmm_stat_type *stat) +{ + + if (vcpu == 0) { + vmm_stat_set(vm, vcpu, VMM_MEM_RESIDENT, + PAGE_SIZE * vmspace_resident_count(vm->vmspace)); + } +} + +static void +vm_get_wiredcnt(struct vm *vm, int vcpu, struct vmm_stat_type *stat) +{ + + if (vcpu == 0) { + vmm_stat_set(vm, vcpu, VMM_MEM_WIRED, + PAGE_SIZE * pmap_wired_count(vmspace_pmap(vm->vmspace))); + } +} + +VMM_STAT_FUNC(VMM_MEM_RESIDENT, "Resident memory", vm_get_rescnt); +VMM_STAT_FUNC(VMM_MEM_WIRED, "Wired memory", vm_get_wiredcnt); diff -u -r -N usr/src/sys/amd64/vmm/vmm_dev.c /usr/src/sys/amd64/vmm/vmm_dev.c --- usr/src/sys/amd64/vmm/vmm_dev.c 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_dev.c 2016-11-30 10:56:05.792745000 +0000 @@ -55,6 +55,7 @@ #include "vmm_lapic.h" #include "vmm_stat.h" #include "vmm_mem.h" +#include "vmm_ioport.h" #include "io/ppt.h" #include "io/vatpic.h" #include "io/vioapic.h" @@ -300,6 +301,8 @@ struct vm_pptdev_mmio *pptmmio; struct vm_pptdev_msi *pptmsi; struct vm_pptdev_msix *pptmsix; + struct vm_user_buf *usermem; + struct vm_io_reg_handler *ioregh; struct vm_nmi *vmnmi; struct vm_stats *vmstats; struct vm_stat_desc *statdesc; @@ -358,6 +361,7 @@ case VM_UNBIND_PPTDEV: case VM_ALLOC_MEMSEG: case VM_MMAP_MEMSEG: + case VM_MAP_USER_BUF: case VM_REINIT: /* * ioctls that operate on the entire virtual machine must @@ -433,6 +437,16 @@ pptmmio->func, pptmmio->gpa, pptmmio->len, pptmmio->hpa); break; + case VM_MAP_USER_BUF: + usermem = (struct vm_user_buf *)data; + error = vm_map_usermem(sc->vm, usermem->gpa, usermem->len, + usermem->addr, td); + break; + case VM_IO_REG_HANDLER: + ioregh = (struct vm_io_reg_handler *)data; + error = vmm_ioport_reg_handler(sc->vm, ioregh->port, ioregh->in, ioregh->mask_data, + ioregh->data, ioregh->type, ioregh->arg); + break; case VM_BIND_PPTDEV: pptdev = (struct vm_pptdev *)data; error = vm_assign_pptdev(sc->vm, pptdev->bus, pptdev->slot, diff -u -r -N usr/src/sys/amd64/vmm/vmm_dev.c.orig /usr/src/sys/amd64/vmm/vmm_dev.c.orig --- usr/src/sys/amd64/vmm/vmm_dev.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_dev.c.orig 2016-11-30 10:52:58.459016000 +0000 @@ -0,0 +1,983 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/vmm/vmm_dev.c 285218 2015-07-06 19:41:43Z neel $ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD: releng/11.0/sys/amd64/vmm/vmm_dev.c 285218 2015-07-06 19:41:43Z neel $"); + +#include <sys/param.h> +#include <sys/kernel.h> +#include <sys/queue.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/malloc.h> +#include <sys/conf.h> +#include <sys/sysctl.h> +#include <sys/libkern.h> +#include <sys/ioccom.h> +#include <sys/mman.h> +#include <sys/uio.h> + +#include <vm/vm.h> +#include <vm/pmap.h> +#include <vm/vm_map.h> +#include <vm/vm_object.h> + +#include <machine/vmparam.h> +#include <machine/vmm.h> +#include <machine/vmm_instruction_emul.h> +#include <machine/vmm_dev.h> + +#include "vmm_lapic.h" +#include "vmm_stat.h" +#include "vmm_mem.h" +#include "io/ppt.h" +#include "io/vatpic.h" +#include "io/vioapic.h" +#include "io/vhpet.h" +#include "io/vrtc.h" + +struct devmem_softc { + int segid; + char *name; + struct cdev *cdev; + struct vmmdev_softc *sc; + SLIST_ENTRY(devmem_softc) link; +}; + +struct vmmdev_softc { + struct vm *vm; /* vm instance cookie */ + struct cdev *cdev; + SLIST_ENTRY(vmmdev_softc) link; + SLIST_HEAD(, devmem_softc) devmem; + int flags; +}; +#define VSC_LINKED 0x01 + +static SLIST_HEAD(, vmmdev_softc) head; + +static struct mtx vmmdev_mtx; + +static MALLOC_DEFINE(M_VMMDEV, "vmmdev", "vmmdev"); + +SYSCTL_DECL(_hw_vmm); + +static int devmem_create_cdev(const char *vmname, int id, char *devmem); +static void devmem_destroy(void *arg); + +static int +vcpu_lock_one(struct vmmdev_softc *sc, int vcpu) +{ + int error; + + if (vcpu < 0 || vcpu >= VM_MAXCPU) + return (EINVAL); + + error = vcpu_set_state(sc->vm, vcpu, VCPU_FROZEN, true); + return (error); +} + +static void +vcpu_unlock_one(struct vmmdev_softc *sc, int vcpu) +{ + enum vcpu_state state; + + state = vcpu_get_state(sc->vm, vcpu, NULL); + if (state != VCPU_FROZEN) { + panic("vcpu %s(%d) has invalid state %d", vm_name(sc->vm), + vcpu, state); + } + + vcpu_set_state(sc->vm, vcpu, VCPU_IDLE, false); +} + +static int +vcpu_lock_all(struct vmmdev_softc *sc) +{ + int error, vcpu; + + for (vcpu = 0; vcpu < VM_MAXCPU; vcpu++) { + error = vcpu_lock_one(sc, vcpu); + if (error) + break; + } + + if (error) { + while (--vcpu >= 0) + vcpu_unlock_one(sc, vcpu); + } + + return (error); +} + +static void +vcpu_unlock_all(struct vmmdev_softc *sc) +{ + int vcpu; + + for (vcpu = 0; vcpu < VM_MAXCPU; vcpu++) + vcpu_unlock_one(sc, vcpu); +} + +static struct vmmdev_softc * +vmmdev_lookup(const char *name) +{ + struct vmmdev_softc *sc; + +#ifdef notyet /* XXX kernel is not compiled with invariants */ + mtx_assert(&vmmdev_mtx, MA_OWNED); +#endif + + SLIST_FOREACH(sc, &head, link) { + if (strcmp(name, vm_name(sc->vm)) == 0) + break; + } + + return (sc); +} + +static struct vmmdev_softc * +vmmdev_lookup2(struct cdev *cdev) +{ + + return (cdev->si_drv1); +} + +static int +vmmdev_rw(struct cdev *cdev, struct uio *uio, int flags) +{ + int error, off, c, prot; + vm_paddr_t gpa; + void *hpa, *cookie; + struct vmmdev_softc *sc; + + sc = vmmdev_lookup2(cdev); + if (sc == NULL) + return (ENXIO); + + /* + * Get a read lock on the guest memory map by freezing any vcpu. + */ + error = vcpu_lock_one(sc, VM_MAXCPU - 1); + if (error) + return (error); + + prot = (uio->uio_rw == UIO_WRITE ? VM_PROT_WRITE : VM_PROT_READ); + while (uio->uio_resid > 0 && error == 0) { + gpa = uio->uio_offset; + off = gpa & PAGE_MASK; + c = min(uio->uio_resid, PAGE_SIZE - off); + + /* + * The VM has a hole in its physical memory map. If we want to + * use 'dd' to inspect memory beyond the hole we need to + * provide bogus data for memory that lies in the hole. + * + * Since this device does not support lseek(2), dd(1) will + * read(2) blocks of data to simulate the lseek(2). + */ + hpa = vm_gpa_hold(sc->vm, VM_MAXCPU - 1, gpa, c, prot, &cookie); + if (hpa == NULL) { + if (uio->uio_rw == UIO_READ) + error = uiomove(__DECONST(void *, zero_region), + c, uio); + else + error = EFAULT; + } else { + error = uiomove(hpa, c, uio); + vm_gpa_release(cookie); + } + } + vcpu_unlock_one(sc, VM_MAXCPU - 1); + return (error); +} + +CTASSERT(sizeof(((struct vm_memseg *)0)->name) >= SPECNAMELEN + 1); + +static int +get_memseg(struct vmmdev_softc *sc, struct vm_memseg *mseg) +{ + struct devmem_softc *dsc; + int error; + bool sysmem; + + error = vm_get_memseg(sc->vm, mseg->segid, &mseg->len, &sysmem, NULL); + if (error || mseg->len == 0) + return (error); + + if (!sysmem) { + SLIST_FOREACH(dsc, &sc->devmem, link) { + if (dsc->segid == mseg->segid) + break; + } + KASSERT(dsc != NULL, ("%s: devmem segment %d not found", + __func__, mseg->segid)); + error = copystr(dsc->name, mseg->name, SPECNAMELEN + 1, NULL); + } else { + bzero(mseg->name, sizeof(mseg->name)); + } + + return (error); +} + +static int +alloc_memseg(struct vmmdev_softc *sc, struct vm_memseg *mseg) +{ + char *name; + int error; + bool sysmem; + + error = 0; + name = NULL; + sysmem = true; + + if (VM_MEMSEG_NAME(mseg)) { + sysmem = false; + name = malloc(SPECNAMELEN + 1, M_VMMDEV, M_WAITOK); + error = copystr(VM_MEMSEG_NAME(mseg), name, SPECNAMELEN + 1, 0); + if (error) + goto done; + } + + error = vm_alloc_memseg(sc->vm, mseg->segid, mseg->len, sysmem); + if (error) + goto done; + + if (VM_MEMSEG_NAME(mseg)) { + error = devmem_create_cdev(vm_name(sc->vm), mseg->segid, name); + if (error) + vm_free_memseg(sc->vm, mseg->segid); + else + name = NULL; /* freed when 'cdev' is destroyed */ + } +done: + free(name, M_VMMDEV); + return (error); +} + +static int +vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag, + struct thread *td) +{ + int error, vcpu, state_changed, size; + cpuset_t *cpuset; + struct vmmdev_softc *sc; + struct vm_register *vmreg; + struct vm_seg_desc *vmsegdesc; + struct vm_run *vmrun; + struct vm_exception *vmexc; + struct vm_lapic_irq *vmirq; + struct vm_lapic_msi *vmmsi; + struct vm_ioapic_irq *ioapic_irq; + struct vm_isa_irq *isa_irq; + struct vm_isa_irq_trigger *isa_irq_trigger; + struct vm_capability *vmcap; + struct vm_pptdev *pptdev; + struct vm_pptdev_mmio *pptmmio; + struct vm_pptdev_msi *pptmsi; + struct vm_pptdev_msix *pptmsix; + struct vm_nmi *vmnmi; + struct vm_stats *vmstats; + struct vm_stat_desc *statdesc; + struct vm_x2apic *x2apic; + struct vm_gpa_pte *gpapte; + struct vm_suspend *vmsuspend; + struct vm_gla2gpa *gg; + struct vm_activate_cpu *vac; + struct vm_cpuset *vm_cpuset; + struct vm_intinfo *vmii; + struct vm_rtc_time *rtctime; + struct vm_rtc_data *rtcdata; + struct vm_memmap *mm; + + sc = vmmdev_lookup2(cdev); + if (sc == NULL) + return (ENXIO); + + error = 0; + vcpu = -1; + state_changed = 0; + + /* + * Some VMM ioctls can operate only on vcpus that are not running. + */ + switch (cmd) { + case VM_RUN: + case VM_GET_REGISTER: + case VM_SET_REGISTER: + case VM_GET_SEGMENT_DESCRIPTOR: + case VM_SET_SEGMENT_DESCRIPTOR: + case VM_INJECT_EXCEPTION: + case VM_GET_CAPABILITY: + case VM_SET_CAPABILITY: + case VM_PPTDEV_MSI: + case VM_PPTDEV_MSIX: + case VM_SET_X2APIC_STATE: + case VM_GLA2GPA: + case VM_ACTIVATE_CPU: + case VM_SET_INTINFO: + case VM_GET_INTINFO: + case VM_RESTART_INSTRUCTION: + /* + * XXX fragile, handle with care + * Assumes that the first field of the ioctl data is the vcpu. + */ + vcpu = *(int *)data; + error = vcpu_lock_one(sc, vcpu); + if (error) + goto done; + state_changed = 1; + break; + + case VM_MAP_PPTDEV_MMIO: + case VM_BIND_PPTDEV: + case VM_UNBIND_PPTDEV: + case VM_ALLOC_MEMSEG: + case VM_MMAP_MEMSEG: + case VM_REINIT: + /* + * ioctls that operate on the entire virtual machine must + * prevent all vcpus from running. + */ + error = vcpu_lock_all(sc); + if (error) + goto done; + state_changed = 2; + break; + + case VM_GET_MEMSEG: + case VM_MMAP_GETNEXT: + /* + * Lock a vcpu to make sure that the memory map cannot be + * modified while it is being inspected. + */ + vcpu = VM_MAXCPU - 1; + error = vcpu_lock_one(sc, vcpu); + if (error) + goto done; + state_changed = 1; + break; + + default: + break; + } + + switch(cmd) { + case VM_RUN: + vmrun = (struct vm_run *)data; + error = vm_run(sc->vm, vmrun); + break; + case VM_SUSPEND: + vmsuspend = (struct vm_suspend *)data; + error = vm_suspend(sc->vm, vmsuspend->how); + break; + case VM_REINIT: + error = vm_reinit(sc->vm); + break; + case VM_STAT_DESC: { + statdesc = (struct vm_stat_desc *)data; + error = vmm_stat_desc_copy(statdesc->index, + statdesc->desc, sizeof(statdesc->desc)); + break; + } + case VM_STATS: { + CTASSERT(MAX_VM_STATS >= MAX_VMM_STAT_ELEMS); + vmstats = (struct vm_stats *)data; + getmicrotime(&vmstats->tv); + error = vmm_stat_copy(sc->vm, vmstats->cpuid, + &vmstats->num_entries, vmstats->statbuf); + break; + } + case VM_PPTDEV_MSI: + pptmsi = (struct vm_pptdev_msi *)data; + error = ppt_setup_msi(sc->vm, pptmsi->vcpu, + pptmsi->bus, pptmsi->slot, pptmsi->func, + pptmsi->addr, pptmsi->msg, + pptmsi->numvec); + break; + case VM_PPTDEV_MSIX: + pptmsix = (struct vm_pptdev_msix *)data; + error = ppt_setup_msix(sc->vm, pptmsix->vcpu, + pptmsix->bus, pptmsix->slot, + pptmsix->func, pptmsix->idx, + pptmsix->addr, pptmsix->msg, + pptmsix->vector_control); + break; + case VM_MAP_PPTDEV_MMIO: + pptmmio = (struct vm_pptdev_mmio *)data; + error = ppt_map_mmio(sc->vm, pptmmio->bus, pptmmio->slot, + pptmmio->func, pptmmio->gpa, pptmmio->len, + pptmmio->hpa); + break; + case VM_BIND_PPTDEV: + pptdev = (struct vm_pptdev *)data; + error = vm_assign_pptdev(sc->vm, pptdev->bus, pptdev->slot, + pptdev->func); + break; + case VM_UNBIND_PPTDEV: + pptdev = (struct vm_pptdev *)data; + error = vm_unassign_pptdev(sc->vm, pptdev->bus, pptdev->slot, + pptdev->func); + break; + case VM_INJECT_EXCEPTION: + vmexc = (struct vm_exception *)data; + error = vm_inject_exception(sc->vm, vmexc->cpuid, + vmexc->vector, vmexc->error_code_valid, vmexc->error_code, + vmexc->restart_instruction); + break; + case VM_INJECT_NMI: + vmnmi = (struct vm_nmi *)data; + error = vm_inject_nmi(sc->vm, vmnmi->cpuid); + break; + case VM_LAPIC_IRQ: + vmirq = (struct vm_lapic_irq *)data; + error = lapic_intr_edge(sc->vm, vmirq->cpuid, vmirq->vector); + break; + case VM_LAPIC_LOCAL_IRQ: + vmirq = (struct vm_lapic_irq *)data; + error = lapic_set_local_intr(sc->vm, vmirq->cpuid, + vmirq->vector); + break; + case VM_LAPIC_MSI: + vmmsi = (struct vm_lapic_msi *)data; + error = lapic_intr_msi(sc->vm, vmmsi->addr, vmmsi->msg); + break; + case VM_IOAPIC_ASSERT_IRQ: + ioapic_irq = (struct vm_ioapic_irq *)data; + error = vioapic_assert_irq(sc->vm, ioapic_irq->irq); + break; + case VM_IOAPIC_DEASSERT_IRQ: + ioapic_irq = (struct vm_ioapic_irq *)data; + error = vioapic_deassert_irq(sc->vm, ioapic_irq->irq); + break; + case VM_IOAPIC_PULSE_IRQ: + ioapic_irq = (struct vm_ioapic_irq *)data; + error = vioapic_pulse_irq(sc->vm, ioapic_irq->irq); + break; + case VM_IOAPIC_PINCOUNT: + *(int *)data = vioapic_pincount(sc->vm); + break; + case VM_ISA_ASSERT_IRQ: + isa_irq = (struct vm_isa_irq *)data; + error = vatpic_assert_irq(sc->vm, isa_irq->atpic_irq); + if (error == 0 && isa_irq->ioapic_irq != -1) + error = vioapic_assert_irq(sc->vm, + isa_irq->ioapic_irq); + break; + case VM_ISA_DEASSERT_IRQ: + isa_irq = (struct vm_isa_irq *)data; + error = vatpic_deassert_irq(sc->vm, isa_irq->atpic_irq); + if (error == 0 && isa_irq->ioapic_irq != -1) + error = vioapic_deassert_irq(sc->vm, + isa_irq->ioapic_irq); + break; + case VM_ISA_PULSE_IRQ: + isa_irq = (struct vm_isa_irq *)data; + error = vatpic_pulse_irq(sc->vm, isa_irq->atpic_irq); + if (error == 0 && isa_irq->ioapic_irq != -1) + error = vioapic_pulse_irq(sc->vm, isa_irq->ioapic_irq); + break; + case VM_ISA_SET_IRQ_TRIGGER: + isa_irq_trigger = (struct vm_isa_irq_trigger *)data; + error = vatpic_set_irq_trigger(sc->vm, + isa_irq_trigger->atpic_irq, isa_irq_trigger->trigger); + break; + case VM_MMAP_GETNEXT: + mm = (struct vm_memmap *)data; + error = vm_mmap_getnext(sc->vm, &mm->gpa, &mm->segid, + &mm->segoff, &mm->len, &mm->prot, &mm->flags); + break; + case VM_MMAP_MEMSEG: + mm = (struct vm_memmap *)data; + error = vm_mmap_memseg(sc->vm, mm->gpa, mm->segid, mm->segoff, + mm->len, mm->prot, mm->flags); + break; + case VM_ALLOC_MEMSEG: + error = alloc_memseg(sc, (struct vm_memseg *)data); + break; + case VM_GET_MEMSEG: + error = get_memseg(sc, (struct vm_memseg *)data); + break; + case VM_GET_REGISTER: + vmreg = (struct vm_register *)data; + error = vm_get_register(sc->vm, vmreg->cpuid, vmreg->regnum, + &vmreg->regval); + break; + case VM_SET_REGISTER: + vmreg = (struct vm_register *)data; + error = vm_set_register(sc->vm, vmreg->cpuid, vmreg->regnum, + vmreg->regval); + break; + case VM_SET_SEGMENT_DESCRIPTOR: + vmsegdesc = (struct vm_seg_desc *)data; + error = vm_set_seg_desc(sc->vm, vmsegdesc->cpuid, + vmsegdesc->regnum, + &vmsegdesc->desc); + break; + case VM_GET_SEGMENT_DESCRIPTOR: + vmsegdesc = (struct vm_seg_desc *)data; + error = vm_get_seg_desc(sc->vm, vmsegdesc->cpuid, + vmsegdesc->regnum, + &vmsegdesc->desc); + break; + case VM_GET_CAPABILITY: + vmcap = (struct vm_capability *)data; + error = vm_get_capability(sc->vm, vmcap->cpuid, + vmcap->captype, + &vmcap->capval); + break; + case VM_SET_CAPABILITY: + vmcap = (struct vm_capability *)data; + error = vm_set_capability(sc->vm, vmcap->cpuid, + vmcap->captype, + vmcap->capval); + break; + case VM_SET_X2APIC_STATE: + x2apic = (struct vm_x2apic *)data; + error = vm_set_x2apic_state(sc->vm, + x2apic->cpuid, x2apic->state); + break; + case VM_GET_X2APIC_STATE: + x2apic = (struct vm_x2apic *)data; + error = vm_get_x2apic_state(sc->vm, + x2apic->cpuid, &x2apic->state); + break; + case VM_GET_GPA_PMAP: + gpapte = (struct vm_gpa_pte *)data; + pmap_get_mapping(vmspace_pmap(vm_get_vmspace(sc->vm)), + gpapte->gpa, gpapte->pte, &gpapte->ptenum); + error = 0; + break; + case VM_GET_HPET_CAPABILITIES: + error = vhpet_getcap((struct vm_hpet_cap *)data); + break; + case VM_GLA2GPA: { + CTASSERT(PROT_READ == VM_PROT_READ); + CTASSERT(PROT_WRITE == VM_PROT_WRITE); + CTASSERT(PROT_EXEC == VM_PROT_EXECUTE); + gg = (struct vm_gla2gpa *)data; + error = vm_gla2gpa(sc->vm, gg->vcpuid, &gg->paging, gg->gla, + gg->prot, &gg->gpa, &gg->fault); + KASSERT(error == 0 || error == EFAULT, + ("%s: vm_gla2gpa unknown error %d", __func__, error)); + break; + } + case VM_ACTIVATE_CPU: + vac = (struct vm_activate_cpu *)data; + error = vm_activate_cpu(sc->vm, vac->vcpuid); + break; + case VM_GET_CPUS: + error = 0; + vm_cpuset = (struct vm_cpuset *)data; + size = vm_cpuset->cpusetsize; + if (size < sizeof(cpuset_t) || size > CPU_MAXSIZE / NBBY) { + error = ERANGE; + break; + } + cpuset = malloc(size, M_TEMP, M_WAITOK | M_ZERO); + if (vm_cpuset->which == VM_ACTIVE_CPUS) + *cpuset = vm_active_cpus(sc->vm); + else if (vm_cpuset->which == VM_SUSPENDED_CPUS) + *cpuset = vm_suspended_cpus(sc->vm); + else + error = EINVAL; + if (error == 0) + error = copyout(cpuset, vm_cpuset->cpus, size); + free(cpuset, M_TEMP); + break; + case VM_SET_INTINFO: + vmii = (struct vm_intinfo *)data; + error = vm_exit_intinfo(sc->vm, vmii->vcpuid, vmii->info1); + break; + case VM_GET_INTINFO: + vmii = (struct vm_intinfo *)data; + error = vm_get_intinfo(sc->vm, vmii->vcpuid, &vmii->info1, + &vmii->info2); + break; + case VM_RTC_WRITE: + rtcdata = (struct vm_rtc_data *)data; + error = vrtc_nvram_write(sc->vm, rtcdata->offset, + rtcdata->value); + break; + case VM_RTC_READ: + rtcdata = (struct vm_rtc_data *)data; + error = vrtc_nvram_read(sc->vm, rtcdata->offset, + &rtcdata->value); + break; + case VM_RTC_SETTIME: + rtctime = (struct vm_rtc_time *)data; + error = vrtc_set_time(sc->vm, rtctime->secs); + break; + case VM_RTC_GETTIME: + error = 0; + rtctime = (struct vm_rtc_time *)data; + rtctime->secs = vrtc_get_time(sc->vm); + break; + case VM_RESTART_INSTRUCTION: + error = vm_restart_instruction(sc->vm, vcpu); + break; + default: + error = ENOTTY; + break; + } + + if (state_changed == 1) + vcpu_unlock_one(sc, vcpu); + else if (state_changed == 2) + vcpu_unlock_all(sc); + +done: + /* Make sure that no handler returns a bogus value like ERESTART */ + KASSERT(error >= 0, ("vmmdev_ioctl: invalid error return %d", error)); + return (error); +} + +static int +vmmdev_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t mapsize, + struct vm_object **objp, int nprot) +{ + struct vmmdev_softc *sc; + vm_paddr_t gpa; + size_t len; + vm_ooffset_t segoff, first, last; + int error, found, segid; + bool sysmem; + + first = *offset; + last = first + mapsize; + if ((nprot & PROT_EXEC) || first < 0 || first >= last) + return (EINVAL); + + sc = vmmdev_lookup2(cdev); + if (sc == NULL) { + /* virtual machine is in the process of being created */ + return (EINVAL); + } + + /* + * Get a read lock on the guest memory map by freezing any vcpu. + */ + error = vcpu_lock_one(sc, VM_MAXCPU - 1); + if (error) + return (error); + + gpa = 0; + found = 0; + while (!found) { + error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len, + NULL, NULL); + if (error) + break; + + if (first >= gpa && last <= gpa + len) + found = 1; + else + gpa += len; + } + + if (found) { + error = vm_get_memseg(sc->vm, segid, &len, &sysmem, objp); + KASSERT(error == 0 && *objp != NULL, + ("%s: invalid memory segment %d", __func__, segid)); + if (sysmem) { + vm_object_reference(*objp); + *offset = segoff + (first - gpa); + } else { + error = EINVAL; + } + } + vcpu_unlock_one(sc, VM_MAXCPU - 1); + return (error); +} + +static void +vmmdev_destroy(void *arg) +{ + struct vmmdev_softc *sc = arg; + struct devmem_softc *dsc; + int error; + + error = vcpu_lock_all(sc); + KASSERT(error == 0, ("%s: error %d freezing vcpus", __func__, error)); + + while ((dsc = SLIST_FIRST(&sc->devmem)) != NULL) { + KASSERT(dsc->cdev == NULL, ("%s: devmem not free", __func__)); + SLIST_REMOVE_HEAD(&sc->devmem, link); + free(dsc->name, M_VMMDEV); + free(dsc, M_VMMDEV); + } + + if (sc->cdev != NULL) + destroy_dev(sc->cdev); + + if (sc->vm != NULL) + vm_destroy(sc->vm); + + if ((sc->flags & VSC_LINKED) != 0) { + mtx_lock(&vmmdev_mtx); + SLIST_REMOVE(&head, sc, vmmdev_softc, link); + mtx_unlock(&vmmdev_mtx); + } + + free(sc, M_VMMDEV); +} + +static int +sysctl_vmm_destroy(SYSCTL_HANDLER_ARGS) +{ + int error; + char buf[VM_MAX_NAMELEN]; + struct devmem_softc *dsc; + struct vmmdev_softc *sc; + struct cdev *cdev; + + strlcpy(buf, "beavis", sizeof(buf)); + error = sysctl_handle_string(oidp, buf, sizeof(buf), req); + if (error != 0 || req->newptr == NULL) + return (error); + + mtx_lock(&vmmdev_mtx); + sc = vmmdev_lookup(buf); + if (sc == NULL || sc->cdev == NULL) { + mtx_unlock(&vmmdev_mtx); + return (EINVAL); + } + + /* + * The 'cdev' will be destroyed asynchronously when 'si_threadcount' + * goes down to 0 so we should not do it again in the callback. + * + * Setting 'sc->cdev' to NULL is also used to indicate that the VM + * is scheduled for destruction. + */ + cdev = sc->cdev; + sc->cdev = NULL; + mtx_unlock(&vmmdev_mtx); + + /* + * Schedule all cdevs to be destroyed: + * + * - any new operations on the 'cdev' will return an error (ENXIO). + * + * - when the 'si_threadcount' dwindles down to zero the 'cdev' will + * be destroyed and the callback will be invoked in a taskqueue + * context. + * + * - the 'devmem' cdevs are destroyed before the virtual machine 'cdev' + */ + SLIST_FOREACH(dsc, &sc->devmem, link) { + KASSERT(dsc->cdev != NULL, ("devmem cdev already destroyed")); + destroy_dev_sched_cb(dsc->cdev, devmem_destroy, dsc); + } + destroy_dev_sched_cb(cdev, vmmdev_destroy, sc); + return (0); +} +SYSCTL_PROC(_hw_vmm, OID_AUTO, destroy, CTLTYPE_STRING | CTLFLAG_RW, + NULL, 0, sysctl_vmm_destroy, "A", NULL); + +static struct cdevsw vmmdevsw = { + .d_name = "vmmdev", + .d_version = D_VERSION, + .d_ioctl = vmmdev_ioctl, + .d_mmap_single = vmmdev_mmap_single, + .d_read = vmmdev_rw, + .d_write = vmmdev_rw, +}; + +static int +sysctl_vmm_create(SYSCTL_HANDLER_ARGS) +{ + int error; + struct vm *vm; + struct cdev *cdev; + struct vmmdev_softc *sc, *sc2; + char buf[VM_MAX_NAMELEN]; + + strlcpy(buf, "beavis", sizeof(buf)); + error = sysctl_handle_string(oidp, buf, sizeof(buf), req); + if (error != 0 || req->newptr == NULL) + return (error); + + mtx_lock(&vmmdev_mtx); + sc = vmmdev_lookup(buf); + mtx_unlock(&vmmdev_mtx); + if (sc != NULL) + return (EEXIST); + + error = vm_create(buf, &vm); + if (error != 0) + return (error); + + sc = malloc(sizeof(struct vmmdev_softc), M_VMMDEV, M_WAITOK | M_ZERO); + sc->vm = vm; + SLIST_INIT(&sc->devmem); + + /* + * Lookup the name again just in case somebody sneaked in when we + * dropped the lock. + */ + mtx_lock(&vmmdev_mtx); + sc2 = vmmdev_lookup(buf); + if (sc2 == NULL) { + SLIST_INSERT_HEAD(&head, sc, link); + sc->flags |= VSC_LINKED; + } + mtx_unlock(&vmmdev_mtx); + + if (sc2 != NULL) { + vmmdev_destroy(sc); + return (EEXIST); + } + + error = make_dev_p(MAKEDEV_CHECKNAME, &cdev, &vmmdevsw, NULL, + UID_ROOT, GID_WHEEL, 0600, "vmm/%s", buf); + if (error != 0) { + vmmdev_destroy(sc); + return (error); + } + + mtx_lock(&vmmdev_mtx); + sc->cdev = cdev; + sc->cdev->si_drv1 = sc; + mtx_unlock(&vmmdev_mtx); + + return (0); +} +SYSCTL_PROC(_hw_vmm, OID_AUTO, create, CTLTYPE_STRING | CTLFLAG_RW, + NULL, 0, sysctl_vmm_create, "A", NULL); + +void +vmmdev_init(void) +{ + mtx_init(&vmmdev_mtx, "vmm device mutex", NULL, MTX_DEF); +} + +int +vmmdev_cleanup(void) +{ + int error; + + if (SLIST_EMPTY(&head)) + error = 0; + else + error = EBUSY; + + return (error); +} + +static int +devmem_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t len, + struct vm_object **objp, int nprot) +{ + struct devmem_softc *dsc; + vm_ooffset_t first, last; + size_t seglen; + int error; + bool sysmem; + + dsc = cdev->si_drv1; + if (dsc == NULL) { + /* 'cdev' has been created but is not ready for use */ + return (ENXIO); + } + + first = *offset; + last = *offset + len; + if ((nprot & PROT_EXEC) || first < 0 || first >= last) + return (EINVAL); + + error = vcpu_lock_one(dsc->sc, VM_MAXCPU - 1); + if (error) + return (error); + + error = vm_get_memseg(dsc->sc->vm, dsc->segid, &seglen, &sysmem, objp); + KASSERT(error == 0 && !sysmem && *objp != NULL, + ("%s: invalid devmem segment %d", __func__, dsc->segid)); + + vcpu_unlock_one(dsc->sc, VM_MAXCPU - 1); + + if (seglen >= last) { + vm_object_reference(*objp); + return (0); + } else { + return (EINVAL); + } +} + +static struct cdevsw devmemsw = { + .d_name = "devmem", + .d_version = D_VERSION, + .d_mmap_single = devmem_mmap_single, +}; + +static int +devmem_create_cdev(const char *vmname, int segid, char *devname) +{ + struct devmem_softc *dsc; + struct vmmdev_softc *sc; + struct cdev *cdev; + int error; + + error = make_dev_p(MAKEDEV_CHECKNAME, &cdev, &devmemsw, NULL, + UID_ROOT, GID_WHEEL, 0600, "vmm.io/%s.%s", vmname, devname); + if (error) + return (error); + + dsc = malloc(sizeof(struct devmem_softc), M_VMMDEV, M_WAITOK | M_ZERO); + + mtx_lock(&vmmdev_mtx); + sc = vmmdev_lookup(vmname); + KASSERT(sc != NULL, ("%s: vm %s softc not found", __func__, vmname)); + if (sc->cdev == NULL) { + /* virtual machine is being created or destroyed */ + mtx_unlock(&vmmdev_mtx); + free(dsc, M_VMMDEV); + destroy_dev_sched_cb(cdev, NULL, 0); + return (ENODEV); + } + + dsc->segid = segid; + dsc->name = devname; + dsc->cdev = cdev; + dsc->sc = sc; + SLIST_INSERT_HEAD(&sc->devmem, dsc, link); + mtx_unlock(&vmmdev_mtx); + + /* The 'cdev' is ready for use after 'si_drv1' is initialized */ + cdev->si_drv1 = dsc; + return (0); +} + +static void +devmem_destroy(void *arg) +{ + struct devmem_softc *dsc = arg; + + KASSERT(dsc->cdev, ("%s: devmem cdev already destroyed", __func__)); + dsc->cdev = NULL; + dsc->sc = NULL; +} diff -u -r -N usr/src/sys/amd64/vmm/vmm_ioport.c /usr/src/sys/amd64/vmm/vmm_ioport.c --- usr/src/sys/amd64/vmm/vmm_ioport.c 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_ioport.c 2016-11-30 10:56:05.794563000 +0000 @@ -97,31 +97,274 @@ } #endif /* KTR */ +#ifdef VMM_IOPORT_REG_HANDLER +#include <sys/kernel.h> +#include <sys/param.h> +#include <sys/lock.h> +#include <sys/sx.h> +#include <sys/malloc.h> +#include <sys/systm.h> + +static MALLOC_DEFINE(M_IOREGH, "ioregh", "bhyve ioport reg handlers"); + +#define IOPORT_MAX_REG_HANDLER 16 + +/* + * ioport_reg_handler functions allows us to to catch VM write/read + * on specific I/O address and send notification. + * + * When the VM writes or reads a specific value on I/O address, if the address + * and the value matches with the info stored durign the handler registration, + * then we send a notification (we can have multiple type of notification, + * but for now is implemented only the VM_IO_REGH_KWEVENTS handler. + */ + +typedef int (*ioport_reg_handler_func_t)(struct vm *vm, + struct ioport_reg_handler *regh, uint32_t *val); + +struct ioport_reg_handler { + uint16_t port; /* I/O address */ + uint16_t in; /* 0 out, 1 in */ + uint32_t mask_data; /* 0 means match anything */ + uint32_t data; /* data to match */ + ioport_reg_handler_func_t handler; /* handler pointer */ + void *handler_arg; /* handler argument */ +}; + +struct ioregh { + struct sx lock; + /* TODO: use hash table */ + struct ioport_reg_handler handlers[IOPORT_MAX_REG_HANDLER]; +}; + +/* ----- I/O reg handlers ----- */ + +/* + * VM_IO_REGH_KWEVENTS handler + * + * wakeup() on specified address that uniquely identifies the event + * + */ +static int +vmm_ioport_reg_wakeup(struct vm *vm, struct ioport_reg_handler *regh, uint32_t *val) +{ + wakeup(regh->handler_arg); + return (0); +} + +/* + * TODO: + * - VM_IO_REGH_CONDSIGNAL: pthread_cond_signal + * - VM_IO_REGH_WRITEFD: write on fd + * - VM_IO_REGH_IOCTL: ioctl on fd + */ + +/* call with ioregh->lock held */ +static struct ioport_reg_handler * +vmm_ioport_find_handler(struct ioregh *ioregh, uint16_t port, uint16_t in, + uint32_t mask_data, uint32_t data) +{ + struct ioport_reg_handler *regh; + uint32_t mask; + int i; + + regh = ioregh->handlers; + for (i = 0; i < IOPORT_MAX_REG_HANDLER; i++) { + if (regh[i].handler != NULL) { + mask = regh[i].mask_data & mask_data; + if ((regh[i].port == port) && (regh[i].in == in) + && ((mask & regh[i].data) == (mask & data))) { + return ®h[i]; + } + } + } + + return (NULL); +} + +/* call with ioregh->lock held */ +static struct ioport_reg_handler * +vmm_ioport_empty_handler(struct ioregh *ioregh) +{ + struct ioport_reg_handler *regh; + int i; + + regh = ioregh->handlers; + for (i = 0; i < IOPORT_MAX_REG_HANDLER; i++) { + if (regh[i].handler == NULL) { + return ®h[i]; + } + } + + return (NULL); +} + + +static int +vmm_ioport_add_handler(struct vm *vm, uint16_t port, uint16_t in, uint32_t mask_data, + uint32_t data, ioport_reg_handler_func_t handler, void *handler_arg) +{ + struct ioport_reg_handler *regh; + struct ioregh *ioregh; + int ret = 0; + + ioregh = vm_ioregh(vm); + + sx_xlock(&ioregh->lock); + + regh = vmm_ioport_find_handler(ioregh, port, in, mask_data, data); + if (regh != NULL) { + printf("%s: handler for port %d in %d mask_data %d data %d \ + already registered\n", + __FUNCTION__, port, in, mask_data, data); + ret = EEXIST; + goto err; + } + + regh = vmm_ioport_empty_handler(ioregh); + if (regh == NULL) { + printf("%s: empty reg_handler slot not found\n", __FUNCTION__); + ret = ENOMEM; + goto err; + } + + regh->port = port; + regh->in = in; + regh->mask_data = mask_data; + regh->data = data; + regh->handler = handler; + regh->handler_arg = handler_arg; + +err: + sx_xunlock(&ioregh->lock); + return (ret); +} + +static int +vmm_ioport_del_handler(struct vm *vm, uint16_t port, uint16_t in, + uint32_t mask_data, uint32_t data) +{ + struct ioport_reg_handler *regh; + struct ioregh *ioregh; + int ret = 0; + + ioregh = vm_ioregh(vm); + + sx_xlock(&ioregh->lock); + + regh = vmm_ioport_find_handler(ioregh, port, in, mask_data, data); + + if (regh == NULL) { + ret = EINVAL; + goto err; + } + + bzero(regh, sizeof(struct ioport_reg_handler)); +err: + sx_xunlock(&ioregh->lock); + return (ret); +} + +/* + * register or delete a new I/O event handler. + */ +int +vmm_ioport_reg_handler(struct vm *vm, uint16_t port, uint16_t in, + uint32_t mask_data, uint32_t data, enum vm_io_regh_type type, void *arg) +{ + int ret = 0; + + switch (type) { + case VM_IO_REGH_DELETE: + ret = vmm_ioport_del_handler(vm, port, in, mask_data, data); + break; + case VM_IO_REGH_KWEVENTS: + ret = vmm_ioport_add_handler(vm, port, in, mask_data, data, + vmm_ioport_reg_wakeup, arg); + break; + default: + printf("%s: unknown reg_handler type\n", __FUNCTION__); + ret = EINVAL; + break; + } + + return (ret); +} + +/* + * Invoke an handler, if the data matches. + */ +static int +invoke_reg_handler(struct vm *vm, int vcpuid, struct vm_exit *vmexit, + uint32_t *val, int *error) +{ + struct ioport_reg_handler *regh; + struct ioregh *ioregh; + uint32_t mask_data; + + mask_data = vie_size2mask(vmexit->u.inout.bytes); + ioregh = vm_ioregh(vm); + + sx_slock(&ioregh->lock); + regh = vmm_ioport_find_handler(ioregh, vmexit->u.inout.port, + vmexit->u.inout.in, mask_data, vmexit->u.inout.eax); + if (regh != NULL) { + *error = (*(regh->handler))(vm, regh, val); + } + sx_sunlock(&ioregh->lock); + return (regh != NULL); +} + +struct ioregh * +ioregh_init(struct vm *vm) +{ + struct ioregh *ioregh; + + ioregh = malloc(sizeof(struct ioregh), M_IOREGH, M_WAITOK | M_ZERO); + sx_init(&ioregh->lock, "ioregh lock"); + + return (ioregh); +} + +void +ioregh_cleanup(struct ioregh *ioregh) +{ + sx_destroy(&ioregh->lock); + free(ioregh, M_IOREGH); +} +#else /* !VMM_IOPORT_REG_HANDLER */ +#define invoke_reg_handler(_1, _2, _3, _4, _5) (0) +#endif /* VMM_IOPORT_REG_HANDLER */ + static int emulate_inout_port(struct vm *vm, int vcpuid, struct vm_exit *vmexit, bool *retu) { ioport_handler_func_t handler; uint32_t mask, val; - int error; + int regh = 0, error = 0; /* * If there is no handler for the I/O port then punt to userspace. */ - if (vmexit->u.inout.port >= MAX_IOPORTS || - (handler = ioport_handler[vmexit->u.inout.port]) == NULL) { + if ((vmexit->u.inout.port >= MAX_IOPORTS || + (handler = ioport_handler[vmexit->u.inout.port]) == NULL) && + (regh = invoke_reg_handler(vm, vcpuid, vmexit, &val, &error)) == 0) { *retu = true; return (0); } - mask = vie_size2mask(vmexit->u.inout.bytes); + if (!regh) { + mask = vie_size2mask(vmexit->u.inout.bytes); + + if (!vmexit->u.inout.in) { + val = vmexit->u.inout.eax & mask; + } - if (!vmexit->u.inout.in) { - val = vmexit->u.inout.eax & mask; + error = (*handler)(vm, vcpuid, vmexit->u.inout.in, + vmexit->u.inout.port, vmexit->u.inout.bytes, &val); } - error = (*handler)(vm, vcpuid, vmexit->u.inout.in, - vmexit->u.inout.port, vmexit->u.inout.bytes, &val); if (error) { /* * The value returned by this function is also the return value diff -u -r -N usr/src/sys/amd64/vmm/vmm_ioport.c.orig /usr/src/sys/amd64/vmm/vmm_ioport.c.orig --- usr/src/sys/amd64/vmm/vmm_ioport.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_ioport.c.orig 2016-11-30 10:52:59.037065000 +0000 @@ -0,0 +1,176 @@ +/*- + * Copyright (c) 2014 Tycho Nightingale <tycho.nightingale@pluribusnetworks.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD: releng/11.0/sys/amd64/vmm/vmm_ioport.c 282287 2015-04-30 22:23:22Z neel $"); + +#include <sys/param.h> +#include <sys/systm.h> + +#include <machine/vmm.h> +#include <machine/vmm_instruction_emul.h> + +#include "vatpic.h" +#include "vatpit.h" +#include "vpmtmr.h" +#include "vrtc.h" +#include "vmm_ioport.h" +#include "vmm_ktr.h" + +#define MAX_IOPORTS 1280 + +ioport_handler_func_t ioport_handler[MAX_IOPORTS] = { + [TIMER_MODE] = vatpit_handler, + [TIMER_CNTR0] = vatpit_handler, + [TIMER_CNTR1] = vatpit_handler, + [TIMER_CNTR2] = vatpit_handler, + [NMISC_PORT] = vatpit_nmisc_handler, + [IO_ICU1] = vatpic_master_handler, + [IO_ICU1 + ICU_IMR_OFFSET] = vatpic_master_handler, + [IO_ICU2] = vatpic_slave_handler, + [IO_ICU2 + ICU_IMR_OFFSET] = vatpic_slave_handler, + [IO_ELCR1] = vatpic_elc_handler, + [IO_ELCR2] = vatpic_elc_handler, + [IO_PMTMR] = vpmtmr_handler, + [IO_RTC] = vrtc_addr_handler, + [IO_RTC + 1] = vrtc_data_handler, +}; + +#ifdef KTR +static const char * +inout_instruction(struct vm_exit *vmexit) +{ + int index; + + static const char *iodesc[] = { + "outb", "outw", "outl", + "inb", "inw", "inl", + "outsb", "outsw", "outsd", + "insb", "insw", "insd", + }; + + switch (vmexit->u.inout.bytes) { + case 1: + index = 0; + break; + case 2: + index = 1; + break; + default: + index = 2; + break; + } + + if (vmexit->u.inout.in) + index += 3; + + if (vmexit->u.inout.string) + index += 6; + + KASSERT(index < nitems(iodesc), ("%s: invalid index %d", + __func__, index)); + + return (iodesc[index]); +} +#endif /* KTR */ + +static int +emulate_inout_port(struct vm *vm, int vcpuid, struct vm_exit *vmexit, + bool *retu) +{ + ioport_handler_func_t handler; + uint32_t mask, val; + int error; + + /* + * If there is no handler for the I/O port then punt to userspace. + */ + if (vmexit->u.inout.port >= MAX_IOPORTS || + (handler = ioport_handler[vmexit->u.inout.port]) == NULL) { + *retu = true; + return (0); + } + + mask = vie_size2mask(vmexit->u.inout.bytes); + + if (!vmexit->u.inout.in) { + val = vmexit->u.inout.eax & mask; + } + + error = (*handler)(vm, vcpuid, vmexit->u.inout.in, + vmexit->u.inout.port, vmexit->u.inout.bytes, &val); + if (error) { + /* + * The value returned by this function is also the return value + * of vm_run(). This needs to be a positive number otherwise it + * can be interpreted as a "pseudo-error" like ERESTART. + * + * Enforce this by mapping all errors to EIO. + */ + return (EIO); + } + + if (vmexit->u.inout.in) { + vmexit->u.inout.eax &= ~mask; + vmexit->u.inout.eax |= val & mask; + error = vm_set_register(vm, vcpuid, VM_REG_GUEST_RAX, + vmexit->u.inout.eax); + KASSERT(error == 0, ("emulate_ioport: error %d setting guest " + "rax register", error)); + } + *retu = false; + return (0); +} + +static int +emulate_inout_str(struct vm *vm, int vcpuid, struct vm_exit *vmexit, bool *retu) +{ + *retu = true; + return (0); /* Return to userspace to finish emulation */ +} + +int +vm_handle_inout(struct vm *vm, int vcpuid, struct vm_exit *vmexit, bool *retu) +{ + int bytes, error; + + bytes = vmexit->u.inout.bytes; + KASSERT(bytes == 1 || bytes == 2 || bytes == 4, + ("vm_handle_inout: invalid operand size %d", bytes)); + + if (vmexit->u.inout.string) + error = emulate_inout_str(vm, vcpuid, vmexit, retu); + else + error = emulate_inout_port(vm, vcpuid, vmexit, retu); + + VCPU_CTR4(vm, vcpuid, "%s%s 0x%04x: %s", + vmexit->u.inout.rep ? "rep " : "", + inout_instruction(vmexit), + vmexit->u.inout.port, + error ? "error" : (*retu ? "userspace" : "handled")); + + return (error); +} diff -u -r -N usr/src/sys/amd64/vmm/vmm_ioport.h /usr/src/sys/amd64/vmm/vmm_ioport.h --- usr/src/sys/amd64/vmm/vmm_ioport.h 2016-09-29 00:24:54.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_ioport.h 2016-11-30 10:56:05.795940000 +0000 @@ -29,6 +29,22 @@ #ifndef _VMM_IOPORT_H_ #define _VMM_IOPORT_H_ +#define VMM_IOPORT_REG_HANDLER +#ifdef VMM_IOPORT_REG_HANDLER +struct ioport_reg_handler; +struct ioregh; + +struct ioregh *ioregh_init(struct vm *vm); +void ioregh_cleanup(struct ioregh *ioregh); + +int vmm_ioport_reg_handler(struct vm *vm, uint16_t port, uint16_t in, + uint32_t mask_data, uint32_t data, enum vm_io_regh_type type, void *arg); +#else /* !VMM_IOPORT_REG_HANDLER */ +#define ioregh_init(_1) (NULL) +#define ioregh_cleanup(_1) +#define vmm_ioport_reg_handler(_1, _2, _3, _4,_5, _6, _7) (EINVAL) +#endif /* VMM_IOPORT_REG_HANDLER */ + typedef int (*ioport_handler_func_t)(struct vm *vm, int vcpuid, bool in, int port, int bytes, uint32_t *val); diff -u -r -N usr/src/sys/amd64/vmm/vmm_ioport.h.orig /usr/src/sys/amd64/vmm/vmm_ioport.h.orig --- usr/src/sys/amd64/vmm/vmm_ioport.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_ioport.h.orig 2016-11-30 10:52:59.409060000 +0000 @@ -0,0 +1,37 @@ +/*- + * Copyright (c) 2014 Tycho Nightingale <tycho.nightingale@pluribusnetworks.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/vmm/vmm_ioport.h 273706 2014-10-26 19:03:06Z neel $ + */ + +#ifndef _VMM_IOPORT_H_ +#define _VMM_IOPORT_H_ + +typedef int (*ioport_handler_func_t)(struct vm *vm, int vcpuid, + bool in, int port, int bytes, uint32_t *val); + +int vm_handle_inout(struct vm *vm, int vcpuid, struct vm_exit *vme, bool *retu); + +#endif /* _VMM_IOPORT_H_ */ diff -u -r -N usr/src/sys/amd64/vmm/vmm_usermem.c /usr/src/sys/amd64/vmm/vmm_usermem.c --- usr/src/sys/amd64/vmm/vmm_usermem.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_usermem.c 2016-12-01 14:42:38.410596000 +0000 @@ -0,0 +1,186 @@ +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/malloc.h> +#include <sys/sglist.h> +#include <sys/lock.h> +#include <sys/rwlock.h> +#include <sys/proc.h> + +#include <vm/vm.h> +#include <vm/vm_param.h> +#include <vm/pmap.h> +#include <vm/vm_map.h> +#include <vm/vm_object.h> +#include <vm/vm_page.h> +#include <vm/vm_pager.h> + +#include <machine/md_var.h> + +#include "vmm_mem.h" +#include "vmm_usermem.h" + +/* + * usermem functions allow us to map an host userspace buffer (eg. from bhyve) + * in the guest VM. + * + * This feature is used to implement ptnetmap on bhyve, mapping the netmap memory + * (returned by the mmap() in the byvhe userspace application) in the guest VM. + */ + +/* TODO: we can create a dynamical list of usermem */ +#define MAX_USERMEMS 64 + +static struct usermem { + struct vmspace *vmspace; /* guest address space */ + vm_paddr_t gpa; /* guest physical address */ + size_t len; +} usermems[MAX_USERMEMS]; + +static int +vmm_usermem_add(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].len == 0) { + usermems[i].vmspace = vmspace; + usermems[i].gpa = gpa; + usermems[i].len = len; + break; + } + } + + if (i == MAX_USERMEMS) { + printf("vmm_usermem_add: empty usermem slot not found\n"); + return (ENOMEM); + } + + return 0; +} + +static int +vmm_usermem_del(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace == vmspace && usermems[i].gpa == gpa + && usermems[i].len == len) { + bzero(&usermems[i], sizeof(struct usermem)); + return 1; + } + } + + return 0; +} + +boolean_t +usermem_mapped(struct vmspace *vmspace, vm_paddr_t gpa) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace != vmspace || usermems[i].len == 0) + continue; + if (gpa >= usermems[i].gpa && + gpa < usermems[i].gpa + usermems[i].len) + return (TRUE); + } + return (FALSE); +} + +vm_object_t +vmm_usermem_alloc(struct vmspace *vmspace, vm_paddr_t gpa, size_t len, + void *buf, struct thread *td) +{ + int error; + vm_object_t obj; + vm_map_t map; + vm_map_entry_t entry; + vm_pindex_t index; + vm_prot_t prot; + boolean_t wired; + + map = &td->td_proc->p_vmspace->vm_map; + /* lookup the vm_object that describe user addr */ + error = vm_map_lookup(&map, (unsigned long)buf, VM_PROT_RW, &entry, + &obj, &index, &prot, &wired); + + /* map th vm_object in the vmspace */ + if (obj != NULL) { + error = vm_map_find(&vmspace->vm_map, obj, index, &gpa, len, 0, + VMFS_NO_SPACE, VM_PROT_RW, VM_PROT_RW, 0); + if (error != KERN_SUCCESS) { + vm_object_deallocate(obj); + obj = NULL; + } + } + vm_map_lookup_done(map, entry); + + /* acquire the reference to the vm_object */ + if (obj != NULL) { + vm_object_reference(obj); + vmm_usermem_add(vmspace, gpa, len); + } + + return (obj); +} + +void +vmm_usermem_free(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int ret; + + ret = vmm_usermem_del(vmspace, gpa, len); + if (ret) { + //TODO check return value of vm_map_remove ? + vm_map_remove(&vmspace->vm_map, gpa, gpa + len); + //TODO should we call vm_object_deallocate ? + } +} + +void +vmm_usermem_cleanup(struct vmspace *vmspace) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace == vmspace) { + //TODO same as above + vm_map_remove(&vmspace->vm_map, usermems[i].gpa, + usermems[i].gpa + usermems[i].len); + bzero(&usermems[i], sizeof(struct usermem)); + } + } +} diff -u -r -N usr/src/sys/amd64/vmm/vmm_usermem.c.orig /usr/src/sys/amd64/vmm/vmm_usermem.c.orig --- usr/src/sys/amd64/vmm/vmm_usermem.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_usermem.c.orig 2016-11-30 10:52:59.415250000 +0000 @@ -0,0 +1,372 @@ +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/malloc.h> +#include <sys/sglist.h> +#include <sys/lock.h> +#include <sys/rwlock.h> +#include <sys/proc.h> + +#include <vm/vm.h> +#include <vm/vm_param.h> +#include <vm/pmap.h> +#include <vm/vm_map.h> +#include <vm/vm_object.h> +#include <vm/vm_page.h> +#include <vm/vm_pager.h> + +#include <machine/md_var.h> + +#include "vmm_mem.h" +#include "vmm_usermem.h" + +/* + * usermem functions allow us to map an host userspace buffer (eg. from bhyve) + * in the guest VM. + * + * This feature is used to implement ptnetmap on bhyve, mapping the netmap memory + * (returned by the mmap() in the byvhe userspace application) in the guest VM. + */ + +/* TODO: we can create a dynamical list of usermem */ +#define MAX_USERMEMS 64 + +static struct usermem { + struct vmspace *vmspace; /* guest address space */ + vm_paddr_t gpa; /* guest physical address */ + size_t len; +} usermems[MAX_USERMEMS]; + +static int +vmm_usermem_add(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].len == 0) { + usermems[i].vmspace = vmspace; + usermems[i].gpa = gpa; + usermems[i].len = len; + break; + } + } + + if (i == MAX_USERMEMS) { + printf("vmm_usermem_add: empty usermem slot not found\n"); + return (ENOMEM); + } + + return 0; +} + +static int +vmm_usermem_del(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace == vmspace && usermems[i].gpa == gpa + && usermems[i].len == len) { + bzero(&usermems[i], sizeof(struct usermem)); + return 1; + } + } + + return 0; +} + +boolean_t +usermem_mapped(struct vmspace *vmspace, vm_paddr_t gpa) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace != vmspace || usermems[i].len == 0) + continue; + if (gpa >= usermems[i].gpa && + gpa < usermems[i].gpa + usermems[i].len) + return (TRUE); + } + return (FALSE); +} + +vm_object_t +vmm_usermem_alloc(struct vmspace *vmspace, vm_paddr_t gpa, size_t len, + void *buf, struct thread *td) +{ + int error; + vm_object_t obj; + vm_map_t map; + vm_map_entry_t entry; + vm_pindex_t index; + vm_prot_t prot; + boolean_t wired; + + map = &td->td_proc->p_vmspace->vm_map; + /* lookup the vm_object that describe user addr */ + error = vm_map_lookup(&map, (unsigned long)buf, VM_PROT_RW, &entry, + &obj, &index, &prot, &wired); + + /* map th vm_object in the vmspace */ + if (obj != NULL) { + error = vm_map_find(&vmspace->vm_map, obj, index, &gpa, len, 0, + VMFS_NO_SPACE, VM_PROT_RW, VM_PROT_RW, 0); + if (error != KERN_SUCCESS) { + vm_object_deallocate(obj); + obj = NULL; + } + } + vm_map_lookup_done(map, entry); + + /* acquire the reference to the vm_object */ + if (obj != NULL) { + vm_object_reference(obj); + vmm_usermem_add(vmspace, gpa, len); + } + + return (obj); +} + +void +vmm_usermem_free(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int ret; + + ret = vmm_usermem_del(vmspace, gpa, len); + if (ret) { + //TODO check return value of vm_map_remove ? + vm_map_remove(&vmspace->vm_map, gpa, gpa + len); + //TODO should we call vm_object_deallocate ? + } +} + +void +vmm_usermem_cleanup(struct vmspace *vmspace) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace == vmspace) { + //TODO same as above + vm_map_remove(&vmspace->vm_map, usermems[i].gpa, + usermems[i].gpa + usermems[i].len); + bzero(&usermems[i], sizeof(struct usermem)); + } + } +} +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/malloc.h> +#include <sys/sglist.h> +#include <sys/lock.h> +#include <sys/rwlock.h> +#include <sys/proc.h> + +#include <vm/vm.h> +#include <vm/vm_param.h> +#include <vm/pmap.h> +#include <vm/vm_map.h> +#include <vm/vm_object.h> +#include <vm/vm_page.h> +#include <vm/vm_pager.h> + +#include <machine/md_var.h> + +#include "vmm_mem.h" +#include "vmm_usermem.h" + +/* + * usermem functions allow us to map an host userspace buffer (eg. from bhyve) + * in the guest VM. + * + * This feature is used to implement ptnetmap on bhyve, mapping the netmap memory + * (returned by the mmap() in the byvhe userspace application) in the guest VM. + */ + +/* TODO: we can create a dynamical list of usermem */ +#define MAX_USERMEMS 64 + +static struct usermem { + struct vmspace *vmspace; /* guest address space */ + vm_paddr_t gpa; /* guest physical address */ + size_t len; +} usermems[MAX_USERMEMS]; + +static int +vmm_usermem_add(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].len == 0) { + usermems[i].vmspace = vmspace; + usermems[i].gpa = gpa; + usermems[i].len = len; + break; + } + } + + if (i == MAX_USERMEMS) { + printf("vmm_usermem_add: empty usermem slot not found\n"); + return (ENOMEM); + } + + return 0; +} + +static int +vmm_usermem_del(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace == vmspace && usermems[i].gpa == gpa + && usermems[i].len == len) { + bzero(&usermems[i], sizeof(struct usermem)); + return 1; + } + } + + return 0; +} + +boolean_t +usermem_mapped(struct vmspace *vmspace, vm_paddr_t gpa) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace != vmspace || usermems[i].len == 0) + continue; + if (gpa >= usermems[i].gpa && + gpa < usermems[i].gpa + usermems[i].len) + return (TRUE); + } + return (FALSE); +} + +vm_object_t +vmm_usermem_alloc(struct vmspace *vmspace, vm_paddr_t gpa, size_t len, + void *buf, struct thread *td) +{ + int error; + vm_object_t obj; + vm_map_t map; + vm_map_entry_t entry; + vm_pindex_t index; + vm_prot_t prot; + boolean_t wired; + + map = &td->td_proc->p_vmspace->vm_map; + /* lookup the vm_object that describe user addr */ + error = vm_map_lookup(&map, (unsigned long)buf, VM_PROT_RW, &entry, + &obj, &index, &prot, &wired); + + /* map th vm_object in the vmspace */ + if (obj != NULL) { + error = vm_map_find(&vmspace->vm_map, obj, index, &gpa, len, 0, + VMFS_NO_SPACE, VM_PROT_RW, VM_PROT_RW, 0); + if (error != KERN_SUCCESS) { + vm_object_deallocate(obj); + obj = NULL; + } + } + vm_map_lookup_done(map, entry); + + /* acquire the reference to the vm_object */ + if (obj != NULL) { + vm_object_reference(obj); + vmm_usermem_add(vmspace, gpa, len); + } + + return (obj); +} + +void +vmm_usermem_free(struct vmspace *vmspace, vm_paddr_t gpa, size_t len) +{ + int ret; + + ret = vmm_usermem_del(vmspace, gpa, len); + if (ret) { + //TODO check return value of vm_map_remove ? + vm_map_remove(&vmspace->vm_map, gpa, gpa + len); + //TODO should we call vm_object_deallocate ? + } +} + +void +vmm_usermem_cleanup(struct vmspace *vmspace) +{ + int i; + + for (i = 0; i < MAX_USERMEMS; i++) { + if (usermems[i].vmspace == vmspace) { + //TODO same as above + vm_map_remove(&vmspace->vm_map, usermems[i].gpa, + usermems[i].gpa + usermems[i].len); + bzero(&usermems[i], sizeof(struct usermem)); + } + } +} diff -u -r -N usr/src/sys/amd64/vmm/vmm_usermem.h /usr/src/sys/amd64/vmm/vmm_usermem.h --- usr/src/sys/amd64/vmm/vmm_usermem.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_usermem.h 2016-11-30 10:56:05.804241000 +0000 @@ -0,0 +1,120 @@ +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _VMM_USERMEM_H_ +#define _VMM_USERMEM_H_ + +struct vm; + +struct vm_object *vmm_usermem_alloc(struct vmspace *, vm_paddr_t gpa, + size_t len, void *buf, struct thread *td); +void vmm_usermem_free(struct vmspace *, vm_paddr_t gpa, size_t len); +void vmm_usermem_cleanup(struct vmspace *); +boolean_t usermem_mapped(struct vmspace *, vm_paddr_t gpa); + +#endif +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _VMM_USERMEM_H_ +#define _VMM_USERMEM_H_ + +struct vm; + +struct vm_object *vmm_usermem_alloc(struct vmspace *, vm_paddr_t gpa, + size_t len, void *buf, struct thread *td); +void vmm_usermem_free(struct vmspace *, vm_paddr_t gpa, size_t len); +void vmm_usermem_cleanup(struct vmspace *); +boolean_t usermem_mapped(struct vmspace *, vm_paddr_t gpa); + +#endif +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _VMM_USERMEM_H_ +#define _VMM_USERMEM_H_ + +struct vm; + +struct vm_object *vmm_usermem_alloc(struct vmspace *, vm_paddr_t gpa, + size_t len, void *buf, struct thread *td); +void vmm_usermem_free(struct vmspace *, vm_paddr_t gpa, size_t len); +void vmm_usermem_cleanup(struct vmspace *); +boolean_t usermem_mapped(struct vmspace *, vm_paddr_t gpa); + +#endif diff -u -r -N usr/src/sys/amd64/vmm/vmm_usermem.h.orig /usr/src/sys/amd64/vmm/vmm_usermem.h.orig --- usr/src/sys/amd64/vmm/vmm_usermem.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/amd64/vmm/vmm_usermem.h.orig 2016-11-30 10:52:59.417439000 +0000 @@ -0,0 +1,80 @@ +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _VMM_USERMEM_H_ +#define _VMM_USERMEM_H_ + +struct vm; + +struct vm_object *vmm_usermem_alloc(struct vmspace *, vm_paddr_t gpa, + size_t len, void *buf, struct thread *td); +void vmm_usermem_free(struct vmspace *, vm_paddr_t gpa, size_t len); +void vmm_usermem_cleanup(struct vmspace *); +boolean_t usermem_mapped(struct vmspace *, vm_paddr_t gpa); + +#endif +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _VMM_USERMEM_H_ +#define _VMM_USERMEM_H_ + +struct vm; + +struct vm_object *vmm_usermem_alloc(struct vmspace *, vm_paddr_t gpa, + size_t len, void *buf, struct thread *td); +void vmm_usermem_free(struct vmspace *, vm_paddr_t gpa, size_t len); +void vmm_usermem_cleanup(struct vmspace *); +boolean_t usermem_mapped(struct vmspace *, vm_paddr_t gpa); + +#endif diff -u -r -N usr/src/sys/dev/netmap/if_em_netmap.h /usr/src/sys/dev/netmap/if_em_netmap.h --- usr/src/sys/dev/netmap/if_em_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_em_netmap.h 2016-11-23 16:57:57.841822000 +0000 @@ -24,7 +24,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/if_em_netmap.h 293331 2016-01-07 16:42:48Z sbruno $ + * $FreeBSD: head/sys/dev/netmap/if_em_netmap.h 238985 2012-08-02 11:59:43Z luigi $ * * netmap support for: em. * diff -u -r -N usr/src/sys/dev/netmap/if_igb_netmap.h /usr/src/sys/dev/netmap/if_igb_netmap.h --- usr/src/sys/dev/netmap/if_igb_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_igb_netmap.h 2016-11-23 16:57:57.842542000 +0000 @@ -24,7 +24,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/if_igb_netmap.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/dev/netmap/if_igb_netmap.h 256200 2013-10-09 17:32:52Z jfv $ * * Netmap support for igb, partly contributed by Ahmed Kooli * For details on netmap support please see ixgbe_netmap.h diff -u -r -N usr/src/sys/dev/netmap/if_ixl_netmap.h /usr/src/sys/dev/netmap/if_ixl_netmap.h --- usr/src/sys/dev/netmap/if_ixl_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_ixl_netmap.h 2016-11-23 16:57:57.842979000 +0000 @@ -24,7 +24,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/if_ixl_netmap.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/dev/netmap/if_ixl_netmap.h 279232 2015-02-24 06:20:50Z luigi $ * * netmap support for: ixl * @@ -59,7 +59,7 @@ /* * device-specific sysctl variables: * - * ixl_crcstrip: 0: keep CRC in rx frames (default), 1: strip it. + * ixl_crcstrip: 0: NIC keeps CRC in rx frames, 1: NIC strips it (default). * During regular operations the CRC is stripped, but on some * hardware reception of frames not multiple of 64 is slower, * so using crcstrip=0 helps in benchmarks. @@ -71,10 +71,9 @@ /* * The xl driver by default strips CRCs and we do not override it. */ -int ixl_rx_miss, ixl_rx_miss_bufs, ixl_crcstrip = 1; #if 0 SYSCTL_INT(_dev_netmap, OID_AUTO, ixl_crcstrip, - CTLFLAG_RW, &ixl_crcstrip, 1, "strip CRC on rx frames"); + CTLFLAG_RW, &ixl_crcstrip, 1, "NIC strips CRC on rx frames"); #endif SYSCTL_INT(_dev_netmap, OID_AUTO, ixl_rx_miss, CTLFLAG_RW, &ixl_rx_miss, 0, "potentially missed rx intr"); diff -u -r -N usr/src/sys/dev/netmap/if_lem_netmap.h /usr/src/sys/dev/netmap/if_lem_netmap.h --- usr/src/sys/dev/netmap/if_lem_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_lem_netmap.h 2016-11-23 16:57:57.843327000 +0000 @@ -25,7 +25,7 @@ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/if_lem_netmap.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/dev/netmap/if_lem_netmap.h 271849 2014-09-19 03:51:26Z glebius $ * * netmap support for: lem * @@ -35,12 +35,8 @@ #include <net/netmap.h> #include <sys/selinfo.h> -#include <vm/vm.h> -#include <vm/pmap.h> /* vtophys ? */ #include <dev/netmap/netmap_kern.h> -extern int netmap_adaptive_io; - /* * Register/unregister. We are already under netmap lock. */ @@ -81,6 +77,22 @@ } +static void +lem_netmap_intr(struct netmap_adapter *na, int onoff) +{ + struct ifnet *ifp = na->ifp; + struct adapter *adapter = ifp->if_softc; + + EM_CORE_LOCK(adapter); + if (onoff) { + lem_enable_intr(adapter); + } else { + lem_disable_intr(adapter); + } + EM_CORE_UNLOCK(adapter); +} + + /* * Reconcile kernel and user view of the transmit ring. */ @@ -99,10 +111,6 @@ /* device-specific */ struct adapter *adapter = ifp->if_softc; -#ifdef NIC_PARAVIRT - struct paravirt_csb *csb = adapter->csb; - uint64_t *csbd = (uint64_t *)(csb + 1); -#endif /* NIC_PARAVIRT */ bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map, BUS_DMASYNC_POSTREAD); @@ -113,19 +121,6 @@ nm_i = kring->nr_hwcur; if (nm_i != head) { /* we have new packets to send */ -#ifdef NIC_PARAVIRT - int do_kick = 0; - uint64_t t = 0; // timestamp - int n = head - nm_i; - if (n < 0) - n += lim + 1; - if (csb) { - t = rdtsc(); /* last timestamp */ - csbd[16] += t - csbd[0]; /* total Wg */ - csbd[17] += n; /* Wg count */ - csbd[0] = t; - } -#endif /* NIC_PARAVIRT */ nic_i = netmap_idx_k2n(kring, nm_i); while (nm_i != head) { struct netmap_slot *slot = &ring->slot[nm_i]; @@ -166,38 +161,8 @@ bus_dmamap_sync(adapter->txdma.dma_tag, adapter->txdma.dma_map, BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); -#ifdef NIC_PARAVIRT - /* set unconditionally, then also kick if needed */ - if (csb) { - t = rdtsc(); - if (csb->host_need_txkick == 2) { - /* can compute an update of delta */ - int64_t delta = t - csbd[3]; - if (delta < 0) - delta = -delta; - if (csbd[8] == 0 || delta < csbd[8]) { - csbd[8] = delta; - csbd[9]++; - } - csbd[10]++; - } - csb->guest_tdt = nic_i; - csbd[18] += t - csbd[0]; // total wp - csbd[19] += n; - } - if (!csb || !csb->guest_csb_on || (csb->host_need_txkick & 1)) - do_kick = 1; - if (do_kick) -#endif /* NIC_PARAVIRT */ /* (re)start the tx unit up to slot nic_i (excluded) */ E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), nic_i); -#ifdef NIC_PARAVIRT - if (do_kick) { - uint64_t t1 = rdtsc(); - csbd[20] += t1 - t; // total Np - csbd[21]++; - } -#endif /* NIC_PARAVIRT */ } /* @@ -206,93 +171,6 @@ if (ticks != kring->last_reclaim || flags & NAF_FORCE_RECLAIM || nm_kr_txempty(kring)) { kring->last_reclaim = ticks; /* record completed transmissions using TDH */ -#ifdef NIC_PARAVIRT - /* host updates tdh unconditionally, and we have - * no side effects on reads, so we can read from there - * instead of exiting. - */ - if (csb) { - static int drain = 0, nodrain=0, good = 0, bad = 0, fail = 0; - u_int x = adapter->next_tx_to_clean; - csbd[19]++; // XXX count reclaims - nic_i = csb->host_tdh; - if (csb->guest_csb_on) { - if (nic_i == x) { - bad++; - csbd[24]++; // failed reclaims - /* no progress, request kick and retry */ - csb->guest_need_txkick = 1; - mb(); // XXX barrier - nic_i = csb->host_tdh; - } else { - good++; - } - if (nic_i != x) { - csb->guest_need_txkick = 2; - if (nic_i == csb->guest_tdt) - drain++; - else - nodrain++; -#if 1 - if (netmap_adaptive_io) { - /* new mechanism: last half ring (or so) - * released one slot at a time. - * This effectively makes the system spin. - * - * Take next_to_clean + 1 as a reference. - * tdh must be ahead or equal - * On entry, the logical order is - * x < tdh = nic_i - * We first push tdh up to avoid wraps. - * The limit is tdh-ll (half ring). - * if tdh-256 < x we report x; - * else we report tdh-256 - */ - u_int tdh = nic_i; - u_int ll = csbd[15]; - u_int delta = lim/8; - if (netmap_adaptive_io == 2 || ll > delta) - csbd[15] = ll = delta; - else if (netmap_adaptive_io == 1 && ll > 1) { - csbd[15]--; - } - - if (nic_i >= kring->nkr_num_slots) { - RD(5, "bad nic_i %d on input", nic_i); - } - x = nm_next(x, lim); - if (tdh < x) - tdh += lim + 1; - if (tdh <= x + ll) { - nic_i = x; - csbd[25]++; //report n + 1; - } else { - tdh = nic_i; - if (tdh < ll) - tdh += lim + 1; - nic_i = tdh - ll; - csbd[26]++; // report tdh - ll - } - } -#endif - } else { - /* we stop, count whether we are idle or not */ - int bh_active = csb->host_need_txkick & 2 ? 4 : 0; - csbd[27+ csb->host_need_txkick]++; - if (netmap_adaptive_io == 1) { - if (bh_active && csbd[15] > 1) - csbd[15]--; - else if (!bh_active && csbd[15] < lim/2) - csbd[15]++; - } - bad--; - fail++; - } - } - RD(1, "drain %d nodrain %d good %d retry %d fail %d", - drain, nodrain, good, bad, fail); - } else -#endif /* !NIC_PARAVIRT */ nic_i = E1000_READ_REG(&adapter->hw, E1000_TDH(0)); if (nic_i >= kring->nkr_num_slots) { /* XXX can it happen ? */ D("TDH wrap %d", nic_i); @@ -324,21 +202,10 @@ /* device-specific */ struct adapter *adapter = ifp->if_softc; -#ifdef NIC_PARAVIRT - struct paravirt_csb *csb = adapter->csb; - uint32_t csb_mode = csb && csb->guest_csb_on; - uint32_t do_host_rxkick = 0; -#endif /* NIC_PARAVIRT */ if (head > lim) return netmap_ring_reinit(kring); -#ifdef NIC_PARAVIRT - if (csb_mode) { - force_update = 1; - csb->guest_need_rxkick = 0; - } -#endif /* NIC_PARAVIRT */ /* XXX check sync modes */ bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map, BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); @@ -357,23 +224,6 @@ uint32_t staterr = le32toh(curr->status); int len; -#ifdef NIC_PARAVIRT - if (csb_mode) { - if ((staterr & E1000_RXD_STAT_DD) == 0) { - /* don't bother to retry if more than 1 pkt */ - if (n > 1) - break; - csb->guest_need_rxkick = 1; - wmb(); - staterr = le32toh(curr->status); - if ((staterr & E1000_RXD_STAT_DD) == 0) { - break; - } else { /* we are good */ - csb->guest_need_rxkick = 0; - } - } - } else -#endif /* NIC_PARAVIRT */ if ((staterr & E1000_RXD_STAT_DD) == 0) break; len = le16toh(curr->length) - 4; // CRC @@ -390,18 +240,6 @@ nic_i = nm_next(nic_i, lim); } if (n) { /* update the state variables */ -#ifdef NIC_PARAVIRT - if (csb_mode) { - if (n > 1) { - /* leave one spare buffer so we avoid rxkicks */ - nm_i = nm_prev(nm_i, lim); - nic_i = nm_prev(nic_i, lim); - n--; - } else { - csb->guest_need_rxkick = 1; - } - } -#endif /* NIC_PARAVIRT */ ND("%d new packets at nic %d nm %d tail %d", n, adapter->next_rx_desc_to_check, @@ -440,10 +278,6 @@ curr->status = 0; bus_dmamap_sync(adapter->rxtag, rxbuf->map, BUS_DMASYNC_PREREAD); -#ifdef NIC_PARAVIRT - if (csb_mode && csb->host_rxkick_at == nic_i) - do_host_rxkick = 1; -#endif /* NIC_PARAVIRT */ nm_i = nm_next(nm_i, lim); nic_i = nm_next(nic_i, lim); } @@ -455,12 +289,6 @@ * so move nic_i back by one unit */ nic_i = nm_prev(nic_i, lim); -#ifdef NIC_PARAVIRT - /* set unconditionally, then also kick if needed */ - if (csb) - csb->guest_rdt = nic_i; - if (!csb_mode || do_host_rxkick) -#endif /* NIC_PARAVIRT */ E1000_WRITE_REG(&adapter->hw, E1000_RDT(0), nic_i); } @@ -486,6 +314,7 @@ na.nm_rxsync = lem_netmap_rxsync; na.nm_register = lem_netmap_reg; na.num_tx_rings = na.num_rx_rings = 1; + na.nm_intr = lem_netmap_intr; netmap_attach(&na); } diff -u -r -N usr/src/sys/dev/netmap/if_ptnet.c /usr/src/sys/dev/netmap/if_ptnet.c --- usr/src/sys/dev/netmap/if_ptnet.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_ptnet.c 2016-11-23 16:57:57.844628000 +0000 @@ -0,0 +1,2276 @@ +/*- + * Copyright (c) 2016, Vincenzo Maffione + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD$ + */ + +/* Driver for ptnet paravirtualized network device. */ + +#include <sys/cdefs.h> + +#include <sys/types.h> +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/kernel.h> +#include <sys/sockio.h> +#include <sys/mbuf.h> +#include <sys/malloc.h> +#include <sys/module.h> +#include <sys/socket.h> +#include <sys/sysctl.h> +#include <sys/lock.h> +#include <sys/mutex.h> +#include <sys/taskqueue.h> +#include <sys/smp.h> +#include <sys/time.h> +#include <machine/smp.h> + +#include <vm/uma.h> +#include <vm/vm.h> +#include <vm/pmap.h> + +#include <net/ethernet.h> +#include <net/if.h> +#include <net/if_var.h> +#include <net/if_arp.h> +#include <net/if_dl.h> +#include <net/if_types.h> +#include <net/if_media.h> +#include <net/if_vlan_var.h> +#include <net/bpf.h> + +#include <netinet/in_systm.h> +#include <netinet/in.h> +#include <netinet/ip.h> +#include <netinet/ip6.h> +#include <netinet6/ip6_var.h> +#include <netinet/udp.h> +#include <netinet/tcp.h> +#include <netinet/sctp.h> + +#include <machine/bus.h> +#include <machine/resource.h> +#include <sys/bus.h> +#include <sys/rman.h> + +#include <dev/pci/pcivar.h> +#include <dev/pci/pcireg.h> + +#include "opt_inet.h" +#include "opt_inet6.h" + +#include <sys/selinfo.h> +#include <net/netmap.h> +#include <dev/netmap/netmap_kern.h> +#include <net/netmap_virt.h> +#include <dev/netmap/netmap_mem2.h> +#include <dev/virtio/network/virtio_net.h> + +#ifndef PTNET_CSB_ALLOC +#error "No support for on-device CSB" +#endif + +#ifndef INET +#error "INET not defined, cannot support offloadings" +#endif + +#if __FreeBSD_version >= 1100000 +static uint64_t ptnet_get_counter(if_t, ift_counter); +#else +typedef struct ifnet *if_t; +#define if_getsoftc(_ifp) (_ifp)->if_softc +#endif + +//#define PTNETMAP_STATS +//#define DEBUG +#ifdef DEBUG +#define DBG(x) x +#else /* !DEBUG */ +#define DBG(x) +#endif /* !DEBUG */ + +extern int ptnet_vnet_hdr; /* Tunable parameter */ + +struct ptnet_softc; + +struct ptnet_queue_stats { + uint64_t packets; /* if_[io]packets */ + uint64_t bytes; /* if_[io]bytes */ + uint64_t errors; /* if_[io]errors */ + uint64_t iqdrops; /* if_iqdrops */ + uint64_t mcasts; /* if_[io]mcasts */ +#ifdef PTNETMAP_STATS + uint64_t intrs; + uint64_t kicks; +#endif /* PTNETMAP_STATS */ +}; + +struct ptnet_queue { + struct ptnet_softc *sc; + struct resource *irq; + void *cookie; + int kring_id; + struct ptnet_ring *ptring; + unsigned int kick; + struct mtx lock; + struct buf_ring *bufring; /* for TX queues */ + struct ptnet_queue_stats stats; +#ifdef PTNETMAP_STATS + struct ptnet_queue_stats last_stats; +#endif /* PTNETMAP_STATS */ + struct taskqueue *taskq; + struct task task; + char lock_name[16]; +}; + +#define PTNET_Q_LOCK(_pq) mtx_lock(&(_pq)->lock) +#define PTNET_Q_TRYLOCK(_pq) mtx_trylock(&(_pq)->lock) +#define PTNET_Q_UNLOCK(_pq) mtx_unlock(&(_pq)->lock) + +struct ptnet_softc { + device_t dev; + if_t ifp; + struct ifmedia media; + struct mtx lock; + char lock_name[16]; + char hwaddr[ETHER_ADDR_LEN]; + + /* Mirror of PTFEAT register. */ + uint32_t ptfeatures; + unsigned int vnet_hdr_len; + + /* PCI BARs support. */ + struct resource *iomem; + struct resource *msix_mem; + + unsigned int num_rings; + unsigned int num_tx_rings; + struct ptnet_queue *queues; + struct ptnet_queue *rxqueues; + struct ptnet_csb *csb; + + unsigned int min_tx_space; + + struct netmap_pt_guest_adapter *ptna; + + struct callout tick; +#ifdef PTNETMAP_STATS + struct timeval last_ts; +#endif /* PTNETMAP_STATS */ +}; + +#define PTNET_CORE_LOCK(_sc) mtx_lock(&(_sc)->lock) +#define PTNET_CORE_UNLOCK(_sc) mtx_unlock(&(_sc)->lock) + +static int ptnet_probe(device_t); +static int ptnet_attach(device_t); +static int ptnet_detach(device_t); +static int ptnet_suspend(device_t); +static int ptnet_resume(device_t); +static int ptnet_shutdown(device_t); + +static void ptnet_init(void *opaque); +static int ptnet_ioctl(if_t ifp, u_long cmd, caddr_t data); +static int ptnet_init_locked(struct ptnet_softc *sc); +static int ptnet_stop(struct ptnet_softc *sc); +static int ptnet_transmit(if_t ifp, struct mbuf *m); +static int ptnet_drain_transmit_queue(struct ptnet_queue *pq, + unsigned int budget, + bool may_resched); +static void ptnet_qflush(if_t ifp); +static void ptnet_tx_task(void *context, int pending); + +static int ptnet_media_change(if_t ifp); +static void ptnet_media_status(if_t ifp, struct ifmediareq *ifmr); +#ifdef PTNETMAP_STATS +static void ptnet_tick(void *opaque); +#endif + +static int ptnet_irqs_init(struct ptnet_softc *sc); +static void ptnet_irqs_fini(struct ptnet_softc *sc); + +static uint32_t ptnet_nm_ptctl(if_t ifp, uint32_t cmd); +static int ptnet_nm_config(struct netmap_adapter *na, unsigned *txr, + unsigned *txd, unsigned *rxr, unsigned *rxd); +static void ptnet_update_vnet_hdr(struct ptnet_softc *sc); +static int ptnet_nm_register(struct netmap_adapter *na, int onoff); +static int ptnet_nm_txsync(struct netmap_kring *kring, int flags); +static int ptnet_nm_rxsync(struct netmap_kring *kring, int flags); + +static void ptnet_tx_intr(void *opaque); +static void ptnet_rx_intr(void *opaque); + +static unsigned ptnet_rx_discard(struct netmap_kring *kring, + unsigned int head); +static int ptnet_rx_eof(struct ptnet_queue *pq, unsigned int budget, + bool may_resched); +static void ptnet_rx_task(void *context, int pending); + +#ifdef DEVICE_POLLING +static poll_handler_t ptnet_poll; +#endif + +static device_method_t ptnet_methods[] = { + DEVMETHOD(device_probe, ptnet_probe), + DEVMETHOD(device_attach, ptnet_attach), + DEVMETHOD(device_detach, ptnet_detach), + DEVMETHOD(device_suspend, ptnet_suspend), + DEVMETHOD(device_resume, ptnet_resume), + DEVMETHOD(device_shutdown, ptnet_shutdown), + DEVMETHOD_END +}; + +static driver_t ptnet_driver = { + "ptnet", + ptnet_methods, + sizeof(struct ptnet_softc) +}; + +/* We use (SI_ORDER_MIDDLE+2) here, see DEV_MODULE_ORDERED() invocation. */ +static devclass_t ptnet_devclass; +DRIVER_MODULE_ORDERED(ptnet, pci, ptnet_driver, ptnet_devclass, + NULL, NULL, SI_ORDER_MIDDLE + 2); + +static int +ptnet_probe(device_t dev) +{ + if (pci_get_vendor(dev) != PTNETMAP_PCI_VENDOR_ID || + pci_get_device(dev) != PTNETMAP_PCI_NETIF_ID) { + return (ENXIO); + } + + device_set_desc(dev, "ptnet network adapter"); + + return (BUS_PROBE_DEFAULT); +} + +static inline void ptnet_kick(struct ptnet_queue *pq) +{ +#ifdef PTNETMAP_STATS + pq->stats.kicks ++; +#endif /* PTNETMAP_STATS */ + bus_write_4(pq->sc->iomem, pq->kick, 0); +} + +#define PTNET_BUF_RING_SIZE 4096 +#define PTNET_RX_BUDGET 512 +#define PTNET_RX_BATCH 1 +#define PTNET_TX_BUDGET 512 +#define PTNET_TX_BATCH 64 +#define PTNET_HDR_SIZE sizeof(struct virtio_net_hdr_mrg_rxbuf) +#define PTNET_MAX_PKT_SIZE 65536 + +#define PTNET_CSUM_OFFLOAD (CSUM_TCP | CSUM_UDP | CSUM_SCTP) +#define PTNET_CSUM_OFFLOAD_IPV6 (CSUM_TCP_IPV6 | CSUM_UDP_IPV6 |\ + CSUM_SCTP_IPV6) +#define PTNET_ALL_OFFLOAD (CSUM_TSO | PTNET_CSUM_OFFLOAD |\ + PTNET_CSUM_OFFLOAD_IPV6) + +static int +ptnet_attach(device_t dev) +{ + uint32_t ptfeatures = 0; + unsigned int num_rx_rings, num_tx_rings; + struct netmap_adapter na_arg; + unsigned int nifp_offset; + struct ptnet_softc *sc; + if_t ifp; + uint32_t macreg; + int err, rid; + int i; + + sc = device_get_softc(dev); + sc->dev = dev; + + /* Setup PCI resources. */ + pci_enable_busmaster(dev); + + rid = PCIR_BAR(PTNETMAP_IO_PCI_BAR); + sc->iomem = bus_alloc_resource_any(dev, SYS_RES_IOPORT, &rid, + RF_ACTIVE); + if (sc->iomem == NULL) { + device_printf(dev, "Failed to map I/O BAR\n"); + return (ENXIO); + } + + /* Negotiate features with the hypervisor. */ + if (ptnet_vnet_hdr) { + ptfeatures |= PTNETMAP_F_VNET_HDR; + } + bus_write_4(sc->iomem, PTNET_IO_PTFEAT, ptfeatures); /* wanted */ + ptfeatures = bus_read_4(sc->iomem, PTNET_IO_PTFEAT); /* acked */ + sc->ptfeatures = ptfeatures; + + /* Allocate CSB and carry out CSB allocation protocol (CSBBAH first, + * then CSBBAL). */ + sc->csb = malloc(sizeof(struct ptnet_csb), M_DEVBUF, + M_NOWAIT | M_ZERO); + if (sc->csb == NULL) { + device_printf(dev, "Failed to allocate CSB\n"); + err = ENOMEM; + goto err_path; + } + + { + /* + * We use uint64_t rather than vm_paddr_t since we + * need 64 bit addresses even on 32 bit platforms. + */ + uint64_t paddr = vtophys(sc->csb); + + bus_write_4(sc->iomem, PTNET_IO_CSBBAH, + (paddr >> 32) & 0xffffffff); + bus_write_4(sc->iomem, PTNET_IO_CSBBAL, paddr & 0xffffffff); + } + + num_tx_rings = bus_read_4(sc->iomem, PTNET_IO_NUM_TX_RINGS); + num_rx_rings = bus_read_4(sc->iomem, PTNET_IO_NUM_RX_RINGS); + sc->num_rings = num_tx_rings + num_rx_rings; + sc->num_tx_rings = num_tx_rings; + + /* Allocate and initialize per-queue data structures. */ + sc->queues = malloc(sizeof(struct ptnet_queue) * sc->num_rings, + M_DEVBUF, M_NOWAIT | M_ZERO); + if (sc->queues == NULL) { + err = ENOMEM; + goto err_path; + } + sc->rxqueues = sc->queues + num_tx_rings; + + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + + pq->sc = sc; + pq->kring_id = i; + pq->kick = PTNET_IO_KICK_BASE + 4 * i; + pq->ptring = sc->csb->rings + i; + snprintf(pq->lock_name, sizeof(pq->lock_name), "%s-%d", + device_get_nameunit(dev), i); + mtx_init(&pq->lock, pq->lock_name, NULL, MTX_DEF); + if (i >= num_tx_rings) { + /* RX queue: fix kring_id. */ + pq->kring_id -= num_tx_rings; + } else { + /* TX queue: allocate buf_ring. */ + pq->bufring = buf_ring_alloc(PTNET_BUF_RING_SIZE, + M_DEVBUF, M_NOWAIT, &pq->lock); + if (pq->bufring == NULL) { + err = ENOMEM; + goto err_path; + } + } + } + + sc->min_tx_space = 64; /* Safe initial value. */ + + err = ptnet_irqs_init(sc); + if (err) { + goto err_path; + } + + /* Setup Ethernet interface. */ + sc->ifp = ifp = if_alloc(IFT_ETHER); + if (ifp == NULL) { + device_printf(dev, "Failed to allocate ifnet\n"); + err = ENOMEM; + goto err_path; + } + + if_initname(ifp, device_get_name(dev), device_get_unit(dev)); + ifp->if_baudrate = IF_Gbps(10); + ifp->if_softc = sc; + ifp->if_flags = IFF_BROADCAST | IFF_MULTICAST | IFF_SIMPLEX; + ifp->if_init = ptnet_init; + ifp->if_ioctl = ptnet_ioctl; +#if __FreeBSD_version >= 1100000 + ifp->if_get_counter = ptnet_get_counter; +#endif + ifp->if_transmit = ptnet_transmit; + ifp->if_qflush = ptnet_qflush; + + ifmedia_init(&sc->media, IFM_IMASK, ptnet_media_change, + ptnet_media_status); + ifmedia_add(&sc->media, IFM_ETHER | IFM_10G_T | IFM_FDX, 0, NULL); + ifmedia_set(&sc->media, IFM_ETHER | IFM_10G_T | IFM_FDX); + + macreg = bus_read_4(sc->iomem, PTNET_IO_MAC_HI); + sc->hwaddr[0] = (macreg >> 8) & 0xff; + sc->hwaddr[1] = macreg & 0xff; + macreg = bus_read_4(sc->iomem, PTNET_IO_MAC_LO); + sc->hwaddr[2] = (macreg >> 24) & 0xff; + sc->hwaddr[3] = (macreg >> 16) & 0xff; + sc->hwaddr[4] = (macreg >> 8) & 0xff; + sc->hwaddr[5] = macreg & 0xff; + + ether_ifattach(ifp, sc->hwaddr); + + ifp->if_hdrlen = sizeof(struct ether_vlan_header); + ifp->if_capabilities |= IFCAP_JUMBO_MTU | IFCAP_VLAN_MTU; + + if (sc->ptfeatures & PTNETMAP_F_VNET_HDR) { + /* Similarly to what the vtnet driver does, we can emulate + * VLAN offloadings by inserting and removing the 802.1Q + * header during transmit and receive. We are then able + * to do checksum offloading of VLAN frames. */ + ifp->if_capabilities |= IFCAP_HWCSUM | IFCAP_HWCSUM_IPV6 + | IFCAP_VLAN_HWCSUM + | IFCAP_TSO | IFCAP_LRO + | IFCAP_VLAN_HWTSO + | IFCAP_VLAN_HWTAGGING; + } + + ifp->if_capenable = ifp->if_capabilities; +#ifdef DEVICE_POLLING + /* Don't enable polling by default. */ + ifp->if_capabilities |= IFCAP_POLLING; +#endif + snprintf(sc->lock_name, sizeof(sc->lock_name), + "%s", device_get_nameunit(dev)); + mtx_init(&sc->lock, sc->lock_name, "ptnet core lock", MTX_DEF); + callout_init_mtx(&sc->tick, &sc->lock, 0); + + /* Prepare a netmap_adapter struct instance to do netmap_attach(). */ + nifp_offset = bus_read_4(sc->iomem, PTNET_IO_NIFP_OFS); + memset(&na_arg, 0, sizeof(na_arg)); + na_arg.ifp = ifp; + na_arg.num_tx_desc = bus_read_4(sc->iomem, PTNET_IO_NUM_TX_SLOTS); + na_arg.num_rx_desc = bus_read_4(sc->iomem, PTNET_IO_NUM_RX_SLOTS); + na_arg.num_tx_rings = num_tx_rings; + na_arg.num_rx_rings = num_rx_rings; + na_arg.nm_config = ptnet_nm_config; + na_arg.nm_krings_create = ptnet_nm_krings_create; + na_arg.nm_krings_delete = ptnet_nm_krings_delete; + na_arg.nm_dtor = ptnet_nm_dtor; + na_arg.nm_register = ptnet_nm_register; + na_arg.nm_txsync = ptnet_nm_txsync; + na_arg.nm_rxsync = ptnet_nm_rxsync; + + netmap_pt_guest_attach(&na_arg, sc->csb, nifp_offset, + bus_read_4(sc->iomem, PTNET_IO_HOSTMEMID)); + + /* Now a netmap adapter for this ifp has been allocated, and it + * can be accessed through NA(ifp). We also have to initialize the CSB + * pointer. */ + sc->ptna = (struct netmap_pt_guest_adapter *)NA(ifp); + + /* If virtio-net header was negotiated, set the virt_hdr_len field in + * the netmap adapter, to inform users that this netmap adapter requires + * the application to deal with the headers. */ + ptnet_update_vnet_hdr(sc); + + device_printf(dev, "%s() completed\n", __func__); + + return (0); + +err_path: + ptnet_detach(dev); + return err; +} + +static int +ptnet_detach(device_t dev) +{ + struct ptnet_softc *sc = device_get_softc(dev); + int i; + +#ifdef DEVICE_POLLING + if (sc->ifp->if_capenable & IFCAP_POLLING) { + ether_poll_deregister(sc->ifp); + } +#endif + callout_drain(&sc->tick); + + if (sc->queues) { + /* Drain taskqueues before calling if_detach. */ + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + + if (pq->taskq) { + taskqueue_drain(pq->taskq, &pq->task); + } + } + } + + if (sc->ifp) { + ether_ifdetach(sc->ifp); + + /* Uninitialize netmap adapters for this device. */ + netmap_detach(sc->ifp); + + ifmedia_removeall(&sc->media); + if_free(sc->ifp); + sc->ifp = NULL; + } + + ptnet_irqs_fini(sc); + + if (sc->csb) { + bus_write_4(sc->iomem, PTNET_IO_CSBBAH, 0); + bus_write_4(sc->iomem, PTNET_IO_CSBBAL, 0); + free(sc->csb, M_DEVBUF); + sc->csb = NULL; + } + + if (sc->queues) { + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + + if (mtx_initialized(&pq->lock)) { + mtx_destroy(&pq->lock); + } + if (pq->bufring != NULL) { + buf_ring_free(pq->bufring, M_DEVBUF); + } + } + free(sc->queues, M_DEVBUF); + sc->queues = NULL; + } + + if (sc->iomem) { + bus_release_resource(dev, SYS_RES_IOPORT, + PCIR_BAR(PTNETMAP_IO_PCI_BAR), sc->iomem); + sc->iomem = NULL; + } + + mtx_destroy(&sc->lock); + + device_printf(dev, "%s() completed\n", __func__); + + return (0); +} + +static int +ptnet_suspend(device_t dev) +{ + struct ptnet_softc *sc; + + sc = device_get_softc(dev); + (void)sc; + + return (0); +} + +static int +ptnet_resume(device_t dev) +{ + struct ptnet_softc *sc; + + sc = device_get_softc(dev); + (void)sc; + + return (0); +} + +static int +ptnet_shutdown(device_t dev) +{ + /* + * Suspend already does all of what we need to + * do here; we just never expect to be resumed. + */ + return (ptnet_suspend(dev)); +} + +static int +ptnet_irqs_init(struct ptnet_softc *sc) +{ + int rid = PCIR_BAR(PTNETMAP_MSIX_PCI_BAR); + int nvecs = sc->num_rings; + device_t dev = sc->dev; + int err = ENOSPC; + int cpu_cur; + int i; + + if (pci_find_cap(dev, PCIY_MSIX, NULL) != 0) { + device_printf(dev, "Could not find MSI-X capability\n"); + return (ENXIO); + } + + sc->msix_mem = bus_alloc_resource_any(dev, SYS_RES_MEMORY, + &rid, RF_ACTIVE); + if (sc->msix_mem == NULL) { + device_printf(dev, "Failed to allocate MSIX PCI BAR\n"); + return (ENXIO); + } + + if (pci_msix_count(dev) < nvecs) { + device_printf(dev, "Not enough MSI-X vectors\n"); + goto err_path; + } + + err = pci_alloc_msix(dev, &nvecs); + if (err) { + device_printf(dev, "Failed to allocate MSI-X vectors\n"); + goto err_path; + } + + for (i = 0; i < nvecs; i++) { + struct ptnet_queue *pq = sc->queues + i; + + rid = i + 1; + pq->irq = bus_alloc_resource_any(dev, SYS_RES_IRQ, &rid, + RF_ACTIVE); + if (pq->irq == NULL) { + device_printf(dev, "Failed to allocate interrupt " + "for queue #%d\n", i); + err = ENOSPC; + goto err_path; + } + } + + cpu_cur = CPU_FIRST(); + for (i = 0; i < nvecs; i++) { + struct ptnet_queue *pq = sc->queues + i; + void (*handler)(void *) = ptnet_tx_intr; + + if (i >= sc->num_tx_rings) { + handler = ptnet_rx_intr; + } + err = bus_setup_intr(dev, pq->irq, INTR_TYPE_NET | INTR_MPSAFE, + NULL /* intr_filter */, handler, + pq, &pq->cookie); + if (err) { + device_printf(dev, "Failed to register intr handler " + "for queue #%d\n", i); + goto err_path; + } + + bus_describe_intr(dev, pq->irq, pq->cookie, "q%d", i); +#if 0 + bus_bind_intr(sc->dev, pq->irq, cpu_cur); +#endif + cpu_cur = CPU_NEXT(cpu_cur); + } + + device_printf(dev, "Allocated %d MSI-X vectors\n", nvecs); + + cpu_cur = CPU_FIRST(); + for (i = 0; i < nvecs; i++) { + struct ptnet_queue *pq = sc->queues + i; + static void (*handler)(void *context, int pending); + + handler = (i < sc->num_tx_rings) ? ptnet_tx_task : ptnet_rx_task; + + TASK_INIT(&pq->task, 0, handler, pq); + pq->taskq = taskqueue_create_fast("ptnet_queue", M_NOWAIT, + taskqueue_thread_enqueue, &pq->taskq); + taskqueue_start_threads(&pq->taskq, 1, PI_NET, "%s-pq-%d", + device_get_nameunit(sc->dev), cpu_cur); + cpu_cur = CPU_NEXT(cpu_cur); + } + + return 0; +err_path: + ptnet_irqs_fini(sc); + return err; +} + +static void +ptnet_irqs_fini(struct ptnet_softc *sc) +{ + device_t dev = sc->dev; + int i; + + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + + if (pq->taskq) { + taskqueue_free(pq->taskq); + pq->taskq = NULL; + } + + if (pq->cookie) { + bus_teardown_intr(dev, pq->irq, pq->cookie); + pq->cookie = NULL; + } + + if (pq->irq) { + bus_release_resource(dev, SYS_RES_IRQ, i + 1, pq->irq); + pq->irq = NULL; + } + } + + if (sc->msix_mem) { + pci_release_msi(dev); + + bus_release_resource(dev, SYS_RES_MEMORY, + PCIR_BAR(PTNETMAP_MSIX_PCI_BAR), + sc->msix_mem); + sc->msix_mem = NULL; + } +} + +static void +ptnet_init(void *opaque) +{ + struct ptnet_softc *sc = opaque; + + PTNET_CORE_LOCK(sc); + ptnet_init_locked(sc); + PTNET_CORE_UNLOCK(sc); +} + +static int +ptnet_ioctl(if_t ifp, u_long cmd, caddr_t data) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + device_t dev = sc->dev; + struct ifreq *ifr = (struct ifreq *)data; + int mask, err = 0; + + switch (cmd) { + case SIOCSIFFLAGS: + device_printf(dev, "SIOCSIFFLAGS %x\n", ifp->if_flags); + PTNET_CORE_LOCK(sc); + if (ifp->if_flags & IFF_UP) { + /* Network stack wants the iff to be up. */ + err = ptnet_init_locked(sc); + } else { + /* Network stack wants the iff to be down. */ + err = ptnet_stop(sc); + } + /* We don't need to do nothing to support IFF_PROMISC, + * since that is managed by the backend port. */ + PTNET_CORE_UNLOCK(sc); + break; + + case SIOCSIFCAP: + device_printf(dev, "SIOCSIFCAP %x %x\n", + ifr->ifr_reqcap, ifp->if_capenable); + mask = ifr->ifr_reqcap ^ ifp->if_capenable; +#ifdef DEVICE_POLLING + if (mask & IFCAP_POLLING) { + struct ptnet_queue *pq; + int i; + + if (ifr->ifr_reqcap & IFCAP_POLLING) { + err = ether_poll_register(ptnet_poll, ifp); + if (err) { + break; + } + /* Stop queues and sync with taskqueues. */ + ifp->if_drv_flags &= ~IFF_DRV_RUNNING; + for (i = 0; i < sc->num_rings; i++) { + pq = sc-> queues + i; + /* Make sure the worker sees the + * IFF_DRV_RUNNING down. */ + PTNET_Q_LOCK(pq); + pq->ptring->guest_need_kick = 0; + PTNET_Q_UNLOCK(pq); + /* Wait for rescheduling to finish. */ + if (pq->taskq) { + taskqueue_drain(pq->taskq, + &pq->task); + } + } + ifp->if_drv_flags |= IFF_DRV_RUNNING; + } else { + err = ether_poll_deregister(ifp); + for (i = 0; i < sc->num_rings; i++) { + pq = sc-> queues + i; + PTNET_Q_LOCK(pq); + pq->ptring->guest_need_kick = 1; + PTNET_Q_UNLOCK(pq); + } + } + } +#endif /* DEVICE_POLLING */ + ifp->if_capenable = ifr->ifr_reqcap; + break; + + case SIOCSIFMTU: + /* We support any reasonable MTU. */ + if (ifr->ifr_mtu < ETHERMIN || + ifr->ifr_mtu > PTNET_MAX_PKT_SIZE) { + err = EINVAL; + } else { + PTNET_CORE_LOCK(sc); + ifp->if_mtu = ifr->ifr_mtu; + PTNET_CORE_UNLOCK(sc); + } + break; + + case SIOCSIFMEDIA: + case SIOCGIFMEDIA: + err = ifmedia_ioctl(ifp, ifr, &sc->media, cmd); + break; + + default: + err = ether_ioctl(ifp, cmd, data); + break; + } + + return err; +} + +static int +ptnet_init_locked(struct ptnet_softc *sc) +{ + if_t ifp = sc->ifp; + struct netmap_adapter *na_dr = &sc->ptna->dr.up; + struct netmap_adapter *na_nm = &sc->ptna->hwup.up; + unsigned int nm_buf_size; + int ret; + + if (ifp->if_drv_flags & IFF_DRV_RUNNING) { + return 0; /* nothing to do */ + } + + device_printf(sc->dev, "%s\n", __func__); + + /* Translate offload capabilities according to if_capenable. */ + ifp->if_hwassist = 0; + if (ifp->if_capenable & IFCAP_TXCSUM) + ifp->if_hwassist |= PTNET_CSUM_OFFLOAD; + if (ifp->if_capenable & IFCAP_TXCSUM_IPV6) + ifp->if_hwassist |= PTNET_CSUM_OFFLOAD_IPV6; + if (ifp->if_capenable & IFCAP_TSO4) + ifp->if_hwassist |= CSUM_IP_TSO; + if (ifp->if_capenable & IFCAP_TSO6) + ifp->if_hwassist |= CSUM_IP6_TSO; + + /* + * Prepare the interface for netmap mode access. + */ + netmap_update_config(na_dr); + + ret = netmap_mem_finalize(na_dr->nm_mem, na_dr); + if (ret) { + device_printf(sc->dev, "netmap_mem_finalize() failed\n"); + return ret; + } + + if (sc->ptna->backend_regifs == 0) { + ret = ptnet_nm_krings_create(na_nm); + if (ret) { + device_printf(sc->dev, "ptnet_nm_krings_create() " + "failed\n"); + goto err_mem_finalize; + } + + ret = netmap_mem_rings_create(na_dr); + if (ret) { + device_printf(sc->dev, "netmap_mem_rings_create() " + "failed\n"); + goto err_rings_create; + } + + ret = netmap_mem_get_lut(na_dr->nm_mem, &na_dr->na_lut); + if (ret) { + device_printf(sc->dev, "netmap_mem_get_lut() " + "failed\n"); + goto err_get_lut; + } + } + + ret = ptnet_nm_register(na_dr, 1 /* on */); + if (ret) { + goto err_register; + } + + nm_buf_size = NETMAP_BUF_SIZE(na_dr); + + KASSERT(nm_buf_size > 0, ("Invalid netmap buffer size")); + sc->min_tx_space = PTNET_MAX_PKT_SIZE / nm_buf_size + 2; + device_printf(sc->dev, "%s: min_tx_space = %u\n", __func__, + sc->min_tx_space); +#ifdef PTNETMAP_STATS + callout_reset(&sc->tick, hz, ptnet_tick, sc); +#endif + + ifp->if_drv_flags |= IFF_DRV_RUNNING; + + return 0; + +err_register: + memset(&na_dr->na_lut, 0, sizeof(na_dr->na_lut)); +err_get_lut: + netmap_mem_rings_delete(na_dr); +err_rings_create: + ptnet_nm_krings_delete(na_nm); +err_mem_finalize: + netmap_mem_deref(na_dr->nm_mem, na_dr); + + return ret; +} + +/* To be called under core lock. */ +static int +ptnet_stop(struct ptnet_softc *sc) +{ + if_t ifp = sc->ifp; + struct netmap_adapter *na_dr = &sc->ptna->dr.up; + struct netmap_adapter *na_nm = &sc->ptna->hwup.up; + int i; + + device_printf(sc->dev, "%s\n", __func__); + + if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { + return 0; /* nothing to do */ + } + + /* Clear the driver-ready flag, and synchronize with all the queues, + * so that after this loop we are sure nobody is working anymore with + * the device. This scheme is taken from the vtnet driver. */ + ifp->if_drv_flags &= ~IFF_DRV_RUNNING; + callout_stop(&sc->tick); + for (i = 0; i < sc->num_rings; i++) { + PTNET_Q_LOCK(sc->queues + i); + PTNET_Q_UNLOCK(sc->queues + i); + } + + ptnet_nm_register(na_dr, 0 /* off */); + + if (sc->ptna->backend_regifs == 0) { + netmap_mem_rings_delete(na_dr); + ptnet_nm_krings_delete(na_nm); + } + netmap_mem_deref(na_dr->nm_mem, na_dr); + + return 0; +} + +static void +ptnet_qflush(if_t ifp) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + int i; + + /* Flush all the bufrings and do the interface flush. */ + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + struct mbuf *m; + + PTNET_Q_LOCK(pq); + if (pq->bufring) { + while ((m = buf_ring_dequeue_sc(pq->bufring))) { + m_freem(m); + } + } + PTNET_Q_UNLOCK(pq); + } + + if_qflush(ifp); +} + +static int +ptnet_media_change(if_t ifp) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + struct ifmedia *ifm = &sc->media; + + if (IFM_TYPE(ifm->ifm_media) != IFM_ETHER) { + return EINVAL; + } + + return 0; +} + +#if __FreeBSD_version >= 1100000 +static uint64_t +ptnet_get_counter(if_t ifp, ift_counter cnt) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + struct ptnet_queue_stats stats[2]; + int i; + + /* Accumulate statistics over the queues. */ + memset(stats, 0, sizeof(stats)); + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + int idx = (i < sc->num_tx_rings) ? 0 : 1; + + stats[idx].packets += pq->stats.packets; + stats[idx].bytes += pq->stats.bytes; + stats[idx].errors += pq->stats.errors; + stats[idx].iqdrops += pq->stats.iqdrops; + stats[idx].mcasts += pq->stats.mcasts; + } + + switch (cnt) { + case IFCOUNTER_IPACKETS: + return (stats[1].packets); + case IFCOUNTER_IQDROPS: + return (stats[1].iqdrops); + case IFCOUNTER_IERRORS: + return (stats[1].errors); + case IFCOUNTER_OPACKETS: + return (stats[0].packets); + case IFCOUNTER_OBYTES: + return (stats[0].bytes); + case IFCOUNTER_OMCASTS: + return (stats[0].mcasts); + default: + return (if_get_counter_default(ifp, cnt)); + } +} +#endif + + +#ifdef PTNETMAP_STATS +/* Called under core lock. */ +static void +ptnet_tick(void *opaque) +{ + struct ptnet_softc *sc = opaque; + int i; + + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + struct ptnet_queue_stats cur = pq->stats; + struct timeval now; + unsigned int delta; + + microtime(&now); + delta = now.tv_usec - sc->last_ts.tv_usec + + (now.tv_sec - sc->last_ts.tv_sec) * 1000000; + delta /= 1000; /* in milliseconds */ + + if (delta == 0) + continue; + + device_printf(sc->dev, "#%d[%u ms]:pkts %lu, kicks %lu, " + "intr %lu\n", i, delta, + (cur.packets - pq->last_stats.packets), + (cur.kicks - pq->last_stats.kicks), + (cur.intrs - pq->last_stats.intrs)); + pq->last_stats = cur; + } + microtime(&sc->last_ts); + callout_schedule(&sc->tick, hz); +} +#endif /* PTNETMAP_STATS */ + +static void +ptnet_media_status(if_t ifp, struct ifmediareq *ifmr) +{ + /* We are always active, as the backend netmap port is + * always open in netmap mode. */ + ifmr->ifm_status = IFM_AVALID | IFM_ACTIVE; + ifmr->ifm_active = IFM_ETHER | IFM_10G_T | IFM_FDX; +} + +static uint32_t +ptnet_nm_ptctl(if_t ifp, uint32_t cmd) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + /* + * Write a command and read back error status, + * with zero meaning success. + */ + bus_write_4(sc->iomem, PTNET_IO_PTCTL, cmd); + return bus_read_4(sc->iomem, PTNET_IO_PTCTL); +} + +static int +ptnet_nm_config(struct netmap_adapter *na, unsigned *txr, unsigned *txd, + unsigned *rxr, unsigned *rxd) +{ + struct ptnet_softc *sc = if_getsoftc(na->ifp); + + *txr = bus_read_4(sc->iomem, PTNET_IO_NUM_TX_RINGS); + *rxr = bus_read_4(sc->iomem, PTNET_IO_NUM_RX_RINGS); + *txd = bus_read_4(sc->iomem, PTNET_IO_NUM_TX_SLOTS); + *rxd = bus_read_4(sc->iomem, PTNET_IO_NUM_RX_SLOTS); + + device_printf(sc->dev, "txr %u, rxr %u, txd %u, rxd %u\n", + *txr, *rxr, *txd, *rxd); + + return 0; +} + +static void +ptnet_sync_from_csb(struct ptnet_softc *sc, struct netmap_adapter *na) +{ + int i; + + /* Sync krings from the host, reading from + * CSB. */ + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_ring *ptring = sc->queues[i].ptring; + struct netmap_kring *kring; + + if (i < na->num_tx_rings) { + kring = na->tx_rings + i; + } else { + kring = na->rx_rings + i - na->num_tx_rings; + } + kring->rhead = kring->ring->head = ptring->head; + kring->rcur = kring->ring->cur = ptring->cur; + kring->nr_hwcur = ptring->hwcur; + kring->nr_hwtail = kring->rtail = + kring->ring->tail = ptring->hwtail; + + ND("%d,%d: csb {hc %u h %u c %u ht %u}", t, i, + ptring->hwcur, ptring->head, ptring->cur, + ptring->hwtail); + ND("%d,%d: kring {hc %u rh %u rc %u h %u c %u ht %u rt %u t %u}", + t, i, kring->nr_hwcur, kring->rhead, kring->rcur, + kring->ring->head, kring->ring->cur, kring->nr_hwtail, + kring->rtail, kring->ring->tail); + } +} + +static void +ptnet_update_vnet_hdr(struct ptnet_softc *sc) +{ + unsigned int wanted_hdr_len = ptnet_vnet_hdr ? PTNET_HDR_SIZE : 0; + + bus_write_4(sc->iomem, PTNET_IO_VNET_HDR_LEN, wanted_hdr_len); + sc->vnet_hdr_len = bus_read_4(sc->iomem, PTNET_IO_VNET_HDR_LEN); + sc->ptna->hwup.up.virt_hdr_len = sc->vnet_hdr_len; +} + +static int +ptnet_nm_register(struct netmap_adapter *na, int onoff) +{ + /* device-specific */ + if_t ifp = na->ifp; + struct ptnet_softc *sc = if_getsoftc(ifp); + int native = (na == &sc->ptna->hwup.up); + struct ptnet_queue *pq; + enum txrx t; + int ret = 0; + int i; + + if (!onoff) { + sc->ptna->backend_regifs--; + } + + /* If this is the last netmap client, guest interrupt enable flags may + * be in arbitrary state. Since these flags are going to be used also + * by the netdevice driver, we have to make sure to start with + * notifications enabled. Also, schedule NAPI to flush pending packets + * in the RX rings, since we will not receive further interrupts + * until these will be processed. */ + if (native && !onoff && na->active_fds == 0) { + D("Exit netmap mode, re-enable interrupts"); + for (i = 0; i < sc->num_rings; i++) { + pq = sc->queues + i; + pq->ptring->guest_need_kick = 1; + } + } + + if (onoff) { + if (sc->ptna->backend_regifs == 0) { + /* Initialize notification enable fields in the CSB. */ + for (i = 0; i < sc->num_rings; i++) { + pq = sc->queues + i; + pq->ptring->host_need_kick = 1; + pq->ptring->guest_need_kick = + (!(ifp->if_capenable & IFCAP_POLLING) + && i >= sc->num_tx_rings); + } + + /* Set the virtio-net header length. */ + ptnet_update_vnet_hdr(sc); + + /* Make sure the host adapter passed through is ready + * for txsync/rxsync. */ + ret = ptnet_nm_ptctl(ifp, PTNETMAP_PTCTL_CREATE); + if (ret) { + return ret; + } + } + + /* Sync from CSB must be done after REGIF PTCTL. Skip this + * step only if this is a netmap client and it is not the + * first one. */ + if ((!native && sc->ptna->backend_regifs == 0) || + (native && na->active_fds == 0)) { + ptnet_sync_from_csb(sc, na); + } + + /* If not native, don't call nm_set_native_flags, since we don't want + * to replace if_transmit method, nor set NAF_NETMAP_ON */ + if (native) { + for_rx_tx(t) { + for (i = 0; i <= nma_get_nrings(na, t); i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_on(kring)) { + kring->nr_mode = NKR_NETMAP_ON; + } + } + } + nm_set_native_flags(na); + } + + } else { + if (native) { + nm_clear_native_flags(na); + for_rx_tx(t) { + for (i = 0; i <= nma_get_nrings(na, t); i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_off(kring)) { + kring->nr_mode = NKR_NETMAP_OFF; + } + } + } + } + + /* Sync from CSB must be done before UNREGIF PTCTL, on the last + * netmap client. */ + if (native && na->active_fds == 0) { + ptnet_sync_from_csb(sc, na); + } + + if (sc->ptna->backend_regifs == 0) { + ret = ptnet_nm_ptctl(ifp, PTNETMAP_PTCTL_DELETE); + } + } + + if (onoff) { + sc->ptna->backend_regifs++; + } + + return ret; +} + +static int +ptnet_nm_txsync(struct netmap_kring *kring, int flags) +{ + struct ptnet_softc *sc = if_getsoftc(kring->na->ifp); + struct ptnet_queue *pq = sc->queues + kring->ring_id; + bool notify; + + notify = netmap_pt_guest_txsync(pq->ptring, kring, flags); + if (notify) { + ptnet_kick(pq); + } + + return 0; +} + +static int +ptnet_nm_rxsync(struct netmap_kring *kring, int flags) +{ + struct ptnet_softc *sc = if_getsoftc(kring->na->ifp); + struct ptnet_queue *pq = sc->rxqueues + kring->ring_id; + bool notify; + + notify = netmap_pt_guest_rxsync(pq->ptring, kring, flags); + if (notify) { + ptnet_kick(pq); + } + + return 0; +} + +static void +ptnet_tx_intr(void *opaque) +{ + struct ptnet_queue *pq = opaque; + struct ptnet_softc *sc = pq->sc; + + DBG(device_printf(sc->dev, "Tx interrupt #%d\n", pq->kring_id)); +#ifdef PTNETMAP_STATS + pq->stats.intrs ++; +#endif /* PTNETMAP_STATS */ + + if (netmap_tx_irq(sc->ifp, pq->kring_id) != NM_IRQ_PASS) { + return; + } + + /* Schedule the tasqueue to flush process transmissions requests. + * However, vtnet, if_em and if_igb just call ptnet_transmit() here, + * at least when using MSI-X interrupts. The if_em driver, instead + * schedule taskqueue when using legacy interrupts. */ + taskqueue_enqueue(pq->taskq, &pq->task); +} + +static void +ptnet_rx_intr(void *opaque) +{ + struct ptnet_queue *pq = opaque; + struct ptnet_softc *sc = pq->sc; + unsigned int unused; + + DBG(device_printf(sc->dev, "Rx interrupt #%d\n", pq->kring_id)); +#ifdef PTNETMAP_STATS + pq->stats.intrs ++; +#endif /* PTNETMAP_STATS */ + + if (netmap_rx_irq(sc->ifp, pq->kring_id, &unused) != NM_IRQ_PASS) { + return; + } + + /* Like vtnet, if_igb and if_em drivers when using MSI-X interrupts, + * receive-side processing is executed directly in the interrupt + * service routine. Alternatively, we may schedule the taskqueue. */ + ptnet_rx_eof(pq, PTNET_RX_BUDGET, true); +} + +/* The following offloadings-related functions are taken from the vtnet + * driver, but the same functionality is required for the ptnet driver. + * As a temporary solution, I copied this code from vtnet and I started + * to generalize it (taking away driver-specific statistic accounting), + * making as little modifications as possible. + * In the future we need to share these functions between vtnet and ptnet. + */ +static int +ptnet_tx_offload_ctx(struct mbuf *m, int *etype, int *proto, int *start) +{ + struct ether_vlan_header *evh; + int offset; + + evh = mtod(m, struct ether_vlan_header *); + if (evh->evl_encap_proto == htons(ETHERTYPE_VLAN)) { + /* BMV: We should handle nested VLAN tags too. */ + *etype = ntohs(evh->evl_proto); + offset = sizeof(struct ether_vlan_header); + } else { + *etype = ntohs(evh->evl_encap_proto); + offset = sizeof(struct ether_header); + } + + switch (*etype) { +#if defined(INET) + case ETHERTYPE_IP: { + struct ip *ip, iphdr; + if (__predict_false(m->m_len < offset + sizeof(struct ip))) { + m_copydata(m, offset, sizeof(struct ip), + (caddr_t) &iphdr); + ip = &iphdr; + } else + ip = (struct ip *)(m->m_data + offset); + *proto = ip->ip_p; + *start = offset + (ip->ip_hl << 2); + break; + } +#endif +#if defined(INET6) + case ETHERTYPE_IPV6: + *proto = -1; + *start = ip6_lasthdr(m, offset, IPPROTO_IPV6, proto); + /* Assert the network stack sent us a valid packet. */ + KASSERT(*start > offset, + ("%s: mbuf %p start %d offset %d proto %d", __func__, m, + *start, offset, *proto)); + break; +#endif + default: + /* Here we should increment the tx_csum_bad_ethtype counter. */ + return (EINVAL); + } + + return (0); +} + +static int +ptnet_tx_offload_tso(if_t ifp, struct mbuf *m, int eth_type, + int offset, bool allow_ecn, struct virtio_net_hdr *hdr) +{ + static struct timeval lastecn; + static int curecn; + struct tcphdr *tcp, tcphdr; + + if (__predict_false(m->m_len < offset + sizeof(struct tcphdr))) { + m_copydata(m, offset, sizeof(struct tcphdr), (caddr_t) &tcphdr); + tcp = &tcphdr; + } else + tcp = (struct tcphdr *)(m->m_data + offset); + + hdr->hdr_len = offset + (tcp->th_off << 2); + hdr->gso_size = m->m_pkthdr.tso_segsz; + hdr->gso_type = eth_type == ETHERTYPE_IP ? VIRTIO_NET_HDR_GSO_TCPV4 : + VIRTIO_NET_HDR_GSO_TCPV6; + + if (tcp->th_flags & TH_CWR) { + /* + * Drop if VIRTIO_NET_F_HOST_ECN was not negotiated. In FreeBSD, + * ECN support is not on a per-interface basis, but globally via + * the net.inet.tcp.ecn.enable sysctl knob. The default is off. + */ + if (!allow_ecn) { + if (ppsratecheck(&lastecn, &curecn, 1)) + if_printf(ifp, + "TSO with ECN not negotiated with host\n"); + return (ENOTSUP); + } + hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; + } + + /* Here we should increment tx_tso counter. */ + + return (0); +} + +static struct mbuf * +ptnet_tx_offload(if_t ifp, struct mbuf *m, bool allow_ecn, + struct virtio_net_hdr *hdr) +{ + int flags, etype, csum_start, proto, error; + + flags = m->m_pkthdr.csum_flags; + + error = ptnet_tx_offload_ctx(m, &etype, &proto, &csum_start); + if (error) + goto drop; + + if ((etype == ETHERTYPE_IP && flags & PTNET_CSUM_OFFLOAD) || + (etype == ETHERTYPE_IPV6 && flags & PTNET_CSUM_OFFLOAD_IPV6)) { + /* + * We could compare the IP protocol vs the CSUM_ flag too, + * but that really should not be necessary. + */ + hdr->flags |= VIRTIO_NET_HDR_F_NEEDS_CSUM; + hdr->csum_start = csum_start; + hdr->csum_offset = m->m_pkthdr.csum_data; + /* Here we should increment the tx_csum counter. */ + } + + if (flags & CSUM_TSO) { + if (__predict_false(proto != IPPROTO_TCP)) { + /* Likely failed to correctly parse the mbuf. + * Here we should increment the tx_tso_not_tcp + * counter. */ + goto drop; + } + + KASSERT(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM, + ("%s: mbuf %p TSO without checksum offload %#x", + __func__, m, flags)); + + error = ptnet_tx_offload_tso(ifp, m, etype, csum_start, + allow_ecn, hdr); + if (error) + goto drop; + } + + return (m); + +drop: + m_freem(m); + return (NULL); +} + +static void +ptnet_vlan_tag_remove(struct mbuf *m) +{ + struct ether_vlan_header *evh; + + evh = mtod(m, struct ether_vlan_header *); + m->m_pkthdr.ether_vtag = ntohs(evh->evl_tag); + m->m_flags |= M_VLANTAG; + + /* Strip the 802.1Q header. */ + bcopy((char *) evh, (char *) evh + ETHER_VLAN_ENCAP_LEN, + ETHER_HDR_LEN - ETHER_TYPE_LEN); + m_adj(m, ETHER_VLAN_ENCAP_LEN); +} + +/* + * Use the checksum offset in the VirtIO header to set the + * correct CSUM_* flags. + */ +static int +ptnet_rx_csum_by_offset(struct mbuf *m, uint16_t eth_type, int ip_start, + struct virtio_net_hdr *hdr) +{ +#if defined(INET) || defined(INET6) + int offset = hdr->csum_start + hdr->csum_offset; +#endif + + /* Only do a basic sanity check on the offset. */ + switch (eth_type) { +#if defined(INET) + case ETHERTYPE_IP: + if (__predict_false(offset < ip_start + sizeof(struct ip))) + return (1); + break; +#endif +#if defined(INET6) + case ETHERTYPE_IPV6: + if (__predict_false(offset < ip_start + sizeof(struct ip6_hdr))) + return (1); + break; +#endif + default: + /* Here we should increment the rx_csum_bad_ethtype counter. */ + return (1); + } + + /* + * Use the offset to determine the appropriate CSUM_* flags. This is + * a bit dirty, but we can get by with it since the checksum offsets + * happen to be different. We assume the host host does not do IPv4 + * header checksum offloading. + */ + switch (hdr->csum_offset) { + case offsetof(struct udphdr, uh_sum): + case offsetof(struct tcphdr, th_sum): + m->m_pkthdr.csum_flags |= CSUM_DATA_VALID | CSUM_PSEUDO_HDR; + m->m_pkthdr.csum_data = 0xFFFF; + break; + case offsetof(struct sctphdr, checksum): + m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID; + break; + default: + /* Here we should increment the rx_csum_bad_offset counter. */ + return (1); + } + + return (0); +} + +static int +ptnet_rx_csum_by_parse(struct mbuf *m, uint16_t eth_type, int ip_start, + struct virtio_net_hdr *hdr) +{ + int offset, proto; + + switch (eth_type) { +#if defined(INET) + case ETHERTYPE_IP: { + struct ip *ip; + if (__predict_false(m->m_len < ip_start + sizeof(struct ip))) + return (1); + ip = (struct ip *)(m->m_data + ip_start); + proto = ip->ip_p; + offset = ip_start + (ip->ip_hl << 2); + break; + } +#endif +#if defined(INET6) + case ETHERTYPE_IPV6: + if (__predict_false(m->m_len < ip_start + + sizeof(struct ip6_hdr))) + return (1); + offset = ip6_lasthdr(m, ip_start, IPPROTO_IPV6, &proto); + if (__predict_false(offset < 0)) + return (1); + break; +#endif + default: + /* Here we should increment the rx_csum_bad_ethtype counter. */ + return (1); + } + + switch (proto) { + case IPPROTO_TCP: + if (__predict_false(m->m_len < offset + sizeof(struct tcphdr))) + return (1); + m->m_pkthdr.csum_flags |= CSUM_DATA_VALID | CSUM_PSEUDO_HDR; + m->m_pkthdr.csum_data = 0xFFFF; + break; + case IPPROTO_UDP: + if (__predict_false(m->m_len < offset + sizeof(struct udphdr))) + return (1); + m->m_pkthdr.csum_flags |= CSUM_DATA_VALID | CSUM_PSEUDO_HDR; + m->m_pkthdr.csum_data = 0xFFFF; + break; + case IPPROTO_SCTP: + if (__predict_false(m->m_len < offset + sizeof(struct sctphdr))) + return (1); + m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID; + break; + default: + /* + * For the remaining protocols, FreeBSD does not support + * checksum offloading, so the checksum will be recomputed. + */ +#if 0 + if_printf(ifp, "cksum offload of unsupported " + "protocol eth_type=%#x proto=%d csum_start=%d " + "csum_offset=%d\n", __func__, eth_type, proto, + hdr->csum_start, hdr->csum_offset); +#endif + break; + } + + return (0); +} + +/* + * Set the appropriate CSUM_* flags. Unfortunately, the information + * provided is not directly useful to us. The VirtIO header gives the + * offset of the checksum, which is all Linux needs, but this is not + * how FreeBSD does things. We are forced to peek inside the packet + * a bit. + * + * It would be nice if VirtIO gave us the L4 protocol or if FreeBSD + * could accept the offsets and let the stack figure it out. + */ +static int +ptnet_rx_csum(struct mbuf *m, struct virtio_net_hdr *hdr) +{ + struct ether_header *eh; + struct ether_vlan_header *evh; + uint16_t eth_type; + int offset, error; + + eh = mtod(m, struct ether_header *); + eth_type = ntohs(eh->ether_type); + if (eth_type == ETHERTYPE_VLAN) { + /* BMV: We should handle nested VLAN tags too. */ + evh = mtod(m, struct ether_vlan_header *); + eth_type = ntohs(evh->evl_proto); + offset = sizeof(struct ether_vlan_header); + } else + offset = sizeof(struct ether_header); + + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) + error = ptnet_rx_csum_by_offset(m, eth_type, offset, hdr); + else + error = ptnet_rx_csum_by_parse(m, eth_type, offset, hdr); + + return (error); +} +/* End of offloading-related functions to be shared with vtnet. */ + +static inline void +ptnet_sync_tail(struct ptnet_ring *ptring, struct netmap_kring *kring) +{ + struct netmap_ring *ring = kring->ring; + + /* Update hwcur and hwtail as known by the host. */ + ptnetmap_guest_read_kring_csb(ptring, kring); + + /* nm_sync_finalize */ + ring->tail = kring->rtail = kring->nr_hwtail; +} + +static void +ptnet_ring_update(struct ptnet_queue *pq, struct netmap_kring *kring, + unsigned int head, unsigned int sync_flags) +{ + struct netmap_ring *ring = kring->ring; + struct ptnet_ring *ptring = pq->ptring; + + /* Some packets have been pushed to the netmap ring. We have + * to tell the host to process the new packets, updating cur + * and head in the CSB. */ + ring->head = ring->cur = head; + + /* Mimic nm_txsync_prologue/nm_rxsync_prologue. */ + kring->rcur = kring->rhead = head; + + ptnetmap_guest_write_kring_csb(ptring, kring->rcur, kring->rhead); + + /* Kick the host if needed. */ + if (NM_ACCESS_ONCE(ptring->host_need_kick)) { + ptring->sync_flags = sync_flags; + ptnet_kick(pq); + } +} + +#define PTNET_TX_NOSPACE(_h, _k, _min) \ + ((((_h) < (_k)->rtail) ? 0 : (_k)->nkr_num_slots) + \ + (_k)->rtail - (_h)) < (_min) + +/* This function may be called by the network stack, or by + * by the taskqueue thread. */ +static int +ptnet_drain_transmit_queue(struct ptnet_queue *pq, unsigned int budget, + bool may_resched) +{ + struct ptnet_softc *sc = pq->sc; + bool have_vnet_hdr = sc->vnet_hdr_len; + struct netmap_adapter *na = &sc->ptna->dr.up; + if_t ifp = sc->ifp; + unsigned int batch_count = 0; + struct ptnet_ring *ptring; + struct netmap_kring *kring; + struct netmap_ring *ring; + struct netmap_slot *slot; + unsigned int count = 0; + unsigned int minspace; + unsigned int head; + unsigned int lim; + struct mbuf *mhead; + struct mbuf *mf; + int nmbuf_bytes; + uint8_t *nmbuf; + + if (!PTNET_Q_TRYLOCK(pq)) { + /* We failed to acquire the lock, schedule the taskqueue. */ + RD(1, "Deferring TX work"); + if (may_resched) { + taskqueue_enqueue(pq->taskq, &pq->task); + } + + return 0; + } + + if (unlikely(!(ifp->if_drv_flags & IFF_DRV_RUNNING))) { + PTNET_Q_UNLOCK(pq); + RD(1, "Interface is down"); + return ENETDOWN; + } + + ptring = pq->ptring; + kring = na->tx_rings + pq->kring_id; + ring = kring->ring; + lim = kring->nkr_num_slots - 1; + head = ring->head; + minspace = sc->min_tx_space; + + while (count < budget) { + if (PTNET_TX_NOSPACE(head, kring, minspace)) { + /* We ran out of slot, let's see if the host has + * freed up some, by reading hwcur and hwtail from + * the CSB. */ + ptnet_sync_tail(ptring, kring); + + if (PTNET_TX_NOSPACE(head, kring, minspace)) { + /* Still no slots available. Reactivate the + * interrupts so that we can be notified + * when some free slots are made available by + * the host. */ + ptring->guest_need_kick = 1; + + /* Double-check. */ + ptnet_sync_tail(ptring, kring); + if (likely(PTNET_TX_NOSPACE(head, kring, + minspace))) { + break; + } + + RD(1, "Found more slots by doublecheck"); + /* More slots were freed before reactivating + * the interrupts. */ + ptring->guest_need_kick = 0; + } + } + + mhead = drbr_peek(ifp, pq->bufring); + if (!mhead) { + break; + } + + /* Initialize transmission state variables. */ + slot = ring->slot + head; + nmbuf = NMB(na, slot); + nmbuf_bytes = 0; + + /* If needed, prepare the virtio-net header at the beginning + * of the first slot. */ + if (have_vnet_hdr) { + struct virtio_net_hdr *vh = + (struct virtio_net_hdr *)nmbuf; + + /* For performance, we could replace this memset() with + * two 8-bytes-wide writes. */ + memset(nmbuf, 0, PTNET_HDR_SIZE); + if (mhead->m_pkthdr.csum_flags & PTNET_ALL_OFFLOAD) { + mhead = ptnet_tx_offload(ifp, mhead, false, + vh); + if (unlikely(!mhead)) { + /* Packet dropped because errors + * occurred while preparing the vnet + * header. Let's go ahead with the next + * packet. */ + pq->stats.errors ++; + drbr_advance(ifp, pq->bufring); + continue; + } + } + ND(1, "%s: [csum_flags %lX] vnet hdr: flags %x " + "csum_start %u csum_ofs %u hdr_len = %u " + "gso_size %u gso_type %x", __func__, + mhead->m_pkthdr.csum_flags, vh->flags, + vh->csum_start, vh->csum_offset, vh->hdr_len, + vh->gso_size, vh->gso_type); + + nmbuf += PTNET_HDR_SIZE; + nmbuf_bytes += PTNET_HDR_SIZE; + } + + for (mf = mhead; mf; mf = mf->m_next) { + uint8_t *mdata = mf->m_data; + int mlen = mf->m_len; + + for (;;) { + int copy = NETMAP_BUF_SIZE(na) - nmbuf_bytes; + + if (mlen < copy) { + copy = mlen; + } + memcpy(nmbuf, mdata, copy); + + mdata += copy; + mlen -= copy; + nmbuf += copy; + nmbuf_bytes += copy; + + if (!mlen) { + break; + } + + slot->len = nmbuf_bytes; + slot->flags = NS_MOREFRAG; + + head = nm_next(head, lim); + KASSERT(head != ring->tail, + ("Unexpectedly run out of TX space")); + slot = ring->slot + head; + nmbuf = NMB(na, slot); + nmbuf_bytes = 0; + } + } + + /* Complete last slot and update head. */ + slot->len = nmbuf_bytes; + slot->flags = 0; + head = nm_next(head, lim); + + /* Consume the packet just processed. */ + drbr_advance(ifp, pq->bufring); + + /* Copy the packet to listeners. */ + ETHER_BPF_MTAP(ifp, mhead); + + pq->stats.packets ++; + pq->stats.bytes += mhead->m_pkthdr.len; + if (mhead->m_flags & M_MCAST) { + pq->stats.mcasts ++; + } + + m_freem(mhead); + + count ++; + if (++batch_count == PTNET_TX_BATCH) { + ptnet_ring_update(pq, kring, head, NAF_FORCE_RECLAIM); + batch_count = 0; + } + } + + if (batch_count) { + ptnet_ring_update(pq, kring, head, NAF_FORCE_RECLAIM); + } + + if (count >= budget && may_resched) { + DBG(RD(1, "out of budget: resched, %d mbufs pending\n", + drbr_inuse(ifp, pq->bufring))); + taskqueue_enqueue(pq->taskq, &pq->task); + } + + PTNET_Q_UNLOCK(pq); + + return count; +} + +static int +ptnet_transmit(if_t ifp, struct mbuf *m) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + struct ptnet_queue *pq; + unsigned int queue_idx; + int err; + + DBG(device_printf(sc->dev, "transmit %p\n", m)); + + /* Insert 802.1Q header if needed. */ + if (m->m_flags & M_VLANTAG) { + m = ether_vlanencap(m, m->m_pkthdr.ether_vtag); + if (m == NULL) { + return ENOBUFS; + } + m->m_flags &= ~M_VLANTAG; + } + + /* Get the flow-id if available. */ + queue_idx = (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) ? + m->m_pkthdr.flowid : curcpu; + + if (unlikely(queue_idx >= sc->num_tx_rings)) { + queue_idx %= sc->num_tx_rings; + } + + pq = sc->queues + queue_idx; + + err = drbr_enqueue(ifp, pq->bufring, m); + if (err) { + /* ENOBUFS when the bufring is full */ + RD(1, "%s: drbr_enqueue() failed %d\n", + __func__, err); + pq->stats.errors ++; + return err; + } + + if (ifp->if_capenable & IFCAP_POLLING) { + /* If polling is on, the transmit queues will be + * drained by the poller. */ + return 0; + } + + err = ptnet_drain_transmit_queue(pq, PTNET_TX_BUDGET, true); + + return (err < 0) ? err : 0; +} + +static unsigned int +ptnet_rx_discard(struct netmap_kring *kring, unsigned int head) +{ + struct netmap_ring *ring = kring->ring; + struct netmap_slot *slot = ring->slot + head; + + for (;;) { + head = nm_next(head, kring->nkr_num_slots - 1); + if (!(slot->flags & NS_MOREFRAG) || head == ring->tail) { + break; + } + slot = ring->slot + head; + } + + return head; +} + +static inline struct mbuf * +ptnet_rx_slot(struct mbuf *mtail, uint8_t *nmbuf, unsigned int nmbuf_len) +{ + uint8_t *mdata = mtod(mtail, uint8_t *) + mtail->m_len; + + do { + unsigned int copy; + + if (mtail->m_len == MCLBYTES) { + struct mbuf *mf; + + mf = m_getcl(M_NOWAIT, MT_DATA, 0); + if (unlikely(!mf)) { + return NULL; + } + + mtail->m_next = mf; + mtail = mf; + mdata = mtod(mtail, uint8_t *); + mtail->m_len = 0; + } + + copy = MCLBYTES - mtail->m_len; + if (nmbuf_len < copy) { + copy = nmbuf_len; + } + + memcpy(mdata, nmbuf, copy); + + nmbuf += copy; + nmbuf_len -= copy; + mdata += copy; + mtail->m_len += copy; + } while (nmbuf_len); + + return mtail; +} + +static int +ptnet_rx_eof(struct ptnet_queue *pq, unsigned int budget, bool may_resched) +{ + struct ptnet_softc *sc = pq->sc; + bool have_vnet_hdr = sc->vnet_hdr_len; + struct ptnet_ring *ptring = pq->ptring; + struct netmap_adapter *na = &sc->ptna->dr.up; + struct netmap_kring *kring = na->rx_rings + pq->kring_id; + struct netmap_ring *ring = kring->ring; + unsigned int const lim = kring->nkr_num_slots - 1; + unsigned int head = ring->head; + unsigned int batch_count = 0; + if_t ifp = sc->ifp; + unsigned int count = 0; + + PTNET_Q_LOCK(pq); + + if (unlikely(!(ifp->if_drv_flags & IFF_DRV_RUNNING))) { + goto unlock; + } + + kring->nr_kflags &= ~NKR_PENDINTR; + + while (count < budget) { + unsigned int prev_head = head; + struct mbuf *mhead, *mtail; + struct virtio_net_hdr *vh; + struct netmap_slot *slot; + unsigned int nmbuf_len; + uint8_t *nmbuf; +host_sync: + if (head == ring->tail) { + /* We ran out of slot, let's see if the host has + * added some, by reading hwcur and hwtail from + * the CSB. */ + ptnet_sync_tail(ptring, kring); + + if (head == ring->tail) { + /* Still no slots available. Reactivate + * interrupts as they were disabled by the + * host thread right before issuing the + * last interrupt. */ + ptring->guest_need_kick = 1; + + /* Double-check. */ + ptnet_sync_tail(ptring, kring); + if (likely(head == ring->tail)) { + break; + } + ptring->guest_need_kick = 0; + } + } + + /* Initialize ring state variables, possibly grabbing the + * virtio-net header. */ + slot = ring->slot + head; + nmbuf = NMB(na, slot); + nmbuf_len = slot->len; + + vh = (struct virtio_net_hdr *)nmbuf; + if (have_vnet_hdr) { + if (unlikely(nmbuf_len < PTNET_HDR_SIZE)) { + /* There is no good reason why host should + * put the header in multiple netmap slots. + * If this is the case, discard. */ + RD(1, "Fragmented vnet-hdr: dropping"); + head = ptnet_rx_discard(kring, head); + pq->stats.iqdrops ++; + goto skip; + } + ND(1, "%s: vnet hdr: flags %x csum_start %u " + "csum_ofs %u hdr_len = %u gso_size %u " + "gso_type %x", __func__, vh->flags, + vh->csum_start, vh->csum_offset, vh->hdr_len, + vh->gso_size, vh->gso_type); + nmbuf += PTNET_HDR_SIZE; + nmbuf_len -= PTNET_HDR_SIZE; + } + + /* Allocate the head of a new mbuf chain. + * We use m_getcl() to allocate an mbuf with standard cluster + * size (MCLBYTES). In the future we could use m_getjcl() + * to choose different sizes. */ + mhead = mtail = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); + if (unlikely(mhead == NULL)) { + device_printf(sc->dev, "%s: failed to allocate mbuf " + "head\n", __func__); + pq->stats.errors ++; + break; + } + + /* Initialize the mbuf state variables. */ + mhead->m_pkthdr.len = nmbuf_len; + mtail->m_len = 0; + + /* Scan all the netmap slots containing the current packet. */ + for (;;) { + DBG(device_printf(sc->dev, "%s: h %u t %u rcv frag " + "len %u, flags %u\n", __func__, + head, ring->tail, slot->len, + slot->flags)); + + mtail = ptnet_rx_slot(mtail, nmbuf, nmbuf_len); + if (unlikely(!mtail)) { + /* Ouch. We ran out of memory while processing + * a packet. We have to restore the previous + * head position, free the mbuf chain, and + * schedule the taskqueue to give the packet + * another chance. */ + device_printf(sc->dev, "%s: failed to allocate" + " mbuf frag, reset head %u --> %u\n", + __func__, head, prev_head); + head = prev_head; + m_freem(mhead); + pq->stats.errors ++; + if (may_resched) { + taskqueue_enqueue(pq->taskq, + &pq->task); + } + goto escape; + } + + /* We have to increment head irrespective of the + * NS_MOREFRAG being set or not. */ + head = nm_next(head, lim); + + if (!(slot->flags & NS_MOREFRAG)) { + break; + } + + if (unlikely(head == ring->tail)) { + /* The very last slot prepared by the host has + * the NS_MOREFRAG set. Drop it and continue + * the outer cycle (to do the double-check). */ + RD(1, "Incomplete packet: dropping"); + m_freem(mhead); + pq->stats.iqdrops ++; + goto host_sync; + } + + slot = ring->slot + head; + nmbuf = NMB(na, slot); + nmbuf_len = slot->len; + mhead->m_pkthdr.len += nmbuf_len; + } + + mhead->m_pkthdr.rcvif = ifp; + mhead->m_pkthdr.csum_flags = 0; + + /* Store the queue idx in the packet header. */ + mhead->m_pkthdr.flowid = pq->kring_id; + M_HASHTYPE_SET(mhead, M_HASHTYPE_OPAQUE); + + if (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) { + struct ether_header *eh; + + eh = mtod(mhead, struct ether_header *); + if (eh->ether_type == htons(ETHERTYPE_VLAN)) { + ptnet_vlan_tag_remove(mhead); + /* + * With the 802.1Q header removed, update the + * checksum starting location accordingly. + */ + if (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) + vh->csum_start -= ETHER_VLAN_ENCAP_LEN; + } + } + + if (have_vnet_hdr && (vh->flags & (VIRTIO_NET_HDR_F_NEEDS_CSUM + | VIRTIO_NET_HDR_F_DATA_VALID))) { + if (unlikely(ptnet_rx_csum(mhead, vh))) { + m_freem(mhead); + RD(1, "Csum offload error: dropping"); + pq->stats.iqdrops ++; + goto skip; + } + } + + pq->stats.packets ++; + pq->stats.bytes += mhead->m_pkthdr.len; + + PTNET_Q_UNLOCK(pq); + (*ifp->if_input)(ifp, mhead); + PTNET_Q_LOCK(pq); + + if (unlikely(!(ifp->if_drv_flags & IFF_DRV_RUNNING))) { + /* The interface has gone down while we didn't + * have the lock. Stop any processing and exit. */ + goto unlock; + } +skip: + count ++; + if (++batch_count == PTNET_RX_BATCH) { + /* Some packets have been pushed to the network stack. + * We need to update the CSB to tell the host about the new + * ring->cur and ring->head (RX buffer refill). */ + ptnet_ring_update(pq, kring, head, NAF_FORCE_READ); + batch_count = 0; + } + } +escape: + if (batch_count) { + ptnet_ring_update(pq, kring, head, NAF_FORCE_READ); + + } + + if (count >= budget && may_resched) { + /* If we ran out of budget or the double-check found new + * slots to process, schedule the taskqueue. */ + DBG(RD(1, "out of budget: resched h %u t %u\n", + head, ring->tail)); + taskqueue_enqueue(pq->taskq, &pq->task); + } +unlock: + PTNET_Q_UNLOCK(pq); + + return count; +} + +static void +ptnet_rx_task(void *context, int pending) +{ + struct ptnet_queue *pq = context; + + DBG(RD(1, "%s: pq #%u\n", __func__, pq->kring_id)); + ptnet_rx_eof(pq, PTNET_RX_BUDGET, true); +} + +static void +ptnet_tx_task(void *context, int pending) +{ + struct ptnet_queue *pq = context; + + DBG(RD(1, "%s: pq #%u\n", __func__, pq->kring_id)); + ptnet_drain_transmit_queue(pq, PTNET_TX_BUDGET, true); +} + +#ifdef DEVICE_POLLING +/* We don't need to handle differently POLL_AND_CHECK_STATUS and + * POLL_ONLY, since we don't have an Interrupt Status Register. */ +static int +ptnet_poll(if_t ifp, enum poll_cmd cmd, int budget) +{ + struct ptnet_softc *sc = if_getsoftc(ifp); + unsigned int queue_budget; + unsigned int count = 0; + bool borrow = false; + int i; + + KASSERT(sc->num_rings > 0, ("Found no queues in while polling ptnet")); + queue_budget = MAX(budget / sc->num_rings, 1); + RD(1, "Per-queue budget is %d", queue_budget); + + while (budget) { + unsigned int rcnt = 0; + + for (i = 0; i < sc->num_rings; i++) { + struct ptnet_queue *pq = sc->queues + i; + + if (borrow) { + queue_budget = MIN(queue_budget, budget); + if (queue_budget == 0) { + break; + } + } + + if (i < sc->num_tx_rings) { + rcnt += ptnet_drain_transmit_queue(pq, + queue_budget, false); + } else { + rcnt += ptnet_rx_eof(pq, queue_budget, + false); + } + } + + if (!rcnt) { + /* A scan of the queues gave no result, we can + * stop here. */ + break; + } + + if (rcnt > budget) { + /* This may happen when initial budget < sc->num_rings, + * since one packet budget is given to each queue + * anyway. Just pretend we didn't eat "so much". */ + rcnt = budget; + } + count += rcnt; + budget -= rcnt; + borrow = true; + } + + + return count; +} +#endif /* DEVICE_POLLING */ diff -u -r -N usr/src/sys/dev/netmap/if_re_netmap.h /usr/src/sys/dev/netmap/if_re_netmap.h --- usr/src/sys/dev/netmap/if_re_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_re_netmap.h 2016-11-23 16:57:57.845156000 +0000 @@ -24,7 +24,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/if_re_netmap.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/dev/netmap/if_re_netmap.h 234225 2012-04-13 15:33:12Z luigi $ * * netmap support for: re * diff -u -r -N usr/src/sys/dev/netmap/if_vtnet_netmap.h /usr/src/sys/dev/netmap/if_vtnet_netmap.h --- usr/src/sys/dev/netmap/if_vtnet_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/if_vtnet_netmap.h 2016-11-23 16:57:57.845543000 +0000 @@ -24,7 +24,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/if_vtnet_netmap.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/dev/netmap/if_vtnet_netmap.h 270097 2014-08-17 10:25:27Z luigi $ */ #include <net/netmap.h> @@ -127,7 +127,7 @@ * First part: process new packets to send. */ rmb(); - + nm_i = kring->nr_hwcur; if (nm_i != head) { /* we have new packets to send */ struct sglist *sg = txq->vtntx_sg; @@ -182,7 +182,7 @@ virtqueue_enable_intr(vq); // like postpone with 0 } - + /* Free used slots. We only consider our own used buffers, recognized * by the token we passed to virtqueue_add_outbuf. */ diff -u -r -N usr/src/sys/dev/netmap/ixgbe_netmap.h /usr/src/sys/dev/netmap/ixgbe_netmap.h --- usr/src/sys/dev/netmap/ixgbe_netmap.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/ixgbe_netmap.h 2016-11-23 16:57:57.846057000 +0000 @@ -24,7 +24,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/ixgbe_netmap.h 285592 2015-07-15 01:02:01Z pkelsey $ + * $FreeBSD: head/sys/dev/netmap/ixgbe_netmap.h 244514 2012-12-20 22:26:03Z luigi $ * * netmap support for: ixgbe (both ix and ixv) * @@ -53,7 +53,7 @@ /* * device-specific sysctl variables: * - * ix_crcstrip: 0: keep CRC in rx frames (default), 1: strip it. + * ix_crcstrip: 0: NIC keeps CRC in rx frames (default), 1: NIC strips it. * During regular operations the CRC is stripped, but on some * hardware reception of frames not multiple of 64 is slower, * so using crcstrip=0 helps in benchmarks. @@ -65,7 +65,7 @@ static int ix_rx_miss, ix_rx_miss_bufs; int ix_crcstrip; SYSCTL_INT(_dev_netmap, OID_AUTO, ix_crcstrip, - CTLFLAG_RW, &ix_crcstrip, 0, "strip CRC on rx frames"); + CTLFLAG_RW, &ix_crcstrip, 0, "NIC strips CRC on rx frames"); SYSCTL_INT(_dev_netmap, OID_AUTO, ix_rx_miss, CTLFLAG_RW, &ix_rx_miss, 0, "potentially missed rx intr"); SYSCTL_INT(_dev_netmap, OID_AUTO, ix_rx_miss_bufs, @@ -109,6 +109,20 @@ IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rxc); } +static void +ixgbe_netmap_intr(struct netmap_adapter *na, int onoff) +{ + struct ifnet *ifp = na->ifp; + struct adapter *adapter = ifp->if_softc; + + IXGBE_CORE_LOCK(adapter); + if (onoff) { + ixgbe_enable_intr(adapter); // XXX maybe ixgbe_stop ? + } else { + ixgbe_disable_intr(adapter); // XXX maybe ixgbe_stop ? + } + IXGBE_CORE_UNLOCK(adapter); +} /* * Register/unregister. We are already under netmap lock. @@ -311,7 +325,7 @@ * good way. */ nic_i = IXGBE_READ_REG(&adapter->hw, IXGBE_IS_VF(adapter) ? - IXGBE_VFTDH(kring->ring_id) : IXGBE_TDH(kring->ring_id)); + IXGBE_VFTDH(kring->ring_id) : IXGBE_TDH(kring->ring_id)); if (nic_i >= kring->nkr_num_slots) { /* XXX can it happen ? */ D("TDH wrap %d", nic_i); nic_i -= kring->nkr_num_slots; @@ -486,6 +500,7 @@ na.nm_rxsync = ixgbe_netmap_rxsync; na.nm_register = ixgbe_netmap_reg; na.num_tx_rings = na.num_rx_rings = adapter->num_queues; + na.nm_intr = ixgbe_netmap_intr; netmap_attach(&na); } diff -u -r -N usr/src/sys/dev/netmap/netmap.c /usr/src/sys/dev/netmap/netmap.c --- usr/src/sys/dev/netmap/netmap.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap.c 2016-11-23 16:57:57.847975000 +0000 @@ -1,5 +1,9 @@ /* - * Copyright (C) 2011-2014 Matteo Landi, Luigi Rizzo. All rights reserved. + * Copyright (C) 2011-2014 Matteo Landi + * Copyright (C) 2011-2016 Luigi Rizzo + * Copyright (C) 2011-2016 Giuseppe Lettieri + * Copyright (C) 2011-2016 Vincenzo Maffione + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -25,7 +29,7 @@ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/netmap.c 300050 2016-05-17 12:52:31Z eadler $ + * $FreeBSD$ * * This module supports memory mapped access to network devices, * see netmap(4). @@ -133,13 +137,12 @@ * > select()able file descriptor on which events are reported. * * Internally, we allocate a netmap_priv_d structure, that will be - * initialized on ioctl(NIOCREGIF). + * initialized on ioctl(NIOCREGIF). There is one netmap_priv_d + * structure for each open(). * * os-specific: - * FreeBSD: netmap_open (netmap_freebsd.c). The priv is - * per-thread. - * linux: linux_netmap_open (netmap_linux.c). The priv is - * per-open. + * FreeBSD: see netmap_open() (netmap_freebsd.c) + * linux: see linux_netmap_open() (netmap_linux.c) * * > 2. on each descriptor, the process issues an ioctl() to identify * > the interface that should report events to the file descriptor. @@ -299,18 +302,17 @@ * netmap_transmit() * na->nm_notify == netmap_notify() * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context - * kring->nm_sync() == netmap_rxsync_from_host_compat + * kring->nm_sync() == netmap_rxsync_from_host * netmap_rxsync_from_host(na, NULL, NULL) * - tx to host stack * ioctl(NIOCTXSYNC)/netmap_poll() in process context - * kring->nm_sync() == netmap_txsync_to_host_compat + * kring->nm_sync() == netmap_txsync_to_host * netmap_txsync_to_host(na) - * NM_SEND_UP() - * FreeBSD: na->if_input() == ?? XXX + * nm_os_send_up() + * FreeBSD: na->if_input() == ether_input() * linux: netif_rx() with NM_MAGIC_PRIORITY_RX * * - * * -= SYSTEM DEVICE WITH GENERIC SUPPORT =- * * na == NA(ifp) == generic_netmap_adapter created in generic_netmap_attach() @@ -319,10 +321,11 @@ * concurrently: * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context * kring->nm_sync() == generic_netmap_txsync() - * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX - * generic_ndo_start_xmit() - * orig. dev. start_xmit - * FreeBSD: na->if_transmit() == orig. dev if_transmit + * nm_os_generic_xmit_frame() + * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX + * ifp->ndo_start_xmit == generic_ndo_start_xmit() + * gna->save_start_xmit == orig. dev. start_xmit + * FreeBSD: na->if_transmit() == orig. dev if_transmit * 2) generic_mbuf_destructor() * na->nm_notify() == netmap_notify() * - rx from netmap userspace: @@ -333,24 +336,15 @@ * generic_rx_handler() * mbq_safe_enqueue() * na->nm_notify() == netmap_notify() - * - rx from host stack: - * concurrently: + * - rx from host stack + * FreeBSD: same as native + * Linux: same as native except: * 1) host stack - * linux: generic_ndo_start_xmit() - * netmap_transmit() - * FreeBSD: ifp->if_input() == netmap_transmit - * both: - * na->nm_notify() == netmap_notify() - * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context - * kring->nm_sync() == netmap_rxsync_from_host_compat - * netmap_rxsync_from_host(na, NULL, NULL) - * - tx to host stack: - * ioctl(NIOCTXSYNC)/netmap_poll() in process context - * kring->nm_sync() == netmap_txsync_to_host_compat - * netmap_txsync_to_host(na) - * NM_SEND_UP() - * FreeBSD: na->if_input() == ??? XXX - * linux: netif_rx() with NM_MAGIC_PRIORITY_RX + * dev_queue_xmit() without NM_MAGIC_PRIORITY_TX + * ifp->ndo_start_xmit == generic_ndo_start_xmit() + * netmap_transmit() + * na->nm_notify() == netmap_notify() + * - tx to host stack (same as native): * * * -= VALE =- @@ -371,7 +365,7 @@ * from host stack: * netmap_transmit() * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) - * kring->nm_sync() == netmap_rxsync_from_host_compat() + * kring->nm_sync() == netmap_rxsync_from_host() * netmap_vp_txsync() * * - system device with generic support: @@ -384,7 +378,7 @@ * from host stack: * netmap_transmit() * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) - * kring->nm_sync() == netmap_rxsync_from_host_compat() + * kring->nm_sync() == netmap_rxsync_from_host() * netmap_vp_txsync() * * (all cases) --> nm_bdg_flush() @@ -407,7 +401,7 @@ * netmap_vp_rxsync() * to host stack: * netmap_vp_rxsync() - * kring->nm_sync() == netmap_txsync_to_host_compat + * kring->nm_sync() == netmap_txsync_to_host * netmap_vp_rxsync_locked() * * - system device with generic adapter: @@ -418,7 +412,7 @@ * netmap_vp_rxsync() * to host stack: * netmap_vp_rxsync() - * kring->nm_sync() == netmap_txsync_to_host_compat + * kring->nm_sync() == netmap_txsync_to_host * netmap_vp_rxsync() * */ @@ -455,29 +449,19 @@ #include <sys/refcount.h> -/* reduce conditional code */ -// linux API, use for the knlist in FreeBSD -/* use a private mutex for the knlist */ -#define init_waitqueue_head(x) do { \ - struct mtx *m = &(x)->m; \ - mtx_init(m, "nm_kn_lock", NULL, MTX_DEF); \ - knlist_init_mtx(&(x)->si.si_note, m); \ - } while (0) - -#define OS_selrecord(a, b) selrecord(a, &((b)->si)) -#define OS_selwakeup(a, b) freebsd_selwakeup(a, b) - #elif defined(linux) #include "bsd_glue.h" - - #elif defined(__APPLE__) #warning OSX support is only partial #include "osx_glue.h" +#elif defined (_WIN32) + +#include "win_glue.h" + #else #error Unsupported platform @@ -492,47 +476,69 @@ #include <dev/netmap/netmap_mem2.h> -MALLOC_DEFINE(M_NETMAP, "netmap", "Network memory map"); - /* user-controlled variables */ int netmap_verbose; static int netmap_no_timestamp; /* don't timestamp on rxsync */ - -SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args"); -SYSCTL_INT(_dev_netmap, OID_AUTO, verbose, - CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode"); -SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp, - CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp"); int netmap_mitigate = 1; -SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, ""); int netmap_no_pendintr = 1; -SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr, - CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets."); int netmap_txsync_retry = 2; -SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW, - &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush."); - -int netmap_adaptive_io = 0; -SYSCTL_INT(_dev_netmap, OID_AUTO, adaptive_io, CTLFLAG_RW, - &netmap_adaptive_io, 0 , "Adaptive I/O on paravirt"); - int netmap_flags = 0; /* debug flags */ -int netmap_fwd = 0; /* force transparent mode */ +static int netmap_fwd = 0; /* force transparent mode */ /* * netmap_admode selects the netmap mode to use. * Invalid values are reset to NETMAP_ADMODE_BEST */ -enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */ +enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */ NETMAP_ADMODE_NATIVE, /* either native or none */ NETMAP_ADMODE_GENERIC, /* force generic */ NETMAP_ADMODE_LAST }; static int netmap_admode = NETMAP_ADMODE_BEST; -int netmap_generic_mit = 100*1000; /* Generic mitigation interval in nanoseconds. */ -int netmap_generic_ringsize = 1024; /* Generic ringsize. */ -int netmap_generic_rings = 1; /* number of queues in generic. */ +/* netmap_generic_mit controls mitigation of RX notifications for + * the generic netmap adapter. The value is a time interval in + * nanoseconds. */ +int netmap_generic_mit = 100*1000; + +/* We use by default netmap-aware qdiscs with generic netmap adapters, + * even if there can be a little performance hit with hardware NICs. + * However, using the qdisc is the safer approach, for two reasons: + * 1) it prevents non-fifo qdiscs to break the TX notification + * scheme, which is based on mbuf destructors when txqdisc is + * not used. + * 2) it makes it possible to transmit over software devices that + * change skb->dev, like bridge, veth, ... + * + * Anyway users looking for the best performance should + * use native adapters. + */ +int netmap_generic_txqdisc = 1; + +/* Default number of slots and queues for generic adapters. */ +int netmap_generic_ringsize = 1024; +int netmap_generic_rings = 1; + +/* Non-zero if ptnet devices are allowed to use virtio-net headers. */ +int ptnet_vnet_hdr = 1; + +/* + * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated + * in some other operating systems + */ +SYSBEGIN(main_init); + +SYSCTL_DECL(_dev_netmap); +SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args"); +SYSCTL_INT(_dev_netmap, OID_AUTO, verbose, + CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode"); +SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp, + CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp"); +SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, ""); +SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr, + CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets."); +SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW, + &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush."); SYSCTL_INT(_dev_netmap, OID_AUTO, flags, CTLFLAG_RW, &netmap_flags, 0 , ""); SYSCTL_INT(_dev_netmap, OID_AUTO, fwd, CTLFLAG_RW, &netmap_fwd, 0 , ""); @@ -540,19 +546,24 @@ SYSCTL_INT(_dev_netmap, OID_AUTO, generic_mit, CTLFLAG_RW, &netmap_generic_mit, 0 , ""); SYSCTL_INT(_dev_netmap, OID_AUTO, generic_ringsize, CTLFLAG_RW, &netmap_generic_ringsize, 0 , ""); SYSCTL_INT(_dev_netmap, OID_AUTO, generic_rings, CTLFLAG_RW, &netmap_generic_rings, 0 , ""); +SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTLFLAG_RW, &netmap_generic_txqdisc, 0 , ""); +SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0 , ""); + +SYSEND; NMG_LOCK_T netmap_global_lock; -int netmap_use_count = 0; /* number of active netmap instances */ /* * mark the ring as stopped, and run through the locks * to make sure other users get to see it. + * stopped must be either NR_KR_STOPPED (for unbounded stop) + * of NR_KR_LOCKED (brief stop for mutual exclusion purposes) */ static void -netmap_disable_ring(struct netmap_kring *kr) +netmap_disable_ring(struct netmap_kring *kr, int stopped) { - kr->nkr_stopped = 1; - nm_kr_get(kr); + nm_kr_stop(kr, stopped); + // XXX check if nm_kr_stop is sufficient mtx_lock(&kr->q_lock); mtx_unlock(&kr->q_lock); nm_kr_put(kr); @@ -563,7 +574,7 @@ netmap_set_ring(struct netmap_adapter *na, u_int ring_id, enum txrx t, int stopped) { if (stopped) - netmap_disable_ring(NMR(na, t) + ring_id); + netmap_disable_ring(NMR(na, t) + ring_id, stopped); else NMR(na, t)[ring_id].nkr_stopped = 0; } @@ -590,13 +601,14 @@ * Convenience function used in drivers. Waits for current txsync()s/rxsync()s * to finish and prevents any new one from starting. Call this before turning * netmap mode off, or before removing the hardware rings (e.g., on module - * onload). As a rule of thumb for linux drivers, this should be placed near - * each napi_disable(). + * onload). */ void netmap_disable_all_rings(struct ifnet *ifp) { - netmap_set_all_rings(NA(ifp), 1 /* stopped */); + if (NM_NA_VALID(ifp)) { + netmap_set_all_rings(NA(ifp), NM_KR_STOPPED); + } } /* @@ -607,9 +619,34 @@ void netmap_enable_all_rings(struct ifnet *ifp) { - netmap_set_all_rings(NA(ifp), 0 /* enabled */); + if (NM_NA_VALID(ifp)) { + netmap_set_all_rings(NA(ifp), 0 /* enabled */); + } +} + +void +netmap_make_zombie(struct ifnet *ifp) +{ + if (NM_NA_VALID(ifp)) { + struct netmap_adapter *na = NA(ifp); + netmap_set_all_rings(na, NM_KR_LOCKED); + na->na_flags |= NAF_ZOMBIE; + netmap_set_all_rings(na, 0); + } } +void +netmap_undo_zombie(struct ifnet *ifp) +{ + if (NM_NA_VALID(ifp)) { + struct netmap_adapter *na = NA(ifp); + if (na->na_flags & NAF_ZOMBIE) { + netmap_set_all_rings(na, NM_KR_LOCKED); + na->na_flags &= ~NAF_ZOMBIE; + netmap_set_all_rings(na, 0); + } + } +} /* * generic bound_checking function @@ -727,28 +764,9 @@ return 1; } -static void netmap_txsync_to_host(struct netmap_adapter *na); -static int netmap_rxsync_from_host(struct netmap_adapter *na, struct thread *td, void *pwait); - -/* kring->nm_sync callback for the host tx ring */ -static int -netmap_txsync_to_host_compat(struct netmap_kring *kring, int flags) -{ - (void)flags; /* unused */ - netmap_txsync_to_host(kring->na); - return 0; -} - -/* kring->nm_sync callback for the host rx ring */ -static int -netmap_rxsync_from_host_compat(struct netmap_kring *kring, int flags) -{ - (void)flags; /* unused */ - netmap_rxsync_from_host(kring->na, NULL, NULL); - return 0; -} - - +/* nm_sync callbacks for the host rings */ +static int netmap_txsync_to_host(struct netmap_kring *kring, int flags); +static int netmap_rxsync_from_host(struct netmap_kring *kring, int flags); /* create the krings array and initialize the fields common to all adapters. * The array layout is this: @@ -789,7 +807,7 @@ len = (n[NR_TX] + n[NR_RX]) * sizeof(struct netmap_kring) + tailroom; - na->tx_rings = malloc((size_t)len, M_DEVBUF, M_NOWAIT | M_ZERO); + na->tx_rings = nm_os_malloc((size_t)len); if (na->tx_rings == NULL) { D("Cannot allocate krings"); return ENOMEM; @@ -809,12 +827,14 @@ kring->ring_id = i; kring->tx = t; kring->nkr_num_slots = ndesc; + kring->nr_mode = NKR_NETMAP_OFF; + kring->nr_pending_mode = NKR_NETMAP_OFF; if (i < nma_get_nrings(na, t)) { kring->nm_sync = (t == NR_TX ? na->nm_txsync : na->nm_rxsync); - } else if (i == na->num_tx_rings) { + } else { kring->nm_sync = (t == NR_TX ? - netmap_txsync_to_host_compat : - netmap_rxsync_from_host_compat); + netmap_txsync_to_host: + netmap_rxsync_from_host); } kring->nm_notify = na->nm_notify; kring->rhead = kring->rcur = kring->nr_hwcur = 0; @@ -822,14 +842,14 @@ * IMPORTANT: Always keep one slot empty. */ kring->rtail = kring->nr_hwtail = (t == NR_TX ? ndesc - 1 : 0); - snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name, + snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name, nm_txrx2str(t), i); ND("ktx %s h %d c %d t %d", kring->name, kring->rhead, kring->rcur, kring->rtail); mtx_init(&kring->q_lock, (t == NR_TX ? "nm_txq_lock" : "nm_rxq_lock"), NULL, MTX_DEF); - init_waitqueue_head(&kring->si); + nm_os_selinfo_init(&kring->si); } - init_waitqueue_head(&na->si[t]); + nm_os_selinfo_init(&na->si[t]); } na->tailroom = na->rx_rings + n[NR_RX]; @@ -838,19 +858,6 @@ } -#ifdef __FreeBSD__ -static void -netmap_knlist_destroy(NM_SELINFO_T *si) -{ - /* XXX kqueue(9) needed; these will mirror knlist_init. */ - knlist_delete(&si->si.si_note, curthread, 0 /* not locked */ ); - knlist_destroy(&si->si.si_note); - /* now we don't need the mutex anymore */ - mtx_destroy(&si->m); -} -#endif /* __FreeBSD__ */ - - /* undo the actions performed by netmap_krings_create */ /* call with NMG_LOCK held */ void @@ -860,14 +867,14 @@ enum txrx t; for_rx_tx(t) - netmap_knlist_destroy(&na->si[t]); + nm_os_selinfo_uninit(&na->si[t]); /* we rely on the krings layout described above */ for ( ; kring != na->tailroom; kring++) { mtx_destroy(&kring->q_lock); - netmap_knlist_destroy(&kring->si); + nm_os_selinfo_uninit(&kring->si); } - free(na->tx_rings, M_DEVBUF); + nm_os_free(na->tx_rings); na->tx_rings = na->rx_rings = na->tailroom = NULL; } @@ -878,14 +885,14 @@ * them first. */ /* call with NMG_LOCK held */ -static void +void netmap_hw_krings_delete(struct netmap_adapter *na) { struct mbq *q = &na->rx_rings[na->num_rx_rings].rx_queue; ND("destroy sw mbq with len %d", mbq_len(q)); mbq_purge(q); - mbq_safe_destroy(q); + mbq_safe_fini(q); netmap_krings_delete(na); } @@ -898,29 +905,38 @@ */ /* call with NMG_LOCK held */ static void netmap_unset_ringid(struct netmap_priv_d *); -static void netmap_rel_exclusive(struct netmap_priv_d *); -static void +static void netmap_krings_put(struct netmap_priv_d *); +void netmap_do_unregif(struct netmap_priv_d *priv) { struct netmap_adapter *na = priv->np_na; NMG_LOCK_ASSERT(); na->active_fds--; - /* release exclusive use if it was requested on regif */ - netmap_rel_exclusive(priv); - if (na->active_fds <= 0) { /* last instance */ - - if (netmap_verbose) - D("deleting last instance for %s", na->name); + /* unset nr_pending_mode and possibly release exclusive mode */ + netmap_krings_put(priv); #ifdef WITH_MONITOR + /* XXX check whether we have to do something with monitor + * when rings change nr_mode. */ + if (na->active_fds <= 0) { /* walk through all the rings and tell any monitor * that the port is going to exit netmap mode */ netmap_monitor_stop(na); + } #endif + + if (na->active_fds <= 0 || nm_kring_pending(priv)) { + na->nm_register(na, 0); + } + + /* delete rings and buffers that are no longer needed */ + netmap_mem_rings_delete(na); + + if (na->active_fds <= 0) { /* last instance */ /* - * (TO CHECK) This function is only called + * (TO CHECK) We enter here * when the last reference to this file descriptor goes * away. This means we cannot have any pending poll() * or interrupt routine operating on the structure. @@ -933,16 +949,16 @@ * happens if the close() occurs while a concurrent * syscall is running. */ - na->nm_register(na, 0); /* off, clear flags */ - /* Wake up any sleeping threads. netmap_poll will - * then return POLLERR - * XXX The wake up now must happen during *_down(), when - * we order all activities to stop. -gl - */ - /* delete rings and buffers */ - netmap_mem_rings_delete(na); + if (netmap_verbose) + D("deleting last instance for %s", na->name); + + if (nm_netmap_on(na)) { + D("BUG: netmap on while going to delete the krings"); + } + na->nm_krings_delete(na); } + /* possibily decrement counter of tx_si/rx_si users */ netmap_unset_ringid(priv); /* delete the nifp */ @@ -962,6 +978,19 @@ (priv->np_qlast[t] - priv->np_qfirst[t] > 1)); } +struct netmap_priv_d* +netmap_priv_new(void) +{ + struct netmap_priv_d *priv; + + priv = nm_os_malloc(sizeof(struct netmap_priv_d)); + if (priv == NULL) + return NULL; + priv->np_refs = 1; + nm_os_get_module(); + return priv; +} + /* * Destructor of the netmap_priv_d, called when the fd is closed * Action: undo all the things done by NIOCREGIF, @@ -971,22 +1000,22 @@ * */ /* call with NMG_LOCK held */ -int -netmap_dtor_locked(struct netmap_priv_d *priv) +void +netmap_priv_delete(struct netmap_priv_d *priv) { struct netmap_adapter *na = priv->np_na; /* number of active references to this fd */ if (--priv->np_refs > 0) { - return 0; - } - netmap_use_count--; - if (!na) { - return 1; //XXX is it correct? + return; } - netmap_do_unregif(priv); - netmap_adapter_put(na); - return 1; + nm_os_put_module(); + if (na) { + netmap_do_unregif(priv); + } + netmap_unget_na(na, priv->np_ifp); + bzero(priv, sizeof(*priv)); /* for safety */ + nm_os_free(priv); } @@ -995,15 +1024,10 @@ netmap_dtor(void *data) { struct netmap_priv_d *priv = data; - int last_instance; NMG_LOCK(); - last_instance = netmap_dtor_locked(priv); + netmap_priv_delete(priv); NMG_UNLOCK(); - if (last_instance) { - bzero(priv, sizeof(*priv)); /* for safety */ - free(priv, M_DEVBUF); - } } @@ -1036,14 +1060,19 @@ netmap_send_up(struct ifnet *dst, struct mbq *q) { struct mbuf *m; + struct mbuf *head = NULL, *prev = NULL; /* send packets up, outside the lock */ while ((m = mbq_dequeue(q)) != NULL) { if (netmap_verbose & NM_VERB_HOST) D("sending up pkt %p size %d", m, MBUF_LEN(m)); - NM_SEND_UP(dst, m); - } - mbq_destroy(q); + prev = nm_os_send_up(dst, m, prev); + if (head == NULL) + head = prev; + } + if (head) + nm_os_send_up(dst, NULL, head); + mbq_fini(q); } @@ -1081,6 +1110,27 @@ } } +static inline int +_nm_may_forward(struct netmap_kring *kring) +{ + return ((netmap_fwd || kring->ring->flags & NR_FORWARD) && + kring->na->na_flags & NAF_HOST_RINGS && + kring->tx == NR_RX); +} + +static inline int +nm_may_forward_up(struct netmap_kring *kring) +{ + return _nm_may_forward(kring) && + kring->ring_id != kring->na->num_rx_rings; +} + +static inline int +nm_may_forward_down(struct netmap_kring *kring) +{ + return _nm_may_forward(kring) && + kring->ring_id == kring->na->num_rx_rings; +} /* * Send to the NIC rings packets marked NS_FORWARD between @@ -1107,7 +1157,7 @@ for (; rxcur != head && !nm_ring_empty(rdst); rxcur = nm_next(rxcur, src_lim) ) { struct netmap_slot *src, *dst, tmp; - u_int dst_cur = rdst->cur; + u_int dst_head = rdst->head; src = &rxslot[rxcur]; if ((src->flags & NS_FORWARD) == 0 && !netmap_fwd) @@ -1115,7 +1165,7 @@ sent++; - dst = &rdst->slot[dst_cur]; + dst = &rdst->slot[dst_head]; tmp = *src; @@ -1126,7 +1176,7 @@ dst->len = tmp.len; dst->flags = NS_BUF_CHANGED; - rdst->cur = nm_next(dst_cur, dst_lim); + rdst->head = rdst->cur = nm_next(dst_head, dst_lim); } /* if (sent) XXX txsync ? */ } @@ -1140,10 +1190,10 @@ * can be among multiple user threads erroneously calling * this routine concurrently. */ -static void -netmap_txsync_to_host(struct netmap_adapter *na) +static int +netmap_txsync_to_host(struct netmap_kring *kring, int flags) { - struct netmap_kring *kring = &na->tx_rings[na->num_tx_rings]; + struct netmap_adapter *na = kring->na; u_int const lim = kring->nkr_num_slots - 1; u_int const head = kring->rhead; struct mbq q; @@ -1162,6 +1212,7 @@ kring->nr_hwtail -= lim + 1; netmap_send_up(na->ifp, &q); + return 0; } @@ -1171,17 +1222,15 @@ * We protect access to the kring using kring->rx_queue.lock * * This routine also does the selrecord if called from the poll handler - * (we know because td != NULL). + * (we know because sr != NULL). * - * NOTE: on linux, selrecord() is defined as a macro and uses pwait - * as an additional hidden argument. * returns the number of packets delivered to tx queues in * transparent mode, or a negative value if error */ static int -netmap_rxsync_from_host(struct netmap_adapter *na, struct thread *td, void *pwait) +netmap_rxsync_from_host(struct netmap_kring *kring, int flags) { - struct netmap_kring *kring = &na->rx_rings[na->num_rx_rings]; + struct netmap_adapter *na = kring->na; struct netmap_ring *ring = kring->ring; u_int nm_i, n; u_int const lim = kring->nkr_num_slots - 1; @@ -1189,9 +1238,6 @@ int ret = 0; struct mbq *q = &kring->rx_queue, fq; - (void)pwait; /* disable unused warnings */ - (void)td; - mbq_init(&fq); /* fq holds packets to be freed */ mbq_lock(q); @@ -1226,19 +1272,20 @@ */ nm_i = kring->nr_hwcur; if (nm_i != head) { /* something was released */ - if (netmap_fwd || kring->ring->flags & NR_FORWARD) + if (nm_may_forward_down(kring)) { ret = netmap_sw_to_nic(na); + if (ret > 0) { + kring->nr_kflags |= NR_FORWARD; + ret = 0; + } + } kring->nr_hwcur = head; } - /* access copies of cur,tail in the kring */ - if (kring->rcur == kring->rtail && td) /* no bufs available */ - OS_selrecord(td, &kring->si); - mbq_unlock(q); mbq_purge(&fq); - mbq_destroy(&fq); + mbq_fini(&fq); return ret; } @@ -1267,17 +1314,14 @@ * 0 NETMAP_ADMODE_GENERIC GENERIC GENERIC * */ - +static void netmap_hw_dtor(struct netmap_adapter *); /* needed by NM_IS_NATIVE() */ int netmap_get_hw_na(struct ifnet *ifp, struct netmap_adapter **na) { /* generic support */ int i = netmap_admode; /* Take a snapshot. */ struct netmap_adapter *prev_na; -#ifdef WITH_GENERIC - struct netmap_generic_adapter *gna; int error = 0; -#endif *na = NULL; /* default */ @@ -1285,7 +1329,7 @@ if (i < NETMAP_ADMODE_BEST || i >= NETMAP_ADMODE_LAST) i = netmap_admode = NETMAP_ADMODE_BEST; - if (NETMAP_CAPABLE(ifp)) { + if (NM_NA_VALID(ifp)) { prev_na = NA(ifp); /* If an adapter already exists, return it if * there are active file descriptors or if @@ -1310,10 +1354,9 @@ /* If there isn't native support and netmap is not allowed * to use generic adapters, we cannot satisfy the request. */ - if (!NETMAP_CAPABLE(ifp) && i == NETMAP_ADMODE_NATIVE) + if (!NM_IS_NATIVE(ifp) && i == NETMAP_ADMODE_NATIVE) return EOPNOTSUPP; -#ifdef WITH_GENERIC /* Otherwise, create a generic adapter and return it, * saving the previously used netmap adapter, if any. * @@ -1328,25 +1371,12 @@ * the branches above. This ensures that we never override * a generic adapter with another generic adapter. */ - prev_na = NA(ifp); error = generic_netmap_attach(ifp); if (error) return error; *na = NA(ifp); - gna = (struct netmap_generic_adapter*)NA(ifp); - gna->prev = prev_na; /* save old na */ - if (prev_na != NULL) { - ifunit_ref(ifp->if_xname); - // XXX add a refcount ? - netmap_adapter_get(prev_na); - } - ND("Created generic NA %p (prev %p)", gna, gna->prev); - return 0; -#else /* !WITH_GENERIC */ - return EOPNOTSUPP; -#endif } @@ -1364,21 +1394,22 @@ * could not be allocated. * If successful, hold a reference to the netmap adapter. * - * No reference is kept on the real interface, which may then - * disappear at any time. + * If the interface specified by nmr is a system one, also keep + * a reference to it and return a valid *ifp. */ int -netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, int create) +netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, + struct ifnet **ifp, int create) { - struct ifnet *ifp = NULL; int error = 0; struct netmap_adapter *ret = NULL; *na = NULL; /* default return value */ + *ifp = NULL; NMG_LOCK_ASSERT(); - /* we cascade through all possible types of netmap adapter. + /* We cascade through all possible types of netmap adapter. * All netmap_get_*_na() functions return an error and an na, * with the following combinations: * @@ -1389,6 +1420,11 @@ * !0 !NULL impossible */ + /* try to see if this is a ptnetmap port */ + error = netmap_get_pt_host_na(nmr, na, create); + if (error || *na != NULL) + return error; + /* try to see if this is a monitor port */ error = netmap_get_monitor_na(nmr, na, create); if (error || *na != NULL) @@ -1413,12 +1449,12 @@ * This may still be a tap, a veth/epair, or even a * persistent VALE port. */ - ifp = ifunit_ref(nmr->nr_name); - if (ifp == NULL) { + *ifp = ifunit_ref(nmr->nr_name); + if (*ifp == NULL) { return ENXIO; } - error = netmap_get_hw_na(ifp, &ret); + error = netmap_get_hw_na(*ifp, &ret); if (error) goto out; @@ -1426,15 +1462,42 @@ netmap_adapter_get(ret); out: - if (error && ret != NULL) - netmap_adapter_put(ret); - - if (ifp) - if_rele(ifp); /* allow live unloading of drivers modules */ + if (error) { + if (ret) + netmap_adapter_put(ret); + if (*ifp) { + if_rele(*ifp); + *ifp = NULL; + } + } return error; } +/* undo netmap_get_na() */ +void +netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp) +{ + if (ifp) + if_rele(ifp); + if (na) + netmap_adapter_put(na); +} + + +#define NM_FAIL_ON(t) do { \ + if (unlikely(t)) { \ + RD(5, "%s: fail '" #t "' " \ + "h %d c %d t %d " \ + "rh %d rc %d rt %d " \ + "hc %d ht %d", \ + kring->name, \ + head, cur, ring->tail, \ + kring->rhead, kring->rcur, kring->rtail, \ + kring->nr_hwcur, kring->nr_hwtail); \ + return kring->nkr_num_slots; \ + } \ +} while (0) /* * validate parameters on entry for *_txsync() @@ -1449,11 +1512,9 @@ * * hwcur, rhead, rtail and hwtail are reliable */ -static u_int -nm_txsync_prologue(struct netmap_kring *kring) +u_int +nm_txsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) { -#define NM_ASSERT(t) if (t) { D("fail " #t); goto error; } - struct netmap_ring *ring = kring->ring; u_int head = ring->head; /* read only once */ u_int cur = ring->cur; /* read only once */ u_int n = kring->nkr_num_slots; @@ -1463,54 +1524,44 @@ kring->nr_hwcur, kring->nr_hwtail, ring->head, ring->cur, ring->tail); #if 1 /* kernel sanity checks; but we can trust the kring. */ - if (kring->nr_hwcur >= n || kring->rhead >= n || - kring->rtail >= n || kring->nr_hwtail >= n) - goto error; + NM_FAIL_ON(kring->nr_hwcur >= n || kring->rhead >= n || + kring->rtail >= n || kring->nr_hwtail >= n); #endif /* kernel sanity checks */ /* - * user sanity checks. We only use 'cur', - * A, B, ... are possible positions for cur: + * user sanity checks. We only use head, + * A, B, ... are possible positions for head: * - * 0 A cur B tail C n-1 - * 0 D tail E cur F n-1 + * 0 A rhead B rtail C n-1 + * 0 D rtail E rhead F n-1 * * B, F, D are valid. A, C, E are wrong */ if (kring->rtail >= kring->rhead) { /* want rhead <= head <= rtail */ - NM_ASSERT(head < kring->rhead || head > kring->rtail); + NM_FAIL_ON(head < kring->rhead || head > kring->rtail); /* and also head <= cur <= rtail */ - NM_ASSERT(cur < head || cur > kring->rtail); + NM_FAIL_ON(cur < head || cur > kring->rtail); } else { /* here rtail < rhead */ /* we need head outside rtail .. rhead */ - NM_ASSERT(head > kring->rtail && head < kring->rhead); + NM_FAIL_ON(head > kring->rtail && head < kring->rhead); /* two cases now: head <= rtail or head >= rhead */ if (head <= kring->rtail) { /* want head <= cur <= rtail */ - NM_ASSERT(cur < head || cur > kring->rtail); + NM_FAIL_ON(cur < head || cur > kring->rtail); } else { /* head >= rhead */ /* cur must be outside rtail..head */ - NM_ASSERT(cur > kring->rtail && cur < head); + NM_FAIL_ON(cur > kring->rtail && cur < head); } } if (ring->tail != kring->rtail) { - RD(5, "tail overwritten was %d need %d", + RD(5, "%s tail overwritten was %d need %d", kring->name, ring->tail, kring->rtail); ring->tail = kring->rtail; } kring->rhead = head; kring->rcur = cur; return head; - -error: - RD(5, "%s kring error: head %d cur %d tail %d rhead %d rcur %d rtail %d hwcur %d hwtail %d", - kring->name, - head, cur, ring->tail, - kring->rhead, kring->rcur, kring->rtail, - kring->nr_hwcur, kring->nr_hwtail); - return n; -#undef NM_ASSERT } @@ -1525,10 +1576,9 @@ * hwcur and hwtail are reliable. * */ -static u_int -nm_rxsync_prologue(struct netmap_kring *kring) +u_int +nm_rxsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) { - struct netmap_ring *ring = kring->ring; uint32_t const n = kring->nkr_num_slots; uint32_t head, cur; @@ -1546,30 +1596,24 @@ cur = kring->rcur = ring->cur; /* read only once */ head = kring->rhead = ring->head; /* read only once */ #if 1 /* kernel sanity checks */ - if (kring->nr_hwcur >= n || kring->nr_hwtail >= n) - goto error; + NM_FAIL_ON(kring->nr_hwcur >= n || kring->nr_hwtail >= n); #endif /* kernel sanity checks */ /* user sanity checks */ if (kring->nr_hwtail >= kring->nr_hwcur) { /* want hwcur <= rhead <= hwtail */ - if (head < kring->nr_hwcur || head > kring->nr_hwtail) - goto error; + NM_FAIL_ON(head < kring->nr_hwcur || head > kring->nr_hwtail); /* and also rhead <= rcur <= hwtail */ - if (cur < head || cur > kring->nr_hwtail) - goto error; + NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); } else { /* we need rhead outside hwtail..hwcur */ - if (head < kring->nr_hwcur && head > kring->nr_hwtail) - goto error; + NM_FAIL_ON(head < kring->nr_hwcur && head > kring->nr_hwtail); /* two cases now: head <= hwtail or head >= hwcur */ if (head <= kring->nr_hwtail) { /* want head <= cur <= hwtail */ - if (cur < head || cur > kring->nr_hwtail) - goto error; + NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); } else { /* cur must be outside hwtail..head */ - if (cur < head && cur > kring->nr_hwtail) - goto error; + NM_FAIL_ON(cur < head && cur > kring->nr_hwtail); } } if (ring->tail != kring->rtail) { @@ -1579,13 +1623,6 @@ ring->tail = kring->rtail; } return head; - -error: - RD(5, "kring error: hwcur %d rcur %d hwtail %d head %d cur %d tail %d", - kring->nr_hwcur, - kring->rcur, kring->nr_hwtail, - kring->rhead, kring->rcur, ring->tail); - return n; } @@ -1659,6 +1696,7 @@ struct netmap_adapter *na = priv->np_na; u_int j, i = ringid & NETMAP_RING_MASK; u_int reg = flags & NR_REG_MASK; + int excluded_direction[] = { NR_TX_RINGS_ONLY, NR_RX_RINGS_ONLY }; enum txrx t; if (reg == NR_REG_DEFAULT) { @@ -1672,48 +1710,58 @@ } D("deprecated API, old ringid 0x%x -> ringid %x reg %d", ringid, i, reg); } - switch (reg) { - case NR_REG_ALL_NIC: - case NR_REG_PIPE_MASTER: - case NR_REG_PIPE_SLAVE: - for_rx_tx(t) { + + if ((flags & NR_PTNETMAP_HOST) && (reg != NR_REG_ALL_NIC || + flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) { + D("Error: only NR_REG_ALL_NIC supported with netmap passthrough"); + return EINVAL; + } + + for_rx_tx(t) { + if (flags & excluded_direction[t]) { + priv->np_qfirst[t] = priv->np_qlast[t] = 0; + continue; + } + switch (reg) { + case NR_REG_ALL_NIC: + case NR_REG_PIPE_MASTER: + case NR_REG_PIPE_SLAVE: priv->np_qfirst[t] = 0; priv->np_qlast[t] = nma_get_nrings(na, t); - } - ND("%s %d %d", "ALL/PIPE", - priv->np_qfirst[NR_RX], priv->np_qlast[NR_RX]); - break; - case NR_REG_SW: - case NR_REG_NIC_SW: - if (!(na->na_flags & NAF_HOST_RINGS)) { - D("host rings not supported"); - return EINVAL; - } - for_rx_tx(t) { + ND("ALL/PIPE: %s %d %d", nm_txrx2str(t), + priv->np_qfirst[t], priv->np_qlast[t]); + break; + case NR_REG_SW: + case NR_REG_NIC_SW: + if (!(na->na_flags & NAF_HOST_RINGS)) { + D("host rings not supported"); + return EINVAL; + } priv->np_qfirst[t] = (reg == NR_REG_SW ? nma_get_nrings(na, t) : 0); priv->np_qlast[t] = nma_get_nrings(na, t) + 1; - } - ND("%s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW", - priv->np_qfirst[NR_RX], priv->np_qlast[NR_RX]); - break; - case NR_REG_ONE_NIC: - if (i >= na->num_tx_rings && i >= na->num_rx_rings) { - D("invalid ring id %d", i); - return EINVAL; - } - for_rx_tx(t) { + ND("%s: %s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW", + nm_txrx2str(t), + priv->np_qfirst[t], priv->np_qlast[t]); + break; + case NR_REG_ONE_NIC: + if (i >= na->num_tx_rings && i >= na->num_rx_rings) { + D("invalid ring id %d", i); + return EINVAL; + } /* if not enough rings, use the first one */ j = i; if (j >= nma_get_nrings(na, t)) j = 0; priv->np_qfirst[t] = j; priv->np_qlast[t] = j + 1; + ND("ONE_NIC: %s %d %d", nm_txrx2str(t), + priv->np_qfirst[t], priv->np_qlast[t]); + break; + default: + D("invalid regif type %d", reg); + return EINVAL; } - break; - default: - D("invalid regif type %d", reg); - return EINVAL; } priv->np_flags = (flags & ~NR_REG_MASK) | reg; @@ -1776,11 +1824,12 @@ } -/* check that the rings we want to bind are not exclusively owned by a previous - * bind. If exclusive ownership has been requested, we also mark the rings. +/* Set the nr_pending_mode for the requested rings. + * If requested, also try to get exclusive access to the rings, provided + * the rings we want to bind are not exclusively owned by a previous bind. */ static int -netmap_get_exclusive(struct netmap_priv_d *priv) +netmap_krings_get(struct netmap_priv_d *priv) { struct netmap_adapter *na = priv->np_na; u_int i; @@ -1811,16 +1860,16 @@ } } - /* second round: increment usage cound and possibly - * mark as exclusive + /* second round: increment usage count (possibly marking them + * as exclusive) and set the nr_pending_mode */ - for_rx_tx(t) { for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { kring = &NMR(na, t)[i]; kring->users++; if (excl) kring->nr_kflags |= NKR_EXCLUSIVE; + kring->nr_pending_mode = NKR_NETMAP_ON; } } @@ -1828,9 +1877,11 @@ } -/* undo netmap_get_ownership() */ +/* Undo netmap_krings_get(). This is done by clearing the exclusive mode + * if was asked on regif, and unset the nr_pending_mode if we are the + * last users of the involved rings. */ static void -netmap_rel_exclusive(struct netmap_priv_d *priv) +netmap_krings_put(struct netmap_priv_d *priv) { struct netmap_adapter *na = priv->np_na; u_int i; @@ -1852,6 +1903,8 @@ if (excl) kring->nr_kflags &= ~NKR_EXCLUSIVE; kring->users--; + if (kring->users == 0) + kring->nr_pending_mode = NKR_NETMAP_OFF; } } } @@ -1899,9 +1952,8 @@ * (put the adapter in netmap mode) * * This may be one of the following: - * (XXX these should be either all *_register or all *_reg 2014-03-15) * - * * netmap_hw_register (hw ports) + * * netmap_hw_reg (hw ports) * checks that the ifp is still there, then calls * the hardware specific callback; * @@ -1919,7 +1971,7 @@ * intercept the sync callbacks of the monitored * rings * - * * netmap_bwrap_register (bwraps) + * * netmap_bwrap_reg (bwraps) * cross-link the bwrap and hwna rings, * forward the request to the hwna, override * the hwna notify callback (to get the frames @@ -1948,7 +2000,7 @@ if (na->active_fds == 0) { /* * If this is the first registration of the adapter, - * also create the netmap rings and their in-kernel view, + * create the in-kernel view of the netmap rings, * the netmap krings. */ @@ -1960,39 +2012,48 @@ if (error) goto err_drop_mem; - /* create all missing netmap rings */ - error = netmap_mem_rings_create(na); - if (error) - goto err_del_krings; } - /* now the kring must exist and we can check whether some - * previous bind has exclusive ownership on them + /* now the krings must exist and we can check whether some + * previous bind has exclusive ownership on them, and set + * nr_pending_mode */ - error = netmap_get_exclusive(priv); + error = netmap_krings_get(priv); if (error) - goto err_del_rings; + goto err_del_krings; + + /* create all needed missing netmap rings */ + error = netmap_mem_rings_create(na); + if (error) + goto err_rel_excl; /* in all cases, create a new netmap if */ nifp = netmap_mem_if_new(na); if (nifp == NULL) { error = ENOMEM; - goto err_rel_excl; + goto err_del_rings; } - na->active_fds++; - if (!nm_netmap_on(na)) { - /* Netmap not active, set the card in netmap mode - * and make it use the shared buffers. - */ + if (na->active_fds == 0) { /* cache the allocator info in the na */ - netmap_mem_get_lut(na->nm_mem, &na->na_lut); - ND("%p->na_lut == %p", na, na->na_lut.lut); - error = na->nm_register(na, 1); /* mode on */ - if (error) + error = netmap_mem_get_lut(na->nm_mem, &na->na_lut); + if (error) goto err_del_if; + ND("lut %p bufs %u size %u", na->na_lut.lut, na->na_lut.objtotal, + na->na_lut.objsize); + } + + if (nm_kring_pending(priv)) { + /* Some kring is switching mode, tell the adapter to + * react on this. */ + error = na->nm_register(na, 1); + if (error) + goto err_put_lut; } + /* Commit the reference. */ + na->active_fds++; + /* * advertise that the interface is ready by setting np_nifp. * The barrier is needed because readers (poll, *SYNC and mmap) @@ -2003,15 +2064,15 @@ return 0; +err_put_lut: + if (na->active_fds == 0) + memset(&na->na_lut, 0, sizeof(na->na_lut)); err_del_if: - memset(&na->na_lut, 0, sizeof(na->na_lut)); - na->active_fds--; netmap_mem_if_delete(na, nifp); err_rel_excl: - netmap_rel_exclusive(priv); + netmap_krings_put(priv); err_del_rings: - if (na->active_fds == 0) - netmap_mem_rings_delete(na); + netmap_mem_rings_delete(na); err_del_krings: if (na->active_fds == 0) na->nm_krings_delete(na); @@ -2024,40 +2085,45 @@ /* - * update kring and ring at the end of txsync. + * update kring and ring at the end of rxsync/txsync. */ static inline void -nm_txsync_finalize(struct netmap_kring *kring) +nm_sync_finalize(struct netmap_kring *kring) { - /* update ring tail to what the kernel knows */ + /* + * Update ring tail to what the kernel knows + * After txsync: head/rhead/hwcur might be behind cur/rcur + * if no carrier. + */ kring->ring->tail = kring->rtail = kring->nr_hwtail; - /* note, head/rhead/hwcur might be behind cur/rcur - * if no carrier - */ ND(5, "%s now hwcur %d hwtail %d head %d cur %d tail %d", kring->name, kring->nr_hwcur, kring->nr_hwtail, kring->rhead, kring->rcur, kring->rtail); } - -/* - * update kring and ring at the end of rxsync - */ -static inline void -nm_rxsync_finalize(struct netmap_kring *kring) +static int +nm_override_mem(struct netmap_adapter *na, nm_memid_t id) { - /* tell userspace that there might be new packets */ - //struct netmap_ring *ring = kring->ring; - ND("head %d cur %d tail %d -> %d", ring->head, ring->cur, ring->tail, - kring->nr_hwtail); - kring->ring->tail = kring->rtail = kring->nr_hwtail; - /* make a copy of the state for next round */ - kring->rhead = kring->ring->head; - kring->rcur = kring->ring->cur; -} + struct netmap_mem_d *nmd; + + if (id == 0 || netmap_mem_get_id(na->nm_mem) == id) + return 0; + + if (na->na_flags & NAF_MEM_OWNER) + return EINVAL; + if (na->active_fds > 0) + return EBUSY; + nmd = netmap_mem_find(id); + if (nmd == NULL) + return ENOENT; + + netmap_mem_put(na->nm_mem); + na->nm_mem = nmd; + return 0; +} /* * ioctl(2) support for the "netmap" device. @@ -2072,21 +2138,17 @@ * Return 0 on success, errno otherwise. */ int -netmap_ioctl(struct cdev *dev, u_long cmd, caddr_t data, - int fflag, struct thread *td) +netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *td) { - struct netmap_priv_d *priv = NULL; struct nmreq *nmr = (struct nmreq *) data; struct netmap_adapter *na = NULL; - int error; + struct ifnet *ifp = NULL; + int error = 0; u_int i, qfirst, qlast; struct netmap_if *nifp; struct netmap_kring *krings; enum txrx t; - (void)dev; /* UNUSED */ - (void)fflag; /* UNUSED */ - if (cmd == NIOCGINFO || cmd == NIOCREGIF) { /* truncate name */ nmr->nr_name[sizeof(nmr->nr_name) - 1] = '\0'; @@ -2101,15 +2163,6 @@ return EINVAL; } } - CURVNET_SET(TD_TO_VNET(td)); - - error = devfs_get_cdevpriv((void **)&priv); - if (error) { - CURVNET_RESTORE(); - /* XXX ENOENT should be impossible, since the priv - * is now created in the open */ - return (error == ENOENT ? ENXIO : error); - } switch (cmd) { case NIOCGINFO: /* return capabilities etc */ @@ -2125,13 +2178,23 @@ u_int memflags; if (nmr->nr_name[0] != '\0') { + /* get a refcount */ - error = netmap_get_na(nmr, &na, 1 /* create */); - if (error) + error = netmap_get_na(nmr, &na, &ifp, 1 /* create */); + if (error) { + na = NULL; + ifp = NULL; break; + } nmd = na->nm_mem; /* get memory allocator */ } + error = nm_override_mem(na, nmr->nr_arg2); + if (error) { + netmap_unget_na(na, ifp); + break; + } + error = netmap_mem_get_info(nmd, &nmr->nr_memsize, &memflags, &nmr->nr_arg2); if (error) @@ -2145,20 +2208,47 @@ nmr->nr_tx_rings = na->num_tx_rings; nmr->nr_rx_slots = na->num_rx_desc; nmr->nr_tx_slots = na->num_tx_desc; - netmap_adapter_put(na); } while (0); + netmap_unget_na(na, ifp); NMG_UNLOCK(); break; case NIOCREGIF: - /* possibly attach/detach NIC and VALE switch */ + /* + * If nmr->nr_cmd is not zero, this NIOCREGIF is not really + * a regif operation, but a different one, specified by the + * value of nmr->nr_cmd. + */ i = nmr->nr_cmd; if (i == NETMAP_BDG_ATTACH || i == NETMAP_BDG_DETACH || i == NETMAP_BDG_VNET_HDR || i == NETMAP_BDG_NEWIF - || i == NETMAP_BDG_DELIF) { + || i == NETMAP_BDG_DELIF + || i == NETMAP_BDG_POLLING_ON + || i == NETMAP_BDG_POLLING_OFF) { + /* possibly attach/detach NIC and VALE switch */ error = netmap_bdg_ctl(nmr, NULL); break; + } else if (i == NETMAP_PT_HOST_CREATE || i == NETMAP_PT_HOST_DELETE) { + /* forward the command to the ptnetmap subsystem */ + error = ptnetmap_ctl(nmr, priv->np_na); + break; + } else if (i == NETMAP_VNET_HDR_GET) { + /* get vnet-header length for this netmap port */ + struct ifnet *ifp; + + NMG_LOCK(); + error = netmap_get_na(nmr, &na, &ifp, 0); + if (na && !error) { + nmr->nr_arg1 = na->virt_hdr_len; + } + netmap_unget_na(na, ifp); + NMG_UNLOCK(); + break; + } else if (i == NETMAP_POOLS_INFO_GET) { + /* get information from the memory allocator */ + error = netmap_mem_pools_info_get(nmr, priv->np_na); + break; } else if (i != 0) { D("nr_cmd must be 0 not %d", i); error = EINVAL; @@ -2169,23 +2259,38 @@ NMG_LOCK(); do { u_int memflags; + struct ifnet *ifp; if (priv->np_nifp != NULL) { /* thread already registered */ error = EBUSY; break; } /* find the interface and a reference */ - error = netmap_get_na(nmr, &na, 1 /* create */); /* keep reference */ + error = netmap_get_na(nmr, &na, &ifp, + 1 /* create */); /* keep reference */ if (error) break; if (NETMAP_OWNED_BY_KERN(na)) { - netmap_adapter_put(na); + netmap_unget_na(na, ifp); error = EBUSY; break; } + + if (na->virt_hdr_len && !(nmr->nr_flags & NR_ACCEPT_VNET_HDR)) { + netmap_unget_na(na, ifp); + error = EIO; + break; + } + + error = nm_override_mem(na, nmr->nr_arg2); + if (error) { + netmap_unget_na(na, ifp); + break; + } + error = netmap_do_regif(priv, na, nmr->nr_ringid, nmr->nr_flags); if (error) { /* reg. failed, release priv and ref */ - netmap_adapter_put(na); + netmap_unget_na(na, ifp); break; } nifp = priv->np_nifp; @@ -2200,7 +2305,7 @@ &nmr->nr_arg2); if (error) { netmap_do_unregif(priv); - netmap_adapter_put(na); + netmap_unget_na(na, ifp); break; } if (memflags & NETMAP_MEM_PRIVATE) { @@ -2212,12 +2317,17 @@ } if (nmr->nr_arg3) { - D("requested %d extra buffers", nmr->nr_arg3); + if (netmap_verbose) + D("requested %d extra buffers", nmr->nr_arg3); nmr->nr_arg3 = netmap_extra_alloc(na, &nifp->ni_bufs_head, nmr->nr_arg3); - D("got %d extra buffers", nmr->nr_arg3); + if (netmap_verbose) + D("got %d extra buffers", nmr->nr_arg3); } nmr->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp); + + /* store ifp reference so that priv destructor may release it */ + priv->np_ifp = ifp; } while (0); NMG_UNLOCK(); break; @@ -2240,11 +2350,6 @@ break; } - if (!nm_netmap_on(na)) { - error = ENXIO; - break; - } - t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX); krings = NMR(na, t); qfirst = priv->np_qfirst[t]; @@ -2252,31 +2357,34 @@ for (i = qfirst; i < qlast; i++) { struct netmap_kring *kring = krings + i; - if (nm_kr_tryget(kring)) { - error = EBUSY; - goto out; + struct netmap_ring *ring = kring->ring; + + if (unlikely(nm_kr_tryget(kring, 1, &error))) { + error = (error ? EIO : 0); + continue; } + if (cmd == NIOCTXSYNC) { if (netmap_verbose & NM_VERB_TXSYNC) D("pre txsync ring %d cur %d hwcur %d", - i, kring->ring->cur, + i, ring->cur, kring->nr_hwcur); - if (nm_txsync_prologue(kring) >= kring->nkr_num_slots) { + if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { netmap_ring_reinit(kring); } else if (kring->nm_sync(kring, NAF_FORCE_RECLAIM) == 0) { - nm_txsync_finalize(kring); + nm_sync_finalize(kring); } if (netmap_verbose & NM_VERB_TXSYNC) D("post txsync ring %d cur %d hwcur %d", - i, kring->ring->cur, + i, ring->cur, kring->nr_hwcur); } else { - if (nm_rxsync_prologue(kring) >= kring->nkr_num_slots) { + if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { netmap_ring_reinit(kring); } else if (kring->nm_sync(kring, NAF_FORCE_READ) == 0) { - nm_rxsync_finalize(kring); + nm_sync_finalize(kring); } - microtime(&na->rx_rings[i].ring->ts); + microtime(&ring->ts); } nm_kr_put(kring); } @@ -2323,9 +2431,7 @@ error = EOPNOTSUPP; #endif /* linux */ } -out: - CURVNET_RESTORE(); return (error); } @@ -2345,17 +2451,15 @@ * hidden argument. */ int -netmap_poll(struct cdev *dev, int events, struct thread *td) +netmap_poll(struct netmap_priv_d *priv, int events, NM_SELRECORD_T *sr) { - struct netmap_priv_d *priv = NULL; struct netmap_adapter *na; struct netmap_kring *kring; + struct netmap_ring *ring; u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0; #define want_tx want[NR_TX] #define want_rx want[NR_RX] struct mbq q; /* packets from hw queues to host stack */ - void *pwait = dev; /* linux compatibility */ - int is_kevent = 0; enum txrx t; /* @@ -2365,23 +2469,13 @@ */ int retry_tx = 1, retry_rx = 1; - (void)pwait; - mbq_init(&q); - - /* - * XXX kevent has curthread->tp_fop == NULL, - * so devfs_get_cdevpriv() fails. We circumvent this by passing - * priv as the first argument, which is also useful to avoid - * the selrecord() which are not necessary in that case. + /* transparent mode: send_down is 1 if we have found some + * packets to forward during the rx scan and we have not + * sent them down to the nic yet */ - if (devfs_get_cdevpriv((void **)&priv) != 0) { - is_kevent = 1; - if (netmap_verbose) - D("called from kevent"); - priv = (struct netmap_priv_d *)dev; - } - if (priv == NULL) - return POLLERR; + int send_down = 0; + + mbq_init(&q); if (priv->np_nifp == NULL) { D("No if registered"); @@ -2399,7 +2493,6 @@ want_tx = events & (POLLOUT | POLLWRNORM); want_rx = events & (POLLIN | POLLRDNORM); - /* * check_all_{tx|rx} are set if the card has more than one queue AND * the file descriptor is bound to all of them. If so, we sleep on @@ -2421,6 +2514,32 @@ * slots available. If this fails, then lock and call the sync * routines. */ +#if 1 /* new code- call rx if any of the ring needs to release or read buffers */ + if (want_tx) { + t = NR_TX; + for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { + kring = &NMR(na, t)[i]; + /* XXX compare ring->cur and kring->tail */ + if (!nm_ring_empty(kring->ring)) { + revents |= want[t]; + want[t] = 0; /* also breaks the loop */ + } + } + } + if (want_rx) { + want_rx = 0; /* look for a reason to run the handlers */ + t = NR_RX; + for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { + kring = &NMR(na, t)[i]; + if (kring->ring->cur == kring->ring->tail /* try fetch new buffers */ + || kring->rhead != kring->ring->head /* release buffers */) { + want_rx = 1; + } + } + if (!want_rx) + revents |= events & (POLLIN | POLLRDNORM); /* we have data */ + } +#else /* old code */ for_rx_tx(t) { for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { kring = &NMR(na, t)[i]; @@ -2431,6 +2550,7 @@ } } } +#endif /* old code */ /* * If we want to push packets out (priv->np_txpoll) or @@ -2447,32 +2567,26 @@ * used to skip rings with no pending transmissions. */ flush_tx: - for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_RX]; i++) { + for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_TX]; i++) { int found = 0; kring = &na->tx_rings[i]; - if (!want_tx && kring->ring->cur == kring->nr_hwcur) + ring = kring->ring; + + if (!send_down && !want_tx && ring->cur == kring->nr_hwcur) continue; - /* only one thread does txsync */ - if (nm_kr_tryget(kring)) { - /* either busy or stopped - * XXX if the ring is stopped, sleeping would - * be better. In current code, however, we only - * stop the rings for brief intervals (2014-03-14) - */ - if (netmap_verbose) - RD(2, "%p lost race on txring %d, ok", - priv, i); + + if (nm_kr_tryget(kring, 1, &revents)) continue; - } - if (nm_txsync_prologue(kring) >= kring->nkr_num_slots) { + + if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { netmap_ring_reinit(kring); revents |= POLLERR; } else { if (kring->nm_sync(kring, 0)) revents |= POLLERR; else - nm_txsync_finalize(kring); + nm_sync_finalize(kring); } /* @@ -2489,8 +2603,10 @@ kring->nm_notify(kring, 0); } } - if (want_tx && retry_tx && !is_kevent) { - OS_selrecord(td, check_all_tx ? + /* if there were any packet to forward we must have handled them by now */ + send_down = 0; + if (want_tx && retry_tx && sr) { + nm_os_selrecord(sr, check_all_tx ? &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]].si); retry_tx = 0; goto flush_tx; @@ -2502,22 +2618,18 @@ * Do it on all rings because otherwise we starve. */ if (want_rx) { - int send_down = 0; /* transparent mode */ /* two rounds here for race avoidance */ do_retry_rx: for (i = priv->np_qfirst[NR_RX]; i < priv->np_qlast[NR_RX]; i++) { int found = 0; kring = &na->rx_rings[i]; + ring = kring->ring; - if (nm_kr_tryget(kring)) { - if (netmap_verbose) - RD(2, "%p lost race on rxring %d, ok", - priv, i); + if (unlikely(nm_kr_tryget(kring, 1, &revents))) continue; - } - if (nm_rxsync_prologue(kring) >= kring->nkr_num_slots) { + if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { netmap_ring_reinit(kring); revents |= POLLERR; } @@ -2526,22 +2638,22 @@ /* * transparent mode support: collect packets * from the rxring(s). - * XXX NR_FORWARD should only be read on - * physical or NIC ports */ - if (netmap_fwd ||kring->ring->flags & NR_FORWARD) { + if (nm_may_forward_up(kring)) { ND(10, "forwarding some buffers up %d to %d", - kring->nr_hwcur, kring->ring->cur); + kring->nr_hwcur, ring->cur); netmap_grab_packets(kring, &q, netmap_fwd); } + kring->nr_kflags &= ~NR_FORWARD; if (kring->nm_sync(kring, 0)) revents |= POLLERR; else - nm_rxsync_finalize(kring); + nm_sync_finalize(kring); + send_down |= (kring->nr_kflags & NR_FORWARD); /* host ring only */ if (netmap_no_timestamp == 0 || - kring->ring->flags & NR_TIMESTAMP) { - microtime(&kring->ring->ts); + ring->flags & NR_TIMESTAMP) { + microtime(&ring->ts); } found = kring->rcur != kring->rtail; nm_kr_put(kring); @@ -2552,22 +2664,10 @@ } } - /* transparent mode XXX only during first pass ? */ - if (na->na_flags & NAF_HOST_RINGS) { - kring = &na->rx_rings[na->num_rx_rings]; - if (check_all_rx - && (netmap_fwd || kring->ring->flags & NR_FORWARD)) { - /* XXX fix to use kring fields */ - if (nm_ring_empty(kring->ring)) - send_down = netmap_rxsync_from_host(na, td, dev); - if (!nm_ring_empty(kring->ring)) - revents |= want_rx; - } - } - - if (retry_rx && !is_kevent) - OS_selrecord(td, check_all_rx ? + if (retry_rx && sr) { + nm_os_selrecord(sr, check_all_rx ? &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]].si); + } if (send_down > 0 || retry_rx) { retry_rx = 0; if (send_down) @@ -2582,15 +2682,14 @@ * kring->nr_hwcur and ring->head * are passed to the other endpoint. * - * In this mode we also scan the sw rxring, which in - * turn passes packets up. - * - * XXX Transparent mode at the moment requires to bind all + * Transparent mode requires to bind all * rings to a single file descriptor. */ - if (q.head && na->ifp != NULL) + if (q.head && !nm_kr_tryget(&na->tx_rings[na->num_tx_rings], 1, &revents)) { netmap_send_up(na->ifp, &q); + nm_kr_put(&na->tx_rings[na->num_tx_rings]); + } return (revents); #undef want_tx @@ -2600,8 +2699,6 @@ /*-------------------- driver support routines -------------------*/ -static int netmap_hw_krings_create(struct netmap_adapter *); - /* default notify callback */ static int netmap_notify(struct netmap_kring *kring, int flags) @@ -2609,51 +2706,51 @@ struct netmap_adapter *na = kring->na; enum txrx t = kring->tx; - OS_selwakeup(&kring->si, PI_NET); + nm_os_selwakeup(&kring->si); /* optimization: avoid a wake up on the global * queue if nobody has registered for more * than one ring */ if (na->si_users[t] > 0) - OS_selwakeup(&na->si[t], PI_NET); + nm_os_selwakeup(&na->si[t]); - return 0; + return NM_IRQ_COMPLETED; } +#if 0 +static int +netmap_notify(struct netmap_adapter *na, u_int n_ring, +enum txrx tx, int flags) +{ + if (tx == NR_TX) { + KeSetEvent(notes->TX_EVENT, 0, FALSE); + } + else + { + KeSetEvent(notes->RX_EVENT, 0, FALSE); + } + return 0; +} +#endif /* called by all routines that create netmap_adapters. - * Attach na to the ifp (if any) and provide defaults - * for optional callbacks. Defaults assume that we - * are creating an hardware netmap_adapter. + * provide some defaults and get a reference to the + * memory allocator */ int netmap_attach_common(struct netmap_adapter *na) { - struct ifnet *ifp = na->ifp; - if (na->num_tx_rings == 0 || na->num_rx_rings == 0) { D("%s: invalid rings tx %d rx %d", na->name, na->num_tx_rings, na->num_rx_rings); return EINVAL; } - /* ifp is NULL for virtual adapters (bwrap, non-persistent VALE ports, - * pipes, monitors). For bwrap we actually have a non-null ifp for - * use by the external modules, but that is set after this - * function has been called. - * XXX this is ugly, maybe split this function in two (2014-03-14) - */ - if (ifp != NULL) { - WNA(ifp) = na; - /* the following is only needed for na that use the host port. - * XXX do we have something similar for linux ? - */ #ifdef __FreeBSD__ - na->if_input = ifp->if_input; /* for netmap_send_up */ -#endif /* __FreeBSD__ */ - - NETMAP_SET_CAPABLE(ifp); + if (na->na_flags & NAF_HOST_RINGS && na->ifp) { + na->if_input = na->ifp->if_input; /* for netmap_send_up */ } +#endif /* __FreeBSD__ */ if (na->nm_krings_create == NULL) { /* we assume that we have been called by a driver, * since other port types all provide their own @@ -2666,10 +2763,10 @@ na->nm_notify = netmap_notify; na->active_fds = 0; - if (na->nm_mem == NULL) + if (na->nm_mem == NULL) { /* use the global allocator */ - na->nm_mem = &nm_mem; - netmap_mem_get(na->nm_mem); + na->nm_mem = netmap_mem_get(&nm_mem); + } #ifdef WITH_VALE if (na->nm_bdg_attach == NULL) /* no special nm_bdg_attach callback. On VALE @@ -2677,6 +2774,7 @@ */ na->nm_bdg_attach = netmap_bwrap_attach; #endif + return 0; } @@ -2685,9 +2783,6 @@ void netmap_detach_common(struct netmap_adapter *na) { - if (na->ifp != NULL) - WNA(na->ifp) = NULL; /* XXX do we need this? */ - if (na->tx_rings) { /* XXX should not happen */ D("freeing leftover tx_rings"); na->nm_krings_delete(na); @@ -2696,34 +2791,55 @@ if (na->nm_mem) netmap_mem_put(na->nm_mem); bzero(na, sizeof(*na)); - free(na, M_DEVBUF); + nm_os_free(na); } -/* Wrapper for the register callback provided hardware drivers. - * na->ifp == NULL means the driver module has been +/* Wrapper for the register callback provided netmap-enabled + * hardware drivers. + * nm_iszombie(na) means that the driver module has been * unloaded, so we cannot call into it. - * Note that module unloading, in our patched linux drivers, - * happens under NMG_LOCK and after having stopped all the - * nic rings (see netmap_detach). This provides sufficient - * protection for the other driver-provied callbacks - * (i.e., nm_config and nm_*xsync), that therefore don't need - * to wrapped. + * nm_os_ifnet_lock() must guarantee mutual exclusion with + * module unloading. */ static int -netmap_hw_register(struct netmap_adapter *na, int onoff) +netmap_hw_reg(struct netmap_adapter *na, int onoff) { struct netmap_hw_adapter *hwna = (struct netmap_hw_adapter*)na; + int error = 0; + + nm_os_ifnet_lock(); + + if (nm_iszombie(na)) { + if (onoff) { + error = ENXIO; + } else if (na != NULL) { + na->na_flags &= ~NAF_NETMAP_ON; + } + goto out; + } - if (na->ifp == NULL) - return onoff ? ENXIO : 0; + error = hwna->nm_hw_register(na, onoff); + +out: + nm_os_ifnet_unlock(); - return hwna->nm_hw_register(na, onoff); + return error; +} + +static void +netmap_hw_dtor(struct netmap_adapter *na) +{ + if (nm_iszombie(na) || na->ifp == NULL) + return; + + WNA(na->ifp) = NULL; } /* - * Initialize a ``netmap_adapter`` object created by driver on attach. + * Allocate a ``netmap_adapter`` object, and initialize it from the + * 'arg' passed by the driver on attach. * We allocate a block of memory with room for a struct netmap_adapter * plus two sets of N+2 struct netmap_kring (where N is the number * of hardware rings): @@ -2732,29 +2848,31 @@ * kring N+1 is only used for the selinfo for all queues. // XXX still true ? * Return 0 on success, ENOMEM otherwise. */ -int -netmap_attach(struct netmap_adapter *arg) +static int +_netmap_attach(struct netmap_adapter *arg, size_t size) { struct netmap_hw_adapter *hwna = NULL; - // XXX when is arg == NULL ? - struct ifnet *ifp = arg ? arg->ifp : NULL; + struct ifnet *ifp = NULL; - if (arg == NULL || ifp == NULL) + if (arg == NULL || arg->ifp == NULL) goto fail; - hwna = malloc(sizeof(*hwna), M_DEVBUF, M_NOWAIT | M_ZERO); + ifp = arg->ifp; + hwna = nm_os_malloc(size); if (hwna == NULL) goto fail; hwna->up = *arg; hwna->up.na_flags |= NAF_HOST_RINGS | NAF_NATIVE; strncpy(hwna->up.name, ifp->if_xname, sizeof(hwna->up.name)); hwna->nm_hw_register = hwna->up.nm_register; - hwna->up.nm_register = netmap_hw_register; + hwna->up.nm_register = netmap_hw_reg; if (netmap_attach_common(&hwna->up)) { - free(hwna, M_DEVBUF); + nm_os_free(hwna); goto fail; } netmap_adapter_get(&hwna->up); + NM_ATTACH_NA(ifp, &hwna->up); + #ifdef linux if (ifp->netdev_ops) { /* prepare a clone of the netdev ops */ @@ -2762,7 +2880,7 @@ hwna->nm_ndo.ndo_start_xmit = ifp->netdev_ops; #else hwna->nm_ndo = *ifp->netdev_ops; -#endif +#endif /* NETMAP_LINUX_HAVE_NETDEV_OPS */ } hwna->nm_ndo.ndo_start_xmit = linux_netmap_start_xmit; if (ifp->ethtool_ops) { @@ -2771,11 +2889,14 @@ hwna->nm_eto.set_ringparam = linux_netmap_set_ringparam; #ifdef NETMAP_LINUX_HAVE_SET_CHANNELS hwna->nm_eto.set_channels = linux_netmap_set_channels; -#endif +#endif /* NETMAP_LINUX_HAVE_SET_CHANNELS */ if (arg->nm_config == NULL) { hwna->up.nm_config = netmap_linux_config; } #endif /* linux */ + if (arg->nm_dtor == NULL) { + hwna->up.nm_dtor = netmap_hw_dtor; + } if_printf(ifp, "netmap queues/slots: TX %d/%d, RX %d/%d\n", hwna->up.num_tx_rings, hwna->up.num_tx_desc, @@ -2784,12 +2905,54 @@ fail: D("fail, arg %p ifp %p na %p", arg, ifp, hwna); - if (ifp) - netmap_detach(ifp); return (hwna ? EINVAL : ENOMEM); } +int +netmap_attach(struct netmap_adapter *arg) +{ + return _netmap_attach(arg, sizeof(struct netmap_hw_adapter)); +} + + +#ifdef WITH_PTNETMAP_GUEST +int +netmap_pt_guest_attach(struct netmap_adapter *arg, void *csb, + unsigned int nifp_offset, unsigned int memid) +{ + struct netmap_pt_guest_adapter *ptna; + struct ifnet *ifp = arg ? arg->ifp : NULL; + int error; + + /* get allocator */ + arg->nm_mem = netmap_mem_pt_guest_new(ifp, nifp_offset, memid); + if (arg->nm_mem == NULL) + return ENOMEM; + arg->na_flags |= NAF_MEM_OWNER; + error = _netmap_attach(arg, sizeof(struct netmap_pt_guest_adapter)); + if (error) + return error; + + /* get the netmap_pt_guest_adapter */ + ptna = (struct netmap_pt_guest_adapter *) NA(ifp); + ptna->csb = csb; + + /* Initialize a separate pass-through netmap adapter that is going to + * be used by the ptnet driver only, and so never exposed to netmap + * applications. We only need a subset of the available fields. */ + memset(&ptna->dr, 0, sizeof(ptna->dr)); + ptna->dr.up.ifp = ifp; + ptna->dr.up.nm_mem = netmap_mem_get(ptna->hwup.up.nm_mem); + ptna->dr.up.nm_config = ptna->hwup.up.nm_config; + + ptna->backend_regifs = 0; + + return 0; +} +#endif /* WITH_PTNETMAP_GUEST */ + + void NM_DBG(netmap_adapter_get)(struct netmap_adapter *na) { @@ -2841,28 +3004,29 @@ netmap_detach(struct ifnet *ifp) { struct netmap_adapter *na = NA(ifp); - int skip; if (!na) return; - skip = 0; NMG_LOCK(); - netmap_disable_all_rings(ifp); - na->ifp = NULL; - na->na_flags &= ~NAF_NETMAP_ON; + netmap_set_all_rings(na, NM_KR_LOCKED); + na->na_flags |= NAF_ZOMBIE; /* * if the netmap adapter is not native, somebody * changed it, so we can not release it here. - * The NULL na->ifp will notify the new owner that + * The NAF_ZOMBIE flag will notify the new owner that * the driver is gone. */ if (na->na_flags & NAF_NATIVE) { - skip = netmap_adapter_put(na); + netmap_adapter_put(na); } - /* give them a chance to notice */ - if (skip == 0) - netmap_enable_all_rings(ifp); + /* give active users a chance to notice that NAF_ZOMBIE has been + * turned on, so that they can stop and return an error to userspace. + * Note that this becomes a NOP if there are no active users and, + * therefore, the put() above has deleted the na, since now NA(ifp) is + * NULL. + */ + netmap_enable_all_rings(ifp); NMG_UNLOCK(); } @@ -2883,9 +3047,10 @@ netmap_transmit(struct ifnet *ifp, struct mbuf *m) { struct netmap_adapter *na = NA(ifp); - struct netmap_kring *kring; + struct netmap_kring *kring, *tx_kring; u_int len = MBUF_LEN(m); u_int error = ENOBUFS; + unsigned int txr; struct mbq *q; int space; @@ -2900,6 +3065,16 @@ goto done; } + txr = MBUF_TXQ(m); + if (txr >= na->num_tx_rings) { + txr %= na->num_tx_rings; + } + tx_kring = &NMR(na, NR_TX)[txr]; + + if (tx_kring->nr_mode == NKR_NETMAP_OFF) { + return MBUF_TRANSMIT(na, ifp, m); + } + q = &kring->rx_queue; // XXX reconsider long packets if we handle fragments @@ -2909,6 +3084,11 @@ goto done; } + if (nm_os_mbuf_has_offld(m)) { + RD(1, "%s drop mbuf requiring offloadings", na->name); + goto done; + } + /* protect against rxsync_from_host(), netmap_sw_to_nic() * and maybe other instances of netmap_transmit (the latter * not possible on Linux). @@ -2951,6 +3131,8 @@ * netmap_reset() is called by the driver routines when reinitializing * a ring. The driver is in charge of locking to protect the kring. * If native netmap mode is not set just return NULL. + * If native netmap mode is set, in particular, we have to set nr_mode to + * NKR_NETMAP_ON. */ struct netmap_slot * netmap_reset(struct netmap_adapter *na, enum txrx tx, u_int n, @@ -2975,13 +3157,26 @@ if (tx == NR_TX) { if (n >= na->num_tx_rings) return NULL; + kring = na->tx_rings + n; + + if (kring->nr_pending_mode == NKR_NETMAP_OFF) { + kring->nr_mode = NKR_NETMAP_OFF; + return NULL; + } + // XXX check whether we should use hwcur or rcur new_hwofs = kring->nr_hwcur - new_cur; } else { if (n >= na->num_rx_rings) return NULL; kring = na->rx_rings + n; + + if (kring->nr_pending_mode == NKR_NETMAP_OFF) { + kring->nr_mode = NKR_NETMAP_OFF; + return NULL; + } + new_hwofs = kring->nr_hwtail - new_cur; } lim = kring->nkr_num_slots - 1; @@ -3018,6 +3213,7 @@ * We do the wakeup here, but the ring is not yet reconfigured. * However, we are under lock so there are no races. */ + kring->nr_mode = NKR_NETMAP_ON; kring->nm_notify(kring, 0); return kring->ring->slot; } @@ -3037,10 +3233,9 @@ * - for a nic connected to a switch, call the proper forwarding routine * (see netmap_bwrap_intr_notify) */ -void -netmap_common_irq(struct ifnet *ifp, u_int q, u_int *work_done) +int +netmap_common_irq(struct netmap_adapter *na, u_int q, u_int *work_done) { - struct netmap_adapter *na = NA(ifp); struct netmap_kring *kring; enum txrx t = (work_done ? NR_RX : NR_TX); @@ -3051,15 +3246,20 @@ } if (q >= nma_get_nrings(na, t)) - return; // not a physical queue + return NM_IRQ_PASS; // not a physical queue kring = NMR(na, t) + q; + if (kring->nr_mode == NKR_NETMAP_OFF) { + return NM_IRQ_PASS; + } + if (t == NR_RX) { kring->nr_kflags |= NKR_PENDINTR; // XXX atomic ? *work_done = 1; /* do not fire napi again */ } - kring->nm_notify(kring, 0); + + return kring->nm_notify(kring, 0); } @@ -3067,17 +3267,17 @@ * Default functions to handle rx/tx interrupts from a physical device. * "work_done" is non-null on the RX path, NULL for the TX path. * - * If the card is not in netmap mode, simply return 0, + * If the card is not in netmap mode, simply return NM_IRQ_PASS, * so that the caller proceeds with regular processing. - * Otherwise call netmap_common_irq() and return 1. + * Otherwise call netmap_common_irq(). * * If the card is connected to a netmap file descriptor, * do a selwakeup on the individual queue, plus one on the global one * if needed (multiqueue card _and_ there are multiqueue listeners), - * and return 1. + * and return NR_IRQ_COMPLETED. * * Finally, if called on rx from an interface connected to a switch, - * calls the proper forwarding routine, and return 1. + * calls the proper forwarding routine. */ int netmap_rx_irq(struct ifnet *ifp, u_int q, u_int *work_done) @@ -3091,15 +3291,14 @@ * nm_native_on() here. */ if (!nm_netmap_on(na)) - return 0; + return NM_IRQ_PASS; if (na->na_flags & NAF_SKIP_INTR) { ND("use regular interrupt"); - return 0; + return NM_IRQ_PASS; } - netmap_common_irq(ifp, q, work_done); - return 1; + return netmap_common_irq(na, q, work_done); } @@ -3120,9 +3319,11 @@ void netmap_fini(void) { - netmap_uninit_bridges(); if (netmap_dev) destroy_dev(netmap_dev); + /* we assume that there are no longer netmap users */ + nm_os_ifnet_fini(); + netmap_uninit_bridges(); netmap_mem_fini(); NMG_LOCK_DESTROY(); printf("netmap: unloaded module.\n"); @@ -3155,9 +3356,13 @@ goto fail; #ifdef __FreeBSD__ - nm_vi_init_index(); + nm_os_vi_init_index(); #endif + error = nm_os_ifnet_init(); + if (error) + goto fail; + printf("netmap: loaded module\n"); return (0); fail: diff -u -r -N usr/src/sys/dev/netmap/netmap_freebsd.c /usr/src/sys/dev/netmap/netmap_freebsd.c --- usr/src/sys/dev/netmap/netmap_freebsd.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_freebsd.c 2016-11-23 16:57:57.848710000 +0000 @@ -23,18 +23,19 @@ * SUCH DAMAGE. */ -/* $FreeBSD: releng/11.0/sys/dev/netmap/netmap_freebsd.c 285699 2015-07-19 18:07:25Z luigi $ */ +/* $FreeBSD: head/sys/dev/netmap/netmap_freebsd.c 307706 2016-10-21 06:32:45Z sephe $ */ #include "opt_inet.h" #include "opt_inet6.h" -#include <sys/types.h> +#include <sys/param.h> #include <sys/module.h> #include <sys/errno.h> -#include <sys/param.h> /* defines used in kernel.h */ +#include <sys/jail.h> #include <sys/poll.h> /* POLLIN, POLLOUT */ #include <sys/kernel.h> /* types used in module initialization */ -#include <sys/conf.h> /* DEV_MODULE */ +#include <sys/conf.h> /* DEV_MODULE_ORDERED */ #include <sys/endian.h> +#include <sys/syscallsubr.h> /* kern_ioctl() */ #include <sys/rwlock.h> @@ -50,6 +51,11 @@ #include <sys/malloc.h> #include <sys/socket.h> /* sockaddrs */ #include <sys/selinfo.h> +#include <sys/kthread.h> /* kthread_add() */ +#include <sys/proc.h> /* PROC_LOCK() */ +#include <sys/unistd.h> /* RFNOWAIT */ +#include <sys/sched.h> /* sched_bind() */ +#include <sys/smp.h> /* mp_maxid */ #include <net/if.h> #include <net/if_var.h> #include <net/if_types.h> /* IFT_ETHER */ @@ -61,13 +67,112 @@ #include <net/netmap.h> #include <dev/netmap/netmap_kern.h> +#include <net/netmap_virt.h> #include <dev/netmap/netmap_mem2.h> /* ======================== FREEBSD-SPECIFIC ROUTINES ================== */ +void nm_os_selinfo_init(NM_SELINFO_T *si) { + struct mtx *m = &si->m; + mtx_init(m, "nm_kn_lock", NULL, MTX_DEF); + knlist_init_mtx(&si->si.si_note, m); +} + +void +nm_os_selinfo_uninit(NM_SELINFO_T *si) +{ + /* XXX kqueue(9) needed; these will mirror knlist_init. */ + knlist_delete(&si->si.si_note, curthread, 0 /* not locked */ ); + knlist_destroy(&si->si.si_note); + /* now we don't need the mutex anymore */ + mtx_destroy(&si->m); +} + +void * +nm_os_malloc(size_t size) +{ + return malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); +} + +void * +nm_os_realloc(void *addr, size_t new_size, size_t old_size __unused) +{ + return realloc(addr, new_size, M_DEVBUF, M_NOWAIT | M_ZERO); +} + +void +nm_os_free(void *addr) +{ + free(addr, M_DEVBUF); +} + +void +nm_os_ifnet_lock(void) +{ + IFNET_WLOCK(); +} + +void +nm_os_ifnet_unlock(void) +{ + IFNET_WUNLOCK(); +} + +static int netmap_use_count = 0; + +void +nm_os_get_module(void) +{ + netmap_use_count++; +} + +void +nm_os_put_module(void) +{ + netmap_use_count--; +} + +static void +netmap_ifnet_arrival_handler(void *arg __unused, struct ifnet *ifp) +{ + netmap_undo_zombie(ifp); +} + +static void +netmap_ifnet_departure_handler(void *arg __unused, struct ifnet *ifp) +{ + netmap_make_zombie(ifp); +} + +static eventhandler_tag nm_ifnet_ah_tag; +static eventhandler_tag nm_ifnet_dh_tag; + +int +nm_os_ifnet_init(void) +{ + nm_ifnet_ah_tag = + EVENTHANDLER_REGISTER(ifnet_arrival_event, + netmap_ifnet_arrival_handler, + NULL, EVENTHANDLER_PRI_ANY); + nm_ifnet_dh_tag = + EVENTHANDLER_REGISTER(ifnet_departure_event, + netmap_ifnet_departure_handler, + NULL, EVENTHANDLER_PRI_ANY); + return 0; +} + +void +nm_os_ifnet_fini(void) +{ + EVENTHANDLER_DEREGISTER(ifnet_arrival_event, + nm_ifnet_ah_tag); + EVENTHANDLER_DEREGISTER(ifnet_departure_event, + nm_ifnet_dh_tag); +} + rawsum_t -nm_csum_raw(uint8_t *data, size_t len, rawsum_t cur_sum) +nm_os_csum_raw(uint8_t *data, size_t len, rawsum_t cur_sum) { /* TODO XXX please use the FreeBSD implementation for this. */ uint16_t *words = (uint16_t *)data; @@ -87,7 +192,7 @@ * return value is in network byte order. */ uint16_t -nm_csum_fold(rawsum_t cur_sum) +nm_os_csum_fold(rawsum_t cur_sum) { /* TODO XXX please use the FreeBSD implementation for this. */ while (cur_sum >> 16) @@ -96,17 +201,17 @@ return htobe16((~cur_sum) & 0xFFFF); } -uint16_t nm_csum_ipv4(struct nm_iphdr *iph) +uint16_t nm_os_csum_ipv4(struct nm_iphdr *iph) { #if 0 return in_cksum_hdr((void *)iph); #else - return nm_csum_fold(nm_csum_raw((uint8_t*)iph, sizeof(struct nm_iphdr), 0)); + return nm_os_csum_fold(nm_os_csum_raw((uint8_t*)iph, sizeof(struct nm_iphdr), 0)); #endif } void -nm_csum_tcpudp_ipv4(struct nm_iphdr *iph, void *data, +nm_os_csum_tcpudp_ipv4(struct nm_iphdr *iph, void *data, size_t datalen, uint16_t *check) { #ifdef INET @@ -118,7 +223,7 @@ /* Compute the checksum on TCP/UDP header + payload * (includes the pseudo-header). */ - *check = nm_csum_fold(nm_csum_raw(data, datalen, 0)); + *check = nm_os_csum_fold(nm_os_csum_raw(data, datalen, 0)); #else static int notsupported = 0; if (!notsupported) { @@ -129,12 +234,12 @@ } void -nm_csum_tcpudp_ipv6(struct nm_ipv6hdr *ip6h, void *data, +nm_os_csum_tcpudp_ipv6(struct nm_ipv6hdr *ip6h, void *data, size_t datalen, uint16_t *check) { #ifdef INET6 *check = in6_cksum_pseudo((void*)ip6h, datalen, ip6h->nexthdr, 0); - *check = nm_csum_fold(nm_csum_raw(data, datalen, 0)); + *check = nm_os_csum_fold(nm_os_csum_raw(data, datalen, 0)); #else static int notsupported = 0; if (!notsupported) { @@ -144,13 +249,41 @@ #endif } +/* on FreeBSD we send up one packet at a time */ +void * +nm_os_send_up(struct ifnet *ifp, struct mbuf *m, struct mbuf *prev) +{ + + NA(ifp)->if_input(ifp, m); + return NULL; +} + +int +nm_os_mbuf_has_offld(struct mbuf *m) +{ + return m->m_pkthdr.csum_flags & (CSUM_TCP | CSUM_UDP | CSUM_SCTP | + CSUM_TCP_IPV6 | CSUM_UDP_IPV6 | + CSUM_SCTP_IPV6 | CSUM_TSO); +} + +static void +freebsd_generic_rx_handler(struct ifnet *ifp, struct mbuf *m) +{ + struct netmap_generic_adapter *gna = + (struct netmap_generic_adapter *)NA(ifp); + int stolen = generic_rx_handler(ifp, m); + + if (!stolen) { + gna->save_if_input(ifp, m); + } +} /* * Intercept the rx routine in the standard device driver. * Second argument is non-zero to intercept, 0 to restore */ int -netmap_catch_rx(struct netmap_generic_adapter *gna, int intercept) +nm_os_catch_rx(struct netmap_generic_adapter *gna, int intercept) { struct netmap_adapter *na = &gna->up.up; struct ifnet *ifp = na->ifp; @@ -161,7 +294,7 @@ return EINVAL; /* already set */ } gna->save_if_input = ifp->if_input; - ifp->if_input = generic_rx_handler; + ifp->if_input = freebsd_generic_rx_handler; } else { if (!gna->save_if_input){ D("cannot restore"); @@ -181,18 +314,20 @@ * Second argument is non-zero to intercept, 0 to restore. * On freebsd we just intercept if_transmit. */ -void -netmap_catch_tx(struct netmap_generic_adapter *gna, int enable) +int +nm_os_catch_tx(struct netmap_generic_adapter *gna, int intercept) { struct netmap_adapter *na = &gna->up.up; struct ifnet *ifp = netmap_generic_getifp(gna); - if (enable) { + if (intercept) { na->if_transmit = ifp->if_transmit; ifp->if_transmit = netmap_transmit; } else { ifp->if_transmit = na->if_transmit; } + + return 0; } @@ -213,40 +348,44 @@ * */ int -generic_xmit_frame(struct ifnet *ifp, struct mbuf *m, - void *addr, u_int len, u_int ring_nr) +nm_os_generic_xmit_frame(struct nm_os_gen_arg *a) { int ret; + u_int len = a->len; + struct ifnet *ifp = a->ifp; + struct mbuf *m = a->m; +#if __FreeBSD_version < 1100000 /* - * The mbuf should be a cluster from our special pool, - * so we do not need to do an m_copyback but just copy - * (and eventually, just reference the netmap buffer) + * Old FreeBSD versions. The mbuf has a cluster attached, + * we need to copy from the cluster to the netmap buffer. */ - - if (GET_MBUF_REFCNT(m) != 1) { - D("invalid refcnt %d for %p", - GET_MBUF_REFCNT(m), m); + if (MBUF_REFCNT(m) != 1) { + D("invalid refcnt %d for %p", MBUF_REFCNT(m), m); panic("in generic_xmit_frame"); } - // XXX the ext_size check is unnecessary if we link the netmap buf if (m->m_ext.ext_size < len) { RD(5, "size %d < len %d", m->m_ext.ext_size, len); len = m->m_ext.ext_size; } - if (0) { /* XXX seems to have negligible benefits */ - m->m_ext.ext_buf = m->m_data = addr; - } else { - bcopy(addr, m->m_data, len); - } + bcopy(a->addr, m->m_data, len); +#else /* __FreeBSD_version >= 1100000 */ + /* New FreeBSD versions. Link the external storage to + * the netmap buffer, so that no copy is necessary. */ + m->m_ext.ext_buf = m->m_data = a->addr; + m->m_ext.ext_size = len; +#endif /* __FreeBSD_version >= 1100000 */ + m->m_len = m->m_pkthdr.len = len; - // inc refcount. All ours, we could skip the atomic - atomic_fetchadd_int(PNT_MBUF_REFCNT(m), 1); + + /* mbuf refcnt is not contended, no need to use atomic + * (a memory barrier is enough). */ + SET_MBUF_REFCNT(m, 2); M_HASHTYPE_SET(m, M_HASHTYPE_OPAQUE); - m->m_pkthdr.flowid = ring_nr; + m->m_pkthdr.flowid = a->ring_nr; m->m_pkthdr.rcvif = ifp; /* used for tx notification */ ret = NA(ifp)->if_transmit(ifp, m); - return ret; + return ret ? -1 : 0; } @@ -263,7 +402,7 @@ * way to extract the info from the ifp */ int -generic_find_num_desc(struct ifnet *ifp, unsigned int *tx, unsigned int *rx) +nm_os_generic_find_num_desc(struct ifnet *ifp, unsigned int *tx, unsigned int *rx) { D("called, in tx %d rx %d", *tx, *rx); return 0; @@ -271,16 +410,23 @@ void -generic_find_num_queues(struct ifnet *ifp, u_int *txq, u_int *rxq) +nm_os_generic_find_num_queues(struct ifnet *ifp, u_int *txq, u_int *rxq) { D("called, in txq %d rxq %d", *txq, *rxq); *txq = netmap_generic_rings; *rxq = netmap_generic_rings; } +void +nm_os_generic_set_features(struct netmap_generic_adapter *gna) +{ + + gna->rxsg = 1; /* Supported through m_copydata. */ + gna->txqdisc = 0; /* Not supported. */ +} void -netmap_mitigation_init(struct nm_generic_mit *mit, int idx, struct netmap_adapter *na) +nm_os_mitigation_init(struct nm_generic_mit *mit, int idx, struct netmap_adapter *na) { ND("called"); mit->mit_pending = 0; @@ -290,21 +436,21 @@ void -netmap_mitigation_start(struct nm_generic_mit *mit) +nm_os_mitigation_start(struct nm_generic_mit *mit) { ND("called"); } void -netmap_mitigation_restart(struct nm_generic_mit *mit) +nm_os_mitigation_restart(struct nm_generic_mit *mit) { ND("called"); } int -netmap_mitigation_active(struct nm_generic_mit *mit) +nm_os_mitigation_active(struct nm_generic_mit *mit) { ND("called"); return 0; @@ -312,7 +458,7 @@ void -netmap_mitigation_cleanup(struct nm_generic_mit *mit) +nm_os_mitigation_cleanup(struct nm_generic_mit *mit) { ND("called"); } @@ -342,7 +488,7 @@ } nm_vi_indices; void -nm_vi_init_index(void) +nm_os_vi_init_index(void) { int i; for (i = 0; i < NM_VI_MAX; i++) @@ -398,7 +544,7 @@ * increment this refcount on if_attach(). */ int -nm_vi_persist(const char *name, struct ifnet **ret) +nm_os_vi_persist(const char *name, struct ifnet **ret) { struct ifnet *ifp; u_short macaddr_hi; @@ -438,15 +584,215 @@ *ret = ifp; return 0; } + /* unregister from the system and drop the final refcount */ void -nm_vi_detach(struct ifnet *ifp) +nm_os_vi_detach(struct ifnet *ifp) { nm_vi_free_index(((char *)IF_LLADDR(ifp))[5]); ether_ifdetach(ifp); if_free(ifp); } +/* ======================== PTNETMAP SUPPORT ========================== */ + +#ifdef WITH_PTNETMAP_GUEST +#include <sys/bus.h> +#include <sys/rman.h> +#include <machine/bus.h> /* bus_dmamap_* */ +#include <machine/resource.h> +#include <dev/pci/pcivar.h> +#include <dev/pci/pcireg.h> +/* + * ptnetmap memory device (memdev) for freebsd guest, + * ssed to expose host netmap memory to the guest through a PCI BAR. + */ + +/* + * ptnetmap memdev private data structure + */ +struct ptnetmap_memdev { + device_t dev; + struct resource *pci_io; + struct resource *pci_mem; + struct netmap_mem_d *nm_mem; +}; + +static int ptn_memdev_probe(device_t); +static int ptn_memdev_attach(device_t); +static int ptn_memdev_detach(device_t); +static int ptn_memdev_shutdown(device_t); + +static device_method_t ptn_memdev_methods[] = { + DEVMETHOD(device_probe, ptn_memdev_probe), + DEVMETHOD(device_attach, ptn_memdev_attach), + DEVMETHOD(device_detach, ptn_memdev_detach), + DEVMETHOD(device_shutdown, ptn_memdev_shutdown), + DEVMETHOD_END +}; + +static driver_t ptn_memdev_driver = { + PTNETMAP_MEMDEV_NAME, + ptn_memdev_methods, + sizeof(struct ptnetmap_memdev), +}; + +/* We use (SI_ORDER_MIDDLE+1) here, see DEV_MODULE_ORDERED() invocation + * below. */ +static devclass_t ptnetmap_devclass; +DRIVER_MODULE_ORDERED(ptn_memdev, pci, ptn_memdev_driver, ptnetmap_devclass, + NULL, NULL, SI_ORDER_MIDDLE + 1); + +/* + * Map host netmap memory through PCI-BAR in the guest OS, + * returning physical (nm_paddr) and virtual (nm_addr) addresses + * of the netmap memory mapped in the guest. + */ +int +nm_os_pt_memdev_iomap(struct ptnetmap_memdev *ptn_dev, vm_paddr_t *nm_paddr, + void **nm_addr, uint64_t *mem_size) +{ + int rid; + + D("ptn_memdev_driver iomap"); + + rid = PCIR_BAR(PTNETMAP_MEM_PCI_BAR); + *mem_size = bus_read_4(ptn_dev->pci_io, PTNET_MDEV_IO_MEMSIZE_HI); + *mem_size = bus_read_4(ptn_dev->pci_io, PTNET_MDEV_IO_MEMSIZE_LO) | + (*mem_size << 32); + + /* map memory allocator */ + ptn_dev->pci_mem = bus_alloc_resource(ptn_dev->dev, SYS_RES_MEMORY, + &rid, 0, ~0, *mem_size, RF_ACTIVE); + if (ptn_dev->pci_mem == NULL) { + *nm_paddr = 0; + *nm_addr = 0; + return ENOMEM; + } + + *nm_paddr = rman_get_start(ptn_dev->pci_mem); + *nm_addr = rman_get_virtual(ptn_dev->pci_mem); + + D("=== BAR %d start %lx len %lx mem_size %lx ===", + PTNETMAP_MEM_PCI_BAR, + (unsigned long)(*nm_paddr), + (unsigned long)rman_get_size(ptn_dev->pci_mem), + (unsigned long)*mem_size); + return (0); +} + +uint32_t +nm_os_pt_memdev_ioread(struct ptnetmap_memdev *ptn_dev, unsigned int reg) +{ + return bus_read_4(ptn_dev->pci_io, reg); +} + +/* Unmap host netmap memory. */ +void +nm_os_pt_memdev_iounmap(struct ptnetmap_memdev *ptn_dev) +{ + D("ptn_memdev_driver iounmap"); + + if (ptn_dev->pci_mem) { + bus_release_resource(ptn_dev->dev, SYS_RES_MEMORY, + PCIR_BAR(PTNETMAP_MEM_PCI_BAR), ptn_dev->pci_mem); + ptn_dev->pci_mem = NULL; + } +} + +/* Device identification routine, return BUS_PROBE_DEFAULT on success, + * positive on failure */ +static int +ptn_memdev_probe(device_t dev) +{ + char desc[256]; + + if (pci_get_vendor(dev) != PTNETMAP_PCI_VENDOR_ID) + return (ENXIO); + if (pci_get_device(dev) != PTNETMAP_PCI_DEVICE_ID) + return (ENXIO); + + snprintf(desc, sizeof(desc), "%s PCI adapter", + PTNETMAP_MEMDEV_NAME); + device_set_desc_copy(dev, desc); + + return (BUS_PROBE_DEFAULT); +} + +/* Device initialization routine. */ +static int +ptn_memdev_attach(device_t dev) +{ + struct ptnetmap_memdev *ptn_dev; + int rid; + uint16_t mem_id; + + D("ptn_memdev_driver attach"); + + ptn_dev = device_get_softc(dev); + ptn_dev->dev = dev; + + pci_enable_busmaster(dev); + + rid = PCIR_BAR(PTNETMAP_IO_PCI_BAR); + ptn_dev->pci_io = bus_alloc_resource_any(dev, SYS_RES_IOPORT, &rid, + RF_ACTIVE); + if (ptn_dev->pci_io == NULL) { + device_printf(dev, "cannot map I/O space\n"); + return (ENXIO); + } + + mem_id = bus_read_4(ptn_dev->pci_io, PTNET_MDEV_IO_MEMID); + + /* create guest allocator */ + ptn_dev->nm_mem = netmap_mem_pt_guest_attach(ptn_dev, mem_id); + if (ptn_dev->nm_mem == NULL) { + ptn_memdev_detach(dev); + return (ENOMEM); + } + netmap_mem_get(ptn_dev->nm_mem); + + D("ptn_memdev_driver probe OK - host_mem_id: %d", mem_id); + + return (0); +} + +/* Device removal routine. */ +static int +ptn_memdev_detach(device_t dev) +{ + struct ptnetmap_memdev *ptn_dev; + + D("ptn_memdev_driver detach"); + ptn_dev = device_get_softc(dev); + + if (ptn_dev->nm_mem) { + netmap_mem_put(ptn_dev->nm_mem); + ptn_dev->nm_mem = NULL; + } + if (ptn_dev->pci_mem) { + bus_release_resource(dev, SYS_RES_MEMORY, + PCIR_BAR(PTNETMAP_MEM_PCI_BAR), ptn_dev->pci_mem); + ptn_dev->pci_mem = NULL; + } + if (ptn_dev->pci_io) { + bus_release_resource(dev, SYS_RES_IOPORT, + PCIR_BAR(PTNETMAP_IO_PCI_BAR), ptn_dev->pci_io); + ptn_dev->pci_io = NULL; + } + + return (0); +} + +static int +ptn_memdev_shutdown(device_t dev) +{ + D("ptn_memdev_driver shutdown"); + return bus_generic_shutdown(dev); +} + +#endif /* WITH_PTNETMAP_GUEST */ + /* * In order to track whether pages are still mapped, we hook into * the standard cdev_pager and intercept the constructor and @@ -606,7 +952,7 @@ * the device (/dev/netmap) so we cannot do anything useful. * To track close() on individual file descriptors we pass netmap_dtor() to * devfs_set_cdevpriv() on open(). The FreeBSD kernel will call the destructor - * when the last fd pointing to the device is closed. + * when the last fd pointing to the device is closed. * * Note that FreeBSD does not even munmap() on close() so we also have * to track mmap() ourselves, and postpone the call to @@ -634,26 +980,251 @@ (void)devtype; (void)td; - priv = malloc(sizeof(struct netmap_priv_d), M_DEVBUF, - M_NOWAIT | M_ZERO); - if (priv == NULL) - return ENOMEM; - priv->np_refs = 1; + NMG_LOCK(); + priv = netmap_priv_new(); + if (priv == NULL) { + error = ENOMEM; + goto out; + } error = devfs_set_cdevpriv(priv, netmap_dtor); if (error) { - free(priv, M_DEVBUF); - } else { - NMG_LOCK(); - netmap_use_count++; - NMG_UNLOCK(); + netmap_priv_delete(priv); } +out: + NMG_UNLOCK(); return error; } +/******************** kthread wrapper ****************/ +#include <sys/sysproto.h> +u_int +nm_os_ncpus(void) +{ + return mp_maxid + 1; +} + +struct nm_kthread_ctx { + struct thread *user_td; /* thread user-space (kthread creator) to send ioctl */ + struct ptnetmap_cfgentry_bhyve cfg; + + /* worker function and parameter */ + nm_kthread_worker_fn_t worker_fn; + void *worker_private; + + struct nm_kthread *nmk; + + /* integer to manage multiple worker contexts (e.g., RX or TX on ptnetmap) */ + long type; +}; + +struct nm_kthread { + struct thread *worker; + struct mtx worker_lock; + uint64_t scheduled; /* pending wake_up request */ + struct nm_kthread_ctx worker_ctx; + int run; /* used to stop kthread */ + int attach_user; /* kthread attached to user_process */ + int affinity; +}; + +void inline +nm_os_kthread_wakeup_worker(struct nm_kthread *nmk) +{ + /* + * There may be a race between FE and BE, + * which call both this function, and worker kthread, + * that reads nmk->scheduled. + * + * For us it is not important the counter value, + * but simply that it has changed since the last + * time the kthread saw it. + */ + mtx_lock(&nmk->worker_lock); + nmk->scheduled++; + if (nmk->worker_ctx.cfg.wchan) { + wakeup((void *)nmk->worker_ctx.cfg.wchan); + } + mtx_unlock(&nmk->worker_lock); +} + +void inline +nm_os_kthread_send_irq(struct nm_kthread *nmk) +{ + struct nm_kthread_ctx *ctx = &nmk->worker_ctx; + int err; + + if (ctx->user_td && ctx->cfg.ioctl_fd > 0) { + err = kern_ioctl(ctx->user_td, ctx->cfg.ioctl_fd, ctx->cfg.ioctl_cmd, + (caddr_t)&ctx->cfg.ioctl_data); + if (err) { + D("kern_ioctl error: %d ioctl parameters: fd %d com %lu data %p", + err, ctx->cfg.ioctl_fd, (unsigned long)ctx->cfg.ioctl_cmd, + &ctx->cfg.ioctl_data); + } + } +} + +static void +nm_kthread_worker(void *data) +{ + struct nm_kthread *nmk = data; + struct nm_kthread_ctx *ctx = &nmk->worker_ctx; + uint64_t old_scheduled = nmk->scheduled; + + if (nmk->affinity >= 0) { + thread_lock(curthread); + sched_bind(curthread, nmk->affinity); + thread_unlock(curthread); + } + + while (nmk->run) { + /* + * check if the parent process dies + * (when kthread is attached to user process) + */ + if (ctx->user_td) { + PROC_LOCK(curproc); + thread_suspend_check(0); + PROC_UNLOCK(curproc); + } else { + kthread_suspend_check(); + } + + /* + * if wchan is not defined, we don't have notification + * mechanism and we continually execute worker_fn() + */ + if (!ctx->cfg.wchan) { + ctx->worker_fn(ctx->worker_private); /* worker body */ + } else { + /* checks if there is a pending notification */ + mtx_lock(&nmk->worker_lock); + if (likely(nmk->scheduled != old_scheduled)) { + old_scheduled = nmk->scheduled; + mtx_unlock(&nmk->worker_lock); + + ctx->worker_fn(ctx->worker_private); /* worker body */ + + continue; + } else if (nmk->run) { + /* wait on event with one second timeout */ + msleep_spin((void *)ctx->cfg.wchan, &nmk->worker_lock, + "nmk_ev", hz); + nmk->scheduled++; + } + mtx_unlock(&nmk->worker_lock); + } + } + + kthread_exit(); +} + +void +nm_os_kthread_set_affinity(struct nm_kthread *nmk, int affinity) +{ + nmk->affinity = affinity; +} + +struct nm_kthread * +nm_os_kthread_create(struct nm_kthread_cfg *cfg, unsigned int cfgtype, + void *opaque) +{ + struct nm_kthread *nmk = NULL; + + if (cfgtype != PTNETMAP_CFGTYPE_BHYVE) { + D("Unsupported cfgtype %u", cfgtype); + return NULL; + } + + nmk = malloc(sizeof(*nmk), M_DEVBUF, M_NOWAIT | M_ZERO); + if (!nmk) + return NULL; + + mtx_init(&nmk->worker_lock, "nm_kthread lock", NULL, MTX_SPIN); + nmk->worker_ctx.worker_fn = cfg->worker_fn; + nmk->worker_ctx.worker_private = cfg->worker_private; + nmk->worker_ctx.type = cfg->type; + nmk->affinity = -1; + + /* attach kthread to user process (ptnetmap) */ + nmk->attach_user = cfg->attach_user; + + /* store kick/interrupt configuration */ + if (opaque) { + nmk->worker_ctx.cfg = *((struct ptnetmap_cfgentry_bhyve *)opaque); + } + + return nmk; +} + +int +nm_os_kthread_start(struct nm_kthread *nmk) +{ + struct proc *p = NULL; + int error = 0; + + if (nmk->worker) { + return EBUSY; + } + + /* check if we want to attach kthread to user process */ + if (nmk->attach_user) { + nmk->worker_ctx.user_td = curthread; + p = curthread->td_proc; + } + + /* enable kthread main loop */ + nmk->run = 1; + /* create kthread */ + if((error = kthread_add(nm_kthread_worker, nmk, p, + &nmk->worker, RFNOWAIT /* to be checked */, 0, "nm-kthread-%ld", + nmk->worker_ctx.type))) { + goto err; + } + + D("nm_kthread started td %p", nmk->worker); + + return 0; +err: + D("nm_kthread start failed err %d", error); + nmk->worker = NULL; + return error; +} + +void +nm_os_kthread_stop(struct nm_kthread *nmk) +{ + if (!nmk->worker) { + return; + } + /* tell to kthread to exit from main loop */ + nmk->run = 0; + + /* wake up kthread if it sleeps */ + kthread_resume(nmk->worker); + nm_os_kthread_wakeup_worker(nmk); + + nmk->worker = NULL; +} + +void +nm_os_kthread_delete(struct nm_kthread *nmk) +{ + if (!nmk) + return; + if (nmk->worker) { + nm_os_kthread_stop(nmk); + } + + memset(&nmk->worker_ctx.cfg, 0, sizeof(nmk->worker_ctx.cfg)); + + free(nmk, M_DEVBUF); +} + /******************** kqueue support ****************/ /* - * The OS_selwakeup also needs to issue a KNOTE_UNLOCKED. + * nm_os_selwakeup also needs to issue a KNOTE_UNLOCKED. * We use a non-zero argument to distinguish the call from the one * in kevent_scan() which instead also needs to run netmap_poll(). * The knote uses a global mutex for the time being. We might @@ -672,17 +1243,23 @@ void -freebsd_selwakeup(struct nm_selinfo *si, int pri) +nm_os_selwakeup(struct nm_selinfo *si) { if (netmap_verbose) D("on knote %p", &si->si.si_note); - selwakeuppri(&si->si, pri); + selwakeuppri(&si->si, PI_NET); /* use a non-zero hint to tell the notification from the * call done in kqueue_scan() which uses 0 */ KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */); } +void +nm_os_selrecord(struct thread *td, struct nm_selinfo *si) +{ + selrecord(td, &si->si); +} + static void netmap_knrdetach(struct knote *kn) { @@ -728,7 +1305,7 @@ RD(5, "curthread changed %p %p", curthread, priv->np_td); return 1; } else { - revents = netmap_poll((void *)priv, events, curthread); + revents = netmap_poll(priv, events, NULL); return (events & revents) ? 1 : 0; } } @@ -801,13 +1378,47 @@ return 0; } +static int +freebsd_netmap_poll(struct cdev *cdevi __unused, int events, struct thread *td) +{ + struct netmap_priv_d *priv; + if (devfs_get_cdevpriv((void **)&priv)) { + return POLLERR; + } + return netmap_poll(priv, events, td); +} + +static int +freebsd_netmap_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t data, + int ffla __unused, struct thread *td) +{ + int error; + struct netmap_priv_d *priv; + + CURVNET_SET(TD_TO_VNET(td)); + error = devfs_get_cdevpriv((void **)&priv); + if (error) { + /* XXX ENOENT should be impossible, since the priv + * is now created in the open */ + if (error == ENOENT) + error = ENXIO; + goto out; + } + error = netmap_ioctl(priv, cmd, data, td); +out: + CURVNET_RESTORE(); + + return error; +} + +extern struct cdevsw netmap_cdevsw; /* XXX used in netmap.c, should go elsewhere */ struct cdevsw netmap_cdevsw = { .d_version = D_VERSION, .d_name = "netmap", .d_open = netmap_open, .d_mmap_single = netmap_mmap_single, - .d_ioctl = netmap_ioctl, - .d_poll = netmap_poll, + .d_ioctl = freebsd_netmap_ioctl, + .d_poll = freebsd_netmap_poll, .d_kqfilter = netmap_kqfilter, .d_close = netmap_close, }; @@ -852,6 +1463,24 @@ return (error); } - +#ifdef DEV_MODULE_ORDERED +/* + * The netmap module contains three drivers: (i) the netmap character device + * driver; (ii) the ptnetmap memdev PCI device driver, (iii) the ptnet PCI + * device driver. The attach() routines of both (ii) and (iii) need the + * lock of the global allocator, and such lock is initialized in netmap_init(), + * which is part of (i). + * Therefore, we make sure that (i) is loaded before (ii) and (iii), using + * the 'order' parameter of driver declaration macros. For (i), we specify + * SI_ORDER_MIDDLE, while higher orders are used with the DRIVER_MODULE_ORDERED + * macros for (ii) and (iii). + */ +DEV_MODULE_ORDERED(netmap, netmap_loader, NULL, SI_ORDER_MIDDLE); +#else /* !DEV_MODULE_ORDERED */ DEV_MODULE(netmap, netmap_loader, NULL); +#endif /* DEV_MODULE_ORDERED */ +MODULE_DEPEND(netmap, pci, 1, 1, 1); MODULE_VERSION(netmap, 1); +/* reduce conditional code */ +// linux API, use for the knlist in FreeBSD +/* use a private mutex for the knlist */ diff -u -r -N usr/src/sys/dev/netmap/netmap_generic.c /usr/src/sys/dev/netmap/netmap_generic.c --- usr/src/sys/dev/netmap/netmap_generic.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_generic.c 2016-11-23 16:57:57.849427000 +0000 @@ -1,5 +1,7 @@ /* - * Copyright (C) 2013-2014 Universita` di Pisa. All rights reserved. + * Copyright (C) 2013-2016 Vincenzo Maffione + * Copyright (C) 2013-2016 Luigi Rizzo + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -63,7 +65,7 @@ #ifdef __FreeBSD__ #include <sys/cdefs.h> /* prerequisite */ -__FBSDID("$FreeBSD: releng/11.0/sys/dev/netmap/netmap_generic.c 298955 2016-05-03 03:41:25Z pfg $"); +__FBSDID("$FreeBSD: head/sys/dev/netmap/netmap_generic.c 274353 2014-11-10 20:19:58Z luigi $"); #include <sys/types.h> #include <sys/errno.h> @@ -83,25 +85,25 @@ #define rtnl_lock() ND("rtnl_lock called") #define rtnl_unlock() ND("rtnl_unlock called") -#define MBUF_TXQ(m) ((m)->m_pkthdr.flowid) #define MBUF_RXQ(m) ((m)->m_pkthdr.flowid) #define smp_mb() /* * FreeBSD mbuf allocator/deallocator in emulation mode: + */ +#if __FreeBSD_version < 1100000 + +/* + * For older versions of FreeBSD: * * We allocate EXT_PACKET mbuf+clusters, but need to set M_NOFREE * so that the destructor, if invoked, will not free the packet. - * In principle we should set the destructor only on demand, + * In principle we should set the destructor only on demand, * but since there might be a race we better do it on allocation. * As a consequence, we also need to set the destructor or we * would leak buffers. */ -/* - * mbuf wrappers - */ - /* mbuf destructor, also need to change the type to EXT_EXTREF, * add an M_NOFREE flag, and then clear the flag and * chain into uma_zfree(zone_pack, mf) @@ -112,35 +114,93 @@ (m)->m_ext.ext_type = EXT_EXTREF; \ } while (0) -static void -netmap_default_mbuf_destructor(struct mbuf *m) +static int +void_mbuf_dtor(struct mbuf *m, void *arg1, void *arg2) { /* restore original mbuf */ m->m_ext.ext_buf = m->m_data = m->m_ext.ext_arg1; m->m_ext.ext_arg1 = NULL; m->m_ext.ext_type = EXT_PACKET; m->m_ext.ext_free = NULL; - if (GET_MBUF_REFCNT(m) == 0) + if (MBUF_REFCNT(m) == 0) SET_MBUF_REFCNT(m, 1); uma_zfree(zone_pack, m); + + return 0; } static inline struct mbuf * -netmap_get_mbuf(int len) +nm_os_get_mbuf(struct ifnet *ifp, int len) { struct mbuf *m; + + (void)ifp; m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (m) { - m->m_flags |= M_NOFREE; /* XXXNP: Almost certainly incorrect. */ + /* m_getcl() (mb_ctor_mbuf) has an assert that checks that + * M_NOFREE flag is not specified as third argument, + * so we have to set M_NOFREE after m_getcl(). */ + m->m_flags |= M_NOFREE; m->m_ext.ext_arg1 = m->m_ext.ext_buf; // XXX save - m->m_ext.ext_free = (void *)netmap_default_mbuf_destructor; + m->m_ext.ext_free = (void *)void_mbuf_dtor; m->m_ext.ext_type = EXT_EXTREF; - ND(5, "create m %p refcnt %d", m, GET_MBUF_REFCNT(m)); + ND(5, "create m %p refcnt %d", m, MBUF_REFCNT(m)); } return m; } +#else /* __FreeBSD_version >= 1100000 */ + +/* + * Newer versions of FreeBSD, using a straightforward scheme. + * + * We allocate mbufs with m_gethdr(), since the mbuf header is needed + * by the driver. We also attach a customly-provided external storage, + * which in this case is a netmap buffer. When calling m_extadd(), however + * we pass a NULL address, since the real address (and length) will be + * filled in by nm_os_generic_xmit_frame() right before calling + * if_transmit(). + * + * The dtor function does nothing, however we need it since mb_free_ext() + * has a KASSERT(), checking that the mbuf dtor function is not NULL. + */ + +#define SET_MBUF_DESTRUCTOR(m, fn) do { \ + (m)->m_ext.ext_free = (void *)fn; \ +} while (0) + +static void void_mbuf_dtor(struct mbuf *m, void *arg1, void *arg2) { } + +static inline struct mbuf * +nm_os_get_mbuf(struct ifnet *ifp, int len) +{ + struct mbuf *m; + + (void)ifp; + (void)len; + + m = m_gethdr(M_NOWAIT, MT_DATA); + if (m == NULL) { + return m; + } + + m_extadd(m, NULL /* buf */, 0 /* size */, void_mbuf_dtor, + NULL, NULL, 0, EXT_NET_DRV); + + return m; +} + +#endif /* __FreeBSD_version >= 1100000 */ + +#elif defined _WIN32 +#include "win_glue.h" + +#define rtnl_lock() ND("rtnl_lock called") +#define rtnl_unlock() ND("rtnl_unlock called") +#define MBUF_TXQ(m) 0//((m)->m_pkthdr.flowid) +#define MBUF_RXQ(m) 0//((m)->m_pkthdr.flowid) +#define smp_mb() //XXX: to be correctly defined #else /* linux */ @@ -150,7 +210,12 @@ #include <linux/ethtool.h> /* struct ethtool_ops, get_ringparam */ #include <linux/hrtimer.h> -//#define REG_RESET +static inline struct mbuf * +nm_os_get_mbuf(struct ifnet *ifp, int len) +{ + return alloc_skb(ifp->needed_headroom + len + + ifp->needed_tailroom, GFP_ATOMIC); +} #endif /* linux */ @@ -161,8 +226,21 @@ #include <dev/netmap/netmap_mem2.h> +#define for_each_kring_n(_i, _k, _karr, _n) \ + for (_k=_karr, _i = 0; _i < _n; (_k)++, (_i)++) + +#define for_each_tx_kring(_i, _k, _na) \ + for_each_kring_n(_i, _k, (_na)->tx_rings, (_na)->num_tx_rings) +#define for_each_tx_kring_h(_i, _k, _na) \ + for_each_kring_n(_i, _k, (_na)->tx_rings, (_na)->num_tx_rings + 1) + +#define for_each_rx_kring(_i, _k, _na) \ + for_each_kring_n(_i, _k, (_na)->rx_rings, (_na)->num_rx_rings) +#define for_each_rx_kring_h(_i, _k, _na) \ + for_each_kring_n(_i, _k, (_na)->rx_rings, (_na)->num_rx_rings + 1) + -/* ======================== usage stats =========================== */ +/* ======================== PERFORMANCE STATISTICS =========================== */ #ifdef RATE_GENERIC #define IFRATE(x) x @@ -170,6 +248,8 @@ unsigned long txpkt; unsigned long txsync; unsigned long txirq; + unsigned long txrepl; + unsigned long txdrop; unsigned long rxpkt; unsigned long rxirq; unsigned long rxsync; @@ -194,6 +274,8 @@ RATE_PRINTK(txpkt); RATE_PRINTK(txsync); RATE_PRINTK(txirq); + RATE_PRINTK(txrepl); + RATE_PRINTK(txdrop); RATE_PRINTK(rxpkt); RATE_PRINTK(rxsync); RATE_PRINTK(rxirq); @@ -230,94 +312,220 @@ * the poller threads. Differently from netmap_rx_irq(), we check * only NAF_NETMAP_ON instead of NAF_NATIVE_ON to enable the irq. */ -static void -netmap_generic_irq(struct ifnet *ifp, u_int q, u_int *work_done) +void +netmap_generic_irq(struct netmap_adapter *na, u_int q, u_int *work_done) { - struct netmap_adapter *na = NA(ifp); if (unlikely(!nm_netmap_on(na))) return; - netmap_common_irq(ifp, q, work_done); + netmap_common_irq(na, q, work_done); +#ifdef RATE_GENERIC + if (work_done) + rate_ctx.new.rxirq++; + else + rate_ctx.new.txirq++; +#endif /* RATE_GENERIC */ } +static int +generic_netmap_unregister(struct netmap_adapter *na) +{ + struct netmap_generic_adapter *gna = (struct netmap_generic_adapter *)na; + struct netmap_kring *kring = NULL; + int i, r; + + if (na->active_fds == 0) { + D("Generic adapter %p goes off", na); + rtnl_lock(); + + na->na_flags &= ~NAF_NETMAP_ON; + + /* Release packet steering control. */ + nm_os_catch_tx(gna, 0); + + /* Stop intercepting packets on the RX path. */ + nm_os_catch_rx(gna, 0); + + rtnl_unlock(); + } + + for_each_rx_kring_h(r, kring, na) { + if (nm_kring_pending_off(kring)) { + D("RX ring %d of generic adapter %p goes off", r, na); + kring->nr_mode = NKR_NETMAP_OFF; + } + } + for_each_tx_kring_h(r, kring, na) { + if (nm_kring_pending_off(kring)) { + kring->nr_mode = NKR_NETMAP_OFF; + D("TX ring %d of generic adapter %p goes off", r, na); + } + } + + for_each_rx_kring(r, kring, na) { + /* Free the mbufs still pending in the RX queues, + * that did not end up into the corresponding netmap + * RX rings. */ + mbq_safe_purge(&kring->rx_queue); + nm_os_mitigation_cleanup(&gna->mit[r]); + } + + /* Decrement reference counter for the mbufs in the + * TX pools. These mbufs can be still pending in drivers, + * (e.g. this happens with virtio-net driver, which + * does lazy reclaiming of transmitted mbufs). */ + for_each_tx_kring(r, kring, na) { + /* We must remove the destructor on the TX event, + * because the destructor invokes netmap code, and + * the netmap module may disappear before the + * TX event is consumed. */ + mtx_lock_spin(&kring->tx_event_lock); + if (kring->tx_event) { + SET_MBUF_DESTRUCTOR(kring->tx_event, NULL); + } + kring->tx_event = NULL; + mtx_unlock_spin(&kring->tx_event_lock); + } + + if (na->active_fds == 0) { + nm_os_free(gna->mit); + + for_each_rx_kring(r, kring, na) { + mbq_safe_fini(&kring->rx_queue); + } + + for_each_tx_kring(r, kring, na) { + mtx_destroy(&kring->tx_event_lock); + if (kring->tx_pool == NULL) { + continue; + } + + for (i=0; i<na->num_tx_desc; i++) { + if (kring->tx_pool[i]) { + m_freem(kring->tx_pool[i]); + } + } + nm_os_free(kring->tx_pool); + kring->tx_pool = NULL; + } + +#ifdef RATE_GENERIC + if (--rate_ctx.refcount == 0) { + D("del_timer()"); + del_timer(&rate_ctx.timer); + } +#endif + } + + return 0; +} /* Enable/disable netmap mode for a generic network interface. */ static int generic_netmap_register(struct netmap_adapter *na, int enable) { struct netmap_generic_adapter *gna = (struct netmap_generic_adapter *)na; - struct mbuf *m; + struct netmap_kring *kring = NULL; int error; int i, r; - if (!na) + if (!na) { return EINVAL; + } -#ifdef REG_RESET - error = ifp->netdev_ops->ndo_stop(ifp); - if (error) { - return error; - } -#endif /* REG_RESET */ - - if (enable) { /* Enable netmap mode. */ - /* Init the mitigation support on all the rx queues. */ - gna->mit = malloc(na->num_rx_rings * sizeof(struct nm_generic_mit), - M_DEVBUF, M_NOWAIT | M_ZERO); + if (!enable) { + /* This is actually an unregif. */ + return generic_netmap_unregister(na); + } + + if (na->active_fds == 0) { + D("Generic adapter %p goes on", na); + /* Do all memory allocations when (na->active_fds == 0), to + * simplify error management. */ + + /* Allocate memory for mitigation support on all the rx queues. */ + gna->mit = nm_os_malloc(na->num_rx_rings * sizeof(struct nm_generic_mit)); if (!gna->mit) { D("mitigation allocation failed"); error = ENOMEM; goto out; } - for (r=0; r<na->num_rx_rings; r++) - netmap_mitigation_init(&gna->mit[r], r, na); - /* Initialize the rx queue, as generic_rx_handler() can - * be called as soon as netmap_catch_rx() returns. - */ - for (r=0; r<na->num_rx_rings; r++) { - mbq_safe_init(&na->rx_rings[r].rx_queue); + for_each_rx_kring(r, kring, na) { + /* Init mitigation support. */ + nm_os_mitigation_init(&gna->mit[r], r, na); + + /* Initialize the rx queue, as generic_rx_handler() can + * be called as soon as nm_os_catch_rx() returns. + */ + mbq_safe_init(&kring->rx_queue); } /* - * Preallocate packet buffers for the tx rings. + * Prepare mbuf pools (parallel to the tx rings), for packet + * transmission. Don't preallocate the mbufs here, it's simpler + * to leave this task to txsync. */ - for (r=0; r<na->num_tx_rings; r++) - na->tx_rings[r].tx_pool = NULL; - for (r=0; r<na->num_tx_rings; r++) { - na->tx_rings[r].tx_pool = malloc(na->num_tx_desc * sizeof(struct mbuf *), - M_DEVBUF, M_NOWAIT | M_ZERO); - if (!na->tx_rings[r].tx_pool) { + for_each_tx_kring(r, kring, na) { + kring->tx_pool = NULL; + } + for_each_tx_kring(r, kring, na) { + kring->tx_pool = + nm_os_malloc(na->num_tx_desc * sizeof(struct mbuf *)); + if (!kring->tx_pool) { D("tx_pool allocation failed"); error = ENOMEM; goto free_tx_pools; } - for (i=0; i<na->num_tx_desc; i++) - na->tx_rings[r].tx_pool[i] = NULL; - for (i=0; i<na->num_tx_desc; i++) { - m = netmap_get_mbuf(NETMAP_BUF_SIZE(na)); - if (!m) { - D("tx_pool[%d] allocation failed", i); - error = ENOMEM; - goto free_tx_pools; - } - na->tx_rings[r].tx_pool[i] = m; - } + mtx_init(&kring->tx_event_lock, "tx_event_lock", + NULL, MTX_SPIN); } + } + + for_each_rx_kring_h(r, kring, na) { + if (nm_kring_pending_on(kring)) { + D("RX ring %d of generic adapter %p goes on", r, na); + kring->nr_mode = NKR_NETMAP_ON; + } + + } + for_each_tx_kring_h(r, kring, na) { + if (nm_kring_pending_on(kring)) { + D("TX ring %d of generic adapter %p goes on", r, na); + kring->nr_mode = NKR_NETMAP_ON; + } + } + + for_each_tx_kring(r, kring, na) { + /* Initialize tx_pool and tx_event. */ + for (i=0; i<na->num_tx_desc; i++) { + kring->tx_pool[i] = NULL; + } + + kring->tx_event = NULL; + } + + if (na->active_fds == 0) { rtnl_lock(); + /* Prepare to intercept incoming traffic. */ - error = netmap_catch_rx(gna, 1); + error = nm_os_catch_rx(gna, 1); if (error) { - D("netdev_rx_handler_register() failed (%d)", error); + D("nm_os_catch_rx(1) failed (%d)", error); goto register_handler; } - na->na_flags |= NAF_NETMAP_ON; /* Make netmap control the packet steering. */ - netmap_catch_tx(gna, 1); + error = nm_os_catch_tx(gna, 1); + if (error) { + D("nm_os_catch_tx(1) failed (%d)", error); + goto catch_rx; + } rtnl_unlock(); + na->na_flags |= NAF_NETMAP_ON; + #ifdef RATE_GENERIC if (rate_ctx.refcount == 0) { D("setup_timer()"); @@ -329,75 +537,28 @@ } rate_ctx.refcount++; #endif /* RATE */ - - } else if (na->tx_rings[0].tx_pool) { - /* Disable netmap mode. We enter here only if the previous - generic_netmap_register(na, 1) was successful. - If it was not, na->tx_rings[0].tx_pool was set to NULL by the - error handling code below. */ - rtnl_lock(); - - na->na_flags &= ~NAF_NETMAP_ON; - - /* Release packet steering control. */ - netmap_catch_tx(gna, 0); - - /* Do not intercept packets on the rx path. */ - netmap_catch_rx(gna, 0); - - rtnl_unlock(); - - /* Free the mbufs going to the netmap rings */ - for (r=0; r<na->num_rx_rings; r++) { - mbq_safe_purge(&na->rx_rings[r].rx_queue); - mbq_safe_destroy(&na->rx_rings[r].rx_queue); - } - - for (r=0; r<na->num_rx_rings; r++) - netmap_mitigation_cleanup(&gna->mit[r]); - free(gna->mit, M_DEVBUF); - - for (r=0; r<na->num_tx_rings; r++) { - for (i=0; i<na->num_tx_desc; i++) { - m_freem(na->tx_rings[r].tx_pool[i]); - } - free(na->tx_rings[r].tx_pool, M_DEVBUF); - } - -#ifdef RATE_GENERIC - if (--rate_ctx.refcount == 0) { - D("del_timer()"); - del_timer(&rate_ctx.timer); - } -#endif } -#ifdef REG_RESET - error = ifp->netdev_ops->ndo_open(ifp); - if (error) { - goto free_tx_pools; - } -#endif - return 0; + /* Here (na->active_fds == 0) holds. */ +catch_rx: + nm_os_catch_rx(gna, 0); register_handler: rtnl_unlock(); free_tx_pools: - for (r=0; r<na->num_tx_rings; r++) { - if (na->tx_rings[r].tx_pool == NULL) + for_each_tx_kring(r, kring, na) { + mtx_destroy(&kring->tx_event_lock); + if (kring->tx_pool == NULL) { continue; - for (i=0; i<na->num_tx_desc; i++) - if (na->tx_rings[r].tx_pool[i]) - m_freem(na->tx_rings[r].tx_pool[i]); - free(na->tx_rings[r].tx_pool, M_DEVBUF); - na->tx_rings[r].tx_pool = NULL; - } - for (r=0; r<na->num_rx_rings; r++) { - netmap_mitigation_cleanup(&gna->mit[r]); - mbq_safe_destroy(&na->rx_rings[r].rx_queue); + } + nm_os_free(kring->tx_pool); + kring->tx_pool = NULL; } - free(gna->mit, M_DEVBUF); + for_each_rx_kring(r, kring, na) { + mbq_safe_fini(&kring->rx_queue); + } + nm_os_free(gna->mit); out: return error; @@ -411,24 +572,67 @@ static void generic_mbuf_destructor(struct mbuf *m) { - netmap_generic_irq(MBUF_IFP(m), MBUF_TXQ(m), NULL); + struct netmap_adapter *na = NA(GEN_TX_MBUF_IFP(m)); + struct netmap_kring *kring; + unsigned int r = MBUF_TXQ(m); + unsigned int r_orig = r; + + if (unlikely(!nm_netmap_on(na) || r >= na->num_tx_rings)) { + D("Error: no netmap adapter on device %p", + GEN_TX_MBUF_IFP(m)); + return; + } + + /* + * First, clear the event mbuf. + * In principle, the event 'm' should match the one stored + * on ring 'r'. However we check it explicitely to stay + * safe against lower layers (qdisc, driver, etc.) changing + * MBUF_TXQ(m) under our feet. If the match is not found + * on 'r', we try to see if it belongs to some other ring. + */ + for (;;) { + bool match = false; + + kring = &na->tx_rings[r]; + mtx_lock_spin(&kring->tx_event_lock); + if (kring->tx_event == m) { + kring->tx_event = NULL; + match = true; + } + mtx_unlock_spin(&kring->tx_event_lock); + + if (match) { + if (r != r_orig) { + RD(1, "event %p migrated: ring %u --> %u", + m, r_orig, r); + } + break; + } + + if (++r == na->num_tx_rings) r = 0; + + if (r == r_orig) { + RD(1, "Cannot match event %p", m); + return; + } + } + + /* Second, wake up clients. They will reclaim the event through + * txsync. */ + netmap_generic_irq(na, r, NULL); #ifdef __FreeBSD__ - if (netmap_verbose) - RD(5, "Tx irq (%p) queue %d index %d" , m, MBUF_TXQ(m), (int)(uintptr_t)m->m_ext.ext_arg1); - netmap_default_mbuf_destructor(m); -#endif /* __FreeBSD__ */ - IFRATE(rate_ctx.new.txirq++); + void_mbuf_dtor(m, NULL, NULL); +#endif } -extern int netmap_adaptive_io; - /* Record completed transmissions and update hwtail. * * The oldest tx buffer not yet completed is at nr_hwtail + 1, * nr_hwcur is the first unsent buffer. */ static u_int -generic_netmap_tx_clean(struct netmap_kring *kring) +generic_netmap_tx_clean(struct netmap_kring *kring, int txqdisc) { u_int const lim = kring->nkr_num_slots - 1; u_int nm_i = nm_next(kring->nr_hwtail, lim); @@ -436,39 +640,52 @@ u_int n = 0; struct mbuf **tx_pool = kring->tx_pool; + ND("hwcur = %d, hwtail = %d", kring->nr_hwcur, kring->nr_hwtail); + while (nm_i != hwcur) { /* buffers not completed */ struct mbuf *m = tx_pool[nm_i]; - if (unlikely(m == NULL)) { - /* this is done, try to replenish the entry */ - tx_pool[nm_i] = m = netmap_get_mbuf(NETMAP_BUF_SIZE(kring->na)); + if (txqdisc) { + if (m == NULL) { + /* Nothing to do, this is going + * to be replenished. */ + RD(3, "Is this happening?"); + + } else if (MBUF_QUEUED(m)) { + break; /* Not dequeued yet. */ + + } else if (MBUF_REFCNT(m) != 1) { + /* This mbuf has been dequeued but is still busy + * (refcount is 2). + * Leave it to the driver and replenish. */ + m_freem(m); + tx_pool[nm_i] = NULL; + } + + } else { if (unlikely(m == NULL)) { - D("mbuf allocation failed, XXX error"); - // XXX how do we proceed ? break ? - return -ENOMEM; + int event_consumed; + + /* This slot was used to place an event. */ + mtx_lock_spin(&kring->tx_event_lock); + event_consumed = (kring->tx_event == NULL); + mtx_unlock_spin(&kring->tx_event_lock); + if (!event_consumed) { + /* The event has not been consumed yet, + * still busy in the driver. */ + break; + } + /* The event has been consumed, we can go + * ahead. */ + + } else if (MBUF_REFCNT(m) != 1) { + /* This mbuf is still busy: its refcnt is 2. */ + break; } - } else if (GET_MBUF_REFCNT(m) != 1) { - break; /* This mbuf is still busy: its refcnt is 2. */ } + n++; nm_i = nm_next(nm_i, lim); -#if 0 /* rate adaptation */ - if (netmap_adaptive_io > 1) { - if (n >= netmap_adaptive_io) - break; - } else if (netmap_adaptive_io) { - /* if hwcur - nm_i < lim/8 do an early break - * so we prevent the sender from stalling. See CVT. - */ - if (hwcur >= nm_i) { - if (hwcur - nm_i < lim/2) - break; - } else { - if (hwcur + lim + 1 - nm_i < lim/2) - break; - } - } -#endif } kring->nr_hwtail = nm_prev(nm_i, lim); ND("tx completed [%d] -> hwtail %d", n, kring->nr_hwtail); @@ -476,23 +693,17 @@ return n; } - -/* - * We have pending packets in the driver between nr_hwtail +1 and hwcur. - * Compute a position in the middle, to be used to generate - * a notification. - */ +/* Compute a slot index in the middle between inf and sup. */ static inline u_int -generic_tx_event_middle(struct netmap_kring *kring, u_int hwcur) +ring_middle(u_int inf, u_int sup, u_int lim) { - u_int n = kring->nkr_num_slots; - u_int ntc = nm_next(kring->nr_hwtail, n-1); + u_int n = lim + 1; u_int e; - if (hwcur >= ntc) { - e = (hwcur + ntc) / 2; + if (sup >= inf) { + e = (sup + inf) / 2; } else { /* wrap around */ - e = (hwcur + n + ntc) / 2; + e = (sup + n + inf) / 2; if (e >= n) { e -= n; } @@ -506,35 +717,59 @@ return e; } -/* - * We have pending packets in the driver between nr_hwtail+1 and hwcur. - * Schedule a notification approximately in the middle of the two. - * There is a race but this is only called within txsync which does - * a double check. - */ static void generic_set_tx_event(struct netmap_kring *kring, u_int hwcur) { + u_int lim = kring->nkr_num_slots - 1; struct mbuf *m; u_int e; + u_int ntc = nm_next(kring->nr_hwtail, lim); /* next to clean */ - if (nm_next(kring->nr_hwtail, kring->nkr_num_slots -1) == hwcur) { + if (ntc == hwcur) { return; /* all buffers are free */ } - e = generic_tx_event_middle(kring, hwcur); + + /* + * We have pending packets in the driver between hwtail+1 + * and hwcur, and we have to chose one of these slot to + * generate a notification. + * There is a race but this is only called within txsync which + * does a double check. + */ +#if 0 + /* Choose a slot in the middle, so that we don't risk ending + * up in a situation where the client continuously wake up, + * fills one or a few TX slots and go to sleep again. */ + e = ring_middle(ntc, hwcur, lim); +#else + /* Choose the first pending slot, to be safe against driver + * reordering mbuf transmissions. */ + e = ntc; +#endif m = kring->tx_pool[e]; - ND(5, "Request Event at %d mbuf %p refcnt %d", e, m, m ? GET_MBUF_REFCNT(m) : -2 ); if (m == NULL) { - /* This can happen if there is already an event on the netmap - slot 'e': There is nothing to do. */ + /* An event is already in place. */ return; } - kring->tx_pool[e] = NULL; + + mtx_lock_spin(&kring->tx_event_lock); + if (kring->tx_event) { + /* An event is already in place. */ + mtx_unlock_spin(&kring->tx_event_lock); + return; + } + SET_MBUF_DESTRUCTOR(m, generic_mbuf_destructor); + kring->tx_event = m; + mtx_unlock_spin(&kring->tx_event_lock); + + kring->tx_pool[e] = NULL; + + ND(5, "Request Event at %d mbuf %p refcnt %d", e, m, m ? MBUF_REFCNT(m) : -2 ); - // XXX wmb() ? - /* Decrement the refcount an free it if we have the last one. */ + /* Decrement the refcount. This will free it if we lose the race + * with the driver. */ m_freem(m); smp_mb(); } @@ -551,6 +786,7 @@ generic_netmap_txsync(struct netmap_kring *kring, int flags) { struct netmap_adapter *na = kring->na; + struct netmap_generic_adapter *gna = (struct netmap_generic_adapter *)na; struct ifnet *ifp = na->ifp; struct netmap_ring *ring = kring->ring; u_int nm_i; /* index into the netmap ring */ // j @@ -560,8 +796,6 @@ IFRATE(rate_ctx.new.txsync++); - // TODO: handle the case of mbuf allocation failure - rmb(); /* @@ -569,72 +803,121 @@ */ nm_i = kring->nr_hwcur; if (nm_i != head) { /* we have new packets to send */ + struct nm_os_gen_arg a; + u_int event = -1; + + if (gna->txqdisc && nm_kr_txempty(kring)) { + /* In txqdisc mode, we ask for a delayed notification, + * but only when cur == hwtail, which means that the + * client is going to block. */ + event = ring_middle(nm_i, head, lim); + ND(3, "Place txqdisc event (hwcur=%u,event=%u," + "head=%u,hwtail=%u)", nm_i, event, head, + kring->nr_hwtail); + } + + a.ifp = ifp; + a.ring_nr = ring_nr; + a.head = a.tail = NULL; + while (nm_i != head) { struct netmap_slot *slot = &ring->slot[nm_i]; u_int len = slot->len; void *addr = NMB(na, slot); - /* device-specific */ struct mbuf *m; int tx_ret; NM_CHECK_ADDR_LEN(na, addr, len); - /* Tale a mbuf from the tx pool and copy in the user packet. */ + /* Tale a mbuf from the tx pool (replenishing the pool + * entry if necessary) and copy in the user packet. */ m = kring->tx_pool[nm_i]; - if (unlikely(!m)) { - RD(5, "This should never happen"); - kring->tx_pool[nm_i] = m = netmap_get_mbuf(NETMAP_BUF_SIZE(na)); - if (unlikely(m == NULL)) { - D("mbuf allocation failed"); + if (unlikely(m == NULL)) { + kring->tx_pool[nm_i] = m = + nm_os_get_mbuf(ifp, NETMAP_BUF_SIZE(na)); + if (m == NULL) { + RD(2, "Failed to replenish mbuf"); + /* Here we could schedule a timer which + * retries to replenish after a while, + * and notifies the client when it + * manages to replenish some slots. In + * any case we break early to avoid + * crashes. */ break; } + IFRATE(rate_ctx.new.txrepl++); } - /* XXX we should ask notifications when NS_REPORT is set, - * or roughly every half frame. We can optimize this - * by lazily requesting notifications only when a - * transmission fails. Probably the best way is to - * break on failures and set notifications when - * ring->cur == ring->tail || nm_i != cur + + a.m = m; + a.addr = addr; + a.len = len; + a.qevent = (nm_i == event); + /* When not in txqdisc mode, we should ask + * notifications when NS_REPORT is set, or roughly + * every half ring. To optimize this, we set a + * notification event when the client runs out of + * TX ring space, or when transmission fails. In + * the latter case we also break early. */ - tx_ret = generic_xmit_frame(ifp, m, addr, len, ring_nr); + tx_ret = nm_os_generic_xmit_frame(&a); if (unlikely(tx_ret)) { - ND(5, "start_xmit failed: err %d [nm_i %u, head %u, hwtail %u]", - tx_ret, nm_i, head, kring->nr_hwtail); - /* - * No room for this mbuf in the device driver. - * Request a notification FOR A PREVIOUS MBUF, - * then call generic_netmap_tx_clean(kring) to do the - * double check and see if we can free more buffers. - * If there is space continue, else break; - * NOTE: the double check is necessary if the problem - * occurs in the txsync call after selrecord(). - * Also, we need some way to tell the caller that not - * all buffers were queued onto the device (this was - * not a problem with native netmap driver where space - * is preallocated). The bridge has a similar problem - * and we solve it there by dropping the excess packets. - */ - generic_set_tx_event(kring, nm_i); - if (generic_netmap_tx_clean(kring)) { /* space now available */ - continue; - } else { - break; + if (!gna->txqdisc) { + /* + * No room for this mbuf in the device driver. + * Request a notification FOR A PREVIOUS MBUF, + * then call generic_netmap_tx_clean(kring) to do the + * double check and see if we can free more buffers. + * If there is space continue, else break; + * NOTE: the double check is necessary if the problem + * occurs in the txsync call after selrecord(). + * Also, we need some way to tell the caller that not + * all buffers were queued onto the device (this was + * not a problem with native netmap driver where space + * is preallocated). The bridge has a similar problem + * and we solve it there by dropping the excess packets. + */ + generic_set_tx_event(kring, nm_i); + if (generic_netmap_tx_clean(kring, gna->txqdisc)) { + /* space now available */ + continue; + } else { + break; + } } + + /* In txqdisc mode, the netmap-aware qdisc + * queue has the same length as the number of + * netmap slots (N). Since tail is advanced + * only when packets are dequeued, qdisc + * queue overrun cannot happen, so + * nm_os_generic_xmit_frame() did not fail + * because of that. + * However, packets can be dropped because + * carrier is off, or because our qdisc is + * being deactivated, or possibly for other + * reasons. In these cases, we just let the + * packet to be dropped. */ + IFRATE(rate_ctx.new.txdrop++); } + slot->flags &= ~(NS_REPORT | NS_BUF_CHANGED); nm_i = nm_next(nm_i, lim); - IFRATE(rate_ctx.new.txpkt ++); + IFRATE(rate_ctx.new.txpkt++); } - - /* Update hwcur to the next slot to transmit. */ - kring->nr_hwcur = nm_i; /* not head, we could break early */ + if (a.head != NULL) { + a.addr = NULL; + nm_os_generic_xmit_frame(&a); + } + /* Update hwcur to the next slot to transmit. Here nm_i + * is not necessarily head, we could break early. */ + kring->nr_hwcur = nm_i; } /* * Second, reclaim completed buffers */ - if (flags & NAF_FORCE_RECLAIM || nm_kr_txempty(kring)) { + if (!gna->txqdisc && (flags & NAF_FORCE_RECLAIM || nm_kr_txempty(kring))) { /* No more available slots? Set a notification event * on a netmap slot that will be cleaned in the future. * No doublecheck is performed, since txsync() will be @@ -642,58 +925,74 @@ */ generic_set_tx_event(kring, nm_i); } - ND("tx #%d, hwtail = %d", n, kring->nr_hwtail); - generic_netmap_tx_clean(kring); + generic_netmap_tx_clean(kring, gna->txqdisc); return 0; } /* - * This handler is registered (through netmap_catch_rx()) + * This handler is registered (through nm_os_catch_rx()) * within the attached network interface * in the RX subsystem, so that every mbuf passed up by * the driver can be stolen to the network stack. * Stolen packets are put in a queue where the * generic_netmap_rxsync() callback can extract them. + * Returns 1 if the packet was stolen, 0 otherwise. */ -void +int generic_rx_handler(struct ifnet *ifp, struct mbuf *m) { struct netmap_adapter *na = NA(ifp); struct netmap_generic_adapter *gna = (struct netmap_generic_adapter *)na; + struct netmap_kring *kring; u_int work_done; - u_int rr = MBUF_RXQ(m); // receive ring number + u_int r = MBUF_RXQ(m); /* receive ring number */ + + if (r >= na->num_rx_rings) { + r = r % na->num_rx_rings; + } + + kring = &na->rx_rings[r]; - if (rr >= na->num_rx_rings) { - rr = rr % na->num_rx_rings; // XXX expensive... + if (kring->nr_mode == NKR_NETMAP_OFF) { + /* We must not intercept this mbuf. */ + return 0; } /* limit the size of the queue */ - if (unlikely(mbq_len(&na->rx_rings[rr].rx_queue) > 1024)) { + if (unlikely(!gna->rxsg && MBUF_LEN(m) > NETMAP_BUF_SIZE(na))) { + /* This may happen when GRO/LRO features are enabled for + * the NIC driver when the generic adapter does not + * support RX scatter-gather. */ + RD(2, "Warning: driver pushed up big packet " + "(size=%d)", (int)MBUF_LEN(m)); + m_freem(m); + } else if (unlikely(mbq_len(&kring->rx_queue) > 1024)) { m_freem(m); } else { - mbq_safe_enqueue(&na->rx_rings[rr].rx_queue, m); + mbq_safe_enqueue(&kring->rx_queue, m); } if (netmap_generic_mit < 32768) { /* no rx mitigation, pass notification up */ - netmap_generic_irq(na->ifp, rr, &work_done); - IFRATE(rate_ctx.new.rxirq++); + netmap_generic_irq(na, r, &work_done); } else { /* same as send combining, filter notification if there is a * pending timer, otherwise pass it up and start a timer. */ - if (likely(netmap_mitigation_active(&gna->mit[rr]))) { + if (likely(nm_os_mitigation_active(&gna->mit[r]))) { /* Record that there is some pending work. */ - gna->mit[rr].mit_pending = 1; + gna->mit[r].mit_pending = 1; } else { - netmap_generic_irq(na->ifp, rr, &work_done); - IFRATE(rate_ctx.new.rxirq++); - netmap_mitigation_start(&gna->mit[rr]); + netmap_generic_irq(na, r, &work_done); + nm_os_mitigation_start(&gna->mit[r]); } } + + /* We have intercepted the mbuf. */ + return 1; } /* @@ -713,54 +1012,23 @@ u_int const head = kring->rhead; int force_update = (flags & NAF_FORCE_READ) || kring->nr_kflags & NKR_PENDINTR; + /* Adapter-specific variables. */ + uint16_t slot_flags = kring->nkr_slot_flags; + u_int nm_buf_len = NETMAP_BUF_SIZE(na); + struct mbq tmpq; + struct mbuf *m; + int avail; /* in bytes */ + int mlen; + int copy; + if (head > lim) return netmap_ring_reinit(kring); - /* - * First part: import newly received packets. - */ - if (netmap_no_pendintr || force_update) { - /* extract buffers from the rx queue, stop at most one - * slot before nr_hwcur (stop_i) - */ - uint16_t slot_flags = kring->nkr_slot_flags; - u_int stop_i = nm_prev(kring->nr_hwcur, lim); - - nm_i = kring->nr_hwtail; /* first empty slot in the receive ring */ - for (n = 0; nm_i != stop_i; n++) { - int len; - void *addr = NMB(na, &ring->slot[nm_i]); - struct mbuf *m; - - /* we only check the address here on generic rx rings */ - if (addr == NETMAP_BUF_BASE(na)) { /* Bad buffer */ - return netmap_ring_reinit(kring); - } - /* - * Call the locked version of the function. - * XXX Ideally we could grab a batch of mbufs at once - * and save some locking overhead. - */ - m = mbq_safe_dequeue(&kring->rx_queue); - if (!m) /* no more data */ - break; - len = MBUF_LEN(m); - m_copydata(m, 0, len, addr); - ring->slot[nm_i].len = len; - ring->slot[nm_i].flags = slot_flags; - m_freem(m); - nm_i = nm_next(nm_i, lim); - } - if (n) { - kring->nr_hwtail = nm_i; - IFRATE(rate_ctx.new.rxpkt += n); - } - kring->nr_kflags &= ~NKR_PENDINTR; - } + IFRATE(rate_ctx.new.rxsync++); - // XXX should we invert the order ? /* - * Second part: skip past packets that userspace has released. + * First part: skip past packets that userspace has released. + * This can possibly make room for the second part. */ nm_i = kring->nr_hwcur; if (nm_i != head) { @@ -773,7 +1041,106 @@ } kring->nr_hwcur = head; } - IFRATE(rate_ctx.new.rxsync++); + + /* + * Second part: import newly received packets. + */ + if (!netmap_no_pendintr && !force_update) { + return 0; + } + + nm_i = kring->nr_hwtail; /* First empty slot in the receive ring. */ + + /* Compute the available space (in bytes) in this netmap ring. + * The first slot that is not considered in is the one before + * nr_hwcur. */ + + avail = nm_prev(kring->nr_hwcur, lim) - nm_i; + if (avail < 0) + avail += lim + 1; + avail *= nm_buf_len; + + /* First pass: While holding the lock on the RX mbuf queue, + * extract as many mbufs as they fit the available space, + * and put them in a temporary queue. + * To avoid performing a per-mbuf division (mlen / nm_buf_len) to + * to update avail, we do the update in a while loop that we + * also use to set the RX slots, but without performing the copy. */ + mbq_init(&tmpq); + mbq_lock(&kring->rx_queue); + for (n = 0;; n++) { + m = mbq_peek(&kring->rx_queue); + if (!m) { + /* No more packets from the driver. */ + break; + } + + mlen = MBUF_LEN(m); + if (mlen > avail) { + /* No more space in the ring. */ + break; + } + + mbq_dequeue(&kring->rx_queue); + + while (mlen) { + copy = nm_buf_len; + if (mlen < copy) { + copy = mlen; + } + mlen -= copy; + avail -= nm_buf_len; + + ring->slot[nm_i].len = copy; + ring->slot[nm_i].flags = slot_flags | (mlen ? NS_MOREFRAG : 0); + nm_i = nm_next(nm_i, lim); + } + + mbq_enqueue(&tmpq, m); + } + mbq_unlock(&kring->rx_queue); + + /* Second pass: Drain the temporary queue, going over the used RX slots, + * and perform the copy out of the RX queue lock. */ + nm_i = kring->nr_hwtail; + + for (;;) { + void *nmaddr; + int ofs = 0; + int morefrag; + + m = mbq_dequeue(&tmpq); + if (!m) { + break; + } + + do { + nmaddr = NMB(na, &ring->slot[nm_i]); + /* We only check the address here on generic rx rings. */ + if (nmaddr == NETMAP_BUF_BASE(na)) { /* Bad buffer */ + m_freem(m); + mbq_purge(&tmpq); + mbq_fini(&tmpq); + return netmap_ring_reinit(kring); + } + + copy = ring->slot[nm_i].len; + m_copydata(m, ofs, copy, nmaddr); + ofs += copy; + morefrag = ring->slot[nm_i].flags & NS_MOREFRAG; + nm_i = nm_next(nm_i, lim); + } while (morefrag); + + m_freem(m); + } + + mbq_fini(&tmpq); + + if (n) { + kring->nr_hwtail = nm_i; + IFRATE(rate_ctx.new.rxpkt += n); + } + kring->nr_kflags &= ~NKR_PENDINTR; return 0; } @@ -787,9 +1154,8 @@ if (prev_na != NULL) { D("Released generic NA %p", gna); - if_rele(ifp); netmap_adapter_put(prev_na); - if (na->ifp == NULL) { + if (nm_iszombie(na)) { /* * The driver has been removed without releasing * the reference so we need to do it here. @@ -797,9 +1163,13 @@ netmap_adapter_put(prev_na); } } - WNA(ifp) = prev_na; - D("Restored native NA %p", prev_na); + NM_ATTACH_NA(ifp, prev_na); + /* + * netmap_detach_common(), that it's called after this function, + * overrides WNA(ifp) if na->ifp is not NULL. + */ na->ifp = NULL; + D("Restored native NA %p", prev_na); } /* @@ -823,14 +1193,14 @@ num_tx_desc = num_rx_desc = netmap_generic_ringsize; /* starting point */ - generic_find_num_desc(ifp, &num_tx_desc, &num_rx_desc); /* ignore errors */ + nm_os_generic_find_num_desc(ifp, &num_tx_desc, &num_rx_desc); /* ignore errors */ ND("Netmap ring size: TX = %d, RX = %d", num_tx_desc, num_rx_desc); if (num_tx_desc == 0 || num_rx_desc == 0) { D("Device has no hw slots (tx %u, rx %u)", num_tx_desc, num_rx_desc); return EINVAL; } - gna = malloc(sizeof(*gna), M_DEVBUF, M_NOWAIT | M_ZERO); + gna = nm_os_malloc(sizeof(*gna)); if (gna == NULL) { D("no memory on attach, give up"); return ENOMEM; @@ -855,12 +1225,23 @@ ND("[GNA] num_rx_queues(%d), real_num_rx_queues(%d)", ifp->num_rx_queues, ifp->real_num_rx_queues); - generic_find_num_queues(ifp, &na->num_tx_rings, &na->num_rx_rings); + nm_os_generic_find_num_queues(ifp, &na->num_tx_rings, &na->num_rx_rings); retval = netmap_attach_common(na); if (retval) { - free(gna, M_DEVBUF); + nm_os_free(gna); + return retval; } + gna->prev = NA(ifp); /* save old na */ + if (gna->prev != NULL) { + netmap_adapter_get(gna->prev); + } + NM_ATTACH_NA(ifp, na); + + nm_os_generic_set_features(gna); + + D("Created generic NA %p (prev %p)", gna, gna->prev); + return retval; } diff -u -r -N usr/src/sys/dev/netmap/netmap_kern.h /usr/src/sys/dev/netmap/netmap_kern.h --- usr/src/sys/dev/netmap/netmap_kern.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_kern.h 2016-12-01 09:51:28.714314000 +0000 @@ -1,6 +1,7 @@ /* - * Copyright (C) 2011-2014 Matteo Landi, Luigi Rizzo. All rights reserved. - * Copyright (C) 2013-2014 Universita` di Pisa. All rights reserved. + * Copyright (C) 2011-2014 Matteo Landi, Luigi Rizzo + * Copyright (C) 2013-2016 Universita` di Pisa + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -25,7 +26,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/netmap_kern.h 285699 2015-07-19 18:07:25Z luigi $ + * $FreeBSD: head/sys/dev/netmap/netmap_kern.h 238985 2012-08-02 11:59:43Z luigi $ * * The header contains the definitions of constants and function * prototypes used only in kernelspace. @@ -48,23 +49,38 @@ #if defined(CONFIG_NETMAP_GENERIC) #define WITH_GENERIC #endif -#if defined(CONFIG_NETMAP_V1000) -#define WITH_V1000 +#if defined(CONFIG_NETMAP_PTNETMAP_GUEST) +#define WITH_PTNETMAP_GUEST +#endif +#if defined(CONFIG_NETMAP_PTNETMAP_HOST) +#define WITH_PTNETMAP_HOST +#endif +#if defined(CONFIG_NETMAP_SINK) +#define WITH_SINK #endif -#else /* not linux */ +#elif defined (_WIN32) +#define WITH_VALE // comment out to disable VALE support +#define WITH_PIPES +#define WITH_MONITOR +#define WITH_GENERIC +#else /* neither linux nor windows */ #define WITH_VALE // comment out to disable VALE support #define WITH_PIPES #define WITH_MONITOR #define WITH_GENERIC +#define WITH_PTNETMAP_HOST /* ptnetmap host support */ +#define WITH_PTNETMAP_GUEST /* ptnetmap guest support */ #endif #if defined(__FreeBSD__) +#include <sys/selinfo.h> #define likely(x) __builtin_expect((long)!!(x), 1L) #define unlikely(x) __builtin_expect((long)!!(x), 0L) +#define __user #define NM_LOCK_T struct mtx /* low level spinlock, used to protect queues */ @@ -76,9 +92,11 @@ #define NM_MTX_ASSERT(m) sx_assert(&(m), SA_XLOCKED) #define NM_SELINFO_T struct nm_selinfo +#define NM_SELRECORD_T struct thread #define MBUF_LEN(m) ((m)->m_pkthdr.len) -#define MBUF_IFP(m) ((m)->m_pkthdr.rcvif) -#define NM_SEND_UP(ifp, m) ((NA(ifp))->if_input)(ifp, m) +#define MBUF_TXQ(m) ((m)->m_pkthdr.flowid) +#define MBUF_TRANSMIT(na, ifp, m) ((na)->if_transmit(ifp, m)) +#define GEN_TX_MBUF_IFP(m) ((m)->m_pkthdr.rcvif) #define NM_ATOMIC_T volatile int // XXX ? /* atomic operations */ @@ -97,23 +115,20 @@ #endif #if __FreeBSD_version >= 1100027 -#define GET_MBUF_REFCNT(m) ((m)->m_ext.ext_cnt ? *((m)->m_ext.ext_cnt) : -1) -#define SET_MBUF_REFCNT(m, x) *((m)->m_ext.ext_cnt) = x -#define PNT_MBUF_REFCNT(m) ((m)->m_ext.ext_cnt) +#define MBUF_REFCNT(m) ((m)->m_ext.ext_count) +#define SET_MBUF_REFCNT(m, x) (m)->m_ext.ext_count = x #else -#define GET_MBUF_REFCNT(m) ((m)->m_ext.ref_cnt ? *((m)->m_ext.ref_cnt) : -1) +#define MBUF_REFCNT(m) ((m)->m_ext.ref_cnt ? *((m)->m_ext.ref_cnt) : -1) #define SET_MBUF_REFCNT(m, x) *((m)->m_ext.ref_cnt) = x -#define PNT_MBUF_REFCNT(m) ((m)->m_ext.ref_cnt) #endif -MALLOC_DECLARE(M_NETMAP); +#define MBUF_QUEUED(m) 1 struct nm_selinfo { struct selinfo si; struct mtx m; }; -void freebsd_selwakeup(struct nm_selinfo *si, int pri); // XXX linux struct, not used in FreeBSD struct net_device_ops { @@ -130,12 +145,16 @@ #define NM_LOCK_T safe_spinlock_t // see bsd_glue.h #define NM_SELINFO_T wait_queue_head_t #define MBUF_LEN(m) ((m)->len) -#define MBUF_IFP(m) ((m)->dev) -#define NM_SEND_UP(ifp, m) \ - do { \ - m->priority = NM_MAGIC_PRIORITY_RX; \ - netif_rx(m); \ - } while (0) +#define MBUF_TRANSMIT(na, ifp, m) \ + ({ \ + /* Avoid infinite recursion with generic. */ \ + m->priority = NM_MAGIC_PRIORITY_TX; \ + (((struct net_device_ops *)(na)->if_transmit)->ndo_start_xmit(m, ifp)); \ + 0; \ + }) + +/* See explanation in nm_os_generic_xmit_frame. */ +#define GEN_TX_MBUF_IFP(m) ((struct ifnet *)skb_shinfo(m)->destructor_arg) #define NM_ATOMIC_T volatile long unsigned int @@ -158,7 +177,51 @@ #define NM_LOCK_T IOLock * #define NM_SELINFO_T struct selinfo #define MBUF_LEN(m) ((m)->m_pkthdr.len) -#define NM_SEND_UP(ifp, m) ((ifp)->if_input)(ifp, m) + +#elif defined (_WIN32) +#include "../../../WINDOWS/win_glue.h" + +#define NM_SELRECORD_T IO_STACK_LOCATION +#define NM_SELINFO_T win_SELINFO // see win_glue.h +#define NM_LOCK_T win_spinlock_t // see win_glue.h +#define NM_MTX_T KGUARDED_MUTEX /* OS-specific mutex (sleepable) */ + +#define NM_MTX_INIT(m) KeInitializeGuardedMutex(&m); +#define NM_MTX_DESTROY(m) do { (void)(m); } while (0) +#define NM_MTX_LOCK(m) KeAcquireGuardedMutex(&(m)) +#define NM_MTX_UNLOCK(m) KeReleaseGuardedMutex(&(m)) +#define NM_MTX_ASSERT(m) assert(&m.Count>0) + +//These linknames are for the NDIS driver +#define NETMAP_NDIS_LINKNAME_STRING L"\\DosDevices\\NMAPNDIS" +#define NETMAP_NDIS_NTDEVICE_STRING L"\\Device\\NMAPNDIS" + +//Definition of internal driver-to-driver ioctl codes +#define NETMAP_KERNEL_XCHANGE_POINTERS _IO('i', 180) +#define NETMAP_KERNEL_SEND_SHUTDOWN_SIGNAL _IO_direct('i', 195) + +//Empty data structures are not permitted by MSVC compiler +//XXX_ale, try to solve this problem +struct net_device_ops{ + char data[1]; +}; +typedef struct ethtool_ops{ + char data[1]; +}; +typedef struct hrtimer{ + KTIMER timer; + BOOLEAN active; + KDPC deferred_proc; +}; + +/* MSVC does not have likely/unlikely support */ +#ifdef _MSC_VER +#define likely(x) (x) +#define unlikely(x) (x) +#else +#define likely(x) __builtin_expect((long)!!(x), 1L) +#define unlikely(x) __builtin_expect((long)!!(x), 0L) +#endif //_MSC_VER #else @@ -166,6 +229,13 @@ #endif /* end - platform-specific code */ +#ifndef _WIN32 /* support for emulated sysctl */ +#define SYSBEGIN(x) +#define SYSEND +#endif /* _WIN32 */ + +#define NM_ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x)) + #define NMG_LOCK_T NM_MTX_T #define NMG_LOCK_INIT() NM_MTX_INIT(netmap_global_lock) #define NMG_LOCK_DESTROY() NM_MTX_DESTROY(netmap_global_lock) @@ -200,8 +270,41 @@ struct nm_bridge; struct netmap_priv_d; +/* os-specific NM_SELINFO_T initialzation/destruction functions */ +void nm_os_selinfo_init(NM_SELINFO_T *); +void nm_os_selinfo_uninit(NM_SELINFO_T *); + const char *nm_dump_buf(char *p, int len, int lim, char *dst); +void nm_os_selwakeup(NM_SELINFO_T *si); +void nm_os_selrecord(NM_SELRECORD_T *sr, NM_SELINFO_T *si); + +int nm_os_ifnet_init(void); +void nm_os_ifnet_fini(void); +void nm_os_ifnet_lock(void); +void nm_os_ifnet_unlock(void); + +void nm_os_get_module(void); +void nm_os_put_module(void); + +void netmap_make_zombie(struct ifnet *); +void netmap_undo_zombie(struct ifnet *); + +/* os independent alloc/realloc/free */ +void *nm_os_malloc(size_t); +void *nm_os_realloc(void *, size_t new_size, size_t old_size); +void nm_os_free(void *); + +/* passes a packet up to the host stack. + * If the packet is sent (or dropped) immediately it returns NULL, + * otherwise it links the packet to prev and returns m. + * In this case, a final call with m=NULL and prev != NULL will send up + * the entire chain to the host stack. + */ +void *nm_os_send_up(struct ifnet *, struct mbuf *m, struct mbuf *prev); + +int nm_os_mbuf_has_offld(struct mbuf *m); + #include "netmap_mbq.h" extern NMG_LOCK_T netmap_global_lock; @@ -298,6 +401,19 @@ uint32_t nr_kflags; /* private driver flags */ #define NKR_PENDINTR 0x1 // Pending interrupt. #define NKR_EXCLUSIVE 0x2 /* exclusive binding */ +#define NKR_FORWARD 0x4 /* (host ring only) there are + packets to forward + */ +#define NKR_NEEDRING 0x8 /* ring needed even if users==0 + * (used internally by pipes and + * by ptnetmap host ports) + */ + + uint32_t nr_mode; + uint32_t nr_pending_mode; +#define NKR_NETMAP_OFF 0x0 +#define NKR_NETMAP_ON 0x1 + uint32_t nkr_num_slots; /* @@ -343,13 +459,14 @@ * store incoming mbufs in a queue that is drained by * a rxsync. */ - struct mbuf **tx_pool; - // u_int nr_ntc; /* Emulation of a next-to-clean RX ring pointer. */ - struct mbq rx_queue; /* intercepted rx mbufs. */ + struct mbuf **tx_pool; + struct mbuf *tx_event; /* TX event used as a notification */ + NM_LOCK_T tx_event_lock; /* protects the tx_event mbuf */ + struct mbq rx_queue; /* intercepted rx mbufs. */ uint32_t users; /* existing bindings for this ring */ - uint32_t ring_id; /* debugging */ + uint32_t ring_id; /* kring identifier */ enum txrx tx; /* kind of ring (tx or rx) */ char name[64]; /* diagnostic */ @@ -371,9 +488,6 @@ struct netmap_kring *pipe; /* if this is a pipe ring, * pointer to the other end */ - struct netmap_ring *save_ring; /* pointer to hidden rings - * (see netmap_pipe.c for details) - */ #endif /* WITH_PIPES */ #ifdef WITH_VALE @@ -394,10 +508,30 @@ int (*mon_notify)(struct netmap_kring *kring, int flags); uint32_t mon_tail; /* last seen slot on rx */ - uint32_t mon_pos; /* index of this ring in the monitored ring array */ + uint32_t mon_pos[NR_TXRX]; /* index of this ring in the monitored ring array */ +#endif +} +#ifdef _WIN32 +__declspec(align(64)); +#else +__attribute__((__aligned__(64))); #endif -} __attribute__((__aligned__(64))); +/* return 1 iff the kring needs to be turned on */ +static inline int +nm_kring_pending_on(struct netmap_kring *kring) +{ + return kring->nr_pending_mode == NKR_NETMAP_ON && + kring->nr_mode == NKR_NETMAP_OFF; +} + +/* return 1 iff the kring needs to be turned off */ +static inline int +nm_kring_pending_off(struct netmap_kring *kring) +{ + return kring->nr_pending_mode == NKR_NETMAP_OFF && + kring->nr_mode == NKR_NETMAP_ON; +} /* return the next index, with wraparound */ static inline uint32_t @@ -513,6 +647,8 @@ */ #define NAF_HOST_RINGS 64 /* the adapter supports the host rings */ #define NAF_FORCE_NATIVE 128 /* the adapter is always NATIVE */ +#define NAF_PTNETMAP_HOST 256 /* the adapter supports ptnetmap in the host */ +#define NAF_ZOMBIE (1U<<30) /* the nic driver has been unloaded */ #define NAF_BUSY (1U<<31) /* the adapter is used internally and * cannot be registered from userspace */ @@ -591,10 +727,14 @@ * For hw devices this is typically a selwakeup(), * but for NIC/host ports attached to a switch (or vice-versa) * we also need to invoke the 'txsync' code downstream. + * This callback pointer is actually used only to initialize + * kring->nm_notify. + * Return values are the same as for netmap_rx_irq(). */ void (*nm_dtor)(struct netmap_adapter *); int (*nm_register)(struct netmap_adapter *, int onoff); + void (*nm_intr)(struct netmap_adapter *, int onoff); int (*nm_txsync)(struct netmap_kring *kring, int flags); int (*nm_rxsync)(struct netmap_kring *kring, int flags); @@ -639,14 +779,14 @@ /* memory allocator (opaque) * We also cache a pointer to the lut_entry for translating - * buffer addresses, and the total number of buffers. + * buffer addresses, the total number of buffers and the buffer size. */ struct netmap_mem_d *nm_mem; struct netmap_lut na_lut; /* additional information attached to this adapter * by other netmap subsystems. Currently used by - * bwrap and LINUX/v1000. + * bwrap, LINUX/v1000 and ptnetmap */ void *na_private; @@ -655,6 +795,9 @@ int na_next_pipe; /* next free slot in the array */ int na_max_pipes; /* size of the array */ + /* Offset of ethernet header for each packet. */ + u_int virt_hdr_len; + char name[64]; }; @@ -720,8 +863,6 @@ struct nm_bridge *na_bdg; int retry; - /* Offset of ethernet header for each packet. */ - u_int virt_hdr_len; /* Maximum Frame Size, used in bdg_mismatch_datapath() */ u_int mfs; /* Last source MAC on this port */ @@ -766,6 +907,13 @@ #ifdef linux netdev_tx_t (*save_start_xmit)(struct mbuf *, struct ifnet *); #endif + /* Is the adapter able to use multiple RX slots to scatter + * each packet pushed up by the driver? */ + int rxsg; + + /* Is the transmission path controlled by a netmap-aware + * device queue (i.e. qdisc on linux)? */ + int txqdisc; }; #endif /* WITH_GENERIC */ @@ -776,7 +924,7 @@ } #ifdef WITH_VALE - +struct nm_bdg_polling_state; /* * Bridge wrapper for non VALE ports attached to a VALE switch. * @@ -826,9 +974,6 @@ struct netmap_vp_adapter host; /* for host rings */ struct netmap_adapter *hwna; /* the underlying device */ - /* backup of the hwna memory allocator */ - struct netmap_mem_d *save_nmd; - /* * When we attach a physical interface to the bridge, we * allow the controlling process to terminate, so we need @@ -837,10 +982,10 @@ * are attached to a bridge. */ struct netmap_priv_d *na_kpriv; + struct nm_bdg_polling_state *na_polling_state; }; int netmap_bwrap_attach(const char *name, struct netmap_adapter *); - #endif /* WITH_VALE */ #ifdef WITH_PIPES @@ -875,56 +1020,122 @@ return space; } +/* return slots reserved to tx clients */ +#define nm_kr_txspace(_k) nm_kr_rxspace(_k) -/* True if no space in the tx ring. only valid after txsync_prologue */ + +/* True if no space in the tx ring, only valid after txsync_prologue */ static inline int nm_kr_txempty(struct netmap_kring *kring) { return kring->rcur == kring->nr_hwtail; } +/* True if no more completed slots in the rx ring, only valid after + * rxsync_prologue */ +#define nm_kr_rxempty(_k) nm_kr_txempty(_k) /* * protect against multiple threads using the same ring. - * also check that the ring has not been stopped. - * We only care for 0 or !=0 as a return code. + * also check that the ring has not been stopped or locked */ -#define NM_KR_BUSY 1 -#define NM_KR_STOPPED 2 +#define NM_KR_BUSY 1 /* some other thread is syncing the ring */ +#define NM_KR_STOPPED 2 /* unbounded stop (ifconfig down or driver unload) */ +#define NM_KR_LOCKED 3 /* bounded, brief stop for mutual exclusion */ +/* release the previously acquired right to use the *sync() methods of the ring */ static __inline void nm_kr_put(struct netmap_kring *kr) { NM_ATOMIC_CLEAR(&kr->nr_busy); } -static __inline int nm_kr_tryget(struct netmap_kring *kr) +/* true if the ifp that backed the adapter has disappeared (e.g., the + * driver has been unloaded) + */ +static inline int nm_iszombie(struct netmap_adapter *na); + +/* try to obtain exclusive right to issue the *sync() operations on the ring. + * The right is obtained and must be later relinquished via nm_kr_put() if and + * only if nm_kr_tryget() returns 0. + * If can_sleep is 1 there are only two other possible outcomes: + * - the function returns NM_KR_BUSY + * - the function returns NM_KR_STOPPED and sets the POLLERR bit in *perr + * (if non-null) + * In both cases the caller will typically skip the ring, possibly collecting + * errors along the way. + * If the calling context does not allow sleeping, the caller must pass 0 in can_sleep. + * In the latter case, the function may also return NM_KR_LOCKED and leave *perr + * untouched: ideally, the caller should try again at a later time. + */ +static __inline int nm_kr_tryget(struct netmap_kring *kr, int can_sleep, int *perr) { + int busy = 1, stopped; /* check a first time without taking the lock * to avoid starvation for nm_kr_get() */ - if (unlikely(kr->nkr_stopped)) { - ND("ring %p stopped (%d)", kr, kr->nkr_stopped); - return NM_KR_STOPPED; +retry: + stopped = kr->nkr_stopped; + if (unlikely(stopped)) { + goto stop; + } + busy = NM_ATOMIC_TEST_AND_SET(&kr->nr_busy); + /* we should not return NM_KR_BUSY if the ring was + * actually stopped, so check another time after + * the barrier provided by the atomic operation + */ + stopped = kr->nkr_stopped; + if (unlikely(stopped)) { + goto stop; + } + + if (unlikely(nm_iszombie(kr->na))) { + stopped = NM_KR_STOPPED; + goto stop; } - if (unlikely(NM_ATOMIC_TEST_AND_SET(&kr->nr_busy))) - return NM_KR_BUSY; - /* check a second time with lock held */ - if (unlikely(kr->nkr_stopped)) { - ND("ring %p stopped (%d)", kr, kr->nkr_stopped); + + return unlikely(busy) ? NM_KR_BUSY : 0; + +stop: + if (!busy) nm_kr_put(kr); - return NM_KR_STOPPED; + if (stopped == NM_KR_STOPPED) { +/* if POLLERR is defined we want to use it to simplify netmap_poll(). + * Otherwise, any non-zero value will do. + */ +#ifdef POLLERR +#define NM_POLLERR POLLERR +#else +#define NM_POLLERR 1 +#endif /* POLLERR */ + if (perr) + *perr |= NM_POLLERR; +#undef NM_POLLERR + } else if (can_sleep) { + tsleep(kr, 0, "NM_KR_TRYGET", 4); + goto retry; } - return 0; + return stopped; } -static __inline void nm_kr_get(struct netmap_kring *kr) +/* put the ring in the 'stopped' state and wait for the current user (if any) to + * notice. stopped must be either NM_KR_STOPPED or NM_KR_LOCKED + */ +static __inline void nm_kr_stop(struct netmap_kring *kr, int stopped) { + kr->nkr_stopped = stopped; while (NM_ATOMIC_TEST_AND_SET(&kr->nr_busy)) tsleep(kr, 0, "NM_KR_GET", 4); } +/* restart a ring after a stop */ +static __inline void nm_kr_start(struct netmap_kring *kr) +{ + kr->nkr_stopped = 0; + nm_kr_put(kr); +} + /* * The following functions are used by individual drivers to @@ -952,10 +1163,26 @@ enum txrx tx, u_int n, u_int new_cur); int netmap_ring_reinit(struct netmap_kring *); +/* Return codes for netmap_*x_irq. */ +enum { + /* Driver should do normal interrupt processing, e.g. because + * the interface is not in netmap mode. */ + NM_IRQ_PASS = 0, + /* Port is in netmap mode, and the interrupt work has been + * completed. The driver does not have to notify netmap + * again before the next interrupt. */ + NM_IRQ_COMPLETED = -1, + /* Port is in netmap mode, but the interrupt work has not been + * completed. The driver has to make sure netmap will be + * notified again soon, even if no more interrupts come (e.g. + * on Linux the driver should not call napi_complete()). */ + NM_IRQ_RESCHED = -2, +}; + /* default functions to handle rx/tx interrupts */ int netmap_rx_irq(struct ifnet *, u_int, u_int *); #define netmap_tx_irq(_n, _q) netmap_rx_irq(_n, _q, NULL) -void netmap_common_irq(struct ifnet *, u_int, u_int *work_done); +int netmap_common_irq(struct netmap_adapter *, u_int, u_int *work_done); #ifdef WITH_VALE @@ -985,35 +1212,74 @@ return nm_netmap_on(na) && (na->na_flags & NAF_NATIVE); } +static inline int +nm_iszombie(struct netmap_adapter *na) +{ + return na == NULL || (na->na_flags & NAF_ZOMBIE); +} + +static inline void +nm_update_hostrings_mode(struct netmap_adapter *na) +{ + /* Process nr_mode and nr_pending_mode for host rings. */ + na->tx_rings[na->num_tx_rings].nr_mode = + na->tx_rings[na->num_tx_rings].nr_pending_mode; + na->rx_rings[na->num_rx_rings].nr_mode = + na->rx_rings[na->num_rx_rings].nr_pending_mode; +} + /* set/clear native flags and if_transmit/netdev_ops */ static inline void nm_set_native_flags(struct netmap_adapter *na) { struct ifnet *ifp = na->ifp; + /* We do the setup for intercepting packets only if we are the + * first user of this adapapter. */ + if (na->active_fds > 0) { + return; + } + na->na_flags |= NAF_NETMAP_ON; #ifdef IFCAP_NETMAP /* or FreeBSD ? */ ifp->if_capenable |= IFCAP_NETMAP; #endif -#ifdef __FreeBSD__ +#if defined (__FreeBSD__) na->if_transmit = ifp->if_transmit; ifp->if_transmit = netmap_transmit; +#elif defined (_WIN32) + (void)ifp; /* prevent a warning */ + //XXX_ale can we just comment those? + //na->if_transmit = ifp->if_transmit; + //ifp->if_transmit = netmap_transmit; #else na->if_transmit = (void *)ifp->netdev_ops; ifp->netdev_ops = &((struct netmap_hw_adapter *)na)->nm_ndo; ((struct netmap_hw_adapter *)na)->save_ethtool = ifp->ethtool_ops; ifp->ethtool_ops = &((struct netmap_hw_adapter*)na)->nm_eto; #endif + nm_update_hostrings_mode(na); } - static inline void nm_clear_native_flags(struct netmap_adapter *na) { struct ifnet *ifp = na->ifp; -#ifdef __FreeBSD__ + /* We undo the setup for intercepting packets only if we are the + * last user of this adapapter. */ + if (na->active_fds > 0) { + return; + } + + nm_update_hostrings_mode(na); + +#if defined(__FreeBSD__) ifp->if_transmit = na->if_transmit; +#elif defined(_WIN32) + (void)ifp; /* prevent a warning */ + //XXX_ale can we just comment those? + //ifp->if_transmit = na->if_transmit; #else ifp->netdev_ops = (void *)na->if_transmit; ifp->ethtool_ops = ((struct netmap_hw_adapter*)na)->save_ethtool; @@ -1024,6 +1290,28 @@ #endif } +/* + * nm_*sync_prologue() functions are used in ioctl/poll and ptnetmap + * kthreads. + * We need netmap_ring* parameter, because in ptnetmap it is decoupled + * from host kring. + * The user-space ring pointers (head/cur/tail) are shared through + * CSB between host and guest. + */ + +/* + * validates parameters in the ring/kring, returns a value for head + * If any error, returns ring_size to force a reinit. + */ +uint32_t nm_txsync_prologue(struct netmap_kring *, struct netmap_ring *); + + +/* + * validates parameters in the ring/kring, returns a value for head + * If any error, returns ring_size lim to force a reinit. + */ +uint32_t nm_rxsync_prologue(struct netmap_kring *, struct netmap_ring *); + /* check/fix address and len in tx rings */ #if 1 /* debug version */ @@ -1079,6 +1367,9 @@ */ void netmap_krings_delete(struct netmap_adapter *na); +int netmap_hw_krings_create(struct netmap_adapter *na); +void netmap_hw_krings_delete(struct netmap_adapter *na); + /* set the stopped/enabled status of ring * When stopping, they also wait for all current activity on the ring to * terminate. The status change is then notified using the na nm_notify @@ -1087,16 +1378,18 @@ void netmap_set_ring(struct netmap_adapter *, u_int ring_id, enum txrx, int stopped); /* set the stopped/enabled status of all rings of the adapter. */ void netmap_set_all_rings(struct netmap_adapter *, int stopped); -/* convenience wrappers for netmap_set_all_rings, used in drivers */ +/* convenience wrappers for netmap_set_all_rings */ void netmap_disable_all_rings(struct ifnet *); void netmap_enable_all_rings(struct ifnet *); int netmap_do_regif(struct netmap_priv_d *priv, struct netmap_adapter *na, uint16_t ringid, uint32_t flags); - +void netmap_do_unregif(struct netmap_priv_d *priv); u_int nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, const char *msg); -int netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, int create); +int netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, + struct ifnet **ifp, int create); +void netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp); int netmap_get_hw_na(struct ifnet *ifp, struct netmap_adapter **na); @@ -1123,12 +1416,11 @@ u_int netmap_bdg_learning(struct nm_bdg_fwd *ft, uint8_t *dst_ring, struct netmap_vp_adapter *); +#define NM_BRIDGES 8 /* number of bridges */ #define NM_BDG_MAXPORTS 254 /* up to 254 */ #define NM_BDG_BROADCAST NM_BDG_MAXPORTS #define NM_BDG_NOPORT (NM_BDG_MAXPORTS+1) -#define NM_NAME "vale" /* prefix for bridge port name */ - /* these are redefined in case of no VALE support */ int netmap_get_bdg_na(struct nmreq *nmr, struct netmap_adapter **na, int create); struct nm_bridge *netmap_init_bridges2(u_int); @@ -1180,14 +1472,13 @@ #endif /* Various prototypes */ -int netmap_poll(struct cdev *dev, int events, struct thread *td); +int netmap_poll(struct netmap_priv_d *, int events, NM_SELRECORD_T *td); int netmap_init(void); void netmap_fini(void); int netmap_get_memory(struct netmap_priv_d* p); void netmap_dtor(void *data); -int netmap_dtor_locked(struct netmap_priv_d *priv); -int netmap_ioctl(struct cdev *dev, u_long cmd, caddr_t data, int fflag, struct thread *td); +int netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *); /* netmap_adapter creation/destruction */ @@ -1227,11 +1518,11 @@ /* * module variables */ -#define NETMAP_BUF_BASE(na) ((na)->na_lut.lut[0].vaddr) -#define NETMAP_BUF_SIZE(na) ((na)->na_lut.objsize) -extern int netmap_mitigate; // XXX not really used +#define NETMAP_BUF_BASE(_na) ((_na)->na_lut.lut[0].vaddr) +#define NETMAP_BUF_SIZE(_na) ((_na)->na_lut.objsize) extern int netmap_no_pendintr; -extern int netmap_verbose; // XXX debugging +extern int netmap_mitigate; +extern int netmap_verbose; /* for debugging */ enum { /* verbose flags */ NM_VERB_ON = 1, /* generic verbose */ NM_VERB_HOST = 0x2, /* verbose host stack */ @@ -1244,10 +1535,11 @@ }; extern int netmap_txsync_retry; +extern int netmap_flags; extern int netmap_generic_mit; extern int netmap_generic_ringsize; extern int netmap_generic_rings; -extern int netmap_use_count; +extern int netmap_generic_txqdisc; /* * NA returns a pointer to the struct netmap adapter from the ifp, @@ -1256,37 +1548,27 @@ #define NA(_ifp) ((struct netmap_adapter *)WNA(_ifp)) /* - * Macros to determine if an interface is netmap capable or netmap enabled. - * See the magic field in struct netmap_adapter. - */ -#ifdef __FreeBSD__ -/* - * on FreeBSD just use if_capabilities and if_capenable. - */ -#define NETMAP_CAPABLE(ifp) (NA(ifp) && \ - (ifp)->if_capabilities & IFCAP_NETMAP ) - -#define NETMAP_SET_CAPABLE(ifp) \ - (ifp)->if_capabilities |= IFCAP_NETMAP - -#else /* linux */ - -/* - * on linux: - * we check if NA(ifp) is set and its first element has a related + * On old versions of FreeBSD, NA(ifp) is a pspare. On linux we + * overload another pointer in the netdev. + * + * We check if NA(ifp) is set and its first element has a related * magic value. The capenable is within the struct netmap_adapter. */ #define NETMAP_MAGIC 0x52697a7a -#define NETMAP_CAPABLE(ifp) (NA(ifp) && \ +#define NM_NA_VALID(ifp) (NA(ifp) && \ ((uint32_t)(uintptr_t)NA(ifp) ^ NA(ifp)->magic) == NETMAP_MAGIC ) -#define NETMAP_SET_CAPABLE(ifp) \ - NA(ifp)->magic = ((uint32_t)(uintptr_t)NA(ifp)) ^ NETMAP_MAGIC +#define NM_ATTACH_NA(ifp, na) do { \ + WNA(ifp) = na; \ + if (NA(ifp)) \ + NA(ifp)->magic = \ + ((uint32_t)(uintptr_t)NA(ifp)) ^ NETMAP_MAGIC; \ +} while(0) -#endif /* linux */ +#define NM_IS_NATIVE(ifp) (NM_NA_VALID(ifp) && NA(ifp)->nm_dtor == netmap_hw_dtor) -#ifdef __FreeBSD__ +#if defined(__FreeBSD__) /* Assigns the device IOMMU domain to an allocator. * Returns -ENOMEM in case the domain is different */ @@ -1330,6 +1612,8 @@ } } +#elif defined(_WIN32) + #else /* linux */ int nm_iommu_group_id(bus_dma_tag_t dev); @@ -1340,8 +1624,8 @@ bus_dma_tag_t tag, bus_dmamap_t map, void *buf) { if (0 && map) { - *map = dma_map_single(na->pdev, buf, na->na_lut.objsize, - DMA_BIDIRECTIONAL); + *map = dma_map_single(na->pdev, buf, NETMAP_BUF_SIZE(na), + DMA_BIDIRECTIONAL); } } @@ -1349,11 +1633,11 @@ netmap_unload_map(struct netmap_adapter *na, bus_dma_tag_t tag, bus_dmamap_t map) { - u_int sz = na->na_lut.objsize; + u_int sz = NETMAP_BUF_SIZE(na); if (*map) { dma_unmap_single(na->pdev, *map, sz, - DMA_BIDIRECTIONAL); + DMA_BIDIRECTIONAL); } } @@ -1361,7 +1645,7 @@ netmap_reload_map(struct netmap_adapter *na, bus_dma_tag_t tag, bus_dmamap_t map, void *buf) { - u_int sz = na->na_lut.objsize; + u_int sz = NETMAP_BUF_SIZE(na); if (*map) { dma_unmap_single(na->pdev, *map, sz, @@ -1472,7 +1756,11 @@ struct lut_entry *lut = na->na_lut.lut; void *ret = (i >= na->na_lut.objtotal) ? lut[0].vaddr : lut[i].vaddr; +#ifndef _WIN32 *pp = (i >= na->na_lut.objtotal) ? lut[0].paddr : lut[i].paddr; +#else + *pp = (i >= na->na_lut.objtotal) ? (uint64_t)lut[0].paddr.QuadPart : (uint64_t)lut[i].paddr.QuadPart; +#endif return ret; } @@ -1496,8 +1784,9 @@ struct netmap_if * volatile np_nifp; /* netmap if descriptor. */ struct netmap_adapter *np_na; + struct ifnet *np_ifp; uint32_t np_flags; /* from the ioctl */ - u_int np_qfirst[NR_TXRX], + u_int np_qfirst[NR_TXRX], np_qlast[NR_TXRX]; /* range of tx/rx rings to scan */ uint16_t np_txpoll; /* XXX and also np_rxpoll ? */ @@ -1511,6 +1800,26 @@ struct thread *np_td; /* kqueue, just debugging */ }; +struct netmap_priv_d *netmap_priv_new(void); +void netmap_priv_delete(struct netmap_priv_d *); + +static inline int nm_kring_pending(struct netmap_priv_d *np) +{ + struct netmap_adapter *na = np->np_na; + enum txrx t; + int i; + + for_rx_tx(t) { + for (i = np->np_qfirst[t]; i < np->np_qlast[t]; i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + if (kring->nr_mode != kring->nr_pending_mode) { + return 1; + } + } + } + return 0; +} + #ifdef WITH_MONITOR struct netmap_monitor_adapter { @@ -1529,13 +1838,36 @@ * native netmap support. */ int generic_netmap_attach(struct ifnet *ifp); +int generic_rx_handler(struct ifnet *ifp, struct mbuf *m);; + +int nm_os_catch_rx(struct netmap_generic_adapter *gna, int intercept); +int nm_os_catch_tx(struct netmap_generic_adapter *gna, int intercept); + +/* + * the generic transmit routine is passed a structure to optionally + * build a queue of descriptors, in an OS-specific way. + * The payload is at addr, if non-null, and the routine should send or queue + * the packet, returning 0 if successful, 1 on failure. + * + * At the end, if head is non-null, there will be an additional call + * to the function with addr = NULL; this should tell the OS-specific + * routine to send the queue and free any resources. Failure is ignored. + */ +struct nm_os_gen_arg { + struct ifnet *ifp; + void *m; /* os-specific mbuf-like object */ + void *head, *tail; /* tailq, if the OS-specific routine needs to build one */ + void *addr; /* payload of current packet */ + u_int len; /* packet length */ + u_int ring_nr; /* packet length */ + u_int qevent; /* in txqdisc mode, place an event on this mbuf */ +}; + +int nm_os_generic_xmit_frame(struct nm_os_gen_arg *); +int nm_os_generic_find_num_desc(struct ifnet *ifp, u_int *tx, u_int *rx); +void nm_os_generic_find_num_queues(struct ifnet *ifp, u_int *txq, u_int *rxq); +void nm_os_generic_set_features(struct netmap_generic_adapter *gna); -int netmap_catch_rx(struct netmap_generic_adapter *na, int intercept); -void generic_rx_handler(struct ifnet *ifp, struct mbuf *m);; -void netmap_catch_tx(struct netmap_generic_adapter *na, int enable); -int generic_xmit_frame(struct ifnet *ifp, struct mbuf *m, void *addr, u_int len, u_int ring_nr); -int generic_find_num_desc(struct ifnet *ifp, u_int *tx, u_int *rx); -void generic_find_num_queues(struct ifnet *ifp, u_int *txq, u_int *rxq); static inline struct ifnet* netmap_generic_getifp(struct netmap_generic_adapter *gna) { @@ -1545,6 +1877,8 @@ return gna->up.up.ifp; } +void netmap_generic_irq(struct netmap_adapter *na, u_int q, u_int *work_done); + //#define RATE_GENERIC /* Enables communication statistics for generic. */ #ifdef RATE_GENERIC void generic_rate(int txp, int txs, int txi, int rxp, int rxs, int rxi); @@ -1557,16 +1891,16 @@ * to reduce the number of interrupt requests/selwakeup * to clients on incoming packets. */ -void netmap_mitigation_init(struct nm_generic_mit *mit, int idx, +void nm_os_mitigation_init(struct nm_generic_mit *mit, int idx, struct netmap_adapter *na); -void netmap_mitigation_start(struct nm_generic_mit *mit); -void netmap_mitigation_restart(struct nm_generic_mit *mit); -int netmap_mitigation_active(struct nm_generic_mit *mit); -void netmap_mitigation_cleanup(struct nm_generic_mit *mit); +void nm_os_mitigation_start(struct nm_generic_mit *mit); +void nm_os_mitigation_restart(struct nm_generic_mit *mit); +int nm_os_mitigation_active(struct nm_generic_mit *mit); +void nm_os_mitigation_cleanup(struct nm_generic_mit *mit); +#else /* !WITH_GENERIC */ +#define generic_netmap_attach(ifp) (EOPNOTSUPP) #endif /* WITH_GENERIC */ - - /* Shared declarations for the VALE switch. */ /* @@ -1655,22 +1989,110 @@ */ #define rawsum_t uint32_t -rawsum_t nm_csum_raw(uint8_t *data, size_t len, rawsum_t cur_sum); -uint16_t nm_csum_ipv4(struct nm_iphdr *iph); -void nm_csum_tcpudp_ipv4(struct nm_iphdr *iph, void *data, +rawsum_t nm_os_csum_raw(uint8_t *data, size_t len, rawsum_t cur_sum); +uint16_t nm_os_csum_ipv4(struct nm_iphdr *iph); +void nm_os_csum_tcpudp_ipv4(struct nm_iphdr *iph, void *data, size_t datalen, uint16_t *check); -void nm_csum_tcpudp_ipv6(struct nm_ipv6hdr *ip6h, void *data, +void nm_os_csum_tcpudp_ipv6(struct nm_ipv6hdr *ip6h, void *data, size_t datalen, uint16_t *check); -uint16_t nm_csum_fold(rawsum_t cur_sum); +uint16_t nm_os_csum_fold(rawsum_t cur_sum); void bdg_mismatch_datapath(struct netmap_vp_adapter *na, struct netmap_vp_adapter *dst_na, - struct nm_bdg_fwd *ft_p, struct netmap_ring *ring, + const struct nm_bdg_fwd *ft_p, + struct netmap_ring *dst_ring, u_int *j, u_int lim, u_int *howmany); /* persistent virtual port routines */ -int nm_vi_persist(const char *, struct ifnet **); -void nm_vi_detach(struct ifnet *); -void nm_vi_init_index(void); +int nm_os_vi_persist(const char *, struct ifnet **); +void nm_os_vi_detach(struct ifnet *); +void nm_os_vi_init_index(void); + +/* + * kernel thread routines + */ +struct nm_kthread; /* OS-specific kthread - opaque */ +typedef void (*nm_kthread_worker_fn_t)(void *data); + +/* kthread configuration */ +struct nm_kthread_cfg { + long type; /* kthread type/identifier */ + nm_kthread_worker_fn_t worker_fn; /* worker function */ + void *worker_private;/* worker parameter */ + int attach_user; /* attach kthread to user process */ +}; +/* kthread configuration */ +struct nm_kthread *nm_os_kthread_create(struct nm_kthread_cfg *cfg, + unsigned int cfgtype, + void *opaque); +int nm_os_kthread_start(struct nm_kthread *); +void nm_os_kthread_stop(struct nm_kthread *); +void nm_os_kthread_delete(struct nm_kthread *); +void nm_os_kthread_wakeup_worker(struct nm_kthread *nmk); +void nm_os_kthread_send_irq(struct nm_kthread *); +void nm_os_kthread_set_affinity(struct nm_kthread *, int); +u_int nm_os_ncpus(void); + +#ifdef WITH_PTNETMAP_HOST +/* + * netmap adapter for host ptnetmap ports + */ +struct netmap_pt_host_adapter { + struct netmap_adapter up; + + struct netmap_adapter *parent; + int (*parent_nm_notify)(struct netmap_kring *kring, int flags); + void *ptns; +}; +/* ptnetmap HOST routines */ +int netmap_get_pt_host_na(struct nmreq *nmr, struct netmap_adapter **na, int create); +int ptnetmap_ctl(struct nmreq *nmr, struct netmap_adapter *na); +static inline int +nm_ptnetmap_host_on(struct netmap_adapter *na) +{ + return na && na->na_flags & NAF_PTNETMAP_HOST; +} +#else /* !WITH_PTNETMAP_HOST */ +#define netmap_get_pt_host_na(nmr, _2, _3) \ + ((nmr)->nr_flags & (NR_PTNETMAP_HOST) ? EOPNOTSUPP : 0) +#define ptnetmap_ctl(_1, _2) EINVAL +#define nm_ptnetmap_host_on(_1) EINVAL +#endif /* !WITH_PTNETMAP_HOST */ + +#ifdef WITH_PTNETMAP_GUEST +/* ptnetmap GUEST routines */ + +/* + * netmap adapter for guest ptnetmap ports + */ +struct netmap_pt_guest_adapter { + /* The netmap adapter to be used by netmap applications. + * This field must be the first, to allow upcast. */ + struct netmap_hw_adapter hwup; + + /* The netmap adapter to be used by the driver. */ + struct netmap_hw_adapter dr; + + void *csb; + + /* Reference counter to track users of backend netmap port: the + * network stack and netmap clients. + * Used to decide when we need (de)allocate krings/rings and + * start (stop) ptnetmap kthreads. */ + int backend_regifs; + +}; + +int netmap_pt_guest_attach(struct netmap_adapter *na, void *csb, + unsigned int nifp_offset, unsigned int memid); +struct ptnet_ring; +bool netmap_pt_guest_txsync(struct ptnet_ring *ptring, struct netmap_kring *kring, + int flags); +bool netmap_pt_guest_rxsync(struct ptnet_ring *ptring, struct netmap_kring *kring, + int flags); +int ptnet_nm_krings_create(struct netmap_adapter *na); +void ptnet_nm_krings_delete(struct netmap_adapter *na); +void ptnet_nm_dtor(struct netmap_adapter *na); +#endif /* WITH_PTNETMAP_GUEST */ #endif /* _NET_NETMAP_KERN_H_ */ diff -u -r -N usr/src/sys/dev/netmap/netmap_mbq.c /usr/src/sys/dev/netmap/netmap_mbq.c --- usr/src/sys/dev/netmap/netmap_mbq.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_mbq.c 2016-11-23 16:57:57.850945000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2013-2014 Vincenzo Maffione. All rights reserved. + * Copyright (C) 2013-2014 Vincenzo Maffione + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -24,12 +25,14 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/netmap_mbq.c 267177 2014-06-06 18:02:32Z luigi $ + * $FreeBSD$ */ #ifdef linux #include "bsd_glue.h" +#elif defined (_WIN32) +#include "win_glue.h" #else /* __FreeBSD__ */ #include <sys/param.h> #include <sys/lock.h> @@ -152,12 +155,12 @@ } -void mbq_safe_destroy(struct mbq *q) +void mbq_safe_fini(struct mbq *q) { mtx_destroy(&q->lock); } -void mbq_destroy(struct mbq *q) +void mbq_fini(struct mbq *q) { } diff -u -r -N usr/src/sys/dev/netmap/netmap_mbq.h /usr/src/sys/dev/netmap/netmap_mbq.h --- usr/src/sys/dev/netmap/netmap_mbq.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_mbq.h 2016-11-23 16:57:57.851245000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2013-2014 Vincenzo Maffione. All rights reserved. + * Copyright (C) 2013-2014 Vincenzo Maffione + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -24,7 +25,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/netmap_mbq.h 270063 2014-08-16 15:00:01Z luigi $ + * $FreeBSD$ */ @@ -40,6 +41,8 @@ /* XXX probably rely on a previous definition of SPINLOCK_T */ #ifdef linux #define SPINLOCK_T safe_spinlock_t +#elif defined (_WIN32) +#define SPINLOCK_T win_spinlock_t #else #define SPINLOCK_T struct mtx #endif @@ -52,16 +55,21 @@ SPINLOCK_T lock; }; -/* XXX "destroy" does not match "init" as a name. - * We should also clarify whether init can be used while +/* We should clarify whether init can be used while * holding a lock, and whether mbq_safe_destroy() is a NOP. */ void mbq_init(struct mbq *q); -void mbq_destroy(struct mbq *q); +void mbq_fini(struct mbq *q); void mbq_enqueue(struct mbq *q, struct mbuf *m); struct mbuf *mbq_dequeue(struct mbq *q); void mbq_purge(struct mbq *q); +static inline struct mbuf * +mbq_peek(struct mbq *q) +{ + return q->head ? q->head : NULL; +} + static inline void mbq_lock(struct mbq *q) { @@ -76,7 +84,7 @@ void mbq_safe_init(struct mbq *q); -void mbq_safe_destroy(struct mbq *q); +void mbq_safe_fini(struct mbq *q); void mbq_safe_enqueue(struct mbq *q, struct mbuf *m); struct mbuf *mbq_safe_dequeue(struct mbq *q); void mbq_safe_purge(struct mbq *q); diff -u -r -N usr/src/sys/dev/netmap/netmap_mem2.c /usr/src/sys/dev/netmap/netmap_mem2.c --- usr/src/sys/dev/netmap/netmap_mem2.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_mem2.c 2016-11-23 16:57:57.852144000 +0000 @@ -1,5 +1,8 @@ /* - * Copyright (C) 2012-2014 Matteo Landi, Luigi Rizzo, Giuseppe Lettieri. All rights reserved. + * Copyright (C) 2012-2014 Matteo Landi + * Copyright (C) 2012-2016 Luigi Rizzo + * Copyright (C) 2012-2016 Giuseppe Lettieri + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -33,10 +36,11 @@ #ifdef __FreeBSD__ #include <sys/cdefs.h> /* prerequisite */ -__FBSDID("$FreeBSD: releng/11.0/sys/dev/netmap/netmap_mem2.c 285349 2015-07-10 05:51:36Z luigi $"); +__FBSDID("$FreeBSD: head/sys/dev/netmap/netmap.c 241723 2012-10-19 09:41:45Z glebius $"); #include <sys/types.h> #include <sys/malloc.h> +#include <sys/kernel.h> /* MALLOC_DEFINE */ #include <sys/proc.h> #include <vm/vm.h> /* vtophys */ #include <vm/pmap.h> /* vtophys */ @@ -48,13 +52,26 @@ #include <net/vnet.h> #include <machine/bus.h> /* bus_dmamap_* */ +/* M_NETMAP only used in here */ +MALLOC_DECLARE(M_NETMAP); +MALLOC_DEFINE(M_NETMAP, "netmap", "Network memory map"); + #endif /* __FreeBSD__ */ +#ifdef _WIN32 +#include <win_glue.h> +#endif + #include <net/netmap.h> #include <dev/netmap/netmap_kern.h> +#include <net/netmap_virt.h> #include "netmap_mem2.h" -#define NETMAP_BUF_MAX_NUM 20*4096*2 /* large machine */ +#ifdef _WIN32_USE_SMALL_GENERIC_DEVICES_MEMORY +#define NETMAP_BUF_MAX_NUM 8*4096 /* if too big takes too much time to allocate */ +#else +#define NETMAP_BUF_MAX_NUM 20*4096*2 /* large machine */ +#endif #define NETMAP_POOL_MAX_NAMSZ 32 @@ -70,6 +87,9 @@ struct netmap_obj_params { u_int size; u_int num; + + u_int last_size; + u_int last_num; }; struct netmap_obj_pool { @@ -111,7 +131,7 @@ struct netmap_mem_ops { - void (*nmd_get_lut)(struct netmap_mem_d *, struct netmap_lut*); + int (*nmd_get_lut)(struct netmap_mem_d *, struct netmap_lut*); int (*nmd_get_info)(struct netmap_mem_d *, u_int *size, u_int *memflags, uint16_t *id); @@ -128,14 +148,13 @@ void (*nmd_rings_delete)(struct netmap_adapter *); }; -typedef uint16_t nm_memid_t; - struct netmap_mem_d { NMA_LOCK_T nm_mtx; /* protect the allocator */ u_int nm_totalsize; /* shorthand */ u_int flags; #define NETMAP_MEM_FINALIZED 0x1 /* preallocation done */ +#define NETMAP_MEM_HIDDEN 0x8 /* beeing prepared */ int lasterr; /* last error for curr config */ int active; /* active users */ int refcount; @@ -149,8 +168,16 @@ struct netmap_mem_d *prev, *next; struct netmap_mem_ops *ops; + + struct netmap_obj_params params[NETMAP_POOLS_NR]; + +#define NM_MEM_NAMESZ 16 + char name[NM_MEM_NAMESZ]; }; +/* + * XXX need to fix the case of t0 == void + */ #define NMD_DEFCB(t0, name) \ t0 \ netmap_mem_##name(struct netmap_mem_d *nmd) \ @@ -186,7 +213,7 @@ return na->nm_mem->ops->nmd_##name(na, a1); \ } -NMD_DEFCB1(void, get_lut, struct netmap_lut *); +NMD_DEFCB1(int, get_lut, struct netmap_lut *); NMD_DEFCB3(int, get_info, u_int *, u_int *, uint16_t *); NMD_DEFCB1(vm_paddr_t, ofstophys, vm_ooffset_t); static int netmap_mem_config(struct netmap_mem_d *); @@ -202,6 +229,13 @@ static int netmap_mem_map(struct netmap_obj_pool *, struct netmap_adapter *); static int netmap_mem_unmap(struct netmap_obj_pool *, struct netmap_adapter *); static int nm_mem_assign_group(struct netmap_mem_d *, struct device *); +static void nm_mem_release_id(struct netmap_mem_d *); + +nm_memid_t +netmap_mem_get_id(struct netmap_mem_d *nmd) +{ + return nmd->nm_id; +} #define NMA_LOCK_INIT(n) NM_MTX_INIT((n)->nm_mtx) #define NMA_LOCK_DESTROY(n) NM_MTX_DESTROY((n)->nm_mtx) @@ -215,29 +249,30 @@ #define NM_DBG_REFC(nmd, func, line) #endif -#ifdef NM_DEBUG_MEM_PUTGET -void __netmap_mem_get(struct netmap_mem_d *nmd, const char *func, int line) -#else -void netmap_mem_get(struct netmap_mem_d *nmd) -#endif +/* circular list of all existing allocators */ +static struct netmap_mem_d *netmap_last_mem_d = &nm_mem; +NM_MTX_T nm_mem_list_lock; + +struct netmap_mem_d * +netmap_mem_get(struct netmap_mem_d *nmd) { - NMA_LOCK(nmd); + NM_MTX_LOCK(nm_mem_list_lock); nmd->refcount++; NM_DBG_REFC(nmd, func, line); - NMA_UNLOCK(nmd); + NM_MTX_UNLOCK(nm_mem_list_lock); + return nmd; } -#ifdef NM_DEBUG_MEM_PUTGET -void __netmap_mem_put(struct netmap_mem_d *nmd, const char *func, int line) -#else -void netmap_mem_put(struct netmap_mem_d *nmd) -#endif +void +netmap_mem_put(struct netmap_mem_d *nmd) { int last; - NMA_LOCK(nmd); + NM_MTX_LOCK(nm_mem_list_lock); last = (--nmd->refcount == 0); + if (last) + nm_mem_release_id(nmd); NM_DBG_REFC(nmd, func, line); - NMA_UNLOCK(nmd); + NM_MTX_UNLOCK(nm_mem_list_lock); if (last) netmap_mem_delete(nmd); } @@ -248,7 +283,9 @@ if (nm_mem_assign_group(nmd, na->pdev) < 0) { return ENOMEM; } else { - nmd->ops->nmd_finalize(nmd); + NMA_LOCK(nmd); + nmd->lasterr = nmd->ops->nmd_finalize(nmd); + NMA_UNLOCK(nmd); } if (!nmd->lasterr && na->pdev) @@ -262,21 +299,72 @@ { NMA_LOCK(nmd); netmap_mem_unmap(&nmd->pools[NETMAP_BUF_POOL], na); + if (nmd->active == 1) { + u_int i; + + /* + * Reset the allocator when it falls out of use so that any + * pool resources leaked by unclean application exits are + * reclaimed. + */ + for (i = 0; i < NETMAP_POOLS_NR; i++) { + struct netmap_obj_pool *p; + u_int j; + + p = &nmd->pools[i]; + p->objfree = p->objtotal; + /* + * Reproduce the net effect of the M_ZERO malloc() + * and marking of free entries in the bitmap that + * occur in finalize_obj_allocator() + */ + memset(p->bitmap, + '\0', + sizeof(uint32_t) * ((p->objtotal + 31) / 32)); + + /* + * Set all the bits in the bitmap that have + * corresponding buffers to 1 to indicate they are + * free. + */ + for (j = 0; j < p->objtotal; j++) { + if (p->lut[j].vaddr != NULL) { + p->bitmap[ (j>>5) ] |= ( 1 << (j & 31) ); + } + } + } + + /* + * Per netmap_mem_finalize_all(), + * buffers 0 and 1 are reserved + */ + nmd->pools[NETMAP_BUF_POOL].objfree -= 2; + if (nmd->pools[NETMAP_BUF_POOL].bitmap) { + /* XXX This check is a workaround that prevents a + * NULL pointer crash which currently happens only + * with ptnetmap guests. + * Removed shared-info --> is the bug still there? */ + nmd->pools[NETMAP_BUF_POOL].bitmap[0] = ~3; + } + } + nmd->ops->nmd_deref(nmd); + NMA_UNLOCK(nmd); - return nmd->ops->nmd_deref(nmd); } /* accessor functions */ -static void +static int netmap_mem2_get_lut(struct netmap_mem_d *nmd, struct netmap_lut *lut) { lut->lut = nmd->pools[NETMAP_BUF_POOL].lut; lut->objtotal = nmd->pools[NETMAP_BUF_POOL].objtotal; lut->objsize = nmd->pools[NETMAP_BUF_POOL]._objsize; + + return 0; } -struct netmap_obj_params netmap_params[NETMAP_POOLS_NR] = { +static struct netmap_obj_params netmap_params[NETMAP_POOLS_NR] = { [NETMAP_IF_POOL] = { .size = 1024, .num = 100, @@ -291,10 +379,10 @@ }, }; -struct netmap_obj_params netmap_min_priv_params[NETMAP_POOLS_NR] = { +static struct netmap_obj_params netmap_min_priv_params[NETMAP_POOLS_NR] = { [NETMAP_IF_POOL] = { .size = 1024, - .num = 1, + .num = 2, }, [NETMAP_RING_POOL] = { .size = 5*PAGE_SIZE, @@ -338,21 +426,37 @@ }, }, + .params = { + [NETMAP_IF_POOL] = { + .size = 1024, + .num = 100, + }, + [NETMAP_RING_POOL] = { + .size = 9*PAGE_SIZE, + .num = 200, + }, + [NETMAP_BUF_POOL] = { + .size = 2048, + .num = NETMAP_BUF_MAX_NUM, + }, + }, + .nm_id = 1, .nm_grp = -1, .prev = &nm_mem, .next = &nm_mem, - .ops = &netmap_mem_global_ops -}; + .ops = &netmap_mem_global_ops, + .name = "1" +}; -struct netmap_mem_d *netmap_last_mem_d = &nm_mem; /* blueprint for the private memory allocators */ extern struct netmap_mem_ops netmap_mem_private_ops; /* forward */ -const struct netmap_mem_d nm_blueprint = { +/* XXX clang is not happy about using name as a print format */ +static const struct netmap_mem_d nm_blueprint = { .pools = { [NETMAP_IF_POOL] = { .name = "%s_if", @@ -377,9 +481,11 @@ }, }, + .nm_grp = -1, + .flags = NETMAP_MEM_PRIVATE, - .ops = &netmap_mem_private_ops + .ops = &netmap_mem_global_ops, }; /* memory allocator related sysctls */ @@ -388,6 +494,7 @@ #define DECLARE_SYSCTLS(id, name) \ + SYSBEGIN(mem2_ ## name); \ SYSCTL_INT(_dev_netmap, OID_AUTO, name##_size, \ CTLFLAG_RW, &netmap_params[id].size, 0, "Requested size of netmap " STRINGIFY(name) "s"); \ SYSCTL_INT(_dev_netmap, OID_AUTO, name##_curr_size, \ @@ -401,22 +508,22 @@ "Default size of private netmap " STRINGIFY(name) "s"); \ SYSCTL_INT(_dev_netmap, OID_AUTO, priv_##name##_num, \ CTLFLAG_RW, &netmap_min_priv_params[id].num, 0, \ - "Default number of private netmap " STRINGIFY(name) "s") + "Default number of private netmap " STRINGIFY(name) "s"); \ + SYSEND SYSCTL_DECL(_dev_netmap); DECLARE_SYSCTLS(NETMAP_IF_POOL, if); DECLARE_SYSCTLS(NETMAP_RING_POOL, ring); DECLARE_SYSCTLS(NETMAP_BUF_POOL, buf); +/* call with nm_mem_list_lock held */ static int -nm_mem_assign_id(struct netmap_mem_d *nmd) +nm_mem_assign_id_locked(struct netmap_mem_d *nmd) { nm_memid_t id; struct netmap_mem_d *scan = netmap_last_mem_d; int error = ENOMEM; - NMA_LOCK(&nm_mem); - do { /* we rely on unsigned wrap around */ id = scan->nm_id + 1; @@ -430,20 +537,32 @@ scan->prev->next = nmd; scan->prev = nmd; netmap_last_mem_d = nmd; + nmd->refcount = 1; error = 0; break; } } while (scan != netmap_last_mem_d); - NMA_UNLOCK(&nm_mem); return error; } +/* call with nm_mem_list_lock *not* held */ +static int +nm_mem_assign_id(struct netmap_mem_d *nmd) +{ + int ret; + + NM_MTX_LOCK(nm_mem_list_lock); + ret = nm_mem_assign_id_locked(nmd); + NM_MTX_UNLOCK(nm_mem_list_lock); + + return ret; +} + +/* call with nm_mem_list_lock held */ static void nm_mem_release_id(struct netmap_mem_d *nmd) { - NMA_LOCK(&nm_mem); - nmd->prev->next = nmd->next; nmd->next->prev = nmd->prev; @@ -451,8 +570,25 @@ netmap_last_mem_d = nmd->prev; nmd->prev = nmd->next = NULL; +} + +struct netmap_mem_d * +netmap_mem_find(nm_memid_t id) +{ + struct netmap_mem_d *nmd; - NMA_UNLOCK(&nm_mem); + NM_MTX_LOCK(nm_mem_list_lock); + nmd = netmap_last_mem_d; + do { + if (!(nmd->flags & NETMAP_MEM_HIDDEN) && nmd->nm_id == id) { + nmd->refcount++; + NM_MTX_UNLOCK(nm_mem_list_lock); + return nmd; + } + nmd = nmd->next; + } while (nmd != netmap_last_mem_d); + NM_MTX_UNLOCK(nm_mem_list_lock); + return NULL; } static int @@ -494,8 +630,13 @@ if (offset >= p[i].memtotal) continue; // now lookup the cluster's address +#ifndef _WIN32 pa = vtophys(p[i].lut[offset / p[i]._objsize].vaddr) + offset % p[i]._objsize; +#else + pa = vtophys(p[i].lut[offset / p[i]._objsize].vaddr); + pa.QuadPart += offset % p[i]._objsize; +#endif NMA_UNLOCK(nmd); return pa; } @@ -508,7 +649,110 @@ + p[NETMAP_RING_POOL].memtotal + p[NETMAP_BUF_POOL].memtotal); NMA_UNLOCK(nmd); +#ifndef _WIN32 return 0; // XXX bad address +#else + vm_paddr_t res; + res.QuadPart = 0; + return res; +#endif +} + +#ifdef _WIN32 + +/* + * win32_build_virtual_memory_for_userspace + * + * This function get all the object making part of the pools and maps + * a contiguous virtual memory space for the userspace + * It works this way + * 1 - allocate a Memory Descriptor List wide as the sum + * of the memory needed for the pools + * 2 - cycle all the objects in every pool and for every object do + * + * 2a - cycle all the objects in every pool, get the list + * of the physical address descriptors + * 2b - calculate the offset in the array of pages desciptor in the + * main MDL + * 2c - copy the descriptors of the object in the main MDL + * + * 3 - return the resulting MDL that needs to be mapped in userland + * + * In this way we will have an MDL that describes all the memory for the + * objects in a single object +*/ + +PMDL +win32_build_user_vm_map(struct netmap_mem_d* nmd) +{ + int i, j; + u_int memsize, memflags, ofs = 0; + PMDL mainMdl, tempMdl; + + if (netmap_mem_get_info(nmd, &memsize, &memflags, NULL)) { + D("memory not finalised yet"); + return NULL; + } + + mainMdl = IoAllocateMdl(NULL, memsize, FALSE, FALSE, NULL); + if (mainMdl == NULL) { + D("failed to allocate mdl"); + return NULL; + } + + NMA_LOCK(nmd); + for (i = 0; i < NETMAP_POOLS_NR; i++) { + struct netmap_obj_pool *p = &nmd->pools[i]; + int clsz = p->_clustsize; + int clobjs = p->_clustentries; /* objects per cluster */ + int mdl_len = sizeof(PFN_NUMBER) * BYTES_TO_PAGES(clsz); + PPFN_NUMBER pSrc, pDst; + + /* each pool has a different cluster size so we need to reallocate */ + tempMdl = IoAllocateMdl(p->lut[0].vaddr, clsz, FALSE, FALSE, NULL); + if (tempMdl == NULL) { + NMA_UNLOCK(nmd); + D("fail to allocate tempMdl"); + IoFreeMdl(mainMdl); + return NULL; + } + pSrc = MmGetMdlPfnArray(tempMdl); + /* create one entry per cluster, the lut[] has one entry per object */ + for (j = 0; j < p->numclusters; j++, ofs += clsz) { + pDst = &MmGetMdlPfnArray(mainMdl)[BYTES_TO_PAGES(ofs)]; + MmInitializeMdl(tempMdl, p->lut[j*clobjs].vaddr, clsz); + MmBuildMdlForNonPagedPool(tempMdl); /* compute physical page addresses */ + RtlCopyMemory(pDst, pSrc, mdl_len); /* copy the page descriptors */ + mainMdl->MdlFlags = tempMdl->MdlFlags; /* XXX what is in here ? */ + } + IoFreeMdl(tempMdl); + } + NMA_UNLOCK(nmd); + return mainMdl; +} + +#endif /* _WIN32 */ + +/* + * helper function for OS-specific mmap routines (currently only windows). + * Given an nmd and a pool index, returns the cluster size and number of clusters. + * Returns 0 if memory is finalised and the pool is valid, otherwise 1. + * It should be called under NMA_LOCK(nmd) otherwise the underlying info can change. + */ + +int +netmap_mem2_get_pool_info(struct netmap_mem_d* nmd, u_int pool, u_int *clustsize, u_int *numclusters) +{ + if (!nmd || !clustsize || !numclusters || pool >= NETMAP_POOLS_NR) + return 1; /* invalid arguments */ + // NMA_LOCK_ASSERT(nmd); + if (!(nmd->flags & NETMAP_MEM_FINALIZED)) { + *clustsize = *numclusters = 0; + return 1; /* not ready yet */ + } + *clustsize = nmd->pools[pool]._clustsize; + *numclusters = nmd->pools[pool].numclusters; + return 0; /* success */ } static int @@ -578,12 +822,6 @@ ((n)->pools[NETMAP_IF_POOL].memtotal + \ netmap_obj_offset(&(n)->pools[NETMAP_RING_POOL], (v))) -#define netmap_buf_offset(n, v) \ - ((n)->pools[NETMAP_IF_POOL].memtotal + \ - (n)->pools[NETMAP_RING_POOL].memtotal + \ - netmap_obj_offset(&(n)->pools[NETMAP_BUF_POOL], (v))) - - static ssize_t netmap_mem2_if_offset(struct netmap_mem_d *nmd, const void *addr) { @@ -602,7 +840,7 @@ netmap_obj_malloc(struct netmap_obj_pool *p, u_int len, uint32_t *start, uint32_t *index) { uint32_t i = 0; /* index in the bitmap */ - uint32_t mask, j; /* slot counter */ + uint32_t mask, j = 0; /* slot counter */ void *vaddr = NULL; if (len > p->_objsize) { @@ -636,7 +874,7 @@ if (index) *index = i * 32 + j; } - ND("%s allocator: allocated object @ [%d][%d]: vaddr %p", i, j, vaddr); + ND("%s allocator: allocated object @ [%d][%d]: vaddr %p",p->name, i, j, vaddr); if (start) *start = i; @@ -733,7 +971,7 @@ *head = cur; /* restore */ break; } - RD(5, "allocate buffer %d -> %d", *head, cur); + ND(5, "allocate buffer %d -> %d", *head, cur); *p = cur; /* link to previous head */ } @@ -750,7 +988,7 @@ struct netmap_obj_pool *p = &nmd->pools[NETMAP_BUF_POOL]; uint32_t i, cur, *buf; - D("freeing the extra list"); + ND("freeing the extra list"); for (i = 0; head >=2 && head < p->objtotal; i++) { cur = head; buf = lut[head].vaddr; @@ -761,7 +999,8 @@ } if (head != 0) D("breaking with head %d", head); - D("freed %d buffers", i); + if (netmap_verbose) + D("freed %d buffers", i); } @@ -842,11 +1081,10 @@ if (p == NULL) return; if (p->bitmap) - free(p->bitmap, M_NETMAP); + nm_os_free(p->bitmap); p->bitmap = NULL; if (p->lut) { u_int i; - size_t sz = p->_clustsize; /* * Free each cluster allocated in @@ -856,13 +1094,13 @@ */ for (i = 0; i < p->objtotal; i += p->_clustentries) { if (p->lut[i].vaddr) - contigfree(p->lut[i].vaddr, sz, M_NETMAP); + contigfree(p->lut[i].vaddr, p->_clustsize, M_NETMAP); } bzero(p->lut, sizeof(struct lut_entry) * p->objtotal); #ifdef linux vfree(p->lut); #else - free(p->lut, M_NETMAP); + nm_os_free(p->lut); #endif } p->lut = NULL; @@ -973,6 +1211,18 @@ return 0; } +static struct lut_entry * +nm_alloc_lut(u_int nobj) +{ + size_t n = sizeof(struct lut_entry) * nobj; + struct lut_entry *lut; +#ifdef linux + lut = vmalloc(n); +#else + lut = nm_os_malloc(n); +#endif + return lut; +} /* call with NMA_LOCK held */ static int @@ -985,20 +1235,15 @@ p->numclusters = p->_numclusters; p->objtotal = p->_objtotal; - n = sizeof(struct lut_entry) * p->objtotal; -#ifdef linux - p->lut = vmalloc(n); -#else - p->lut = malloc(n, M_NETMAP, M_NOWAIT | M_ZERO); -#endif + p->lut = nm_alloc_lut(p->objtotal); if (p->lut == NULL) { - D("Unable to create lookup table (%d bytes) for '%s'", (int)n, p->name); + D("Unable to create lookup table for '%s'", p->name); goto clean; } /* Allocate the bitmap */ n = (p->objtotal + 31) / 32; - p->bitmap = malloc(sizeof(uint32_t) * n, M_NETMAP, M_NOWAIT | M_ZERO); + p->bitmap = nm_os_malloc(sizeof(uint32_t) * n); if (p->bitmap == NULL) { D("Unable to create bitmap (%d entries) for allocator '%s'", (int)n, p->name); @@ -1015,6 +1260,13 @@ int lim = i + p->_clustentries; char *clust; + /* + * XXX Note, we only need contigmalloc() for buffers attached + * to native interfaces. In all other cases (nifp, netmap rings + * and even buffers for VALE ports or emulated interfaces) we + * can live with standard malloc, because the hardware will not + * access the pages directly. + */ clust = contigmalloc(n, M_NETMAP, M_NOWAIT | M_ZERO, (size_t)0, -1UL, PAGE_SIZE, 0); if (clust == NULL) { @@ -1075,16 +1327,18 @@ /* call with lock held */ static int -netmap_memory_config_changed(struct netmap_mem_d *nmd) +netmap_mem_params_changed(struct netmap_obj_params* p) { - int i; + int i, rv = 0; for (i = 0; i < NETMAP_POOLS_NR; i++) { - if (nmd->pools[i].r_objsize != netmap_params[i].size || - nmd->pools[i].r_objtotal != netmap_params[i].num) - return 1; + if (p[i].last_size != p[i].size || p[i].last_num != p[i].num) { + p[i].last_size = p[i].size; + p[i].last_num = p[i].num; + rv = 1; + } } - return 0; + return rv; } static void @@ -1105,13 +1359,18 @@ { int i, lim = p->_objtotal; - if (na->pdev == NULL) + if (na == NULL || na->pdev == NULL) return 0; -#ifdef __FreeBSD__ +#if defined(__FreeBSD__) (void)i; (void)lim; D("unsupported on FreeBSD"); + +#elif defined(_WIN32) + (void)i; + (void)lim; + D("unsupported on Windows"); //XXX_ale, really? #else /* linux */ for (i = 2; i < lim; i++) { netmap_unload_map(na, (bus_dma_tag_t) na->pdev, &p->lut[i].paddr); @@ -1124,8 +1383,10 @@ static int netmap_mem_map(struct netmap_obj_pool *p, struct netmap_adapter *na) { -#ifdef __FreeBSD__ +#if defined(__FreeBSD__) D("unsupported on FreeBSD"); +#elif defined(_WIN32) + D("unsupported on Windows"); //XXX_ale, really? #else /* linux */ int i, lim = p->_objtotal; @@ -1176,69 +1437,16 @@ return nmd->lasterr; } - - -static void -netmap_mem_private_delete(struct netmap_mem_d *nmd) -{ - if (nmd == NULL) - return; - if (netmap_verbose) - D("deleting %p", nmd); - if (nmd->active > 0) - D("bug: deleting mem allocator with active=%d!", nmd->active); - nm_mem_release_id(nmd); - if (netmap_verbose) - D("done deleting %p", nmd); - NMA_LOCK_DESTROY(nmd); - free(nmd, M_DEVBUF); -} - -static int -netmap_mem_private_config(struct netmap_mem_d *nmd) -{ - /* nothing to do, we are configured on creation - * and configuration never changes thereafter - */ - return 0; -} - -static int -netmap_mem_private_finalize(struct netmap_mem_d *nmd) -{ - int err; - NMA_LOCK(nmd); - nmd->active++; - err = netmap_mem_finalize_all(nmd); - NMA_UNLOCK(nmd); - return err; - -} - -static void -netmap_mem_private_deref(struct netmap_mem_d *nmd) -{ - NMA_LOCK(nmd); - if (--nmd->active <= 0) - netmap_mem_reset_all(nmd); - NMA_UNLOCK(nmd); -} - - /* * allocator for private memory */ -struct netmap_mem_d * -netmap_mem_private_new(const char *name, u_int txr, u_int txd, - u_int rxr, u_int rxd, u_int extra_bufs, u_int npipes, int *perr) +static struct netmap_mem_d * +_netmap_mem_private_new(struct netmap_obj_params *p, int *perr) { struct netmap_mem_d *d = NULL; - struct netmap_obj_params p[NETMAP_POOLS_NR]; - int i, err; - u_int v, maxd; + int i, err = 0; - d = malloc(sizeof(struct netmap_mem_d), - M_DEVBUF, M_NOWAIT | M_ZERO); + d = nm_os_malloc(sizeof(struct netmap_mem_d)); if (d == NULL) { err = ENOMEM; goto error; @@ -1249,7 +1457,41 @@ err = nm_mem_assign_id(d); if (err) goto error; + snprintf(d->name, NM_MEM_NAMESZ, "%d", d->nm_id); + for (i = 0; i < NETMAP_POOLS_NR; i++) { + snprintf(d->pools[i].name, NETMAP_POOL_MAX_NAMSZ, + nm_blueprint.pools[i].name, + d->name); + d->params[i].num = p[i].num; + d->params[i].size = p[i].size; + } + + NMA_LOCK_INIT(d); + + err = netmap_mem_config(d); + if (err) + goto error; + + d->flags &= ~NETMAP_MEM_FINALIZED; + + return d; + +error: + netmap_mem_delete(d); + if (perr) + *perr = err; + return NULL; +} + +struct netmap_mem_d * +netmap_mem_private_new(u_int txr, u_int txd, u_int rxr, u_int rxd, + u_int extra_bufs, u_int npipes, int *perr) +{ + struct netmap_mem_d *d = NULL; + struct netmap_obj_params p[NETMAP_POOLS_NR]; + int i, err = 0; + u_int v, maxd; /* account for the fake host rings */ txr++; rxr++; @@ -1295,23 +1537,13 @@ p[NETMAP_BUF_POOL].num, p[NETMAP_BUF_POOL].size); - for (i = 0; i < NETMAP_POOLS_NR; i++) { - snprintf(d->pools[i].name, NETMAP_POOL_MAX_NAMSZ, - nm_blueprint.pools[i].name, - name); - err = netmap_config_obj_allocator(&d->pools[i], - p[i].num, p[i].size); - if (err) - goto error; - } - - d->flags &= ~NETMAP_MEM_FINALIZED; - - NMA_LOCK_INIT(d); + d = _netmap_mem_private_new(p, perr); + if (d == NULL) + goto error; return d; error: - netmap_mem_private_delete(d); + netmap_mem_delete(d); if (perr) *perr = err; return NULL; @@ -1320,7 +1552,7 @@ /* call with lock held */ static int -netmap_mem_global_config(struct netmap_mem_d *nmd) +netmap_mem2_config(struct netmap_mem_d *nmd) { int i; @@ -1328,7 +1560,7 @@ /* already in use, we cannot change the configuration */ goto out; - if (!netmap_memory_config_changed(nmd)) + if (!netmap_mem_params_changed(nmd->params)) goto out; ND("reconfiguring"); @@ -1343,7 +1575,7 @@ for (i = 0; i < NETMAP_POOLS_NR; i++) { nmd->lasterr = netmap_config_obj_allocator(&nmd->pools[i], - netmap_params[i].num, netmap_params[i].size); + nmd->params[i].num, nmd->params[i].size); if (nmd->lasterr) goto out; } @@ -1354,13 +1586,13 @@ } static int -netmap_mem_global_finalize(struct netmap_mem_d *nmd) +netmap_mem2_finalize(struct netmap_mem_d *nmd) { int err; - + /* update configuration if changed */ - if (netmap_mem_global_config(nmd)) - goto out; + if (netmap_mem2_config(nmd)) + goto out1; nmd->active++; @@ -1378,6 +1610,7 @@ out: if (nmd->lasterr) nmd->active--; +out1: err = nmd->lasterr; return err; @@ -1385,20 +1618,23 @@ } static void -netmap_mem_global_delete(struct netmap_mem_d *nmd) +netmap_mem2_delete(struct netmap_mem_d *nmd) { int i; for (i = 0; i < NETMAP_POOLS_NR; i++) { - netmap_destroy_obj_allocator(&nm_mem.pools[i]); + netmap_destroy_obj_allocator(&nmd->pools[i]); } - NMA_LOCK_DESTROY(&nm_mem); + NMA_LOCK_DESTROY(nmd); + if (nmd != &nm_mem) + nm_os_free(nmd); } int netmap_mem_init(void) { + NM_MTX_INIT(nm_mem_list_lock); NMA_LOCK_INIT(&nm_mem); netmap_mem_get(&nm_mem); return (0); @@ -1417,13 +1653,17 @@ for_rx_tx(t) { u_int i; - for (i = 0; i < netmap_real_rings(na, t); i++) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { struct netmap_kring *kring = &NMR(na, t)[i]; struct netmap_ring *ring = kring->ring; - if (ring == NULL) + if (ring == NULL || kring->users > 0 || (kring->nr_kflags & NKR_NEEDRING)) { + ND("skipping ring %s (ring %p, users %d)", + kring->name, ring, kring->users); continue; - netmap_free_bufs(na->nm_mem, ring->slot, kring->nkr_num_slots); + } + if (i != nma_get_nrings(na, t) || na->na_flags & NAF_HOST_RINGS) + netmap_free_bufs(na->nm_mem, ring->slot, kring->nkr_num_slots); netmap_ring_free(na->nm_mem, ring); kring->ring = NULL; } @@ -1452,9 +1692,10 @@ struct netmap_ring *ring = kring->ring; u_int len, ndesc; - if (ring) { - ND("%s already created", kring->name); - continue; /* already created by somebody else */ + if (ring || (!kring->users && !(kring->nr_kflags & NKR_NEEDRING))) { + /* uneeded, or already created by somebody else */ + ND("skipping ring %s", kring->name); + continue; } ndesc = kring->nkr_num_slots; len = sizeof(struct netmap_ring) + @@ -1569,10 +1810,22 @@ */ base = netmap_if_offset(na->nm_mem, nifp); for (i = 0; i < n[NR_TX]; i++) { + if (na->tx_rings[i].ring == NULL) { + // XXX maybe use the offset of an error ring, + // like we do for buffers? + *(ssize_t *)(uintptr_t)&nifp->ring_ofs[i] = 0; + continue; + } *(ssize_t *)(uintptr_t)&nifp->ring_ofs[i] = netmap_ring_offset(na->nm_mem, na->tx_rings[i].ring) - base; } for (i = 0; i < n[NR_RX]; i++) { + if (na->rx_rings[i].ring == NULL) { + // XXX maybe use the offset of an error ring, + // like we do for buffers? + *(ssize_t *)(uintptr_t)&nifp->ring_ofs[i+n[NR_TX]] = 0; + continue; + } *(ssize_t *)(uintptr_t)&nifp->ring_ofs[i+n[NR_TX]] = netmap_ring_offset(na->nm_mem, na->rx_rings[i].ring) - base; } @@ -1597,7 +1850,7 @@ } static void -netmap_mem_global_deref(struct netmap_mem_d *nmd) +netmap_mem2_deref(struct netmap_mem_d *nmd) { nmd->active--; @@ -1612,27 +1865,551 @@ .nmd_get_lut = netmap_mem2_get_lut, .nmd_get_info = netmap_mem2_get_info, .nmd_ofstophys = netmap_mem2_ofstophys, - .nmd_config = netmap_mem_global_config, - .nmd_finalize = netmap_mem_global_finalize, - .nmd_deref = netmap_mem_global_deref, - .nmd_delete = netmap_mem_global_delete, + .nmd_config = netmap_mem2_config, + .nmd_finalize = netmap_mem2_finalize, + .nmd_deref = netmap_mem2_deref, + .nmd_delete = netmap_mem2_delete, .nmd_if_offset = netmap_mem2_if_offset, .nmd_if_new = netmap_mem2_if_new, .nmd_if_delete = netmap_mem2_if_delete, .nmd_rings_create = netmap_mem2_rings_create, .nmd_rings_delete = netmap_mem2_rings_delete }; -struct netmap_mem_ops netmap_mem_private_ops = { - .nmd_get_lut = netmap_mem2_get_lut, - .nmd_get_info = netmap_mem2_get_info, - .nmd_ofstophys = netmap_mem2_ofstophys, - .nmd_config = netmap_mem_private_config, - .nmd_finalize = netmap_mem_private_finalize, - .nmd_deref = netmap_mem_private_deref, - .nmd_if_offset = netmap_mem2_if_offset, - .nmd_delete = netmap_mem_private_delete, - .nmd_if_new = netmap_mem2_if_new, - .nmd_if_delete = netmap_mem2_if_delete, - .nmd_rings_create = netmap_mem2_rings_create, - .nmd_rings_delete = netmap_mem2_rings_delete + +int +netmap_mem_pools_info_get(struct nmreq *nmr, struct netmap_adapter *na) +{ + uintptr_t *pp = (uintptr_t *)&nmr->nr_arg1; + struct netmap_pools_info *upi = (struct netmap_pools_info *)(*pp); + struct netmap_mem_d *nmd = na->nm_mem; + struct netmap_pools_info pi; + unsigned int memsize; + uint16_t memid; + int ret; + + if (!nmd) { + return -1; + } + + ret = netmap_mem_get_info(nmd, &memsize, NULL, &memid); + if (ret) { + return ret; + } + + pi.memsize = memsize; + pi.memid = memid; + pi.if_pool_offset = 0; + pi.if_pool_objtotal = nmd->pools[NETMAP_IF_POOL].objtotal; + pi.if_pool_objsize = nmd->pools[NETMAP_IF_POOL]._objsize; + + pi.ring_pool_offset = nmd->pools[NETMAP_IF_POOL].memtotal; + pi.ring_pool_objtotal = nmd->pools[NETMAP_RING_POOL].objtotal; + pi.ring_pool_objsize = nmd->pools[NETMAP_RING_POOL]._objsize; + + pi.buf_pool_offset = nmd->pools[NETMAP_IF_POOL].memtotal + + nmd->pools[NETMAP_RING_POOL].memtotal; + pi.buf_pool_objtotal = nmd->pools[NETMAP_BUF_POOL].objtotal; + pi.buf_pool_objsize = nmd->pools[NETMAP_BUF_POOL]._objsize; + + ret = copyout(&pi, upi, sizeof(pi)); + if (ret) { + return ret; + } + + return 0; +} + +#ifdef WITH_PTNETMAP_GUEST +struct mem_pt_if { + struct mem_pt_if *next; + struct ifnet *ifp; + unsigned int nifp_offset; +}; + +/* Netmap allocator for ptnetmap guests. */ +struct netmap_mem_ptg { + struct netmap_mem_d up; + + vm_paddr_t nm_paddr; /* physical address in the guest */ + void *nm_addr; /* virtual address in the guest */ + struct netmap_lut buf_lut; /* lookup table for BUF pool in the guest */ + nm_memid_t host_mem_id; /* allocator identifier in the host */ + struct ptnetmap_memdev *ptn_dev;/* ptnetmap memdev */ + struct mem_pt_if *pt_ifs; /* list of interfaces in passthrough */ +}; + +/* Link a passthrough interface to a passthrough netmap allocator. */ +static int +netmap_mem_pt_guest_ifp_add(struct netmap_mem_d *nmd, struct ifnet *ifp, + unsigned int nifp_offset) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + struct mem_pt_if *ptif = nm_os_malloc(sizeof(*ptif)); + + if (!ptif) { + return ENOMEM; + } + + NMA_LOCK(nmd); + + ptif->ifp = ifp; + ptif->nifp_offset = nifp_offset; + + if (ptnmd->pt_ifs) { + ptif->next = ptnmd->pt_ifs; + } + ptnmd->pt_ifs = ptif; + + NMA_UNLOCK(nmd); + + D("added (ifp=%p,nifp_offset=%u)", ptif->ifp, ptif->nifp_offset); + + return 0; +} + +/* Called with NMA_LOCK(nmd) held. */ +static struct mem_pt_if * +netmap_mem_pt_guest_ifp_lookup(struct netmap_mem_d *nmd, struct ifnet *ifp) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + struct mem_pt_if *curr; + + for (curr = ptnmd->pt_ifs; curr; curr = curr->next) { + if (curr->ifp == ifp) { + return curr; + } + } + + return NULL; +} + +/* Unlink a passthrough interface from a passthrough netmap allocator. */ +int +netmap_mem_pt_guest_ifp_del(struct netmap_mem_d *nmd, struct ifnet *ifp) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + struct mem_pt_if *prev = NULL; + struct mem_pt_if *curr; + int ret = -1; + + NMA_LOCK(nmd); + + for (curr = ptnmd->pt_ifs; curr; curr = curr->next) { + if (curr->ifp == ifp) { + if (prev) { + prev->next = curr->next; + } else { + ptnmd->pt_ifs = curr->next; + } + D("removed (ifp=%p,nifp_offset=%u)", + curr->ifp, curr->nifp_offset); + nm_os_free(curr); + ret = 0; + break; + } + prev = curr; + } + + NMA_UNLOCK(nmd); + + return ret; +} + +static int +netmap_mem_pt_guest_get_lut(struct netmap_mem_d *nmd, struct netmap_lut *lut) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + + if (!(nmd->flags & NETMAP_MEM_FINALIZED)) { + return EINVAL; + } + + *lut = ptnmd->buf_lut; + return 0; +} + +static int +netmap_mem_pt_guest_get_info(struct netmap_mem_d *nmd, u_int *size, + u_int *memflags, uint16_t *id) +{ + int error = 0; + + NMA_LOCK(nmd); + + error = nmd->ops->nmd_config(nmd); + if (error) + goto out; + + if (size) + *size = nmd->nm_totalsize; + if (memflags) + *memflags = nmd->flags; + if (id) + *id = nmd->nm_id; + +out: + NMA_UNLOCK(nmd); + + return error; +} + +static vm_paddr_t +netmap_mem_pt_guest_ofstophys(struct netmap_mem_d *nmd, vm_ooffset_t off) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + vm_paddr_t paddr; + /* if the offset is valid, just return csb->base_addr + off */ + paddr = (vm_paddr_t)(ptnmd->nm_paddr + off); + ND("off %lx padr %lx", off, (unsigned long)paddr); + return paddr; +} + +static int +netmap_mem_pt_guest_config(struct netmap_mem_d *nmd) +{ + /* nothing to do, we are configured on creation + * and configuration never changes thereafter + */ + return 0; +} + +static int +netmap_mem_pt_guest_finalize(struct netmap_mem_d *nmd) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + uint64_t mem_size; + uint32_t bufsize; + uint32_t nbuffers; + uint32_t poolofs; + vm_paddr_t paddr; + char *vaddr; + int i; + int error = 0; + + nmd->active++; + + if (nmd->flags & NETMAP_MEM_FINALIZED) + goto out; + + if (ptnmd->ptn_dev == NULL) { + D("ptnetmap memdev not attached"); + error = ENOMEM; + goto err; + } + /* Map memory through ptnetmap-memdev BAR. */ + error = nm_os_pt_memdev_iomap(ptnmd->ptn_dev, &ptnmd->nm_paddr, + &ptnmd->nm_addr, &mem_size); + if (error) + goto err; + + /* Initialize the lut using the information contained in the + * ptnetmap memory device. */ + bufsize = nm_os_pt_memdev_ioread(ptnmd->ptn_dev, + PTNET_MDEV_IO_BUF_POOL_OBJSZ); + nbuffers = nm_os_pt_memdev_ioread(ptnmd->ptn_dev, + PTNET_MDEV_IO_BUF_POOL_OBJNUM); + + /* allocate the lut */ + if (ptnmd->buf_lut.lut == NULL) { + D("allocating lut"); + ptnmd->buf_lut.lut = nm_alloc_lut(nbuffers); + if (ptnmd->buf_lut.lut == NULL) { + D("lut allocation failed"); + return ENOMEM; + } + } + + /* we have physically contiguous memory mapped through PCI BAR */ + poolofs = nm_os_pt_memdev_ioread(ptnmd->ptn_dev, + PTNET_MDEV_IO_BUF_POOL_OFS); + vaddr = (char *)(ptnmd->nm_addr) + poolofs; + paddr = ptnmd->nm_paddr + poolofs; + + for (i = 0; i < nbuffers; i++) { + ptnmd->buf_lut.lut[i].vaddr = vaddr; + ptnmd->buf_lut.lut[i].paddr = paddr; + vaddr += bufsize; + paddr += bufsize; + } + + ptnmd->buf_lut.objtotal = nbuffers; + ptnmd->buf_lut.objsize = bufsize; + nmd->nm_totalsize = (unsigned int)mem_size; + + nmd->flags |= NETMAP_MEM_FINALIZED; +out: + return 0; +err: + nmd->active--; + return error; +} + +static void +netmap_mem_pt_guest_deref(struct netmap_mem_d *nmd) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + + nmd->active--; + if (nmd->active <= 0 && + (nmd->flags & NETMAP_MEM_FINALIZED)) { + nmd->flags &= ~NETMAP_MEM_FINALIZED; + /* unmap ptnetmap-memdev memory */ + if (ptnmd->ptn_dev) { + nm_os_pt_memdev_iounmap(ptnmd->ptn_dev); + } + ptnmd->nm_addr = 0; + ptnmd->nm_paddr = 0; + } +} + +static ssize_t +netmap_mem_pt_guest_if_offset(struct netmap_mem_d *nmd, const void *vaddr) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)nmd; + + return (const char *)(vaddr) - (char *)(ptnmd->nm_addr); +} + +static void +netmap_mem_pt_guest_delete(struct netmap_mem_d *nmd) +{ + if (nmd == NULL) + return; + if (netmap_verbose) + D("deleting %p", nmd); + if (nmd->active > 0) + D("bug: deleting mem allocator with active=%d!", nmd->active); + if (netmap_verbose) + D("done deleting %p", nmd); + NMA_LOCK_DESTROY(nmd); + nm_os_free(nmd); +} + +static struct netmap_if * +netmap_mem_pt_guest_if_new(struct netmap_adapter *na) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)na->nm_mem; + struct mem_pt_if *ptif; + struct netmap_if *nifp = NULL; + + NMA_LOCK(na->nm_mem); + + ptif = netmap_mem_pt_guest_ifp_lookup(na->nm_mem, na->ifp); + if (ptif == NULL) { + D("Error: interface %p is not in passthrough", na->ifp); + goto out; + } + + nifp = (struct netmap_if *)((char *)(ptnmd->nm_addr) + + ptif->nifp_offset); + NMA_UNLOCK(na->nm_mem); +out: + return nifp; +} + +static void +netmap_mem_pt_guest_if_delete(struct netmap_adapter *na, struct netmap_if *nifp) +{ + struct mem_pt_if *ptif; + + NMA_LOCK(na->nm_mem); + ptif = netmap_mem_pt_guest_ifp_lookup(na->nm_mem, na->ifp); + if (ptif == NULL) { + D("Error: interface %p is not in passthrough", na->ifp); + } + NMA_UNLOCK(na->nm_mem); +} + +static int +netmap_mem_pt_guest_rings_create(struct netmap_adapter *na) +{ + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)na->nm_mem; + struct mem_pt_if *ptif; + struct netmap_if *nifp; + int i, error = -1; + + NMA_LOCK(na->nm_mem); + + ptif = netmap_mem_pt_guest_ifp_lookup(na->nm_mem, na->ifp); + if (ptif == NULL) { + D("Error: interface %p is not in passthrough", na->ifp); + goto out; + } + + + /* point each kring to the corresponding backend ring */ + nifp = (struct netmap_if *)((char *)ptnmd->nm_addr + ptif->nifp_offset); + for (i = 0; i <= na->num_tx_rings; i++) { + struct netmap_kring *kring = na->tx_rings + i; + if (kring->ring) + continue; + kring->ring = (struct netmap_ring *) + ((char *)nifp + nifp->ring_ofs[i]); + } + for (i = 0; i <= na->num_rx_rings; i++) { + struct netmap_kring *kring = na->rx_rings + i; + if (kring->ring) + continue; + kring->ring = (struct netmap_ring *) + ((char *)nifp + + nifp->ring_ofs[i + na->num_tx_rings + 1]); + } + + error = 0; +out: + NMA_UNLOCK(na->nm_mem); + + return error; +} + +static void +netmap_mem_pt_guest_rings_delete(struct netmap_adapter *na) +{ + /* TODO: remove?? */ +#if 0 + struct netmap_mem_ptg *ptnmd = (struct netmap_mem_ptg *)na->nm_mem; + struct mem_pt_if *ptif = netmap_mem_pt_guest_ifp_lookup(na->nm_mem, + na->ifp); +#endif +} + +static struct netmap_mem_ops netmap_mem_pt_guest_ops = { + .nmd_get_lut = netmap_mem_pt_guest_get_lut, + .nmd_get_info = netmap_mem_pt_guest_get_info, + .nmd_ofstophys = netmap_mem_pt_guest_ofstophys, + .nmd_config = netmap_mem_pt_guest_config, + .nmd_finalize = netmap_mem_pt_guest_finalize, + .nmd_deref = netmap_mem_pt_guest_deref, + .nmd_if_offset = netmap_mem_pt_guest_if_offset, + .nmd_delete = netmap_mem_pt_guest_delete, + .nmd_if_new = netmap_mem_pt_guest_if_new, + .nmd_if_delete = netmap_mem_pt_guest_if_delete, + .nmd_rings_create = netmap_mem_pt_guest_rings_create, + .nmd_rings_delete = netmap_mem_pt_guest_rings_delete }; + +/* Called with nm_mem_list_lock held. */ +static struct netmap_mem_d * +netmap_mem_pt_guest_find_memid(nm_memid_t mem_id) +{ + struct netmap_mem_d *mem = NULL; + struct netmap_mem_d *scan = netmap_last_mem_d; + + do { + /* find ptnetmap allocator through host ID */ + if (scan->ops->nmd_deref == netmap_mem_pt_guest_deref && + ((struct netmap_mem_ptg *)(scan))->host_mem_id == mem_id) { + mem = scan; + mem->refcount++; + break; + } + scan = scan->next; + } while (scan != netmap_last_mem_d); + + return mem; +} + +/* Called with nm_mem_list_lock held. */ +static struct netmap_mem_d * +netmap_mem_pt_guest_create(nm_memid_t mem_id) +{ + struct netmap_mem_ptg *ptnmd; + int err = 0; + + ptnmd = nm_os_malloc(sizeof(struct netmap_mem_ptg)); + if (ptnmd == NULL) { + err = ENOMEM; + goto error; + } + + ptnmd->up.ops = &netmap_mem_pt_guest_ops; + ptnmd->host_mem_id = mem_id; + ptnmd->pt_ifs = NULL; + + /* Assign new id in the guest (We have the lock) */ + err = nm_mem_assign_id_locked(&ptnmd->up); + if (err) + goto error; + + ptnmd->up.flags &= ~NETMAP_MEM_FINALIZED; + ptnmd->up.flags |= NETMAP_MEM_IO; + + NMA_LOCK_INIT(&ptnmd->up); + + snprintf(ptnmd->up.name, NM_MEM_NAMESZ, "%d", ptnmd->up.nm_id); + + + return &ptnmd->up; +error: + netmap_mem_pt_guest_delete(&ptnmd->up); + return NULL; +} + +/* + * find host id in guest allocators and create guest allocator + * if it is not there + */ +static struct netmap_mem_d * +netmap_mem_pt_guest_get(nm_memid_t mem_id) +{ + struct netmap_mem_d *nmd; + + NM_MTX_LOCK(nm_mem_list_lock); + nmd = netmap_mem_pt_guest_find_memid(mem_id); + if (nmd == NULL) { + nmd = netmap_mem_pt_guest_create(mem_id); + } + NM_MTX_UNLOCK(nm_mem_list_lock); + + return nmd; +} + +/* + * The guest allocator can be created by ptnetmap_memdev (during the device + * attach) or by ptnetmap device (ptnet), during the netmap_attach. + * + * The order is not important (we have different order in LINUX and FreeBSD). + * The first one, creates the device, and the second one simply attaches it. + */ + +/* Called when ptnetmap_memdev is attaching, to attach a new allocator in + * the guest */ +struct netmap_mem_d * +netmap_mem_pt_guest_attach(struct ptnetmap_memdev *ptn_dev, nm_memid_t mem_id) +{ + struct netmap_mem_d *nmd; + struct netmap_mem_ptg *ptnmd; + + nmd = netmap_mem_pt_guest_get(mem_id); + + /* assign this device to the guest allocator */ + if (nmd) { + ptnmd = (struct netmap_mem_ptg *)nmd; + ptnmd->ptn_dev = ptn_dev; + } + + return nmd; +} + +/* Called when ptnet device is attaching */ +struct netmap_mem_d * +netmap_mem_pt_guest_new(struct ifnet *ifp, + unsigned int nifp_offset, + unsigned int memid) +{ + struct netmap_mem_d *nmd; + + if (ifp == NULL) { + return NULL; + } + + nmd = netmap_mem_pt_guest_get((nm_memid_t)memid); + + if (nmd) { + netmap_mem_pt_guest_ifp_add(nmd, ifp, nifp_offset); + } + + return nmd; +} + +#endif /* WITH_PTNETMAP_GUEST */ diff -u -r -N usr/src/sys/dev/netmap/netmap_mem2.h /usr/src/sys/dev/netmap/netmap_mem2.h --- usr/src/sys/dev/netmap/netmap_mem2.h 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_mem2.h 2016-11-23 16:57:57.852518000 +0000 @@ -1,5 +1,8 @@ /* - * Copyright (C) 2012-2014 Matteo Landi, Luigi Rizzo, Giuseppe Lettieri. All rights reserved. + * Copyright (C) 2012-2014 Matteo Landi + * Copyright (C) 2012-2016 Luigi Rizzo + * Copyright (C) 2012-2016 Giuseppe Lettieri + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -24,7 +27,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/netmap_mem2.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/dev/netmap/netmap_mem2.c 234290 2012-04-14 16:44:18Z luigi $ * * (New) memory allocator for netmap */ @@ -116,9 +119,14 @@ */ extern struct netmap_mem_d nm_mem; +typedef uint16_t nm_memid_t; -void netmap_mem_get_lut(struct netmap_mem_d *, struct netmap_lut *); +int netmap_mem_get_lut(struct netmap_mem_d *, struct netmap_lut *); +nm_memid_t netmap_mem_get_id(struct netmap_mem_d *); vm_paddr_t netmap_mem_ofstophys(struct netmap_mem_d *, vm_ooffset_t); +#ifdef _WIN32 +PMDL win32_build_user_vm_map(struct netmap_mem_d* nmd); +#endif int netmap_mem_finalize(struct netmap_mem_d *, struct netmap_adapter *); int netmap_mem_init(void); void netmap_mem_fini(void); @@ -127,35 +135,27 @@ int netmap_mem_rings_create(struct netmap_adapter *); void netmap_mem_rings_delete(struct netmap_adapter *); void netmap_mem_deref(struct netmap_mem_d *, struct netmap_adapter *); +int netmap_mem2_get_pool_info(struct netmap_mem_d *, u_int, u_int *, u_int *); int netmap_mem_get_info(struct netmap_mem_d *, u_int *size, u_int *memflags, uint16_t *id); ssize_t netmap_mem_if_offset(struct netmap_mem_d *, const void *vaddr); -struct netmap_mem_d* netmap_mem_private_new(const char *name, - u_int txr, u_int txd, u_int rxr, u_int rxd, u_int extra_bufs, u_int npipes, - int* error); +struct netmap_mem_d* netmap_mem_private_new( u_int txr, u_int txd, u_int rxr, u_int rxd, + u_int extra_bufs, u_int npipes, int* error); void netmap_mem_delete(struct netmap_mem_d *); -//#define NM_DEBUG_MEM_PUTGET 1 - -#ifdef NM_DEBUG_MEM_PUTGET - -#define netmap_mem_get(nmd) \ - do { \ - __netmap_mem_get(nmd, __FUNCTION__, __LINE__); \ - } while (0) - -#define netmap_mem_put(nmd) \ - do { \ - __netmap_mem_put(nmd, __FUNCTION__, __LINE__); \ - } while (0) - -void __netmap_mem_get(struct netmap_mem_d *, const char *, int); -void __netmap_mem_put(struct netmap_mem_d *, const char *, int); -#else /* !NM_DEBUG_MEM_PUTGET */ - -void netmap_mem_get(struct netmap_mem_d *); +struct netmap_mem_d* netmap_mem_get(struct netmap_mem_d *); void netmap_mem_put(struct netmap_mem_d *); +struct netmap_mem_d* netmap_mem_find(nm_memid_t); + +#ifdef WITH_PTNETMAP_GUEST +struct netmap_mem_d* netmap_mem_pt_guest_new(struct ifnet *, + unsigned int nifp_offset, + unsigned int memid); +struct ptnetmap_memdev; +struct netmap_mem_d* netmap_mem_pt_guest_attach(struct ptnetmap_memdev *, uint16_t); +int netmap_mem_pt_guest_ifp_del(struct netmap_mem_d *, struct ifnet *); +#endif /* WITH_PTNETMAP_GUEST */ -#endif /* !NM_DEBUG_PUTGET */ +int netmap_mem_pools_info_get(struct nmreq *, struct netmap_adapter *); #define NETMAP_MEM_PRIVATE 0x2 /* allocator uses private address space */ #define NETMAP_MEM_IO 0x4 /* the underlying memory is mmapped I/O */ diff -u -r -N usr/src/sys/dev/netmap/netmap_monitor.c /usr/src/sys/dev/netmap/netmap_monitor.c --- usr/src/sys/dev/netmap/netmap_monitor.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_monitor.c 2016-12-01 09:51:28.715138000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2014 Giuseppe Lettieri. All rights reserved. + * Copyright (C) 2014-2016 Giuseppe Lettieri + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -24,7 +25,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/dev/netmap/netmap_monitor.c 285696 2015-07-19 18:04:51Z luigi $ + * $FreeBSD: head/sys/dev/netmap/netmap_zmon.c 270063 2014-08-16 15:00:01Z luigi $ * * Monitors * @@ -101,6 +102,8 @@ #warning OSX support is only partial #include "osx_glue.h" +#elif defined(_WIN32) +#include "win_glue.h" #else #error Unsupported platform @@ -145,19 +148,23 @@ netmap_monitor_rxsync(struct netmap_kring *kring, int flags) { ND("%s %x", kring->name, flags); - kring->nr_hwcur = kring->rcur; + kring->nr_hwcur = kring->rhead; mb(); return 0; } /* nm_krings_create callbacks for monitors. - * We could use the default netmap_hw_krings_zmon, but - * we don't need the mbq. */ static int netmap_monitor_krings_create(struct netmap_adapter *na) { - return netmap_krings_create(na, 0); + int error = netmap_krings_create(na, 0); + if (error) + return error; + /* override the host rings callbacks */ + na->tx_rings[na->num_tx_rings].nm_sync = netmap_monitor_txsync; + na->rx_rings[na->num_rx_rings].nm_sync = netmap_monitor_rxsync; + return 0; } /* nm_krings_delete callback for monitors */ @@ -178,15 +185,16 @@ static int nm_monitor_alloc(struct netmap_kring *kring, u_int n) { - size_t len; + size_t old_len, len; struct netmap_kring **nm; if (n <= kring->max_monitors) /* we already have more entries that requested */ return 0; + old_len = sizeof(struct netmap_kring *)*kring->max_monitors; len = sizeof(struct netmap_kring *) * n; - nm = realloc(kring->monitors, len, M_DEVBUF, M_NOWAIT | M_ZERO); + nm = nm_os_realloc(kring->monitors, len, old_len); if (nm == NULL) return ENOMEM; @@ -205,7 +213,7 @@ D("freeing not empty monitor array for %s (%d dangling monitors)!", kring->name, kring->n_monitors); } - free(kring->monitors, M_DEVBUF); + nm_os_free(kring->monitors); kring->monitors = NULL; kring->max_monitors = 0; kring->n_monitors = 0; @@ -229,20 +237,20 @@ static int netmap_monitor_add(struct netmap_kring *mkring, struct netmap_kring *kring, int zcopy) { - int error = 0; + int error = NM_IRQ_COMPLETED; /* sinchronize with concurrently running nm_sync()s */ - nm_kr_get(kring); + nm_kr_stop(kring, NM_KR_LOCKED); /* make sure the monitor array exists and is big enough */ error = nm_monitor_alloc(kring, kring->n_monitors + 1); if (error) goto out; kring->monitors[kring->n_monitors] = mkring; - mkring->mon_pos = kring->n_monitors; + mkring->mon_pos[kring->tx] = kring->n_monitors; kring->n_monitors++; if (kring->n_monitors == 1) { /* this is the first monitor, intercept callbacks */ - D("%s: intercept callbacks on %s", mkring->name, kring->name); + ND("%s: intercept callbacks on %s", mkring->name, kring->name); kring->mon_sync = kring->nm_sync; /* zcopy monitors do not override nm_notify(), but * we save the original one regardless, so that @@ -265,7 +273,7 @@ } out: - nm_kr_put(kring); + nm_kr_start(kring); return error; } @@ -276,28 +284,30 @@ static void netmap_monitor_del(struct netmap_kring *mkring, struct netmap_kring *kring) { + uint32_t mon_pos; /* sinchronize with concurrently running nm_sync()s */ - nm_kr_get(kring); + nm_kr_stop(kring, NM_KR_LOCKED); kring->n_monitors--; - if (mkring->mon_pos != kring->n_monitors) { - kring->monitors[mkring->mon_pos] = kring->monitors[kring->n_monitors]; - kring->monitors[mkring->mon_pos]->mon_pos = mkring->mon_pos; + mon_pos = mkring->mon_pos[kring->tx]; + if (mon_pos != kring->n_monitors) { + kring->monitors[mon_pos] = kring->monitors[kring->n_monitors]; + kring->monitors[mon_pos]->mon_pos[kring->tx] = mon_pos; } kring->monitors[kring->n_monitors] = NULL; if (kring->n_monitors == 0) { /* this was the last monitor, restore callbacks and delete monitor array */ - D("%s: restoring sync on %s: %p", mkring->name, kring->name, kring->mon_sync); + ND("%s: restoring sync on %s: %p", mkring->name, kring->name, kring->mon_sync); kring->nm_sync = kring->mon_sync; kring->mon_sync = NULL; if (kring->tx == NR_RX) { - D("%s: restoring notify on %s: %p", + ND("%s: restoring notify on %s: %p", mkring->name, kring->name, kring->mon_notify); kring->nm_notify = kring->mon_notify; kring->mon_notify = NULL; } nm_monitor_dealloc(kring); } - nm_kr_put(kring); + nm_kr_start(kring); } @@ -316,7 +326,7 @@ for_rx_tx(t) { u_int i; - for (i = 0; i < nma_get_nrings(na, t); i++) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { struct netmap_kring *kring = &NMR(na, t)[i]; u_int j; @@ -326,8 +336,10 @@ struct netmap_monitor_adapter *mna = (struct netmap_monitor_adapter *)mkring->na; /* forget about this adapter */ - netmap_adapter_put(mna->priv.np_na); - mna->priv.np_na = NULL; + if (mna->priv.np_na != NULL) { + netmap_adapter_put(mna->priv.np_na); + mna->priv.np_na = NULL; + } } } } @@ -346,7 +358,7 @@ struct netmap_adapter *pna = priv->np_na; struct netmap_kring *kring, *mkring; int i; - enum txrx t; + enum txrx t, s; ND("%p: onoff %d", na, onoff); if (onoff) { @@ -356,27 +368,48 @@ return ENXIO; } for_rx_tx(t) { - if (mna->flags & nm_txrx2flag(t)) { - for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { - kring = &NMR(pna, t)[i]; - mkring = &na->rx_rings[i]; - netmap_monitor_add(mkring, kring, zmon); + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + mkring = &NMR(na, t)[i]; + if (!nm_kring_pending_on(mkring)) + continue; + mkring->nr_mode = NKR_NETMAP_ON; + if (t == NR_TX) + continue; + for_rx_tx(s) { + if (i > nma_get_nrings(pna, s)) + continue; + if (mna->flags & nm_txrx2flag(s)) { + kring = &NMR(pna, s)[i]; + netmap_monitor_add(mkring, kring, zmon); + } } } } na->na_flags |= NAF_NETMAP_ON; } else { - if (pna == NULL) { - D("%s: parent left netmap mode, nothing to restore", na->name); - return 0; - } - na->na_flags &= ~NAF_NETMAP_ON; + if (na->active_fds == 0) + na->na_flags &= ~NAF_NETMAP_ON; for_rx_tx(t) { - if (mna->flags & nm_txrx2flag(t)) { - for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { - kring = &NMR(pna, t)[i]; - mkring = &na->rx_rings[i]; - netmap_monitor_del(mkring, kring); + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + mkring = &NMR(na, t)[i]; + if (!nm_kring_pending_off(mkring)) + continue; + mkring->nr_mode = NKR_NETMAP_OFF; + if (t == NR_TX) + continue; + /* we cannot access the parent krings if the parent + * has left netmap mode. This is signaled by a NULL + * pna pointer + */ + if (pna == NULL) + continue; + for_rx_tx(s) { + if (i > nma_get_nrings(pna, s)) + continue; + if (mna->flags & nm_txrx2flag(s)) { + kring = &NMR(pna, s)[i]; + netmap_monitor_del(mkring, kring); + } } } } @@ -386,7 +419,7 @@ /* **************************************************************** - * functions specific for zero-copy monitors + * functions specific for zero-copy monitors **************************************************************** */ @@ -414,11 +447,11 @@ /* get the relased slots (rel_slots) */ if (tx == NR_TX) { - beg = kring->nr_hwtail; + beg = kring->nr_hwtail + 1; error = kring->mon_sync(kring, flags); if (error) return error; - end = kring->nr_hwtail; + end = kring->nr_hwtail + 1; } else { /* NR_RX */ beg = kring->nr_hwcur; end = kring->rhead; @@ -453,10 +486,10 @@ /* swap min(free_slots, rel_slots) slots */ if (free_slots < rel_slots) { beg += (rel_slots - free_slots); - if (beg >= kring->nkr_num_slots) - beg -= kring->nkr_num_slots; rel_slots = free_slots; } + if (unlikely(beg >= kring->nkr_num_slots)) + beg -= kring->nkr_num_slots; sent = rel_slots; for ( ; rel_slots; rel_slots--) { @@ -534,7 +567,7 @@ /* **************************************************************** - * functions specific for copy monitors + * functions specific for copy monitors **************************************************************** */ @@ -652,17 +685,27 @@ static int netmap_monitor_parent_notify(struct netmap_kring *kring, int flags) { + int (*notify)(struct netmap_kring*, int); ND(5, "%s %x", kring->name, flags); /* ?xsync callbacks have tryget called by their callers * (NIOCREGIF and poll()), but here we have to call it * by ourself */ - if (nm_kr_tryget(kring)) - goto out; - netmap_monitor_parent_rxsync(kring, NAF_FORCE_READ); + if (nm_kr_tryget(kring, 0, NULL)) { + /* in all cases, just skip the sync */ + return NM_IRQ_COMPLETED; + } + if (kring->n_monitors > 0) { + netmap_monitor_parent_rxsync(kring, NAF_FORCE_READ); + notify = kring->mon_notify; + } else { + /* we are no longer monitoring this ring, so both + * mon_sync and mon_notify are NULL + */ + notify = kring->nm_notify; + } nm_kr_put(kring); -out: - return kring->mon_notify(kring, flags); + return notify(kring, flags); } @@ -691,20 +734,27 @@ struct nmreq pnmr; struct netmap_adapter *pna; /* parent adapter */ struct netmap_monitor_adapter *mna; + struct ifnet *ifp = NULL; int i, error; enum txrx t; int zcopy = (nmr->nr_flags & NR_ZCOPY_MON); char monsuff[10] = ""; if ((nmr->nr_flags & (NR_MONITOR_TX | NR_MONITOR_RX)) == 0) { + if (nmr->nr_flags & NR_ZCOPY_MON) { + /* the flag makes no sense unless you are + * creating a monitor + */ + return EINVAL; + } ND("not a monitor"); return 0; } /* this is a request for a monitor adapter */ - D("flags %x", nmr->nr_flags); + ND("flags %x", nmr->nr_flags); - mna = malloc(sizeof(*mna), M_DEVBUF, M_NOWAIT | M_ZERO); + mna = nm_os_malloc(sizeof(*mna)); if (mna == NULL) { D("memory error"); return ENOMEM; @@ -716,13 +766,14 @@ * except other monitors. */ memcpy(&pnmr, nmr, sizeof(pnmr)); - pnmr.nr_flags &= ~(NR_MONITOR_TX | NR_MONITOR_RX); - error = netmap_get_na(&pnmr, &pna, create); + pnmr.nr_flags &= ~(NR_MONITOR_TX | NR_MONITOR_RX | NR_ZCOPY_MON); + error = netmap_get_na(&pnmr, &pna, &ifp, create); if (error) { D("parent lookup failed: %d", error); + nm_os_free(mna); return error; } - D("found parent: %s", pna->name); + ND("found parent: %s", pna->name); if (!nm_netmap_on(pna)) { /* parent not in netmap mode */ @@ -770,8 +821,10 @@ /* to have zero copy, we need to use the same memory allocator * as the monitored port */ - mna->up.nm_mem = pna->nm_mem; + mna->up.nm_mem = netmap_mem_get(pna->nm_mem); mna->up.na_lut = pna->na_lut; + /* and the allocator cannot be changed */ + mna->up.na_flags |= NAF_MEM_OWNER; } else { /* normal monitors are incompatible with zero copy ones */ for_rx_tx(t) { @@ -794,7 +847,7 @@ } /* the monitor supports the host rings iff the parent does */ - mna->up.na_flags = (pna->na_flags & NAF_HOST_RINGS); + mna->up.na_flags |= (pna->na_flags & NAF_HOST_RINGS); /* a do-nothing txsync: monitors cannot be used to inject packets */ mna->up.nm_txsync = netmap_monitor_txsync; mna->up.nm_rxsync = netmap_monitor_rxsync; @@ -829,20 +882,18 @@ *na = &mna->up; netmap_adapter_get(*na); - /* write the configuration back */ - nmr->nr_tx_rings = mna->up.num_tx_rings; - nmr->nr_rx_rings = mna->up.num_rx_rings; - nmr->nr_tx_slots = mna->up.num_tx_desc; - nmr->nr_rx_slots = mna->up.num_rx_desc; - /* keep the reference to the parent */ - D("monitor ok"); + ND("monitor ok"); + + /* drop the reference to the ifp, if any */ + if (ifp) + if_rele(ifp); return 0; put_out: - netmap_adapter_put(pna); - free(mna, M_DEVBUF); + netmap_unget_na(pna, ifp); + nm_os_free(mna); return error; } diff -u -r -N usr/src/sys/dev/netmap/netmap_offloadings.c /usr/src/sys/dev/netmap/netmap_offloadings.c --- usr/src/sys/dev/netmap/netmap_offloadings.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_offloadings.c 2016-11-23 16:57:57.853513000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2014 Vincenzo Maffione. All rights reserved. + * Copyright (C) 2014-2015 Vincenzo Maffione + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -23,7 +24,7 @@ * SUCH DAMAGE. */ -/* $FreeBSD: releng/11.0/sys/dev/netmap/netmap_offloadings.c 298955 2016-05-03 03:41:25Z pfg $ */ +/* $FreeBSD: head/sys/dev/netmap/netmap_offloadings.c 261909 2014-02-15 04:53:04Z luigi $ */ #if defined(__FreeBSD__) #include <sys/cdefs.h> /* prerequisite */ @@ -31,9 +32,9 @@ #include <sys/types.h> #include <sys/errno.h> #include <sys/param.h> /* defines used in kernel.h */ -#include <sys/malloc.h> /* types used in module initialization */ #include <sys/kernel.h> /* types used in module initialization */ #include <sys/sockio.h> +#include <sys/malloc.h> #include <sys/socketvar.h> /* struct socket */ #include <sys/socket.h> /* sockaddrs */ #include <net/if.h> @@ -64,21 +65,21 @@ /* This routine is called by bdg_mismatch_datapath() when it finishes * accumulating bytes for a segment, in order to fix some fields in the * segment headers (which still contain the same content as the header - * of the original GSO packet). 'buf' points to the beginning (e.g. - * the ethernet header) of the segment, and 'len' is its length. + * of the original GSO packet). 'pkt' points to the beginning of the IP + * header of the segment, while 'len' is the length of the IP packet. */ -static void gso_fix_segment(uint8_t *buf, size_t len, u_int idx, - u_int segmented_bytes, u_int last_segment, - u_int tcp, u_int iphlen) +static void +gso_fix_segment(uint8_t *pkt, size_t len, u_int ipv4, u_int iphlen, u_int tcp, + u_int idx, u_int segmented_bytes, u_int last_segment) { - struct nm_iphdr *iph = (struct nm_iphdr *)(buf + 14); - struct nm_ipv6hdr *ip6h = (struct nm_ipv6hdr *)(buf + 14); + struct nm_iphdr *iph = (struct nm_iphdr *)(pkt); + struct nm_ipv6hdr *ip6h = (struct nm_ipv6hdr *)(pkt); uint16_t *check = NULL; uint8_t *check_data = NULL; - if (iphlen == 20) { + if (ipv4) { /* Set the IPv4 "Total Length" field. */ - iph->tot_len = htobe16(len-14); + iph->tot_len = htobe16(len); ND("ip total length %u", be16toh(ip->tot_len)); /* Set the IPv4 "Identification" field. */ @@ -87,15 +88,15 @@ /* Compute and insert the IPv4 header checksum. */ iph->check = 0; - iph->check = nm_csum_ipv4(iph); + iph->check = nm_os_csum_ipv4(iph); ND("IP csum %x", be16toh(iph->check)); - } else {/* if (iphlen == 40) */ + } else { /* Set the IPv6 "Payload Len" field. */ - ip6h->payload_len = htobe16(len-14-iphlen); + ip6h->payload_len = htobe16(len-iphlen); } if (tcp) { - struct nm_tcphdr *tcph = (struct nm_tcphdr *)(buf + 14 + iphlen); + struct nm_tcphdr *tcph = (struct nm_tcphdr *)(pkt + iphlen); /* Set the TCP sequence number. */ tcph->seq = htobe32(be32toh(tcph->seq) + segmented_bytes); @@ -110,10 +111,10 @@ check = &tcph->check; check_data = (uint8_t *)tcph; } else { /* UDP */ - struct nm_udphdr *udph = (struct nm_udphdr *)(buf + 14 + iphlen); + struct nm_udphdr *udph = (struct nm_udphdr *)(pkt + iphlen); /* Set the UDP 'Length' field. */ - udph->len = htobe16(len-14-iphlen); + udph->len = htobe16(len-iphlen); check = &udph->check; check_data = (uint8_t *)udph; @@ -121,48 +122,80 @@ /* Compute and insert TCP/UDP checksum. */ *check = 0; - if (iphlen == 20) - nm_csum_tcpudp_ipv4(iph, check_data, len-14-iphlen, check); + if (ipv4) + nm_os_csum_tcpudp_ipv4(iph, check_data, len-iphlen, check); else - nm_csum_tcpudp_ipv6(ip6h, check_data, len-14-iphlen, check); + nm_os_csum_tcpudp_ipv6(ip6h, check_data, len-iphlen, check); ND("TCP/UDP csum %x", be16toh(*check)); } +static int +vnet_hdr_is_bad(struct nm_vnet_hdr *vh) +{ + uint8_t gso_type = vh->gso_type & ~VIRTIO_NET_HDR_GSO_ECN; + + return ( + (gso_type != VIRTIO_NET_HDR_GSO_NONE && + gso_type != VIRTIO_NET_HDR_GSO_TCPV4 && + gso_type != VIRTIO_NET_HDR_GSO_UDP && + gso_type != VIRTIO_NET_HDR_GSO_TCPV6) + || + (vh->flags & ~(VIRTIO_NET_HDR_F_NEEDS_CSUM + | VIRTIO_NET_HDR_F_DATA_VALID)) + ); +} /* The VALE mismatch datapath implementation. */ -void bdg_mismatch_datapath(struct netmap_vp_adapter *na, - struct netmap_vp_adapter *dst_na, - struct nm_bdg_fwd *ft_p, struct netmap_ring *ring, - u_int *j, u_int lim, u_int *howmany) +void +bdg_mismatch_datapath(struct netmap_vp_adapter *na, + struct netmap_vp_adapter *dst_na, + const struct nm_bdg_fwd *ft_p, + struct netmap_ring *dst_ring, + u_int *j, u_int lim, u_int *howmany) { - struct netmap_slot *slot = NULL; + struct netmap_slot *dst_slot = NULL; struct nm_vnet_hdr *vh = NULL; - /* Number of source slots to process. */ - u_int frags = ft_p->ft_frags; - struct nm_bdg_fwd *ft_end = ft_p + frags; + const struct nm_bdg_fwd *ft_end = ft_p + ft_p->ft_frags; /* Source and destination pointers. */ uint8_t *dst, *src; size_t src_len, dst_len; + /* Indices and counters for the destination ring. */ u_int j_start = *j; + u_int j_cur = j_start; u_int dst_slots = 0; - /* If the source port uses the offloadings, while destination doesn't, - * we grab the source virtio-net header and do the offloadings here. - */ - if (na->virt_hdr_len && !dst_na->virt_hdr_len) { - vh = (struct nm_vnet_hdr *)ft_p->ft_buf; + if (unlikely(ft_p == ft_end)) { + RD(3, "No source slots to process"); + return; } /* Init source and dest pointers. */ src = ft_p->ft_buf; src_len = ft_p->ft_len; - slot = &ring->slot[*j]; - dst = NMB(&dst_na->up, slot); + dst_slot = &dst_ring->slot[j_cur]; + dst = NMB(&dst_na->up, dst_slot); dst_len = src_len; + /* If the source port uses the offloadings, while destination doesn't, + * we grab the source virtio-net header and do the offloadings here. + */ + if (na->up.virt_hdr_len && !dst_na->up.virt_hdr_len) { + vh = (struct nm_vnet_hdr *)src; + /* Initial sanity check on the source virtio-net header. If + * something seems wrong, just drop the packet. */ + if (src_len < na->up.virt_hdr_len) { + RD(3, "Short src vnet header, dropping"); + return; + } + if (vnet_hdr_is_bad(vh)) { + RD(3, "Bad src vnet header, dropping"); + return; + } + } + /* We are processing the first input slot and there is a mismatch * between source and destination virt_hdr_len (SHL and DHL). * When the a client is using virtio-net headers, the header length @@ -185,14 +218,14 @@ * 12 | 0 | doesn't exist * 12 | 10 | copied from the first 10 bytes of source header */ - bzero(dst, dst_na->virt_hdr_len); - if (na->virt_hdr_len && dst_na->virt_hdr_len) + bzero(dst, dst_na->up.virt_hdr_len); + if (na->up.virt_hdr_len && dst_na->up.virt_hdr_len) memcpy(dst, src, sizeof(struct nm_vnet_hdr)); /* Skip the virtio-net headers. */ - src += na->virt_hdr_len; - src_len -= na->virt_hdr_len; - dst += dst_na->virt_hdr_len; - dst_len = dst_na->virt_hdr_len + src_len; + src += na->up.virt_hdr_len; + src_len -= na->up.virt_hdr_len; + dst += dst_na->up.virt_hdr_len; + dst_len = dst_na->up.virt_hdr_len + src_len; /* Here it could be dst_len == 0 (which implies src_len == 0), * so we avoid passing a zero length fragment. @@ -214,16 +247,27 @@ u_int gso_idx = 0; /* Payload data bytes segmented so far (e.g. TCP data bytes). */ u_int segmented_bytes = 0; + /* Is this an IPv4 or IPv6 GSO packet? */ + u_int ipv4 = 0; /* Length of the IP header (20 if IPv4, 40 if IPv6). */ u_int iphlen = 0; + /* Length of the Ethernet header (18 if 802.1q, otherwise 14). */ + u_int ethhlen = 14; /* Is this a TCP or an UDP GSO packet? */ u_int tcp = ((vh->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) == VIRTIO_NET_HDR_GSO_UDP) ? 0 : 1; /* Segment the GSO packet contained into the input slots (frags). */ - while (ft_p != ft_end) { + for (;;) { size_t copy; + if (dst_slots >= *howmany) { + /* We still have work to do, but we've run out of + * dst slots, so we have to drop the packet. */ + RD(3, "Not enough slots, dropping GSO packet"); + return; + } + /* Grab the GSO header if we don't have it. */ if (!gso_hdr) { uint16_t ethertype; @@ -231,28 +275,75 @@ gso_hdr = src; /* Look at the 'Ethertype' field to see if this packet - * is IPv4 or IPv6. - */ - ethertype = be16toh(*((uint16_t *)(gso_hdr + 12))); - if (ethertype == 0x0800) - iphlen = 20; - else /* if (ethertype == 0x86DD) */ - iphlen = 40; + * is IPv4 or IPv6, taking into account VLAN + * encapsulation. */ + for (;;) { + if (src_len < ethhlen) { + RD(3, "Short GSO fragment [eth], dropping"); + return; + } + ethertype = be16toh(*((uint16_t *) + (gso_hdr + ethhlen - 2))); + if (ethertype != 0x8100) /* not 802.1q */ + break; + ethhlen += 4; + } + switch (ethertype) { + case 0x0800: /* IPv4 */ + { + struct nm_iphdr *iph = (struct nm_iphdr *) + (gso_hdr + ethhlen); + + if (src_len < ethhlen + 20) { + RD(3, "Short GSO fragment " + "[IPv4], dropping"); + return; + } + ipv4 = 1; + iphlen = 4 * (iph->version_ihl & 0x0F); + break; + } + case 0x86DD: /* IPv6 */ + ipv4 = 0; + iphlen = 40; + break; + default: + RD(3, "Unsupported ethertype, " + "dropping GSO packet"); + return; + } ND(3, "type=%04x", ethertype); + if (src_len < ethhlen + iphlen) { + RD(3, "Short GSO fragment [IP], dropping"); + return; + } + /* Compute gso_hdr_len. For TCP we need to read the * content of the 'Data Offset' field. */ if (tcp) { - struct nm_tcphdr *tcph = - (struct nm_tcphdr *)&gso_hdr[14+iphlen]; + struct nm_tcphdr *tcph = (struct nm_tcphdr *) + (gso_hdr + ethhlen + iphlen); + + if (src_len < ethhlen + iphlen + 20) { + RD(3, "Short GSO fragment " + "[TCP], dropping"); + return; + } + gso_hdr_len = ethhlen + iphlen + + 4 * (tcph->doff >> 4); + } else { + gso_hdr_len = ethhlen + iphlen + 8; /* UDP */ + } - gso_hdr_len = 14 + iphlen + 4*(tcph->doff >> 4); - } else - gso_hdr_len = 14 + iphlen + 8; /* UDP */ + if (src_len < gso_hdr_len) { + RD(3, "Short GSO fragment [TCP/UDP], dropping"); + return; + } ND(3, "gso_hdr_len %u gso_mtu %d", gso_hdr_len, - dst_na->mfs); + dst_na->mfs); /* Advance source pointers. */ src += gso_hdr_len; @@ -263,7 +354,6 @@ break; src = ft_p->ft_buf; src_len = ft_p->ft_len; - continue; } } @@ -289,25 +379,24 @@ /* After raw segmentation, we must fix some header * fields and compute checksums, in a protocol dependent * way. */ - gso_fix_segment(dst, gso_bytes, gso_idx, - segmented_bytes, - src_len == 0 && ft_p + 1 == ft_end, - tcp, iphlen); + gso_fix_segment(dst + ethhlen, gso_bytes - ethhlen, + ipv4, iphlen, tcp, + gso_idx, segmented_bytes, + src_len == 0 && ft_p + 1 == ft_end); ND("frame %u completed with %d bytes", gso_idx, (int)gso_bytes); - slot->len = gso_bytes; - slot->flags = 0; - segmented_bytes += gso_bytes - gso_hdr_len; - + dst_slot->len = gso_bytes; + dst_slot->flags = 0; dst_slots++; - - /* Next destination slot. */ - *j = nm_next(*j, lim); - slot = &ring->slot[*j]; - dst = NMB(&dst_na->up, slot); + segmented_bytes += gso_bytes - gso_hdr_len; gso_bytes = 0; gso_idx++; + + /* Next destination slot. */ + j_cur = nm_next(j_cur, lim); + dst_slot = &dst_ring->slot[j_cur]; + dst = NMB(&dst_na->up, dst_slot); } /* Next input slot. */ @@ -342,10 +431,10 @@ /* Init/update the packet checksum if needed. */ if (vh && (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) { if (!dst_slots) - csum = nm_csum_raw(src + vh->csum_start, + csum = nm_os_csum_raw(src + vh->csum_start, src_len - vh->csum_start, 0); else - csum = nm_csum_raw(src, src_len, csum); + csum = nm_os_csum_raw(src, src_len, csum); } /* Round to a multiple of 64 */ @@ -359,44 +448,43 @@ } else { memcpy(dst, src, (int)src_len); } - slot->len = dst_len; - + dst_slot->len = dst_len; dst_slots++; /* Next destination slot. */ - *j = nm_next(*j, lim); - slot = &ring->slot[*j]; - dst = NMB(&dst_na->up, slot); + j_cur = nm_next(j_cur, lim); + dst_slot = &dst_ring->slot[j_cur]; + dst = NMB(&dst_na->up, dst_slot); /* Next source slot. */ ft_p++; src = ft_p->ft_buf; dst_len = src_len = ft_p->ft_len; - } /* Finalize (fold) the checksum if needed. */ if (check && vh && (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM)) { - *check = nm_csum_fold(csum); + *check = nm_os_csum_fold(csum); } ND(3, "using %u dst_slots", dst_slots); - /* A second pass on the desitations slots to set the slot flags, + /* A second pass on the destination slots to set the slot flags, * using the right number of destination slots. */ - while (j_start != *j) { - slot = &ring->slot[j_start]; - slot->flags = (dst_slots << 8)| NS_MOREFRAG; + while (j_start != j_cur) { + dst_slot = &dst_ring->slot[j_start]; + dst_slot->flags = (dst_slots << 8)| NS_MOREFRAG; j_start = nm_next(j_start, lim); } /* Clear NS_MOREFRAG flag on last entry. */ - slot->flags = (dst_slots << 8); + dst_slot->flags = (dst_slots << 8); } - /* Update howmany. */ + /* Update howmany and j. This is to commit the use of + * those slots in the destination ring. */ if (unlikely(dst_slots > *howmany)) { - dst_slots = *howmany; - D("Slot allocation error: Should never happen"); + D("Slot allocation error: This is a bug"); } + *j = j_cur; *howmany -= dst_slots; } diff -u -r -N usr/src/sys/dev/netmap/netmap_pipe.c /usr/src/sys/dev/netmap/netmap_pipe.c --- usr/src/sys/dev/netmap/netmap_pipe.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_pipe.c 2016-11-23 16:57:57.853967000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2014 Giuseppe Lettieri. All rights reserved. + * Copyright (C) 2014-2016 Giuseppe Lettieri + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -23,7 +24,7 @@ * SUCH DAMAGE. */ -/* $FreeBSD: releng/11.0/sys/dev/netmap/netmap_pipe.c 285697 2015-07-19 18:05:49Z luigi $ */ +/* $FreeBSD: head/sys/dev/netmap/netmap_pipe.c 261909 2014-02-15 04:53:04Z luigi $ */ #if defined(__FreeBSD__) #include <sys/cdefs.h> /* prerequisite */ @@ -54,6 +55,9 @@ #warning OSX support is only partial #include "osx_glue.h" +#elif defined(_WIN32) +#include "win_glue.h" + #else #error Unsupported platform @@ -72,26 +76,29 @@ #define NM_PIPE_MAXSLOTS 4096 -int netmap_default_pipes = 0; /* ignored, kept for compatibility */ +static int netmap_default_pipes = 0; /* ignored, kept for compatibility */ +SYSBEGIN(vars_pipes); SYSCTL_DECL(_dev_netmap); SYSCTL_INT(_dev_netmap, OID_AUTO, default_pipes, CTLFLAG_RW, &netmap_default_pipes, 0 , ""); +SYSEND; /* allocate the pipe array in the parent adapter */ static int nm_pipe_alloc(struct netmap_adapter *na, u_int npipes) { - size_t len; + size_t old_len, len; struct netmap_pipe_adapter **npa; if (npipes <= na->na_max_pipes) /* we already have more entries that requested */ return 0; - + if (npipes < na->na_next_pipe || npipes > NM_MAXPIPES) return EINVAL; + old_len = sizeof(struct netmap_pipe_adapter *)*na->na_max_pipes; len = sizeof(struct netmap_pipe_adapter *) * npipes; - npa = realloc(na->na_pipes, len, M_DEVBUF, M_NOWAIT | M_ZERO); + npa = nm_os_realloc(na->na_pipes, len, old_len); if (npa == NULL) return ENOMEM; @@ -110,7 +117,7 @@ D("freeing not empty pipe array for %s (%d dangling pipes)!", na->name, na->na_next_pipe); } - free(na->na_pipes, M_DEVBUF); + nm_os_free(na->na_pipes); na->na_pipes = NULL; na->na_max_pipes = 0; na->na_next_pipe = 0; @@ -199,7 +206,7 @@ } while (limit-- > 0) { - struct netmap_slot *rs = &rxkring->save_ring->slot[j]; + struct netmap_slot *rs = &rxkring->ring->slot[j]; struct netmap_slot *ts = &txkring->ring->slot[k]; struct netmap_slot tmp; @@ -295,7 +302,7 @@ * usr1 --> e1 --> e2 * * and we are e2. e1 is certainly registered and our - * krings already exist, but they may be hidden. + * krings already exist. Nothing to do. */ static int netmap_pipe_krings_create(struct netmap_adapter *na) @@ -310,65 +317,28 @@ int i; /* case 1) above */ - ND("%p: case 1, create everything", na); + D("%p: case 1, create both ends", na); error = netmap_krings_create(na, 0); if (error) goto err; - /* we also create all the rings, since we need to - * update the save_ring pointers. - * netmap_mem_rings_create (called by our caller) - * will not create the rings again - */ - - error = netmap_mem_rings_create(na); - if (error) - goto del_krings1; - - /* update our hidden ring pointers */ - for_rx_tx(t) { - for (i = 0; i < nma_get_nrings(na, t) + 1; i++) - NMR(na, t)[i].save_ring = NMR(na, t)[i].ring; - } - - /* now, create krings and rings of the other end */ + /* create the krings of the other end */ error = netmap_krings_create(ona, 0); if (error) - goto del_rings1; - - error = netmap_mem_rings_create(ona); - if (error) - goto del_krings2; - - for_rx_tx(t) { - for (i = 0; i < nma_get_nrings(ona, t) + 1; i++) - NMR(ona, t)[i].save_ring = NMR(ona, t)[i].ring; - } + goto del_krings1; /* cross link the krings */ for_rx_tx(t) { - enum txrx r= nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */ + enum txrx r = nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */ for (i = 0; i < nma_get_nrings(na, t); i++) { NMR(na, t)[i].pipe = NMR(&pna->peer->up, r) + i; NMR(&pna->peer->up, r)[i].pipe = NMR(na, t) + i; } } - } else { - int i; - /* case 2) above */ - /* recover the hidden rings */ - ND("%p: case 2, hidden rings", na); - for_rx_tx(t) { - for (i = 0; i < nma_get_nrings(na, t) + 1; i++) - NMR(na, t)[i].ring = NMR(na, t)[i].save_ring; - } + } return 0; -del_krings2: - netmap_krings_delete(ona); -del_rings1: - netmap_mem_rings_delete(na); del_krings1: netmap_krings_delete(na); err: @@ -383,7 +353,8 @@ * * usr1 --> e1 --> e2 * - * and we are e1. Nothing special to do. + * and we are e1. Create the needed rings of the + * other end. * * 1.b) state is * @@ -412,14 +383,65 @@ { struct netmap_pipe_adapter *pna = (struct netmap_pipe_adapter *)na; + struct netmap_adapter *ona = &pna->peer->up; + int i, error = 0; enum txrx t; ND("%p: onoff %d", na, onoff); if (onoff) { - na->na_flags |= NAF_NETMAP_ON; + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_on(kring)) { + /* mark the partner ring as needed */ + kring->pipe->nr_kflags |= NKR_NEEDRING; + } + } + } + + /* create all missing needed rings on the other end */ + error = netmap_mem_rings_create(ona); + if (error) + return error; + + /* In case of no error we put our rings in netmap mode */ + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_on(kring)) { + kring->nr_mode = NKR_NETMAP_ON; + } + } + } + if (na->active_fds == 0) + na->na_flags |= NAF_NETMAP_ON; } else { - na->na_flags &= ~NAF_NETMAP_ON; + if (na->active_fds == 0) + na->na_flags &= ~NAF_NETMAP_ON; + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_off(kring)) { + kring->nr_mode = NKR_NETMAP_OFF; + /* mark the peer ring as no longer needed by us + * (it may still be kept if sombody else is using it) + */ + kring->pipe->nr_kflags &= ~NKR_NEEDRING; + } + } + } + /* delete all the peer rings that are no longer needed */ + netmap_mem_rings_delete(ona); + } + + if (na->active_fds) { + D("active_fds %d", na->active_fds); + return 0; } + if (pna->peer_ref) { ND("%p: case 1.a or 2.a, nothing to do", na); return 0; @@ -429,18 +451,11 @@ pna->peer->peer_ref = 0; netmap_adapter_put(na); } else { - int i; ND("%p: case 2.b, grab peer", na); netmap_adapter_get(na); pna->peer->peer_ref = 1; - /* hide our rings from netmap_mem_rings_delete */ - for_rx_tx(t) { - for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { - NMR(na, t)[i].ring = NULL; - } - } } - return 0; + return error; } /* netmap_pipe_krings_delete. @@ -470,8 +485,6 @@ struct netmap_pipe_adapter *pna = (struct netmap_pipe_adapter *)na; struct netmap_adapter *ona; /* na of the other end */ - int i; - enum txrx t; if (!pna->peer_ref) { ND("%p: case 2, kept alive by peer", na); @@ -480,18 +493,12 @@ /* case 1) above */ ND("%p: case 1, deleting everyhing", na); netmap_krings_delete(na); /* also zeroes tx_rings etc. */ - /* restore the ring to be deleted on the peer */ ona = &pna->peer->up; if (ona->tx_rings == NULL) { /* already deleted, we must be on an * cleanup-after-error path */ return; } - for_rx_tx(t) { - for (i = 0; i < nma_get_nrings(ona, t) + 1; i++) - NMR(ona, t)[i].ring = NMR(ona, t)[i].save_ring; - } - netmap_mem_rings_delete(ona); netmap_krings_delete(ona); } @@ -519,6 +526,7 @@ struct nmreq pnmr; struct netmap_adapter *pna; /* parent adapter */ struct netmap_pipe_adapter *mna, *sna, *req; + struct ifnet *ifp = NULL; u_int pipe_id; int role = nmr->nr_flags & NR_REG_MASK; int error; @@ -536,7 +544,7 @@ memcpy(&pnmr.nr_name, nmr->nr_name, IFNAMSIZ); /* pass to parent the requested number of pipes */ pnmr.nr_arg1 = nmr->nr_arg1; - error = netmap_get_na(&pnmr, &pna, create); + error = netmap_get_na(&pnmr, &pna, &ifp, create); if (error) { ND("parent lookup failed: %d", error); return error; @@ -576,7 +584,7 @@ * The endpoint we were asked for holds a reference to * the other one. */ - mna = malloc(sizeof(*mna), M_DEVBUF, M_NOWAIT | M_ZERO); + mna = nm_os_malloc(sizeof(*mna)); if (mna == NULL) { error = ENOMEM; goto put_out; @@ -593,7 +601,8 @@ mna->up.nm_dtor = netmap_pipe_dtor; mna->up.nm_krings_create = netmap_pipe_krings_create; mna->up.nm_krings_delete = netmap_pipe_krings_delete; - mna->up.nm_mem = pna->nm_mem; + mna->up.nm_mem = netmap_mem_get(pna->nm_mem); + mna->up.na_flags |= NAF_MEM_OWNER; mna->up.na_lut = pna->na_lut; mna->up.num_tx_rings = 1; @@ -613,13 +622,14 @@ goto free_mna; /* create the slave */ - sna = malloc(sizeof(*mna), M_DEVBUF, M_NOWAIT | M_ZERO); + sna = nm_os_malloc(sizeof(*mna)); if (sna == NULL) { error = ENOMEM; goto unregister_mna; } /* most fields are the same, copy from master and then fix */ *sna = *mna; + sna->up.nm_mem = netmap_mem_get(mna->up.nm_mem); snprintf(sna->up.name, sizeof(sna->up.name), "%s}%d", pna->name, pipe_id); sna->role = NR_REG_PIPE_SLAVE; error = netmap_attach_common(&sna->up); @@ -652,26 +662,25 @@ *na = &req->up; netmap_adapter_get(*na); - /* write the configuration back */ - nmr->nr_tx_rings = req->up.num_tx_rings; - nmr->nr_rx_rings = req->up.num_rx_rings; - nmr->nr_tx_slots = req->up.num_tx_desc; - nmr->nr_rx_slots = req->up.num_rx_desc; - /* keep the reference to the parent. * It will be released by the req destructor */ + /* drop the ifp reference, if any */ + if (ifp) { + if_rele(ifp); + } + return 0; free_sna: - free(sna, M_DEVBUF); + nm_os_free(sna); unregister_mna: netmap_pipe_remove(pna, mna); free_mna: - free(mna, M_DEVBUF); + nm_os_free(mna); put_out: - netmap_adapter_put(pna); + netmap_unget_na(pna, ifp); return error; } diff -u -r -N usr/src/sys/dev/netmap/netmap_pt.c /usr/src/sys/dev/netmap/netmap_pt.c --- usr/src/sys/dev/netmap/netmap_pt.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_pt.c 2016-11-23 16:57:57.854667000 +0000 @@ -0,0 +1,1452 @@ +/* + * Copyright (C) 2015 Stefano Garzarella + * Copyright (C) 2016 Vincenzo Maffione + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +/* + * common headers + */ +#if defined(__FreeBSD__) +#include <sys/cdefs.h> +#include <sys/param.h> +#include <sys/kernel.h> +#include <sys/types.h> +#include <sys/selinfo.h> +#include <sys/socket.h> +#include <net/if.h> +#include <net/if_var.h> +#include <machine/bus.h> + +//#define usleep_range(_1, _2) +#define usleep_range(_1, _2) \ + pause_sbt("ptnetmap-sleep", SBT_1US * _1, SBT_1US * 1, C_ABSOLUTE) + +#elif defined(linux) +#include <bsd_glue.h> +#endif + +#include <net/netmap.h> +#include <dev/netmap/netmap_kern.h> +#include <net/netmap_virt.h> +#include <dev/netmap/netmap_mem2.h> + +#ifdef WITH_PTNETMAP_HOST + +/* RX cycle without receive any packets */ +#define PTN_RX_DRY_CYCLES_MAX 10 + +/* Limit Batch TX to half ring. + * Currently disabled, since it does not manage NS_MOREFRAG, which + * results in random drops in the VALE txsync. */ +//#define PTN_TX_BATCH_LIM(_n) ((_n >> 1)) + +//#define BUSY_WAIT + +#define NETMAP_PT_DEBUG /* Enables communication debugging. */ +#ifdef NETMAP_PT_DEBUG +#define DBG(x) x +#else +#define DBG(x) +#endif + + +#undef RATE +//#define RATE /* Enables communication statistics. */ +#ifdef RATE +#define IFRATE(x) x +struct rate_batch_stats { + unsigned long sync; + unsigned long sync_dry; + unsigned long pkt; +}; + +struct rate_stats { + unsigned long gtxk; /* Guest --> Host Tx kicks. */ + unsigned long grxk; /* Guest --> Host Rx kicks. */ + unsigned long htxk; /* Host --> Guest Tx kicks. */ + unsigned long hrxk; /* Host --> Guest Rx Kicks. */ + unsigned long btxwu; /* Backend Tx wake-up. */ + unsigned long brxwu; /* Backend Rx wake-up. */ + struct rate_batch_stats txbs; + struct rate_batch_stats rxbs; +}; + +struct rate_context { + struct timer_list timer; + struct rate_stats new; + struct rate_stats old; +}; + +#define RATE_PERIOD 2 +static void +rate_callback(unsigned long arg) +{ + struct rate_context * ctx = (struct rate_context *)arg; + struct rate_stats cur = ctx->new; + struct rate_batch_stats *txbs = &cur.txbs; + struct rate_batch_stats *rxbs = &cur.rxbs; + struct rate_batch_stats *txbs_old = &ctx->old.txbs; + struct rate_batch_stats *rxbs_old = &ctx->old.rxbs; + uint64_t tx_batch, rx_batch; + unsigned long txpkts, rxpkts; + unsigned long gtxk, grxk; + int r; + + txpkts = txbs->pkt - txbs_old->pkt; + rxpkts = rxbs->pkt - rxbs_old->pkt; + + tx_batch = ((txbs->sync - txbs_old->sync) > 0) ? + txpkts / (txbs->sync - txbs_old->sync): 0; + rx_batch = ((rxbs->sync - rxbs_old->sync) > 0) ? + rxpkts / (rxbs->sync - rxbs_old->sync): 0; + + /* Fix-up gtxk and grxk estimates. */ + gtxk = (cur.gtxk - ctx->old.gtxk) - (cur.btxwu - ctx->old.btxwu); + grxk = (cur.grxk - ctx->old.grxk) - (cur.brxwu - ctx->old.brxwu); + + printk("txpkts = %lu Hz\n", txpkts/RATE_PERIOD); + printk("gtxk = %lu Hz\n", gtxk/RATE_PERIOD); + printk("htxk = %lu Hz\n", (cur.htxk - ctx->old.htxk)/RATE_PERIOD); + printk("btxw = %lu Hz\n", (cur.btxwu - ctx->old.btxwu)/RATE_PERIOD); + printk("rxpkts = %lu Hz\n", rxpkts/RATE_PERIOD); + printk("grxk = %lu Hz\n", grxk/RATE_PERIOD); + printk("hrxk = %lu Hz\n", (cur.hrxk - ctx->old.hrxk)/RATE_PERIOD); + printk("brxw = %lu Hz\n", (cur.brxwu - ctx->old.brxwu)/RATE_PERIOD); + printk("txbatch = %llu avg\n", tx_batch); + printk("rxbatch = %llu avg\n", rx_batch); + printk("\n"); + + ctx->old = cur; + r = mod_timer(&ctx->timer, jiffies + + msecs_to_jiffies(RATE_PERIOD * 1000)); + if (unlikely(r)) + D("[ptnetmap] Error: mod_timer()\n"); +} + +static void +rate_batch_stats_update(struct rate_batch_stats *bf, uint32_t pre_tail, + uint32_t act_tail, uint32_t num_slots) +{ + int n = (int)act_tail - pre_tail; + + if (n) { + if (n < 0) + n += num_slots; + + bf->sync++; + bf->pkt += n; + } else { + bf->sync_dry++; + } +} + +#else /* !RATE */ +#define IFRATE(x) +#endif /* RATE */ + +struct ptnetmap_state { + /* Kthreads. */ + struct nm_kthread **kthreads; + + /* Shared memory with the guest (TX/RX) */ + struct ptnet_ring __user *ptrings; + + bool stopped; + + /* Netmap adapter wrapping the backend. */ + struct netmap_pt_host_adapter *pth_na; + + IFRATE(struct rate_context rate_ctx;) +}; + +static inline void +ptnetmap_kring_dump(const char *title, const struct netmap_kring *kring) +{ + RD(1, "%s - name: %s hwcur: %d hwtail: %d rhead: %d rcur: %d \ + rtail: %d head: %d cur: %d tail: %d", + title, kring->name, kring->nr_hwcur, + kring->nr_hwtail, kring->rhead, kring->rcur, kring->rtail, + kring->ring->head, kring->ring->cur, kring->ring->tail); +} + +/* + * TX functions to set/get and to handle host/guest kick. + */ + + +/* Enable or disable guest --> host kicks. */ +static inline void +ptring_kick_enable(struct ptnet_ring __user *ptring, uint32_t val) +{ + CSB_WRITE(ptring, host_need_kick, val); +} + +/* Are guest interrupt enabled or disabled? */ +static inline uint32_t +ptring_intr_enabled(struct ptnet_ring __user *ptring) +{ + uint32_t v; + + CSB_READ(ptring, guest_need_kick, v); + + return v; +} + +/* Enable or disable guest interrupts. */ +static inline void +ptring_intr_enable(struct ptnet_ring __user *ptring, uint32_t val) +{ + CSB_WRITE(ptring, guest_need_kick, val); +} + +/* Handle TX events: from the guest or from the backend */ +static void +ptnetmap_tx_handler(void *data) +{ + struct netmap_kring *kring = data; + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)kring->na->na_private; + struct ptnetmap_state *ptns = pth_na->ptns; + struct ptnet_ring __user *ptring; + struct netmap_ring shadow_ring; /* shadow copy of the netmap_ring */ + bool more_txspace = false; + struct nm_kthread *kth; + uint32_t num_slots; + int batch; + IFRATE(uint32_t pre_tail); + + if (unlikely(!ptns)) { + D("ERROR ptnetmap state is NULL"); + return; + } + + if (unlikely(ptns->stopped)) { + RD(1, "backend netmap is being stopped"); + return; + } + + if (unlikely(nm_kr_tryget(kring, 1, NULL))) { + D("ERROR nm_kr_tryget()"); + return; + } + + /* This is a guess, to be fixed in the rate callback. */ + IFRATE(ptns->rate_ctx.new.gtxk++); + + /* Get TX ptring pointer from the CSB. */ + ptring = ptns->ptrings + kring->ring_id; + kth = ptns->kthreads[kring->ring_id]; + + num_slots = kring->nkr_num_slots; + shadow_ring.head = kring->rhead; + shadow_ring.cur = kring->rcur; + + /* Disable guest --> host notifications. */ + ptring_kick_enable(ptring, 0); + /* Copy the guest kring pointers from the CSB */ + ptnetmap_host_read_kring_csb(ptring, &shadow_ring, num_slots); + + for (;;) { + /* If guest moves ahead too fast, let's cut the move so + * that we don't exceed our batch limit. */ + batch = shadow_ring.head - kring->nr_hwcur; + if (batch < 0) + batch += num_slots; + +#ifdef PTN_TX_BATCH_LIM + if (batch > PTN_TX_BATCH_LIM(num_slots)) { + uint32_t head_lim = kring->nr_hwcur + PTN_TX_BATCH_LIM(num_slots); + + if (head_lim >= num_slots) + head_lim -= num_slots; + ND(1, "batch: %d head: %d head_lim: %d", batch, shadow_ring.head, + head_lim); + shadow_ring.head = head_lim; + batch = PTN_TX_BATCH_LIM(num_slots); + } +#endif /* PTN_TX_BATCH_LIM */ + + if (nm_kr_txspace(kring) <= (num_slots >> 1)) { + shadow_ring.flags |= NAF_FORCE_RECLAIM; + } + + /* Netmap prologue */ + shadow_ring.tail = kring->rtail; + if (unlikely(nm_txsync_prologue(kring, &shadow_ring) >= num_slots)) { + /* Reinit ring and enable notifications. */ + netmap_ring_reinit(kring); + ptring_kick_enable(ptring, 1); + break; + } + + if (unlikely(netmap_verbose & NM_VERB_TXSYNC)) { + ptnetmap_kring_dump("pre txsync", kring); + } + + IFRATE(pre_tail = kring->rtail); + if (unlikely(kring->nm_sync(kring, shadow_ring.flags))) { + /* Reenable notifications. */ + ptring_kick_enable(ptring, 1); + D("ERROR txsync()"); + break; + } + + /* + * Finalize + * Copy host hwcur and hwtail into the CSB for the guest sync(), and + * do the nm_sync_finalize. + */ + ptnetmap_host_write_kring_csb(ptring, kring->nr_hwcur, + kring->nr_hwtail); + if (kring->rtail != kring->nr_hwtail) { + /* Some more room available in the parent adapter. */ + kring->rtail = kring->nr_hwtail; + more_txspace = true; + } + + IFRATE(rate_batch_stats_update(&ptns->rate_ctx.new.txbs, pre_tail, + kring->rtail, num_slots)); + + if (unlikely(netmap_verbose & NM_VERB_TXSYNC)) { + ptnetmap_kring_dump("post txsync", kring); + } + +#ifndef BUSY_WAIT + /* Interrupt the guest if needed. */ + if (more_txspace && ptring_intr_enabled(ptring)) { + /* Disable guest kick to avoid sending unnecessary kicks */ + ptring_intr_enable(ptring, 0); + nm_os_kthread_send_irq(kth); + IFRATE(ptns->rate_ctx.new.htxk++); + more_txspace = false; + } +#endif + /* Read CSB to see if there is more work to do. */ + ptnetmap_host_read_kring_csb(ptring, &shadow_ring, num_slots); +#ifndef BUSY_WAIT + if (shadow_ring.head == kring->rhead) { + /* + * No more packets to transmit. We enable notifications and + * go to sleep, waiting for a kick from the guest when new + * new slots are ready for transmission. + */ + usleep_range(1,1); + /* Reenable notifications. */ + ptring_kick_enable(ptring, 1); + /* Doublecheck. */ + ptnetmap_host_read_kring_csb(ptring, &shadow_ring, num_slots); + if (shadow_ring.head != kring->rhead) { + /* We won the race condition, there are more packets to + * transmit. Disable notifications and do another cycle */ + ptring_kick_enable(ptring, 0); + continue; + } + break; + } + + if (nm_kr_txempty(kring)) { + /* No more available TX slots. We stop waiting for a notification + * from the backend (netmap_tx_irq). */ + ND(1, "TX ring"); + break; + } +#endif + if (unlikely(ptns->stopped)) { + D("backend netmap is being stopped"); + break; + } + } + + nm_kr_put(kring); + + if (more_txspace && ptring_intr_enabled(ptring)) { + ptring_intr_enable(ptring, 0); + nm_os_kthread_send_irq(kth); + IFRATE(ptns->rate_ctx.new.htxk++); + } +} + +/* + * We need RX kicks from the guest when (tail == head-1), where we wait + * for the guest to refill. + */ +#ifndef BUSY_WAIT +static inline int +ptnetmap_norxslots(struct netmap_kring *kring, uint32_t g_head) +{ + return (NM_ACCESS_ONCE(kring->nr_hwtail) == nm_prev(g_head, + kring->nkr_num_slots - 1)); +} +#endif /* !BUSY_WAIT */ + +/* Handle RX events: from the guest or from the backend */ +static void +ptnetmap_rx_handler(void *data) +{ + struct netmap_kring *kring = data; + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)kring->na->na_private; + struct ptnetmap_state *ptns = pth_na->ptns; + struct ptnet_ring __user *ptring; + struct netmap_ring shadow_ring; /* shadow copy of the netmap_ring */ + struct nm_kthread *kth; + uint32_t num_slots; + int dry_cycles = 0; + bool some_recvd = false; + IFRATE(uint32_t pre_tail); + + if (unlikely(!ptns || !ptns->pth_na)) { + D("ERROR ptnetmap state %p, ptnetmap host adapter %p", ptns, + ptns ? ptns->pth_na : NULL); + return; + } + + if (unlikely(ptns->stopped)) { + RD(1, "backend netmap is being stopped"); + return; + } + + if (unlikely(nm_kr_tryget(kring, 1, NULL))) { + D("ERROR nm_kr_tryget()"); + return; + } + + /* This is a guess, to be fixed in the rate callback. */ + IFRATE(ptns->rate_ctx.new.grxk++); + + /* Get RX ptring pointer from the CSB. */ + ptring = ptns->ptrings + (pth_na->up.num_tx_rings + kring->ring_id); + kth = ptns->kthreads[pth_na->up.num_tx_rings + kring->ring_id]; + + num_slots = kring->nkr_num_slots; + shadow_ring.head = kring->rhead; + shadow_ring.cur = kring->rcur; + + /* Disable notifications. */ + ptring_kick_enable(ptring, 0); + /* Copy the guest kring pointers from the CSB */ + ptnetmap_host_read_kring_csb(ptring, &shadow_ring, num_slots); + + for (;;) { + uint32_t hwtail; + + /* Netmap prologue */ + shadow_ring.tail = kring->rtail; + if (unlikely(nm_rxsync_prologue(kring, &shadow_ring) >= num_slots)) { + /* Reinit ring and enable notifications. */ + netmap_ring_reinit(kring); + ptring_kick_enable(ptring, 1); + break; + } + + if (unlikely(netmap_verbose & NM_VERB_RXSYNC)) { + ptnetmap_kring_dump("pre rxsync", kring); + } + + IFRATE(pre_tail = kring->rtail); + if (unlikely(kring->nm_sync(kring, shadow_ring.flags))) { + /* Reenable notifications. */ + ptring_kick_enable(ptring, 1); + D("ERROR rxsync()"); + break; + } + /* + * Finalize + * Copy host hwcur and hwtail into the CSB for the guest sync() + */ + hwtail = NM_ACCESS_ONCE(kring->nr_hwtail); + ptnetmap_host_write_kring_csb(ptring, kring->nr_hwcur, hwtail); + if (kring->rtail != hwtail) { + kring->rtail = hwtail; + some_recvd = true; + dry_cycles = 0; + } else { + dry_cycles++; + } + + IFRATE(rate_batch_stats_update(&ptns->rate_ctx.new.rxbs, pre_tail, + kring->rtail, num_slots)); + + if (unlikely(netmap_verbose & NM_VERB_RXSYNC)) { + ptnetmap_kring_dump("post rxsync", kring); + } + +#ifndef BUSY_WAIT + /* Interrupt the guest if needed. */ + if (some_recvd && ptring_intr_enabled(ptring)) { + /* Disable guest kick to avoid sending unnecessary kicks */ + ptring_intr_enable(ptring, 0); + nm_os_kthread_send_irq(kth); + IFRATE(ptns->rate_ctx.new.hrxk++); + some_recvd = false; + } +#endif + /* Read CSB to see if there is more work to do. */ + ptnetmap_host_read_kring_csb(ptring, &shadow_ring, num_slots); +#ifndef BUSY_WAIT + if (ptnetmap_norxslots(kring, shadow_ring.head)) { + /* + * No more slots available for reception. We enable notification and + * go to sleep, waiting for a kick from the guest when new receive + * slots are available. + */ + usleep_range(1,1); + /* Reenable notifications. */ + ptring_kick_enable(ptring, 1); + /* Doublecheck. */ + ptnetmap_host_read_kring_csb(ptring, &shadow_ring, num_slots); + if (!ptnetmap_norxslots(kring, shadow_ring.head)) { + /* We won the race condition, more slots are available. Disable + * notifications and do another cycle. */ + ptring_kick_enable(ptring, 0); + continue; + } + break; + } + + hwtail = NM_ACCESS_ONCE(kring->nr_hwtail); + if (unlikely(hwtail == kring->rhead || + dry_cycles >= PTN_RX_DRY_CYCLES_MAX)) { + /* No more packets to be read from the backend. We stop and + * wait for a notification from the backend (netmap_rx_irq). */ + ND(1, "nr_hwtail: %d rhead: %d dry_cycles: %d", + hwtail, kring->rhead, dry_cycles); + break; + } +#endif + if (unlikely(ptns->stopped)) { + D("backend netmap is being stopped"); + break; + } + } + + nm_kr_put(kring); + + /* Interrupt the guest if needed. */ + if (some_recvd && ptring_intr_enabled(ptring)) { + ptring_intr_enable(ptring, 0); + nm_os_kthread_send_irq(kth); + IFRATE(ptns->rate_ctx.new.hrxk++); + } +} + +#ifdef NETMAP_PT_DEBUG +static void +ptnetmap_print_configuration(struct ptnetmap_cfg *cfg) +{ + int k; + + D("ptnetmap configuration:"); + D(" CSB ptrings @%p, num_rings=%u, cfgtype %08x", cfg->ptrings, + cfg->num_rings, cfg->cfgtype); + for (k = 0; k < cfg->num_rings; k++) { + switch (cfg->cfgtype) { + case PTNETMAP_CFGTYPE_QEMU: { + struct ptnetmap_cfgentry_qemu *e = + (struct ptnetmap_cfgentry_qemu *)(cfg+1) + k; + D(" ring #%d: ioeventfd=%lu, irqfd=%lu", k, + (unsigned long)e->ioeventfd, + (unsigned long)e->irqfd); + break; + } + + case PTNETMAP_CFGTYPE_BHYVE: + { + struct ptnetmap_cfgentry_bhyve *e = + (struct ptnetmap_cfgentry_bhyve *)(cfg+1) + k; + D(" ring #%d: wchan=%lu, ioctl_fd=%lu, " + "ioctl_cmd=%lu, msix_msg_data=%lu, msix_addr=%lu", + k, (unsigned long)e->wchan, + (unsigned long)e->ioctl_fd, + (unsigned long)e->ioctl_cmd, + (unsigned long)e->ioctl_data.msg_data, + (unsigned long)e->ioctl_data.addr); + break; + } + } + } + +} +#endif /* NETMAP_PT_DEBUG */ + +/* Copy actual state of the host ring into the CSB for the guest init */ +static int +ptnetmap_kring_snapshot(struct netmap_kring *kring, struct ptnet_ring __user *ptring) +{ + if(CSB_WRITE(ptring, head, kring->rhead)) + goto err; + if(CSB_WRITE(ptring, cur, kring->rcur)) + goto err; + + if(CSB_WRITE(ptring, hwcur, kring->nr_hwcur)) + goto err; + if(CSB_WRITE(ptring, hwtail, NM_ACCESS_ONCE(kring->nr_hwtail))) + goto err; + + DBG(ptnetmap_kring_dump("ptnetmap_kring_snapshot", kring);) + + return 0; +err: + return EFAULT; +} + +static struct netmap_kring * +ptnetmap_kring(struct netmap_pt_host_adapter *pth_na, int k) +{ + if (k < pth_na->up.num_tx_rings) { + return pth_na->up.tx_rings + k; + } + return pth_na->up.rx_rings + k - pth_na->up.num_tx_rings; +} + +static int +ptnetmap_krings_snapshot(struct netmap_pt_host_adapter *pth_na) +{ + struct ptnetmap_state *ptns = pth_na->ptns; + struct netmap_kring *kring; + unsigned int num_rings; + int err = 0, k; + + num_rings = pth_na->up.num_tx_rings + + pth_na->up.num_rx_rings; + + for (k = 0; k < num_rings; k++) { + kring = ptnetmap_kring(pth_na, k); + err |= ptnetmap_kring_snapshot(kring, ptns->ptrings + k); + } + + return err; +} + +/* + * Functions to create, start and stop the kthreads + */ + +static int +ptnetmap_create_kthreads(struct netmap_pt_host_adapter *pth_na, + struct ptnetmap_cfg *cfg) +{ + struct ptnetmap_state *ptns = pth_na->ptns; + struct nm_kthread_cfg nmk_cfg; + unsigned int num_rings; + uint8_t *cfg_entries = (uint8_t *)(cfg + 1); + int k; + + num_rings = pth_na->up.num_tx_rings + + pth_na->up.num_rx_rings; + + for (k = 0; k < num_rings; k++) { + nmk_cfg.attach_user = 1; /* attach kthread to user process */ + nmk_cfg.worker_private = ptnetmap_kring(pth_na, k); + nmk_cfg.type = k; + if (k < pth_na->up.num_tx_rings) { + nmk_cfg.worker_fn = ptnetmap_tx_handler; + } else { + nmk_cfg.worker_fn = ptnetmap_rx_handler; + } + + ptns->kthreads[k] = nm_os_kthread_create(&nmk_cfg, + cfg->cfgtype, cfg_entries + k * cfg->entry_size); + if (ptns->kthreads[k] == NULL) { + goto err; + } + } + + return 0; +err: + for (k = 0; k < num_rings; k++) { + if (ptns->kthreads[k]) { + nm_os_kthread_delete(ptns->kthreads[k]); + ptns->kthreads[k] = NULL; + } + } + return EFAULT; +} + +static int +ptnetmap_start_kthreads(struct netmap_pt_host_adapter *pth_na) +{ + struct ptnetmap_state *ptns = pth_na->ptns; + int num_rings; + int error; + int k; + + if (!ptns) { + D("BUG ptns is NULL"); + return EFAULT; + } + + ptns->stopped = false; + + num_rings = ptns->pth_na->up.num_tx_rings + + ptns->pth_na->up.num_rx_rings; + for (k = 0; k < num_rings; k++) { + //nm_os_kthread_set_affinity(ptns->kthreads[k], xxx); + error = nm_os_kthread_start(ptns->kthreads[k]); + if (error) { + return error; + } + } + + return 0; +} + +static void +ptnetmap_stop_kthreads(struct netmap_pt_host_adapter *pth_na) +{ + struct ptnetmap_state *ptns = pth_na->ptns; + int num_rings; + int k; + + if (!ptns) { + /* Nothing to do. */ + return; + } + + ptns->stopped = true; + + num_rings = ptns->pth_na->up.num_tx_rings + + ptns->pth_na->up.num_rx_rings; + for (k = 0; k < num_rings; k++) { + nm_os_kthread_stop(ptns->kthreads[k]); + } +} + +static struct ptnetmap_cfg * +ptnetmap_read_cfg(struct nmreq *nmr) +{ + uintptr_t *nmr_ptncfg = (uintptr_t *)&nmr->nr_arg1; + struct ptnetmap_cfg *cfg; + struct ptnetmap_cfg tmp; + size_t cfglen; + + if (copyin((const void *)*nmr_ptncfg, &tmp, sizeof(tmp))) { + D("Partial copyin() failed"); + return NULL; + } + + cfglen = sizeof(tmp) + tmp.num_rings * tmp.entry_size; + cfg = nm_os_malloc(cfglen); + if (!cfg) { + return NULL; + } + + if (copyin((const void *)*nmr_ptncfg, cfg, cfglen)) { + D("Full copyin() failed"); + nm_os_free(cfg); + return NULL; + } + + return cfg; +} + +static int nm_unused_notify(struct netmap_kring *, int); +static int nm_pt_host_notify(struct netmap_kring *, int); + +/* Create ptnetmap state and switch parent adapter to ptnetmap mode. */ +static int +ptnetmap_create(struct netmap_pt_host_adapter *pth_na, + struct ptnetmap_cfg *cfg) +{ + struct ptnetmap_state *ptns; + unsigned int num_rings; + int ret, i; + + /* Check if ptnetmap state is already there. */ + if (pth_na->ptns) { + D("ERROR adapter %p already in ptnetmap mode", pth_na->parent); + return EINVAL; + } + + num_rings = pth_na->up.num_tx_rings + pth_na->up.num_rx_rings; + + if (num_rings != cfg->num_rings) { + D("ERROR configuration mismatch, expected %u rings, found %u", + num_rings, cfg->num_rings); + return EINVAL; + } + + ptns = nm_os_malloc(sizeof(*ptns) + num_rings * sizeof(*ptns->kthreads)); + if (!ptns) { + return ENOMEM; + } + + ptns->kthreads = (struct nm_kthread **)(ptns + 1); + ptns->stopped = true; + + /* Cross-link data structures. */ + pth_na->ptns = ptns; + ptns->pth_na = pth_na; + + /* Store the CSB address provided by the hypervisor. */ + ptns->ptrings = cfg->ptrings; + + DBG(ptnetmap_print_configuration(cfg)); + + /* Create kthreads */ + if ((ret = ptnetmap_create_kthreads(pth_na, cfg))) { + D("ERROR ptnetmap_create_kthreads()"); + goto err; + } + /* Copy krings state into the CSB for the guest initialization */ + if ((ret = ptnetmap_krings_snapshot(pth_na))) { + D("ERROR ptnetmap_krings_snapshot()"); + goto err; + } + + /* Overwrite parent nm_notify krings callback. */ + pth_na->parent->na_private = pth_na; + pth_na->parent_nm_notify = pth_na->parent->nm_notify; + pth_na->parent->nm_notify = nm_unused_notify; + + for (i = 0; i < pth_na->parent->num_rx_rings; i++) { + pth_na->up.rx_rings[i].save_notify = + pth_na->up.rx_rings[i].nm_notify; + pth_na->up.rx_rings[i].nm_notify = nm_pt_host_notify; + } + for (i = 0; i < pth_na->parent->num_tx_rings; i++) { + pth_na->up.tx_rings[i].save_notify = + pth_na->up.tx_rings[i].nm_notify; + pth_na->up.tx_rings[i].nm_notify = nm_pt_host_notify; + } + +#ifdef RATE + memset(&ptns->rate_ctx, 0, sizeof(ptns->rate_ctx)); + setup_timer(&ptns->rate_ctx.timer, &rate_callback, + (unsigned long)&ptns->rate_ctx); + if (mod_timer(&ptns->rate_ctx.timer, jiffies + msecs_to_jiffies(1500))) + D("[ptn] Error: mod_timer()\n"); +#endif + + DBG(D("[%s] ptnetmap configuration DONE", pth_na->up.name)); + + return 0; + +err: + pth_na->ptns = NULL; + nm_os_free(ptns); + return ret; +} + +/* Switch parent adapter back to normal mode and destroy + * ptnetmap state. */ +static void +ptnetmap_delete(struct netmap_pt_host_adapter *pth_na) +{ + struct ptnetmap_state *ptns = pth_na->ptns; + int num_rings; + int i; + + if (!ptns) { + /* Nothing to do. */ + return; + } + + /* Restore parent adapter callbacks. */ + pth_na->parent->nm_notify = pth_na->parent_nm_notify; + pth_na->parent->na_private = NULL; + + for (i = 0; i < pth_na->parent->num_rx_rings; i++) { + pth_na->up.rx_rings[i].nm_notify = + pth_na->up.rx_rings[i].save_notify; + pth_na->up.rx_rings[i].save_notify = NULL; + } + for (i = 0; i < pth_na->parent->num_tx_rings; i++) { + pth_na->up.tx_rings[i].nm_notify = + pth_na->up.tx_rings[i].save_notify; + pth_na->up.tx_rings[i].save_notify = NULL; + } + + /* Delete kthreads. */ + num_rings = ptns->pth_na->up.num_tx_rings + + ptns->pth_na->up.num_rx_rings; + for (i = 0; i < num_rings; i++) { + nm_os_kthread_delete(ptns->kthreads[i]); + ptns->kthreads[i] = NULL; + } + + IFRATE(del_timer(&ptns->rate_ctx.timer)); + + nm_os_free(ptns); + + pth_na->ptns = NULL; + + DBG(D("[%s] ptnetmap deleted", pth_na->up.name)); +} + +/* + * Called by netmap_ioctl(). + * Operation is indicated in nmr->nr_cmd. + * + * Called without NMG_LOCK. + */ +int +ptnetmap_ctl(struct nmreq *nmr, struct netmap_adapter *na) +{ + struct netmap_pt_host_adapter *pth_na; + struct ptnetmap_cfg *cfg; + char *name; + int cmd, error = 0; + + name = nmr->nr_name; + cmd = nmr->nr_cmd; + + DBG(D("name: %s", name)); + + if (!nm_ptnetmap_host_on(na)) { + D("ERROR Netmap adapter %p is not a ptnetmap host adapter", na); + error = ENXIO; + goto done; + } + pth_na = (struct netmap_pt_host_adapter *)na; + + NMG_LOCK(); + switch (cmd) { + case NETMAP_PT_HOST_CREATE: + /* Read hypervisor configuration from userspace. */ + cfg = ptnetmap_read_cfg(nmr); + if (!cfg) + break; + /* Create ptnetmap state (kthreads, ...) and switch parent + * adapter to ptnetmap mode. */ + error = ptnetmap_create(pth_na, cfg); + nm_os_free(cfg); + if (error) + break; + /* Start kthreads. */ + error = ptnetmap_start_kthreads(pth_na); + if (error) + ptnetmap_delete(pth_na); + break; + + case NETMAP_PT_HOST_DELETE: + /* Stop kthreads. */ + ptnetmap_stop_kthreads(pth_na); + /* Switch parent adapter back to normal mode and destroy + * ptnetmap state (kthreads, ...). */ + ptnetmap_delete(pth_na); + break; + + default: + D("ERROR invalid cmd (nmr->nr_cmd) (0x%x)", cmd); + error = EINVAL; + break; + } + NMG_UNLOCK(); + +done: + return error; +} + +/* nm_notify callbacks for ptnetmap */ +static int +nm_pt_host_notify(struct netmap_kring *kring, int flags) +{ + struct netmap_adapter *na = kring->na; + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)na->na_private; + struct ptnetmap_state *ptns; + int k; + + /* First check that the passthrough port is not being destroyed. */ + if (unlikely(!pth_na)) { + return NM_IRQ_COMPLETED; + } + + ptns = pth_na->ptns; + if (unlikely(!ptns || ptns->stopped)) { + return NM_IRQ_COMPLETED; + } + + k = kring->ring_id; + + /* Notify kthreads (wake up if needed) */ + if (kring->tx == NR_TX) { + ND(1, "TX backend irq"); + IFRATE(ptns->rate_ctx.new.btxwu++); + } else { + k += pth_na->up.num_tx_rings; + ND(1, "RX backend irq"); + IFRATE(ptns->rate_ctx.new.brxwu++); + } + nm_os_kthread_wakeup_worker(ptns->kthreads[k]); + + return NM_IRQ_COMPLETED; +} + +static int +nm_unused_notify(struct netmap_kring *kring, int flags) +{ + D("BUG this should never be called"); + return ENXIO; +} + +/* nm_config callback for bwrap */ +static int +nm_pt_host_config(struct netmap_adapter *na, u_int *txr, u_int *txd, + u_int *rxr, u_int *rxd) +{ + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)na; + struct netmap_adapter *parent = pth_na->parent; + int error; + + //XXX: maybe calling parent->nm_config is better + + /* forward the request */ + error = netmap_update_config(parent); + + *rxr = na->num_rx_rings = parent->num_rx_rings; + *txr = na->num_tx_rings = parent->num_tx_rings; + *txd = na->num_tx_desc = parent->num_tx_desc; + *rxd = na->num_rx_desc = parent->num_rx_desc; + + DBG(D("rxr: %d txr: %d txd: %d rxd: %d", *rxr, *txr, *txd, *rxd)); + + return error; +} + +/* nm_krings_create callback for ptnetmap */ +static int +nm_pt_host_krings_create(struct netmap_adapter *na) +{ + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)na; + struct netmap_adapter *parent = pth_na->parent; + enum txrx t; + int error; + + DBG(D("%s", pth_na->up.name)); + + /* create the parent krings */ + error = parent->nm_krings_create(parent); + if (error) { + return error; + } + + /* A ptnetmap host adapter points the very same krings + * as its parent adapter. These pointer are used in the + * TX/RX worker functions. */ + na->tx_rings = parent->tx_rings; + na->rx_rings = parent->rx_rings; + na->tailroom = parent->tailroom; + + for_rx_tx(t) { + struct netmap_kring *kring; + + /* Parent's kring_create function will initialize + * its own na->si. We have to init our na->si here. */ + nm_os_selinfo_init(&na->si[t]); + + /* Force the mem_rings_create() method to create the + * host rings independently on what the regif asked for: + * these rings are needed by the guest ptnetmap adapter + * anyway. */ + kring = &NMR(na, t)[nma_get_nrings(na, t)]; + kring->nr_kflags |= NKR_NEEDRING; + } + + return 0; +} + +/* nm_krings_delete callback for ptnetmap */ +static void +nm_pt_host_krings_delete(struct netmap_adapter *na) +{ + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)na; + struct netmap_adapter *parent = pth_na->parent; + + DBG(D("%s", pth_na->up.name)); + + parent->nm_krings_delete(parent); + + na->tx_rings = na->rx_rings = na->tailroom = NULL; +} + +/* nm_register callback */ +static int +nm_pt_host_register(struct netmap_adapter *na, int onoff) +{ + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)na; + struct netmap_adapter *parent = pth_na->parent; + int error; + DBG(D("%s onoff %d", pth_na->up.name, onoff)); + + if (onoff) { + /* netmap_do_regif has been called on the ptnetmap na. + * We need to pass the information about the + * memory allocator to the parent before + * putting it in netmap mode + */ + parent->na_lut = na->na_lut; + } + + /* forward the request to the parent */ + error = parent->nm_register(parent, onoff); + if (error) + return error; + + + if (onoff) { + na->na_flags |= NAF_NETMAP_ON | NAF_PTNETMAP_HOST; + } else { + ptnetmap_delete(pth_na); + na->na_flags &= ~(NAF_NETMAP_ON | NAF_PTNETMAP_HOST); + } + + return 0; +} + +/* nm_dtor callback */ +static void +nm_pt_host_dtor(struct netmap_adapter *na) +{ + struct netmap_pt_host_adapter *pth_na = + (struct netmap_pt_host_adapter *)na; + struct netmap_adapter *parent = pth_na->parent; + + DBG(D("%s", pth_na->up.name)); + + /* The equivalent of NETMAP_PT_HOST_DELETE if the hypervisor + * didn't do it. */ + ptnetmap_stop_kthreads(pth_na); + ptnetmap_delete(pth_na); + + parent->na_flags &= ~NAF_BUSY; + + netmap_adapter_put(pth_na->parent); + pth_na->parent = NULL; +} + +/* check if nmr is a request for a ptnetmap adapter that we can satisfy */ +int +netmap_get_pt_host_na(struct nmreq *nmr, struct netmap_adapter **na, int create) +{ + struct nmreq parent_nmr; + struct netmap_adapter *parent; /* target adapter */ + struct netmap_pt_host_adapter *pth_na; + struct ifnet *ifp = NULL; + int error; + + /* Check if it is a request for a ptnetmap adapter */ + if ((nmr->nr_flags & (NR_PTNETMAP_HOST)) == 0) { + return 0; + } + + D("Requesting a ptnetmap host adapter"); + + pth_na = nm_os_malloc(sizeof(*pth_na)); + if (pth_na == NULL) { + D("ERROR malloc"); + return ENOMEM; + } + + /* first, try to find the adapter that we want to passthrough + * We use the same nmr, after we have turned off the ptnetmap flag. + * In this way we can potentially passthrough everything netmap understands. + */ + memcpy(&parent_nmr, nmr, sizeof(parent_nmr)); + parent_nmr.nr_flags &= ~(NR_PTNETMAP_HOST); + error = netmap_get_na(&parent_nmr, &parent, &ifp, create); + if (error) { + D("parent lookup failed: %d", error); + goto put_out_noputparent; + } + DBG(D("found parent: %s", parent->name)); + + /* make sure the interface is not already in use */ + if (NETMAP_OWNED_BY_ANY(parent)) { + D("NIC %s busy, cannot ptnetmap", parent->name); + error = EBUSY; + goto put_out; + } + + pth_na->parent = parent; + + /* Follow netmap_attach()-like operations for the host + * ptnetmap adapter. */ + + //XXX pth_na->up.na_flags = parent->na_flags; + pth_na->up.num_rx_rings = parent->num_rx_rings; + pth_na->up.num_tx_rings = parent->num_tx_rings; + pth_na->up.num_tx_desc = parent->num_tx_desc; + pth_na->up.num_rx_desc = parent->num_rx_desc; + + pth_na->up.nm_dtor = nm_pt_host_dtor; + pth_na->up.nm_register = nm_pt_host_register; + + /* Reuse parent's adapter txsync and rxsync methods. */ + pth_na->up.nm_txsync = parent->nm_txsync; + pth_na->up.nm_rxsync = parent->nm_rxsync; + + pth_na->up.nm_krings_create = nm_pt_host_krings_create; + pth_na->up.nm_krings_delete = nm_pt_host_krings_delete; + pth_na->up.nm_config = nm_pt_host_config; + + /* Set the notify method only or convenience, it will never + * be used, since - differently from default krings_create - we + * ptnetmap krings_create callback inits kring->nm_notify + * directly. */ + pth_na->up.nm_notify = nm_unused_notify; + + pth_na->up.nm_mem = netmap_mem_get(parent->nm_mem); + + pth_na->up.na_flags |= NAF_HOST_RINGS; + + error = netmap_attach_common(&pth_na->up); + if (error) { + D("ERROR netmap_attach_common()"); + goto put_out; + } + + *na = &pth_na->up; + netmap_adapter_get(*na); + + /* set parent busy, because attached for ptnetmap */ + parent->na_flags |= NAF_BUSY; + + strncpy(pth_na->up.name, parent->name, sizeof(pth_na->up.name)); + strcat(pth_na->up.name, "-PTN"); + + DBG(D("%s ptnetmap request DONE", pth_na->up.name)); + + /* drop the reference to the ifp, if any */ + if (ifp) + if_rele(ifp); + + return 0; + +put_out: + netmap_adapter_put(parent); + if (ifp) + if_rele(ifp); +put_out_noputparent: + nm_os_free(pth_na); + return error; +} +#endif /* WITH_PTNETMAP_HOST */ + +#ifdef WITH_PTNETMAP_GUEST +/* + * Guest ptnetmap txsync()/rxsync() routines, used in ptnet device drivers. + * These routines are reused across the different operating systems supported + * by netmap. + */ + +/* + * Reconcile host and guest views of the transmit ring. + * + * Guest user wants to transmit packets up to the one before ring->head, + * and guest kernel knows tx_ring->hwcur is the first packet unsent + * by the host kernel. + * + * We push out as many packets as possible, and possibly + * reclaim buffers from previously completed transmission. + * + * Notifications from the host are enabled only if the user guest would + * block (no space in the ring). + */ +bool +netmap_pt_guest_txsync(struct ptnet_ring *ptring, struct netmap_kring *kring, + int flags) +{ + bool notify = false; + + /* Disable notifications */ + ptring->guest_need_kick = 0; + + /* + * First part: tell the host (updating the CSB) to process the new + * packets. + */ + kring->nr_hwcur = ptring->hwcur; + ptnetmap_guest_write_kring_csb(ptring, kring->rcur, kring->rhead); + + /* Ask for a kick from a guest to the host if needed. */ + if ((kring->rhead != kring->nr_hwcur && + NM_ACCESS_ONCE(ptring->host_need_kick)) || + (flags & NAF_FORCE_RECLAIM)) { + ptring->sync_flags = flags; + notify = true; + } + + /* + * Second part: reclaim buffers for completed transmissions. + */ + if (nm_kr_txempty(kring) || (flags & NAF_FORCE_RECLAIM)) { + ptnetmap_guest_read_kring_csb(ptring, kring); + } + + /* + * No more room in the ring for new transmissions. The user thread will + * go to sleep and we need to be notified by the host when more free + * space is available. + */ + if (nm_kr_txempty(kring)) { + /* Reenable notifications. */ + ptring->guest_need_kick = 1; + /* Double check */ + ptnetmap_guest_read_kring_csb(ptring, kring); + /* If there is new free space, disable notifications */ + if (unlikely(!nm_kr_txempty(kring))) { + ptring->guest_need_kick = 0; + } + } + + ND(1, "TX - CSB: head:%u cur:%u hwtail:%u - KRING: head:%u cur:%u tail: %u", + ptring->head, ptring->cur, ptring->hwtail, + kring->rhead, kring->rcur, kring->nr_hwtail); + + return notify; +} + +/* + * Reconcile host and guest view of the receive ring. + * + * Update hwcur/hwtail from host (reading from CSB). + * + * If guest user has released buffers up to the one before ring->head, we + * also give them to the host. + * + * Notifications from the host are enabled only if the user guest would + * block (no more completed slots in the ring). + */ +bool +netmap_pt_guest_rxsync(struct ptnet_ring *ptring, struct netmap_kring *kring, + int flags) +{ + bool notify = false; + + /* Disable notifications */ + ptring->guest_need_kick = 0; + + /* + * First part: import newly received packets, by updating the kring + * hwtail to the hwtail known from the host (read from the CSB). + * This also updates the kring hwcur. + */ + ptnetmap_guest_read_kring_csb(ptring, kring); + kring->nr_kflags &= ~NKR_PENDINTR; + + /* + * Second part: tell the host about the slots that guest user has + * released, by updating cur and head in the CSB. + */ + if (kring->rhead != kring->nr_hwcur) { + ptnetmap_guest_write_kring_csb(ptring, kring->rcur, + kring->rhead); + /* Ask for a kick from the guest to the host if needed. */ + if (NM_ACCESS_ONCE(ptring->host_need_kick)) { + ptring->sync_flags = flags; + notify = true; + } + } + + /* + * No more completed RX slots. The user thread will go to sleep and + * we need to be notified by the host when more RX slots have been + * completed. + */ + if (nm_kr_rxempty(kring)) { + /* Reenable notifications. */ + ptring->guest_need_kick = 1; + /* Double check */ + ptnetmap_guest_read_kring_csb(ptring, kring); + /* If there are new slots, disable notifications. */ + if (!nm_kr_rxempty(kring)) { + ptring->guest_need_kick = 0; + } + } + + ND(1, "RX - CSB: head:%u cur:%u hwtail:%u - KRING: head:%u cur:%u", + ptring->head, ptring->cur, ptring->hwtail, + kring->rhead, kring->rcur); + + return notify; +} + +/* + * Callbacks for ptnet drivers: nm_krings_create, nm_krings_delete, nm_dtor. + */ +int +ptnet_nm_krings_create(struct netmap_adapter *na) +{ + struct netmap_pt_guest_adapter *ptna = + (struct netmap_pt_guest_adapter *)na; /* Upcast. */ + struct netmap_adapter *na_nm = &ptna->hwup.up; + struct netmap_adapter *na_dr = &ptna->dr.up; + int ret; + + if (ptna->backend_regifs) { + return 0; + } + + /* Create krings on the public netmap adapter. */ + ret = netmap_hw_krings_create(na_nm); + if (ret) { + return ret; + } + + /* Copy krings into the netmap adapter private to the driver. */ + na_dr->tx_rings = na_nm->tx_rings; + na_dr->rx_rings = na_nm->rx_rings; + + return 0; +} + +void +ptnet_nm_krings_delete(struct netmap_adapter *na) +{ + struct netmap_pt_guest_adapter *ptna = + (struct netmap_pt_guest_adapter *)na; /* Upcast. */ + struct netmap_adapter *na_nm = &ptna->hwup.up; + struct netmap_adapter *na_dr = &ptna->dr.up; + + if (ptna->backend_regifs) { + return; + } + + na_dr->tx_rings = NULL; + na_dr->rx_rings = NULL; + + netmap_hw_krings_delete(na_nm); +} + +void +ptnet_nm_dtor(struct netmap_adapter *na) +{ + struct netmap_pt_guest_adapter *ptna = + (struct netmap_pt_guest_adapter *)na; + + netmap_mem_put(ptna->dr.up.nm_mem); // XXX is this needed? + memset(&ptna->dr, 0, sizeof(ptna->dr)); + netmap_mem_pt_guest_ifp_del(na->nm_mem, na->ifp); +} + +#endif /* WITH_PTNETMAP_GUEST */ diff -u -r -N usr/src/sys/dev/netmap/netmap_vale.c /usr/src/sys/dev/netmap/netmap_vale.c --- usr/src/sys/dev/netmap/netmap_vale.c 2016-09-29 00:24:47.000000000 +0100 +++ /usr/src/sys/dev/netmap/netmap_vale.c 2016-11-23 16:57:57.856042000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2013-2014 Universita` di Pisa. All rights reserved. + * Copyright (C) 2013-2016 Universita` di Pisa + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -57,7 +58,7 @@ #if defined(__FreeBSD__) #include <sys/cdefs.h> /* prerequisite */ -__FBSDID("$FreeBSD: releng/11.0/sys/dev/netmap/netmap_vale.c 285698 2015-07-19 18:06:30Z luigi $"); +__FBSDID("$FreeBSD: head/sys/dev/netmap/netmap.c 257176 2013-10-26 17:58:36Z glebius $"); #include <sys/types.h> #include <sys/errno.h> @@ -101,6 +102,9 @@ #warning OSX support is only partial #include "osx_glue.h" +#elif defined(_WIN32) +#include "win_glue.h" + #else #error Unsupported platform @@ -119,7 +123,7 @@ /* * system parameters (most of them in netmap_kern.h) - * NM_NAME prefix for switch port names, default "vale" + * NM_BDG_NAME prefix for switch port names, default "vale" * NM_BDG_MAXPORTS number of ports * NM_BRIDGES max number of switches in the system. * XXX should become a sysctl or tunable @@ -144,7 +148,6 @@ #define NM_BDG_BATCH_MAX (NM_BDG_BATCH + NM_MULTISEG) /* NM_FT_NULL terminates a list of slots in the ft */ #define NM_FT_NULL NM_BDG_BATCH_MAX -#define NM_BRIDGES 8 /* number of bridges */ /* @@ -152,14 +155,15 @@ * used in the bridge. The actual value may be larger as the * last packet in the block may overflow the size. */ -int bridge_batch = NM_BDG_BATCH; /* bridge batch size */ +static int bridge_batch = NM_BDG_BATCH; /* bridge batch size */ +SYSBEGIN(vars_vale); SYSCTL_DECL(_dev_netmap); SYSCTL_INT(_dev_netmap, OID_AUTO, bridge_batch, CTLFLAG_RW, &bridge_batch, 0 , ""); - +SYSEND; static int netmap_vp_create(struct nmreq *, struct ifnet *, struct netmap_vp_adapter **); static int netmap_vp_reg(struct netmap_adapter *na, int onoff); -static int netmap_bwrap_register(struct netmap_adapter *, int onoff); +static int netmap_bwrap_reg(struct netmap_adapter *, int onoff); /* * For each output interface, nm_bdg_q is used to construct a list. @@ -213,7 +217,7 @@ * forward this packet. ring_nr is the source ring index, and the * function may overwrite this value to forward this packet to a * different ring index. - * This function must be set by netmap_bdgctl(). + * This function must be set by netmap_bdg_ctl(). */ struct netmap_bdg_ops bdg_ops; @@ -244,7 +248,7 @@ * Right now we have a static array and deletions are protected * by an exclusive lock. */ -struct nm_bridge *nm_bridges; +static struct nm_bridge *nm_bridges; #endif /* !CONFIG_NET_NS */ @@ -278,6 +282,45 @@ } +static int +nm_is_id_char(const char c) +{ + return (c >= 'a' && c <= 'z') || + (c >= 'A' && c <= 'Z') || + (c >= '0' && c <= '9') || + (c == '_'); +} + +/* Validate the name of a VALE bridge port and return the + * position of the ":" character. */ +static int +nm_vale_name_validate(const char *name) +{ + int colon_pos = -1; + int i; + + if (!name || strlen(name) < strlen(NM_BDG_NAME)) { + return -1; + } + + for (i = 0; name[i]; i++) { + if (name[i] == ':') { + if (colon_pos != -1) { + return -1; + } + colon_pos = i; + } else if (!nm_is_id_char(name[i])) { + return -1; + } + } + + if (i >= IFNAMSIZ) { + return -1; + } + + return colon_pos; +} + /* * locate a bridge among the existing ones. * MUST BE CALLED WITH NMG_LOCK() @@ -288,7 +331,7 @@ static struct nm_bridge * nm_find_bridge(const char *name, int create) { - int i, l, namelen; + int i, namelen; struct nm_bridge *b = NULL, *bridges; u_int num_bridges; @@ -296,21 +339,11 @@ netmap_bns_getbridges(&bridges, &num_bridges); - namelen = strlen(NM_NAME); /* base length */ - l = name ? strlen(name) : 0; /* actual length */ - if (l < namelen) { + namelen = nm_vale_name_validate(name); + if (namelen < 0) { D("invalid bridge name %s", name ? name : NULL); return NULL; } - for (i = namelen + 1; i < l; i++) { - if (name[i] == ':') { - namelen = i; - break; - } - } - if (namelen >= IFNAMSIZ) - namelen = IFNAMSIZ; - ND("--- prefix is '%.*s' ---", namelen, name); /* lookup the name, remember empty slot if there is one */ for (i = 0; i < num_bridges; i++) { @@ -360,7 +393,7 @@ kring = na->tx_rings; for (i = 0; i < nrings; i++) { if (kring[i].nkr_ft) { - free(kring[i].nkr_ft, M_DEVBUF); + nm_os_free(kring[i].nkr_ft); kring[i].nkr_ft = NULL; /* protect from freeing twice */ } } @@ -390,7 +423,7 @@ struct nm_bdg_q *dstq; int j; - ft = malloc(l, M_DEVBUF, M_NOWAIT | M_ZERO); + ft = nm_os_malloc(l); if (!ft) { nm_free_bdgfwd(na); return ENOMEM; @@ -479,6 +512,7 @@ struct netmap_vp_adapter *vpna = (struct netmap_vp_adapter *)na; struct nm_bridge *b = vpna->na_bdg; + (void)nmr; // XXX merge ? if (attach) return 0; /* nothing to do */ if (b) { @@ -518,7 +552,7 @@ return ENXIO; NMG_LOCK(); /* make sure this is actually a VALE port */ - if (!NETMAP_CAPABLE(ifp) || NA(ifp)->nm_register != netmap_vp_reg) { + if (!NM_NA_VALID(ifp) || NA(ifp)->nm_register != netmap_vp_reg) { error = EINVAL; goto err; } @@ -535,7 +569,7 @@ */ if_rele(ifp); netmap_detach(ifp); - nm_vi_detach(ifp); + nm_os_vi_detach(ifp); return 0; err: @@ -544,6 +578,16 @@ return error; } +static int +nm_update_info(struct nmreq *nmr, struct netmap_adapter *na) +{ + nmr->nr_rx_rings = na->num_rx_rings; + nmr->nr_tx_rings = na->num_tx_rings; + nmr->nr_rx_slots = na->num_rx_desc; + nmr->nr_tx_slots = na->num_tx_desc; + return netmap_mem_get_info(na->nm_mem, &nmr->nr_memsize, NULL, &nmr->nr_arg2); +} + /* * Create a virtual interface registered to the system. * The interface will be attached to a bridge later. @@ -556,14 +600,22 @@ int error; /* don't include VALE prefix */ - if (!strncmp(nmr->nr_name, NM_NAME, strlen(NM_NAME))) + if (!strncmp(nmr->nr_name, NM_BDG_NAME, strlen(NM_BDG_NAME))) return EINVAL; ifp = ifunit_ref(nmr->nr_name); if (ifp) { /* already exist, cannot create new one */ + error = EEXIST; + NMG_LOCK(); + if (NM_NA_VALID(ifp)) { + int update_err = nm_update_info(nmr, NA(ifp)); + if (update_err) + error = update_err; + } + NMG_UNLOCK(); if_rele(ifp); - return EEXIST; + return error; } - error = nm_vi_persist(nmr->nr_name, &ifp); + error = nm_os_vi_persist(nmr->nr_name, &ifp); if (error) return error; @@ -572,15 +624,29 @@ error = netmap_vp_create(nmr, ifp, &vpna); if (error) { D("error %d", error); - nm_vi_detach(ifp); - return error; + goto err_1; } /* persist-specific routines */ vpna->up.nm_bdg_ctl = netmap_vp_bdg_ctl; netmap_adapter_get(&vpna->up); + NM_ATTACH_NA(ifp, &vpna->up); + /* return the updated info */ + error = nm_update_info(nmr, &vpna->up); + if (error) { + goto err_2; + } + D("returning nr_arg2 %d", nmr->nr_arg2); NMG_UNLOCK(); D("created %s", ifp->if_xname); return 0; + +err_2: + netmap_detach(ifp); +err_1: + NMG_UNLOCK(); + nm_os_vi_detach(ifp); + + return error; } /* Try to get a reference to a netmap adapter attached to a VALE switch. @@ -608,7 +674,7 @@ /* first try to see if this is a bridge port. */ NMG_LOCK_ASSERT(); - if (strncmp(nr_name, NM_NAME, sizeof(NM_NAME) - 1)) { + if (strncmp(nr_name, NM_BDG_NAME, sizeof(NM_BDG_NAME) - 1)) { return 0; /* no error, but no VALE prefix */ } @@ -674,7 +740,7 @@ error = netmap_vp_create(nmr, NULL, &vpna); if (error) { D("error %d", error); - free(ifp, M_DEVBUF); + nm_os_free(ifp); return error; } /* shortcut - we can skip get_hw_na(), @@ -693,7 +759,6 @@ goto out; vpna = hw->na_vp; hostna = hw->na_hostvp; - if_rele(ifp); if (nmr->nr_arg1 != NETMAP_BDG_HOST) hostna = NULL; } @@ -768,6 +833,11 @@ return error; } +static inline int +nm_is_bwrap(struct netmap_adapter *na) +{ + return na->nm_register == netmap_bwrap_reg; +} /* process NETMAP_BDG_DETACH */ static int @@ -785,8 +855,13 @@ if (na == NULL) { /* VALE prefix missing */ error = EINVAL; goto unlock_exit; + } else if (nm_is_bwrap(na) && + ((struct netmap_bwrap_adapter *)na)->na_polling_state) { + /* Don't detach a NIC with polling */ + error = EBUSY; + netmap_adapter_put(na); + goto unlock_exit; } - if (na->nm_bdg_ctl) { /* remove the port from bridge. The bwrap * also needs to put the hwna in normal mode @@ -801,6 +876,266 @@ } +struct nm_bdg_polling_state; +struct +nm_bdg_kthread { + struct nm_kthread *nmk; + u_int qfirst; + u_int qlast; + struct nm_bdg_polling_state *bps; +}; + +struct nm_bdg_polling_state { + bool configured; + bool stopped; + struct netmap_bwrap_adapter *bna; + u_int reg; + u_int qfirst; + u_int qlast; + u_int cpu_from; + u_int ncpus; + struct nm_bdg_kthread *kthreads; +}; + +static void +netmap_bwrap_polling(void *data) +{ + struct nm_bdg_kthread *nbk = data; + struct netmap_bwrap_adapter *bna; + u_int qfirst, qlast, i; + struct netmap_kring *kring0, *kring; + + if (!nbk) + return; + qfirst = nbk->qfirst; + qlast = nbk->qlast; + bna = nbk->bps->bna; + kring0 = NMR(bna->hwna, NR_RX); + + for (i = qfirst; i < qlast; i++) { + kring = kring0 + i; + kring->nm_notify(kring, 0); + } +} + +static int +nm_bdg_create_kthreads(struct nm_bdg_polling_state *bps) +{ + struct nm_kthread_cfg kcfg; + int i, j; + + bps->kthreads = nm_os_malloc(sizeof(struct nm_bdg_kthread) * bps->ncpus); + if (bps->kthreads == NULL) + return ENOMEM; + + bzero(&kcfg, sizeof(kcfg)); + kcfg.worker_fn = netmap_bwrap_polling; + for (i = 0; i < bps->ncpus; i++) { + struct nm_bdg_kthread *t = bps->kthreads + i; + int all = (bps->ncpus == 1 && bps->reg == NR_REG_ALL_NIC); + int affinity = bps->cpu_from + i; + + t->bps = bps; + t->qfirst = all ? bps->qfirst /* must be 0 */: affinity; + t->qlast = all ? bps->qlast : t->qfirst + 1; + D("kthread %d a:%u qf:%u ql:%u", i, affinity, t->qfirst, + t->qlast); + + kcfg.type = i; + kcfg.worker_private = t; + t->nmk = nm_os_kthread_create(&kcfg, 0, NULL); + if (t->nmk == NULL) { + goto cleanup; + } + nm_os_kthread_set_affinity(t->nmk, affinity); + } + return 0; + +cleanup: + for (j = 0; j < i; j++) { + struct nm_bdg_kthread *t = bps->kthreads + i; + nm_os_kthread_delete(t->nmk); + } + nm_os_free(bps->kthreads); + return EFAULT; +} + +/* a version of ptnetmap_start_kthreads() */ +static int +nm_bdg_polling_start_kthreads(struct nm_bdg_polling_state *bps) +{ + int error, i, j; + + if (!bps) { + D("polling is not configured"); + return EFAULT; + } + bps->stopped = false; + + for (i = 0; i < bps->ncpus; i++) { + struct nm_bdg_kthread *t = bps->kthreads + i; + error = nm_os_kthread_start(t->nmk); + if (error) { + D("error in nm_kthread_start()"); + goto cleanup; + } + } + return 0; + +cleanup: + for (j = 0; j < i; j++) { + struct nm_bdg_kthread *t = bps->kthreads + i; + nm_os_kthread_stop(t->nmk); + } + bps->stopped = true; + return error; +} + +static void +nm_bdg_polling_stop_delete_kthreads(struct nm_bdg_polling_state *bps) +{ + int i; + + if (!bps) + return; + + for (i = 0; i < bps->ncpus; i++) { + struct nm_bdg_kthread *t = bps->kthreads + i; + nm_os_kthread_stop(t->nmk); + nm_os_kthread_delete(t->nmk); + } + bps->stopped = true; +} + +static int +get_polling_cfg(struct nmreq *nmr, struct netmap_adapter *na, + struct nm_bdg_polling_state *bps) +{ + int req_cpus, avail_cpus, core_from; + u_int reg, i, qfirst, qlast; + + avail_cpus = nm_os_ncpus(); + req_cpus = nmr->nr_arg1; + + if (req_cpus == 0) { + D("req_cpus must be > 0"); + return EINVAL; + } else if (req_cpus >= avail_cpus) { + D("for safety, we need at least one core left in the system"); + return EINVAL; + } + reg = nmr->nr_flags & NR_REG_MASK; + i = nmr->nr_ringid & NETMAP_RING_MASK; + /* + * ONE_NIC: dedicate one core to one ring. If multiple cores + * are specified, consecutive rings are also polled. + * For example, if ringid=2 and 2 cores are given, + * ring 2 and 3 are polled by core 2 and 3, respectively. + * ALL_NIC: poll all the rings using a core specified by ringid. + * the number of cores must be 1. + */ + if (reg == NR_REG_ONE_NIC) { + if (i + req_cpus > nma_get_nrings(na, NR_RX)) { + D("only %d rings exist (ring %u-%u is given)", + nma_get_nrings(na, NR_RX), i, i+req_cpus); + return EINVAL; + } + qfirst = i; + qlast = qfirst + req_cpus; + core_from = qfirst; + } else if (reg == NR_REG_ALL_NIC) { + if (req_cpus != 1) { + D("ncpus must be 1 not %d for REG_ALL_NIC", req_cpus); + return EINVAL; + } + qfirst = 0; + qlast = nma_get_nrings(na, NR_RX); + core_from = i; + } else { + D("reg must be ALL_NIC or ONE_NIC"); + return EINVAL; + } + + bps->reg = reg; + bps->qfirst = qfirst; + bps->qlast = qlast; + bps->cpu_from = core_from; + bps->ncpus = req_cpus; + D("%s qfirst %u qlast %u cpu_from %u ncpus %u", + reg == NR_REG_ALL_NIC ? "REG_ALL_NIC" : "REG_ONE_NIC", + qfirst, qlast, core_from, req_cpus); + return 0; +} + +static int +nm_bdg_ctl_polling_start(struct nmreq *nmr, struct netmap_adapter *na) +{ + struct nm_bdg_polling_state *bps; + struct netmap_bwrap_adapter *bna; + int error; + + bna = (struct netmap_bwrap_adapter *)na; + if (bna->na_polling_state) { + D("ERROR adapter already in polling mode"); + return EFAULT; + } + + bps = nm_os_malloc(sizeof(*bps)); + if (!bps) + return ENOMEM; + bps->configured = false; + bps->stopped = true; + + if (get_polling_cfg(nmr, na, bps)) { + nm_os_free(bps); + return EINVAL; + } + + if (nm_bdg_create_kthreads(bps)) { + nm_os_free(bps); + return EFAULT; + } + + bps->configured = true; + bna->na_polling_state = bps; + bps->bna = bna; + + /* disable interrupt if possible */ + if (bna->hwna->nm_intr) + bna->hwna->nm_intr(bna->hwna, 0); + /* start kthread now */ + error = nm_bdg_polling_start_kthreads(bps); + if (error) { + D("ERROR nm_bdg_polling_start_kthread()"); + nm_os_free(bps->kthreads); + nm_os_free(bps); + bna->na_polling_state = NULL; + if (bna->hwna->nm_intr) + bna->hwna->nm_intr(bna->hwna, 1); + } + return error; +} + +static int +nm_bdg_ctl_polling_stop(struct nmreq *nmr, struct netmap_adapter *na) +{ + struct netmap_bwrap_adapter *bna = (struct netmap_bwrap_adapter *)na; + struct nm_bdg_polling_state *bps; + + if (!bna->na_polling_state) { + D("ERROR adapter is not in polling mode"); + return EFAULT; + } + bps = bna->na_polling_state; + nm_bdg_polling_stop_delete_kthreads(bna->na_polling_state); + bps->configured = false; + nm_os_free(bps); + bna->na_polling_state = NULL; + /* reenable interrupt */ + if (bna->hwna->nm_intr) + bna->hwna->nm_intr(bna->hwna, 1); + return 0; +} /* Called by either user's context (netmap_ioctl()) * or external kernel modules (e.g., Openvswitch). @@ -843,7 +1178,7 @@ case NETMAP_BDG_LIST: /* this is used to enumerate bridges and ports */ if (namelen) { /* look up indexes of bridge and port */ - if (strncmp(name, NM_NAME, strlen(NM_NAME))) { + if (strncmp(name, NM_BDG_NAME, strlen(NM_BDG_NAME))) { error = EINVAL; break; } @@ -855,7 +1190,9 @@ break; } - error = ENOENT; + error = 0; + nmr->nr_arg1 = b - bridges; /* bridge index */ + nmr->nr_arg2 = NM_BDG_NOPORT; for (j = 0; j < b->bdg_active_ports; j++) { i = b->bdg_port_index[j]; vpna = b->bdg_ports[i]; @@ -867,10 +1204,7 @@ * virtual port and a NIC, respectively */ if (!strcmp(vpna->up.name, name)) { - /* bridge index */ - nmr->nr_arg1 = b - bridges; nmr->nr_arg2 = i; /* port index */ - error = 0; break; } } @@ -937,10 +1271,34 @@ error = netmap_get_bdg_na(nmr, &na, 0); if (na && !error) { vpna = (struct netmap_vp_adapter *)na; - vpna->virt_hdr_len = nmr->nr_arg1; - if (vpna->virt_hdr_len) + na->virt_hdr_len = nmr->nr_arg1; + if (na->virt_hdr_len) { vpna->mfs = NETMAP_BUF_SIZE(na); - D("Using vnet_hdr_len %d for %p", vpna->virt_hdr_len, vpna); + } + D("Using vnet_hdr_len %d for %p", na->virt_hdr_len, na); + netmap_adapter_put(na); + } else if (!na) { + error = ENXIO; + } + NMG_UNLOCK(); + break; + + case NETMAP_BDG_POLLING_ON: + case NETMAP_BDG_POLLING_OFF: + NMG_LOCK(); + error = netmap_get_bdg_na(nmr, &na, 0); + if (na && !error) { + if (!nm_is_bwrap(na)) { + error = EOPNOTSUPP; + } else if (cmd == NETMAP_BDG_POLLING_ON) { + error = nm_bdg_ctl_polling_start(nmr, na); + if (!error) + netmap_adapter_get(na); + } else { + error = nm_bdg_ctl_polling_stop(nmr, na); + if (!error) + netmap_adapter_put(na); + } netmap_adapter_put(na); } NMG_UNLOCK(); @@ -1097,10 +1455,12 @@ ft_i = nm_bdg_flush(ft, ft_i, na, ring_nr); } if (frags > 1) { - D("truncate incomplete fragment at %d (%d frags)", ft_i, frags); - // ft_i > 0, ft[ft_i-1].flags has NS_MOREFRAG - ft[ft_i - 1].ft_frags &= ~NS_MOREFRAG; - ft[ft_i - frags].ft_frags = frags - 1; + /* Here ft_i > 0, ft[ft_i-1].flags has NS_MOREFRAG, and we + * have to fix frags count. */ + frags--; + ft[ft_i - 1].ft_flags &= ~NS_MOREFRAG; + ft[ft_i - frags].ft_frags = frags; + D("Truncate incomplete fragment at %d (%d frags)", ft_i, frags); } if (ft_i) ft_i = nm_bdg_flush(ft, ft_i, na, ring_nr); @@ -1157,6 +1517,8 @@ { struct netmap_vp_adapter *vpna = (struct netmap_vp_adapter*)na; + enum txrx t; + int i; /* persistent ports may be put in netmap mode * before being attached to a bridge @@ -1164,12 +1526,30 @@ if (vpna->na_bdg) BDG_WLOCK(vpna->na_bdg); if (onoff) { - na->na_flags |= NAF_NETMAP_ON; + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_on(kring)) + kring->nr_mode = NKR_NETMAP_ON; + } + } + if (na->active_fds == 0) + na->na_flags |= NAF_NETMAP_ON; /* XXX on FreeBSD, persistent VALE ports should also * toggle IFCAP_NETMAP in na->ifp (2014-03-16) */ } else { - na->na_flags &= ~NAF_NETMAP_ON; + if (na->active_fds == 0) + na->na_flags &= ~NAF_NETMAP_ON; + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) { + struct netmap_kring *kring = &NMR(na, t)[i]; + + if (nm_kring_pending_off(kring)) + kring->nr_mode = NKR_NETMAP_OFF; + } + } } if (vpna->na_bdg) BDG_WUNLOCK(vpna->na_bdg); @@ -1193,13 +1573,14 @@ uint32_t sh, dh; u_int dst, mysrc = na->bdg_port; uint64_t smac, dmac; + uint8_t indbuf[12]; /* safety check, unfortunately we have many cases */ - if (buf_len >= 14 + na->virt_hdr_len) { + if (buf_len >= 14 + na->up.virt_hdr_len) { /* virthdr + mac_hdr in the same slot */ - buf += na->virt_hdr_len; - buf_len -= na->virt_hdr_len; - } else if (buf_len == na->virt_hdr_len && ft->ft_flags & NS_MOREFRAG) { + buf += na->up.virt_hdr_len; + buf_len -= na->up.virt_hdr_len; + } else if (buf_len == na->up.virt_hdr_len && ft->ft_flags & NS_MOREFRAG) { /* only header in first fragment */ ft++; buf = ft->ft_buf; @@ -1208,6 +1589,14 @@ RD(5, "invalid buf format, length %d", buf_len); return NM_BDG_NOPORT; } + + if (ft->ft_flags & NS_INDIRECT) { + if (copyin(buf, indbuf, sizeof(indbuf))) { + return NM_BDG_NOPORT; + } + buf = indbuf; + } + dmac = le64toh(*(uint64_t *)(buf)) & 0xffffffffffff; smac = le64toh(*(uint64_t *)(buf + 4)); smac >>= 16; @@ -1321,7 +1710,7 @@ struct nm_bdg_q *dst_ents, *brddst; uint16_t num_dsts = 0, *dsts; struct nm_bridge *b = na->na_bdg; - u_int i, j, me = na->bdg_port; + u_int i, me = na->bdg_port; /* * The work area (pointed by ft) is followed by an array of @@ -1341,7 +1730,7 @@ ND("slot %d frags %d", i, ft[i].ft_frags); /* Drop the packet if the virtio-net header is not into the first fragment nor at the very beginning of the second. */ - if (unlikely(na->virt_hdr_len > ft[i].ft_len)) + if (unlikely(na->up.virt_hdr_len > ft[i].ft_len)) continue; dst_port = b->bdg_ops.lookup(&ft[i], &dst_ring, na); if (netmap_verbose > 255) @@ -1382,6 +1771,7 @@ */ brddst = dst_ents + NM_BDG_BROADCAST * NM_BDG_MAXRINGS; if (brddst->bq_head != NM_FT_NULL) { + u_int j; for (j = 0; likely(j < b->bdg_active_ports); j++) { uint16_t d_i; i = b->bdg_port_index[j]; @@ -1441,8 +1831,9 @@ */ needed = d->bq_len + brddst->bq_len; - if (unlikely(dst_na->virt_hdr_len != na->virt_hdr_len)) { - RD(3, "virt_hdr_mismatch, src %d dst %d", na->virt_hdr_len, dst_na->virt_hdr_len); + if (unlikely(dst_na->up.virt_hdr_len != na->up.virt_hdr_len)) { + RD(3, "virt_hdr_mismatch, src %d dst %d", na->up.virt_hdr_len, + dst_na->up.virt_hdr_len); /* There is a virtio-net header/offloadings mismatch between * source and destination. The slower mismatch datapath will * be used to cope with all the mismatches. @@ -1768,10 +2159,10 @@ { struct netmap_vp_adapter *vpna; struct netmap_adapter *na; - int error; + int error = 0; u_int npipes = 0; - vpna = malloc(sizeof(*vpna), M_DEVBUF, M_NOWAIT | M_ZERO); + vpna = nm_os_malloc(sizeof(*vpna)); if (vpna == NULL) return ENOMEM; @@ -1803,7 +2194,6 @@ nm_bound_var(&nmr->nr_arg3, 0, 0, 128*NM_BDG_MAXSLOTS, NULL); na->num_rx_desc = nmr->nr_rx_slots; - vpna->virt_hdr_len = 0; vpna->mfs = 1514; vpna->last_smac = ~0llu; /*if (vpna->mfs > netmap_buf_size) TODO netmap_buf_size is zero?? @@ -1823,7 +2213,10 @@ na->nm_krings_create = netmap_vp_krings_create; na->nm_krings_delete = netmap_vp_krings_delete; na->nm_dtor = netmap_vp_dtor; - na->nm_mem = netmap_mem_private_new(na->name, + D("nr_arg2 %d", nmr->nr_arg2); + na->nm_mem = (nmr->nr_arg2 > 0) ? + netmap_mem_find(nmr->nr_arg2): + netmap_mem_private_new( na->num_tx_rings, na->num_tx_desc, na->num_rx_rings, na->num_rx_desc, nmr->nr_arg3, npipes, &error); @@ -1839,8 +2232,8 @@ err: if (na->nm_mem != NULL) - netmap_mem_delete(na->nm_mem); - free(vpna, M_DEVBUF); + netmap_mem_put(na->nm_mem); + nm_os_free(vpna); return error; } @@ -1880,19 +2273,19 @@ { struct netmap_bwrap_adapter *bna = (struct netmap_bwrap_adapter*)na; struct netmap_adapter *hwna = bna->hwna; + struct nm_bridge *b = bna->up.na_bdg, + *bh = bna->host.na_bdg; + + netmap_mem_put(bna->host.up.nm_mem); + + if (b) { + netmap_bdg_detach_common(b, bna->up.bdg_port, + (bh ? bna->host.bdg_port : -1)); + } ND("na %p", na); - /* drop reference to hwna->ifp. - * If we don't do this, netmap_detach_common(na) - * will think it has set NA(na->ifp) to NULL - */ na->ifp = NULL; - /* for safety, also drop the possible reference - * in the hostna - */ bna->host.up.ifp = NULL; - - hwna->nm_mem = bna->save_nmd; hwna->na_private = NULL; hwna->na_vp = hwna->na_hostvp = NULL; hwna->na_flags &= ~NAF_BUSY; @@ -1916,7 +2309,8 @@ * (part as a receive ring, part as a transmit ring). * * callback that overwrites the hwna notify callback. - * Packets come from the outside or from the host stack and are put on an hwna rx ring. + * Packets come from the outside or from the host stack and are put on an + * hwna rx ring. * The bridge wrapper then sends the packets through the bridge. */ static int @@ -1927,19 +2321,18 @@ struct netmap_kring *bkring; struct netmap_vp_adapter *vpna = &bna->up; u_int ring_nr = kring->ring_id; - int error = 0; + int ret = NM_IRQ_COMPLETED; + int error; if (netmap_verbose) D("%s %s 0x%x", na->name, kring->name, flags); - if (!nm_netmap_on(na)) - return 0; - bkring = &vpna->up.tx_rings[ring_nr]; /* make sure the ring is not disabled */ - if (nm_kr_tryget(kring)) - return 0; + if (nm_kr_tryget(kring, 0 /* can't sleep */, NULL)) { + return EIO; + } if (netmap_verbose) D("%s head %d cur %d tail %d", na->name, @@ -1951,9 +2344,10 @@ error = kring->nm_sync(kring, 0); if (error) goto put_out; - if (kring->nr_hwcur == kring->nr_hwtail && netmap_verbose) { - D("how strange, interrupt with no packets on %s", - na->name); + if (kring->nr_hwcur == kring->nr_hwtail) { + if (netmap_verbose) + D("how strange, interrupt with no packets on %s", + na->name); goto put_out; } @@ -1970,28 +2364,32 @@ /* another call to actually release the buffers */ error = kring->nm_sync(kring, 0); + /* The second rxsync may have further advanced hwtail. If this happens, + * return NM_IRQ_RESCHED, otherwise just return NM_IRQ_COMPLETED. */ + if (kring->rcur != kring->nr_hwtail) { + ret = NM_IRQ_RESCHED; + } put_out: nm_kr_put(kring); - return error; + + return error ? error : ret; } /* nm_register callback for bwrap */ static int -netmap_bwrap_register(struct netmap_adapter *na, int onoff) +netmap_bwrap_reg(struct netmap_adapter *na, int onoff) { struct netmap_bwrap_adapter *bna = (struct netmap_bwrap_adapter *)na; struct netmap_adapter *hwna = bna->hwna; struct netmap_vp_adapter *hostna = &bna->host; - int error; + int error, i; enum txrx t; ND("%s %s", na->name, onoff ? "on" : "off"); if (onoff) { - int i; - /* netmap_do_regif has been called on the bwrap na. * We need to pass the information about the * memory allocator down to the hwna before @@ -2010,16 +2408,32 @@ /* cross-link the netmap rings * The original number of rings comes from hwna, * rx rings on one side equals tx rings on the other. - * We need to do this now, after the initialization - * of the kring->ring pointers */ for_rx_tx(t) { - enum txrx r= nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */ - for (i = 0; i < nma_get_nrings(na, r) + 1; i++) { - NMR(hwna, t)[i].nkr_num_slots = NMR(na, r)[i].nkr_num_slots; - NMR(hwna, t)[i].ring = NMR(na, r)[i].ring; + enum txrx r = nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */ + for (i = 0; i < nma_get_nrings(hwna, r) + 1; i++) { + NMR(hwna, r)[i].ring = NMR(na, t)[i].ring; } } + + if (na->na_flags & NAF_HOST_RINGS) { + struct netmap_adapter *hna = &hostna->up; + /* the hostna rings are the host rings of the bwrap. + * The corresponding krings must point back to the + * hostna + */ + hna->tx_rings = &na->tx_rings[na->num_tx_rings]; + hna->tx_rings[0].na = hna; + hna->rx_rings = &na->rx_rings[na->num_rx_rings]; + hna->rx_rings[0].na = hna; + } + } + + /* pass down the pending ring state information */ + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) + NMR(hwna, t)[i].nr_pending_mode = + NMR(na, t)[i].nr_pending_mode; } /* forward the request to the hwna */ @@ -2027,6 +2441,13 @@ if (error) return error; + /* copy up the current ring state information */ + for_rx_tx(t) { + for (i = 0; i < nma_get_nrings(na, t) + 1; i++) + NMR(na, t)[i].nr_mode = + NMR(hwna, t)[i].nr_mode; + } + /* impersonate a netmap_vp_adapter */ netmap_vp_reg(na, onoff); if (hostna->na_bdg) @@ -2046,8 +2467,14 @@ /* also intercept the host ring notify */ hwna->rx_rings[i].nm_notify = netmap_bwrap_intr_notify; } + if (na->active_fds == 0) + na->na_flags |= NAF_NETMAP_ON; } else { u_int i; + + if (na->active_fds == 0) + na->na_flags &= ~NAF_NETMAP_ON; + /* reset all notify callbacks (including host ring) */ for (i = 0; i <= hwna->num_rx_rings; i++) { hwna->rx_rings[i].nm_notify = hwna->rx_rings[i].save_notify; @@ -2089,8 +2516,8 @@ struct netmap_bwrap_adapter *bna = (struct netmap_bwrap_adapter *)na; struct netmap_adapter *hwna = bna->hwna; - struct netmap_adapter *hostna = &bna->host.up; - int error; + int i, error = 0; + enum txrx t; ND("%s", na->name); @@ -2102,26 +2529,23 @@ /* also create the hwna krings */ error = hwna->nm_krings_create(hwna); if (error) { - netmap_vp_krings_delete(na); - return error; + goto err_del_vp_rings; } - /* the connection between the bwrap krings and the hwna krings - * will be perfomed later, in the nm_register callback, since - * now the kring->ring pointers have not been initialized yet - */ - if (na->na_flags & NAF_HOST_RINGS) { - /* the hostna rings are the host rings of the bwrap. - * The corresponding krings must point back to the - * hostna - */ - hostna->tx_rings = &na->tx_rings[na->num_tx_rings]; - hostna->tx_rings[0].na = hostna; - hostna->rx_rings = &na->rx_rings[na->num_rx_rings]; - hostna->rx_rings[0].na = hostna; + /* get each ring slot number from the corresponding hwna ring */ + for_rx_tx(t) { + enum txrx r = nm_txrx_swap(t); /* swap NR_TX <-> NR_RX */ + for (i = 0; i < nma_get_nrings(hwna, r) + 1; i++) { + NMR(na, t)[i].nkr_num_slots = NMR(hwna, r)[i].nkr_num_slots; + } } return 0; + +err_del_vp_rings: + netmap_vp_krings_delete(na); + + return error; } @@ -2149,19 +2573,18 @@ u_int ring_n = kring->ring_id; u_int lim = kring->nkr_num_slots - 1; struct netmap_kring *hw_kring; - int error = 0; + int error; - ND("%s: na %s hwna %s", + ND("%s: na %s hwna %s", (kring ? kring->name : "NULL!"), (na ? na->name : "NULL!"), (hwna ? hwna->name : "NULL!")); hw_kring = &hwna->tx_rings[ring_n]; - if (nm_kr_tryget(hw_kring)) - return 0; + if (nm_kr_tryget(hw_kring, 0, NULL)) { + return ENXIO; + } - if (!nm_netmap_on(hwna)) - return 0; /* first step: simulate a user wakeup on the rx ring */ netmap_vp_rxsync(kring, flags); ND("%s[%d] PRE rx(c%3d t%3d l%3d) ring(h%3d c%3d t%3d) tx(c%3d ht%3d t%3d)", @@ -2175,7 +2598,7 @@ hw_kring->rhead = hw_kring->rcur = kring->nr_hwtail; error = hw_kring->nm_sync(hw_kring, flags); if (error) - goto out; + goto put_out; /* third step: now we are back the rx ring */ /* claim ownership on all hw owned bufs */ @@ -2188,9 +2611,10 @@ kring->nr_hwcur, kring->nr_hwtail, kring->nkr_hwlease, ring->head, ring->cur, ring->tail, hw_kring->nr_hwcur, hw_kring->nr_hwtail, hw_kring->rtail); -out: +put_out: nm_kr_put(hw_kring); - return error; + + return error ? error : NM_IRQ_COMPLETED; } @@ -2217,44 +2641,23 @@ /* nothing to do */ return 0; } - npriv = malloc(sizeof(*npriv), M_DEVBUF, M_NOWAIT|M_ZERO); + npriv = netmap_priv_new(); if (npriv == NULL) return ENOMEM; - error = netmap_do_regif(npriv, na, nmr->nr_ringid, nmr->nr_flags); + npriv->np_ifp = na->ifp; /* let the priv destructor release the ref */ + error = netmap_do_regif(npriv, na, 0, NR_REG_NIC_SW); if (error) { - bzero(npriv, sizeof(*npriv)); - free(npriv, M_DEVBUF); + netmap_priv_delete(npriv); return error; } bna->na_kpriv = npriv; na->na_flags |= NAF_BUSY; } else { - int last_instance; - if (na->active_fds == 0) /* not registered */ return EINVAL; - last_instance = netmap_dtor_locked(bna->na_kpriv); - if (!last_instance) { - D("--- error, trying to detach an entry with active mmaps"); - error = EINVAL; - } else { - struct nm_bridge *b = bna->up.na_bdg, - *bh = bna->host.na_bdg; - npriv = bna->na_kpriv; - bna->na_kpriv = NULL; - D("deleting priv"); - - bzero(npriv, sizeof(*npriv)); - free(npriv, M_DEVBUF); - if (b) { - /* XXX the bwrap dtor should take care - * of this (2014-06-16) - */ - netmap_bdg_detach_common(b, bna->up.bdg_port, - (bh ? bna->host.bdg_port : -1)); - } - na->na_flags &= ~NAF_BUSY; - } + netmap_priv_delete(bna->na_kpriv); + bna->na_kpriv = NULL; + na->na_flags &= ~NAF_BUSY; } return error; @@ -2276,12 +2679,14 @@ return EBUSY; } - bna = malloc(sizeof(*bna), M_DEVBUF, M_NOWAIT | M_ZERO); + bna = nm_os_malloc(sizeof(*bna)); if (bna == NULL) { return ENOMEM; } na = &bna->up.up; + /* make bwrap ifp point to the real ifp */ + na->ifp = hwna->ifp; na->na_private = bna; strncpy(na->name, nr_name, sizeof(na->name)); /* fill the ring data for the bwrap adapter with rx/tx meanings @@ -2294,7 +2699,7 @@ nma_set_ndesc(na, t, nma_get_ndesc(hwna, r)); } na->nm_dtor = netmap_bwrap_dtor; - na->nm_register = netmap_bwrap_register; + na->nm_register = netmap_bwrap_reg; // na->nm_txsync = netmap_bwrap_txsync; // na->nm_rxsync = netmap_bwrap_rxsync; na->nm_config = netmap_bwrap_config; @@ -2303,13 +2708,8 @@ na->nm_notify = netmap_bwrap_notify; na->nm_bdg_ctl = netmap_bwrap_bdg_ctl; na->pdev = hwna->pdev; - na->nm_mem = netmap_mem_private_new(na->name, - na->num_tx_rings, na->num_tx_desc, - na->num_rx_rings, na->num_rx_desc, - 0, 0, &error); - na->na_flags |= NAF_MEM_OWNER; - if (na->nm_mem == NULL) - goto err_put; + na->nm_mem = netmap_mem_get(hwna->nm_mem); + na->virt_hdr_len = hwna->virt_hdr_len; bna->up.retry = 1; /* XXX maybe this should depend on the hwna */ bna->hwna = hwna; @@ -2332,7 +2732,7 @@ // hostna->nm_txsync = netmap_bwrap_host_txsync; // hostna->nm_rxsync = netmap_bwrap_host_rxsync; hostna->nm_notify = netmap_bwrap_notify; - hostna->nm_mem = na->nm_mem; + hostna->nm_mem = netmap_mem_get(na->nm_mem); hostna->na_private = bna; hostna->na_vp = &bna->up; na->na_hostvp = hwna->na_hostvp = @@ -2349,27 +2749,13 @@ if (error) { goto err_free; } - /* make bwrap ifp point to the real ifp - * NOTE: netmap_attach_common() interprets a non-NULL na->ifp - * as a request to make the ifp point to the na. Since we - * do not want to change the na already pointed to by hwna->ifp, - * the following assignment has to be delayed until now - */ - na->ifp = hwna->ifp; hwna->na_flags |= NAF_BUSY; - /* make hwna point to the allocator we are actually using, - * so that monitors will be able to find it - */ - bna->save_nmd = hwna->nm_mem; - hwna->nm_mem = na->nm_mem; return 0; err_free: - netmap_mem_delete(na->nm_mem); -err_put: hwna->na_vp = hwna->na_hostvp = NULL; netmap_adapter_put(hwna); - free(bna, M_DEVBUF); + nm_os_free(bna); return error; } @@ -2380,8 +2766,7 @@ int i; struct nm_bridge *b; - b = malloc(sizeof(struct nm_bridge) * n, M_DEVBUF, - M_NOWAIT | M_ZERO); + b = nm_os_malloc(sizeof(struct nm_bridge) * n); if (b == NULL) return NULL; for (i = 0; i < n; i++) @@ -2399,7 +2784,7 @@ for (i = 0; i < n; i++) BDG_RWDESTROY(&b[i]); - free(b, M_DEVBUF); + nm_os_free(b); } int diff -u -r -N usr/src/sys/modules/netmap/.depend.if_ptnet.o /usr/src/sys/modules/netmap/.depend.if_ptnet.o --- usr/src/sys/modules/netmap/.depend.if_ptnet.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.if_ptnet.o 2016-11-23 17:05:42.344390000 +0000 @@ -0,0 +1,62 @@ +if_ptnet.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/if_ptnet.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/param.h /usr/src/sys/sys/_null.h \ + /usr/src/sys/sys/syslimits.h /usr/src/sys/sys/errno.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/systm.h \ + machine/atomic.h machine/cpufunc.h /usr/src/sys/sys/callout.h \ + /usr/src/sys/sys/_callout.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/stdint.h machine/_stdint.h x86/_stdint.h \ + /usr/src/sys/sys/libkern.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/sockio.h \ + /usr/src/sys/sys/ioccom.h /usr/src/sys/sys/mbuf.h \ + /usr/src/sys/vm/uma.h /usr/src/sys/sys/malloc.h \ + /usr/src/sys/sys/_lock.h /usr/src/sys/sys/_mutex.h \ + /usr/src/sys/sys/sdt.h /usr/src/sys/sys/module.h \ + /usr/src/sys/sys/socket.h /usr/src/sys/sys/_iovec.h \ + /usr/src/sys/sys/_sockaddr_storage.h /usr/src/sys/sys/sysctl.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/ktr_class.h \ + /usr/src/sys/sys/mutex.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/taskqueue.h /usr/src/sys/sys/_task.h \ + /usr/src/sys/sys/smp.h /usr/src/sys/sys/cpuset.h \ + /usr/src/sys/sys/bitset.h machine/smp.h /usr/src/sys/vm/vm.h \ + machine/vm.h machine/specialreg.h x86/specialreg.h \ + /usr/src/sys/vm/pmap.h machine/pmap.h /usr/src/sys/vm/_vm_radix.h \ + /usr/src/sys/net/ethernet.h /usr/src/sys/net/if.h \ + /usr/src/sys/net/if_var.h /usr/src/sys/sys/buf_ring.h machine/cpu.h \ + machine/psl.h x86/psl.h machine/frame.h x86/frame.h machine/segments.h \ + x86/segments.h /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h \ + machine/counter.h /usr/src/sys/sys/rwlock.h /usr/src/sys/sys/_rwlock.h \ + /usr/src/sys/sys/sx.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/sys/event.h /usr/src/sys/net/ifq.h \ + /usr/src/sys/net/if_arp.h /usr/src/sys/net/if_dl.h \ + /usr/src/sys/net/if_types.h /usr/src/sys/net/if_media.h \ + /usr/src/sys/net/if_vlan_var.h /usr/src/sys/net/bpf.h \ + /usr/src/sys/netinet/in_systm.h /usr/src/sys/netinet/in.h \ + /usr/src/sys/netinet6/in6.h /usr/src/sys/netinet/ip.h \ + /usr/src/sys/netinet/ip6.h /usr/src/sys/netinet6/ip6_var.h \ + /usr/src/sys/netinet/udp.h /usr/src/sys/netinet/tcp.h \ + /usr/src/sys/netinet/sctp.h /usr/src/sys/netinet/sctp_uio.h \ + /usr/src/sys/sys/endian.h machine/bus.h x86/bus.h machine/_bus.h \ + machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h machine/resource.h /usr/src/sys/sys/bus.h \ + /usr/src/sys/sys/eventhandler.h /usr/src/sys/sys/ktr.h \ + /usr/src/sys/sys/kobj.h device_if.h bus_if.h /usr/src/sys/sys/rman.h \ + /usr/src/sys/dev/pci/pcivar.h pci_if.h /usr/src/sys/dev/pci/pcireg.h \ + opt_inet.h opt_inet6.h /usr/src/sys/sys/selinfo.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap_virt.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h \ + /usr/src/sys/dev/virtio/network/virtio_net.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap.o /usr/src/sys/modules/netmap/.depend.netmap.o --- usr/src/sys/modules/netmap/.depend.netmap.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap.o 2016-11-23 17:05:35.817114000 +0000 @@ -0,0 +1,50 @@ +netmap.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/syslimits.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/conf.h /usr/src/sys/sys/eventhandler.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/ktr_class.h /usr/src/sys/sys/ktr.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/mutex.h /usr/src/sys/sys/_mutex.h \ + /usr/src/sys/sys/pcpu.h /usr/src/sys/sys/_sx.h \ + /usr/src/sys/sys/_rmlock.h /usr/src/sys/sys/vmmeter.h \ + /usr/src/sys/sys/resource.h machine/pcpu.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/sdt.h machine/atomic.h machine/cpufunc.h \ + /usr/src/sys/sys/filio.h /usr/src/sys/sys/ioccom.h \ + /usr/src/sys/sys/sockio.h /usr/src/sys/sys/socketvar.h \ + /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/event.h \ + /usr/src/sys/sys/osd.h /usr/src/sys/sys/sockbuf.h \ + /usr/src/sys/sys/_task.h /usr/src/sys/sys/sockstate.h \ + /usr/src/sys/sys/caprights.h /usr/src/sys/sys/sockopt.h \ + /usr/src/sys/sys/malloc.h /usr/src/sys/sys/poll.h \ + /usr/src/sys/sys/rwlock.h /usr/src/sys/sys/_rwlock.h \ + /usr/src/sys/sys/socket.h /usr/src/sys/sys/_iovec.h \ + /usr/src/sys/sys/_sockaddr_storage.h /usr/src/sys/sys/sysctl.h \ + /usr/src/sys/sys/jail.h /usr/src/sys/net/vnet.h /usr/src/sys/net/if.h \ + /usr/src/sys/net/if_var.h /usr/src/sys/sys/mbuf.h \ + /usr/src/sys/sys/systm.h /usr/src/sys/sys/callout.h \ + /usr/src/sys/sys/_callout.h /usr/src/sys/sys/stdint.h \ + machine/_stdint.h x86/_stdint.h /usr/src/sys/sys/libkern.h \ + /usr/src/sys/vm/uma.h /usr/src/sys/sys/buf_ring.h machine/cpu.h \ + machine/psl.h x86/psl.h machine/frame.h x86/frame.h machine/segments.h \ + x86/segments.h /usr/src/sys/sys/counter.h machine/counter.h \ + /usr/src/sys/sys/sx.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/net/ifq.h /usr/src/sys/net/bpf.h machine/bus.h x86/bus.h \ + machine/_bus.h machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h /usr/src/sys/sys/endian.h \ + /usr/src/sys/sys/refcount.h /usr/src/sys/sys/limits.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_freebsd.o /usr/src/sys/modules/netmap/.depend.netmap_freebsd.o --- usr/src/sys/modules/netmap/.depend.netmap_freebsd.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_freebsd.o 2016-11-23 17:05:39.611585000 +0000 @@ -0,0 +1,76 @@ +netmap_freebsd.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_freebsd.c \ + opt_inet.h opt_inet6.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/types.h \ + /usr/src/sys/sys/cdefs.h machine/endian.h x86/endian.h \ + /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/syslimits.h /usr/src/sys/sys/errno.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/module.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/jail.h \ + /usr/src/sys/sys/queue.h /usr/src/sys/sys/sysctl.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/ktr_class.h /usr/src/sys/sys/mutex.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/sdt.h machine/atomic.h machine/cpufunc.h \ + /usr/src/sys/sys/_task.h /usr/src/sys/sys/osd.h \ + /usr/src/sys/sys/poll.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/conf.h /usr/src/sys/sys/eventhandler.h \ + /usr/src/sys/sys/ktr.h /usr/src/sys/sys/endian.h \ + /usr/src/sys/sys/syscallsubr.h /usr/src/sys/sys/signal.h \ + machine/signal.h x86/signal.h machine/trap.h x86/trap.h \ + /usr/src/sys/sys/uio.h /usr/src/sys/sys/_iovec.h \ + /usr/src/sys/sys/socket.h /usr/src/sys/sys/_sockaddr_storage.h \ + /usr/src/sys/sys/mac.h /usr/src/sys/sys/mount.h \ + /usr/src/sys/sys/ucred.h /usr/src/sys/bsm/audit.h \ + /usr/src/sys/sys/lockmgr.h /usr/src/sys/sys/_lockmgr.h \ + /usr/src/sys/sys/_rwlock.h /usr/src/sys/sys/rwlock.h \ + /usr/src/sys/vm/vm.h machine/vm.h machine/specialreg.h \ + x86/specialreg.h /usr/src/sys/vm/pmap.h machine/pmap.h \ + /usr/src/sys/vm/_vm_radix.h /usr/src/sys/vm/vm_param.h \ + machine/vmparam.h /usr/src/sys/vm/vm_object.h \ + /usr/src/sys/vm/vm_page.h /usr/src/sys/sys/systm.h \ + /usr/src/sys/sys/callout.h /usr/src/sys/sys/_callout.h \ + /usr/src/sys/sys/stdint.h machine/_stdint.h x86/_stdint.h \ + /usr/src/sys/sys/libkern.h /usr/src/sys/vm/vm_pager.h \ + /usr/src/sys/vm/uma.h /usr/src/sys/sys/malloc.h \ + /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/event.h \ + /usr/src/sys/sys/kthread.h /usr/src/sys/sys/proc.h \ + /usr/src/sys/sys/condvar.h /usr/src/sys/sys/rtprio.h \ + /usr/src/sys/sys/runq.h machine/runq.h /usr/src/sys/sys/sigio.h \ + /usr/src/sys/sys/signalvar.h /usr/src/sys/sys/ucontext.h \ + machine/ucontext.h x86/ucontext.h /usr/src/sys/sys/_ucontext.h \ + /usr/src/sys/sys/_vm_domain.h /usr/src/sys/sys/seq.h machine/cpu.h \ + machine/psl.h x86/psl.h machine/frame.h x86/frame.h machine/segments.h \ + x86/segments.h machine/proc.h /usr/src/sys/sys/unistd.h \ + /usr/src/sys/sys/sched.h /usr/src/sys/sys/smp.h \ + /usr/src/sys/sys/cpuset.h /usr/src/sys/sys/bitset.h \ + /usr/src/sys/net/if.h /usr/src/sys/net/if_var.h \ + /usr/src/sys/sys/mbuf.h /usr/src/sys/sys/buf_ring.h \ + /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h machine/counter.h \ + /usr/src/sys/sys/sx.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/net/ifq.h /usr/src/sys/net/if_types.h \ + /usr/src/sys/net/ethernet.h /usr/src/sys/net/if_dl.h machine/bus.h \ + x86/bus.h machine/_bus.h machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h /usr/src/sys/netinet/in.h \ + /usr/src/sys/netinet6/in6.h machine/in_cksum.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap_virt.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h \ + /usr/src/sys/sys/bus.h /usr/src/sys/sys/ioccom.h \ + /usr/src/sys/sys/kobj.h device_if.h bus_if.h /usr/src/sys/sys/rman.h \ + machine/resource.h /usr/src/sys/dev/pci/pcivar.h pci_if.h \ + /usr/src/sys/dev/pci/pcireg.h /usr/src/sys/sys/sysproto.h \ + /usr/src/sys/sys/acl.h /usr/src/sys/sys/_ffcounter.h \ + /usr/src/sys/sys/_semaphore.h /usr/src/sys/sys/wait.h \ + /usr/src/sys/bsm/audit_kevents.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_generic.o /usr/src/sys/modules/netmap/.depend.netmap_generic.o --- usr/src/sys/modules/netmap/.depend.netmap_generic.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_generic.o 2016-11-23 17:05:37.382378000 +0000 @@ -0,0 +1,40 @@ +netmap_generic.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_generic.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/malloc.h \ + /usr/src/sys/sys/param.h /usr/src/sys/sys/_null.h \ + /usr/src/sys/sys/syslimits.h /usr/src/sys/sys/time.h \ + /usr/src/sys/sys/priority.h machine/param.h machine/_align.h \ + x86/_align.h /usr/src/sys/sys/queue.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/lock.h \ + /usr/src/sys/sys/ktr_class.h /usr/src/sys/sys/rwlock.h \ + /usr/src/sys/sys/_rwlock.h /usr/src/sys/sys/lock_profile.h \ + /usr/src/sys/sys/lockstat.h /usr/src/sys/sys/sdt.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + machine/atomic.h /usr/src/sys/sys/socket.h /usr/src/sys/sys/_iovec.h \ + /usr/src/sys/sys/_sockaddr_storage.h /usr/src/sys/sys/selinfo.h \ + /usr/src/sys/sys/event.h /usr/src/sys/net/if.h \ + /usr/src/sys/net/if_var.h /usr/src/sys/sys/mbuf.h \ + /usr/src/sys/sys/systm.h machine/cpufunc.h /usr/src/sys/sys/callout.h \ + /usr/src/sys/sys/_callout.h /usr/src/sys/sys/stdint.h \ + machine/_stdint.h x86/_stdint.h /usr/src/sys/sys/libkern.h \ + /usr/src/sys/vm/uma.h /usr/src/sys/sys/buf_ring.h machine/cpu.h \ + machine/psl.h x86/psl.h machine/frame.h x86/frame.h machine/segments.h \ + x86/segments.h /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h \ + machine/counter.h /usr/src/sys/sys/mutex.h /usr/src/sys/sys/sx.h \ + /usr/src/sys/sys/_task.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/net/ifq.h machine/bus.h x86/bus.h machine/_bus.h \ + machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_mbq.o /usr/src/sys/modules/netmap/.depend.netmap_mbq.o --- usr/src/sys/modules/netmap/.depend.netmap_mbq.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_mbq.o 2016-11-23 17:05:37.510213000 +0000 @@ -0,0 +1,26 @@ +netmap_mbq.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.c \ + /usr/src/sys/sys/param.h /usr/src/sys/sys/_null.h \ + /usr/src/sys/sys/types.h /usr/src/sys/sys/cdefs.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/syslimits.h /usr/src/sys/sys/errno.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/lock.h \ + /usr/src/sys/sys/queue.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/ktr_class.h /usr/src/sys/sys/mutex.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/sdt.h /usr/src/sys/sys/linker_set.h machine/atomic.h \ + machine/cpufunc.h /usr/src/sys/sys/systm.h /usr/src/sys/sys/callout.h \ + /usr/src/sys/sys/_callout.h /usr/src/sys/sys/stdint.h \ + machine/_stdint.h x86/_stdint.h /usr/src/sys/sys/libkern.h \ + /usr/src/sys/sys/mbuf.h /usr/src/sys/vm/uma.h \ + /usr/src/sys/sys/malloc.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_mem2.o /usr/src/sys/modules/netmap/.depend.netmap_mem2.o --- usr/src/sys/modules/netmap/.depend.netmap_mem2.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_mem2.o 2016-11-23 17:05:36.921656000 +0000 @@ -0,0 +1,53 @@ +netmap_mem2.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/malloc.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/syslimits.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/time.h \ + /usr/src/sys/sys/priority.h machine/param.h machine/_align.h \ + x86/_align.h /usr/src/sys/sys/queue.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/proc.h \ + /usr/src/sys/sys/callout.h /usr/src/sys/sys/_callout.h \ + /usr/src/sys/sys/event.h /usr/src/sys/sys/condvar.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/osd.h \ + /usr/src/sys/sys/rtprio.h /usr/src/sys/sys/runq.h machine/runq.h \ + /usr/src/sys/sys/resource.h /usr/src/sys/sys/sigio.h \ + /usr/src/sys/sys/signal.h machine/signal.h x86/signal.h machine/trap.h \ + x86/trap.h /usr/src/sys/sys/signalvar.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h machine/pcpu.h /usr/src/sys/sys/ucontext.h \ + machine/ucontext.h x86/ucontext.h /usr/src/sys/sys/_ucontext.h \ + /usr/src/sys/sys/ucred.h /usr/src/sys/bsm/audit.h \ + /usr/src/sys/sys/_vm_domain.h /usr/src/sys/sys/seq.h \ + /usr/src/sys/sys/systm.h machine/atomic.h machine/cpufunc.h \ + /usr/src/sys/sys/stdint.h machine/_stdint.h x86/_stdint.h \ + /usr/src/sys/sys/libkern.h /usr/src/sys/sys/lock.h \ + /usr/src/sys/sys/ktr_class.h machine/cpu.h machine/psl.h x86/psl.h \ + machine/frame.h x86/frame.h machine/segments.h x86/segments.h \ + machine/proc.h /usr/src/sys/vm/vm.h machine/vm.h machine/specialreg.h \ + x86/specialreg.h /usr/src/sys/vm/pmap.h machine/pmap.h \ + /usr/src/sys/vm/_vm_radix.h /usr/src/sys/sys/socket.h \ + /usr/src/sys/sys/_iovec.h /usr/src/sys/sys/_sockaddr_storage.h \ + /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/sysctl.h \ + /usr/src/sys/net/if.h /usr/src/sys/net/if_var.h \ + /usr/src/sys/sys/mbuf.h /usr/src/sys/vm/uma.h /usr/src/sys/sys/sdt.h \ + /usr/src/sys/sys/buf_ring.h /usr/src/sys/net/vnet.h \ + /usr/src/sys/sys/counter.h machine/counter.h /usr/src/sys/sys/mutex.h \ + /usr/src/sys/sys/lockstat.h /usr/src/sys/sys/rwlock.h \ + /usr/src/sys/sys/_rwlock.h /usr/src/sys/sys/sx.h \ + /usr/src/sys/sys/_task.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/net/ifq.h machine/bus.h x86/bus.h machine/_bus.h \ + machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap_virt.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_monitor.o /usr/src/sys/modules/netmap/.depend.netmap_monitor.o --- usr/src/sys/modules/netmap/.depend.netmap_monitor.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_monitor.o 2016-11-23 17:05:40.691882000 +0000 @@ -0,0 +1,43 @@ +netmap_monitor.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_monitor.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/syslimits.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/malloc.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/poll.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/ktr_class.h \ + /usr/src/sys/sys/rwlock.h /usr/src/sys/sys/_rwlock.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/sdt.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + machine/atomic.h /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/event.h \ + /usr/src/sys/sys/sysctl.h /usr/src/sys/sys/socket.h \ + /usr/src/sys/sys/_iovec.h /usr/src/sys/sys/_sockaddr_storage.h \ + /usr/src/sys/net/if.h /usr/src/sys/net/if_var.h \ + /usr/src/sys/sys/mbuf.h /usr/src/sys/sys/systm.h machine/cpufunc.h \ + /usr/src/sys/sys/callout.h /usr/src/sys/sys/_callout.h \ + /usr/src/sys/sys/stdint.h machine/_stdint.h x86/_stdint.h \ + /usr/src/sys/sys/libkern.h /usr/src/sys/vm/uma.h \ + /usr/src/sys/sys/buf_ring.h machine/cpu.h machine/psl.h x86/psl.h \ + machine/frame.h x86/frame.h machine/segments.h x86/segments.h \ + /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h machine/counter.h \ + /usr/src/sys/sys/mutex.h /usr/src/sys/sys/sx.h \ + /usr/src/sys/sys/_task.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/net/ifq.h machine/bus.h x86/bus.h machine/_bus.h \ + machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h /usr/src/sys/sys/refcount.h \ + /usr/src/sys/sys/limits.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_offloadings.o /usr/src/sys/modules/netmap/.depend.netmap_offloadings.o --- usr/src/sys/modules/netmap/.depend.netmap_offloadings.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_offloadings.o 2016-11-23 17:05:39.923580000 +0000 @@ -0,0 +1,44 @@ +netmap_offloadings.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_offloadings.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/syslimits.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/sockio.h /usr/src/sys/sys/ioccom.h \ + /usr/src/sys/sys/malloc.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/socketvar.h \ + /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/event.h \ + /usr/src/sys/sys/osd.h /usr/src/sys/sys/_sx.h \ + /usr/src/sys/sys/sockbuf.h /usr/src/sys/sys/_task.h \ + /usr/src/sys/sys/sockstate.h /usr/src/sys/sys/caprights.h \ + /usr/src/sys/sys/sockopt.h /usr/src/sys/sys/socket.h \ + /usr/src/sys/sys/_iovec.h /usr/src/sys/sys/_sockaddr_storage.h \ + /usr/src/sys/net/if.h /usr/src/sys/net/if_var.h \ + /usr/src/sys/sys/mbuf.h /usr/src/sys/sys/systm.h machine/atomic.h \ + machine/cpufunc.h /usr/src/sys/sys/callout.h \ + /usr/src/sys/sys/_callout.h /usr/src/sys/sys/stdint.h \ + machine/_stdint.h x86/_stdint.h /usr/src/sys/sys/libkern.h \ + /usr/src/sys/vm/uma.h /usr/src/sys/sys/sdt.h \ + /usr/src/sys/sys/buf_ring.h machine/cpu.h machine/psl.h x86/psl.h \ + machine/frame.h x86/frame.h machine/segments.h x86/segments.h \ + /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h machine/counter.h \ + /usr/src/sys/sys/pcpu.h /usr/src/sys/sys/_cpuset.h \ + /usr/src/sys/sys/_bitset.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/ktr_class.h \ + /usr/src/sys/sys/mutex.h /usr/src/sys/sys/lock_profile.h \ + /usr/src/sys/sys/lockstat.h /usr/src/sys/sys/rwlock.h \ + /usr/src/sys/sys/_rwlock.h /usr/src/sys/sys/sx.h \ + /usr/src/sys/net/altq/if_altq.h /usr/src/sys/net/ifq.h machine/bus.h \ + x86/bus.h machine/_bus.h machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h /usr/src/sys/sys/endian.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_pipe.o /usr/src/sys/modules/netmap/.depend.netmap_pipe.o --- usr/src/sys/modules/netmap/.depend.netmap_pipe.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_pipe.o 2016-11-23 17:05:40.232032000 +0000 @@ -0,0 +1,43 @@ +netmap_pipe.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_pipe.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/syslimits.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/malloc.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/poll.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/ktr_class.h \ + /usr/src/sys/sys/rwlock.h /usr/src/sys/sys/_rwlock.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/sdt.h /usr/src/sys/sys/pcpu.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/_sx.h /usr/src/sys/sys/_rmlock.h \ + /usr/src/sys/sys/vmmeter.h /usr/src/sys/sys/resource.h machine/pcpu.h \ + machine/atomic.h /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/event.h \ + /usr/src/sys/sys/sysctl.h /usr/src/sys/sys/socket.h \ + /usr/src/sys/sys/_iovec.h /usr/src/sys/sys/_sockaddr_storage.h \ + /usr/src/sys/net/if.h /usr/src/sys/net/if_var.h \ + /usr/src/sys/sys/mbuf.h /usr/src/sys/sys/systm.h machine/cpufunc.h \ + /usr/src/sys/sys/callout.h /usr/src/sys/sys/_callout.h \ + /usr/src/sys/sys/stdint.h machine/_stdint.h x86/_stdint.h \ + /usr/src/sys/sys/libkern.h /usr/src/sys/vm/uma.h \ + /usr/src/sys/sys/buf_ring.h machine/cpu.h machine/psl.h x86/psl.h \ + machine/frame.h x86/frame.h machine/segments.h x86/segments.h \ + /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h machine/counter.h \ + /usr/src/sys/sys/mutex.h /usr/src/sys/sys/sx.h \ + /usr/src/sys/sys/_task.h /usr/src/sys/net/altq/if_altq.h \ + /usr/src/sys/net/ifq.h machine/bus.h x86/bus.h machine/_bus.h \ + machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h /usr/src/sys/sys/refcount.h \ + /usr/src/sys/sys/limits.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_pt.o /usr/src/sys/modules/netmap/.depend.netmap_pt.o --- usr/src/sys/modules/netmap/.depend.netmap_pt.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_pt.o 2016-11-23 17:05:41.322506000 +0000 @@ -0,0 +1,42 @@ +netmap_pt.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_pt.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/syslimits.h /usr/src/sys/sys/errno.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/selinfo.h /usr/src/sys/sys/event.h \ + /usr/src/sys/sys/socket.h /usr/src/sys/sys/_iovec.h \ + /usr/src/sys/sys/_sockaddr_storage.h /usr/src/sys/net/if.h \ + /usr/src/sys/net/if_var.h /usr/src/sys/sys/mbuf.h \ + /usr/src/sys/sys/systm.h machine/atomic.h machine/cpufunc.h \ + /usr/src/sys/sys/callout.h /usr/src/sys/sys/_callout.h \ + /usr/src/sys/sys/stdint.h machine/_stdint.h x86/_stdint.h \ + /usr/src/sys/sys/libkern.h /usr/src/sys/vm/uma.h \ + /usr/src/sys/sys/malloc.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/_mutex.h /usr/src/sys/sys/sdt.h \ + /usr/src/sys/sys/buf_ring.h machine/cpu.h machine/psl.h x86/psl.h \ + machine/frame.h x86/frame.h machine/segments.h x86/segments.h \ + /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h machine/counter.h \ + /usr/src/sys/sys/pcpu.h /usr/src/sys/sys/_cpuset.h \ + /usr/src/sys/sys/_bitset.h /usr/src/sys/sys/_sx.h \ + /usr/src/sys/sys/_rmlock.h /usr/src/sys/sys/vmmeter.h \ + /usr/src/sys/sys/resource.h machine/pcpu.h /usr/src/sys/sys/lock.h \ + /usr/src/sys/sys/ktr_class.h /usr/src/sys/sys/mutex.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/rwlock.h /usr/src/sys/sys/_rwlock.h \ + /usr/src/sys/sys/sx.h /usr/src/sys/sys/_task.h \ + /usr/src/sys/net/altq/if_altq.h /usr/src/sys/net/ifq.h machine/bus.h \ + x86/bus.h machine/_bus.h machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap_virt.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.depend.netmap_vale.o /usr/src/sys/modules/netmap/.depend.netmap_vale.o --- usr/src/sys/modules/netmap/.depend.netmap_vale.o 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.depend.netmap_vale.o 2016-11-23 17:05:38.804529000 +0000 @@ -0,0 +1,50 @@ +netmap_vale.o: \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_vale.c \ + /usr/src/sys/sys/cdefs.h /usr/src/sys/sys/types.h machine/endian.h \ + x86/endian.h /usr/src/sys/sys/_types.h machine/_types.h x86/_types.h \ + machine/_limits.h x86/_limits.h /usr/src/sys/sys/_pthreadtypes.h \ + /usr/src/sys/sys/_stdint.h /usr/src/sys/sys/select.h \ + /usr/src/sys/sys/_sigset.h /usr/src/sys/sys/_timeval.h \ + /usr/src/sys/sys/timespec.h /usr/src/sys/sys/_timespec.h \ + /usr/src/sys/sys/errno.h /usr/src/sys/sys/param.h \ + /usr/src/sys/sys/_null.h /usr/src/sys/sys/syslimits.h \ + /usr/src/sys/sys/time.h /usr/src/sys/sys/priority.h machine/param.h \ + machine/_align.h x86/_align.h /usr/src/sys/sys/kernel.h \ + /usr/src/sys/sys/linker_set.h /usr/src/sys/sys/queue.h \ + /usr/src/sys/sys/conf.h /usr/src/sys/sys/eventhandler.h \ + /usr/src/sys/sys/lock.h /usr/src/sys/sys/_lock.h \ + /usr/src/sys/sys/ktr_class.h /usr/src/sys/sys/ktr.h \ + /usr/src/sys/sys/_cpuset.h /usr/src/sys/sys/_bitset.h \ + /usr/src/sys/sys/mutex.h /usr/src/sys/sys/_mutex.h \ + /usr/src/sys/sys/pcpu.h /usr/src/sys/sys/_sx.h \ + /usr/src/sys/sys/_rmlock.h /usr/src/sys/sys/vmmeter.h \ + /usr/src/sys/sys/resource.h machine/pcpu.h \ + /usr/src/sys/sys/lock_profile.h /usr/src/sys/sys/lockstat.h \ + /usr/src/sys/sys/sdt.h machine/atomic.h machine/cpufunc.h \ + /usr/src/sys/sys/sockio.h /usr/src/sys/sys/ioccom.h \ + /usr/src/sys/sys/socketvar.h /usr/src/sys/sys/selinfo.h \ + /usr/src/sys/sys/event.h /usr/src/sys/sys/osd.h \ + /usr/src/sys/sys/sockbuf.h /usr/src/sys/sys/_task.h \ + /usr/src/sys/sys/sockstate.h /usr/src/sys/sys/caprights.h \ + /usr/src/sys/sys/sockopt.h /usr/src/sys/sys/malloc.h \ + /usr/src/sys/sys/poll.h /usr/src/sys/sys/rwlock.h \ + /usr/src/sys/sys/_rwlock.h /usr/src/sys/sys/socket.h \ + /usr/src/sys/sys/_iovec.h /usr/src/sys/sys/_sockaddr_storage.h \ + /usr/src/sys/sys/sysctl.h /usr/src/sys/net/if.h \ + /usr/src/sys/net/if_var.h /usr/src/sys/sys/mbuf.h \ + /usr/src/sys/sys/systm.h /usr/src/sys/sys/callout.h \ + /usr/src/sys/sys/_callout.h /usr/src/sys/sys/stdint.h \ + machine/_stdint.h x86/_stdint.h /usr/src/sys/sys/libkern.h \ + /usr/src/sys/vm/uma.h /usr/src/sys/sys/buf_ring.h machine/cpu.h \ + machine/psl.h x86/psl.h machine/frame.h x86/frame.h machine/segments.h \ + x86/segments.h /usr/src/sys/net/vnet.h /usr/src/sys/sys/counter.h \ + machine/counter.h /usr/src/sys/sys/sx.h \ + /usr/src/sys/net/altq/if_altq.h /usr/src/sys/net/ifq.h \ + /usr/src/sys/net/bpf.h machine/bus.h x86/bus.h machine/_bus.h \ + machine/bus_dma.h /usr/src/sys/sys/bus_dma.h \ + /usr/src/sys/sys/_bus_dma.h /usr/src/sys/sys/endian.h \ + /usr/src/sys/sys/refcount.h /usr/src/sys/sys/limits.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../net/netmap.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_kern.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mbq.h \ + /usr/home/steven/netmap/sys/modules/netmap/../../dev/netmap/netmap_mem2.h diff -u -r -N usr/src/sys/modules/netmap/.gitignore /usr/src/sys/modules/netmap/.gitignore --- usr/src/sys/modules/netmap/.gitignore 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/.gitignore 2016-11-23 16:57:57.856639000 +0000 @@ -0,0 +1,3 @@ +* +!Makefile +!.gitignore diff -u -r -N usr/src/sys/modules/netmap/Makefile /usr/src/sys/modules/netmap/Makefile --- usr/src/sys/modules/netmap/Makefile 2016-09-29 00:24:53.000000000 +0100 +++ /usr/src/sys/modules/netmap/Makefile 2016-11-23 16:57:57.856934000 +0000 @@ -1,13 +1,16 @@ -# $FreeBSD: releng/11.0/sys/modules/netmap/Makefile 272108 2014-09-25 14:25:38Z luigi $ +# $FreeBSD$ # # Compile netmap as a module, useful if you want a netmap bridge # or loadable drivers. +.include <bsd.own.mk> # FreeBSD 10 and earlier +# .include "${SYSDIR}/conf/kern.opts.mk" + .PATH: ${.CURDIR}/../../dev/netmap .PATH.h: ${.CURDIR}/../../net -CFLAGS += -I${.CURDIR}/../../ +CFLAGS += -I${.CURDIR}/../../ -D INET KMOD = netmap -SRCS = device_if.h bus_if.h opt_netmap.h +SRCS = device_if.h bus_if.h pci_if.h opt_netmap.h SRCS += netmap.c netmap.h netmap_kern.h SRCS += netmap_mem2.c netmap_mem2.h SRCS += netmap_generic.c @@ -17,5 +20,8 @@ SRCS += netmap_offloadings.c SRCS += netmap_pipe.c SRCS += netmap_monitor.c +SRCS += netmap_pt.c +SRCS += if_ptnet.c +SRCS += opt_inet.h opt_inet6.h .include <bsd.kmod.mk> diff -u -r -N usr/src/sys/modules/netmap/bus_if.h /usr/src/sys/modules/netmap/bus_if.h --- usr/src/sys/modules/netmap/bus_if.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/bus_if.h 2016-11-23 17:05:33.857275000 +0000 @@ -0,0 +1,1040 @@ +/* + * This file is produced automatically. + * Do not modify anything in here by hand. + * + * Created from source file + * /usr/src/sys/kern/bus_if.m + * with + * makeobjops.awk + * + * See the source file for legal information + */ + +/** + * @defgroup BUS bus - KObj methods for drivers of devices with children + * @brief A set of methods required device drivers that support + * child devices. + * @{ + */ + +#ifndef _bus_if_h_ +#define _bus_if_h_ + +/** @brief Unique descriptor for the BUS_PRINT_CHILD() method */ +extern struct kobjop_desc bus_print_child_desc; +/** @brief A function implementing the BUS_PRINT_CHILD() method */ +typedef int bus_print_child_t(device_t _dev, device_t _child); +/** + * @brief Print a description of a child device + * + * This is called from system code which prints out a description of a + * device. It should describe the attachment that the child has with + * the parent. For instance the TurboLaser bus prints which node the + * device is attached to. See bus_generic_print_child() for more + * information. + * + * @param _dev the device whose child is being printed + * @param _child the child device to describe + * + * @returns the number of characters output. + */ + +static __inline int BUS_PRINT_CHILD(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_print_child); + return ((bus_print_child_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_PROBE_NOMATCH() method */ +extern struct kobjop_desc bus_probe_nomatch_desc; +/** @brief A function implementing the BUS_PROBE_NOMATCH() method */ +typedef void bus_probe_nomatch_t(device_t _dev, device_t _child); +/** + * @brief Print a notification about an unprobed child device. + * + * Called for each child device that did not succeed in probing for a + * driver. + * + * @param _dev the device whose child was being probed + * @param _child the child device which failed to probe + */ + +static __inline void BUS_PROBE_NOMATCH(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_probe_nomatch); + ((bus_probe_nomatch_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_READ_IVAR() method */ +extern struct kobjop_desc bus_read_ivar_desc; +/** @brief A function implementing the BUS_READ_IVAR() method */ +typedef int bus_read_ivar_t(device_t _dev, device_t _child, int _index, + uintptr_t *_result); +/** + * @brief Read the value of a bus-specific attribute of a device + * + * This method, along with BUS_WRITE_IVAR() manages a bus-specific set + * of instance variables of a child device. The intention is that + * each different type of bus defines a set of appropriate instance + * variables (such as ports and irqs for ISA bus etc.) + * + * This information could be given to the child device as a struct but + * that makes it hard for a bus to add or remove variables without + * forcing an edit and recompile for all drivers which may not be + * possible for vendor supplied binary drivers. + * + * This method copies the value of an instance variable to the + * location specified by @p *_result. + * + * @param _dev the device whose child was being examined + * @param _child the child device whose instance variable is + * being read + * @param _index the instance variable to read + * @param _result a location to receive the instance variable + * value + * + * @retval 0 success + * @retval ENOENT no such instance variable is supported by @p + * _dev + */ + +static __inline int BUS_READ_IVAR(device_t _dev, device_t _child, int _index, + uintptr_t *_result) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_read_ivar); + return ((bus_read_ivar_t *) _m)(_dev, _child, _index, _result); +} + +/** @brief Unique descriptor for the BUS_WRITE_IVAR() method */ +extern struct kobjop_desc bus_write_ivar_desc; +/** @brief A function implementing the BUS_WRITE_IVAR() method */ +typedef int bus_write_ivar_t(device_t _dev, device_t _child, int _indx, + uintptr_t _value); +/** + * @brief Write the value of a bus-specific attribute of a device + * + * This method sets the value of an instance variable to @p _value. + * + * @param _dev the device whose child was being updated + * @param _child the child device whose instance variable is + * being written + * @param _index the instance variable to write + * @param _value the value to write to that instance variable + * + * @retval 0 success + * @retval ENOENT no such instance variable is supported by @p + * _dev + * @retval EINVAL the instance variable was recognised but + * contains a read-only value + */ + +static __inline int BUS_WRITE_IVAR(device_t _dev, device_t _child, int _indx, + uintptr_t _value) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_write_ivar); + return ((bus_write_ivar_t *) _m)(_dev, _child, _indx, _value); +} + +/** @brief Unique descriptor for the BUS_CHILD_DELETED() method */ +extern struct kobjop_desc bus_child_deleted_desc; +/** @brief A function implementing the BUS_CHILD_DELETED() method */ +typedef void bus_child_deleted_t(device_t _dev, device_t _child); +/** + * @brief Notify a bus that a child was deleted + * + * Called at the beginning of device_delete_child() to allow the parent + * to teardown any bus-specific state for the child. + * + * @param _dev the device whose child is being deleted + * @param _child the child device which is being deleted + */ + +static __inline void BUS_CHILD_DELETED(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_child_deleted); + ((bus_child_deleted_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_CHILD_DETACHED() method */ +extern struct kobjop_desc bus_child_detached_desc; +/** @brief A function implementing the BUS_CHILD_DETACHED() method */ +typedef void bus_child_detached_t(device_t _dev, device_t _child); +/** + * @brief Notify a bus that a child was detached + * + * Called after the child's DEVICE_DETACH() method to allow the parent + * to reclaim any resources allocated on behalf of the child. + * + * @param _dev the device whose child changed state + * @param _child the child device which changed state + */ + +static __inline void BUS_CHILD_DETACHED(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_child_detached); + ((bus_child_detached_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_DRIVER_ADDED() method */ +extern struct kobjop_desc bus_driver_added_desc; +/** @brief A function implementing the BUS_DRIVER_ADDED() method */ +typedef void bus_driver_added_t(device_t _dev, driver_t *_driver); +/** + * @brief Notify a bus that a new driver was added + * + * Called when a new driver is added to the devclass which owns this + * bus. The generic implementation of this method attempts to probe and + * attach any un-matched children of the bus. + * + * @param _dev the device whose devclass had a new driver + * added to it + * @param _driver the new driver which was added + */ + +static __inline void BUS_DRIVER_ADDED(device_t _dev, driver_t *_driver) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_driver_added); + ((bus_driver_added_t *) _m)(_dev, _driver); +} + +/** @brief Unique descriptor for the BUS_ADD_CHILD() method */ +extern struct kobjop_desc bus_add_child_desc; +/** @brief A function implementing the BUS_ADD_CHILD() method */ +typedef device_t bus_add_child_t(device_t _dev, u_int _order, const char *_name, + int _unit); +/** + * @brief Create a new child device + * + * For busses which use use drivers supporting DEVICE_IDENTIFY() to + * enumerate their devices, this method is used to create new + * device instances. The new device will be added after the last + * existing child with the same order. Implementations of bus_add_child + * call device_add_child_ordered to add the child and often add + * a suitable ivar to the device specific to that bus. + * + * @param _dev the bus device which will be the parent of the + * new child device + * @param _order a value which is used to partially sort the + * children of @p _dev - devices created using + * lower values of @p _order appear first in @p + * _dev's list of children + * @param _name devclass name for new device or @c NULL if not + * specified + * @param _unit unit number for new device or @c -1 if not + * specified + */ + +static __inline device_t BUS_ADD_CHILD(device_t _dev, u_int _order, + const char *_name, int _unit) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_add_child); + return ((bus_add_child_t *) _m)(_dev, _order, _name, _unit); +} + +/** @brief Unique descriptor for the BUS_RESCAN() method */ +extern struct kobjop_desc bus_rescan_desc; +/** @brief A function implementing the BUS_RESCAN() method */ +typedef int bus_rescan_t(device_t _dev); +/** + * @brief Rescan the bus + * + * This method is called by a parent bridge or devctl to trigger a bus + * rescan. The rescan should delete devices no longer present and + * enumerate devices that have newly arrived. + * + * @param _dev the bus device + */ + +static __inline int BUS_RESCAN(device_t _dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_rescan); + return ((bus_rescan_t *) _m)(_dev); +} + +/** @brief Unique descriptor for the BUS_ALLOC_RESOURCE() method */ +extern struct kobjop_desc bus_alloc_resource_desc; +/** @brief A function implementing the BUS_ALLOC_RESOURCE() method */ +typedef struct resource * bus_alloc_resource_t(device_t _dev, device_t _child, + int _type, int *_rid, + rman_res_t _start, + rman_res_t _end, + rman_res_t _count, u_int _flags); +/** + * @brief Allocate a system resource + * + * This method is called by child devices of a bus to allocate resources. + * The types are defined in <machine/resource.h>; the meaning of the + * resource-ID field varies from bus to bus (but @p *rid == 0 is always + * valid if the resource type is). If a resource was allocated and the + * caller did not use the RF_ACTIVE to specify that it should be + * activated immediately, the caller is responsible for calling + * BUS_ACTIVATE_RESOURCE() when it actually uses the resource. + * + * @param _dev the parent device of @p _child + * @param _child the device which is requesting an allocation + * @param _type the type of resource to allocate + * @param _rid a pointer to the resource identifier + * @param _start hint at the start of the resource range - pass + * @c 0 for any start address + * @param _end hint at the end of the resource range - pass + * @c ~0 for any end address + * @param _count hint at the size of range required - pass @c 1 + * for any size + * @param _flags any extra flags to control the resource + * allocation - see @c RF_XXX flags in + * <sys/rman.h> for details + * + * @returns the resource which was allocated or @c NULL if no + * resource could be allocated + */ + +static __inline struct resource * BUS_ALLOC_RESOURCE(device_t _dev, + device_t _child, int _type, + int *_rid, + rman_res_t _start, + rman_res_t _end, + rman_res_t _count, + u_int _flags) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_alloc_resource); + return ((bus_alloc_resource_t *) _m)(_dev, _child, _type, _rid, _start, _end, _count, _flags); +} + +/** @brief Unique descriptor for the BUS_ACTIVATE_RESOURCE() method */ +extern struct kobjop_desc bus_activate_resource_desc; +/** @brief A function implementing the BUS_ACTIVATE_RESOURCE() method */ +typedef int bus_activate_resource_t(device_t _dev, device_t _child, int _type, + int _rid, struct resource *_r); +/** + * @brief Activate a resource + * + * Activate a resource previously allocated with + * BUS_ALLOC_RESOURCE(). This may enable decoding of this resource in a + * device for instance. It will also establish a mapping for the resource + * unless RF_UNMAPPED was set when allocating the resource. + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _type the type of resource + * @param _rid the resource identifier + * @param _r the resource to activate + */ + +static __inline int BUS_ACTIVATE_RESOURCE(device_t _dev, device_t _child, + int _type, int _rid, + struct resource *_r) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_activate_resource); + return ((bus_activate_resource_t *) _m)(_dev, _child, _type, _rid, _r); +} + +/** @brief Unique descriptor for the BUS_MAP_RESOURCE() method */ +extern struct kobjop_desc bus_map_resource_desc; +/** @brief A function implementing the BUS_MAP_RESOURCE() method */ +typedef int bus_map_resource_t(device_t _dev, device_t _child, int _type, + struct resource *_r, + struct resource_map_request *_args, + struct resource_map *_map); +/** + * @brief Map a resource + * + * Allocate a mapping for a range of an active resource. The mapping + * is described by a struct resource_map object. This may for instance + * map a memory region into the kernel's virtual address space. + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _type the type of resource + * @param _r the resource to map + * @param _args optional attributes of the mapping + * @param _map the mapping + */ + +static __inline int BUS_MAP_RESOURCE(device_t _dev, device_t _child, int _type, + struct resource *_r, + struct resource_map_request *_args, + struct resource_map *_map) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_map_resource); + return ((bus_map_resource_t *) _m)(_dev, _child, _type, _r, _args, _map); +} + +/** @brief Unique descriptor for the BUS_UNMAP_RESOURCE() method */ +extern struct kobjop_desc bus_unmap_resource_desc; +/** @brief A function implementing the BUS_UNMAP_RESOURCE() method */ +typedef int bus_unmap_resource_t(device_t _dev, device_t _child, int _type, + struct resource *_r, + struct resource_map *_map); +/** + * @brief Unmap a resource + * + * Release a mapping previously allocated with + * BUS_MAP_RESOURCE(). This may for instance unmap a memory region + * from the kernel's virtual address space. + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _type the type of resource + * @param _r the resource + * @param _map the mapping to release + */ + +static __inline int BUS_UNMAP_RESOURCE(device_t _dev, device_t _child, + int _type, struct resource *_r, + struct resource_map *_map) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_unmap_resource); + return ((bus_unmap_resource_t *) _m)(_dev, _child, _type, _r, _map); +} + +/** @brief Unique descriptor for the BUS_DEACTIVATE_RESOURCE() method */ +extern struct kobjop_desc bus_deactivate_resource_desc; +/** @brief A function implementing the BUS_DEACTIVATE_RESOURCE() method */ +typedef int bus_deactivate_resource_t(device_t _dev, device_t _child, int _type, + int _rid, struct resource *_r); +/** + * @brief Deactivate a resource + * + * Deactivate a resource previously allocated with + * BUS_ALLOC_RESOURCE(). + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _type the type of resource + * @param _rid the resource identifier + * @param _r the resource to deactivate + */ + +static __inline int BUS_DEACTIVATE_RESOURCE(device_t _dev, device_t _child, + int _type, int _rid, + struct resource *_r) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_deactivate_resource); + return ((bus_deactivate_resource_t *) _m)(_dev, _child, _type, _rid, _r); +} + +/** @brief Unique descriptor for the BUS_ADJUST_RESOURCE() method */ +extern struct kobjop_desc bus_adjust_resource_desc; +/** @brief A function implementing the BUS_ADJUST_RESOURCE() method */ +typedef int bus_adjust_resource_t(device_t _dev, device_t _child, int _type, + struct resource *_res, rman_res_t _start, + rman_res_t _end); +/** + * @brief Adjust a resource + * + * Adjust the start and/or end of a resource allocated by + * BUS_ALLOC_RESOURCE. At least part of the new address range must overlap + * with the existing address range. If the successful, the resource's range + * will be adjusted to [start, end] on return. + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _type the type of resource + * @param _res the resource to adjust + * @param _start the new starting address of the resource range + * @param _end the new ending address of the resource range + */ + +static __inline int BUS_ADJUST_RESOURCE(device_t _dev, device_t _child, + int _type, struct resource *_res, + rman_res_t _start, rman_res_t _end) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_adjust_resource); + return ((bus_adjust_resource_t *) _m)(_dev, _child, _type, _res, _start, _end); +} + +/** @brief Unique descriptor for the BUS_RELEASE_RESOURCE() method */ +extern struct kobjop_desc bus_release_resource_desc; +/** @brief A function implementing the BUS_RELEASE_RESOURCE() method */ +typedef int bus_release_resource_t(device_t _dev, device_t _child, int _type, + int _rid, struct resource *_res); +/** + * @brief Release a resource + * + * Free a resource allocated by the BUS_ALLOC_RESOURCE. The @p _rid + * value must be the same as the one returned by BUS_ALLOC_RESOURCE() + * (which is not necessarily the same as the one the client passed). + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _type the type of resource + * @param _rid the resource identifier + * @param _r the resource to release + */ + +static __inline int BUS_RELEASE_RESOURCE(device_t _dev, device_t _child, + int _type, int _rid, + struct resource *_res) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_release_resource); + return ((bus_release_resource_t *) _m)(_dev, _child, _type, _rid, _res); +} + +/** @brief Unique descriptor for the BUS_SETUP_INTR() method */ +extern struct kobjop_desc bus_setup_intr_desc; +/** @brief A function implementing the BUS_SETUP_INTR() method */ +typedef int bus_setup_intr_t(device_t _dev, device_t _child, + struct resource *_irq, int _flags, + driver_filter_t *_filter, driver_intr_t *_intr, + void *_arg, void **_cookiep); +/** + * @brief Install an interrupt handler + * + * This method is used to associate an interrupt handler function with + * an irq resource. When the interrupt triggers, the function @p _intr + * will be called with the value of @p _arg as its single + * argument. The value returned in @p *_cookiep is used to cancel the + * interrupt handler - the caller should save this value to use in a + * future call to BUS_TEARDOWN_INTR(). + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _irq the resource representing the interrupt + * @param _flags a set of bits from enum intr_type specifying + * the class of interrupt + * @param _intr the function to call when the interrupt + * triggers + * @param _arg a value to use as the single argument in calls + * to @p _intr + * @param _cookiep a pointer to a location to receive a cookie + * value that may be used to remove the interrupt + * handler + */ + +static __inline int BUS_SETUP_INTR(device_t _dev, device_t _child, + struct resource *_irq, int _flags, + driver_filter_t *_filter, + driver_intr_t *_intr, void *_arg, + void **_cookiep) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_setup_intr); + return ((bus_setup_intr_t *) _m)(_dev, _child, _irq, _flags, _filter, _intr, _arg, _cookiep); +} + +/** @brief Unique descriptor for the BUS_TEARDOWN_INTR() method */ +extern struct kobjop_desc bus_teardown_intr_desc; +/** @brief A function implementing the BUS_TEARDOWN_INTR() method */ +typedef int bus_teardown_intr_t(device_t _dev, device_t _child, + struct resource *_irq, void *_cookie); +/** + * @brief Uninstall an interrupt handler + * + * This method is used to disassociate an interrupt handler function + * with an irq resource. The value of @p _cookie must be the value + * returned from a previous call to BUS_SETUP_INTR(). + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _irq the resource representing the interrupt + * @param _cookie the cookie value returned when the interrupt + * was originally registered + */ + +static __inline int BUS_TEARDOWN_INTR(device_t _dev, device_t _child, + struct resource *_irq, void *_cookie) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_teardown_intr); + return ((bus_teardown_intr_t *) _m)(_dev, _child, _irq, _cookie); +} + +/** @brief Unique descriptor for the BUS_SET_RESOURCE() method */ +extern struct kobjop_desc bus_set_resource_desc; +/** @brief A function implementing the BUS_SET_RESOURCE() method */ +typedef int bus_set_resource_t(device_t _dev, device_t _child, int _type, + int _rid, rman_res_t _start, rman_res_t _count); +/** + * @brief Define a resource which can be allocated with + * BUS_ALLOC_RESOURCE(). + * + * This method is used by some busses (typically ISA) to allow a + * driver to describe a resource range that it would like to + * allocate. The resource defined by @p _type and @p _rid is defined + * to start at @p _start and to include @p _count indices in its + * range. + * + * @param _dev the parent device of @p _child + * @param _child the device which owns the resource + * @param _type the type of resource + * @param _rid the resource identifier + * @param _start the start of the resource range + * @param _count the size of the resource range + */ + +static __inline int BUS_SET_RESOURCE(device_t _dev, device_t _child, int _type, + int _rid, rman_res_t _start, + rman_res_t _count) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_set_resource); + return ((bus_set_resource_t *) _m)(_dev, _child, _type, _rid, _start, _count); +} + +/** @brief Unique descriptor for the BUS_GET_RESOURCE() method */ +extern struct kobjop_desc bus_get_resource_desc; +/** @brief A function implementing the BUS_GET_RESOURCE() method */ +typedef int bus_get_resource_t(device_t _dev, device_t _child, int _type, + int _rid, rman_res_t *_startp, + rman_res_t *_countp); +/** + * @brief Describe a resource + * + * This method allows a driver to examine the range used for a given + * resource without actually allocating it. + * + * @param _dev the parent device of @p _child + * @param _child the device which owns the resource + * @param _type the type of resource + * @param _rid the resource identifier + * @param _start the address of a location to receive the start + * index of the resource range + * @param _count the address of a location to receive the size + * of the resource range + */ + +static __inline int BUS_GET_RESOURCE(device_t _dev, device_t _child, int _type, + int _rid, rman_res_t *_startp, + rman_res_t *_countp) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_get_resource); + return ((bus_get_resource_t *) _m)(_dev, _child, _type, _rid, _startp, _countp); +} + +/** @brief Unique descriptor for the BUS_DELETE_RESOURCE() method */ +extern struct kobjop_desc bus_delete_resource_desc; +/** @brief A function implementing the BUS_DELETE_RESOURCE() method */ +typedef void bus_delete_resource_t(device_t _dev, device_t _child, int _type, + int _rid); +/** + * @brief Delete a resource. + * + * Use this to delete a resource (possibly one previously added with + * BUS_SET_RESOURCE()). + * + * @param _dev the parent device of @p _child + * @param _child the device which owns the resource + * @param _type the type of resource + * @param _rid the resource identifier + */ + +static __inline void BUS_DELETE_RESOURCE(device_t _dev, device_t _child, + int _type, int _rid) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_delete_resource); + ((bus_delete_resource_t *) _m)(_dev, _child, _type, _rid); +} + +/** @brief Unique descriptor for the BUS_GET_RESOURCE_LIST() method */ +extern struct kobjop_desc bus_get_resource_list_desc; +/** @brief A function implementing the BUS_GET_RESOURCE_LIST() method */ +typedef struct resource_list * bus_get_resource_list_t(device_t _dev, + device_t _child); +/** + * @brief Return a struct resource_list. + * + * Used by drivers which use bus_generic_rl_alloc_resource() etc. to + * implement their resource handling. It should return the resource + * list of the given child device. + * + * @param _dev the parent device of @p _child + * @param _child the device which owns the resource list + */ + +static __inline struct resource_list * BUS_GET_RESOURCE_LIST(device_t _dev, + device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_get_resource_list); + return ((bus_get_resource_list_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_CHILD_PRESENT() method */ +extern struct kobjop_desc bus_child_present_desc; +/** @brief A function implementing the BUS_CHILD_PRESENT() method */ +typedef int bus_child_present_t(device_t _dev, device_t _child); +/** + * @brief Is the hardware described by @p _child still attached to the + * system? + * + * This method should return 0 if the device is not present. It + * should return -1 if it is present. Any errors in determining + * should be returned as a normal errno value. Client drivers are to + * assume that the device is present, even if there is an error + * determining if it is there. Busses are to try to avoid returning + * errors, but newcard will return an error if the device fails to + * implement this method. + * + * @param _dev the parent device of @p _child + * @param _child the device which is being examined + */ + +static __inline int BUS_CHILD_PRESENT(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_child_present); + return ((bus_child_present_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_CHILD_PNPINFO_STR() method */ +extern struct kobjop_desc bus_child_pnpinfo_str_desc; +/** @brief A function implementing the BUS_CHILD_PNPINFO_STR() method */ +typedef int bus_child_pnpinfo_str_t(device_t _dev, device_t _child, char *_buf, + size_t _buflen); +/** + * @brief Returns the pnp info for this device. + * + * Return it as a string. If the storage is insufficient for the + * string, then return EOVERFLOW. + * + * The string must be formatted as a space-separated list of + * name=value pairs. Names may only contain alphanumeric characters, + * underscores ('_') and hyphens ('-'). Values can contain any + * non-whitespace characters. Values containing whitespace can be + * quoted with double quotes ('"'). Double quotes and backslashes in + * quoted values can be escaped with backslashes ('\'). + * + * @param _dev the parent device of @p _child + * @param _child the device which is being examined + * @param _buf the address of a buffer to receive the pnp + * string + * @param _buflen the size of the buffer pointed to by @p _buf + */ + +static __inline int BUS_CHILD_PNPINFO_STR(device_t _dev, device_t _child, + char *_buf, size_t _buflen) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_child_pnpinfo_str); + return ((bus_child_pnpinfo_str_t *) _m)(_dev, _child, _buf, _buflen); +} + +/** @brief Unique descriptor for the BUS_CHILD_LOCATION_STR() method */ +extern struct kobjop_desc bus_child_location_str_desc; +/** @brief A function implementing the BUS_CHILD_LOCATION_STR() method */ +typedef int bus_child_location_str_t(device_t _dev, device_t _child, char *_buf, + size_t _buflen); +/** + * @brief Returns the location for this device. + * + * Return it as a string. If the storage is insufficient for the + * string, then return EOVERFLOW. + * + * The string must be formatted as a space-separated list of + * name=value pairs. Names may only contain alphanumeric characters, + * underscores ('_') and hyphens ('-'). Values can contain any + * non-whitespace characters. Values containing whitespace can be + * quoted with double quotes ('"'). Double quotes and backslashes in + * quoted values can be escaped with backslashes ('\'). + * + * @param _dev the parent device of @p _child + * @param _child the device which is being examined + * @param _buf the address of a buffer to receive the location + * string + * @param _buflen the size of the buffer pointed to by @p _buf + */ + +static __inline int BUS_CHILD_LOCATION_STR(device_t _dev, device_t _child, + char *_buf, size_t _buflen) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_child_location_str); + return ((bus_child_location_str_t *) _m)(_dev, _child, _buf, _buflen); +} + +/** @brief Unique descriptor for the BUS_BIND_INTR() method */ +extern struct kobjop_desc bus_bind_intr_desc; +/** @brief A function implementing the BUS_BIND_INTR() method */ +typedef int bus_bind_intr_t(device_t _dev, device_t _child, + struct resource *_irq, int _cpu); +/** + * @brief Allow drivers to request that an interrupt be bound to a specific + * CPU. + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _irq the resource representing the interrupt + * @param _cpu the CPU to bind the interrupt to + */ + +static __inline int BUS_BIND_INTR(device_t _dev, device_t _child, + struct resource *_irq, int _cpu) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_bind_intr); + return ((bus_bind_intr_t *) _m)(_dev, _child, _irq, _cpu); +} + +/** @brief Unique descriptor for the BUS_CONFIG_INTR() method */ +extern struct kobjop_desc bus_config_intr_desc; +/** @brief A function implementing the BUS_CONFIG_INTR() method */ +typedef int bus_config_intr_t(device_t _dev, int _irq, enum intr_trigger _trig, + enum intr_polarity _pol); +/** + * @brief Allow (bus) drivers to specify the trigger mode and polarity + * of the specified interrupt. + * + * @param _dev the bus device + * @param _irq the interrupt number to modify + * @param _trig the trigger mode required + * @param _pol the interrupt polarity required + */ + +static __inline int BUS_CONFIG_INTR(device_t _dev, int _irq, + enum intr_trigger _trig, + enum intr_polarity _pol) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_config_intr); + return ((bus_config_intr_t *) _m)(_dev, _irq, _trig, _pol); +} + +/** @brief Unique descriptor for the BUS_DESCRIBE_INTR() method */ +extern struct kobjop_desc bus_describe_intr_desc; +/** @brief A function implementing the BUS_DESCRIBE_INTR() method */ +typedef int bus_describe_intr_t(device_t _dev, device_t _child, + struct resource *_irq, void *_cookie, + const char *_descr); +/** + * @brief Allow drivers to associate a description with an active + * interrupt handler. + * + * @param _dev the parent device of @p _child + * @param _child the device which allocated the resource + * @param _irq the resource representing the interrupt + * @param _cookie the cookie value returned when the interrupt + * was originally registered + * @param _descr the description to associate with the interrupt + */ + +static __inline int BUS_DESCRIBE_INTR(device_t _dev, device_t _child, + struct resource *_irq, void *_cookie, + const char *_descr) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_describe_intr); + return ((bus_describe_intr_t *) _m)(_dev, _child, _irq, _cookie, _descr); +} + +/** @brief Unique descriptor for the BUS_HINTED_CHILD() method */ +extern struct kobjop_desc bus_hinted_child_desc; +/** @brief A function implementing the BUS_HINTED_CHILD() method */ +typedef void bus_hinted_child_t(device_t _dev, const char *_dname, int _dunit); +/** + * @brief Notify a (bus) driver about a child that the hints mechanism + * believes it has discovered. + * + * The bus is responsible for then adding the child in the right order + * and discovering other things about the child. The bus driver is + * free to ignore this hint, to do special things, etc. It is all up + * to the bus driver to interpret. + * + * This method is only called in response to the parent bus asking for + * hinted devices to be enumerated. + * + * @param _dev the bus device + * @param _dname the name of the device w/o unit numbers + * @param _dunit the unit number of the device + */ + +static __inline void BUS_HINTED_CHILD(device_t _dev, const char *_dname, + int _dunit) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_hinted_child); + ((bus_hinted_child_t *) _m)(_dev, _dname, _dunit); +} + +/** @brief Unique descriptor for the BUS_GET_DMA_TAG() method */ +extern struct kobjop_desc bus_get_dma_tag_desc; +/** @brief A function implementing the BUS_GET_DMA_TAG() method */ +typedef bus_dma_tag_t bus_get_dma_tag_t(device_t _dev, device_t _child); +/** + * @brief Returns bus_dma_tag_t for use w/ devices on the bus. + * + * @param _dev the parent device of @p _child + * @param _child the device to which the tag will belong + */ + +static __inline bus_dma_tag_t BUS_GET_DMA_TAG(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_get_dma_tag); + return ((bus_get_dma_tag_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_GET_BUS_TAG() method */ +extern struct kobjop_desc bus_get_bus_tag_desc; +/** @brief A function implementing the BUS_GET_BUS_TAG() method */ +typedef bus_space_tag_t bus_get_bus_tag_t(device_t _dev, device_t _child); +/** + * @brief Returns bus_space_tag_t for use w/ devices on the bus. + * + * @param _dev the parent device of @p _child + * @param _child the device to which the tag will belong + */ + +static __inline bus_space_tag_t BUS_GET_BUS_TAG(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_get_bus_tag); + return ((bus_get_bus_tag_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_HINT_DEVICE_UNIT() method */ +extern struct kobjop_desc bus_hint_device_unit_desc; +/** @brief A function implementing the BUS_HINT_DEVICE_UNIT() method */ +typedef void bus_hint_device_unit_t(device_t _dev, device_t _child, + const char *_name, int *_unitp); +/** + * @brief Allow the bus to determine the unit number of a device. + * + * @param _dev the parent device of @p _child + * @param _child the device whose unit is to be wired + * @param _name the name of the device's new devclass + * @param _unitp a pointer to the device's new unit value + */ + +static __inline void BUS_HINT_DEVICE_UNIT(device_t _dev, device_t _child, + const char *_name, int *_unitp) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_hint_device_unit); + ((bus_hint_device_unit_t *) _m)(_dev, _child, _name, _unitp); +} + +/** @brief Unique descriptor for the BUS_NEW_PASS() method */ +extern struct kobjop_desc bus_new_pass_desc; +/** @brief A function implementing the BUS_NEW_PASS() method */ +typedef void bus_new_pass_t(device_t _dev); +/** + * @brief Notify a bus that the bus pass level has been changed + * + * @param _dev the bus device + */ + +static __inline void BUS_NEW_PASS(device_t _dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_new_pass); + ((bus_new_pass_t *) _m)(_dev); +} + +/** @brief Unique descriptor for the BUS_REMAP_INTR() method */ +extern struct kobjop_desc bus_remap_intr_desc; +/** @brief A function implementing the BUS_REMAP_INTR() method */ +typedef int bus_remap_intr_t(device_t _dev, device_t _child, u_int _irq); +/** + * @brief Notify a bus that specified child's IRQ should be remapped. + * + * @param _dev the bus device + * @param _child the child device + * @param _irq the irq number + */ + +static __inline int BUS_REMAP_INTR(device_t _dev, device_t _child, u_int _irq) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_remap_intr); + return ((bus_remap_intr_t *) _m)(_dev, _child, _irq); +} + +/** @brief Unique descriptor for the BUS_SUSPEND_CHILD() method */ +extern struct kobjop_desc bus_suspend_child_desc; +/** @brief A function implementing the BUS_SUSPEND_CHILD() method */ +typedef int bus_suspend_child_t(device_t _dev, device_t _child); +/** + * @brief Suspend a given child + * + * @param _dev the parent device of @p _child + * @param _child the device to suspend + */ + +static __inline int BUS_SUSPEND_CHILD(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_suspend_child); + return ((bus_suspend_child_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_RESUME_CHILD() method */ +extern struct kobjop_desc bus_resume_child_desc; +/** @brief A function implementing the BUS_RESUME_CHILD() method */ +typedef int bus_resume_child_t(device_t _dev, device_t _child); +/** + * @brief Resume a given child + * + * @param _dev the parent device of @p _child + * @param _child the device to resume + */ + +static __inline int BUS_RESUME_CHILD(device_t _dev, device_t _child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_resume_child); + return ((bus_resume_child_t *) _m)(_dev, _child); +} + +/** @brief Unique descriptor for the BUS_GET_DOMAIN() method */ +extern struct kobjop_desc bus_get_domain_desc; +/** @brief A function implementing the BUS_GET_DOMAIN() method */ +typedef int bus_get_domain_t(device_t _dev, device_t _child, int *_domain); +/** + * @brief Get the VM domain handle for the given bus and child. + * + * @param _dev the bus device + * @param _child the child device + * @param _domain a pointer to the bus's domain handle identifier + */ + +static __inline int BUS_GET_DOMAIN(device_t _dev, device_t _child, int *_domain) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_get_domain); + return ((bus_get_domain_t *) _m)(_dev, _child, _domain); +} + +/** @brief Unique descriptor for the BUS_GET_CPUS() method */ +extern struct kobjop_desc bus_get_cpus_desc; +/** @brief A function implementing the BUS_GET_CPUS() method */ +typedef int bus_get_cpus_t(device_t _dev, device_t _child, enum cpu_sets _op, + size_t _setsize, cpuset_t *_cpuset); +/** + * @brief Request a set of CPUs + * + * @param _dev the bus device + * @param _child the child device + * @param _op type of CPUs to request + * @param _setsize the size of the set passed in _cpuset + * @param _cpuset a pointer to a cpuset to receive the requested + * set of CPUs + */ + +static __inline int BUS_GET_CPUS(device_t _dev, device_t _child, + enum cpu_sets _op, size_t _setsize, + cpuset_t *_cpuset) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)_dev)->ops,bus_get_cpus); + return ((bus_get_cpus_t *) _m)(_dev, _child, _op, _setsize, _cpuset); +} + +#endif /* _bus_if_h_ */ diff -u -r -N usr/src/sys/modules/netmap/device_if.h /usr/src/sys/modules/netmap/device_if.h --- usr/src/sys/modules/netmap/device_if.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/device_if.h 2016-11-23 17:05:33.813056000 +0000 @@ -0,0 +1,371 @@ +/* + * This file is produced automatically. + * Do not modify anything in here by hand. + * + * Created from source file + * /usr/src/sys/kern/device_if.m + * with + * makeobjops.awk + * + * See the source file for legal information + */ + +/** + * @defgroup DEVICE device - KObj methods for all device drivers + * @brief A basic set of methods required for all device drivers. + * + * The device interface is used to match devices to drivers during + * autoconfiguration and provides methods to allow drivers to handle + * system-wide events such as suspend, resume or shutdown. + * @{ + */ + +#ifndef _device_if_h_ +#define _device_if_h_ + +/** @brief Unique descriptor for the DEVICE_PROBE() method */ +extern struct kobjop_desc device_probe_desc; +/** @brief A function implementing the DEVICE_PROBE() method */ +typedef int device_probe_t(device_t dev); +/** + * @brief Probe to see if a device matches a driver. + * + * Users should not call this method directly. Normally, this + * is called via device_probe_and_attach() to select a driver + * calling the DEVICE_PROBE() of all candidate drivers and attach + * the winning driver (if any) to the device. + * + * This function is used to match devices to device drivers. + * Typically, the driver will examine the device to see if + * it is suitable for this driver. This might include checking + * the values of various device instance variables or reading + * hardware registers. + * + * In some cases, there may be more than one driver available + * which can be used for a device (for instance there might + * be a generic driver which works for a set of many types of + * device and a more specific driver which works for a subset + * of devices). Because of this, a driver should not assume + * that it will be the driver that attaches to the device even + * if it returns a success status from DEVICE_PROBE(). In particular, + * a driver must free any resources which it allocated during + * the probe before returning. The return value of DEVICE_PROBE() + * is used to elect which driver is used - the driver which returns + * the largest non-error value wins the election and attaches to + * the device. Common non-error values are described in the + * DEVICE_PROBE(9) manual page. + * + * If a driver matches the hardware, it should set the device + * description string using device_set_desc() or + * device_set_desc_copy(). This string is used to generate an + * informative message when DEVICE_ATTACH() is called. + * + * As a special case, if a driver returns zero, the driver election + * is cut short and that driver will attach to the device + * immediately. This should rarely be used. + * + * For example, a probe method for a PCI device driver might look + * like this: + * + * @code + * int + * foo_probe(device_t dev) + * { + * if (pci_get_vendor(dev) == FOOVENDOR && + * pci_get_device(dev) == FOODEVICE) { + * device_set_desc(dev, "Foo device"); + * return (BUS_PROBE_DEFAULT); + * } + * return (ENXIO); + * } + * @endcode + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_probe, foo_probe) + * @endcode + * + * @param dev the device to probe + * + * @retval 0 if this is the only possible driver for this + * device + * @retval negative if the driver can match this device - the + * least negative value is used to select the + * driver + * @retval ENXIO if the driver does not match the device + * @retval positive if some kind of error was detected during + * the probe, a regular unix error code should + * be returned to indicate the type of error + * @see DEVICE_ATTACH(), pci_get_vendor(), pci_get_device() + */ + +static __inline int DEVICE_PROBE(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_probe); + return ((device_probe_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_IDENTIFY() method */ +extern struct kobjop_desc device_identify_desc; +/** @brief A function implementing the DEVICE_IDENTIFY() method */ +typedef void device_identify_t(driver_t *driver, device_t parent); +/** + * @brief Allow a device driver to detect devices not otherwise enumerated. + * + * The DEVICE_IDENTIFY() method is used by some drivers (e.g. the ISA + * bus driver) to help populate the bus device with a useful set of + * child devices, normally by calling the BUS_ADD_CHILD() method of + * the parent device. For instance, the ISA bus driver uses several + * special drivers, including the isahint driver and the pnp driver to + * create child devices based on configuration hints and PnP bus + * probes respectively. + * + * Many bus drivers which support true plug-and-play do not need to + * use this method at all since child devices can be discovered + * automatically without help from child drivers. + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_identify, foo_identify) + * @endcode + * + * @param driver the driver whose identify method is being called + * @param parent the parent device to use when adding new children + */ + +static __inline void DEVICE_IDENTIFY(driver_t *driver, device_t parent) +{ + kobjop_t _m; + KOBJOPLOOKUP(driver->ops,device_identify); + ((device_identify_t *) _m)(driver, parent); +} + +/** @brief Unique descriptor for the DEVICE_ATTACH() method */ +extern struct kobjop_desc device_attach_desc; +/** @brief A function implementing the DEVICE_ATTACH() method */ +typedef int device_attach_t(device_t dev); +/** + * @brief Attach a device to a device driver + * + * Normally only called via device_probe_and_attach(), this is called + * when a driver has succeeded in probing against a device. + * This method should initialise the hardware and allocate other + * system resources (e.g. devfs entries) as required. + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_attach, foo_attach) + * @endcode + * + * @param dev the device to probe + * + * @retval 0 success + * @retval non-zero if some kind of error was detected during + * the attach, a regular unix error code should + * be returned to indicate the type of error + * @see DEVICE_PROBE() + */ + +static __inline int DEVICE_ATTACH(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_attach); + return ((device_attach_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_DETACH() method */ +extern struct kobjop_desc device_detach_desc; +/** @brief A function implementing the DEVICE_DETACH() method */ +typedef int device_detach_t(device_t dev); +/** + * @brief Detach a driver from a device. + * + * This can be called if the user is replacing the + * driver software or if a device is about to be physically removed + * from the system (e.g. for removable hardware such as USB or PCCARD). + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_detach, foo_detach) + * @endcode + * + * @param dev the device to detach + * + * @retval 0 success + * @retval non-zero the detach could not be performed, e.g. if the + * driver does not support detaching. + * + * @see DEVICE_ATTACH() + */ + +static __inline int DEVICE_DETACH(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_detach); + return ((device_detach_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_SHUTDOWN() method */ +extern struct kobjop_desc device_shutdown_desc; +/** @brief A function implementing the DEVICE_SHUTDOWN() method */ +typedef int device_shutdown_t(device_t dev); +/** + * @brief Called during system shutdown. + * + * This method allows drivers to detect when the system is being shut down. + * Some drivers need to use this to place their hardware in a consistent + * state before rebooting the computer. + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_shutdown, foo_shutdown) + * @endcode + */ + +static __inline int DEVICE_SHUTDOWN(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_shutdown); + return ((device_shutdown_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_SUSPEND() method */ +extern struct kobjop_desc device_suspend_desc; +/** @brief A function implementing the DEVICE_SUSPEND() method */ +typedef int device_suspend_t(device_t dev); +/** + * @brief This is called by the power-management subsystem when a + * suspend has been requested by the user or by some automatic + * mechanism. + * + * This gives drivers a chance to veto the suspend or save their + * configuration before power is removed. + * + * To include this method in a device driver, use a line like this in + * the driver's method list: + * + * @code + * KOBJMETHOD(device_suspend, foo_suspend) + * @endcode + * + * @param dev the device being suspended + * + * @retval 0 success + * @retval non-zero an error occurred while attempting to prepare the + * device for suspension + * + * @see DEVICE_RESUME() + */ + +static __inline int DEVICE_SUSPEND(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_suspend); + return ((device_suspend_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_RESUME() method */ +extern struct kobjop_desc device_resume_desc; +/** @brief A function implementing the DEVICE_RESUME() method */ +typedef int device_resume_t(device_t dev); +/** + * @brief This is called when the system resumes after a suspend. + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_resume, foo_resume) + * @endcode + * + * @param dev the device being resumed + * + * @retval 0 success + * @retval non-zero an error occurred while attempting to restore the + * device from suspension + * + * @see DEVICE_SUSPEND() + */ + +static __inline int DEVICE_RESUME(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_resume); + return ((device_resume_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_QUIESCE() method */ +extern struct kobjop_desc device_quiesce_desc; +/** @brief A function implementing the DEVICE_QUIESCE() method */ +typedef int device_quiesce_t(device_t dev); +/** + * @brief This is called when the driver is asked to quiesce itself. + * + * The driver should arrange for the orderly shutdown of this device. + * All further access to the device should be curtailed. Soon there + * will be a request to detach, but there won't necessarily be one. + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_quiesce, foo_quiesce) + * @endcode + * + * @param dev the device being quiesced + * + * @retval 0 success + * @retval non-zero an error occurred while attempting to quiesce the + * device + * + * @see DEVICE_DETACH() + */ + +static __inline int DEVICE_QUIESCE(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_quiesce); + return ((device_quiesce_t *) _m)(dev); +} + +/** @brief Unique descriptor for the DEVICE_REGISTER() method */ +extern struct kobjop_desc device_register_desc; +/** @brief A function implementing the DEVICE_REGISTER() method */ +typedef void * device_register_t(device_t dev); +/** + * @brief This is called when the driver is asked to register handlers. + * + * + * To include this method in a device driver, use a line like this + * in the driver's method list: + * + * @code + * KOBJMETHOD(device_register, foo_register) + * @endcode + * + * @param dev the device for which handlers are being registered + * + * @retval NULL method not implemented + * @retval non-NULL a pointer to implementation specific static driver state + * + */ + +static __inline void * DEVICE_REGISTER(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,device_register); + return ((device_register_t *) _m)(dev); +} + +#endif /* _device_if_h_ */ Files usr/src/sys/modules/netmap/if_ptnet.o and /usr/src/sys/modules/netmap/if_ptnet.o differ diff -u -r -N usr/src/sys/modules/netmap/machine/_align.h /usr/src/sys/modules/netmap/machine/_align.h --- usr/src/sys/modules/netmap/machine/_align.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/_align.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/_align.h 215856 2010-11-26 10:59:20Z tijl $ */ + +#include <x86/_align.h> diff -u -r -N usr/src/sys/modules/netmap/machine/_bus.h /usr/src/sys/modules/netmap/machine/_bus.h --- usr/src/sys/modules/netmap/machine/_bus.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/_bus.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,46 @@ +/*- + * Copyright (c) 2005 M. Warner Losh. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions, and the following disclaimer, + * without modification, immediately at the beginning of the file. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/_bus.h 145253 2005-04-18 21:45:34Z imp $ + */ + +#ifndef AMD64_INCLUDE__BUS_H +#define AMD64_INCLUDE__BUS_H + +/* + * Bus address and size types + */ +typedef uint64_t bus_addr_t; +typedef uint64_t bus_size_t; + +/* + * Access methods for bus resources and address space. + */ +typedef uint64_t bus_space_tag_t; +typedef uint64_t bus_space_handle_t; + +#endif /* AMD64_INCLUDE__BUS_H */ diff -u -r -N usr/src/sys/modules/netmap/machine/_inttypes.h /usr/src/sys/modules/netmap/machine/_inttypes.h --- usr/src/sys/modules/netmap/machine/_inttypes.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/_inttypes.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/_inttypes.h 217157 2011-01-08 18:09:48Z tijl $ */ + +#include <x86/_inttypes.h> diff -u -r -N usr/src/sys/modules/netmap/machine/_limits.h /usr/src/sys/modules/netmap/machine/_limits.h --- usr/src/sys/modules/netmap/machine/_limits.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/_limits.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/_limits.h 232262 2012-02-28 18:24:28Z tijl $ */ + +#include <x86/_limits.h> diff -u -r -N usr/src/sys/modules/netmap/machine/_stdint.h /usr/src/sys/modules/netmap/machine/_stdint.h --- usr/src/sys/modules/netmap/machine/_stdint.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/_stdint.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/_stdint.h 232264 2012-02-28 18:38:33Z tijl $ */ + +#include <x86/_stdint.h> diff -u -r -N usr/src/sys/modules/netmap/machine/_types.h /usr/src/sys/modules/netmap/machine/_types.h --- usr/src/sys/modules/netmap/machine/_types.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/_types.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/_types.h 232261 2012-02-28 18:15:28Z tijl $ */ + +#include <x86/_types.h> diff -u -r -N usr/src/sys/modules/netmap/machine/acpica_machdep.h /usr/src/sys/modules/netmap/machine/acpica_machdep.h --- usr/src/sys/modules/netmap/machine/acpica_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/acpica_machdep.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/acpica_machdep.h 254305 2013-08-13 22:05:10Z jkim $ */ + +#include <x86/acpica_machdep.h> diff -u -r -N usr/src/sys/modules/netmap/machine/apm_bios.h /usr/src/sys/modules/netmap/machine/apm_bios.h --- usr/src/sys/modules/netmap/machine/apm_bios.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/apm_bios.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/apm_bios.h 215140 2010-11-11 19:36:21Z jkim $ */ + +#include <x86/apm_bios.h> diff -u -r -N usr/src/sys/modules/netmap/machine/asm.h /usr/src/sys/modules/netmap/machine/asm.h --- usr/src/sys/modules/netmap/machine/asm.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/asm.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,99 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)DEFS.h 5.1 (Berkeley) 4/23/90 + * $FreeBSD: releng/11.0/sys/amd64/include/asm.h 275004 2014-11-25 03:50:31Z emaste $ + */ + +#ifndef _MACHINE_ASM_H_ +#define _MACHINE_ASM_H_ + +#include <sys/cdefs.h> + +#ifdef PIC +#define PIC_PLT(x) x@PLT +#define PIC_GOT(x) x@GOTPCREL(%rip) +#else +#define PIC_PLT(x) x +#endif + +/* + * CNAME and HIDENAME manage the relationship between symbol names in C + * and the equivalent assembly language names. CNAME is given a name as + * it would be used in a C program. It expands to the equivalent assembly + * language name. HIDENAME is given an assembly-language name, and expands + * to a possibly-modified form that will be invisible to C programs. + */ +#define CNAME(csym) csym +#define HIDENAME(asmsym) .asmsym + +#define _START_ENTRY .text; .p2align 4,0x90 + +#define _ENTRY(x) _START_ENTRY; \ + .globl CNAME(x); .type CNAME(x),@function; CNAME(x): + +#ifdef PROF +#define ALTENTRY(x) _ENTRY(x); \ + pushq %rbp; movq %rsp,%rbp; \ + call PIC_PLT(HIDENAME(mcount)); \ + popq %rbp; \ + jmp 9f +#define ENTRY(x) _ENTRY(x); \ + pushq %rbp; movq %rsp,%rbp; \ + call PIC_PLT(HIDENAME(mcount)); \ + popq %rbp; \ + 9: +#else +#define ALTENTRY(x) _ENTRY(x) +#define ENTRY(x) _ENTRY(x) +#endif + +#define END(x) .size x, . - x +/* + * WEAK_REFERENCE(): create a weak reference alias from sym. + * The macro is not a general asm macro that takes arbitrary names, + * but one that takes only C names. It does the non-null name + * translation inside the macro. + */ +#define WEAK_REFERENCE(sym, alias) \ + .weak CNAME(alias); \ + .equ CNAME(alias),CNAME(sym) + +#define RCSID(x) .text; .asciz x + +#undef __FBSDID +#if !defined(lint) && !defined(STRIP_FBSDID) +#define __FBSDID(s) .ident s +#else +#define __FBSDID(s) /* nothing */ +#endif /* not lint and not STRIP_FBSDID */ + +#endif /* !_MACHINE_ASM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/asmacros.h /usr/src/sys/modules/netmap/machine/asmacros.h --- usr/src/sys/modules/netmap/machine/asmacros.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/asmacros.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,240 @@ +/*- + * Copyright (c) 1993 The Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/asmacros.h 274489 2014-11-13 22:11:44Z scottl $ + */ + +#ifndef _MACHINE_ASMACROS_H_ +#define _MACHINE_ASMACROS_H_ + +#include <sys/cdefs.h> + +/* XXX too much duplication in various asm*.h's. */ + +/* + * CNAME is used to manage the relationship between symbol names in C + * and the equivalent assembly language names. CNAME is given a name as + * it would be used in a C program. It expands to the equivalent assembly + * language name. + */ +#define CNAME(csym) csym + +#define ALIGN_DATA .p2align 3 /* 8 byte alignment, zero filled */ +#ifdef GPROF +#define ALIGN_TEXT .p2align 4,0x90 /* 16-byte alignment, nop filled */ +#else +#define ALIGN_TEXT .p2align 4,0x90 /* 16-byte alignment, nop filled */ +#endif +#define SUPERALIGN_TEXT .p2align 4,0x90 /* 16-byte alignment, nop filled */ + +#define GEN_ENTRY(name) ALIGN_TEXT; .globl CNAME(name); \ + .type CNAME(name),@function; CNAME(name): +#define NON_GPROF_ENTRY(name) GEN_ENTRY(name) +#define NON_GPROF_RET .byte 0xc3 /* opcode for `ret' */ + +#define END(name) .size name, . - name + +#ifdef GPROF +/* + * __mcount is like [.]mcount except that doesn't require its caller to set + * up a frame pointer. It must be called before pushing anything onto the + * stack. gcc should eventually generate code to call __mcount in most + * cases. This would make -pg in combination with -fomit-frame-pointer + * useful. gcc has a configuration variable PROFILE_BEFORE_PROLOGUE to + * allow profiling before setting up the frame pointer, but this is + * inadequate for good handling of special cases, e.g., -fpic works best + * with profiling after the prologue. + * + * [.]mexitcount is a new function to support non-statistical profiling if an + * accurate clock is available. For C sources, calls to it are generated + * by the FreeBSD extension `-mprofiler-epilogue' to gcc. It is best to + * call [.]mexitcount at the end of a function like the MEXITCOUNT macro does, + * but gcc currently generates calls to it at the start of the epilogue to + * avoid problems with -fpic. + * + * [.]mcount and __mcount may clobber the call-used registers and %ef. + * [.]mexitcount may clobber %ecx and %ef. + * + * Cross-jumping makes non-statistical profiling timing more complicated. + * It is handled in many cases by calling [.]mexitcount before jumping. It + * is handled for conditional jumps using CROSSJUMP() and CROSSJUMP_LABEL(). + * It is handled for some fault-handling jumps by not sharing the exit + * routine. + * + * ALTENTRY() must be before a corresponding ENTRY() so that it can jump to + * the main entry point. Note that alt entries are counted twice. They + * have to be counted as ordinary entries for gprof to get the call times + * right for the ordinary entries. + * + * High local labels are used in macros to avoid clashes with local labels + * in functions. + * + * Ordinary `ret' is used instead of a macro `RET' because there are a lot + * of `ret's. 0xc3 is the opcode for `ret' (`#define ret ... ret' can't + * be used because this file is sometimes preprocessed in traditional mode). + * `ret' clobbers eflags but this doesn't matter. + */ +#define ALTENTRY(name) GEN_ENTRY(name) ; MCOUNT ; MEXITCOUNT ; jmp 9f +#define CROSSJUMP(jtrue, label, jfalse) \ + jfalse 8f; MEXITCOUNT; jmp __CONCAT(to,label); 8: +#define CROSSJUMPTARGET(label) \ + ALIGN_TEXT; __CONCAT(to,label): ; MCOUNT; jmp label +#define ENTRY(name) GEN_ENTRY(name) ; 9: ; MCOUNT +#define FAKE_MCOUNT(caller) pushq caller ; call __mcount ; popq %rcx +#define MCOUNT call __mcount +#define MCOUNT_LABEL(name) GEN_ENTRY(name) ; nop ; ALIGN_TEXT +#ifdef GUPROF +#define MEXITCOUNT call .mexitcount +#define ret MEXITCOUNT ; NON_GPROF_RET +#else +#define MEXITCOUNT +#endif + +#else /* !GPROF */ +/* + * ALTENTRY() has to align because it is before a corresponding ENTRY(). + * ENTRY() has to align to because there may be no ALTENTRY() before it. + * If there is a previous ALTENTRY() then the alignment code for ENTRY() + * is empty. + */ +#define ALTENTRY(name) GEN_ENTRY(name) +#define CROSSJUMP(jtrue, label, jfalse) jtrue label +#define CROSSJUMPTARGET(label) +#define ENTRY(name) GEN_ENTRY(name) +#define FAKE_MCOUNT(caller) +#define MCOUNT +#define MCOUNT_LABEL(name) +#define MEXITCOUNT +#endif /* GPROF */ + +/* + * Convenience for adding frame pointers to hand-coded ASM. Useful for + * DTrace, HWPMC, and KDB. + */ +#define PUSH_FRAME_POINTER \ + pushq %rbp ; \ + movq %rsp, %rbp ; +#define POP_FRAME_POINTER \ + popq %rbp + +#ifdef LOCORE +/* + * Convenience macro for declaring interrupt entry points. + */ +#define IDTVEC(name) ALIGN_TEXT; .globl __CONCAT(X,name); \ + .type __CONCAT(X,name),@function; __CONCAT(X,name): + +/* + * Macros to create and destroy a trap frame. + */ +#define PUSH_FRAME \ + subq $TF_RIP,%rsp ; /* skip dummy tf_err and tf_trapno */ \ + testb $SEL_RPL_MASK,TF_CS(%rsp) ; /* come from kernel? */ \ + jz 1f ; /* Yes, dont swapgs again */ \ + swapgs ; \ +1: movq %rdi,TF_RDI(%rsp) ; \ + movq %rsi,TF_RSI(%rsp) ; \ + movq %rdx,TF_RDX(%rsp) ; \ + movq %rcx,TF_RCX(%rsp) ; \ + movq %r8,TF_R8(%rsp) ; \ + movq %r9,TF_R9(%rsp) ; \ + movq %rax,TF_RAX(%rsp) ; \ + movq %rbx,TF_RBX(%rsp) ; \ + movq %rbp,TF_RBP(%rsp) ; \ + movq %r10,TF_R10(%rsp) ; \ + movq %r11,TF_R11(%rsp) ; \ + movq %r12,TF_R12(%rsp) ; \ + movq %r13,TF_R13(%rsp) ; \ + movq %r14,TF_R14(%rsp) ; \ + movq %r15,TF_R15(%rsp) ; \ + movw %fs,TF_FS(%rsp) ; \ + movw %gs,TF_GS(%rsp) ; \ + movw %es,TF_ES(%rsp) ; \ + movw %ds,TF_DS(%rsp) ; \ + movl $TF_HASSEGS,TF_FLAGS(%rsp) ; \ + cld + +#define POP_FRAME \ + movq TF_RDI(%rsp),%rdi ; \ + movq TF_RSI(%rsp),%rsi ; \ + movq TF_RDX(%rsp),%rdx ; \ + movq TF_RCX(%rsp),%rcx ; \ + movq TF_R8(%rsp),%r8 ; \ + movq TF_R9(%rsp),%r9 ; \ + movq TF_RAX(%rsp),%rax ; \ + movq TF_RBX(%rsp),%rbx ; \ + movq TF_RBP(%rsp),%rbp ; \ + movq TF_R10(%rsp),%r10 ; \ + movq TF_R11(%rsp),%r11 ; \ + movq TF_R12(%rsp),%r12 ; \ + movq TF_R13(%rsp),%r13 ; \ + movq TF_R14(%rsp),%r14 ; \ + movq TF_R15(%rsp),%r15 ; \ + testb $SEL_RPL_MASK,TF_CS(%rsp) ; /* come from kernel? */ \ + jz 1f ; /* keep kernel GS.base */ \ + cli ; \ + swapgs ; \ +1: addq $TF_RIP,%rsp /* skip over tf_err, tf_trapno */ + +/* + * Access per-CPU data. + */ +#define PCPU(member) %gs:PC_ ## member +#define PCPU_ADDR(member, reg) \ + movq %gs:PC_PRVSPACE, reg ; \ + addq $PC_ ## member, reg + +#endif /* LOCORE */ + +#ifdef __STDC__ +#define ELFNOTE(name, type, desctype, descdata...) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz #name ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#else /* !__STDC__, i.e. -traditional */ +#define ELFNOTE(name, type, desctype, descdata) \ +.pushsection .note.name ; \ + .align 4 ; \ + .long 2f - 1f /* namesz */ ; \ + .long 4f - 3f /* descsz */ ; \ + .long type ; \ +1:.asciz "name" ; \ +2:.align 4 ; \ +3:desctype descdata ; \ +4:.align 4 ; \ +.popsection +#endif /* __STDC__ */ + +#endif /* !_MACHINE_ASMACROS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/atomic.h /usr/src/sys/modules/netmap/machine/atomic.h --- usr/src/sys/modules/netmap/machine/atomic.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/atomic.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,625 @@ +/*- + * Copyright (c) 1998 Doug Rabson + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/atomic.h 299912 2016-05-16 07:19:33Z sephe $ + */ +#ifndef _MACHINE_ATOMIC_H_ +#define _MACHINE_ATOMIC_H_ + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +/* + * To express interprocessor (as opposed to processor and device) memory + * ordering constraints, use the atomic_*() functions with acquire and release + * semantics rather than the *mb() functions. An architecture's memory + * ordering (or memory consistency) model governs the order in which a + * program's accesses to different locations may be performed by an + * implementation of that architecture. In general, for memory regions + * defined as writeback cacheable, the memory ordering implemented by amd64 + * processors preserves the program ordering of a load followed by a load, a + * load followed by a store, and a store followed by a store. Only a store + * followed by a load to a different memory location may be reordered. + * Therefore, except for special cases, like non-temporal memory accesses or + * memory regions defined as write combining, the memory ordering effects + * provided by the sfence instruction in the wmb() function and the lfence + * instruction in the rmb() function are redundant. In contrast, the + * atomic_*() functions with acquire and release semantics do not perform + * redundant instructions for ordinary cases of interprocessor memory + * ordering on any architecture. + */ +#define mb() __asm __volatile("mfence;" : : : "memory") +#define wmb() __asm __volatile("sfence;" : : : "memory") +#define rmb() __asm __volatile("lfence;" : : : "memory") + +/* + * Various simple operations on memory, each of which is atomic in the + * presence of interrupts and multiple processors. + * + * atomic_set_char(P, V) (*(u_char *)(P) |= (V)) + * atomic_clear_char(P, V) (*(u_char *)(P) &= ~(V)) + * atomic_add_char(P, V) (*(u_char *)(P) += (V)) + * atomic_subtract_char(P, V) (*(u_char *)(P) -= (V)) + * + * atomic_set_short(P, V) (*(u_short *)(P) |= (V)) + * atomic_clear_short(P, V) (*(u_short *)(P) &= ~(V)) + * atomic_add_short(P, V) (*(u_short *)(P) += (V)) + * atomic_subtract_short(P, V) (*(u_short *)(P) -= (V)) + * + * atomic_set_int(P, V) (*(u_int *)(P) |= (V)) + * atomic_clear_int(P, V) (*(u_int *)(P) &= ~(V)) + * atomic_add_int(P, V) (*(u_int *)(P) += (V)) + * atomic_subtract_int(P, V) (*(u_int *)(P) -= (V)) + * atomic_swap_int(P, V) (return (*(u_int *)(P)); *(u_int *)(P) = (V);) + * atomic_readandclear_int(P) (return (*(u_int *)(P)); *(u_int *)(P) = 0;) + * + * atomic_set_long(P, V) (*(u_long *)(P) |= (V)) + * atomic_clear_long(P, V) (*(u_long *)(P) &= ~(V)) + * atomic_add_long(P, V) (*(u_long *)(P) += (V)) + * atomic_subtract_long(P, V) (*(u_long *)(P) -= (V)) + * atomic_swap_long(P, V) (return (*(u_long *)(P)); *(u_long *)(P) = (V);) + * atomic_readandclear_long(P) (return (*(u_long *)(P)); *(u_long *)(P) = 0;) + */ + +/* + * The above functions are expanded inline in the statically-linked + * kernel. Lock prefixes are generated if an SMP kernel is being + * built. + * + * Kernel modules call real functions which are built into the kernel. + * This allows kernel modules to be portable between UP and SMP systems. + */ +#if defined(KLD_MODULE) || !defined(__GNUCLIKE_ASM) +#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ +void atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v); \ +void atomic_##NAME##_barr_##TYPE(volatile u_##TYPE *p, u_##TYPE v) + +int atomic_cmpset_int(volatile u_int *dst, u_int expect, u_int src); +int atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src); +u_int atomic_fetchadd_int(volatile u_int *p, u_int v); +u_long atomic_fetchadd_long(volatile u_long *p, u_long v); +int atomic_testandset_int(volatile u_int *p, u_int v); +int atomic_testandset_long(volatile u_long *p, u_int v); +int atomic_testandclear_int(volatile u_int *p, u_int v); +int atomic_testandclear_long(volatile u_long *p, u_int v); +void atomic_thread_fence_acq(void); +void atomic_thread_fence_acq_rel(void); +void atomic_thread_fence_rel(void); +void atomic_thread_fence_seq_cst(void); + +#define ATOMIC_LOAD(TYPE) \ +u_##TYPE atomic_load_acq_##TYPE(volatile u_##TYPE *p) +#define ATOMIC_STORE(TYPE) \ +void atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) + +#else /* !KLD_MODULE && __GNUCLIKE_ASM */ + +/* + * For userland, always use lock prefixes so that the binaries will run + * on both SMP and !SMP systems. + */ +#if defined(SMP) || !defined(_KERNEL) +#define MPLOCKED "lock ; " +#else +#define MPLOCKED +#endif + +/* + * The assembly is volatilized to avoid code chunk removal by the compiler. + * GCC aggressively reorders operations and memory clobbering is necessary + * in order to avoid that for memory barriers. + */ +#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ +static __inline void \ +atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v)\ +{ \ + __asm __volatile(MPLOCKED OP \ + : "+m" (*p) \ + : CONS (V) \ + : "cc"); \ +} \ + \ +static __inline void \ +atomic_##NAME##_barr_##TYPE(volatile u_##TYPE *p, u_##TYPE v)\ +{ \ + __asm __volatile(MPLOCKED OP \ + : "+m" (*p) \ + : CONS (V) \ + : "memory", "cc"); \ +} \ +struct __hack + +/* + * Atomic compare and set, used by the mutex functions + * + * if (*dst == expect) *dst = src (all 32 bit words) + * + * Returns 0 on failure, non-zero on success + */ + +static __inline int +atomic_cmpset_int(volatile u_int *dst, u_int expect, u_int src) +{ + u_char res; + + __asm __volatile( + " " MPLOCKED " " + " cmpxchgl %3,%1 ; " + " sete %0 ; " + "# atomic_cmpset_int" + : "=q" (res), /* 0 */ + "+m" (*dst), /* 1 */ + "+a" (expect) /* 2 */ + : "r" (src) /* 3 */ + : "memory", "cc"); + return (res); +} + +static __inline int +atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src) +{ + u_char res; + + __asm __volatile( + " " MPLOCKED " " + " cmpxchgq %3,%1 ; " + " sete %0 ; " + "# atomic_cmpset_long" + : "=q" (res), /* 0 */ + "+m" (*dst), /* 1 */ + "+a" (expect) /* 2 */ + : "r" (src) /* 3 */ + : "memory", "cc"); + return (res); +} + +/* + * Atomically add the value of v to the integer pointed to by p and return + * the previous value of *p. + */ +static __inline u_int +atomic_fetchadd_int(volatile u_int *p, u_int v) +{ + + __asm __volatile( + " " MPLOCKED " " + " xaddl %0,%1 ; " + "# atomic_fetchadd_int" + : "+r" (v), /* 0 */ + "+m" (*p) /* 1 */ + : : "cc"); + return (v); +} + +/* + * Atomically add the value of v to the long integer pointed to by p and return + * the previous value of *p. + */ +static __inline u_long +atomic_fetchadd_long(volatile u_long *p, u_long v) +{ + + __asm __volatile( + " " MPLOCKED " " + " xaddq %0,%1 ; " + "# atomic_fetchadd_long" + : "+r" (v), /* 0 */ + "+m" (*p) /* 1 */ + : : "cc"); + return (v); +} + +static __inline int +atomic_testandset_int(volatile u_int *p, u_int v) +{ + u_char res; + + __asm __volatile( + " " MPLOCKED " " + " btsl %2,%1 ; " + " setc %0 ; " + "# atomic_testandset_int" + : "=q" (res), /* 0 */ + "+m" (*p) /* 1 */ + : "Ir" (v & 0x1f) /* 2 */ + : "cc"); + return (res); +} + +static __inline int +atomic_testandset_long(volatile u_long *p, u_int v) +{ + u_char res; + + __asm __volatile( + " " MPLOCKED " " + " btsq %2,%1 ; " + " setc %0 ; " + "# atomic_testandset_long" + : "=q" (res), /* 0 */ + "+m" (*p) /* 1 */ + : "Jr" ((u_long)(v & 0x3f)) /* 2 */ + : "cc"); + return (res); +} + +static __inline int +atomic_testandclear_int(volatile u_int *p, u_int v) +{ + u_char res; + + __asm __volatile( + " " MPLOCKED " " + " btrl %2,%1 ; " + " setc %0 ; " + "# atomic_testandclear_int" + : "=q" (res), /* 0 */ + "+m" (*p) /* 1 */ + : "Ir" (v & 0x1f) /* 2 */ + : "cc"); + return (res); +} + +static __inline int +atomic_testandclear_long(volatile u_long *p, u_int v) +{ + u_char res; + + __asm __volatile( + " " MPLOCKED " " + " btrq %2,%1 ; " + " setc %0 ; " + "# atomic_testandclear_long" + : "=q" (res), /* 0 */ + "+m" (*p) /* 1 */ + : "Jr" ((u_long)(v & 0x3f)) /* 2 */ + : "cc"); + return (res); +} + +/* + * We assume that a = b will do atomic loads and stores. Due to the + * IA32 memory model, a simple store guarantees release semantics. + * + * However, a load may pass a store if they are performed on distinct + * addresses, so we need a Store/Load barrier for sequentially + * consistent fences in SMP kernels. We use "lock addl $0,mem" for a + * Store/Load barrier, as recommended by the AMD Software Optimization + * Guide, and not mfence. To avoid false data dependencies, we use a + * special address for "mem". In the kernel, we use a private per-cpu + * cache line. In user space, we use a word in the stack's red zone + * (-8(%rsp)). + * + * For UP kernels, however, the memory of the single processor is + * always consistent, so we only need to stop the compiler from + * reordering accesses in a way that violates the semantics of acquire + * and release. + */ + +#if defined(_KERNEL) + +/* + * OFFSETOF_MONITORBUF == __pcpu_offset(pc_monitorbuf). + * + * The open-coded number is used instead of the symbolic expression to + * avoid a dependency on sys/pcpu.h in machine/atomic.h consumers. + * An assertion in amd64/vm_machdep.c ensures that the value is correct. + */ +#define OFFSETOF_MONITORBUF 0x180 + +#if defined(SMP) +static __inline void +__storeload_barrier(void) +{ + + __asm __volatile("lock; addl $0,%%gs:%0" + : "+m" (*(u_int *)OFFSETOF_MONITORBUF) : : "memory", "cc"); +} +#else /* _KERNEL && UP */ +static __inline void +__storeload_barrier(void) +{ + + __compiler_membar(); +} +#endif /* SMP */ +#else /* !_KERNEL */ +static __inline void +__storeload_barrier(void) +{ + + __asm __volatile("lock; addl $0,-8(%%rsp)" : : : "memory", "cc"); +} +#endif /* _KERNEL*/ + +#define ATOMIC_LOAD(TYPE) \ +static __inline u_##TYPE \ +atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ +{ \ + u_##TYPE res; \ + \ + res = *p; \ + __compiler_membar(); \ + return (res); \ +} \ +struct __hack + +#define ATOMIC_STORE(TYPE) \ +static __inline void \ +atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) \ +{ \ + \ + __compiler_membar(); \ + *p = v; \ +} \ +struct __hack + +static __inline void +atomic_thread_fence_acq(void) +{ + + __compiler_membar(); +} + +static __inline void +atomic_thread_fence_rel(void) +{ + + __compiler_membar(); +} + +static __inline void +atomic_thread_fence_acq_rel(void) +{ + + __compiler_membar(); +} + +static __inline void +atomic_thread_fence_seq_cst(void) +{ + + __storeload_barrier(); +} + +#endif /* KLD_MODULE || !__GNUCLIKE_ASM */ + +ATOMIC_ASM(set, char, "orb %b1,%0", "iq", v); +ATOMIC_ASM(clear, char, "andb %b1,%0", "iq", ~v); +ATOMIC_ASM(add, char, "addb %b1,%0", "iq", v); +ATOMIC_ASM(subtract, char, "subb %b1,%0", "iq", v); + +ATOMIC_ASM(set, short, "orw %w1,%0", "ir", v); +ATOMIC_ASM(clear, short, "andw %w1,%0", "ir", ~v); +ATOMIC_ASM(add, short, "addw %w1,%0", "ir", v); +ATOMIC_ASM(subtract, short, "subw %w1,%0", "ir", v); + +ATOMIC_ASM(set, int, "orl %1,%0", "ir", v); +ATOMIC_ASM(clear, int, "andl %1,%0", "ir", ~v); +ATOMIC_ASM(add, int, "addl %1,%0", "ir", v); +ATOMIC_ASM(subtract, int, "subl %1,%0", "ir", v); + +ATOMIC_ASM(set, long, "orq %1,%0", "ir", v); +ATOMIC_ASM(clear, long, "andq %1,%0", "ir", ~v); +ATOMIC_ASM(add, long, "addq %1,%0", "ir", v); +ATOMIC_ASM(subtract, long, "subq %1,%0", "ir", v); + +#define ATOMIC_LOADSTORE(TYPE) \ + ATOMIC_LOAD(TYPE); \ + ATOMIC_STORE(TYPE) + +ATOMIC_LOADSTORE(char); +ATOMIC_LOADSTORE(short); +ATOMIC_LOADSTORE(int); +ATOMIC_LOADSTORE(long); + +#undef ATOMIC_ASM +#undef ATOMIC_LOAD +#undef ATOMIC_STORE +#undef ATOMIC_LOADSTORE +#ifndef WANT_FUNCTIONS + +/* Read the current value and store a new value in the destination. */ +#ifdef __GNUCLIKE_ASM + +static __inline u_int +atomic_swap_int(volatile u_int *p, u_int v) +{ + + __asm __volatile( + " xchgl %1,%0 ; " + "# atomic_swap_int" + : "+r" (v), /* 0 */ + "+m" (*p)); /* 1 */ + return (v); +} + +static __inline u_long +atomic_swap_long(volatile u_long *p, u_long v) +{ + + __asm __volatile( + " xchgq %1,%0 ; " + "# atomic_swap_long" + : "+r" (v), /* 0 */ + "+m" (*p)); /* 1 */ + return (v); +} + +#else /* !__GNUCLIKE_ASM */ + +u_int atomic_swap_int(volatile u_int *p, u_int v); +u_long atomic_swap_long(volatile u_long *p, u_long v); + +#endif /* __GNUCLIKE_ASM */ + +#define atomic_set_acq_char atomic_set_barr_char +#define atomic_set_rel_char atomic_set_barr_char +#define atomic_clear_acq_char atomic_clear_barr_char +#define atomic_clear_rel_char atomic_clear_barr_char +#define atomic_add_acq_char atomic_add_barr_char +#define atomic_add_rel_char atomic_add_barr_char +#define atomic_subtract_acq_char atomic_subtract_barr_char +#define atomic_subtract_rel_char atomic_subtract_barr_char + +#define atomic_set_acq_short atomic_set_barr_short +#define atomic_set_rel_short atomic_set_barr_short +#define atomic_clear_acq_short atomic_clear_barr_short +#define atomic_clear_rel_short atomic_clear_barr_short +#define atomic_add_acq_short atomic_add_barr_short +#define atomic_add_rel_short atomic_add_barr_short +#define atomic_subtract_acq_short atomic_subtract_barr_short +#define atomic_subtract_rel_short atomic_subtract_barr_short + +#define atomic_set_acq_int atomic_set_barr_int +#define atomic_set_rel_int atomic_set_barr_int +#define atomic_clear_acq_int atomic_clear_barr_int +#define atomic_clear_rel_int atomic_clear_barr_int +#define atomic_add_acq_int atomic_add_barr_int +#define atomic_add_rel_int atomic_add_barr_int +#define atomic_subtract_acq_int atomic_subtract_barr_int +#define atomic_subtract_rel_int atomic_subtract_barr_int +#define atomic_cmpset_acq_int atomic_cmpset_int +#define atomic_cmpset_rel_int atomic_cmpset_int + +#define atomic_set_acq_long atomic_set_barr_long +#define atomic_set_rel_long atomic_set_barr_long +#define atomic_clear_acq_long atomic_clear_barr_long +#define atomic_clear_rel_long atomic_clear_barr_long +#define atomic_add_acq_long atomic_add_barr_long +#define atomic_add_rel_long atomic_add_barr_long +#define atomic_subtract_acq_long atomic_subtract_barr_long +#define atomic_subtract_rel_long atomic_subtract_barr_long +#define atomic_cmpset_acq_long atomic_cmpset_long +#define atomic_cmpset_rel_long atomic_cmpset_long + +#define atomic_readandclear_int(p) atomic_swap_int(p, 0) +#define atomic_readandclear_long(p) atomic_swap_long(p, 0) + +/* Operations on 8-bit bytes. */ +#define atomic_set_8 atomic_set_char +#define atomic_set_acq_8 atomic_set_acq_char +#define atomic_set_rel_8 atomic_set_rel_char +#define atomic_clear_8 atomic_clear_char +#define atomic_clear_acq_8 atomic_clear_acq_char +#define atomic_clear_rel_8 atomic_clear_rel_char +#define atomic_add_8 atomic_add_char +#define atomic_add_acq_8 atomic_add_acq_char +#define atomic_add_rel_8 atomic_add_rel_char +#define atomic_subtract_8 atomic_subtract_char +#define atomic_subtract_acq_8 atomic_subtract_acq_char +#define atomic_subtract_rel_8 atomic_subtract_rel_char +#define atomic_load_acq_8 atomic_load_acq_char +#define atomic_store_rel_8 atomic_store_rel_char + +/* Operations on 16-bit words. */ +#define atomic_set_16 atomic_set_short +#define atomic_set_acq_16 atomic_set_acq_short +#define atomic_set_rel_16 atomic_set_rel_short +#define atomic_clear_16 atomic_clear_short +#define atomic_clear_acq_16 atomic_clear_acq_short +#define atomic_clear_rel_16 atomic_clear_rel_short +#define atomic_add_16 atomic_add_short +#define atomic_add_acq_16 atomic_add_acq_short +#define atomic_add_rel_16 atomic_add_rel_short +#define atomic_subtract_16 atomic_subtract_short +#define atomic_subtract_acq_16 atomic_subtract_acq_short +#define atomic_subtract_rel_16 atomic_subtract_rel_short +#define atomic_load_acq_16 atomic_load_acq_short +#define atomic_store_rel_16 atomic_store_rel_short + +/* Operations on 32-bit double words. */ +#define atomic_set_32 atomic_set_int +#define atomic_set_acq_32 atomic_set_acq_int +#define atomic_set_rel_32 atomic_set_rel_int +#define atomic_clear_32 atomic_clear_int +#define atomic_clear_acq_32 atomic_clear_acq_int +#define atomic_clear_rel_32 atomic_clear_rel_int +#define atomic_add_32 atomic_add_int +#define atomic_add_acq_32 atomic_add_acq_int +#define atomic_add_rel_32 atomic_add_rel_int +#define atomic_subtract_32 atomic_subtract_int +#define atomic_subtract_acq_32 atomic_subtract_acq_int +#define atomic_subtract_rel_32 atomic_subtract_rel_int +#define atomic_load_acq_32 atomic_load_acq_int +#define atomic_store_rel_32 atomic_store_rel_int +#define atomic_cmpset_32 atomic_cmpset_int +#define atomic_cmpset_acq_32 atomic_cmpset_acq_int +#define atomic_cmpset_rel_32 atomic_cmpset_rel_int +#define atomic_swap_32 atomic_swap_int +#define atomic_readandclear_32 atomic_readandclear_int +#define atomic_fetchadd_32 atomic_fetchadd_int +#define atomic_testandset_32 atomic_testandset_int +#define atomic_testandclear_32 atomic_testandclear_int + +/* Operations on 64-bit quad words. */ +#define atomic_set_64 atomic_set_long +#define atomic_set_acq_64 atomic_set_acq_long +#define atomic_set_rel_64 atomic_set_rel_long +#define atomic_clear_64 atomic_clear_long +#define atomic_clear_acq_64 atomic_clear_acq_long +#define atomic_clear_rel_64 atomic_clear_rel_long +#define atomic_add_64 atomic_add_long +#define atomic_add_acq_64 atomic_add_acq_long +#define atomic_add_rel_64 atomic_add_rel_long +#define atomic_subtract_64 atomic_subtract_long +#define atomic_subtract_acq_64 atomic_subtract_acq_long +#define atomic_subtract_rel_64 atomic_subtract_rel_long +#define atomic_load_acq_64 atomic_load_acq_long +#define atomic_store_rel_64 atomic_store_rel_long +#define atomic_cmpset_64 atomic_cmpset_long +#define atomic_cmpset_acq_64 atomic_cmpset_acq_long +#define atomic_cmpset_rel_64 atomic_cmpset_rel_long +#define atomic_swap_64 atomic_swap_long +#define atomic_readandclear_64 atomic_readandclear_long +#define atomic_fetchadd_64 atomic_fetchadd_long +#define atomic_testandset_64 atomic_testandset_long +#define atomic_testandclear_64 atomic_testandclear_long + +/* Operations on pointers. */ +#define atomic_set_ptr atomic_set_long +#define atomic_set_acq_ptr atomic_set_acq_long +#define atomic_set_rel_ptr atomic_set_rel_long +#define atomic_clear_ptr atomic_clear_long +#define atomic_clear_acq_ptr atomic_clear_acq_long +#define atomic_clear_rel_ptr atomic_clear_rel_long +#define atomic_add_ptr atomic_add_long +#define atomic_add_acq_ptr atomic_add_acq_long +#define atomic_add_rel_ptr atomic_add_rel_long +#define atomic_subtract_ptr atomic_subtract_long +#define atomic_subtract_acq_ptr atomic_subtract_acq_long +#define atomic_subtract_rel_ptr atomic_subtract_rel_long +#define atomic_load_acq_ptr atomic_load_acq_long +#define atomic_store_rel_ptr atomic_store_rel_long +#define atomic_cmpset_ptr atomic_cmpset_long +#define atomic_cmpset_acq_ptr atomic_cmpset_acq_long +#define atomic_cmpset_rel_ptr atomic_cmpset_rel_long +#define atomic_swap_ptr atomic_swap_long +#define atomic_readandclear_ptr atomic_readandclear_long + +#endif /* !WANT_FUNCTIONS */ + +#endif /* !_MACHINE_ATOMIC_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/bus.h /usr/src/sys/modules/netmap/machine/bus.h --- usr/src/sys/modules/netmap/machine/bus.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/bus.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/bus.h 244191 2012-12-13 21:27:20Z jimharris $ */ + +#include <x86/bus.h> diff -u -r -N usr/src/sys/modules/netmap/machine/bus_dma.h /usr/src/sys/modules/netmap/machine/bus_dma.h --- usr/src/sys/modules/netmap/machine/bus_dma.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/bus_dma.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,34 @@ +/*- + * Copyright (c) 2005 Scott Long + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/bus_dma.h 148275 2005-07-22 04:03:25Z obrien $ + */ + +#ifndef _AMD64_BUS_DMA_H_ +#define _AMD64_BUS_DMA_H_ + +#include <sys/bus_dma.h> + +#endif /* _AMD64_BUS_DMA_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/clock.h /usr/src/sys/modules/netmap/machine/clock.h --- usr/src/sys/modules/netmap/machine/clock.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/clock.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,45 @@ +/*- + * Kernel interface to machine-dependent clock driver. + * Garrett Wollman, September 1994. + * This file is in the public domain. + * + * $FreeBSD: releng/11.0/sys/amd64/include/clock.h 263008 2014-03-11 10:20:42Z royger $ + */ + +#ifndef _MACHINE_CLOCK_H_ +#define _MACHINE_CLOCK_H_ + +#ifdef _KERNEL +/* + * i386 to clock driver interface. + * XXX large parts of the driver and its interface are misplaced. + */ +extern int clkintr_pending; +extern u_int i8254_freq; +extern int i8254_max_count; +extern uint64_t tsc_freq; +extern int tsc_is_invariant; +extern int tsc_perf_stat; +#ifdef SMP +extern int smp_tsc; +#endif + +void i8254_init(void); +void i8254_delay(int); +void clock_init(void); + +/* + * Driver to clock driver interface. + */ + +void startrtclock(void); +void init_TSC(void); + +#define HAS_TIMER_SPKR 1 +int timer_spkr_acquire(void); +int timer_spkr_release(void); +void timer_spkr_setfreq(int freq); + +#endif /* _KERNEL */ + +#endif /* !_MACHINE_CLOCK_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/counter.h /usr/src/sys/modules/netmap/machine/counter.h --- usr/src/sys/modules/netmap/machine/counter.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/counter.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,89 @@ +/*- + * Copyright (c) 2012 Konstantin Belousov <kib@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/counter.h 302372 2016-07-06 14:09:49Z nwhitehorn $ + */ + +#ifndef __MACHINE_COUNTER_H__ +#define __MACHINE_COUNTER_H__ + +#include <sys/pcpu.h> + +extern struct pcpu __pcpu[1]; + +#define counter_enter() do {} while (0) +#define counter_exit() do {} while (0) + +#ifdef IN_SUBR_COUNTER_C +static inline uint64_t +counter_u64_read_one(uint64_t *p, int cpu) +{ + + return (*(uint64_t *)((char *)p + sizeof(struct pcpu) * cpu)); +} + +static inline uint64_t +counter_u64_fetch_inline(uint64_t *p) +{ + uint64_t r; + int i; + + r = 0; + CPU_FOREACH(i) + r += counter_u64_read_one((uint64_t *)p, i); + + return (r); +} + +static void +counter_u64_zero_one_cpu(void *arg) +{ + + *((uint64_t *)((char *)arg + sizeof(struct pcpu) * + PCPU_GET(cpuid))) = 0; +} + +static inline void +counter_u64_zero_inline(counter_u64_t c) +{ + + smp_rendezvous(smp_no_rendevous_barrier, counter_u64_zero_one_cpu, + smp_no_rendevous_barrier, c); +} +#endif + +#define counter_u64_add_protected(c, i) counter_u64_add(c, i) + +static inline void +counter_u64_add(counter_u64_t c, int64_t inc) +{ + + __asm __volatile("addq\t%1,%%gs:(%0)" + : + : "r" ((char *)c - (char *)&__pcpu[0]), "ri" (inc) + : "memory", "cc"); +} + +#endif /* ! __MACHINE_COUNTER_H__ */ diff -u -r -N usr/src/sys/modules/netmap/machine/cpu.h /usr/src/sys/modules/netmap/machine/cpu.h --- usr/src/sys/modules/netmap/machine/cpu.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/cpu.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,93 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)cpu.h 5.4 (Berkeley) 5/9/91 + * $FreeBSD: releng/11.0/sys/amd64/include/cpu.h 267526 2014-06-16 08:43:03Z royger $ + */ + +#ifndef _MACHINE_CPU_H_ +#define _MACHINE_CPU_H_ + +/* + * Definitions unique to i386 cpu support. + */ +#include <machine/psl.h> +#include <machine/frame.h> +#include <machine/segments.h> + +#define cpu_exec(p) /* nothing */ +#define cpu_swapin(p) /* nothing */ +#define cpu_getstack(td) ((td)->td_frame->tf_rsp) +#define cpu_setstack(td, ap) ((td)->td_frame->tf_rsp = (ap)) +#define cpu_spinwait() ia32_pause() + +#define TRAPF_USERMODE(framep) \ + (ISPL((framep)->tf_cs) == SEL_UPL) +#define TRAPF_PC(framep) ((framep)->tf_rip) + +#ifdef _KERNEL +/* + * Struct containing pointers to CPU management functions whose + * implementation is run time selectable. Selection can be made, + * for example, based on detection of a particular CPU variant or + * hypervisor environment. + */ +struct cpu_ops { + void (*cpu_init)(void); + void (*cpu_resume)(void); +}; + +extern struct cpu_ops cpu_ops; +extern char btext[]; +extern char etext[]; + +/* Resume hook for VMM. */ +extern void (*vmm_resume_p)(void); + +void cpu_halt(void); +void cpu_reset(void); +void fork_trampoline(void); +void swi_vm(void *); + +/* + * Return contents of in-cpu fast counter as a sort of "bogo-time" + * for random-harvesting purposes. + */ +static __inline u_int64_t +get_cyclecount(void) +{ + + return (rdtsc()); +} + +#endif + +#endif /* !_MACHINE_CPU_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/cpufunc.h /usr/src/sys/modules/netmap/machine/cpufunc.h --- usr/src/sys/modules/netmap/machine/cpufunc.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/cpufunc.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,871 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1993 The Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/cpufunc.h 291688 2015-12-03 11:14:14Z kib $ + */ + +/* + * Functions to provide access to special i386 instructions. + * This in included in sys/systm.h, and that file should be + * used in preference to this. + */ + +#ifndef _MACHINE_CPUFUNC_H_ +#define _MACHINE_CPUFUNC_H_ + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +struct region_descriptor; + +#define readb(va) (*(volatile uint8_t *) (va)) +#define readw(va) (*(volatile uint16_t *) (va)) +#define readl(va) (*(volatile uint32_t *) (va)) +#define readq(va) (*(volatile uint64_t *) (va)) + +#define writeb(va, d) (*(volatile uint8_t *) (va) = (d)) +#define writew(va, d) (*(volatile uint16_t *) (va) = (d)) +#define writel(va, d) (*(volatile uint32_t *) (va) = (d)) +#define writeq(va, d) (*(volatile uint64_t *) (va) = (d)) + +#if defined(__GNUCLIKE_ASM) && defined(__CC_SUPPORTS___INLINE) + +static __inline void +breakpoint(void) +{ + __asm __volatile("int $3"); +} + +static __inline u_int +bsfl(u_int mask) +{ + u_int result; + + __asm __volatile("bsfl %1,%0" : "=r" (result) : "rm" (mask)); + return (result); +} + +static __inline u_long +bsfq(u_long mask) +{ + u_long result; + + __asm __volatile("bsfq %1,%0" : "=r" (result) : "rm" (mask)); + return (result); +} + +static __inline u_int +bsrl(u_int mask) +{ + u_int result; + + __asm __volatile("bsrl %1,%0" : "=r" (result) : "rm" (mask)); + return (result); +} + +static __inline u_long +bsrq(u_long mask) +{ + u_long result; + + __asm __volatile("bsrq %1,%0" : "=r" (result) : "rm" (mask)); + return (result); +} + +static __inline void +clflush(u_long addr) +{ + + __asm __volatile("clflush %0" : : "m" (*(char *)addr)); +} + +static __inline void +clflushopt(u_long addr) +{ + + __asm __volatile(".byte 0x66;clflush %0" : : "m" (*(char *)addr)); +} + +static __inline void +clts(void) +{ + + __asm __volatile("clts"); +} + +static __inline void +disable_intr(void) +{ + __asm __volatile("cli" : : : "memory"); +} + +static __inline void +do_cpuid(u_int ax, u_int *p) +{ + __asm __volatile("cpuid" + : "=a" (p[0]), "=b" (p[1]), "=c" (p[2]), "=d" (p[3]) + : "0" (ax)); +} + +static __inline void +cpuid_count(u_int ax, u_int cx, u_int *p) +{ + __asm __volatile("cpuid" + : "=a" (p[0]), "=b" (p[1]), "=c" (p[2]), "=d" (p[3]) + : "0" (ax), "c" (cx)); +} + +static __inline void +enable_intr(void) +{ + __asm __volatile("sti"); +} + +#ifdef _KERNEL + +#define HAVE_INLINE_FFS +#define ffs(x) __builtin_ffs(x) + +#define HAVE_INLINE_FFSL + +static __inline int +ffsl(long mask) +{ + return (mask == 0 ? mask : (int)bsfq((u_long)mask) + 1); +} + +#define HAVE_INLINE_FFSLL + +static __inline int +ffsll(long long mask) +{ + return (ffsl((long)mask)); +} + +#define HAVE_INLINE_FLS + +static __inline int +fls(int mask) +{ + return (mask == 0 ? mask : (int)bsrl((u_int)mask) + 1); +} + +#define HAVE_INLINE_FLSL + +static __inline int +flsl(long mask) +{ + return (mask == 0 ? mask : (int)bsrq((u_long)mask) + 1); +} + +#define HAVE_INLINE_FLSLL + +static __inline int +flsll(long long mask) +{ + return (flsl((long)mask)); +} + +#endif /* _KERNEL */ + +static __inline void +halt(void) +{ + __asm __volatile("hlt"); +} + +static __inline u_char +inb(u_int port) +{ + u_char data; + + __asm __volatile("inb %w1, %0" : "=a" (data) : "Nd" (port)); + return (data); +} + +static __inline u_int +inl(u_int port) +{ + u_int data; + + __asm __volatile("inl %w1, %0" : "=a" (data) : "Nd" (port)); + return (data); +} + +static __inline void +insb(u_int port, void *addr, size_t count) +{ + __asm __volatile("cld; rep; insb" + : "+D" (addr), "+c" (count) + : "d" (port) + : "memory"); +} + +static __inline void +insw(u_int port, void *addr, size_t count) +{ + __asm __volatile("cld; rep; insw" + : "+D" (addr), "+c" (count) + : "d" (port) + : "memory"); +} + +static __inline void +insl(u_int port, void *addr, size_t count) +{ + __asm __volatile("cld; rep; insl" + : "+D" (addr), "+c" (count) + : "d" (port) + : "memory"); +} + +static __inline void +invd(void) +{ + __asm __volatile("invd"); +} + +static __inline u_short +inw(u_int port) +{ + u_short data; + + __asm __volatile("inw %w1, %0" : "=a" (data) : "Nd" (port)); + return (data); +} + +static __inline void +outb(u_int port, u_char data) +{ + __asm __volatile("outb %0, %w1" : : "a" (data), "Nd" (port)); +} + +static __inline void +outl(u_int port, u_int data) +{ + __asm __volatile("outl %0, %w1" : : "a" (data), "Nd" (port)); +} + +static __inline void +outsb(u_int port, const void *addr, size_t count) +{ + __asm __volatile("cld; rep; outsb" + : "+S" (addr), "+c" (count) + : "d" (port)); +} + +static __inline void +outsw(u_int port, const void *addr, size_t count) +{ + __asm __volatile("cld; rep; outsw" + : "+S" (addr), "+c" (count) + : "d" (port)); +} + +static __inline void +outsl(u_int port, const void *addr, size_t count) +{ + __asm __volatile("cld; rep; outsl" + : "+S" (addr), "+c" (count) + : "d" (port)); +} + +static __inline void +outw(u_int port, u_short data) +{ + __asm __volatile("outw %0, %w1" : : "a" (data), "Nd" (port)); +} + +static __inline u_long +popcntq(u_long mask) +{ + u_long result; + + __asm __volatile("popcntq %1,%0" : "=r" (result) : "rm" (mask)); + return (result); +} + +static __inline void +lfence(void) +{ + + __asm __volatile("lfence" : : : "memory"); +} + +static __inline void +mfence(void) +{ + + __asm __volatile("mfence" : : : "memory"); +} + +static __inline void +ia32_pause(void) +{ + __asm __volatile("pause"); +} + +static __inline u_long +read_rflags(void) +{ + u_long rf; + + __asm __volatile("pushfq; popq %0" : "=r" (rf)); + return (rf); +} + +static __inline uint64_t +rdmsr(u_int msr) +{ + uint32_t low, high; + + __asm __volatile("rdmsr" : "=a" (low), "=d" (high) : "c" (msr)); + return (low | ((uint64_t)high << 32)); +} + +static __inline uint32_t +rdmsr32(u_int msr) +{ + uint32_t low; + + __asm __volatile("rdmsr" : "=a" (low) : "c" (msr) : "rdx"); + return (low); +} + +static __inline uint64_t +rdpmc(u_int pmc) +{ + uint32_t low, high; + + __asm __volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (pmc)); + return (low | ((uint64_t)high << 32)); +} + +static __inline uint64_t +rdtsc(void) +{ + uint32_t low, high; + + __asm __volatile("rdtsc" : "=a" (low), "=d" (high)); + return (low | ((uint64_t)high << 32)); +} + +static __inline uint32_t +rdtsc32(void) +{ + uint32_t rv; + + __asm __volatile("rdtsc" : "=a" (rv) : : "edx"); + return (rv); +} + +static __inline void +wbinvd(void) +{ + __asm __volatile("wbinvd"); +} + +static __inline void +write_rflags(u_long rf) +{ + __asm __volatile("pushq %0; popfq" : : "r" (rf)); +} + +static __inline void +wrmsr(u_int msr, uint64_t newval) +{ + uint32_t low, high; + + low = newval; + high = newval >> 32; + __asm __volatile("wrmsr" : : "a" (low), "d" (high), "c" (msr)); +} + +static __inline void +load_cr0(u_long data) +{ + + __asm __volatile("movq %0,%%cr0" : : "r" (data)); +} + +static __inline u_long +rcr0(void) +{ + u_long data; + + __asm __volatile("movq %%cr0,%0" : "=r" (data)); + return (data); +} + +static __inline u_long +rcr2(void) +{ + u_long data; + + __asm __volatile("movq %%cr2,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_cr3(u_long data) +{ + + __asm __volatile("movq %0,%%cr3" : : "r" (data) : "memory"); +} + +static __inline u_long +rcr3(void) +{ + u_long data; + + __asm __volatile("movq %%cr3,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_cr4(u_long data) +{ + __asm __volatile("movq %0,%%cr4" : : "r" (data)); +} + +static __inline u_long +rcr4(void) +{ + u_long data; + + __asm __volatile("movq %%cr4,%0" : "=r" (data)); + return (data); +} + +static __inline u_long +rxcr(u_int reg) +{ + u_int low, high; + + __asm __volatile("xgetbv" : "=a" (low), "=d" (high) : "c" (reg)); + return (low | ((uint64_t)high << 32)); +} + +static __inline void +load_xcr(u_int reg, u_long val) +{ + u_int low, high; + + low = val; + high = val >> 32; + __asm __volatile("xsetbv" : : "c" (reg), "a" (low), "d" (high)); +} + +/* + * Global TLB flush (except for thise for pages marked PG_G) + */ +static __inline void +invltlb(void) +{ + + load_cr3(rcr3()); +} + +#ifndef CR4_PGE +#define CR4_PGE 0x00000080 /* Page global enable */ +#endif + +/* + * Perform the guaranteed invalidation of all TLB entries. This + * includes the global entries, and entries in all PCIDs, not only the + * current context. The function works both on non-PCID CPUs and CPUs + * with the PCID turned off or on. See IA-32 SDM Vol. 3a 4.10.4.1 + * Operations that Invalidate TLBs and Paging-Structure Caches. + */ +static __inline void +invltlb_glob(void) +{ + uint64_t cr4; + + cr4 = rcr4(); + load_cr4(cr4 & ~CR4_PGE); + /* + * Although preemption at this point could be detrimental to + * performance, it would not lead to an error. PG_G is simply + * ignored if CR4.PGE is clear. Moreover, in case this block + * is re-entered, the load_cr4() either above or below will + * modify CR4.PGE flushing the TLB. + */ + load_cr4(cr4 | CR4_PGE); +} + +/* + * TLB flush for an individual page (even if it has PG_G). + * Only works on 486+ CPUs (i386 does not have PG_G). + */ +static __inline void +invlpg(u_long addr) +{ + + __asm __volatile("invlpg %0" : : "m" (*(char *)addr) : "memory"); +} + +#define INVPCID_ADDR 0 +#define INVPCID_CTX 1 +#define INVPCID_CTXGLOB 2 +#define INVPCID_ALLCTX 3 + +struct invpcid_descr { + uint64_t pcid:12 __packed; + uint64_t pad:52 __packed; + uint64_t addr; +} __packed; + +static __inline void +invpcid(struct invpcid_descr *d, int type) +{ + + __asm __volatile("invpcid (%0),%1" + : : "r" (d), "r" ((u_long)type) : "memory"); +} + +static __inline u_short +rfs(void) +{ + u_short sel; + __asm __volatile("movw %%fs,%0" : "=rm" (sel)); + return (sel); +} + +static __inline u_short +rgs(void) +{ + u_short sel; + __asm __volatile("movw %%gs,%0" : "=rm" (sel)); + return (sel); +} + +static __inline u_short +rss(void) +{ + u_short sel; + __asm __volatile("movw %%ss,%0" : "=rm" (sel)); + return (sel); +} + +static __inline void +load_ds(u_short sel) +{ + __asm __volatile("movw %0,%%ds" : : "rm" (sel)); +} + +static __inline void +load_es(u_short sel) +{ + __asm __volatile("movw %0,%%es" : : "rm" (sel)); +} + +static __inline void +cpu_monitor(const void *addr, u_long extensions, u_int hints) +{ + + __asm __volatile("monitor" + : : "a" (addr), "c" (extensions), "d" (hints)); +} + +static __inline void +cpu_mwait(u_long extensions, u_int hints) +{ + + __asm __volatile("mwait" : : "a" (hints), "c" (extensions)); +} + +#ifdef _KERNEL +/* This is defined in <machine/specialreg.h> but is too painful to get to */ +#ifndef MSR_FSBASE +#define MSR_FSBASE 0xc0000100 +#endif +static __inline void +load_fs(u_short sel) +{ + /* Preserve the fsbase value across the selector load */ + __asm __volatile("rdmsr; movw %0,%%fs; wrmsr" + : : "rm" (sel), "c" (MSR_FSBASE) : "eax", "edx"); +} + +#ifndef MSR_GSBASE +#define MSR_GSBASE 0xc0000101 +#endif +static __inline void +load_gs(u_short sel) +{ + /* + * Preserve the gsbase value across the selector load. + * Note that we have to disable interrupts because the gsbase + * being trashed happens to be the kernel gsbase at the time. + */ + __asm __volatile("pushfq; cli; rdmsr; movw %0,%%gs; wrmsr; popfq" + : : "rm" (sel), "c" (MSR_GSBASE) : "eax", "edx"); +} +#else +/* Usable by userland */ +static __inline void +load_fs(u_short sel) +{ + __asm __volatile("movw %0,%%fs" : : "rm" (sel)); +} + +static __inline void +load_gs(u_short sel) +{ + __asm __volatile("movw %0,%%gs" : : "rm" (sel)); +} +#endif + +static __inline void +lidt(struct region_descriptor *addr) +{ + __asm __volatile("lidt (%0)" : : "r" (addr)); +} + +static __inline void +lldt(u_short sel) +{ + __asm __volatile("lldt %0" : : "r" (sel)); +} + +static __inline void +ltr(u_short sel) +{ + __asm __volatile("ltr %0" : : "r" (sel)); +} + +static __inline uint64_t +rdr0(void) +{ + uint64_t data; + __asm __volatile("movq %%dr0,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr0(uint64_t dr0) +{ + __asm __volatile("movq %0,%%dr0" : : "r" (dr0)); +} + +static __inline uint64_t +rdr1(void) +{ + uint64_t data; + __asm __volatile("movq %%dr1,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr1(uint64_t dr1) +{ + __asm __volatile("movq %0,%%dr1" : : "r" (dr1)); +} + +static __inline uint64_t +rdr2(void) +{ + uint64_t data; + __asm __volatile("movq %%dr2,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr2(uint64_t dr2) +{ + __asm __volatile("movq %0,%%dr2" : : "r" (dr2)); +} + +static __inline uint64_t +rdr3(void) +{ + uint64_t data; + __asm __volatile("movq %%dr3,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr3(uint64_t dr3) +{ + __asm __volatile("movq %0,%%dr3" : : "r" (dr3)); +} + +static __inline uint64_t +rdr4(void) +{ + uint64_t data; + __asm __volatile("movq %%dr4,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr4(uint64_t dr4) +{ + __asm __volatile("movq %0,%%dr4" : : "r" (dr4)); +} + +static __inline uint64_t +rdr5(void) +{ + uint64_t data; + __asm __volatile("movq %%dr5,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr5(uint64_t dr5) +{ + __asm __volatile("movq %0,%%dr5" : : "r" (dr5)); +} + +static __inline uint64_t +rdr6(void) +{ + uint64_t data; + __asm __volatile("movq %%dr6,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr6(uint64_t dr6) +{ + __asm __volatile("movq %0,%%dr6" : : "r" (dr6)); +} + +static __inline uint64_t +rdr7(void) +{ + uint64_t data; + __asm __volatile("movq %%dr7,%0" : "=r" (data)); + return (data); +} + +static __inline void +load_dr7(uint64_t dr7) +{ + __asm __volatile("movq %0,%%dr7" : : "r" (dr7)); +} + +static __inline register_t +intr_disable(void) +{ + register_t rflags; + + rflags = read_rflags(); + disable_intr(); + return (rflags); +} + +static __inline void +intr_restore(register_t rflags) +{ + write_rflags(rflags); +} + +#else /* !(__GNUCLIKE_ASM && __CC_SUPPORTS___INLINE) */ + +int breakpoint(void); +u_int bsfl(u_int mask); +u_int bsrl(u_int mask); +void clflush(u_long addr); +void clts(void); +void cpuid_count(u_int ax, u_int cx, u_int *p); +void disable_intr(void); +void do_cpuid(u_int ax, u_int *p); +void enable_intr(void); +void halt(void); +void ia32_pause(void); +u_char inb(u_int port); +u_int inl(u_int port); +void insb(u_int port, void *addr, size_t count); +void insl(u_int port, void *addr, size_t count); +void insw(u_int port, void *addr, size_t count); +register_t intr_disable(void); +void intr_restore(register_t rf); +void invd(void); +void invlpg(u_int addr); +void invltlb(void); +u_short inw(u_int port); +void lidt(struct region_descriptor *addr); +void lldt(u_short sel); +void load_cr0(u_long cr0); +void load_cr3(u_long cr3); +void load_cr4(u_long cr4); +void load_dr0(uint64_t dr0); +void load_dr1(uint64_t dr1); +void load_dr2(uint64_t dr2); +void load_dr3(uint64_t dr3); +void load_dr4(uint64_t dr4); +void load_dr5(uint64_t dr5); +void load_dr6(uint64_t dr6); +void load_dr7(uint64_t dr7); +void load_fs(u_short sel); +void load_gs(u_short sel); +void ltr(u_short sel); +void outb(u_int port, u_char data); +void outl(u_int port, u_int data); +void outsb(u_int port, const void *addr, size_t count); +void outsl(u_int port, const void *addr, size_t count); +void outsw(u_int port, const void *addr, size_t count); +void outw(u_int port, u_short data); +u_long rcr0(void); +u_long rcr2(void); +u_long rcr3(void); +u_long rcr4(void); +uint64_t rdmsr(u_int msr); +uint32_t rdmsr32(u_int msr); +uint64_t rdpmc(u_int pmc); +uint64_t rdr0(void); +uint64_t rdr1(void); +uint64_t rdr2(void); +uint64_t rdr3(void); +uint64_t rdr4(void); +uint64_t rdr5(void); +uint64_t rdr6(void); +uint64_t rdr7(void); +uint64_t rdtsc(void); +u_long read_rflags(void); +u_int rfs(void); +u_int rgs(void); +void wbinvd(void); +void write_rflags(u_int rf); +void wrmsr(u_int msr, uint64_t newval); + +#endif /* __GNUCLIKE_ASM && __CC_SUPPORTS___INLINE */ + +void reset_dbregs(void); + +#ifdef _KERNEL +int rdmsr_safe(u_int msr, uint64_t *val); +int wrmsr_safe(u_int msr, uint64_t newval); +#endif + +#endif /* !_MACHINE_CPUFUNC_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/cputypes.h /usr/src/sys/modules/netmap/machine/cputypes.h --- usr/src/sys/modules/netmap/machine/cputypes.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/cputypes.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,48 @@ +/*- + * Copyright (c) 1993 Christopher G. Demetriou + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/cputypes.h 292668 2015-12-23 21:41:42Z jhb $ + */ + +#ifndef _MACHINE_CPUTYPES_H_ +#define _MACHINE_CPUTYPES_H_ + +#include <x86/cputypes.h> + +/* + * Classes of processor. + */ +#define CPUCLASS_X86 0 /* X86 */ +#define CPUCLASS_K8 1 /* K8 AMD64 class */ + +/* + * Kinds of processor. + */ +#define CPU_X86 0 /* Intel */ +#define CPU_CLAWHAMMER 1 /* AMD Clawhammer */ +#define CPU_SLEDGEHAMMER 2 /* AMD Sledgehammer */ + +#endif /* !_MACHINE_CPUTYPES_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/db_machdep.h /usr/src/sys/modules/netmap/machine/db_machdep.h --- usr/src/sys/modules/netmap/machine/db_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/db_machdep.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,94 @@ +/*- + * Mach Operating System + * Copyright (c) 1991,1990 Carnegie Mellon University + * All Rights Reserved. + * + * Permission to use, copy, modify and distribute this software and its + * documentation is hereby granted, provided that both the copyright + * notice and this permission notice appear in all copies of the + * software, derivative works or modified versions, and any portions + * thereof, and that both notices appear in supporting documentation. + * + * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" + * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR + * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. + * + * Carnegie Mellon requests users of this software to return to + * + * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU + * School of Computer Science + * Carnegie Mellon University + * Pittsburgh PA 15213-3890 + * + * any improvements or extensions that they make and grant Carnegie Mellon + * the rights to redistribute these changes. + * + * $FreeBSD: releng/11.0/sys/amd64/include/db_machdep.h 139731 2005-01-05 20:17:21Z imp $ + */ + +#ifndef _MACHINE_DB_MACHDEP_H_ +#define _MACHINE_DB_MACHDEP_H_ + +#include <machine/frame.h> +#include <machine/trap.h> + +typedef vm_offset_t db_addr_t; /* address - unsigned */ +typedef long db_expr_t; /* expression - signed */ + +#define PC_REGS() ((db_addr_t)kdb_thrctx->pcb_rip) + +#define BKPT_INST 0xcc /* breakpoint instruction */ +#define BKPT_SIZE (1) /* size of breakpoint inst */ +#define BKPT_SET(inst) (BKPT_INST) + +#define BKPT_SKIP \ +do { \ + kdb_frame->tf_rip += 1; \ + kdb_thrctx->pcb_rip += 1; \ +} while(0) + +#define FIXUP_PC_AFTER_BREAK \ +do { \ + kdb_frame->tf_rip -= 1; \ + kdb_thrctx->pcb_rip -= 1; \ +} while(0); + +#define db_clear_single_step kdb_cpu_clear_singlestep +#define db_set_single_step kdb_cpu_set_singlestep + +#define IS_BREAKPOINT_TRAP(type, code) ((type) == T_BPTFLT) +/* + * Watchpoints are not supported. The debug exception type is in %dr6 + * and not yet in the args to this macro. + */ +#define IS_WATCHPOINT_TRAP(type, code) 0 + +#define I_CALL 0xe8 +#define I_CALLI 0xff +#define I_RET 0xc3 +#define I_IRET 0xcf + +#define inst_trap_return(ins) (((ins)&0xff) == I_IRET) +#define inst_return(ins) (((ins)&0xff) == I_RET) +#define inst_call(ins) (((ins)&0xff) == I_CALL || \ + (((ins)&0xff) == I_CALLI && \ + ((ins)&0x3800) == 0x1000)) +#define inst_load(ins) 0 +#define inst_store(ins) 0 + +/* + * There no interesting addresses below _kstack = 0xefbfe000. There + * are small absolute values for GUPROF, but we don't want to see them. + * Treat "negative" addresses below _kstack as non-small to allow for + * future reductions of _kstack and to avoid sign extension problems. + * + * There is one interesting symbol above -db_maxoff = 0xffff0000, + * namely _APTD = 0xfffff000. Accepting this would mess up the + * printing of small negative offsets. The next largest symbol is + * _APTmap = 0xffc00000. Accepting this is OK (unless db_maxoff is + * set to >= 0x400000 - (max stack offset)). + */ +#define DB_SMALL_VALUE_MAX 0x7fffffff +#define DB_SMALL_VALUE_MIN (-0x400001) + +#endif /* !_MACHINE_DB_MACHDEP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/dump.h /usr/src/sys/modules/netmap/machine/dump.h --- usr/src/sys/modules/netmap/machine/dump.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/dump.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/dump.h 276772 2015-01-07 01:01:39Z markj $ */ + +#include <x86/dump.h> diff -u -r -N usr/src/sys/modules/netmap/machine/elf.h /usr/src/sys/modules/netmap/machine/elf.h --- usr/src/sys/modules/netmap/machine/elf.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/elf.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/elf.h 247047 2013-02-20 17:39:52Z kib $ */ + +#include <x86/elf.h> diff -u -r -N usr/src/sys/modules/netmap/machine/endian.h /usr/src/sys/modules/netmap/machine/endian.h --- usr/src/sys/modules/netmap/machine/endian.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/endian.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/endian.h 232266 2012-02-28 19:39:54Z tijl $ */ + +#include <x86/endian.h> diff -u -r -N usr/src/sys/modules/netmap/machine/exec.h /usr/src/sys/modules/netmap/machine/exec.h --- usr/src/sys/modules/netmap/machine/exec.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/exec.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,38 @@ +/*- + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)exec.h 8.1 (Berkeley) 6/11/93 + * $FreeBSD: releng/11.0/sys/amd64/include/exec.h 142107 2005-02-19 21:16:48Z ru $ + */ + +#ifndef _MACHINE_EXEC_H_ +#define _MACHINE_EXEC_H_ + +#define __LDPGSZ 4096 + +#endif /* !_MACHINE_EXEC_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/fdt.h /usr/src/sys/modules/netmap/machine/fdt.h --- usr/src/sys/modules/netmap/machine/fdt.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/fdt.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/fdt.h 250840 2013-05-21 03:05:49Z marcel $ */ + +#include <x86/fdt.h> diff -u -r -N usr/src/sys/modules/netmap/machine/float.h /usr/src/sys/modules/netmap/machine/float.h --- usr/src/sys/modules/netmap/machine/float.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/float.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/float.h 232491 2012-03-04 14:00:32Z tijl $ */ + +#include <x86/float.h> diff -u -r -N usr/src/sys/modules/netmap/machine/floatingpoint.h /usr/src/sys/modules/netmap/machine/floatingpoint.h --- usr/src/sys/modules/netmap/machine/floatingpoint.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/floatingpoint.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,43 @@ +/*- + * Copyright (c) 1993 Andrew Moore, Talke Studio + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#) floatingpoint.h 1.0 (Berkeley) 9/23/93 + * $FreeBSD: releng/11.0/sys/amd64/include/floatingpoint.h 144544 2005-04-02 17:31:42Z netchild $ + */ + +#ifndef _FLOATINGPOINT_H_ +#define _FLOATINGPOINT_H_ + +#include <sys/cdefs.h> +#include <machine/ieeefp.h> + +#endif /* !_FLOATINGPOINT_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/fpu.h /usr/src/sys/modules/netmap/machine/fpu.h --- usr/src/sys/modules/netmap/machine/fpu.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/fpu.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,92 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)npx.h 5.3 (Berkeley) 1/18/91 + * $FreeBSD: releng/11.0/sys/amd64/include/fpu.h 271192 2014-09-06 15:23:28Z jhb $ + */ + +/* + * Floating Point Data Structures and Constants + * W. Jolitz 1/90 + */ + +#ifndef _MACHINE_FPU_H_ +#define _MACHINE_FPU_H_ + +#include <x86/fpu.h> + +#ifdef _KERNEL + +struct fpu_kern_ctx; + +#define PCB_USER_FPU(pcb) (((pcb)->pcb_flags & PCB_KERNFPU) == 0) + +#define XSAVE_AREA_ALIGN 64 + +void fpudna(void); +void fpudrop(void); +void fpuexit(struct thread *td); +int fpuformat(void); +int fpugetregs(struct thread *td); +void fpuinit(void); +void fpurestore(void *addr); +void fpuresume(void *addr); +void fpusave(void *addr); +int fpusetregs(struct thread *td, struct savefpu *addr, + char *xfpustate, size_t xfpustate_size); +int fpusetxstate(struct thread *td, char *xfpustate, + size_t xfpustate_size); +void fpususpend(void *addr); +int fputrap_sse(void); +int fputrap_x87(void); +void fpuuserinited(struct thread *td); +struct fpu_kern_ctx *fpu_kern_alloc_ctx(u_int flags); +void fpu_kern_free_ctx(struct fpu_kern_ctx *ctx); +int fpu_kern_enter(struct thread *td, struct fpu_kern_ctx *ctx, + u_int flags); +int fpu_kern_leave(struct thread *td, struct fpu_kern_ctx *ctx); +int fpu_kern_thread(u_int flags); +int is_fpu_kern_thread(u_int flags); + +struct savefpu *fpu_save_area_alloc(void); +void fpu_save_area_free(struct savefpu *fsa); +void fpu_save_area_reset(struct savefpu *fsa); + +/* + * Flags for fpu_kern_alloc_ctx(), fpu_kern_enter() and fpu_kern_thread(). + */ +#define FPU_KERN_NORMAL 0x0000 +#define FPU_KERN_NOWAIT 0x0001 +#define FPU_KERN_KTHR 0x0002 + +#endif + +#endif /* !_MACHINE_FPU_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/frame.h /usr/src/sys/modules/netmap/machine/frame.h --- usr/src/sys/modules/netmap/machine/frame.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/frame.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/frame.h 247047 2013-02-20 17:39:52Z kib $ */ + +#include <x86/frame.h> diff -u -r -N usr/src/sys/modules/netmap/machine/gdb_machdep.h /usr/src/sys/modules/netmap/machine/gdb_machdep.h --- usr/src/sys/modules/netmap/machine/gdb_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/gdb_machdep.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,52 @@ +/*- + * Copyright (c) 2004 Marcel Moolenaar + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/gdb_machdep.h 166520 2007-02-05 21:48:32Z jhb $ + */ + +#ifndef _MACHINE_GDB_MACHDEP_H_ +#define _MACHINE_GDB_MACHDEP_H_ + +#define GDB_BUFSZ (GDB_NREGS * 16) +#define GDB_NREGS 56 +#define GDB_REG_PC 16 + +static __inline size_t +gdb_cpu_regsz(int regnum) +{ + return ((regnum > 16 && regnum < 24) ? 4 : 8); +} + +static __inline int +gdb_cpu_query(void) +{ + return (0); +} + +void *gdb_cpu_getreg(int, size_t *); +void gdb_cpu_setreg(int, void *); +int gdb_cpu_signal(int, int); + +#endif /* !_MACHINE_GDB_MACHDEP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/ieeefp.h /usr/src/sys/modules/netmap/machine/ieeefp.h --- usr/src/sys/modules/netmap/machine/ieeefp.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/ieeefp.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,308 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1990 Andrew Moore, Talke Studio + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#) ieeefp.h 1.0 (Berkeley) 9/23/93 + * $FreeBSD: releng/11.0/sys/amd64/include/ieeefp.h 226607 2011-10-21 06:41:46Z das $ + */ + +#ifndef _MACHINE_IEEEFP_H_ +#define _MACHINE_IEEEFP_H_ + +/* + * Deprecated historical FPU control interface + * + * IEEE floating point type, constant and function definitions. + * XXX: {FP,SSE}*FLD and {FP,SSE}*OFF are undocumented pollution. + */ + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +/* + * Rounding modes. + */ +typedef enum { + FP_RN=0, /* round to nearest */ + FP_RM, /* round down towards minus infinity */ + FP_RP, /* round up towards plus infinity */ + FP_RZ /* truncate */ +} fp_rnd_t; + +/* + * Precision (i.e., rounding precision) modes. + */ +typedef enum { + FP_PS=0, /* 24 bit (single-precision) */ + FP_PRS, /* reserved */ + FP_PD, /* 53 bit (double-precision) */ + FP_PE /* 64 bit (extended-precision) */ +} fp_prec_t; + +#define fp_except_t int + +/* + * Exception bit masks. + */ +#define FP_X_INV 0x01 /* invalid operation */ +#define FP_X_DNML 0x02 /* denormal */ +#define FP_X_DZ 0x04 /* zero divide */ +#define FP_X_OFL 0x08 /* overflow */ +#define FP_X_UFL 0x10 /* underflow */ +#define FP_X_IMP 0x20 /* (im)precision */ +#define FP_X_STK 0x40 /* stack fault */ + +/* + * FPU control word bit-field masks. + */ +#define FP_MSKS_FLD 0x3f /* exception masks field */ +#define FP_PRC_FLD 0x300 /* precision control field */ +#define FP_RND_FLD 0xc00 /* rounding control field */ + +/* + * FPU status word bit-field masks. + */ +#define FP_STKY_FLD 0x3f /* sticky flags field */ + +/* + * SSE mxcsr register bit-field masks. + */ +#define SSE_STKY_FLD 0x3f /* exception flags */ +#define SSE_DAZ_FLD 0x40 /* Denormals are zero */ +#define SSE_MSKS_FLD 0x1f80 /* exception masks field */ +#define SSE_RND_FLD 0x6000 /* rounding control */ +#define SSE_FZ_FLD 0x8000 /* flush to zero on underflow */ + +/* + * FPU control word bit-field offsets (shift counts). + */ +#define FP_MSKS_OFF 0 /* exception masks offset */ +#define FP_PRC_OFF 8 /* precision control offset */ +#define FP_RND_OFF 10 /* rounding control offset */ + +/* + * FPU status word bit-field offsets (shift counts). + */ +#define FP_STKY_OFF 0 /* sticky flags offset */ + +/* + * SSE mxcsr register bit-field offsets (shift counts). + */ +#define SSE_STKY_OFF 0 /* exception flags offset */ +#define SSE_DAZ_OFF 6 /* DAZ exception mask offset */ +#define SSE_MSKS_OFF 7 /* other exception masks offset */ +#define SSE_RND_OFF 13 /* rounding control offset */ +#define SSE_FZ_OFF 15 /* flush to zero offset */ + +#ifdef __GNUCLIKE_ASM + +#define __fldcw(addr) __asm __volatile("fldcw %0" : : "m" (*(addr))) +#define __fldenv(addr) __asm __volatile("fldenv %0" : : "m" (*(addr))) +#define __fnstcw(addr) __asm __volatile("fnstcw %0" : "=m" (*(addr))) +#define __fnstenv(addr) __asm __volatile("fnstenv %0" : "=m" (*(addr))) +#define __fnstsw(addr) __asm __volatile("fnstsw %0" : "=m" (*(addr))) +#define __ldmxcsr(addr) __asm __volatile("ldmxcsr %0" : : "m" (*(addr))) +#define __stmxcsr(addr) __asm __volatile("stmxcsr %0" : "=m" (*(addr))) + +/* + * Load the control word. Be careful not to trap if there is a currently + * unmasked exception (ones that will become freshly unmasked are not a + * problem). This case must be handled by a save/restore of the + * environment or even of the full x87 state. Accessing the environment + * is very inefficient, so only do it when necessary. + */ +static __inline void +__fnldcw(unsigned short _cw, unsigned short _newcw) +{ + struct { + unsigned _cw; + unsigned _other[6]; + } _env; + unsigned short _sw; + + if ((_cw & FP_MSKS_FLD) != FP_MSKS_FLD) { + __fnstsw(&_sw); + if (((_sw & ~_cw) & FP_STKY_FLD) != 0) { + __fnstenv(&_env); + _env._cw = _newcw; + __fldenv(&_env); + return; + } + } + __fldcw(&_newcw); +} + +/* + * General notes about conflicting SSE vs FP status bits. + * This code assumes that software will not fiddle with the control + * bits of the SSE and x87 in such a way to get them out of sync and + * still expect this to work. Break this at your peril. + * Because I based this on the i386 port, the x87 state is used for + * the fpget*() functions, and is shadowed into the SSE state for + * the fpset*() functions. For dual source fpget*() functions, I + * merge the two together. I think. + */ + +static __inline fp_rnd_t +__fpgetround(void) +{ + unsigned short _cw; + + __fnstcw(&_cw); + return ((fp_rnd_t)((_cw & FP_RND_FLD) >> FP_RND_OFF)); +} + +static __inline fp_rnd_t +__fpsetround(fp_rnd_t _m) +{ + fp_rnd_t _p; + unsigned _mxcsr; + unsigned short _cw, _newcw; + + __fnstcw(&_cw); + _p = (fp_rnd_t)((_cw & FP_RND_FLD) >> FP_RND_OFF); + _newcw = _cw & ~FP_RND_FLD; + _newcw |= (_m << FP_RND_OFF) & FP_RND_FLD; + __fnldcw(_cw, _newcw); + __stmxcsr(&_mxcsr); + _mxcsr &= ~SSE_RND_FLD; + _mxcsr |= (_m << SSE_RND_OFF) & SSE_RND_FLD; + __ldmxcsr(&_mxcsr); + return (_p); +} + +/* + * Get or set the rounding precision for x87 arithmetic operations. + * There is no equivalent SSE mode or control. + */ + +static __inline fp_prec_t +__fpgetprec(void) +{ + unsigned short _cw; + + __fnstcw(&_cw); + return ((fp_prec_t)((_cw & FP_PRC_FLD) >> FP_PRC_OFF)); +} + +static __inline fp_prec_t +__fpsetprec(fp_prec_t _m) +{ + fp_prec_t _p; + unsigned short _cw, _newcw; + + __fnstcw(&_cw); + _p = (fp_prec_t)((_cw & FP_PRC_FLD) >> FP_PRC_OFF); + _newcw = _cw & ~FP_PRC_FLD; + _newcw |= (_m << FP_PRC_OFF) & FP_PRC_FLD; + __fnldcw(_cw, _newcw); + return (_p); +} + +/* + * Get or set the exception mask. + * Note that the x87 mask bits are inverted by the API -- a mask bit of 1 + * means disable for x87 and SSE, but for fp*mask() it means enable. + */ + +static __inline fp_except_t +__fpgetmask(void) +{ + unsigned short _cw; + + __fnstcw(&_cw); + return ((~_cw & FP_MSKS_FLD) >> FP_MSKS_OFF); +} + +static __inline fp_except_t +__fpsetmask(fp_except_t _m) +{ + fp_except_t _p; + unsigned _mxcsr; + unsigned short _cw, _newcw; + + __fnstcw(&_cw); + _p = (~_cw & FP_MSKS_FLD) >> FP_MSKS_OFF; + _newcw = _cw & ~FP_MSKS_FLD; + _newcw |= (~_m << FP_MSKS_OFF) & FP_MSKS_FLD; + __fnldcw(_cw, _newcw); + __stmxcsr(&_mxcsr); + /* XXX should we clear non-ieee SSE_DAZ_FLD and SSE_FZ_FLD ? */ + _mxcsr &= ~SSE_MSKS_FLD; + _mxcsr |= (~_m << SSE_MSKS_OFF) & SSE_MSKS_FLD; + __ldmxcsr(&_mxcsr); + return (_p); +} + +static __inline fp_except_t +__fpgetsticky(void) +{ + unsigned _ex, _mxcsr; + unsigned short _sw; + + __fnstsw(&_sw); + _ex = (_sw & FP_STKY_FLD) >> FP_STKY_OFF; + __stmxcsr(&_mxcsr); + _ex |= (_mxcsr & SSE_STKY_FLD) >> SSE_STKY_OFF; + return ((fp_except_t)_ex); +} + +#endif /* __GNUCLIKE_ASM */ + +#if !defined(__IEEEFP_NOINLINES__) && defined(__GNUCLIKE_ASM) + +#define fpgetmask() __fpgetmask() +#define fpgetprec() __fpgetprec() +#define fpgetround() __fpgetround() +#define fpgetsticky() __fpgetsticky() +#define fpsetmask(m) __fpsetmask(m) +#define fpsetprec(m) __fpsetprec(m) +#define fpsetround(m) __fpsetround(m) + +#else /* !(!__IEEEFP_NOINLINES__ && __GNUCLIKE_ASM) */ + +/* Augment the userland declarations. */ +__BEGIN_DECLS +extern fp_rnd_t fpgetround(void); +extern fp_rnd_t fpsetround(fp_rnd_t); +extern fp_except_t fpgetmask(void); +extern fp_except_t fpsetmask(fp_except_t); +extern fp_except_t fpgetsticky(void); +extern fp_except_t fpsetsticky(fp_except_t); +fp_prec_t fpgetprec(void); +fp_prec_t fpsetprec(fp_prec_t); +__END_DECLS + +#endif /* !__IEEEFP_NOINLINES__ && __GNUCLIKE_ASM */ + +#endif /* !_MACHINE_IEEEFP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/in_cksum.h /usr/src/sys/modules/netmap/machine/in_cksum.h --- usr/src/sys/modules/netmap/machine/in_cksum.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/in_cksum.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,84 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from tahoe: in_cksum.c 1.2 86/01/05 + * from: @(#)in_cksum.c 1.3 (Berkeley) 1/19/91 + * from: Id: in_cksum.c,v 1.8 1995/12/03 18:35:19 bde Exp + * $FreeBSD: releng/11.0/sys/amd64/include/in_cksum.h 286336 2015-08-05 19:05:12Z emaste $ + */ + +#ifndef _MACHINE_IN_CKSUM_H_ +#define _MACHINE_IN_CKSUM_H_ 1 + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +#include <sys/cdefs.h> + +#define in_cksum(m, len) in_cksum_skip(m, len, 0) + +#if defined(IPVERSION) && (IPVERSION == 4) +/* + * It it useful to have an Internet checksum routine which is inlineable + * and optimized specifically for the task of computing IP header checksums + * in the normal case (where there are no options and the header length is + * therefore always exactly five 32-bit words. + */ +#ifdef __CC_SUPPORTS___INLINE + +static __inline void +in_cksum_update(struct ip *ip) +{ + int __tmpsum; + __tmpsum = (int)ntohs(ip->ip_sum) + 256; + ip->ip_sum = htons(__tmpsum + (__tmpsum >> 16)); +} + +#else + +#define in_cksum_update(ip) \ + do { \ + int __tmpsum; \ + __tmpsum = (int)ntohs(ip->ip_sum) + 256; \ + ip->ip_sum = htons(__tmpsum + (__tmpsum >> 16)); \ + } while(0) + +#endif +#endif + +#ifdef _KERNEL +#if defined(IPVERSION) && (IPVERSION == 4) +u_int in_cksum_hdr(const struct ip *ip); +#endif +u_short in_addword(u_short sum, u_short b); +u_short in_pseudo(u_int sum, u_int b, u_int c); +u_short in_cksum_skip(struct mbuf *m, int len, int skip); +#endif + +#endif /* _MACHINE_IN_CKSUM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/intr_machdep.h /usr/src/sys/modules/netmap/machine/intr_machdep.h --- usr/src/sys/modules/netmap/machine/intr_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/intr_machdep.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,194 @@ +/*- + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/intr_machdep.h 302895 2016-07-15 09:44:48Z royger $ + */ + +#ifndef __MACHINE_INTR_MACHDEP_H__ +#define __MACHINE_INTR_MACHDEP_H__ + +#ifdef _KERNEL + +/* + * The maximum number of I/O interrupts we allow. This number is rather + * arbitrary as it is just the maximum IRQ resource value. The interrupt + * source for a given IRQ maps that I/O interrupt to device interrupt + * source whether it be a pin on an interrupt controller or an MSI interrupt. + * The 16 ISA IRQs are assigned fixed IDT vectors, but all other device + * interrupts allocate IDT vectors on demand. Currently we have 191 IDT + * vectors available for device interrupts. On many systems with I/O APICs, + * a lot of the IRQs are not used, so this number can be much larger than + * 191 and still be safe since only interrupt sources in actual use will + * allocate IDT vectors. + * + * The first 255 IRQs (0 - 254) are reserved for ISA IRQs and PCI intline IRQs. + * IRQ values from 256 to 767 are used by MSI. When running under the Xen + * Hypervisor, IRQ values from 768 to 4863 are available for binding to + * event channel events. We leave 255 unused to avoid confusion since 255 is + * used in PCI to indicate an invalid IRQ. + */ +#define NUM_MSI_INTS 512 +#define FIRST_MSI_INT 256 +#ifdef XENHVM +#include <xen/xen-os.h> +#include <xen/interface/event_channel.h> +#define NUM_EVTCHN_INTS NR_EVENT_CHANNELS +#define FIRST_EVTCHN_INT \ + (FIRST_MSI_INT + NUM_MSI_INTS) +#define LAST_EVTCHN_INT \ + (FIRST_EVTCHN_INT + NUM_EVTCHN_INTS - 1) +#else +#define NUM_EVTCHN_INTS 0 +#endif +#define NUM_IO_INTS (FIRST_MSI_INT + NUM_MSI_INTS + NUM_EVTCHN_INTS) + +/* + * Default base address for MSI messages on x86 platforms. + */ +#define MSI_INTEL_ADDR_BASE 0xfee00000 + +/* + * - 1 ??? dummy counter. + * - 2 counters for each I/O interrupt. + * - 1 counter for each CPU for lapic timer. + * - 8 counters for each CPU for IPI counters for SMP. + */ +#ifdef SMP +#define INTRCNT_COUNT (1 + NUM_IO_INTS * 2 + (1 + 8) * MAXCPU) +#else +#define INTRCNT_COUNT (1 + NUM_IO_INTS * 2 + 1) +#endif + +#ifndef LOCORE + +typedef void inthand_t(void); + +#define IDTVEC(name) __CONCAT(X,name) + +struct intsrc; + +/* + * Methods that a PIC provides to mask/unmask a given interrupt source, + * "turn on" the interrupt on the CPU side by setting up an IDT entry, and + * return the vector associated with this source. + */ +struct pic { + void (*pic_enable_source)(struct intsrc *); + void (*pic_disable_source)(struct intsrc *, int); + void (*pic_eoi_source)(struct intsrc *); + void (*pic_enable_intr)(struct intsrc *); + void (*pic_disable_intr)(struct intsrc *); + int (*pic_vector)(struct intsrc *); + int (*pic_source_pending)(struct intsrc *); + void (*pic_suspend)(struct pic *); + void (*pic_resume)(struct pic *, bool suspend_cancelled); + int (*pic_config_intr)(struct intsrc *, enum intr_trigger, + enum intr_polarity); + int (*pic_assign_cpu)(struct intsrc *, u_int apic_id); + void (*pic_reprogram_pin)(struct intsrc *); + TAILQ_ENTRY(pic) pics; +}; + +/* Flags for pic_disable_source() */ +enum { + PIC_EOI, + PIC_NO_EOI, +}; + +/* + * An interrupt source. The upper-layer code uses the PIC methods to + * control a given source. The lower-layer PIC drivers can store additional + * private data in a given interrupt source such as an interrupt pin number + * or an I/O APIC pointer. + */ +struct intsrc { + struct pic *is_pic; + struct intr_event *is_event; + u_long *is_count; + u_long *is_straycount; + u_int is_index; + u_int is_handlers; +}; + +struct trapframe; + +/* + * The following data structure holds per-cpu data, and is placed just + * above the top of the space used for the NMI stack. + */ +struct nmi_pcpu { + register_t np_pcpu; + register_t __padding; /* pad to 16 bytes */ +}; + +#ifdef SMP +extern cpuset_t intr_cpus; +#endif +extern struct mtx icu_lock; +extern int elcr_found; + +extern int msix_disable_migration; + +#ifndef DEV_ATPIC +void atpic_reset(void); +#endif +/* XXX: The elcr_* prototypes probably belong somewhere else. */ +int elcr_probe(void); +enum intr_trigger elcr_read_trigger(u_int irq); +void elcr_resume(void); +void elcr_write_trigger(u_int irq, enum intr_trigger trigger); +#ifdef SMP +void intr_add_cpu(u_int cpu); +#endif +int intr_add_handler(const char *name, int vector, driver_filter_t filter, + driver_intr_t handler, void *arg, enum intr_type flags, + void **cookiep); +#ifdef SMP +int intr_bind(u_int vector, u_char cpu); +#endif +int intr_config_intr(int vector, enum intr_trigger trig, + enum intr_polarity pol); +int intr_describe(u_int vector, void *ih, const char *descr); +void intr_execute_handlers(struct intsrc *isrc, struct trapframe *frame); +u_int intr_next_cpu(void); +struct intsrc *intr_lookup_source(int vector); +int intr_register_pic(struct pic *pic); +int intr_register_source(struct intsrc *isrc); +int intr_remove_handler(void *cookie); +void intr_resume(bool suspend_cancelled); +void intr_suspend(void); +void intr_reprogram(void); +void intrcnt_add(const char *name, u_long **countp); +void nexus_add_irq(u_long irq); +int msi_alloc(device_t dev, int count, int maxcount, int *irqs); +void msi_init(void); +int msi_map(int irq, uint64_t *addr, uint32_t *data); +int msi_release(int *irqs, int count); +int msix_alloc(device_t dev, int *irq); +int msix_release(int irq); + +#endif /* !LOCORE */ +#endif /* _KERNEL */ +#endif /* !__MACHINE_INTR_MACHDEP_H__ */ diff -u -r -N usr/src/sys/modules/netmap/machine/iodev.h /usr/src/sys/modules/netmap/machine/iodev.h --- usr/src/sys/modules/netmap/machine/iodev.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/iodev.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,46 @@ +/*- + * Copyright (c) 2004 Mark R V Murray + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer + * in this position and unchanged. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/iodev.h 207329 2010-04-28 15:38:01Z attilio $ + */ +#ifndef _MACHINE_IODEV_H_ +#define _MACHINE_IODEV_H_ + +#ifdef _KERNEL +#include <machine/cpufunc.h> + +#define iodev_read_1 inb +#define iodev_read_2 inw +#define iodev_read_4 inl +#define iodev_write_1 outb +#define iodev_write_2 outw +#define iodev_write_4 outl + +int iodev_open(struct thread *td); +int iodev_close(struct thread *td); +int iodev_ioctl(u_long cmd, caddr_t data); + +#endif /* _KERNEL */ +#endif /* _MACHINE_IODEV_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/kdb.h /usr/src/sys/modules/netmap/machine/kdb.h --- usr/src/sys/modules/netmap/machine/kdb.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/kdb.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,59 @@ +/*- + * Copyright (c) 2004 Marcel Moolenaar + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/kdb.h 170473 2007-06-09 21:55:17Z marcel $ + */ + +#ifndef _MACHINE_KDB_H_ +#define _MACHINE_KDB_H_ + +#include <machine/frame.h> +#include <machine/psl.h> + +#define KDB_STOPPEDPCB(pc) &stoppcbs[pc->pc_cpuid] + +static __inline void +kdb_cpu_clear_singlestep(void) +{ + kdb_frame->tf_rflags &= ~PSL_T; +} + +static __inline void +kdb_cpu_set_singlestep(void) +{ + kdb_frame->tf_rflags |= PSL_T; +} + +static __inline void +kdb_cpu_sync_icache(unsigned char *addr, size_t size) +{ +} + +static __inline void +kdb_cpu_trap(int type, int code) +{ +} + +#endif /* _MACHINE_KDB_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/limits.h /usr/src/sys/modules/netmap/machine/limits.h --- usr/src/sys/modules/netmap/machine/limits.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/limits.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,44 @@ +/*- + * Copyright (c) 1988, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)limits.h 8.3 (Berkeley) 1/4/94 + * $FreeBSD: releng/11.0/sys/amd64/include/limits.h 143063 2005-03-02 21:33:29Z joerg $ + */ + +#ifndef _MACHINE_LIMITS_H_ +#define _MACHINE_LIMITS_H_ + +#include <sys/cdefs.h> + +#ifdef __CC_SUPPORTS_WARNING +#warning "machine/limits.h is deprecated. Include sys/limits.h instead." +#endif + +#include <sys/limits.h> + +#endif /* !_MACHINE_LIMITS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/md_var.h /usr/src/sys/modules/netmap/machine/md_var.h --- usr/src/sys/modules/netmap/machine/md_var.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/md_var.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,63 @@ +/*- + * Copyright (c) 1995 Bruce D. Evans. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/md_var.h 297399 2016-03-29 19:56:48Z kib $ + */ + +#ifndef _MACHINE_MD_VAR_H_ +#define _MACHINE_MD_VAR_H_ + +#include <x86/x86_var.h> + +extern uint64_t *vm_page_dump; + +struct savefpu; + +void amd64_db_resume_dbreg(void); +void amd64_syscall(struct thread *td, int traced); +void doreti_iret(void) __asm(__STRING(doreti_iret)); +void doreti_iret_fault(void) __asm(__STRING(doreti_iret_fault)); +void ld_ds(void) __asm(__STRING(ld_ds)); +void ld_es(void) __asm(__STRING(ld_es)); +void ld_fs(void) __asm(__STRING(ld_fs)); +void ld_gs(void) __asm(__STRING(ld_gs)); +void ld_fsbase(void) __asm(__STRING(ld_fsbase)); +void ld_gsbase(void) __asm(__STRING(ld_gsbase)); +void ds_load_fault(void) __asm(__STRING(ds_load_fault)); +void es_load_fault(void) __asm(__STRING(es_load_fault)); +void fs_load_fault(void) __asm(__STRING(fs_load_fault)); +void gs_load_fault(void) __asm(__STRING(gs_load_fault)); +void fsbase_load_fault(void) __asm(__STRING(fsbase_load_fault)); +void gsbase_load_fault(void) __asm(__STRING(gsbase_load_fault)); +void fpstate_drop(struct thread *td); +void pagezero(void *addr); +void setidt(int idx, alias_for_inthand_t *func, int typ, int dpl, int ist); +struct savefpu *get_pcb_user_save_td(struct thread *td); +struct savefpu *get_pcb_user_save_pcb(struct pcb *pcb); + +#endif /* !_MACHINE_MD_VAR_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/memdev.h /usr/src/sys/modules/netmap/machine/memdev.h --- usr/src/sys/modules/netmap/machine/memdev.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/memdev.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,40 @@ +/*- + * Copyright (c) 2004 Mark R V Murray + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer + * in this position and unchanged. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/memdev.h 217515 2011-01-17 22:58:28Z jkim $ + */ + +#ifndef _MACHINE_MEMDEV_H_ +#define _MACHINE_MEMDEV_H_ + +#define CDEV_MINOR_MEM 0 +#define CDEV_MINOR_KMEM 1 + +d_open_t memopen; +d_read_t memrw; +d_ioctl_t memioctl; +d_mmap_t memmmap; + +#endif /* _MACHINE_MEMDEV_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/metadata.h /usr/src/sys/modules/netmap/machine/metadata.h --- usr/src/sys/modules/netmap/machine/metadata.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/metadata.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/metadata.h 293343 2016-01-07 19:47:26Z emaste $ */ + +#include <x86/metadata.h> diff -u -r -N usr/src/sys/modules/netmap/machine/minidump.h /usr/src/sys/modules/netmap/machine/minidump.h --- usr/src/sys/modules/netmap/machine/minidump.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/minidump.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,46 @@ +/*- + * Copyright (c) 2006 Peter Wemm + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/minidump.h 215133 2010-11-11 18:35:28Z avg $ + */ + +#ifndef _MACHINE_MINIDUMP_H_ +#define _MACHINE_MINIDUMP_H_ 1 + +#define MINIDUMP_MAGIC "minidump FreeBSD/amd64" +#define MINIDUMP_VERSION 2 + +struct minidumphdr { + char magic[24]; + uint32_t version; + uint32_t msgbufsize; + uint32_t bitmapsize; + uint32_t pmapsize; + uint64_t kernbase; + uint64_t dmapbase; + uint64_t dmapend; +}; + +#endif /* _MACHINE_MINIDUMP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/mp_watchdog.h /usr/src/sys/modules/netmap/machine/mp_watchdog.h --- usr/src/sys/modules/netmap/machine/mp_watchdog.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/mp_watchdog.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,34 @@ +/*- + * Copyright (c) 2004 Robert N. M. Watson + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/mp_watchdog.h 133759 2004-08-15 18:02:09Z rwatson $ + */ + +#ifndef _MACHINE_MP_WATCHDOG_H_ +#define _MACHINE_MP_WATCHDOG_H_ + +void ap_watchdog(u_int cpuid); + +#endif /* !_MACHINE_MP_WATCHDOG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/nexusvar.h /usr/src/sys/modules/netmap/machine/nexusvar.h --- usr/src/sys/modules/netmap/machine/nexusvar.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/nexusvar.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,45 @@ +/*- + * Copyright 1998 Massachusetts Institute of Technology + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose and without fee is hereby + * granted, provided that both the above copyright notice and this + * permission notice appear in all copies, that both the above + * copyright notice and this permission notice appear in all + * supporting documentation, and that the name of M.I.T. not be used + * in advertising or publicity pertaining to distribution of the + * software without specific, written prior permission. M.I.T. makes + * no representations about the suitability of this software for any + * purpose. It is provided "as is" without express or implied + * warranty. + * + * THIS SOFTWARE IS PROVIDED BY M.I.T. ``AS IS''. M.I.T. DISCLAIMS + * ALL EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS SOFTWARE, + * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT + * SHALL M.I.T. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/nexusvar.h 177157 2008-03-13 20:39:04Z jhb $ + */ + +#ifndef _MACHINE_NEXUSVAR_H_ +#define _MACHINE_NEXUSVAR_H_ + +struct nexus_device { + struct resource_list nx_resources; +}; + +DECLARE_CLASS(nexus_driver); + +extern struct rman irq_rman, drq_rman, port_rman, mem_rman; + +void nexus_init_resources(void); + +#endif /* !_MACHINE_NEXUSVAR_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/npx.h /usr/src/sys/modules/netmap/machine/npx.h --- usr/src/sys/modules/netmap/machine/npx.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/npx.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/npx.h 233044 2012-03-16 20:24:30Z tijl $ */ + +#include <x86/fpu.h> diff -u -r -N usr/src/sys/modules/netmap/machine/ofw_machdep.h /usr/src/sys/modules/netmap/machine/ofw_machdep.h --- usr/src/sys/modules/netmap/machine/ofw_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/ofw_machdep.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/ofw_machdep.h 250840 2013-05-21 03:05:49Z marcel $ */ + +#include <x86/ofw_machdep.h> diff -u -r -N usr/src/sys/modules/netmap/machine/param.h /usr/src/sys/modules/netmap/machine/param.h --- usr/src/sys/modules/netmap/machine/param.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/param.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,155 @@ +/*- + * Copyright (c) 2002 David E. O'Brien. All rights reserved. + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * the Systems Programming Group of the University of Utah Computer + * Science Department and Ralph Campbell. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)param.h 8.1 (Berkeley) 6/10/93 + * $FreeBSD: releng/11.0/sys/amd64/include/param.h 297877 2016-04-12 21:23:44Z jhb $ + */ + + +#ifndef _AMD64_INCLUDE_PARAM_H_ +#define _AMD64_INCLUDE_PARAM_H_ + +#include <machine/_align.h> + +/* + * Machine dependent constants for AMD64. + */ + + +#define __HAVE_ACPI +#define __PCI_REROUTE_INTERRUPT + +#ifndef MACHINE +#define MACHINE "amd64" +#endif +#ifndef MACHINE_ARCH +#define MACHINE_ARCH "amd64" +#endif +#ifndef MACHINE_ARCH32 +#define MACHINE_ARCH32 "i386" +#endif + +#if defined(SMP) || defined(KLD_MODULE) +#ifndef MAXCPU +#define MAXCPU 256 +#endif +#else +#define MAXCPU 1 +#endif + +#ifndef MAXMEMDOM +#define MAXMEMDOM 8 +#endif + +#define ALIGNBYTES _ALIGNBYTES +#define ALIGN(p) _ALIGN(p) +/* + * ALIGNED_POINTER is a boolean macro that checks whether an address + * is valid to fetch data elements of type t from on this architecture. + * This does not reflect the optimal alignment, just the possibility + * (within reasonable limits). + */ +#define ALIGNED_POINTER(p, t) 1 + +/* + * CACHE_LINE_SIZE is the compile-time maximum cache line size for an + * architecture. It should be used with appropriate caution. + */ +#define CACHE_LINE_SHIFT 7 +#define CACHE_LINE_SIZE (1 << CACHE_LINE_SHIFT) + +/* Size of the level 1 page table units */ +#define NPTEPG (PAGE_SIZE/(sizeof (pt_entry_t))) +#define NPTEPGSHIFT 9 /* LOG2(NPTEPG) */ +#define PAGE_SHIFT 12 /* LOG2(PAGE_SIZE) */ +#define PAGE_SIZE (1<<PAGE_SHIFT) /* bytes/page */ +#define PAGE_MASK (PAGE_SIZE-1) +/* Size of the level 2 page directory units */ +#define NPDEPG (PAGE_SIZE/(sizeof (pd_entry_t))) +#define NPDEPGSHIFT 9 /* LOG2(NPDEPG) */ +#define PDRSHIFT 21 /* LOG2(NBPDR) */ +#define NBPDR (1<<PDRSHIFT) /* bytes/page dir */ +#define PDRMASK (NBPDR-1) +/* Size of the level 3 page directory pointer table units */ +#define NPDPEPG (PAGE_SIZE/(sizeof (pdp_entry_t))) +#define NPDPEPGSHIFT 9 /* LOG2(NPDPEPG) */ +#define PDPSHIFT 30 /* LOG2(NBPDP) */ +#define NBPDP (1<<PDPSHIFT) /* bytes/page dir ptr table */ +#define PDPMASK (NBPDP-1) +/* Size of the level 4 page-map level-4 table units */ +#define NPML4EPG (PAGE_SIZE/(sizeof (pml4_entry_t))) +#define NPML4EPGSHIFT 9 /* LOG2(NPML4EPG) */ +#define PML4SHIFT 39 /* LOG2(NBPML4) */ +#define NBPML4 (1UL<<PML4SHIFT)/* bytes/page map lev4 table */ +#define PML4MASK (NBPML4-1) + +#define MAXPAGESIZES 3 /* maximum number of supported page sizes */ + +#define IOPAGES 2 /* pages of i/o permission bitmap */ +/* + * I/O permission bitmap has a bit for each I/O port plus an additional + * byte at the end with all bits set. See section "I/O Permission Bit Map" + * in the Intel SDM for more details. + */ +#define IOPERM_BITMAP_SIZE (IOPAGES * PAGE_SIZE + 1) + +#ifndef KSTACK_PAGES +#define KSTACK_PAGES 4 /* pages of kstack (with pcb) */ +#endif +#define KSTACK_GUARD_PAGES 1 /* pages of kstack guard; 0 disables */ + +/* + * Mach derived conversion macros + */ +#define round_page(x) ((((unsigned long)(x)) + PAGE_MASK) & ~(PAGE_MASK)) +#define trunc_page(x) ((unsigned long)(x) & ~(PAGE_MASK)) +#define trunc_2mpage(x) ((unsigned long)(x) & ~PDRMASK) +#define round_2mpage(x) ((((unsigned long)(x)) + PDRMASK) & ~PDRMASK) +#define trunc_1gpage(x) ((unsigned long)(x) & ~PDPMASK) + +#define atop(x) ((unsigned long)(x) >> PAGE_SHIFT) +#define ptoa(x) ((unsigned long)(x) << PAGE_SHIFT) + +#define amd64_btop(x) ((unsigned long)(x) >> PAGE_SHIFT) +#define amd64_ptob(x) ((unsigned long)(x) << PAGE_SHIFT) + +#define pgtok(x) ((unsigned long)(x) * (PAGE_SIZE / 1024)) + +#define INKERNEL(va) (((va) >= DMAP_MIN_ADDRESS && (va) < DMAP_MAX_ADDRESS) \ + || ((va) >= VM_MIN_KERNEL_ADDRESS && (va) < VM_MAX_KERNEL_ADDRESS)) + +#endif /* !_AMD64_INCLUDE_PARAM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/pc/bios.h /usr/src/sys/modules/netmap/machine/pc/bios.h --- usr/src/sys/modules/netmap/machine/pc/bios.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pc/bios.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,121 @@ +/*- + * Copyright (c) 1997 Michael Smith + * Copyright (c) 1998 Jonathan Lemon + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/pc/bios.h 270828 2014-08-29 21:25:47Z jhb $ + */ + +#ifndef _MACHINE_PC_BIOS_H_ +#define _MACHINE_PC_BIOS_H_ + +/* + * Int 15:E820 'SMAP' structure + */ +#define SMAP_SIG 0x534D4150 /* 'SMAP' */ + +#define SMAP_TYPE_MEMORY 1 +#define SMAP_TYPE_RESERVED 2 +#define SMAP_TYPE_ACPI_RECLAIM 3 +#define SMAP_TYPE_ACPI_NVS 4 +#define SMAP_TYPE_ACPI_ERROR 5 + +#define SMAP_XATTR_ENABLED 0x00000001 +#define SMAP_XATTR_NON_VOLATILE 0x00000002 +#define SMAP_XATTR_MASK (SMAP_XATTR_ENABLED | SMAP_XATTR_NON_VOLATILE) + +struct bios_smap { + u_int64_t base; + u_int64_t length; + u_int32_t type; +} __packed; + +/* Structure extended to include extended attribute field in ACPI 3.0. */ +struct bios_smap_xattr { + u_int64_t base; + u_int64_t length; + u_int32_t type; + u_int32_t xattr; +} __packed; + +/* + * System Management BIOS + */ +#define SMBIOS_START 0xf0000 +#define SMBIOS_STEP 0x10 +#define SMBIOS_OFF 0 +#define SMBIOS_LEN 4 +#define SMBIOS_SIG "_SM_" + +struct smbios_eps { + uint8_t anchor_string[4]; /* '_SM_' */ + uint8_t checksum; + uint8_t length; + uint8_t major_version; + uint8_t minor_version; + uint16_t maximum_structure_size; + uint8_t entry_point_revision; + uint8_t formatted_area[5]; + uint8_t intermediate_anchor_string[5]; /* '_DMI_' */ + uint8_t intermediate_checksum; + uint16_t structure_table_length; + uint32_t structure_table_address; + uint16_t number_structures; + uint8_t BCD_revision; +}; + +struct smbios_structure_header { + uint8_t type; + uint8_t length; + uint16_t handle; +}; + +#ifdef _KERNEL +#define BIOS_PADDRTOVADDR(x) ((x) + KERNBASE) +#define BIOS_VADDRTOPADDR(x) ((x) - KERNBASE) + +struct bios_oem_signature { + char * anchor; /* search anchor string in BIOS memory */ + size_t offset; /* offset from anchor (may be negative) */ + size_t totlen; /* total length of BIOS string to copy */ +} __packed; + +struct bios_oem_range { + u_int from; /* shouldn't be below 0xe0000 */ + u_int to; /* shouldn't be above 0xfffff */ +} __packed; + +struct bios_oem { + struct bios_oem_range range; + struct bios_oem_signature signature[]; +} __packed; + +int bios_oem_strings(struct bios_oem *oem, u_char *buffer, size_t maxlen); +uint32_t bios_sigsearch(uint32_t start, u_char *sig, int siglen, int paralen, + int sigofs); +void bios_add_smap_entries(struct bios_smap *smapbase, u_int32_t smapsize, + vm_paddr_t *physmap, int *physmap_idx); +#endif + +#endif /* _MACHINE_PC_BIOS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/pc/display.h /usr/src/sys/modules/netmap/machine/pc/display.h --- usr/src/sys/modules/netmap/machine/pc/display.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pc/display.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,45 @@ +/* + * IBM PC display definitions + * + * $FreeBSD: releng/11.0/sys/amd64/include/pc/display.h 139730 2005-01-05 20:11:13Z imp $ + */ + +/* Color attributes for foreground text */ + +#define FG_BLACK 0 +#define FG_BLUE 1 +#define FG_GREEN 2 +#define FG_CYAN 3 +#define FG_RED 4 +#define FG_MAGENTA 5 +#define FG_BROWN 6 +#define FG_LIGHTGREY 7 +#define FG_DARKGREY 8 +#define FG_LIGHTBLUE 9 +#define FG_LIGHTGREEN 10 +#define FG_LIGHTCYAN 11 +#define FG_LIGHTRED 12 +#define FG_LIGHTMAGENTA 13 +#define FG_YELLOW 14 +#define FG_WHITE 15 +#define FG_BLINK 0x80 + +/* Color attributes for text background */ + +#define BG_BLACK 0x00 +#define BG_BLUE 0x10 +#define BG_GREEN 0x20 +#define BG_CYAN 0x30 +#define BG_RED 0x40 +#define BG_MAGENTA 0x50 +#define BG_BROWN 0x60 +#define BG_LIGHTGREY 0x70 + +/* Monochrome attributes for foreground text */ + +#define FG_UNDERLINE 0x01 +#define FG_INTENSE 0x08 + +/* Monochrome attributes for text background */ + +#define BG_INTENSE 0x10 diff -u -r -N usr/src/sys/modules/netmap/machine/pcb.h /usr/src/sys/modules/netmap/machine/pcb.h --- usr/src/sys/modules/netmap/machine/pcb.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pcb.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,157 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)pcb.h 5.10 (Berkeley) 5/12/91 + * $FreeBSD: releng/11.0/sys/amd64/include/pcb.h 290728 2015-11-12 22:00:59Z jhb $ + */ + +#ifndef _AMD64_PCB_H_ +#define _AMD64_PCB_H_ + +/* + * AMD64 process control block + */ +#include <machine/fpu.h> +#include <machine/segments.h> + +#ifdef __amd64__ +/* + * NB: The fields marked with (*) are used by kernel debuggers. Their + * ABI should be preserved. + */ +struct pcb { + register_t pcb_r15; /* (*) */ + register_t pcb_r14; /* (*) */ + register_t pcb_r13; /* (*) */ + register_t pcb_r12; /* (*) */ + register_t pcb_rbp; /* (*) */ + register_t pcb_rsp; /* (*) */ + register_t pcb_rbx; /* (*) */ + register_t pcb_rip; /* (*) */ + register_t pcb_fsbase; + register_t pcb_gsbase; + register_t pcb_kgsbase; + register_t pcb_cr0; + register_t pcb_cr2; + register_t pcb_cr3; + register_t pcb_cr4; + register_t pcb_dr0; + register_t pcb_dr1; + register_t pcb_dr2; + register_t pcb_dr3; + register_t pcb_dr6; + register_t pcb_dr7; + + struct region_descriptor pcb_gdt; + struct region_descriptor pcb_idt; + struct region_descriptor pcb_ldt; + uint16_t pcb_tr; + + u_int pcb_flags; +#define PCB_FULL_IRET 0x01 /* full iret is required */ +#define PCB_DBREGS 0x02 /* process using debug registers */ +#define PCB_KERNFPU 0x04 /* kernel uses fpu */ +#define PCB_FPUINITDONE 0x08 /* fpu state is initialized */ +#define PCB_USERFPUINITDONE 0x10 /* fpu user state is initialized */ +#define PCB_32BIT 0x40 /* process has 32 bit context (segs etc) */ + + uint16_t pcb_initial_fpucw; + + /* copyin/out fault recovery */ + caddr_t pcb_onfault; + + uint64_t pcb_pad0; + + /* local tss, with i/o bitmap; NULL for common */ + struct amd64tss *pcb_tssp; + + /* model specific registers */ + register_t pcb_efer; + register_t pcb_star; + register_t pcb_lstar; + register_t pcb_cstar; + register_t pcb_sfmask; + + struct savefpu *pcb_save; + + uint64_t pcb_pad[5]; +}; + +/* Per-CPU state saved during suspend and resume. */ +struct susppcb { + struct pcb sp_pcb; + + /* fpu context for suspend/resume */ + void *sp_fpususpend; +}; +#endif + +#ifdef _KERNEL +struct trapframe; + +/* + * The pcb_flags is only modified by current thread, or by other threads + * when current thread is stopped. However, current thread may change it + * from the interrupt context in cpu_switch(), or in the trap handler. + * When we read-modify-write pcb_flags from C sources, compiler may generate + * code that is not atomic regarding the interrupt handler. If a trap or + * interrupt happens and any flag is modified from the handler, it can be + * clobbered with the cached value later. Therefore, we implement setting + * and clearing flags with single-instruction functions, which do not race + * with possible modification of the flags from the trap or interrupt context, + * because traps and interrupts are executed only on instruction boundary. + */ +static __inline void +set_pcb_flags(struct pcb *pcb, const u_int flags) +{ + + __asm __volatile("orl %1,%0" + : "=m" (pcb->pcb_flags) : "ir" (flags), "m" (pcb->pcb_flags) + : "cc"); +} + +static __inline void +clear_pcb_flags(struct pcb *pcb, const u_int flags) +{ + + __asm __volatile("andl %1,%0" + : "=m" (pcb->pcb_flags) : "ir" (~flags), "m" (pcb->pcb_flags) + : "cc"); +} + +void makectx(struct trapframe *, struct pcb *); +int savectx(struct pcb *) __returns_twice; +void resumectx(struct pcb *); + +#endif + +#endif /* _AMD64_PCB_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/pci_cfgreg.h /usr/src/sys/modules/netmap/machine/pci_cfgreg.h --- usr/src/sys/modules/netmap/machine/pci_cfgreg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pci_cfgreg.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/pci_cfgreg.h 223440 2011-06-22 21:04:13Z jhb $ */ + +#include <x86/pci_cfgreg.h> diff -u -r -N usr/src/sys/modules/netmap/machine/pcpu.h /usr/src/sys/modules/netmap/machine/pcpu.h --- usr/src/sys/modules/netmap/machine/pcpu.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pcpu.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,251 @@ +/*- + * Copyright (c) Peter Wemm <peter@netplex.com.au> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/pcpu.h 282684 2015-05-09 19:11:01Z kib $ + */ + +#ifndef _MACHINE_PCPU_H_ +#define _MACHINE_PCPU_H_ + +#ifndef _SYS_CDEFS_H_ +#error "sys/cdefs.h is a prerequisite for this file" +#endif + +/* + * The SMP parts are setup in pmap.c and locore.s for the BSP, and + * mp_machdep.c sets up the data for the AP's to "see" when they awake. + * The reason for doing it via a struct is so that an array of pointers + * to each CPU's data can be set up for things like "check curproc on all + * other processors" + */ +#define PCPU_MD_FIELDS \ + char pc_monitorbuf[128] __aligned(128); /* cache line */ \ + struct pcpu *pc_prvspace; /* Self-reference */ \ + struct pmap *pc_curpmap; \ + struct amd64tss *pc_tssp; /* TSS segment active on CPU */ \ + struct amd64tss *pc_commontssp;/* Common TSS for the CPU */ \ + register_t pc_rsp0; \ + register_t pc_scratch_rsp; /* User %rsp in syscall */ \ + u_int pc_apic_id; \ + u_int pc_acpi_id; /* ACPI CPU id */ \ + /* Pointer to the CPU %fs descriptor */ \ + struct user_segment_descriptor *pc_fs32p; \ + /* Pointer to the CPU %gs descriptor */ \ + struct user_segment_descriptor *pc_gs32p; \ + /* Pointer to the CPU LDT descriptor */ \ + struct system_segment_descriptor *pc_ldt; \ + /* Pointer to the CPU TSS descriptor */ \ + struct system_segment_descriptor *pc_tss; \ + uint64_t pc_pm_save_cnt; \ + u_int pc_cmci_mask; /* MCx banks for CMCI */ \ + uint64_t pc_dbreg[16]; /* ddb debugging regs */ \ + int pc_dbreg_cmd; /* ddb debugging reg cmd */ \ + u_int pc_vcpu_id; /* Xen vCPU ID */ \ + uint32_t pc_pcid_next; \ + uint32_t pc_pcid_gen; \ + char __pad[149] /* be divisor of PAGE_SIZE \ + after cache alignment */ + +#define PC_DBREG_CMD_NONE 0 +#define PC_DBREG_CMD_LOAD 1 + +#ifdef _KERNEL + +#ifdef lint + +extern struct pcpu *pcpup; + +#define PCPU_GET(member) (pcpup->pc_ ## member) +#define PCPU_ADD(member, val) (pcpup->pc_ ## member += (val)) +#define PCPU_INC(member) PCPU_ADD(member, 1) +#define PCPU_PTR(member) (&pcpup->pc_ ## member) +#define PCPU_SET(member, val) (pcpup->pc_ ## member = (val)) + +#elif defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE___TYPEOF) + +/* + * Evaluates to the byte offset of the per-cpu variable name. + */ +#define __pcpu_offset(name) \ + __offsetof(struct pcpu, name) + +/* + * Evaluates to the type of the per-cpu variable name. + */ +#define __pcpu_type(name) \ + __typeof(((struct pcpu *)0)->name) + +/* + * Evaluates to the address of the per-cpu variable name. + */ +#define __PCPU_PTR(name) __extension__ ({ \ + __pcpu_type(name) *__p; \ + \ + __asm __volatile("movq %%gs:%1,%0; addq %2,%0" \ + : "=r" (__p) \ + : "m" (*(struct pcpu *)(__pcpu_offset(pc_prvspace))), \ + "i" (__pcpu_offset(name))); \ + \ + __p; \ +}) + +/* + * Evaluates to the value of the per-cpu variable name. + */ +#define __PCPU_GET(name) __extension__ ({ \ + __pcpu_type(name) __res; \ + struct __s { \ + u_char __b[MIN(sizeof(__pcpu_type(name)), 8)]; \ + } __s; \ + \ + if (sizeof(__res) == 1 || sizeof(__res) == 2 || \ + sizeof(__res) == 4 || sizeof(__res) == 8) { \ + __asm __volatile("mov %%gs:%1,%0" \ + : "=r" (__s) \ + : "m" (*(struct __s *)(__pcpu_offset(name)))); \ + *(struct __s *)(void *)&__res = __s; \ + } else { \ + __res = *__PCPU_PTR(name); \ + } \ + __res; \ +}) + +/* + * Adds the value to the per-cpu counter name. The implementation + * must be atomic with respect to interrupts. + */ +#define __PCPU_ADD(name, val) do { \ + __pcpu_type(name) __val; \ + struct __s { \ + u_char __b[MIN(sizeof(__pcpu_type(name)), 8)]; \ + } __s; \ + \ + __val = (val); \ + if (sizeof(__val) == 1 || sizeof(__val) == 2 || \ + sizeof(__val) == 4 || sizeof(__val) == 8) { \ + __s = *(struct __s *)(void *)&__val; \ + __asm __volatile("add %1,%%gs:%0" \ + : "=m" (*(struct __s *)(__pcpu_offset(name))) \ + : "r" (__s)); \ + } else \ + *__PCPU_PTR(name) += __val; \ +} while (0) + +/* + * Increments the value of the per-cpu counter name. The implementation + * must be atomic with respect to interrupts. + */ +#define __PCPU_INC(name) do { \ + CTASSERT(sizeof(__pcpu_type(name)) == 1 || \ + sizeof(__pcpu_type(name)) == 2 || \ + sizeof(__pcpu_type(name)) == 4 || \ + sizeof(__pcpu_type(name)) == 8); \ + if (sizeof(__pcpu_type(name)) == 1) { \ + __asm __volatile("incb %%gs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))\ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name))));\ + } else if (sizeof(__pcpu_type(name)) == 2) { \ + __asm __volatile("incw %%gs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))\ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name))));\ + } else if (sizeof(__pcpu_type(name)) == 4) { \ + __asm __volatile("incl %%gs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))\ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name))));\ + } else if (sizeof(__pcpu_type(name)) == 8) { \ + __asm __volatile("incq %%gs:%0" \ + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))\ + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name))));\ + } \ +} while (0) + +/* + * Sets the value of the per-cpu variable name to value val. + */ +#define __PCPU_SET(name, val) { \ + __pcpu_type(name) __val; \ + struct __s { \ + u_char __b[MIN(sizeof(__pcpu_type(name)), 8)]; \ + } __s; \ + \ + __val = (val); \ + if (sizeof(__val) == 1 || sizeof(__val) == 2 || \ + sizeof(__val) == 4 || sizeof(__val) == 8) { \ + __s = *(struct __s *)(void *)&__val; \ + __asm __volatile("mov %1,%%gs:%0" \ + : "=m" (*(struct __s *)(__pcpu_offset(name))) \ + : "r" (__s)); \ + } else { \ + *__PCPU_PTR(name) = __val; \ + } \ +} + +#define PCPU_GET(member) __PCPU_GET(pc_ ## member) +#define PCPU_ADD(member, val) __PCPU_ADD(pc_ ## member, val) +#define PCPU_INC(member) __PCPU_INC(pc_ ## member) +#define PCPU_PTR(member) __PCPU_PTR(pc_ ## member) +#define PCPU_SET(member, val) __PCPU_SET(pc_ ## member, val) + +#define OFFSETOF_CURTHREAD 0 +#ifdef __clang__ +#pragma clang diagnostic push +#pragma clang diagnostic ignored "-Wnull-dereference" +#endif +static __inline __pure2 struct thread * +__curthread(void) +{ + struct thread *td; + + __asm("movq %%gs:%1,%0" : "=r" (td) + : "m" (*(char *)OFFSETOF_CURTHREAD)); + return (td); +} +#ifdef __clang__ +#pragma clang diagnostic pop +#endif +#define curthread (__curthread()) + +#define OFFSETOF_CURPCB 32 +static __inline __pure2 struct pcb * +__curpcb(void) +{ + struct pcb *pcb; + + __asm("movq %%gs:%1,%0" : "=r" (pcb) : "m" (*(char *)OFFSETOF_CURPCB)); + return (pcb); +} +#define curpcb (__curpcb()) + +#define IS_BSP() (PCPU_GET(cpuid) == 0) + +#else /* !lint || defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE___TYPEOF) */ + +#error "this file needs to be ported to your compiler" + +#endif /* lint, etc. */ + +#endif /* _KERNEL */ + +#endif /* !_MACHINE_PCPU_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/pmap.h /usr/src/sys/modules/netmap/machine/pmap.h --- usr/src/sys/modules/netmap/machine/pmap.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pmap.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,421 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1991 Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * the Systems Programming Group of the University of Utah Computer + * Science Department and William Jolitz of UUNET Technologies Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Derived from hp300 version by Mike Hibler, this version by William + * Jolitz uses a recursive map [a pde points to the page directory] to + * map the page tables using the pagetables themselves. This is done to + * reduce the impact on kernel virtual memory for lots of sparse address + * space, and to reduce the cost of memory to each process. + * + * from: hp300: @(#)pmap.h 7.2 (Berkeley) 12/16/90 + * from: @(#)pmap.h 7.4 (Berkeley) 5/12/91 + * $FreeBSD: releng/11.0/sys/amd64/include/pmap.h 299350 2016-05-10 09:58:51Z kib $ + */ + +#ifndef _MACHINE_PMAP_H_ +#define _MACHINE_PMAP_H_ + +/* + * Page-directory and page-table entries follow this format, with a few + * of the fields not present here and there, depending on a lot of things. + */ + /* ---- Intel Nomenclature ---- */ +#define X86_PG_V 0x001 /* P Valid */ +#define X86_PG_RW 0x002 /* R/W Read/Write */ +#define X86_PG_U 0x004 /* U/S User/Supervisor */ +#define X86_PG_NC_PWT 0x008 /* PWT Write through */ +#define X86_PG_NC_PCD 0x010 /* PCD Cache disable */ +#define X86_PG_A 0x020 /* A Accessed */ +#define X86_PG_M 0x040 /* D Dirty */ +#define X86_PG_PS 0x080 /* PS Page size (0=4k,1=2M) */ +#define X86_PG_PTE_PAT 0x080 /* PAT PAT index */ +#define X86_PG_G 0x100 /* G Global */ +#define X86_PG_AVAIL1 0x200 /* / Available for system */ +#define X86_PG_AVAIL2 0x400 /* < programmers use */ +#define X86_PG_AVAIL3 0x800 /* \ */ +#define X86_PG_PDE_PAT 0x1000 /* PAT PAT index */ +#define X86_PG_NX (1ul<<63) /* No-execute */ +#define X86_PG_AVAIL(x) (1ul << (x)) + +/* Page level cache control fields used to determine the PAT type */ +#define X86_PG_PDE_CACHE (X86_PG_PDE_PAT | X86_PG_NC_PWT | X86_PG_NC_PCD) +#define X86_PG_PTE_CACHE (X86_PG_PTE_PAT | X86_PG_NC_PWT | X86_PG_NC_PCD) + +/* + * Intel extended page table (EPT) bit definitions. + */ +#define EPT_PG_READ 0x001 /* R Read */ +#define EPT_PG_WRITE 0x002 /* W Write */ +#define EPT_PG_EXECUTE 0x004 /* X Execute */ +#define EPT_PG_IGNORE_PAT 0x040 /* IPAT Ignore PAT */ +#define EPT_PG_PS 0x080 /* PS Page size */ +#define EPT_PG_A 0x100 /* A Accessed */ +#define EPT_PG_M 0x200 /* D Dirty */ +#define EPT_PG_MEMORY_TYPE(x) ((x) << 3) /* MT Memory Type */ + +/* + * Define the PG_xx macros in terms of the bits on x86 PTEs. + */ +#define PG_V X86_PG_V +#define PG_RW X86_PG_RW +#define PG_U X86_PG_U +#define PG_NC_PWT X86_PG_NC_PWT +#define PG_NC_PCD X86_PG_NC_PCD +#define PG_A X86_PG_A +#define PG_M X86_PG_M +#define PG_PS X86_PG_PS +#define PG_PTE_PAT X86_PG_PTE_PAT +#define PG_G X86_PG_G +#define PG_AVAIL1 X86_PG_AVAIL1 +#define PG_AVAIL2 X86_PG_AVAIL2 +#define PG_AVAIL3 X86_PG_AVAIL3 +#define PG_PDE_PAT X86_PG_PDE_PAT +#define PG_NX X86_PG_NX +#define PG_PDE_CACHE X86_PG_PDE_CACHE +#define PG_PTE_CACHE X86_PG_PTE_CACHE + +/* Our various interpretations of the above */ +#define PG_W X86_PG_AVAIL3 /* "Wired" pseudoflag */ +#define PG_MANAGED X86_PG_AVAIL2 +#define EPT_PG_EMUL_V X86_PG_AVAIL(52) +#define EPT_PG_EMUL_RW X86_PG_AVAIL(53) +#define PG_FRAME (0x000ffffffffff000ul) +#define PG_PS_FRAME (0x000fffffffe00000ul) + +/* + * Promotion to a 2MB (PDE) page mapping requires that the corresponding 4KB + * (PTE) page mappings have identical settings for the following fields: + */ +#define PG_PTE_PROMOTE (PG_NX | PG_MANAGED | PG_W | PG_G | PG_PTE_CACHE | \ + PG_M | PG_A | PG_U | PG_RW | PG_V) + +/* + * Page Protection Exception bits + */ + +#define PGEX_P 0x01 /* Protection violation vs. not present */ +#define PGEX_W 0x02 /* during a Write cycle */ +#define PGEX_U 0x04 /* access from User mode (UPL) */ +#define PGEX_RSV 0x08 /* reserved PTE field is non-zero */ +#define PGEX_I 0x10 /* during an instruction fetch */ + +/* + * undef the PG_xx macros that define bits in the regular x86 PTEs that + * have a different position in nested PTEs. This is done when compiling + * code that needs to be aware of the differences between regular x86 and + * nested PTEs. + * + * The appropriate bitmask will be calculated at runtime based on the pmap + * type. + */ +#ifdef AMD64_NPT_AWARE +#undef PG_AVAIL1 /* X86_PG_AVAIL1 aliases with EPT_PG_M */ +#undef PG_G +#undef PG_A +#undef PG_M +#undef PG_PDE_PAT +#undef PG_PDE_CACHE +#undef PG_PTE_PAT +#undef PG_PTE_CACHE +#undef PG_RW +#undef PG_V +#endif + +/* + * Pte related macros. This is complicated by having to deal with + * the sign extension of the 48th bit. + */ +#define KVADDR(l4, l3, l2, l1) ( \ + ((unsigned long)-1 << 47) | \ + ((unsigned long)(l4) << PML4SHIFT) | \ + ((unsigned long)(l3) << PDPSHIFT) | \ + ((unsigned long)(l2) << PDRSHIFT) | \ + ((unsigned long)(l1) << PAGE_SHIFT)) + +#define UVADDR(l4, l3, l2, l1) ( \ + ((unsigned long)(l4) << PML4SHIFT) | \ + ((unsigned long)(l3) << PDPSHIFT) | \ + ((unsigned long)(l2) << PDRSHIFT) | \ + ((unsigned long)(l1) << PAGE_SHIFT)) + +/* + * Number of kernel PML4 slots. Can be anywhere from 1 to 64 or so, + * but setting it larger than NDMPML4E makes no sense. + * + * Each slot provides .5 TB of kernel virtual space. + */ +#define NKPML4E 4 + +#define NUPML4E (NPML4EPG/2) /* number of userland PML4 pages */ +#define NUPDPE (NUPML4E*NPDPEPG)/* number of userland PDP pages */ +#define NUPDE (NUPDPE*NPDEPG) /* number of userland PD entries */ + +/* + * NDMPML4E is the maximum number of PML4 entries that will be + * used to implement the direct map. It must be a power of two, + * and should generally exceed NKPML4E. The maximum possible + * value is 64; using 128 will make the direct map intrude into + * the recursive page table map. + */ +#define NDMPML4E 8 + +/* + * These values control the layout of virtual memory. The starting address + * of the direct map, which is controlled by DMPML4I, must be a multiple of + * its size. (See the PHYS_TO_DMAP() and DMAP_TO_PHYS() macros.) + * + * Note: KPML4I is the index of the (single) level 4 page that maps + * the KVA that holds KERNBASE, while KPML4BASE is the index of the + * first level 4 page that maps VM_MIN_KERNEL_ADDRESS. If NKPML4E + * is 1, these are the same, otherwise KPML4BASE < KPML4I and extra + * level 4 PDEs are needed to map from VM_MIN_KERNEL_ADDRESS up to + * KERNBASE. + * + * (KPML4I combines with KPDPI to choose where KERNBASE starts. + * Or, in other words, KPML4I provides bits 39..47 of KERNBASE, + * and KPDPI provides bits 30..38.) + */ +#define PML4PML4I (NPML4EPG/2) /* Index of recursive pml4 mapping */ + +#define KPML4BASE (NPML4EPG-NKPML4E) /* KVM at highest addresses */ +#define DMPML4I rounddown(KPML4BASE-NDMPML4E, NDMPML4E) /* Below KVM */ + +#define KPML4I (NPML4EPG-1) +#define KPDPI (NPDPEPG-2) /* kernbase at -2GB */ + +/* + * XXX doesn't really belong here I guess... + */ +#define ISA_HOLE_START 0xa0000 +#define ISA_HOLE_LENGTH (0x100000-ISA_HOLE_START) + +#define PMAP_PCID_NONE 0xffffffff +#define PMAP_PCID_KERN 0 +#define PMAP_PCID_OVERMAX 0x1000 + +#ifndef LOCORE + +#include <sys/queue.h> +#include <sys/_cpuset.h> +#include <sys/_lock.h> +#include <sys/_mutex.h> + +#include <vm/_vm_radix.h> + +typedef u_int64_t pd_entry_t; +typedef u_int64_t pt_entry_t; +typedef u_int64_t pdp_entry_t; +typedef u_int64_t pml4_entry_t; + +/* + * Address of current address space page table maps and directories. + */ +#ifdef _KERNEL +#define addr_PTmap (KVADDR(PML4PML4I, 0, 0, 0)) +#define addr_PDmap (KVADDR(PML4PML4I, PML4PML4I, 0, 0)) +#define addr_PDPmap (KVADDR(PML4PML4I, PML4PML4I, PML4PML4I, 0)) +#define addr_PML4map (KVADDR(PML4PML4I, PML4PML4I, PML4PML4I, PML4PML4I)) +#define addr_PML4pml4e (addr_PML4map + (PML4PML4I * sizeof(pml4_entry_t))) +#define PTmap ((pt_entry_t *)(addr_PTmap)) +#define PDmap ((pd_entry_t *)(addr_PDmap)) +#define PDPmap ((pd_entry_t *)(addr_PDPmap)) +#define PML4map ((pd_entry_t *)(addr_PML4map)) +#define PML4pml4e ((pd_entry_t *)(addr_PML4pml4e)) + +extern int nkpt; /* Initial number of kernel page tables */ +extern u_int64_t KPDPphys; /* physical address of kernel level 3 */ +extern u_int64_t KPML4phys; /* physical address of kernel level 4 */ + +/* + * virtual address to page table entry and + * to physical address. + * Note: these work recursively, thus vtopte of a pte will give + * the corresponding pde that in turn maps it. + */ +pt_entry_t *vtopte(vm_offset_t); +#define vtophys(va) pmap_kextract(((vm_offset_t) (va))) + +#define pte_load_store(ptep, pte) atomic_swap_long(ptep, pte) +#define pte_load_clear(ptep) atomic_swap_long(ptep, 0) +#define pte_store(ptep, pte) do { \ + *(u_long *)(ptep) = (u_long)(pte); \ +} while (0) +#define pte_clear(ptep) pte_store(ptep, 0) + +#define pde_store(pdep, pde) pte_store(pdep, pde) + +extern pt_entry_t pg_nx; + +#endif /* _KERNEL */ + +/* + * Pmap stuff + */ +struct pv_entry; +struct pv_chunk; + +/* + * Locks + * (p) PV list lock + */ +struct md_page { + TAILQ_HEAD(, pv_entry) pv_list; /* (p) */ + int pv_gen; /* (p) */ + int pat_mode; +}; + +enum pmap_type { + PT_X86, /* regular x86 page tables */ + PT_EPT, /* Intel's nested page tables */ + PT_RVI, /* AMD's nested page tables */ +}; + +struct pmap_pcids { + uint32_t pm_pcid; + uint32_t pm_gen; +}; + +/* + * The kernel virtual address (KVA) of the level 4 page table page is always + * within the direct map (DMAP) region. + */ +struct pmap { + struct mtx pm_mtx; + pml4_entry_t *pm_pml4; /* KVA of level 4 page table */ + uint64_t pm_cr3; + TAILQ_HEAD(,pv_chunk) pm_pvchunk; /* list of mappings in pmap */ + cpuset_t pm_active; /* active on cpus */ + enum pmap_type pm_type; /* regular or nested tables */ + struct pmap_statistics pm_stats; /* pmap statistics */ + struct vm_radix pm_root; /* spare page table pages */ + long pm_eptgen; /* EPT pmap generation id */ + int pm_flags; + struct pmap_pcids pm_pcids[MAXCPU]; +}; + +/* flags */ +#define PMAP_NESTED_IPIMASK 0xff +#define PMAP_PDE_SUPERPAGE (1 << 8) /* supports 2MB superpages */ +#define PMAP_EMULATE_AD_BITS (1 << 9) /* needs A/D bits emulation */ +#define PMAP_SUPPORTS_EXEC_ONLY (1 << 10) /* execute only mappings ok */ + +typedef struct pmap *pmap_t; + +#ifdef _KERNEL +extern struct pmap kernel_pmap_store; +#define kernel_pmap (&kernel_pmap_store) + +#define PMAP_LOCK(pmap) mtx_lock(&(pmap)->pm_mtx) +#define PMAP_LOCK_ASSERT(pmap, type) \ + mtx_assert(&(pmap)->pm_mtx, (type)) +#define PMAP_LOCK_DESTROY(pmap) mtx_destroy(&(pmap)->pm_mtx) +#define PMAP_LOCK_INIT(pmap) mtx_init(&(pmap)->pm_mtx, "pmap", \ + NULL, MTX_DEF | MTX_DUPOK) +#define PMAP_LOCKED(pmap) mtx_owned(&(pmap)->pm_mtx) +#define PMAP_MTX(pmap) (&(pmap)->pm_mtx) +#define PMAP_TRYLOCK(pmap) mtx_trylock(&(pmap)->pm_mtx) +#define PMAP_UNLOCK(pmap) mtx_unlock(&(pmap)->pm_mtx) + +int pmap_pinit_type(pmap_t pmap, enum pmap_type pm_type, int flags); +int pmap_emulate_accessed_dirty(pmap_t pmap, vm_offset_t va, int ftype); +#endif + +/* + * For each vm_page_t, there is a list of all currently valid virtual + * mappings of that page. An entry is a pv_entry_t, the list is pv_list. + */ +typedef struct pv_entry { + vm_offset_t pv_va; /* virtual address for mapping */ + TAILQ_ENTRY(pv_entry) pv_next; +} *pv_entry_t; + +/* + * pv_entries are allocated in chunks per-process. This avoids the + * need to track per-pmap assignments. + */ +#define _NPCM 3 +#define _NPCPV 168 +struct pv_chunk { + pmap_t pc_pmap; + TAILQ_ENTRY(pv_chunk) pc_list; + uint64_t pc_map[_NPCM]; /* bitmap; 1 = free */ + TAILQ_ENTRY(pv_chunk) pc_lru; + struct pv_entry pc_pventry[_NPCPV]; +}; + +#ifdef _KERNEL + +extern caddr_t CADDR1; +extern pt_entry_t *CMAP1; +extern vm_paddr_t phys_avail[]; +extern vm_paddr_t dump_avail[]; +extern vm_offset_t virtual_avail; +extern vm_offset_t virtual_end; +extern vm_paddr_t dmaplimit; + +#define pmap_page_get_memattr(m) ((vm_memattr_t)(m)->md.pat_mode) +#define pmap_page_is_write_mapped(m) (((m)->aflags & PGA_WRITEABLE) != 0) +#define pmap_unmapbios(va, sz) pmap_unmapdev((va), (sz)) + +struct thread; + +void pmap_activate_sw(struct thread *); +void pmap_bootstrap(vm_paddr_t *); +int pmap_change_attr(vm_offset_t, vm_size_t, int); +void pmap_demote_DMAP(vm_paddr_t base, vm_size_t len, boolean_t invalidate); +void pmap_init_pat(void); +void pmap_kenter(vm_offset_t va, vm_paddr_t pa); +void *pmap_kenter_temporary(vm_paddr_t pa, int i); +vm_paddr_t pmap_kextract(vm_offset_t); +void pmap_kremove(vm_offset_t); +void *pmap_mapbios(vm_paddr_t, vm_size_t); +void *pmap_mapdev(vm_paddr_t, vm_size_t); +void *pmap_mapdev_attr(vm_paddr_t, vm_size_t, int); +boolean_t pmap_page_is_mapped(vm_page_t m); +void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma); +void pmap_unmapdev(vm_offset_t, vm_size_t); +void pmap_invalidate_page(pmap_t, vm_offset_t); +void pmap_invalidate_range(pmap_t, vm_offset_t, vm_offset_t); +void pmap_invalidate_all(pmap_t); +void pmap_invalidate_cache(void); +void pmap_invalidate_cache_pages(vm_page_t *pages, int count); +void pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva, + boolean_t force); +void pmap_get_mapping(pmap_t pmap, vm_offset_t va, uint64_t *ptr, int *num); +boolean_t pmap_map_io_transient(vm_page_t *, vm_offset_t *, int, boolean_t); +void pmap_unmap_io_transient(vm_page_t *, vm_offset_t *, int, boolean_t); +#endif /* _KERNEL */ + +#endif /* !LOCORE */ + +#endif /* !_MACHINE_PMAP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/pmc_mdep.h /usr/src/sys/modules/netmap/machine/pmc_mdep.h --- usr/src/sys/modules/netmap/machine/pmc_mdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pmc_mdep.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,141 @@ +/*- + * Copyright (c) 2003-2008 Joseph Koshy + * Copyright (c) 2007 The FreeBSD Foundation + * All rights reserved. + * + * Portions of this software were developed by A. Joseph Koshy under + * sponsorship from the FreeBSD Foundation and Google, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/pmc_mdep.h 285041 2015-07-02 14:37:21Z kib $ + */ + +/* Machine dependent interfaces */ + +#ifndef _MACHINE_PMC_MDEP_H +#define _MACHINE_PMC_MDEP_H 1 + +#ifdef _KERNEL +struct pmc_mdep; +#endif + +#include <dev/hwpmc/hwpmc_amd.h> +#include <dev/hwpmc/hwpmc_core.h> +#include <dev/hwpmc/hwpmc_piv.h> +#include <dev/hwpmc/hwpmc_tsc.h> +#include <dev/hwpmc/hwpmc_uncore.h> + +/* + * Intel processors implementing V2 and later of the Intel performance + * measurement architecture have PMCs of the following classes: TSC, + * IAF, IAP, UCF and UCP. + */ +#define PMC_MDEP_CLASS_INDEX_TSC 1 +#define PMC_MDEP_CLASS_INDEX_K8 2 +#define PMC_MDEP_CLASS_INDEX_P4 2 +#define PMC_MDEP_CLASS_INDEX_IAP 2 +#define PMC_MDEP_CLASS_INDEX_IAF 3 +#define PMC_MDEP_CLASS_INDEX_UCP 4 +#define PMC_MDEP_CLASS_INDEX_UCF 5 + +/* + * On the amd64 platform we support the following PMCs. + * + * TSC The timestamp counter + * K8 AMD Athlon64 and Opteron PMCs in 64 bit mode. + * PIV Intel P4/HTT and P4/EMT64 + * IAP Intel Core/Core2/Atom CPUs in 64 bits mode. + * IAF Intel fixed-function PMCs in Core2 and later CPUs. + * UCP Intel Uncore programmable PMCs. + * UCF Intel Uncore fixed-function PMCs. + */ + +union pmc_md_op_pmcallocate { + struct pmc_md_amd_op_pmcallocate pm_amd; + struct pmc_md_iaf_op_pmcallocate pm_iaf; + struct pmc_md_iap_op_pmcallocate pm_iap; + struct pmc_md_ucf_op_pmcallocate pm_ucf; + struct pmc_md_ucp_op_pmcallocate pm_ucp; + struct pmc_md_p4_op_pmcallocate pm_p4; + uint64_t __pad[4]; +}; + +/* Logging */ +#define PMCLOG_READADDR PMCLOG_READ64 +#define PMCLOG_EMITADDR PMCLOG_EMIT64 + +#ifdef _KERNEL + +union pmc_md_pmc { + struct pmc_md_amd_pmc pm_amd; + struct pmc_md_iaf_pmc pm_iaf; + struct pmc_md_iap_pmc pm_iap; + struct pmc_md_ucf_pmc pm_ucf; + struct pmc_md_ucp_pmc pm_ucp; + struct pmc_md_p4_pmc pm_p4; +}; + +#define PMC_TRAPFRAME_TO_PC(TF) ((TF)->tf_rip) +#define PMC_TRAPFRAME_TO_FP(TF) ((TF)->tf_rbp) +#define PMC_TRAPFRAME_TO_USER_SP(TF) ((TF)->tf_rsp) +#define PMC_TRAPFRAME_TO_KERNEL_SP(TF) ((TF)->tf_rsp) + +#define PMC_AT_FUNCTION_PROLOGUE_PUSH_BP(I) \ + (((I) & 0xffffffff) == 0xe5894855) /* pushq %rbp; movq %rsp,%rbp */ +#define PMC_AT_FUNCTION_PROLOGUE_MOV_SP_BP(I) \ + (((I) & 0x00ffffff) == 0x00e58948) /* movq %rsp,%rbp */ +#define PMC_AT_FUNCTION_EPILOGUE_RET(I) \ + (((I) & 0xFF) == 0xC3) /* ret */ + +#define PMC_IN_TRAP_HANDLER(PC) \ + ((PC) >= (uintptr_t) start_exceptions && \ + (PC) < (uintptr_t) end_exceptions) + +#define PMC_IN_KERNEL_STACK(S,START,END) \ + ((S) >= (START) && (S) < (END)) +#define PMC_IN_KERNEL(va) INKERNEL(va) + +#define PMC_IN_USERSPACE(va) ((va) <= VM_MAXUSER_ADDRESS) + +/* Build a fake kernel trapframe from current instruction pointer. */ +#define PMC_FAKE_TRAPFRAME(TF) \ + do { \ + (TF)->tf_cs = 0; (TF)->tf_rflags = 0; \ + __asm __volatile("movq %%rbp,%0" : "=r" ((TF)->tf_rbp)); \ + __asm __volatile("movq %%rsp,%0" : "=r" ((TF)->tf_rsp)); \ + __asm __volatile("call 1f \n\t1: pop %0" : "=r"((TF)->tf_rip)); \ + } while (0) + +/* + * Prototypes + */ + +void start_exceptions(void), end_exceptions(void); + +struct pmc_mdep *pmc_amd_initialize(void); +void pmc_amd_finalize(struct pmc_mdep *_md); +struct pmc_mdep *pmc_intel_initialize(void); +void pmc_intel_finalize(struct pmc_mdep *_md); + +#endif /* _KERNEL */ +#endif /* _MACHINE_PMC_MDEP_H */ diff -u -r -N usr/src/sys/modules/netmap/machine/ppireg.h /usr/src/sys/modules/netmap/machine/ppireg.h --- usr/src/sys/modules/netmap/machine/ppireg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/ppireg.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,49 @@ +/*- + * Copyright (C) 2005 TAKAHASHI Yoshihiro. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/ppireg.h 146211 2005-05-14 09:10:02Z nyan $ + */ + +#ifndef _MACHINE_PPIREG_H_ +#define _MACHINE_PPIREG_H_ + +#ifdef _KERNEL + +#define IO_PPI 0x61 /* Programmable Peripheral Interface */ + +/* + * PPI speaker control values + */ + +#define PIT_ENABLETMR2 0x01 /* Enable timer/counter 2 */ +#define PIT_SPKRDATA 0x02 /* Direct to speaker */ + +#define PIT_SPKR (PIT_ENABLETMR2 | PIT_SPKRDATA) + +#define ppi_spkr_on() outb(IO_PPI, inb(IO_PPI) | PIT_SPKR) +#define ppi_spkr_off() outb(IO_PPI, inb(IO_PPI) & ~PIT_SPKR) + +#endif /* _KERNEL */ + +#endif /* _MACHINE_PPIREG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/proc.h /usr/src/sys/modules/netmap/machine/proc.h --- usr/src/sys/modules/netmap/machine/proc.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/proc.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,104 @@ +/*- + * Copyright (c) 1991 Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)proc.h 7.1 (Berkeley) 5/15/91 + * $FreeBSD: releng/11.0/sys/amd64/include/proc.h 299788 2016-05-14 23:35:11Z kib $ + */ + +#ifndef _MACHINE_PROC_H_ +#define _MACHINE_PROC_H_ + +#include <sys/queue.h> +#include <machine/segments.h> + +/* + * List of locks + * k - only accessed by curthread + * pp - pmap.c:invl_gen_mtx + */ + +struct proc_ldt { + caddr_t ldt_base; + int ldt_refcnt; +}; + +struct pmap_invl_gen { + u_long gen; /* (k) */ + LIST_ENTRY(pmap_invl_gen) link; /* (pp) */ +}; + +/* + * Machine-dependent part of the proc structure for AMD64. + */ +struct mdthread { + int md_spinlock_count; /* (k) */ + register_t md_saved_flags; /* (k) */ + register_t md_spurflt_addr; /* (k) Spurious page fault address. */ + struct pmap_invl_gen md_invl_gen; +}; + +struct mdproc { + struct proc_ldt *md_ldt; /* (t) per-process ldt */ + struct system_segment_descriptor md_ldt_sd; +}; + +#define KINFO_PROC_SIZE 1088 +#define KINFO_PROC32_SIZE 768 + +#ifdef _KERNEL + +/* Get the current kernel thread stack usage. */ +#define GET_STACK_USAGE(total, used) do { \ + struct thread *td = curthread; \ + (total) = td->td_kstack_pages * PAGE_SIZE; \ + (used) = (char *)td->td_kstack + \ + td->td_kstack_pages * PAGE_SIZE - \ + (char *)&td; \ +} while (0) + +void set_user_ldt(struct mdproc *); +struct proc_ldt *user_ldt_alloc(struct proc *, int); +void user_ldt_free(struct thread *); +void user_ldt_deref(struct proc_ldt *); +struct sysarch_args; +int sysarch_ldt(struct thread *td, struct sysarch_args *uap, int uap_space); +int amd64_set_ldt_data(struct thread *td, int start, int num, + struct user_segment_descriptor *descs); + +extern struct mtx dt_lock; +extern int max_ldt_segment; + +struct syscall_args { + u_int code; + struct sysent *callp; + register_t args[8]; + int narg; +}; +#endif /* _KERNEL */ + +#endif /* !_MACHINE_PROC_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/profile.h /usr/src/sys/modules/netmap/machine/profile.h --- usr/src/sys/modules/netmap/machine/profile.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/profile.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,201 @@ +/*- + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)profile.h 8.1 (Berkeley) 6/11/93 + * $FreeBSD: releng/11.0/sys/amd64/include/profile.h 214346 2010-10-25 15:28:03Z jhb $ + */ + +#ifndef _MACHINE_PROFILE_H_ +#define _MACHINE_PROFILE_H_ + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +#ifdef _KERNEL + +/* + * Config generates something to tell the compiler to align functions on 16 + * byte boundaries. A strict alignment is good for keeping the tables small. + */ +#define FUNCTION_ALIGNMENT 16 + +/* + * The kernel uses assembler stubs instead of unportable inlines. + * This is mainly to save a little time when profiling is not enabled, + * which is the usual case for the kernel. + */ +#define _MCOUNT_DECL void mcount +#define MCOUNT + +#ifdef GUPROF +#define MCOUNT_DECL(s) +#define MCOUNT_ENTER(s) +#define MCOUNT_EXIT(s) +#ifdef __GNUCLIKE_ASM +#define MCOUNT_OVERHEAD(label) \ + __asm __volatile("pushq %0; call __mcount; popq %%rcx" \ + : \ + : "i" (label) \ + : "ax", "dx", "cx", "di", "si", "r8", "r9", "memory") +#define MEXITCOUNT_OVERHEAD() \ + __asm __volatile("call .mexitcount; 1:" \ + : : \ + : "ax", "dx", "cx", "di", "si", "r8", "r9", "memory") +#define MEXITCOUNT_OVERHEAD_GETLABEL(labelp) \ + __asm __volatile("movq $1b,%0" : "=rm" (labelp)) +#elif defined(lint) +#define MCOUNT_OVERHEAD(label) +#define MEXITCOUNT_OVERHEAD() +#define MEXITCOUNT_OVERHEAD_GETLABEL() +#else +#error this file needs to be ported to your compiler +#endif /* !__GNUCLIKE_ASM */ +#else /* !GUPROF */ +#define MCOUNT_DECL(s) register_t s; +#ifdef SMP +extern int mcount_lock; +#define MCOUNT_ENTER(s) { s = intr_disable(); \ + while (!atomic_cmpset_acq_int(&mcount_lock, 0, 1)) \ + /* nothing */ ; } +#define MCOUNT_EXIT(s) { atomic_store_rel_int(&mcount_lock, 0); \ + intr_restore(s); } +#else +#define MCOUNT_ENTER(s) { s = intr_disable(); } +#define MCOUNT_EXIT(s) (intr_restore(s)) +#endif +#endif /* GUPROF */ + +void bintr(void); +void btrap(void); +void eintr(void); +void user(void); + +#define MCOUNT_FROMPC_USER(pc) \ + ((pc < (uintfptr_t)VM_MAXUSER_ADDRESS) ? (uintfptr_t)user : pc) + +#define MCOUNT_FROMPC_INTR(pc) \ + ((pc >= (uintfptr_t)btrap && pc < (uintfptr_t)eintr) ? \ + ((pc >= (uintfptr_t)bintr) ? (uintfptr_t)bintr : \ + (uintfptr_t)btrap) : ~0UL) + +#else /* !_KERNEL */ + +#define FUNCTION_ALIGNMENT 4 + +#define _MCOUNT_DECL \ +static void _mcount(uintfptr_t frompc, uintfptr_t selfpc) __used; \ +static void _mcount + +#ifdef __GNUCLIKE_ASM +#define MCOUNT __asm(" \n\ + .text \n\ + .p2align 4,0x90 \n\ + .globl .mcount \n\ + .type .mcount,@function \n\ +.mcount: \n\ + pushq %rdi \n\ + pushq %rsi \n\ + pushq %rdx \n\ + pushq %rcx \n\ + pushq %r8 \n\ + pushq %r9 \n\ + pushq %rax \n\ + movq 8(%rbp),%rdi \n\ + movq 7*8(%rsp),%rsi \n\ + call _mcount \n\ + popq %rax \n\ + popq %r9 \n\ + popq %r8 \n\ + popq %rcx \n\ + popq %rdx \n\ + popq %rsi \n\ + popq %rdi \n\ + ret \n\ + .size .mcount, . - .mcount"); +#if 0 +/* + * We could use this, except it doesn't preserve the registers that were + * being passed with arguments to the function that we were inserted + * into. I've left it here as documentation of what the code above is + * supposed to do. + */ +#define MCOUNT \ +void \ +mcount() \ +{ \ + uintfptr_t selfpc, frompc; \ + /* \ + * Find the return address for mcount, \ + * and the return address for mcount's caller. \ + * \ + * selfpc = pc pushed by call to mcount \ + */ \ + __asm("movq 8(%%rbp),%0" : "=r" (selfpc)); \ + /* \ + * frompc = pc pushed by call to mcount's caller. \ + * The caller's stack frame has already been built, so %rbp is \ + * the caller's frame pointer. The caller's raddr is in the \ + * caller's frame following the caller's caller's frame pointer.\ + */ \ + __asm("movq (%%rbp),%0" : "=r" (frompc)); \ + frompc = ((uintfptr_t *)frompc)[1]; \ + _mcount(frompc, selfpc); \ +} +#endif +#else /* !__GNUCLIKE_ASM */ +#define MCOUNT +#endif /* __GNUCLIKE_ASM */ + +typedef u_long uintfptr_t; + +#endif /* _KERNEL */ + +/* + * An unsigned integral type that can hold non-negative difference between + * function pointers. + */ +typedef u_long fptrdiff_t; + +#ifdef _KERNEL + +void mcount(uintfptr_t frompc, uintfptr_t selfpc); + +#else /* !_KERNEL */ + +#include <sys/cdefs.h> + +__BEGIN_DECLS +#ifdef __GNUCLIKE_ASM +void mcount(void) __asm(".mcount"); +#endif +__END_DECLS + +#endif /* _KERNEL */ + +#endif /* !_MACHINE_PROFILE_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/psl.h /usr/src/sys/modules/netmap/machine/psl.h --- usr/src/sys/modules/netmap/machine/psl.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/psl.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/psl.h 233204 2012-03-19 21:29:57Z tijl $ */ + +#include <x86/psl.h> diff -u -r -N usr/src/sys/modules/netmap/machine/ptrace.h /usr/src/sys/modules/netmap/machine/ptrace.h --- usr/src/sys/modules/netmap/machine/ptrace.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/ptrace.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/ptrace.h 232520 2012-03-04 20:24:28Z tijl $ */ + +#include <x86/ptrace.h> diff -u -r -N usr/src/sys/modules/netmap/machine/pvclock.h /usr/src/sys/modules/netmap/machine/pvclock.h --- usr/src/sys/modules/netmap/machine/pvclock.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/pvclock.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/pvclock.h 278183 2015-02-04 08:26:43Z bryanv $ */ + +#include <x86/pvclock.h> diff -u -r -N usr/src/sys/modules/netmap/machine/reg.h /usr/src/sys/modules/netmap/machine/reg.h --- usr/src/sys/modules/netmap/machine/reg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/reg.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/reg.h 233124 2012-03-18 19:06:38Z tijl $ */ + +#include <x86/reg.h> diff -u -r -N usr/src/sys/modules/netmap/machine/reloc.h /usr/src/sys/modules/netmap/machine/reloc.h --- usr/src/sys/modules/netmap/machine/reloc.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/reloc.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,49 @@ +/*- + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)reloc.h 8.1 (Berkeley) 6/10/93 + * $FreeBSD: releng/11.0/sys/amd64/include/reloc.h 127914 2004-04-05 21:29:41Z imp $ + */ + +#ifndef _I386_MACHINE_RELOC_H_ +#define _I386_MACHINE_RELOC_H_ + +/* Relocation format. */ +struct relocation_info { + int r_address; /* offset in text or data segment */ + unsigned int r_symbolnum : 24, /* ordinal number of add symbol */ + r_pcrel : 1, /* 1 if value should be pc-relative */ + r_length : 2, /* log base 2 of value's width */ + r_extern : 1, /* 1 if need to add symbol to value */ + r_baserel : 1, /* linkage table relative */ + r_jmptable : 1, /* relocate to jump table */ + r_relative : 1, /* load address relative */ + r_copy : 1; /* run time copy */ +}; + +#endif diff -u -r -N usr/src/sys/modules/netmap/machine/resource.h /usr/src/sys/modules/netmap/machine/resource.h --- usr/src/sys/modules/netmap/machine/resource.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/resource.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,47 @@ +/* $FreeBSD: releng/11.0/sys/amd64/include/resource.h 261790 2014-02-12 04:30:37Z jhb $ */ +/*- + * Copyright 1998 Massachusetts Institute of Technology + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose and without fee is hereby + * granted, provided that both the above copyright notice and this + * permission notice appear in all copies, that both the above + * copyright notice and this permission notice appear in all + * supporting documentation, and that the name of M.I.T. not be used + * in advertising or publicity pertaining to distribution of the + * software without specific, written prior permission. M.I.T. makes + * no representations about the suitability of this software for any + * purpose. It is provided "as is" without express or implied + * warranty. + * + * THIS SOFTWARE IS PROVIDED BY M.I.T. ``AS IS''. M.I.T. DISCLAIMS + * ALL EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS SOFTWARE, + * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT + * SHALL M.I.T. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef _MACHINE_RESOURCE_H_ +#define _MACHINE_RESOURCE_H_ 1 + +/* + * Definitions of resource types for Intel Architecture machines + * with support for legacy ISA devices and drivers. + */ + +#define SYS_RES_IRQ 1 /* interrupt lines */ +#define SYS_RES_DRQ 2 /* isa dma lines */ +#define SYS_RES_MEMORY 3 /* i/o memory */ +#define SYS_RES_IOPORT 4 /* i/o ports */ +#ifdef NEW_PCIB +#define PCI_RES_BUS 5 /* PCI bus numbers */ +#endif + +#endif /* !_MACHINE_RESOURCE_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/runq.h /usr/src/sys/modules/netmap/machine/runq.h --- usr/src/sys/modules/netmap/machine/runq.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/runq.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,46 @@ +/*- + * Copyright (c) 2001 Jake Burkholder <jake@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/runq.h 139731 2005-01-05 20:17:21Z imp $ + */ + +#ifndef _MACHINE_RUNQ_H_ +#define _MACHINE_RUNQ_H_ + +#define RQB_LEN (1) /* Number of priority status words. */ +#define RQB_L2BPW (6) /* Log2(sizeof(rqb_word_t) * NBBY)). */ +#define RQB_BPW (1<<RQB_L2BPW) /* Bits in an rqb_word_t. */ + +#define RQB_BIT(pri) (1ul << ((pri) & (RQB_BPW - 1))) +#define RQB_WORD(pri) ((pri) >> RQB_L2BPW) + +#define RQB_FFS(word) (bsfq(word)) + +/* + * Type of run queue status word. + */ +typedef u_int64_t rqb_word_t; + +#endif diff -u -r -N usr/src/sys/modules/netmap/machine/segments.h /usr/src/sys/modules/netmap/machine/segments.h --- usr/src/sys/modules/netmap/machine/segments.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/segments.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,106 @@ +/*- + * Copyright (c) 1989, 1990 William F. Jolitz + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)segments.h 7.1 (Berkeley) 5/9/91 + * $FreeBSD: releng/11.0/sys/amd64/include/segments.h 258660 2013-11-26 19:38:42Z kib $ + */ + +#ifndef _MACHINE_SEGMENTS_H_ +#define _MACHINE_SEGMENTS_H_ + +/* + * AMD64 Segmentation Data Structures and definitions + */ + +#include <x86/segments.h> + +/* + * System segment descriptors (128 bit wide) + */ +struct system_segment_descriptor { + u_int64_t sd_lolimit:16; /* segment extent (lsb) */ + u_int64_t sd_lobase:24; /* segment base address (lsb) */ + u_int64_t sd_type:5; /* segment type */ + u_int64_t sd_dpl:2; /* segment descriptor priority level */ + u_int64_t sd_p:1; /* segment descriptor present */ + u_int64_t sd_hilimit:4; /* segment extent (msb) */ + u_int64_t sd_xx0:3; /* unused */ + u_int64_t sd_gran:1; /* limit granularity (byte/page units)*/ + u_int64_t sd_hibase:40 __packed;/* segment base address (msb) */ + u_int64_t sd_xx1:8; + u_int64_t sd_mbz:5; /* MUST be zero */ + u_int64_t sd_xx2:19; +} __packed; + +/* + * Software definitions are in this convenient format, + * which are translated into inconvenient segment descriptors + * when needed to be used by the 386 hardware + */ + +struct soft_segment_descriptor { + unsigned long ssd_base; /* segment base address */ + unsigned long ssd_limit; /* segment extent */ + unsigned long ssd_type:5; /* segment type */ + unsigned long ssd_dpl:2; /* segment descriptor priority level */ + unsigned long ssd_p:1; /* segment descriptor present */ + unsigned long ssd_long:1; /* long mode (for %cs) */ + unsigned long ssd_def32:1; /* default 32 vs 16 bit size */ + unsigned long ssd_gran:1; /* limit granularity (byte/page units)*/ +} __packed; + +/* + * region descriptors, used to load gdt/idt tables before segments yet exist. + */ +struct region_descriptor { + uint64_t rd_limit:16; /* segment extent */ + uint64_t rd_base:64 __packed; /* base address */ +} __packed; + +#ifdef _KERNEL +extern struct user_segment_descriptor gdt[]; +extern struct soft_segment_descriptor gdt_segs[]; +extern struct gate_descriptor *idt; +extern struct region_descriptor r_gdt, r_idt; + +void lgdt(struct region_descriptor *rdp); +void sdtossd(struct user_segment_descriptor *sdp, + struct soft_segment_descriptor *ssdp); +void ssdtosd(struct soft_segment_descriptor *ssdp, + struct user_segment_descriptor *sdp); +void ssdtosyssd(struct soft_segment_descriptor *ssdp, + struct system_segment_descriptor *sdp); +void update_gdt_gsbase(struct thread *td, uint32_t base); +void update_gdt_fsbase(struct thread *td, uint32_t base); +#endif /* _KERNEL */ + +#endif /* !_MACHINE_SEGMENTS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/setjmp.h /usr/src/sys/modules/netmap/machine/setjmp.h --- usr/src/sys/modules/netmap/machine/setjmp.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/setjmp.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/setjmp.h 232275 2012-02-28 22:17:52Z tijl $ */ + +#include <x86/setjmp.h> diff -u -r -N usr/src/sys/modules/netmap/machine/sf_buf.h /usr/src/sys/modules/netmap/machine/sf_buf.h --- usr/src/sys/modules/netmap/machine/sf_buf.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/sf_buf.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,51 @@ +/*- + * Copyright (c) 2003, 2005 Alan L. Cox <alc@cs.rice.edu> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/sf_buf.h 269577 2014-08-05 09:44:10Z glebius $ + */ + +#ifndef _MACHINE_SF_BUF_H_ +#define _MACHINE_SF_BUF_H_ + +/* + * On this machine, the only purpose for which sf_buf is used is to implement + * an opaque pointer required by the machine-independent parts of the kernel. + * That pointer references the vm_page that is "mapped" by the sf_buf. The + * actual mapping is provided by the direct virtual-to-physical mapping. + */ +static inline vm_offset_t +sf_buf_kva(struct sf_buf *sf) +{ + + return (PHYS_TO_DMAP(VM_PAGE_TO_PHYS((vm_page_t)sf))); +} + +static inline vm_page_t +sf_buf_page(struct sf_buf *sf) +{ + + return ((vm_page_t)sf); +} +#endif /* !_MACHINE_SF_BUF_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/sigframe.h /usr/src/sys/modules/netmap/machine/sigframe.h --- usr/src/sys/modules/netmap/machine/sigframe.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/sigframe.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/sigframe.h 247047 2013-02-20 17:39:52Z kib $ */ + +#include <x86/sigframe.h> diff -u -r -N usr/src/sys/modules/netmap/machine/signal.h /usr/src/sys/modules/netmap/machine/signal.h --- usr/src/sys/modules/netmap/machine/signal.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/signal.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/signal.h 247047 2013-02-20 17:39:52Z kib $ */ + +#include <x86/signal.h> diff -u -r -N usr/src/sys/modules/netmap/machine/smp.h /usr/src/sys/modules/netmap/machine/smp.h --- usr/src/sys/modules/netmap/machine/smp.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/smp.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,46 @@ +/*- + * ---------------------------------------------------------------------------- + * "THE BEER-WARE LICENSE" (Revision 42): + * <phk@FreeBSD.org> wrote this file. As long as you retain this notice you + * can do whatever you want with this stuff. If we meet some day, and you think + * this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp + * ---------------------------------------------------------------------------- + * + * $FreeBSD: releng/11.0/sys/amd64/include/smp.h 291949 2015-12-07 17:41:20Z kib $ + * + */ + +#ifndef _MACHINE_SMP_H_ +#define _MACHINE_SMP_H_ + +#ifdef _KERNEL + +#ifdef SMP + +#ifndef LOCORE + +#include <x86/x86_smp.h> + +extern int pmap_pcid_enabled; +extern int invpcid_works; + +/* global symbols in mpboot.S */ +extern char mptramp_start[]; +extern char mptramp_end[]; +extern u_int32_t mptramp_pagetables; + +/* IPI handlers */ +inthand_t + IDTVEC(invltlb_pcid), /* TLB shootdowns - global, pcid */ + IDTVEC(invltlb_invpcid),/* TLB shootdowns - global, invpcid */ + IDTVEC(justreturn); /* interrupt CPU with minimum overhead */ + +void invltlb_pcid_handler(void); +void invltlb_invpcid_handler(void); +int native_start_all_aps(void); + +#endif /* !LOCORE */ +#endif /* SMP */ + +#endif /* _KERNEL */ +#endif /* _MACHINE_SMP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/specialreg.h /usr/src/sys/modules/netmap/machine/specialreg.h --- usr/src/sys/modules/netmap/machine/specialreg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/specialreg.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/specialreg.h 233207 2012-03-19 21:34:11Z tijl $ */ + +#include <x86/specialreg.h> diff -u -r -N usr/src/sys/modules/netmap/machine/stack.h /usr/src/sys/modules/netmap/machine/stack.h --- usr/src/sys/modules/netmap/machine/stack.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/stack.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/* + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/stack.h 287643 2015-09-11 03:24:07Z markj $ */ + +#include <x86/stack.h> diff -u -r -N usr/src/sys/modules/netmap/machine/stdarg.h /usr/src/sys/modules/netmap/machine/stdarg.h --- usr/src/sys/modules/netmap/machine/stdarg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/stdarg.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/stdarg.h 232276 2012-02-28 22:30:58Z tijl $ */ + +#include <x86/stdarg.h> diff -u -r -N usr/src/sys/modules/netmap/machine/sysarch.h /usr/src/sys/modules/netmap/machine/sysarch.h --- usr/src/sys/modules/netmap/machine/sysarch.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/sysarch.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/sysarch.h 233209 2012-03-19 21:57:31Z tijl $ */ + +#include <x86/sysarch.h> diff -u -r -N usr/src/sys/modules/netmap/machine/timerreg.h /usr/src/sys/modules/netmap/machine/timerreg.h --- usr/src/sys/modules/netmap/machine/timerreg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/timerreg.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,54 @@ +/*- + * Copyright (C) 2005 TAKAHASHI Yoshihiro. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/timerreg.h 177642 2008-03-26 20:09:21Z phk $ + */ + +/* + * The outputs of the three timers are connected as follows: + * + * timer 0 -> irq 0 + * timer 1 -> dma chan 0 (for dram refresh) + * timer 2 -> speaker (via keyboard controller) + * + * Timer 0 is used to call hardclock. + * Timer 2 is used to generate console beeps. + */ + +#ifndef _MACHINE_TIMERREG_H_ +#define _MACHINE_TIMERREG_H_ + +#ifdef _KERNEL + +#include <dev/ic/i8253reg.h> + +#define IO_TIMER1 0x40 /* 8253 Timer #1 */ +#define TIMER_CNTR0 (IO_TIMER1 + TIMER_REG_CNTR0) +#define TIMER_CNTR1 (IO_TIMER1 + TIMER_REG_CNTR1) +#define TIMER_CNTR2 (IO_TIMER1 + TIMER_REG_CNTR2) +#define TIMER_MODE (IO_TIMER1 + TIMER_REG_MODE) + +#endif /* _KERNEL */ + +#endif /* _MACHINE_TIMERREG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/trap.h /usr/src/sys/modules/netmap/machine/trap.h --- usr/src/sys/modules/netmap/machine/trap.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/trap.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/trap.h 232492 2012-03-04 14:12:57Z tijl $ */ + +#include <x86/trap.h> diff -u -r -N usr/src/sys/modules/netmap/machine/tss.h /usr/src/sys/modules/netmap/machine/tss.h --- usr/src/sys/modules/netmap/machine/tss.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/tss.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,70 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)tss.h 5.4 (Berkeley) 1/18/91 + * $FreeBSD: releng/11.0/sys/amd64/include/tss.h 145120 2005-04-15 18:39:31Z peter $ + */ + +#ifndef _MACHINE_TSS_H_ +#define _MACHINE_TSS_H_ 1 + +/* + * amd64 Context Data Type + * + * The alignment is pretty messed up here due to reuse of the original 32 bit + * fields. It might be worth trying to set the tss on a +4 byte offset to + * make the 64 bit fields aligned in practice. + */ +struct amd64tss { + u_int32_t tss_rsvd0; + u_int64_t tss_rsp0 __packed; /* kernel stack pointer ring 0 */ + u_int64_t tss_rsp1 __packed; /* kernel stack pointer ring 1 */ + u_int64_t tss_rsp2 __packed; /* kernel stack pointer ring 2 */ + u_int32_t tss_rsvd1; + u_int32_t tss_rsvd2; + u_int64_t tss_ist1 __packed; /* Interrupt stack table 1 */ + u_int64_t tss_ist2 __packed; /* Interrupt stack table 2 */ + u_int64_t tss_ist3 __packed; /* Interrupt stack table 3 */ + u_int64_t tss_ist4 __packed; /* Interrupt stack table 4 */ + u_int64_t tss_ist5 __packed; /* Interrupt stack table 5 */ + u_int64_t tss_ist6 __packed; /* Interrupt stack table 6 */ + u_int64_t tss_ist7 __packed; /* Interrupt stack table 7 */ + u_int32_t tss_rsvd3; + u_int32_t tss_rsvd4; + u_int16_t tss_rsvd5; + u_int16_t tss_iobase; /* io bitmap offset */ +}; + +#ifdef _KERNEL +extern struct amd64tss common_tss[]; +#endif + +#endif /* _MACHINE_TSS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/ucontext.h /usr/src/sys/modules/netmap/machine/ucontext.h --- usr/src/sys/modules/netmap/machine/ucontext.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/ucontext.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/ucontext.h 247047 2013-02-20 17:39:52Z kib $ */ + +#include <x86/ucontext.h> diff -u -r -N usr/src/sys/modules/netmap/machine/varargs.h /usr/src/sys/modules/netmap/machine/varargs.h --- usr/src/sys/modules/netmap/machine/varargs.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/varargs.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,89 @@ +/*- + * Copyright (c) 2002 David E. O'Brien. All rights reserved. + * Copyright (c) 1990, 1993 + * The Regents of the University of California. All rights reserved. + * (c) UNIX System Laboratories, Inc. + * All or some portions of this file are derived from material licensed + * to the University of California by American Telephone and Telegraph + * Co. or Unix System Laboratories, Inc. and are reproduced herein with + * the permission of UNIX System Laboratories, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)varargs.h 8.2 (Berkeley) 3/22/94 + * $FreeBSD: releng/11.0/sys/amd64/include/varargs.h 143434 2005-03-11 22:16:09Z peter $ + */ + +#ifndef _MACHINE_VARARGS_H_ +#define _MACHINE_VARARGS_H_ + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +#ifdef __GNUCLIKE_BUILTIN_VARARGS + +#include <sys/_types.h> + +#ifndef _VA_LIST_DECLARED +#define _VA_LIST_DECLARED +typedef __va_list va_list; +#endif + +typedef int __builtin_va_alist_t __attribute__((__mode__(__word__))); + +#define va_alist __builtin_va_alist +#define va_dcl __builtin_va_alist_t __builtin_va_alist; ... +#define va_start(ap) __builtin_varargs_start(ap) +#define va_arg(ap, type) __builtin_va_arg((ap), type) +#define va_end(ap) __builtin_va_end(ap) + +#else /* !__GNUCLIKE_BUILTIN_VARARGS */ + +typedef char *va_list; + +#define __va_size(type) \ + (((sizeof(type) + sizeof(int) - 1) / sizeof(int)) * sizeof(int)) + +#if defined(__GNUCLIKE_BUILTIN_VAALIST) +#define va_alist __builtin_va_alist +#endif +#define va_dcl int va_alist; ... + +#define va_start(ap) \ + ((ap) = (va_list)&va_alist) + +#define va_arg(ap, type) \ + (*(type *)((ap) += __va_size(type), (ap) - __va_size(type))) + +#define va_end(ap) + +#endif /* __GNUCLIKE_BUILTIN_VARARGS */ + +#endif /* !_MACHINE_VARARGS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/vdso.h /usr/src/sys/modules/netmap/machine/vdso.h --- usr/src/sys/modules/netmap/machine/vdso.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vdso.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/vdso.h 237433 2012-06-22 07:06:40Z kib $ */ + +#include <x86/vdso.h> diff -u -r -N usr/src/sys/modules/netmap/machine/vm.h /usr/src/sys/modules/netmap/machine/vm.h --- usr/src/sys/modules/netmap/machine/vm.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vm.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,45 @@ +/*- + * Copyright (c) 2009 Hudson River Trading LLC + * Written by: John H. Baldwin <jhb@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vm.h 281887 2015-04-23 14:22:20Z jhb $ + */ + +#ifndef _MACHINE_VM_H_ +#define _MACHINE_VM_H_ + +#include <machine/specialreg.h> + +/* Memory attributes. */ +#define VM_MEMATTR_UNCACHEABLE ((vm_memattr_t)PAT_UNCACHEABLE) +#define VM_MEMATTR_WRITE_COMBINING ((vm_memattr_t)PAT_WRITE_COMBINING) +#define VM_MEMATTR_WRITE_THROUGH ((vm_memattr_t)PAT_WRITE_THROUGH) +#define VM_MEMATTR_WRITE_PROTECTED ((vm_memattr_t)PAT_WRITE_PROTECTED) +#define VM_MEMATTR_WRITE_BACK ((vm_memattr_t)PAT_WRITE_BACK) +#define VM_MEMATTR_WEAK_UNCACHEABLE ((vm_memattr_t)PAT_UNCACHED) + +#define VM_MEMATTR_DEFAULT VM_MEMATTR_WRITE_BACK + +#endif /* !_MACHINE_VM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/vmm.h /usr/src/sys/modules/netmap/machine/vmm.h --- usr/src/sys/modules/netmap/machine/vmm.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vmm.h 2016-11-30 10:56:05.784999000 +0000 @@ -0,0 +1,684 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm.h 299010 2016-05-03 22:13:04Z pfg $ + */ + +#ifndef _VMM_H_ +#define _VMM_H_ + +#include <x86/segments.h> + +enum vm_suspend_how { + VM_SUSPEND_NONE, + VM_SUSPEND_RESET, + VM_SUSPEND_POWEROFF, + VM_SUSPEND_HALT, + VM_SUSPEND_TRIPLEFAULT, + VM_SUSPEND_LAST +}; + +/* + * Identifiers for architecturally defined registers. + */ +enum vm_reg_name { + VM_REG_GUEST_RAX, + VM_REG_GUEST_RBX, + VM_REG_GUEST_RCX, + VM_REG_GUEST_RDX, + VM_REG_GUEST_RSI, + VM_REG_GUEST_RDI, + VM_REG_GUEST_RBP, + VM_REG_GUEST_R8, + VM_REG_GUEST_R9, + VM_REG_GUEST_R10, + VM_REG_GUEST_R11, + VM_REG_GUEST_R12, + VM_REG_GUEST_R13, + VM_REG_GUEST_R14, + VM_REG_GUEST_R15, + VM_REG_GUEST_CR0, + VM_REG_GUEST_CR3, + VM_REG_GUEST_CR4, + VM_REG_GUEST_DR7, + VM_REG_GUEST_RSP, + VM_REG_GUEST_RIP, + VM_REG_GUEST_RFLAGS, + VM_REG_GUEST_ES, + VM_REG_GUEST_CS, + VM_REG_GUEST_SS, + VM_REG_GUEST_DS, + VM_REG_GUEST_FS, + VM_REG_GUEST_GS, + VM_REG_GUEST_LDTR, + VM_REG_GUEST_TR, + VM_REG_GUEST_IDTR, + VM_REG_GUEST_GDTR, + VM_REG_GUEST_EFER, + VM_REG_GUEST_CR2, + VM_REG_GUEST_PDPTE0, + VM_REG_GUEST_PDPTE1, + VM_REG_GUEST_PDPTE2, + VM_REG_GUEST_PDPTE3, + VM_REG_GUEST_INTR_SHADOW, + VM_REG_LAST +}; + +enum x2apic_state { + X2APIC_DISABLED, + X2APIC_ENABLED, + X2APIC_STATE_LAST +}; + +#define VM_INTINFO_VECTOR(info) ((info) & 0xff) +#define VM_INTINFO_DEL_ERRCODE 0x800 +#define VM_INTINFO_RSVD 0x7ffff000 +#define VM_INTINFO_VALID 0x80000000 +#define VM_INTINFO_TYPE 0x700 +#define VM_INTINFO_HWINTR (0 << 8) +#define VM_INTINFO_NMI (2 << 8) +#define VM_INTINFO_HWEXCEPTION (3 << 8) +#define VM_INTINFO_SWINTR (4 << 8) + +#ifdef _KERNEL + +#define VM_MAX_NAMELEN 32 + +struct vm; +struct vm_exception; +struct seg_desc; +struct vm_exit; +struct vm_run; +struct vhpet; +struct vioapic; +struct vlapic; +struct vmspace; +struct vm_object; +struct vm_guest_paging; +struct pmap; + +struct vm_eventinfo { + void *rptr; /* rendezvous cookie */ + int *sptr; /* suspend cookie */ + int *iptr; /* reqidle cookie */ +}; + +typedef int (*vmm_init_func_t)(int ipinum); +typedef int (*vmm_cleanup_func_t)(void); +typedef void (*vmm_resume_func_t)(void); +typedef void * (*vmi_init_func_t)(struct vm *vm, struct pmap *pmap); +typedef int (*vmi_run_func_t)(void *vmi, int vcpu, register_t rip, + struct pmap *pmap, struct vm_eventinfo *info); +typedef void (*vmi_cleanup_func_t)(void *vmi); +typedef int (*vmi_get_register_t)(void *vmi, int vcpu, int num, + uint64_t *retval); +typedef int (*vmi_set_register_t)(void *vmi, int vcpu, int num, + uint64_t val); +typedef int (*vmi_get_desc_t)(void *vmi, int vcpu, int num, + struct seg_desc *desc); +typedef int (*vmi_set_desc_t)(void *vmi, int vcpu, int num, + struct seg_desc *desc); +typedef int (*vmi_get_cap_t)(void *vmi, int vcpu, int num, int *retval); +typedef int (*vmi_set_cap_t)(void *vmi, int vcpu, int num, int val); +typedef struct vmspace * (*vmi_vmspace_alloc)(vm_offset_t min, vm_offset_t max); +typedef void (*vmi_vmspace_free)(struct vmspace *vmspace); +typedef struct vlapic * (*vmi_vlapic_init)(void *vmi, int vcpu); +typedef void (*vmi_vlapic_cleanup)(void *vmi, struct vlapic *vlapic); + +struct vmm_ops { + vmm_init_func_t init; /* module wide initialization */ + vmm_cleanup_func_t cleanup; + vmm_resume_func_t resume; + + vmi_init_func_t vminit; /* vm-specific initialization */ + vmi_run_func_t vmrun; + vmi_cleanup_func_t vmcleanup; + vmi_get_register_t vmgetreg; + vmi_set_register_t vmsetreg; + vmi_get_desc_t vmgetdesc; + vmi_set_desc_t vmsetdesc; + vmi_get_cap_t vmgetcap; + vmi_set_cap_t vmsetcap; + vmi_vmspace_alloc vmspace_alloc; + vmi_vmspace_free vmspace_free; + vmi_vlapic_init vlapic_init; + vmi_vlapic_cleanup vlapic_cleanup; +}; + +extern struct vmm_ops vmm_ops_intel; +extern struct vmm_ops vmm_ops_amd; + +int vm_create(const char *name, struct vm **retvm); +void vm_destroy(struct vm *vm); +int vm_reinit(struct vm *vm); +const char *vm_name(struct vm *vm); + +/* + * APIs that modify the guest memory map require all vcpus to be frozen. + */ +int vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t off, + size_t len, int prot, int flags); +int vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem); +void vm_free_memseg(struct vm *vm, int ident); +int vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa); +int vm_map_usermem(struct vm *vm, vm_paddr_t gpa, size_t len, void *buf, struct thread *td); +int vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len); +int vm_assign_pptdev(struct vm *vm, int bus, int slot, int func); +int vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func); + +/* + * APIs that inspect the guest memory map require only a *single* vcpu to + * be frozen. This acts like a read lock on the guest memory map since any + * modification requires *all* vcpus to be frozen. + */ +int vm_mmap_getnext(struct vm *vm, vm_paddr_t *gpa, int *segid, + vm_ooffset_t *segoff, size_t *len, int *prot, int *flags); +int vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem, + struct vm_object **objptr); +void *vm_gpa_hold(struct vm *, int vcpuid, vm_paddr_t gpa, size_t len, + int prot, void **cookie); +void vm_gpa_release(void *cookie); +bool vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa); + +int vm_get_register(struct vm *vm, int vcpu, int reg, uint64_t *retval); +int vm_set_register(struct vm *vm, int vcpu, int reg, uint64_t val); +int vm_get_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *ret_desc); +int vm_set_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *desc); +int vm_run(struct vm *vm, struct vm_run *vmrun); +int vm_suspend(struct vm *vm, enum vm_suspend_how how); +int vm_inject_nmi(struct vm *vm, int vcpu); +int vm_nmi_pending(struct vm *vm, int vcpuid); +void vm_nmi_clear(struct vm *vm, int vcpuid); +int vm_inject_extint(struct vm *vm, int vcpu); +int vm_extint_pending(struct vm *vm, int vcpuid); +void vm_extint_clear(struct vm *vm, int vcpuid); +struct vlapic *vm_lapic(struct vm *vm, int cpu); +struct vioapic *vm_ioapic(struct vm *vm); +struct vhpet *vm_hpet(struct vm *vm); +int vm_get_capability(struct vm *vm, int vcpu, int type, int *val); +int vm_set_capability(struct vm *vm, int vcpu, int type, int val); +int vm_get_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state *state); +int vm_set_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state state); +int vm_apicid2vcpuid(struct vm *vm, int apicid); +int vm_activate_cpu(struct vm *vm, int vcpu); +struct vm_exit *vm_exitinfo(struct vm *vm, int vcpuid); +void vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip); + +#ifdef _SYS__CPUSET_H_ +/* + * Rendezvous all vcpus specified in 'dest' and execute 'func(arg)'. + * The rendezvous 'func(arg)' is not allowed to do anything that will + * cause the thread to be put to sleep. + * + * If the rendezvous is being initiated from a vcpu context then the + * 'vcpuid' must refer to that vcpu, otherwise it should be set to -1. + * + * The caller cannot hold any locks when initiating the rendezvous. + * + * The implementation of this API may cause vcpus other than those specified + * by 'dest' to be stalled. The caller should not rely on any vcpus making + * forward progress when the rendezvous is in progress. + */ +typedef void (*vm_rendezvous_func_t)(struct vm *vm, int vcpuid, void *arg); +void vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest, + vm_rendezvous_func_t func, void *arg); +cpuset_t vm_active_cpus(struct vm *vm); +cpuset_t vm_suspended_cpus(struct vm *vm); +#endif /* _SYS__CPUSET_H_ */ + +static __inline int +vcpu_rendezvous_pending(struct vm_eventinfo *info) +{ + + return (*((uintptr_t *)(info->rptr)) != 0); +} + +static __inline int +vcpu_suspended(struct vm_eventinfo *info) +{ + + return (*info->sptr); +} + +static __inline int +vcpu_reqidle(struct vm_eventinfo *info) +{ + + return (*info->iptr); +} + +/* + * Return 1 if device indicated by bus/slot/func is supposed to be a + * pci passthrough device. + * + * Return 0 otherwise. + */ +int vmm_is_pptdev(int bus, int slot, int func); + +void *vm_iommu_domain(struct vm *vm); + +enum vcpu_state { + VCPU_IDLE, + VCPU_FROZEN, + VCPU_RUNNING, + VCPU_SLEEPING, +}; + +int vcpu_set_state(struct vm *vm, int vcpu, enum vcpu_state state, + bool from_idle); +enum vcpu_state vcpu_get_state(struct vm *vm, int vcpu, int *hostcpu); + +static int __inline +vcpu_is_running(struct vm *vm, int vcpu, int *hostcpu) +{ + return (vcpu_get_state(vm, vcpu, hostcpu) == VCPU_RUNNING); +} + +#ifdef _SYS_PROC_H_ +static int __inline +vcpu_should_yield(struct vm *vm, int vcpu) +{ + + if (curthread->td_flags & (TDF_ASTPENDING | TDF_NEEDRESCHED)) + return (1); + else if (curthread->td_owepreempt) + return (1); + else + return (0); +} +#endif + +void *vcpu_stats(struct vm *vm, int vcpu); +void vcpu_notify_event(struct vm *vm, int vcpuid, bool lapic_intr); +struct vmspace *vm_get_vmspace(struct vm *vm); +struct vatpic *vm_atpic(struct vm *vm); +struct vatpit *vm_atpit(struct vm *vm); +struct vpmtmr *vm_pmtmr(struct vm *vm); +struct vrtc *vm_rtc(struct vm *vm); +struct ioregh *vm_ioregh(struct vm *vm); + +/* + * Inject exception 'vector' into the guest vcpu. This function returns 0 on + * success and non-zero on failure. + * + * Wrapper functions like 'vm_inject_gp()' should be preferred to calling + * this function directly because they enforce the trap-like or fault-like + * behavior of an exception. + * + * This function should only be called in the context of the thread that is + * executing this vcpu. + */ +int vm_inject_exception(struct vm *vm, int vcpuid, int vector, int err_valid, + uint32_t errcode, int restart_instruction); + +/* + * This function is called after a VM-exit that occurred during exception or + * interrupt delivery through the IDT. The format of 'intinfo' is described + * in Figure 15-1, "EXITINTINFO for All Intercepts", APM, Vol 2. + * + * If a VM-exit handler completes the event delivery successfully then it + * should call vm_exit_intinfo() to extinguish the pending event. For e.g., + * if the task switch emulation is triggered via a task gate then it should + * call this function with 'intinfo=0' to indicate that the external event + * is not pending anymore. + * + * Return value is 0 on success and non-zero on failure. + */ +int vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_t intinfo); + +/* + * This function is called before every VM-entry to retrieve a pending + * event that should be injected into the guest. This function combines + * nested events into a double or triple fault. + * + * Returns 0 if there are no events that need to be injected into the guest + * and non-zero otherwise. + */ +int vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *info); + +int vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2); + +enum vm_reg_name vm_segment_name(int seg_encoding); + +struct vm_copyinfo { + uint64_t gpa; + size_t len; + void *hva; + void *cookie; +}; + +/* + * Set up 'copyinfo[]' to copy to/from guest linear address space starting + * at 'gla' and 'len' bytes long. The 'prot' should be set to PROT_READ for + * a copyin or PROT_WRITE for a copyout. + * + * retval is_fault Interpretation + * 0 0 Success + * 0 1 An exception was injected into the guest + * EFAULT N/A Unrecoverable error + * + * The 'copyinfo[]' can be passed to 'vm_copyin()' or 'vm_copyout()' only if + * the return value is 0. The 'copyinfo[]' resources should be freed by calling + * 'vm_copy_teardown()' after the copy is done. + */ +int vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, + uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo, + int num_copyinfo, int *is_fault); +void vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + int num_copyinfo); +void vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + void *kaddr, size_t len); +void vm_copyout(struct vm *vm, int vcpuid, const void *kaddr, + struct vm_copyinfo *copyinfo, size_t len); + +int vcpu_trace_exceptions(struct vm *vm, int vcpuid); +#endif /* KERNEL */ + +#define VM_MAXCPU 16 /* maximum virtual cpus */ + +/* + * Identifiers for optional vmm capabilities + */ +enum vm_cap_type { + VM_CAP_HALT_EXIT, + VM_CAP_MTRAP_EXIT, + VM_CAP_PAUSE_EXIT, + VM_CAP_UNRESTRICTED_GUEST, + VM_CAP_ENABLE_INVPCID, + VM_CAP_MAX +}; + +enum vm_intr_trigger { + EDGE_TRIGGER, + LEVEL_TRIGGER +}; + +/* Operations supported on VM_IO_REG_HANDLER ioctl. */ +enum vm_io_regh_type { + VM_IO_REGH_DELETE, + VM_IO_REGH_KWEVENTS, /* kernel wait events */ + VM_IO_REGH_MAX +}; + +/* + * The 'access' field has the format specified in Table 21-2 of the Intel + * Architecture Manual vol 3b. + * + * XXX The contents of the 'access' field are architecturally defined except + * bit 16 - Segment Unusable. + */ +struct seg_desc { + uint64_t base; + uint32_t limit; + uint32_t access; +}; +#define SEG_DESC_TYPE(access) ((access) & 0x001f) +#define SEG_DESC_DPL(access) (((access) >> 5) & 0x3) +#define SEG_DESC_PRESENT(access) (((access) & 0x0080) ? 1 : 0) +#define SEG_DESC_DEF32(access) (((access) & 0x4000) ? 1 : 0) +#define SEG_DESC_GRANULARITY(access) (((access) & 0x8000) ? 1 : 0) +#define SEG_DESC_UNUSABLE(access) (((access) & 0x10000) ? 1 : 0) + +enum vm_cpu_mode { + CPU_MODE_REAL, + CPU_MODE_PROTECTED, + CPU_MODE_COMPATIBILITY, /* IA-32E mode (CS.L = 0) */ + CPU_MODE_64BIT, /* IA-32E mode (CS.L = 1) */ +}; + +enum vm_paging_mode { + PAGING_MODE_FLAT, + PAGING_MODE_32, + PAGING_MODE_PAE, + PAGING_MODE_64, +}; + +struct vm_guest_paging { + uint64_t cr3; + int cpl; + enum vm_cpu_mode cpu_mode; + enum vm_paging_mode paging_mode; +}; + +/* + * The data structures 'vie' and 'vie_op' are meant to be opaque to the + * consumers of instruction decoding. The only reason why their contents + * need to be exposed is because they are part of the 'vm_exit' structure. + */ +struct vie_op { + uint8_t op_byte; /* actual opcode byte */ + uint8_t op_type; /* type of operation (e.g. MOV) */ + uint16_t op_flags; +}; + +#define VIE_INST_SIZE 15 +struct vie { + uint8_t inst[VIE_INST_SIZE]; /* instruction bytes */ + uint8_t num_valid; /* size of the instruction */ + uint8_t num_processed; + + uint8_t addrsize:4, opsize:4; /* address and operand sizes */ + uint8_t rex_w:1, /* REX prefix */ + rex_r:1, + rex_x:1, + rex_b:1, + rex_present:1, + repz_present:1, /* REP/REPE/REPZ prefix */ + repnz_present:1, /* REPNE/REPNZ prefix */ + opsize_override:1, /* Operand size override */ + addrsize_override:1, /* Address size override */ + segment_override:1; /* Segment override */ + + uint8_t mod:2, /* ModRM byte */ + reg:4, + rm:4; + + uint8_t ss:2, /* SIB byte */ + index:4, + base:4; + + uint8_t disp_bytes; + uint8_t imm_bytes; + + uint8_t scale; + int base_register; /* VM_REG_GUEST_xyz */ + int index_register; /* VM_REG_GUEST_xyz */ + int segment_register; /* VM_REG_GUEST_xyz */ + + int64_t displacement; /* optional addr displacement */ + int64_t immediate; /* optional immediate operand */ + + uint8_t decoded; /* set to 1 if successfully decoded */ + + struct vie_op op; /* opcode description */ +}; + +enum vm_exitcode { + VM_EXITCODE_INOUT, + VM_EXITCODE_VMX, + VM_EXITCODE_BOGUS, + VM_EXITCODE_RDMSR, + VM_EXITCODE_WRMSR, + VM_EXITCODE_HLT, + VM_EXITCODE_MTRAP, + VM_EXITCODE_PAUSE, + VM_EXITCODE_PAGING, + VM_EXITCODE_INST_EMUL, + VM_EXITCODE_SPINUP_AP, + VM_EXITCODE_DEPRECATED1, /* used to be SPINDOWN_CPU */ + VM_EXITCODE_RENDEZVOUS, + VM_EXITCODE_IOAPIC_EOI, + VM_EXITCODE_SUSPENDED, + VM_EXITCODE_INOUT_STR, + VM_EXITCODE_TASK_SWITCH, + VM_EXITCODE_MONITOR, + VM_EXITCODE_MWAIT, + VM_EXITCODE_SVM, + VM_EXITCODE_REQIDLE, + VM_EXITCODE_MAX +}; + +struct vm_inout { + uint16_t bytes:3; /* 1 or 2 or 4 */ + uint16_t in:1; + uint16_t string:1; + uint16_t rep:1; + uint16_t port; + uint32_t eax; /* valid for out */ +}; + +struct vm_inout_str { + struct vm_inout inout; /* must be the first element */ + struct vm_guest_paging paging; + uint64_t rflags; + uint64_t cr0; + uint64_t index; + uint64_t count; /* rep=1 (%rcx), rep=0 (1) */ + int addrsize; + enum vm_reg_name seg_name; + struct seg_desc seg_desc; +}; + +enum task_switch_reason { + TSR_CALL, + TSR_IRET, + TSR_JMP, + TSR_IDT_GATE, /* task gate in IDT */ +}; + +struct vm_task_switch { + uint16_t tsssel; /* new TSS selector */ + int ext; /* task switch due to external event */ + uint32_t errcode; + int errcode_valid; /* push 'errcode' on the new stack */ + enum task_switch_reason reason; + struct vm_guest_paging paging; +}; + +struct vm_exit { + enum vm_exitcode exitcode; + int inst_length; /* 0 means unknown */ + uint64_t rip; + union { + struct vm_inout inout; + struct vm_inout_str inout_str; + struct { + uint64_t gpa; + int fault_type; + } paging; + struct { + uint64_t gpa; + uint64_t gla; + uint64_t cs_base; + int cs_d; /* CS.D */ + struct vm_guest_paging paging; + struct vie vie; + } inst_emul; + /* + * VMX specific payload. Used when there is no "better" + * exitcode to represent the VM-exit. + */ + struct { + int status; /* vmx inst status */ + /* + * 'exit_reason' and 'exit_qualification' are valid + * only if 'status' is zero. + */ + uint32_t exit_reason; + uint64_t exit_qualification; + /* + * 'inst_error' and 'inst_type' are valid + * only if 'status' is non-zero. + */ + int inst_type; + int inst_error; + } vmx; + /* + * SVM specific payload. + */ + struct { + uint64_t exitcode; + uint64_t exitinfo1; + uint64_t exitinfo2; + } svm; + struct { + uint32_t code; /* ecx value */ + uint64_t wval; + } msr; + struct { + int vcpu; + uint64_t rip; + } spinup_ap; + struct { + uint64_t rflags; + } hlt; + struct { + int vector; + } ioapic_eoi; + struct { + enum vm_suspend_how how; + } suspended; + struct vm_task_switch task_switch; + } u; +}; + +/* APIs to inject faults into the guest */ +void vm_inject_fault(void *vm, int vcpuid, int vector, int errcode_valid, + int errcode); + +static __inline void +vm_inject_ud(void *vm, int vcpuid) +{ + vm_inject_fault(vm, vcpuid, IDT_UD, 0, 0); +} + +static __inline void +vm_inject_gp(void *vm, int vcpuid) +{ + vm_inject_fault(vm, vcpuid, IDT_GP, 1, 0); +} + +static __inline void +vm_inject_ac(void *vm, int vcpuid, int errcode) +{ + vm_inject_fault(vm, vcpuid, IDT_AC, 1, errcode); +} + +static __inline void +vm_inject_ss(void *vm, int vcpuid, int errcode) +{ + vm_inject_fault(vm, vcpuid, IDT_SS, 1, errcode); +} + +void vm_inject_pf(void *vm, int vcpuid, int error_code, uint64_t cr2); + +int vm_restart_instruction(void *vm, int vcpuid); + +#endif /* _VMM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/vmm.h.orig /usr/src/sys/modules/netmap/machine/vmm.h.orig --- usr/src/sys/modules/netmap/machine/vmm.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vmm.h.orig 2016-11-30 10:52:53.822959000 +0000 @@ -0,0 +1,675 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm.h 299010 2016-05-03 22:13:04Z pfg $ + */ + +#ifndef _VMM_H_ +#define _VMM_H_ + +#include <x86/segments.h> + +enum vm_suspend_how { + VM_SUSPEND_NONE, + VM_SUSPEND_RESET, + VM_SUSPEND_POWEROFF, + VM_SUSPEND_HALT, + VM_SUSPEND_TRIPLEFAULT, + VM_SUSPEND_LAST +}; + +/* + * Identifiers for architecturally defined registers. + */ +enum vm_reg_name { + VM_REG_GUEST_RAX, + VM_REG_GUEST_RBX, + VM_REG_GUEST_RCX, + VM_REG_GUEST_RDX, + VM_REG_GUEST_RSI, + VM_REG_GUEST_RDI, + VM_REG_GUEST_RBP, + VM_REG_GUEST_R8, + VM_REG_GUEST_R9, + VM_REG_GUEST_R10, + VM_REG_GUEST_R11, + VM_REG_GUEST_R12, + VM_REG_GUEST_R13, + VM_REG_GUEST_R14, + VM_REG_GUEST_R15, + VM_REG_GUEST_CR0, + VM_REG_GUEST_CR3, + VM_REG_GUEST_CR4, + VM_REG_GUEST_DR7, + VM_REG_GUEST_RSP, + VM_REG_GUEST_RIP, + VM_REG_GUEST_RFLAGS, + VM_REG_GUEST_ES, + VM_REG_GUEST_CS, + VM_REG_GUEST_SS, + VM_REG_GUEST_DS, + VM_REG_GUEST_FS, + VM_REG_GUEST_GS, + VM_REG_GUEST_LDTR, + VM_REG_GUEST_TR, + VM_REG_GUEST_IDTR, + VM_REG_GUEST_GDTR, + VM_REG_GUEST_EFER, + VM_REG_GUEST_CR2, + VM_REG_GUEST_PDPTE0, + VM_REG_GUEST_PDPTE1, + VM_REG_GUEST_PDPTE2, + VM_REG_GUEST_PDPTE3, + VM_REG_GUEST_INTR_SHADOW, + VM_REG_LAST +}; + +enum x2apic_state { + X2APIC_DISABLED, + X2APIC_ENABLED, + X2APIC_STATE_LAST +}; + +#define VM_INTINFO_VECTOR(info) ((info) & 0xff) +#define VM_INTINFO_DEL_ERRCODE 0x800 +#define VM_INTINFO_RSVD 0x7ffff000 +#define VM_INTINFO_VALID 0x80000000 +#define VM_INTINFO_TYPE 0x700 +#define VM_INTINFO_HWINTR (0 << 8) +#define VM_INTINFO_NMI (2 << 8) +#define VM_INTINFO_HWEXCEPTION (3 << 8) +#define VM_INTINFO_SWINTR (4 << 8) + +#ifdef _KERNEL + +#define VM_MAX_NAMELEN 32 + +struct vm; +struct vm_exception; +struct seg_desc; +struct vm_exit; +struct vm_run; +struct vhpet; +struct vioapic; +struct vlapic; +struct vmspace; +struct vm_object; +struct vm_guest_paging; +struct pmap; + +struct vm_eventinfo { + void *rptr; /* rendezvous cookie */ + int *sptr; /* suspend cookie */ + int *iptr; /* reqidle cookie */ +}; + +typedef int (*vmm_init_func_t)(int ipinum); +typedef int (*vmm_cleanup_func_t)(void); +typedef void (*vmm_resume_func_t)(void); +typedef void * (*vmi_init_func_t)(struct vm *vm, struct pmap *pmap); +typedef int (*vmi_run_func_t)(void *vmi, int vcpu, register_t rip, + struct pmap *pmap, struct vm_eventinfo *info); +typedef void (*vmi_cleanup_func_t)(void *vmi); +typedef int (*vmi_get_register_t)(void *vmi, int vcpu, int num, + uint64_t *retval); +typedef int (*vmi_set_register_t)(void *vmi, int vcpu, int num, + uint64_t val); +typedef int (*vmi_get_desc_t)(void *vmi, int vcpu, int num, + struct seg_desc *desc); +typedef int (*vmi_set_desc_t)(void *vmi, int vcpu, int num, + struct seg_desc *desc); +typedef int (*vmi_get_cap_t)(void *vmi, int vcpu, int num, int *retval); +typedef int (*vmi_set_cap_t)(void *vmi, int vcpu, int num, int val); +typedef struct vmspace * (*vmi_vmspace_alloc)(vm_offset_t min, vm_offset_t max); +typedef void (*vmi_vmspace_free)(struct vmspace *vmspace); +typedef struct vlapic * (*vmi_vlapic_init)(void *vmi, int vcpu); +typedef void (*vmi_vlapic_cleanup)(void *vmi, struct vlapic *vlapic); + +struct vmm_ops { + vmm_init_func_t init; /* module wide initialization */ + vmm_cleanup_func_t cleanup; + vmm_resume_func_t resume; + + vmi_init_func_t vminit; /* vm-specific initialization */ + vmi_run_func_t vmrun; + vmi_cleanup_func_t vmcleanup; + vmi_get_register_t vmgetreg; + vmi_set_register_t vmsetreg; + vmi_get_desc_t vmgetdesc; + vmi_set_desc_t vmsetdesc; + vmi_get_cap_t vmgetcap; + vmi_set_cap_t vmsetcap; + vmi_vmspace_alloc vmspace_alloc; + vmi_vmspace_free vmspace_free; + vmi_vlapic_init vlapic_init; + vmi_vlapic_cleanup vlapic_cleanup; +}; + +extern struct vmm_ops vmm_ops_intel; +extern struct vmm_ops vmm_ops_amd; + +int vm_create(const char *name, struct vm **retvm); +void vm_destroy(struct vm *vm); +int vm_reinit(struct vm *vm); +const char *vm_name(struct vm *vm); + +/* + * APIs that modify the guest memory map require all vcpus to be frozen. + */ +int vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t off, + size_t len, int prot, int flags); +int vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem); +void vm_free_memseg(struct vm *vm, int ident); +int vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa); +int vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len); +int vm_assign_pptdev(struct vm *vm, int bus, int slot, int func); +int vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func); + +/* + * APIs that inspect the guest memory map require only a *single* vcpu to + * be frozen. This acts like a read lock on the guest memory map since any + * modification requires *all* vcpus to be frozen. + */ +int vm_mmap_getnext(struct vm *vm, vm_paddr_t *gpa, int *segid, + vm_ooffset_t *segoff, size_t *len, int *prot, int *flags); +int vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem, + struct vm_object **objptr); +void *vm_gpa_hold(struct vm *, int vcpuid, vm_paddr_t gpa, size_t len, + int prot, void **cookie); +void vm_gpa_release(void *cookie); +bool vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa); + +int vm_get_register(struct vm *vm, int vcpu, int reg, uint64_t *retval); +int vm_set_register(struct vm *vm, int vcpu, int reg, uint64_t val); +int vm_get_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *ret_desc); +int vm_set_seg_desc(struct vm *vm, int vcpu, int reg, + struct seg_desc *desc); +int vm_run(struct vm *vm, struct vm_run *vmrun); +int vm_suspend(struct vm *vm, enum vm_suspend_how how); +int vm_inject_nmi(struct vm *vm, int vcpu); +int vm_nmi_pending(struct vm *vm, int vcpuid); +void vm_nmi_clear(struct vm *vm, int vcpuid); +int vm_inject_extint(struct vm *vm, int vcpu); +int vm_extint_pending(struct vm *vm, int vcpuid); +void vm_extint_clear(struct vm *vm, int vcpuid); +struct vlapic *vm_lapic(struct vm *vm, int cpu); +struct vioapic *vm_ioapic(struct vm *vm); +struct vhpet *vm_hpet(struct vm *vm); +int vm_get_capability(struct vm *vm, int vcpu, int type, int *val); +int vm_set_capability(struct vm *vm, int vcpu, int type, int val); +int vm_get_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state *state); +int vm_set_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state state); +int vm_apicid2vcpuid(struct vm *vm, int apicid); +int vm_activate_cpu(struct vm *vm, int vcpu); +struct vm_exit *vm_exitinfo(struct vm *vm, int vcpuid); +void vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip); +void vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip); + +#ifdef _SYS__CPUSET_H_ +/* + * Rendezvous all vcpus specified in 'dest' and execute 'func(arg)'. + * The rendezvous 'func(arg)' is not allowed to do anything that will + * cause the thread to be put to sleep. + * + * If the rendezvous is being initiated from a vcpu context then the + * 'vcpuid' must refer to that vcpu, otherwise it should be set to -1. + * + * The caller cannot hold any locks when initiating the rendezvous. + * + * The implementation of this API may cause vcpus other than those specified + * by 'dest' to be stalled. The caller should not rely on any vcpus making + * forward progress when the rendezvous is in progress. + */ +typedef void (*vm_rendezvous_func_t)(struct vm *vm, int vcpuid, void *arg); +void vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest, + vm_rendezvous_func_t func, void *arg); +cpuset_t vm_active_cpus(struct vm *vm); +cpuset_t vm_suspended_cpus(struct vm *vm); +#endif /* _SYS__CPUSET_H_ */ + +static __inline int +vcpu_rendezvous_pending(struct vm_eventinfo *info) +{ + + return (*((uintptr_t *)(info->rptr)) != 0); +} + +static __inline int +vcpu_suspended(struct vm_eventinfo *info) +{ + + return (*info->sptr); +} + +static __inline int +vcpu_reqidle(struct vm_eventinfo *info) +{ + + return (*info->iptr); +} + +/* + * Return 1 if device indicated by bus/slot/func is supposed to be a + * pci passthrough device. + * + * Return 0 otherwise. + */ +int vmm_is_pptdev(int bus, int slot, int func); + +void *vm_iommu_domain(struct vm *vm); + +enum vcpu_state { + VCPU_IDLE, + VCPU_FROZEN, + VCPU_RUNNING, + VCPU_SLEEPING, +}; + +int vcpu_set_state(struct vm *vm, int vcpu, enum vcpu_state state, + bool from_idle); +enum vcpu_state vcpu_get_state(struct vm *vm, int vcpu, int *hostcpu); + +static int __inline +vcpu_is_running(struct vm *vm, int vcpu, int *hostcpu) +{ + return (vcpu_get_state(vm, vcpu, hostcpu) == VCPU_RUNNING); +} + +#ifdef _SYS_PROC_H_ +static int __inline +vcpu_should_yield(struct vm *vm, int vcpu) +{ + + if (curthread->td_flags & (TDF_ASTPENDING | TDF_NEEDRESCHED)) + return (1); + else if (curthread->td_owepreempt) + return (1); + else + return (0); +} +#endif + +void *vcpu_stats(struct vm *vm, int vcpu); +void vcpu_notify_event(struct vm *vm, int vcpuid, bool lapic_intr); +struct vmspace *vm_get_vmspace(struct vm *vm); +struct vatpic *vm_atpic(struct vm *vm); +struct vatpit *vm_atpit(struct vm *vm); +struct vpmtmr *vm_pmtmr(struct vm *vm); +struct vrtc *vm_rtc(struct vm *vm); + +/* + * Inject exception 'vector' into the guest vcpu. This function returns 0 on + * success and non-zero on failure. + * + * Wrapper functions like 'vm_inject_gp()' should be preferred to calling + * this function directly because they enforce the trap-like or fault-like + * behavior of an exception. + * + * This function should only be called in the context of the thread that is + * executing this vcpu. + */ +int vm_inject_exception(struct vm *vm, int vcpuid, int vector, int err_valid, + uint32_t errcode, int restart_instruction); + +/* + * This function is called after a VM-exit that occurred during exception or + * interrupt delivery through the IDT. The format of 'intinfo' is described + * in Figure 15-1, "EXITINTINFO for All Intercepts", APM, Vol 2. + * + * If a VM-exit handler completes the event delivery successfully then it + * should call vm_exit_intinfo() to extinguish the pending event. For e.g., + * if the task switch emulation is triggered via a task gate then it should + * call this function with 'intinfo=0' to indicate that the external event + * is not pending anymore. + * + * Return value is 0 on success and non-zero on failure. + */ +int vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_t intinfo); + +/* + * This function is called before every VM-entry to retrieve a pending + * event that should be injected into the guest. This function combines + * nested events into a double or triple fault. + * + * Returns 0 if there are no events that need to be injected into the guest + * and non-zero otherwise. + */ +int vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *info); + +int vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2); + +enum vm_reg_name vm_segment_name(int seg_encoding); + +struct vm_copyinfo { + uint64_t gpa; + size_t len; + void *hva; + void *cookie; +}; + +/* + * Set up 'copyinfo[]' to copy to/from guest linear address space starting + * at 'gla' and 'len' bytes long. The 'prot' should be set to PROT_READ for + * a copyin or PROT_WRITE for a copyout. + * + * retval is_fault Interpretation + * 0 0 Success + * 0 1 An exception was injected into the guest + * EFAULT N/A Unrecoverable error + * + * The 'copyinfo[]' can be passed to 'vm_copyin()' or 'vm_copyout()' only if + * the return value is 0. The 'copyinfo[]' resources should be freed by calling + * 'vm_copy_teardown()' after the copy is done. + */ +int vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, + uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo, + int num_copyinfo, int *is_fault); +void vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + int num_copyinfo); +void vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, + void *kaddr, size_t len); +void vm_copyout(struct vm *vm, int vcpuid, const void *kaddr, + struct vm_copyinfo *copyinfo, size_t len); + +int vcpu_trace_exceptions(struct vm *vm, int vcpuid); +#endif /* KERNEL */ + +#define VM_MAXCPU 16 /* maximum virtual cpus */ + +/* + * Identifiers for optional vmm capabilities + */ +enum vm_cap_type { + VM_CAP_HALT_EXIT, + VM_CAP_MTRAP_EXIT, + VM_CAP_PAUSE_EXIT, + VM_CAP_UNRESTRICTED_GUEST, + VM_CAP_ENABLE_INVPCID, + VM_CAP_MAX +}; + +enum vm_intr_trigger { + EDGE_TRIGGER, + LEVEL_TRIGGER +}; + +/* + * The 'access' field has the format specified in Table 21-2 of the Intel + * Architecture Manual vol 3b. + * + * XXX The contents of the 'access' field are architecturally defined except + * bit 16 - Segment Unusable. + */ +struct seg_desc { + uint64_t base; + uint32_t limit; + uint32_t access; +}; +#define SEG_DESC_TYPE(access) ((access) & 0x001f) +#define SEG_DESC_DPL(access) (((access) >> 5) & 0x3) +#define SEG_DESC_PRESENT(access) (((access) & 0x0080) ? 1 : 0) +#define SEG_DESC_DEF32(access) (((access) & 0x4000) ? 1 : 0) +#define SEG_DESC_GRANULARITY(access) (((access) & 0x8000) ? 1 : 0) +#define SEG_DESC_UNUSABLE(access) (((access) & 0x10000) ? 1 : 0) + +enum vm_cpu_mode { + CPU_MODE_REAL, + CPU_MODE_PROTECTED, + CPU_MODE_COMPATIBILITY, /* IA-32E mode (CS.L = 0) */ + CPU_MODE_64BIT, /* IA-32E mode (CS.L = 1) */ +}; + +enum vm_paging_mode { + PAGING_MODE_FLAT, + PAGING_MODE_32, + PAGING_MODE_PAE, + PAGING_MODE_64, +}; + +struct vm_guest_paging { + uint64_t cr3; + int cpl; + enum vm_cpu_mode cpu_mode; + enum vm_paging_mode paging_mode; +}; + +/* + * The data structures 'vie' and 'vie_op' are meant to be opaque to the + * consumers of instruction decoding. The only reason why their contents + * need to be exposed is because they are part of the 'vm_exit' structure. + */ +struct vie_op { + uint8_t op_byte; /* actual opcode byte */ + uint8_t op_type; /* type of operation (e.g. MOV) */ + uint16_t op_flags; +}; + +#define VIE_INST_SIZE 15 +struct vie { + uint8_t inst[VIE_INST_SIZE]; /* instruction bytes */ + uint8_t num_valid; /* size of the instruction */ + uint8_t num_processed; + + uint8_t addrsize:4, opsize:4; /* address and operand sizes */ + uint8_t rex_w:1, /* REX prefix */ + rex_r:1, + rex_x:1, + rex_b:1, + rex_present:1, + repz_present:1, /* REP/REPE/REPZ prefix */ + repnz_present:1, /* REPNE/REPNZ prefix */ + opsize_override:1, /* Operand size override */ + addrsize_override:1, /* Address size override */ + segment_override:1; /* Segment override */ + + uint8_t mod:2, /* ModRM byte */ + reg:4, + rm:4; + + uint8_t ss:2, /* SIB byte */ + index:4, + base:4; + + uint8_t disp_bytes; + uint8_t imm_bytes; + + uint8_t scale; + int base_register; /* VM_REG_GUEST_xyz */ + int index_register; /* VM_REG_GUEST_xyz */ + int segment_register; /* VM_REG_GUEST_xyz */ + + int64_t displacement; /* optional addr displacement */ + int64_t immediate; /* optional immediate operand */ + + uint8_t decoded; /* set to 1 if successfully decoded */ + + struct vie_op op; /* opcode description */ +}; + +enum vm_exitcode { + VM_EXITCODE_INOUT, + VM_EXITCODE_VMX, + VM_EXITCODE_BOGUS, + VM_EXITCODE_RDMSR, + VM_EXITCODE_WRMSR, + VM_EXITCODE_HLT, + VM_EXITCODE_MTRAP, + VM_EXITCODE_PAUSE, + VM_EXITCODE_PAGING, + VM_EXITCODE_INST_EMUL, + VM_EXITCODE_SPINUP_AP, + VM_EXITCODE_DEPRECATED1, /* used to be SPINDOWN_CPU */ + VM_EXITCODE_RENDEZVOUS, + VM_EXITCODE_IOAPIC_EOI, + VM_EXITCODE_SUSPENDED, + VM_EXITCODE_INOUT_STR, + VM_EXITCODE_TASK_SWITCH, + VM_EXITCODE_MONITOR, + VM_EXITCODE_MWAIT, + VM_EXITCODE_SVM, + VM_EXITCODE_REQIDLE, + VM_EXITCODE_MAX +}; + +struct vm_inout { + uint16_t bytes:3; /* 1 or 2 or 4 */ + uint16_t in:1; + uint16_t string:1; + uint16_t rep:1; + uint16_t port; + uint32_t eax; /* valid for out */ +}; + +struct vm_inout_str { + struct vm_inout inout; /* must be the first element */ + struct vm_guest_paging paging; + uint64_t rflags; + uint64_t cr0; + uint64_t index; + uint64_t count; /* rep=1 (%rcx), rep=0 (1) */ + int addrsize; + enum vm_reg_name seg_name; + struct seg_desc seg_desc; +}; + +enum task_switch_reason { + TSR_CALL, + TSR_IRET, + TSR_JMP, + TSR_IDT_GATE, /* task gate in IDT */ +}; + +struct vm_task_switch { + uint16_t tsssel; /* new TSS selector */ + int ext; /* task switch due to external event */ + uint32_t errcode; + int errcode_valid; /* push 'errcode' on the new stack */ + enum task_switch_reason reason; + struct vm_guest_paging paging; +}; + +struct vm_exit { + enum vm_exitcode exitcode; + int inst_length; /* 0 means unknown */ + uint64_t rip; + union { + struct vm_inout inout; + struct vm_inout_str inout_str; + struct { + uint64_t gpa; + int fault_type; + } paging; + struct { + uint64_t gpa; + uint64_t gla; + uint64_t cs_base; + int cs_d; /* CS.D */ + struct vm_guest_paging paging; + struct vie vie; + } inst_emul; + /* + * VMX specific payload. Used when there is no "better" + * exitcode to represent the VM-exit. + */ + struct { + int status; /* vmx inst status */ + /* + * 'exit_reason' and 'exit_qualification' are valid + * only if 'status' is zero. + */ + uint32_t exit_reason; + uint64_t exit_qualification; + /* + * 'inst_error' and 'inst_type' are valid + * only if 'status' is non-zero. + */ + int inst_type; + int inst_error; + } vmx; + /* + * SVM specific payload. + */ + struct { + uint64_t exitcode; + uint64_t exitinfo1; + uint64_t exitinfo2; + } svm; + struct { + uint32_t code; /* ecx value */ + uint64_t wval; + } msr; + struct { + int vcpu; + uint64_t rip; + } spinup_ap; + struct { + uint64_t rflags; + } hlt; + struct { + int vector; + } ioapic_eoi; + struct { + enum vm_suspend_how how; + } suspended; + struct vm_task_switch task_switch; + } u; +}; + +/* APIs to inject faults into the guest */ +void vm_inject_fault(void *vm, int vcpuid, int vector, int errcode_valid, + int errcode); + +static __inline void +vm_inject_ud(void *vm, int vcpuid) +{ + vm_inject_fault(vm, vcpuid, IDT_UD, 0, 0); +} + +static __inline void +vm_inject_gp(void *vm, int vcpuid) +{ + vm_inject_fault(vm, vcpuid, IDT_GP, 1, 0); +} + +static __inline void +vm_inject_ac(void *vm, int vcpuid, int errcode) +{ + vm_inject_fault(vm, vcpuid, IDT_AC, 1, errcode); +} + +static __inline void +vm_inject_ss(void *vm, int vcpuid, int errcode) +{ + vm_inject_fault(vm, vcpuid, IDT_SS, 1, errcode); +} + +void vm_inject_pf(void *vm, int vcpuid, int error_code, uint64_t cr2); + +int vm_restart_instruction(void *vm, int vcpuid); + +#endif /* _VMM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/vmm_dev.h /usr/src/sys/modules/netmap/machine/vmm_dev.h --- usr/src/sys/modules/netmap/machine/vmm_dev.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vmm_dev.h 2016-11-30 10:56:05.786583000 +0000 @@ -0,0 +1,410 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm_dev.h 298094 2016-04-16 03:44:50Z gjb $ + */ + +#ifndef _VMM_DEV_H_ +#define _VMM_DEV_H_ + +#ifdef _KERNEL +void vmmdev_init(void); +int vmmdev_cleanup(void); +#endif + +struct vm_memmap { + vm_paddr_t gpa; + int segid; /* memory segment */ + vm_ooffset_t segoff; /* offset into memory segment */ + size_t len; /* mmap length */ + int prot; /* RWX */ + int flags; +}; +#define VM_MEMMAP_F_WIRED 0x01 +#define VM_MEMMAP_F_IOMMU 0x02 + +#define VM_MEMSEG_NAME(m) ((m)->name[0] != '\0' ? (m)->name : NULL) +struct vm_memseg { + int segid; + size_t len; + char name[SPECNAMELEN + 1]; +}; + +struct vm_register { + int cpuid; + int regnum; /* enum vm_reg_name */ + uint64_t regval; +}; + +struct vm_seg_desc { /* data or code segment */ + int cpuid; + int regnum; /* enum vm_reg_name */ + struct seg_desc desc; +}; + +struct vm_run { + int cpuid; + struct vm_exit vm_exit; +}; + +struct vm_exception { + int cpuid; + int vector; + uint32_t error_code; + int error_code_valid; + int restart_instruction; +}; + +struct vm_lapic_msi { + uint64_t msg; + uint64_t addr; +}; + +struct vm_lapic_irq { + int cpuid; + int vector; +}; + +struct vm_ioapic_irq { + int irq; +}; + +struct vm_isa_irq { + int atpic_irq; + int ioapic_irq; +}; + +struct vm_isa_irq_trigger { + int atpic_irq; + enum vm_intr_trigger trigger; +}; + +struct vm_capability { + int cpuid; + enum vm_cap_type captype; + int capval; + int allcpus; +}; + +struct vm_pptdev { + int bus; + int slot; + int func; +}; + +struct vm_pptdev_mmio { + int bus; + int slot; + int func; + vm_paddr_t gpa; + vm_paddr_t hpa; + size_t len; +}; + +/* Argument for VM_MAP_USER_BUF ioctl in vmmapi.c */ +struct vm_user_buf { + vm_paddr_t gpa; + void *addr; + size_t len; +}; + +/* Argument for VM_IO_REG_HANDLER ioctl in vmmapi.c */ +struct vm_io_reg_handler { + uint16_t port; /* I/O address */ + uint16_t in; /* 0 out, 1 in */ + uint32_t mask_data; /* 0 means match anything */ + uint32_t data; /* data to match */ + enum vm_io_regh_type type; /* handler type */ + void *arg; /* handler argument */ +}; + +struct vm_pptdev_msi { + int vcpu; + int bus; + int slot; + int func; + int numvec; /* 0 means disabled */ + uint64_t msg; + uint64_t addr; +}; + +struct vm_pptdev_msix { + int vcpu; + int bus; + int slot; + int func; + int idx; + uint64_t msg; + uint32_t vector_control; + uint64_t addr; +}; + +struct vm_nmi { + int cpuid; +}; + +#define MAX_VM_STATS 64 +struct vm_stats { + int cpuid; /* in */ + int num_entries; /* out */ + struct timeval tv; + uint64_t statbuf[MAX_VM_STATS]; +}; + +struct vm_stat_desc { + int index; /* in */ + char desc[128]; /* out */ +}; + +struct vm_x2apic { + int cpuid; + enum x2apic_state state; +}; + +struct vm_gpa_pte { + uint64_t gpa; /* in */ + uint64_t pte[4]; /* out */ + int ptenum; +}; + +struct vm_hpet_cap { + uint32_t capabilities; /* lower 32 bits of HPET capabilities */ +}; + +struct vm_suspend { + enum vm_suspend_how how; +}; + +struct vm_gla2gpa { + int vcpuid; /* inputs */ + int prot; /* PROT_READ or PROT_WRITE */ + uint64_t gla; + struct vm_guest_paging paging; + int fault; /* outputs */ + uint64_t gpa; +}; + +struct vm_activate_cpu { + int vcpuid; +}; + +struct vm_cpuset { + int which; + int cpusetsize; + cpuset_t *cpus; +}; +#define VM_ACTIVE_CPUS 0 +#define VM_SUSPENDED_CPUS 1 + +struct vm_intinfo { + int vcpuid; + uint64_t info1; + uint64_t info2; +}; + +struct vm_rtc_time { + time_t secs; +}; + +struct vm_rtc_data { + int offset; + uint8_t value; +}; + +enum { + /* general routines */ + IOCNUM_ABIVERS = 0, + IOCNUM_RUN = 1, + IOCNUM_SET_CAPABILITY = 2, + IOCNUM_GET_CAPABILITY = 3, + IOCNUM_SUSPEND = 4, + IOCNUM_REINIT = 5, + + /* memory apis */ + IOCNUM_MAP_MEMORY = 10, /* deprecated */ + IOCNUM_GET_MEMORY_SEG = 11, /* deprecated */ + IOCNUM_GET_GPA_PMAP = 12, + IOCNUM_GLA2GPA = 13, + IOCNUM_ALLOC_MEMSEG = 14, + IOCNUM_GET_MEMSEG = 15, + IOCNUM_MMAP_MEMSEG = 16, + IOCNUM_MMAP_GETNEXT = 17, + + /* register/state accessors */ + IOCNUM_SET_REGISTER = 20, + IOCNUM_GET_REGISTER = 21, + IOCNUM_SET_SEGMENT_DESCRIPTOR = 22, + IOCNUM_GET_SEGMENT_DESCRIPTOR = 23, + + /* interrupt injection */ + IOCNUM_GET_INTINFO = 28, + IOCNUM_SET_INTINFO = 29, + IOCNUM_INJECT_EXCEPTION = 30, + IOCNUM_LAPIC_IRQ = 31, + IOCNUM_INJECT_NMI = 32, + IOCNUM_IOAPIC_ASSERT_IRQ = 33, + IOCNUM_IOAPIC_DEASSERT_IRQ = 34, + IOCNUM_IOAPIC_PULSE_IRQ = 35, + IOCNUM_LAPIC_MSI = 36, + IOCNUM_LAPIC_LOCAL_IRQ = 37, + IOCNUM_IOAPIC_PINCOUNT = 38, + IOCNUM_RESTART_INSTRUCTION = 39, + + /* PCI pass-thru */ + IOCNUM_BIND_PPTDEV = 40, + IOCNUM_UNBIND_PPTDEV = 41, + IOCNUM_MAP_PPTDEV_MMIO = 42, + IOCNUM_PPTDEV_MSI = 43, + IOCNUM_PPTDEV_MSIX = 44, + + /* statistics */ + IOCNUM_VM_STATS = 50, + IOCNUM_VM_STAT_DESC = 51, + + /* kernel device state */ + IOCNUM_SET_X2APIC_STATE = 60, + IOCNUM_GET_X2APIC_STATE = 61, + IOCNUM_GET_HPET_CAPABILITIES = 62, + + /* legacy interrupt injection */ + IOCNUM_ISA_ASSERT_IRQ = 80, + IOCNUM_ISA_DEASSERT_IRQ = 81, + IOCNUM_ISA_PULSE_IRQ = 82, + IOCNUM_ISA_SET_IRQ_TRIGGER = 83, + + /* vm_cpuset */ + IOCNUM_ACTIVATE_CPU = 90, + IOCNUM_GET_CPUSET = 91, + + /* RTC */ + IOCNUM_RTC_READ = 100, + IOCNUM_RTC_WRITE = 101, + IOCNUM_RTC_SETTIME = 102, + IOCNUM_RTC_GETTIME = 103, + + /* host mmap and IO handler */ + IOCNUM_MAP_USER_BUF = 104, + IOCNUM_IO_REG_HANDLER = 105, +}; + +#define VM_RUN \ + _IOWR('v', IOCNUM_RUN, struct vm_run) +#define VM_SUSPEND \ + _IOW('v', IOCNUM_SUSPEND, struct vm_suspend) +#define VM_REINIT \ + _IO('v', IOCNUM_REINIT) +#define VM_ALLOC_MEMSEG \ + _IOW('v', IOCNUM_ALLOC_MEMSEG, struct vm_memseg) +#define VM_GET_MEMSEG \ + _IOWR('v', IOCNUM_GET_MEMSEG, struct vm_memseg) +#define VM_MMAP_MEMSEG \ + _IOW('v', IOCNUM_MMAP_MEMSEG, struct vm_memmap) +#define VM_MMAP_GETNEXT \ + _IOWR('v', IOCNUM_MMAP_GETNEXT, struct vm_memmap) +#define VM_SET_REGISTER \ + _IOW('v', IOCNUM_SET_REGISTER, struct vm_register) +#define VM_GET_REGISTER \ + _IOWR('v', IOCNUM_GET_REGISTER, struct vm_register) +#define VM_SET_SEGMENT_DESCRIPTOR \ + _IOW('v', IOCNUM_SET_SEGMENT_DESCRIPTOR, struct vm_seg_desc) +#define VM_GET_SEGMENT_DESCRIPTOR \ + _IOWR('v', IOCNUM_GET_SEGMENT_DESCRIPTOR, struct vm_seg_desc) +#define VM_INJECT_EXCEPTION \ + _IOW('v', IOCNUM_INJECT_EXCEPTION, struct vm_exception) +#define VM_LAPIC_IRQ \ + _IOW('v', IOCNUM_LAPIC_IRQ, struct vm_lapic_irq) +#define VM_LAPIC_LOCAL_IRQ \ + _IOW('v', IOCNUM_LAPIC_LOCAL_IRQ, struct vm_lapic_irq) +#define VM_LAPIC_MSI \ + _IOW('v', IOCNUM_LAPIC_MSI, struct vm_lapic_msi) +#define VM_IOAPIC_ASSERT_IRQ \ + _IOW('v', IOCNUM_IOAPIC_ASSERT_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_DEASSERT_IRQ \ + _IOW('v', IOCNUM_IOAPIC_DEASSERT_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_PULSE_IRQ \ + _IOW('v', IOCNUM_IOAPIC_PULSE_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_PINCOUNT \ + _IOR('v', IOCNUM_IOAPIC_PINCOUNT, int) +#define VM_ISA_ASSERT_IRQ \ + _IOW('v', IOCNUM_ISA_ASSERT_IRQ, struct vm_isa_irq) +#define VM_ISA_DEASSERT_IRQ \ + _IOW('v', IOCNUM_ISA_DEASSERT_IRQ, struct vm_isa_irq) +#define VM_ISA_PULSE_IRQ \ + _IOW('v', IOCNUM_ISA_PULSE_IRQ, struct vm_isa_irq) +#define VM_ISA_SET_IRQ_TRIGGER \ + _IOW('v', IOCNUM_ISA_SET_IRQ_TRIGGER, struct vm_isa_irq_trigger) +#define VM_SET_CAPABILITY \ + _IOW('v', IOCNUM_SET_CAPABILITY, struct vm_capability) +#define VM_GET_CAPABILITY \ + _IOWR('v', IOCNUM_GET_CAPABILITY, struct vm_capability) +#define VM_BIND_PPTDEV \ + _IOW('v', IOCNUM_BIND_PPTDEV, struct vm_pptdev) +#define VM_UNBIND_PPTDEV \ + _IOW('v', IOCNUM_UNBIND_PPTDEV, struct vm_pptdev) +#define VM_MAP_PPTDEV_MMIO \ + _IOW('v', IOCNUM_MAP_PPTDEV_MMIO, struct vm_pptdev_mmio) +#define VM_MAP_USER_BUF \ + _IOW('v', IOCNUM_MAP_USER_BUF, struct vm_user_buf) +#define VM_IO_REG_HANDLER \ + _IOW('v', IOCNUM_IO_REG_HANDLER, struct vm_io_reg_handler) +#define VM_PPTDEV_MSI \ + _IOW('v', IOCNUM_PPTDEV_MSI, struct vm_pptdev_msi) +#define VM_PPTDEV_MSIX \ + _IOW('v', IOCNUM_PPTDEV_MSIX, struct vm_pptdev_msix) +#define VM_INJECT_NMI \ + _IOW('v', IOCNUM_INJECT_NMI, struct vm_nmi) +#define VM_STATS \ + _IOWR('v', IOCNUM_VM_STATS, struct vm_stats) +#define VM_STAT_DESC \ + _IOWR('v', IOCNUM_VM_STAT_DESC, struct vm_stat_desc) +#define VM_SET_X2APIC_STATE \ + _IOW('v', IOCNUM_SET_X2APIC_STATE, struct vm_x2apic) +#define VM_GET_X2APIC_STATE \ + _IOWR('v', IOCNUM_GET_X2APIC_STATE, struct vm_x2apic) +#define VM_GET_HPET_CAPABILITIES \ + _IOR('v', IOCNUM_GET_HPET_CAPABILITIES, struct vm_hpet_cap) +#define VM_GET_GPA_PMAP \ + _IOWR('v', IOCNUM_GET_GPA_PMAP, struct vm_gpa_pte) +#define VM_GLA2GPA \ + _IOWR('v', IOCNUM_GLA2GPA, struct vm_gla2gpa) +#define VM_ACTIVATE_CPU \ + _IOW('v', IOCNUM_ACTIVATE_CPU, struct vm_activate_cpu) +#define VM_GET_CPUS \ + _IOW('v', IOCNUM_GET_CPUSET, struct vm_cpuset) +#define VM_SET_INTINFO \ + _IOW('v', IOCNUM_SET_INTINFO, struct vm_intinfo) +#define VM_GET_INTINFO \ + _IOWR('v', IOCNUM_GET_INTINFO, struct vm_intinfo) +#define VM_RTC_WRITE \ + _IOW('v', IOCNUM_RTC_WRITE, struct vm_rtc_data) +#define VM_RTC_READ \ + _IOWR('v', IOCNUM_RTC_READ, struct vm_rtc_data) +#define VM_RTC_SETTIME \ + _IOW('v', IOCNUM_RTC_SETTIME, struct vm_rtc_time) +#define VM_RTC_GETTIME \ + _IOR('v', IOCNUM_RTC_GETTIME, struct vm_rtc_time) +#define VM_RESTART_INSTRUCTION \ + _IOW('v', IOCNUM_RESTART_INSTRUCTION, int) +#endif diff -u -r -N usr/src/sys/modules/netmap/machine/vmm_dev.h.orig /usr/src/sys/modules/netmap/machine/vmm_dev.h.orig --- usr/src/sys/modules/netmap/machine/vmm_dev.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vmm_dev.h.orig 2016-11-30 10:52:54.370440000 +0000 @@ -0,0 +1,385 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm_dev.h 298094 2016-04-16 03:44:50Z gjb $ + */ + +#ifndef _VMM_DEV_H_ +#define _VMM_DEV_H_ + +#ifdef _KERNEL +void vmmdev_init(void); +int vmmdev_cleanup(void); +#endif + +struct vm_memmap { + vm_paddr_t gpa; + int segid; /* memory segment */ + vm_ooffset_t segoff; /* offset into memory segment */ + size_t len; /* mmap length */ + int prot; /* RWX */ + int flags; +}; +#define VM_MEMMAP_F_WIRED 0x01 +#define VM_MEMMAP_F_IOMMU 0x02 + +#define VM_MEMSEG_NAME(m) ((m)->name[0] != '\0' ? (m)->name : NULL) +struct vm_memseg { + int segid; + size_t len; + char name[SPECNAMELEN + 1]; +}; + +struct vm_register { + int cpuid; + int regnum; /* enum vm_reg_name */ + uint64_t regval; +}; + +struct vm_seg_desc { /* data or code segment */ + int cpuid; + int regnum; /* enum vm_reg_name */ + struct seg_desc desc; +}; + +struct vm_run { + int cpuid; + struct vm_exit vm_exit; +}; + +struct vm_exception { + int cpuid; + int vector; + uint32_t error_code; + int error_code_valid; + int restart_instruction; +}; + +struct vm_lapic_msi { + uint64_t msg; + uint64_t addr; +}; + +struct vm_lapic_irq { + int cpuid; + int vector; +}; + +struct vm_ioapic_irq { + int irq; +}; + +struct vm_isa_irq { + int atpic_irq; + int ioapic_irq; +}; + +struct vm_isa_irq_trigger { + int atpic_irq; + enum vm_intr_trigger trigger; +}; + +struct vm_capability { + int cpuid; + enum vm_cap_type captype; + int capval; + int allcpus; +}; + +struct vm_pptdev { + int bus; + int slot; + int func; +}; + +struct vm_pptdev_mmio { + int bus; + int slot; + int func; + vm_paddr_t gpa; + vm_paddr_t hpa; + size_t len; +}; + +struct vm_pptdev_msi { + int vcpu; + int bus; + int slot; + int func; + int numvec; /* 0 means disabled */ + uint64_t msg; + uint64_t addr; +}; + +struct vm_pptdev_msix { + int vcpu; + int bus; + int slot; + int func; + int idx; + uint64_t msg; + uint32_t vector_control; + uint64_t addr; +}; + +struct vm_nmi { + int cpuid; +}; + +#define MAX_VM_STATS 64 +struct vm_stats { + int cpuid; /* in */ + int num_entries; /* out */ + struct timeval tv; + uint64_t statbuf[MAX_VM_STATS]; +}; + +struct vm_stat_desc { + int index; /* in */ + char desc[128]; /* out */ +}; + +struct vm_x2apic { + int cpuid; + enum x2apic_state state; +}; + +struct vm_gpa_pte { + uint64_t gpa; /* in */ + uint64_t pte[4]; /* out */ + int ptenum; +}; + +struct vm_hpet_cap { + uint32_t capabilities; /* lower 32 bits of HPET capabilities */ +}; + +struct vm_suspend { + enum vm_suspend_how how; +}; + +struct vm_gla2gpa { + int vcpuid; /* inputs */ + int prot; /* PROT_READ or PROT_WRITE */ + uint64_t gla; + struct vm_guest_paging paging; + int fault; /* outputs */ + uint64_t gpa; +}; + +struct vm_activate_cpu { + int vcpuid; +}; + +struct vm_cpuset { + int which; + int cpusetsize; + cpuset_t *cpus; +}; +#define VM_ACTIVE_CPUS 0 +#define VM_SUSPENDED_CPUS 1 + +struct vm_intinfo { + int vcpuid; + uint64_t info1; + uint64_t info2; +}; + +struct vm_rtc_time { + time_t secs; +}; + +struct vm_rtc_data { + int offset; + uint8_t value; +}; + +enum { + /* general routines */ + IOCNUM_ABIVERS = 0, + IOCNUM_RUN = 1, + IOCNUM_SET_CAPABILITY = 2, + IOCNUM_GET_CAPABILITY = 3, + IOCNUM_SUSPEND = 4, + IOCNUM_REINIT = 5, + + /* memory apis */ + IOCNUM_MAP_MEMORY = 10, /* deprecated */ + IOCNUM_GET_MEMORY_SEG = 11, /* deprecated */ + IOCNUM_GET_GPA_PMAP = 12, + IOCNUM_GLA2GPA = 13, + IOCNUM_ALLOC_MEMSEG = 14, + IOCNUM_GET_MEMSEG = 15, + IOCNUM_MMAP_MEMSEG = 16, + IOCNUM_MMAP_GETNEXT = 17, + + /* register/state accessors */ + IOCNUM_SET_REGISTER = 20, + IOCNUM_GET_REGISTER = 21, + IOCNUM_SET_SEGMENT_DESCRIPTOR = 22, + IOCNUM_GET_SEGMENT_DESCRIPTOR = 23, + + /* interrupt injection */ + IOCNUM_GET_INTINFO = 28, + IOCNUM_SET_INTINFO = 29, + IOCNUM_INJECT_EXCEPTION = 30, + IOCNUM_LAPIC_IRQ = 31, + IOCNUM_INJECT_NMI = 32, + IOCNUM_IOAPIC_ASSERT_IRQ = 33, + IOCNUM_IOAPIC_DEASSERT_IRQ = 34, + IOCNUM_IOAPIC_PULSE_IRQ = 35, + IOCNUM_LAPIC_MSI = 36, + IOCNUM_LAPIC_LOCAL_IRQ = 37, + IOCNUM_IOAPIC_PINCOUNT = 38, + IOCNUM_RESTART_INSTRUCTION = 39, + + /* PCI pass-thru */ + IOCNUM_BIND_PPTDEV = 40, + IOCNUM_UNBIND_PPTDEV = 41, + IOCNUM_MAP_PPTDEV_MMIO = 42, + IOCNUM_PPTDEV_MSI = 43, + IOCNUM_PPTDEV_MSIX = 44, + + /* statistics */ + IOCNUM_VM_STATS = 50, + IOCNUM_VM_STAT_DESC = 51, + + /* kernel device state */ + IOCNUM_SET_X2APIC_STATE = 60, + IOCNUM_GET_X2APIC_STATE = 61, + IOCNUM_GET_HPET_CAPABILITIES = 62, + + /* legacy interrupt injection */ + IOCNUM_ISA_ASSERT_IRQ = 80, + IOCNUM_ISA_DEASSERT_IRQ = 81, + IOCNUM_ISA_PULSE_IRQ = 82, + IOCNUM_ISA_SET_IRQ_TRIGGER = 83, + + /* vm_cpuset */ + IOCNUM_ACTIVATE_CPU = 90, + IOCNUM_GET_CPUSET = 91, + + /* RTC */ + IOCNUM_RTC_READ = 100, + IOCNUM_RTC_WRITE = 101, + IOCNUM_RTC_SETTIME = 102, + IOCNUM_RTC_GETTIME = 103, +}; + +#define VM_RUN \ + _IOWR('v', IOCNUM_RUN, struct vm_run) +#define VM_SUSPEND \ + _IOW('v', IOCNUM_SUSPEND, struct vm_suspend) +#define VM_REINIT \ + _IO('v', IOCNUM_REINIT) +#define VM_ALLOC_MEMSEG \ + _IOW('v', IOCNUM_ALLOC_MEMSEG, struct vm_memseg) +#define VM_GET_MEMSEG \ + _IOWR('v', IOCNUM_GET_MEMSEG, struct vm_memseg) +#define VM_MMAP_MEMSEG \ + _IOW('v', IOCNUM_MMAP_MEMSEG, struct vm_memmap) +#define VM_MMAP_GETNEXT \ + _IOWR('v', IOCNUM_MMAP_GETNEXT, struct vm_memmap) +#define VM_SET_REGISTER \ + _IOW('v', IOCNUM_SET_REGISTER, struct vm_register) +#define VM_GET_REGISTER \ + _IOWR('v', IOCNUM_GET_REGISTER, struct vm_register) +#define VM_SET_SEGMENT_DESCRIPTOR \ + _IOW('v', IOCNUM_SET_SEGMENT_DESCRIPTOR, struct vm_seg_desc) +#define VM_GET_SEGMENT_DESCRIPTOR \ + _IOWR('v', IOCNUM_GET_SEGMENT_DESCRIPTOR, struct vm_seg_desc) +#define VM_INJECT_EXCEPTION \ + _IOW('v', IOCNUM_INJECT_EXCEPTION, struct vm_exception) +#define VM_LAPIC_IRQ \ + _IOW('v', IOCNUM_LAPIC_IRQ, struct vm_lapic_irq) +#define VM_LAPIC_LOCAL_IRQ \ + _IOW('v', IOCNUM_LAPIC_LOCAL_IRQ, struct vm_lapic_irq) +#define VM_LAPIC_MSI \ + _IOW('v', IOCNUM_LAPIC_MSI, struct vm_lapic_msi) +#define VM_IOAPIC_ASSERT_IRQ \ + _IOW('v', IOCNUM_IOAPIC_ASSERT_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_DEASSERT_IRQ \ + _IOW('v', IOCNUM_IOAPIC_DEASSERT_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_PULSE_IRQ \ + _IOW('v', IOCNUM_IOAPIC_PULSE_IRQ, struct vm_ioapic_irq) +#define VM_IOAPIC_PINCOUNT \ + _IOR('v', IOCNUM_IOAPIC_PINCOUNT, int) +#define VM_ISA_ASSERT_IRQ \ + _IOW('v', IOCNUM_ISA_ASSERT_IRQ, struct vm_isa_irq) +#define VM_ISA_DEASSERT_IRQ \ + _IOW('v', IOCNUM_ISA_DEASSERT_IRQ, struct vm_isa_irq) +#define VM_ISA_PULSE_IRQ \ + _IOW('v', IOCNUM_ISA_PULSE_IRQ, struct vm_isa_irq) +#define VM_ISA_SET_IRQ_TRIGGER \ + _IOW('v', IOCNUM_ISA_SET_IRQ_TRIGGER, struct vm_isa_irq_trigger) +#define VM_SET_CAPABILITY \ + _IOW('v', IOCNUM_SET_CAPABILITY, struct vm_capability) +#define VM_GET_CAPABILITY \ + _IOWR('v', IOCNUM_GET_CAPABILITY, struct vm_capability) +#define VM_BIND_PPTDEV \ + _IOW('v', IOCNUM_BIND_PPTDEV, struct vm_pptdev) +#define VM_UNBIND_PPTDEV \ + _IOW('v', IOCNUM_UNBIND_PPTDEV, struct vm_pptdev) +#define VM_MAP_PPTDEV_MMIO \ + _IOW('v', IOCNUM_MAP_PPTDEV_MMIO, struct vm_pptdev_mmio) +#define VM_PPTDEV_MSI \ + _IOW('v', IOCNUM_PPTDEV_MSI, struct vm_pptdev_msi) +#define VM_PPTDEV_MSIX \ + _IOW('v', IOCNUM_PPTDEV_MSIX, struct vm_pptdev_msix) +#define VM_INJECT_NMI \ + _IOW('v', IOCNUM_INJECT_NMI, struct vm_nmi) +#define VM_STATS \ + _IOWR('v', IOCNUM_VM_STATS, struct vm_stats) +#define VM_STAT_DESC \ + _IOWR('v', IOCNUM_VM_STAT_DESC, struct vm_stat_desc) +#define VM_SET_X2APIC_STATE \ + _IOW('v', IOCNUM_SET_X2APIC_STATE, struct vm_x2apic) +#define VM_GET_X2APIC_STATE \ + _IOWR('v', IOCNUM_GET_X2APIC_STATE, struct vm_x2apic) +#define VM_GET_HPET_CAPABILITIES \ + _IOR('v', IOCNUM_GET_HPET_CAPABILITIES, struct vm_hpet_cap) +#define VM_GET_GPA_PMAP \ + _IOWR('v', IOCNUM_GET_GPA_PMAP, struct vm_gpa_pte) +#define VM_GLA2GPA \ + _IOWR('v', IOCNUM_GLA2GPA, struct vm_gla2gpa) +#define VM_ACTIVATE_CPU \ + _IOW('v', IOCNUM_ACTIVATE_CPU, struct vm_activate_cpu) +#define VM_GET_CPUS \ + _IOW('v', IOCNUM_GET_CPUSET, struct vm_cpuset) +#define VM_SET_INTINFO \ + _IOW('v', IOCNUM_SET_INTINFO, struct vm_intinfo) +#define VM_GET_INTINFO \ + _IOWR('v', IOCNUM_GET_INTINFO, struct vm_intinfo) +#define VM_RTC_WRITE \ + _IOW('v', IOCNUM_RTC_WRITE, struct vm_rtc_data) +#define VM_RTC_READ \ + _IOWR('v', IOCNUM_RTC_READ, struct vm_rtc_data) +#define VM_RTC_SETTIME \ + _IOW('v', IOCNUM_RTC_SETTIME, struct vm_rtc_time) +#define VM_RTC_GETTIME \ + _IOR('v', IOCNUM_RTC_GETTIME, struct vm_rtc_time) +#define VM_RESTART_INSTRUCTION \ + _IOW('v', IOCNUM_RESTART_INSTRUCTION, int) +#endif diff -u -r -N usr/src/sys/modules/netmap/machine/vmm_instruction_emul.h /usr/src/sys/modules/netmap/machine/vmm_instruction_emul.h --- usr/src/sys/modules/netmap/machine/vmm_instruction_emul.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vmm_instruction_emul.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,116 @@ +/*- + * Copyright (c) 2012 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/vmm_instruction_emul.h 298094 2016-04-16 03:44:50Z gjb $ + */ + +#ifndef _VMM_INSTRUCTION_EMUL_H_ +#define _VMM_INSTRUCTION_EMUL_H_ + +#include <sys/mman.h> + +/* + * Callback functions to read and write memory regions. + */ +typedef int (*mem_region_read_t)(void *vm, int cpuid, uint64_t gpa, + uint64_t *rval, int rsize, void *arg); + +typedef int (*mem_region_write_t)(void *vm, int cpuid, uint64_t gpa, + uint64_t wval, int wsize, void *arg); + +/* + * Emulate the decoded 'vie' instruction. + * + * The callbacks 'mrr' and 'mrw' emulate reads and writes to the memory region + * containing 'gpa'. 'mrarg' is an opaque argument that is passed into the + * callback functions. + * + * 'void *vm' should be 'struct vm *' when called from kernel context and + * 'struct vmctx *' when called from user context. + * s + */ +int vmm_emulate_instruction(void *vm, int cpuid, uint64_t gpa, struct vie *vie, + struct vm_guest_paging *paging, mem_region_read_t mrr, + mem_region_write_t mrw, void *mrarg); + +int vie_update_register(void *vm, int vcpuid, enum vm_reg_name reg, + uint64_t val, int size); + +/* + * Returns 1 if an alignment check exception should be injected and 0 otherwise. + */ +int vie_alignment_check(int cpl, int operand_size, uint64_t cr0, + uint64_t rflags, uint64_t gla); + +/* Returns 1 if the 'gla' is not canonical and 0 otherwise. */ +int vie_canonical_check(enum vm_cpu_mode cpu_mode, uint64_t gla); + +uint64_t vie_size2mask(int size); + +int vie_calculate_gla(enum vm_cpu_mode cpu_mode, enum vm_reg_name seg, + struct seg_desc *desc, uint64_t off, int length, int addrsize, int prot, + uint64_t *gla); + +#ifdef _KERNEL +/* + * APIs to fetch and decode the instruction from nested page fault handler. + * + * 'vie' must be initialized before calling 'vmm_fetch_instruction()' + */ +int vmm_fetch_instruction(struct vm *vm, int cpuid, + struct vm_guest_paging *guest_paging, + uint64_t rip, int inst_length, struct vie *vie, + int *is_fault); + +/* + * Translate the guest linear address 'gla' to a guest physical address. + * + * retval is_fault Interpretation + * 0 0 'gpa' contains result of the translation + * 0 1 An exception was injected into the guest + * EFAULT N/A An unrecoverable hypervisor error occurred + */ +int vm_gla2gpa(struct vm *vm, int vcpuid, struct vm_guest_paging *paging, + uint64_t gla, int prot, uint64_t *gpa, int *is_fault); + +void vie_init(struct vie *vie, const char *inst_bytes, int inst_length); + +/* + * Decode the instruction fetched into 'vie' so it can be emulated. + * + * 'gla' is the guest linear address provided by the hardware assist + * that caused the nested page table fault. It is used to verify that + * the software instruction decoding is in agreement with the hardware. + * + * Some hardware assists do not provide the 'gla' to the hypervisor. + * To skip the 'gla' verification for this or any other reason pass + * in VIE_INVALID_GLA instead. + */ +#define VIE_INVALID_GLA (1UL << 63) /* a non-canonical address */ +int vmm_decode_instruction(struct vm *vm, int cpuid, uint64_t gla, + enum vm_cpu_mode cpu_mode, int csd, struct vie *vie); +#endif /* _KERNEL */ + +#endif /* _VMM_INSTRUCTION_EMUL_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/vmparam.h /usr/src/sys/modules/netmap/machine/vmparam.h --- usr/src/sys/modules/netmap/machine/vmparam.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/vmparam.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,227 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 1994 John S. Dyson + * All rights reserved. + * Copyright (c) 2003 Peter Wemm + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)vmparam.h 5.9 (Berkeley) 5/12/91 + * $FreeBSD: releng/11.0/sys/amd64/include/vmparam.h 284147 2015-06-08 04:59:32Z alc $ + */ + + +#ifndef _MACHINE_VMPARAM_H_ +#define _MACHINE_VMPARAM_H_ 1 + +/* + * Machine dependent constants for AMD64. + */ + +/* + * Virtual memory related constants, all in bytes + */ +#define MAXTSIZ (128UL*1024*1024) /* max text size */ +#ifndef DFLDSIZ +#define DFLDSIZ (32768UL*1024*1024) /* initial data size limit */ +#endif +#ifndef MAXDSIZ +#define MAXDSIZ (32768UL*1024*1024) /* max data size */ +#endif +#ifndef DFLSSIZ +#define DFLSSIZ (8UL*1024*1024) /* initial stack size limit */ +#endif +#ifndef MAXSSIZ +#define MAXSSIZ (512UL*1024*1024) /* max stack size */ +#endif +#ifndef SGROWSIZ +#define SGROWSIZ (128UL*1024) /* amount to grow stack */ +#endif + +/* + * We provide a machine specific single page allocator through the use + * of the direct mapped segment. This uses 2MB pages for reduced + * TLB pressure. + */ +#define UMA_MD_SMALL_ALLOC + +/* + * The physical address space is densely populated. + */ +#define VM_PHYSSEG_DENSE + +/* + * The number of PHYSSEG entries must be one greater than the number + * of phys_avail entries because the phys_avail entry that spans the + * largest physical address that is accessible by ISA DMA is split + * into two PHYSSEG entries. + */ +#define VM_PHYSSEG_MAX 63 + +/* + * Create two free page pools: VM_FREEPOOL_DEFAULT is the default pool + * from which physical pages are allocated and VM_FREEPOOL_DIRECT is + * the pool from which physical pages for page tables and small UMA + * objects are allocated. + */ +#define VM_NFREEPOOL 2 +#define VM_FREEPOOL_DEFAULT 0 +#define VM_FREEPOOL_DIRECT 1 + +/* + * Create up to three free page lists: VM_FREELIST_DMA32 is for physical pages + * that have physical addresses below 4G but are not accessible by ISA DMA, + * and VM_FREELIST_ISADMA is for physical pages that are accessible by ISA + * DMA. + */ +#define VM_NFREELIST 3 +#define VM_FREELIST_DEFAULT 0 +#define VM_FREELIST_DMA32 1 +#define VM_FREELIST_ISADMA 2 + +/* + * Create the DMA32 free list only if the number of physical pages above + * physical address 4G is at least 16M, which amounts to 64GB of physical + * memory. + */ +#define VM_DMA32_NPAGES_THRESHOLD 16777216 + +/* + * An allocation size of 16MB is supported in order to optimize the + * use of the direct map by UMA. Specifically, a cache line contains + * at most 8 PDEs, collectively mapping 16MB of physical memory. By + * reducing the number of distinct 16MB "pages" that are used by UMA, + * the physical memory allocator reduces the likelihood of both 2MB + * page TLB misses and cache misses caused by 2MB page TLB misses. + */ +#define VM_NFREEORDER 13 + +/* + * Enable superpage reservations: 1 level. + */ +#ifndef VM_NRESERVLEVEL +#define VM_NRESERVLEVEL 1 +#endif + +/* + * Level 0 reservations consist of 512 pages. + */ +#ifndef VM_LEVEL_0_ORDER +#define VM_LEVEL_0_ORDER 9 +#endif + +#ifdef SMP +#define PA_LOCK_COUNT 256 +#endif + +/* + * Virtual addresses of things. Derived from the page directory and + * page table indexes from pmap.h for precision. + * + * 0x0000000000000000 - 0x00007fffffffffff user map + * 0x0000800000000000 - 0xffff7fffffffffff does not exist (hole) + * 0xffff800000000000 - 0xffff804020100fff recursive page table (512GB slot) + * 0xffff804020101000 - 0xfffff7ffffffffff unused + * 0xfffff80000000000 - 0xfffffbffffffffff 4TB direct map + * 0xfffffc0000000000 - 0xfffffdffffffffff unused + * 0xfffffe0000000000 - 0xffffffffffffffff 2TB kernel map + * + * Within the kernel map: + * + * 0xffffffff80000000 KERNBASE + */ + +#define VM_MIN_KERNEL_ADDRESS KVADDR(KPML4BASE, 0, 0, 0) +#define VM_MAX_KERNEL_ADDRESS KVADDR(KPML4BASE + NKPML4E - 1, \ + NPDPEPG-1, NPDEPG-1, NPTEPG-1) + +#define DMAP_MIN_ADDRESS KVADDR(DMPML4I, 0, 0, 0) +#define DMAP_MAX_ADDRESS KVADDR(DMPML4I + NDMPML4E, 0, 0, 0) + +#define KERNBASE KVADDR(KPML4I, KPDPI, 0, 0) + +#define UPT_MAX_ADDRESS KVADDR(PML4PML4I, PML4PML4I, PML4PML4I, PML4PML4I) +#define UPT_MIN_ADDRESS KVADDR(PML4PML4I, 0, 0, 0) + +#define VM_MAXUSER_ADDRESS UVADDR(NUPML4E, 0, 0, 0) + +#define SHAREDPAGE (VM_MAXUSER_ADDRESS - PAGE_SIZE) +#define USRSTACK SHAREDPAGE + +#define VM_MAX_ADDRESS UPT_MAX_ADDRESS +#define VM_MIN_ADDRESS (0) + +/* + * XXX Allowing dmaplimit == 0 is a temporary workaround for vt(4) efifb's + * early use of PHYS_TO_DMAP before the mapping is actually setup. This works + * because the result is not actually accessed until later, but the early + * vt fb startup needs to be reworked. + */ +#define PHYS_TO_DMAP(x) ({ \ + KASSERT(dmaplimit == 0 || (x) < dmaplimit, \ + ("physical address %#jx not covered by the DMAP", \ + (uintmax_t)x)); \ + (x) | DMAP_MIN_ADDRESS; }) + +#define DMAP_TO_PHYS(x) ({ \ + KASSERT((x) < (DMAP_MIN_ADDRESS + dmaplimit) && \ + (x) >= DMAP_MIN_ADDRESS, \ + ("virtual address %#jx not covered by the DMAP", \ + (uintmax_t)x)); \ + (x) & ~DMAP_MIN_ADDRESS; }) + +/* + * How many physical pages per kmem arena virtual page. + */ +#ifndef VM_KMEM_SIZE_SCALE +#define VM_KMEM_SIZE_SCALE (1) +#endif + +/* + * Optional ceiling (in bytes) on the size of the kmem arena: 60% of the + * kernel map. + */ +#ifndef VM_KMEM_SIZE_MAX +#define VM_KMEM_SIZE_MAX ((VM_MAX_KERNEL_ADDRESS - \ + VM_MIN_KERNEL_ADDRESS + 1) * 3 / 5) +#endif + +/* initial pagein size of beginning of executable file */ +#ifndef VM_INITIAL_PAGEIN +#define VM_INITIAL_PAGEIN 16 +#endif + +#define ZERO_REGION_SIZE (2 * 1024 * 1024) /* 2MB */ + +#endif /* _MACHINE_VMPARAM_H_ */ diff -u -r -N usr/src/sys/modules/netmap/machine/xen/hypercall.h /usr/src/sys/modules/netmap/machine/xen/hypercall.h --- usr/src/sys/modules/netmap/machine/xen/hypercall.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/xen/hypercall.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,432 @@ +/****************************************************************************** + * hypercall.h + * + * FreeBSD-specific hypervisor handling. + * + * Copyright (c) 2002-2004, K A Fraser + * + * 64-bit updates: + * Benjamin Liu <benjamin.liu@intel.com> + * Jun Nakajima <jun.nakajima@intel.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * $FreeBSD: releng/11.0/sys/amd64/include/xen/hypercall.h 289033 2015-10-08 16:39:43Z royger $ + */ + +#ifndef __MACHINE_XEN_HYPERCALL_H__ +#define __MACHINE_XEN_HYPERCALL_H__ + +#include <sys/systm.h> + +#ifndef __XEN_HYPERVISOR_H__ +# error "please don't include this file directly" +#endif + +extern char *hypercall_page; + +#define __STR(x) #x +#define STR(x) __STR(x) +#define ENOXENSYS 38 +#define CONFIG_XEN_COMPAT 0x030002 +#define __must_check + +#define HYPERCALL_STR(name) \ + "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)" + +#define _hypercall0(type, name) \ +({ \ + type __res; \ + __asm__ volatile ( \ + HYPERCALL_STR(name) \ + : "=a" (__res) \ + : \ + : "memory" ); \ + __res; \ +}) + +#define _hypercall1(type, name, a1) \ +({ \ + type __res; \ + long __ign1; \ + __asm__ volatile ( \ + HYPERCALL_STR(name) \ + : "=a" (__res), "=D" (__ign1) \ + : "1" ((long)(a1)) \ + : "memory" ); \ + __res; \ +}) + +#define _hypercall2(type, name, a1, a2) \ +({ \ + type __res; \ + long __ign1, __ign2; \ + __asm__ volatile ( \ + HYPERCALL_STR(name) \ + : "=a" (__res), "=D" (__ign1), "=S" (__ign2) \ + : "1" ((long)(a1)), "2" ((long)(a2)) \ + : "memory" ); \ + __res; \ +}) + +#define _hypercall3(type, name, a1, a2, a3) \ +({ \ + type __res; \ + long __ign1, __ign2, __ign3; \ + __asm__ volatile ( \ + HYPERCALL_STR(name) \ + : "=a" (__res), "=D" (__ign1), "=S" (__ign2), \ + "=d" (__ign3) \ + : "1" ((long)(a1)), "2" ((long)(a2)), \ + "3" ((long)(a3)) \ + : "memory" ); \ + __res; \ +}) + +#define _hypercall4(type, name, a1, a2, a3, a4) \ +({ \ + type __res; \ + long __ign1, __ign2, __ign3; \ + register long __arg4 __asm__("r10") = (long)(a4); \ + __asm__ volatile ( \ + HYPERCALL_STR(name) \ + : "=a" (__res), "=D" (__ign1), "=S" (__ign2), \ + "=d" (__ign3), "+r" (__arg4) \ + : "1" ((long)(a1)), "2" ((long)(a2)), \ + "3" ((long)(a3)) \ + : "memory" ); \ + __res; \ +}) + +#define _hypercall5(type, name, a1, a2, a3, a4, a5) \ +({ \ + type __res; \ + long __ign1, __ign2, __ign3; \ + register long __arg4 __asm__("r10") = (long)(a4); \ + register long __arg5 __asm__("r8") = (long)(a5); \ + __asm__ volatile ( \ + HYPERCALL_STR(name) \ + : "=a" (__res), "=D" (__ign1), "=S" (__ign2), \ + "=d" (__ign3), "+r" (__arg4), "+r" (__arg5) \ + : "1" ((long)(a1)), "2" ((long)(a2)), \ + "3" ((long)(a3)) \ + : "memory" ); \ + __res; \ +}) + +static inline int +privcmd_hypercall(long op, long a1, long a2, long a3, long a4, long a5) +{ + int __res; + long __ign1, __ign2, __ign3; + register long __arg4 __asm__("r10") = (long)(a4); + register long __arg5 __asm__("r8") = (long)(a5); + long __call = (long)&hypercall_page + (op * 32); + + __asm__ volatile ( + "call *%[call]" + : "=a" (__res), "=D" (__ign1), "=S" (__ign2), + "=d" (__ign3), "+r" (__arg4), "+r" (__arg5) + : "1" ((long)(a1)), "2" ((long)(a2)), + "3" ((long)(a3)), [call] "a" (__call) + : "memory" ); + + return (__res); +} + +static inline int __must_check +HYPERVISOR_set_trap_table( + const trap_info_t *table) +{ + return _hypercall1(int, set_trap_table, table); +} + +static inline int __must_check +HYPERVISOR_mmu_update( + mmu_update_t *req, unsigned int count, unsigned int *success_count, + domid_t domid) +{ + return _hypercall4(int, mmu_update, req, count, success_count, domid); +} + +static inline int __must_check +HYPERVISOR_mmuext_op( + struct mmuext_op *op, unsigned int count, unsigned int *success_count, + domid_t domid) +{ + return _hypercall4(int, mmuext_op, op, count, success_count, domid); +} + +static inline int __must_check +HYPERVISOR_set_gdt( + unsigned long *frame_list, unsigned int entries) +{ + return _hypercall2(int, set_gdt, frame_list, entries); +} + +static inline int __must_check +HYPERVISOR_stack_switch( + unsigned long ss, unsigned long esp) +{ + return _hypercall2(int, stack_switch, ss, esp); +} + +static inline int __must_check +HYPERVISOR_set_callbacks( + unsigned long event_address, unsigned long failsafe_address, + unsigned long syscall_address) +{ + return _hypercall3(int, set_callbacks, + event_address, failsafe_address, syscall_address); +} + +static inline int +HYPERVISOR_fpu_taskswitch( + int set) +{ + return _hypercall1(int, fpu_taskswitch, set); +} + +static inline int __must_check +HYPERVISOR_sched_op_compat( + int cmd, unsigned long arg) +{ + return _hypercall2(int, sched_op_compat, cmd, arg); +} + +static inline int __must_check +HYPERVISOR_sched_op( + int cmd, void *arg) +{ + return _hypercall2(int, sched_op, cmd, arg); +} + +static inline long __must_check +HYPERVISOR_set_timer_op( + uint64_t timeout) +{ + return _hypercall1(long, set_timer_op, timeout); +} + +static inline int __must_check +HYPERVISOR_platform_op( + struct xen_platform_op *platform_op) +{ + platform_op->interface_version = XENPF_INTERFACE_VERSION; + return _hypercall1(int, platform_op, platform_op); +} + +static inline int __must_check +HYPERVISOR_set_debugreg( + unsigned int reg, unsigned long value) +{ + return _hypercall2(int, set_debugreg, reg, value); +} + +static inline unsigned long __must_check +HYPERVISOR_get_debugreg( + unsigned int reg) +{ + return _hypercall1(unsigned long, get_debugreg, reg); +} + +static inline int __must_check +HYPERVISOR_update_descriptor( + unsigned long ma, unsigned long word) +{ + return _hypercall2(int, update_descriptor, ma, word); +} + +static inline int __must_check +HYPERVISOR_memory_op( + unsigned int cmd, void *arg) +{ + return _hypercall2(int, memory_op, cmd, arg); +} + +static inline int __must_check +HYPERVISOR_multicall( + multicall_entry_t *call_list, unsigned int nr_calls) +{ + return _hypercall2(int, multicall, call_list, nr_calls); +} + +static inline int __must_check +HYPERVISOR_update_va_mapping( + unsigned long va, uint64_t new_val, unsigned long flags) +{ + return _hypercall3(int, update_va_mapping, va, new_val, flags); +} + +static inline int __must_check +HYPERVISOR_event_channel_op( + int cmd, void *arg) +{ + int rc = _hypercall2(int, event_channel_op, cmd, arg); + +#if CONFIG_XEN_COMPAT <= 0x030002 + if (__predict_false(rc == -ENOXENSYS)) { + struct evtchn_op op; + op.cmd = cmd; + memcpy(&op.u, arg, sizeof(op.u)); + rc = _hypercall1(int, event_channel_op_compat, &op); + memcpy(arg, &op.u, sizeof(op.u)); + } +#endif + + return rc; +} + +static inline int __must_check +HYPERVISOR_xen_version( + int cmd, void *arg) +{ + return _hypercall2(int, xen_version, cmd, arg); +} + +static inline int __must_check +HYPERVISOR_console_io( + int cmd, unsigned int count, const char *str) +{ + return _hypercall3(int, console_io, cmd, count, str); +} + +static inline int __must_check +HYPERVISOR_physdev_op( + int cmd, void *arg) +{ + int rc = _hypercall2(int, physdev_op, cmd, arg); + +#if CONFIG_XEN_COMPAT <= 0x030002 + if (__predict_false(rc == -ENOXENSYS)) { + struct physdev_op op; + op.cmd = cmd; + memcpy(&op.u, arg, sizeof(op.u)); + rc = _hypercall1(int, physdev_op_compat, &op); + memcpy(arg, &op.u, sizeof(op.u)); + } +#endif + + return rc; +} + +static inline int __must_check +HYPERVISOR_grant_table_op( + unsigned int cmd, void *uop, unsigned int count) +{ + return _hypercall3(int, grant_table_op, cmd, uop, count); +} + +static inline int __must_check +HYPERVISOR_update_va_mapping_otherdomain( + unsigned long va, uint64_t new_val, unsigned long flags, domid_t domid) +{ + return _hypercall4(int, update_va_mapping_otherdomain, va, + new_val, flags, domid); +} + +static inline int __must_check +HYPERVISOR_vm_assist( + unsigned int cmd, unsigned int type) +{ + return _hypercall2(int, vm_assist, cmd, type); +} + +static inline int __must_check +HYPERVISOR_vcpu_op( + int cmd, unsigned int vcpuid, void *extra_args) +{ + return _hypercall3(int, vcpu_op, cmd, vcpuid, extra_args); +} + +static inline int __must_check +HYPERVISOR_set_segment_base( + int reg, unsigned long value) +{ + return _hypercall2(int, set_segment_base, reg, value); +} + +static inline int __must_check +HYPERVISOR_suspend( + unsigned long srec) +{ + struct sched_shutdown sched_shutdown = { + .reason = SHUTDOWN_suspend + }; + + int rc = _hypercall3(int, sched_op, SCHEDOP_shutdown, + &sched_shutdown, srec); + +#if CONFIG_XEN_COMPAT <= 0x030002 + if (rc == -ENOXENSYS) + rc = _hypercall3(int, sched_op_compat, SCHEDOP_shutdown, + SHUTDOWN_suspend, srec); +#endif + + return rc; +} + +#if CONFIG_XEN_COMPAT <= 0x030002 +static inline int +HYPERVISOR_nmi_op( + unsigned long op, void *arg) +{ + return _hypercall2(int, nmi_op, op, arg); +} +#endif + +#ifndef CONFIG_XEN +static inline unsigned long __must_check +HYPERVISOR_hvm_op( + int op, void *arg) +{ + return _hypercall2(unsigned long, hvm_op, op, arg); +} +#endif + +static inline int __must_check +HYPERVISOR_callback_op( + int cmd, const void *arg) +{ + return _hypercall2(int, callback_op, cmd, arg); +} + +static inline int __must_check +HYPERVISOR_xenoprof_op( + int op, void *arg) +{ + return _hypercall2(int, xenoprof_op, op, arg); +} + +static inline int __must_check +HYPERVISOR_kexec_op( + unsigned long op, void *args) +{ + return _hypercall2(int, kexec_op, op, args); +} + +#undef __must_check + +#endif /* __MACHINE_XEN_HYPERCALL_H__ */ diff -u -r -N usr/src/sys/modules/netmap/machine/xen/synch_bitops.h /usr/src/sys/modules/netmap/machine/xen/synch_bitops.h --- usr/src/sys/modules/netmap/machine/xen/synch_bitops.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/xen/synch_bitops.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,129 @@ +#ifndef __XEN_SYNCH_BITOPS_H__ +#define __XEN_SYNCH_BITOPS_H__ + +/* + * Copyright 1992, Linus Torvalds. + * Heavily modified to provide guaranteed strong synchronisation + * when communicating with Xen or other guest OSes running on other CPUs. + */ + + +#define ADDR (*(volatile long *) addr) + +static __inline__ void synch_set_bit(int nr, volatile void * addr) +{ + __asm__ __volatile__ ( + "lock btsl %1,%0" + : "=m" (ADDR) : "Ir" (nr) : "memory" ); +} + +static __inline__ void synch_clear_bit(int nr, volatile void * addr) +{ + __asm__ __volatile__ ( + "lock btrl %1,%0" + : "=m" (ADDR) : "Ir" (nr) : "memory" ); +} + +static __inline__ void synch_change_bit(int nr, volatile void * addr) +{ + __asm__ __volatile__ ( + "lock btcl %1,%0" + : "=m" (ADDR) : "Ir" (nr) : "memory" ); +} + +static __inline__ int synch_test_and_set_bit(int nr, volatile void * addr) +{ + int oldbit; + __asm__ __volatile__ ( + "lock btsl %2,%1\n\tsbbl %0,%0" + : "=r" (oldbit), "=m" (ADDR) : "Ir" (nr) : "memory"); + return oldbit; +} + +static __inline__ int synch_test_and_clear_bit(int nr, volatile void * addr) +{ + int oldbit; + __asm__ __volatile__ ( + "lock btrl %2,%1\n\tsbbl %0,%0" + : "=r" (oldbit), "=m" (ADDR) : "Ir" (nr) : "memory"); + return oldbit; +} + +static __inline__ int synch_test_and_change_bit(int nr, volatile void * addr) +{ + int oldbit; + + __asm__ __volatile__ ( + "lock btcl %2,%1\n\tsbbl %0,%0" + : "=r" (oldbit), "=m" (ADDR) : "Ir" (nr) : "memory"); + return oldbit; +} + +struct __synch_xchg_dummy { unsigned long a[100]; }; +#define __synch_xg(x) ((volatile struct __synch_xchg_dummy *)(x)) + +#define synch_cmpxchg(ptr, old, new) \ +((__typeof__(*(ptr)))__synch_cmpxchg((ptr),\ + (unsigned long)(old), \ + (unsigned long)(new), \ + sizeof(*(ptr)))) + +static inline unsigned long __synch_cmpxchg(volatile void *ptr, + unsigned long old, + unsigned long new, int size) +{ + unsigned long prev; + switch (size) { + case 1: + __asm__ __volatile__("lock; cmpxchgb %b1,%2" + : "=a"(prev) + : "q"(new), "m"(*__synch_xg(ptr)), + "0"(old) + : "memory"); + return prev; + case 2: + __asm__ __volatile__("lock; cmpxchgw %w1,%2" + : "=a"(prev) + : "q"(new), "m"(*__synch_xg(ptr)), + "0"(old) + : "memory"); + return prev; + case 4: + __asm__ __volatile__("lock; cmpxchgl %k1,%2" + : "=a"(prev) + : "q"(new), "m"(*__synch_xg(ptr)), + "0"(old) + : "memory"); + return prev; + case 8: + __asm__ __volatile__("lock; cmpxchgq %1,%2" + : "=a"(prev) + : "q"(new), "m"(*__synch_xg(ptr)), + "0"(old) + : "memory"); + return prev; + } + return old; +} + +static __inline__ int synch_const_test_bit(int nr, const volatile void * addr) +{ + return ((1UL << (nr & 31)) & + (((const volatile unsigned int *) addr)[nr >> 5])) != 0; +} + +static __inline__ int synch_var_test_bit(int nr, volatile void * addr) +{ + int oldbit; + __asm__ __volatile__ ( + "btl %2,%1\n\tsbbl %0,%0" + : "=r" (oldbit) : "m" (ADDR), "Ir" (nr) ); + return oldbit; +} + +#define synch_test_bit(nr,addr) \ +(__builtin_constant_p(nr) ? \ + synch_const_test_bit((nr),(addr)) : \ + synch_var_test_bit((nr),(addr))) + +#endif /* __XEN_SYNCH_BITOPS_H__ */ diff -u -r -N usr/src/sys/modules/netmap/machine/xen/xen-os.h /usr/src/sys/modules/netmap/machine/xen/xen-os.h --- usr/src/sys/modules/netmap/machine/xen/xen-os.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/machine/xen/xen-os.h 2016-09-29 00:24:54.000000000 +0100 @@ -0,0 +1,6 @@ +/*- + * This file is in the public domain. + */ +/* $FreeBSD: releng/11.0/sys/amd64/include/xen/xen-os.h 289685 2015-10-21 10:04:35Z royger $ */ + +#include <x86/xen/xen-os.h> Files usr/src/sys/modules/netmap/netmap.ko and /usr/src/sys/modules/netmap/netmap.ko differ Files usr/src/sys/modules/netmap/netmap.o and /usr/src/sys/modules/netmap/netmap.o differ Files usr/src/sys/modules/netmap/netmap_freebsd.o and /usr/src/sys/modules/netmap/netmap_freebsd.o differ Files usr/src/sys/modules/netmap/netmap_generic.o and /usr/src/sys/modules/netmap/netmap_generic.o differ Files usr/src/sys/modules/netmap/netmap_mbq.o and /usr/src/sys/modules/netmap/netmap_mbq.o differ Files usr/src/sys/modules/netmap/netmap_mem2.o and /usr/src/sys/modules/netmap/netmap_mem2.o differ Files usr/src/sys/modules/netmap/netmap_monitor.o and /usr/src/sys/modules/netmap/netmap_monitor.o differ Files usr/src/sys/modules/netmap/netmap_offloadings.o and /usr/src/sys/modules/netmap/netmap_offloadings.o differ Files usr/src/sys/modules/netmap/netmap_pipe.o and /usr/src/sys/modules/netmap/netmap_pipe.o differ Files usr/src/sys/modules/netmap/netmap_pt.o and /usr/src/sys/modules/netmap/netmap_pt.o differ Files usr/src/sys/modules/netmap/netmap_vale.o and /usr/src/sys/modules/netmap/netmap_vale.o differ diff -u -r -N usr/src/sys/modules/netmap/opt_inet.h /usr/src/sys/modules/netmap/opt_inet.h --- usr/src/sys/modules/netmap/opt_inet.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/opt_inet.h 2016-11-23 17:05:33.907590000 +0000 @@ -0,0 +1,2 @@ +#define INET 1 +#define TCP_OFFLOAD 1 diff -u -r -N usr/src/sys/modules/netmap/opt_inet6.h /usr/src/sys/modules/netmap/opt_inet6.h --- usr/src/sys/modules/netmap/opt_inet6.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/opt_inet6.h 2016-11-23 17:05:33.909816000 +0000 @@ -0,0 +1 @@ +#define INET6 1 diff -u -r -N usr/src/sys/modules/netmap/pci_if.h /usr/src/sys/modules/netmap/pci_if.h --- usr/src/sys/modules/netmap/pci_if.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/pci_if.h 2016-11-23 17:05:33.897191000 +0000 @@ -0,0 +1,426 @@ +/* + * This file is produced automatically. + * Do not modify anything in here by hand. + * + * Created from source file + * /usr/src/sys/dev/pci/pci_if.m + * with + * makeobjops.awk + * + * See the source file for legal information + */ + + +#ifndef _pci_if_h_ +#define _pci_if_h_ + + +struct nvlist; + +enum pci_id_type { + PCI_ID_RID, + PCI_ID_MSI, +}; + +/** @brief Unique descriptor for the PCI_READ_CONFIG() method */ +extern struct kobjop_desc pci_read_config_desc; +/** @brief A function implementing the PCI_READ_CONFIG() method */ +typedef u_int32_t pci_read_config_t(device_t dev, device_t child, int reg, + int width); + +static __inline u_int32_t PCI_READ_CONFIG(device_t dev, device_t child, int reg, + int width) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_read_config); + return ((pci_read_config_t *) _m)(dev, child, reg, width); +} + +/** @brief Unique descriptor for the PCI_WRITE_CONFIG() method */ +extern struct kobjop_desc pci_write_config_desc; +/** @brief A function implementing the PCI_WRITE_CONFIG() method */ +typedef void pci_write_config_t(device_t dev, device_t child, int reg, + u_int32_t val, int width); + +static __inline void PCI_WRITE_CONFIG(device_t dev, device_t child, int reg, + u_int32_t val, int width) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_write_config); + ((pci_write_config_t *) _m)(dev, child, reg, val, width); +} + +/** @brief Unique descriptor for the PCI_GET_POWERSTATE() method */ +extern struct kobjop_desc pci_get_powerstate_desc; +/** @brief A function implementing the PCI_GET_POWERSTATE() method */ +typedef int pci_get_powerstate_t(device_t dev, device_t child); + +static __inline int PCI_GET_POWERSTATE(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_get_powerstate); + return ((pci_get_powerstate_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_SET_POWERSTATE() method */ +extern struct kobjop_desc pci_set_powerstate_desc; +/** @brief A function implementing the PCI_SET_POWERSTATE() method */ +typedef int pci_set_powerstate_t(device_t dev, device_t child, int state); + +static __inline int PCI_SET_POWERSTATE(device_t dev, device_t child, int state) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_set_powerstate); + return ((pci_set_powerstate_t *) _m)(dev, child, state); +} + +/** @brief Unique descriptor for the PCI_GET_VPD_IDENT() method */ +extern struct kobjop_desc pci_get_vpd_ident_desc; +/** @brief A function implementing the PCI_GET_VPD_IDENT() method */ +typedef int pci_get_vpd_ident_t(device_t dev, device_t child, + const char **identptr); + +static __inline int PCI_GET_VPD_IDENT(device_t dev, device_t child, + const char **identptr) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_get_vpd_ident); + return ((pci_get_vpd_ident_t *) _m)(dev, child, identptr); +} + +/** @brief Unique descriptor for the PCI_GET_VPD_READONLY() method */ +extern struct kobjop_desc pci_get_vpd_readonly_desc; +/** @brief A function implementing the PCI_GET_VPD_READONLY() method */ +typedef int pci_get_vpd_readonly_t(device_t dev, device_t child, const char *kw, + const char **vptr); + +static __inline int PCI_GET_VPD_READONLY(device_t dev, device_t child, + const char *kw, const char **vptr) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_get_vpd_readonly); + return ((pci_get_vpd_readonly_t *) _m)(dev, child, kw, vptr); +} + +/** @brief Unique descriptor for the PCI_ENABLE_BUSMASTER() method */ +extern struct kobjop_desc pci_enable_busmaster_desc; +/** @brief A function implementing the PCI_ENABLE_BUSMASTER() method */ +typedef int pci_enable_busmaster_t(device_t dev, device_t child); + +static __inline int PCI_ENABLE_BUSMASTER(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_enable_busmaster); + return ((pci_enable_busmaster_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_DISABLE_BUSMASTER() method */ +extern struct kobjop_desc pci_disable_busmaster_desc; +/** @brief A function implementing the PCI_DISABLE_BUSMASTER() method */ +typedef int pci_disable_busmaster_t(device_t dev, device_t child); + +static __inline int PCI_DISABLE_BUSMASTER(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_disable_busmaster); + return ((pci_disable_busmaster_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_ENABLE_IO() method */ +extern struct kobjop_desc pci_enable_io_desc; +/** @brief A function implementing the PCI_ENABLE_IO() method */ +typedef int pci_enable_io_t(device_t dev, device_t child, int space); + +static __inline int PCI_ENABLE_IO(device_t dev, device_t child, int space) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_enable_io); + return ((pci_enable_io_t *) _m)(dev, child, space); +} + +/** @brief Unique descriptor for the PCI_DISABLE_IO() method */ +extern struct kobjop_desc pci_disable_io_desc; +/** @brief A function implementing the PCI_DISABLE_IO() method */ +typedef int pci_disable_io_t(device_t dev, device_t child, int space); + +static __inline int PCI_DISABLE_IO(device_t dev, device_t child, int space) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_disable_io); + return ((pci_disable_io_t *) _m)(dev, child, space); +} + +/** @brief Unique descriptor for the PCI_ASSIGN_INTERRUPT() method */ +extern struct kobjop_desc pci_assign_interrupt_desc; +/** @brief A function implementing the PCI_ASSIGN_INTERRUPT() method */ +typedef int pci_assign_interrupt_t(device_t dev, device_t child); + +static __inline int PCI_ASSIGN_INTERRUPT(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_assign_interrupt); + return ((pci_assign_interrupt_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_FIND_CAP() method */ +extern struct kobjop_desc pci_find_cap_desc; +/** @brief A function implementing the PCI_FIND_CAP() method */ +typedef int pci_find_cap_t(device_t dev, device_t child, int capability, + int *capreg); + +static __inline int PCI_FIND_CAP(device_t dev, device_t child, int capability, + int *capreg) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_find_cap); + return ((pci_find_cap_t *) _m)(dev, child, capability, capreg); +} + +/** @brief Unique descriptor for the PCI_FIND_EXTCAP() method */ +extern struct kobjop_desc pci_find_extcap_desc; +/** @brief A function implementing the PCI_FIND_EXTCAP() method */ +typedef int pci_find_extcap_t(device_t dev, device_t child, int capability, + int *capreg); + +static __inline int PCI_FIND_EXTCAP(device_t dev, device_t child, + int capability, int *capreg) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_find_extcap); + return ((pci_find_extcap_t *) _m)(dev, child, capability, capreg); +} + +/** @brief Unique descriptor for the PCI_FIND_HTCAP() method */ +extern struct kobjop_desc pci_find_htcap_desc; +/** @brief A function implementing the PCI_FIND_HTCAP() method */ +typedef int pci_find_htcap_t(device_t dev, device_t child, int capability, + int *capreg); + +static __inline int PCI_FIND_HTCAP(device_t dev, device_t child, int capability, + int *capreg) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_find_htcap); + return ((pci_find_htcap_t *) _m)(dev, child, capability, capreg); +} + +/** @brief Unique descriptor for the PCI_ALLOC_MSI() method */ +extern struct kobjop_desc pci_alloc_msi_desc; +/** @brief A function implementing the PCI_ALLOC_MSI() method */ +typedef int pci_alloc_msi_t(device_t dev, device_t child, int *count); + +static __inline int PCI_ALLOC_MSI(device_t dev, device_t child, int *count) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_alloc_msi); + return ((pci_alloc_msi_t *) _m)(dev, child, count); +} + +/** @brief Unique descriptor for the PCI_ALLOC_MSIX() method */ +extern struct kobjop_desc pci_alloc_msix_desc; +/** @brief A function implementing the PCI_ALLOC_MSIX() method */ +typedef int pci_alloc_msix_t(device_t dev, device_t child, int *count); + +static __inline int PCI_ALLOC_MSIX(device_t dev, device_t child, int *count) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_alloc_msix); + return ((pci_alloc_msix_t *) _m)(dev, child, count); +} + +/** @brief Unique descriptor for the PCI_ENABLE_MSI() method */ +extern struct kobjop_desc pci_enable_msi_desc; +/** @brief A function implementing the PCI_ENABLE_MSI() method */ +typedef void pci_enable_msi_t(device_t dev, device_t child, uint64_t address, + uint16_t data); + +static __inline void PCI_ENABLE_MSI(device_t dev, device_t child, + uint64_t address, uint16_t data) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_enable_msi); + ((pci_enable_msi_t *) _m)(dev, child, address, data); +} + +/** @brief Unique descriptor for the PCI_ENABLE_MSIX() method */ +extern struct kobjop_desc pci_enable_msix_desc; +/** @brief A function implementing the PCI_ENABLE_MSIX() method */ +typedef void pci_enable_msix_t(device_t dev, device_t child, u_int index, + uint64_t address, uint32_t data); + +static __inline void PCI_ENABLE_MSIX(device_t dev, device_t child, u_int index, + uint64_t address, uint32_t data) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_enable_msix); + ((pci_enable_msix_t *) _m)(dev, child, index, address, data); +} + +/** @brief Unique descriptor for the PCI_DISABLE_MSI() method */ +extern struct kobjop_desc pci_disable_msi_desc; +/** @brief A function implementing the PCI_DISABLE_MSI() method */ +typedef void pci_disable_msi_t(device_t dev, device_t child); + +static __inline void PCI_DISABLE_MSI(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_disable_msi); + ((pci_disable_msi_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_REMAP_MSIX() method */ +extern struct kobjop_desc pci_remap_msix_desc; +/** @brief A function implementing the PCI_REMAP_MSIX() method */ +typedef int pci_remap_msix_t(device_t dev, device_t child, int count, + const u_int *vectors); + +static __inline int PCI_REMAP_MSIX(device_t dev, device_t child, int count, + const u_int *vectors) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_remap_msix); + return ((pci_remap_msix_t *) _m)(dev, child, count, vectors); +} + +/** @brief Unique descriptor for the PCI_RELEASE_MSI() method */ +extern struct kobjop_desc pci_release_msi_desc; +/** @brief A function implementing the PCI_RELEASE_MSI() method */ +typedef int pci_release_msi_t(device_t dev, device_t child); + +static __inline int PCI_RELEASE_MSI(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_release_msi); + return ((pci_release_msi_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_MSI_COUNT() method */ +extern struct kobjop_desc pci_msi_count_desc; +/** @brief A function implementing the PCI_MSI_COUNT() method */ +typedef int pci_msi_count_t(device_t dev, device_t child); + +static __inline int PCI_MSI_COUNT(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_msi_count); + return ((pci_msi_count_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_MSIX_COUNT() method */ +extern struct kobjop_desc pci_msix_count_desc; +/** @brief A function implementing the PCI_MSIX_COUNT() method */ +typedef int pci_msix_count_t(device_t dev, device_t child); + +static __inline int PCI_MSIX_COUNT(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_msix_count); + return ((pci_msix_count_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_MSIX_PBA_BAR() method */ +extern struct kobjop_desc pci_msix_pba_bar_desc; +/** @brief A function implementing the PCI_MSIX_PBA_BAR() method */ +typedef int pci_msix_pba_bar_t(device_t dev, device_t child); + +static __inline int PCI_MSIX_PBA_BAR(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_msix_pba_bar); + return ((pci_msix_pba_bar_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_MSIX_TABLE_BAR() method */ +extern struct kobjop_desc pci_msix_table_bar_desc; +/** @brief A function implementing the PCI_MSIX_TABLE_BAR() method */ +typedef int pci_msix_table_bar_t(device_t dev, device_t child); + +static __inline int PCI_MSIX_TABLE_BAR(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_msix_table_bar); + return ((pci_msix_table_bar_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_GET_ID() method */ +extern struct kobjop_desc pci_get_id_desc; +/** @brief A function implementing the PCI_GET_ID() method */ +typedef int pci_get_id_t(device_t dev, device_t child, enum pci_id_type type, + uintptr_t *id); + +static __inline int PCI_GET_ID(device_t dev, device_t child, + enum pci_id_type type, uintptr_t *id) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_get_id); + return ((pci_get_id_t *) _m)(dev, child, type, id); +} + +/** @brief Unique descriptor for the PCI_ALLOC_DEVINFO() method */ +extern struct kobjop_desc pci_alloc_devinfo_desc; +/** @brief A function implementing the PCI_ALLOC_DEVINFO() method */ +typedef struct pci_devinfo * pci_alloc_devinfo_t(device_t dev); + +static __inline struct pci_devinfo * PCI_ALLOC_DEVINFO(device_t dev) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_alloc_devinfo); + return ((pci_alloc_devinfo_t *) _m)(dev); +} + +/** @brief Unique descriptor for the PCI_CHILD_ADDED() method */ +extern struct kobjop_desc pci_child_added_desc; +/** @brief A function implementing the PCI_CHILD_ADDED() method */ +typedef void pci_child_added_t(device_t dev, device_t child); + +static __inline void PCI_CHILD_ADDED(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_child_added); + ((pci_child_added_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_IOV_ATTACH() method */ +extern struct kobjop_desc pci_iov_attach_desc; +/** @brief A function implementing the PCI_IOV_ATTACH() method */ +typedef int pci_iov_attach_t(device_t dev, device_t child, + struct nvlist *pf_schema, struct nvlist *vf_schema, + const char *name); + +static __inline int PCI_IOV_ATTACH(device_t dev, device_t child, + struct nvlist *pf_schema, + struct nvlist *vf_schema, const char *name) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_iov_attach); + return ((pci_iov_attach_t *) _m)(dev, child, pf_schema, vf_schema, name); +} + +/** @brief Unique descriptor for the PCI_IOV_DETACH() method */ +extern struct kobjop_desc pci_iov_detach_desc; +/** @brief A function implementing the PCI_IOV_DETACH() method */ +typedef int pci_iov_detach_t(device_t dev, device_t child); + +static __inline int PCI_IOV_DETACH(device_t dev, device_t child) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)dev)->ops,pci_iov_detach); + return ((pci_iov_detach_t *) _m)(dev, child); +} + +/** @brief Unique descriptor for the PCI_CREATE_IOV_CHILD() method */ +extern struct kobjop_desc pci_create_iov_child_desc; +/** @brief A function implementing the PCI_CREATE_IOV_CHILD() method */ +typedef device_t pci_create_iov_child_t(device_t bus, device_t pf, uint16_t rid, + uint16_t vid, uint16_t did); + +static __inline device_t PCI_CREATE_IOV_CHILD(device_t bus, device_t pf, + uint16_t rid, uint16_t vid, + uint16_t did) +{ + kobjop_t _m; + KOBJOPLOOKUP(((kobj_t)bus)->ops,pci_create_iov_child); + return ((pci_create_iov_child_t *) _m)(bus, pf, rid, vid, did); +} + +#endif /* _pci_if_h_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/_align.h /usr/src/sys/modules/netmap/x86/_align.h --- usr/src/sys/modules/netmap/x86/_align.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/_align.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,52 @@ +/*- + * Copyright (c) 2001 David E. O'Brien + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)param.h 5.8 (Berkeley) 6/28/91 + * $FreeBSD: releng/11.0/sys/x86/include/_align.h 301037 2016-05-31 13:31:19Z ed $ + */ + +#ifndef _X86_INCLUDE__ALIGN_H_ +#define _X86_INCLUDE__ALIGN_H_ + +/* + * Round p (pointer or byte index) up to a correctly-aligned value + * for all data types (int, long, ...). The result is unsigned int + * and must be cast to any desired pointer type. + */ +#define _ALIGNBYTES (sizeof(__register_t) - 1) +#define _ALIGN(p) (((__uintptr_t)(p) + _ALIGNBYTES) & ~_ALIGNBYTES) + +#endif /* !_X86_INCLUDE__ALIGN_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/_inttypes.h /usr/src/sys/modules/netmap/x86/_inttypes.h --- usr/src/sys/modules/netmap/x86/_inttypes.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/_inttypes.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,221 @@ +/*- + * Copyright (c) 2001 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Klaus Klein. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + * + * From: $NetBSD: int_fmtio.h,v 1.2 2001/04/26 16:25:21 kleink Exp $ + * $FreeBSD: releng/11.0/sys/x86/include/_inttypes.h 217157 2011-01-08 18:09:48Z tijl $ + */ + +#ifndef _MACHINE_INTTYPES_H_ +#define _MACHINE_INTTYPES_H_ + +/* + * Macros for format specifiers. + */ + +#ifdef __LP64__ +#define __PRI64 "l" +#define __PRIptr "l" +#else +#define __PRI64 "ll" +#define __PRIptr +#endif + +/* fprintf(3) macros for signed integers. */ + +#define PRId8 "d" /* int8_t */ +#define PRId16 "d" /* int16_t */ +#define PRId32 "d" /* int32_t */ +#define PRId64 __PRI64"d" /* int64_t */ +#define PRIdLEAST8 "d" /* int_least8_t */ +#define PRIdLEAST16 "d" /* int_least16_t */ +#define PRIdLEAST32 "d" /* int_least32_t */ +#define PRIdLEAST64 __PRI64"d" /* int_least64_t */ +#define PRIdFAST8 "d" /* int_fast8_t */ +#define PRIdFAST16 "d" /* int_fast16_t */ +#define PRIdFAST32 "d" /* int_fast32_t */ +#define PRIdFAST64 __PRI64"d" /* int_fast64_t */ +#define PRIdMAX "jd" /* intmax_t */ +#define PRIdPTR __PRIptr"d" /* intptr_t */ + +#define PRIi8 "i" /* int8_t */ +#define PRIi16 "i" /* int16_t */ +#define PRIi32 "i" /* int32_t */ +#define PRIi64 __PRI64"i" /* int64_t */ +#define PRIiLEAST8 "i" /* int_least8_t */ +#define PRIiLEAST16 "i" /* int_least16_t */ +#define PRIiLEAST32 "i" /* int_least32_t */ +#define PRIiLEAST64 __PRI64"i" /* int_least64_t */ +#define PRIiFAST8 "i" /* int_fast8_t */ +#define PRIiFAST16 "i" /* int_fast16_t */ +#define PRIiFAST32 "i" /* int_fast32_t */ +#define PRIiFAST64 __PRI64"i" /* int_fast64_t */ +#define PRIiMAX "ji" /* intmax_t */ +#define PRIiPTR __PRIptr"i" /* intptr_t */ + +/* fprintf(3) macros for unsigned integers. */ + +#define PRIo8 "o" /* uint8_t */ +#define PRIo16 "o" /* uint16_t */ +#define PRIo32 "o" /* uint32_t */ +#define PRIo64 __PRI64"o" /* uint64_t */ +#define PRIoLEAST8 "o" /* uint_least8_t */ +#define PRIoLEAST16 "o" /* uint_least16_t */ +#define PRIoLEAST32 "o" /* uint_least32_t */ +#define PRIoLEAST64 __PRI64"o" /* uint_least64_t */ +#define PRIoFAST8 "o" /* uint_fast8_t */ +#define PRIoFAST16 "o" /* uint_fast16_t */ +#define PRIoFAST32 "o" /* uint_fast32_t */ +#define PRIoFAST64 __PRI64"o" /* uint_fast64_t */ +#define PRIoMAX "jo" /* uintmax_t */ +#define PRIoPTR __PRIptr"o" /* uintptr_t */ + +#define PRIu8 "u" /* uint8_t */ +#define PRIu16 "u" /* uint16_t */ +#define PRIu32 "u" /* uint32_t */ +#define PRIu64 __PRI64"u" /* uint64_t */ +#define PRIuLEAST8 "u" /* uint_least8_t */ +#define PRIuLEAST16 "u" /* uint_least16_t */ +#define PRIuLEAST32 "u" /* uint_least32_t */ +#define PRIuLEAST64 __PRI64"u" /* uint_least64_t */ +#define PRIuFAST8 "u" /* uint_fast8_t */ +#define PRIuFAST16 "u" /* uint_fast16_t */ +#define PRIuFAST32 "u" /* uint_fast32_t */ +#define PRIuFAST64 __PRI64"u" /* uint_fast64_t */ +#define PRIuMAX "ju" /* uintmax_t */ +#define PRIuPTR __PRIptr"u" /* uintptr_t */ + +#define PRIx8 "x" /* uint8_t */ +#define PRIx16 "x" /* uint16_t */ +#define PRIx32 "x" /* uint32_t */ +#define PRIx64 __PRI64"x" /* uint64_t */ +#define PRIxLEAST8 "x" /* uint_least8_t */ +#define PRIxLEAST16 "x" /* uint_least16_t */ +#define PRIxLEAST32 "x" /* uint_least32_t */ +#define PRIxLEAST64 __PRI64"x" /* uint_least64_t */ +#define PRIxFAST8 "x" /* uint_fast8_t */ +#define PRIxFAST16 "x" /* uint_fast16_t */ +#define PRIxFAST32 "x" /* uint_fast32_t */ +#define PRIxFAST64 __PRI64"x" /* uint_fast64_t */ +#define PRIxMAX "jx" /* uintmax_t */ +#define PRIxPTR __PRIptr"x" /* uintptr_t */ + +#define PRIX8 "X" /* uint8_t */ +#define PRIX16 "X" /* uint16_t */ +#define PRIX32 "X" /* uint32_t */ +#define PRIX64 __PRI64"X" /* uint64_t */ +#define PRIXLEAST8 "X" /* uint_least8_t */ +#define PRIXLEAST16 "X" /* uint_least16_t */ +#define PRIXLEAST32 "X" /* uint_least32_t */ +#define PRIXLEAST64 __PRI64"X" /* uint_least64_t */ +#define PRIXFAST8 "X" /* uint_fast8_t */ +#define PRIXFAST16 "X" /* uint_fast16_t */ +#define PRIXFAST32 "X" /* uint_fast32_t */ +#define PRIXFAST64 __PRI64"X" /* uint_fast64_t */ +#define PRIXMAX "jX" /* uintmax_t */ +#define PRIXPTR __PRIptr"X" /* uintptr_t */ + +/* fscanf(3) macros for signed integers. */ + +#define SCNd8 "hhd" /* int8_t */ +#define SCNd16 "hd" /* int16_t */ +#define SCNd32 "d" /* int32_t */ +#define SCNd64 __PRI64"d" /* int64_t */ +#define SCNdLEAST8 "hhd" /* int_least8_t */ +#define SCNdLEAST16 "hd" /* int_least16_t */ +#define SCNdLEAST32 "d" /* int_least32_t */ +#define SCNdLEAST64 __PRI64"d" /* int_least64_t */ +#define SCNdFAST8 "d" /* int_fast8_t */ +#define SCNdFAST16 "d" /* int_fast16_t */ +#define SCNdFAST32 "d" /* int_fast32_t */ +#define SCNdFAST64 __PRI64"d" /* int_fast64_t */ +#define SCNdMAX "jd" /* intmax_t */ +#define SCNdPTR __PRIptr"d" /* intptr_t */ + +#define SCNi8 "hhi" /* int8_t */ +#define SCNi16 "hi" /* int16_t */ +#define SCNi32 "i" /* int32_t */ +#define SCNi64 __PRI64"i" /* int64_t */ +#define SCNiLEAST8 "hhi" /* int_least8_t */ +#define SCNiLEAST16 "hi" /* int_least16_t */ +#define SCNiLEAST32 "i" /* int_least32_t */ +#define SCNiLEAST64 __PRI64"i" /* int_least64_t */ +#define SCNiFAST8 "i" /* int_fast8_t */ +#define SCNiFAST16 "i" /* int_fast16_t */ +#define SCNiFAST32 "i" /* int_fast32_t */ +#define SCNiFAST64 __PRI64"i" /* int_fast64_t */ +#define SCNiMAX "ji" /* intmax_t */ +#define SCNiPTR __PRIptr"i" /* intptr_t */ + +/* fscanf(3) macros for unsigned integers. */ + +#define SCNo8 "hho" /* uint8_t */ +#define SCNo16 "ho" /* uint16_t */ +#define SCNo32 "o" /* uint32_t */ +#define SCNo64 __PRI64"o" /* uint64_t */ +#define SCNoLEAST8 "hho" /* uint_least8_t */ +#define SCNoLEAST16 "ho" /* uint_least16_t */ +#define SCNoLEAST32 "o" /* uint_least32_t */ +#define SCNoLEAST64 __PRI64"o" /* uint_least64_t */ +#define SCNoFAST8 "o" /* uint_fast8_t */ +#define SCNoFAST16 "o" /* uint_fast16_t */ +#define SCNoFAST32 "o" /* uint_fast32_t */ +#define SCNoFAST64 __PRI64"o" /* uint_fast64_t */ +#define SCNoMAX "jo" /* uintmax_t */ +#define SCNoPTR __PRIptr"o" /* uintptr_t */ + +#define SCNu8 "hhu" /* uint8_t */ +#define SCNu16 "hu" /* uint16_t */ +#define SCNu32 "u" /* uint32_t */ +#define SCNu64 __PRI64"u" /* uint64_t */ +#define SCNuLEAST8 "hhu" /* uint_least8_t */ +#define SCNuLEAST16 "hu" /* uint_least16_t */ +#define SCNuLEAST32 "u" /* uint_least32_t */ +#define SCNuLEAST64 __PRI64"u" /* uint_least64_t */ +#define SCNuFAST8 "u" /* uint_fast8_t */ +#define SCNuFAST16 "u" /* uint_fast16_t */ +#define SCNuFAST32 "u" /* uint_fast32_t */ +#define SCNuFAST64 __PRI64"u" /* uint_fast64_t */ +#define SCNuMAX "ju" /* uintmax_t */ +#define SCNuPTR __PRIptr"u" /* uintptr_t */ + +#define SCNx8 "hhx" /* uint8_t */ +#define SCNx16 "hx" /* uint16_t */ +#define SCNx32 "x" /* uint32_t */ +#define SCNx64 __PRI64"x" /* uint64_t */ +#define SCNxLEAST8 "hhx" /* uint_least8_t */ +#define SCNxLEAST16 "hx" /* uint_least16_t */ +#define SCNxLEAST32 "x" /* uint_least32_t */ +#define SCNxLEAST64 __PRI64"x" /* uint_least64_t */ +#define SCNxFAST8 "x" /* uint_fast8_t */ +#define SCNxFAST16 "x" /* uint_fast16_t */ +#define SCNxFAST32 "x" /* uint_fast32_t */ +#define SCNxFAST64 __PRI64"x" /* uint_fast64_t */ +#define SCNxMAX "jx" /* uintmax_t */ +#define SCNxPTR __PRIptr"x" /* uintptr_t */ + +#endif /* !_MACHINE_INTTYPES_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/_limits.h /usr/src/sys/modules/netmap/x86/_limits.h --- usr/src/sys/modules/netmap/x86/_limits.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/_limits.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,101 @@ +/*- + * Copyright (c) 1988, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)limits.h 8.3 (Berkeley) 1/4/94 + * $FreeBSD: releng/11.0/sys/x86/include/_limits.h 235939 2012-05-24 21:44:46Z obrien $ + */ + +#ifndef _MACHINE__LIMITS_H_ +#define _MACHINE__LIMITS_H_ + +/* + * According to ANSI (section 2.2.4.2), the values below must be usable by + * #if preprocessing directives. Additionally, the expression must have the + * same type as would an expression that is an object of the corresponding + * type converted according to the integral promotions. The subtraction for + * INT_MIN, etc., is so the value is not unsigned; e.g., 0x80000000 is an + * unsigned int for 32-bit two's complement ANSI compilers (section 3.1.3.2). + */ + +#define __CHAR_BIT 8 /* number of bits in a char */ + +#define __SCHAR_MAX 0x7f /* max value for a signed char */ +#define __SCHAR_MIN (-0x7f - 1) /* min value for a signed char */ + +#define __UCHAR_MAX 0xff /* max value for an unsigned char */ + +#define __USHRT_MAX 0xffff /* max value for an unsigned short */ +#define __SHRT_MAX 0x7fff /* max value for a short */ +#define __SHRT_MIN (-0x7fff - 1) /* min value for a short */ + +#define __UINT_MAX 0xffffffff /* max value for an unsigned int */ +#define __INT_MAX 0x7fffffff /* max value for an int */ +#define __INT_MIN (-0x7fffffff - 1) /* min value for an int */ + +#ifdef __LP64__ +#define __ULONG_MAX 0xffffffffffffffff /* max for an unsigned long */ +#define __LONG_MAX 0x7fffffffffffffff /* max for a long */ +#define __LONG_MIN (-0x7fffffffffffffff - 1) /* min for a long */ +#else +#define __ULONG_MAX 0xffffffffUL +#define __LONG_MAX 0x7fffffffL +#define __LONG_MIN (-0x7fffffffL - 1) +#endif + + /* max value for an unsigned long long */ +#define __ULLONG_MAX 0xffffffffffffffffULL +#define __LLONG_MAX 0x7fffffffffffffffLL /* max value for a long long */ +#define __LLONG_MIN (-0x7fffffffffffffffLL - 1) /* min for a long long */ + +#ifdef __LP64__ +#define __SSIZE_MAX __LONG_MAX /* max value for a ssize_t */ +#define __SIZE_T_MAX __ULONG_MAX /* max value for a size_t */ +#define __OFF_MAX __LONG_MAX /* max value for an off_t */ +#define __OFF_MIN __LONG_MIN /* min value for an off_t */ +/* Quads and longs are the same on the amd64. Ensure they stay in sync. */ +#define __UQUAD_MAX __ULONG_MAX /* max value for a uquad_t */ +#define __QUAD_MAX __LONG_MAX /* max value for a quad_t */ +#define __QUAD_MIN __LONG_MIN /* min value for a quad_t */ +#define __LONG_BIT 64 +#else +#define __SSIZE_MAX __INT_MAX +#define __SIZE_T_MAX __UINT_MAX +#define __OFF_MAX __LLONG_MAX +#define __OFF_MIN __LLONG_MIN +#define __UQUAD_MAX __ULLONG_MAX +#define __QUAD_MAX __LLONG_MAX +#define __QUAD_MIN __LLONG_MIN +#define __LONG_BIT 32 +#endif + +#define __WORD_BIT 32 + +/* Minimum signal stack size. */ +#define __MINSIGSTKSZ (512 * 4) + +#endif /* !_MACHINE__LIMITS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/_stdint.h /usr/src/sys/modules/netmap/x86/_stdint.h --- usr/src/sys/modules/netmap/x86/_stdint.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/_stdint.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,191 @@ +/*- + * Copyright (c) 2001, 2002 Mike Barcroft <mike@FreeBSD.org> + * Copyright (c) 2001 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Klaus Klein. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the NetBSD + * Foundation, Inc. and its contributors. + * 4. Neither the name of The NetBSD Foundation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/_stdint.h 301030 2016-05-31 08:38:24Z ed $ + */ + +#ifndef _MACHINE__STDINT_H_ +#define _MACHINE__STDINT_H_ + +#include <machine/_limits.h> + +#if !defined(__cplusplus) || defined(__STDC_CONSTANT_MACROS) + +#define INT8_C(c) (c) +#define INT16_C(c) (c) +#define INT32_C(c) (c) + +#define UINT8_C(c) (c) +#define UINT16_C(c) (c) +#define UINT32_C(c) (c ## U) + +#ifdef __LP64__ +#define INT64_C(c) (c ## L) +#define UINT64_C(c) (c ## UL) +#else +#define INT64_C(c) (c ## LL) +#define UINT64_C(c) (c ## ULL) +#endif + +#define INTMAX_C(c) INT64_C(c) +#define UINTMAX_C(c) UINT64_C(c) + +#endif /* !defined(__cplusplus) || defined(__STDC_CONSTANT_MACROS) */ + +#if !defined(__cplusplus) || defined(__STDC_LIMIT_MACROS) + +/* + * ISO/IEC 9899:1999 + * 7.18.2.1 Limits of exact-width integer types + */ +#define INT8_MIN (-0x7f-1) +#define INT16_MIN (-0x7fff-1) +#define INT32_MIN (-0x7fffffff-1) + +#define INT8_MAX 0x7f +#define INT16_MAX 0x7fff +#define INT32_MAX 0x7fffffff + +#define UINT8_MAX 0xff +#define UINT16_MAX 0xffff +#define UINT32_MAX 0xffffffffU + +#ifdef __LP64__ +#define INT64_MIN (-0x7fffffffffffffff-1) +#define INT64_MAX 0x7fffffffffffffff +#define UINT64_MAX 0xffffffffffffffff +#else +#define INT64_MIN (-0x7fffffffffffffffLL-1) +#define INT64_MAX 0x7fffffffffffffffLL +#define UINT64_MAX 0xffffffffffffffffULL +#endif + +/* + * ISO/IEC 9899:1999 + * 7.18.2.2 Limits of minimum-width integer types + */ +/* Minimum values of minimum-width signed integer types. */ +#define INT_LEAST8_MIN INT8_MIN +#define INT_LEAST16_MIN INT16_MIN +#define INT_LEAST32_MIN INT32_MIN +#define INT_LEAST64_MIN INT64_MIN + +/* Maximum values of minimum-width signed integer types. */ +#define INT_LEAST8_MAX INT8_MAX +#define INT_LEAST16_MAX INT16_MAX +#define INT_LEAST32_MAX INT32_MAX +#define INT_LEAST64_MAX INT64_MAX + +/* Maximum values of minimum-width unsigned integer types. */ +#define UINT_LEAST8_MAX UINT8_MAX +#define UINT_LEAST16_MAX UINT16_MAX +#define UINT_LEAST32_MAX UINT32_MAX +#define UINT_LEAST64_MAX UINT64_MAX + +/* + * ISO/IEC 9899:1999 + * 7.18.2.3 Limits of fastest minimum-width integer types + */ +/* Minimum values of fastest minimum-width signed integer types. */ +#define INT_FAST8_MIN INT32_MIN +#define INT_FAST16_MIN INT32_MIN +#define INT_FAST32_MIN INT32_MIN +#define INT_FAST64_MIN INT64_MIN + +/* Maximum values of fastest minimum-width signed integer types. */ +#define INT_FAST8_MAX INT32_MAX +#define INT_FAST16_MAX INT32_MAX +#define INT_FAST32_MAX INT32_MAX +#define INT_FAST64_MAX INT64_MAX + +/* Maximum values of fastest minimum-width unsigned integer types. */ +#define UINT_FAST8_MAX UINT32_MAX +#define UINT_FAST16_MAX UINT32_MAX +#define UINT_FAST32_MAX UINT32_MAX +#define UINT_FAST64_MAX UINT64_MAX + +/* + * ISO/IEC 9899:1999 + * 7.18.2.4 Limits of integer types capable of holding object pointers + */ +#ifdef __LP64__ +#define INTPTR_MIN INT64_MIN +#define INTPTR_MAX INT64_MAX +#define UINTPTR_MAX UINT64_MAX +#else +#define INTPTR_MIN INT32_MIN +#define INTPTR_MAX INT32_MAX +#define UINTPTR_MAX UINT32_MAX +#endif + +/* + * ISO/IEC 9899:1999 + * 7.18.2.5 Limits of greatest-width integer types + */ +#define INTMAX_MIN INT64_MIN +#define INTMAX_MAX INT64_MAX +#define UINTMAX_MAX UINT64_MAX + +/* + * ISO/IEC 9899:1999 + * 7.18.3 Limits of other integer types + */ +#ifdef __LP64__ +/* Limits of ptrdiff_t. */ +#define PTRDIFF_MIN INT64_MIN +#define PTRDIFF_MAX INT64_MAX + +/* Limits of sig_atomic_t. */ +#define SIG_ATOMIC_MIN __LONG_MIN +#define SIG_ATOMIC_MAX __LONG_MAX + +/* Limit of size_t. */ +#define SIZE_MAX UINT64_MAX +#else +#define PTRDIFF_MIN INT32_MIN +#define PTRDIFF_MAX INT32_MAX +#define SIG_ATOMIC_MIN INT32_MIN +#define SIG_ATOMIC_MAX INT32_MAX +#define SIZE_MAX UINT32_MAX +#endif + +/* Limits of wint_t. */ +#define WINT_MIN INT32_MIN +#define WINT_MAX INT32_MAX + +#endif /* !defined(__cplusplus) || defined(__STDC_LIMIT_MACROS) */ + +#endif /* !_MACHINE__STDINT_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/_types.h /usr/src/sys/modules/netmap/x86/_types.h --- usr/src/sys/modules/netmap/x86/_types.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/_types.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,174 @@ +/*- + * Copyright (c) 2002 Mike Barcroft <mike@FreeBSD.org> + * Copyright (c) 1990, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * From: @(#)ansi.h 8.2 (Berkeley) 1/4/94 + * From: @(#)types.h 8.3 (Berkeley) 1/5/94 + * $FreeBSD: releng/11.0/sys/x86/include/_types.h 301029 2016-05-31 08:36:39Z ed $ + */ + +#ifndef _MACHINE__TYPES_H_ +#define _MACHINE__TYPES_H_ + +#ifndef _SYS_CDEFS_H_ +#error this file needs sys/cdefs.h as a prerequisite +#endif + +#include <machine/_limits.h> + +#define __NO_STRICT_ALIGNMENT + +/* + * Basic types upon which most other types are built. + */ +typedef signed char __int8_t; +typedef unsigned char __uint8_t; +typedef short __int16_t; +typedef unsigned short __uint16_t; +typedef int __int32_t; +typedef unsigned int __uint32_t; +#ifdef __LP64__ +typedef long __int64_t; +typedef unsigned long __uint64_t; +#else +#ifndef lint +__extension__ +#endif +/* LONGLONG */ +typedef long long __int64_t; +#ifndef lint +__extension__ +#endif +/* LONGLONG */ +typedef unsigned long long __uint64_t; +#endif + +/* + * Standard type definitions. + */ +#ifdef __LP64__ +typedef __int32_t __clock_t; /* clock()... */ +typedef __int64_t __critical_t; +typedef double __double_t; +typedef float __float_t; +typedef __int64_t __intfptr_t; +typedef __int64_t __intptr_t; +#else +typedef unsigned long __clock_t; +typedef __int32_t __critical_t; +typedef long double __double_t; +typedef long double __float_t; +typedef __int32_t __intfptr_t; +typedef __int32_t __intptr_t; +#endif +typedef __int64_t __intmax_t; +typedef __int32_t __int_fast8_t; +typedef __int32_t __int_fast16_t; +typedef __int32_t __int_fast32_t; +typedef __int64_t __int_fast64_t; +typedef __int8_t __int_least8_t; +typedef __int16_t __int_least16_t; +typedef __int32_t __int_least32_t; +typedef __int64_t __int_least64_t; +#ifdef __LP64__ +typedef __int64_t __ptrdiff_t; /* ptr1 - ptr2 */ +typedef __int64_t __register_t; +typedef __int64_t __segsz_t; /* segment size (in pages) */ +typedef __uint64_t __size_t; /* sizeof() */ +typedef __int64_t __ssize_t; /* byte count or error */ +typedef __int64_t __time_t; /* time()... */ +typedef __uint64_t __uintfptr_t; +typedef __uint64_t __uintptr_t; +#else +typedef __int32_t __ptrdiff_t; +typedef __int32_t __register_t; +typedef __int32_t __segsz_t; +typedef __uint32_t __size_t; +typedef __int32_t __ssize_t; +typedef __int32_t __time_t; +typedef __uint32_t __uintfptr_t; +typedef __uint32_t __uintptr_t; +#endif +typedef __uint64_t __uintmax_t; +typedef __uint32_t __uint_fast8_t; +typedef __uint32_t __uint_fast16_t; +typedef __uint32_t __uint_fast32_t; +typedef __uint64_t __uint_fast64_t; +typedef __uint8_t __uint_least8_t; +typedef __uint16_t __uint_least16_t; +typedef __uint32_t __uint_least32_t; +typedef __uint64_t __uint_least64_t; +#ifdef __LP64__ +typedef __uint64_t __u_register_t; +typedef __uint64_t __vm_offset_t; +typedef __uint64_t __vm_paddr_t; +typedef __uint64_t __vm_size_t; +#else +typedef __uint32_t __u_register_t; +typedef __uint32_t __vm_offset_t; +#ifdef PAE +typedef __uint64_t __vm_paddr_t; +#else +typedef __uint32_t __vm_paddr_t; +#endif +typedef __uint32_t __vm_size_t; +#endif +typedef __int64_t __vm_ooffset_t; +typedef __uint64_t __vm_pindex_t; +typedef int ___wchar_t; + +#define __WCHAR_MIN __INT_MIN /* min value for a wchar_t */ +#define __WCHAR_MAX __INT_MAX /* max value for a wchar_t */ + +/* + * Unusual type definitions. + */ +#ifdef __GNUCLIKE_BUILTIN_VARARGS +typedef __builtin_va_list __va_list; /* internally known to gcc */ +#else +#ifdef __LP64__ +struct __s_va_list { + __uint32_t _pad1[2]; /* gp_offset, fp_offset */ + __uint64_t _pad2[2]; /* overflow_arg_area, reg_save_area */ +}; +typedef struct __s_va_list __va_list; +#else +typedef char * __va_list; +#endif +#endif +#if defined(__GNUC_VA_LIST_COMPATIBILITY) && !defined(__GNUC_VA_LIST) \ + && !defined(__NO_GNUC_VA_LIST) +#define __GNUC_VA_LIST +typedef __va_list __gnuc_va_list; /* compatibility w/GNU headers*/ +#endif + +#endif /* !_MACHINE__TYPES_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/acpica_machdep.h /usr/src/sys/modules/netmap/x86/acpica_machdep.h --- usr/src/sys/modules/netmap/x86/acpica_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/acpica_machdep.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,88 @@ +/*- + * Copyright (c) 2002 Mitsuru IWASAKI + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/acpica_machdep.h 298094 2016-04-16 03:44:50Z gjb $ + */ + +/****************************************************************************** + * + * Name: acpica_machdep.h - arch-specific defines, etc. + * $Revision$ + * + *****************************************************************************/ + +#ifndef __ACPICA_MACHDEP_H__ +#define __ACPICA_MACHDEP_H__ + +#ifdef _KERNEL +/* + * Calling conventions: + * + * ACPI_SYSTEM_XFACE - Interfaces to host OS (handlers, threads) + * ACPI_EXTERNAL_XFACE - External ACPI interfaces + * ACPI_INTERNAL_XFACE - Internal ACPI interfaces + * ACPI_INTERNAL_VAR_XFACE - Internal variable-parameter list interfaces + */ +#define ACPI_SYSTEM_XFACE +#define ACPI_EXTERNAL_XFACE +#define ACPI_INTERNAL_XFACE +#define ACPI_INTERNAL_VAR_XFACE + +/* Asm macros */ + +#define ACPI_ASM_MACROS +#define BREAKPOINT3 +#define ACPI_DISABLE_IRQS() disable_intr() +#define ACPI_ENABLE_IRQS() enable_intr() + +#define ACPI_FLUSH_CPU_CACHE() wbinvd() + +/* Section 5.2.10.1: global lock acquire/release functions */ +int acpi_acquire_global_lock(volatile uint32_t *); +int acpi_release_global_lock(volatile uint32_t *); +#define ACPI_ACQUIRE_GLOBAL_LOCK(GLptr, Acq) do { \ + (Acq) = acpi_acquire_global_lock(&((GLptr)->GlobalLock)); \ +} while (0) +#define ACPI_RELEASE_GLOBAL_LOCK(GLptr, Acq) do { \ + (Acq) = acpi_release_global_lock(&((GLptr)->GlobalLock)); \ +} while (0) + +enum intr_trigger; +enum intr_polarity; + +void acpi_SetDefaultIntrModel(int model); +void acpi_cpu_c1(void); +void acpi_cpu_idle_mwait(uint32_t mwait_hint); +void *acpi_map_table(vm_paddr_t pa, const char *sig); +void acpi_unmap_table(void *table); +vm_paddr_t acpi_find_table(const char *sig); +void madt_parse_interrupt_values(void *entry, + enum intr_trigger *trig, enum intr_polarity *pol); + +extern int madt_found_sci_override; + +#endif /* _KERNEL */ + +#endif /* __ACPICA_MACHDEP_H__ */ diff -u -r -N usr/src/sys/modules/netmap/x86/apicreg.h /usr/src/sys/modules/netmap/x86/apicreg.h --- usr/src/sys/modules/netmap/x86/apicreg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/apicreg.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,516 @@ +/*- + * Copyright (c) 1996, by Peter Wemm and Steve Passe + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. The name of the developer may NOT be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/apicreg.h 297347 2016-03-28 09:43:40Z kib $ + */ + +#ifndef _X86_APICREG_H_ +#define _X86_APICREG_H_ + +/* + * Local && I/O APIC definitions. + */ + +/* + * Pentium P54C+ Built-in APIC + * (Advanced programmable Interrupt Controller) + * + * Base Address of Built-in APIC in memory location + * is 0xfee00000. + * + * Map of APIC Registers: + * + * Offset (hex) Description Read/Write state + * 000 Reserved + * 010 Reserved + * 020 ID Local APIC ID R/W + * 030 VER Local APIC Version R + * 040 Reserved + * 050 Reserved + * 060 Reserved + * 070 Reserved + * 080 Task Priority Register R/W + * 090 Arbitration Priority Register R + * 0A0 Processor Priority Register R + * 0B0 EOI Register W + * 0C0 RRR Remote read R + * 0D0 Logical Destination R/W + * 0E0 Destination Format Register 0..27 R; 28..31 R/W + * 0F0 SVR Spurious Interrupt Vector Reg. 0..3 R; 4..9 R/W + * 100 ISR 000-031 R + * 110 ISR 032-063 R + * 120 ISR 064-095 R + * 130 ISR 095-128 R + * 140 ISR 128-159 R + * 150 ISR 160-191 R + * 160 ISR 192-223 R + * 170 ISR 224-255 R + * 180 TMR 000-031 R + * 190 TMR 032-063 R + * 1A0 TMR 064-095 R + * 1B0 TMR 095-128 R + * 1C0 TMR 128-159 R + * 1D0 TMR 160-191 R + * 1E0 TMR 192-223 R + * 1F0 TMR 224-255 R + * 200 IRR 000-031 R + * 210 IRR 032-063 R + * 220 IRR 064-095 R + * 230 IRR 095-128 R + * 240 IRR 128-159 R + * 250 IRR 160-191 R + * 260 IRR 192-223 R + * 270 IRR 224-255 R + * 280 Error Status Register R + * 290 Reserved + * 2A0 Reserved + * 2B0 Reserved + * 2C0 Reserved + * 2D0 Reserved + * 2E0 Reserved + * 2F0 Local Vector Table (CMCI) R/W + * 300 ICR_LOW Interrupt Command Reg. (0-31) R/W + * 310 ICR_HI Interrupt Command Reg. (32-63) R/W + * 320 Local Vector Table (Timer) R/W + * 330 Local Vector Table (Thermal) R/W (PIV+) + * 340 Local Vector Table (Performance) R/W (P6+) + * 350 LVT1 Local Vector Table (LINT0) R/W + * 360 LVT2 Local Vector Table (LINT1) R/W + * 370 LVT3 Local Vector Table (ERROR) R/W + * 380 Initial Count Reg. for Timer R/W + * 390 Current Count of Timer R + * 3A0 Reserved + * 3B0 Reserved + * 3C0 Reserved + * 3D0 Reserved + * 3E0 Timer Divide Configuration Reg. R/W + * 3F0 Reserved + */ + + +/****************************************************************************** + * global defines, etc. + */ + + +/****************************************************************************** + * LOCAL APIC structure + */ + +#ifndef LOCORE +#include <sys/types.h> + +#define PAD3 int : 32; int : 32; int : 32 +#define PAD4 int : 32; int : 32; int : 32; int : 32 + +struct LAPIC { + /* reserved */ PAD4; + /* reserved */ PAD4; + u_int32_t id; PAD3; + u_int32_t version; PAD3; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + u_int32_t tpr; PAD3; + u_int32_t apr; PAD3; + u_int32_t ppr; PAD3; + u_int32_t eoi; PAD3; + /* reserved */ PAD4; + u_int32_t ldr; PAD3; + u_int32_t dfr; PAD3; + u_int32_t svr; PAD3; + u_int32_t isr0; PAD3; + u_int32_t isr1; PAD3; + u_int32_t isr2; PAD3; + u_int32_t isr3; PAD3; + u_int32_t isr4; PAD3; + u_int32_t isr5; PAD3; + u_int32_t isr6; PAD3; + u_int32_t isr7; PAD3; + u_int32_t tmr0; PAD3; + u_int32_t tmr1; PAD3; + u_int32_t tmr2; PAD3; + u_int32_t tmr3; PAD3; + u_int32_t tmr4; PAD3; + u_int32_t tmr5; PAD3; + u_int32_t tmr6; PAD3; + u_int32_t tmr7; PAD3; + u_int32_t irr0; PAD3; + u_int32_t irr1; PAD3; + u_int32_t irr2; PAD3; + u_int32_t irr3; PAD3; + u_int32_t irr4; PAD3; + u_int32_t irr5; PAD3; + u_int32_t irr6; PAD3; + u_int32_t irr7; PAD3; + u_int32_t esr; PAD3; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + u_int32_t lvt_cmci; PAD3; + u_int32_t icr_lo; PAD3; + u_int32_t icr_hi; PAD3; + u_int32_t lvt_timer; PAD3; + u_int32_t lvt_thermal; PAD3; + u_int32_t lvt_pcint; PAD3; + u_int32_t lvt_lint0; PAD3; + u_int32_t lvt_lint1; PAD3; + u_int32_t lvt_error; PAD3; + u_int32_t icr_timer; PAD3; + u_int32_t ccr_timer; PAD3; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + /* reserved */ PAD4; + u_int32_t dcr_timer; PAD3; + /* reserved */ PAD4; +}; + +typedef struct LAPIC lapic_t; + +enum LAPIC_REGISTERS { + LAPIC_ID = 0x2, + LAPIC_VERSION = 0x3, + LAPIC_TPR = 0x8, + LAPIC_APR = 0x9, + LAPIC_PPR = 0xa, + LAPIC_EOI = 0xb, + LAPIC_LDR = 0xd, + LAPIC_DFR = 0xe, /* Not in x2APIC */ + LAPIC_SVR = 0xf, + LAPIC_ISR0 = 0x10, + LAPIC_ISR1 = 0x11, + LAPIC_ISR2 = 0x12, + LAPIC_ISR3 = 0x13, + LAPIC_ISR4 = 0x14, + LAPIC_ISR5 = 0x15, + LAPIC_ISR6 = 0x16, + LAPIC_ISR7 = 0x17, + LAPIC_TMR0 = 0x18, + LAPIC_TMR1 = 0x19, + LAPIC_TMR2 = 0x1a, + LAPIC_TMR3 = 0x1b, + LAPIC_TMR4 = 0x1c, + LAPIC_TMR5 = 0x1d, + LAPIC_TMR6 = 0x1e, + LAPIC_TMR7 = 0x1f, + LAPIC_IRR0 = 0x20, + LAPIC_IRR1 = 0x21, + LAPIC_IRR2 = 0x22, + LAPIC_IRR3 = 0x23, + LAPIC_IRR4 = 0x24, + LAPIC_IRR5 = 0x25, + LAPIC_IRR6 = 0x26, + LAPIC_IRR7 = 0x27, + LAPIC_ESR = 0x28, + LAPIC_LVT_CMCI = 0x2f, + LAPIC_ICR_LO = 0x30, + LAPIC_ICR_HI = 0x31, /* Not in x2APIC */ + LAPIC_LVT_TIMER = 0x32, + LAPIC_LVT_THERMAL = 0x33, + LAPIC_LVT_PCINT = 0x34, + LAPIC_LVT_LINT0 = 0x35, + LAPIC_LVT_LINT1 = 0x36, + LAPIC_LVT_ERROR = 0x37, + LAPIC_ICR_TIMER = 0x38, + LAPIC_CCR_TIMER = 0x39, + LAPIC_DCR_TIMER = 0x3e, + LAPIC_SELF_IPI = 0x3f, /* Only in x2APIC */ +}; + +/* + * The LAPIC_SELF_IPI register only exists in x2APIC mode. The + * formula below is applicable only to reserve the memory region, + * i.e. for xAPIC mode, where LAPIC_SELF_IPI finely serves as the + * address past end of the region. + */ +#define LAPIC_MEM_REGION (LAPIC_SELF_IPI * 0x10) + +#define LAPIC_MEM_MUL 0x10 + +/****************************************************************************** + * I/O APIC structure + */ + +struct IOAPIC { + u_int32_t ioregsel; PAD3; + u_int32_t iowin; PAD3; +}; + +typedef struct IOAPIC ioapic_t; + +#undef PAD4 +#undef PAD3 + +#endif /* !LOCORE */ + + +/****************************************************************************** + * various code 'logical' values + */ + +/****************************************************************************** + * LOCAL APIC defines + */ + +/* default physical locations of LOCAL (CPU) APICs */ +#define DEFAULT_APIC_BASE 0xfee00000 + +/* constants relating to APIC ID registers */ +#define APIC_ID_MASK 0xff000000 +#define APIC_ID_SHIFT 24 +#define APIC_ID_CLUSTER 0xf0 +#define APIC_ID_CLUSTER_ID 0x0f +#define APIC_MAX_CLUSTER 0xe +#define APIC_MAX_INTRACLUSTER_ID 3 +#define APIC_ID_CLUSTER_SHIFT 4 + +/* fields in VER */ +#define APIC_VER_VERSION 0x000000ff +#define APIC_VER_MAXLVT 0x00ff0000 +#define MAXLVTSHIFT 16 +#define APIC_VER_EOI_SUPPRESSION 0x01000000 + +/* fields in LDR */ +#define APIC_LDR_RESERVED 0x00ffffff + +/* fields in DFR */ +#define APIC_DFR_RESERVED 0x0fffffff +#define APIC_DFR_MODEL_MASK 0xf0000000 +#define APIC_DFR_MODEL_FLAT 0xf0000000 +#define APIC_DFR_MODEL_CLUSTER 0x00000000 + +/* fields in SVR */ +#define APIC_SVR_VECTOR 0x000000ff +#define APIC_SVR_VEC_PROG 0x000000f0 +#define APIC_SVR_VEC_FIX 0x0000000f +#define APIC_SVR_ENABLE 0x00000100 +# define APIC_SVR_SWDIS 0x00000000 +# define APIC_SVR_SWEN 0x00000100 +#define APIC_SVR_FOCUS 0x00000200 +# define APIC_SVR_FEN 0x00000000 +# define APIC_SVR_FDIS 0x00000200 +#define APIC_SVR_EOI_SUPPRESSION 0x00001000 + +/* fields in TPR */ +#define APIC_TPR_PRIO 0x000000ff +# define APIC_TPR_INT 0x000000f0 +# define APIC_TPR_SUB 0x0000000f + +/* fields in ESR */ +#define APIC_ESR_SEND_CS_ERROR 0x00000001 +#define APIC_ESR_RECEIVE_CS_ERROR 0x00000002 +#define APIC_ESR_SEND_ACCEPT 0x00000004 +#define APIC_ESR_RECEIVE_ACCEPT 0x00000008 +#define APIC_ESR_SEND_ILLEGAL_VECTOR 0x00000020 +#define APIC_ESR_RECEIVE_ILLEGAL_VECTOR 0x00000040 +#define APIC_ESR_ILLEGAL_REGISTER 0x00000080 + +/* fields in ICR_LOW */ +#define APIC_VECTOR_MASK 0x000000ff + +#define APIC_DELMODE_MASK 0x00000700 +# define APIC_DELMODE_FIXED 0x00000000 +# define APIC_DELMODE_LOWPRIO 0x00000100 +# define APIC_DELMODE_SMI 0x00000200 +# define APIC_DELMODE_RR 0x00000300 +# define APIC_DELMODE_NMI 0x00000400 +# define APIC_DELMODE_INIT 0x00000500 +# define APIC_DELMODE_STARTUP 0x00000600 +# define APIC_DELMODE_RESV 0x00000700 + +#define APIC_DESTMODE_MASK 0x00000800 +# define APIC_DESTMODE_PHY 0x00000000 +# define APIC_DESTMODE_LOG 0x00000800 + +#define APIC_DELSTAT_MASK 0x00001000 +# define APIC_DELSTAT_IDLE 0x00000000 +# define APIC_DELSTAT_PEND 0x00001000 + +#define APIC_RESV1_MASK 0x00002000 + +#define APIC_LEVEL_MASK 0x00004000 +# define APIC_LEVEL_DEASSERT 0x00000000 +# define APIC_LEVEL_ASSERT 0x00004000 + +#define APIC_TRIGMOD_MASK 0x00008000 +# define APIC_TRIGMOD_EDGE 0x00000000 +# define APIC_TRIGMOD_LEVEL 0x00008000 + +#define APIC_RRSTAT_MASK 0x00030000 +# define APIC_RRSTAT_INVALID 0x00000000 +# define APIC_RRSTAT_INPROG 0x00010000 +# define APIC_RRSTAT_VALID 0x00020000 +# define APIC_RRSTAT_RESV 0x00030000 + +#define APIC_DEST_MASK 0x000c0000 +# define APIC_DEST_DESTFLD 0x00000000 +# define APIC_DEST_SELF 0x00040000 +# define APIC_DEST_ALLISELF 0x00080000 +# define APIC_DEST_ALLESELF 0x000c0000 + +#define APIC_RESV2_MASK 0xfff00000 + +#define APIC_ICRLO_RESV_MASK (APIC_RESV1_MASK | APIC_RESV2_MASK) + +/* fields in LVT1/2 */ +#define APIC_LVT_VECTOR 0x000000ff +#define APIC_LVT_DM 0x00000700 +# define APIC_LVT_DM_FIXED 0x00000000 +# define APIC_LVT_DM_SMI 0x00000200 +# define APIC_LVT_DM_NMI 0x00000400 +# define APIC_LVT_DM_INIT 0x00000500 +# define APIC_LVT_DM_EXTINT 0x00000700 +#define APIC_LVT_DS 0x00001000 +#define APIC_LVT_IIPP 0x00002000 +#define APIC_LVT_IIPP_INTALO 0x00002000 +#define APIC_LVT_IIPP_INTAHI 0x00000000 +#define APIC_LVT_RIRR 0x00004000 +#define APIC_LVT_TM 0x00008000 +#define APIC_LVT_M 0x00010000 + + +/* fields in LVT Timer */ +#define APIC_LVTT_VECTOR 0x000000ff +#define APIC_LVTT_DS 0x00001000 +#define APIC_LVTT_M 0x00010000 +#define APIC_LVTT_TM 0x00060000 +# define APIC_LVTT_TM_ONE_SHOT 0x00000000 +# define APIC_LVTT_TM_PERIODIC 0x00020000 +# define APIC_LVTT_TM_TSCDLT 0x00040000 +# define APIC_LVTT_TM_RSRV 0x00060000 + +/* APIC timer current count */ +#define APIC_TIMER_MAX_COUNT 0xffffffff + +/* fields in TDCR */ +#define APIC_TDCR_2 0x00 +#define APIC_TDCR_4 0x01 +#define APIC_TDCR_8 0x02 +#define APIC_TDCR_16 0x03 +#define APIC_TDCR_32 0x08 +#define APIC_TDCR_64 0x09 +#define APIC_TDCR_128 0x0a +#define APIC_TDCR_1 0x0b + +/* LVT table indices */ +#define APIC_LVT_LINT0 0 +#define APIC_LVT_LINT1 1 +#define APIC_LVT_TIMER 2 +#define APIC_LVT_ERROR 3 +#define APIC_LVT_PMC 4 +#define APIC_LVT_THERMAL 5 +#define APIC_LVT_CMCI 6 +#define APIC_LVT_MAX APIC_LVT_CMCI + +/****************************************************************************** + * I/O APIC defines + */ + +/* default physical locations of an IO APIC */ +#define DEFAULT_IO_APIC_BASE 0xfec00000 + +/* window register offset */ +#define IOAPIC_WINDOW 0x10 +#define IOAPIC_EOIR 0x40 + +/* indexes into IO APIC */ +#define IOAPIC_ID 0x00 +#define IOAPIC_VER 0x01 +#define IOAPIC_ARB 0x02 +#define IOAPIC_REDTBL 0x10 +#define IOAPIC_REDTBL0 IOAPIC_REDTBL +#define IOAPIC_REDTBL1 (IOAPIC_REDTBL+0x02) +#define IOAPIC_REDTBL2 (IOAPIC_REDTBL+0x04) +#define IOAPIC_REDTBL3 (IOAPIC_REDTBL+0x06) +#define IOAPIC_REDTBL4 (IOAPIC_REDTBL+0x08) +#define IOAPIC_REDTBL5 (IOAPIC_REDTBL+0x0a) +#define IOAPIC_REDTBL6 (IOAPIC_REDTBL+0x0c) +#define IOAPIC_REDTBL7 (IOAPIC_REDTBL+0x0e) +#define IOAPIC_REDTBL8 (IOAPIC_REDTBL+0x10) +#define IOAPIC_REDTBL9 (IOAPIC_REDTBL+0x12) +#define IOAPIC_REDTBL10 (IOAPIC_REDTBL+0x14) +#define IOAPIC_REDTBL11 (IOAPIC_REDTBL+0x16) +#define IOAPIC_REDTBL12 (IOAPIC_REDTBL+0x18) +#define IOAPIC_REDTBL13 (IOAPIC_REDTBL+0x1a) +#define IOAPIC_REDTBL14 (IOAPIC_REDTBL+0x1c) +#define IOAPIC_REDTBL15 (IOAPIC_REDTBL+0x1e) +#define IOAPIC_REDTBL16 (IOAPIC_REDTBL+0x20) +#define IOAPIC_REDTBL17 (IOAPIC_REDTBL+0x22) +#define IOAPIC_REDTBL18 (IOAPIC_REDTBL+0x24) +#define IOAPIC_REDTBL19 (IOAPIC_REDTBL+0x26) +#define IOAPIC_REDTBL20 (IOAPIC_REDTBL+0x28) +#define IOAPIC_REDTBL21 (IOAPIC_REDTBL+0x2a) +#define IOAPIC_REDTBL22 (IOAPIC_REDTBL+0x2c) +#define IOAPIC_REDTBL23 (IOAPIC_REDTBL+0x2e) + +/* fields in VER */ +#define IOART_VER_VERSION 0x000000ff +#define IOART_VER_MAXREDIR 0x00ff0000 +#define MAXREDIRSHIFT 16 + +/* + * fields in the IO APIC's redirection table entries + */ +#define IOART_DEST APIC_ID_MASK /* broadcast addr: all APICs */ + +#define IOART_RESV 0x00fe0000 /* reserved */ + +#define IOART_INTMASK 0x00010000 /* R/W: INTerrupt mask */ +# define IOART_INTMCLR 0x00000000 /* clear, allow INTs */ +# define IOART_INTMSET 0x00010000 /* set, inhibit INTs */ + +#define IOART_TRGRMOD 0x00008000 /* R/W: trigger mode */ +# define IOART_TRGREDG 0x00000000 /* edge */ +# define IOART_TRGRLVL 0x00008000 /* level */ + +#define IOART_REM_IRR 0x00004000 /* RO: remote IRR */ + +#define IOART_INTPOL 0x00002000 /* R/W: INT input pin polarity */ +# define IOART_INTAHI 0x00000000 /* active high */ +# define IOART_INTALO 0x00002000 /* active low */ + +#define IOART_DELIVS 0x00001000 /* RO: delivery status */ + +#define IOART_DESTMOD 0x00000800 /* R/W: destination mode */ +# define IOART_DESTPHY 0x00000000 /* physical */ +# define IOART_DESTLOG 0x00000800 /* logical */ + +#define IOART_DELMOD 0x00000700 /* R/W: delivery mode */ +# define IOART_DELFIXED 0x00000000 /* fixed */ +# define IOART_DELLOPRI 0x00000100 /* lowest priority */ +# define IOART_DELSMI 0x00000200 /* System Management INT */ +# define IOART_DELRSV1 0x00000300 /* reserved */ +# define IOART_DELNMI 0x00000400 /* NMI signal */ +# define IOART_DELINIT 0x00000500 /* INIT signal */ +# define IOART_DELRSV2 0x00000600 /* reserved */ +# define IOART_DELEXINT 0x00000700 /* External INTerrupt */ + +#define IOART_INTVEC 0x000000ff /* R/W: INTerrupt vector field */ + +#endif /* _X86_APICREG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/apicvar.h /usr/src/sys/modules/netmap/x86/apicvar.h --- usr/src/sys/modules/netmap/x86/apicvar.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/apicvar.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,467 @@ +/*- + * Copyright (c) 2003 John Baldwin <jhb@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/apicvar.h 301015 2016-05-31 04:47:53Z sephe $ + */ + +#ifndef _X86_APICVAR_H_ +#define _X86_APICVAR_H_ + +/* + * Local && I/O APIC variable definitions. + */ + +/* + * Layout of local APIC interrupt vectors: + * + * 0xff (255) +-------------+ + * | | 15 (Spurious / IPIs / Local Interrupts) + * 0xf0 (240) +-------------+ + * | | 14 (I/O Interrupts / Timer) + * 0xe0 (224) +-------------+ + * | | 13 (I/O Interrupts) + * 0xd0 (208) +-------------+ + * | | 12 (I/O Interrupts) + * 0xc0 (192) +-------------+ + * | | 11 (I/O Interrupts) + * 0xb0 (176) +-------------+ + * | | 10 (I/O Interrupts) + * 0xa0 (160) +-------------+ + * | | 9 (I/O Interrupts) + * 0x90 (144) +-------------+ + * | | 8 (I/O Interrupts / System Calls) + * 0x80 (128) +-------------+ + * | | 7 (I/O Interrupts) + * 0x70 (112) +-------------+ + * | | 6 (I/O Interrupts) + * 0x60 (96) +-------------+ + * | | 5 (I/O Interrupts) + * 0x50 (80) +-------------+ + * | | 4 (I/O Interrupts) + * 0x40 (64) +-------------+ + * | | 3 (I/O Interrupts) + * 0x30 (48) +-------------+ + * | | 2 (ATPIC Interrupts) + * 0x20 (32) +-------------+ + * | | 1 (Exceptions, traps, faults, etc.) + * 0x10 (16) +-------------+ + * | | 0 (Exceptions, traps, faults, etc.) + * 0x00 (0) +-------------+ + * + * Note: 0x80 needs to be handled specially and not allocated to an + * I/O device! + */ + +#define MAX_APIC_ID 0xfe +#define APIC_ID_ALL 0xff + +/* I/O Interrupts are used for external devices such as ISA, PCI, etc. */ +#define APIC_IO_INTS (IDT_IO_INTS + 16) +#define APIC_NUM_IOINTS 191 + +/* The timer interrupt is used for clock handling and drives hardclock, etc. */ +#define APIC_TIMER_INT (APIC_IO_INTS + APIC_NUM_IOINTS) + +/* + ********************* !!! WARNING !!! ****************************** + * Each local apic has an interrupt receive fifo that is two entries deep + * for each interrupt priority class (higher 4 bits of interrupt vector). + * Once the fifo is full the APIC can no longer receive interrupts for this + * class and sending IPIs from other CPUs will be blocked. + * To avoid deadlocks there should be no more than two IPI interrupts + * pending at the same time. + * Currently this is guaranteed by dividing the IPIs in two groups that have + * each at most one IPI interrupt pending. The first group is protected by the + * smp_ipi_mtx and waits for the completion of the IPI (Only one IPI user + * at a time) The second group uses a single interrupt and a bitmap to avoid + * redundant IPI interrupts. + */ + +/* Interrupts for local APIC LVT entries other than the timer. */ +#define APIC_LOCAL_INTS 240 +#define APIC_ERROR_INT APIC_LOCAL_INTS +#define APIC_THERMAL_INT (APIC_LOCAL_INTS + 1) +#define APIC_CMC_INT (APIC_LOCAL_INTS + 2) +#define APIC_IPI_INTS (APIC_LOCAL_INTS + 3) + +#define IPI_RENDEZVOUS (APIC_IPI_INTS) /* Inter-CPU rendezvous. */ +#define IPI_INVLTLB (APIC_IPI_INTS + 1) /* TLB Shootdown IPIs */ +#define IPI_INVLPG (APIC_IPI_INTS + 2) +#define IPI_INVLRNG (APIC_IPI_INTS + 3) +#define IPI_INVLCACHE (APIC_IPI_INTS + 4) +/* Vector to handle bitmap based IPIs */ +#define IPI_BITMAP_VECTOR (APIC_IPI_INTS + 5) + +/* IPIs handled by IPI_BITMAP_VECTOR */ +#define IPI_AST 0 /* Generate software trap. */ +#define IPI_PREEMPT 1 +#define IPI_HARDCLOCK 2 +#define IPI_BITMAP_LAST IPI_HARDCLOCK +#define IPI_IS_BITMAPED(x) ((x) <= IPI_BITMAP_LAST) + +#define IPI_STOP (APIC_IPI_INTS + 6) /* Stop CPU until restarted. */ +#define IPI_SUSPEND (APIC_IPI_INTS + 7) /* Suspend CPU until restarted. */ +#ifdef __i386__ +#define IPI_LAZYPMAP (APIC_IPI_INTS + 8) /* Lazy pmap release. */ +#define IPI_DYN_FIRST (APIC_IPI_INTS + 9) +#else +#define IPI_DYN_FIRST (APIC_IPI_INTS + 8) +#endif +#define IPI_DYN_LAST (253) /* IPIs allocated at runtime */ + +/* + * IPI_STOP_HARD does not need to occupy a slot in the IPI vector space since + * it is delivered using an NMI anyways. + */ +#define IPI_NMI_FIRST 254 +#define IPI_TRACE 254 /* Interrupt for tracing. */ +#define IPI_STOP_HARD 255 /* Stop CPU with a NMI. */ + +/* + * The spurious interrupt can share the priority class with the IPIs since + * it is not a normal interrupt. (Does not use the APIC's interrupt fifo) + */ +#define APIC_SPURIOUS_INT 255 + +#ifndef LOCORE + +#define APIC_IPI_DEST_SELF -1 +#define APIC_IPI_DEST_ALL -2 +#define APIC_IPI_DEST_OTHERS -3 + +#define APIC_BUS_UNKNOWN -1 +#define APIC_BUS_ISA 0 +#define APIC_BUS_EISA 1 +#define APIC_BUS_PCI 2 +#define APIC_BUS_MAX APIC_BUS_PCI + +#define IRQ_EXTINT (NUM_IO_INTS + 1) +#define IRQ_NMI (NUM_IO_INTS + 2) +#define IRQ_SMI (NUM_IO_INTS + 3) +#define IRQ_DISABLED (NUM_IO_INTS + 4) + +/* + * An APIC enumerator is a psuedo bus driver that enumerates APIC's including + * CPU's and I/O APIC's. + */ +struct apic_enumerator { + const char *apic_name; + int (*apic_probe)(void); + int (*apic_probe_cpus)(void); + int (*apic_setup_local)(void); + int (*apic_setup_io)(void); + SLIST_ENTRY(apic_enumerator) apic_next; +}; + +inthand_t + IDTVEC(apic_isr1), IDTVEC(apic_isr2), IDTVEC(apic_isr3), + IDTVEC(apic_isr4), IDTVEC(apic_isr5), IDTVEC(apic_isr6), + IDTVEC(apic_isr7), IDTVEC(cmcint), IDTVEC(errorint), + IDTVEC(spuriousint), IDTVEC(timerint); + +extern vm_paddr_t lapic_paddr; +extern int apic_cpuids[]; + +void apic_register_enumerator(struct apic_enumerator *enumerator); +void *ioapic_create(vm_paddr_t addr, int32_t apic_id, int intbase); +int ioapic_disable_pin(void *cookie, u_int pin); +int ioapic_get_vector(void *cookie, u_int pin); +void ioapic_register(void *cookie); +int ioapic_remap_vector(void *cookie, u_int pin, int vector); +int ioapic_set_bus(void *cookie, u_int pin, int bus_type); +int ioapic_set_extint(void *cookie, u_int pin); +int ioapic_set_nmi(void *cookie, u_int pin); +int ioapic_set_polarity(void *cookie, u_int pin, enum intr_polarity pol); +int ioapic_set_triggermode(void *cookie, u_int pin, + enum intr_trigger trigger); +int ioapic_set_smi(void *cookie, u_int pin); + +/* + * Struct containing pointers to APIC functions whose + * implementation is run time selectable. + */ +struct apic_ops { + void (*create)(u_int, int); + void (*init)(vm_paddr_t); + void (*xapic_mode)(void); + void (*setup)(int); + void (*dump)(const char *); + void (*disable)(void); + void (*eoi)(void); + int (*id)(void); + int (*intr_pending)(u_int); + void (*set_logical_id)(u_int, u_int, u_int); + u_int (*cpuid)(u_int); + + /* Vectors */ + u_int (*alloc_vector)(u_int, u_int); + u_int (*alloc_vectors)(u_int, u_int *, u_int, u_int); + void (*enable_vector)(u_int, u_int); + void (*disable_vector)(u_int, u_int); + void (*free_vector)(u_int, u_int, u_int); + + + /* PMC */ + int (*enable_pmc)(void); + void (*disable_pmc)(void); + void (*reenable_pmc)(void); + + /* CMC */ + void (*enable_cmc)(void); + + /* IPI */ + void (*ipi_raw)(register_t, u_int); + void (*ipi_vectored)(u_int, int); + int (*ipi_wait)(int); + int (*ipi_alloc)(inthand_t *ipifunc); + void (*ipi_free)(int vector); + + /* LVT */ + int (*set_lvt_mask)(u_int, u_int, u_char); + int (*set_lvt_mode)(u_int, u_int, u_int32_t); + int (*set_lvt_polarity)(u_int, u_int, enum intr_polarity); + int (*set_lvt_triggermode)(u_int, u_int, enum intr_trigger); +}; + +extern struct apic_ops apic_ops; + +static inline void +lapic_create(u_int apic_id, int boot_cpu) +{ + + apic_ops.create(apic_id, boot_cpu); +} + +static inline void +lapic_init(vm_paddr_t addr) +{ + + apic_ops.init(addr); +} + +static inline void +lapic_xapic_mode(void) +{ + + apic_ops.xapic_mode(); +} + +static inline void +lapic_setup(int boot) +{ + + apic_ops.setup(boot); +} + +static inline void +lapic_dump(const char *str) +{ + + apic_ops.dump(str); +} + +static inline void +lapic_disable(void) +{ + + apic_ops.disable(); +} + +static inline void +lapic_eoi(void) +{ + + apic_ops.eoi(); +} + +static inline int +lapic_id(void) +{ + + return (apic_ops.id()); +} + +static inline int +lapic_intr_pending(u_int vector) +{ + + return (apic_ops.intr_pending(vector)); +} + +/* XXX: UNUSED */ +static inline void +lapic_set_logical_id(u_int apic_id, u_int cluster, u_int cluster_id) +{ + + apic_ops.set_logical_id(apic_id, cluster, cluster_id); +} + +static inline u_int +apic_cpuid(u_int apic_id) +{ + + return (apic_ops.cpuid(apic_id)); +} + +static inline u_int +apic_alloc_vector(u_int apic_id, u_int irq) +{ + + return (apic_ops.alloc_vector(apic_id, irq)); +} + +static inline u_int +apic_alloc_vectors(u_int apic_id, u_int *irqs, u_int count, u_int align) +{ + + return (apic_ops.alloc_vectors(apic_id, irqs, count, align)); +} + +static inline void +apic_enable_vector(u_int apic_id, u_int vector) +{ + + apic_ops.enable_vector(apic_id, vector); +} + +static inline void +apic_disable_vector(u_int apic_id, u_int vector) +{ + + apic_ops.disable_vector(apic_id, vector); +} + +static inline void +apic_free_vector(u_int apic_id, u_int vector, u_int irq) +{ + + apic_ops.free_vector(apic_id, vector, irq); +} + +static inline int +lapic_enable_pmc(void) +{ + + return (apic_ops.enable_pmc()); +} + +static inline void +lapic_disable_pmc(void) +{ + + apic_ops.disable_pmc(); +} + +static inline void +lapic_reenable_pmc(void) +{ + + apic_ops.reenable_pmc(); +} + +static inline void +lapic_enable_cmc(void) +{ + + apic_ops.enable_cmc(); +} + +static inline void +lapic_ipi_raw(register_t icrlo, u_int dest) +{ + + apic_ops.ipi_raw(icrlo, dest); +} + +static inline void +lapic_ipi_vectored(u_int vector, int dest) +{ + + apic_ops.ipi_vectored(vector, dest); +} + +static inline int +lapic_ipi_wait(int delay) +{ + + return (apic_ops.ipi_wait(delay)); +} + +static inline int +lapic_ipi_alloc(inthand_t *ipifunc) +{ + + return (apic_ops.ipi_alloc(ipifunc)); +} + +static inline void +lapic_ipi_free(int vector) +{ + + return (apic_ops.ipi_free(vector)); +} + +static inline int +lapic_set_lvt_mask(u_int apic_id, u_int lvt, u_char masked) +{ + + return (apic_ops.set_lvt_mask(apic_id, lvt, masked)); +} + +static inline int +lapic_set_lvt_mode(u_int apic_id, u_int lvt, u_int32_t mode) +{ + + return (apic_ops.set_lvt_mode(apic_id, lvt, mode)); +} + +static inline int +lapic_set_lvt_polarity(u_int apic_id, u_int lvt, enum intr_polarity pol) +{ + + return (apic_ops.set_lvt_polarity(apic_id, lvt, pol)); +} + +static inline int +lapic_set_lvt_triggermode(u_int apic_id, u_int lvt, enum intr_trigger trigger) +{ + + return (apic_ops.set_lvt_triggermode(apic_id, lvt, trigger)); +} + +void lapic_handle_cmc(void); +void lapic_handle_error(void); +void lapic_handle_intr(int vector, struct trapframe *frame); +void lapic_handle_timer(struct trapframe *frame); + +extern int x2apic_mode; +extern int lapic_eoi_suppression; + +#ifdef _SYS_SYSCTL_H_ +SYSCTL_DECL(_hw_apic); +#endif + +#endif /* !LOCORE */ +#endif /* _X86_APICVAR_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/apm_bios.h /usr/src/sys/modules/netmap/x86/apm_bios.h --- usr/src/sys/modules/netmap/x86/apm_bios.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/apm_bios.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,264 @@ +/*- + * APM (Advanced Power Management) BIOS Device Driver + * + * Copyright (c) 1994-1995 by HOSOKAWA, Tatsumi <hosokawa@mt.cs.keio.ac.jp> + * + * This software may be used, modified, copied, and distributed, in + * both source and binary form provided that the above copyright and + * these terms are retained. Under no circumstances is the author + * responsible for the proper functioning of this software, nor does + * the author assume any responsibility for damages incurred with its + * use. + * + * Aug, 1994 Implemented on FreeBSD 1.1.5.1R (Toshiba AVS001WD) + * + * $FreeBSD: releng/11.0/sys/x86/include/apm_bios.h 215140 2010-11-11 19:36:21Z jkim $ + */ + +#ifndef _X86_APM_BIOS_H_ +#define _X86_APM_BIOS_H_ + +#ifndef _KERNEL +#include <sys/types.h> +#endif +#include <sys/ioccom.h> + +/* BIOS id */ +#define APM_BIOS 0x53 +#define APM_INT 0x15 + +/* APM flags */ +#define APM_16BIT_SUPPORT 0x01 +#define APM_32BIT_SUPPORT 0x02 +#define APM_CPUIDLE_SLOW 0x04 +#define APM_DISABLED 0x08 +#define APM_DISENGAGED 0x10 + +/* APM initializer physical address */ +#define APM_OURADDR 0x00080000 + +/* APM functions */ +#define APM_INSTCHECK 0x00 +#define APM_REALCONNECT 0x01 +#define APM_PROT16CONNECT 0x02 +#define APM_PROT32CONNECT 0x03 +#define APM_DISCONNECT 0x04 +#define APM_CPUIDLE 0x05 +#define APM_CPUBUSY 0x06 +#define APM_SETPWSTATE 0x07 +#define APM_ENABLEDISABLEPM 0x08 +#define APM_RESTOREDEFAULT 0x09 +#define APM_GETPWSTATUS 0x0a +#define APM_GETPMEVENT 0x0b +#define APM_GETPWSTATE 0x0c +#define APM_ENABLEDISABLEDPM 0x0d +#define APM_DRVVERSION 0x0e +#define APM_ENGAGEDISENGAGEPM 0x0f +#define APM_GETCAPABILITIES 0x10 +#define APM_RESUMETIMER 0x11 +#define APM_RESUMEONRING 0x12 +#define APM_TIMERREQUESTS 0x13 +#define APM_OEMFUNC 0x80 + +/* error code */ +#define APME_OK 0x00 +#define APME_PMDISABLED 0x01 +#define APME_REALESTABLISHED 0x02 +#define APME_NOTCONNECTED 0x03 +#define APME_PROT16ESTABLISHED 0x05 +#define APME_PROT16NOTSUPPORTED 0x06 +#define APME_PROT32ESTABLISHED 0x07 +#define APME_PROT32NOTDUPPORTED 0x08 +#define APME_UNKNOWNDEVICEID 0x09 +#define APME_OUTOFRANGE 0x0a +#define APME_NOTENGAGED 0x0b +#define APME_CANTENTERSTATE 0x60 +#define APME_NOPMEVENT 0x80 +#define APME_NOAPMPRESENT 0x86 + + +/* device code */ +#define PMDV_APMBIOS 0x0000 +#define PMDV_ALLDEV 0x0001 +#define PMDV_DISP0 0x0100 +#define PMDV_DISP1 0x0101 +#define PMDV_DISPALL 0x01ff +#define PMDV_2NDSTORAGE0 0x0200 +#define PMDV_2NDSTORAGE1 0x0201 +#define PMDV_2NDSTORAGE2 0x0202 +#define PMDV_2NDSTORAGE3 0x0203 +#define PMDV_PARALLEL0 0x0300 +#define PMDV_PARALLEL1 0x0301 +#define PMDV_SERIAL0 0x0400 +#define PMDV_SERIAL1 0x0401 +#define PMDV_SERIAL2 0x0402 +#define PMDV_SERIAL3 0x0403 +#define PMDV_SERIAL4 0x0404 +#define PMDV_SERIAL5 0x0405 +#define PMDV_SERIAL6 0x0406 +#define PMDV_SERIAL7 0x0407 +#define PMDV_NET0 0x0500 +#define PMDV_NET1 0x0501 +#define PMDV_NET2 0x0502 +#define PMDV_NET3 0x0503 +#define PMDV_PCMCIA0 0x0600 +#define PMDV_PCMCIA1 0x0601 +#define PMDV_PCMCIA2 0x0602 +#define PMDV_PCMCIA3 0x0603 +/* 0x0700 - 0x7fff Reserved */ +#define PMDV_BATT_BASE 0x8000 +#define PMDV_BATT0 0x8001 +#define PMDV_BATT1 0x8002 +#define PMDV_BATT_ALL 0x80ff +/* 0x8100 - 0xdfff Reserved */ +/* 0xe000 - 0xefff OEM-defined power device IDs */ +/* 0xf000 - 0xffff Reserved */ + +/* Power state */ +#define PMST_APMENABLED 0x0000 +#define PMST_STANDBY 0x0001 +#define PMST_SUSPEND 0x0002 +#define PMST_OFF 0x0003 +#define PMST_LASTREQNOTIFY 0x0004 +#define PMST_LASTREQREJECT 0x0005 +/* 0x0006 - 0x001f Reserved system states */ +/* 0x0020 - 0x003f OEM-defined system states */ +/* 0x0040 - 0x007f OEM-defined device states */ +/* 0x0080 - 0xffff Reserved device states */ + +#if !defined(ASSEMBLER) && !defined(INITIALIZER) + +/* C definitions */ +struct apmhook { + struct apmhook *ah_next; + int (*ah_fun)(void *ah_arg); + void *ah_arg; + const char *ah_name; + int ah_order; +}; +#define APM_HOOK_NONE (-1) +#define APM_HOOK_SUSPEND 0 +#define APM_HOOK_RESUME 1 +#define NAPM_HOOK 2 + +#ifdef _KERNEL + +void apm_suspend(int state); +struct apmhook *apm_hook_establish (int apmh, struct apmhook *); +void apm_hook_disestablish (int apmh, struct apmhook *); +void apm_cpu_idle(void); +void apm_cpu_busy(void); + +#endif + +#endif /* !ASSEMBLER && !INITIALIZER */ + +#define APM_MIN_ORDER 0x00 +#define APM_MID_ORDER 0x80 +#define APM_MAX_ORDER 0xff + +/* power management event code */ +#define PMEV_NOEVENT 0x0000 +#define PMEV_STANDBYREQ 0x0001 +#define PMEV_SUSPENDREQ 0x0002 +#define PMEV_NORMRESUME 0x0003 +#define PMEV_CRITRESUME 0x0004 +#define PMEV_BATTERYLOW 0x0005 +#define PMEV_POWERSTATECHANGE 0x0006 +#define PMEV_UPDATETIME 0x0007 +#define PMEV_CRITSUSPEND 0x0008 +#define PMEV_USERSTANDBYREQ 0x0009 +#define PMEV_USERSUSPENDREQ 0x000a +#define PMEV_STANDBYRESUME 0x000b +#define PMEV_CAPABILITIESCHANGE 0x000c +/* 0x000d - 0x00ff Reserved system events */ +/* 0x0100 - 0x01ff Reserved device events */ +/* 0x0200 - 0x02ff OEM-defined APM events */ +/* 0x0300 - 0xffff Reserved */ +#define PMEV_DEFAULT 0xffffffff /* used for customization */ + +#if !defined(ASSEMBLER) && !defined(INITIALIZER) + +/* + * Old apm_info structure, returned by the APMIO_GETINFO_OLD ioctl. This + * is for backward compatibility with old executables. + */ +typedef struct apm_info_old { + u_int ai_major; /* APM major version */ + u_int ai_minor; /* APM minor version */ + u_int ai_acline; /* AC line status */ + u_int ai_batt_stat; /* Battery status */ + u_int ai_batt_life; /* Remaining battery life */ + u_int ai_status; /* Status of APM support (enabled/disabled) */ +} *apm_info_old_t; + +/* + * Structure returned by the APMIO_GETINFO ioctl. + * + * In the comments below, the parenthesized numbers indicate the minimum + * value of ai_infoversion for which each field is valid. + */ +typedef struct apm_info { + u_int ai_infoversion; /* Indicates which fields are valid */ + u_int ai_major; /* APM major version (0) */ + u_int ai_minor; /* APM minor version (0) */ + u_int ai_acline; /* AC line status (0) */ + u_int ai_batt_stat; /* Battery status (0) */ + u_int ai_batt_life; /* Remaining battery life in percent (0) */ + int ai_batt_time; /* Remaining battery time in seconds (0) */ + u_int ai_status; /* True if enabled (0) */ + u_int ai_batteries; /* Number of batteries (1) */ + u_int ai_capabilities;/* APM Capabilities (1) */ + u_int ai_spare[6]; /* For future expansion */ +} *apm_info_t; + +/* Battery flag */ +#define APM_BATT_HIGH 0x01 +#define APM_BATT_LOW 0x02 +#define APM_BATT_CRITICAL 0x04 +#define APM_BATT_CHARGING 0x08 +#define APM_BATT_NOT_PRESENT 0x10 +#define APM_BATT_NO_SYSTEM 0x80 + +typedef struct apm_pwstatus { + u_int ap_device; /* Device code of battery */ + u_int ap_acline; /* AC line status (0) */ + u_int ap_batt_stat; /* Battery status (0) */ + u_int ap_batt_flag; /* Battery flag (0) */ + u_int ap_batt_life; /* Remaining battery life in percent (0) */ + int ap_batt_time; /* Remaining battery time in seconds (0) */ +} *apm_pwstatus_t; + +struct apm_bios_arg { + uint32_t eax; + uint32_t ebx; + uint32_t ecx; + uint32_t edx; + uint32_t esi; + uint32_t edi; +}; + +struct apm_event_info { + u_int type; + u_int index; + u_int spare[8]; +}; + +#define APMIO_SUSPEND _IO('P', 1) +#define APMIO_GETINFO_OLD _IOR('P', 2, struct apm_info_old) +#define APMIO_ENABLE _IO('P', 5) +#define APMIO_DISABLE _IO('P', 6) +#define APMIO_HALTCPU _IO('P', 7) +#define APMIO_NOTHALTCPU _IO('P', 8) +#define APMIO_DISPLAY _IOW('P', 9, int) +#define APMIO_BIOS _IOWR('P', 10, struct apm_bios_arg) +#define APMIO_GETINFO _IOR('P', 11, struct apm_info) +#define APMIO_STANDBY _IO('P', 12) +#define APMIO_GETPWSTATUS _IOWR('P', 13, struct apm_pwstatus) +/* for /dev/apmctl */ +#define APMIO_NEXTEVENT _IOR('A', 100, struct apm_event_info) +#define APMIO_REJECTLASTREQ _IO('P', 101) + +#endif /* !ASSEMBLER && !INITIALIZER */ + +#endif /* !_X86_APM_BIOS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/bus.h /usr/src/sys/modules/netmap/x86/bus.h --- usr/src/sys/modules/netmap/x86/bus.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/bus.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,1104 @@ +/*- + * Copyright (c) KATO Takenori, 1999. + * + * All rights reserved. Unpublished rights reserved under the copyright + * laws of Japan. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer as + * the first lines of this file unmodified. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/bus.h 286667 2015-08-12 15:26:32Z marcel $ + */ + +/* $NetBSD: bus.h,v 1.12 1997/10/01 08:25:15 fvdl Exp $ */ + +/*- + * Copyright (c) 1996, 1997 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Jason R. Thorpe of the Numerical Aerospace Simulation Facility, + * NASA Ames Research Center. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +/*- + * Copyright (c) 1996 Charles M. Hannum. All rights reserved. + * Copyright (c) 1996 Christopher G. Demetriou. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + * must display the following acknowledgement: + * This product includes software developed by Christopher G. Demetriou + * for the NetBSD Project. + * 4. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _X86_BUS_H_ +#define _X86_BUS_H_ + +#include <machine/_bus.h> +#include <machine/cpufunc.h> + +#ifndef __GNUCLIKE_ASM +# ifndef lint +# error "no assembler code for your compiler" +# endif +#endif + +/* + * Values for the x86 bus space tag, not to be used directly by MI code. + */ +#define X86_BUS_SPACE_IO 0 /* space is i/o space */ +#define X86_BUS_SPACE_MEM 1 /* space is mem space */ + +#define BUS_SPACE_MAXSIZE_24BIT 0xFFFFFF +#define BUS_SPACE_MAXSIZE_32BIT 0xFFFFFFFF +#define BUS_SPACE_MAXSIZE 0xFFFFFFFF +#define BUS_SPACE_MAXADDR_24BIT 0xFFFFFF +#define BUS_SPACE_MAXADDR_32BIT 0xFFFFFFFF +#if defined(__amd64__) || defined(PAE) +#define BUS_SPACE_MAXADDR 0xFFFFFFFFFFFFFFFFULL +#else +#define BUS_SPACE_MAXADDR 0xFFFFFFFF +#endif + +#define BUS_SPACE_INVALID_DATA (~0) +#define BUS_SPACE_UNRESTRICTED (~0) + +/* + * Map a region of device bus space into CPU virtual address space. + */ + +int bus_space_map(bus_space_tag_t tag, bus_addr_t addr, bus_size_t size, + int flags, bus_space_handle_t *bshp); + +/* + * Unmap a region of device bus space. + */ + +void bus_space_unmap(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t size); + +/* + * Get a new handle for a subregion of an already-mapped area of bus space. + */ + +static __inline int bus_space_subregion(bus_space_tag_t t, + bus_space_handle_t bsh, + bus_size_t offset, bus_size_t size, + bus_space_handle_t *nbshp); + +static __inline int +bus_space_subregion(bus_space_tag_t t __unused, bus_space_handle_t bsh, + bus_size_t offset, bus_size_t size __unused, + bus_space_handle_t *nbshp) +{ + + *nbshp = bsh + offset; + return (0); +} + +/* + * Allocate a region of memory that is accessible to devices in bus space. + */ + +int bus_space_alloc(bus_space_tag_t t, bus_addr_t rstart, + bus_addr_t rend, bus_size_t size, bus_size_t align, + bus_size_t boundary, int flags, bus_addr_t *addrp, + bus_space_handle_t *bshp); + +/* + * Free a region of bus space accessible memory. + */ + +static __inline void bus_space_free(bus_space_tag_t t, bus_space_handle_t bsh, + bus_size_t size); + +static __inline void +bus_space_free(bus_space_tag_t t __unused, bus_space_handle_t bsh __unused, + bus_size_t size __unused) +{ +} + + +/* + * Read a 1, 2, 4, or 8 byte quantity from bus space + * described by tag/handle/offset. + */ +static __inline u_int8_t bus_space_read_1(bus_space_tag_t tag, + bus_space_handle_t handle, + bus_size_t offset); + +static __inline u_int16_t bus_space_read_2(bus_space_tag_t tag, + bus_space_handle_t handle, + bus_size_t offset); + +static __inline u_int32_t bus_space_read_4(bus_space_tag_t tag, + bus_space_handle_t handle, + bus_size_t offset); + +#ifdef __amd64__ +static __inline uint64_t bus_space_read_8(bus_space_tag_t tag, + bus_space_handle_t handle, + bus_size_t offset); +#endif + +static __inline u_int8_t +bus_space_read_1(bus_space_tag_t tag, bus_space_handle_t handle, + bus_size_t offset) +{ + + if (tag == X86_BUS_SPACE_IO) + return (inb(handle + offset)); + return (*(volatile u_int8_t *)(handle + offset)); +} + +static __inline u_int16_t +bus_space_read_2(bus_space_tag_t tag, bus_space_handle_t handle, + bus_size_t offset) +{ + + if (tag == X86_BUS_SPACE_IO) + return (inw(handle + offset)); + return (*(volatile u_int16_t *)(handle + offset)); +} + +static __inline u_int32_t +bus_space_read_4(bus_space_tag_t tag, bus_space_handle_t handle, + bus_size_t offset) +{ + + if (tag == X86_BUS_SPACE_IO) + return (inl(handle + offset)); + return (*(volatile u_int32_t *)(handle + offset)); +} + +#ifdef __amd64__ +static __inline uint64_t +bus_space_read_8(bus_space_tag_t tag, bus_space_handle_t handle, + bus_size_t offset) +{ + + if (tag == X86_BUS_SPACE_IO) /* No 8 byte IO space access on x86 */ + return (BUS_SPACE_INVALID_DATA); + return (*(volatile uint64_t *)(handle + offset)); +} +#endif + +/* + * Read `count' 1, 2, 4, or 8 byte quantities from bus space + * described by tag/handle/offset and copy into buffer provided. + */ +static __inline void bus_space_read_multi_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int8_t *addr, + size_t count); + +static __inline void bus_space_read_multi_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int16_t *addr, + size_t count); + +static __inline void bus_space_read_multi_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int32_t *addr, + size_t count); + +static __inline void +bus_space_read_multi_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int8_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) + insb(bsh + offset, addr, count); + else { +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: movb (%2),%%al \n\ + stosb \n\ + loop 1b" : + "=D" (addr), "=c" (count) : + "r" (bsh + offset), "0" (addr), "1" (count) : + "%eax", "memory"); +#endif + } +} + +static __inline void +bus_space_read_multi_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int16_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) + insw(bsh + offset, addr, count); + else { +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: movw (%2),%%ax \n\ + stosw \n\ + loop 1b" : + "=D" (addr), "=c" (count) : + "r" (bsh + offset), "0" (addr), "1" (count) : + "%eax", "memory"); +#endif + } +} + +static __inline void +bus_space_read_multi_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int32_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) + insl(bsh + offset, addr, count); + else { +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: movl (%2),%%eax \n\ + stosl \n\ + loop 1b" : + "=D" (addr), "=c" (count) : + "r" (bsh + offset), "0" (addr), "1" (count) : + "%eax", "memory"); +#endif + } +} + +#if 0 /* Cause a link error for bus_space_read_multi_8 */ +#define bus_space_read_multi_8 !!! bus_space_read_multi_8 unimplemented !!! +#endif + +/* + * Read `count' 1, 2, 4, or 8 byte quantities from bus space + * described by tag/handle and starting at `offset' and copy into + * buffer provided. + */ +static __inline void bus_space_read_region_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int8_t *addr, + size_t count); + +static __inline void bus_space_read_region_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int16_t *addr, + size_t count); + +static __inline void bus_space_read_region_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int32_t *addr, + size_t count); + + +static __inline void +bus_space_read_region_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int8_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) { + int _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: inb %w2,%%al \n\ + stosb \n\ + incl %2 \n\ + loop 1b" : + "=D" (addr), "=c" (count), "=d" (_port_) : + "0" (addr), "1" (count), "2" (_port_) : + "%eax", "memory", "cc"); +#endif + } else { + bus_space_handle_t _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + repne \n\ + movsb" : + "=D" (addr), "=c" (count), "=S" (_port_) : + "0" (addr), "1" (count), "2" (_port_) : + "memory", "cc"); +#endif + } +} + +static __inline void +bus_space_read_region_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int16_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) { + int _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: inw %w2,%%ax \n\ + stosw \n\ + addl $2,%2 \n\ + loop 1b" : + "=D" (addr), "=c" (count), "=d" (_port_) : + "0" (addr), "1" (count), "2" (_port_) : + "%eax", "memory", "cc"); +#endif + } else { + bus_space_handle_t _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + repne \n\ + movsw" : + "=D" (addr), "=c" (count), "=S" (_port_) : + "0" (addr), "1" (count), "2" (_port_) : + "memory", "cc"); +#endif + } +} + +static __inline void +bus_space_read_region_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int32_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) { + int _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: inl %w2,%%eax \n\ + stosl \n\ + addl $4,%2 \n\ + loop 1b" : + "=D" (addr), "=c" (count), "=d" (_port_) : + "0" (addr), "1" (count), "2" (_port_) : + "%eax", "memory", "cc"); +#endif + } else { + bus_space_handle_t _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + repne \n\ + movsl" : + "=D" (addr), "=c" (count), "=S" (_port_) : + "0" (addr), "1" (count), "2" (_port_) : + "memory", "cc"); +#endif + } +} + +#if 0 /* Cause a link error for bus_space_read_region_8 */ +#define bus_space_read_region_8 !!! bus_space_read_region_8 unimplemented !!! +#endif + +/* + * Write the 1, 2, 4, or 8 byte value `value' to bus space + * described by tag/handle/offset. + */ + +static __inline void bus_space_write_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int8_t value); + +static __inline void bus_space_write_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int16_t value); + +static __inline void bus_space_write_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int32_t value); + +#ifdef __amd64__ +static __inline void bus_space_write_8(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, uint64_t value); +#endif + +static __inline void +bus_space_write_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int8_t value) +{ + + if (tag == X86_BUS_SPACE_IO) + outb(bsh + offset, value); + else + *(volatile u_int8_t *)(bsh + offset) = value; +} + +static __inline void +bus_space_write_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int16_t value) +{ + + if (tag == X86_BUS_SPACE_IO) + outw(bsh + offset, value); + else + *(volatile u_int16_t *)(bsh + offset) = value; +} + +static __inline void +bus_space_write_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int32_t value) +{ + + if (tag == X86_BUS_SPACE_IO) + outl(bsh + offset, value); + else + *(volatile u_int32_t *)(bsh + offset) = value; +} + +#ifdef __amd64__ +static __inline void +bus_space_write_8(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, uint64_t value) +{ + + if (tag == X86_BUS_SPACE_IO) /* No 8 byte IO space access on x86 */ + return; + else + *(volatile uint64_t *)(bsh + offset) = value; +} +#endif + +/* + * Write `count' 1, 2, 4, or 8 byte quantities from the buffer + * provided to bus space described by tag/handle/offset. + */ + +static __inline void bus_space_write_multi_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + const u_int8_t *addr, + size_t count); +static __inline void bus_space_write_multi_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + const u_int16_t *addr, + size_t count); + +static __inline void bus_space_write_multi_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + const u_int32_t *addr, + size_t count); + +static __inline void +bus_space_write_multi_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, const u_int8_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) + outsb(bsh + offset, addr, count); + else { +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: lodsb \n\ + movb %%al,(%2) \n\ + loop 1b" : + "=S" (addr), "=c" (count) : + "r" (bsh + offset), "0" (addr), "1" (count) : + "%eax", "memory", "cc"); +#endif + } +} + +static __inline void +bus_space_write_multi_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, const u_int16_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) + outsw(bsh + offset, addr, count); + else { +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: lodsw \n\ + movw %%ax,(%2) \n\ + loop 1b" : + "=S" (addr), "=c" (count) : + "r" (bsh + offset), "0" (addr), "1" (count) : + "%eax", "memory", "cc"); +#endif + } +} + +static __inline void +bus_space_write_multi_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, const u_int32_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) + outsl(bsh + offset, addr, count); + else { +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: lodsl \n\ + movl %%eax,(%2) \n\ + loop 1b" : + "=S" (addr), "=c" (count) : + "r" (bsh + offset), "0" (addr), "1" (count) : + "%eax", "memory", "cc"); +#endif + } +} + +#if 0 /* Cause a link error for bus_space_write_multi_8 */ +#define bus_space_write_multi_8(t, h, o, a, c) \ + !!! bus_space_write_multi_8 unimplemented !!! +#endif + +/* + * Write `count' 1, 2, 4, or 8 byte quantities from the buffer provided + * to bus space described by tag/handle starting at `offset'. + */ + +static __inline void bus_space_write_region_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + const u_int8_t *addr, + size_t count); +static __inline void bus_space_write_region_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + const u_int16_t *addr, + size_t count); +static __inline void bus_space_write_region_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + const u_int32_t *addr, + size_t count); + +static __inline void +bus_space_write_region_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, const u_int8_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) { + int _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: lodsb \n\ + outb %%al,%w0 \n\ + incl %0 \n\ + loop 1b" : + "=d" (_port_), "=S" (addr), "=c" (count) : + "0" (_port_), "1" (addr), "2" (count) : + "%eax", "memory", "cc"); +#endif + } else { + bus_space_handle_t _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + repne \n\ + movsb" : + "=D" (_port_), "=S" (addr), "=c" (count) : + "0" (_port_), "1" (addr), "2" (count) : + "memory", "cc"); +#endif + } +} + +static __inline void +bus_space_write_region_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, const u_int16_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) { + int _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: lodsw \n\ + outw %%ax,%w0 \n\ + addl $2,%0 \n\ + loop 1b" : + "=d" (_port_), "=S" (addr), "=c" (count) : + "0" (_port_), "1" (addr), "2" (count) : + "%eax", "memory", "cc"); +#endif + } else { + bus_space_handle_t _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + repne \n\ + movsw" : + "=D" (_port_), "=S" (addr), "=c" (count) : + "0" (_port_), "1" (addr), "2" (count) : + "memory", "cc"); +#endif + } +} + +static __inline void +bus_space_write_region_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, const u_int32_t *addr, size_t count) +{ + + if (tag == X86_BUS_SPACE_IO) { + int _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + 1: lodsl \n\ + outl %%eax,%w0 \n\ + addl $4,%0 \n\ + loop 1b" : + "=d" (_port_), "=S" (addr), "=c" (count) : + "0" (_port_), "1" (addr), "2" (count) : + "%eax", "memory", "cc"); +#endif + } else { + bus_space_handle_t _port_ = bsh + offset; +#ifdef __GNUCLIKE_ASM + __asm __volatile(" \n\ + cld \n\ + repne \n\ + movsl" : + "=D" (_port_), "=S" (addr), "=c" (count) : + "0" (_port_), "1" (addr), "2" (count) : + "memory", "cc"); +#endif + } +} + +#if 0 /* Cause a link error for bus_space_write_region_8 */ +#define bus_space_write_region_8 \ + !!! bus_space_write_region_8 unimplemented !!! +#endif + +/* + * Write the 1, 2, 4, or 8 byte value `val' to bus space described + * by tag/handle/offset `count' times. + */ + +static __inline void bus_space_set_multi_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + u_int8_t value, size_t count); +static __inline void bus_space_set_multi_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + u_int16_t value, size_t count); +static __inline void bus_space_set_multi_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, + u_int32_t value, size_t count); + +static __inline void +bus_space_set_multi_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int8_t value, size_t count) +{ + bus_space_handle_t addr = bsh + offset; + + if (tag == X86_BUS_SPACE_IO) + while (count--) + outb(addr, value); + else + while (count--) + *(volatile u_int8_t *)(addr) = value; +} + +static __inline void +bus_space_set_multi_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int16_t value, size_t count) +{ + bus_space_handle_t addr = bsh + offset; + + if (tag == X86_BUS_SPACE_IO) + while (count--) + outw(addr, value); + else + while (count--) + *(volatile u_int16_t *)(addr) = value; +} + +static __inline void +bus_space_set_multi_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int32_t value, size_t count) +{ + bus_space_handle_t addr = bsh + offset; + + if (tag == X86_BUS_SPACE_IO) + while (count--) + outl(addr, value); + else + while (count--) + *(volatile u_int32_t *)(addr) = value; +} + +#if 0 /* Cause a link error for bus_space_set_multi_8 */ +#define bus_space_set_multi_8 !!! bus_space_set_multi_8 unimplemented !!! +#endif + +/* + * Write `count' 1, 2, 4, or 8 byte value `val' to bus space described + * by tag/handle starting at `offset'. + */ + +static __inline void bus_space_set_region_1(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int8_t value, + size_t count); +static __inline void bus_space_set_region_2(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int16_t value, + size_t count); +static __inline void bus_space_set_region_4(bus_space_tag_t tag, + bus_space_handle_t bsh, + bus_size_t offset, u_int32_t value, + size_t count); + +static __inline void +bus_space_set_region_1(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int8_t value, size_t count) +{ + bus_space_handle_t addr = bsh + offset; + + if (tag == X86_BUS_SPACE_IO) + for (; count != 0; count--, addr++) + outb(addr, value); + else + for (; count != 0; count--, addr++) + *(volatile u_int8_t *)(addr) = value; +} + +static __inline void +bus_space_set_region_2(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int16_t value, size_t count) +{ + bus_space_handle_t addr = bsh + offset; + + if (tag == X86_BUS_SPACE_IO) + for (; count != 0; count--, addr += 2) + outw(addr, value); + else + for (; count != 0; count--, addr += 2) + *(volatile u_int16_t *)(addr) = value; +} + +static __inline void +bus_space_set_region_4(bus_space_tag_t tag, bus_space_handle_t bsh, + bus_size_t offset, u_int32_t value, size_t count) +{ + bus_space_handle_t addr = bsh + offset; + + if (tag == X86_BUS_SPACE_IO) + for (; count != 0; count--, addr += 4) + outl(addr, value); + else + for (; count != 0; count--, addr += 4) + *(volatile u_int32_t *)(addr) = value; +} + +#if 0 /* Cause a link error for bus_space_set_region_8 */ +#define bus_space_set_region_8 !!! bus_space_set_region_8 unimplemented !!! +#endif + +/* + * Copy `count' 1, 2, 4, or 8 byte values from bus space starting + * at tag/bsh1/off1 to bus space starting at tag/bsh2/off2. + */ + +static __inline void bus_space_copy_region_1(bus_space_tag_t tag, + bus_space_handle_t bsh1, + bus_size_t off1, + bus_space_handle_t bsh2, + bus_size_t off2, size_t count); + +static __inline void bus_space_copy_region_2(bus_space_tag_t tag, + bus_space_handle_t bsh1, + bus_size_t off1, + bus_space_handle_t bsh2, + bus_size_t off2, size_t count); + +static __inline void bus_space_copy_region_4(bus_space_tag_t tag, + bus_space_handle_t bsh1, + bus_size_t off1, + bus_space_handle_t bsh2, + bus_size_t off2, size_t count); + +static __inline void +bus_space_copy_region_1(bus_space_tag_t tag, bus_space_handle_t bsh1, + bus_size_t off1, bus_space_handle_t bsh2, + bus_size_t off2, size_t count) +{ + bus_space_handle_t addr1 = bsh1 + off1; + bus_space_handle_t addr2 = bsh2 + off2; + + if (tag == X86_BUS_SPACE_IO) { + if (addr1 >= addr2) { + /* src after dest: copy forward */ + for (; count != 0; count--, addr1++, addr2++) + outb(addr2, inb(addr1)); + } else { + /* dest after src: copy backwards */ + for (addr1 += (count - 1), addr2 += (count - 1); + count != 0; count--, addr1--, addr2--) + outb(addr2, inb(addr1)); + } + } else { + if (addr1 >= addr2) { + /* src after dest: copy forward */ + for (; count != 0; count--, addr1++, addr2++) + *(volatile u_int8_t *)(addr2) = + *(volatile u_int8_t *)(addr1); + } else { + /* dest after src: copy backwards */ + for (addr1 += (count - 1), addr2 += (count - 1); + count != 0; count--, addr1--, addr2--) + *(volatile u_int8_t *)(addr2) = + *(volatile u_int8_t *)(addr1); + } + } +} + +static __inline void +bus_space_copy_region_2(bus_space_tag_t tag, bus_space_handle_t bsh1, + bus_size_t off1, bus_space_handle_t bsh2, + bus_size_t off2, size_t count) +{ + bus_space_handle_t addr1 = bsh1 + off1; + bus_space_handle_t addr2 = bsh2 + off2; + + if (tag == X86_BUS_SPACE_IO) { + if (addr1 >= addr2) { + /* src after dest: copy forward */ + for (; count != 0; count--, addr1 += 2, addr2 += 2) + outw(addr2, inw(addr1)); + } else { + /* dest after src: copy backwards */ + for (addr1 += 2 * (count - 1), addr2 += 2 * (count - 1); + count != 0; count--, addr1 -= 2, addr2 -= 2) + outw(addr2, inw(addr1)); + } + } else { + if (addr1 >= addr2) { + /* src after dest: copy forward */ + for (; count != 0; count--, addr1 += 2, addr2 += 2) + *(volatile u_int16_t *)(addr2) = + *(volatile u_int16_t *)(addr1); + } else { + /* dest after src: copy backwards */ + for (addr1 += 2 * (count - 1), addr2 += 2 * (count - 1); + count != 0; count--, addr1 -= 2, addr2 -= 2) + *(volatile u_int16_t *)(addr2) = + *(volatile u_int16_t *)(addr1); + } + } +} + +static __inline void +bus_space_copy_region_4(bus_space_tag_t tag, bus_space_handle_t bsh1, + bus_size_t off1, bus_space_handle_t bsh2, + bus_size_t off2, size_t count) +{ + bus_space_handle_t addr1 = bsh1 + off1; + bus_space_handle_t addr2 = bsh2 + off2; + + if (tag == X86_BUS_SPACE_IO) { + if (addr1 >= addr2) { + /* src after dest: copy forward */ + for (; count != 0; count--, addr1 += 4, addr2 += 4) + outl(addr2, inl(addr1)); + } else { + /* dest after src: copy backwards */ + for (addr1 += 4 * (count - 1), addr2 += 4 * (count - 1); + count != 0; count--, addr1 -= 4, addr2 -= 4) + outl(addr2, inl(addr1)); + } + } else { + if (addr1 >= addr2) { + /* src after dest: copy forward */ + for (; count != 0; count--, addr1 += 4, addr2 += 4) + *(volatile u_int32_t *)(addr2) = + *(volatile u_int32_t *)(addr1); + } else { + /* dest after src: copy backwards */ + for (addr1 += 4 * (count - 1), addr2 += 4 * (count - 1); + count != 0; count--, addr1 -= 4, addr2 -= 4) + *(volatile u_int32_t *)(addr2) = + *(volatile u_int32_t *)(addr1); + } + } +} + +#if 0 /* Cause a link error for bus_space_copy_8 */ +#define bus_space_copy_region_8 !!! bus_space_copy_region_8 unimplemented !!! +#endif + +/* + * Bus read/write barrier methods. + * + * void bus_space_barrier(bus_space_tag_t tag, bus_space_handle_t bsh, + * bus_size_t offset, bus_size_t len, int flags); + * + * + * Note that BUS_SPACE_BARRIER_WRITE doesn't do anything other than + * prevent reordering by the compiler; all Intel x86 processors currently + * retire operations outside the CPU in program order. + */ +#define BUS_SPACE_BARRIER_READ 0x01 /* force read barrier */ +#define BUS_SPACE_BARRIER_WRITE 0x02 /* force write barrier */ + +static __inline void +bus_space_barrier(bus_space_tag_t tag __unused, bus_space_handle_t bsh __unused, + bus_size_t offset __unused, bus_size_t len __unused, int flags) +{ +#ifdef __GNUCLIKE_ASM + if (flags & BUS_SPACE_BARRIER_READ) +#ifdef __amd64__ + __asm __volatile("lock; addl $0,0(%%rsp)" : : : "memory"); +#else + __asm __volatile("lock; addl $0,0(%%esp)" : : : "memory"); +#endif + else + __compiler_membar(); +#endif +} + +#ifdef BUS_SPACE_NO_LEGACY +#undef inb +#undef outb +#define inb(a) compiler_error +#define inw(a) compiler_error +#define inl(a) compiler_error +#define outb(a, b) compiler_error +#define outw(a, b) compiler_error +#define outl(a, b) compiler_error +#endif + +#include <machine/bus_dma.h> + +/* + * Stream accesses are the same as normal accesses on x86; there are no + * supported bus systems with an endianess different from the host one. + */ +#define bus_space_read_stream_1(t, h, o) bus_space_read_1((t), (h), (o)) +#define bus_space_read_stream_2(t, h, o) bus_space_read_2((t), (h), (o)) +#define bus_space_read_stream_4(t, h, o) bus_space_read_4((t), (h), (o)) + +#define bus_space_read_multi_stream_1(t, h, o, a, c) \ + bus_space_read_multi_1((t), (h), (o), (a), (c)) +#define bus_space_read_multi_stream_2(t, h, o, a, c) \ + bus_space_read_multi_2((t), (h), (o), (a), (c)) +#define bus_space_read_multi_stream_4(t, h, o, a, c) \ + bus_space_read_multi_4((t), (h), (o), (a), (c)) + +#define bus_space_write_stream_1(t, h, o, v) \ + bus_space_write_1((t), (h), (o), (v)) +#define bus_space_write_stream_2(t, h, o, v) \ + bus_space_write_2((t), (h), (o), (v)) +#define bus_space_write_stream_4(t, h, o, v) \ + bus_space_write_4((t), (h), (o), (v)) + +#define bus_space_write_multi_stream_1(t, h, o, a, c) \ + bus_space_write_multi_1((t), (h), (o), (a), (c)) +#define bus_space_write_multi_stream_2(t, h, o, a, c) \ + bus_space_write_multi_2((t), (h), (o), (a), (c)) +#define bus_space_write_multi_stream_4(t, h, o, a, c) \ + bus_space_write_multi_4((t), (h), (o), (a), (c)) + +#define bus_space_set_multi_stream_1(t, h, o, v, c) \ + bus_space_set_multi_1((t), (h), (o), (v), (c)) +#define bus_space_set_multi_stream_2(t, h, o, v, c) \ + bus_space_set_multi_2((t), (h), (o), (v), (c)) +#define bus_space_set_multi_stream_4(t, h, o, v, c) \ + bus_space_set_multi_4((t), (h), (o), (v), (c)) + +#define bus_space_read_region_stream_1(t, h, o, a, c) \ + bus_space_read_region_1((t), (h), (o), (a), (c)) +#define bus_space_read_region_stream_2(t, h, o, a, c) \ + bus_space_read_region_2((t), (h), (o), (a), (c)) +#define bus_space_read_region_stream_4(t, h, o, a, c) \ + bus_space_read_region_4((t), (h), (o), (a), (c)) + +#define bus_space_write_region_stream_1(t, h, o, a, c) \ + bus_space_write_region_1((t), (h), (o), (a), (c)) +#define bus_space_write_region_stream_2(t, h, o, a, c) \ + bus_space_write_region_2((t), (h), (o), (a), (c)) +#define bus_space_write_region_stream_4(t, h, o, a, c) \ + bus_space_write_region_4((t), (h), (o), (a), (c)) + +#define bus_space_set_region_stream_1(t, h, o, v, c) \ + bus_space_set_region_1((t), (h), (o), (v), (c)) +#define bus_space_set_region_stream_2(t, h, o, v, c) \ + bus_space_set_region_2((t), (h), (o), (v), (c)) +#define bus_space_set_region_stream_4(t, h, o, v, c) \ + bus_space_set_region_4((t), (h), (o), (v), (c)) + +#define bus_space_copy_region_stream_1(t, h1, o1, h2, o2, c) \ + bus_space_copy_region_1((t), (h1), (o1), (h2), (o2), (c)) +#define bus_space_copy_region_stream_2(t, h1, o1, h2, o2, c) \ + bus_space_copy_region_2((t), (h1), (o1), (h2), (o2), (c)) +#define bus_space_copy_region_stream_4(t, h1, o1, h2, o2, c) \ + bus_space_copy_region_4((t), (h1), (o1), (h2), (o2), (c)) + +#endif /* _X86_BUS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/busdma_impl.h /usr/src/sys/modules/netmap/x86/busdma_impl.h --- usr/src/sys/modules/netmap/x86/busdma_impl.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/busdma_impl.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,96 @@ +/*- + * Copyright (c) 2013 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Konstantin Belousov <kib@FreeBSD.org> + * under sponsorship from the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/busdma_impl.h 257308 2013-10-29 07:25:54Z kib $ + */ + +#ifndef __X86_BUSDMA_IMPL_H +#define __X86_BUSDMA_IMPL_H + +struct bus_dma_tag_common { + struct bus_dma_impl *impl; + struct bus_dma_tag_common *parent; + bus_size_t alignment; + bus_addr_t boundary; + bus_addr_t lowaddr; + bus_addr_t highaddr; + bus_dma_filter_t *filter; + void *filterarg; + bus_size_t maxsize; + u_int nsegments; + bus_size_t maxsegsz; + int flags; + bus_dma_lock_t *lockfunc; + void *lockfuncarg; + int ref_count; +}; + +struct bus_dma_impl { + int (*tag_create)(bus_dma_tag_t parent, + bus_size_t alignment, bus_addr_t boundary, bus_addr_t lowaddr, + bus_addr_t highaddr, bus_dma_filter_t *filter, + void *filterarg, bus_size_t maxsize, int nsegments, + bus_size_t maxsegsz, int flags, bus_dma_lock_t *lockfunc, + void *lockfuncarg, bus_dma_tag_t *dmat); + int (*tag_destroy)(bus_dma_tag_t dmat); + int (*map_create)(bus_dma_tag_t dmat, int flags, bus_dmamap_t *mapp); + int (*map_destroy)(bus_dma_tag_t dmat, bus_dmamap_t map); + int (*mem_alloc)(bus_dma_tag_t dmat, void** vaddr, int flags, + bus_dmamap_t *mapp); + void (*mem_free)(bus_dma_tag_t dmat, void *vaddr, bus_dmamap_t map); + int (*load_ma)(bus_dma_tag_t dmat, bus_dmamap_t map, + struct vm_page **ma, bus_size_t tlen, int ma_offs, int flags, + bus_dma_segment_t *segs, int *segp); + int (*load_phys)(bus_dma_tag_t dmat, bus_dmamap_t map, + vm_paddr_t buf, bus_size_t buflen, int flags, + bus_dma_segment_t *segs, int *segp); + int (*load_buffer)(bus_dma_tag_t dmat, bus_dmamap_t map, + void *buf, bus_size_t buflen, pmap_t pmap, int flags, + bus_dma_segment_t *segs, int *segp); + void (*map_waitok)(bus_dma_tag_t dmat, bus_dmamap_t map, + struct memdesc *mem, bus_dmamap_callback_t *callback, + void *callback_arg); + bus_dma_segment_t *(*map_complete)(bus_dma_tag_t dmat, bus_dmamap_t map, + bus_dma_segment_t *segs, int nsegs, int error); + void (*map_unload)(bus_dma_tag_t dmat, bus_dmamap_t map); + void (*map_sync)(bus_dma_tag_t dmat, bus_dmamap_t map, + bus_dmasync_op_t op); +}; + +void bus_dma_dflt_lock(void *arg, bus_dma_lock_op_t op); +int bus_dma_run_filter(struct bus_dma_tag_common *dmat, bus_addr_t paddr); +int common_bus_dma_tag_create(struct bus_dma_tag_common *parent, + bus_size_t alignment, + bus_addr_t boundary, bus_addr_t lowaddr, bus_addr_t highaddr, + bus_dma_filter_t *filter, void *filterarg, bus_size_t maxsize, + int nsegments, bus_size_t maxsegsz, int flags, bus_dma_lock_t *lockfunc, + void *lockfuncarg, size_t sz, void **dmat); + +extern struct bus_dma_impl bus_dma_bounce_impl; + +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/cputypes.h /usr/src/sys/modules/netmap/x86/cputypes.h --- usr/src/sys/modules/netmap/x86/cputypes.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/cputypes.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,54 @@ +/*- + * Copyright (c) 1993 Christopher G. Demetriou + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/cputypes.h 292668 2015-12-23 21:41:42Z jhb $ + */ + +#ifndef _X86_CPUTYPES_H_ +#define _X86_CPUTYPES_H_ + +/* + * Vendors of processor. + */ +#define CPU_VENDOR_NSC 0x100b /* NSC */ +#define CPU_VENDOR_IBM 0x1014 /* IBM */ +#define CPU_VENDOR_AMD 0x1022 /* AMD */ +#define CPU_VENDOR_SIS 0x1039 /* SiS */ +#define CPU_VENDOR_UMC 0x1060 /* UMC */ +#define CPU_VENDOR_NEXGEN 0x1074 /* Nexgen */ +#define CPU_VENDOR_CYRIX 0x1078 /* Cyrix */ +#define CPU_VENDOR_IDT 0x111d /* Centaur/IDT/VIA */ +#define CPU_VENDOR_TRANSMETA 0x1279 /* Transmeta */ +#define CPU_VENDOR_INTEL 0x8086 /* Intel */ +#define CPU_VENDOR_RISE 0xdead2bad /* Rise */ +#define CPU_VENDOR_CENTAUR CPU_VENDOR_IDT + +#ifndef LOCORE +extern int cpu; +extern int cpu_class; +#endif + +#endif /* !_X86_CPUTYPES_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/dump.h /usr/src/sys/modules/netmap/x86/dump.h --- usr/src/sys/modules/netmap/x86/dump.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/dump.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,87 @@ +/*- + * Copyright (c) 2014 EMC Corp. + * Author: Conrad Meyer <conrad.meyer@isilon.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/dump.h 276772 2015-01-07 01:01:39Z markj $ + */ + +#ifndef _MACHINE_DUMP_H_ +#define _MACHINE_DUMP_H_ + +#ifdef __amd64__ +#define KERNELDUMP_ARCH_VERSION KERNELDUMP_AMD64_VERSION +#define EM_VALUE EM_X86_64 +#else +#define KERNELDUMP_ARCH_VERSION KERNELDUMP_I386_VERSION +#define EM_VALUE EM_386 +#endif + +/* 20 phys_avail entry pairs correspond to 10 pa's */ +#define DUMPSYS_MD_PA_NPAIRS 10 +#define DUMPSYS_NUM_AUX_HDRS 0 + +static inline void +dumpsys_pa_init(void) +{ + + dumpsys_gen_pa_init(); +} + +static inline struct dump_pa * +dumpsys_pa_next(struct dump_pa *p) +{ + + return (dumpsys_gen_pa_next(p)); +} + +static inline void +dumpsys_wbinv_all(void) +{ + + dumpsys_gen_wbinv_all(); +} + +static inline void +dumpsys_unmap_chunk(vm_paddr_t pa, size_t s, void *va) +{ + + dumpsys_gen_unmap_chunk(pa, s, va); +} + +static inline int +dumpsys_write_aux_headers(struct dumperinfo *di) +{ + + return (dumpsys_gen_write_aux_headers(di)); +} + +static inline int +dumpsys(struct dumperinfo *di) +{ + + return (dumpsys_generic(di)); +} + +#endif /* !_MACHINE_DUMP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/elf.h /usr/src/sys/modules/netmap/x86/elf.h --- usr/src/sys/modules/netmap/x86/elf.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/elf.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,215 @@ +/*- + * Copyright (c) 1996-1997 John D. Polstra. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/elf.h 247047 2013-02-20 17:39:52Z kib $ + */ + +#ifndef _MACHINE_ELF_H_ +#define _MACHINE_ELF_H_ 1 + +#if defined(__i386__) || defined(_MACHINE_ELF_WANT_32BIT) + +/* + * ELF definitions for the i386 architecture. + */ + +#include <sys/elf32.h> /* Definitions common to all 32 bit architectures. */ +#if defined(__ELF_WORD_SIZE) && __ELF_WORD_SIZE == 64 +#include <sys/elf64.h> /* Definitions common to all 64 bit architectures. */ +#endif + +#ifndef __ELF_WORD_SIZE +#define __ELF_WORD_SIZE 32 /* Used by <sys/elf_generic.h> */ +#endif + +#include <sys/elf_generic.h> + +#define ELF_ARCH EM_386 + +#define ELF_MACHINE_OK(x) ((x) == EM_386 || (x) == EM_486) + +/* + * Auxiliary vector entries for passing information to the interpreter. + * + * The i386 supplement to the SVR4 ABI specification names this "auxv_t", + * but POSIX lays claim to all symbols ending with "_t". + */ + +typedef struct { /* Auxiliary vector entry on initial stack */ + int a_type; /* Entry type. */ + union { + long a_val; /* Integer value. */ + void *a_ptr; /* Address. */ + void (*a_fcn)(void); /* Function pointer (not used). */ + } a_un; +} Elf32_Auxinfo; + +#if __ELF_WORD_SIZE == 64 +/* Fake for amd64 loader support */ +typedef struct { + int fake; +} Elf64_Auxinfo; +#endif + +__ElfType(Auxinfo); + +/* Values for a_type. */ +#define AT_NULL 0 /* Terminates the vector. */ +#define AT_IGNORE 1 /* Ignored entry. */ +#define AT_EXECFD 2 /* File descriptor of program to load. */ +#define AT_PHDR 3 /* Program header of program already loaded. */ +#define AT_PHENT 4 /* Size of each program header entry. */ +#define AT_PHNUM 5 /* Number of program header entries. */ +#define AT_PAGESZ 6 /* Page size in bytes. */ +#define AT_BASE 7 /* Interpreter's base address. */ +#define AT_FLAGS 8 /* Flags (unused for i386). */ +#define AT_ENTRY 9 /* Where interpreter should transfer control. */ +#define AT_NOTELF 10 /* Program is not ELF ?? */ +#define AT_UID 11 /* Real uid. */ +#define AT_EUID 12 /* Effective uid. */ +#define AT_GID 13 /* Real gid. */ +#define AT_EGID 14 /* Effective gid. */ +#define AT_EXECPATH 15 /* Path to the executable. */ +#define AT_CANARY 16 /* Canary for SSP. */ +#define AT_CANARYLEN 17 /* Length of the canary. */ +#define AT_OSRELDATE 18 /* OSRELDATE. */ +#define AT_NCPUS 19 /* Number of CPUs. */ +#define AT_PAGESIZES 20 /* Pagesizes. */ +#define AT_PAGESIZESLEN 21 /* Number of pagesizes. */ +#define AT_TIMEKEEP 22 /* Pointer to timehands. */ +#define AT_STACKPROT 23 /* Initial stack protection. */ + +#define AT_COUNT 24 /* Count of defined aux entry types. */ + +/* + * Relocation types. + */ + +#define R_386_COUNT 38 /* Count of defined relocation types. */ + +/* Define "machine" characteristics */ +#define ELF_TARG_CLASS ELFCLASS32 +#define ELF_TARG_DATA ELFDATA2LSB +#define ELF_TARG_MACH EM_386 +#define ELF_TARG_VER 1 + +#define ET_DYN_LOAD_ADDR 0x01001000 + +#elif defined(__amd64__) + +/* + * ELF definitions for the AMD64 architecture. + */ + +#ifndef __ELF_WORD_SIZE +#define __ELF_WORD_SIZE 64 /* Used by <sys/elf_generic.h> */ +#endif +#include <sys/elf32.h> /* Definitions common to all 32 bit architectures. */ +#include <sys/elf64.h> /* Definitions common to all 64 bit architectures. */ +#include <sys/elf_generic.h> + +#define ELF_ARCH EM_X86_64 +#define ELF_ARCH32 EM_386 + +#define ELF_MACHINE_OK(x) ((x) == EM_X86_64) + +/* + * Auxiliary vector entries for passing information to the interpreter. + * + * The i386 supplement to the SVR4 ABI specification names this "auxv_t", + * but POSIX lays claim to all symbols ending with "_t". + */ +typedef struct { /* Auxiliary vector entry on initial stack */ + int a_type; /* Entry type. */ + union { + int a_val; /* Integer value. */ + } a_un; +} Elf32_Auxinfo; + + +typedef struct { /* Auxiliary vector entry on initial stack */ + long a_type; /* Entry type. */ + union { + long a_val; /* Integer value. */ + void *a_ptr; /* Address. */ + void (*a_fcn)(void); /* Function pointer (not used). */ + } a_un; +} Elf64_Auxinfo; + +__ElfType(Auxinfo); + +/* Values for a_type. */ +#define AT_NULL 0 /* Terminates the vector. */ +#define AT_IGNORE 1 /* Ignored entry. */ +#define AT_EXECFD 2 /* File descriptor of program to load. */ +#define AT_PHDR 3 /* Program header of program already loaded. */ +#define AT_PHENT 4 /* Size of each program header entry. */ +#define AT_PHNUM 5 /* Number of program header entries. */ +#define AT_PAGESZ 6 /* Page size in bytes. */ +#define AT_BASE 7 /* Interpreter's base address. */ +#define AT_FLAGS 8 /* Flags (unused for i386). */ +#define AT_ENTRY 9 /* Where interpreter should transfer control. */ +#define AT_NOTELF 10 /* Program is not ELF ?? */ +#define AT_UID 11 /* Real uid. */ +#define AT_EUID 12 /* Effective uid. */ +#define AT_GID 13 /* Real gid. */ +#define AT_EGID 14 /* Effective gid. */ +#define AT_EXECPATH 15 /* Path to the executable. */ +#define AT_CANARY 16 /* Canary for SSP */ +#define AT_CANARYLEN 17 /* Length of the canary. */ +#define AT_OSRELDATE 18 /* OSRELDATE. */ +#define AT_NCPUS 19 /* Number of CPUs. */ +#define AT_PAGESIZES 20 /* Pagesizes. */ +#define AT_PAGESIZESLEN 21 /* Number of pagesizes. */ +#define AT_TIMEKEEP 22 /* Pointer to timehands. */ +#define AT_STACKPROT 23 /* Initial stack protection. */ + +#define AT_COUNT 24 /* Count of defined aux entry types. */ + +/* + * Relocation types. + */ + +#define R_X86_64_COUNT 24 /* Count of defined relocation types. */ + +/* Define "machine" characteristics */ +#if __ELF_WORD_SIZE == 32 +#define ELF_TARG_CLASS ELFCLASS32 +#else +#define ELF_TARG_CLASS ELFCLASS64 +#endif +#define ELF_TARG_DATA ELFDATA2LSB +#define ELF_TARG_MACH EM_X86_64 +#define ELF_TARG_VER 1 + +#if __ELF_WORD_SIZE == 32 +#define ET_DYN_LOAD_ADDR 0x01001000 +#else +#define ET_DYN_LOAD_ADDR 0x01021000 +#endif + +#endif /* __i386__, __amd64__ */ + +#endif /* !_MACHINE_ELF_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/endian.h /usr/src/sys/modules/netmap/x86/endian.h --- usr/src/sys/modules/netmap/x86/endian.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/endian.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,131 @@ +/*- + * Copyright (c) 1987, 1991 Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)endian.h 7.8 (Berkeley) 4/3/91 + * $FreeBSD: releng/11.0/sys/x86/include/endian.h 233684 2012-03-29 23:31:48Z dim $ + */ + +#ifndef _MACHINE_ENDIAN_H_ +#define _MACHINE_ENDIAN_H_ + +#include <sys/cdefs.h> +#include <sys/_types.h> + +/* + * Define the order of 32-bit words in 64-bit words. + */ +#define _QUAD_HIGHWORD 1 +#define _QUAD_LOWWORD 0 + +/* + * Definitions for byte order, according to byte significance from low + * address to high. + */ +#define _LITTLE_ENDIAN 1234 /* LSB first: i386, vax */ +#define _BIG_ENDIAN 4321 /* MSB first: 68000, ibm, net */ +#define _PDP_ENDIAN 3412 /* LSB first in word, MSW first in long */ + +#define _BYTE_ORDER _LITTLE_ENDIAN + +/* + * Deprecated variants that don't have enough underscores to be useful in more + * strict namespaces. + */ +#if __BSD_VISIBLE +#define LITTLE_ENDIAN _LITTLE_ENDIAN +#define BIG_ENDIAN _BIG_ENDIAN +#define PDP_ENDIAN _PDP_ENDIAN +#define BYTE_ORDER _BYTE_ORDER +#endif + +#define __bswap16_gen(x) (__uint16_t)((x) << 8 | (x) >> 8) +#define __bswap32_gen(x) \ + (((__uint32_t)__bswap16((x) & 0xffff) << 16) | __bswap16((x) >> 16)) +#define __bswap64_gen(x) \ + (((__uint64_t)__bswap32((x) & 0xffffffff) << 32) | __bswap32((x) >> 32)) + +#ifdef __GNUCLIKE_BUILTIN_CONSTANT_P +#define __bswap16(x) \ + ((__uint16_t)(__builtin_constant_p(x) ? \ + __bswap16_gen((__uint16_t)(x)) : __bswap16_var(x))) +#define __bswap32(x) \ + (__builtin_constant_p(x) ? \ + __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x)) +#define __bswap64(x) \ + (__builtin_constant_p(x) ? \ + __bswap64_gen((__uint64_t)(x)) : __bswap64_var(x)) +#else +/* XXX these are broken for use in static initializers. */ +#define __bswap16(x) __bswap16_var(x) +#define __bswap32(x) __bswap32_var(x) +#define __bswap64(x) __bswap64_var(x) +#endif + +/* These are defined as functions to avoid multiple evaluation of x. */ + +static __inline __uint16_t +__bswap16_var(__uint16_t _x) +{ + + return (__bswap16_gen(_x)); +} + +static __inline __uint32_t +__bswap32_var(__uint32_t _x) +{ + +#ifdef __GNUCLIKE_ASM + __asm("bswap %0" : "+r" (_x)); + return (_x); +#else + return (__bswap32_gen(_x)); +#endif +} + +static __inline __uint64_t +__bswap64_var(__uint64_t _x) +{ + +#if defined(__amd64__) && defined(__GNUCLIKE_ASM) + __asm("bswap %0" : "+r" (_x)); + return (_x); +#else + /* + * It is important for the optimizations that the following is not + * really generic, but expands to 2 __bswap32_var()'s. + */ + return (__bswap64_gen(_x)); +#endif +} + +#define __htonl(x) __bswap32(x) +#define __htons(x) __bswap16(x) +#define __ntohl(x) __bswap32(x) +#define __ntohs(x) __bswap16(x) + +#endif /* !_MACHINE_ENDIAN_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/fdt.h /usr/src/sys/modules/netmap/x86/fdt.h --- usr/src/sys/modules/netmap/x86/fdt.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/fdt.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,36 @@ +/*- + * Copyright (c) 2013 Juniper Networks, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/fdt.h 260327 2014-01-05 18:46:58Z nwhitehorn $ + */ + +#ifndef _MACHINE_FDT_H_ +#define _MACHINE_FDT_H_ + +__BEGIN_DECLS +int x86_init_fdt(void); +__END_DECLS + +#endif /* _MACHINE_FDT_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/float.h /usr/src/sys/modules/netmap/x86/float.h --- usr/src/sys/modules/netmap/x86/float.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/float.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,98 @@ +/*- + * Copyright (c) 1989 Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)float.h 7.1 (Berkeley) 5/8/90 + * $FreeBSD: releng/11.0/sys/x86/include/float.h 286327 2015-08-05 17:05:35Z emaste $ + */ + +#ifndef _MACHINE_FLOAT_H_ +#define _MACHINE_FLOAT_H_ 1 + +#include <sys/cdefs.h> + +__BEGIN_DECLS +extern int __flt_rounds(void); +__END_DECLS + +#define FLT_RADIX 2 /* b */ +#define FLT_ROUNDS __flt_rounds() +#if __ISO_C_VISIBLE >= 1999 +#ifdef __LP64__ +#define FLT_EVAL_METHOD 0 /* no promotions */ +#else +#define FLT_EVAL_METHOD (-1) /* i387 semantics are...interesting */ +#endif +#define DECIMAL_DIG 21 /* max precision in decimal digits */ +#endif + +#define FLT_MANT_DIG 24 /* p */ +#define FLT_EPSILON 1.19209290E-07F /* b**(1-p) */ +#define FLT_DIG 6 /* floor((p-1)*log10(b))+(b == 10) */ +#define FLT_MIN_EXP (-125) /* emin */ +#define FLT_MIN 1.17549435E-38F /* b**(emin-1) */ +#define FLT_MIN_10_EXP (-37) /* ceil(log10(b**(emin-1))) */ +#define FLT_MAX_EXP 128 /* emax */ +#define FLT_MAX 3.40282347E+38F /* (1-b**(-p))*b**emax */ +#define FLT_MAX_10_EXP 38 /* floor(log10((1-b**(-p))*b**emax)) */ +#if __ISO_C_VISIBLE >= 2011 +#define FLT_TRUE_MIN 1.40129846E-45F /* b**(emin-p) */ +#define FLT_DECIMAL_DIG 9 /* ceil(1+p*log10(b)) */ +#define FLT_HAS_SUBNORM 1 +#endif /* __ISO_C_VISIBLE >= 2011 */ + +#define DBL_MANT_DIG 53 +#define DBL_EPSILON 2.2204460492503131E-16 +#define DBL_DIG 15 +#define DBL_MIN_EXP (-1021) +#define DBL_MIN 2.2250738585072014E-308 +#define DBL_MIN_10_EXP (-307) +#define DBL_MAX_EXP 1024 +#define DBL_MAX 1.7976931348623157E+308 +#define DBL_MAX_10_EXP 308 +#if __ISO_C_VISIBLE >= 2011 +#define DBL_TRUE_MIN 4.9406564584124654E-324 +#define DBL_DECIMAL_DIG 17 +#define DBL_HAS_SUBNORM 1 +#endif /* __ISO_C_VISIBLE >= 2011 */ + +#define LDBL_MANT_DIG 64 +#define LDBL_EPSILON 1.0842021724855044340E-19L +#define LDBL_DIG 18 +#define LDBL_MIN_EXP (-16381) +#define LDBL_MIN 3.3621031431120935063E-4932L +#define LDBL_MIN_10_EXP (-4931) +#define LDBL_MAX_EXP 16384 +#define LDBL_MAX 1.1897314953572317650E+4932L +#define LDBL_MAX_10_EXP 4932 +#if __ISO_C_VISIBLE >= 2011 +#define LDBL_TRUE_MIN 3.6451995318824746025E-4951L +#define LDBL_DECIMAL_DIG 21 +#define LDBL_HAS_SUBNORM 1 +#endif /* __ISO_C_VISIBLE >= 2011 */ + +#endif /* _MACHINE_FLOAT_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/fpu.h /usr/src/sys/modules/netmap/x86/fpu.h --- usr/src/sys/modules/netmap/x86/fpu.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/fpu.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,217 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)npx.h 5.3 (Berkeley) 1/18/91 + * $FreeBSD: releng/11.0/sys/x86/include/fpu.h 274817 2014-11-21 20:53:17Z jhb $ + */ + +/* + * Floating Point Data Structures and Constants + * W. Jolitz 1/90 + */ + +#ifndef _X86_FPU_H_ +#define _X86_FPU_H_ + +/* Environment information of floating point unit. */ +struct env87 { + int32_t en_cw; /* control word (16bits) */ + int32_t en_sw; /* status word (16bits) */ + int32_t en_tw; /* tag word (16bits) */ + int32_t en_fip; /* fp instruction pointer */ + uint16_t en_fcs; /* fp code segment selector */ + uint16_t en_opcode; /* opcode last executed (11 bits) */ + int32_t en_foo; /* fp operand offset */ + int32_t en_fos; /* fp operand segment selector */ +}; + +/* Contents of each x87 floating point accumulator. */ +struct fpacc87 { + uint8_t fp_bytes[10]; +}; + +/* Floating point context. (i386 fnsave/frstor) */ +struct save87 { + struct env87 sv_env; /* floating point control/status */ + struct fpacc87 sv_ac[8]; /* accumulator contents, 0-7 */ + uint8_t sv_pad0[4]; /* saved status word (now unused) */ + uint8_t sv_pad[64]; +}; + +/* Contents of each SSE extended accumulator. */ +struct xmmacc { + uint8_t xmm_bytes[16]; +}; + +/* Contents of the upper 16 bytes of each AVX extended accumulator. */ +struct ymmacc { + uint8_t ymm_bytes[16]; +}; + +/* Rename structs below depending on machine architecture. */ +#ifdef __i386__ +#define __envxmm32 envxmm +#else +#define __envxmm32 envxmm32 +#define __envxmm64 envxmm +#endif + +struct __envxmm32 { + uint16_t en_cw; /* control word (16bits) */ + uint16_t en_sw; /* status word (16bits) */ + uint16_t en_tw; /* tag word (16bits) */ + uint16_t en_opcode; /* opcode last executed (11 bits) */ + uint32_t en_fip; /* fp instruction pointer */ + uint16_t en_fcs; /* fp code segment selector */ + uint16_t en_pad0; /* padding */ + uint32_t en_foo; /* fp operand offset */ + uint16_t en_fos; /* fp operand segment selector */ + uint16_t en_pad1; /* padding */ + uint32_t en_mxcsr; /* SSE control/status register */ + uint32_t en_mxcsr_mask; /* valid bits in mxcsr */ +}; + +struct __envxmm64 { + uint16_t en_cw; /* control word (16bits) */ + uint16_t en_sw; /* status word (16bits) */ + uint8_t en_tw; /* tag word (8bits) */ + uint8_t en_zero; + uint16_t en_opcode; /* opcode last executed (11 bits ) */ + uint64_t en_rip; /* fp instruction pointer */ + uint64_t en_rdp; /* fp operand pointer */ + uint32_t en_mxcsr; /* SSE control/status register */ + uint32_t en_mxcsr_mask; /* valid bits in mxcsr */ +}; + +/* Floating point context. (i386 fxsave/fxrstor) */ +struct savexmm { + struct __envxmm32 sv_env; + struct { + struct fpacc87 fp_acc; + uint8_t fp_pad[6]; /* padding */ + } sv_fp[8]; + struct xmmacc sv_xmm[8]; + uint8_t sv_pad[224]; +} __aligned(16); + +#ifdef __i386__ +union savefpu { + struct save87 sv_87; + struct savexmm sv_xmm; +}; +#else +/* Floating point context. (amd64 fxsave/fxrstor) */ +struct savefpu { + struct __envxmm64 sv_env; + struct { + struct fpacc87 fp_acc; + uint8_t fp_pad[6]; /* padding */ + } sv_fp[8]; + struct xmmacc sv_xmm[16]; + uint8_t sv_pad[96]; +} __aligned(16); +#endif + +struct xstate_hdr { + uint64_t xstate_bv; + uint64_t xstate_xcomp_bv; + uint8_t xstate_rsrv0[8]; + uint8_t xstate_rsrv[40]; +}; +#define XSTATE_XCOMP_BV_COMPACT (1ULL << 63) + +struct savexmm_xstate { + struct xstate_hdr sx_hd; + struct ymmacc sx_ymm[16]; +}; + +struct savexmm_ymm { + struct __envxmm32 sv_env; + struct { + struct fpacc87 fp_acc; + int8_t fp_pad[6]; /* padding */ + } sv_fp[8]; + struct xmmacc sv_xmm[16]; + uint8_t sv_pad[96]; + struct savexmm_xstate sv_xstate; +} __aligned(64); + +struct savefpu_xstate { + struct xstate_hdr sx_hd; + struct ymmacc sx_ymm[16]; +}; + +struct savefpu_ymm { + struct __envxmm64 sv_env; + struct { + struct fpacc87 fp_acc; + int8_t fp_pad[6]; /* padding */ + } sv_fp[8]; + struct xmmacc sv_xmm[16]; + uint8_t sv_pad[96]; + struct savefpu_xstate sv_xstate; +} __aligned(64); + +#undef __envxmm32 +#undef __envxmm64 + +/* + * The hardware default control word for i387's and later coprocessors is + * 0x37F, giving: + * + * round to nearest + * 64-bit precision + * all exceptions masked. + * + * FreeBSD/i386 uses 53 bit precision for things like fadd/fsub/fsqrt etc + * because of the difference between memory and fpu register stack arguments. + * If its using an intermediate fpu register, it has 80/64 bits to work + * with. If it uses memory, it has 64/53 bits to work with. However, + * gcc is aware of this and goes to a fair bit of trouble to make the + * best use of it. + * + * This is mostly academic for AMD64, because the ABI prefers the use + * SSE2 based math. For FreeBSD/amd64, we go with the default settings. + */ +#define __INITIAL_FPUCW__ 0x037F +#define __INITIAL_FPUCW_I386__ 0x127F +#define __INITIAL_NPXCW__ __INITIAL_FPUCW_I386__ +#define __INITIAL_MXCSR__ 0x1F80 +#define __INITIAL_MXCSR_MASK__ 0xFFBF + +/* + * The current value of %xcr0 is saved in the sv_pad[] field of the FPU + * state in the NT_X86_XSTATE note in core dumps. This offset is chosen + * to match the offset used by NT_X86_XSTATE in other systems. + */ +#define X86_XSTATE_XCR0_OFFSET 464 + +#endif /* !_X86_FPU_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/frame.h /usr/src/sys/modules/netmap/x86/frame.h --- usr/src/sys/modules/netmap/x86/frame.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/frame.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,148 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)frame.h 5.2 (Berkeley) 1/18/91 + * $FreeBSD: releng/11.0/sys/x86/include/frame.h 247047 2013-02-20 17:39:52Z kib $ + */ + +#ifndef _MACHINE_FRAME_H_ +#define _MACHINE_FRAME_H_ 1 + +/* + * System stack frames. + */ + +#ifdef __i386__ +/* + * Exception/Trap Stack Frame + */ + +struct trapframe { + int tf_fs; + int tf_es; + int tf_ds; + int tf_edi; + int tf_esi; + int tf_ebp; + int tf_isp; + int tf_ebx; + int tf_edx; + int tf_ecx; + int tf_eax; + int tf_trapno; + /* below portion defined in 386 hardware */ + int tf_err; + int tf_eip; + int tf_cs; + int tf_eflags; + /* below only when crossing rings (e.g. user to kernel) */ + int tf_esp; + int tf_ss; +}; + +/* Superset of trap frame, for traps from virtual-8086 mode */ + +struct trapframe_vm86 { + int tf_fs; + int tf_es; + int tf_ds; + int tf_edi; + int tf_esi; + int tf_ebp; + int tf_isp; + int tf_ebx; + int tf_edx; + int tf_ecx; + int tf_eax; + int tf_trapno; + /* below portion defined in 386 hardware */ + int tf_err; + int tf_eip; + int tf_cs; + int tf_eflags; + /* below only when crossing rings (e.g. user to kernel) */ + int tf_esp; + int tf_ss; + /* below only when switching out of VM86 mode */ + int tf_vm86_es; + int tf_vm86_ds; + int tf_vm86_fs; + int tf_vm86_gs; +}; +#endif /* __i386__ */ + +#ifdef __amd64__ +/* + * Exception/Trap Stack Frame + * + * The ordering of this is specifically so that we can take first 6 + * the syscall arguments directly from the beginning of the frame. + */ + +struct trapframe { + register_t tf_rdi; + register_t tf_rsi; + register_t tf_rdx; + register_t tf_rcx; + register_t tf_r8; + register_t tf_r9; + register_t tf_rax; + register_t tf_rbx; + register_t tf_rbp; + register_t tf_r10; + register_t tf_r11; + register_t tf_r12; + register_t tf_r13; + register_t tf_r14; + register_t tf_r15; + uint32_t tf_trapno; + uint16_t tf_fs; + uint16_t tf_gs; + register_t tf_addr; + uint32_t tf_flags; + uint16_t tf_es; + uint16_t tf_ds; + /* below portion defined in hardware */ + register_t tf_err; + register_t tf_rip; + register_t tf_cs; + register_t tf_rflags; + register_t tf_rsp; + register_t tf_ss; +}; + +#define TF_HASSEGS 0x1 +#define TF_HASBASES 0x2 +#define TF_HASFPXSTATE 0x4 +#endif /* __amd64__ */ + +#endif /* _MACHINE_FRAME_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/init.h /usr/src/sys/modules/netmap/x86/init.h --- usr/src/sys/modules/netmap/x86/init.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/init.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,58 @@ +/*- + * Copyright (c) 2013 Roger Pau Monné <roger.pau@citrix.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/init.h 272310 2014-09-30 16:46:45Z royger $ + */ + +#ifndef __X86_INIT_H__ +#define __X86_INIT_H__ +/* + * Struct containing pointers to init functions whose + * implementation is run time selectable. Selection can be made, + * for example, based on detection of a BIOS variant or + * hypervisor environment. + */ +struct init_ops { + caddr_t (*parse_preload_data)(u_int64_t); + void (*early_clock_source_init)(void); + void (*early_delay)(int); + void (*parse_memmap)(caddr_t, vm_paddr_t *, int *); + u_int (*mp_bootaddress)(u_int); + int (*start_all_aps)(void); + void (*msi_init)(void); +}; + +extern struct init_ops init_ops; + +/* Knob to disable acpi_cpu devices */ +extern bool acpi_cpu_disabled; + +/* Knob to disable acpi_hpet device */ +extern bool acpi_hpet_disabled; + +/* Knob to disable acpi_timer device */ +extern bool acpi_timer_disabled; + +#endif /* __X86_INIT_H__ */ diff -u -r -N usr/src/sys/modules/netmap/x86/legacyvar.h /usr/src/sys/modules/netmap/x86/legacyvar.h --- usr/src/sys/modules/netmap/x86/legacyvar.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/legacyvar.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,71 @@ +/*- + * Copyright (c) 2000 Peter Wemm <peter@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/legacyvar.h 294883 2016-01-27 02:23:54Z jhibbits $ + */ + +#ifndef _X86_LEGACYVAR_H_ +#define _X86_LEGACYVAR_H_ + +enum legacy_device_ivars { + LEGACY_IVAR_PCIDOMAIN, + LEGACY_IVAR_PCIBUS, + LEGACY_IVAR_PCISLOT, + LEGACY_IVAR_PCIFUNC +}; + +#define LEGACY_ACCESSOR(var, ivar, type) \ + __BUS_ACCESSOR(legacy, var, LEGACY, ivar, type) + +LEGACY_ACCESSOR(pcidomain, PCIDOMAIN, uint32_t) +LEGACY_ACCESSOR(pcibus, PCIBUS, uint32_t) +LEGACY_ACCESSOR(pcislot, PCISLOT, int) +LEGACY_ACCESSOR(pcifunc, PCIFUNC, int) + +#undef LEGACY_ACCESSOR + +int legacy_pcib_maxslots(device_t dev); +uint32_t legacy_pcib_read_config(device_t dev, u_int bus, u_int slot, + u_int func, u_int reg, int bytes); +int legacy_pcib_read_ivar(device_t dev, device_t child, int which, + uintptr_t *result); +void legacy_pcib_write_config(device_t dev, u_int bus, u_int slot, + u_int func, u_int reg, uint32_t data, int bytes); +int legacy_pcib_write_ivar(device_t dev, device_t child, int which, + uintptr_t value); +struct resource *legacy_pcib_alloc_resource(device_t dev, device_t child, + int type, int *rid, rman_res_t start, rman_res_t end, rman_res_t count, + u_int flags); +int legacy_pcib_adjust_resource(device_t dev, device_t child, int type, + struct resource *r, rman_res_t start, rman_res_t end); +int legacy_pcib_release_resource(device_t dev, device_t child, int type, + int rid, struct resource *r); +int legacy_pcib_alloc_msi(device_t pcib, device_t dev, int count, + int maxcount, int *irqs); +int legacy_pcib_alloc_msix(device_t pcib, device_t dev, int *irq); +int legacy_pcib_map_msi(device_t pcib, device_t dev, int irq, + uint64_t *addr, uint32_t *data); + +#endif /* !_X86_LEGACYVAR_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/mca.h /usr/src/sys/modules/netmap/x86/mca.h --- usr/src/sys/modules/netmap/x86/mca.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/mca.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,56 @@ +/*- + * Copyright (c) 2009 Hudson River Trading LLC + * Written by: John H. Baldwin <jhb@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/mca.h 281887 2015-04-23 14:22:20Z jhb $ + */ + +#ifndef __X86_MCA_H__ +#define __X86_MCA_H__ + +struct mca_record { + uint64_t mr_status; + uint64_t mr_addr; + uint64_t mr_misc; + uint64_t mr_tsc; + int mr_apic_id; + int mr_bank; + uint64_t mr_mcg_cap; + uint64_t mr_mcg_status; + int mr_cpu_id; + int mr_cpu_vendor_id; + int mr_cpu; +}; + +#ifdef _KERNEL + +void cmc_intr(void); +void mca_init(void); +void mca_intr(void); +void mca_resume(void); + +#endif + +#endif /* !__X86_MCA_H__ */ diff -u -r -N usr/src/sys/modules/netmap/x86/metadata.h /usr/src/sys/modules/netmap/x86/metadata.h --- usr/src/sys/modules/netmap/x86/metadata.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/metadata.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,57 @@ +/*- + * Copyright (c) 2003 Peter Wemm <peter@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/metadata.h 293343 2016-01-07 19:47:26Z emaste $ + */ + +#ifndef _MACHINE_METADATA_H_ +#define _MACHINE_METADATA_H_ + +#define MODINFOMD_SMAP 0x1001 +#define MODINFOMD_SMAP_XATTR 0x1002 +#define MODINFOMD_DTBP 0x1003 +#define MODINFOMD_EFI_MAP 0x1004 +#define MODINFOMD_EFI_FB 0x1005 +#define MODINFOMD_MODULEP 0x1006 + +struct efi_map_header { + uint64_t memory_size; + uint64_t descriptor_size; + uint32_t descriptor_version; +}; + +struct efi_fb { + uint64_t fb_addr; + uint64_t fb_size; + uint32_t fb_height; + uint32_t fb_width; + uint32_t fb_stride; + uint32_t fb_mask_red; + uint32_t fb_mask_green; + uint32_t fb_mask_blue; + uint32_t fb_mask_reserved; +}; + +#endif /* !_MACHINE_METADATA_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/mptable.h /usr/src/sys/modules/netmap/x86/mptable.h --- usr/src/sys/modules/netmap/x86/mptable.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/mptable.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,204 @@ +/*- + * Copyright (c) 1996, by Steve Passe + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. The name of the developer may NOT be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/mptable.h 259228 2013-12-11 21:19:04Z jhb $ + */ + +#ifndef __MACHINE_MPTABLE_H__ +#define __MACHINE_MPTABLE_H__ + +enum busTypes { + NOBUS = 0, + CBUS = 1, + CBUSII = 2, + EISA = 3, + ISA = 6, + MCA = 9, + PCI = 13, + XPRESS = 18, + MAX_BUSTYPE = 18, + UNKNOWN_BUSTYPE = 0xff +}; + +/* MP Floating Pointer Structure */ +typedef struct MPFPS { + uint8_t signature[4]; + uint32_t pap; + uint8_t length; + uint8_t spec_rev; + uint8_t checksum; + uint8_t config_type; + uint8_t mpfb2; + uint8_t mpfb3; + uint8_t mpfb4; + uint8_t mpfb5; +} __packed *mpfps_t; + +#define MPFB2_IMCR_PRESENT 0x80 +#define MPFB2_MUL_CLK_SRCS 0x40 + +/* MP Configuration Table Header */ +typedef struct MPCTH { + uint8_t signature[4]; + uint16_t base_table_length; + uint8_t spec_rev; + uint8_t checksum; + uint8_t oem_id[8]; + uint8_t product_id[12]; + uint32_t oem_table_pointer; + uint16_t oem_table_size; + uint16_t entry_count; + uint32_t apic_address; + uint16_t extended_table_length; + uint8_t extended_table_checksum; + uint8_t reserved; +} __packed *mpcth_t; + +/* Base table entries */ + +#define MPCT_ENTRY_PROCESSOR 0 +#define MPCT_ENTRY_BUS 1 +#define MPCT_ENTRY_IOAPIC 2 +#define MPCT_ENTRY_INT 3 +#define MPCT_ENTRY_LOCAL_INT 4 + +typedef struct PROCENTRY { + uint8_t type; + uint8_t apic_id; + uint8_t apic_version; + uint8_t cpu_flags; + uint32_t cpu_signature; + uint32_t feature_flags; + uint32_t reserved1; + uint32_t reserved2; +} __packed *proc_entry_ptr; + +#define PROCENTRY_FLAG_EN 0x01 +#define PROCENTRY_FLAG_BP 0x02 + +typedef struct BUSENTRY { + uint8_t type; + uint8_t bus_id; + uint8_t bus_type[6]; +} __packed *bus_entry_ptr; + +typedef struct IOAPICENTRY { + uint8_t type; + uint8_t apic_id; + uint8_t apic_version; + uint8_t apic_flags; + uint32_t apic_address; +} __packed *io_apic_entry_ptr; + +#define IOAPICENTRY_FLAG_EN 0x01 + +typedef struct INTENTRY { + uint8_t type; + uint8_t int_type; + uint16_t int_flags; + uint8_t src_bus_id; + uint8_t src_bus_irq; + uint8_t dst_apic_id; + uint8_t dst_apic_int; +} __packed *int_entry_ptr; + +#define INTENTRY_TYPE_INT 0 +#define INTENTRY_TYPE_NMI 1 +#define INTENTRY_TYPE_SMI 2 +#define INTENTRY_TYPE_EXTINT 3 + +#define INTENTRY_FLAGS_POLARITY 0x3 +#define INTENTRY_FLAGS_POLARITY_CONFORM 0x0 +#define INTENTRY_FLAGS_POLARITY_ACTIVEHI 0x1 +#define INTENTRY_FLAGS_POLARITY_ACTIVELO 0x3 +#define INTENTRY_FLAGS_TRIGGER 0xc +#define INTENTRY_FLAGS_TRIGGER_CONFORM 0x0 +#define INTENTRY_FLAGS_TRIGGER_EDGE 0x4 +#define INTENTRY_FLAGS_TRIGGER_LEVEL 0xc + +/* Extended table entries */ + +typedef struct EXTENTRY { + uint8_t type; + uint8_t length; +} __packed *ext_entry_ptr; + +#define MPCT_EXTENTRY_SAS 0x80 +#define MPCT_EXTENTRY_BHD 0x81 +#define MPCT_EXTENTRY_CBASM 0x82 + +typedef struct SASENTRY { + uint8_t type; + uint8_t length; + uint8_t bus_id; + uint8_t address_type; + uint64_t address_base; + uint64_t address_length; +} __packed *sas_entry_ptr; + +#define SASENTRY_TYPE_IO 0 +#define SASENTRY_TYPE_MEMORY 1 +#define SASENTRY_TYPE_PREFETCH 2 + +typedef struct BHDENTRY { + uint8_t type; + uint8_t length; + uint8_t bus_id; + uint8_t bus_info; + uint8_t parent_bus; + uint8_t reserved[3]; +} __packed *bhd_entry_ptr; + +#define BHDENTRY_INFO_SUBTRACTIVE_DECODE 0x1 + +typedef struct CBASMENTRY { + uint8_t type; + uint8_t length; + uint8_t bus_id; + uint8_t address_mod; + uint32_t predefined_range; +} __packed *cbasm_entry_ptr; + +#define CBASMENTRY_ADDRESS_MOD_ADD 0x0 +#define CBASMENTRY_ADDRESS_MOD_SUBTRACT 0x1 + +#define CBASMENTRY_RANGE_ISA_IO 0 +#define CBASMENTRY_RANGE_VGA_IO 1 + +#ifdef _KERNEL +struct mptable_hostb_softc { +#ifdef NEW_PCIB + struct pcib_host_resources sc_host_res; + int sc_decodes_vga_io; + int sc_decodes_isa_io; +#endif +}; + +#ifdef NEW_PCIB +void mptable_pci_host_res_init(device_t pcib); +#endif +int mptable_pci_probe_table(int bus); +int mptable_pci_route_interrupt(device_t pcib, device_t dev, int pin); +#endif +#endif /* !__MACHINE_MPTABLE_H__ */ diff -u -r -N usr/src/sys/modules/netmap/x86/ofw_machdep.h /usr/src/sys/modules/netmap/x86/ofw_machdep.h --- usr/src/sys/modules/netmap/x86/ofw_machdep.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/ofw_machdep.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,42 @@ +/*- + * Copyright (c) 2013 Juniper Networks, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/ofw_machdep.h 287260 2015-08-28 15:41:09Z imp $ + */ + +#ifndef _MACHINE_OFW_MACHDEP_H_ +#define _MACHINE_OFW_MACHDEP_H_ + +#include <machine/bus.h> +#include <vm/vm.h> + +typedef uint32_t cell_t; + +struct mem_region { + vm_offset_t mr_start; + vm_size_t mr_size; +}; + +#endif /* _MACHINE_OFW_MACHDEP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/pci_cfgreg.h /usr/src/sys/modules/netmap/x86/pci_cfgreg.h --- usr/src/sys/modules/netmap/x86/pci_cfgreg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/pci_cfgreg.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,60 @@ +/*- + * Copyright (c) 1997, Stefan Esser <se@freebsd.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/pci_cfgreg.h 294883 2016-01-27 02:23:54Z jhibbits $ + * + */ + +#ifndef __X86_PCI_CFGREG_H__ +#define __X86_PCI_CFGREG_H__ + +#define CONF1_ADDR_PORT 0x0cf8 +#define CONF1_DATA_PORT 0x0cfc + +#define CONF1_ENABLE 0x80000000ul +#define CONF1_ENABLE_CHK 0x80000000ul +#define CONF1_ENABLE_MSK 0x7f000000ul +#define CONF1_ENABLE_CHK1 0xff000001ul +#define CONF1_ENABLE_MSK1 0x80000001ul +#define CONF1_ENABLE_RES1 0x80000000ul + +#define CONF2_ENABLE_PORT 0x0cf8 +#define CONF2_FORWARD_PORT 0x0cfa + +#define CONF2_ENABLE_CHK 0x0e +#define CONF2_ENABLE_RES 0x0e + +rman_res_t hostb_alloc_start(int type, rman_res_t start, rman_res_t end, rman_res_t count); +int pcie_cfgregopen(uint64_t base, uint8_t minbus, uint8_t maxbus); +int pci_cfgregopen(void); +u_int32_t pci_cfgregread(int bus, int slot, int func, int reg, int bytes); +void pci_cfgregwrite(int bus, int slot, int func, int reg, u_int32_t data, int bytes); +#ifdef __HAVE_PIR +void pci_pir_open(void); +int pci_pir_probe(int bus, int require_parse); +int pci_pir_route_interrupt(int bus, int device, int func, int pin); +#endif + +#endif /* !__X86_PCI_CFGREG_H__ */ diff -u -r -N usr/src/sys/modules/netmap/x86/psl.h /usr/src/sys/modules/netmap/x86/psl.h --- usr/src/sys/modules/netmap/x86/psl.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/psl.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,92 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)psl.h 5.2 (Berkeley) 1/18/91 + * $FreeBSD: releng/11.0/sys/x86/include/psl.h 258135 2013-11-14 15:37:20Z emaste $ + */ + +#ifndef _MACHINE_PSL_H_ +#define _MACHINE_PSL_H_ + +/* + * 386 processor status longword. + */ +#define PSL_C 0x00000001 /* carry bit */ +#define PSL_PF 0x00000004 /* parity bit */ +#define PSL_AF 0x00000010 /* bcd carry bit */ +#define PSL_Z 0x00000040 /* zero bit */ +#define PSL_N 0x00000080 /* negative bit */ +#define PSL_T 0x00000100 /* trace enable bit */ +#define PSL_I 0x00000200 /* interrupt enable bit */ +#define PSL_D 0x00000400 /* string instruction direction bit */ +#define PSL_V 0x00000800 /* overflow bit */ +#define PSL_IOPL 0x00003000 /* i/o privilege level */ +#define PSL_NT 0x00004000 /* nested task bit */ +#define PSL_RF 0x00010000 /* resume flag bit */ +#define PSL_VM 0x00020000 /* virtual 8086 mode bit */ +#define PSL_AC 0x00040000 /* alignment checking */ +#define PSL_VIF 0x00080000 /* virtual interrupt enable */ +#define PSL_VIP 0x00100000 /* virtual interrupt pending */ +#define PSL_ID 0x00200000 /* identification bit */ + +/* + * The i486 manual says that we are not supposed to change reserved flags, + * but this is too much trouble since the reserved flags depend on the cpu + * and setting them to their historical values works in practice. + */ +#define PSL_RESERVED_DEFAULT 0x00000002 + +/* + * Initial flags for kernel and user mode. The kernel later inherits + * PSL_I and some other flags from user mode. + */ +#define PSL_KERNEL PSL_RESERVED_DEFAULT +#define PSL_USER (PSL_RESERVED_DEFAULT | PSL_I) + +/* + * Bits that can be changed in user mode on 486's. We allow these bits + * to be changed using ptrace(), sigreturn() and procfs. Setting PS_NT + * is undesirable but it may as well be allowed since users can inflict + * it on the kernel directly. Changes to PSL_AC are silently ignored on + * 386's. + * + * Users are allowed to change the privileged flag PSL_RF. The cpu sets PSL_RF + * in tf_eflags for faults. Debuggers should sometimes set it there too. + * tf_eflags is kept in the signal context during signal handling and there is + * no other place to remember it, so the PSL_RF bit may be corrupted by the + * signal handler without us knowing. Corruption of the PSL_RF bit at worst + * causes one more or one less debugger trap, so allowing it is fairly + * harmless. + */ +#define PSL_USERCHANGE (PSL_C | PSL_PF | PSL_AF | PSL_Z | PSL_N | PSL_T \ + | PSL_D | PSL_V | PSL_NT | PSL_RF | PSL_AC | PSL_ID) + +#endif /* !_MACHINE_PSL_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/ptrace.h /usr/src/sys/modules/netmap/x86/ptrace.h --- usr/src/sys/modules/netmap/x86/ptrace.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/ptrace.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,65 @@ +/*- + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)ptrace.h 8.1 (Berkeley) 6/11/93 + * $FreeBSD: releng/11.0/sys/x86/include/ptrace.h 284919 2015-06-29 07:07:24Z kib $ + */ + +#ifndef _MACHINE_PTRACE_H_ +#define _MACHINE_PTRACE_H_ + +#define __HAVE_PTRACE_MACHDEP + +/* + * On amd64 (PT_FIRSTMACH + 0) and (PT_FIRSTMACH + 1) are old values for + * PT_GETXSTATE_OLD and PT_SETXSTATE_OLD. They should not be (re)used. + */ + +#ifdef __i386__ +#define PT_GETXMMREGS (PT_FIRSTMACH + 0) +#define PT_SETXMMREGS (PT_FIRSTMACH + 1) +#endif +#ifdef _KERNEL +#define PT_GETXSTATE_OLD (PT_FIRSTMACH + 2) +#define PT_SETXSTATE_OLD (PT_FIRSTMACH + 3) +#endif +#define PT_GETXSTATE_INFO (PT_FIRSTMACH + 4) +#define PT_GETXSTATE (PT_FIRSTMACH + 5) +#define PT_SETXSTATE (PT_FIRSTMACH + 6) +#define PT_GETFSBASE (PT_FIRSTMACH + 7) +#define PT_SETFSBASE (PT_FIRSTMACH + 8) +#define PT_GETGSBASE (PT_FIRSTMACH + 9) +#define PT_SETGSBASE (PT_FIRSTMACH + 10) + +/* Argument structure for PT_GETXSTATE_INFO. */ +struct ptrace_xstate_info { + uint64_t xsave_mask; + uint32_t xsave_len; +}; + +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/pvclock.h /usr/src/sys/modules/netmap/x86/pvclock.h --- usr/src/sys/modules/netmap/x86/pvclock.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/pvclock.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,59 @@ +/*- + * Copyright (c) 2014, Bryan Venteicher <bryanv@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/pvclock.h 278184 2015-02-04 08:33:04Z bryanv $ + */ + +#ifndef X86_PVCLOCK +#define X86_PVCLOCK + +struct pvclock_vcpu_time_info { + uint32_t version; + uint32_t pad0; + uint64_t tsc_timestamp; + uint64_t system_time; + uint32_t tsc_to_system_mul; + int8_t tsc_shift; + uint8_t flags; + uint8_t pad[2]; +}; + +#define PVCLOCK_FLAG_TSC_STABLE 0x01 +#define PVCLOCK_FLAG_GUEST_PASUED 0x02 + +struct pvclock_wall_clock { + uint32_t version; + uint32_t sec; + uint32_t nsec; +}; + +void pvclock_resume(void); +uint64_t pvclock_get_last_cycles(void); +uint64_t pvclock_tsc_freq(struct pvclock_vcpu_time_info *ti); +uint64_t pvclock_get_timecount(struct pvclock_vcpu_time_info *ti); +void pvclock_get_wallclock(struct pvclock_wall_clock *wc, + struct timespec *ts); + +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/reg.h /usr/src/sys/modules/netmap/x86/reg.h --- usr/src/sys/modules/netmap/x86/reg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/reg.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,257 @@ +/*- + * Copyright (c) 2003 Peter Wemm. + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)reg.h 5.5 (Berkeley) 1/18/91 + * $FreeBSD: releng/11.0/sys/x86/include/reg.h 281266 2015-04-08 16:30:45Z jhb $ + */ + +#ifndef _MACHINE_REG_H_ +#define _MACHINE_REG_H_ + +#include <machine/_types.h> + +#ifdef __i386__ +/* + * Indices for registers in `struct trapframe' and `struct regs'. + * + * This interface is deprecated. In the kernel, it is only used in FPU + * emulators to convert from register numbers encoded in instructions to + * register values. Everything else just accesses the relevant struct + * members. In userland, debuggers tend to abuse this interface since + * they don't understand that `struct regs' is a struct. I hope they have + * stopped accessing the registers in the trap frame via PT_{READ,WRITE}_U + * and we can stop supporting the user area soon. + */ +#define tFS (0) +#define tES (1) +#define tDS (2) +#define tEDI (3) +#define tESI (4) +#define tEBP (5) +#define tISP (6) +#define tEBX (7) +#define tEDX (8) +#define tECX (9) +#define tEAX (10) +#define tERR (12) +#define tEIP (13) +#define tCS (14) +#define tEFLAGS (15) +#define tESP (16) +#define tSS (17) + +/* + * Indices for registers in `struct regs' only. + * + * Some registers live in the pcb and are only in an "array" with the + * other registers in application interfaces that copy all the registers + * to or from a `struct regs'. + */ +#define tGS (18) +#endif /* __i386__ */ + +/* Rename the structs below depending on the machine architecture. */ +#ifdef __i386__ +#define __reg32 reg +#define __fpreg32 fpreg +#define __dbreg32 dbreg +#else +#define __reg32 reg32 +#define __reg64 reg +#define __fpreg32 fpreg32 +#define __fpreg64 fpreg +#define __dbreg32 dbreg32 +#define __dbreg64 dbreg +#define __HAVE_REG32 +#endif + +/* + * Register set accessible via /proc/$pid/regs and PT_{SET,GET}REGS. + */ +struct __reg32 { + __uint32_t r_fs; + __uint32_t r_es; + __uint32_t r_ds; + __uint32_t r_edi; + __uint32_t r_esi; + __uint32_t r_ebp; + __uint32_t r_isp; + __uint32_t r_ebx; + __uint32_t r_edx; + __uint32_t r_ecx; + __uint32_t r_eax; + __uint32_t r_trapno; + __uint32_t r_err; + __uint32_t r_eip; + __uint32_t r_cs; + __uint32_t r_eflags; + __uint32_t r_esp; + __uint32_t r_ss; + __uint32_t r_gs; +}; + +struct __reg64 { + __int64_t r_r15; + __int64_t r_r14; + __int64_t r_r13; + __int64_t r_r12; + __int64_t r_r11; + __int64_t r_r10; + __int64_t r_r9; + __int64_t r_r8; + __int64_t r_rdi; + __int64_t r_rsi; + __int64_t r_rbp; + __int64_t r_rbx; + __int64_t r_rdx; + __int64_t r_rcx; + __int64_t r_rax; + __uint32_t r_trapno; + __uint16_t r_fs; + __uint16_t r_gs; + __uint32_t r_err; + __uint16_t r_es; + __uint16_t r_ds; + __int64_t r_rip; + __int64_t r_cs; + __int64_t r_rflags; + __int64_t r_rsp; + __int64_t r_ss; +}; + +/* + * Register set accessible via /proc/$pid/fpregs. + * + * XXX should get struct from fpu.h. Here we give a slightly + * simplified struct. This may be too much detail. Perhaps + * an array of unsigned longs is best. + */ +struct __fpreg32 { + __uint32_t fpr_env[7]; + __uint8_t fpr_acc[8][10]; + __uint32_t fpr_ex_sw; + __uint8_t fpr_pad[64]; +}; + +struct __fpreg64 { + __uint64_t fpr_env[4]; + __uint8_t fpr_acc[8][16]; + __uint8_t fpr_xacc[16][16]; + __uint64_t fpr_spare[12]; +}; + +/* + * Register set accessible via PT_GETXMMREGS (i386). + */ +struct xmmreg { + /* + * XXX should get struct from npx.h. Here we give a slightly + * simplified struct. This may be too much detail. Perhaps + * an array of unsigned longs is best. + */ + __uint32_t xmm_env[8]; + __uint8_t xmm_acc[8][16]; + __uint8_t xmm_reg[8][16]; + __uint8_t xmm_pad[224]; +}; + +/* + * Register set accessible via /proc/$pid/dbregs. + */ +struct __dbreg32 { + __uint32_t dr[8]; /* debug registers */ + /* Index 0-3: debug address registers */ + /* Index 4-5: reserved */ + /* Index 6: debug status */ + /* Index 7: debug control */ +}; + +struct __dbreg64 { + __uint64_t dr[16]; /* debug registers */ + /* Index 0-3: debug address registers */ + /* Index 4-5: reserved */ + /* Index 6: debug status */ + /* Index 7: debug control */ + /* Index 8-15: reserved */ +}; + +#define DBREG_DR7_LOCAL_ENABLE 0x01 +#define DBREG_DR7_GLOBAL_ENABLE 0x02 +#define DBREG_DR7_LEN_1 0x00 /* 1 byte length */ +#define DBREG_DR7_LEN_2 0x01 +#define DBREG_DR7_LEN_4 0x03 +#define DBREG_DR7_LEN_8 0x02 +#define DBREG_DR7_EXEC 0x00 /* break on execute */ +#define DBREG_DR7_WRONLY 0x01 /* break on write */ +#define DBREG_DR7_RDWR 0x03 /* break on read or write */ +#define DBREG_DR7_MASK(i) \ + ((__u_register_t)(0xf) << ((i) * 4 + 16) | 0x3 << (i) * 2) +#define DBREG_DR7_SET(i, len, access, enable) \ + ((__u_register_t)((len) << 2 | (access)) << ((i) * 4 + 16) | \ + (enable) << (i) * 2) +#define DBREG_DR7_GD 0x2000 +#define DBREG_DR7_ENABLED(d, i) (((d) & 0x3 << (i) * 2) != 0) +#define DBREG_DR7_ACCESS(d, i) ((d) >> ((i) * 4 + 16) & 0x3) +#define DBREG_DR7_LEN(d, i) ((d) >> ((i) * 4 + 18) & 0x3) + +#define DBREG_DRX(d,x) ((d)->dr[(x)]) /* reference dr0 - dr7 by + register number */ + +#undef __reg32 +#undef __reg64 +#undef __fpreg32 +#undef __fpreg64 +#undef __dbreg32 +#undef __dbreg64 + +#ifdef _KERNEL +/* + * XXX these interfaces are MI, so they should be declared in a MI place. + */ +int fill_regs(struct thread *, struct reg *); +int fill_frame_regs(struct trapframe *, struct reg *); +int set_regs(struct thread *, struct reg *); +int fill_fpregs(struct thread *, struct fpreg *); +int set_fpregs(struct thread *, struct fpreg *); +int fill_dbregs(struct thread *, struct dbreg *); +int set_dbregs(struct thread *, struct dbreg *); +#ifdef COMPAT_FREEBSD32 +int fill_regs32(struct thread *, struct reg32 *); +int set_regs32(struct thread *, struct reg32 *); +int fill_fpregs32(struct thread *, struct fpreg32 *); +int set_fpregs32(struct thread *, struct fpreg32 *); +int fill_dbregs32(struct thread *, struct dbreg32 *); +int set_dbregs32(struct thread *, struct dbreg32 *); +#endif +#endif + +#endif /* !_MACHINE_REG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/segments.h /usr/src/sys/modules/netmap/x86/segments.h --- usr/src/sys/modules/netmap/x86/segments.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/segments.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,279 @@ +/*- + * Copyright (c) 1989, 1990 William F. Jolitz + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)segments.h 7.1 (Berkeley) 5/9/91 + * $FreeBSD: releng/11.0/sys/x86/include/segments.h 282274 2015-04-30 15:48:48Z jhb $ + */ + +#ifndef _X86_SEGMENTS_H_ +#define _X86_SEGMENTS_H_ + +/* + * X86 Segmentation Data Structures and definitions + */ + +/* + * Selectors + */ +#define SEL_RPL_MASK 3 /* requester priv level */ +#define ISPL(s) ((s)&3) /* priority level of a selector */ +#define SEL_KPL 0 /* kernel priority level */ +#define SEL_UPL 3 /* user priority level */ +#define ISLDT(s) ((s)&SEL_LDT) /* is it local or global */ +#define SEL_LDT 4 /* local descriptor table */ +#define IDXSEL(s) (((s)>>3) & 0x1fff) /* index of selector */ +#define LSEL(s,r) (((s)<<3) | SEL_LDT | r) /* a local selector */ +#define GSEL(s,r) (((s)<<3) | r) /* a global selector */ + +/* + * User segment descriptors (%cs, %ds etc for i386 apps. 64 bit wide) + * For long-mode apps, %cs only has the conforming bit in sd_type, the sd_dpl, + * sd_p, sd_l and sd_def32 which must be zero). %ds only has sd_p. + */ +struct segment_descriptor { + unsigned sd_lolimit:16; /* segment extent (lsb) */ + unsigned sd_lobase:24; /* segment base address (lsb) */ + unsigned sd_type:5; /* segment type */ + unsigned sd_dpl:2; /* segment descriptor priority level */ + unsigned sd_p:1; /* segment descriptor present */ + unsigned sd_hilimit:4; /* segment extent (msb) */ + unsigned sd_xx:2; /* unused */ + unsigned sd_def32:1; /* default 32 vs 16 bit size */ + unsigned sd_gran:1; /* limit granularity (byte/page units)*/ + unsigned sd_hibase:8; /* segment base address (msb) */ +} __packed; + +struct user_segment_descriptor { + unsigned sd_lolimit:16; /* segment extent (lsb) */ + unsigned sd_lobase:24; /* segment base address (lsb) */ + unsigned sd_type:5; /* segment type */ + unsigned sd_dpl:2; /* segment descriptor priority level */ + unsigned sd_p:1; /* segment descriptor present */ + unsigned sd_hilimit:4; /* segment extent (msb) */ + unsigned sd_xx:1; /* unused */ + unsigned sd_long:1; /* long mode (cs only) */ + unsigned sd_def32:1; /* default 32 vs 16 bit size */ + unsigned sd_gran:1; /* limit granularity (byte/page units)*/ + unsigned sd_hibase:8; /* segment base address (msb) */ +} __packed; + +#define USD_GETBASE(sd) (((sd)->sd_lobase) | (sd)->sd_hibase << 24) +#define USD_SETBASE(sd, b) (sd)->sd_lobase = (b); \ + (sd)->sd_hibase = ((b) >> 24); +#define USD_GETLIMIT(sd) (((sd)->sd_lolimit) | (sd)->sd_hilimit << 16) +#define USD_SETLIMIT(sd, l) (sd)->sd_lolimit = (l); \ + (sd)->sd_hilimit = ((l) >> 16); + +#ifdef __i386__ +/* + * Gate descriptors (e.g. indirect descriptors) + */ +struct gate_descriptor { + unsigned gd_looffset:16; /* gate offset (lsb) */ + unsigned gd_selector:16; /* gate segment selector */ + unsigned gd_stkcpy:5; /* number of stack wds to cpy */ + unsigned gd_xx:3; /* unused */ + unsigned gd_type:5; /* segment type */ + unsigned gd_dpl:2; /* segment descriptor priority level */ + unsigned gd_p:1; /* segment descriptor present */ + unsigned gd_hioffset:16; /* gate offset (msb) */ +} __packed; + +/* + * Generic descriptor + */ +union descriptor { + struct segment_descriptor sd; + struct gate_descriptor gd; +}; +#else +/* + * Gate descriptors (e.g. indirect descriptors, trap, interrupt etc. 128 bit) + * Only interrupt and trap gates have gd_ist. + */ +struct gate_descriptor { + uint64_t gd_looffset:16; /* gate offset (lsb) */ + uint64_t gd_selector:16; /* gate segment selector */ + uint64_t gd_ist:3; /* IST table index */ + uint64_t gd_xx:5; /* unused */ + uint64_t gd_type:5; /* segment type */ + uint64_t gd_dpl:2; /* segment descriptor priority level */ + uint64_t gd_p:1; /* segment descriptor present */ + uint64_t gd_hioffset:48; /* gate offset (msb) */ + uint64_t sd_xx1:32; +} __packed; + +/* + * Generic descriptor + */ +union descriptor { + struct user_segment_descriptor sd; + struct gate_descriptor gd; +}; +#endif + + /* system segments and gate types */ +#define SDT_SYSNULL 0 /* system null */ +#define SDT_SYS286TSS 1 /* system 286 TSS available */ +#define SDT_SYSLDT 2 /* system local descriptor table */ +#define SDT_SYS286BSY 3 /* system 286 TSS busy */ +#define SDT_SYS286CGT 4 /* system 286 call gate */ +#define SDT_SYSTASKGT 5 /* system task gate */ +#define SDT_SYS286IGT 6 /* system 286 interrupt gate */ +#define SDT_SYS286TGT 7 /* system 286 trap gate */ +#define SDT_SYSNULL2 8 /* system null again */ +#define SDT_SYS386TSS 9 /* system 386 TSS available */ +#define SDT_SYSTSS 9 /* system available 64 bit TSS */ +#define SDT_SYSNULL3 10 /* system null again */ +#define SDT_SYS386BSY 11 /* system 386 TSS busy */ +#define SDT_SYSBSY 11 /* system busy 64 bit TSS */ +#define SDT_SYS386CGT 12 /* system 386 call gate */ +#define SDT_SYSCGT 12 /* system 64 bit call gate */ +#define SDT_SYSNULL4 13 /* system null again */ +#define SDT_SYS386IGT 14 /* system 386 interrupt gate */ +#define SDT_SYSIGT 14 /* system 64 bit interrupt gate */ +#define SDT_SYS386TGT 15 /* system 386 trap gate */ +#define SDT_SYSTGT 15 /* system 64 bit trap gate */ + + /* memory segment types */ +#define SDT_MEMRO 16 /* memory read only */ +#define SDT_MEMROA 17 /* memory read only accessed */ +#define SDT_MEMRW 18 /* memory read write */ +#define SDT_MEMRWA 19 /* memory read write accessed */ +#define SDT_MEMROD 20 /* memory read only expand dwn limit */ +#define SDT_MEMRODA 21 /* memory read only expand dwn limit accessed */ +#define SDT_MEMRWD 22 /* memory read write expand dwn limit */ +#define SDT_MEMRWDA 23 /* memory read write expand dwn limit accessed*/ +#define SDT_MEME 24 /* memory execute only */ +#define SDT_MEMEA 25 /* memory execute only accessed */ +#define SDT_MEMER 26 /* memory execute read */ +#define SDT_MEMERA 27 /* memory execute read accessed */ +#define SDT_MEMEC 28 /* memory execute only conforming */ +#define SDT_MEMEAC 29 /* memory execute only accessed conforming */ +#define SDT_MEMERC 30 /* memory execute read conforming */ +#define SDT_MEMERAC 31 /* memory execute read accessed conforming */ + +/* + * Size of IDT table + */ +#define NIDT 256 /* 32 reserved, 0x80 syscall, most are h/w */ +#define NRSVIDT 32 /* reserved entries for cpu exceptions */ + +/* + * Entries in the Interrupt Descriptor Table (IDT) + */ +#define IDT_DE 0 /* #DE: Divide Error */ +#define IDT_DB 1 /* #DB: Debug */ +#define IDT_NMI 2 /* Nonmaskable External Interrupt */ +#define IDT_BP 3 /* #BP: Breakpoint */ +#define IDT_OF 4 /* #OF: Overflow */ +#define IDT_BR 5 /* #BR: Bound Range Exceeded */ +#define IDT_UD 6 /* #UD: Undefined/Invalid Opcode */ +#define IDT_NM 7 /* #NM: No Math Coprocessor */ +#define IDT_DF 8 /* #DF: Double Fault */ +#define IDT_FPUGP 9 /* Coprocessor Segment Overrun */ +#define IDT_TS 10 /* #TS: Invalid TSS */ +#define IDT_NP 11 /* #NP: Segment Not Present */ +#define IDT_SS 12 /* #SS: Stack Segment Fault */ +#define IDT_GP 13 /* #GP: General Protection Fault */ +#define IDT_PF 14 /* #PF: Page Fault */ +#define IDT_MF 16 /* #MF: FPU Floating-Point Error */ +#define IDT_AC 17 /* #AC: Alignment Check */ +#define IDT_MC 18 /* #MC: Machine Check */ +#define IDT_XF 19 /* #XF: SIMD Floating-Point Exception */ +#define IDT_IO_INTS NRSVIDT /* Base of IDT entries for I/O interrupts. */ +#define IDT_SYSCALL 0x80 /* System Call Interrupt Vector */ +#define IDT_DTRACE_RET 0x92 /* DTrace pid provider Interrupt Vector */ +#define IDT_EVTCHN 0x93 /* Xen HVM Event Channel Interrupt Vector */ + +#if defined(__i386__) +/* + * Entries in the Global Descriptor Table (GDT) + * Note that each 4 entries share a single 32 byte L1 cache line. + * Some of the fast syscall instructions require a specific order here. + */ +#define GNULL_SEL 0 /* Null Descriptor */ +#define GPRIV_SEL 1 /* SMP Per-Processor Private Data */ +#define GUFS_SEL 2 /* User %fs Descriptor (order critical: 1) */ +#define GUGS_SEL 3 /* User %gs Descriptor (order critical: 2) */ +#define GCODE_SEL 4 /* Kernel Code Descriptor (order critical: 1) */ +#define GDATA_SEL 5 /* Kernel Data Descriptor (order critical: 2) */ +#define GUCODE_SEL 6 /* User Code Descriptor (order critical: 3) */ +#define GUDATA_SEL 7 /* User Data Descriptor (order critical: 4) */ +#define GBIOSLOWMEM_SEL 8 /* BIOS low memory access (must be entry 8) */ +#define GPROC0_SEL 9 /* Task state process slot zero and up */ +#define GLDT_SEL 10 /* Default User LDT */ +#define GUSERLDT_SEL 11 /* User LDT */ +#define GPANIC_SEL 12 /* Task state to consider panic from */ +#define GBIOSCODE32_SEL 13 /* BIOS interface (32bit Code) */ +#define GBIOSCODE16_SEL 14 /* BIOS interface (16bit Code) */ +#define GBIOSDATA_SEL 15 /* BIOS interface (Data) */ +#define GBIOSUTIL_SEL 16 /* BIOS interface (Utility) */ +#define GBIOSARGS_SEL 17 /* BIOS interface (Arguments) */ +#define GNDIS_SEL 18 /* For the NDIS layer */ +#define NGDT 19 + +/* + * Entries in the Local Descriptor Table (LDT) + */ +#define LSYS5CALLS_SEL 0 /* forced by intel BCS */ +#define LSYS5SIGR_SEL 1 +#define L43BSDCALLS_SEL 2 /* notyet */ +#define LUCODE_SEL 3 +#define LSOL26CALLS_SEL 4 /* Solaris >= 2.6 system call gate */ +#define LUDATA_SEL 5 +/* separate stack, es,fs,gs sels ? */ +/* #define LPOSIXCALLS_SEL 5*/ /* notyet */ +#define LBSDICALLS_SEL 16 /* BSDI system call gate */ +#define NLDT (LBSDICALLS_SEL + 1) + +#else /* !__i386__ */ +/* + * Entries in the Global Descriptor Table (GDT) + */ +#define GNULL_SEL 0 /* Null Descriptor */ +#define GNULL2_SEL 1 /* Null Descriptor */ +#define GUFS32_SEL 2 /* User 32 bit %fs Descriptor */ +#define GUGS32_SEL 3 /* User 32 bit %gs Descriptor */ +#define GCODE_SEL 4 /* Kernel Code Descriptor */ +#define GDATA_SEL 5 /* Kernel Data Descriptor */ +#define GUCODE32_SEL 6 /* User 32 bit code Descriptor */ +#define GUDATA_SEL 7 /* User 32/64 bit Data Descriptor */ +#define GUCODE_SEL 8 /* User 64 bit Code Descriptor */ +#define GPROC0_SEL 9 /* TSS for entering kernel etc */ +/* slot 10 is second half of GPROC0_SEL */ +#define GUSERLDT_SEL 11 /* LDT */ +/* slot 12 is second half of GUSERLDT_SEL */ +#define NGDT 13 +#endif /* __i386__ */ + +#endif /* !_X86_SEGMENTS_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/setjmp.h /usr/src/sys/modules/netmap/x86/setjmp.h --- usr/src/sys/modules/netmap/x86/setjmp.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/setjmp.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,50 @@ +/*- + * Copyright (c) 1998 John Birrell <jb@cimlogic.com.au>. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of any co-contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY JOHN BIRRELL AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/setjmp.h 232275 2012-02-28 22:17:52Z tijl $ + */ + +#ifndef _MACHINE_SETJMP_H_ +#define _MACHINE_SETJMP_H_ + +#include <sys/cdefs.h> + +#define _JBLEN 12 /* Size of the jmp_buf on AMD64. */ + +/* + * jmp_buf and sigjmp_buf are encapsulated in different structs to force + * compile-time diagnostics for mismatches. The structs are the same + * internally to avoid some run-time errors for mismatches. + */ +#if __BSD_VISIBLE || __POSIX_VISIBLE || __XSI_VISIBLE +typedef struct _sigjmp_buf { long _sjb[_JBLEN]; } sigjmp_buf[1]; +#endif + +typedef struct _jmp_buf { long _jb[_JBLEN]; } jmp_buf[1]; + +#endif /* !_MACHINE_SETJMP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/sigframe.h /usr/src/sys/modules/netmap/x86/sigframe.h --- usr/src/sys/modules/netmap/x86/sigframe.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/sigframe.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,72 @@ +/*- + * Copyright (c) 1999 Marcel Moolenaar + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer + * in this position and unchanged. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/sigframe.h 247047 2013-02-20 17:39:52Z kib $ + */ + +#ifndef _X86_SIGFRAME_H_ +#define _X86_SIGFRAME_H_ + +/* + * Signal frames, arguments passed to application signal handlers. + */ + +#ifdef __i386__ +struct sigframe { + /* + * The first four members may be used by applications. + * + * NOTE: The 4th argument is undocumented, ill commented + * on and seems to be somewhat BSD "standard". Handlers + * installed with sigvec may be using it. + */ + register_t sf_signum; + register_t sf_siginfo; /* code or pointer to sf_si */ + register_t sf_ucontext; /* points to sf_uc */ + register_t sf_addr; /* undocumented 4th arg */ + + union { + __siginfohandler_t *sf_action; + __sighandler_t *sf_handler; + } sf_ahu; + ucontext_t sf_uc; /* = *sf_ucontext */ + siginfo_t sf_si; /* = *sf_siginfo (SA_SIGINFO case) */ +}; +#endif /* __i386__ */ + +#ifdef __amd64__ +struct sigframe { + union { + __siginfohandler_t *sf_action; + __sighandler_t *sf_handler; + } sf_ahu; + ucontext_t sf_uc; /* = *sf_ucontext */ + siginfo_t sf_si; /* = *sf_siginfo (SA_SIGINFO case) */ +}; +#endif /* __amd64__ */ + +#endif /* _X86_SIGFRAME_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/signal.h /usr/src/sys/modules/netmap/x86/signal.h --- usr/src/sys/modules/netmap/x86/signal.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/signal.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,167 @@ +/*- + * Copyright (c) 1986, 1989, 1991, 1993 + * The Regents of the University of California. All rights reserved. + * Copyright (c) 2003 Peter Wemm. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * @(#)signal.h 8.1 (Berkeley) 6/11/93 + * $FreeBSD: releng/11.0/sys/x86/include/signal.h 247047 2013-02-20 17:39:52Z kib $ + */ + +#ifndef _X86_SIGNAL_H +#define _X86_SIGNAL_H 1 + +/* + * Machine-dependent signal definitions + */ + +#include <sys/cdefs.h> +#include <sys/_sigset.h> + +#if __BSD_VISIBLE +#include <machine/trap.h> /* codes for SIGILL, SIGFPE */ +#endif + +#ifdef __i386__ +typedef int sig_atomic_t; + +#if __BSD_VISIBLE +struct sigcontext { + struct __sigset sc_mask; /* signal mask to restore */ + int sc_onstack; /* sigstack state to restore */ + int sc_gs; /* machine state (struct trapframe) */ + int sc_fs; + int sc_es; + int sc_ds; + int sc_edi; + int sc_esi; + int sc_ebp; + int sc_isp; + int sc_ebx; + int sc_edx; + int sc_ecx; + int sc_eax; + int sc_trapno; + int sc_err; + int sc_eip; + int sc_cs; + int sc_efl; + int sc_esp; + int sc_ss; + int sc_len; /* sizeof(mcontext_t) */ + /* + * See <machine/ucontext.h> and <machine/npx.h> for + * the following fields. + */ + int sc_fpformat; + int sc_ownedfp; + int sc_flags; + int sc_fpstate[128] __aligned(16); + + int sc_fsbase; + int sc_gsbase; + + int sc_xfpustate; + int sc_xfpustate_len; + + int sc_spare2[4]; +}; + +#define sc_sp sc_esp +#define sc_fp sc_ebp +#define sc_pc sc_eip +#define sc_ps sc_efl +#define sc_eflags sc_efl + +#endif /* __BSD_VISIBLE */ +#endif /* __i386__ */ + +#ifdef __amd64__ +typedef long sig_atomic_t; + +#if __BSD_VISIBLE +/* + * Information pushed on stack when a signal is delivered. + * This is used by the kernel to restore state following + * execution of the signal handler. It is also made available + * to the handler to allow it to restore state properly if + * a non-standard exit is performed. + * + * The sequence of the fields/registers after sc_mask in struct + * sigcontext must match those in mcontext_t and struct trapframe. + */ +struct sigcontext { + struct __sigset sc_mask; /* signal mask to restore */ + long sc_onstack; /* sigstack state to restore */ + long sc_rdi; /* machine state (struct trapframe) */ + long sc_rsi; + long sc_rdx; + long sc_rcx; + long sc_r8; + long sc_r9; + long sc_rax; + long sc_rbx; + long sc_rbp; + long sc_r10; + long sc_r11; + long sc_r12; + long sc_r13; + long sc_r14; + long sc_r15; + int sc_trapno; + short sc_fs; + short sc_gs; + long sc_addr; + int sc_flags; + short sc_es; + short sc_ds; + long sc_err; + long sc_rip; + long sc_cs; + long sc_rflags; + long sc_rsp; + long sc_ss; + long sc_len; /* sizeof(mcontext_t) */ + /* + * See <machine/ucontext.h> and <machine/fpu.h> for the following + * fields. + */ + long sc_fpformat; + long sc_ownedfp; + long sc_fpstate[64] __aligned(16); + + long sc_fsbase; + long sc_gsbase; + + long sc_xfpustate; + long sc_xfpustate_len; + + long sc_spare[4]; +}; +#endif /* __BSD_VISIBLE */ +#endif /* __amd64__ */ + +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/specialreg.h /usr/src/sys/modules/netmap/x86/specialreg.h --- usr/src/sys/modules/netmap/x86/specialreg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/specialreg.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,879 @@ +/*- + * Copyright (c) 1991 The Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)specialreg.h 7.1 (Berkeley) 5/9/91 + * $FreeBSD: releng/11.0/sys/x86/include/specialreg.h 298101 2016-04-16 06:07:13Z kib $ + */ + +#ifndef _MACHINE_SPECIALREG_H_ +#define _MACHINE_SPECIALREG_H_ + +/* + * Bits in 386 special registers: + */ +#define CR0_PE 0x00000001 /* Protected mode Enable */ +#define CR0_MP 0x00000002 /* "Math" (fpu) Present */ +#define CR0_EM 0x00000004 /* EMulate FPU instructions. (trap ESC only) */ +#define CR0_TS 0x00000008 /* Task Switched (if MP, trap ESC and WAIT) */ +#define CR0_PG 0x80000000 /* PaGing enable */ + +/* + * Bits in 486 special registers: + */ +#define CR0_NE 0x00000020 /* Numeric Error enable (EX16 vs IRQ13) */ +#define CR0_WP 0x00010000 /* Write Protect (honor page protect in + all modes) */ +#define CR0_AM 0x00040000 /* Alignment Mask (set to enable AC flag) */ +#define CR0_NW 0x20000000 /* Not Write-through */ +#define CR0_CD 0x40000000 /* Cache Disable */ + +#define CR3_PCID_SAVE 0x8000000000000000 +#define CR3_PCID_MASK 0xfff + +/* + * Bits in PPro special registers + */ +#define CR4_VME 0x00000001 /* Virtual 8086 mode extensions */ +#define CR4_PVI 0x00000002 /* Protected-mode virtual interrupts */ +#define CR4_TSD 0x00000004 /* Time stamp disable */ +#define CR4_DE 0x00000008 /* Debugging extensions */ +#define CR4_PSE 0x00000010 /* Page size extensions */ +#define CR4_PAE 0x00000020 /* Physical address extension */ +#define CR4_MCE 0x00000040 /* Machine check enable */ +#define CR4_PGE 0x00000080 /* Page global enable */ +#define CR4_PCE 0x00000100 /* Performance monitoring counter enable */ +#define CR4_FXSR 0x00000200 /* Fast FPU save/restore used by OS */ +#define CR4_XMM 0x00000400 /* enable SIMD/MMX2 to use except 16 */ +#define CR4_VMXE 0x00002000 /* enable VMX operation (Intel-specific) */ +#define CR4_FSGSBASE 0x00010000 /* Enable FS/GS BASE accessing instructions */ +#define CR4_PCIDE 0x00020000 /* Enable Context ID */ +#define CR4_XSAVE 0x00040000 /* XSETBV/XGETBV */ +#define CR4_SMEP 0x00100000 /* Supervisor-Mode Execution Prevention */ + +/* + * Bits in AMD64 special registers. EFER is 64 bits wide. + */ +#define EFER_SCE 0x000000001 /* System Call Extensions (R/W) */ +#define EFER_LME 0x000000100 /* Long mode enable (R/W) */ +#define EFER_LMA 0x000000400 /* Long mode active (R) */ +#define EFER_NXE 0x000000800 /* PTE No-Execute bit enable (R/W) */ +#define EFER_SVM 0x000001000 /* SVM enable bit for AMD, reserved for Intel */ +#define EFER_LMSLE 0x000002000 /* Long Mode Segment Limit Enable */ +#define EFER_FFXSR 0x000004000 /* Fast FXSAVE/FSRSTOR */ +#define EFER_TCE 0x000008000 /* Translation Cache Extension */ + +/* + * Intel Extended Features registers + */ +#define XCR0 0 /* XFEATURE_ENABLED_MASK register */ + +#define XFEATURE_ENABLED_X87 0x00000001 +#define XFEATURE_ENABLED_SSE 0x00000002 +#define XFEATURE_ENABLED_YMM_HI128 0x00000004 +#define XFEATURE_ENABLED_AVX XFEATURE_ENABLED_YMM_HI128 +#define XFEATURE_ENABLED_BNDREGS 0x00000008 +#define XFEATURE_ENABLED_BNDCSR 0x00000010 +#define XFEATURE_ENABLED_OPMASK 0x00000020 +#define XFEATURE_ENABLED_ZMM_HI256 0x00000040 +#define XFEATURE_ENABLED_HI16_ZMM 0x00000080 + +#define XFEATURE_AVX \ + (XFEATURE_ENABLED_X87 | XFEATURE_ENABLED_SSE | XFEATURE_ENABLED_AVX) +#define XFEATURE_AVX512 \ + (XFEATURE_ENABLED_OPMASK | XFEATURE_ENABLED_ZMM_HI256 | \ + XFEATURE_ENABLED_HI16_ZMM) +#define XFEATURE_MPX \ + (XFEATURE_ENABLED_BNDREGS | XFEATURE_ENABLED_BNDCSR) + +/* + * CPUID instruction features register + */ +#define CPUID_FPU 0x00000001 +#define CPUID_VME 0x00000002 +#define CPUID_DE 0x00000004 +#define CPUID_PSE 0x00000008 +#define CPUID_TSC 0x00000010 +#define CPUID_MSR 0x00000020 +#define CPUID_PAE 0x00000040 +#define CPUID_MCE 0x00000080 +#define CPUID_CX8 0x00000100 +#define CPUID_APIC 0x00000200 +#define CPUID_B10 0x00000400 +#define CPUID_SEP 0x00000800 +#define CPUID_MTRR 0x00001000 +#define CPUID_PGE 0x00002000 +#define CPUID_MCA 0x00004000 +#define CPUID_CMOV 0x00008000 +#define CPUID_PAT 0x00010000 +#define CPUID_PSE36 0x00020000 +#define CPUID_PSN 0x00040000 +#define CPUID_CLFSH 0x00080000 +#define CPUID_B20 0x00100000 +#define CPUID_DS 0x00200000 +#define CPUID_ACPI 0x00400000 +#define CPUID_MMX 0x00800000 +#define CPUID_FXSR 0x01000000 +#define CPUID_SSE 0x02000000 +#define CPUID_XMM 0x02000000 +#define CPUID_SSE2 0x04000000 +#define CPUID_SS 0x08000000 +#define CPUID_HTT 0x10000000 +#define CPUID_TM 0x20000000 +#define CPUID_IA64 0x40000000 +#define CPUID_PBE 0x80000000 + +#define CPUID2_SSE3 0x00000001 +#define CPUID2_PCLMULQDQ 0x00000002 +#define CPUID2_DTES64 0x00000004 +#define CPUID2_MON 0x00000008 +#define CPUID2_DS_CPL 0x00000010 +#define CPUID2_VMX 0x00000020 +#define CPUID2_SMX 0x00000040 +#define CPUID2_EST 0x00000080 +#define CPUID2_TM2 0x00000100 +#define CPUID2_SSSE3 0x00000200 +#define CPUID2_CNXTID 0x00000400 +#define CPUID2_SDBG 0x00000800 +#define CPUID2_FMA 0x00001000 +#define CPUID2_CX16 0x00002000 +#define CPUID2_XTPR 0x00004000 +#define CPUID2_PDCM 0x00008000 +#define CPUID2_PCID 0x00020000 +#define CPUID2_DCA 0x00040000 +#define CPUID2_SSE41 0x00080000 +#define CPUID2_SSE42 0x00100000 +#define CPUID2_X2APIC 0x00200000 +#define CPUID2_MOVBE 0x00400000 +#define CPUID2_POPCNT 0x00800000 +#define CPUID2_TSCDLT 0x01000000 +#define CPUID2_AESNI 0x02000000 +#define CPUID2_XSAVE 0x04000000 +#define CPUID2_OSXSAVE 0x08000000 +#define CPUID2_AVX 0x10000000 +#define CPUID2_F16C 0x20000000 +#define CPUID2_RDRAND 0x40000000 +#define CPUID2_HV 0x80000000 + +/* + * Important bits in the Thermal and Power Management flags + * CPUID.6 EAX and ECX. + */ +#define CPUTPM1_SENSOR 0x00000001 +#define CPUTPM1_TURBO 0x00000002 +#define CPUTPM1_ARAT 0x00000004 +#define CPUTPM2_EFFREQ 0x00000001 + +/* + * Important bits in the AMD extended cpuid flags + */ +#define AMDID_SYSCALL 0x00000800 +#define AMDID_MP 0x00080000 +#define AMDID_NX 0x00100000 +#define AMDID_EXT_MMX 0x00400000 +#define AMDID_FFXSR 0x02000000 +#define AMDID_PAGE1GB 0x04000000 +#define AMDID_RDTSCP 0x08000000 +#define AMDID_LM 0x20000000 +#define AMDID_EXT_3DNOW 0x40000000 +#define AMDID_3DNOW 0x80000000 + +#define AMDID2_LAHF 0x00000001 +#define AMDID2_CMP 0x00000002 +#define AMDID2_SVM 0x00000004 +#define AMDID2_EXT_APIC 0x00000008 +#define AMDID2_CR8 0x00000010 +#define AMDID2_ABM 0x00000020 +#define AMDID2_SSE4A 0x00000040 +#define AMDID2_MAS 0x00000080 +#define AMDID2_PREFETCH 0x00000100 +#define AMDID2_OSVW 0x00000200 +#define AMDID2_IBS 0x00000400 +#define AMDID2_XOP 0x00000800 +#define AMDID2_SKINIT 0x00001000 +#define AMDID2_WDT 0x00002000 +#define AMDID2_LWP 0x00008000 +#define AMDID2_FMA4 0x00010000 +#define AMDID2_TCE 0x00020000 +#define AMDID2_NODE_ID 0x00080000 +#define AMDID2_TBM 0x00200000 +#define AMDID2_TOPOLOGY 0x00400000 +#define AMDID2_PCXC 0x00800000 +#define AMDID2_PNXC 0x01000000 +#define AMDID2_DBE 0x04000000 +#define AMDID2_PTSC 0x08000000 +#define AMDID2_PTSCEL2I 0x10000000 + +/* + * CPUID instruction 1 eax info + */ +#define CPUID_STEPPING 0x0000000f +#define CPUID_MODEL 0x000000f0 +#define CPUID_FAMILY 0x00000f00 +#define CPUID_EXT_MODEL 0x000f0000 +#define CPUID_EXT_FAMILY 0x0ff00000 +#ifdef __i386__ +#define CPUID_TO_MODEL(id) \ + ((((id) & CPUID_MODEL) >> 4) | \ + ((((id) & CPUID_FAMILY) >= 0x600) ? \ + (((id) & CPUID_EXT_MODEL) >> 12) : 0)) +#define CPUID_TO_FAMILY(id) \ + ((((id) & CPUID_FAMILY) >> 8) + \ + ((((id) & CPUID_FAMILY) == 0xf00) ? \ + (((id) & CPUID_EXT_FAMILY) >> 20) : 0)) +#else +#define CPUID_TO_MODEL(id) \ + ((((id) & CPUID_MODEL) >> 4) | \ + (((id) & CPUID_EXT_MODEL) >> 12)) +#define CPUID_TO_FAMILY(id) \ + ((((id) & CPUID_FAMILY) >> 8) + \ + (((id) & CPUID_EXT_FAMILY) >> 20)) +#endif + +/* + * CPUID instruction 1 ebx info + */ +#define CPUID_BRAND_INDEX 0x000000ff +#define CPUID_CLFUSH_SIZE 0x0000ff00 +#define CPUID_HTT_CORES 0x00ff0000 +#define CPUID_LOCAL_APIC_ID 0xff000000 + +/* + * CPUID instruction 5 info + */ +#define CPUID5_MON_MIN_SIZE 0x0000ffff /* eax */ +#define CPUID5_MON_MAX_SIZE 0x0000ffff /* ebx */ +#define CPUID5_MON_MWAIT_EXT 0x00000001 /* ecx */ +#define CPUID5_MWAIT_INTRBREAK 0x00000002 /* ecx */ + +/* + * MWAIT cpu power states. Lower 4 bits are sub-states. + */ +#define MWAIT_C0 0xf0 +#define MWAIT_C1 0x00 +#define MWAIT_C2 0x10 +#define MWAIT_C3 0x20 +#define MWAIT_C4 0x30 + +/* + * MWAIT extensions. + */ +/* Interrupt breaks MWAIT even when masked. */ +#define MWAIT_INTRBREAK 0x00000001 + +/* + * CPUID instruction 6 ecx info + */ +#define CPUID_PERF_STAT 0x00000001 +#define CPUID_PERF_BIAS 0x00000008 + +/* + * CPUID instruction 0xb ebx info. + */ +#define CPUID_TYPE_INVAL 0 +#define CPUID_TYPE_SMT 1 +#define CPUID_TYPE_CORE 2 + +/* + * CPUID instruction 0xd Processor Extended State Enumeration Sub-leaf 1 + */ +#define CPUID_EXTSTATE_XSAVEOPT 0x00000001 +#define CPUID_EXTSTATE_XSAVEC 0x00000002 +#define CPUID_EXTSTATE_XINUSE 0x00000004 +#define CPUID_EXTSTATE_XSAVES 0x00000008 + +/* + * AMD extended function 8000_0007h edx info + */ +#define AMDPM_TS 0x00000001 +#define AMDPM_FID 0x00000002 +#define AMDPM_VID 0x00000004 +#define AMDPM_TTP 0x00000008 +#define AMDPM_TM 0x00000010 +#define AMDPM_STC 0x00000020 +#define AMDPM_100MHZ_STEPS 0x00000040 +#define AMDPM_HW_PSTATE 0x00000080 +#define AMDPM_TSC_INVARIANT 0x00000100 +#define AMDPM_CPB 0x00000200 + +/* + * AMD extended function 8000_0008h ecx info + */ +#define AMDID_CMP_CORES 0x000000ff +#define AMDID_COREID_SIZE 0x0000f000 +#define AMDID_COREID_SIZE_SHIFT 12 + +/* + * CPUID instruction 7 Structured Extended Features, leaf 0 ebx info + */ +#define CPUID_STDEXT_FSGSBASE 0x00000001 +#define CPUID_STDEXT_TSC_ADJUST 0x00000002 +#define CPUID_STDEXT_SGX 0x00000004 +#define CPUID_STDEXT_BMI1 0x00000008 +#define CPUID_STDEXT_HLE 0x00000010 +#define CPUID_STDEXT_AVX2 0x00000020 +#define CPUID_STDEXT_FDP_EXC 0x00000040 +#define CPUID_STDEXT_SMEP 0x00000080 +#define CPUID_STDEXT_BMI2 0x00000100 +#define CPUID_STDEXT_ERMS 0x00000200 +#define CPUID_STDEXT_INVPCID 0x00000400 +#define CPUID_STDEXT_RTM 0x00000800 +#define CPUID_STDEXT_PQM 0x00001000 +#define CPUID_STDEXT_NFPUSG 0x00002000 +#define CPUID_STDEXT_MPX 0x00004000 +#define CPUID_STDEXT_PQE 0x00008000 +#define CPUID_STDEXT_AVX512F 0x00010000 +#define CPUID_STDEXT_AVX512DQ 0x00020000 +#define CPUID_STDEXT_RDSEED 0x00040000 +#define CPUID_STDEXT_ADX 0x00080000 +#define CPUID_STDEXT_SMAP 0x00100000 +#define CPUID_STDEXT_AVX512IFMA 0x00200000 +#define CPUID_STDEXT_PCOMMIT 0x00400000 +#define CPUID_STDEXT_CLFLUSHOPT 0x00800000 +#define CPUID_STDEXT_CLWB 0x01000000 +#define CPUID_STDEXT_PROCTRACE 0x02000000 +#define CPUID_STDEXT_AVX512PF 0x04000000 +#define CPUID_STDEXT_AVX512ER 0x08000000 +#define CPUID_STDEXT_AVX512CD 0x10000000 +#define CPUID_STDEXT_SHA 0x20000000 +#define CPUID_STDEXT_AVX512BW 0x40000000 + +/* + * CPUID instruction 7 Structured Extended Features, leaf 0 ecx info + */ +#define CPUID_STDEXT2_PREFETCHWT1 0x00000001 +#define CPUID_STDEXT2_UMIP 0x00000004 +#define CPUID_STDEXT2_PKU 0x00000008 +#define CPUID_STDEXT2_OSPKE 0x00000010 +#define CPUID_STDEXT2_RDPID 0x00400000 +#define CPUID_STDEXT2_SGXLC 0x40000000 + +/* + * CPUID manufacturers identifiers + */ +#define AMD_VENDOR_ID "AuthenticAMD" +#define CENTAUR_VENDOR_ID "CentaurHauls" +#define CYRIX_VENDOR_ID "CyrixInstead" +#define INTEL_VENDOR_ID "GenuineIntel" +#define NEXGEN_VENDOR_ID "NexGenDriven" +#define NSC_VENDOR_ID "Geode by NSC" +#define RISE_VENDOR_ID "RiseRiseRise" +#define SIS_VENDOR_ID "SiS SiS SiS " +#define TRANSMETA_VENDOR_ID "GenuineTMx86" +#define UMC_VENDOR_ID "UMC UMC UMC " + +/* + * Model-specific registers for the i386 family + */ +#define MSR_P5_MC_ADDR 0x000 +#define MSR_P5_MC_TYPE 0x001 +#define MSR_TSC 0x010 +#define MSR_P5_CESR 0x011 +#define MSR_P5_CTR0 0x012 +#define MSR_P5_CTR1 0x013 +#define MSR_IA32_PLATFORM_ID 0x017 +#define MSR_APICBASE 0x01b +#define MSR_EBL_CR_POWERON 0x02a +#define MSR_TEST_CTL 0x033 +#define MSR_IA32_FEATURE_CONTROL 0x03a +#define MSR_BIOS_UPDT_TRIG 0x079 +#define MSR_BBL_CR_D0 0x088 +#define MSR_BBL_CR_D1 0x089 +#define MSR_BBL_CR_D2 0x08a +#define MSR_BIOS_SIGN 0x08b +#define MSR_PERFCTR0 0x0c1 +#define MSR_PERFCTR1 0x0c2 +#define MSR_PLATFORM_INFO 0x0ce +#define MSR_MPERF 0x0e7 +#define MSR_APERF 0x0e8 +#define MSR_IA32_EXT_CONFIG 0x0ee /* Undocumented. Core Solo/Duo only */ +#define MSR_MTRRcap 0x0fe +#define MSR_BBL_CR_ADDR 0x116 +#define MSR_BBL_CR_DECC 0x118 +#define MSR_BBL_CR_CTL 0x119 +#define MSR_BBL_CR_TRIG 0x11a +#define MSR_BBL_CR_BUSY 0x11b +#define MSR_BBL_CR_CTL3 0x11e +#define MSR_SYSENTER_CS_MSR 0x174 +#define MSR_SYSENTER_ESP_MSR 0x175 +#define MSR_SYSENTER_EIP_MSR 0x176 +#define MSR_MCG_CAP 0x179 +#define MSR_MCG_STATUS 0x17a +#define MSR_MCG_CTL 0x17b +#define MSR_EVNTSEL0 0x186 +#define MSR_EVNTSEL1 0x187 +#define MSR_THERM_CONTROL 0x19a +#define MSR_THERM_INTERRUPT 0x19b +#define MSR_THERM_STATUS 0x19c +#define MSR_IA32_MISC_ENABLE 0x1a0 +#define MSR_IA32_TEMPERATURE_TARGET 0x1a2 +#define MSR_TURBO_RATIO_LIMIT 0x1ad +#define MSR_TURBO_RATIO_LIMIT1 0x1ae +#define MSR_DEBUGCTLMSR 0x1d9 +#define MSR_LASTBRANCHFROMIP 0x1db +#define MSR_LASTBRANCHTOIP 0x1dc +#define MSR_LASTINTFROMIP 0x1dd +#define MSR_LASTINTTOIP 0x1de +#define MSR_ROB_CR_BKUPTMPDR6 0x1e0 +#define MSR_MTRRVarBase 0x200 +#define MSR_MTRR64kBase 0x250 +#define MSR_MTRR16kBase 0x258 +#define MSR_MTRR4kBase 0x268 +#define MSR_PAT 0x277 +#define MSR_MC0_CTL2 0x280 +#define MSR_MTRRdefType 0x2ff +#define MSR_MC0_CTL 0x400 +#define MSR_MC0_STATUS 0x401 +#define MSR_MC0_ADDR 0x402 +#define MSR_MC0_MISC 0x403 +#define MSR_MC1_CTL 0x404 +#define MSR_MC1_STATUS 0x405 +#define MSR_MC1_ADDR 0x406 +#define MSR_MC1_MISC 0x407 +#define MSR_MC2_CTL 0x408 +#define MSR_MC2_STATUS 0x409 +#define MSR_MC2_ADDR 0x40a +#define MSR_MC2_MISC 0x40b +#define MSR_MC3_CTL 0x40c +#define MSR_MC3_STATUS 0x40d +#define MSR_MC3_ADDR 0x40e +#define MSR_MC3_MISC 0x40f +#define MSR_MC4_CTL 0x410 +#define MSR_MC4_STATUS 0x411 +#define MSR_MC4_ADDR 0x412 +#define MSR_MC4_MISC 0x413 +#define MSR_RAPL_POWER_UNIT 0x606 +#define MSR_PKG_ENERGY_STATUS 0x611 +#define MSR_DRAM_ENERGY_STATUS 0x619 +#define MSR_PP0_ENERGY_STATUS 0x639 +#define MSR_PP1_ENERGY_STATUS 0x641 +#define MSR_TSC_DEADLINE 0x6e0 /* Writes are not serializing */ + +/* + * VMX MSRs + */ +#define MSR_VMX_BASIC 0x480 +#define MSR_VMX_PINBASED_CTLS 0x481 +#define MSR_VMX_PROCBASED_CTLS 0x482 +#define MSR_VMX_EXIT_CTLS 0x483 +#define MSR_VMX_ENTRY_CTLS 0x484 +#define MSR_VMX_CR0_FIXED0 0x486 +#define MSR_VMX_CR0_FIXED1 0x487 +#define MSR_VMX_CR4_FIXED0 0x488 +#define MSR_VMX_CR4_FIXED1 0x489 +#define MSR_VMX_PROCBASED_CTLS2 0x48b +#define MSR_VMX_EPT_VPID_CAP 0x48c +#define MSR_VMX_TRUE_PINBASED_CTLS 0x48d +#define MSR_VMX_TRUE_PROCBASED_CTLS 0x48e +#define MSR_VMX_TRUE_EXIT_CTLS 0x48f +#define MSR_VMX_TRUE_ENTRY_CTLS 0x490 + +/* + * X2APIC MSRs. + * Writes are not serializing. + */ +#define MSR_APIC_000 0x800 +#define MSR_APIC_ID 0x802 +#define MSR_APIC_VERSION 0x803 +#define MSR_APIC_TPR 0x808 +#define MSR_APIC_EOI 0x80b +#define MSR_APIC_LDR 0x80d +#define MSR_APIC_SVR 0x80f +#define MSR_APIC_ISR0 0x810 +#define MSR_APIC_ISR1 0x811 +#define MSR_APIC_ISR2 0x812 +#define MSR_APIC_ISR3 0x813 +#define MSR_APIC_ISR4 0x814 +#define MSR_APIC_ISR5 0x815 +#define MSR_APIC_ISR6 0x816 +#define MSR_APIC_ISR7 0x817 +#define MSR_APIC_TMR0 0x818 +#define MSR_APIC_IRR0 0x820 +#define MSR_APIC_ESR 0x828 +#define MSR_APIC_LVT_CMCI 0x82F +#define MSR_APIC_ICR 0x830 +#define MSR_APIC_LVT_TIMER 0x832 +#define MSR_APIC_LVT_THERMAL 0x833 +#define MSR_APIC_LVT_PCINT 0x834 +#define MSR_APIC_LVT_LINT0 0x835 +#define MSR_APIC_LVT_LINT1 0x836 +#define MSR_APIC_LVT_ERROR 0x837 +#define MSR_APIC_ICR_TIMER 0x838 +#define MSR_APIC_CCR_TIMER 0x839 +#define MSR_APIC_DCR_TIMER 0x83e +#define MSR_APIC_SELF_IPI 0x83f + +#define MSR_IA32_XSS 0xda0 + +/* + * Constants related to MSR's. + */ +#define APICBASE_RESERVED 0x000002ff +#define APICBASE_BSP 0x00000100 +#define APICBASE_X2APIC 0x00000400 +#define APICBASE_ENABLED 0x00000800 +#define APICBASE_ADDRESS 0xfffff000 + +/* MSR_IA32_FEATURE_CONTROL related */ +#define IA32_FEATURE_CONTROL_LOCK 0x01 /* lock bit */ +#define IA32_FEATURE_CONTROL_SMX_EN 0x02 /* enable VMX inside SMX */ +#define IA32_FEATURE_CONTROL_VMX_EN 0x04 /* enable VMX outside SMX */ + +/* MSR IA32_MISC_ENABLE */ +#define IA32_MISC_EN_FASTSTR 0x0000000000000001ULL +#define IA32_MISC_EN_ATCCE 0x0000000000000008ULL +#define IA32_MISC_EN_PERFMON 0x0000000000000080ULL +#define IA32_MISC_EN_PEBSU 0x0000000000001000ULL +#define IA32_MISC_EN_ESSTE 0x0000000000010000ULL +#define IA32_MISC_EN_MONE 0x0000000000040000ULL +#define IA32_MISC_EN_LIMCPUID 0x0000000000400000ULL +#define IA32_MISC_EN_xTPRD 0x0000000000800000ULL +#define IA32_MISC_EN_XDD 0x0000000400000000ULL + +/* + * PAT modes. + */ +#define PAT_UNCACHEABLE 0x00 +#define PAT_WRITE_COMBINING 0x01 +#define PAT_WRITE_THROUGH 0x04 +#define PAT_WRITE_PROTECTED 0x05 +#define PAT_WRITE_BACK 0x06 +#define PAT_UNCACHED 0x07 +#define PAT_VALUE(i, m) ((long long)(m) << (8 * (i))) +#define PAT_MASK(i) PAT_VALUE(i, 0xff) + +/* + * Constants related to MTRRs + */ +#define MTRR_UNCACHEABLE 0x00 +#define MTRR_WRITE_COMBINING 0x01 +#define MTRR_WRITE_THROUGH 0x04 +#define MTRR_WRITE_PROTECTED 0x05 +#define MTRR_WRITE_BACK 0x06 +#define MTRR_N64K 8 /* numbers of fixed-size entries */ +#define MTRR_N16K 16 +#define MTRR_N4K 64 +#define MTRR_CAP_WC 0x0000000000000400 +#define MTRR_CAP_FIXED 0x0000000000000100 +#define MTRR_CAP_VCNT 0x00000000000000ff +#define MTRR_DEF_ENABLE 0x0000000000000800 +#define MTRR_DEF_FIXED_ENABLE 0x0000000000000400 +#define MTRR_DEF_TYPE 0x00000000000000ff +#define MTRR_PHYSBASE_PHYSBASE 0x000ffffffffff000 +#define MTRR_PHYSBASE_TYPE 0x00000000000000ff +#define MTRR_PHYSMASK_PHYSMASK 0x000ffffffffff000 +#define MTRR_PHYSMASK_VALID 0x0000000000000800 + +/* + * Cyrix configuration registers, accessible as IO ports. + */ +#define CCR0 0xc0 /* Configuration control register 0 */ +#define CCR0_NC0 0x01 /* First 64K of each 1M memory region is + non-cacheable */ +#define CCR0_NC1 0x02 /* 640K-1M region is non-cacheable */ +#define CCR0_A20M 0x04 /* Enables A20M# input pin */ +#define CCR0_KEN 0x08 /* Enables KEN# input pin */ +#define CCR0_FLUSH 0x10 /* Enables FLUSH# input pin */ +#define CCR0_BARB 0x20 /* Flushes internal cache when entering hold + state */ +#define CCR0_CO 0x40 /* Cache org: 1=direct mapped, 0=2x set + assoc */ +#define CCR0_SUSPEND 0x80 /* Enables SUSP# and SUSPA# pins */ + +#define CCR1 0xc1 /* Configuration control register 1 */ +#define CCR1_RPL 0x01 /* Enables RPLSET and RPLVAL# pins */ +#define CCR1_SMI 0x02 /* Enables SMM pins */ +#define CCR1_SMAC 0x04 /* System management memory access */ +#define CCR1_MMAC 0x08 /* Main memory access */ +#define CCR1_NO_LOCK 0x10 /* Negate LOCK# */ +#define CCR1_SM3 0x80 /* SMM address space address region 3 */ + +#define CCR2 0xc2 +#define CCR2_WB 0x02 /* Enables WB cache interface pins */ +#define CCR2_SADS 0x02 /* Slow ADS */ +#define CCR2_LOCK_NW 0x04 /* LOCK NW Bit */ +#define CCR2_SUSP_HLT 0x08 /* Suspend on HALT */ +#define CCR2_WT1 0x10 /* WT region 1 */ +#define CCR2_WPR1 0x10 /* Write-protect region 1 */ +#define CCR2_BARB 0x20 /* Flushes write-back cache when entering + hold state. */ +#define CCR2_BWRT 0x40 /* Enables burst write cycles */ +#define CCR2_USE_SUSP 0x80 /* Enables suspend pins */ + +#define CCR3 0xc3 +#define CCR3_SMILOCK 0x01 /* SMM register lock */ +#define CCR3_NMI 0x02 /* Enables NMI during SMM */ +#define CCR3_LINBRST 0x04 /* Linear address burst cycles */ +#define CCR3_SMMMODE 0x08 /* SMM Mode */ +#define CCR3_MAPEN0 0x10 /* Enables Map0 */ +#define CCR3_MAPEN1 0x20 /* Enables Map1 */ +#define CCR3_MAPEN2 0x40 /* Enables Map2 */ +#define CCR3_MAPEN3 0x80 /* Enables Map3 */ + +#define CCR4 0xe8 +#define CCR4_IOMASK 0x07 +#define CCR4_MEM 0x08 /* Enables momory bypassing */ +#define CCR4_DTE 0x10 /* Enables directory table entry cache */ +#define CCR4_FASTFPE 0x20 /* Fast FPU exception */ +#define CCR4_CPUID 0x80 /* Enables CPUID instruction */ + +#define CCR5 0xe9 +#define CCR5_WT_ALLOC 0x01 /* Write-through allocate */ +#define CCR5_SLOP 0x02 /* LOOP instruction slowed down */ +#define CCR5_LBR1 0x10 /* Local bus region 1 */ +#define CCR5_ARREN 0x20 /* Enables ARR region */ + +#define CCR6 0xea + +#define CCR7 0xeb + +/* Performance Control Register (5x86 only). */ +#define PCR0 0x20 +#define PCR0_RSTK 0x01 /* Enables return stack */ +#define PCR0_BTB 0x02 /* Enables branch target buffer */ +#define PCR0_LOOP 0x04 /* Enables loop */ +#define PCR0_AIS 0x08 /* Enables all instrcutions stalled to + serialize pipe. */ +#define PCR0_MLR 0x10 /* Enables reordering of misaligned loads */ +#define PCR0_BTBRT 0x40 /* Enables BTB test register. */ +#define PCR0_LSSER 0x80 /* Disable reorder */ + +/* Device Identification Registers */ +#define DIR0 0xfe +#define DIR1 0xff + +/* + * Machine Check register constants. + */ +#define MCG_CAP_COUNT 0x000000ff +#define MCG_CAP_CTL_P 0x00000100 +#define MCG_CAP_EXT_P 0x00000200 +#define MCG_CAP_CMCI_P 0x00000400 +#define MCG_CAP_TES_P 0x00000800 +#define MCG_CAP_EXT_CNT 0x00ff0000 +#define MCG_CAP_SER_P 0x01000000 +#define MCG_STATUS_RIPV 0x00000001 +#define MCG_STATUS_EIPV 0x00000002 +#define MCG_STATUS_MCIP 0x00000004 +#define MCG_CTL_ENABLE 0xffffffffffffffff +#define MCG_CTL_DISABLE 0x0000000000000000 +#define MSR_MC_CTL(x) (MSR_MC0_CTL + (x) * 4) +#define MSR_MC_STATUS(x) (MSR_MC0_STATUS + (x) * 4) +#define MSR_MC_ADDR(x) (MSR_MC0_ADDR + (x) * 4) +#define MSR_MC_MISC(x) (MSR_MC0_MISC + (x) * 4) +#define MSR_MC_CTL2(x) (MSR_MC0_CTL2 + (x)) /* If MCG_CAP_CMCI_P */ +#define MC_STATUS_MCA_ERROR 0x000000000000ffff +#define MC_STATUS_MODEL_ERROR 0x00000000ffff0000 +#define MC_STATUS_OTHER_INFO 0x01ffffff00000000 +#define MC_STATUS_COR_COUNT 0x001fffc000000000 /* If MCG_CAP_CMCI_P */ +#define MC_STATUS_TES_STATUS 0x0060000000000000 /* If MCG_CAP_TES_P */ +#define MC_STATUS_AR 0x0080000000000000 /* If MCG_CAP_TES_P */ +#define MC_STATUS_S 0x0100000000000000 /* If MCG_CAP_TES_P */ +#define MC_STATUS_PCC 0x0200000000000000 +#define MC_STATUS_ADDRV 0x0400000000000000 +#define MC_STATUS_MISCV 0x0800000000000000 +#define MC_STATUS_EN 0x1000000000000000 +#define MC_STATUS_UC 0x2000000000000000 +#define MC_STATUS_OVER 0x4000000000000000 +#define MC_STATUS_VAL 0x8000000000000000 +#define MC_MISC_RA_LSB 0x000000000000003f /* If MCG_CAP_SER_P */ +#define MC_MISC_ADDRESS_MODE 0x00000000000001c0 /* If MCG_CAP_SER_P */ +#define MC_CTL2_THRESHOLD 0x0000000000007fff +#define MC_CTL2_CMCI_EN 0x0000000040000000 + +/* + * The following four 3-byte registers control the non-cacheable regions. + * These registers must be written as three separate bytes. + * + * NCRx+0: A31-A24 of starting address + * NCRx+1: A23-A16 of starting address + * NCRx+2: A15-A12 of starting address | NCR_SIZE_xx. + * + * The non-cacheable region's starting address must be aligned to the + * size indicated by the NCR_SIZE_xx field. + */ +#define NCR1 0xc4 +#define NCR2 0xc7 +#define NCR3 0xca +#define NCR4 0xcd + +#define NCR_SIZE_0K 0 +#define NCR_SIZE_4K 1 +#define NCR_SIZE_8K 2 +#define NCR_SIZE_16K 3 +#define NCR_SIZE_32K 4 +#define NCR_SIZE_64K 5 +#define NCR_SIZE_128K 6 +#define NCR_SIZE_256K 7 +#define NCR_SIZE_512K 8 +#define NCR_SIZE_1M 9 +#define NCR_SIZE_2M 10 +#define NCR_SIZE_4M 11 +#define NCR_SIZE_8M 12 +#define NCR_SIZE_16M 13 +#define NCR_SIZE_32M 14 +#define NCR_SIZE_4G 15 + +/* + * The address region registers are used to specify the location and + * size for the eight address regions. + * + * ARRx + 0: A31-A24 of start address + * ARRx + 1: A23-A16 of start address + * ARRx + 2: A15-A12 of start address | ARR_SIZE_xx + */ +#define ARR0 0xc4 +#define ARR1 0xc7 +#define ARR2 0xca +#define ARR3 0xcd +#define ARR4 0xd0 +#define ARR5 0xd3 +#define ARR6 0xd6 +#define ARR7 0xd9 + +#define ARR_SIZE_0K 0 +#define ARR_SIZE_4K 1 +#define ARR_SIZE_8K 2 +#define ARR_SIZE_16K 3 +#define ARR_SIZE_32K 4 +#define ARR_SIZE_64K 5 +#define ARR_SIZE_128K 6 +#define ARR_SIZE_256K 7 +#define ARR_SIZE_512K 8 +#define ARR_SIZE_1M 9 +#define ARR_SIZE_2M 10 +#define ARR_SIZE_4M 11 +#define ARR_SIZE_8M 12 +#define ARR_SIZE_16M 13 +#define ARR_SIZE_32M 14 +#define ARR_SIZE_4G 15 + +/* + * The region control registers specify the attributes associated with + * the ARRx addres regions. + */ +#define RCR0 0xdc +#define RCR1 0xdd +#define RCR2 0xde +#define RCR3 0xdf +#define RCR4 0xe0 +#define RCR5 0xe1 +#define RCR6 0xe2 +#define RCR7 0xe3 + +#define RCR_RCD 0x01 /* Disables caching for ARRx (x = 0-6). */ +#define RCR_RCE 0x01 /* Enables caching for ARR7. */ +#define RCR_WWO 0x02 /* Weak write ordering. */ +#define RCR_WL 0x04 /* Weak locking. */ +#define RCR_WG 0x08 /* Write gathering. */ +#define RCR_WT 0x10 /* Write-through. */ +#define RCR_NLB 0x20 /* LBA# pin is not asserted. */ + +/* AMD Write Allocate Top-Of-Memory and Control Register */ +#define AMD_WT_ALLOC_TME 0x40000 /* top-of-memory enable */ +#define AMD_WT_ALLOC_PRE 0x20000 /* programmable range enable */ +#define AMD_WT_ALLOC_FRE 0x10000 /* fixed (A0000-FFFFF) range enable */ + +/* AMD64 MSR's */ +#define MSR_EFER 0xc0000080 /* extended features */ +#define MSR_STAR 0xc0000081 /* legacy mode SYSCALL target/cs/ss */ +#define MSR_LSTAR 0xc0000082 /* long mode SYSCALL target rip */ +#define MSR_CSTAR 0xc0000083 /* compat mode SYSCALL target rip */ +#define MSR_SF_MASK 0xc0000084 /* syscall flags mask */ +#define MSR_FSBASE 0xc0000100 /* base address of the %fs "segment" */ +#define MSR_GSBASE 0xc0000101 /* base address of the %gs "segment" */ +#define MSR_KGSBASE 0xc0000102 /* base address of the kernel %gs */ +#define MSR_PERFEVSEL0 0xc0010000 +#define MSR_PERFEVSEL1 0xc0010001 +#define MSR_PERFEVSEL2 0xc0010002 +#define MSR_PERFEVSEL3 0xc0010003 +#define MSR_K7_PERFCTR0 0xc0010004 +#define MSR_K7_PERFCTR1 0xc0010005 +#define MSR_K7_PERFCTR2 0xc0010006 +#define MSR_K7_PERFCTR3 0xc0010007 +#define MSR_SYSCFG 0xc0010010 +#define MSR_HWCR 0xc0010015 +#define MSR_IORRBASE0 0xc0010016 +#define MSR_IORRMASK0 0xc0010017 +#define MSR_IORRBASE1 0xc0010018 +#define MSR_IORRMASK1 0xc0010019 +#define MSR_TOP_MEM 0xc001001a /* boundary for ram below 4G */ +#define MSR_TOP_MEM2 0xc001001d /* boundary for ram above 4G */ +#define MSR_NB_CFG1 0xc001001f /* NB configuration 1 */ +#define MSR_P_STATE_LIMIT 0xc0010061 /* P-state Current Limit Register */ +#define MSR_P_STATE_CONTROL 0xc0010062 /* P-state Control Register */ +#define MSR_P_STATE_STATUS 0xc0010063 /* P-state Status Register */ +#define MSR_P_STATE_CONFIG(n) (0xc0010064 + (n)) /* P-state Config */ +#define MSR_SMM_ADDR 0xc0010112 /* SMM TSEG base address */ +#define MSR_SMM_MASK 0xc0010113 /* SMM TSEG address mask */ +#define MSR_EXTFEATURES 0xc0011005 /* Extended CPUID Features override */ +#define MSR_IC_CFG 0xc0011021 /* Instruction Cache Configuration */ +#define MSR_K8_UCODE_UPDATE 0xc0010020 /* update microcode */ +#define MSR_MC0_CTL_MASK 0xc0010044 +#define MSR_VM_CR 0xc0010114 /* SVM: feature control */ +#define MSR_VM_HSAVE_PA 0xc0010117 /* SVM: host save area address */ + +/* MSR_VM_CR related */ +#define VM_CR_SVMDIS 0x10 /* SVM: disabled by BIOS */ + +/* VIA ACE crypto featureset: for via_feature_rng */ +#define VIA_HAS_RNG 1 /* cpu has RNG */ + +/* VIA ACE crypto featureset: for via_feature_xcrypt */ +#define VIA_HAS_AES 1 /* cpu has AES */ +#define VIA_HAS_SHA 2 /* cpu has SHA1 & SHA256 */ +#define VIA_HAS_MM 4 /* cpu has RSA instructions */ +#define VIA_HAS_AESCTR 8 /* cpu has AES-CTR instructions */ + +/* Centaur Extended Feature flags */ +#define VIA_CPUID_HAS_RNG 0x000004 +#define VIA_CPUID_DO_RNG 0x000008 +#define VIA_CPUID_HAS_ACE 0x000040 +#define VIA_CPUID_DO_ACE 0x000080 +#define VIA_CPUID_HAS_ACE2 0x000100 +#define VIA_CPUID_DO_ACE2 0x000200 +#define VIA_CPUID_HAS_PHE 0x000400 +#define VIA_CPUID_DO_PHE 0x000800 +#define VIA_CPUID_HAS_PMM 0x001000 +#define VIA_CPUID_DO_PMM 0x002000 + +/* VIA ACE xcrypt-* instruction context control options */ +#define VIA_CRYPT_CWLO_ROUND_M 0x0000000f +#define VIA_CRYPT_CWLO_ALG_M 0x00000070 +#define VIA_CRYPT_CWLO_ALG_AES 0x00000000 +#define VIA_CRYPT_CWLO_KEYGEN_M 0x00000080 +#define VIA_CRYPT_CWLO_KEYGEN_HW 0x00000000 +#define VIA_CRYPT_CWLO_KEYGEN_SW 0x00000080 +#define VIA_CRYPT_CWLO_NORMAL 0x00000000 +#define VIA_CRYPT_CWLO_INTERMEDIATE 0x00000100 +#define VIA_CRYPT_CWLO_ENCRYPT 0x00000000 +#define VIA_CRYPT_CWLO_DECRYPT 0x00000200 +#define VIA_CRYPT_CWLO_KEY128 0x0000000a /* 128bit, 10 rds */ +#define VIA_CRYPT_CWLO_KEY192 0x0000040c /* 192bit, 12 rds */ +#define VIA_CRYPT_CWLO_KEY256 0x0000080e /* 256bit, 15 rds */ + +#endif /* !_MACHINE_SPECIALREG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/stack.h /usr/src/sys/modules/netmap/x86/stack.h --- usr/src/sys/modules/netmap/x86/stack.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/stack.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,61 @@ +/*- + * Mach Operating System + * Copyright (c) 1991,1990 Carnegie Mellon University + * All Rights Reserved. + * + * Permission to use, copy, modify and distribute this software and its + * documentation is hereby granted, provided that both the copyright + * notice and this permission notice appear in all copies of the + * software, derivative works or modified versions, and any portions + * thereof, and that both notices appear in supporting documentation. + * + * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS + * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR + * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. + * + * Carnegie Mellon requests users of this software to return to + * + * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU + * School of Computer Science + * Carnegie Mellon University + * Pittsburgh PA 15213-3890 + * + * any improvements or extensions that they make and grant Carnegie the + * rights to redistribute these changes. + * + * $FreeBSD: releng/11.0/sys/x86/include/stack.h 287645 2015-09-11 03:54:37Z markj $ + */ + +#ifndef _X86_STACK_H +#define _X86_STACK_H + +/* + * Stack trace. + */ + +#ifdef __i386__ +struct i386_frame { + struct i386_frame *f_frame; + u_int f_retaddr; + u_int f_arg0; +}; +#endif + +#ifdef __amd64__ +struct amd64_frame { + struct amd64_frame *f_frame; + u_long f_retaddr; +}; + +struct i386_frame { + uint32_t f_frame; + uint32_t f_retaddr; + uint32_t f_arg0; +}; +#endif /* __amd64__ */ + +#ifdef _KERNEL +int stack_nmi_handler(struct trapframe *); +#endif + +#endif /* !_X86_STACK_H */ diff -u -r -N usr/src/sys/modules/netmap/x86/stdarg.h /usr/src/sys/modules/netmap/x86/stdarg.h --- usr/src/sys/modules/netmap/x86/stdarg.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/stdarg.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,77 @@ +/*- + * Copyright (c) 2002 David E. O'Brien. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/stdarg.h 256105 2013-10-07 10:01:23Z phk $ + */ + +#ifndef _MACHINE_STDARG_H_ +#define _MACHINE_STDARG_H_ + +#include <sys/cdefs.h> +#include <sys/_types.h> + +#ifndef _VA_LIST_DECLARED +#define _VA_LIST_DECLARED +typedef __va_list va_list; +#endif + +#ifdef __GNUCLIKE_BUILTIN_STDARG + +#define va_start(ap, last) \ + __builtin_va_start((ap), (last)) + +#define va_arg(ap, type) \ + __builtin_va_arg((ap), type) + +#define __va_copy(dest, src) \ + __builtin_va_copy((dest), (src)) + +#if __ISO_C_VISIBLE >= 1999 +#define va_copy(dest, src) \ + __va_copy(dest, src) +#endif + +#define va_end(ap) \ + __builtin_va_end(ap) + +#elif defined(lint) +/* Provide a fake implementation for lint's benefit */ +#define __va_size(type) \ + (((sizeof(type) + sizeof(long) - 1) / sizeof(long)) * sizeof(long)) +#define va_start(ap, last) \ + ((ap) = (va_list)&(last) + __va_size(last)) +#define va_copy(dst, src) \ + ((dst) = (src)) +#define va_arg(ap, type) \ + (*(type *)((ap) += __va_size(type), (ap) - __va_size(type))) +#define va_end(ap) + +#else +#error this file needs to be ported to your compiler +#endif + +#endif /* !_MACHINE_STDARG_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/sysarch.h /usr/src/sys/modules/netmap/x86/sysarch.h --- usr/src/sys/modules/netmap/x86/sysarch.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/sysarch.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,138 @@ +/*- + * Copyright (c) 1993 The Regents of the University of California. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/sysarch.h 233209 2012-03-19 21:57:31Z tijl $ + */ + +/* + * Architecture specific syscalls (X86) + */ +#ifndef _MACHINE_SYSARCH_H_ +#define _MACHINE_SYSARCH_H_ + +#include <sys/cdefs.h> + +#define I386_GET_LDT 0 +#define I386_SET_LDT 1 +#define LDT_AUTO_ALLOC 0xffffffff + /* I386_IOPL */ +#define I386_GET_IOPERM 3 +#define I386_SET_IOPERM 4 + /* xxxxx */ +#define I386_VM86 6 /* XXX Not implementable on amd64 */ +#define I386_GET_FSBASE 7 +#define I386_SET_FSBASE 8 +#define I386_GET_GSBASE 9 +#define I386_SET_GSBASE 10 +#define I386_GET_XFPUSTATE 11 + +/* Leave space for 0-127 for to avoid translating syscalls */ +#define AMD64_GET_FSBASE 128 +#define AMD64_SET_FSBASE 129 +#define AMD64_GET_GSBASE 130 +#define AMD64_SET_GSBASE 131 +#define AMD64_GET_XFPUSTATE 132 + +struct i386_ioperm_args { + unsigned int start; + unsigned int length; + int enable; +}; + +#ifdef __i386__ +struct i386_ldt_args { + unsigned int start; + union descriptor *descs; + unsigned int num; +}; + +struct i386_vm86_args { + int sub_op; /* sub-operation to perform */ + char *sub_args; /* args */ +}; + +struct i386_get_xfpustate { + void *addr; + int len; +}; +#else +struct i386_ldt_args { + unsigned int start; + struct user_segment_descriptor *descs __packed; + unsigned int num; +}; + +struct i386_get_xfpustate { + unsigned int addr; + int len; +}; + +struct amd64_get_xfpustate { + void *addr; + int len; +}; +#endif + +#ifndef _KERNEL +union descriptor; +struct dbreg; + +__BEGIN_DECLS +int i386_get_ldt(int, union descriptor *, int); +int i386_set_ldt(int, union descriptor *, int); +int i386_get_ioperm(unsigned int, unsigned int *, int *); +int i386_set_ioperm(unsigned int, unsigned int, int); +int i386_vm86(int, void *); +int i386_get_fsbase(void **); +int i386_get_gsbase(void **); +int i386_set_fsbase(void *); +int i386_set_gsbase(void *); +int i386_set_watch(int, unsigned int, int, int, struct dbreg *); +int i386_clr_watch(int, struct dbreg *); +int amd64_get_fsbase(void **); +int amd64_get_gsbase(void **); +int amd64_set_fsbase(void *); +int amd64_set_gsbase(void *); +int sysarch(int, void *); +__END_DECLS +#else +struct thread; +union descriptor; + +int i386_get_ldt(struct thread *, struct i386_ldt_args *); +int i386_set_ldt(struct thread *, struct i386_ldt_args *, union descriptor *); +int i386_get_ioperm(struct thread *, struct i386_ioperm_args *); +int i386_set_ioperm(struct thread *, struct i386_ioperm_args *); +int amd64_get_ldt(struct thread *, struct i386_ldt_args *); +int amd64_set_ldt(struct thread *, struct i386_ldt_args *, + struct user_segment_descriptor *); +int amd64_get_ioperm(struct thread *, struct i386_ioperm_args *); +int amd64_set_ioperm(struct thread *, struct i386_ioperm_args *); +#endif + +#endif /* !_MACHINE_SYSARCH_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/trap.h /usr/src/sys/modules/netmap/x86/trap.h --- usr/src/sys/modules/netmap/x86/trap.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/trap.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,94 @@ +/*- + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: @(#)trap.h 5.4 (Berkeley) 5/9/91 + * $FreeBSD: releng/11.0/sys/x86/include/trap.h 257417 2013-10-31 02:35:00Z markj $ + */ + +#ifndef _MACHINE_TRAP_H_ +#define _MACHINE_TRAP_H_ + +/* + * Trap type values + * also known in trap.c for name strings + */ + +#define T_PRIVINFLT 1 /* privileged instruction */ +#define T_BPTFLT 3 /* breakpoint instruction */ +#define T_ARITHTRAP 6 /* arithmetic trap */ +#define T_PROTFLT 9 /* protection fault */ +#define T_TRCTRAP 10 /* debug exception (sic) */ +#define T_PAGEFLT 12 /* page fault */ +#define T_ALIGNFLT 14 /* alignment fault */ + +#define T_DIVIDE 18 /* integer divide fault */ +#define T_NMI 19 /* non-maskable trap */ +#define T_OFLOW 20 /* overflow trap */ +#define T_BOUND 21 /* bound instruction fault */ +#define T_DNA 22 /* device not available fault */ +#define T_DOUBLEFLT 23 /* double fault */ +#define T_FPOPFLT 24 /* fp coprocessor operand fetch fault */ +#define T_TSSFLT 25 /* invalid tss fault */ +#define T_SEGNPFLT 26 /* segment not present fault */ +#define T_STKFLT 27 /* stack fault */ +#define T_MCHK 28 /* machine check trap */ +#define T_XMMFLT 29 /* SIMD floating-point exception */ +#define T_RESERVED 30 /* reserved (unknown) */ +#define T_DTRACE_RET 32 /* DTrace pid return */ + +/* XXX most of the following codes aren't used, but could be. */ + +/* definitions for <sys/signal.h> */ +#define ILL_RESAD_FAULT T_RESADFLT +#define ILL_PRIVIN_FAULT T_PRIVINFLT +#define ILL_RESOP_FAULT T_RESOPFLT +#define ILL_ALIGN_FAULT T_ALIGNFLT +#define ILL_FPOP_FAULT T_FPOPFLT /* coprocessor operand fault */ + +/* old FreeBSD macros, deprecated */ +#define FPE_INTOVF_TRAP 0x1 /* integer overflow */ +#define FPE_INTDIV_TRAP 0x2 /* integer divide by zero */ +#define FPE_FLTDIV_TRAP 0x3 /* floating/decimal divide by zero */ +#define FPE_FLTOVF_TRAP 0x4 /* floating overflow */ +#define FPE_FLTUND_TRAP 0x5 /* floating underflow */ +#define FPE_FPU_NP_TRAP 0x6 /* floating point unit not present */ +#define FPE_SUBRNG_TRAP 0x7 /* subrange out of bounds */ + +/* codes for SIGBUS */ +#define BUS_PAGE_FAULT T_PAGEFLT /* page fault protection base */ +#define BUS_SEGNP_FAULT T_SEGNPFLT /* segment not present */ +#define BUS_STK_FAULT T_STKFLT /* stack segment */ +#define BUS_SEGM_FAULT T_RESERVED /* segment protection base */ + +/* Trap's coming from user mode */ +#define T_USER 0x100 + +#endif /* !_MACHINE_TRAP_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/ucontext.h /usr/src/sys/modules/netmap/x86/ucontext.h --- usr/src/sys/modules/netmap/x86/ucontext.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/ucontext.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,170 @@ +/*- + * Copyright (c) 2003 Peter Wemm + * Copyright (c) 1999 Marcel Moolenaar + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer + * in this position and unchanged. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/ucontext.h 295561 2016-02-12 07:38:19Z kib $ + */ + +#ifndef _X86_UCONTEXT_H_ +#define _X86_UCONTEXT_H_ + +#ifdef __i386__ +/* Keep _MC_* values similar to amd64 */ +#define _MC_HASSEGS 0x1 +#define _MC_HASBASES 0x2 +#define _MC_HASFPXSTATE 0x4 +#define _MC_FLAG_MASK (_MC_HASSEGS | _MC_HASBASES | _MC_HASFPXSTATE) + +typedef struct __mcontext { + /* + * The definition of mcontext_t must match the layout of + * struct sigcontext after the sc_mask member. This is so + * that we can support sigcontext and ucontext_t at the same + * time. + */ + __register_t mc_onstack; /* XXX - sigcontext compat. */ + __register_t mc_gs; /* machine state (struct trapframe) */ + __register_t mc_fs; + __register_t mc_es; + __register_t mc_ds; + __register_t mc_edi; + __register_t mc_esi; + __register_t mc_ebp; + __register_t mc_isp; + __register_t mc_ebx; + __register_t mc_edx; + __register_t mc_ecx; + __register_t mc_eax; + __register_t mc_trapno; + __register_t mc_err; + __register_t mc_eip; + __register_t mc_cs; + __register_t mc_eflags; + __register_t mc_esp; + __register_t mc_ss; + + int mc_len; /* sizeof(mcontext_t) */ +#define _MC_FPFMT_NODEV 0x10000 /* device not present or configured */ +#define _MC_FPFMT_387 0x10001 +#define _MC_FPFMT_XMM 0x10002 + int mc_fpformat; +#define _MC_FPOWNED_NONE 0x20000 /* FP state not used */ +#define _MC_FPOWNED_FPU 0x20001 /* FP state came from FPU */ +#define _MC_FPOWNED_PCB 0x20002 /* FP state came from PCB */ + int mc_ownedfp; + __register_t mc_flags; + /* + * See <machine/npx.h> for the internals of mc_fpstate[]. + */ + int mc_fpstate[128] __aligned(16); + + __register_t mc_fsbase; + __register_t mc_gsbase; + + __register_t mc_xfpustate; + __register_t mc_xfpustate_len; + + int mc_spare2[4]; +} mcontext_t; +#endif /* __i386__ */ + +#ifdef __amd64__ +/* + * mc_trapno bits. Shall be in sync with TF_XXX. + */ +#define _MC_HASSEGS 0x1 +#define _MC_HASBASES 0x2 +#define _MC_HASFPXSTATE 0x4 +#define _MC_FLAG_MASK (_MC_HASSEGS | _MC_HASBASES | _MC_HASFPXSTATE) + +typedef struct __mcontext { + /* + * The definition of mcontext_t must match the layout of + * struct sigcontext after the sc_mask member. This is so + * that we can support sigcontext and ucontext_t at the same + * time. + */ + __register_t mc_onstack; /* XXX - sigcontext compat. */ + __register_t mc_rdi; /* machine state (struct trapframe) */ + __register_t mc_rsi; + __register_t mc_rdx; + __register_t mc_rcx; + __register_t mc_r8; + __register_t mc_r9; + __register_t mc_rax; + __register_t mc_rbx; + __register_t mc_rbp; + __register_t mc_r10; + __register_t mc_r11; + __register_t mc_r12; + __register_t mc_r13; + __register_t mc_r14; + __register_t mc_r15; + __uint32_t mc_trapno; + __uint16_t mc_fs; + __uint16_t mc_gs; + __register_t mc_addr; + __uint32_t mc_flags; + __uint16_t mc_es; + __uint16_t mc_ds; + __register_t mc_err; + __register_t mc_rip; + __register_t mc_cs; + __register_t mc_rflags; + __register_t mc_rsp; + __register_t mc_ss; + + long mc_len; /* sizeof(mcontext_t) */ + +#define _MC_FPFMT_NODEV 0x10000 /* device not present or configured */ +#define _MC_FPFMT_XMM 0x10002 + long mc_fpformat; +#define _MC_FPOWNED_NONE 0x20000 /* FP state not used */ +#define _MC_FPOWNED_FPU 0x20001 /* FP state came from FPU */ +#define _MC_FPOWNED_PCB 0x20002 /* FP state came from PCB */ + long mc_ownedfp; + /* + * See <machine/fpu.h> for the internals of mc_fpstate[]. + */ + long mc_fpstate[64] __aligned(16); + + __register_t mc_fsbase; + __register_t mc_gsbase; + + __register_t mc_xfpustate; + __register_t mc_xfpustate_len; + + long mc_spare[4]; +} mcontext_t; +#endif /* __amd64__ */ + +#ifdef __LINT__ +typedef struct __mcontext { +} mcontext_t; +#endif /* __LINT__ */ + +#endif /* !_X86_UCONTEXT_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/vdso.h /usr/src/sys/modules/netmap/x86/vdso.h --- usr/src/sys/modules/netmap/x86/vdso.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/vdso.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,42 @@ +/*- + * Copyright 2012 Konstantin Belousov <kib@FreeBSD.ORG>. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/vdso.h 237433 2012-06-22 07:06:40Z kib $ + */ + +#ifndef _X86_VDSO_H +#define _X86_VDSO_H + +#define VDSO_TIMEHANDS_MD \ + uint32_t th_x86_shift; \ + uint32_t th_res[7]; + +#ifdef _KERNEL +#ifdef COMPAT_FREEBSD32 + +#define VDSO_TIMEHANDS_MD32 VDSO_TIMEHANDS_MD + +#endif +#endif +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/vmware.h /usr/src/sys/modules/netmap/x86/vmware.h --- usr/src/sys/modules/netmap/x86/vmware.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/vmware.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,52 @@ +/*- + * Copyright (c) 2011-2014 Jung-uk Kim <jkim@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/vmware.h 278749 2015-02-14 09:00:12Z kib $ + */ + +#ifndef _X86_VMWARE_H_ +#define _X86_VMWARE_H_ + +#define VMW_HVMAGIC 0x564d5868 +#define VMW_HVPORT 0x5658 + +#define VMW_HVCMD_GETVERSION 10 +#define VMW_HVCMD_GETHZ 45 +#define VMW_HVCMD_GETVCPU_INFO 68 + +#define VMW_VCPUINFO_LEGACY_X2APIC (1 << 3) +#define VMW_VCPUINFO_VCPU_RESERVED (1 << 31) + +static __inline void +vmware_hvcall(u_int cmd, u_int *p) +{ + + __asm __volatile("inl %w3, %0" + : "=a" (p[0]), "=b" (p[1]), "=c" (p[2]), "=d" (p[3]) + : "0" (VMW_HVMAGIC), "1" (UINT_MAX), "2" (cmd), "3" (VMW_HVPORT) + : "memory"); +} + +#endif /* !_X86_VMWARE_H_ */ diff -u -r -N usr/src/sys/modules/netmap/x86/x86_smp.h /usr/src/sys/modules/netmap/x86/x86_smp.h --- usr/src/sys/modules/netmap/x86/x86_smp.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/x86_smp.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,103 @@ +/*- + * ---------------------------------------------------------------------------- + * "THE BEER-WARE LICENSE" (Revision 42): + * <phk@FreeBSD.org> wrote this file. As long as you retain this notice you + * can do whatever you want with this stuff. If we meet some day, and you think + * this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp + * ---------------------------------------------------------------------------- + * + * $FreeBSD: releng/11.0/sys/x86/include/x86_smp.h 291949 2015-12-07 17:41:20Z kib $ + * + */ + +#ifndef _X86_X86_SMP_H_ +#define _X86_X86_SMP_H_ + +#include <sys/bus.h> +#include <machine/frame.h> +#include <machine/intr_machdep.h> +#include <x86/apicvar.h> +#include <machine/pcb.h> + +struct pmap; + +/* global data in mp_x86.c */ +extern int mp_naps; +extern int boot_cpu_id; +extern struct pcb stoppcbs[]; +extern int cpu_apic_ids[]; +extern int bootAP; +extern void *dpcpu; +extern char *bootSTK; +extern void *bootstacks[]; +extern volatile u_int cpu_ipi_pending[]; +extern volatile int aps_ready; +extern struct mtx ap_boot_mtx; +extern int cpu_logical; +extern int cpu_cores; +extern volatile int smp_tlb_wait; +extern struct pmap *smp_tlb_pmap; +extern u_int xhits_gbl[]; +extern u_int xhits_pg[]; +extern u_int xhits_rng[]; +extern u_int ipi_global; +extern u_int ipi_page; +extern u_int ipi_range; +extern u_int ipi_range_size; + +struct cpu_info { + int cpu_present:1; + int cpu_bsp:1; + int cpu_disabled:1; + int cpu_hyperthread:1; +}; +extern struct cpu_info cpu_info[]; + +#ifdef COUNT_IPIS +extern u_long *ipi_invltlb_counts[MAXCPU]; +extern u_long *ipi_invlrng_counts[MAXCPU]; +extern u_long *ipi_invlpg_counts[MAXCPU]; +extern u_long *ipi_invlcache_counts[MAXCPU]; +extern u_long *ipi_rendezvous_counts[MAXCPU]; +#endif + +/* IPI handlers */ +inthand_t + IDTVEC(invltlb), /* TLB shootdowns - global */ + IDTVEC(invlpg), /* TLB shootdowns - 1 page */ + IDTVEC(invlrng), /* TLB shootdowns - page range */ + IDTVEC(invlcache), /* Write back and invalidate cache */ + IDTVEC(ipi_intr_bitmap_handler), /* Bitmap based IPIs */ + IDTVEC(cpustop), /* CPU stops & waits to be restarted */ + IDTVEC(cpususpend), /* CPU suspends & waits to be resumed */ + IDTVEC(rendezvous); /* handle CPU rendezvous */ + +/* functions in x86_mp.c */ +void assign_cpu_ids(void); +void cpu_add(u_int apic_id, char boot_cpu); +void cpustop_handler(void); +void cpususpend_handler(void); +void init_secondary_tail(void); +void invltlb_handler(void); +void invlpg_handler(void); +void invlrng_handler(void); +void invlcache_handler(void); +void init_secondary(void); +void ipi_startup(int apic_id, int vector); +void ipi_all_but_self(u_int ipi); +void ipi_bitmap_handler(struct trapframe frame); +void ipi_cpu(int cpu, u_int ipi); +int ipi_nmi_handler(void); +void ipi_selected(cpuset_t cpus, u_int ipi); +u_int mp_bootaddress(u_int); +void set_interrupt_apic_ids(void); +void smp_cache_flush(void); +void smp_masked_invlpg(cpuset_t mask, vm_offset_t addr); +void smp_masked_invlpg_range(cpuset_t mask, vm_offset_t startva, + vm_offset_t endva); +void smp_masked_invltlb(cpuset_t mask, struct pmap *pmap); +void mem_range_AP_init(void); +void topo_probe(void); +void ipi_send_cpu(int cpu, u_int ipi); + +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/x86_var.h /usr/src/sys/modules/netmap/x86/x86_var.h --- usr/src/sys/modules/netmap/x86/x86_var.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/x86_var.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,117 @@ +/*- + * Copyright (c) 1995 Bruce D. Evans. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of the author nor the names of contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/sys/x86/include/x86_var.h 297857 2016-04-12 13:30:39Z avg $ + */ + +#ifndef _X86_X86_VAR_H_ +#define _X86_X86_VAR_H_ + +/* + * Miscellaneous machine-dependent declarations. + */ + +extern long Maxmem; +extern u_int basemem; +extern int busdma_swi_pending; +extern u_int cpu_exthigh; +extern u_int cpu_feature; +extern u_int cpu_feature2; +extern u_int amd_feature; +extern u_int amd_feature2; +extern u_int amd_pminfo; +extern u_int via_feature_rng; +extern u_int via_feature_xcrypt; +extern u_int cpu_clflush_line_size; +extern u_int cpu_stdext_feature; +extern u_int cpu_stdext_feature2; +extern u_int cpu_fxsr; +extern u_int cpu_high; +extern u_int cpu_id; +extern u_int cpu_max_ext_state_size; +extern u_int cpu_mxcsr_mask; +extern u_int cpu_procinfo; +extern u_int cpu_procinfo2; +extern char cpu_vendor[]; +extern u_int cpu_vendor_id; +extern u_int cpu_mon_mwait_flags; +extern u_int cpu_mon_min_size; +extern u_int cpu_mon_max_size; +extern u_int cpu_maxphyaddr; +extern char ctx_switch_xsave[]; +extern u_int hv_high; +extern char hv_vendor[]; +extern char kstack[]; +extern char sigcode[]; +extern int szsigcode; +extern int vm_page_dump_size; +extern int workaround_erratum383; +extern int _udatasel; +extern int _ucodesel; +extern int _ucode32sel; +extern int _ufssel; +extern int _ugssel; +extern int use_xsave; +extern uint64_t xsave_mask; + +struct pcb; +struct thread; +struct reg; +struct fpreg; +struct dbreg; +struct dumperinfo; + +/* + * The interface type of the interrupt handler entry point cannot be + * expressed in C. Use simplest non-variadic function type as an + * approximation. + */ +typedef void alias_for_inthand_t(void); + +void *alloc_fpusave(int flags); +void busdma_swi(void); +bool cpu_mwait_usable(void); +void cpu_probe_amdc1e(void); +void cpu_setregs(void); +void dump_add_page(vm_paddr_t); +void dump_drop_page(vm_paddr_t); +void identify_cpu(void); +void initializecpu(void); +void initializecpucache(void); +bool fix_cpuid(void); +void fillw(int /*u_short*/ pat, void *base, size_t cnt); +int is_physical_memory(vm_paddr_t addr); +int isa_nmi(int cd); +void panicifcpuunsupported(void); +void pagecopy(void *from, void *to); +void printcpuinfo(void); +int user_dbreg_trap(void); +int minidumpsys(struct dumperinfo *); +struct pcb *get_pcb_td(struct thread *td); + +#endif diff -u -r -N usr/src/sys/modules/netmap/x86/xen/xen-os.h /usr/src/sys/modules/netmap/x86/xen/xen-os.h --- usr/src/sys/modules/netmap/x86/xen/xen-os.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/netmap/x86/xen/xen-os.h 2016-09-29 00:24:55.000000000 +0100 @@ -0,0 +1,38 @@ +/***************************************************************************** + * x86/xen/xen-os.h + * + * Random collection of macros and definition + * + * Copyright (c) 2003, 2004 Keir Fraser (on behalf of the Xen team) + * All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * $FreeBSD: releng/11.0/sys/x86/include/xen/xen-os.h 289686 2015-10-21 10:44:07Z royger $ + */ + +#ifndef _MACHINE_X86_XEN_XEN_OS_H_ +#define _MACHINE_X86_XEN_XEN_OS_H_ + +/* Everything below this point is not included by assembler (.S) files. */ +#ifndef __ASSEMBLY__ + +#endif /* !__ASSEMBLY__ */ + +#endif /* _MACHINE_X86_XEN_XEN_OS_H_ */ diff -u -r -N usr/src/sys/modules/vmm/Makefile /usr/src/sys/modules/vmm/Makefile --- usr/src/sys/modules/vmm/Makefile 2016-09-29 00:24:52.000000000 +0100 +++ /usr/src/sys/modules/vmm/Makefile 2016-11-30 10:56:05.805760000 +0000 @@ -21,6 +21,7 @@ vmm_ioport.c \ vmm_lapic.c \ vmm_mem.c \ + vmm_usermem.c \ vmm_stat.c \ vmm_util.c \ x86.c diff -u -r -N usr/src/sys/modules/vmm/Makefile.orig /usr/src/sys/modules/vmm/Makefile.orig --- usr/src/sys/modules/vmm/Makefile.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/modules/vmm/Makefile.orig 2016-11-30 10:52:59.670208000 +0000 @@ -0,0 +1,79 @@ +# $FreeBSD: releng/11.0/sys/modules/vmm/Makefile 279971 2015-03-14 02:32:08Z neel $ + +KMOD= vmm + +SRCS= opt_acpi.h opt_ddb.h device_if.h bus_if.h pci_if.h +SRCS+= vmx_assym.h svm_assym.h +DPSRCS= vmx_genassym.c svm_genassym.c + +CFLAGS+= -DVMM_KEEP_STATS -DSMP +CFLAGS+= -I${.CURDIR}/../../amd64/vmm +CFLAGS+= -I${.CURDIR}/../../amd64/vmm/io +CFLAGS+= -I${.CURDIR}/../../amd64/vmm/intel +CFLAGS+= -I${.CURDIR}/../../amd64/vmm/amd + +# generic vmm support +.PATH: ${.CURDIR}/../../amd64/vmm +SRCS+= vmm.c \ + vmm_dev.c \ + vmm_host.c \ + vmm_instruction_emul.c \ + vmm_ioport.c \ + vmm_lapic.c \ + vmm_mem.c \ + vmm_stat.c \ + vmm_util.c \ + x86.c + +.PATH: ${.CURDIR}/../../amd64/vmm/io +SRCS+= iommu.c \ + ppt.c \ + vatpic.c \ + vatpit.c \ + vhpet.c \ + vioapic.c \ + vlapic.c \ + vpmtmr.c \ + vrtc.c + +# intel-specific files +.PATH: ${.CURDIR}/../../amd64/vmm/intel +SRCS+= ept.c \ + vmcs.c \ + vmx_msr.c \ + vmx_support.S \ + vmx.c \ + vtd.c + +# amd-specific files +.PATH: ${.CURDIR}/../../amd64/vmm/amd +SRCS+= vmcb.c \ + svm.c \ + svm_support.S \ + npt.c \ + amdv.c \ + svm_msr.c + +CLEANFILES= vmx_assym.h vmx_genassym.o svm_assym.h svm_genassym.o + +vmx_assym.h: vmx_genassym.o + sh ${SYSDIR}/kern/genassym.sh vmx_genassym.o > ${.TARGET} + +svm_assym.h: svm_genassym.o + sh ${SYSDIR}/kern/genassym.sh svm_genassym.o > ${.TARGET} + +vmx_support.o: + ${CC} -c -x assembler-with-cpp -DLOCORE ${CFLAGS} \ + ${.IMPSRC} -o ${.TARGET} + +svm_support.o: + ${CC} -c -x assembler-with-cpp -DLOCORE ${CFLAGS} \ + ${.IMPSRC} -o ${.TARGET} + +vmx_genassym.o: + ${CC} -c ${CFLAGS:N-fno-common} ${.IMPSRC} + +svm_genassym.o: + ${CC} -c ${CFLAGS:N-fno-common} ${.IMPSRC} + +.include <bsd.kmod.mk> diff -u -r -N usr/src/sys/net/netmap.h /usr/src/sys/net/netmap.h --- usr/src/sys/net/netmap.h 2016-09-29 00:24:41.000000000 +0100 +++ /usr/src/sys/net/netmap.h 2016-11-23 16:57:57.857788000 +0000 @@ -25,7 +25,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/net/netmap.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD: head/sys/net/netmap.h 251139 2013-05-30 14:07:14Z luigi $ * * Definitions of constants and the structures used by the netmap * framework, for the part visible to both kernel and userspace. @@ -137,6 +137,26 @@ * netmap:foo-k the k-th NIC ring pair * netmap:foo{k PIPE ring pair k, master side * netmap:foo}k PIPE ring pair k, slave side + * + * Some notes about host rings: + * + * + The RX host ring is used to store those packets that the host network + * stack is trying to transmit through a NIC queue, but only if that queue + * is currently in netmap mode. Netmap will not intercept host stack mbufs + * designated to NIC queues that are not in netmap mode. As a consequence, + * registering a netmap port with netmap:foo^ is not enough to intercept + * mbufs in the RX host ring; the netmap port should be registered with + * netmap:foo*, or another registration should be done to open at least a + * NIC TX queue in netmap mode. + * + * + Netmap is not currently able to deal with intercepted trasmit mbufs which + * require offloadings like TSO, UFO, checksumming offloadings, etc. It is + * responsibility of the user to disable those offloadings (e.g. using + * ifconfig on FreeBSD or ethtool -K on Linux) for an interface that is being + * used in netmap mode. If the offloadings are not disabled, GSO and/or + * unchecksummed packets may be dropped immediately or end up in the host RX + * ring, and will be dropped as soon as the packet reaches another netmap + * adapter. */ /* @@ -277,7 +297,11 @@ struct timeval ts; /* (k) time of last *sync() */ /* opaque room for a mutex or similar object */ - uint8_t sem[128] __attribute__((__aligned__(NM_CACHE_ALIGN))); +#if !defined(_WIN32) || defined(__CYGWIN__) + uint8_t __attribute__((__aligned__(NM_CACHE_ALIGN))) sem[128]; +#else + uint8_t __declspec(align(NM_CACHE_ALIGN)) sem[128]; +#endif /* the slots follow. This struct has variable size */ struct netmap_slot slot[0]; /* array of slots. */ @@ -496,6 +520,12 @@ #define NETMAP_BDG_OFFSET NETMAP_BDG_VNET_HDR /* deprecated alias */ #define NETMAP_BDG_NEWIF 6 /* create a virtual port */ #define NETMAP_BDG_DELIF 7 /* destroy a virtual port */ +#define NETMAP_PT_HOST_CREATE 8 /* create ptnetmap kthreads */ +#define NETMAP_PT_HOST_DELETE 9 /* delete ptnetmap kthreads */ +#define NETMAP_BDG_POLLING_ON 10 /* delete polling kthread */ +#define NETMAP_BDG_POLLING_OFF 11 /* delete polling kthread */ +#define NETMAP_VNET_HDR_GET 12 /* get the port virtio-net-hdr length */ +#define NETMAP_POOLS_INFO_GET 13 /* get memory allocator pools info */ uint16_t nr_arg1; /* reserve extra rings in NIOCREGIF */ #define NETMAP_BDG_HOST 1 /* attach the host stack on ATTACH */ @@ -521,7 +551,61 @@ #define NR_ZCOPY_MON 0x400 /* request exclusive access to the selected rings */ #define NR_EXCLUSIVE 0x800 +/* request ptnetmap host support */ +#define NR_PASSTHROUGH_HOST NR_PTNETMAP_HOST /* deprecated */ +#define NR_PTNETMAP_HOST 0x1000 +#define NR_RX_RINGS_ONLY 0x2000 +#define NR_TX_RINGS_ONLY 0x4000 +/* Applications set this flag if they are able to deal with virtio-net headers, + * that is send/receive frames that start with a virtio-net header. + * If not set, NIOCREGIF will fail with netmap ports that require applications + * to use those headers. If the flag is set, the application can use the + * NETMAP_VNET_HDR_GET command to figure out the header length. */ +#define NR_ACCEPT_VNET_HDR 0x8000 + +#define NM_BDG_NAME "vale" /* prefix for bridge port name */ + +/* + * Windows does not have _IOWR(). _IO(), _IOW() and _IOR() are defined + * in ws2def.h but not sure if they are in the form we need. + * XXX so we redefine them + * in a convenient way to use for DeviceIoControl signatures + */ +#ifdef _WIN32 +#undef _IO // ws2def.h +#define _WIN_NM_IOCTL_TYPE 40000 +#define _IO(_c, _n) CTL_CODE(_WIN_NM_IOCTL_TYPE, ((_n) + 0x800) , \ + METHOD_BUFFERED, FILE_ANY_ACCESS ) +#define _IO_direct(_c, _n) CTL_CODE(_WIN_NM_IOCTL_TYPE, ((_n) + 0x800) , \ + METHOD_OUT_DIRECT, FILE_ANY_ACCESS ) + +#define _IOWR(_c, _n, _s) _IO(_c, _n) + +/* We havesome internal sysctl in addition to the externally visible ones */ +#define NETMAP_MMAP _IO_direct('i', 160) // note METHOD_OUT_DIRECT +#define NETMAP_POLL _IO('i', 162) + +/* and also two setsockopt for sysctl emulation */ +#define NETMAP_SETSOCKOPT _IO('i', 140) +#define NETMAP_GETSOCKOPT _IO('i', 141) + + +//These linknames are for the Netmap Core Driver +#define NETMAP_NT_DEVICE_NAME L"\\Device\\NETMAP" +#define NETMAP_DOS_DEVICE_NAME L"\\DosDevices\\netmap" + +//Definition of a structure used to pass a virtual address within an IOCTL +typedef struct _MEMORY_ENTRY { + PVOID pUsermodeVirtualAddress; +} MEMORY_ENTRY, *PMEMORY_ENTRY; + +typedef struct _POLL_REQUEST_DATA { + int events; + int timeout; + int revents; +} POLL_REQUEST_DATA; +#endif /* _WIN32 */ /* * FreeBSD uses the size value embedded in the _IOWR to determine diff -u -r -N usr/src/sys/net/netmap_user.h /usr/src/sys/net/netmap_user.h --- usr/src/sys/net/netmap_user.h 2016-09-29 00:24:41.000000000 +0100 +++ /usr/src/sys/net/netmap_user.h 2016-11-23 16:57:57.858351000 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C) 2011-2014 Universita` di Pisa. All rights reserved. + * Copyright (C) 2011-2016 Universita` di Pisa + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -25,7 +26,7 @@ */ /* - * $FreeBSD: releng/11.0/sys/net/netmap_user.h 285349 2015-07-10 05:51:36Z luigi $ + * $FreeBSD$ * * Functions and macros to manipulate netmap structures and packets * in userspace. See netmap(4) for more information. @@ -65,9 +66,31 @@ #ifndef _NET_NETMAP_USER_H_ #define _NET_NETMAP_USER_H_ +#define NETMAP_DEVICE_NAME "/dev/netmap" + +#ifdef __CYGWIN__ +/* + * we can compile userspace apps with either cygwin or msvc, + * and we use _WIN32 to identify windows specific code + */ +#ifndef _WIN32 +#define _WIN32 +#endif /* _WIN32 */ + +#endif /* __CYGWIN__ */ + +#ifdef _WIN32 +#undef NETMAP_DEVICE_NAME +#define NETMAP_DEVICE_NAME "/proc/sys/DosDevices/Global/netmap" +#include <windows.h> +#include <WinDef.h> +#include <sys/cygwin.h> +#endif /* _WIN32 */ + #include <stdint.h> #include <sys/socket.h> /* apple needs sockaddr */ #include <net/if.h> /* IFNAMSIZ */ +#include <ctype.h> #ifndef likely #define likely(x) __builtin_expect(!!(x), 1) @@ -172,17 +195,23 @@ } while (0) #endif -struct nm_pkthdr { /* same as pcap_pkthdr */ +struct nm_pkthdr { /* first part is the same as pcap_pkthdr */ struct timeval ts; uint32_t caplen; uint32_t len; + + uint64_t flags; /* NM_MORE_PKTS etc */ +#define NM_MORE_PKTS 1 + struct nm_desc *d; + struct netmap_slot *slot; + uint8_t *buf; }; struct nm_stat { /* same as pcap_stat */ u_int ps_recv; u_int ps_drop; u_int ps_ifdrop; -#ifdef WIN32 +#ifdef WIN32 /* XXX or _WIN32 ? */ u_int bs_capt; #endif /* WIN32 */ }; @@ -284,12 +313,14 @@ * -NN bind individual NIC ring pair * {NN bind master side of pipe NN * }NN bind slave side of pipe NN - * a suffix starting with + and the following flags, + * a suffix starting with / and the following flags, * in any order: * x exclusive access * z zero copy monitor * t monitor tx side * r monitor rx side + * R bind only RX ring(s) + * T bind only TX ring(s) * * req provides the initial values of nmreq before parsing ifname. * Remember that the ifname parsing will override the ring @@ -329,6 +360,13 @@ static int nm_close(struct nm_desc *); /* + * nm_mmap() do mmap or inherit from parent if the nr_arg2 + * (memory block) matches. + */ + +static int nm_mmap(struct nm_desc *, const struct nm_desc *); + +/* * nm_inject() is the same as pcap_inject() * nm_dispatch() is the same as pcap_dispatch() * nm_nextpkt() is the same as pcap_next() @@ -338,13 +376,247 @@ static int nm_dispatch(struct nm_desc *, int, nm_cb_t, u_char *); static u_char *nm_nextpkt(struct nm_desc *, struct nm_pkthdr *); +#ifdef _WIN32 + +intptr_t _get_osfhandle(int); /* defined in io.h in windows */ + +/* + * In windows we do not have yet native poll support, so we keep track + * of file descriptors associated to netmap ports to emulate poll on + * them and fall back on regular poll on other file descriptors. + */ +struct win_netmap_fd_list { + struct win_netmap_fd_list *next; + int win_netmap_fd; + HANDLE win_netmap_handle; +}; + +/* + * list head containing all the netmap opened fd and their + * windows HANDLE counterparts + */ +static struct win_netmap_fd_list *win_netmap_fd_list_head; + +static void +win_insert_fd_record(int fd) +{ + struct win_netmap_fd_list *curr; + + for (curr = win_netmap_fd_list_head; curr; curr = curr->next) { + if (fd == curr->win_netmap_fd) { + return; + } + } + curr = calloc(1, sizeof(*curr)); + curr->next = win_netmap_fd_list_head; + curr->win_netmap_fd = fd; + curr->win_netmap_handle = IntToPtr(_get_osfhandle(fd)); + win_netmap_fd_list_head = curr; +} + +void +win_remove_fd_record(int fd) +{ + struct win_netmap_fd_list *curr = win_netmap_fd_list_head; + struct win_netmap_fd_list *prev = NULL; + for (; curr ; prev = curr, curr = curr->next) { + if (fd != curr->win_netmap_fd) + continue; + /* found the entry */ + if (prev == NULL) { /* we are freeing the first entry */ + win_netmap_fd_list_head = curr->next; + } else { + prev->next = curr->next; + } + free(curr); + break; + } +} + + +HANDLE +win_get_netmap_handle(int fd) +{ + struct win_netmap_fd_list *curr; + + for (curr = win_netmap_fd_list_head; curr; curr = curr->next) { + if (fd == curr->win_netmap_fd) { + return curr->win_netmap_handle; + } + } + return NULL; +} + +/* + * we need to wrap ioctl and mmap, at least for the netmap file descriptors + */ + +/* + * use this function only from netmap_user.h internal functions + * same as ioctl, returns 0 on success and -1 on error + */ +static int +win_nm_ioctl_internal(HANDLE h, int32_t ctlCode, void *arg) +{ + DWORD bReturn = 0, szIn, szOut; + BOOL ioctlReturnStatus; + void *inParam = arg, *outParam = arg; + + switch (ctlCode) { + case NETMAP_POLL: + szIn = sizeof(POLL_REQUEST_DATA); + szOut = sizeof(POLL_REQUEST_DATA); + break; + case NETMAP_MMAP: + szIn = 0; + szOut = sizeof(void*); + inParam = NULL; /* nothing on input */ + break; + case NIOCTXSYNC: + case NIOCRXSYNC: + szIn = 0; + szOut = 0; + break; + case NIOCREGIF: + szIn = sizeof(struct nmreq); + szOut = sizeof(struct nmreq); + break; + case NIOCCONFIG: + D("unsupported NIOCCONFIG!"); + return -1; + + default: /* a regular ioctl */ + D("invalid ioctl %x on netmap fd", ctlCode); + return -1; + } + + ioctlReturnStatus = DeviceIoControl(h, + ctlCode, inParam, szIn, + outParam, szOut, + &bReturn, NULL); + // XXX note windows returns 0 on error or async call, 1 on success + // we could call GetLastError() to figure out what happened + return ioctlReturnStatus ? 0 : -1; +} + +/* + * this function is what must be called from user-space programs + * same as ioctl, returns 0 on success and -1 on error + */ +static int +win_nm_ioctl(int fd, int32_t ctlCode, void *arg) +{ + HANDLE h = win_get_netmap_handle(fd); + + if (h == NULL) { + return ioctl(fd, ctlCode, arg); + } else { + return win_nm_ioctl_internal(h, ctlCode, arg); + } +} + +#define ioctl win_nm_ioctl /* from now on, within this file ... */ + +/* + * We cannot use the native mmap on windows + * The only parameter used is "fd", the other ones are just declared to + * make this signature comparable to the FreeBSD/Linux one + */ +static void * +win32_mmap_emulated(void *addr, size_t length, int prot, int flags, int fd, int32_t offset) +{ + HANDLE h = win_get_netmap_handle(fd); + + if (h == NULL) { + return mmap(addr, length, prot, flags, fd, offset); + } else { + MEMORY_ENTRY ret; + + return win_nm_ioctl_internal(h, NETMAP_MMAP, &ret) ? + NULL : ret.pUsermodeVirtualAddress; + } +} + +#define mmap win32_mmap_emulated + +#include <sys/poll.h> /* XXX needed to use the structure pollfd */ + +static int +win_nm_poll(struct pollfd *fds, int nfds, int timeout) +{ + HANDLE h; + + if (nfds != 1 || fds == NULL || (h = win_get_netmap_handle(fds->fd)) == NULL) {; + return poll(fds, nfds, timeout); + } else { + POLL_REQUEST_DATA prd; + + prd.timeout = timeout; + prd.events = fds->events; + + win_nm_ioctl_internal(h, NETMAP_POLL, &prd); + if ((prd.revents == POLLERR) || (prd.revents == STATUS_TIMEOUT)) { + return -1; + } + return 1; + } +} + +#define poll win_nm_poll + +static int +win_nm_open(char* pathname, int flags) +{ + + if (strcmp(pathname, NETMAP_DEVICE_NAME) == 0) { + int fd = open(NETMAP_DEVICE_NAME, O_RDWR); + if (fd < 0) { + return -1; + } + + win_insert_fd_record(fd); + return fd; + } else { + return open(pathname, flags); + } +} + +#define open win_nm_open + +static int +win_nm_close(int fd) +{ + if (fd != -1) { + close(fd); + if (win_get_netmap_handle(fd) != NULL) { + win_remove_fd_record(fd); + } + } + return 0; +} + +#define close win_nm_close + +#endif /* _WIN32 */ + +static int +nm_is_identifier(const char *s, const char *e) +{ + for (; s != e; s++) { + if (!isalnum(*s) && *s != '_') { + return 0; + } + } + + return 1; +} /* * Try to open, return descriptor if successful, NULL otherwise. * An invalid netmap name will return errno = 0; * You can pass a pointer to a pre-filled nm_desc to add special * parameters. Flags is used as follows - * NM_OPEN_NO_MMAP use the memory from arg, only + * NM_OPEN_NO_MMAP use the memory from arg, only XXX avoid mmap * if the nr_arg2 (memory block) matches. * NM_OPEN_ARG1 use req.nr_arg1 from arg * NM_OPEN_ARG2 use req.nr_arg2 from arg @@ -359,20 +631,49 @@ u_int namelen; uint32_t nr_ringid = 0, nr_flags, nr_reg; const char *port = NULL; + const char *vpname = NULL; #define MAXERRMSG 80 char errmsg[MAXERRMSG] = ""; - enum { P_START, P_RNGSFXOK, P_GETNUM, P_FLAGS, P_FLAGSOK } p_state; + enum { P_START, P_RNGSFXOK, P_GETNUM, P_FLAGS, P_FLAGSOK, P_MEMID } p_state; + int is_vale; long num; + uint16_t nr_arg2 = 0; - if (strncmp(ifname, "netmap:", 7) && strncmp(ifname, "vale", 4)) { + if (strncmp(ifname, "netmap:", 7) && + strncmp(ifname, NM_BDG_NAME, strlen(NM_BDG_NAME))) { errno = 0; /* name not recognised, not an error */ return NULL; } - if (ifname[0] == 'n') + + is_vale = (ifname[0] == 'v'); + if (is_vale) { + port = index(ifname, ':'); + if (port == NULL) { + snprintf(errmsg, MAXERRMSG, + "missing ':' in vale name"); + goto fail; + } + + if (!nm_is_identifier(ifname + 4, port)) { + snprintf(errmsg, MAXERRMSG, "invalid bridge name"); + goto fail; + } + + vpname = ++port; + } else { ifname += 7; + port = ifname; + } + /* scan for a separator */ - for (port = ifname; *port && !index("-*^{}/", *port); port++) + for (; *port && !index("-*^{}/@", *port); port++) ; + + if (is_vale && !nm_is_identifier(vpname, port)) { + snprintf(errmsg, MAXERRMSG, "invalid bridge port name"); + goto fail; + } + namelen = port - ifname; if (namelen >= sizeof(d->req.nr_name)) { snprintf(errmsg, MAXERRMSG, "name too long"); @@ -407,6 +708,9 @@ case '/': /* start of flags */ p_state = P_FLAGS; break; + case '@': /* start of memid */ + p_state = P_MEMID; + break; default: snprintf(errmsg, MAXERRMSG, "unknown modifier: '%c'", *port); goto fail; @@ -418,6 +722,9 @@ case '/': p_state = P_FLAGS; break; + case '@': + p_state = P_MEMID; + break; default: snprintf(errmsg, MAXERRMSG, "unexpected character: '%c'", *port); goto fail; @@ -436,6 +743,11 @@ break; case P_FLAGS: case P_FLAGSOK: + if (*port == '@') { + port++; + p_state = P_MEMID; + break; + } switch (*port) { case 'x': nr_flags |= NR_EXCLUSIVE; @@ -449,6 +761,12 @@ case 'r': nr_flags |= NR_MONITOR_RX; break; + case 'R': + nr_flags |= NR_RX_RINGS_ONLY; + break; + case 'T': + nr_flags |= NR_TX_RINGS_ONLY; + break; default: snprintf(errmsg, MAXERRMSG, "unrecognized flag: '%c'", *port); goto fail; @@ -456,12 +774,30 @@ port++; p_state = P_FLAGSOK; break; + case P_MEMID: + if (nr_arg2 != 0) { + snprintf(errmsg, MAXERRMSG, "double setting of memid"); + goto fail; + } + num = strtol(port, (char **)&port, 10); + if (num <= 0) { + snprintf(errmsg, MAXERRMSG, "invalid memid %ld, must be >0", num); + goto fail; + } + nr_arg2 = num; + p_state = P_RNGSFXOK; + break; } } if (p_state != P_START && p_state != P_RNGSFXOK && p_state != P_FLAGSOK) { snprintf(errmsg, MAXERRMSG, "unexpected end of port name"); goto fail; } + if ((nr_flags & NR_ZCOPY_MON) && + !(nr_flags & (NR_MONITOR_TX|NR_MONITOR_RX))) { + snprintf(errmsg, MAXERRMSG, "'z' used but neither 'r', nor 't' found"); + goto fail; + } ND("flags: %s %s %s %s", (nr_flags & NR_EXCLUSIVE) ? "EXCLUSIVE" : "", (nr_flags & NR_ZCOPY_MON) ? "ZCOPY_MON" : "", @@ -474,7 +810,7 @@ return NULL; } d->self = d; /* set this early so nm_close() works */ - d->fd = open("/dev/netmap", O_RDWR); + d->fd = open(NETMAP_DEVICE_NAME, O_RDWR); if (d->fd < 0) { snprintf(errmsg, MAXERRMSG, "cannot open /dev/netmap: %s", strerror(errno)); goto fail; @@ -487,7 +823,9 @@ /* these fields are overridden by ifname and flags processing */ d->req.nr_ringid |= nr_ringid; - d->req.nr_flags = nr_flags; + d->req.nr_flags |= nr_flags; + if (nr_arg2) + d->req.nr_arg2 = nr_arg2; memcpy(d->req.nr_name, ifname, namelen); d->req.nr_name[namelen] = '\0'; /* optionally import info from parent */ @@ -529,31 +867,10 @@ goto fail; } - if (IS_NETMAP_DESC(parent) && parent->mem && - parent->req.nr_arg2 == d->req.nr_arg2) { - /* do not mmap, inherit from parent */ - d->memsize = parent->memsize; - d->mem = parent->mem; - } else { - /* XXX TODO: check if memsize is too large (or there is overflow) */ - d->memsize = d->req.nr_memsize; - d->mem = mmap(0, d->memsize, PROT_WRITE | PROT_READ, MAP_SHARED, - d->fd, 0); - if (d->mem == MAP_FAILED) { - snprintf(errmsg, MAXERRMSG, "mmap failed: %s", strerror(errno)); - goto fail; - } - d->done_mmap = 1; - } - { - struct netmap_if *nifp = NETMAP_IF(d->mem, d->req.nr_offset); - struct netmap_ring *r = NETMAP_RXRING(nifp, ); - - *(struct netmap_if **)(uintptr_t)&(d->nifp) = nifp; - *(struct netmap_ring **)(uintptr_t)&d->some_ring = r; - *(void **)(uintptr_t)&d->buf_start = NETMAP_BUF(r, 0); - *(void **)(uintptr_t)&d->buf_end = - (char *)d->mem + d->memsize; + /* if parent is defined, do nm_mmap() even if NM_OPEN_NO_MMAP is set */ + if ((!(new_flags & NM_OPEN_NO_MMAP) || parent) && nm_mmap(d, parent)) { + snprintf(errmsg, MAXERRMSG, "mmap failed: %s", strerror(errno)); + goto fail; } nr_reg = d->req.nr_flags & NR_REG_MASK; @@ -626,14 +943,54 @@ return EINVAL; if (d->done_mmap && d->mem) munmap(d->mem, d->memsize); - if (d->fd != -1) + if (d->fd != -1) { close(d->fd); + } + bzero(d, sizeof(*d)); free(d); return 0; } +static int +nm_mmap(struct nm_desc *d, const struct nm_desc *parent) +{ + //XXX TODO: check if mmap is already done + + if (IS_NETMAP_DESC(parent) && parent->mem && + parent->req.nr_arg2 == d->req.nr_arg2) { + /* do not mmap, inherit from parent */ + D("do not mmap, inherit from parent"); + d->memsize = parent->memsize; + d->mem = parent->mem; + } else { + /* XXX TODO: check if memsize is too large (or there is overflow) */ + d->memsize = d->req.nr_memsize; + d->mem = mmap(0, d->memsize, PROT_WRITE | PROT_READ, MAP_SHARED, + d->fd, 0); + if (d->mem == MAP_FAILED) { + goto fail; + } + d->done_mmap = 1; + } + { + struct netmap_if *nifp = NETMAP_IF(d->mem, d->req.nr_offset); + struct netmap_ring *r = NETMAP_RXRING(nifp, ); + + *(struct netmap_if **)(uintptr_t)&(d->nifp) = nifp; + *(struct netmap_ring **)(uintptr_t)&d->some_ring = r; + *(void **)(uintptr_t)&d->buf_start = NETMAP_BUF(r, 0); + *(void **)(uintptr_t)&d->buf_end = + (char *)d->mem + d->memsize; + } + + return 0; + +fail: + return EINVAL; +} + /* * Same prototype as pcap_inject(), only need to cast. */ @@ -674,6 +1031,9 @@ { int n = d->last_rx_ring - d->first_rx_ring + 1; int c, got = 0, ri = d->cur_rx_ring; + d->hdr.buf = NULL; + d->hdr.flags = NM_MORE_PKTS; + d->hdr.d = d; if (cnt == 0) cnt = -1; @@ -690,17 +1050,24 @@ ri = d->first_rx_ring; ring = NETMAP_RXRING(d->nifp, ri); for ( ; !nm_ring_empty(ring) && cnt != got; got++) { - u_int i = ring->cur; - u_int idx = ring->slot[i].buf_idx; - u_char *buf = (u_char *)NETMAP_BUF(ring, idx); - + u_int idx, i; + if (d->hdr.buf) { /* from previous round */ + cb(arg, &d->hdr, d->hdr.buf); + } + i = ring->cur; + idx = ring->slot[i].buf_idx; + d->hdr.slot = &ring->slot[i]; + d->hdr.buf = (u_char *)NETMAP_BUF(ring, idx); // __builtin_prefetch(buf); d->hdr.len = d->hdr.caplen = ring->slot[i].len; d->hdr.ts = ring->ts; - cb(arg, &d->hdr, buf); ring->head = ring->cur = nm_ring_next(ring, i); } } + if (d->hdr.buf) { /* from previous round */ + d->hdr.flags = 0; + cb(arg, &d->hdr, d->hdr.buf); + } d->cur_rx_ring = ri; return got; } diff -u -r -N usr/src/sys/net/netmap_virt.h /usr/src/sys/net/netmap_virt.h --- usr/src/sys/net/netmap_virt.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/sys/net/netmap_virt.h 2016-11-23 16:57:57.858924000 +0000 @@ -0,0 +1,305 @@ +/* + * Copyright (C) 2013-2016 Luigi Rizzo + * Copyright (C) 2013-2016 Giuseppe Lettieri + * Copyright (C) 2013-2016 Vincenzo Maffione + * Copyright (C) 2015 Stefano Garzarella + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef NETMAP_VIRT_H +#define NETMAP_VIRT_H + +/* + * ptnetmap_memdev: device used to expose memory into the guest VM + * + * These macros are used in the hypervisor frontend (QEMU, bhyve) and in the + * guest device driver. + */ + +/* PCI identifiers and PCI BARs for the ptnetmap memdev + * and ptnetmap network interface. */ +#define PTNETMAP_MEMDEV_NAME "ptnetmap-memdev" +#define PTNETMAP_PCI_VENDOR_ID 0x1b36 /* QEMU virtual devices */ +#define PTNETMAP_PCI_DEVICE_ID 0x000c /* memory device */ +#define PTNETMAP_PCI_NETIF_ID 0x000d /* ptnet network interface */ +#define PTNETMAP_IO_PCI_BAR 0 +#define PTNETMAP_MEM_PCI_BAR 1 +#define PTNETMAP_MSIX_PCI_BAR 2 + +/* Registers for the ptnetmap memdev */ +#define PTNET_MDEV_IO_MEMSIZE_LO 0 /* netmap memory size (low) */ +#define PTNET_MDEV_IO_MEMSIZE_HI 4 /* netmap_memory_size (high) */ +#define PTNET_MDEV_IO_MEMID 8 /* memory allocator ID in the host */ +#define PTNET_MDEV_IO_IF_POOL_OFS 64 +#define PTNET_MDEV_IO_IF_POOL_OBJNUM 68 +#define PTNET_MDEV_IO_IF_POOL_OBJSZ 72 +#define PTNET_MDEV_IO_RING_POOL_OFS 76 +#define PTNET_MDEV_IO_RING_POOL_OBJNUM 80 +#define PTNET_MDEV_IO_RING_POOL_OBJSZ 84 +#define PTNET_MDEV_IO_BUF_POOL_OFS 88 +#define PTNET_MDEV_IO_BUF_POOL_OBJNUM 92 +#define PTNET_MDEV_IO_BUF_POOL_OBJSZ 96 +#define PTNET_MDEV_IO_END 100 + +/* + * ptnetmap configuration + * + * The ptnet kthreads (running in host kernel-space) need to be configured + * in order to know how to intercept guest kicks (I/O register writes) and + * how to inject MSI-X interrupts to the guest. The configuration may vary + * depending on the hypervisor. Currently, we support QEMU/KVM on Linux and + * and bhyve on FreeBSD. + * The configuration is passed by the hypervisor to the host netmap module + * by means of an ioctl() with nr_cmd=NETMAP_PT_HOST_CREATE, and it is + * specified by the ptnetmap_cfg struct. This struct contains an header + * with general informations and an array of entries whose size depends + * on the hypervisor. The NETMAP_PT_HOST_CREATE command is issued every + * time the kthreads are started. + */ +struct ptnetmap_cfg { +#define PTNETMAP_CFGTYPE_QEMU 0x1 +#define PTNETMAP_CFGTYPE_BHYVE 0x2 + uint16_t cfgtype; /* how to interpret the cfg entries */ + uint16_t entry_size; /* size of a config entry */ + uint32_t num_rings; /* number of config entries */ + void *ptrings; /* ptrings inside CSB */ + /* Configuration entries are allocated right after the struct. */ +}; + +/* Configuration of a ptnetmap ring for QEMU. */ +struct ptnetmap_cfgentry_qemu { + uint32_t ioeventfd; /* to intercept guest register access */ + uint32_t irqfd; /* to inject guest interrupts */ +}; + +/* Configuration of a ptnetmap ring for bhyve. */ +struct ptnetmap_cfgentry_bhyve { + uint64_t wchan; /* tsleep() parameter, to wake up kthread */ + uint32_t ioctl_fd; /* ioctl fd */ + /* ioctl parameters to send irq */ + uint32_t ioctl_cmd; + /* vmm.ko MSIX parameters for IOCTL */ + struct { + uint64_t msg_data; + uint64_t addr; + } ioctl_data; +}; + +/* + * Structure filled-in by the kernel when asked for allocator info + * through NETMAP_POOLS_INFO_GET. Used by hypervisors supporting + * ptnetmap. + */ +struct netmap_pools_info { + uint64_t memsize; /* same as nmr->nr_memsize */ + uint32_t memid; /* same as nmr->nr_arg2 */ + uint32_t if_pool_offset; + uint32_t if_pool_objtotal; + uint32_t if_pool_objsize; + uint32_t ring_pool_offset; + uint32_t ring_pool_objtotal; + uint32_t ring_pool_objsize; + uint32_t buf_pool_offset; + uint32_t buf_pool_objtotal; + uint32_t buf_pool_objsize; +}; + +/* + * Pass a pointer to a userspace buffer to be passed to kernelspace for write + * or read. Used by NETMAP_PT_HOST_CREATE and NETMAP_POOLS_INFO_GET. + */ +static inline void +nmreq_pointer_put(struct nmreq *nmr, void *userptr) +{ + uintptr_t *pp = (uintptr_t *)&nmr->nr_arg1; + *pp = (uintptr_t)userptr; +} + +/* ptnetmap features */ +#define PTNETMAP_F_VNET_HDR 1 + +/* I/O registers for the ptnet device. */ +#define PTNET_IO_PTFEAT 0 +#define PTNET_IO_PTCTL 4 +#define PTNET_IO_MAC_LO 8 +#define PTNET_IO_MAC_HI 12 +#define PTNET_IO_CSBBAH 16 +#define PTNET_IO_CSBBAL 20 +#define PTNET_IO_NIFP_OFS 24 +#define PTNET_IO_NUM_TX_RINGS 28 +#define PTNET_IO_NUM_RX_RINGS 32 +#define PTNET_IO_NUM_TX_SLOTS 36 +#define PTNET_IO_NUM_RX_SLOTS 40 +#define PTNET_IO_VNET_HDR_LEN 44 +#define PTNET_IO_HOSTMEMID 48 +#define PTNET_IO_END 52 +#define PTNET_IO_KICK_BASE 128 +#define PTNET_IO_MASK 0xff + +/* ptnetmap control commands (values for PTCTL register) */ +#define PTNETMAP_PTCTL_CREATE 1 +#define PTNETMAP_PTCTL_DELETE 2 + +/* If defined, CSB is allocated by the guest, not by the host. */ +#define PTNET_CSB_ALLOC + +/* ptnetmap ring fields shared between guest and host */ +struct ptnet_ring { + /* XXX revise the layout to minimize cache bounces. */ + uint32_t head; /* GW+ HR+ the head of the guest netmap_ring */ + uint32_t cur; /* GW+ HR+ the cur of the guest netmap_ring */ + uint32_t guest_need_kick; /* GW+ HR+ host-->guest notification enable */ + uint32_t sync_flags; /* GW+ HR+ the flags of the guest [tx|rx]sync() */ + uint32_t hwcur; /* GR+ HW+ the hwcur of the host netmap_kring */ + uint32_t hwtail; /* GR+ HW+ the hwtail of the host netmap_kring */ + uint32_t host_need_kick; /* GR+ HW+ guest-->host notification enable */ + char pad[4]; +}; + +/* CSB for the ptnet device. */ +struct ptnet_csb { +#define NETMAP_VIRT_CSB_SIZE 4096 + struct ptnet_ring rings[NETMAP_VIRT_CSB_SIZE/sizeof(struct ptnet_ring)]; +}; + +#ifdef WITH_PTNETMAP_GUEST + +/* ptnetmap_memdev routines used to talk with ptnetmap_memdev device driver */ +struct ptnetmap_memdev; +int nm_os_pt_memdev_iomap(struct ptnetmap_memdev *, vm_paddr_t *, void **, + uint64_t *); +void nm_os_pt_memdev_iounmap(struct ptnetmap_memdev *); +uint32_t nm_os_pt_memdev_ioread(struct ptnetmap_memdev *, unsigned int); + +/* Guest driver: Write kring pointers (cur, head) to the CSB. + * This routine is coupled with ptnetmap_host_read_kring_csb(). */ +static inline void +ptnetmap_guest_write_kring_csb(struct ptnet_ring *ptr, uint32_t cur, + uint32_t head) +{ + /* + * We need to write cur and head to the CSB but we cannot do it atomically. + * There is no way we can prevent the host from reading the updated value + * of one of the two and the old value of the other. However, if we make + * sure that the host never reads a value of head more recent than the + * value of cur we are safe. We can allow the host to read a value of cur + * more recent than the value of head, since in the netmap ring cur can be + * ahead of head and cur cannot wrap around head because it must be behind + * tail. Inverting the order of writes below could instead result into the + * host to think head went ahead of cur, which would cause the sync + * prologue to fail. + * + * The following memory barrier scheme is used to make this happen: + * + * Guest Host + * + * STORE(cur) LOAD(head) + * mb() <-----------> mb() + * STORE(head) LOAD(cur) + */ + ptr->cur = cur; + mb(); + ptr->head = head; +} + +/* Guest driver: Read kring pointers (hwcur, hwtail) from the CSB. + * This routine is coupled with ptnetmap_host_write_kring_csb(). */ +static inline void +ptnetmap_guest_read_kring_csb(struct ptnet_ring *ptr, struct netmap_kring *kring) +{ + /* + * We place a memory barrier to make sure that the update of hwtail never + * overtakes the update of hwcur. + * (see explanation in ptnetmap_host_write_kring_csb). + */ + kring->nr_hwtail = ptr->hwtail; + mb(); + kring->nr_hwcur = ptr->hwcur; +} + +#endif /* WITH_PTNETMAP_GUEST */ + +#ifdef WITH_PTNETMAP_HOST +/* + * ptnetmap kernel thread routines + * */ + +/* Functions to read and write CSB fields in the host */ +#if defined (linux) +#define CSB_READ(csb, field, r) (get_user(r, &csb->field)) +#define CSB_WRITE(csb, field, v) (put_user(v, &csb->field)) +#else /* ! linux */ +#define CSB_READ(csb, field, r) (r = fuword32(&csb->field)) +#define CSB_WRITE(csb, field, v) (suword32(&csb->field, v)) +#endif /* ! linux */ + +/* Host netmap: Write kring pointers (hwcur, hwtail) to the CSB. + * This routine is coupled with ptnetmap_guest_read_kring_csb(). */ +static inline void +ptnetmap_host_write_kring_csb(struct ptnet_ring __user *ptr, uint32_t hwcur, + uint32_t hwtail) +{ + /* + * The same scheme used in ptnetmap_guest_write_kring_csb() applies here. + * We allow the guest to read a value of hwcur more recent than the value + * of hwtail, since this would anyway result in a consistent view of the + * ring state (and hwcur can never wraparound hwtail, since hwcur must be + * behind head). + * + * The following memory barrier scheme is used to make this happen: + * + * Guest Host + * + * STORE(hwcur) LOAD(hwtail) + * mb() <-------------> mb() + * STORE(hwtail) LOAD(hwcur) + */ + CSB_WRITE(ptr, hwcur, hwcur); + mb(); + CSB_WRITE(ptr, hwtail, hwtail); +} + +/* Host netmap: Read kring pointers (head, cur, sync_flags) from the CSB. + * This routine is coupled with ptnetmap_guest_write_kring_csb(). */ +static inline void +ptnetmap_host_read_kring_csb(struct ptnet_ring __user *ptr, + struct netmap_ring *shadow_ring, + uint32_t num_slots) +{ + /* + * We place a memory barrier to make sure that the update of head never + * overtakes the update of cur. + * (see explanation in ptnetmap_guest_write_kring_csb). + */ + CSB_READ(ptr, head, shadow_ring->head); + mb(); + CSB_READ(ptr, cur, shadow_ring->cur); + CSB_READ(ptr, sync_flags, shadow_ring->flags); +} + +#endif /* WITH_PTNETMAP_HOST */ + +#endif /* NETMAP_VIRT_H */ diff -u -r -N usr/src/usr/src/.arcconfig /usr/src/usr/src/.arcconfig --- usr/src/usr/src/.arcconfig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr/src/.arcconfig 2016-09-29 00:26:36.000000000 +0100 @@ -0,0 +1,5 @@ +{ + "repository.callsign" : "S", + "phabricator.uri" : "https://reviews.freebsd.org/", + "history.immutable" : true +} diff -u -r -N usr/src/usr/src/.arclint /usr/src/usr/src/.arclint --- usr/src/usr/src/.arclint 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr/src/.arclint 2016-09-29 00:26:36.000000000 +0100 @@ -0,0 +1,25 @@ +{ + "exclude": "(contrib|crypto)", + "linters": { + "python": { + "type": "pep8", + "include": "(\\.py$)" + }, + "spelling": { + "type": "spelling" + }, + "chmod": { + "type": "chmod" + }, + "merge-conflict": { + "type": "merge-conflict" + }, + "filename": { + "type": "filename" + }, + "json": { + "type": "json", + "include": "(\\.arclint|\\.json$)" + } + } +} diff -u -r -N usr/src/usr.sbin/bhyve/Makefile /usr/src/usr.sbin/bhyve/Makefile --- usr/src/usr.sbin/bhyve/Makefile 2016-09-29 00:25:07.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/Makefile 2016-11-30 10:56:05.807250000 +0000 @@ -27,6 +27,8 @@ mem.c \ mevent.c \ mptbl.c \ + net_backends.c \ + net_utils.c \ pci_ahci.c \ pci_emul.c \ pci_fbuf.c \ @@ -34,6 +36,8 @@ pci_irq.c \ pci_lpc.c \ pci_passthru.c \ + pci_ptnetmap_memdev.c \ + pci_ptnetmap_netif.c \ pci_virtio_block.c \ pci_virtio_net.c \ pci_virtio_rnd.c \ @@ -62,6 +66,8 @@ LIBADD= vmmapi md pthread z CFLAGS+= -I${BHYVE_SYSDIR}/sys/dev/usb/controller +CFLAGS+= -I${BHYVE_SYSDIR}/sys/ +CFLAGS+= -DWITH_NETMAP WARNS?= 2 diff -u -r -N usr/src/usr.sbin/bhyve/Makefile.orig /usr/src/usr.sbin/bhyve/Makefile.orig --- usr/src/usr.sbin/bhyve/Makefile.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/Makefile.orig 2016-11-30 10:52:59.883164000 +0000 @@ -0,0 +1,68 @@ +# +# $FreeBSD: releng/11.0/usr.sbin/bhyve/Makefile 302332 2016-07-04 03:19:06Z grehan $ +# + +PROG= bhyve +PACKAGE= bhyve + +DEBUG_FLAGS= -g -O0 + +MAN= bhyve.8 + +BHYVE_SYSDIR?=${SRCTOP} + +SRCS= \ + atkbdc.c \ + acpi.c \ + bhyvegc.c \ + bhyverun.c \ + block_if.c \ + bootrom.c \ + console.c \ + consport.c \ + dbgport.c \ + fwctl.c \ + inout.c \ + ioapic.c \ + mem.c \ + mevent.c \ + mptbl.c \ + pci_ahci.c \ + pci_emul.c \ + pci_fbuf.c \ + pci_hostbridge.c \ + pci_irq.c \ + pci_lpc.c \ + pci_passthru.c \ + pci_virtio_block.c \ + pci_virtio_net.c \ + pci_virtio_rnd.c \ + pci_uart.c \ + pci_xhci.c \ + pm.c \ + post.c \ + ps2kbd.c \ + ps2mouse.c \ + rfb.c \ + rtc.c \ + smbiostbl.c \ + sockstream.c \ + task_switch.c \ + uart_emul.c \ + usb_emul.c \ + usb_mouse.c \ + virtio.c \ + vga.c \ + xmsr.c \ + spinup_ap.c + +.PATH: ${BHYVE_SYSDIR}/sys/amd64/vmm +SRCS+= vmm_instruction_emul.c + +LIBADD= vmmapi md pthread z + +CFLAGS+= -I${BHYVE_SYSDIR}/sys/dev/usb/controller + +WARNS?= 2 + +.include <bsd.prog.mk> diff -u -r -N usr/src/usr.sbin/bhyve/net_backends.c /usr/src/usr.sbin/bhyve/net_backends.c --- usr/src/usr.sbin/bhyve/net_backends.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_backends.c 2016-12-01 11:13:54.812679000 +0000 @@ -0,0 +1,1082 @@ +/*- + * Copyright (c) 2014-2016 Vincenzo Maffione + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * This file implements multiple network backends (null, tap, netmap, ...), + * to be used by network frontends such as virtio-net and ptnet. + * The API to access the backend (e.g. send/receive packets, negotiate + * features) is exported by net_backends.h. + */ + +#include <sys/cdefs.h> +#include <sys/uio.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <sys/types.h> /* u_short etc */ +#include <net/if.h> + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <unistd.h> +#include <assert.h> +#include <pthread.h> +#include <pthread_np.h> +#include <poll.h> +#include <assert.h> + +#include "mevent.h" +#include "net_backends.h" + +#include <sys/linker_set.h> + +/* + * Each network backend registers a set of function pointers that are + * used to implement the net backends API. + * This might need to be exposed if we implement backends in separate files. + */ +struct net_backend { + const char *name; /* name of the backend */ + /* + * The init and cleanup functions are used internally, + * virtio-net should never use it. + */ + int (*init)(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param); + void (*cleanup)(struct net_backend *be); + + + /* + * Called to serve a guest transmit request. The scatter-gather + * vector provided by the caller has 'iovcnt' elements and contains + * the packet to send. 'len' is the length of whole packet in bytes. + */ + int (*send)(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); + + /* + * Called to serve guest receive request. When the function + * returns a positive value, the scatter-gather vector + * provided by the caller (having 'iovcnt' elements in it) will + * contain a chunk of the received packet. The 'more' flag will + * be set if the returned chunk was the last one for the current + * packet, and 0 otherwise. The function returns the chunk size + * in bytes, or 0 if the backend doesn't have a new packet to + * receive. + * Note that it may be necessary to call this callback many + * times to receive a single packet, depending of how big is + * buffers you provide. + */ + int (*recv)(struct net_backend *be, struct iovec *iov, int iovcnt); + + /* + * Ask the backend for the virtio-net features it is able to + * support. Possible features are TSO, UFO and checksum offloading + * in both rx and tx direction and for both IPv4 and IPv6. + */ + uint64_t (*get_cap)(struct net_backend *be); + + /* + * Tell the backend to enable/disable the specified virtio-net + * features (capabilities). + */ + int (*set_cap)(struct net_backend *be, uint64_t features, + unsigned int vnet_hdr_len); + + struct pci_vtnet_softc *sc; + int fd; + unsigned int be_vnet_hdr_len; + unsigned int fe_vnet_hdr_len; + void *priv; /* Pointer to backend-specific data. */ +}; + +SET_DECLARE(net_backend_s, struct net_backend); + +#define VNET_HDR_LEN sizeof(struct virtio_net_rxhdr) + +#define WPRINTF(params) printf params + +/* the null backend */ +static int +netbe_null_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + (void)devname; (void)cb; (void)param; + be->fd = -1; + return 0; +} + +static void +netbe_null_cleanup(struct net_backend *be) +{ + (void)be; +} + +static uint64_t +netbe_null_get_cap(struct net_backend *be) +{ + (void)be; + return 0; +} + +static int +netbe_null_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + (void)be; (void)features; (void)vnet_hdr_len; + return 0; +} + +static int +netbe_null_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more) +{ + (void)be; (void)iov; (void)iovcnt; (void)len; (void)more; + return 0; /* pretend we send */ +} + +static int +netbe_null_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + (void)be; (void)iov; (void)iovcnt; + fprintf(stderr, "netbe_null_recv called ?\n"); + return -1; /* never called, i believe */ +} + +static struct net_backend n_be = { + .name = "null", + .init = netbe_null_init, + .cleanup = netbe_null_cleanup, + .send = netbe_null_send, + .recv = netbe_null_recv, + .get_cap = netbe_null_get_cap, + .set_cap = netbe_null_set_cap, +}; + +DATA_SET(net_backend_s, n_be); + + +/* the tap backend */ + +struct tap_priv { + struct mevent *mevp; +}; + +static void +tap_cleanup(struct net_backend *be) +{ + struct tap_priv *priv = be->priv; + + if (be->priv) { + mevent_delete(priv->mevp); + free(be->priv); + be->priv = NULL; + } + if (be->fd != -1) { + close(be->fd); + be->fd = -1; + } +} + +static int +tap_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + char tbuf[80]; + int fd; + int opt = 1; + struct tap_priv *priv; + + if (cb == NULL) { + WPRINTF(("TAP backend requires non-NULL callback\n")); + return -1; + } + + priv = calloc(1, sizeof(struct tap_priv)); + if (priv == NULL) { + WPRINTF(("tap_priv alloc failed\n")); + return -1; + } + + strcpy(tbuf, "/dev/"); + strlcat(tbuf, devname, sizeof(tbuf)); + + fd = open(tbuf, O_RDWR); + if (fd == -1) { + WPRINTF(("open of tap device %s failed\n", tbuf)); + goto error; + } + + /* + * Set non-blocking and register for read + * notifications with the event loop + */ + if (ioctl(fd, FIONBIO, &opt) < 0) { + WPRINTF(("tap device O_NONBLOCK failed\n")); + goto error; + } + + priv->mevp = mevent_add(fd, EVF_READ, cb, param); + if (priv->mevp == NULL) { + WPRINTF(("Could not register event\n")); + goto error; + } + + be->fd = fd; + be->priv = priv; + + return 0; + +error: + tap_cleanup(be); + return -1; +} + +/* + * Called to send a buffer chain out to the tap device + */ +static int +tap_send(struct net_backend *be, struct iovec *iov, int iovcnt, uint32_t len, + int more) +{ + static char pad[60]; /* all zero bytes */ + + (void)more; + /* + * If the length is < 60, pad out to that and add the + * extra zero'd segment to the iov. It is guaranteed that + * there is always an extra iov available by the caller. + */ + if (len < 60) { + iov[iovcnt].iov_base = pad; + iov[iovcnt].iov_len = (size_t)(60 - len); + iovcnt++; + } + + return (int)writev(be->fd, iov, iovcnt); +} + +static int +tap_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + int ret; + + /* Should never be called without a valid tap fd */ + assert(be->fd != -1); + + ret = (int)readv(be->fd, iov, iovcnt); + + if (ret < 0 && errno == EWOULDBLOCK) { + return 0; + } + + return ret; +} + +static uint64_t +tap_get_cap(struct net_backend *be) +{ + (void)be; + return 0; // nothing extra +} + +static int +tap_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + (void)be; + return (features || vnet_hdr_len) ? -1 : 0; +} + +static struct net_backend tap_backend = { + .name = "tap|vmmnet", + .init = tap_init, + .cleanup = tap_cleanup, + .send = tap_send, + .recv = tap_recv, + .get_cap = tap_get_cap, + .set_cap = tap_set_cap, +}; + +DATA_SET(net_backend_s, tap_backend); + +#ifdef WITH_NETMAP + +/* + * The netmap backend + */ + +/* The virtio-net features supported by netmap. */ +#define NETMAP_FEATURES (VIRTIO_NET_F_CSUM | VIRTIO_NET_F_HOST_TSO4 | \ + VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_UFO | \ + VIRTIO_NET_F_GUEST_CSUM | VIRTIO_NET_F_GUEST_TSO4 | \ + VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_UFO) + +#define NETMAP_POLLMASK (POLLIN | POLLRDNORM | POLLRDBAND) + +struct netmap_priv { + char ifname[IFNAMSIZ]; + struct nm_desc *nmd; + uint16_t memid; + struct netmap_ring *rx; + struct netmap_ring *tx; + pthread_t evloop_tid; + net_backend_cb_t cb; + void *cb_param; + + struct ptnetmap_state ptnetmap; +}; + +static void * +netmap_evloop_thread(void *param) +{ + struct net_backend *be = param; + struct netmap_priv *priv = be->priv; + struct pollfd pfd; + int ret; + + for (;;) { + pfd.fd = be->fd; + pfd.events = NETMAP_POLLMASK; + ret = poll(&pfd, 1, INFTIM); + if (ret == -1 && errno != EINTR) { + WPRINTF(("netmap poll failed, %d\n", errno)); + } else if (ret == 1 && (pfd.revents & NETMAP_POLLMASK)) { + priv->cb(pfd.fd, EVF_READ, priv->cb_param); + } + } + + return NULL; +} + +static void +nmreq_init(struct nmreq *req, char *ifname) +{ + memset(req, 0, sizeof(*req)); + strncpy(req->nr_name, ifname, sizeof(req->nr_name)); + req->nr_version = NETMAP_API; +} + +static int +netmap_set_vnet_hdr_len(struct net_backend *be, int vnet_hdr_len) +{ + int err; + struct nmreq req; + struct netmap_priv *priv = be->priv; + + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_BDG_VNET_HDR; + req.nr_arg1 = vnet_hdr_len; + err = ioctl(be->fd, NIOCREGIF, &req); + if (err) { + WPRINTF(("Unable to set vnet header length %d\n", + vnet_hdr_len)); + return err; + } + + be->be_vnet_hdr_len = vnet_hdr_len; + + return 0; +} + +static int +netmap_has_vnet_hdr_len(struct net_backend *be, unsigned vnet_hdr_len) +{ + int prev_hdr_len = be->be_vnet_hdr_len; + int ret; + + if (vnet_hdr_len == prev_hdr_len) { + return 1; + } + + ret = netmap_set_vnet_hdr_len(be, vnet_hdr_len); + if (ret) { + return 0; + } + + netmap_set_vnet_hdr_len(be, prev_hdr_len); + + return 1; +} + +static uint64_t +netmap_get_cap(struct net_backend *be) +{ + return netmap_has_vnet_hdr_len(be, VNET_HDR_LEN) ? + NETMAP_FEATURES : 0; +} + +static int +netmap_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + return netmap_set_vnet_hdr_len(be, vnet_hdr_len); +} + +/* Store and return the features we agreed upon. */ +uint32_t +ptnetmap_ack_features(struct ptnetmap_state *ptn, uint32_t wanted_features) +{ + ptn->acked_features = ptn->features & wanted_features; + + return ptn->acked_features; +} + +struct ptnetmap_state * +get_ptnetmap(struct net_backend *be) +{ + struct netmap_priv *priv = be ? be->priv : NULL; + struct netmap_pools_info pi; + struct nmreq req; + int err; + + /* Check that this is a ptnetmap backend. */ + if (!be || be->set_cap != netmap_set_cap || + !(priv->nmd->req.nr_flags & NR_PTNETMAP_HOST)) { + return NULL; + } + + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_POOLS_INFO_GET; + nmreq_pointer_put(&req, &pi); + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + return NULL; + } + + err = ptn_memdev_attach(priv->nmd->mem, &pi); + if (err) { + return NULL; + } + + return &priv->ptnetmap; +} + +int +ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, struct netmap_if_info *nif) +{ + struct netmap_priv *priv = ptn->netmap_priv; + + memset(nif, 0, sizeof(*nif)); + if (priv->nmd == NULL) { + return EINVAL; + } + + nif->nifp_offset = priv->nmd->req.nr_offset; + nif->num_tx_rings = priv->nmd->req.nr_tx_rings; + nif->num_rx_rings = priv->nmd->req.nr_rx_rings; + nif->num_tx_slots = priv->nmd->req.nr_tx_slots; + nif->num_rx_slots = priv->nmd->req.nr_rx_slots; + + return 0; +} + +int +ptnetmap_get_hostmemid(struct ptnetmap_state *ptn) +{ + struct netmap_priv *priv = ptn->netmap_priv; + + if (priv->nmd == NULL) { + return EINVAL; + } + + return priv->memid; +} + +int +ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg) +{ + struct netmap_priv *priv = ptn->netmap_priv; + struct nmreq req; + int err; + + if (ptn->running) { + return 0; + } + + /* XXX We should stop the netmap evloop here. */ + + /* Ask netmap to create kthreads for this interface. */ + nmreq_init(&req, priv->ifname); + nmreq_pointer_put(&req, cfg); + req.nr_cmd = NETMAP_PT_HOST_CREATE; + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + fprintf(stderr, "%s: Unable to create ptnetmap kthreads on " + "%s [errno=%d]", __func__, priv->ifname, errno); + return err; + } + + ptn->running = 1; + + return 0; +} + +int +ptnetmap_delete(struct ptnetmap_state *ptn) +{ + struct netmap_priv *priv = ptn->netmap_priv; + struct nmreq req; + int err; + + if (!ptn->running) { + return 0; + } + + /* Ask netmap to delete kthreads for this interface. */ + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_PT_HOST_DELETE; + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + fprintf(stderr, "%s: Unable to create ptnetmap kthreads on " + "%s [errno=%d]", __func__, priv->ifname, errno); + return err; + } + + ptn->running = 0; + + return 0; +} + +static int +netmap_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + const char *ndname = "/dev/netmap"; + struct netmap_priv *priv = NULL; + struct nmreq req; + int ptnetmap = (cb == NULL); + + priv = calloc(1, sizeof(struct netmap_priv)); + if (priv == NULL) { + WPRINTF(("Unable alloc netmap private data\n")); + return -1; + } + + strncpy(priv->ifname, devname, sizeof(priv->ifname)); + priv->ifname[sizeof(priv->ifname) - 1] = '\0'; + + memset(&req, 0, sizeof(req)); + req.nr_flags = ptnetmap ? NR_PTNETMAP_HOST : 0; + + priv->nmd = nm_open(priv->ifname, &req, NETMAP_NO_TX_POLL, NULL); + if (priv->nmd == NULL) { + WPRINTF(("Unable to nm_open(): device '%s', " + "interface '%s', errno (%s)\n", + ndname, devname, strerror(errno))); + free(priv); + return -1; + } + + priv->memid = priv->nmd->req.nr_arg2; + priv->tx = NETMAP_TXRING(priv->nmd->nifp, 0); + priv->rx = NETMAP_RXRING(priv->nmd->nifp, 0); + priv->cb = cb; + priv->cb_param = param; + be->fd = priv->nmd->fd; + be->priv = priv; + + priv->ptnetmap.netmap_priv = priv; + priv->ptnetmap.features = 0; + priv->ptnetmap.acked_features = 0; + priv->ptnetmap.running = 0; + if (ptnetmap) { + if (netmap_has_vnet_hdr_len(be, VNET_HDR_LEN)) { + priv->ptnetmap.features |= PTNETMAP_F_VNET_HDR; + } + } else { + char tname[40]; + + /* Create a thread for netmap poll. */ + pthread_create(&priv->evloop_tid, NULL, netmap_evloop_thread, (void *)be); + snprintf(tname, sizeof(tname), "netmap-evloop-%p", priv); + pthread_set_name_np(priv->evloop_tid, tname); + } + + return 0; +} + +static void +netmap_cleanup(struct net_backend *be) +{ + struct netmap_priv *priv = be->priv; + + if (be->priv) { + if (priv->ptnetmap.running) { + ptnetmap_delete(&priv->ptnetmap); + } + nm_close(priv->nmd); + free(be->priv); + be->priv = NULL; + } + be->fd = -1; +} + +/* A fast copy routine only for multiples of 64 bytes, non overlapped. */ +static inline void +pkt_copy(const void *_src, void *_dst, int l) +{ + const uint64_t *src = _src; + uint64_t *dst = _dst; + if (l >= 1024) { + bcopy(src, dst, l); + return; + } + for (; l > 0; l -= 64) { + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + } +} + +static int +netmap_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t size, int more) +{ + struct netmap_priv *priv = be->priv; + struct netmap_ring *ring; + int nm_buf_size; + int nm_buf_len; + uint32_t head; + void *nm_buf; + int j; + + if (iovcnt <= 0 || size <= 0) { + D("Wrong iov: iovcnt %d size %d", iovcnt, size); + return 0; + } + + ring = priv->tx; + head = ring->head; + if (head == ring->tail) { + RD(1, "No space, drop %d bytes", size); + goto txsync; + } + nm_buf = NETMAP_BUF(ring, ring->slot[head].buf_idx); + nm_buf_size = ring->nr_buf_size; + nm_buf_len = 0; + + for (j = 0; j < iovcnt; j++) { + int iov_frag_size = iov[j].iov_len; + void *iov_frag_buf = iov[j].iov_base; + + /* Split each iovec fragment over more netmap slots, if + necessary. */ + for (;;) { + int copylen; + + copylen = iov_frag_size < nm_buf_size ? iov_frag_size : nm_buf_size; + pkt_copy(iov_frag_buf, nm_buf, copylen); + + iov_frag_buf += copylen; + iov_frag_size -= copylen; + nm_buf += copylen; + nm_buf_size -= copylen; + nm_buf_len += copylen; + + if (iov_frag_size == 0) { + break; + } + + ring->slot[head].len = nm_buf_len; + ring->slot[head].flags = NS_MOREFRAG; + head = nm_ring_next(ring, head); + if (head == ring->tail) { + /* We ran out of netmap slots while + * splitting the iovec fragments. */ + RD(1, "No space, drop %d bytes", size); + goto txsync; + } + nm_buf = NETMAP_BUF(ring, ring->slot[head].buf_idx); + nm_buf_size = ring->nr_buf_size; + nm_buf_len = 0; + } + } + + /* Complete the last slot, which must not have NS_MOREFRAG set. */ + ring->slot[head].len = nm_buf_len; + ring->slot[head].flags = 0; + head = nm_ring_next(ring, head); + + /* Now update ring->head and ring->cur. */ + ring->head = ring->cur = head; + + if (more) {// && nm_ring_space(ring) > 64 + return 0; + } +txsync: + ioctl(be->fd, NIOCTXSYNC, NULL); + + return 0; +} + +static int +netmap_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + struct netmap_priv *priv = be->priv; + struct netmap_slot *slot = NULL; + struct netmap_ring *ring; + void *iov_frag_buf; + int iov_frag_size; + int totlen = 0; + uint32_t head; + + assert(iovcnt); + + ring = priv->rx; + head = ring->head; + iov_frag_buf = iov->iov_base; + iov_frag_size = iov->iov_len; + + do { + int nm_buf_len; + void *nm_buf; + + if (head == ring->tail) { + return 0; + } + + slot = ring->slot + head; + nm_buf = NETMAP_BUF(ring, slot->buf_idx); + nm_buf_len = slot->len; + + for (;;) { + int copylen = nm_buf_len < iov_frag_size ? nm_buf_len : iov_frag_size; + + pkt_copy(nm_buf, iov_frag_buf, copylen); + nm_buf += copylen; + nm_buf_len -= copylen; + iov_frag_buf += copylen; + iov_frag_size -= copylen; + totlen += copylen; + + if (nm_buf_len == 0) { + break; + } + + iov++; + iovcnt--; + if (iovcnt == 0) { + /* No space to receive. */ + D("Short iov, drop %d bytes", totlen); + return -ENOSPC; + } + iov_frag_buf = iov->iov_base; + iov_frag_size = iov->iov_len; + } + + head = nm_ring_next(ring, head); + + } while (slot->flags & NS_MOREFRAG); + + /* Release slots to netmap. */ + ring->head = ring->cur = head; + + return totlen; +} + +static struct net_backend netmap_backend = { + .name = "netmap|vale", + .init = netmap_init, + .cleanup = netmap_cleanup, + .send = netmap_send, + .recv = netmap_recv, + .get_cap = netmap_get_cap, + .set_cap = netmap_set_cap, +}; + +DATA_SET(net_backend_s, netmap_backend); + +#endif /* WITH_NETMAP */ + +/* + * make sure a backend is properly initialized + */ +static void +netbe_fix(struct net_backend *be) +{ + if (be == NULL) + return; + if (be->name == NULL) { + fprintf(stderr, "missing name for %p\n", be); + be->name = "unnamed netbe"; + } + if (be->init == NULL) { + fprintf(stderr, "missing init for %p %s\n", be, be->name); + be->init = netbe_null_init; + } + if (be->cleanup == NULL) { + fprintf(stderr, "missing cleanup for %p %s\n", be, be->name); + be->cleanup = netbe_null_cleanup; + } + if (be->send == NULL) { + fprintf(stderr, "missing send for %p %s\n", be, be->name); + be->send = netbe_null_send; + } + if (be->recv == NULL) { + fprintf(stderr, "missing recv for %p %s\n", be, be->name); + be->recv = netbe_null_recv; + } + if (be->get_cap == NULL) { + fprintf(stderr, "missing get_cap for %p %s\n", + be, be->name); + be->get_cap = netbe_null_get_cap; + } + if (be->set_cap == NULL) { + fprintf(stderr, "missing set_cap for %p %s\n", + be, be->name); + be->set_cap = netbe_null_set_cap; + } +} + +/* + * keys is a set of prefixes separated by '|', + * return 1 if the leftmost part of name matches one prefix. + */ +static const char * +netbe_name_match(const char *keys, const char *name) +{ + const char *n = name, *good = keys; + char c; + + if (!keys || !name) + return NULL; + while ( (c = *keys++) ) { + if (c == '|') { /* reached the separator */ + if (good) + break; + /* prepare for new round */ + n = name; + good = keys; + } else if (good && c != *n++) { + good = NULL; /* drop till next keyword */ + } + } + return good; +} + +/* + * Initialize a backend and attach to the frontend. + * This is called during frontend initialization. + * devname is the backend-name as supplied on the command line, + * e.g. -s 2:0,frontend-name,backend-name[,other-args] + * cb is the receive callback supplied by the frontend, + * and it is invoked in the event loop when a receive + * event is generated in the hypervisor, + * param is a pointer to the frontend, and normally used as + * the argument for the callback. + */ +struct net_backend * +netbe_init(const char *devname, net_backend_cb_t cb, void *param) +{ + struct net_backend **pbe, *be, *tbe = NULL; + int err; + + /* + * Find the network backend depending on the user-provided + * device name. net_backend_s is built using a linker set. + */ + SET_FOREACH(pbe, net_backend_s) { + if (netbe_name_match((*pbe)->name, devname)) { + tbe = *pbe; + break; + } + } + if (tbe == NULL) + return NULL; /* or null backend ? */ + be = calloc(1, sizeof(*be)); + *be = *tbe; /* copy the template */ + netbe_fix(be); /* make sure we have all fields */ + be->fd = -1; + be->priv = NULL; + be->sc = param; + be->be_vnet_hdr_len = 0; + be->fe_vnet_hdr_len = 0; + + /* initialize the backend */ + err = be->init(be, devname, cb, param); + if (err) { + free(be); + be = NULL; + } + return be; +} + +void +netbe_cleanup(struct net_backend *be) +{ + if (be == NULL) + return; + be->cleanup(be); + free(be); +} + +uint64_t +netbe_get_cap(struct net_backend *be) +{ + if (be == NULL) + return 0; + return be->get_cap(be); +} + +int +netbe_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + int ret; + + if (be == NULL) + return 0; + + /* There are only three valid lengths. */ + if (vnet_hdr_len && vnet_hdr_len != VNET_HDR_LEN + && vnet_hdr_len != (VNET_HDR_LEN - sizeof(uint16_t))) + return -1; + + be->fe_vnet_hdr_len = vnet_hdr_len; + + ret = be->set_cap(be, features, vnet_hdr_len); + assert(be->be_vnet_hdr_len == 0 || + be->be_vnet_hdr_len == be->fe_vnet_hdr_len); + + return ret; +} + +static __inline struct iovec * +iov_trim(struct iovec *iov, int *iovcnt, unsigned int tlen) +{ + struct iovec *riov; + + /* XXX short-cut: assume first segment is >= tlen */ + assert(iov[0].iov_len >= tlen); + + iov[0].iov_len -= tlen; + if (iov[0].iov_len == 0) { + assert(*iovcnt > 1); + *iovcnt -= 1; + riov = &iov[1]; + } else { + iov[0].iov_base = (void *)((uintptr_t)iov[0].iov_base + tlen); + riov = &iov[0]; + } + + return (riov); +} + +void +netbe_send(struct net_backend *be, struct iovec *iov, int iovcnt, uint32_t len, + int more) +{ + if (be == NULL) + return; +#if 0 + int i; + D("sending iovcnt %d len %d iovec %p", iovcnt, len, iov); + for (i=0; i < iovcnt; i++) + D(" %3d: %4d %p", i, (int)iov[i].iov_len, iov[i].iov_base); +#endif + if (be->be_vnet_hdr_len != be->fe_vnet_hdr_len) { + /* Here we are sure be->be_vnet_hdr_len is 0. */ + iov = iov_trim(iov, &iovcnt, be->fe_vnet_hdr_len); + } + + be->send(be, iov, iovcnt, len, more); +} + +/* + * can return -1 in case of errors + */ +int +netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + unsigned int hlen = 0; /* length of prepended virtio-net header */ + int ret; + + if (be == NULL) + return -1; + + if (be->be_vnet_hdr_len != be->fe_vnet_hdr_len) { + struct virtio_net_rxhdr *vh; + + /* Here we are sure be->be_vnet_hdr_len is 0. */ + hlen = be->fe_vnet_hdr_len; + /* + * Get a pointer to the rx header, and use the + * data immediately following it for the packet buffer. + */ + vh = iov[0].iov_base; + iov = iov_trim(iov, &iovcnt, hlen); + + /* + * Here we are sure be->fe_vnet_hdr_len is 0. + * The only valid field in the rx packet header is the + * number of buffers if merged rx bufs were negotiated. + */ + memset(vh, 0, hlen); + + if (hlen == VNET_HDR_LEN) { + vh->vrh_bufs = 1; + } + } + + ret = be->recv(be, iov, iovcnt); + if (ret > 0) { + ret += hlen; + } + + return ret; +} + +/* + * Read a packet from the backend and discard it. + * Returns the size of the discarded packet or zero if no packet was available. + * A negative error code is returned in case of read error. + */ +int +netbe_rx_discard(struct net_backend *be) +{ + /* + * MP note: the dummybuf is only used to discard frames, + * so there is no need for it to be per-vtnet or locked. + * We only make it large enough for TSO-sized segment. + */ + static uint8_t dummybuf[65536+64]; + struct iovec iov; + + iov.iov_base = dummybuf; + iov.iov_len = sizeof(dummybuf); + + return netbe_recv(be, &iov, 1); +} + diff -u -r -N usr/src/usr.sbin/bhyve/net_backends.c.orig /usr/src/usr.sbin/bhyve/net_backends.c.orig --- usr/src/usr.sbin/bhyve/net_backends.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_backends.c.orig 2016-11-30 10:52:59.912526000 +0000 @@ -0,0 +1,2164 @@ +/*- + * Copyright (c) 2014-2016 Vincenzo Maffione + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * This file implements multiple network backends (null, tap, netmap, ...), + * to be used by network frontends such as virtio-net and ptnet. + * The API to access the backend (e.g. send/receive packets, negotiate + * features) is exported by net_backends.h. + */ + +#include <sys/cdefs.h> +#include <sys/uio.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <sys/types.h> /* u_short etc */ +#include <net/if.h> + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <unistd.h> +#include <assert.h> +#include <pthread.h> +#include <pthread_np.h> +#include <poll.h> +#include <assert.h> + +#include "mevent.h" +#include "net_backends.h" + +#include <sys/linker_set.h> + +/* + * Each network backend registers a set of function pointers that are + * used to implement the net backends API. + * This might need to be exposed if we implement backends in separate files. + */ +struct net_backend { + const char *name; /* name of the backend */ + /* + * The init and cleanup functions are used internally, + * virtio-net should never use it. + */ + int (*init)(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param); + void (*cleanup)(struct net_backend *be); + + + /* + * Called to serve a guest transmit request. The scatter-gather + * vector provided by the caller has 'iovcnt' elements and contains + * the packet to send. 'len' is the length of whole packet in bytes. + */ + int (*send)(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); + + /* + * Called to serve guest receive request. When the function + * returns a positive value, the scatter-gather vector + * provided by the caller (having 'iovcnt' elements in it) will + * contain a chunk of the received packet. The 'more' flag will + * be set if the returned chunk was the last one for the current + * packet, and 0 otherwise. The function returns the chunk size + * in bytes, or 0 if the backend doesn't have a new packet to + * receive. + * Note that it may be necessary to call this callback many + * times to receive a single packet, depending of how big is + * buffers you provide. + */ + int (*recv)(struct net_backend *be, struct iovec *iov, int iovcnt); + + /* + * Ask the backend for the virtio-net features it is able to + * support. Possible features are TSO, UFO and checksum offloading + * in both rx and tx direction and for both IPv4 and IPv6. + */ + uint64_t (*get_cap)(struct net_backend *be); + + /* + * Tell the backend to enable/disable the specified virtio-net + * features (capabilities). + */ + int (*set_cap)(struct net_backend *be, uint64_t features, + unsigned int vnet_hdr_len); + + struct pci_vtnet_softc *sc; + int fd; + unsigned int be_vnet_hdr_len; + unsigned int fe_vnet_hdr_len; + void *priv; /* Pointer to backend-specific data. */ +}; + +SET_DECLARE(net_backend_s, struct net_backend); + +#define VNET_HDR_LEN sizeof(struct virtio_net_rxhdr) + +#define WPRINTF(params) printf params + +/* the null backend */ +static int +netbe_null_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + (void)devname; (void)cb; (void)param; + be->fd = -1; + return 0; +} + +static void +netbe_null_cleanup(struct net_backend *be) +{ + (void)be; +} + +static uint64_t +netbe_null_get_cap(struct net_backend *be) +{ + (void)be; + return 0; +} + +static int +netbe_null_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + (void)be; (void)features; (void)vnet_hdr_len; + return 0; +} + +static int +netbe_null_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more) +{ + (void)be; (void)iov; (void)iovcnt; (void)len; (void)more; + return 0; /* pretend we send */ +} + +static int +netbe_null_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + (void)be; (void)iov; (void)iovcnt; + fprintf(stderr, "netbe_null_recv called ?\n"); + return -1; /* never called, i believe */ +} + +static struct net_backend n_be = { + .name = "null", + .init = netbe_null_init, + .cleanup = netbe_null_cleanup, + .send = netbe_null_send, + .recv = netbe_null_recv, + .get_cap = netbe_null_get_cap, + .set_cap = netbe_null_set_cap, +}; + +DATA_SET(net_backend_s, n_be); + + +/* the tap backend */ + +struct tap_priv { + struct mevent *mevp; +}; + +static void +tap_cleanup(struct net_backend *be) +{ + struct tap_priv *priv = be->priv; + + if (be->priv) { + mevent_delete(priv->mevp); + free(be->priv); + be->priv = NULL; + } + if (be->fd != -1) { + close(be->fd); + be->fd = -1; + } +} + +static int +tap_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + char tbuf[80]; + int fd; + int opt = 1; + struct tap_priv *priv; + + if (cb == NULL) { + WPRINTF(("TAP backend requires non-NULL callback\n")); + return -1; + } + + priv = calloc(1, sizeof(struct tap_priv)); + if (priv == NULL) { + WPRINTF(("tap_priv alloc failed\n")); + return -1; + } + + strcpy(tbuf, "/dev/"); + strlcat(tbuf, devname, sizeof(tbuf)); + + fd = open(tbuf, O_RDWR); + if (fd == -1) { + WPRINTF(("open of tap device %s failed\n", tbuf)); + goto error; + } + + /* + * Set non-blocking and register for read + * notifications with the event loop + */ + if (ioctl(fd, FIONBIO, &opt) < 0) { + WPRINTF(("tap device O_NONBLOCK failed\n")); + goto error; + } + + priv->mevp = mevent_add(fd, EVF_READ, cb, param); + if (priv->mevp == NULL) { + WPRINTF(("Could not register event\n")); + goto error; + } + + be->fd = fd; + be->priv = priv; + + return 0; + +error: + tap_cleanup(be); + return -1; +} + +/* + * Called to send a buffer chain out to the tap device + */ +static int +tap_send(struct net_backend *be, struct iovec *iov, int iovcnt, uint32_t len, + int more) +{ + static char pad[60]; /* all zero bytes */ + + (void)more; + /* + * If the length is < 60, pad out to that and add the + * extra zero'd segment to the iov. It is guaranteed that + * there is always an extra iov available by the caller. + */ + if (len < 60) { + iov[iovcnt].iov_base = pad; + iov[iovcnt].iov_len = (size_t)(60 - len); + iovcnt++; + } + + return (int)writev(be->fd, iov, iovcnt); +} + +static int +tap_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + int ret; + + /* Should never be called without a valid tap fd */ + assert(be->fd != -1); + + ret = (int)readv(be->fd, iov, iovcnt); + + if (ret < 0 && errno == EWOULDBLOCK) { + return 0; + } + + return ret; +} + +static uint64_t +tap_get_cap(struct net_backend *be) +{ + (void)be; + return 0; // nothing extra +} + +static int +tap_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + (void)be; + return (features || vnet_hdr_len) ? -1 : 0; +} + +static struct net_backend tap_backend = { + .name = "tap|vmmnet", + .init = tap_init, + .cleanup = tap_cleanup, + .send = tap_send, + .recv = tap_recv, + .get_cap = tap_get_cap, + .set_cap = tap_set_cap, +}; + +DATA_SET(net_backend_s, tap_backend); + +#ifdef WITH_NETMAP + +/* + * The netmap backend + */ + +/* The virtio-net features supported by netmap. */ +#define NETMAP_FEATURES (VIRTIO_NET_F_CSUM | VIRTIO_NET_F_HOST_TSO4 | \ + VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_UFO | \ + VIRTIO_NET_F_GUEST_CSUM | VIRTIO_NET_F_GUEST_TSO4 | \ + VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_UFO) + +#define NETMAP_POLLMASK (POLLIN | POLLRDNORM | POLLRDBAND) + +struct netmap_priv { + char ifname[IFNAMSIZ]; + struct nm_desc *nmd; + uint16_t memid; + struct netmap_ring *rx; + struct netmap_ring *tx; + pthread_t evloop_tid; + net_backend_cb_t cb; + void *cb_param; + + struct ptnetmap_state ptnetmap; +}; + +static void * +netmap_evloop_thread(void *param) +{ + struct net_backend *be = param; + struct netmap_priv *priv = be->priv; + struct pollfd pfd; + int ret; + + for (;;) { + pfd.fd = be->fd; + pfd.events = NETMAP_POLLMASK; + ret = poll(&pfd, 1, INFTIM); + if (ret == -1 && errno != EINTR) { + WPRINTF(("netmap poll failed, %d\n", errno)); + } else if (ret == 1 && (pfd.revents & NETMAP_POLLMASK)) { + priv->cb(pfd.fd, EVF_READ, priv->cb_param); + } + } + + return NULL; +} + +static void +nmreq_init(struct nmreq *req, char *ifname) +{ + memset(req, 0, sizeof(*req)); + strncpy(req->nr_name, ifname, sizeof(req->nr_name)); + req->nr_version = NETMAP_API; +} + +static int +netmap_set_vnet_hdr_len(struct net_backend *be, int vnet_hdr_len) +{ + int err; + struct nmreq req; + struct netmap_priv *priv = be->priv; + + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_BDG_VNET_HDR; + req.nr_arg1 = vnet_hdr_len; + err = ioctl(be->fd, NIOCREGIF, &req); + if (err) { + WPRINTF(("Unable to set vnet header length %d\n", + vnet_hdr_len)); + return err; + } + + be->be_vnet_hdr_len = vnet_hdr_len; + + return 0; +} + +static int +netmap_has_vnet_hdr_len(struct net_backend *be, unsigned vnet_hdr_len) +{ + int prev_hdr_len = be->be_vnet_hdr_len; + int ret; + + if (vnet_hdr_len == prev_hdr_len) { + return 1; + } + + ret = netmap_set_vnet_hdr_len(be, vnet_hdr_len); + if (ret) { + return 0; + } + + netmap_set_vnet_hdr_len(be, prev_hdr_len); + + return 1; +} + +static uint64_t +netmap_get_cap(struct net_backend *be) +{ + return netmap_has_vnet_hdr_len(be, VNET_HDR_LEN) ? + NETMAP_FEATURES : 0; +} + +static int +netmap_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + return netmap_set_vnet_hdr_len(be, vnet_hdr_len); +} + +/* Store and return the features we agreed upon. */ +uint32_t +ptnetmap_ack_features(struct ptnetmap_state *ptn, uint32_t wanted_features) +{ + ptn->acked_features = ptn->features & wanted_features; + + return ptn->acked_features; +} + +struct ptnetmap_state * +get_ptnetmap(struct net_backend *be) +{ + struct netmap_priv *priv = be ? be->priv : NULL; + struct netmap_pools_info pi; + struct nmreq req; + int err; + + /* Check that this is a ptnetmap backend. */ + if (!be || be->set_cap != netmap_set_cap || + !(priv->nmd->req.nr_flags & NR_PTNETMAP_HOST)) { + return NULL; + } + + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_POOLS_INFO_GET; + nmreq_pointer_put(&req, &pi); + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + return NULL; + } + + err = ptn_memdev_attach(priv->nmd->mem, &pi); + if (err) { + return NULL; + } + + return &priv->ptnetmap; +} + +int +ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, struct netmap_if_info *nif) +{ + struct netmap_priv *priv = ptn->netmap_priv; + + memset(nif, 0, sizeof(*nif)); + if (priv->nmd == NULL) { + return EINVAL; + } + + nif->nifp_offset = priv->nmd->req.nr_offset; + nif->num_tx_rings = priv->nmd->req.nr_tx_rings; + nif->num_rx_rings = priv->nmd->req.nr_rx_rings; + nif->num_tx_slots = priv->nmd->req.nr_tx_slots; + nif->num_rx_slots = priv->nmd->req.nr_rx_slots; + + return 0; +} + +int +ptnetmap_get_hostmemid(struct ptnetmap_state *ptn) +{ + struct netmap_priv *priv = ptn->netmap_priv; + + if (priv->nmd == NULL) { + return EINVAL; + } + + return priv->memid; +} + +int +ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg) +{ + struct netmap_priv *priv = ptn->netmap_priv; + struct nmreq req; + int err; + + if (ptn->running) { + return 0; + } + + /* XXX We should stop the netmap evloop here. */ + + /* Ask netmap to create kthreads for this interface. */ + nmreq_init(&req, priv->ifname); + nmreq_pointer_put(&req, cfg); + req.nr_cmd = NETMAP_PT_HOST_CREATE; + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + fprintf(stderr, "%s: Unable to create ptnetmap kthreads on " + "%s [errno=%d]", __func__, priv->ifname, errno); + return err; + } + + ptn->running = 1; + + return 0; +} + +int +ptnetmap_delete(struct ptnetmap_state *ptn) +{ + struct netmap_priv *priv = ptn->netmap_priv; + struct nmreq req; + int err; + + if (!ptn->running) { + return 0; + } + + /* Ask netmap to delete kthreads for this interface. */ + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_PT_HOST_DELETE; + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + fprintf(stderr, "%s: Unable to create ptnetmap kthreads on " + "%s [errno=%d]", __func__, priv->ifname, errno); + return err; + } + + ptn->running = 0; + + return 0; +} + +static int +netmap_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + const char *ndname = "/dev/netmap"; + struct netmap_priv *priv = NULL; + struct nmreq req; + int ptnetmap = (cb == NULL); + + priv = calloc(1, sizeof(struct netmap_priv)); + if (priv == NULL) { + WPRINTF(("Unable alloc netmap private data\n")); + return -1; + } + + strncpy(priv->ifname, devname, sizeof(priv->ifname)); + priv->ifname[sizeof(priv->ifname) - 1] = '\0'; + + memset(&req, 0, sizeof(req)); + req.nr_flags = ptnetmap ? NR_PTNETMAP_HOST : 0; + + priv->nmd = nm_open(priv->ifname, &req, NETMAP_NO_TX_POLL, NULL); + if (priv->nmd == NULL) { + WPRINTF(("Unable to nm_open(): device '%s', " + "interface '%s', errno (%s)\n", + ndname, devname, strerror(errno))); + free(priv); + return -1; + } + + priv->memid = priv->nmd->req.nr_arg2; + priv->tx = NETMAP_TXRING(priv->nmd->nifp, 0); + priv->rx = NETMAP_RXRING(priv->nmd->nifp, 0); + priv->cb = cb; + priv->cb_param = param; + be->fd = priv->nmd->fd; + be->priv = priv; + + priv->ptnetmap.netmap_priv = priv; + priv->ptnetmap.features = 0; + priv->ptnetmap.acked_features = 0; + priv->ptnetmap.running = 0; + if (ptnetmap) { + if (netmap_has_vnet_hdr_len(be, VNET_HDR_LEN)) { + priv->ptnetmap.features |= PTNETMAP_F_VNET_HDR; + } + } else { + char tname[40]; + + /* Create a thread for netmap poll. */ + pthread_create(&priv->evloop_tid, NULL, netmap_evloop_thread, (void *)be); + snprintf(tname, sizeof(tname), "netmap-evloop-%p", priv); + pthread_set_name_np(priv->evloop_tid, tname); + } + + return 0; +} + +static void +netmap_cleanup(struct net_backend *be) +{ + struct netmap_priv *priv = be->priv; + + if (be->priv) { + if (priv->ptnetmap.running) { + ptnetmap_delete(&priv->ptnetmap); + } + nm_close(priv->nmd); + free(be->priv); + be->priv = NULL; + } + be->fd = -1; +} + +/* A fast copy routine only for multiples of 64 bytes, non overlapped. */ +static inline void +pkt_copy(const void *_src, void *_dst, int l) +{ + const uint64_t *src = _src; + uint64_t *dst = _dst; + if (l >= 1024) { + bcopy(src, dst, l); + return; + } + for (; l > 0; l -= 64) { + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + } +} + +static int +netmap_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t size, int more) +{ + struct netmap_priv *priv = be->priv; + struct netmap_ring *ring; + int nm_buf_size; + int nm_buf_len; + uint32_t head; + void *nm_buf; + int j; + + if (iovcnt <= 0 || size <= 0) { + D("Wrong iov: iovcnt %d size %d", iovcnt, size); + return 0; + } + + ring = priv->tx; + head = ring->head; + if (head == ring->tail) { + RD(1, "No space, drop %d bytes", size); + goto txsync; + } + nm_buf = NETMAP_BUF(ring, ring->slot[head].buf_idx); + nm_buf_size = ring->nr_buf_size; + nm_buf_len = 0; + + for (j = 0; j < iovcnt; j++) { + int iov_frag_size = iov[j].iov_len; + void *iov_frag_buf = iov[j].iov_base; + + /* Split each iovec fragment over more netmap slots, if + necessary. */ + for (;;) { + int copylen; + + copylen = iov_frag_size < nm_buf_size ? iov_frag_size : nm_buf_size; + pkt_copy(iov_frag_buf, nm_buf, copylen); + + iov_frag_buf += copylen; + iov_frag_size -= copylen; + nm_buf += copylen; + nm_buf_size -= copylen; + nm_buf_len += copylen; + + if (iov_frag_size == 0) { + break; + } + + ring->slot[head].len = nm_buf_len; + ring->slot[head].flags = NS_MOREFRAG; + head = nm_ring_next(ring, head); + if (head == ring->tail) { + /* We ran out of netmap slots while + * splitting the iovec fragments. */ + RD(1, "No space, drop %d bytes", size); + goto txsync; + } + nm_buf = NETMAP_BUF(ring, ring->slot[head].buf_idx); + nm_buf_size = ring->nr_buf_size; + nm_buf_len = 0; + } + } + + /* Complete the last slot, which must not have NS_MOREFRAG set. */ + ring->slot[head].len = nm_buf_len; + ring->slot[head].flags = 0; + head = nm_ring_next(ring, head); + + /* Now update ring->head and ring->cur. */ + ring->head = ring->cur = head; + + if (more) {// && nm_ring_space(ring) > 64 + return 0; + } +txsync: + ioctl(be->fd, NIOCTXSYNC, NULL); + + return 0; +} + +static int +netmap_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + struct netmap_priv *priv = be->priv; + struct netmap_slot *slot = NULL; + struct netmap_ring *ring; + void *iov_frag_buf; + int iov_frag_size; + int totlen = 0; + uint32_t head; + + assert(iovcnt); + + ring = priv->rx; + head = ring->head; + iov_frag_buf = iov->iov_base; + iov_frag_size = iov->iov_len; + + do { + int nm_buf_len; + void *nm_buf; + + if (head == ring->tail) { + return 0; + } + + slot = ring->slot + head; + nm_buf = NETMAP_BUF(ring, slot->buf_idx); + nm_buf_len = slot->len; + + for (;;) { + int copylen = nm_buf_len < iov_frag_size ? nm_buf_len : iov_frag_size; + + pkt_copy(nm_buf, iov_frag_buf, copylen); + nm_buf += copylen; + nm_buf_len -= copylen; + iov_frag_buf += copylen; + iov_frag_size -= copylen; + totlen += copylen; + + if (nm_buf_len == 0) { + break; + } + + iov++; + iovcnt--; + if (iovcnt == 0) { + /* No space to receive. */ + D("Short iov, drop %d bytes", totlen); + return -ENOSPC; + } + iov_frag_buf = iov->iov_base; + iov_frag_size = iov->iov_len; + } + + head = nm_ring_next(ring, head); + + } while (slot->flags & NS_MOREFRAG); + + /* Release slots to netmap. */ + ring->head = ring->cur = head; + + return totlen; +} + +static struct net_backend netmap_backend = { + .name = "netmap|vale", + .init = netmap_init, + .cleanup = netmap_cleanup, + .send = netmap_send, + .recv = netmap_recv, + .get_cap = netmap_get_cap, + .set_cap = netmap_set_cap, +}; + +DATA_SET(net_backend_s, netmap_backend); + +#endif /* WITH_NETMAP */ + +/* + * make sure a backend is properly initialized + */ +static void +netbe_fix(struct net_backend *be) +{ + if (be == NULL) + return; + if (be->name == NULL) { + fprintf(stderr, "missing name for %p\n", be); + be->name = "unnamed netbe"; + } + if (be->init == NULL) { + fprintf(stderr, "missing init for %p %s\n", be, be->name); + be->init = netbe_null_init; + } + if (be->cleanup == NULL) { + fprintf(stderr, "missing cleanup for %p %s\n", be, be->name); + be->cleanup = netbe_null_cleanup; + } + if (be->send == NULL) { + fprintf(stderr, "missing send for %p %s\n", be, be->name); + be->send = netbe_null_send; + } + if (be->recv == NULL) { + fprintf(stderr, "missing recv for %p %s\n", be, be->name); + be->recv = netbe_null_recv; + } + if (be->get_cap == NULL) { + fprintf(stderr, "missing get_cap for %p %s\n", + be, be->name); + be->get_cap = netbe_null_get_cap; + } + if (be->set_cap == NULL) { + fprintf(stderr, "missing set_cap for %p %s\n", + be, be->name); + be->set_cap = netbe_null_set_cap; + } +} + +/* + * keys is a set of prefixes separated by '|', + * return 1 if the leftmost part of name matches one prefix. + */ +static const char * +netbe_name_match(const char *keys, const char *name) +{ + const char *n = name, *good = keys; + char c; + + if (!keys || !name) + return NULL; + while ( (c = *keys++) ) { + if (c == '|') { /* reached the separator */ + if (good) + break; + /* prepare for new round */ + n = name; + good = keys; + } else if (good && c != *n++) { + good = NULL; /* drop till next keyword */ + } + } + return good; +} + +/* + * Initialize a backend and attach to the frontend. + * This is called during frontend initialization. + * devname is the backend-name as supplied on the command line, + * e.g. -s 2:0,frontend-name,backend-name[,other-args] + * cb is the receive callback supplied by the frontend, + * and it is invoked in the event loop when a receive + * event is generated in the hypervisor, + * param is a pointer to the frontend, and normally used as + * the argument for the callback. + */ +struct net_backend * +netbe_init(const char *devname, net_backend_cb_t cb, void *param) +{ + struct net_backend **pbe, *be, *tbe = NULL; + int err; + + /* + * Find the network backend depending on the user-provided + * device name. net_backend_s is built using a linker set. + */ + SET_FOREACH(pbe, net_backend_s) { + if (netbe_name_match((*pbe)->name, devname)) { + tbe = *pbe; + break; + } + } + if (tbe == NULL) + return NULL; /* or null backend ? */ + be = calloc(1, sizeof(*be)); + *be = *tbe; /* copy the template */ + netbe_fix(be); /* make sure we have all fields */ + be->fd = -1; + be->priv = NULL; + be->sc = param; + be->be_vnet_hdr_len = 0; + be->fe_vnet_hdr_len = 0; + + /* initialize the backend */ + err = be->init(be, devname, cb, param); + if (err) { + free(be); + be = NULL; + } + return be; +} + +void +netbe_cleanup(struct net_backend *be) +{ + if (be == NULL) + return; + be->cleanup(be); + free(be); +} + +uint64_t +netbe_get_cap(struct net_backend *be) +{ + if (be == NULL) + return 0; + return be->get_cap(be); +} + +int +netbe_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + int ret; + + if (be == NULL) + return 0; + + /* There are only three valid lengths. */ + if (vnet_hdr_len && vnet_hdr_len != VNET_HDR_LEN + && vnet_hdr_len != (VNET_HDR_LEN - sizeof(uint16_t))) + return -1; + + be->fe_vnet_hdr_len = vnet_hdr_len; + + ret = be->set_cap(be, features, vnet_hdr_len); + assert(be->be_vnet_hdr_len == 0 || + be->be_vnet_hdr_len == be->fe_vnet_hdr_len); + + return ret; +} + +static __inline struct iovec * +iov_trim(struct iovec *iov, int *iovcnt, unsigned int tlen) +{ + struct iovec *riov; + + /* XXX short-cut: assume first segment is >= tlen */ + assert(iov[0].iov_len >= tlen); + + iov[0].iov_len -= tlen; + if (iov[0].iov_len == 0) { + assert(*iovcnt > 1); + *iovcnt -= 1; + riov = &iov[1]; + } else { + iov[0].iov_base = (void *)((uintptr_t)iov[0].iov_base + tlen); + riov = &iov[0]; + } + + return (riov); +} + +void +netbe_send(struct net_backend *be, struct iovec *iov, int iovcnt, uint32_t len, + int more) +{ + if (be == NULL) + return; +#if 0 + int i; + D("sending iovcnt %d len %d iovec %p", iovcnt, len, iov); + for (i=0; i < iovcnt; i++) + D(" %3d: %4d %p", i, (int)iov[i].iov_len, iov[i].iov_base); +#endif + if (be->be_vnet_hdr_len != be->fe_vnet_hdr_len) { + /* Here we are sure be->be_vnet_hdr_len is 0. */ + iov = iov_trim(iov, &iovcnt, be->fe_vnet_hdr_len); + } + + be->send(be, iov, iovcnt, len, more); +} + +/* + * can return -1 in case of errors + */ +int +netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + unsigned int hlen = 0; /* length of prepended virtio-net header */ + int ret; + + if (be == NULL) + return -1; + + if (be->be_vnet_hdr_len != be->fe_vnet_hdr_len) { + struct virtio_net_rxhdr *vh; + + /* Here we are sure be->be_vnet_hdr_len is 0. */ + hlen = be->fe_vnet_hdr_len; + /* + * Get a pointer to the rx header, and use the + * data immediately following it for the packet buffer. + */ + vh = iov[0].iov_base; + iov = iov_trim(iov, &iovcnt, hlen); + + /* + * Here we are sure be->fe_vnet_hdr_len is 0. + * The only valid field in the rx packet header is the + * number of buffers if merged rx bufs were negotiated. + */ + memset(vh, 0, hlen); + + if (hlen == VNET_HDR_LEN) { + vh->vrh_bufs = 1; + } + } + + ret = be->recv(be, iov, iovcnt); + if (ret > 0) { + ret += hlen; + } + + return ret; +} + +/* + * Read a packet from the backend and discard it. + * Returns the size of the discarded packet or zero if no packet was available. + * A negative error code is returned in case of read error. + */ +int +netbe_rx_discard(struct net_backend *be) +{ + /* + * MP note: the dummybuf is only used to discard frames, + * so there is no need for it to be per-vtnet or locked. + * We only make it large enough for TSO-sized segment. + */ + static uint8_t dummybuf[65536+64]; + struct iovec iov; + + iov.iov_base = dummybuf; + iov.iov_len = sizeof(dummybuf); + + return netbe_recv(be, &iov, 1); +} + +/*- + * Copyright (c) 2014-2016 Vincenzo Maffione + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * This file implements multiple network backends (null, tap, netmap, ...), + * to be used by network frontends such as virtio-net and ptnet. + * The API to access the backend (e.g. send/receive packets, negotiate + * features) is exported by net_backends.h. + */ + +#include <sys/cdefs.h> +#include <sys/uio.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <sys/types.h> /* u_short etc */ +#include <net/if.h> + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <unistd.h> +#include <assert.h> +#include <pthread.h> +#include <pthread_np.h> +#include <poll.h> +#include <assert.h> + +#include "mevent.h" +#include "net_backends.h" + +#include <sys/linker_set.h> + +/* + * Each network backend registers a set of function pointers that are + * used to implement the net backends API. + * This might need to be exposed if we implement backends in separate files. + */ +struct net_backend { + const char *name; /* name of the backend */ + /* + * The init and cleanup functions are used internally, + * virtio-net should never use it. + */ + int (*init)(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param); + void (*cleanup)(struct net_backend *be); + + + /* + * Called to serve a guest transmit request. The scatter-gather + * vector provided by the caller has 'iovcnt' elements and contains + * the packet to send. 'len' is the length of whole packet in bytes. + */ + int (*send)(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); + + /* + * Called to serve guest receive request. When the function + * returns a positive value, the scatter-gather vector + * provided by the caller (having 'iovcnt' elements in it) will + * contain a chunk of the received packet. The 'more' flag will + * be set if the returned chunk was the last one for the current + * packet, and 0 otherwise. The function returns the chunk size + * in bytes, or 0 if the backend doesn't have a new packet to + * receive. + * Note that it may be necessary to call this callback many + * times to receive a single packet, depending of how big is + * buffers you provide. + */ + int (*recv)(struct net_backend *be, struct iovec *iov, int iovcnt); + + /* + * Ask the backend for the virtio-net features it is able to + * support. Possible features are TSO, UFO and checksum offloading + * in both rx and tx direction and for both IPv4 and IPv6. + */ + uint64_t (*get_cap)(struct net_backend *be); + + /* + * Tell the backend to enable/disable the specified virtio-net + * features (capabilities). + */ + int (*set_cap)(struct net_backend *be, uint64_t features, + unsigned int vnet_hdr_len); + + struct pci_vtnet_softc *sc; + int fd; + unsigned int be_vnet_hdr_len; + unsigned int fe_vnet_hdr_len; + void *priv; /* Pointer to backend-specific data. */ +}; + +SET_DECLARE(net_backend_s, struct net_backend); + +#define VNET_HDR_LEN sizeof(struct virtio_net_rxhdr) + +#define WPRINTF(params) printf params + +/* the null backend */ +static int +netbe_null_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + (void)devname; (void)cb; (void)param; + be->fd = -1; + return 0; +} + +static void +netbe_null_cleanup(struct net_backend *be) +{ + (void)be; +} + +static uint64_t +netbe_null_get_cap(struct net_backend *be) +{ + (void)be; + return 0; +} + +static int +netbe_null_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + (void)be; (void)features; (void)vnet_hdr_len; + return 0; +} + +static int +netbe_null_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more) +{ + (void)be; (void)iov; (void)iovcnt; (void)len; (void)more; + return 0; /* pretend we send */ +} + +static int +netbe_null_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + (void)be; (void)iov; (void)iovcnt; + fprintf(stderr, "netbe_null_recv called ?\n"); + return -1; /* never called, i believe */ +} + +static struct net_backend n_be = { + .name = "null", + .init = netbe_null_init, + .cleanup = netbe_null_cleanup, + .send = netbe_null_send, + .recv = netbe_null_recv, + .get_cap = netbe_null_get_cap, + .set_cap = netbe_null_set_cap, +}; + +DATA_SET(net_backend_s, n_be); + + +/* the tap backend */ + +struct tap_priv { + struct mevent *mevp; +}; + +static void +tap_cleanup(struct net_backend *be) +{ + struct tap_priv *priv = be->priv; + + if (be->priv) { + mevent_delete(priv->mevp); + free(be->priv); + be->priv = NULL; + } + if (be->fd != -1) { + close(be->fd); + be->fd = -1; + } +} + +static int +tap_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + char tbuf[80]; + int fd; + int opt = 1; + struct tap_priv *priv; + + if (cb == NULL) { + WPRINTF(("TAP backend requires non-NULL callback\n")); + return -1; + } + + priv = calloc(1, sizeof(struct tap_priv)); + if (priv == NULL) { + WPRINTF(("tap_priv alloc failed\n")); + return -1; + } + + strcpy(tbuf, "/dev/"); + strlcat(tbuf, devname, sizeof(tbuf)); + + fd = open(tbuf, O_RDWR); + if (fd == -1) { + WPRINTF(("open of tap device %s failed\n", tbuf)); + goto error; + } + + /* + * Set non-blocking and register for read + * notifications with the event loop + */ + if (ioctl(fd, FIONBIO, &opt) < 0) { + WPRINTF(("tap device O_NONBLOCK failed\n")); + goto error; + } + + priv->mevp = mevent_add(fd, EVF_READ, cb, param); + if (priv->mevp == NULL) { + WPRINTF(("Could not register event\n")); + goto error; + } + + be->fd = fd; + be->priv = priv; + + return 0; + +error: + tap_cleanup(be); + return -1; +} + +/* + * Called to send a buffer chain out to the tap device + */ +static int +tap_send(struct net_backend *be, struct iovec *iov, int iovcnt, uint32_t len, + int more) +{ + static char pad[60]; /* all zero bytes */ + + (void)more; + /* + * If the length is < 60, pad out to that and add the + * extra zero'd segment to the iov. It is guaranteed that + * there is always an extra iov available by the caller. + */ + if (len < 60) { + iov[iovcnt].iov_base = pad; + iov[iovcnt].iov_len = (size_t)(60 - len); + iovcnt++; + } + + return (int)writev(be->fd, iov, iovcnt); +} + +static int +tap_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + int ret; + + /* Should never be called without a valid tap fd */ + assert(be->fd != -1); + + ret = (int)readv(be->fd, iov, iovcnt); + + if (ret < 0 && errno == EWOULDBLOCK) { + return 0; + } + + return ret; +} + +static uint64_t +tap_get_cap(struct net_backend *be) +{ + (void)be; + return 0; // nothing extra +} + +static int +tap_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + (void)be; + return (features || vnet_hdr_len) ? -1 : 0; +} + +static struct net_backend tap_backend = { + .name = "tap|vmmnet", + .init = tap_init, + .cleanup = tap_cleanup, + .send = tap_send, + .recv = tap_recv, + .get_cap = tap_get_cap, + .set_cap = tap_set_cap, +}; + +DATA_SET(net_backend_s, tap_backend); + +#ifdef WITH_NETMAP + +/* + * The netmap backend + */ + +/* The virtio-net features supported by netmap. */ +#define NETMAP_FEATURES (VIRTIO_NET_F_CSUM | VIRTIO_NET_F_HOST_TSO4 | \ + VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_UFO | \ + VIRTIO_NET_F_GUEST_CSUM | VIRTIO_NET_F_GUEST_TSO4 | \ + VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_UFO) + +#define NETMAP_POLLMASK (POLLIN | POLLRDNORM | POLLRDBAND) + +struct netmap_priv { + char ifname[IFNAMSIZ]; + struct nm_desc *nmd; + uint16_t memid; + struct netmap_ring *rx; + struct netmap_ring *tx; + pthread_t evloop_tid; + net_backend_cb_t cb; + void *cb_param; + + struct ptnetmap_state ptnetmap; +}; + +static void * +netmap_evloop_thread(void *param) +{ + struct net_backend *be = param; + struct netmap_priv *priv = be->priv; + struct pollfd pfd; + int ret; + + for (;;) { + pfd.fd = be->fd; + pfd.events = NETMAP_POLLMASK; + ret = poll(&pfd, 1, INFTIM); + if (ret == -1 && errno != EINTR) { + WPRINTF(("netmap poll failed, %d\n", errno)); + } else if (ret == 1 && (pfd.revents & NETMAP_POLLMASK)) { + priv->cb(pfd.fd, EVF_READ, priv->cb_param); + } + } + + return NULL; +} + +static void +nmreq_init(struct nmreq *req, char *ifname) +{ + memset(req, 0, sizeof(*req)); + strncpy(req->nr_name, ifname, sizeof(req->nr_name)); + req->nr_version = NETMAP_API; +} + +static int +netmap_set_vnet_hdr_len(struct net_backend *be, int vnet_hdr_len) +{ + int err; + struct nmreq req; + struct netmap_priv *priv = be->priv; + + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_BDG_VNET_HDR; + req.nr_arg1 = vnet_hdr_len; + err = ioctl(be->fd, NIOCREGIF, &req); + if (err) { + WPRINTF(("Unable to set vnet header length %d\n", + vnet_hdr_len)); + return err; + } + + be->be_vnet_hdr_len = vnet_hdr_len; + + return 0; +} + +static int +netmap_has_vnet_hdr_len(struct net_backend *be, unsigned vnet_hdr_len) +{ + int prev_hdr_len = be->be_vnet_hdr_len; + int ret; + + if (vnet_hdr_len == prev_hdr_len) { + return 1; + } + + ret = netmap_set_vnet_hdr_len(be, vnet_hdr_len); + if (ret) { + return 0; + } + + netmap_set_vnet_hdr_len(be, prev_hdr_len); + + return 1; +} + +static uint64_t +netmap_get_cap(struct net_backend *be) +{ + return netmap_has_vnet_hdr_len(be, VNET_HDR_LEN) ? + NETMAP_FEATURES : 0; +} + +static int +netmap_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + return netmap_set_vnet_hdr_len(be, vnet_hdr_len); +} + +/* Store and return the features we agreed upon. */ +uint32_t +ptnetmap_ack_features(struct ptnetmap_state *ptn, uint32_t wanted_features) +{ + ptn->acked_features = ptn->features & wanted_features; + + return ptn->acked_features; +} + +struct ptnetmap_state * +get_ptnetmap(struct net_backend *be) +{ + struct netmap_priv *priv = be ? be->priv : NULL; + struct netmap_pools_info pi; + struct nmreq req; + int err; + + /* Check that this is a ptnetmap backend. */ + if (!be || be->set_cap != netmap_set_cap || + !(priv->nmd->req.nr_flags & NR_PTNETMAP_HOST)) { + return NULL; + } + + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_POOLS_INFO_GET; + nmreq_pointer_put(&req, &pi); + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + return NULL; + } + + err = ptn_memdev_attach(priv->nmd->mem, &pi); + if (err) { + return NULL; + } + + return &priv->ptnetmap; +} + +int +ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, struct netmap_if_info *nif) +{ + struct netmap_priv *priv = ptn->netmap_priv; + + memset(nif, 0, sizeof(*nif)); + if (priv->nmd == NULL) { + return EINVAL; + } + + nif->nifp_offset = priv->nmd->req.nr_offset; + nif->num_tx_rings = priv->nmd->req.nr_tx_rings; + nif->num_rx_rings = priv->nmd->req.nr_rx_rings; + nif->num_tx_slots = priv->nmd->req.nr_tx_slots; + nif->num_rx_slots = priv->nmd->req.nr_rx_slots; + + return 0; +} + +int +ptnetmap_get_hostmemid(struct ptnetmap_state *ptn) +{ + struct netmap_priv *priv = ptn->netmap_priv; + + if (priv->nmd == NULL) { + return EINVAL; + } + + return priv->memid; +} + +int +ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg) +{ + struct netmap_priv *priv = ptn->netmap_priv; + struct nmreq req; + int err; + + if (ptn->running) { + return 0; + } + + /* XXX We should stop the netmap evloop here. */ + + /* Ask netmap to create kthreads for this interface. */ + nmreq_init(&req, priv->ifname); + nmreq_pointer_put(&req, cfg); + req.nr_cmd = NETMAP_PT_HOST_CREATE; + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + fprintf(stderr, "%s: Unable to create ptnetmap kthreads on " + "%s [errno=%d]", __func__, priv->ifname, errno); + return err; + } + + ptn->running = 1; + + return 0; +} + +int +ptnetmap_delete(struct ptnetmap_state *ptn) +{ + struct netmap_priv *priv = ptn->netmap_priv; + struct nmreq req; + int err; + + if (!ptn->running) { + return 0; + } + + /* Ask netmap to delete kthreads for this interface. */ + nmreq_init(&req, priv->ifname); + req.nr_cmd = NETMAP_PT_HOST_DELETE; + err = ioctl(priv->nmd->fd, NIOCREGIF, &req); + if (err) { + fprintf(stderr, "%s: Unable to create ptnetmap kthreads on " + "%s [errno=%d]", __func__, priv->ifname, errno); + return err; + } + + ptn->running = 0; + + return 0; +} + +static int +netmap_init(struct net_backend *be, const char *devname, + net_backend_cb_t cb, void *param) +{ + const char *ndname = "/dev/netmap"; + struct netmap_priv *priv = NULL; + struct nmreq req; + int ptnetmap = (cb == NULL); + + priv = calloc(1, sizeof(struct netmap_priv)); + if (priv == NULL) { + WPRINTF(("Unable alloc netmap private data\n")); + return -1; + } + + strncpy(priv->ifname, devname, sizeof(priv->ifname)); + priv->ifname[sizeof(priv->ifname) - 1] = '\0'; + + memset(&req, 0, sizeof(req)); + req.nr_flags = ptnetmap ? NR_PTNETMAP_HOST : 0; + + priv->nmd = nm_open(priv->ifname, &req, NETMAP_NO_TX_POLL, NULL); + if (priv->nmd == NULL) { + WPRINTF(("Unable to nm_open(): device '%s', " + "interface '%s', errno (%s)\n", + ndname, devname, strerror(errno))); + free(priv); + return -1; + } + + priv->memid = priv->nmd->req.nr_arg2; + priv->tx = NETMAP_TXRING(priv->nmd->nifp, 0); + priv->rx = NETMAP_RXRING(priv->nmd->nifp, 0); + priv->cb = cb; + priv->cb_param = param; + be->fd = priv->nmd->fd; + be->priv = priv; + + priv->ptnetmap.netmap_priv = priv; + priv->ptnetmap.features = 0; + priv->ptnetmap.acked_features = 0; + priv->ptnetmap.running = 0; + if (ptnetmap) { + if (netmap_has_vnet_hdr_len(be, VNET_HDR_LEN)) { + priv->ptnetmap.features |= PTNETMAP_F_VNET_HDR; + } + } else { + char tname[40]; + + /* Create a thread for netmap poll. */ + pthread_create(&priv->evloop_tid, NULL, netmap_evloop_thread, (void *)be); + snprintf(tname, sizeof(tname), "netmap-evloop-%p", priv); + pthread_set_name_np(priv->evloop_tid, tname); + } + + return 0; +} + +static void +netmap_cleanup(struct net_backend *be) +{ + struct netmap_priv *priv = be->priv; + + if (be->priv) { + if (priv->ptnetmap.running) { + ptnetmap_delete(&priv->ptnetmap); + } + nm_close(priv->nmd); + free(be->priv); + be->priv = NULL; + } + be->fd = -1; +} + +/* A fast copy routine only for multiples of 64 bytes, non overlapped. */ +static inline void +pkt_copy(const void *_src, void *_dst, int l) +{ + const uint64_t *src = _src; + uint64_t *dst = _dst; + if (l >= 1024) { + bcopy(src, dst, l); + return; + } + for (; l > 0; l -= 64) { + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + *dst++ = *src++; + } +} + +static int +netmap_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t size, int more) +{ + struct netmap_priv *priv = be->priv; + struct netmap_ring *ring; + int nm_buf_size; + int nm_buf_len; + uint32_t head; + void *nm_buf; + int j; + + if (iovcnt <= 0 || size <= 0) { + D("Wrong iov: iovcnt %d size %d", iovcnt, size); + return 0; + } + + ring = priv->tx; + head = ring->head; + if (head == ring->tail) { + RD(1, "No space, drop %d bytes", size); + goto txsync; + } + nm_buf = NETMAP_BUF(ring, ring->slot[head].buf_idx); + nm_buf_size = ring->nr_buf_size; + nm_buf_len = 0; + + for (j = 0; j < iovcnt; j++) { + int iov_frag_size = iov[j].iov_len; + void *iov_frag_buf = iov[j].iov_base; + + /* Split each iovec fragment over more netmap slots, if + necessary. */ + for (;;) { + int copylen; + + copylen = iov_frag_size < nm_buf_size ? iov_frag_size : nm_buf_size; + pkt_copy(iov_frag_buf, nm_buf, copylen); + + iov_frag_buf += copylen; + iov_frag_size -= copylen; + nm_buf += copylen; + nm_buf_size -= copylen; + nm_buf_len += copylen; + + if (iov_frag_size == 0) { + break; + } + + ring->slot[head].len = nm_buf_len; + ring->slot[head].flags = NS_MOREFRAG; + head = nm_ring_next(ring, head); + if (head == ring->tail) { + /* We ran out of netmap slots while + * splitting the iovec fragments. */ + RD(1, "No space, drop %d bytes", size); + goto txsync; + } + nm_buf = NETMAP_BUF(ring, ring->slot[head].buf_idx); + nm_buf_size = ring->nr_buf_size; + nm_buf_len = 0; + } + } + + /* Complete the last slot, which must not have NS_MOREFRAG set. */ + ring->slot[head].len = nm_buf_len; + ring->slot[head].flags = 0; + head = nm_ring_next(ring, head); + + /* Now update ring->head and ring->cur. */ + ring->head = ring->cur = head; + + if (more) {// && nm_ring_space(ring) > 64 + return 0; + } +txsync: + ioctl(be->fd, NIOCTXSYNC, NULL); + + return 0; +} + +static int +netmap_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + struct netmap_priv *priv = be->priv; + struct netmap_slot *slot = NULL; + struct netmap_ring *ring; + void *iov_frag_buf; + int iov_frag_size; + int totlen = 0; + uint32_t head; + + assert(iovcnt); + + ring = priv->rx; + head = ring->head; + iov_frag_buf = iov->iov_base; + iov_frag_size = iov->iov_len; + + do { + int nm_buf_len; + void *nm_buf; + + if (head == ring->tail) { + return 0; + } + + slot = ring->slot + head; + nm_buf = NETMAP_BUF(ring, slot->buf_idx); + nm_buf_len = slot->len; + + for (;;) { + int copylen = nm_buf_len < iov_frag_size ? nm_buf_len : iov_frag_size; + + pkt_copy(nm_buf, iov_frag_buf, copylen); + nm_buf += copylen; + nm_buf_len -= copylen; + iov_frag_buf += copylen; + iov_frag_size -= copylen; + totlen += copylen; + + if (nm_buf_len == 0) { + break; + } + + iov++; + iovcnt--; + if (iovcnt == 0) { + /* No space to receive. */ + D("Short iov, drop %d bytes", totlen); + return -ENOSPC; + } + iov_frag_buf = iov->iov_base; + iov_frag_size = iov->iov_len; + } + + head = nm_ring_next(ring, head); + + } while (slot->flags & NS_MOREFRAG); + + /* Release slots to netmap. */ + ring->head = ring->cur = head; + + return totlen; +} + +static struct net_backend netmap_backend = { + .name = "netmap|vale", + .init = netmap_init, + .cleanup = netmap_cleanup, + .send = netmap_send, + .recv = netmap_recv, + .get_cap = netmap_get_cap, + .set_cap = netmap_set_cap, +}; + +DATA_SET(net_backend_s, netmap_backend); + +#endif /* WITH_NETMAP */ + +/* + * make sure a backend is properly initialized + */ +static void +netbe_fix(struct net_backend *be) +{ + if (be == NULL) + return; + if (be->name == NULL) { + fprintf(stderr, "missing name for %p\n", be); + be->name = "unnamed netbe"; + } + if (be->init == NULL) { + fprintf(stderr, "missing init for %p %s\n", be, be->name); + be->init = netbe_null_init; + } + if (be->cleanup == NULL) { + fprintf(stderr, "missing cleanup for %p %s\n", be, be->name); + be->cleanup = netbe_null_cleanup; + } + if (be->send == NULL) { + fprintf(stderr, "missing send for %p %s\n", be, be->name); + be->send = netbe_null_send; + } + if (be->recv == NULL) { + fprintf(stderr, "missing recv for %p %s\n", be, be->name); + be->recv = netbe_null_recv; + } + if (be->get_cap == NULL) { + fprintf(stderr, "missing get_cap for %p %s\n", + be, be->name); + be->get_cap = netbe_null_get_cap; + } + if (be->set_cap == NULL) { + fprintf(stderr, "missing set_cap for %p %s\n", + be, be->name); + be->set_cap = netbe_null_set_cap; + } +} + +/* + * keys is a set of prefixes separated by '|', + * return 1 if the leftmost part of name matches one prefix. + */ +static const char * +netbe_name_match(const char *keys, const char *name) +{ + const char *n = name, *good = keys; + char c; + + if (!keys || !name) + return NULL; + while ( (c = *keys++) ) { + if (c == '|') { /* reached the separator */ + if (good) + break; + /* prepare for new round */ + n = name; + good = keys; + } else if (good && c != *n++) { + good = NULL; /* drop till next keyword */ + } + } + return good; +} + +/* + * Initialize a backend and attach to the frontend. + * This is called during frontend initialization. + * devname is the backend-name as supplied on the command line, + * e.g. -s 2:0,frontend-name,backend-name[,other-args] + * cb is the receive callback supplied by the frontend, + * and it is invoked in the event loop when a receive + * event is generated in the hypervisor, + * param is a pointer to the frontend, and normally used as + * the argument for the callback. + */ +struct net_backend * +netbe_init(const char *devname, net_backend_cb_t cb, void *param) +{ + struct net_backend **pbe, *be, *tbe = NULL; + int err; + + /* + * Find the network backend depending on the user-provided + * device name. net_backend_s is built using a linker set. + */ + SET_FOREACH(pbe, net_backend_s) { + if (netbe_name_match((*pbe)->name, devname)) { + tbe = *pbe; + break; + } + } + if (tbe == NULL) + return NULL; /* or null backend ? */ + be = calloc(1, sizeof(*be)); + *be = *tbe; /* copy the template */ + netbe_fix(be); /* make sure we have all fields */ + be->fd = -1; + be->priv = NULL; + be->sc = param; + be->be_vnet_hdr_len = 0; + be->fe_vnet_hdr_len = 0; + + /* initialize the backend */ + err = be->init(be, devname, cb, param); + if (err) { + free(be); + be = NULL; + } + return be; +} + +void +netbe_cleanup(struct net_backend *be) +{ + if (be == NULL) + return; + be->cleanup(be); + free(be); +} + +uint64_t +netbe_get_cap(struct net_backend *be) +{ + if (be == NULL) + return 0; + return be->get_cap(be); +} + +int +netbe_set_cap(struct net_backend *be, uint64_t features, + unsigned vnet_hdr_len) +{ + int ret; + + if (be == NULL) + return 0; + + /* There are only three valid lengths. */ + if (vnet_hdr_len && vnet_hdr_len != VNET_HDR_LEN + && vnet_hdr_len != (VNET_HDR_LEN - sizeof(uint16_t))) + return -1; + + be->fe_vnet_hdr_len = vnet_hdr_len; + + ret = be->set_cap(be, features, vnet_hdr_len); + assert(be->be_vnet_hdr_len == 0 || + be->be_vnet_hdr_len == be->fe_vnet_hdr_len); + + return ret; +} + +static __inline struct iovec * +iov_trim(struct iovec *iov, int *iovcnt, unsigned int tlen) +{ + struct iovec *riov; + + /* XXX short-cut: assume first segment is >= tlen */ + assert(iov[0].iov_len >= tlen); + + iov[0].iov_len -= tlen; + if (iov[0].iov_len == 0) { + assert(*iovcnt > 1); + *iovcnt -= 1; + riov = &iov[1]; + } else { + iov[0].iov_base = (void *)((uintptr_t)iov[0].iov_base + tlen); + riov = &iov[0]; + } + + return (riov); +} + +void +netbe_send(struct net_backend *be, struct iovec *iov, int iovcnt, uint32_t len, + int more) +{ + if (be == NULL) + return; +#if 0 + int i; + D("sending iovcnt %d len %d iovec %p", iovcnt, len, iov); + for (i=0; i < iovcnt; i++) + D(" %3d: %4d %p", i, (int)iov[i].iov_len, iov[i].iov_base); +#endif + if (be->be_vnet_hdr_len != be->fe_vnet_hdr_len) { + /* Here we are sure be->be_vnet_hdr_len is 0. */ + iov = iov_trim(iov, &iovcnt, be->fe_vnet_hdr_len); + } + + be->send(be, iov, iovcnt, len, more); +} + +/* + * can return -1 in case of errors + */ +int +netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt) +{ + unsigned int hlen = 0; /* length of prepended virtio-net header */ + int ret; + + if (be == NULL) + return -1; + + if (be->be_vnet_hdr_len != be->fe_vnet_hdr_len) { + struct virtio_net_rxhdr *vh; + + /* Here we are sure be->be_vnet_hdr_len is 0. */ + hlen = be->fe_vnet_hdr_len; + /* + * Get a pointer to the rx header, and use the + * data immediately following it for the packet buffer. + */ + vh = iov[0].iov_base; + iov = iov_trim(iov, &iovcnt, hlen); + + /* + * Here we are sure be->fe_vnet_hdr_len is 0. + * The only valid field in the rx packet header is the + * number of buffers if merged rx bufs were negotiated. + */ + memset(vh, 0, hlen); + + if (hlen == VNET_HDR_LEN) { + vh->vrh_bufs = 1; + } + } + + ret = be->recv(be, iov, iovcnt); + if (ret > 0) { + ret += hlen; + } + + return ret; +} + +/* + * Read a packet from the backend and discard it. + * Returns the size of the discarded packet or zero if no packet was available. + * A negative error code is returned in case of read error. + */ +int +netbe_rx_discard(struct net_backend *be) +{ + /* + * MP note: the dummybuf is only used to discard frames, + * so there is no need for it to be per-vtnet or locked. + * We only make it large enough for TSO-sized segment. + */ + static uint8_t dummybuf[65536+64]; + struct iovec iov; + + iov.iov_base = dummybuf; + iov.iov_len = sizeof(dummybuf); + + return netbe_recv(be, &iov, 1); +} + diff -u -r -N usr/src/usr.sbin/bhyve/net_backends.h /usr/src/usr.sbin/bhyve/net_backends.h --- usr/src/usr.sbin/bhyve/net_backends.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_backends.h 2016-11-30 10:56:05.841958000 +0000 @@ -0,0 +1,432 @@ +/*- + * Copyright (c) 2014 Vincenzo Maffione <v.maffione@gmail.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __NET_BACKENDS_H__ +#define __NET_BACKENDS_H__ + +#include <stdint.h> + +#ifdef WITH_NETMAP +#include <net/netmap.h> +#include <net/netmap_virt.h> +#define NETMAP_WITH_LIBS +#include <net/netmap_user.h> +#if (NETMAP_API < 11) +#error "Netmap API version must be >= 11" +#endif +#endif /* WITH_NETMAP */ + +#include "mevent.h" + +extern int netmap_ioctl_counter; + +typedef void (*net_backend_cb_t)(int, enum ev_type, void *param); + +/* Interface between virtio-net and the network backend. */ +struct net_backend; + +struct net_backend *netbe_init(const char *devname, + net_backend_cb_t cb, void *param); +void netbe_cleanup(struct net_backend *be); +uint64_t netbe_get_cap(struct net_backend *be); +int netbe_set_cap(struct net_backend *be, uint64_t cap, + unsigned vnet_hdr_len); +void netbe_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); +int netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt); +int netbe_rx_discard(struct net_backend *be); + + +/* + * Network device capabilities taken from VirtIO standard. + * Despite the name, these capabilities can be used by different frontents + * (virtio-net, ptnet) and supported by different backends (netmap, tap, ...). + */ +#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ +#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ +#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ +#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ +#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ +#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ +#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ +#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ +#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ +#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ +#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ +#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ +#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ +#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ +#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE \ + (1 << 21) /* guest can send gratuitous pkts */ + +/* + * Fixed network header size + */ +struct virtio_net_rxhdr { + uint8_t vrh_flags; + uint8_t vrh_gso_type; + uint16_t vrh_hdr_len; + uint16_t vrh_gso_size; + uint16_t vrh_csum_start; + uint16_t vrh_csum_offset; + uint16_t vrh_bufs; +} __packed; + +/* + * ptnetmap definitions + */ +struct ptnetmap_state { + void *netmap_priv; + + /* True if ptnetmap kthreads are running. */ + int running; + + /* Feature acknoweledgement support. */ + unsigned long features; + unsigned long acked_features; + + /* Info about netmap memory. */ + uint32_t memsize; + void *mem; +}; + +#ifdef WITH_NETMAP +/* Used to get read-only info. */ +struct netmap_if_info { + uint32_t nifp_offset; + uint16_t num_tx_rings; + uint16_t num_rx_rings; + uint16_t num_tx_slots; + uint16_t num_rx_slots; +}; + +int ptn_memdev_attach(void *mem_ptr, struct netmap_pools_info *); +int ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, + struct netmap_if_info *nif); +struct ptnetmap_state * get_ptnetmap(struct net_backend *be); +uint32_t ptnetmap_ack_features(struct ptnetmap_state *ptn, + uint32_t wanted_features); +int ptnetmap_get_hostmemid(struct ptnetmap_state *ptn); +int ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg); +int ptnetmap_delete(struct ptnetmap_state *ptn); +#endif /* WITH_NETMAP */ + +#include "pci_emul.h" +int net_parsemac(char *mac_str, uint8_t *mac_addr); +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); + +#endif /* __NET_BACKENDS_H__ */ +/*- + * Copyright (c) 2014 Vincenzo Maffione <v.maffione@gmail.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __NET_BACKENDS_H__ +#define __NET_BACKENDS_H__ + +#include <stdint.h> + +#ifdef WITH_NETMAP +#include <net/netmap.h> +#include <net/netmap_virt.h> +#define NETMAP_WITH_LIBS +#include <net/netmap_user.h> +#if (NETMAP_API < 11) +#error "Netmap API version must be >= 11" +#endif +#endif /* WITH_NETMAP */ + +#include "mevent.h" + +extern int netmap_ioctl_counter; + +typedef void (*net_backend_cb_t)(int, enum ev_type, void *param); + +/* Interface between virtio-net and the network backend. */ +struct net_backend; + +struct net_backend *netbe_init(const char *devname, + net_backend_cb_t cb, void *param); +void netbe_cleanup(struct net_backend *be); +uint64_t netbe_get_cap(struct net_backend *be); +int netbe_set_cap(struct net_backend *be, uint64_t cap, + unsigned vnet_hdr_len); +void netbe_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); +int netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt); +int netbe_rx_discard(struct net_backend *be); + + +/* + * Network device capabilities taken from VirtIO standard. + * Despite the name, these capabilities can be used by different frontents + * (virtio-net, ptnet) and supported by different backends (netmap, tap, ...). + */ +#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ +#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ +#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ +#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ +#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ +#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ +#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ +#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ +#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ +#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ +#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ +#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ +#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ +#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ +#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE \ + (1 << 21) /* guest can send gratuitous pkts */ + +/* + * Fixed network header size + */ +struct virtio_net_rxhdr { + uint8_t vrh_flags; + uint8_t vrh_gso_type; + uint16_t vrh_hdr_len; + uint16_t vrh_gso_size; + uint16_t vrh_csum_start; + uint16_t vrh_csum_offset; + uint16_t vrh_bufs; +} __packed; + +/* + * ptnetmap definitions + */ +struct ptnetmap_state { + void *netmap_priv; + + /* True if ptnetmap kthreads are running. */ + int running; + + /* Feature acknoweledgement support. */ + unsigned long features; + unsigned long acked_features; + + /* Info about netmap memory. */ + uint32_t memsize; + void *mem; +}; + +#ifdef WITH_NETMAP +/* Used to get read-only info. */ +struct netmap_if_info { + uint32_t nifp_offset; + uint16_t num_tx_rings; + uint16_t num_rx_rings; + uint16_t num_tx_slots; + uint16_t num_rx_slots; +}; + +int ptn_memdev_attach(void *mem_ptr, struct netmap_pools_info *); +int ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, + struct netmap_if_info *nif); +struct ptnetmap_state * get_ptnetmap(struct net_backend *be); +uint32_t ptnetmap_ack_features(struct ptnetmap_state *ptn, + uint32_t wanted_features); +int ptnetmap_get_hostmemid(struct ptnetmap_state *ptn); +int ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg); +int ptnetmap_delete(struct ptnetmap_state *ptn); +#endif /* WITH_NETMAP */ + +#include "pci_emul.h" +int net_parsemac(char *mac_str, uint8_t *mac_addr); +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); + +#endif /* __NET_BACKENDS_H__ */ +/*- + * Copyright (c) 2014 Vincenzo Maffione <v.maffione@gmail.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __NET_BACKENDS_H__ +#define __NET_BACKENDS_H__ + +#include <stdint.h> + +#ifdef WITH_NETMAP +#include <net/netmap.h> +#include <net/netmap_virt.h> +#define NETMAP_WITH_LIBS +#include <net/netmap_user.h> +#if (NETMAP_API < 11) +#error "Netmap API version must be >= 11" +#endif +#endif /* WITH_NETMAP */ + +#include "mevent.h" + +extern int netmap_ioctl_counter; + +typedef void (*net_backend_cb_t)(int, enum ev_type, void *param); + +/* Interface between virtio-net and the network backend. */ +struct net_backend; + +struct net_backend *netbe_init(const char *devname, + net_backend_cb_t cb, void *param); +void netbe_cleanup(struct net_backend *be); +uint64_t netbe_get_cap(struct net_backend *be); +int netbe_set_cap(struct net_backend *be, uint64_t cap, + unsigned vnet_hdr_len); +void netbe_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); +int netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt); +int netbe_rx_discard(struct net_backend *be); + + +/* + * Network device capabilities taken from VirtIO standard. + * Despite the name, these capabilities can be used by different frontents + * (virtio-net, ptnet) and supported by different backends (netmap, tap, ...). + */ +#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ +#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ +#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ +#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ +#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ +#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ +#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ +#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ +#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ +#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ +#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ +#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ +#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ +#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ +#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE \ + (1 << 21) /* guest can send gratuitous pkts */ + +/* + * Fixed network header size + */ +struct virtio_net_rxhdr { + uint8_t vrh_flags; + uint8_t vrh_gso_type; + uint16_t vrh_hdr_len; + uint16_t vrh_gso_size; + uint16_t vrh_csum_start; + uint16_t vrh_csum_offset; + uint16_t vrh_bufs; +} __packed; + +/* + * ptnetmap definitions + */ +struct ptnetmap_state { + void *netmap_priv; + + /* True if ptnetmap kthreads are running. */ + int running; + + /* Feature acknoweledgement support. */ + unsigned long features; + unsigned long acked_features; + + /* Info about netmap memory. */ + uint32_t memsize; + void *mem; +}; + +#ifdef WITH_NETMAP +/* Used to get read-only info. */ +struct netmap_if_info { + uint32_t nifp_offset; + uint16_t num_tx_rings; + uint16_t num_rx_rings; + uint16_t num_tx_slots; + uint16_t num_rx_slots; +}; + +int ptn_memdev_attach(void *mem_ptr, struct netmap_pools_info *); +int ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, + struct netmap_if_info *nif); +struct ptnetmap_state * get_ptnetmap(struct net_backend *be); +uint32_t ptnetmap_ack_features(struct ptnetmap_state *ptn, + uint32_t wanted_features); +int ptnetmap_get_hostmemid(struct ptnetmap_state *ptn); +int ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg); +int ptnetmap_delete(struct ptnetmap_state *ptn); +#endif /* WITH_NETMAP */ + +#include "pci_emul.h" +int net_parsemac(char *mac_str, uint8_t *mac_addr); +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); + +#endif /* __NET_BACKENDS_H__ */ diff -u -r -N usr/src/usr.sbin/bhyve/net_backends.h.orig /usr/src/usr.sbin/bhyve/net_backends.h.orig --- usr/src/usr.sbin/bhyve/net_backends.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_backends.h.orig 2016-11-30 10:52:59.917477000 +0000 @@ -0,0 +1,288 @@ +/*- + * Copyright (c) 2014 Vincenzo Maffione <v.maffione@gmail.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __NET_BACKENDS_H__ +#define __NET_BACKENDS_H__ + +#include <stdint.h> + +#ifdef WITH_NETMAP +#include <net/netmap.h> +#include <net/netmap_virt.h> +#define NETMAP_WITH_LIBS +#include <net/netmap_user.h> +#if (NETMAP_API < 11) +#error "Netmap API version must be >= 11" +#endif +#endif /* WITH_NETMAP */ + +#include "mevent.h" + +extern int netmap_ioctl_counter; + +typedef void (*net_backend_cb_t)(int, enum ev_type, void *param); + +/* Interface between virtio-net and the network backend. */ +struct net_backend; + +struct net_backend *netbe_init(const char *devname, + net_backend_cb_t cb, void *param); +void netbe_cleanup(struct net_backend *be); +uint64_t netbe_get_cap(struct net_backend *be); +int netbe_set_cap(struct net_backend *be, uint64_t cap, + unsigned vnet_hdr_len); +void netbe_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); +int netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt); +int netbe_rx_discard(struct net_backend *be); + + +/* + * Network device capabilities taken from VirtIO standard. + * Despite the name, these capabilities can be used by different frontents + * (virtio-net, ptnet) and supported by different backends (netmap, tap, ...). + */ +#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ +#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ +#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ +#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ +#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ +#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ +#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ +#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ +#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ +#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ +#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ +#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ +#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ +#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ +#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE \ + (1 << 21) /* guest can send gratuitous pkts */ + +/* + * Fixed network header size + */ +struct virtio_net_rxhdr { + uint8_t vrh_flags; + uint8_t vrh_gso_type; + uint16_t vrh_hdr_len; + uint16_t vrh_gso_size; + uint16_t vrh_csum_start; + uint16_t vrh_csum_offset; + uint16_t vrh_bufs; +} __packed; + +/* + * ptnetmap definitions + */ +struct ptnetmap_state { + void *netmap_priv; + + /* True if ptnetmap kthreads are running. */ + int running; + + /* Feature acknoweledgement support. */ + unsigned long features; + unsigned long acked_features; + + /* Info about netmap memory. */ + uint32_t memsize; + void *mem; +}; + +#ifdef WITH_NETMAP +/* Used to get read-only info. */ +struct netmap_if_info { + uint32_t nifp_offset; + uint16_t num_tx_rings; + uint16_t num_rx_rings; + uint16_t num_tx_slots; + uint16_t num_rx_slots; +}; + +int ptn_memdev_attach(void *mem_ptr, struct netmap_pools_info *); +int ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, + struct netmap_if_info *nif); +struct ptnetmap_state * get_ptnetmap(struct net_backend *be); +uint32_t ptnetmap_ack_features(struct ptnetmap_state *ptn, + uint32_t wanted_features); +int ptnetmap_get_hostmemid(struct ptnetmap_state *ptn); +int ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg); +int ptnetmap_delete(struct ptnetmap_state *ptn); +#endif /* WITH_NETMAP */ + +#include "pci_emul.h" +int net_parsemac(char *mac_str, uint8_t *mac_addr); +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); + +#endif /* __NET_BACKENDS_H__ */ +/*- + * Copyright (c) 2014 Vincenzo Maffione <v.maffione@gmail.com> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __NET_BACKENDS_H__ +#define __NET_BACKENDS_H__ + +#include <stdint.h> + +#ifdef WITH_NETMAP +#include <net/netmap.h> +#include <net/netmap_virt.h> +#define NETMAP_WITH_LIBS +#include <net/netmap_user.h> +#if (NETMAP_API < 11) +#error "Netmap API version must be >= 11" +#endif +#endif /* WITH_NETMAP */ + +#include "mevent.h" + +extern int netmap_ioctl_counter; + +typedef void (*net_backend_cb_t)(int, enum ev_type, void *param); + +/* Interface between virtio-net and the network backend. */ +struct net_backend; + +struct net_backend *netbe_init(const char *devname, + net_backend_cb_t cb, void *param); +void netbe_cleanup(struct net_backend *be); +uint64_t netbe_get_cap(struct net_backend *be); +int netbe_set_cap(struct net_backend *be, uint64_t cap, + unsigned vnet_hdr_len); +void netbe_send(struct net_backend *be, struct iovec *iov, + int iovcnt, uint32_t len, int more); +int netbe_recv(struct net_backend *be, struct iovec *iov, int iovcnt); +int netbe_rx_discard(struct net_backend *be); + + +/* + * Network device capabilities taken from VirtIO standard. + * Despite the name, these capabilities can be used by different frontents + * (virtio-net, ptnet) and supported by different backends (netmap, tap, ...). + */ +#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ +#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ +#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ +#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ +#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ +#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ +#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ +#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ +#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ +#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ +#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ +#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ +#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ +#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ +#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE \ + (1 << 21) /* guest can send gratuitous pkts */ + +/* + * Fixed network header size + */ +struct virtio_net_rxhdr { + uint8_t vrh_flags; + uint8_t vrh_gso_type; + uint16_t vrh_hdr_len; + uint16_t vrh_gso_size; + uint16_t vrh_csum_start; + uint16_t vrh_csum_offset; + uint16_t vrh_bufs; +} __packed; + +/* + * ptnetmap definitions + */ +struct ptnetmap_state { + void *netmap_priv; + + /* True if ptnetmap kthreads are running. */ + int running; + + /* Feature acknoweledgement support. */ + unsigned long features; + unsigned long acked_features; + + /* Info about netmap memory. */ + uint32_t memsize; + void *mem; +}; + +#ifdef WITH_NETMAP +/* Used to get read-only info. */ +struct netmap_if_info { + uint32_t nifp_offset; + uint16_t num_tx_rings; + uint16_t num_rx_rings; + uint16_t num_tx_slots; + uint16_t num_rx_slots; +}; + +int ptn_memdev_attach(void *mem_ptr, struct netmap_pools_info *); +int ptnetmap_get_netmap_if(struct ptnetmap_state *ptn, + struct netmap_if_info *nif); +struct ptnetmap_state * get_ptnetmap(struct net_backend *be); +uint32_t ptnetmap_ack_features(struct ptnetmap_state *ptn, + uint32_t wanted_features); +int ptnetmap_get_hostmemid(struct ptnetmap_state *ptn); +int ptnetmap_create(struct ptnetmap_state *ptn, struct ptnetmap_cfg *cfg); +int ptnetmap_delete(struct ptnetmap_state *ptn); +#endif /* WITH_NETMAP */ + +#include "pci_emul.h" +int net_parsemac(char *mac_str, uint8_t *mac_addr); +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); + +#endif /* __NET_BACKENDS_H__ */ diff -u -r -N usr/src/usr.sbin/bhyve/net_utils.c /usr/src/usr.sbin/bhyve/net_utils.c --- usr/src/usr.sbin/bhyve/net_utils.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_utils.c 2016-12-01 13:18:51.719036000 +0000 @@ -0,0 +1,86 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include "net_utils.h" +#include "bhyverun.h" +#include <md5.h> +#include <net/ethernet.h> +#include <string.h> +#include <stdio.h> +#include <errno.h> + +/* + * Some utils functions, used by net front-ends. Originally, they were + * in pci_virtio_net.c. + */ + +int +net_parsemac(char *mac_str, uint8_t *mac_addr) +{ + struct ether_addr *ea; + char *tmpstr; + char zero_addr[ETHER_ADDR_LEN] = { 0, 0, 0, 0, 0, 0 }; + + tmpstr = strsep(&mac_str,"="); + + if ((mac_str != NULL) && (!strcmp(tmpstr,"mac"))) { + ea = ether_aton(mac_str); + + if (ea == NULL || ETHER_IS_MULTICAST(ea->octet) || + memcmp(ea->octet, zero_addr, ETHER_ADDR_LEN) == 0) { + fprintf(stderr, "Invalid MAC %s\n", mac_str); + return (EINVAL); + } else + memcpy(mac_addr, ea->octet, ETHER_ADDR_LEN); + } + + return (0); +} + +void +net_genmac(struct pci_devinst *pi, uint8_t *macaddr) +{ + /* + * The default MAC address is the standard NetApp OUI of 00-a0-98, + * followed by an MD5 of the PCI slot/func number and dev name + */ + MD5_CTX mdctx; + unsigned char digest[16]; + char nstr[80]; + + snprintf(nstr, sizeof(nstr), "%d-%d-%s", pi->pi_slot, + pi->pi_func, vmname); + + MD5Init(&mdctx); + MD5Update(&mdctx, nstr, (unsigned int)strlen(nstr)); + MD5Final(digest, &mdctx); + + macaddr[0] = 0x00; + macaddr[1] = 0xa0; + macaddr[2] = 0x98; + macaddr[3] = digest[0]; + macaddr[4] = digest[1]; + macaddr[5] = digest[2]; +} diff -u -r -N usr/src/usr.sbin/bhyve/net_utils.c.orig /usr/src/usr.sbin/bhyve/net_utils.c.orig --- usr/src/usr.sbin/bhyve/net_utils.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_utils.c.orig 2016-11-30 10:52:59.921009000 +0000 @@ -0,0 +1,172 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include "net_utils.h" +#include "bhyverun.h" +#include <md5.h> +#include <net/ethernet.h> +#include <string.h> +#include <stdio.h> +#include <errno.h> + +/* + * Some utils functions, used by net front-ends. Originally, they were + * in pci_virtio_net.c. + */ + +int +net_parsemac(char *mac_str, uint8_t *mac_addr) +{ + struct ether_addr *ea; + char *tmpstr; + char zero_addr[ETHER_ADDR_LEN] = { 0, 0, 0, 0, 0, 0 }; + + tmpstr = strsep(&mac_str,"="); + + if ((mac_str != NULL) && (!strcmp(tmpstr,"mac"))) { + ea = ether_aton(mac_str); + + if (ea == NULL || ETHER_IS_MULTICAST(ea->octet) || + memcmp(ea->octet, zero_addr, ETHER_ADDR_LEN) == 0) { + fprintf(stderr, "Invalid MAC %s\n", mac_str); + return (EINVAL); + } else + memcpy(mac_addr, ea->octet, ETHER_ADDR_LEN); + } + + return (0); +} + +void +net_genmac(struct pci_devinst *pi, uint8_t *macaddr) +{ + /* + * The default MAC address is the standard NetApp OUI of 00-a0-98, + * followed by an MD5 of the PCI slot/func number and dev name + */ + MD5_CTX mdctx; + unsigned char digest[16]; + char nstr[80]; + + snprintf(nstr, sizeof(nstr), "%d-%d-%s", pi->pi_slot, + pi->pi_func, vmname); + + MD5Init(&mdctx); + MD5Update(&mdctx, nstr, (unsigned int)strlen(nstr)); + MD5Final(digest, &mdctx); + + macaddr[0] = 0x00; + macaddr[1] = 0xa0; + macaddr[2] = 0x98; + macaddr[3] = digest[0]; + macaddr[4] = digest[1]; + macaddr[5] = digest[2]; +} +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include "net_utils.h" +#include "bhyverun.h" +#include <md5.h> +#include <net/ethernet.h> +#include <string.h> +#include <stdio.h> +#include <errno.h> + +/* + * Some utils functions, used by net front-ends. Originally, they were + * in pci_virtio_net.c. + */ + +int +net_parsemac(char *mac_str, uint8_t *mac_addr) +{ + struct ether_addr *ea; + char *tmpstr; + char zero_addr[ETHER_ADDR_LEN] = { 0, 0, 0, 0, 0, 0 }; + + tmpstr = strsep(&mac_str,"="); + + if ((mac_str != NULL) && (!strcmp(tmpstr,"mac"))) { + ea = ether_aton(mac_str); + + if (ea == NULL || ETHER_IS_MULTICAST(ea->octet) || + memcmp(ea->octet, zero_addr, ETHER_ADDR_LEN) == 0) { + fprintf(stderr, "Invalid MAC %s\n", mac_str); + return (EINVAL); + } else + memcpy(mac_addr, ea->octet, ETHER_ADDR_LEN); + } + + return (0); +} + +void +net_genmac(struct pci_devinst *pi, uint8_t *macaddr) +{ + /* + * The default MAC address is the standard NetApp OUI of 00-a0-98, + * followed by an MD5 of the PCI slot/func number and dev name + */ + MD5_CTX mdctx; + unsigned char digest[16]; + char nstr[80]; + + snprintf(nstr, sizeof(nstr), "%d-%d-%s", pi->pi_slot, + pi->pi_func, vmname); + + MD5Init(&mdctx); + MD5Update(&mdctx, nstr, (unsigned int)strlen(nstr)); + MD5Final(digest, &mdctx); + + macaddr[0] = 0x00; + macaddr[1] = 0xa0; + macaddr[2] = 0x98; + macaddr[3] = digest[0]; + macaddr[4] = digest[1]; + macaddr[5] = digest[2]; +} diff -u -r -N usr/src/usr.sbin/bhyve/net_utils.h /usr/src/usr.sbin/bhyve/net_utils.h --- usr/src/usr.sbin/bhyve/net_utils.h 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_utils.h 2016-11-30 10:56:05.847722000 +0000 @@ -0,0 +1,102 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _NET_UTILS_H_ +#define _NET_UTILS_H_ + +#include <stdint.h> +#include "pci_emul.h" + +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); +int net_parsemac(char *mac_str, uint8_t *mac_addr); +#endif /* _NET_UTILS_H_ */ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _NET_UTILS_H_ +#define _NET_UTILS_H_ + +#include <stdint.h> +#include "pci_emul.h" + +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); +int net_parsemac(char *mac_str, uint8_t *mac_addr); +#endif /* _NET_UTILS_H_ */ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _NET_UTILS_H_ +#define _NET_UTILS_H_ + +#include <stdint.h> +#include "pci_emul.h" + +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); +int net_parsemac(char *mac_str, uint8_t *mac_addr); +#endif /* _NET_UTILS_H_ */ diff -u -r -N usr/src/usr.sbin/bhyve/net_utils.h.orig /usr/src/usr.sbin/bhyve/net_utils.h.orig --- usr/src/usr.sbin/bhyve/net_utils.h.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/net_utils.h.orig 2016-11-30 10:52:59.923031000 +0000 @@ -0,0 +1,68 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _NET_UTILS_H_ +#define _NET_UTILS_H_ + +#include <stdint.h> +#include "pci_emul.h" + +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); +int net_parsemac(char *mac_str, uint8_t *mac_addr); +#endif /* _NET_UTILS_H_ */ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, + * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, + * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _NET_UTILS_H_ +#define _NET_UTILS_H_ + +#include <stdint.h> +#include "pci_emul.h" + +void net_genmac(struct pci_devinst *pi, uint8_t *macaddr); +int net_parsemac(char *mac_str, uint8_t *mac_addr); +#endif /* _NET_UTILS_H_ */ diff -u -r -N usr/src/usr.sbin/bhyve/pci_ptnetmap_memdev.c /usr/src/usr.sbin/bhyve/pci_ptnetmap_memdev.c --- usr/src/usr.sbin/bhyve/pci_ptnetmap_memdev.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/pci_ptnetmap_memdev.c 2016-11-30 10:56:10.444085000 +0000 @@ -0,0 +1,341 @@ +/* + * Copyright (C) 2015 Stefano Garzarella (stefano.garzarella@gmail.com) + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifdef WITH_NETMAP + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> + +#include <net/if.h> /* IFNAMSIZ */ +#include <net/netmap.h> +#include <net/netmap_virt.h> + +#include <machine/vmm.h> +#include <vmmapi.h> + +#include "bhyverun.h" +#include "pci_emul.h" + +/* + * ptnetmap memdev PCI device + * + * This device is used to map a netmap memory allocator on the guest VM + * through PCI_BAR. The same allocator can be shared between multiple ptnetmap + * ports in the guest. + * + * Each netmap allocator has a unique ID assigned by the netmap host module. + * + * The implementation here is based on the QEMU/KVM one. + */ +struct ptn_memdev_softc { + struct pci_devinst *pi; /* PCI device instance */ + + void *mem_ptr; /* netmap shared memory */ + struct netmap_pools_info info; + + TAILQ_ENTRY(ptn_memdev_softc) next; +}; +static TAILQ_HEAD(, ptn_memdev_softc) ptn_memdevs = TAILQ_HEAD_INITIALIZER(ptn_memdevs); + +/* + * ptn_memdev_softc can be created by pe_init or ptnetmap backend, + * this depends on the order of initialization. + */ +static struct ptn_memdev_softc * +ptn_memdev_create() +{ + struct ptn_memdev_softc *sc; + + sc = calloc(1, sizeof(struct ptn_memdev_softc)); + if (sc != NULL) { + TAILQ_INSERT_TAIL(&ptn_memdevs, sc, next); + } + + return sc; +} + +static void +ptn_memdev_delete(struct ptn_memdev_softc *sc) +{ + TAILQ_REMOVE(&ptn_memdevs, sc, next); + + free(sc); +} + +/* + * Find ptn_memdev through memid (netmap memory allocator ID) + */ +static struct ptn_memdev_softc * +ptn_memdev_find_memid(uint32_t mem_id) +{ + struct ptn_memdev_softc *sc; + + TAILQ_FOREACH(sc, &ptn_memdevs, next) { + if (sc->mem_ptr != NULL && mem_id == sc->info.memid) { + return sc; + } + } + + return NULL; +} + +/* + * Find ptn_memdev that has not netmap memory (attached by ptnetmap backend) + */ +static struct ptn_memdev_softc * +ptn_memdev_find_empty_mem() +{ + struct ptn_memdev_softc *sc; + + TAILQ_FOREACH(sc, &ptn_memdevs, next) { + if (sc->mem_ptr == NULL) { + return sc; + } + } + + return NULL; +} + +/* + * Find ptn_memdev that has not PCI device istance (created by pe_init) + */ +static struct ptn_memdev_softc * +ptn_memdev_find_empty_pi() +{ + struct ptn_memdev_softc *sc; + + TAILQ_FOREACH(sc, &ptn_memdevs, next) { + if (sc->pi == NULL) { + return sc; + } + } + + return NULL; +} + +/* + * Handle read on ptnetmap-memdev register + */ +static uint64_t +ptn_pci_read(struct vmctx *ctx, int vcpu, struct pci_devinst *pi, + int baridx, uint64_t offset, int size) +{ + struct ptn_memdev_softc *sc = pi->pi_arg; + + if (sc == NULL) + return 0; + + if (baridx == PTNETMAP_IO_PCI_BAR) { + switch (offset) { + case PTNET_MDEV_IO_MEMSIZE_LO: + return sc->info.memsize & 0xffffffff; + case PTNET_MDEV_IO_MEMSIZE_HI: + return sc->info.memsize >> 32; + case PTNET_MDEV_IO_MEMID: + return sc->info.memid; + case PTNET_MDEV_IO_IF_POOL_OFS: + return sc->info.if_pool_offset; + case PTNET_MDEV_IO_IF_POOL_OBJNUM: + return sc->info.if_pool_objtotal; + case PTNET_MDEV_IO_IF_POOL_OBJSZ: + return sc->info.if_pool_objsize; + case PTNET_MDEV_IO_RING_POOL_OFS: + return sc->info.ring_pool_offset; + case PTNET_MDEV_IO_RING_POOL_OBJNUM: + return sc->info.ring_pool_objtotal; + case PTNET_MDEV_IO_RING_POOL_OBJSZ: + return sc->info.ring_pool_objsize; + case PTNET_MDEV_IO_BUF_POOL_OFS: + return sc->info.buf_pool_offset; + case PTNET_MDEV_IO_BUF_POOL_OBJNUM: + return sc->info.buf_pool_objtotal; + case PTNET_MDEV_IO_BUF_POOL_OBJSZ: + return sc->info.buf_pool_objsize; + } + } + + printf("%s: Unexpected register read [bar %u, offset %lx size %d]\n", + __func__, baridx, offset, size); + + return 0; +} + +/* + * Handle write on ptnetmap-memdev register (unused for now) + */ +static void +ptn_pci_write(struct vmctx *ctx, int vcpu, struct pci_devinst *pi, + int baridx, uint64_t offset, int size, uint64_t value) +{ + struct ptn_memdev_softc *sc = pi->pi_arg; + + if (sc == NULL) + return; + + printf("%s: Unexpected register write [bar %u, offset %lx size %d " + "value %lx]\n", __func__, baridx, offset, size, value); +} + +/* + * Configure the ptnetmap-memdev PCI BARs. PCI BARs can only be created + * when the PCI device is created and the netmap memory is attached. + */ +static int +ptn_memdev_configure_bars(struct ptn_memdev_softc *sc) +{ + int ret; + + if (sc->pi == NULL || sc->mem_ptr == NULL) + return 0; + + /* Allocate a BAR for an I/O region. */ + ret = pci_emul_alloc_bar(sc->pi, PTNETMAP_IO_PCI_BAR, PCIBAR_IO, + PTNET_MDEV_IO_END); + if (ret) { + printf("ptnetmap_memdev: iobar allocation error %d\n", ret); + return ret; + } + + /* Allocate a BAR for a memory region. */ + ret = pci_emul_alloc_bar(sc->pi, PTNETMAP_MEM_PCI_BAR, PCIBAR_MEM32, + sc->info.memsize); + if (ret) { + printf("ptnetmap_memdev: membar allocation error %d\n", ret); + return ret; + } + + /* Map netmap memory on the memory BAR. */ + ret = vm_map_user_buf(sc->pi->pi_vmctx, + sc->pi->pi_bar[PTNETMAP_MEM_PCI_BAR].addr, + sc->info.memsize, sc->mem_ptr); + if (ret) { + printf("ptnetmap_memdev: membar map error %d\n", ret); + return ret; + } + + return 0; +} + +/* + * PCI device initialization + */ +static int +ptn_memdev_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) +{ + struct ptn_memdev_softc *sc; + int ret; + + sc = ptn_memdev_find_empty_pi(); + if (sc == NULL) { + sc = ptn_memdev_create(); + if (sc == NULL) { + printf("ptnetmap_memdev: calloc error\n"); + return (ENOMEM); + } + } + + /* Link our softc in the pci_devinst. */ + pi->pi_arg = sc; + sc->pi = pi; + + /* Initialize PCI configuration space. */ + pci_set_cfgdata16(pi, PCIR_VENDOR, PTNETMAP_PCI_VENDOR_ID); + pci_set_cfgdata16(pi, PCIR_DEVICE, PTNETMAP_PCI_DEVICE_ID); + pci_set_cfgdata8(pi, PCIR_CLASS, PCIC_NETWORK); + pci_set_cfgdata16(pi, PCIR_SUBDEV_0, 1); + pci_set_cfgdata16(pi, PCIR_SUBVEND_0, PTNETMAP_PCI_VENDOR_ID); + + /* Configure PCI-BARs. */ + ret = ptn_memdev_configure_bars(sc); + if (ret) { + printf("ptnetmap_memdev: configure error\n"); + goto err; + } + + return 0; +err: + ptn_memdev_delete(sc); + pi->pi_arg = NULL; + return ret; +} + +/* + * used by ptnetmap backend to attach the netmap memory allocator to the + * ptnetmap-memdev. (shared with the guest VM through PCI-BAR) + */ +int +ptn_memdev_attach(void *mem_ptr, struct netmap_pools_info *info) +{ + struct ptn_memdev_softc *sc; + int ret; + + /* if a device with the same mem_id is already attached, we are done */ + if (ptn_memdev_find_memid(info->memid)) { + printf("ptnetmap_memdev: already attched\n"); + return 0; + } + + sc = ptn_memdev_find_empty_mem(); + if (sc == NULL) { + sc = ptn_memdev_create(); + if (sc == NULL) { + printf("ptnetmap_memdev: calloc error\n"); + return (ENOMEM); + } + } + + sc->mem_ptr = mem_ptr; + sc->info = *info; + + /* configure device PCI-BARs */ + ret = ptn_memdev_configure_bars(sc); + if (ret) { + printf("ptnetmap_memdev: configure error\n"); + goto err; + } + + + return 0; +err: + ptn_memdev_delete(sc); + sc->pi->pi_arg = NULL; + return ret; +} + +struct pci_devemu pci_de_ptnetmap = { + .pe_emu = PTNETMAP_MEMDEV_NAME, + .pe_init = ptn_memdev_init, + .pe_barwrite = ptn_pci_write, + .pe_barread = ptn_pci_read +}; +PCI_EMUL_SET(pci_de_ptnetmap); + +#endif /* WITH_NETMAP */ diff -u -r -N usr/src/usr.sbin/bhyve/pci_ptnetmap_netif.c /usr/src/usr.sbin/bhyve/pci_ptnetmap_netif.c --- usr/src/usr.sbin/bhyve/pci_ptnetmap_netif.c 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/pci_ptnetmap_netif.c 2016-11-30 10:56:10.455824000 +0000 @@ -0,0 +1,411 @@ +/* + * Copyright (C) 2016 Vincenzo Maffione + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* + * This file contains the emulation of the ptnet network frontend, to be used + * with netmap backend. + */ + +#ifdef WITH_NETMAP + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD$"); + +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> + +#include <net/if.h> /* IFNAMSIZ */ +#include <net/netmap.h> +#include <net/netmap_virt.h> + +#include <sys/ioctl.h> +#include <sys/param.h> +#include <sys/_cpuset.h> +#include <machine/vmm.h> +#include <machine/vmm_dev.h> /* VM_LAPIC_MSI */ +#include <vmmapi.h> + +#include "bhyverun.h" +#include "pci_emul.h" +#include "net_utils.h" +#include "net_backends.h" + +#ifndef PTNET_CSB_ALLOC +#error "Hypervisor-allocated CSB not supported" +#endif + + +struct ptnet_softc { + struct pci_devinst *pi; + + struct net_backend *be; + struct ptnetmap_state *ptbe; + + unsigned int num_rings; + uint32_t ioregs[PTNET_IO_END >> 2]; + void *csb; +}; + +static int +ptnet_get_netmap_if(struct ptnet_softc *sc) +{ + unsigned int num_rings; + struct netmap_if_info nif; + int ret; + + ret = ptnetmap_get_netmap_if(sc->ptbe, &nif); + if (ret) { + return ret; + } + + sc->ioregs[PTNET_IO_NIFP_OFS >> 2] = nif.nifp_offset; + sc->ioregs[PTNET_IO_NUM_TX_RINGS >> 2] = nif.num_tx_rings; + sc->ioregs[PTNET_IO_NUM_RX_RINGS >> 2] = nif.num_rx_rings; + sc->ioregs[PTNET_IO_NUM_TX_SLOTS >> 2] = nif.num_tx_slots; + sc->ioregs[PTNET_IO_NUM_RX_SLOTS >> 2] = nif.num_rx_slots; + + num_rings = sc->ioregs[PTNET_IO_NUM_TX_RINGS >> 2] + + sc->ioregs[PTNET_IO_NUM_RX_RINGS >> 2]; + if (sc->num_rings && num_rings && sc->num_rings != num_rings) { + fprintf(stderr, "Number of rings changed: not supported\n"); + return EINVAL; + } + sc->num_rings = num_rings; + + return 0; +} + +static int +ptnet_ptctl_create(struct ptnet_softc *sc) +{ + struct ptnetmap_cfgentry_bhyve *cfgentry; + struct pci_devinst *pi = sc->pi; + struct vmctx *vmctx = pi->pi_vmctx; + struct ptnetmap_cfg *cfg; + unsigned int kick_addr; + int ret; + int i; + + if (sc->csb == NULL) { + fprintf(stderr, "%s: Unexpected NULL CSB", __func__); + return -1; + } + + cfg = calloc(1, sizeof(*cfg) + sc->num_rings * sizeof(*cfgentry)); + + cfg->cfgtype = PTNETMAP_CFGTYPE_BHYVE; + cfg->entry_size = sizeof(*cfgentry); + cfg->num_rings = sc->num_rings; + cfg->ptrings = sc->csb; + + kick_addr = pi->pi_bar[PTNETMAP_IO_PCI_BAR].addr + PTNET_IO_KICK_BASE; + cfgentry = (struct ptnetmap_cfgentry_bhyve *)(cfg + 1); + + for (i = 0; i < sc->num_rings; i++, kick_addr += 4, cfgentry++) { + struct msix_table_entry *mte; + uint64_t cookie = sc->ioregs[PTNET_IO_MAC_LO >> 2] + 4*i; + + cfgentry->ioctl_fd = vm_get_fd(vmctx); + cfgentry->ioctl_cmd = VM_LAPIC_MSI; + mte = &pi->pi_msix.table[i]; + cfgentry->ioctl_data.addr = mte->addr; + cfgentry->ioctl_data.msg_data = mte->msg_data; + + fprintf(stderr, "%s: vector %u, addr %lu, data %u, " + "kick_addr %u, cookie: %p\n", + __func__, i, mte->addr, mte->msg_data, kick_addr, + (void*)cookie); + + ret = vm_io_reg_handler(vmctx, kick_addr /* ioaddr */, + 0 /* in */, 0 /* mask_data */, + 0 /* data */, VM_IO_REGH_KWEVENTS, + (void*)cookie /* cookie */); + if (ret) { + fprintf(stderr, "%s: vm_io_reg_handler %d\n", + __func__, ret); + } + cfgentry->wchan = (uint64_t) cookie; + } + + ret = ptnetmap_create(sc->ptbe, cfg); + free(cfg); + + return ret; +} + +static int +ptnet_ptctl_delete(struct ptnet_softc *sc) +{ + struct pci_devinst *pi = sc->pi; + struct vmctx *vmctx = pi->pi_vmctx; + unsigned int kick_addr; + int i; + + kick_addr = pi->pi_bar[PTNETMAP_IO_PCI_BAR].addr + PTNET_IO_KICK_BASE; + + for (i = 0; i < sc->num_rings; i++, kick_addr += 4) { + vm_io_reg_handler(vmctx, kick_addr, 0, 0, 0, + VM_IO_REGH_DELETE, 0); + } + + return ptnetmap_delete(sc->ptbe); +} + +static void +ptnet_ptctl(struct ptnet_softc *sc, uint64_t cmd) +{ + int ret = EINVAL; + + switch (cmd) { + case PTNETMAP_PTCTL_CREATE: + /* React to a REGIF in the guest. */ + ret = ptnet_ptctl_create(sc); + break; + + case PTNETMAP_PTCTL_DELETE: + /* React to an UNREGIF in the guest. */ + ret = ptnet_ptctl_delete(sc); + break; + } + + sc->ioregs[PTNET_IO_PTCTL >> 2] = ret; +} + +static void +ptnet_csb_mapping(struct ptnet_softc *sc) +{ + uint64_t base = ((uint64_t)sc->ioregs[PTNET_IO_CSBBAH >> 2] << 32) | + sc->ioregs[PTNET_IO_CSBBAL >> 2]; + uint64_t len = 4096; + + sc->csb = NULL; + if (base) { + sc->csb = paddr_guest2host(sc->pi->pi_vmctx, base, len); + } +} + +static void +ptnet_bar_write(struct vmctx *ctx, int vcpu, struct pci_devinst *pi, + int baridx, uint64_t offset, int size, uint64_t value) +{ + struct ptnet_softc *sc = pi->pi_arg; + unsigned int index; + + /* Redirect to MSI-X emulation code. */ + if (baridx == pci_msix_table_bar(pi) || + baridx == pci_msix_pba_bar(pi)) { + pci_emul_msix_twrite(pi, offset, size, value); + return; + } + + if (sc == NULL) + return; + + offset &= PTNET_IO_MASK; + index = offset >> 2; + + if (baridx != PTNETMAP_IO_PCI_BAR || offset >= PTNET_IO_END) { + fprintf(stderr, "%s: Unexpected register write [bar %u, " + "offset %lx size %d value %lx]\n", __func__, baridx, + offset, size, value); + return; + } + + switch (offset) { + case PTNET_IO_PTFEAT: + value = ptnetmap_ack_features(sc->ptbe, value); + sc->ioregs[index] = value; + break; + + case PTNET_IO_PTCTL: + ptnet_ptctl(sc, value); + break; + + case PTNET_IO_CSBBAH: + sc->ioregs[index] = value; + break; + + case PTNET_IO_CSBBAL: + sc->ioregs[index] = value; + ptnet_csb_mapping(sc); + break; + + case PTNET_IO_VNET_HDR_LEN: + if (netbe_set_cap(sc->be, netbe_get_cap(sc->be), + value) == 0) { + sc->ioregs[index] = value; + } + break; + } +} + +static uint64_t +ptnet_bar_read(struct vmctx *ctx, int vcpu, struct pci_devinst *pi, + int baridx, uint64_t offset, int size) +{ + struct ptnet_softc *sc = pi->pi_arg; + uint64_t index = offset >> 2; + + if (baridx == pci_msix_table_bar(pi) || + baridx == pci_msix_pba_bar(pi)) { + return pci_emul_msix_tread(pi, offset, size); + } + + if (sc == NULL) + return 0; + + offset &= PTNET_IO_MASK; + + if (baridx != PTNETMAP_IO_PCI_BAR || offset >= PTNET_IO_END) { + fprintf(stderr, "%s: Unexpected register read [bar %u, " + "offset %lx size %d]\n", __func__, baridx, offset, + size); + return 0; + } + + switch (offset) { + case PTNET_IO_NIFP_OFS: + case PTNET_IO_NUM_TX_RINGS: + case PTNET_IO_NUM_RX_RINGS: + case PTNET_IO_NUM_TX_SLOTS: + case PTNET_IO_NUM_RX_SLOTS: + /* Fill in device registers with information about + * nifp_offset, num_*x_rings, and num_*x_slots. */ + ptnet_get_netmap_if(sc); + break; + + case PTNET_IO_HOSTMEMID: + sc->ioregs[index] = ptnetmap_get_hostmemid(sc->ptbe); + break; + } + + return sc->ioregs[index]; +} + +/* PCI device initialization. */ +static int +ptnet_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) +{ + struct ptnet_softc *sc; + char *ptopts, *devname; + uint8_t macaddr[6]; + int mac_provided = 0; + int ret; + + sc = calloc(1, sizeof(*sc)); + if (sc == NULL) { + fprintf(stderr, "%s: out of memory\n", __func__); + return -1; + } + + /* Link our softc in the pci_devinst. */ + pi->pi_arg = sc; + sc->pi = pi; + + /* Parse command line options. */ + if (opts == NULL) { + fprintf(stderr, "%s: No backend specified\n", __func__); + return -1; + } + + devname = ptopts = strdup(opts); + (void) strsep(&ptopts, ","); + + if (ptopts != NULL) { + ret = net_parsemac(ptopts, macaddr); + if (ret != 0) { + free(devname); + return ret; + } + mac_provided = 1; + } + + if (!mac_provided) { + net_genmac(pi, macaddr); + } + + /* Initialize backend. A NULL callback is used here to tell + * the ask the netmap backend to use ptnetmap. */ + sc->be = netbe_init(devname, NULL, sc); + if (!sc->be) { + fprintf(stderr, "net backend initialization failed\n"); + return -1; + } + + free(devname); + + sc->ptbe = get_ptnetmap(sc->be); + if (!sc->ptbe) { + fprintf(stderr, "%s: failed to get ptnetmap\n", __func__); + return -1; + } + + /* Initialize PCI configuration space. */ + pci_set_cfgdata16(pi, PCIR_VENDOR, PTNETMAP_PCI_VENDOR_ID); + pci_set_cfgdata16(pi, PCIR_DEVICE, PTNETMAP_PCI_NETIF_ID); + pci_set_cfgdata8(pi, PCIR_CLASS, PCIC_NETWORK); + pci_set_cfgdata8(pi, PCIR_SUBCLASS, PCIS_NETWORK_ETHERNET); + pci_set_cfgdata16(pi, PCIR_SUBDEV_0, 1); + pci_set_cfgdata16(pi, PCIR_SUBVEND_0, PTNETMAP_PCI_VENDOR_ID); + + /* Allocate a BAR for an I/O region. */ + ret = pci_emul_alloc_bar(pi, PTNETMAP_IO_PCI_BAR, PCIBAR_IO, + PTNET_IO_MASK + 1); + if (ret) { + fprintf(stderr, "%s: failed to allocate BAR [%d]\n", + __func__, ret); + return ret; + } + + /* Initialize registers and data structures. */ + memset(sc->ioregs, 0, sizeof(sc->ioregs)); + sc->csb = NULL; + sc->ioregs[PTNET_IO_MAC_HI >> 2] = (macaddr[0] << 8) | macaddr[1]; + sc->ioregs[PTNET_IO_MAC_LO >> 2] = (macaddr[2] << 24) | + (macaddr[3] << 16) | + (macaddr[4] << 8) | macaddr[5]; + + sc->num_rings = 0; + ptnet_get_netmap_if(sc); + + /* Allocate a BAR for MSI-X vectors. */ + pci_emul_add_msixcap(pi, sc->num_rings, PTNETMAP_MSIX_PCI_BAR); + + return 0; +} + +struct pci_devemu pci_de_ptnet = { + .pe_emu = "ptnet", + .pe_init = ptnet_init, + .pe_barwrite = ptnet_bar_write, + .pe_barread = ptnet_bar_read, +}; +PCI_EMUL_SET(pci_de_ptnet); + +#endif /* WITH_NETMAP */ diff -u -r -N usr/src/usr.sbin/bhyve/pci_virtio_net.c /usr/src/usr.sbin/bhyve/pci_virtio_net.c --- usr/src/usr.sbin/bhyve/pci_virtio_net.c 2016-09-29 00:25:07.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/pci_virtio_net.c 2016-11-30 10:56:10.459343000 +0000 @@ -26,6 +26,22 @@ * $FreeBSD: releng/11.0/usr.sbin/bhyve/pci_virtio_net.c 296829 2016-03-14 08:48:16Z gnn $ */ +/* + * This file contains the emulation of the virtio-net network frontend. Network + * backends are in net_backends.c. + * + * The frontend is selected using the pe_emu field of the descriptor, + * Upon a match, the pe_init function is invoked, which initializes + * the emulated PCI device, attaches to the backend, and calls virtio + * initialization functions. + * + * PCI register read/writes are handled through generic PCI methods + * + * virtio TX is handled by a dedicated thread, pci_vtnet_tx_thread() + * virtio RX is handled by the backend (often with some helper thread), + * which in turn calls a frontend callback, pci_vtnet_rx_callback() + */ + #include <sys/cdefs.h> __FBSDID("$FreeBSD: releng/11.0/usr.sbin/bhyve/pci_virtio_net.c 296829 2016-03-14 08:48:16Z gnn $"); @@ -36,10 +52,7 @@ #include <sys/ioctl.h> #include <machine/atomic.h> #include <net/ethernet.h> -#ifndef NETMAP_WITH_LIBS -#define NETMAP_WITH_LIBS -#endif -#include <net/netmap_user.h> +#include <net/if.h> /* IFNAMSIZ */ #include <errno.h> #include <fcntl.h> @@ -50,7 +63,6 @@ #include <strings.h> #include <unistd.h> #include <assert.h> -#include <md5.h> #include <pthread.h> #include <pthread_np.h> @@ -58,36 +70,16 @@ #include "pci_emul.h" #include "mevent.h" #include "virtio.h" +#include "net_utils.h" /* MAC address generation */ +#include "net_backends.h" /* VirtIO capabilities */ #define VTNET_RINGSZ 1024 #define VTNET_MAXSEGS 256 -/* - * Host capabilities. Note that we only offer a few of these. - */ -#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ -#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ -#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ -#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ -#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ -#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ -#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ -#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ -#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ -#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ -#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ -#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ -#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ -#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ -#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ -#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ -#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ -#define VIRTIO_NET_F_GUEST_ANNOUNCE \ - (1 << 21) /* guest can send gratuitous pkts */ - +/* Our capabilities: we don't support VIRTIO_NET_F_MRG_RXBUF at the moment. */ #define VTNET_S_HOSTCAPS \ - ( VIRTIO_NET_F_MAC | VIRTIO_NET_F_MRG_RXBUF | VIRTIO_NET_F_STATUS | \ + ( VIRTIO_NET_F_MAC | VIRTIO_NET_F_STATUS | \ VIRTIO_F_NOTIFY_ON_EMPTY | VIRTIO_RING_F_INDIRECT_DESC) /* @@ -96,6 +88,7 @@ struct virtio_net_config { uint8_t mac[6]; uint16_t status; + uint16_t max_virtqueue_pairs; } __packed; /* @@ -108,19 +101,6 @@ #define VTNET_MAXQ 3 /* - * Fixed network header size - */ -struct virtio_net_rxhdr { - uint8_t vrh_flags; - uint8_t vrh_gso_type; - uint16_t vrh_hdr_len; - uint16_t vrh_gso_size; - uint16_t vrh_csum_start; - uint16_t vrh_csum_offset; - uint16_t vrh_bufs; -} __packed; - -/* * Debug printf */ static int pci_vtnet_debug; @@ -134,31 +114,24 @@ struct virtio_softc vsc_vs; struct vqueue_info vsc_queues[VTNET_MAXQ - 1]; pthread_mutex_t vsc_mtx; - struct mevent *vsc_mevp; - int vsc_tapfd; - struct nm_desc *vsc_nmd; + struct net_backend *vsc_be; int vsc_rx_ready; volatile int resetting; /* set and checked outside lock */ uint64_t vsc_features; /* negotiated features */ - struct virtio_net_config vsc_config; - pthread_mutex_t rx_mtx; - int rx_in_progress; - int rx_vhdrlen; + unsigned int rx_vhdrlen; int rx_merge; /* merged rx bufs in use */ pthread_t tx_tid; pthread_mutex_t tx_mtx; pthread_cond_t tx_cond; int tx_in_progress; + struct virtio_net_config vsc_config; - void (*pci_vtnet_rx)(struct pci_vtnet_softc *sc); - void (*pci_vtnet_tx)(struct pci_vtnet_softc *sc, struct iovec *iov, - int iovcnt, int len); }; static void pci_vtnet_reset(void *); @@ -181,6 +154,7 @@ /* * If the transmit thread is active then stall until it is done. + * Only used once in pci_vtnet_reset() */ static void pci_vtnet_txwait(struct pci_vtnet_softc *sc) @@ -197,20 +171,18 @@ /* * If the receive thread is active then stall until it is done. + * It is enough to lock and unlock the RX mutex. + * Only used once in pci_vtnet_reset() */ static void pci_vtnet_rxwait(struct pci_vtnet_softc *sc) { pthread_mutex_lock(&sc->rx_mtx); - while (sc->rx_in_progress) { - pthread_mutex_unlock(&sc->rx_mtx); - usleep(10000); - pthread_mutex_lock(&sc->rx_mtx); - } pthread_mutex_unlock(&sc->rx_mtx); } +/* handler for virtio_reset */ static void pci_vtnet_reset(void *vsc) { @@ -237,360 +209,80 @@ sc->resetting = 0; } -/* - * Called to send a buffer chain out to the tap device - */ -static void -pci_vtnet_tap_tx(struct pci_vtnet_softc *sc, struct iovec *iov, int iovcnt, - int len) -{ - static char pad[60]; /* all zero bytes */ - - if (sc->vsc_tapfd == -1) - return; - - /* - * If the length is < 60, pad out to that and add the - * extra zero'd segment to the iov. It is guaranteed that - * there is always an extra iov available by the caller. - */ - if (len < 60) { - iov[iovcnt].iov_base = pad; - iov[iovcnt].iov_len = 60 - len; - iovcnt++; - } - (void) writev(sc->vsc_tapfd, iov, iovcnt); -} - -/* - * Called when there is read activity on the tap file descriptor. - * Each buffer posted by the guest is assumed to be able to contain - * an entire ethernet frame + rx header. - * MP note: the dummybuf is only used for discarding frames, so there - * is no need for it to be per-vtnet or locked. - */ -static uint8_t dummybuf[2048]; - -static __inline struct iovec * -rx_iov_trim(struct iovec *iov, int *niov, int tlen) -{ - struct iovec *riov; - - /* XXX short-cut: assume first segment is >= tlen */ - assert(iov[0].iov_len >= tlen); - - iov[0].iov_len -= tlen; - if (iov[0].iov_len == 0) { - assert(*niov > 1); - *niov -= 1; - riov = &iov[1]; - } else { - iov[0].iov_base = (void *)((uintptr_t)iov[0].iov_base + tlen); - riov = &iov[0]; - } - - return (riov); -} - static void -pci_vtnet_tap_rx(struct pci_vtnet_softc *sc) +pci_vtnet_rx(struct pci_vtnet_softc *sc) { - struct iovec iov[VTNET_MAXSEGS], *riov; + struct iovec iov[VTNET_MAXSEGS + 1]; struct vqueue_info *vq; - void *vrx; int len, n; uint16_t idx; - /* - * Should never be called without a valid tap fd - */ - assert(sc->vsc_tapfd != -1); - - /* - * But, will be called when the rx ring hasn't yet - * been set up or the guest is resetting the device. - */ if (!sc->vsc_rx_ready || sc->resetting) { /* - * Drop the packet and try later. + * The rx ring has not yet been set up or the guest is + * resetting the device. Drop the packet and try later. */ - (void) read(sc->vsc_tapfd, dummybuf, sizeof(dummybuf)); + netbe_rx_discard(sc->vsc_be); return; } - /* - * Check for available rx buffers - */ vq = &sc->vsc_queues[VTNET_RXQ]; if (!vq_has_descs(vq)) { /* - * Drop the packet and try later. Interrupt on - * empty, if that's negotiated. + * No available rx buffers. Drop the packet and try later. + * Interrupt on empty, if that's negotiated. */ - (void) read(sc->vsc_tapfd, dummybuf, sizeof(dummybuf)); + netbe_rx_discard(sc->vsc_be); vq_endchains(vq, 1); return; } do { - /* - * Get descriptor chain. - */ + /* Get descriptor chain into iov */ n = vq_getchain(vq, &idx, iov, VTNET_MAXSEGS, NULL); assert(n >= 1 && n <= VTNET_MAXSEGS); - /* - * Get a pointer to the rx header, and use the - * data immediately following it for the packet buffer. - */ - vrx = iov[0].iov_base; - riov = rx_iov_trim(iov, &n, sc->rx_vhdrlen); - - len = readv(sc->vsc_tapfd, riov, n); - - if (len < 0 && errno == EWOULDBLOCK) { - /* - * No more packets, but still some avail ring - * entries. Interrupt if needed/appropriate. - */ - vq_retchain(vq); - vq_endchains(vq, 0); - return; - } - - /* - * The only valid field in the rx packet header is the - * number of buffers if merged rx bufs were negotiated. - */ - memset(vrx, 0, sc->rx_vhdrlen); - - if (sc->rx_merge) { - struct virtio_net_rxhdr *vrxh; - - vrxh = vrx; - vrxh->vrh_bufs = 1; - } - - /* - * Release this chain and handle more chains. - */ - vq_relchain(vq, idx, len + sc->rx_vhdrlen); - } while (vq_has_descs(vq)); - - /* Interrupt if needed, including for NOTIFY_ON_EMPTY. */ - vq_endchains(vq, 1); -} - -static __inline int -pci_vtnet_netmap_writev(struct nm_desc *nmd, struct iovec *iov, int iovcnt) -{ - int r, i; - int len = 0; - - for (r = nmd->cur_tx_ring; ; ) { - struct netmap_ring *ring = NETMAP_TXRING(nmd->nifp, r); - uint32_t cur, idx; - char *buf; - - if (nm_ring_empty(ring)) { - r++; - if (r > nmd->last_tx_ring) - r = nmd->first_tx_ring; - if (r == nmd->cur_tx_ring) - break; - continue; - } - cur = ring->cur; - idx = ring->slot[cur].buf_idx; - buf = NETMAP_BUF(ring, idx); + len = netbe_recv(sc->vsc_be, iov, n); - for (i = 0; i < iovcnt; i++) { - if (len + iov[i].iov_len > 2048) - break; - memcpy(&buf[len], iov[i].iov_base, iov[i].iov_len); - len += iov[i].iov_len; - } - ring->slot[cur].len = len; - ring->head = ring->cur = nm_ring_next(ring, cur); - nmd->cur_tx_ring = r; - ioctl(nmd->fd, NIOCTXSYNC, NULL); - break; - } - - return (len); -} - -static __inline int -pci_vtnet_netmap_readv(struct nm_desc *nmd, struct iovec *iov, int iovcnt) -{ - int len = 0; - int i = 0; - int r; - - for (r = nmd->cur_rx_ring; ; ) { - struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, r); - uint32_t cur, idx; - char *buf; - size_t left; - - if (nm_ring_empty(ring)) { - r++; - if (r > nmd->last_rx_ring) - r = nmd->first_rx_ring; - if (r == nmd->cur_rx_ring) - break; - continue; + if (len < 0) { + break; } - cur = ring->cur; - idx = ring->slot[cur].buf_idx; - buf = NETMAP_BUF(ring, idx); - left = ring->slot[cur].len; - - for (i = 0; i < iovcnt && left > 0; i++) { - if (iov[i].iov_len > left) - iov[i].iov_len = left; - memcpy(iov[i].iov_base, &buf[len], iov[i].iov_len); - len += iov[i].iov_len; - left -= iov[i].iov_len; - } - ring->head = ring->cur = nm_ring_next(ring, cur); - nmd->cur_rx_ring = r; - ioctl(nmd->fd, NIOCRXSYNC, NULL); - break; - } - for (; i < iovcnt; i++) - iov[i].iov_len = 0; - - return (len); -} - -/* - * Called to send a buffer chain out to the vale port - */ -static void -pci_vtnet_netmap_tx(struct pci_vtnet_softc *sc, struct iovec *iov, int iovcnt, - int len) -{ - static char pad[60]; /* all zero bytes */ - - if (sc->vsc_nmd == NULL) - return; - - /* - * If the length is < 60, pad out to that and add the - * extra zero'd segment to the iov. It is guaranteed that - * there is always an extra iov available by the caller. - */ - if (len < 60) { - iov[iovcnt].iov_base = pad; - iov[iovcnt].iov_len = 60 - len; - iovcnt++; - } - (void) pci_vtnet_netmap_writev(sc->vsc_nmd, iov, iovcnt); -} - -static void -pci_vtnet_netmap_rx(struct pci_vtnet_softc *sc) -{ - struct iovec iov[VTNET_MAXSEGS], *riov; - struct vqueue_info *vq; - void *vrx; - int len, n; - uint16_t idx; - - /* - * Should never be called without a valid netmap descriptor - */ - assert(sc->vsc_nmd != NULL); - - /* - * But, will be called when the rx ring hasn't yet - * been set up or the guest is resetting the device. - */ - if (!sc->vsc_rx_ready || sc->resetting) { - /* - * Drop the packet and try later. - */ - (void) nm_nextpkt(sc->vsc_nmd, (void *)dummybuf); - return; - } - - /* - * Check for available rx buffers - */ - vq = &sc->vsc_queues[VTNET_RXQ]; - if (!vq_has_descs(vq)) { - /* - * Drop the packet and try later. Interrupt on - * empty, if that's negotiated. - */ - (void) nm_nextpkt(sc->vsc_nmd, (void *)dummybuf); - vq_endchains(vq, 1); - return; - } - - do { - /* - * Get descriptor chain. - */ - n = vq_getchain(vq, &idx, iov, VTNET_MAXSEGS, NULL); - assert(n >= 1 && n <= VTNET_MAXSEGS); - - /* - * Get a pointer to the rx header, and use the - * data immediately following it for the packet buffer. - */ - vrx = iov[0].iov_base; - riov = rx_iov_trim(iov, &n, sc->rx_vhdrlen); - - len = pci_vtnet_netmap_readv(sc->vsc_nmd, riov, n); if (len == 0) { /* * No more packets, but still some avail ring * entries. Interrupt if needed/appropriate. */ - vq_retchain(vq); + vq_retchain(vq); /* return the slot to the vq */ vq_endchains(vq, 0); return; } - /* - * The only valid field in the rx packet header is the - * number of buffers if merged rx bufs were negotiated. - */ - memset(vrx, 0, sc->rx_vhdrlen); - - if (sc->rx_merge) { - struct virtio_net_rxhdr *vrxh; - - vrxh = vrx; - vrxh->vrh_bufs = 1; - } - - /* - * Release this chain and handle more chains. - */ - vq_relchain(vq, idx, len + sc->rx_vhdrlen); + /* Publish the info to the guest */ + vq_relchain(vq, idx, (uint32_t)len); } while (vq_has_descs(vq)); /* Interrupt if needed, including for NOTIFY_ON_EMPTY. */ vq_endchains(vq, 1); } +/* + * Called when there is read activity on the tap file descriptor. + * Each buffer posted by the guest is assumed to be able to contain + * an entire ethernet frame + rx header. + */ static void pci_vtnet_rx_callback(int fd, enum ev_type type, void *param) { struct pci_vtnet_softc *sc = param; + (void)fd; (void)type; pthread_mutex_lock(&sc->rx_mtx); - sc->rx_in_progress = 1; - sc->pci_vtnet_rx(sc); - sc->rx_in_progress = 0; + pci_vtnet_rx(sc); pthread_mutex_unlock(&sc->rx_mtx); - } +/* callback when writing to the PCI register */ static void pci_vtnet_ping_rxq(void *vsc, struct vqueue_info *vq) { @@ -605,35 +297,33 @@ } } +/* TX processing (guest to host), called in the tx thread */ static void pci_vtnet_proctx(struct pci_vtnet_softc *sc, struct vqueue_info *vq) { struct iovec iov[VTNET_MAXSEGS + 1]; int i, n; - int plen, tlen; + uint32_t len; uint16_t idx; /* - * Obtain chain of descriptors. The first one is - * really the header descriptor, so we need to sum - * up two lengths: packet length and transfer length. + * Obtain chain of descriptors. The first descriptor also + * contains the virtio-net header. */ n = vq_getchain(vq, &idx, iov, VTNET_MAXSEGS, NULL); assert(n >= 1 && n <= VTNET_MAXSEGS); - plen = 0; - tlen = iov[0].iov_len; - for (i = 1; i < n; i++) { - plen += iov[i].iov_len; - tlen += iov[i].iov_len; + len = 0; + for (i = 0; i < n; i++) { + len += iov[i].iov_len; } - DPRINTF(("virtio: packet send, %d bytes, %d segs\n\r", plen, n)); - sc->pci_vtnet_tx(sc, &iov[1], n - 1, plen); + netbe_send(sc->vsc_be, iov, n, len, 0 /* more */); - /* chain is processed, release it and set tlen */ - vq_relchain(vq, idx, tlen); + /* chain is processed, release it and set len */ + vq_relchain(vq, idx, len); } +/* callback when writing to the PCI register */ static void pci_vtnet_ping_txq(void *vsc, struct vqueue_info *vq) { @@ -663,6 +353,14 @@ struct vqueue_info *vq; int error; + { + struct pci_devinst *pi = sc->vsc_vs.vs_pi; + char tname[MAXCOMLEN + 1]; + snprintf(tname, sizeof(tname), "vtnet-%d:%d tx", pi->pi_slot, + pi->pi_func); + pthread_set_name_np(pthread_self(), tname); + } + vq = &sc->vsc_queues[VTNET_TXQ]; /* @@ -717,109 +415,27 @@ #endif static int -pci_vtnet_parsemac(char *mac_str, uint8_t *mac_addr) -{ - struct ether_addr *ea; - char *tmpstr; - char zero_addr[ETHER_ADDR_LEN] = { 0, 0, 0, 0, 0, 0 }; - - tmpstr = strsep(&mac_str,"="); - - if ((mac_str != NULL) && (!strcmp(tmpstr,"mac"))) { - ea = ether_aton(mac_str); - - if (ea == NULL || ETHER_IS_MULTICAST(ea->octet) || - memcmp(ea->octet, zero_addr, ETHER_ADDR_LEN) == 0) { - fprintf(stderr, "Invalid MAC %s\n", mac_str); - return (EINVAL); - } else - memcpy(mac_addr, ea->octet, ETHER_ADDR_LEN); - } - - return (0); -} - -static void -pci_vtnet_tap_setup(struct pci_vtnet_softc *sc, char *devname) -{ - char tbuf[80]; - - strcpy(tbuf, "/dev/"); - strlcat(tbuf, devname, sizeof(tbuf)); - - sc->pci_vtnet_rx = pci_vtnet_tap_rx; - sc->pci_vtnet_tx = pci_vtnet_tap_tx; - - sc->vsc_tapfd = open(tbuf, O_RDWR); - if (sc->vsc_tapfd == -1) { - WPRINTF(("open of tap device %s failed\n", tbuf)); - return; - } - - /* - * Set non-blocking and register for read - * notifications with the event loop - */ - int opt = 1; - if (ioctl(sc->vsc_tapfd, FIONBIO, &opt) < 0) { - WPRINTF(("tap device O_NONBLOCK failed\n")); - close(sc->vsc_tapfd); - sc->vsc_tapfd = -1; - } - - sc->vsc_mevp = mevent_add(sc->vsc_tapfd, - EVF_READ, - pci_vtnet_rx_callback, - sc); - if (sc->vsc_mevp == NULL) { - WPRINTF(("Could not register event\n")); - close(sc->vsc_tapfd); - sc->vsc_tapfd = -1; - } -} - -static void -pci_vtnet_netmap_setup(struct pci_vtnet_softc *sc, char *ifname) -{ - sc->pci_vtnet_rx = pci_vtnet_netmap_rx; - sc->pci_vtnet_tx = pci_vtnet_netmap_tx; - - sc->vsc_nmd = nm_open(ifname, NULL, 0, 0); - if (sc->vsc_nmd == NULL) { - WPRINTF(("open of netmap device %s failed\n", ifname)); - return; - } - - sc->vsc_mevp = mevent_add(sc->vsc_nmd->fd, - EVF_READ, - pci_vtnet_rx_callback, - sc); - if (sc->vsc_mevp == NULL) { - WPRINTF(("Could not register event\n")); - nm_close(sc->vsc_nmd); - sc->vsc_nmd = NULL; - } -} - -static int pci_vtnet_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) { - MD5_CTX mdctx; - unsigned char digest[16]; - char nstr[80]; - char tname[MAXCOMLEN + 1]; struct pci_vtnet_softc *sc; char *devname; char *vtopts; int mac_provided; + struct virtio_consts *vc; - sc = calloc(1, sizeof(struct pci_vtnet_softc)); + /* + * Allocate data structures for further virtio initializations. + * sc also contains a copy of the vtnet_vi_consts, + * because the capabilities change depending on + * the backend. + */ + sc = calloc(1, sizeof(struct pci_vtnet_softc) + + sizeof(struct virtio_consts)); + vc = (struct virtio_consts *)(sc + 1); + memcpy(vc, &vtnet_vi_consts, sizeof(*vc)); pthread_mutex_init(&sc->vsc_mtx, NULL); - vi_softc_linkup(&sc->vsc_vs, &vtnet_vi_consts, sc, pi, sc->vsc_queues); - sc->vsc_vs.vs_mtx = &sc->vsc_mtx; - sc->vsc_queues[VTNET_RXQ].vq_qsize = VTNET_RINGSZ; sc->vsc_queues[VTNET_RXQ].vq_notify = pci_vtnet_ping_rxq; sc->vsc_queues[VTNET_TXQ].vq_qsize = VTNET_RINGSZ; @@ -830,12 +446,10 @@ #endif /* - * Attempt to open the tap device and read the MAC address + * Attempt to open the backend device and read the MAC address * if specified */ mac_provided = 0; - sc->vsc_tapfd = -1; - sc->vsc_nmd = NULL; if (opts != NULL) { int err; @@ -843,7 +457,7 @@ (void) strsep(&vtopts, ","); if (vtopts != NULL) { - err = pci_vtnet_parsemac(vtopts, sc->vsc_config.mac); + err = net_parsemac(vtopts, sc->vsc_config.mac); if (err != 0) { free(devname); return (err); @@ -851,33 +465,18 @@ mac_provided = 1; } - if (strncmp(devname, "vale", 4) == 0) - pci_vtnet_netmap_setup(sc, devname); - if (strncmp(devname, "tap", 3) == 0 || - strncmp(devname, "vmnet", 5) == 0) - pci_vtnet_tap_setup(sc, devname); + sc->vsc_be = netbe_init(devname, pci_vtnet_rx_callback, sc); + if (!sc->vsc_be) { + WPRINTF(("net backend initialization failed\n")); + } else { + vc->vc_hv_caps |= netbe_get_cap(sc->vsc_be); + } free(devname); } - /* - * The default MAC address is the standard NetApp OUI of 00-a0-98, - * followed by an MD5 of the PCI slot/func number and dev name - */ if (!mac_provided) { - snprintf(nstr, sizeof(nstr), "%d-%d-%s", pi->pi_slot, - pi->pi_func, vmname); - - MD5Init(&mdctx); - MD5Update(&mdctx, nstr, strlen(nstr)); - MD5Final(digest, &mdctx); - - sc->vsc_config.mac[0] = 0x00; - sc->vsc_config.mac[1] = 0xa0; - sc->vsc_config.mac[2] = 0x98; - sc->vsc_config.mac[3] = digest[0]; - sc->vsc_config.mac[4] = digest[1]; - sc->vsc_config.mac[5] = digest[2]; + net_genmac(pi, sc->vsc_config.mac); } /* initialize config space */ @@ -887,22 +486,23 @@ pci_set_cfgdata16(pi, PCIR_SUBDEV_0, VIRTIO_TYPE_NET); pci_set_cfgdata16(pi, PCIR_SUBVEND_0, VIRTIO_VENDOR); - /* Link is up if we managed to open tap device or vale port. */ - sc->vsc_config.status = (opts == NULL || sc->vsc_tapfd >= 0 || - sc->vsc_nmd != NULL); + /* Link is up if we managed to open backend device. */ + sc->vsc_config.status = (opts == NULL || sc->vsc_be); + vi_softc_linkup(&sc->vsc_vs, vc, sc, pi, sc->vsc_queues); + sc->vsc_vs.vs_mtx = &sc->vsc_mtx; + /* use BAR 1 to map MSI-X table and PBA, if we're using MSI-X */ if (vi_intr_init(&sc->vsc_vs, 1, fbsdrun_virtio_msix())) return (1); /* use BAR 0 to map config regs in IO space */ - vi_set_io_bar(&sc->vsc_vs, 0); + vi_set_io_bar(&sc->vsc_vs, 0); /* calls into virtio */ sc->resetting = 0; sc->rx_merge = 1; sc->rx_vhdrlen = sizeof(struct virtio_net_rxhdr); - sc->rx_in_progress = 0; pthread_mutex_init(&sc->rx_mtx, NULL); /* @@ -914,9 +514,6 @@ pthread_mutex_init(&sc->tx_mtx, NULL); pthread_cond_init(&sc->tx_cond, NULL); pthread_create(&sc->tx_tid, NULL, pci_vtnet_tx_thread, (void *)sc); - snprintf(tname, sizeof(tname), "vtnet-%d:%d tx", pi->pi_slot, - pi->pi_func); - pthread_set_name_np(sc->tx_tid, tname); return (0); } @@ -927,8 +524,8 @@ struct pci_vtnet_softc *sc = vsc; void *ptr; - if (offset < 6) { - assert(offset + size <= 6); + if (offset < (int)sizeof(sc->vsc_config.mac)) { + assert(offset + size <= (int)sizeof(sc->vsc_config.mac)); /* * The driver is allowed to change the MAC address */ @@ -960,14 +557,17 @@ sc->vsc_features = negotiated_features; - if (!(sc->vsc_features & VIRTIO_NET_F_MRG_RXBUF)) { + if (!(negotiated_features & VIRTIO_NET_F_MRG_RXBUF)) { sc->rx_merge = 0; /* non-merge rx header is 2 bytes shorter */ sc->rx_vhdrlen -= 2; } + + /* Tell the backend to enable some capabilities it has advertised. */ + netbe_set_cap(sc->vsc_be, negotiated_features, sc->rx_vhdrlen); } -struct pci_devemu pci_de_vnet = { +static struct pci_devemu pci_de_vnet = { .pe_emu = "virtio-net", .pe_init = pci_vtnet_init, .pe_barwrite = vi_pci_write, diff -u -r -N usr/src/usr.sbin/bhyve/pci_virtio_net.c.orig /usr/src/usr.sbin/bhyve/pci_virtio_net.c.orig --- usr/src/usr.sbin/bhyve/pci_virtio_net.c.orig 1970-01-01 01:00:00.000000000 +0100 +++ /usr/src/usr.sbin/bhyve/pci_virtio_net.c.orig 2016-09-29 00:25:07.000000000 +0100 @@ -0,0 +1,976 @@ +/*- + * Copyright (c) 2011 NetApp, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: releng/11.0/usr.sbin/bhyve/pci_virtio_net.c 296829 2016-03-14 08:48:16Z gnn $ + */ + +#include <sys/cdefs.h> +__FBSDID("$FreeBSD: releng/11.0/usr.sbin/bhyve/pci_virtio_net.c 296829 2016-03-14 08:48:16Z gnn $"); + +#include <sys/param.h> +#include <sys/linker_set.h> +#include <sys/select.h> +#include <sys/uio.h> +#include <sys/ioctl.h> +#include <machine/atomic.h> +#include <net/ethernet.h> +#ifndef NETMAP_WITH_LIBS +#define NETMAP_WITH_LIBS +#endif +#include <net/netmap_user.h> + +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <strings.h> +#include <unistd.h> +#include <assert.h> +#include <md5.h> +#include <pthread.h> +#include <pthread_np.h> + +#include "bhyverun.h" +#include "pci_emul.h" +#include "mevent.h" +#include "virtio.h" + +#define VTNET_RINGSZ 1024 + +#define VTNET_MAXSEGS 256 + +/* + * Host capabilities. Note that we only offer a few of these. + */ +#define VIRTIO_NET_F_CSUM (1 << 0) /* host handles partial cksum */ +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) /* guest handles partial cksum */ +#define VIRTIO_NET_F_MAC (1 << 5) /* host supplies MAC */ +#define VIRTIO_NET_F_GSO_DEPREC (1 << 6) /* deprecated: host handles GSO */ +#define VIRTIO_NET_F_GUEST_TSO4 (1 << 7) /* guest can rcv TSOv4 */ +#define VIRTIO_NET_F_GUEST_TSO6 (1 << 8) /* guest can rcv TSOv6 */ +#define VIRTIO_NET_F_GUEST_ECN (1 << 9) /* guest can rcv TSO with ECN */ +#define VIRTIO_NET_F_GUEST_UFO (1 << 10) /* guest can rcv UFO */ +#define VIRTIO_NET_F_HOST_TSO4 (1 << 11) /* host can rcv TSOv4 */ +#define VIRTIO_NET_F_HOST_TSO6 (1 << 12) /* host can rcv TSOv6 */ +#define VIRTIO_NET_F_HOST_ECN (1 << 13) /* host can rcv TSO with ECN */ +#define VIRTIO_NET_F_HOST_UFO (1 << 14) /* host can rcv UFO */ +#define VIRTIO_NET_F_MRG_RXBUF (1 << 15) /* host can merge RX buffers */ +#define VIRTIO_NET_F_STATUS (1 << 16) /* config status field available */ +#define VIRTIO_NET_F_CTRL_VQ (1 << 17) /* control channel available */ +#define VIRTIO_NET_F_CTRL_RX (1 << 18) /* control channel RX mode support */ +#define VIRTIO_NET_F_CTRL_VLAN (1 << 19) /* control channel VLAN filtering */ +#define VIRTIO_NET_F_GUEST_ANNOUNCE \ + (1 << 21) /* guest can send gratuitous pkts */ + +#define VTNET_S_HOSTCAPS \ + ( VIRTIO_NET_F_MAC | VIRTIO_NET_F_MRG_RXBUF | VIRTIO_NET_F_STATUS | \ + VIRTIO_F_NOTIFY_ON_EMPTY | VIRTIO_RING_F_INDIRECT_DESC) + +/* + * PCI config-space "registers" + */ +struct virtio_net_config { + uint8_t mac[6]; + uint16_t status; +} __packed; + +/* + * Queue definitions. + */ +#define VTNET_RXQ 0 +#define VTNET_TXQ 1 +#define VTNET_CTLQ 2 /* NB: not yet supported */ + +#define VTNET_MAXQ 3 + +/* + * Fixed network header size + */ +struct virtio_net_rxhdr { + uint8_t vrh_flags; + uint8_t vrh_gso_type; + uint16_t vrh_hdr_len; + uint16_t vrh_gso_size; + uint16_t vrh_csum_start; + uint16_t vrh_csum_offset; + uint16_t vrh_bufs; +} __packed; + +/* + * Debug printf + */ +static int pci_vtnet_debug; +#define DPRINTF(params) if (pci_vtnet_debug) printf params +#define WPRINTF(params) printf params + +/* + * Per-device softc + */ +struct pci_vtnet_softc { + struct virtio_softc vsc_vs; + struct vqueue_info vsc_queues[VTNET_MAXQ - 1]; + pthread_mutex_t vsc_mtx; + struct mevent *vsc_mevp; + + int vsc_tapfd; + struct nm_desc *vsc_nmd; + + int vsc_rx_ready; + volatile int resetting; /* set and checked outside lock */ + + uint64_t vsc_features; /* negotiated features */ + + struct virtio_net_config vsc_config; + + pthread_mutex_t rx_mtx; + int rx_in_progress; + int rx_vhdrlen; + int rx_merge; /* merged rx bufs in use */ + + pthread_t tx_tid; + pthread_mutex_t tx_mtx; + pthread_cond_t tx_cond; + int tx_in_progress; + + void (*pci_vtnet_rx)(struct pci_vtnet_softc *sc); + void (*pci_vtnet_tx)(struct pci_vtnet_softc *sc, struct iovec *iov, + int iovcnt, int len); +}; + +static void pci_vtnet_reset(void *); +/* static void pci_vtnet_notify(void *, struct vqueue_info *); */ +static int pci_vtnet_cfgread(void *, int, int, uint32_t *); +static int pci_vtnet_cfgwrite(void *, int, int, uint32_t); +static void pci_vtnet_neg_features(void *, uint64_t); + +static struct virtio_consts vtnet_vi_consts = { + "vtnet", /* our name */ + VTNET_MAXQ - 1, /* we currently support 2 virtqueues */ + sizeof(struct virtio_net_config), /* config reg size */ + pci_vtnet_reset, /* reset */ + NULL, /* device-wide qnotify -- not used */ + pci_vtnet_cfgread, /* read PCI config */ + pci_vtnet_cfgwrite, /* write PCI config */ + pci_vtnet_neg_features, /* apply negotiated features */ + VTNET_S_HOSTCAPS, /* our capabilities */ +}; + +/* + * If the transmit thread is active then stall until it is done. + */ +static void +pci_vtnet_txwait(struct pci_vtnet_softc *sc) +{ + + pthread_mutex_lock(&sc->tx_mtx); + while (sc->tx_in_progress) { + pthread_mutex_unlock(&sc->tx_mtx); + usleep(10000); + pthread_mutex_lock(&sc->tx_mtx); + } + pthread_mutex_unlock(&sc->tx_mtx); +} + +/* + * If the receive thread is active then stall until it is done. + */ +static void +pci_vtnet_rxwait(struct pci_vtnet_softc *sc) +{ + + pthread_mutex_lock(&sc->rx_mtx); + while (sc->rx_in_progress) { + pthread_mutex_unlock(&sc->rx_mtx); + usleep(10000); + pthread_mutex_lock(&sc->rx_mtx); + } + pthread_mutex_unlock(&sc->rx_mtx); +} + +static void +pci_vtnet_reset(void *vsc) +{ + struct pci_vtnet_softc *sc = vsc; + + DPRINTF(("vtnet: device reset requested !\n")); + + sc->resetting = 1; + + /* + * Wait for the transmit and receive threads to finish their + * processing. + */ + pci_vtnet_txwait(sc); + pci_vtnet_rxwait(sc); + + sc->vsc_rx_ready = 0; + sc->rx_merge = 1; + sc->rx_vhdrlen = sizeof(struct virtio_net_rxhdr); + + /* now reset rings, MSI-X vectors, and negotiated capabilities */ + vi_reset_dev(&sc->vsc_vs); + + sc->resetting = 0; +} + +/* + * Called to send a buffer chain out to the tap device + */ +static void +pci_vtnet_tap_tx(struct pci_vtnet_softc *sc, struct iovec *iov, int iovcnt, + int len) +{ + static char pad[60]; /* all zero bytes */ + + if (sc->vsc_tapfd == -1) + return; + + /* + * If the length is < 60, pad out to that and add the + * extra zero'd segment to the iov. It is guaranteed that + * there is always an extra iov available by the caller. + */ + if (len < 60) { + iov[iovcnt].iov_base = pad; + iov[iovcnt].iov_len = 60 - len; + iovcnt++; + } + (void) writev(sc->vsc_tapfd, iov, iovcnt); +} + +/* + * Called when there is read activity on the tap file descriptor. + * Each buffer posted by the guest is assumed to be able to contain + * an entire ethernet frame + rx header. + * MP note: the dummybuf is only used for discarding frames, so there + * is no need for it to be per-vtnet or locked. + */ +static uint8_t dummybuf[2048]; + +static __inline struct iovec * +rx_iov_trim(struct iovec *iov, int *niov, int tlen) +{ + struct iovec *riov; + + /* XXX short-cut: assume first segment is >= tlen */ + assert(iov[0].iov_len >= tlen); + + iov[0].iov_len -= tlen; + if (iov[0].iov_len == 0) { + assert(*niov > 1); + *niov -= 1; + riov = &iov[1]; + } else { + iov[0].iov_base = (void *)((uintptr_t)iov[0].iov_base + tlen); + riov = &iov[0]; + } + + return (riov); +} + +static void +pci_vtnet_tap_rx(struct pci_vtnet_softc *sc) +{ + struct iovec iov[VTNET_MAXSEGS], *riov; + struct vqueue_info *vq; + void *vrx; + int len, n; + uint16_t idx; + + /* + * Should never be called without a valid tap fd + */ + assert(sc->vsc_tapfd != -1); + + /* + * But, will be called when the rx ring hasn't yet + * been set up or the guest is resetting the device. + */ + if (!sc->vsc_rx_ready || sc->resetting) { + /* + * Drop the packet and try later. + */ + (void) read(sc->vsc_tapfd, dummybuf, sizeof(dummybuf)); + return; + } + + /* + * Check for available rx buffers + */ + vq = &sc->vsc_queues[VTNET_RXQ]; + if (!vq_has_descs(vq)) { + /* + * Drop the packet and try later. Interrupt on + * empty, if that's negotiated. + */ + (void) read(sc->vsc_tapfd, dummybuf, sizeof(dummybuf)); + vq_endchains(vq, 1); + return; + } + + do { + /* + * Get descriptor chain. + */ + n = vq_getchain(vq, &idx, iov, VTNET_MAXSEGS, NULL); + assert(n >= 1 && n <= VTNET_MAXSEGS); + + /* + * Get a pointer to the rx header, and use the + * data immediately following it for the packet buffer. + */ + vrx = iov[0].iov_base; + riov = rx_iov_trim(iov, &n, sc->rx_vhdrlen); + + len = readv(sc->vsc_tapfd, riov, n); + + if (len < 0 && errno == EWOULDBLOCK) { + /* + * No more packets, but still some avail ring + * entries. Interrupt if needed/appropriate. + */ + vq_retchain(vq); + vq_endchains(vq, 0); + return; + } + + /* + * The only valid field in the rx packet header is the + * number of buffers if merged rx bufs were negotiated. + */ + memset(vrx, 0, sc->rx_vhdrlen); + + if (sc->rx_merge) { + struct virtio_net_rxhdr *vrxh; + + vrxh = vrx; + vrxh->vrh_bufs = 1; + } + + /* + * Release this chain and handle more chains. + */ + vq_relchain(vq, idx, len + sc->rx_vhdrlen); + } while (vq_has_descs(vq)); + + /* Interrupt if needed, including for NOTIFY_ON_EMPTY. */ + vq_endchains(vq, 1); +} + +static __inline int +pci_vtnet_netmap_writev(struct nm_desc *nmd, struct iovec *iov, int iovcnt) +{ + int r, i; + int len = 0; + + for (r = nmd->cur_tx_ring; ; ) { + struct netmap_ring *ring = NETMAP_TXRING(nmd->nifp, r); + uint32_t cur, idx; + char *buf; + + if (nm_ring_empty(ring)) { + r++; + if (r > nmd->last_tx_ring) + r = nmd->first_tx_ring; + if (r == nmd->cur_tx_ring) + break; + continue; + } + cur = ring->cur; + idx = ring->slot[cur].buf_idx; + buf = NETMAP_BUF(ring, idx); + + for (i = 0; i < iovcnt; i++) { + if (len + iov[i].iov_len > 2048) + break; + memcpy(&buf[len], iov[i].iov_base, iov[i].iov_len); + len += iov[i].iov_len; + } + ring->slot[cur].len = len; + ring->head = ring->cur = nm_ring_next(ring, cur); + nmd->cur_tx_ring = r; + ioctl(nmd->fd, NIOCTXSYNC, NULL); + break; + } + + return (len); +} + +static __inline int +pci_vtnet_netmap_readv(struct nm_desc *nmd, struct iovec *iov, int iovcnt) +{ + int len = 0; + int i = 0; + int r; + + for (r = nmd->cur_rx_ring; ; ) { + struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, r); + uint32_t cur, idx; + char *buf; + size_t left; + + if (nm_ring_empty(ring)) { + r++; + if (r > nmd->last_rx_ring) + r = nmd->first_rx_ring; + if (r == nmd->cur_rx_ring) + break; + continue; + } + cur = ring->cur; + idx = ring->slot[cur].buf_idx; + buf = NETMAP_BUF(ring, idx); + left = ring->slot[cur].len; + + for (i = 0; i < iovcnt && left > 0; i++) { + if (iov[i].iov_len > left) + iov[i].iov_len = left; + memcpy(iov[i].iov_base, &buf[len], iov[i].iov_len); + len += iov[i].iov_len; + left -= iov[i].iov_len; + } + ring->head = ring->cur = nm_ring_next(ring, cur); + nmd->cur_rx_ring = r; + ioctl(nmd->fd, NIOCRXSYNC, NULL); + break; + } + for (; i < iovcnt; i++) + iov[i].iov_len = 0; + + return (len); +} + +/* + * Called to send a buffer chain out to the vale port + */ +static void +pci_vtnet_netmap_tx(struct pci_vtnet_softc *sc, struct iovec *iov, int iovcnt, + int len) +{ + static char pad[60]; /* all zero bytes */ + + if (sc->vsc_nmd == NULL) + return; + + /* + * If the length is < 60, pad out to that and add the + * extra zero'd segment to the iov. It is guaranteed that + * there is always an extra iov available by the caller. + */ + if (len < 60) { + iov[iovcnt].iov_base = pad; + iov[iovcnt].iov_len = 60 - len; + iovcnt++; + } + (void) pci_vtnet_netmap_writev(sc->vsc_nmd, iov, iovcnt); +} + +static void +pci_vtnet_netmap_rx(struct pci_vtnet_softc *sc) +{ + struct iovec iov[VTNET_MAXSEGS], *riov; + struct vqueue_info *vq; + void *vrx; + int len, n; + uint16_t idx; + + /* + * Should never be called without a valid netmap descriptor + */ + assert(sc->vsc_nmd != NULL); + + /* + * But, will be called when the rx ring hasn't yet + * been set up or the guest is resetting the device. + */ + if (!sc->vsc_rx_ready || sc->resetting) { + /* + * Drop the packet and try later. + */ + (void) nm_nextpkt(sc->vsc_nmd, (void *)dummybuf); + return; + } + + /* + * Check for available rx buffers + */ + vq = &sc->vsc_queues[VTNET_RXQ]; + if (!vq_has_descs(vq)) { + /* + * Drop the packet and try later. Interrupt on + * empty, if that's negotiated. + */ + (void) nm_nextpkt(sc->vsc_nmd, (void *)dummybuf); + vq_endchains(vq, 1); + return; + } + + do { + /* + * Get descriptor chain. + */ + n = vq_getchain(vq, &idx, iov, VTNET_MAXSEGS, NULL); + assert(n >= 1 && n <= VTNET_MAXSEGS); + + /* + * Get a pointer to the rx header, and use the + * data immediately following it for the packet buffer. + */ + vrx = iov[0].iov_base; + riov = rx_iov_trim(iov, &n, sc->rx_vhdrlen); + + len = pci_vtnet_netmap_readv(sc->vsc_nmd, riov, n); + + if (len == 0) { + /* + * No more packets, but still some avail ring + * entries. Interrupt if needed/appropriate. + */ + vq_retchain(vq); + vq_endchains(vq, 0); + return; + } + + /* + * The only valid field in the rx packet header is the + * number of buffers if merged rx bufs were negotiated. + */ + memset(vrx, 0, sc->rx_vhdrlen); + + if (sc->rx_merge) { + struct virtio_net_rxhdr *vrxh; + + vrxh = vrx; + vrxh->vrh_bufs = 1; + } + + /* + * Release this chain and handle more chains. + */ + vq_relchain(vq, idx, len + sc->rx_vhdrlen); + } while (vq_has_descs(vq)); + + /* Interrupt if needed, including for NOTIFY_ON_EMPTY. */ + vq_endchains(vq, 1); +} + +static void +pci_vtnet_rx_callback(int fd, enum ev_type type, void *param) +{ + struct pci_vtnet_softc *sc = param; + + pthread_mutex_lock(&sc->rx_mtx); + sc->rx_in_progress = 1; + sc->pci_vtnet_rx(sc); + sc->rx_in_progress = 0; + pthread_mutex_unlock(&sc->rx_mtx); + +} + +static void +pci_vtnet_ping_rxq(void *vsc, struct vqueue_info *vq) +{ + struct pci_vtnet_softc *sc = vsc; + + /* + * A qnotify means that the rx process can now begin + */ + if (sc->vsc_rx_ready == 0) { + sc->vsc_rx_ready = 1; + vq->vq_used->vu_flags |= VRING_USED_F_NO_NOTIFY; + } +} + +static void +pci_vtnet_proctx(struct pci_vtnet_softc *sc, struct vqueue_info *vq) +{ + struct iovec iov[VTNET_MAXSEGS + 1]; + int i, n; + int plen, tlen; + uint16_t idx; + + /* + * Obtain chain of descriptors. The first one is + * really the header descriptor, so we need to sum + * up two lengths: packet length and transfer length. + */ + n = vq_getchain(vq, &idx, iov, VTNET_MAXSEGS, NULL); + assert(n >= 1 && n <= VTNET_MAXSEGS); + plen = 0; + tlen = iov[0].iov_len; + for (i = 1; i < n; i++) { + plen += iov[i].iov_len; + tlen += iov[i].iov_len; + } + + DPRINTF(("virtio: packet send, %d bytes, %d segs\n\r", plen, n)); + sc->pci_vtnet_tx(sc, &iov[1], n - 1, plen); + + /* chain is processed, release it and set tlen */ + vq_relchain(vq, idx, tlen); +} + +static void +pci_vtnet_ping_txq(void *vsc, struct vqueue_info *vq) +{ + struct pci_vtnet_softc *sc = vsc; + + /* + * Any ring entries to process? + */ + if (!vq_has_descs(vq)) + return; + + /* Signal the tx thread for processing */ + pthread_mutex_lock(&sc->tx_mtx); + vq->vq_used->vu_flags |= VRING_USED_F_NO_NOTIFY; + if (sc->tx_in_progress == 0) + pthread_cond_signal(&sc->tx_cond); + pthread_mutex_unlock(&sc->tx_mtx); +} + +/* + * Thread which will handle processing of TX desc + */ +static void * +pci_vtnet_tx_thread(void *param) +{ + struct pci_vtnet_softc *sc = param; + struct vqueue_info *vq; + int error; + + vq = &sc->vsc_queues[VTNET_TXQ]; + + /* + * Let us wait till the tx queue pointers get initialised & + * first tx signaled + */ + pthread_mutex_lock(&sc->tx_mtx); + error = pthread_cond_wait(&sc->tx_cond, &sc->tx_mtx); + assert(error == 0); + + for (;;) { + /* note - tx mutex is locked here */ + while (sc->resetting || !vq_has_descs(vq)) { + vq->vq_used->vu_flags &= ~VRING_USED_F_NO_NOTIFY; + mb(); + if (!sc->resetting && vq_has_descs(vq)) + break; + + sc->tx_in_progress = 0; + error = pthread_cond_wait(&sc->tx_cond, &sc->tx_mtx); + assert(error == 0); + } + vq->vq_used->vu_flags |= VRING_USED_F_NO_NOTIFY; + sc->tx_in_progress = 1; + pthread_mutex_unlock(&sc->tx_mtx); + + do { + /* + * Run through entries, placing them into + * iovecs and sending when an end-of-packet + * is found + */ + pci_vtnet_proctx(sc, vq); + } while (vq_has_descs(vq)); + + /* + * Generate an interrupt if needed. + */ + vq_endchains(vq, 1); + + pthread_mutex_lock(&sc->tx_mtx); + } +} + +#ifdef notyet +static void +pci_vtnet_ping_ctlq(void *vsc, struct vqueue_info *vq) +{ + + DPRINTF(("vtnet: control qnotify!\n\r")); +} +#endif + +static int +pci_vtnet_parsemac(char *mac_str, uint8_t *mac_addr) +{ + struct ether_addr *ea; + char *tmpstr; + char zero_addr[ETHER_ADDR_LEN] = { 0, 0, 0, 0, 0, 0 }; + + tmpstr = strsep(&mac_str,"="); + + if ((mac_str != NULL) && (!strcmp(tmpstr,"mac"))) { + ea = ether_aton(mac_str); + + if (ea == NULL || ETHER_IS_MULTICAST(ea->octet) || + memcmp(ea->octet, zero_addr, ETHER_ADDR_LEN) == 0) { + fprintf(stderr, "Invalid MAC %s\n", mac_str); + return (EINVAL); + } else + memcpy(mac_addr, ea->octet, ETHER_ADDR_LEN); + } + + return (0); +} + +static void +pci_vtnet_tap_setup(struct pci_vtnet_softc *sc, char *devname) +{ + char tbuf[80]; + + strcpy(tbuf, "/dev/"); + strlcat(tbuf, devname, sizeof(tbuf)); + + sc->pci_vtnet_rx = pci_vtnet_tap_rx; + sc->pci_vtnet_tx = pci_vtnet_tap_tx; + + sc->vsc_tapfd = open(tbuf, O_RDWR); + if (sc->vsc_tapfd == -1) { + WPRINTF(("open of tap device %s failed\n", tbuf)); + return; + } + + /* + * Set non-blocking and register for read + * notifications with the event loop + */ + int opt = 1; + if (ioctl(sc->vsc_tapfd, FIONBIO, &opt) < 0) { + WPRINTF(("tap device O_NONBLOCK failed\n")); + close(sc->vsc_tapfd); + sc->vsc_tapfd = -1; + } + + sc->vsc_mevp = mevent_add(sc->vsc_tapfd, + EVF_READ, + pci_vtnet_rx_callback, + sc); + if (sc->vsc_mevp == NULL) { + WPRINTF(("Could not register event\n")); + close(sc->vsc_tapfd); + sc->vsc_tapfd = -1; + } +} + +static void +pci_vtnet_netmap_setup(struct pci_vtnet_softc *sc, char *ifname) +{ + sc->pci_vtnet_rx = pci_vtnet_netmap_rx; + sc->pci_vtnet_tx = pci_vtnet_netmap_tx; + + sc->vsc_nmd = nm_open(ifname, NULL, 0, 0); + if (sc->vsc_nmd == NULL) { + WPRINTF(("open of netmap device %s failed\n", ifname)); + return; + } + + sc->vsc_mevp = mevent_add(sc->vsc_nmd->fd, + EVF_READ, + pci_vtnet_rx_callback, + sc); + if (sc->vsc_mevp == NULL) { + WPRINTF(("Could not register event\n")); + nm_close(sc->vsc_nmd); + sc->vsc_nmd = NULL; + } +} + +static int +pci_vtnet_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) +{ + MD5_CTX mdctx; + unsigned char digest[16]; + char nstr[80]; + char tname[MAXCOMLEN + 1]; + struct pci_vtnet_softc *sc; + char *devname; + char *vtopts; + int mac_provided; + + sc = calloc(1, sizeof(struct pci_vtnet_softc)); + + pthread_mutex_init(&sc->vsc_mtx, NULL); + + vi_softc_linkup(&sc->vsc_vs, &vtnet_vi_consts, sc, pi, sc->vsc_queues); + sc->vsc_vs.vs_mtx = &sc->vsc_mtx; + + sc->vsc_queues[VTNET_RXQ].vq_qsize = VTNET_RINGSZ; + sc->vsc_queues[VTNET_RXQ].vq_notify = pci_vtnet_ping_rxq; + sc->vsc_queues[VTNET_TXQ].vq_qsize = VTNET_RINGSZ; + sc->vsc_queues[VTNET_TXQ].vq_notify = pci_vtnet_ping_txq; +#ifdef notyet + sc->vsc_queues[VTNET_CTLQ].vq_qsize = VTNET_RINGSZ; + sc->vsc_queues[VTNET_CTLQ].vq_notify = pci_vtnet_ping_ctlq; +#endif + + /* + * Attempt to open the tap device and read the MAC address + * if specified + */ + mac_provided = 0; + sc->vsc_tapfd = -1; + sc->vsc_nmd = NULL; + if (opts != NULL) { + int err; + + devname = vtopts = strdup(opts); + (void) strsep(&vtopts, ","); + + if (vtopts != NULL) { + err = pci_vtnet_parsemac(vtopts, sc->vsc_config.mac); + if (err != 0) { + free(devname); + return (err); + } + mac_provided = 1; + } + + if (strncmp(devname, "vale", 4) == 0) + pci_vtnet_netmap_setup(sc, devname); + if (strncmp(devname, "tap", 3) == 0 || + strncmp(devname, "vmnet", 5) == 0) + pci_vtnet_tap_setup(sc, devname); + + free(devname); + } + + /* + * The default MAC address is the standard NetApp OUI of 00-a0-98, + * followed by an MD5 of the PCI slot/func number and dev name + */ + if (!mac_provided) { + snprintf(nstr, sizeof(nstr), "%d-%d-%s", pi->pi_slot, + pi->pi_func, vmname); + + MD5Init(&mdctx); + MD5Update(&mdctx, nstr, strlen(nstr)); + MD5Final(digest, &mdctx); + + sc->vsc_config.mac[0] = 0x00; + sc->vsc_config.mac[1] = 0xa0; + sc->vsc_config.mac[2] = 0x98; + sc->vsc_config.mac[3] = digest[0]; + sc->vsc_config.mac[4] = digest[1]; + sc->vsc_config.mac[5] = digest[2]; + } + + /* initialize config space */ + pci_set_cfgdata16(pi, PCIR_DEVICE, VIRTIO_DEV_NET); + pci_set_cfgdata16(pi, PCIR_VENDOR, VIRTIO_VENDOR); + pci_set_cfgdata8(pi, PCIR_CLASS, PCIC_NETWORK); + pci_set_cfgdata16(pi, PCIR_SUBDEV_0, VIRTIO_TYPE_NET); + pci_set_cfgdata16(pi, PCIR_SUBVEND_0, VIRTIO_VENDOR); + + /* Link is up if we managed to open tap device or vale port. */ + sc->vsc_config.status = (opts == NULL || sc->vsc_tapfd >= 0 || + sc->vsc_nmd != NULL); + + /* use BAR 1 to map MSI-X table and PBA, if we're using MSI-X */ + if (vi_intr_init(&sc->vsc_vs, 1, fbsdrun_virtio_msix())) + return (1); + + /* use BAR 0 to map config regs in IO space */ + vi_set_io_bar(&sc->vsc_vs, 0); + + sc->resetting = 0; + + sc->rx_merge = 1; + sc->rx_vhdrlen = sizeof(struct virtio_net_rxhdr); + sc->rx_in_progress = 0; + pthread_mutex_init(&sc->rx_mtx, NULL); + + /* + * Initialize tx semaphore & spawn TX processing thread. + * As of now, only one thread for TX desc processing is + * spawned. + */ + sc->tx_in_progress = 0; + pthread_mutex_init(&sc->tx_mtx, NULL); + pthread_cond_init(&sc->tx_cond, NULL); + pthread_create(&sc->tx_tid, NULL, pci_vtnet_tx_thread, (void *)sc); + snprintf(tname, sizeof(tname), "vtnet-%d:%d tx", pi->pi_slot, + pi->pi_func); + pthread_set_name_np(sc->tx_tid, tname); + + return (0); +} + +static int +pci_vtnet_cfgwrite(void *vsc, int offset, int size, uint32_t value) +{ + struct pci_vtnet_softc *sc = vsc; + void *ptr; + + if (offset < 6) { + assert(offset + size <= 6); + /* + * The driver is allowed to change the MAC address + */ + ptr = &sc->vsc_config.mac[offset]; + memcpy(ptr, &value, size); + } else { + /* silently ignore other writes */ + DPRINTF(("vtnet: write to readonly reg %d\n\r", offset)); + } + + return (0); +} + +static int +pci_vtnet_cfgread(void *vsc, int offset, int size, uint32_t *retval) +{ + struct pci_vtnet_softc *sc = vsc; + void *ptr; + + ptr = (uint8_t *)&sc->vsc_config + offset; + memcpy(retval, ptr, size); + return (0); +} + +static void +pci_vtnet_neg_features(void *vsc, uint64_t negotiated_features) +{ + struct pci_vtnet_softc *sc = vsc; + + sc->vsc_features = negotiated_features; + + if (!(sc->vsc_features & VIRTIO_NET_F_MRG_RXBUF)) { + sc->rx_merge = 0; + /* non-merge rx header is 2 bytes shorter */ + sc->rx_vhdrlen -= 2; + } +} + +struct pci_devemu pci_de_vnet = { + .pe_emu = "virtio-net", + .pe_init = pci_vtnet_init, + .pe_barwrite = vi_pci_write, + .pe_barread = vi_pci_read +}; +PCI_EMUL_SET(pci_de_vnet); [-- Attachment #3 --] TAP: 375.748718 main_thread [2325] 67.522 Kpps (67.859 Kpkts 32.572 Mbps in 1004986 usec) 253.21 avg_batch 99999 min_space 376.751736 main_thread [2325] 66.484 Kpps (66.685 Kpkts 32.009 Mbps in 1003017 usec) 248.82 avg_batch 99999 min_space 377.761533 main_thread [2325] 65.043 Kpps (65.680 Kpkts 31.526 Mbps in 1009797 usec) 252.62 avg_batch 99999 min_space 378.766738 main_thread [2325] 65.329 Kpps (65.669 Kpkts 31.521 Mbps in 1005206 usec) 257.53 avg_batch 99999 min_space 379.780398 main_thread [2325] 68.006 Kpps (68.935 Kpkts 33.089 Mbps in 1013660 usec) 253.44 avg_batch 99999 min_space 380.785733 main_thread [2325] 64.262 Kpps (64.605 Kpkts 31.010 Mbps in 1005335 usec) 251.38 avg_batch 99999 min_space 381.792360 main_thread [2325] 67.290 Kpps (67.736 Kpkts 32.513 Mbps in NETMAP with Vale VM1 (TX): 052.122745 main_thread [2325] 121.617 Kpps (121.735 Kpkts 58.433 Mbps in 1000973 usec) 209.89 avg_batch 0 min_space 053.234788 main_thread [2325] 144.178 Kpps (160.332 Kpkts 76.959 Mbps in 1112043 usec) 214.63 avg_batch 99999 min_space 054.239751 main_thread [2325] 139.072 Kpps (139.762 Kpkts 67.086 Mbps in 1004962 usec) 215.68 avg_batch 99999 min_space 055.249794 main_thread [2325] 152.888 Kpps (154.424 Kpkts 74.124 Mbps in 1010044 usec) 210.67 avg_batch 99999 min_space 056.260799 main_thread [2325] 142.566 Kpps (144.135 Kpkts 69.185 Mbps in 1011005 usec) 214.81 avg_batch 99999 min_space 057.265225 main_thread [2325] 143.575 Kpps (144.210 Kpkts 69.221 Mbps in 1004426 usec) 215.56 avg_batch 99999 min_space 058.273468 main_thread [2325] 154.912 Kpps (156.189 Kpkts 74.971 Mbps in 1008242 usec) 209.93 avg_batch 99999 min_space 059.278795 main_thread [2325] 141.722 Kpps (142.477 Kpkts 68.389 Mbps in 1005328 usec) 217.85 avg_batch 99999 min_space 060.340699 main_thread [2325] 145.871 Kpps (154.901 Kpkts 74.352 Mbps in 1061904 usec) 215.14 avg_batch 99999 min_space 061.345748 main_thread [2325] 144.221 Kpps (144.949 Kpkts 69.576 Mbps in 1005048 usec) 209.16 avg_batch 99999 min_space NETMAP with Vale VM2(RX): 054.968574 main_thread [2325] 49.952 Kpps (54.362 Kpkts 26.094 Mbps in 1088290 usec) 74.67 avg_batch 1 min_space 056.017426 main_thread [2325] 64.290 Kpps (67.431 Kpkts 32.367 Mbps in 1048852 usec) 132.22 avg_batch 1 min_space 057.021036 main_thread [2325] 93.405 Kpps (93.742 Kpkts 44.996 Mbps in 1003610 usec) 68.63 avg_batch 1 min_space 058.032650 main_thread [2325] 81.058 Kpps (81.999 Kpkts 39.360 Mbps in 1011614 usec) 80.39 avg_batch 1 min_space 059.059176 main_thread [2325] 85.816 Kpps (88.092 Kpkts 42.284 Mbps in 1026526 usec) 73.84 avg_batch 1 min_space 060.078563 main_thread [2325] 66.959 Kpps (68.096 Kpkts 32.686 Mbps in 1016985 usec) 512.00 avg_batch 1 min_space 061.088738 main_thread [2325] 79.911 Kpps (80.916 Kpkts 38.840 Mbps in 1012576 usec) 137.15 avg_batch 1 min_space 062.104505 main_thread [2325] 90.814 Kpps (92.007 Kpkts 44.163 Mbps in 1013138 usec) 73.20 avg_batch 1 min_space 063.115433 main_thread [2325] 86.233 Kpps (87.402 Kpkts 41.953 Mbps in 1013558 usec) 94.28 avg_batch 1 min_space 064.134497 main_thread [2325] 86.351 Kpps (87.997 Kpkts 42.239 Mbps in 1019062 usec) 413.13 avg_batch 1 min_space 065.148983 main_thread [2325] 69.483 Kpps (70.390 Kpkts 33.787 Mbps in 1013057 usec) 186.22 avg_batch 1 min_space 066.172555 main_thread [2325] 91.347 Kpps (93.573 Kpkts 44.915 Mbps in 1024365 usec) 125.60 avg_batch 1 min_space 067.190699 main_thread [2325] 79.280 Kpps (80.769 Kpkts 38.769 Mbps in 1018782 usec) 114.57 avg_batch 1 min_space 068.209058 main_thread [2325] 107.100 Kpps (109.066 Kpkts 52.352 Mbps in 1018358 usec) 33.24 avg_batch 1 min_space PTNET(VM1 TX): 920.707900 main_thread [2325] 7.216 Mpps (7.238 Mpkts 4.169 Gbps in 1003043 usec) 511.50 avg_batch 99999 min_space 921.709890 main_thread [2325] 7.114 Mpps (7.128 Mpkts 4.106 Gbps in 1001989 usec) 511.50 avg_batch 99999 min_space 922.712865 main_thread [2325] 7.277 Mpps (7.299 Mpkts 4.204 Gbps in 1002975 usec) 511.50 avg_batch 99999 min_space 923.715783 main_thread [2325] 5.980 Mpps (5.997 Mpkts 3.454 Gbps in 1002918 usec) 511.50 avg_batch 99999 min_space 924.717926 main_thread [2325] 7.257 Mpps (7.273 Mpkts 4.189 Gbps in 1002143 usec) 511.50 avg_batch 99999 min_space 926.738572 main_thread [2325] 7.365 Mpps (14.883 Mpkts 8.572 Gbps in 2020646 usec) 511.50 avg_batch 99999 min_space 927.739321 main_thread [2325] 6.196 Mpps (6.200 Mpkts 3.571 Gbps in 1000749 usec) 511.50 avg_batch 99999 min_space PTNET(VM2 RX): 927.761042 main_thread [2325] 6.163 Mpps (6.196 Mpkts 3.569 Gbps in 1005388 usec) 511.46 avg_batch 1 min_space 928.763410 main_thread [2325] 7.249 Mpps (7.266 Mpkts 4.185 Gbps in 1002367 usec) 511.50 avg_batch 1 min_space 930.441251 main_thread [2325] 7.455 Mpps (12.508 Mpkts 7.205 Gbps in 1677842 usec) 511.50 avg_batch 1 min_space 932.298463 main_thread [2325] 7.151 Mpps (13.280 Mpkts 7.649 Gbps in 1857212 usec) 511.50 avg_batch 1 min_space 933.299217 main_thread [2325] 6.957 Mpps (6.963 Mpkts 4.010 Gbps in 1000754 usec) 511.50 avg_batch 1 min_space 934.302244 main_thread [2325] 7.496 Mpps (7.519 Mpkts 4.331 Gbps in 1003027 usec) 511.50 avg [-- Attachment #4 --] 957.285624 [ 442] generic_netmap_register Generic adapter 0xfffff80012a5cc00 goes on 957.287257 [ 487] generic_netmap_register RX ring 0 of generic adapter 0xfffff80012a5cc00 goes on 957.288979 [ 494] generic_netmap_register TX ring 0 of generic adapter 0xfffff80012a5cc00 goes on 957.584285 main [2770] mapped 334980KB at 0x801600000 Sending on netmap:vtnet1: 1 queues, 1 threads and 1 cpus. 10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff) 957.671132 main [2867] Sending 512 packets every 0.000000000 s 957.684112 start_threads [2235] Wait 2 secs for phy reset 959.737703 start_threads [2237] Ready... 959.738424 main [2880] failed to install ^C handler: Invalid argument 959.739435 sender_body [1436] start, fd 3 main_fd 3 Fatal trap 18: integer divide fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff809056b5 stack pointer = 0x28:0xfffffe01bfd77630 frame pointer = 0x28:0xfffffe01bfd77680 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2480 (pkt-gen) trap number = 18 panic: integer divide fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b088c7 at kdb_backtrace+0x67 #1 0xffffffff80abdc32 at vpanic+0x182 #2 0xffffffff80abdaa3 at panic+0x43 #3 0xffffffff80f84d31 at trap_fatal+0x351 #4 0xffffffff80f849c8 at trap+0x768 #5 0xffffffff80f677d1 at calltrap+0x8 #6 0xffffffff826296b8 at nm_os_generic_xmit_frame+0x48 #7 0xffffffff826233bf at generic_netmap_txsync+0x29f #8 0xffffffff8261d4ff at netmap_poll+0x50f #9 0xffffffff8262a672 at freebsd_netmap_poll+0x32 #10 0xffffffff8096ad90 at devfs_poll_f+0x70 #11 0xffffffff80b27c20 at kern_poll+0x650 #12 0xffffffff80b275c1 at sys_poll+0x61 #13 0xffffffff80f8568e at amd64_syscall+0x4ce #14 0xffffffff80f67abb at Xfast_syscall+0xfb Uptime: 20m40s 786.243355 [ 442] generic_netmap_register Generic adapter 0xfffff80012ae2000 goes on 786.244891 [ 487] generic_netmap_register RX ring 0 of generic adapter 0xfffff80012ae2000 goes on 786.246508 [ 494] generic_netmap_register TX ring 0 of generic adapter 0xfffff80012ae2000 goes on 786.270206 main [2770] mapped 334980KB at 0x801600000 Sending on netmap:vtnet1: 1 queues, 1 threads and 1 cpus. 10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff) 786.345136 main [2867] Sending 512 packets every 0.000000000 s 786.363395 start_threads [2235] Wait 2 secs for phy reset 788.426614 start_threads [2237] Ready... 788.427454 sender_body [1436] start, fd 3 main_fd 3 Fatal trap 18: integer divide fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff809056b5 stack pointer = 0x28:0xfffffe01bfd59630 frame pointer = 0x28:0xfffffe01bfd59680 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1142 (pkt-gen) trap number = 18 panic: integer divide fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b088c7 at kdb_backtrace+0x67 #1 0xffffffff80abdc32 at vpanic+0x182 #2 0xffffffff80abdaa3 at panic+0x43 #3 0xffffffff80f84d31 at trap_fatal+0x351 #4 0xffffffff80f849c8 at trap+0x768 #5 0xffffffff80f677d1 at calltrap+0x8 #6 0xffffffff826296b8 at nm_os_generic_xmit_frame+0x48 #7 0xffffffff826233bf at generic_netmap_txsync+0x29f #8 0xffffffff8261d4ff at netmap_poll+0x50f #9 0xffffffff8262a672 at freebsd_netmap_poll+0x32 #10 0xffffffff8096ad90 at devfs_poll_f+0x70 #11 0xffffffff80b27c20 at kern_poll+0x650 #12 0xffffffff80b275c1 at sys_poll+0x61 #13 0xffffffff80f8568e at amd64_syscall+0x4ce #14 0xffffffff80f67abb at Xfast_syscall+0xfb Uptime: 5m47s
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DB5PR07MB16857B8F7A2B4783640074979B8E0>
