From owner-svn-src-head@FreeBSD.ORG Mon Oct 20 14:42:44 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2B3F51F8; Mon, 20 Oct 2014 14:42:44 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 14D492F8; Mon, 20 Oct 2014 14:42:44 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id s9KEgixT096175; Mon, 20 Oct 2014 14:42:44 GMT (envelope-from bryanv@FreeBSD.org) Received: (from bryanv@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id s9KEggqt096167; Mon, 20 Oct 2014 14:42:42 GMT (envelope-from bryanv@FreeBSD.org) Message-Id: <201410201442.s9KEggqt096167@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: bryanv set sender to bryanv@FreeBSD.org using -f From: Bryan Venteicher Date: Mon, 20 Oct 2014 14:42:42 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r273331 - in head: sbin/ifconfig share/man/man4 sys/conf sys/modules sys/modules/if_vxlan sys/net sys/sys X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Oct 2014 14:42:44 -0000 Author: bryanv Date: Mon Oct 20 14:42:42 2014 New Revision: 273331 URL: https://svnweb.freebsd.org/changeset/base/273331 Log: Add vxlan interface vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in a UDP packet. This implementation is based on RFC7348. Currently, the IPv6 support is not fully compliant with the specification: we should be able to receive UPDv6 packets with a zero checksum, but we need to support RFC6935 first. Patches for this should come soon. Encapsulation protocols such as vxlan emphasize the need for the FreeBSD network stack to support batching, GRO, and GSO. Each frame has to make two trips through the network stack, and each frame will be at most MTU sized. Performance suffers accordingly. Some latest generation NICs have begun to support vxlan HW offloads that we should also take advantage of. VIMAGE support should also be added soon. Differential Revision: https://reviews.freebsd.org/D384 Reviewed by: gnn Relnotes: yes Added: head/sbin/ifconfig/ifvxlan.c (contents, props changed) head/share/man/man4/vxlan.4 (contents, props changed) head/sys/modules/if_vxlan/ head/sys/modules/if_vxlan/Makefile (contents, props changed) head/sys/net/if_vxlan.c (contents, props changed) head/sys/net/if_vxlan.h (contents, props changed) Modified: head/sbin/ifconfig/Makefile head/sbin/ifconfig/ifconfig.8 head/share/man/man4/Makefile head/sys/conf/NOTES head/sys/conf/files head/sys/modules/Makefile head/sys/sys/priv.h Modified: head/sbin/ifconfig/Makefile ============================================================================== --- head/sbin/ifconfig/Makefile Mon Oct 20 14:25:23 2014 (r273330) +++ head/sbin/ifconfig/Makefile Mon Oct 20 14:42:42 2014 (r273331) @@ -30,6 +30,7 @@ SRCS+= ifmac.c # MAC support SRCS+= ifmedia.c # SIOC[GS]IFMEDIA support SRCS+= iffib.c # non-default FIB support SRCS+= ifvlan.c # SIOC[GS]ETVLAN support +SRCS+= ifvxlan.c # VXLAN support SRCS+= ifgre.c # GRE keys etc SRCS+= ifgif.c # GIF reversed header workaround Modified: head/sbin/ifconfig/ifconfig.8 ============================================================================== --- head/sbin/ifconfig/ifconfig.8 Mon Oct 20 14:25:23 2014 (r273330) +++ head/sbin/ifconfig/ifconfig.8 Mon Oct 20 14:42:42 2014 (r273331) @@ -28,7 +28,7 @@ .\" From: @(#)ifconfig.8 8.3 (Berkeley) 1/5/94 .\" $FreeBSD$ .\" -.Dd October 1, 2014 +.Dd October 20, 2014 .Dt IFCONFIG 8 .Os .Sh NAME @@ -2541,6 +2541,76 @@ argument is useless and hence deprecated .El .Pp The following parameters are used to configure +.Xr vxlan 4 +interfaces. +.Bl -tag -width indent +.It Cm vni Ar identifier +This value is a 24-bit VXLAN Network Identifier (VNI) that identifies the +virtual network segment membership of the interface. +.It Cm local Ar address +The source address used in the encapsulating IPv4/IPv6 header. +The address should already be assigned to an existing interface. +When the interface is configured in unicast mode, the listening socket +is bound to this address. +.It Cm remote Ar address +The interface can be configured in a unicast, or point-to-point, mode +to create a tunnel between two hosts. +This is the IP address of the remote end of the tunnel. +.It Cm group Ar address +The interface can be configured in a multicast mode +to create a virtual network of hosts. +This is the IP multicast group address the interface will join. +.It Cm localport Ar port +The port number the interface will listen on. +The default port number is 4789. +.It Cm remoteport Ar port +The destination port number used in the encapsulating IPv4/IPv6 header. +The remote host should be listening on this port. +The default port number is 4789. +Note some other implementations, such as Linux, +do not default to the IANA assigned port, +but instead listen on port 8472. +.It Cm portrange Ar low high +The range of source ports used in the encapsulating IPv4/IPv6 header. +The port selected within the range is based on a hash of the inner frame. +A range is useful to provide entropy within the outer IP header +for more effective load balancing. +The default range is between the +.Xr sysctl 8 +variables +.Va net.inet.ip.portrange.first +and +.Va net.inet.ip.portrange.last +.It Cm timeout Ar timeout +The maximum time, in seconds, before an entry in the forwarding table +is pruned. +The default is 1200 seconds (20 minutes). +.It Cm maxaddr Ar max +The maximum number of entries in the forwarding table. +The default is 2000. +.It Cm vxlandev Ar dev +When the interface is configured in multicast mode, the +.Cm dev +interface is used to transmit IP multicast packets. +.It Cm ttl Ar ttl +The TTL used in the encapsulating IPv4/IPv6 header. +The default is 64. +.It Cm learn +The source IP address and inner source Ethernet MAC address of +received packets are used to dynamically populate the forwarding table. +When in multicast mode, an entry in the forwarding table allows the +interface to send the frame directly to the remote host instead of +broadcasting the frame to the multicast group. +This is the default. +.It Fl learn +The forwarding table is not populated by recevied packets. +.It Cm flush +Delete all dynamically-learned addresses from the forwarding table. +.It Cm flushall +Delete all addresses, including static addresses, from the forwarding table. +.El +.Pp +The following parameters are used to configure .Xr carp 4 protocol on an interface: .Bl -tag -width indent @@ -2745,6 +2815,7 @@ tried to alter an interface's configurat .Xr pfsync 4 , .Xr polling 4 , .Xr vlan 4 , +.Xr vxlan 4 , .Xr devd.conf 5 , .\" .Xr eon 5 , .Xr devd 8 , Added: head/sbin/ifconfig/ifvxlan.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sbin/ifconfig/ifvxlan.c Mon Oct 20 14:42:42 2014 (r273331) @@ -0,0 +1,648 @@ +/*- + * Copyright (c) 2014, Bryan Venteicher + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include +#include + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "ifconfig.h" + +static struct ifvxlanparam params = { + .vxlp_vni = VXLAN_VNI_MAX, +}; + +static int +get_val(const char *cp, u_long *valp) +{ + char *endptr; + u_long val; + + errno = 0; + val = strtoul(cp, &endptr, 0); + if (cp[0] == '\0' || endptr[0] != '\0' || errno == ERANGE) + return (-1); + + *valp = val; + return (0); +} + +static int +do_cmd(int sock, u_long op, void *arg, size_t argsize, int set) +{ + struct ifdrv ifd; + + bzero(&ifd, sizeof(ifd)); + + strlcpy(ifd.ifd_name, ifr.ifr_name, sizeof(ifd.ifd_name)); + ifd.ifd_cmd = op; + ifd.ifd_len = argsize; + ifd.ifd_data = arg; + + return (ioctl(sock, set ? SIOCSDRVSPEC : SIOCGDRVSPEC, &ifd)); +} + +static int +vxlan_exists(int sock) +{ + struct ifvxlancfg cfg; + + bzero(&cfg, sizeof(cfg)); + + return (do_cmd(sock, VXLAN_CMD_GET_CONFIG, &cfg, sizeof(cfg), 0) != -1); +} + +static void +vxlan_status(int s) +{ + struct ifvxlancfg cfg; + char src[NI_MAXHOST], dst[NI_MAXHOST]; + char srcport[NI_MAXSERV], dstport[NI_MAXSERV]; + struct sockaddr *lsa, *rsa; + int vni, mc, ipv6; + + bzero(&cfg, sizeof(cfg)); + + if (do_cmd(s, VXLAN_CMD_GET_CONFIG, &cfg, sizeof(cfg), 0) < 0) + return; + + vni = cfg.vxlc_vni; + lsa = &cfg.vxlc_local_sa.sa; + rsa = &cfg.vxlc_remote_sa.sa; + ipv6 = rsa->sa_family == AF_INET6; + + /* Just report nothing if the network identity isn't set yet. */ + if (vni >= VXLAN_VNI_MAX) + return; + + if (getnameinfo(lsa, lsa->sa_len, src, sizeof(src), + srcport, sizeof(srcport), NI_NUMERICHOST | NI_NUMERICSERV) != 0) + src[0] = srcport[0] = '\0'; + if (getnameinfo(rsa, rsa->sa_len, dst, sizeof(dst), + dstport, sizeof(dstport), NI_NUMERICHOST | NI_NUMERICSERV) != 0) + dst[0] = dstport[0] = '\0'; + + if (!ipv6) { + struct sockaddr_in *sin = (struct sockaddr_in *)rsa; + mc = IN_MULTICAST(ntohl(sin->sin_addr.s_addr)); + } else { + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)rsa; + mc = IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr); + } + + printf("\tvxlan vni %d", vni); + printf(" local %s%s%s:%s", ipv6 ? "[" : "", src, ipv6 ? "]" : "", + srcport); + printf(" %s %s%s%s:%s", mc ? "group" : "remote", ipv6 ? "[" : "", + dst, ipv6 ? "]" : "", dstport); + + if (verbose) { + printf("\n\t\tconfig: "); + printf("%slearning portrange %d-%d ttl %d", + cfg.vxlc_learn ? "" : "no", cfg.vxlc_port_min, + cfg.vxlc_port_max, cfg.vxlc_ttl); + printf("\n\t\tftable: "); + printf("cnt %d max %d timeout %d", + cfg.vxlc_ftable_cnt, cfg.vxlc_ftable_max, + cfg.vxlc_ftable_timeout); + } + + putchar('\n'); +} + +#define _LOCAL_ADDR46 \ + (VXLAN_PARAM_WITH_LOCAL_ADDR4 | VXLAN_PARAM_WITH_LOCAL_ADDR6) +#define _REMOTE_ADDR46 \ + (VXLAN_PARAM_WITH_REMOTE_ADDR4 | VXLAN_PARAM_WITH_REMOTE_ADDR6) + +static void +vxlan_check_params(void) +{ + + if ((params.vxlp_with & _LOCAL_ADDR46) == _LOCAL_ADDR46) + errx(1, "cannot specify both local IPv4 and IPv6 addresses"); + if ((params.vxlp_with & _REMOTE_ADDR46) == _REMOTE_ADDR46) + errx(1, "cannot specify both remote IPv4 and IPv6 addresses"); + if ((params.vxlp_with & VXLAN_PARAM_WITH_LOCAL_ADDR4 && + params.vxlp_with & VXLAN_PARAM_WITH_REMOTE_ADDR6) || + (params.vxlp_with & VXLAN_PARAM_WITH_LOCAL_ADDR6 && + params.vxlp_with & VXLAN_PARAM_WITH_REMOTE_ADDR4)) + errx(1, "cannot mix IPv4 and IPv6 addresses"); +} + +#undef _LOCAL_ADDR46 +#undef _REMOTE_ADDR46 + +static void +vxlan_cb(int s, void *arg) +{ + +} + +static void +vxlan_create(int s, struct ifreq *ifr) +{ + + vxlan_check_params(); + + ifr->ifr_data = (caddr_t) ¶ms; + if (ioctl(s, SIOCIFCREATE2, ifr) < 0) + err(1, "SIOCIFCREATE2"); +} + +static +DECL_CMD_FUNC(setvxlan_vni, arg, d) +{ + struct ifvxlancmd cmd; + u_long val; + + if (get_val(arg, &val) < 0 || val >= VXLAN_VNI_MAX) + errx(1, "invalid network identifier: %s", arg); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_VNI; + params.vxlp_vni = val; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_vni = val; + + if (do_cmd(s, VXLAN_CMD_SET_VNI, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_VNI"); +} + +static +DECL_CMD_FUNC(setvxlan_local, addr, d) +{ + struct ifvxlancmd cmd; + struct addrinfo *ai; + struct sockaddr *sa; + int error; + + bzero(&cmd, sizeof(cmd)); + + if ((error = getaddrinfo(addr, NULL, NULL, &ai)) != 0) + errx(1, "error in parsing local address string: %s", + gai_strerror(error)); + + sa = ai->ai_addr; + + switch (ai->ai_family) { +#ifdef INET + case AF_INET: { + struct in_addr addr = ((struct sockaddr_in *) sa)->sin_addr; + + if (IN_MULTICAST(ntohl(addr.s_addr))) + errx(1, "local address cannot be multicast"); + + cmd.vxlcmd_sa.in4.sin_family = AF_INET; + cmd.vxlcmd_sa.in4.sin_addr = addr; + break; + } +#endif +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr = &((struct sockaddr_in6 *)sa)->sin6_addr; + + if (IN6_IS_ADDR_MULTICAST(addr)) + errx(1, "local address cannot be multicast"); + + cmd.vxlcmd_sa.in6.sin6_family = AF_INET6; + cmd.vxlcmd_sa.in6.sin6_addr = *addr; + break; + } +#endif + default: + errx(1, "local address %s not supported", addr); + } + + freeaddrinfo(ai); + + if (!vxlan_exists(s)) { + if (cmd.vxlcmd_sa.sa.sa_family == AF_INET) { + params.vxlp_with |= VXLAN_PARAM_WITH_LOCAL_ADDR4; + params.vxlp_local_in4 = cmd.vxlcmd_sa.in4.sin_addr; + } else { + params.vxlp_with |= VXLAN_PARAM_WITH_LOCAL_ADDR6; + params.vxlp_local_in6 = cmd.vxlcmd_sa.in6.sin6_addr; + } + return; + } + + if (do_cmd(s, VXLAN_CMD_SET_LOCAL_ADDR, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_LOCAL_ADDR"); +} + +static +DECL_CMD_FUNC(setvxlan_remote, addr, d) +{ + struct ifvxlancmd cmd; + struct addrinfo *ai; + struct sockaddr *sa; + int error; + + bzero(&cmd, sizeof(cmd)); + + if ((error = getaddrinfo(addr, NULL, NULL, &ai)) != 0) + errx(1, "error in parsing remote address string: %s", + gai_strerror(error)); + + sa = ai->ai_addr; + + switch (ai->ai_family) { +#ifdef INET + case AF_INET: { + struct in_addr addr = ((struct sockaddr_in *)sa)->sin_addr; + + if (IN_MULTICAST(ntohl(addr.s_addr))) + errx(1, "remote address cannot be multicast"); + + cmd.vxlcmd_sa.in4.sin_family = AF_INET; + cmd.vxlcmd_sa.in4.sin_addr = addr; + break; + } +#endif +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr = &((struct sockaddr_in6 *)sa)->sin6_addr; + + if (IN6_IS_ADDR_MULTICAST(addr)) + errx(1, "remote address cannot be multicast"); + + cmd.vxlcmd_sa.in6.sin6_family = AF_INET6; + cmd.vxlcmd_sa.in6.sin6_addr = *addr; + break; + } +#endif + default: + errx(1, "remote address %s not supported", addr); + } + + freeaddrinfo(ai); + + if (!vxlan_exists(s)) { + if (cmd.vxlcmd_sa.sa.sa_family == AF_INET) { + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR4; + params.vxlp_remote_in4 = cmd.vxlcmd_sa.in4.sin_addr; + } else { + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR6; + params.vxlp_remote_in6 = cmd.vxlcmd_sa.in6.sin6_addr; + } + return; + } + + if (do_cmd(s, VXLAN_CMD_SET_REMOTE_ADDR, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_REMOTE_ADDR"); +} + +static +DECL_CMD_FUNC(setvxlan_group, addr, d) +{ + struct ifvxlancmd cmd; + struct addrinfo *ai; + struct sockaddr *sa; + int error; + + bzero(&cmd, sizeof(cmd)); + + if ((error = getaddrinfo(addr, NULL, NULL, &ai)) != 0) + errx(1, "error in parsing group address string: %s", + gai_strerror(error)); + + sa = ai->ai_addr; + + switch (ai->ai_family) { +#ifdef INET + case AF_INET: { + struct in_addr addr = ((struct sockaddr_in *)sa)->sin_addr; + + if (!IN_MULTICAST(ntohl(addr.s_addr))) + errx(1, "group address must be multicast"); + + cmd.vxlcmd_sa.in4.sin_family = AF_INET; + cmd.vxlcmd_sa.in4.sin_addr = addr; + break; + } +#endif +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr = &((struct sockaddr_in6 *)sa)->sin6_addr; + + if (!IN6_IS_ADDR_MULTICAST(addr)) + errx(1, "group address must be multicast"); + + cmd.vxlcmd_sa.in6.sin6_family = AF_INET6; + cmd.vxlcmd_sa.in6.sin6_addr = *addr; + break; + } +#endif + default: + errx(1, "group address %s not supported", addr); + } + + freeaddrinfo(ai); + + if (!vxlan_exists(s)) { + if (cmd.vxlcmd_sa.sa.sa_family == AF_INET) { + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR4; + params.vxlp_remote_in4 = cmd.vxlcmd_sa.in4.sin_addr; + } else { + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR6; + params.vxlp_remote_in6 = cmd.vxlcmd_sa.in6.sin6_addr; + } + return; + } + + if (do_cmd(s, VXLAN_CMD_SET_REMOTE_ADDR, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_REMOTE_ADDR"); +} + +static +DECL_CMD_FUNC(setvxlan_local_port, arg, d) +{ + struct ifvxlancmd cmd; + u_long val; + + if (get_val(arg, &val) < 0 || val >= UINT16_MAX) + errx(1, "invalid local port: %s", arg); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_LOCAL_PORT; + params.vxlp_local_port = val; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_port = val; + + if (do_cmd(s, VXLAN_CMD_SET_LOCAL_PORT, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_LOCAL_PORT"); +} + +static +DECL_CMD_FUNC(setvxlan_remote_port, arg, d) +{ + struct ifvxlancmd cmd; + u_long val; + + if (get_val(arg, &val) < 0 || val >= UINT16_MAX) + errx(1, "invalid remote port: %s", arg); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_PORT; + params.vxlp_remote_port = val; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_port = val; + + if (do_cmd(s, VXLAN_CMD_SET_REMOTE_PORT, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_REMOTE_PORT"); +} + +static +DECL_CMD_FUNC2(setvxlan_port_range, arg1, arg2) +{ + struct ifvxlancmd cmd; + u_long min, max; + + if (get_val(arg1, &min) < 0 || min >= UINT16_MAX) + errx(1, "invalid port range minimum: %s", arg1); + if (get_val(arg2, &max) < 0 || max >= UINT16_MAX) + errx(1, "invalid port range maximum: %s", arg2); + if (max < min) + errx(1, "invalid port range"); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_PORT_RANGE; + params.vxlp_min_port = min; + params.vxlp_max_port = max; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_port_min = min; + cmd.vxlcmd_port_max = max; + + if (do_cmd(s, VXLAN_CMD_SET_PORT_RANGE, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_PORT_RANGE"); +} + +static +DECL_CMD_FUNC(setvxlan_timeout, arg, d) +{ + struct ifvxlancmd cmd; + u_long val; + + if (get_val(arg, &val) < 0 || (val & ~0xFFFFFFFF) != 0) + errx(1, "invalid timeout value: %s", arg); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_FTABLE_TIMEOUT; + params.vxlp_ftable_timeout = val & 0xFFFFFFFF; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_ftable_timeout = val & 0xFFFFFFFF; + + if (do_cmd(s, VXLAN_CMD_SET_FTABLE_TIMEOUT, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_FTABLE_TIMEOUT"); +} + +static +DECL_CMD_FUNC(setvxlan_maxaddr, arg, d) +{ + struct ifvxlancmd cmd; + u_long val; + + if (get_val(arg, &val) < 0 || (val & ~0xFFFFFFFF) != 0) + errx(1, "invalid maxaddr value: %s", arg); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_FTABLE_MAX; + params.vxlp_ftable_max = val & 0xFFFFFFFF; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_ftable_max = val & 0xFFFFFFFF; + + if (do_cmd(s, VXLAN_CMD_SET_FTABLE_MAX, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_FTABLE_MAX"); +} + +static +DECL_CMD_FUNC(setvxlan_dev, arg, d) +{ + struct ifvxlancmd cmd; + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_MULTICAST_IF; + strlcpy(params.vxlp_mc_ifname, arg, + sizeof(params.vxlp_mc_ifname)); + return; + } + + bzero(&cmd, sizeof(cmd)); + strlcpy(cmd.vxlcmd_ifname, arg, sizeof(cmd.vxlcmd_ifname)); + + if (do_cmd(s, VXLAN_CMD_SET_MULTICAST_IF, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_MULTICAST_IF"); +} + +static +DECL_CMD_FUNC(setvxlan_ttl, arg, d) +{ + struct ifvxlancmd cmd; + u_long val; + + if (get_val(arg, &val) < 0 || val > 256) + errx(1, "invalid TTL value: %s", arg); + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_TTL; + params.vxlp_ttl = val; + return; + } + + bzero(&cmd, sizeof(cmd)); + cmd.vxlcmd_ttl = val; + + if (do_cmd(s, VXLAN_CMD_SET_TTL, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_TTL"); +} + +static +DECL_CMD_FUNC(setvxlan_learn, arg, d) +{ + struct ifvxlancmd cmd; + + if (!vxlan_exists(s)) { + params.vxlp_with |= VXLAN_PARAM_WITH_LEARN; + params.vxlp_learn = d; + return; + } + + bzero(&cmd, sizeof(cmd)); + if (d != 0) + cmd.vxlcmd_flags |= VXLAN_CMD_FLAG_LEARN; + + if (do_cmd(s, VXLAN_CMD_SET_LEARN, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_SET_LEARN"); +} + +static void +setvxlan_flush(const char *val, int d, int s, const struct afswtch *afp) +{ + struct ifvxlancmd cmd; + + bzero(&cmd, sizeof(cmd)); + if (d != 0) + cmd.vxlcmd_flags |= VXLAN_CMD_FLAG_FLUSH_ALL; + + if (do_cmd(s, VXLAN_CMD_FLUSH, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_FLUSH"); +} + +static struct cmd vxlan_cmds[] = { + + DEF_CLONE_CMD_ARG("vni", setvxlan_vni), + DEF_CLONE_CMD_ARG("local", setvxlan_local), + DEF_CLONE_CMD_ARG("remote", setvxlan_remote), + DEF_CLONE_CMD_ARG("group", setvxlan_group), + DEF_CLONE_CMD_ARG("localport", setvxlan_local_port), + DEF_CLONE_CMD_ARG("remoteport", setvxlan_remote_port), + DEF_CLONE_CMD_ARG2("portrange", setvxlan_port_range), + DEF_CLONE_CMD_ARG("timeout", setvxlan_timeout), + DEF_CLONE_CMD_ARG("maxaddr", setvxlan_maxaddr), + DEF_CLONE_CMD_ARG("vxlandev", setvxlan_dev), + DEF_CLONE_CMD_ARG("ttl", setvxlan_ttl), + DEF_CLONE_CMD("learn", 1, setvxlan_learn), + DEF_CLONE_CMD("-learn", 0, setvxlan_learn), + + DEF_CMD_ARG("vni", setvxlan_vni), + DEF_CMD_ARG("local", setvxlan_local), + DEF_CMD_ARG("remote", setvxlan_remote), + DEF_CMD_ARG("group", setvxlan_group), + DEF_CMD_ARG("localport", setvxlan_local_port), + DEF_CMD_ARG("remoteport", setvxlan_remote_port), + DEF_CMD_ARG2("portrange", setvxlan_port_range), + DEF_CMD_ARG("timeout", setvxlan_timeout), + DEF_CMD_ARG("maxaddr", setvxlan_maxaddr), + DEF_CMD_ARG("vxlandev", setvxlan_dev), + DEF_CMD_ARG("ttl", setvxlan_ttl), + DEF_CMD("learn", 1, setvxlan_learn), + DEF_CMD("-learn", 0, setvxlan_learn), + + DEF_CMD("flush", 0, setvxlan_flush), + DEF_CMD("flushall", 1, setvxlan_flush), +}; + +static struct afswtch af_vxlan = { + .af_name = "af_vxlan", + .af_af = AF_UNSPEC, + .af_other_status = vxlan_status, +}; + +static __constructor void +vxlan_ctor(void) +{ +#define N(a) (sizeof(a) / sizeof(a[0])) + size_t i; + + for (i = 0; i < N(vxlan_cmds); i++) + cmd_register(&vxlan_cmds[i]); + af_register(&af_vxlan); + callback_register(vxlan_cb, NULL); + clone_setdefcallback("vxlan", vxlan_create); +#undef N +} Modified: head/share/man/man4/Makefile ============================================================================== --- head/share/man/man4/Makefile Mon Oct 20 14:25:23 2014 (r273330) +++ head/share/man/man4/Makefile Mon Oct 20 14:42:42 2014 (r273331) @@ -567,6 +567,7 @@ MAN= aac.4 \ ${_virtio_scsi.4} \ vkbd.4 \ vlan.4 \ + vxlan.4 \ ${_vmx.4} \ vpo.4 \ vr.4 \ @@ -743,6 +744,7 @@ MLINKS+=urndis.4 if_urndis.4 MLINKS+=${_urtw.4} ${_if_urtw.4} MLINKS+=vge.4 if_vge.4 MLINKS+=vlan.4 if_vlan.4 +MLINKS+=vxlan.4 if_vxlan.4 MLINKS+=${_vmx.4} ${_if_vmx.4} MLINKS+=vpo.4 imm.4 MLINKS+=vr.4 if_vr.4 Added: head/share/man/man4/vxlan.4 ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/share/man/man4/vxlan.4 Mon Oct 20 14:42:42 2014 (r273331) @@ -0,0 +1,235 @@ +.\" Copyright (c) 2014 Bryan Venteicher +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd October 20, 2014 +.Dt VXLAN 4 +.Os +.Sh NAME +.Nm vxlan +.Nd "Virtual eXtensible LAN interface" +.Sh SYNOPSIS +To compile this driver into the kernel, +place the following line in your +kernel configuration file: +.Bd -ragged -offset indent +.Cd "device vxlan" +.Ed +.Pp +Alternatively, to load the driver as a +module at boot time, place the following line in +.Xr loader.conf 5 : +.Bd -literal -offset indent +if_vxlan_load="YES" +.Ed +.Sh DESCRIPTION +The +.Nm +driver creates a virtual tunnel endpoint in a +.Nm +segment. +A +.Nm +segment is a virtual Layer 2 (Ethernet) network that is overlaid +in a Layer 3 (IP/UDP) network. +.Nm +is analogous to +.Xr vlan 4 +but is designed to be better suited for large, multiple tenant +data center environments. +.Pp +Each +.Nm +interface is created at runtime using interface cloning. +This is most easily done with the +.Xr ifconfig 8 +.Cm create +command or using the +.Va cloned_interfaces +variable in +.Xr rc.conf 5 . +The interface may be removed with the +.Xr ifconfig 8 +.Cm destroy +command. +.Pp +The +.Nm +driver creates a pseudo Ethernet network interface +that supports the usual network +.Xr ioctl 2 Ns s +and is thus can be used with +.Xr ifconfig 8 +like any other Ethernet interface. +The +.Nm +interface encapsulates the Ethernet frame +by prepending IP/UDP and +.Nm +headers. +Thus, the encapsulated (inner) frame is able to transmitted +over a routed, Layer 3 network to the remote host. +.Pp +The +.Nm +interface may be configured in either unicast or multicast mode. +When in unicast mode, +the interface creates a tunnel to a single remote host, +and all traffic is transmitted to that host. +When in multicast mode, +the interface joins an IP multicast group, +and receives packets sent to the group address, +and transmits packets to either the multicast group address, +or directly the remote host if there is an appropriate +forwarding table entry. +.Pp +When the +.Nm +interface is brought up, a +.Xr UDP 4 +.Xr socket 9 +is created based on the configuration, +such as the local address for unicast mode or +the group address for multicast mode, +and the listening (local) port number. +Since multiple +.Nm +interfaces may be created that either +use the same local address +or join the same group address, +and use the same port, +the driver may share a socket among multiple interfaces. +However, each interface within a socket must belong to +a unique +.Nm +segment. +The analogous +.Xr vlan 4 +configuration would be a physical interface configured as +the parent device for multiple VLAN interfaces, each with +a unique VLAN tag. +Each +.Nm +segment is identified by a 24-bit value in the +.Nm +header called the +.Dq VXLAN Network Identifier , +or VNI. +.Pp +When configured with the +.Xr ifconfig 8 +.Cm learn +parameter, the interface dynamically creates forwarding table entries +from received packets. +An entry in the forwarding table maps the inner source MAC address +to the outer remote IP address. +During transmit, the interface attempts to lookup an entry for +the encapsulated destination MAC address. +If an entry is found, the IP address in the entry is used to directly +transmit the encapsulated frame to the destination. +Otherwise, when configured in multicast mode, +the interface must flood the frame to all hosts in the group. +The maximum number of entries in the table is configurable with the +.Xr ifconfig 8 +.Cm maxaddr +command. +Stale entries in the table periodically pruned. +The timeout is configurable with the +.Xr ifconfig 8 +.Cm timeout +command. +The table may be viewed with the +.Xr sysctl 8 +.Cm net.link.vlxan.N.ftable.dump +command. +.Sh MTU +Since the +.Nm +interface encapsulates the Ethernet frame with an IP, UDP, and +.Nm +header, the resulting frame may be larger than the MTU of the +physical network. +The +.Nm +specification recommends the physical network MTU be configured +to use jumbo frames to accommodate the encapsulated frame size. +Alternatively, the +.Xr ifconfig 8 +.Cm mtu +command may be used to reduce the MTU size on the +.Nm +interface to allow the encapsulated frame to fit in the +current MTU of the physical network. +.Sh EXAMPLES +Create a +.Nm +interface in unicast mode +with the +.Cm local +tunnel address of 192.168.100.1, +and the +.Cm remote +tunnel address of 192.168.100.2. +.Bd -literal -offset indent +ifconfig vxlan create vni 108 local 192.168.100.1 remote 192.168.100.2 +.Ed +.Pp +Create a +.Nm +interface in multicast mode, +with the +.Cm local +address of 192.168.10.95, +and the +.Cm group +address of 224.0.2.6. +The em0 interface will be used to transmit multicast packets. +.Bd -literal -offset indent +ifconfig vxlan create vni 42 local 192.168.10.95 group 224.0.2.6 vxlandev em0 +.Ed +.Pp +Once created, the +.Nm +interface can be configured with +.Xr ifconfig 8 . +.Sh SEE ALSO +.Xr ifconfig 8 , +.Xr inet 4 , +.Xr inet 6 , +.Xr sysctl 8 , +.Xr vlan 8 *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***