From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 00:25:55 2012 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 74B67757; Sun, 2 Dec 2012 00:25:55 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5917C8FC08; Sun, 2 Dec 2012 00:25:55 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qB20Ptbv006658; Sun, 2 Dec 2012 00:25:55 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qB20Ps1L006654; Sun, 2 Dec 2012 00:25:54 GMT (envelope-from rmacklem) Date: Sun, 2 Dec 2012 00:25:54 GMT Message-Id: <201212020025.qB20Ps1L006654@freefall.freebsd.org> To: jas@cse.yorku.ca, rmacklem@FreeBSD.org, freebsd-net@FreeBSD.org From: rmacklem@FreeBSD.org Subject: Re: kern/173479: [nfs] chown and chgrp operations fail between FreeBSD 9.1RC3 NFSv4 server and RH63 NFSv4 client X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 00:25:55 -0000 Synopsis: [nfs] chown and chgrp operations fail between FreeBSD 9.1RC3 NFSv4 server and RH63 NFSv4 client State-Changed-From-To: open->closed State-Changed-By: rmacklem State-Changed-When: Sun Dec 2 00:20:25 UTC 2012 State-Changed-Why: This bug is caused by Linux 3.3 or greater kernels defaulting to using numeric uids/gids in the owner and owner_group strings. Support for this is defined in an internet draft that has not yet been published as an RFC. To swich the Linux server to the old behaviour you may: - create /etc/modprobe.d - put a file in there called nfs.conf with the following line in it options nfs nfs4_disable_idmapping=N Support for this new behaviour was added to head as r240720 and has been MFC'd to stable/8 and stable/9. http://www.freebsd.org/cgi/query-pr.cgi?pr=173479 From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 00:28:37 2012 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C0852765; Sun, 2 Dec 2012 00:28:37 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id A64838FC08; Sun, 2 Dec 2012 00:28:37 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qB20Sbw6006813; Sun, 2 Dec 2012 00:28:37 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qB20Sbmf006809; Sun, 2 Dec 2012 00:28:37 GMT (envelope-from rmacklem) Date: Sun, 2 Dec 2012 00:28:37 GMT Message-Id: <201212020028.qB20Sbmf006809@freefall.freebsd.org> To: jas@cse.yorku.ca, rmacklem@FreeBSD.org, freebsd-net@FreeBSD.org From: rmacklem@FreeBSD.org Subject: Re: kern/173481: [NFS] RH63 NFSv4 client does not reconnect to FreeBSD 9.1RC3 NFSv4 server after server is rebooted X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 00:28:37 -0000 Synopsis: [NFS] RH63 NFSv4 client does not reconnect to FreeBSD 9.1RC3 NFSv4 server after server is rebooted State-Changed-From-To: open->closed State-Changed-By: rmacklem State-Changed-When: Sun Dec 2 00:26:10 UTC 2012 State-Changed-Why: Upon further investigation, Jason determined that the Linux client generated no network traffic to the server for some time, but then did reconnect and recover. The slow reconnect seems to be a Linux client issue. He recommended closing the PR. http://www.freebsd.org/cgi/query-pr.cgi?pr=173481 From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 00:48:43 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4C04EE7E; Sun, 2 Dec 2012 00:48:43 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 162FA8FC0C; Sun, 2 Dec 2012 00:48:42 +0000 (UTC) Received: from secured.by.ipfw.ru ([95.143.220.47] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1Texmz-0008qu-Ll; Sun, 02 Dec 2012 04:52:10 +0400 Message-ID: <50BAA552.1010707@FreeBSD.org> Date: Sun, 02 Dec 2012 04:48:18 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120121 Thunderbird/9.0 MIME-Version: 1.0 To: Hiroki Sato Subject: [CFT] Virtual BPF interfaces (was: CFR: ipfw0 pseudo-interface clonable) References: <4F96D11B.2060007@FreeBSD.org> <20120425.020518.406495893112283552.hrs@allbsd.org> <4F96E71B.9020405@FreeBSD.org> <20120427.084414.1142593201575277510.hrs@allbsd.org> <4FD4AD29.3040204@FreeBSD.org> In-Reply-To: <4FD4AD29.3040204@FreeBSD.org> Content-Type: multipart/mixed; boundary="------------080701070400070005040601" Cc: freebsd-ipfw@FreeBSD.org, delphij@freebsd.org, "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 00:48:43 -0000 This is a multi-part message in MIME format. --------------080701070400070005040601 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 10.06.2012 18:20, Alexander V. Chernikov wrote: > On 27.04.2012 03:44, Hiroki Sato wrote: >> "Alexander V. Chernikov" wrote >> in<4F96E71B.9020405@FreeBSD.org>: >> >> me> On 24.04.2012 21:05, Hiroki Sato wrote: > > Proof-of-concept patch attached. Hopefully, libcap code is easily extendable. New version attached: * BPF code is now able to use 'virtual' interfaces without real ifnet * New bpfattach3() / bpfdetach3() routines were added to attach virtual ifaces * New BIOCGIFLIST ioctl is added to permit userland to retrieve available virtual interfaces * freebsd-specific 'platform_finddevs' version is added to libpcap code (new file) There are some rough edges (conditional code in pcap-bpf.c, lack of documentation, maybe some style issues), but generally it seems to work and does not interfere with contrib/ code much (from my point of view). ipfw log device was converted to use new bpf(4) api, see attached patch. Small example: 4:17 [0] zfscurr0# tcpdump -D 1.em0 2.em1 3.lo0 4:17 [0] zfscurr0# kldload ipfw 4:17 [0] zfscurr0# ifconfig -l em0 em1 lo0 4:17 [0] zfscurr0# tcpdump -D 1.em0 2.ipfw0 (ipfw log interface) 3.em1 4.lo0 4:40 [1] zfscurr0# ipfw add 100 count log logamount 0 ip from any to any 00100 count log ip from any to any 4:40 [0] zfscurr0# tcpdump -i ipfw0 -lns0 tcpdump: WARNING: SIOCGIFADDR: ipfw0: Device not configured tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ipfw0, link-type EN10MB (Ethernet), capture size 65535 bytes 04:41:27.233653 IP 10.0.0.92.22 > 10.0.0.5.59076: Flags [P.], seq 2783103749:2783103941, ack 3836123088, win 1040, options [nop,nop,TS val 1668094903 ecr 564715671], length 192 04:41:27.233678 IP 10.0.0.5.59076 > 10.0.0.92.22: Flags [.], ack 0, win 1039, options [nop,nop,TS val 564715680 ecr 1668094903], length 0 Btw, do we still need warning about lack of IPv4 address? > > Unfortunately, there are problems with this approach, too. > > pcap_findalldevs() uses external to BPF method (possibly NET_RT_IFLIST), > so programs relying on that function for showing some kind of combo-box > (like wireshark) with all possible variant won't allow user to specify > such interface. > > Additionally, tcpdump assumes that passed interface name is real and > warns us that SIOCGIFADDR returns error. > > >> >> -- Hiroki > --------------080701070400070005040601 Content-Type: text/plain; name="bpf_virtual.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="bpf_virtual.diff" Index: lib/libpcap/Makefile =================================================================== --- lib/libpcap/Makefile (revision 243778) +++ lib/libpcap/Makefile (working copy) @@ -6,7 +6,7 @@ SHLIBDIR?= /lib .include LIB= pcap -SRCS= grammar.y tokdefs.h version.h pcap-bpf.c \ +SRCS= grammar.y tokdefs.h version.h pcap-bpf.c pcap-freebsd.c \ pcap.c pcap-common.c inet.c fad-getad.c gencode.c optimize.c nametoaddr.c \ etherent.c savefile.c bpf_filter.c bpf_image.c bpf_dump.c \ scanner.l sf-pcap.c sf-pcap-ng.c version.c Index: sys/net/bpf.c =================================================================== --- sys/net/bpf.c (revision 243778) +++ sys/net/bpf.c (working copy) @@ -151,6 +151,7 @@ static void bpf_detachd_locked(struct bpf_d *); static void bpf_freed(struct bpf_d *); static int bpf_movein(struct uio *, int, struct ifnet *, struct mbuf **, struct sockaddr *, int *, struct bpf_insn *); +static int bpf_getiflist(struct bpf_d *, struct bpf_iflist *); static int bpf_setif(struct bpf_d *, struct ifreq *); static void bpf_timed_out(void *); static __inline void @@ -654,7 +655,7 @@ bpf_attachd(struct bpf_d *d, struct bpf_if *bp) CTR3(KTR_NET, "%s: bpf_attach called by pid %d, adding to %s list", __func__, d->bd_pid, d->bd_writer ? "writer" : "active"); - if (op_w == 0) + if ((op_w == 0) && (!BPF_IS_VIRTUAL(bp))) EVENTHANDLER_INVOKE(bpf_track, bp->bif_ifp, bp->bif_dlt, 1); } @@ -696,7 +697,8 @@ bpf_upgraded(struct bpf_d *d) CTR2(KTR_NET, "%s: upgrade required by pid %d", __func__, d->bd_pid); - EVENTHANDLER_INVOKE(bpf_track, bp->bif_ifp, bp->bif_dlt, 1); + if (!BPF_IS_VIRTUAL(bp)) + EVENTHANDLER_INVOKE(bpf_track, bp->bif_ifp, bp->bif_dlt, 1); } /* @@ -743,6 +745,10 @@ bpf_detachd_locked(struct bpf_d *d) bpf_bpfd_cnt--; + /* Nothing to do for fake interfaces */ + if (BPF_IS_VIRTUAL(bp)) + return; + /* Call event handler iff d is attached */ if (error == 0) EVENTHANDLER_INVOKE(bpf_track, ifp, bp->bif_dlt, 0); @@ -1037,7 +1043,11 @@ bpfwrite(struct cdev *dev, struct uio *uio, int io return (ENXIO); } - ifp = d->bd_bif->bif_ifp; + /* XXX: Writing to fake interfaces is not supported */ + if ((ifp = d->bd_bif->bif_ifp) == NULL) { + d->bd_wdcount++; + return (ENXIO); + } if ((ifp->if_flags & IFF_UP) == 0) { d->bd_wdcount++; @@ -1266,10 +1276,17 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t add { struct ifnet *ifp; - if (d->bd_bif == NULL) + /* + * Lock d since other thread can do reatach in + * other thread causing d->bd_bif to be set to NULL + */ + BPFD_LOCK(d); + if ((d->bd_bif == NULL) || (BPF_IS_VIRTUAL(d->bd_bif))) { error = EINVAL; - else { + BPFD_UNLOCK(d); + } else { ifp = d->bd_bif->bif_ifp; + BPFD_UNLOCK(d); error = (*ifp->if_ioctl)(ifp, cmd, addr); } break; @@ -1325,6 +1342,13 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t add error = EINVAL; break; } + + if (BPF_IS_VIRTUAL(d->bd_bif)) { + /* Silently ignore fake interfaces */ + error = 0; + break; + } + if (d->bd_promisc == 0) { error = ifpromisc(d->bd_bif->bif_ifp, 1); if (error == 0) @@ -1390,6 +1414,12 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t add BPF_UNLOCK(); break; + case BIOCGIFLIST: + BPF_LOCK(); + error = bpf_getiflist(d, (struct bpf_iflist *)addr); + BPF_UNLOCK(); + break; + /* * Get interface name. */ @@ -1401,7 +1431,8 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t add struct ifnet *const ifp = d->bd_bif->bif_ifp; struct ifreq *const ifr = (struct ifreq *)addr; - strlcpy(ifr->ifr_name, ifp->if_xname, + strlcpy(ifr->ifr_name, BPF_IS_VIRTUAL(d->bd_bif) ? + d->bd_bif->ifname : ifp->if_xname, sizeof(ifr->ifr_name)); } BPF_UNLOCK(); @@ -1701,6 +1732,7 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t add break; } CURVNET_RESTORE(); + return (error); } @@ -1834,6 +1866,55 @@ bpf_setf(struct bpf_d *d, struct bpf_program *fp, } /* + * Get a list of available virtual interfaces + */ +static int +bpf_getiflist(struct bpf_d *d, struct bpf_iflist *ifl) +{ + int len, tot_len, error; + struct bpf_if *bp; + struct bpf_ifreply ifr; + char *buffer; + + BPF_LOCK_ASSERT(); + + tot_len = 0; + error = 0; + buffer = ifl->ifl_list; + LIST_FOREACH(bp, &bpf_iflist, bif_next) { + if (!BPF_IS_VIRTUAL(bp)) + continue; + + /* Count total length */ + len = offsetof(struct bpf_ifreply, ifr_descr) + + strlen(bp->ifdescr) + 1; + /* Align on 4-byte boundary */ + len = roundup2(len, 4); + + if (buffer != NULL) { + if (tot_len + len >= ifl->ifl_len) + return (ENOMEM); + + /* Fill in interface record */ + memset(&ifr, 0, sizeof(ifr)); + ifr.ifr_len = len; + strlcpy(ifr.ifr_name, bp->ifname, IFNAMSIZ + 1); + + copyout(&ifr, buffer, sizeof(ifr)); + /* Write interface description */ + error = copyout(bp->ifdescr, + buffer + offsetof(struct bpf_ifreply, ifr_descr), + strlen(bp->ifdescr) + 1); + + buffer += len; + } + tot_len += len; + } + ifl->ifl_len = tot_len; + return (error); +} + +/* * Detach a file from its current interface (if attached at all) and attach * to the interface indicated by the name stored in ifr. * Return an errno or 0. @@ -1847,10 +1928,19 @@ bpf_setif(struct bpf_d *d, struct ifreq *ifr) BPF_LOCK_ASSERT(); theywant = ifunit(ifr->ifr_name); - if (theywant == NULL || theywant->if_bpf == NULL) - return (ENXIO); + if (theywant == NULL || theywant->if_bpf == NULL) { + /* Check for fake interface existance */ + LIST_FOREACH(bp, &bpf_iflist, bif_next) { + if (!BPF_IS_VIRTUAL(bp)) + continue; + if (!strcmp(bp->ifname, ifr->ifr_name)) + break; + } - bp = theywant->if_bpf; + if (bp == NULL) + return (ENXIO); + } else + bp = theywant->if_bpf; /* Check if interface is not being detached from BPF */ BPFIF_RLOCK(bp); @@ -2075,7 +2165,8 @@ bpf_tap(struct bpf_if *bp, u_char *pkt, u_int pktl if (gottime < bpf_ts_quality(d->bd_tstamp)) gottime = bpf_gettime(&bt, d->bd_tstamp, NULL); #ifdef MAC - if (mac_bpfdesc_check_receive(d, bp->bif_ifp) == 0) + if (BPF_IS_VIRTUAL(bp) || + (mac_bpfdesc_check_receive(d, bp->bif_ifp) == 0)) #endif catchpacket(d, pkt, pktlen, slen, bpf_append_bytes, &bt); @@ -2085,6 +2176,7 @@ bpf_tap(struct bpf_if *bp, u_char *pkt, u_int pktl BPFIF_RUNLOCK(bp); } +/* Note i CAN be NULL */ #define BPF_CHECK_DIRECTION(d, r, i) \ (((d)->bd_direction == BPF_D_IN && (r) != (i)) || \ ((d)->bd_direction == BPF_D_OUT && (r) == (i))) @@ -2134,7 +2226,8 @@ bpf_mtap(struct bpf_if *bp, struct mbuf *m) if (gottime < bpf_ts_quality(d->bd_tstamp)) gottime = bpf_gettime(&bt, d->bd_tstamp, m); #ifdef MAC - if (mac_bpfdesc_check_receive(d, bp->bif_ifp) == 0) + if ((BPF_IS_VIRTUAL(bp)) || + (mac_bpfdesc_check_receive(d, bp->bif_ifp) == 0)) #endif catchpacket(d, (u_char *)m, pktlen, slen, bpf_append_mbuf, &bt); @@ -2190,7 +2283,8 @@ bpf_mtap2(struct bpf_if *bp, void *data, u_int dle if (gottime < bpf_ts_quality(d->bd_tstamp)) gottime = bpf_gettime(&bt, d->bd_tstamp, m); #ifdef MAC - if (mac_bpfdesc_check_receive(d, bp->bif_ifp) == 0) + if ((BPF_IS_VIRTUAL(bp)) || + (mac_bpfdesc_check_receive(d, bp->bif_ifp) == 0)) #endif catchpacket(d, (u_char *)&mb, pktlen, slen, bpf_append_mbuf, &bt); @@ -2484,6 +2578,45 @@ bpfattach2(struct ifnet *ifp, u_int dlt, u_int hdr } /* + * Attach fake interface to bpf. ifname is interface name to be attached, + * dlt is the link layer type, and hdrlen is the fixed size of the link header + * (variable length headers are not yet supporrted). + */ +void +bpfattach3(char *ifname, char *ifdescr, u_int dlt, u_int hdrlen, struct bpf_if **driverp) +{ + struct bpf_if *bp; + int len; + + len = strlen(ifdescr) + 1; + + /* Assume bpf_if to be properly aligned */ + bp = malloc(sizeof(*bp) + len, M_BPF, M_NOWAIT | M_ZERO); + if (bp == NULL) + panic("bpfattach"); + + LIST_INIT(&bp->bif_dlist); + LIST_INIT(&bp->bif_wlist); + strlcpy(bp->ifname, ifname, IFNAMSIZ + 1); + bp->ifdescr = (char *)(bp + 1); + strlcpy(bp->ifdescr, ifdescr, len); + bp->bif_dlt = dlt; + rw_init(&bp->bif_lock, "bpf interface lock"); + KASSERT(*driverp == NULL, ("bpfattach3: driverp already initialized")); + *driverp = bp; + + BPF_LOCK(); + LIST_INSERT_HEAD(&bpf_iflist, bp, bif_next); + BPF_UNLOCK(); + + bp->bif_hdrlen = hdrlen; + + if (bootverbose) + printf("%s: bpf attached\n", bp->ifname); +} + + +/* * Detach bpf from an interface. This involves detaching each descriptor * associated with the interface. Notify each descriptor as it's detached * so that any sleepers wake up and get ENXIO. @@ -2546,6 +2679,54 @@ bpfdetach(struct ifnet *ifp) } /* + * Detach bpf from the fake interface. This involves detaching each descriptor + * associated with the interface. Notify each descriptor as it's detached + * so that any sleepers wake up and get ENXIO. + */ +void +bpfdetach3(char *ifname) +{ + struct bpf_if *bp; + struct bpf_d *d; + + BPF_LOCK(); + /* Find all bpf_if struct's which reference ifp and detach them. */ + LIST_FOREACH(bp, &bpf_iflist, bif_next) { + if (!BPF_IS_VIRTUAL(bp)) + continue; + if (!strcmp(bp->ifname, ifname)) + break; + } + + if (bp != NULL) + LIST_REMOVE(bp, bif_next); + + BPF_UNLOCK(); + + if (bp != NULL) { + while ((d = LIST_FIRST(&bp->bif_dlist)) != NULL) { + bpf_detachd_locked(d); + BPFD_LOCK(d); + bpf_wakeup(d); + BPFD_UNLOCK(d); + } + /* Free writer-only descriptors */ + while ((d = LIST_FIRST(&bp->bif_wlist)) != NULL) { + bpf_detachd_locked(d); + BPFD_LOCK(d); + bpf_wakeup(d); + BPFD_UNLOCK(d); + } + + /* + * Since this interface is fake we can free our + * structure immediately. + */ + rw_destroy(&bp->bif_lock); + free(bp, M_BPF); + } +} +/* * Interface departure handler. * Note departure event does not guarantee interface is going down. */ @@ -2594,6 +2775,9 @@ bpf_getdltlist(struct bpf_d *d, struct bpf_dltlist LIST_FOREACH(bp, &bpf_iflist, bif_next) { if (bp->bif_ifp != ifp) continue; + /* Compare fake interfaces by name */ + if ((ifp == NULL) && (strcmp(d->bd_bif->ifname, bp->ifname))) + continue; if (bfl->bfl_list != NULL) { if (n >= bfl->bfl_len) return (ENOMEM); @@ -2623,7 +2807,13 @@ bpf_setdlt(struct bpf_d *d, u_int dlt) ifp = d->bd_bif->bif_ifp; LIST_FOREACH(bp, &bpf_iflist, bif_next) { - if (bp->bif_ifp == ifp && bp->bif_dlt == dlt) + if (bp->bif_ifp != ifp) + continue; + + if ((ifp == NULL) && strcmp(d->bd_bif->ifname, bp->ifname)) + continue; + + if (bp->bif_dlt == dlt) break; } @@ -2718,8 +2908,10 @@ bpfstats_fill_xbpf(struct xbpf_d *d, struct bpf_d d->bd_hlen = bd->bd_hlen; d->bd_bufsize = bd->bd_bufsize; d->bd_pid = bd->bd_pid; - strlcpy(d->bd_ifname, - bd->bd_bif->bif_ifp->if_xname, IFNAMSIZ); + if (!BPF_IS_VIRTUAL(bd->bd_bif)) + strlcpy(d->bd_ifname, bd->bd_bif->bif_ifp->if_xname, IFNAMSIZ); + else + strlcpy(d->bd_ifname, bd->bd_bif->ifname, IFNAMSIZ); d->bd_locked = bd->bd_locked; d->bd_wcount = bd->bd_wcount; d->bd_wdcount = bd->bd_wdcount; Index: sys/net/bpf.h =================================================================== --- sys/net/bpf.h (revision 243778) +++ sys/net/bpf.h (working copy) @@ -147,6 +147,7 @@ struct bpf_zbuf { #define BIOCSETFNR _IOW('B', 130, struct bpf_program) #define BIOCGTSTAMP _IOR('B', 131, u_int) #define BIOCSTSTAMP _IOW('B', 132, u_int) +#define BIOCGIFLIST _IOWR('B', 133, struct bpf_iflist) /* Obsolete */ #define BIOCGSEESENT BIOCGDIRECTION @@ -1224,6 +1225,25 @@ struct bpf_dltlist { u_int *bfl_list; /* array of DLTs */ }; +#define BIFNAMSIZ 16 +#if !defined(_KERNEL) || defined(BPF_INTERNAL) +/* + * Structure to retrieve virtual BPF intefaces. + */ +struct bpf_iflist { + u_int ifl_len; /* total memory size */ + u_int ifl_ver; /* version (set to 0) */ + char *ifl_list; /* array of interfaces */ +}; + +struct bpf_ifreply { + u_int ifr_len; /* Total record length */ + u_int ifr_spare[3]; /* Spare data */ + char ifr_name[BIFNAMSIZ + 1]; /* Interface name */ + char ifr_descr[0]; /* Interface description (variable) */ +}; +#endif + #ifdef _KERNEL #ifdef MALLOC_DECLARE MALLOC_DECLARE(M_BPF); @@ -1262,6 +1282,8 @@ struct bpf_if { struct rwlock bif_lock; /* interface lock */ LIST_HEAD(, bpf_d) bif_wlist; /* writer-only list */ int flags; /* Interface flags */ + char ifname[IFNAMSIZ + 1]; /* Virtual interface name */ + char *ifdescr; /* Virtual interface description */ #endif }; @@ -1272,7 +1294,9 @@ void bpf_mtap(struct bpf_if *, struct mbuf *); void bpf_mtap2(struct bpf_if *, void *, u_int, struct mbuf *); void bpfattach(struct ifnet *, u_int, u_int); void bpfattach2(struct ifnet *, u_int, u_int, struct bpf_if **); +void bpfattach3(char *, char *, u_int, u_int, struct bpf_if **); void bpfdetach(struct ifnet *); +void bpfdetach3(char *); void bpfilterattach(int); u_int bpf_filter(const struct bpf_insn *, u_char *, u_int, u_int); Index: sys/net/bpfdesc.h =================================================================== --- sys/net/bpfdesc.h (revision 243778) +++ sys/net/bpfdesc.h (working copy) @@ -102,6 +102,8 @@ struct bpf_d { u_char bd_compat32; /* 32-bit stream on LP64 system */ }; +#define BPF_IS_VIRTUAL(x) ((x)->bif_ifp == NULL) + /* Values for bd_state */ #define BPF_IDLE 0 /* no select in progress */ #define BPF_WAITING 1 /* waiting for read timeout in select */ Index: sys/netpfil/ipfw/ip_fw_log.c =================================================================== --- sys/netpfil/ipfw/ip_fw_log.c (revision 243778) +++ sys/netpfil/ipfw/ip_fw_log.c (working copy) @@ -93,142 +93,31 @@ ipfw_log_bpf(int onoff) { } #else /* !WITHOUT_BPF */ -static struct ifnet *log_if; /* hook to attach to bpf */ -static struct rwlock log_if_lock; -#define LOGIF_LOCK_INIT(x) rw_init(&log_if_lock, "ipfw log_if lock") -#define LOGIF_LOCK_DESTROY(x) rw_destroy(&log_if_lock) -#define LOGIF_RLOCK(x) rw_rlock(&log_if_lock) -#define LOGIF_RUNLOCK(x) rw_runlock(&log_if_lock) -#define LOGIF_WLOCK(x) rw_wlock(&log_if_lock) -#define LOGIF_WUNLOCK(x) rw_wunlock(&log_if_lock) +static struct bpf_if *log_bpfif = NULL; /* hook to attach to bpf */ +#define BPF_IFNAME "ipfw0" +#define IPFW_MTAP(_if_bpf,_data,_dlen,_m) do { \ + if (bpf_peers_present(_if_bpf)) { \ + M_ASSERTVALID(_m); \ + bpf_mtap2((_if_bpf),(_data),(_dlen),(_m)); \ + } \ +} while (0) -static const char ipfwname[] = "ipfw"; - -/* we use this dummy function for all ifnet callbacks */ -static int -log_dummy(struct ifnet *ifp, u_long cmd, caddr_t addr) -{ - return EINVAL; -} - -static int -ipfw_log_output(struct ifnet *ifp, struct mbuf *m, - struct sockaddr *dst, struct route *ro) -{ - if (m != NULL) - FREE_PKT(m); - return EINVAL; -} - -static void -ipfw_log_start(struct ifnet* ifp) -{ - panic("ipfw_log_start() must not be called"); -} - static const u_char ipfwbroadcastaddr[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; -static int -ipfw_log_clone_match(struct if_clone *ifc, const char *name) -{ - - return (strncmp(name, ipfwname, sizeof(ipfwname) - 1) == 0); -} - -static int -ipfw_log_clone_create(struct if_clone *ifc, char *name, size_t len, - caddr_t params) -{ - int error; - int unit; - struct ifnet *ifp; - - error = ifc_name2unit(name, &unit); - if (error) - return (error); - - error = ifc_alloc_unit(ifc, &unit); - if (error) - return (error); - - ifp = if_alloc(IFT_PFLOG); - if (ifp == NULL) { - ifc_free_unit(ifc, unit); - return (ENOSPC); - } - ifp->if_dname = ipfwname; - ifp->if_dunit = unit; - snprintf(ifp->if_xname, IFNAMSIZ, "%s%d", ipfwname, unit); - strlcpy(name, ifp->if_xname, len); - ifp->if_mtu = 65536; - ifp->if_flags = IFF_UP | IFF_SIMPLEX | IFF_MULTICAST; - ifp->if_init = (void *)log_dummy; - ifp->if_ioctl = log_dummy; - ifp->if_start = ipfw_log_start; - ifp->if_output = ipfw_log_output; - ifp->if_addrlen = 6; - ifp->if_hdrlen = 14; - ifp->if_broadcastaddr = ipfwbroadcastaddr; - ifp->if_baudrate = IF_Mbps(10); - - LOGIF_WLOCK(); - if (log_if == NULL) - log_if = ifp; - else { - LOGIF_WUNLOCK(); - if_free(ifp); - ifc_free_unit(ifc, unit); - return (EEXIST); - } - LOGIF_WUNLOCK(); - if_attach(ifp); - bpfattach(ifp, DLT_EN10MB, 14); - - return (0); -} - -static int -ipfw_log_clone_destroy(struct if_clone *ifc, struct ifnet *ifp) -{ - int unit; - - if (ifp == NULL) - return (0); - - LOGIF_WLOCK(); - if (log_if != NULL && ifp == log_if) - log_if = NULL; - else { - LOGIF_WUNLOCK(); - return (EINVAL); - } - LOGIF_WUNLOCK(); - - unit = ifp->if_dunit; - bpfdetach(ifp); - if_detach(ifp); - if_free(ifp); - ifc_free_unit(ifc, unit); - - return (0); -} - -static struct if_clone *ipfw_log_cloner; - void ipfw_log_bpf(int onoff) { - if (onoff) { - LOGIF_LOCK_INIT(); - ipfw_log_cloner = if_clone_advanced(ipfwname, 0, - ipfw_log_clone_match, ipfw_log_clone_create, - ipfw_log_clone_destroy); - } else { - if_clone_detach(ipfw_log_cloner); - LOGIF_LOCK_DESTROY(); - } + if (onoff) { + if (log_bpfif) + return; + bpfattach3(BPF_IFNAME, "ipfw log interface", DLT_EN10MB, 14, &log_bpfif); + } else { + if (log_bpfif != NULL) + bpfdetach3(BPF_IFNAME); + log_bpfif = NULL; + } } #endif /* !WITHOUT_BPF */ @@ -247,20 +136,18 @@ ipfw_log(struct ip_fw *f, u_int hlen, struct ip_fw if (V_fw_verbose == 0) { #ifndef WITHOUT_BPF - LOGIF_RLOCK(); - if (log_if == NULL || log_if->if_bpf == NULL) { - LOGIF_RUNLOCK(); + if (log_bpfif == NULL) return; - } if (args->eh) /* layer2, use orig hdr */ - BPF_MTAP2(log_if, args->eh, ETHER_HDR_LEN, m); + IPFW_MTAP(log_bpfif, args->eh, ETHER_HDR_LEN, m); else - /* Add fake header. Later we will store + /* + * Add fake header. Later we will store * more info in the header. */ - BPF_MTAP2(log_if, "DDDDDDSSSSSS\x08\x00", ETHER_HDR_LEN, m); - LOGIF_RUNLOCK(); + IPFW_MTAP(log_bpfif, "DDDDDDSSSSSS\x08\x00", + ETHER_HDR_LEN, m); #endif /* !WITHOUT_BPF */ return; } Index: contrib/libpcap/pcap-bpf.c =================================================================== --- contrib/libpcap/pcap-bpf.c (revision 243778) +++ contrib/libpcap/pcap-bpf.c (working copy) @@ -132,6 +132,8 @@ static int bpf_load(char *errbuf); #include "pcap-snf.h" #endif /* HAVE_SNF_API */ +#include "pcap-freebsd.h" + #ifdef HAVE_OS_PROTO_H #include "os-proto.h" #endif @@ -2311,6 +2313,8 @@ pcap_platform_finddevs(pcap_if_t **alldevsp, char if (snf_platform_finddevs(alldevsp, errbuf) < 0) return (-1); #endif /* HAVE_SNF_API */ + if (freebsd_platform_finddevs(alldevsp, errbuf) < 0) + return (-1); return (0); } --- /dev/null 2012-12-02 04:22:01.000000000 +0400 +++ contrib/libpcap/pcap-freebsd.h 2012-12-02 02:50:44.251624161 +0400 @@ -0,0 +1 @@ +int freebsd_platform_finddevs(pcap_if_t **devlistp, char *errbuf); --- /dev/null 2012-12-02 04:22:01.000000000 +0400 +++ contrib/libpcap/pcap-freebsd.c 2012-12-02 04:22:11.404710869 +0400 @@ -0,0 +1,138 @@ +/* + * pcap-freebsd.c: Packet capture advanced interface to the FreeBSD kernel + * + * License: BSD + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED + * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include +#include +#include + +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "pcap-int.h" + +int +freebsd_platform_finddevs(pcap_if_t **alldevsp, char *errbuf) +{ + int ret; + + struct bpf_iflist ifl; + struct bpf_ifreply *ifr; + char *device = "/dev/bpf"; + int fd, i, len, res; + caddr_t databuf; + + if ((fd = open(device, O_RDWR)) == -1) { + snprintf(errbuf, PCAP_ERRBUF_SIZE, + "(cannot open device) %s: %s", + device, pcap_strerror(errno)); + + return (-1); + } + + res = 0; + + for (i = 0; i < 10; i++) { + /* Get size */ + memset(&ifl, 0, sizeof(ifl)); + + if (ioctl(fd, BIOCGIFLIST, (caddr_t)&ifl) != 0) { + snprintf(errbuf, PCAP_ERRBUF_SIZE, + "(cannot get interface list length): %s", + pcap_strerror(errno)); + + close(fd); + return (-1); + } + + /* Allocate requested length */ + len = ifl.ifl_len + 1024; + databuf = calloc(1, len); + + /* Try to read data */ + ifl.ifl_list = databuf; + ifl.ifl_len = len; + + if (ioctl(fd, BIOCGIFLIST, (caddr_t)&ifl) != 0) { + if (errno == ENOMEM) { + /* + * Probably new interface added. + * Let's try another time. + */ + free(databuf); + databuf = NULL; + ifl.ifl_len = 0; + continue; + } + + snprintf(errbuf, PCAP_ERRBUF_SIZE, + "(cannot read interface list): %s", + pcap_strerror(errno)); + + close(fd); + return (-1); + } + + res = 1; + break; + } + + close(fd); + + if (res == 0) { + snprintf(errbuf, PCAP_ERRBUF_SIZE, + "(error reading interface list): retries exceeded"); + return (-1); + } + + /* Okay, let's parse */ + for (len = 0; len < ifl.ifl_len; ) { + ifr = (struct bpf_ifreply *)&databuf[len]; + + if (pcap_add_if(alldevsp, ifr->ifr_name, 0, + ifr->ifr_descr, errbuf) < 0) + return (-1); + + len += ifr->ifr_len; + } + + return (0); +} + --------------080701070400070005040601-- From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 03:38:26 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9DCD2364 for ; Sun, 2 Dec 2012 03:38:26 +0000 (UTC) (envelope-from moonlightakkiy@yahoo.ca) Received: from nm14-vm0.bullet.mail.bf1.yahoo.com (nm14-vm0.bullet.mail.bf1.yahoo.com [98.139.213.164]) by mx1.freebsd.org (Postfix) with ESMTP id 01B7C8FC0C for ; Sun, 2 Dec 2012 03:38:25 +0000 (UTC) Received: from [98.139.215.141] by nm14.bullet.mail.bf1.yahoo.com with NNFMP; 02 Dec 2012 03:38:24 -0000 Received: from [98.139.213.12] by tm12.bullet.mail.bf1.yahoo.com with NNFMP; 02 Dec 2012 03:38:24 -0000 Received: from [127.0.0.1] by smtp112.mail.bf1.yahoo.com with NNFMP; 02 Dec 2012 03:38:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.ca; s=s1024; t=1354419504; bh=gDgQb+NYGEbZG5K2h4Hfjk2kihIV/REgvVGw7DKkxic=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Received:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type; b=NtV4e6RI5Htm0wV4/08YsYFB+KeD1IEp9WH61uqhjRKGW1BUriPp6w5ScJi/33XQW1A33OmXcLLRCAg6EMo/BEbZLhYBrbe7FBYePRycljFdrOEN2IVfO9J8j0lir5b15Tg5mPPXm3iqqeOwh7efXxzWseMMD9zm6FUyzoQBy78= X-Yahoo-Newman-Id: 713105.73332.bm@smtp112.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: ezmBgt8VM1kDHqNXfrVqnXF5lJw_JZXFwbqQszOidZgXG8k w0BDIIdHfRZaHHw3vccIm44_yQg50qbTMr79llAQuOT2O3yeabWTZEuonxYo dK58sAw3HQLU2iks5Q9KYiZsF0iltq1IW48fNJT5.TslSAloWwlt5CojYzFI lyxtfdFEkr.QWgTDH0UoTbfIws9q7343c8MO.OmeOvPkhcOoFa3Zb87Z6nfb TOnoXucSoMCFswFXyjztQwNH2L7MvkwQeUMC2Uj4zPn6H_WJqhfefUUZPVi4 QR9ha.A.phZkuBpXh09YaBCMla3OIjP2WJBrnnrmW9hY21TY7WO3bwJcYhiX KDu.OnS4mAHyvxtYiNoyl2qZD85vKgEkl_kL_.uNBl6Hr2P4.Oplw42l3uhP fXZPdqEfSID5olHOe361SnT8H8G0bWNJqAkIpMtZAjuvM.K2nJMRAfEdjfPm Afx.PmQaRgZOXAYr4vLBuFs0n5.ZZHpC.nsGnfWqBsMck8MU4F_Lu6_bm93c Z6VTj5PYNI20U0BVC5HHlh_DRb85rRrM53cUd87X.Y6aQoEit_DB8soxamZ7 eyhb64ykgz5H90iQLN5FQ7oWqaYTO6P3oumy1miedW5ZqjFEB2Jyt3klRDg- - X-Yahoo-SMTP: Xr6qjFWswBAEmd20sAvB4Q3keqXvXsIH9TjJ Received: from mail-vc0-f182.google.com (moonlightakkiy@209.85.220.182 with plain) by smtp112.mail.bf1.yahoo.com with SMTP; 01 Dec 2012 19:38:24 -0800 PST Received: by mail-vc0-f182.google.com with SMTP id fo14so1112501vcb.13 for ; Sat, 01 Dec 2012 19:38:24 -0800 (PST) MIME-Version: 1.0 Received: by 10.58.12.231 with SMTP id b7mr5401465vec.31.1354419504218; Sat, 01 Dec 2012 19:38:24 -0800 (PST) Received: by 10.58.182.72 with HTTP; Sat, 1 Dec 2012 19:38:23 -0800 (PST) In-Reply-To: References: Date: Sat, 1 Dec 2012 20:38:23 -0700 Message-ID: Subject: Re: Ralink RT2860 Driver Code From: PseudoCylon To: Ramanujan Seshadri Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 03:38:26 -0000 On Sat, Dec 1, 2012 at 3:08 PM, Ramanujan Seshadri wrote: > Hello, > Thanks for the explanation. In fact when i saw the code i also thought the > same, but when i tried > to print out the transmitted A-MPDU's i found something different. > > For example, > The Counter AggSize15Count should print the the numbers in the multiples > of 15, but sometimes it doesn't. > So, my understanding is that, the MPDU's are written into these registers, > and then an ampdu is formed only when there > are enough number of MPDU;s. For example, AggSize15Count sometimes show the > counter as 54, so > there would be only (54/15) == 3 ampdu's ( 9 remainder). > > But, then i am not sure what will happen to the remaining 9 MPDU's. Does > the register wait for 6 more MPDU's > so that it can aggregate 15 MPDU's to form 1 ampdu or does it write to a > different registry like > AggSize9Count where these 9 MPDU's can get aggregated to an ampdu. > > Can you please explain ? Maybe, re-transmitted packets were counted multiple times. If you need to know exactly what is going on, you have to figure out, i.e by reading BA packet or checking what other end is receiving. Unfortunately, this is what you need to do when you are writing a driver without proper documentation. AK > > -Ram > > On Thu, Nov 29, 2012 at 3:35 AM, PseudoCylon > wrote: >> >> On Wed, Nov 28, 2012 at 9:35 PM, Ramanujan Seshadri >> wrote: >> > Hello, >> > >> > Thanks for the reply. I just had one more doubt. >> > >> > In the counters to update the transmitted A-MPDU counter (Function Name: >> > NICUpdateRawCounters), i saw these lines of codes >> > >> > pRalinkCounters->TransmittedAMPDUCount.u.LowPart += >> > TxAggCnt0.field.AggSize1Count; >> > pRalinkCounters->TransmittedAMPDUCount.u.LowPart += >> > (TxAggCnt0.field.AggSize2Count >> 1); >> > pRalinkCounters->TransmittedAMPDUCount.u.LowPart += >> > (TxAggCnt0.field.AggSize3Count /3); >> > . >> > . >> > . >> > . >> > pRalinkCounters->TransmittedAMPDUCount.u.LowPart += >> > (TxAggCnt0.field.AggSize15Count/ 15); >> > pRalinkCounters->TransmittedAMPDUCount.u.LowPart += >> > (TxAggCnt0.field.AggSize16Count >> 4); >> > >> > Can you please explain the reason why the 'i'th counter is being divided >> > by >> > i, for example .TxAggCnt0.field.AggSize15Count is being divided by 15. >> >> [NB] For people who haven't seen Ralink's code, the above codes are >> theirs. >> >> I guess I didn't explain well. Those counters show number of mpdu >> packets, i.e. AggSize15Count == 30 means 30 mpdu or 2 (30/15) ampdu >> packets. (Because I don't have any datasheet, that how I interpret >> Ralink's code.) >> >> > >> > Also if these were little endian counters then i could not understand >> > the >> > reason why the four counters "TxAggCnt0.field.AggSize2Count, >> > TxAggCnt0.field.AggSize4Count, TxAggCnt0.field.AggSize8Count >> > and TxAggCnt0.field.AggSize16Count " are shifted right by some bits, >> > which >> > means that they are multiplying them (since it is little endian >> > registers) >> > and why they are dividing the others. >> >> RTMP_IO_READ32() does byte swapping. The values should be saved into >> AggSizeNCount with host's byte order. So, right sifting means dividing >> regardless of the byte order. >> >>1 == /2 >> ... >> >>4 == /16 >> They are playing nice to CPUs, I think. >> >> >> AK >> >> > >> > Thanks for the help. >> > >> > -ram >> > >> > >> > On Tue, Nov 27, 2012 at 6:07 PM, PseudoCylon >> > wrote: >> >> >> >> On Tue, Nov 27, 2012 at 1:23 PM, Ramanujan Seshadri >> >> wrote: >> >> > I want to know how many MPDU's are aggregated in each AMPDU >> >> > transmission. >> >> >> >> You could use following statistic counters >> >> RT2860_TX_AGG_CNT0 to 7 >> >> >> >> >> >> https://gitorious.org/run/run/blobs/11n_rc3/dev/usb/wlan/if_runreg.h#line186 >> >> Each 32-bit little-endian read-on-clear register contains 2 16-bit >> >> counters (total 16 16-bit counters). >> >> counter at offset 0x1720 MPDU count 1 >> >> counter at offset 0x1722 MPDU count 2 >> >> ... >> >> counter at offset 0x173c MPDU count 15 >> >> counter at offset 0x173e MPDU count >= 16 >> >> >> >> These regs are identical on RT2800 and RT2700 (pci/usb). >> >> >> >> Example (see #if 0 part) >> >> >> >> https://gitorious.org/run/run/blobs/11n_rc3/dev/usb/wlan/if_run.c#line2493 >> >> >> >> You can only find out statistical numbers (total Tx counts past X >> >> sec). You cannot find out an MPDU count in a particular packet, i.e. >> >> an aggregated packet just Tx'd, unless you read the counters on each >> >> Tx. >> >> >> >> >> >> AK >> >> >> >> > >> >> > -ram >> >> > >> >> > >> >> > On Tue, Nov 27, 2012 at 2:11 PM, PseudoCylon >> >> > >> >> > wrote: >> >> >> >> >> >> > ------------------------------ >> >> >> > >> >> >> > Message: 12 >> >> >> > Date: Tue, 27 Nov 2012 04:33:37 -0500 >> >> >> > From: Ramanujan Seshadri >> >> >> > To: freebsd-net@freebsd.org >> >> >> > Subject: Ralink RT2860 Driver Code >> >> >> > Message-ID: >> >> >> > >> >> >> > >> >> >> > >> >> >> > Content-Type: text/plain; charset=ISO-8859-1 >> >> >> > >> >> >> > Hello, >> >> >> > Can i know how to get the MPDU's aggregated in each AMPDU in a >> >> >> > ralink >> >> >> > driver code for RT2860. I saw the existing counters of ralink and >> >> >> > tried >> >> >> > to >> >> >> > get some info, but was not very useful. >> >> >> > Any help is greatly appreciated. >> >> >> > >> >> >> >> >> >> https://gitorious.org/run/run/trees/11n_rc3/dev/usb/wlan >> >> >> >> >> >> What info are you trying to get? >> >> >> >> >> >> >> >> >> AK >> >> >> >> >> >> > Thanks >> >> >> > ram >> >> >> > >> >> >> > >> >> >> > ------------------------------ >> >> >> > >> >> >> > _______________________________________________ >> >> >> > freebsd-net@freebsd.org mailing list >> >> >> > http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >> >> > To unsubscribe, send any mail to >> >> >> > "freebsd-net-unsubscribe@freebsd.org" >> >> >> > >> >> >> > End of freebsd-net Digest, Vol 504, Issue 2 >> >> >> > ******************************************* >> >> > >> >> > >> > >> > > > From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 03:47:47 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9BC32441 for ; Sun, 2 Dec 2012 03:47:47 +0000 (UTC) (envelope-from moonlightakkiy@yahoo.ca) Received: from nm27-vm0.bullet.mail.bf1.yahoo.com (nm27-vm0.bullet.mail.bf1.yahoo.com [98.139.213.139]) by mx1.freebsd.org (Postfix) with ESMTP id 1FC578FC12 for ; Sun, 2 Dec 2012 03:47:47 +0000 (UTC) Received: from [98.139.212.146] by nm27.bullet.mail.bf1.yahoo.com with NNFMP; 02 Dec 2012 03:47:46 -0000 Received: from [98.139.213.2] by tm3.bullet.mail.bf1.yahoo.com with NNFMP; 02 Dec 2012 03:47:46 -0000 Received: from [127.0.0.1] by smtp102.mail.bf1.yahoo.com with NNFMP; 02 Dec 2012 03:47:46 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.ca; s=s1024; t=1354420066; bh=iP/bG+7ePlaH/SvhNV5fJrfTDDKgEwa+zdj+rgfrTr8=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:Received:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Content-Type; b=G+hwW83llhbXHQrkt02IGpVCXCERVyXrMHHZU/hABulEBMcdKYRcuUBR7WkYKrqo368Cj8JGhDpowqaLfK57f6Q0CCGARa7eJ4n7cQEUsJ9M9lIoNGbknv8aXEPD//z2DsofYn7Z9bJg8DTa0H1Xob4twfLLHytIIUO7DApkGHk= X-Yahoo-Newman-Id: 630941.51747.bm@smtp102.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: dwI.zrgVM1nVRoO5SHQ3rdelXoFlf4psDFkkZfAOBJcrJSx MFvjJnXW0cNEPSLKwbP2sd8rcagmzj_E0m31sXrRDpyjEgxgbxICFD.TZmlf TC_sEZDiReH2vayCY33pAp0VBFeuvLDAdJ5paSWRRF3VZVfd1g24kde8p9MN XMgUIPpJBVJgFdFurjg4PGd1JHeg.5Fd8MP1XcmELSa0syLKMcfRXy8xx3pQ sRP88sKtIFj6o3rAmz_jV_skz1AUydhg5AplzFKjFI5FJLiPfxw5znLJJrSP nJ4KKtA0jcWcO5gNQ5kaNRIpqc.D7tGwDnPLr_tg7Z8P9U13m1Iw9cchTtzC frToNACk5kvBHculMGWEirTxPjBNB3GqY1fxdS5CH.wRLVRCq9FKYzkLVIsE uC6uFT13HdR2CC143b_4cjvBu9JDCSYZabCGkHP.dPN7cQTZTfMt7xvbvnEi jfMuedhK_H0Ixr8Euoy6pqRuNzo4dTBKPcvzl1gFVRfU34uNiGS47pxa23zG 2Mh8HsxiqI066Mo5ErXktPncfjaSYGMqRt2UmRXvgePrmC2ANRsG28A-- X-Yahoo-SMTP: Xr6qjFWswBAEmd20sAvB4Q3keqXvXsIH9TjJ Received: from mail-vb0-f54.google.com (moonlightakkiy@209.85.212.54 with plain) by smtp102.mail.bf1.yahoo.com with SMTP; 01 Dec 2012 19:47:46 -0800 PST Received: by mail-vb0-f54.google.com with SMTP id l1so911690vba.13 for ; Sat, 01 Dec 2012 19:47:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.151.72 with SMTP id b8mr5180193vcw.38.1354420063383; Sat, 01 Dec 2012 19:47:43 -0800 (PST) Received: by 10.58.182.72 with HTTP; Sat, 1 Dec 2012 19:47:43 -0800 (PST) In-Reply-To: References: Date: Sat, 1 Dec 2012 20:47:43 -0700 Message-ID: Subject: Re: freebsd-net Digest, Vol 504, Issue 7 From: PseudoCylon To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 03:47:47 -0000 > ------------------------------ > > Message: 12 > Date: Sat, 1 Dec 2012 17:12:53 -0500 > From: Ramanujan Seshadri > To: freebsd-net@freebsd.org > Subject: MCS selected for each transmission in Ralink RT2860 > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > Hello all, > i wanted to know if i can get the MCS (bit rate ) used to send each > packet in a ralink rt2860 wireless NIC. > I saw the ralink code and got to know that they have a rate-adaptation > algorithm to select the best > rate (when the HT_MCS parameter =33). But, i wanted to know if i can get > the details the bit-rate used > to send each packet. This is the actual mcs used. http://fxr.watson.org/fxr/source/dev/ral/rt2860.c#L1118 > > Thanks > ram > > From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 14:25:30 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8EE3E98C for ; Sun, 2 Dec 2012 14:25:30 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 19CD38FC0C for ; Sun, 2 Dec 2012 14:25:29 +0000 (UTC) Received: by mail-ee0-f54.google.com with SMTP id c13so1375389eek.13 for ; Sun, 02 Dec 2012 06:25:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=i+4l6owSa/UKNxUdrhKcWTzEy90bee3D3Vvr1nhCX9M=; b=F256IzNcsYSEZS3OXuGCaiaqE+bydGBwESz6MiYtTDY/QC6ZSFXI0I3FLLxb5/2sdu gN+idF8zjBzuZhjauiXkxN9JanlKeUquEMAhHvceBJmOkJxt5jYiCTquFDMy0igBiSlt E+KDgyij953iyoVUx78ikX+NXmKMg2W+cZtJRi6zfY757dsBua1lHTJjzz+eqg2+kS08 p5XFqvCmaeg/vezIjZsvAb+y3f7G3Ch7DZJehLJRhca/HEoG3CwerH0iGjZBGeq02P9Y v4w4ZG22Sis6LVLxpwDCV4m4eaT4PkBkzhvwZux+bQuoQ1Mc/2wqs7HA84YU/RYgmeZQ NXSQ== Received: by 10.14.176.66 with SMTP id a42mr26060180eem.34.1354458329068; Sun, 02 Dec 2012 06:25:29 -0800 (PST) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPS id w3sm24784183eel.17.2012.12.02.06.25.27 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 02 Dec 2012 06:25:27 -0800 (PST) Sender: Mikolaj Golub Date: Sun, 2 Dec 2012 16:25:25 +0200 From: Mikolaj Golub To: freebsd-net@freebsd.org Subject: lagg with wireless iface: iieee80211_waitfor_parent is called with a non-sleepable lock held Message-ID: <20121202142524.GA8207@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 14:25:30 -0000 Hi, On my laptop I have lagg setup in failover mode between wired and wireless interfaces, as it is decribed in handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html#networking-lagg-wired-and-wireless On start I have been observing witness warnings like below: taskqueue_drain with the following non-sleepable locks held: exclusive rw if_lagg rwlock (if_lagg rwlock) r = 0 (0xfffffe000aa9d408) locked @ /home/golub/freebsd/base/head/sys/modules/if_lagg/../../net/if_lagg.c:1065 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b kdb_backtrace() at kdb_backtrace+0x39 witness_warn() at witness_warn+0x4b2 taskqueue_drain() at taskqueue_drain+0x3a ieee80211_waitfor_parent() at ieee80211_waitfor_parent+0x28 ieee80211_ioctl() at ieee80211_ioctl+0x3e9 if_setflag() at if_setflag+0xc0 ifpromisc() at ifpromisc+0x2c lagg_ioctl() at lagg_ioctl+0x7d5 if_setflag() at if_setflag+0xc0 ifpromisc() at ifpromisc+0x2c bridge_ioctl_add() at bridge_ioctl_add+0x454 bridge_ioctl() at bridge_ioctl+0x268 in_control() at in_control+0x219 ifioctl() at ifioctl+0x1896 kern_ioctl() at kern_ioctl+0x1b0 sys_ioctl() at sys_ioctl+0x11f amd64_syscall() at amd64_syscall+0x282 Xfast_syscall() at Xfast_syscall+0xfb --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011815ca, rsp = 0x7fffffffd3f8, rbp = 0x7fffffffd4a0 --- and eventually the panic "Sleeping thread owns a non-sleepable lock" in lagg_input, when a packet arrives simultaneously with ifconfig run. The lagg gets if_lagg rwlock before going to setflag, which ends up calling ieee80211_ioctl and ieee80211_waitfor_parent (wait for all deferred parent interface tasks to complete). Does anybody see a way how it could be solved? -- Mikolaj Golub From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 19:48:18 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1F9A817D for ; Sun, 2 Dec 2012 19:48:18 +0000 (UTC) (envelope-from Choupani@gmail.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id ECBF58FC15 for ; Sun, 2 Dec 2012 19:48:17 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1TfFWS-0001x9-AK for freebsd-net@freebsd.org; Sun, 02 Dec 2012 11:48:16 -0800 Date: Sun, 2 Dec 2012 11:48:16 -0800 (PST) From: Choupani To: freebsd-net@freebsd.org Message-ID: <1354477696312-5766007.post@n5.nabble.com> Subject: protect common resources in kernel MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 19:48:18 -0000 Dears, I'm working on kernel in FreeBSD-9. I need to protect a=20 common resource (for example a queue).=20 There are 4 points for access (read/write) this common resource as bellows: 1. ether_input() =E2=80=93 hardware interrupt 2. ip_input() & ip_output() =E2=80=93 software interrupt 3. dev_ioctl() =E2=80=93 local io control in our own kernel module 4. another kernel thread Which scenario is proper to use for this purpose: 1. kernel mutex (MTX_DEF) 2. kernel mutex (MTX_SPIN) 3. kernel share/exclusive lock 4. kernel reader/writer lock -- View this message in context: http://freebsd.1045724.n5.nabble.com/protect-= common-resources-in-kernel-tp5766007.html Sent from the freebsd-net mailing list archive at Nabble.com. From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 22:47:52 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AB02E61C for ; Sun, 2 Dec 2012 22:47:52 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7926F8FC14 for ; Sun, 2 Dec 2012 22:47:52 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 2B9F946B2A; Sun, 2 Dec 2012 17:47:52 -0500 (EST) Date: Sun, 2 Dec 2012 22:47:51 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Choupani Subject: Re: protect common resources in kernel In-Reply-To: <1354477696312-5766007.post@n5.nabble.com> Message-ID: References: <1354477696312-5766007.post@n5.nabble.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-1806727502-1354488472=:18806" Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 22:47:52 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1806727502-1354488472=:18806 Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed Content-Transfer-Encoding: 8BIT On Sun, 2 Dec 2012, Choupani wrote: > I'm working on kernel in FreeBSD-9. I need to protect a > common resource (for example a queue). > There are 4 points for access (read/write) this common resource as bellows: > 1. ether_input() – hardware interrupt > 2. ip_input() & ip_output() – software interrupt > 3. dev_ioctl() – local io control in our own kernel module > 4. another kernel thread > > Which scenario is proper to use for this purpose: > > 1. kernel mutex (MTX_DEF) > 2. kernel mutex (MTX_SPIN) > 3. kernel share/exclusive lock > 4. kernel reader/writer lock Hi Choupani: Assuming you are not accessing the resource from a low-level interrupt handler ("filter") or within the scheduler, your best bets are (1) or (4), depending on whether you think you will benefit from read-locking as opposed to just write-locking. (2) should be avoided unless in the low-level interrupt/scheduler context, as it takes additional overhead (disabling interrupts, etc), and (3) can't be used in contexts were unbounded sleeping isn't allowed (e.g., from ithreads, within most parts of the lower network stack). Robert --621616949-1806727502-1354488472=:18806-- From owner-freebsd-net@FreeBSD.ORG Sun Dec 2 23:31:05 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5C8E7C9 for ; Sun, 2 Dec 2012 23:31:05 +0000 (UTC) (envelope-from fineuropa06@hotmail.it) Received: from smtpdg7.aruba.it (smtpdg225.aruba.it [62.149.158.225]) by mx1.freebsd.org (Postfix) with ESMTP id A03788FC13 for ; Sun, 2 Dec 2012 23:31:04 +0000 (UTC) Received: from eliot.com ([67.205.103.205]) by smtpcmd03.ad.aruba.it with bizsmtp id WnVp1k00D4RuJtv01nVr1L; Mon, 03 Dec 2012 00:29:52 +0100 From: "GIOBBE" Subject: Dicembre disponibili le nuove agevolazioni per l'impresa e la famiglia To: "freebsd-net" Content-Type: text/plain; charset=iso-8859-1 MIME-Version: 1.0 Date: Mon, 3 Dec 2012 00:29:51 +0100 X-Mailer-MsgId: IB202VDM7zIuXUA09UEpHLlFSXUE6OjpAkLmdsY3NwbW5fPm03c3JqbW0waSxhbW8s6LS0tPGRnbGNzcG1uXy4vPm1zckFqbW1pLGE2bWs6LS0tPGRnbGNzcG1uXy4wP9m1zcmptbWksYW1rOi0tLTxkZ2xjc3Btbl8uMT5tc3JqbW1pLGFBbWs6LS0tPDVkZ2xjc3Btbl8+Zm1ya19naixncjotLS08ZGdsY3NwbW5fLi8+Zm1ya19naixncjot5LS08ZGdsY3NwbW5fLjA+Zm1ya19naixncjotLS08ZGdsY3NwbW5fLjE+Zm1ya19naixncjotLS08ZGdsY3NwbW5fLjI+ZkFtcmtfZ2osZzVyOi0tLTxkZ2xjc3Btbl8uMz5mbXJrX2dqLGdyOi0tLTxkZ2xjc3Btbl8uND5mbXJrX2dqLGdyOi0tLTxkZ2xjc3Btbl8uNT5mbXJrX2dqLGdyOi0tLTxkZ2xjc3Btbl8uNj5mbXJrX2dqLGdyPDw08IwdAM0NENkI/Mys2MEI/ Message-Id: <20121202233105.5C8E7C9@hub.freebsd.org> X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2012 23:31:05 -0000 Finalmente arrivano le agevolazioni per l'attività e la famiglia. Potrai trovare tutte le novità, incollando su google " agevolazioni_italia_soluzioni " Per non ricevere ulteriori comunicazioni segui la procedura e clicca su unsubscribe Spero possa essere utile. Ciao From owner-freebsd-net@FreeBSD.ORG Mon Dec 3 08:11:43 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 707C02E6; Mon, 3 Dec 2012 08:11:43 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id DC12E8FC15; Mon, 3 Dec 2012 08:11:42 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.5/8.14.5) with ESMTP id qB38BZHS040405; Mon, 3 Dec 2012 12:11:35 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.5/8.14.5/Submit) id qB38BYgH040404; Mon, 3 Dec 2012 12:11:34 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Mon, 3 Dec 2012 12:11:34 +0400 From: Gleb Smirnoff To: "Alexander V. Chernikov" Subject: Re: [CFT] Virtual BPF interfaces (was: CFR: ipfw0 pseudo-interface clonable) Message-ID: <20121203081134.GO14202@glebius.int.ru> References: <4F96D11B.2060007@FreeBSD.org> <20120425.020518.406495893112283552.hrs@allbsd.org> <4F96E71B.9020405@FreeBSD.org> <20120427.084414.1142593201575277510.hrs@allbsd.org> <4FD4AD29.3040204@FreeBSD.org> <50BAA552.1010707@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <50BAA552.1010707@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-ipfw@FreeBSD.org, Hiroki Sato , delphij@FreeBSD.org, "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 08:11:43 -0000 On Sun, Dec 02, 2012 at 04:48:18AM +0400, Alexander V. Chernikov wrote: A> On 10.06.2012 18:20, Alexander V. Chernikov wrote: A> > On 27.04.2012 03:44, Hiroki Sato wrote: A> >> "Alexander V. Chernikov" wrote A> >> in<4F96E71B.9020405@FreeBSD.org>: A> >> A> >> me> On 24.04.2012 21:05, Hiroki Sato wrote: A> > A> > Proof-of-concept patch attached. A> A> Hopefully, libcap code is easily extendable. A> New version attached: A> * BPF code is now able to use 'virtual' interfaces without real ifnet A> * New bpfattach3() / bpfdetach3() routines were added to attach virtual A> ifaces A> * New BIOCGIFLIST ioctl is added to permit userland to retrieve A> available virtual interfaces A> * freebsd-specific 'platform_finddevs' version is added to libpcap code A> (new file) A> A> There are some rough edges (conditional code in pcap-bpf.c, lack of A> documentation, maybe some style issues), but generally it seems to work A> and does not interfere with contrib/ code much (from my point of view). A> A> ipfw log device was converted to use new bpf(4) api, see attached patch. Nice proof of concept, Alexander! What does prevent us from unifing all bpf providers to be "virtual" in current terms? I think if we finish divorce between ifnet and bpf, the code would get simplier and you can proceed further with reducing locking overhead. -- Totus tuus, Glebius. From owner-freebsd-net@FreeBSD.ORG Mon Dec 3 11:06:48 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 73D0DDB for ; Mon, 3 Dec 2012 11:06:48 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 566A28FC1D for ; Mon, 3 Dec 2012 11:06:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qB3B6mZC027628 for ; Mon, 3 Dec 2012 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qB3B6leN027626 for freebsd-net@FreeBSD.org; Mon, 3 Dec 2012 11:06:47 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 3 Dec 2012 11:06:47 GMT Message-Id: <201212031106.qB3B6leN027626@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 11:06:48 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/173475 net [tun] tun(4) stays opened by PID after process is term o kern/173201 net [ixgbe] [patch] Missing / broken ixgbe sysctl's and tu o kern/173137 net [em] em(4) unable to run at gigabit with 9.1-RC2 o kern/173002 net [patch] data type size problem in if_spppsubr.c o kern/172985 net [patch] [ip6] lltable leak when adding and removing IP o kern/172895 net [ixgb] [ixgbe] do not properly determine link-state o kern/172683 net [ip6] Duplicate IPv6 Link Local Addresses o kern/172675 net [netinet] [patch] sysctl_tcp_hc_list (net.inet.tcp.hos o kern/172113 net [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4 o kern/171840 net [ip6] IPv6 packets transmitting only on queue 0 o kern/171838 net [oce] [patch] Possible lock reversal and duplicate loc o kern/171739 net [bce] [panic] bce related kernel panic o kern/171728 net [arp] arp issue o kern/171711 net [dummynet] [panic] Kernel panic in dummynet o kern/171697 net [ip6] [ndp] panic when changing routes o kern/171532 net [ndis] ndis(4) driver includes 'pccard'-specific code, o kern/171531 net [ndis] undocumented dependency for ndis(4) o kern/171524 net [ipmi] ipmi driver crashes kernel by reboot or shutdow s kern/171508 net [epair] [request] Add the ability to name epair device o kern/171228 net [re] [patch] if_re - eeprom write issues o kern/170701 net [ppp] killl ppp or reboot with active ppp connection c o kern/170267 net [ixgbe] IXGBE_LE32_TO_CPUS is probably an unintentiona o kern/170081 net [fxp] pf/nat/jails not working if checksum offloading o kern/169898 net ifconfig(8) fails to set MTU on multiple interfaces. o kern/169676 net [bge] [hang] system hangs, fully or partially after re o kern/169664 net [bgp] Wrongful replacement of interface connected net o kern/169620 net [ng] [pf] ng_l2tp incoming packet bypass pf firewall o kern/169459 net [ppp] umodem/ppp/3g stopped working after update from o kern/169438 net [ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work p kern/168294 net [ixgbe] [patch] ixgbe driver compiled in kernel has no o kern/168246 net [em] Multiple em(4) not working with qemu o kern/168245 net [arp] [regression] Permanent ARP entry not deleted on o kern/168244 net [arp] [regression] Unable to manually remove permanent o kern/168183 net [bce] bce driver hang system o kern/167947 net [setfib] [patch] arpresolve checks only the default FI o kern/167603 net [ip] IP fragment reassembly's broken: file transfer ov o kern/167500 net [em] [panic] Kernel panics in em driver o kern/167325 net [netinet] [patch] sosend sometimes return EINVAL with o kern/167202 net [igmp]: Sending multiple IGMP packets crashes kernel o kern/167059 net [tcp] [panic] System does panic in in_pcbbind() and ha o kern/166940 net [ipfilter] [panic] Double fault in kern 8.2 o kern/166462 net [gre] gre(4) when using a tunnel source address from c o kern/166372 net [patch] ipfilter drops UDP packets with zero checksum o kern/166285 net [arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres o kern/166255 net [net] [patch] It should be possible to disable "promis o kern/165963 net [panic] [ipf] ipfilter/nat NULL pointer deference o kern/165903 net mbuf leak o kern/165643 net [net] [patch] Missing vnet restores in net/if_ethersub o kern/165622 net [ndis][panic][patch] Unregistered use of FPU in kernel s kern/165562 net [request] add support for Intel i350 in FreeBSD 7.4 o kern/165526 net [bxe] UDP packets checksum calculation whithin if_bxe o kern/165488 net [ppp] [panic] Fatal trap 12 jails and ppp , kernel wit o kern/165305 net [ip6] [request] Feature parity between IP_TOS and IPV6 o kern/165296 net [vlan] [patch] Fix EVL_APPLY_VLID, update EVL_APPLY_PR o kern/165181 net [igb] igb freezes after about 2 weeks of uptime o kern/165174 net [patch] [tap] allow tap(4) to keep its address on clos o kern/165152 net [ip6] Does not work through the issue of ipv6 addresse o kern/164495 net [igb] connect double head igb to switch cause system t o kern/164490 net [pfil] Incorrect IP checksum on pfil pass from ip_outp o kern/164475 net [gre] gre misses RUNNING flag after a reboot o kern/164265 net [netinet] [patch] tcp_lro_rx computes wrong checksum i o kern/163903 net [igb] "igb0:tx(0)","bpf interface lock" v2.2.5 9-STABL o kern/163481 net freebsd do not add itself to ping route packet o kern/162927 net [tun] Modem-PPP error ppp[1538]: tun0: Phase: Clearing o kern/162926 net [ipfilter] Infinite loop in ipfilter with fragmented I o kern/162558 net [dummynet] [panic] seldom dummynet panics o kern/162153 net [em] intel em driver 7.2.4 don't compile o kern/162110 net [igb] [panic] RELENG_9 panics on boot in IGB driver - o kern/162028 net [ixgbe] [patch] misplaced #endif in ixgbe.c o kern/161277 net [em] [patch] BMC cannot receive IPMI traffic after loa o kern/160873 net [igb] igb(4) from HEAD fails to build on 7-STABLE o kern/160750 net Intel PRO/1000 connection breaks under load until rebo o kern/160693 net [gif] [em] Multicast packet are not passed from GIF0 t o kern/160293 net [ieee80211] ppanic] kernel panic during network setup o kern/160206 net [gif] gifX stops working after a while (IPv6 tunnel) o kern/159817 net [udp] write UDPv4: No buffer space available (code=55) o kern/159629 net [ipsec] [panic] kernel panic with IPsec in transport m o kern/159621 net [tcp] [panic] panic: soabort: so_count o kern/159603 net [netinet] [patch] in_ifscrubprefix() - network route c o kern/159601 net [netinet] [patch] in_scrubprefix() - loopback route re o kern/159294 net [em] em watchdog timeouts o kern/159203 net [wpi] Intel 3945ABG Wireless LAN not support IBSS o kern/158930 net [bpf] BPF element leak in ifp->bpf_if->bif_dlist o kern/158726 net [ip6] [patch] ICMPv6 Router Announcement flooding limi o kern/158694 net [ix] [lagg] ix0 is not working within lagg(4) o kern/158665 net [ip6] [panic] kernel pagefault in in6_setscope() o kern/158635 net [em] TSO breaks BPF packet captures with em driver f kern/157802 net [dummynet] [panic] kernel panic in dummynet o kern/157785 net amd64 + jail + ipfw + natd = very slow outbound traffi o kern/157418 net [em] em driver lockup during boot on Supermicro X9SCM- o kern/157410 net [ip6] IPv6 Router Advertisements Cause Excessive CPU U o kern/157287 net [re] [panic] INVARIANTS panic (Memory modified after f o kern/157209 net [ip6] [patch] locking error in rip6_input() (sys/netin o kern/157200 net [network.subr] [patch] stf(4) can not communicate betw o kern/157182 net [lagg] lagg interface not working together with epair o kern/156877 net [dummynet] [panic] dummynet move_pkt() null ptr derefe o kern/156667 net [em] em0 fails to init on CURRENT after March 17 o kern/156408 net [vlan] Routing failure when using VLANs vs. Physical e o kern/156328 net [icmp]: host can ping other subnet but no have IP from o kern/156317 net [ip6] Wrong order of IPv6 NS DAD/MLD Report o kern/156283 net [ip6] [patch] nd6_ns_input - rtalloc_mpath does not re o kern/156279 net [if_bridge][divert][ipfw] unable to correctly re-injec o kern/156226 net [lagg]: failover does not announce the failover to swi o kern/156030 net [ip6] [panic] Crash in nd6_dad_start() due to null ptr o kern/155772 net ifconfig(8): ioctl (SIOCAIFADDR): File exists on direc o kern/155680 net [multicast] problems with multicast s kern/155642 net [request] Add driver for Realtek RTL8191SE/RTL8192SE W o kern/155597 net [panic] Kernel panics with "sbdrop" message o kern/155420 net [vlan] adding vlan break existent vlan o kern/155177 net [route] [panic] Panic when inject routes in kernel p kern/155030 net [igb] igb(4) DEVICE_POLLING does not work with carp(4) o kern/155010 net [msk] ntfs-3g via iscsi using msk driver cause kernel o kern/154943 net [gif] ifconfig gifX create on existing gifX clears IP s kern/154851 net [request]: Port brcm80211 driver from Linux to FreeBSD o kern/154850 net [netgraph] [patch] ng_ether fails to name nodes when t o kern/154679 net [em] Fatal trap 12: "em1 taskq" only at startup (8.1-R o kern/154600 net [tcp] [panic] Random kernel panics on tcp_output o kern/154557 net [tcp] Freeze tcp-session of the clients, if in the gat o kern/154443 net [if_bridge] Kernel module bridgestp.ko missing after u o kern/154286 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/154255 net [nfs] NFS not responding o kern/154214 net [stf] [panic] Panic when creating stf interface o kern/154185 net race condition in mb_dupcl o kern/154169 net [multicast] [ip6] Node Information Query multicast add o kern/154134 net [ip6] stuck kernel state in LISTEN on ipv6 daemon whic o kern/154091 net [netgraph] [panic] netgraph, unaligned mbuf? o conf/154062 net [vlan] [patch] change to way of auto-generatation of v o kern/153937 net [ral] ralink panics the system (amd64 freeBSDD 8.X) wh o kern/153936 net [ixgbe] [patch] MPRC workaround incorrectly applied to o kern/153816 net [ixgbe] ixgbe doesn't work properly with the Intel 10g o kern/153772 net [ixgbe] [patch] sysctls reference wrong XON/XOFF varia o kern/153497 net [netgraph] netgraph panic due to race conditions o kern/153454 net [patch] [wlan] [urtw] Support ad-hoc and hostap modes o kern/153308 net [em] em interface use 100% cpu o kern/153244 net [em] em(4) fails to send UDP to port 0xffff o kern/152893 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/152853 net [em] tftpd (and likely other udp traffic) fails over e o kern/152828 net [em] poor performance on 8.1, 8.2-PRE o kern/152569 net [net]: Multiple ppp connections and routing table prob o kern/152235 net [arp] Permanent local ARP entries are not properly upd o kern/152141 net [vlan] [patch] encapsulate vlan in ng_ether before out o kern/152036 net [libc] getifaddrs(3) returns truncated sockaddrs for n o kern/151690 net [ep] network connectivity won't work until dhclient is o kern/151681 net [nfs] NFS mount via IPv6 leads to hang on client with o kern/151593 net [igb] [panic] Kernel panic when bringing up igb networ o kern/150920 net [ixgbe][igb] Panic when packets are dropped with heade o kern/150557 net [igb] igb0: Watchdog timeout -- resetting o kern/150251 net [patch] [ixgbe] Late cable insertion broken o kern/150249 net [ixgbe] Media type detection broken o bin/150224 net ppp(8) does not reassign static IP after kill -KILL co f kern/149969 net [wlan] [ral] ralink rt2661 fails to maintain connectio o kern/149937 net [ipfilter] [patch] kernel panic in ipfilter IP fragmen o kern/149643 net [rum] device not sending proper beacon frames in ap mo o kern/149609 net [panic] reboot after adding second default route o kern/149117 net [inet] [patch] in_pcbbind: redundant test o kern/149086 net [multicast] Generic multicast join failure in 8.1 o kern/148018 net [flowtable] flowtable crashes on ia64 o kern/147912 net [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300 11 o kern/147894 net [ipsec] IPv6-in-IPv4 does not work inside an ESP-only o kern/147155 net [ip6] setfb not work with ipv6 o kern/146845 net [libc] close(2) returns error 54 (connection reset by f kern/146792 net [flowtable] flowcleaner 100% cpu's core load o kern/146719 net [pf] [panic] PF or dumynet kernel panic o kern/146534 net [icmp6] wrong source address in echo reply o kern/146427 net [mwl] Additional virtual access points don't work on m f kern/146394 net [vlan] IP source address for outgoing connections o bin/146377 net [ppp] [tun] Interface doesn't clear addresses when PPP o kern/146358 net [vlan] wrong destination MAC address o kern/146165 net [wlan] [panic] Setting bssid in adhoc mode causes pani o kern/146082 net [ng_l2tp] a false invaliant check was performed in ng_ o kern/146037 net [panic] mpd + CoA = kernel panic o kern/145825 net [panic] panic: soabort: so_count o kern/145728 net [lagg] Stops working lagg between two servers. p kern/145600 net TCP/ECN behaves different to CE/CWR than ns2 reference f kern/144917 net [flowtable] [panic] flowtable crashes system [regressi o kern/144882 net MacBookPro =>4.1 does not connect to BSD in hostap wit o kern/144874 net [if_bridge] [patch] if_bridge frees mbuf after pfil ho o conf/144700 net [rc.d] async dhclient breaks stuff for too many people o kern/144616 net [nat] [panic] ip_nat panic FreeBSD 7.2 f kern/144315 net [ipfw] [panic] freebsd 8-stable reboot after add ipfw o kern/144231 net bind/connect/sendto too strict about sockaddr length o kern/143846 net [gif] bringing gif3 tunnel down causes gif0 tunnel to s kern/143673 net [stf] [request] there should be a way to support multi s kern/143666 net [ip6] [request] PMTU black hole detection not implemen o kern/143622 net [pfil] [patch] unlock pfil lock while calling firewall o kern/143593 net [ipsec] When using IPSec, tcpdump doesn't show outgoin o kern/143591 net [ral] RT2561C-based DLink card (DWL-510) fails to work o kern/143208 net [ipsec] [gif] IPSec over gif interface not working o kern/143034 net [panic] system reboots itself in tcp code [regression] o kern/142877 net [hang] network-related repeatable 8.0-STABLE hard hang o kern/142774 net Problem with outgoing connections on interface with mu o kern/142772 net [libc] lla_lookup: new lle malloc failed f kern/142518 net [em] [lagg] Problem on 8.0-STABLE with em and lagg o kern/142018 net [iwi] [patch] Possibly wrong interpretation of beacon- o kern/141861 net [wi] data garbled with WEP and wi(4) with Prism 2.5 f kern/141741 net Etherlink III NIC won't work after upgrade to FBSD 8, o kern/140742 net rum(4) Two asus-WL167G adapters cannot talk to each ot o kern/140682 net [netgraph] [panic] random panic in netgraph f kern/140634 net [vlan] destroying if_lagg interface with if_vlan membe o kern/140619 net [ifnet] [patch] refine obsolete if_var.h comments desc o kern/140346 net [wlan] High bandwidth use causes loss of wlan connecti o kern/140142 net [ip6] [panic] FreeBSD 7.2-amd64 panic w/IPv6 o kern/140066 net [bwi] install report for 8.0 RC 2 (multiple problems) o kern/139565 net [ipfilter] ipfilter ioctl SIOCDELST broken o kern/139387 net [ipsec] Wrong lenth of PF_KEY messages in promiscuous o bin/139346 net [patch] arp(8) add option to remove static entries lis o kern/139268 net [if_bridge] [patch] allow if_bridge to forward just VL p kern/139204 net [arp] DHCP server replies rejected, ARP entry lost bef o kern/139117 net [lagg] + wlan boot timing (EBUSY) o kern/139058 net [ipfilter] mbuf cluster leak on FreeBSD 7.2 o kern/138850 net [dummynet] dummynet doesn't work correctly on a bridge o kern/138782 net [panic] sbflush_internal: cc 0 || mb 0xffffff004127b00 o kern/138688 net [rum] possibly broken on 8 Beta 4 amd64: able to wpa a o kern/138678 net [lo] FreeBSD does not assign linklocal address to loop o kern/138407 net [gre] gre(4) interface does not come up after reboot o kern/138332 net [tun] [lor] ifconfig tun0 destroy causes LOR if_adata/ o kern/138266 net [panic] kernel panic when udp benchmark test used as r o kern/138177 net [ipfilter] FreeBSD crashing repeatedly in ip_nat.c:257 f kern/138029 net [bpf] [panic] periodically kernel panic and reboot o kern/137881 net [netgraph] [panic] ng_pppoe fatal trap 12 p bin/137841 net [patch] wpa_supplicant(8) cannot verify SHA256 signed p kern/137776 net [rum] panic in rum(4) driver on 8.0-BETA2 o bin/137641 net ifconfig(8): various problems with "vlan_device.vlan_i o kern/137392 net [ip] [panic] crash in ip_nat.c line 2577 o kern/137372 net [ral] FreeBSD doesn't support wireless interface from o kern/137089 net [lagg] lagg falsely triggers IPv6 duplicate address de o bin/136994 net [patch] ifconfig(8) print carp mac address o kern/136911 net [netgraph] [panic] system panic on kldload ng_bpf.ko t o kern/136618 net [pf][stf] panic on cloning interface without unit numb o kern/135502 net [periodic] Warning message raised by rtfree function i o kern/134583 net [hang] Machine with jail freezes after random amount o o kern/134531 net [route] [panic] kernel crash related to routes/zebra o kern/134157 net [dummynet] dummynet loads cpu for 100% and make a syst o kern/133969 net [dummynet] [panic] Fatal trap 12: page fault while in o kern/133968 net [dummynet] [panic] dummynet kernel panic o kern/133736 net [udp] ip_id not protected ... o kern/133595 net [panic] Kernel Panic at pcpu.h:195 o kern/133572 net [ppp] [hang] incoming PPTP connection hangs the system o kern/133490 net [bpf] [panic] 'kmem_map too small' panic on Dell r900 o kern/133235 net [netinet] [patch] Process SIOCDLIFADDR command incorre f kern/133213 net arp and sshd errors on 7.1-PRERELEASE o kern/133060 net [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs o kern/132889 net [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d o conf/132851 net [patch] rc.conf(5): allow to setfib(1) for service run o kern/132734 net [ifmib] [panic] panic in net/if_mib.c o kern/132705 net [libwrap] [patch] libwrap - infinite loop if hosts.all o kern/132672 net [ndis] [panic] ndis with rt2860.sys causes kernel pani o kern/132554 net [ipl] There is no ippool start script/ipfilter magic t o kern/132354 net [nat] Getting some packages to ipnat(8) causes crash o kern/132277 net [crypto] [ipsec] poor performance using cryptodevice f o kern/131781 net [ndis] ndis keeps dropping the link o kern/131776 net [wi] driver fails to init o kern/131753 net [altq] [panic] kernel panic in hfsc_dequeue o kern/131601 net [ipfilter] [panic] 7-STABLE panic in nat_finalise (tcp o bin/131567 net [socket] [patch] Update for regression/sockets/unix_cm o bin/131365 net route(8): route add changes interpretation of network f kern/130820 net [ndis] wpa_supplicant(8) returns 'no space on device' o kern/130628 net [nfs] NFS / rpc.lockd deadlock on 7.1-R o conf/130555 net [rc.d] [patch] No good way to set ipfilter variables a o kern/130525 net [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau o kern/130311 net [wlan_xauth] [panic] hostapd restart causing kernel pa o kern/130109 net [ipfw] Can not set fib for packets originated from loc f kern/130059 net [panic] Leaking 50k mbufs/hour f kern/129719 net [nfs] [panic] Panic during shutdown, tcp_ctloutput: in o kern/129517 net [ipsec] [panic] double fault / stack overflow f kern/129508 net [carp] [panic] Kernel panic with EtherIP (may be relat o kern/129219 net [ppp] Kernel panic when using kernel mode ppp o kern/129197 net [panic] 7.0 IP stack related panic o bin/128954 net ifconfig(8) deletes valid routes o bin/128602 net [an] wpa_supplicant(8) crashes with an(4) o kern/128448 net [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res o bin/128295 net [patch] ifconfig(8) does not print TOE4 or TOE6 capabi o bin/128001 net wpa_supplicant(8), wlan(4), and wi(4) issues o kern/127826 net [iwi] iwi0 driver has reduced performance and connecti o kern/127815 net [gif] [patch] if_gif does not set vlan attributes from o kern/127724 net [rtalloc] rtfree: 0xc5a8f870 has 1 refs f bin/127719 net [arp] arp: Segmentation fault (core dumped) f kern/127528 net [icmp]: icmp socket receives icmp replies not owned by p kern/127360 net [socket] TOE socket options missing from sosetopt() o bin/127192 net routed(8) removes the secondary alias IP of interface f kern/127145 net [wi]: prism (wi) driver crash at bigger traffic o kern/126895 net [patch] [ral] Add antenna selection (marked as TBD) o kern/126874 net [vlan]: Zebra problem if ifconfig vlanX destroy o kern/126695 net rtfree messages and network disruption upon use of if_ o kern/126339 net [ipw] ipw driver drops the connection o kern/126075 net [inet] [patch] internet control accesses beyond end of o bin/125922 net [patch] Deadlock in arp(8) o kern/125920 net [arp] Kernel Routing Table loses Ethernet Link status o kern/125845 net [netinet] [patch] tcp_lro_rx() should make use of hard o kern/125258 net [socket] socket's SO_REUSEADDR option does not work o kern/125239 net [gre] kernel crash when using gre o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124225 net [ndis] [patch] ndis network driver sometimes loses net o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/123892 net [tap] [patch] No buffer space available o kern/123890 net [ppp] [panic] crash & reboot on work with PPP low-spee o kern/123858 net [stf] [patch] stf not usable behind a NAT o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o kern/123758 net [panic] panic while restarting net/freenet6 o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices f kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge o kern/122685 net It is not visible passing packets in tcpdump(1) o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup ieee o bin/121895 net [patch] rtsol(8)/rtsold(8) doesn't handle managed netw s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121443 net [gif] [lor] icmp6_input/nd6_lookup o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o bin/121359 net [patch] [security] ppp(8): fix local stack overflow in o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120266 net [udp] [panic] gnugk causes kernel panic when closing U o bin/120060 net routed(8) deletes link-level routes in the presence of o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119432 net [arp] route add -host -iface causes arp e o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/118727 net [netgraph] [patch] [request] add new ng_pf module o kern/117423 net [vlan] Duplicate IP on different interfaces o bin/117339 net [patch] route(8): loading routing management commands o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/115019 net [netgraph] ng_ether upper hook packet flow stops on ad o kern/115002 net [wi] if_wi timeout. failed allocation (busy bit). ifco o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o kern/113432 net [ucom] WARNING: attempt to net_add_domain(netgraph) af o kern/112722 net [ipsec] [udp] IP v4 udp fragmented packet reject o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/111537 net [inet6] [patch] ip6_input() treats mbuf cluster wrong o kern/111457 net [ral] ral(4) freeze o kern/110284 net [if_ethersubr] Invalid Assumption in SIOCSIFADDR in et o kern/110249 net [kernel] [regression] [patch] setsockopt() error regre o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/107944 net [wi] [patch] Forget to unlock mutex-locks o conf/107035 net [patch] bridge(8): bridge interface given in rc.conf n o kern/106444 net [netgraph] [panic] Kernel Panic on Binding to an ip to o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/105945 net Address can disappear from network interface s kern/105943 net Network stack may modify read-only mbuf chain copies o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] o kern/104851 net [inet6] [patch] On link routes not configured when usi o kern/104751 net [netgraph] kernel panic, when getting info about my tr o kern/103191 net Unpredictable reboot o kern/103135 net [ipsec] ipsec with ipfw divert (not NAT) encodes a pac o kern/102540 net [netgraph] [patch] supporting vlan(4) by ng_fec(4) o conf/102502 net [netgraph] [patch] ifconfig name does't rename netgrap o kern/102035 net [plip] plip networking disables parallel port printing o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau o kern/100709 net [libc] getaddrinfo(3) should return TTL info o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/98597 net [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu o bin/98218 net wpa_supplicant(8) blacklist not working o kern/97306 net [netgraph] NG_L2TP locks after connection with failed o conf/97014 net [gif] gifconfig_gif? in rc.conf does not recognize IPv f kern/96268 net [socket] TCP socket performance drops by 3000% if pack o kern/95519 net [ral] ral0 could not map mbuf o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/95267 net packet drops periodically appear f kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/93019 net [ppp] ppp and tunX problems: no traffic after restarti o kern/92880 net [libc] [patch] almost rewritten inet_network(3) functi s kern/92279 net [dc] Core faults everytime I reboot, possible NIC issu o kern/91859 net [ndis] if_ndis does not work with Asus WL-138 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/91364 net [ral] [wep] WF-511 RT2500 Card PCI and WEP o kern/91311 net [aue] aue interface hanging o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/87421 net [netgraph] [panic]: ng_ether + ng_eiface + if_bridge o kern/86871 net [tcp] [patch] allocation logic for PCBs in TIME_WAIT s o kern/86427 net [lor] Deadlock with FASTIPSEC and nat o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter o kern/85780 net 'panic: bogus refcnt 0' in routing/ipv6 o bin/85445 net ifconfig(8): deprecated keyword to ifconfig inoperativ p kern/85320 net [gre] [patch] possible depletion of kernel stack in ip o bin/82975 net route change does not parse classfull network as given o kern/82881 net [netgraph] [panic] ng_fec(4) causes kernel panic after o kern/82468 net Using 64MB tcp send/recv buffers, trafficflow stops, i o bin/82185 net [patch] ndp(8) can delete the incorrect entry o kern/81095 net IPsec connection stops working if associated network i o kern/78968 net FreeBSD freezes on mbufs exhaustion (network interface o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o kern/77341 net [ip6] problems with IPV6 implementation s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/75873 net Usability problem with non-RFC-compliant IP spoof prot s kern/75407 net [an] an(4): no carrier after short time a kern/71474 net [route] route lookup does not skip interfaces marked d o kern/71469 net default route to internet magically disappears with mu o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/68889 net [panic] m_copym, length > size of mbuf chain o kern/66225 net [netgraph] [patch] extend ng_eiface(4) control message o kern/65616 net IPSEC can't detunnel GRE packets after real ESP encryp s kern/60293 net [patch] FreeBSD arp poison patch a kern/56233 net IPsec tunnel (ESP) over IPv6: MTU computation is wrong s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/39937 net ipstealth issue a kern/38554 net [patch] changing interface ipaddress doesn't seem to w o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". o kern/31940 net ip queue length too short for >500kpps o kern/31647 net [libc] socket calls can return undocumented EINVAL o kern/30186 net [libc] getaddrinfo(3) does not handle incorrect servna o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c f kern/24959 net [patch] proper TCP_NOPUSH/TCP_CORK compatibility o conf/23063 net [arp] [patch] for static ARP tables in rc.network o kern/21998 net [socket] [patch] ident only for outgoing connections o kern/5877 net [socket] sb_cc counts control data as well as data dat 428 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Dec 3 12:21:48 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ABDB230F; Mon, 3 Dec 2012 12:21:48 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 39C778FC12; Mon, 3 Dec 2012 12:21:48 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=dhcp170-36-red.yandex.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1TfV5I-00053l-89; Mon, 03 Dec 2012 16:25:16 +0400 Message-ID: <50BC989E.3080303@FreeBSD.org> Date: Mon, 03 Dec 2012 16:18:38 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120627 Thunderbird/13.0.1 MIME-Version: 1.0 To: Gleb Smirnoff Subject: Re: [CFT] Virtual BPF interfaces References: <4F96D11B.2060007@FreeBSD.org> <20120425.020518.406495893112283552.hrs@allbsd.org> <4F96E71B.9020405@FreeBSD.org> <20120427.084414.1142593201575277510.hrs@allbsd.org> <4FD4AD29.3040204@FreeBSD.org> <50BAA552.1010707@FreeBSD.org> <20121203081134.GO14202@glebius.int.ru> In-Reply-To: <20121203081134.GO14202@glebius.int.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-ipfw@FreeBSD.org, Hiroki Sato , delphij@FreeBSD.org, "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 12:21:48 -0000 On 03.12.2012 12:11, Gleb Smirnoff wrote: > On Sun, Dec 02, 2012 at 04:48:18AM +0400, Alexander V. Chernikov wrote: > A> On 10.06.2012 18:20, Alexander V. Chernikov wrote: > A> > On 27.04.2012 03:44, Hiroki Sato wrote: > A> >> "Alexander V. Chernikov" wrote > A> >> in<4F96E71B.9020405@FreeBSD.org>: > A> >> > A> >> me> On 24.04.2012 21:05, Hiroki Sato wrote: > A> > > A> > Proof-of-concept patch attached. > A> > A> Hopefully, libcap code is easily extendable. > A> New version attached: > A> * BPF code is now able to use 'virtual' interfaces without real ifnet > A> * New bpfattach3() / bpfdetach3() routines were added to attach virtual > A> ifaces > A> * New BIOCGIFLIST ioctl is added to permit userland to retrieve > A> available virtual interfaces > A> * freebsd-specific 'platform_finddevs' version is added to libpcap code > A> (new file) > A> > A> There are some rough edges (conditional code in pcap-bpf.c, lack of > A> documentation, maybe some style issues), but generally it seems to work > A> and does not interfere with contrib/ code much (from my point of view). > A> > A> ipfw log device was converted to use new bpf(4) api, see attached patch. > > Nice proof of concept, Alexander! > > What does prevent us from unifing all bpf providers to be "virtual" in > current terms? I think if we finish divorce between ifnet and bpf, the code > would get simplier and you can proceed further with reducing locking > overhead. We have to jump from ifnet to the list of per-ifnet BPF consumers somehow, so I'm not sure if we can do much more here. BPF itself doesn't require much from parent ifnet. What I really want to do next is the following: 1) Make BPF_PEERS_PRESENT(ifp) to be (ifp->if_bpf != NULL). This saves some processing time and permits 'bpf_if' to be be totally opaque without any hacks. 2) Set if_bpf pointer IFF there are some consumers (and set it back to NULL when all consumers are detached). This should work well for 'main' BPF DLT, but single (currently, 802.11) interface can hold more than one DLTs. Probably we can save dst pointer passed to bpfattach2() to given bpf_if structure, and set this value instead of ->if_bpf. This, however, can lead to hard-to-find problems, since bpfattach[2] is usually not called by driver directly. > -- WBR, Alexander From owner-freebsd-net@FreeBSD.ORG Mon Dec 3 15:36:57 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 087A86F0 for ; Mon, 3 Dec 2012 15:36:57 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id A76FB8FC15 for ; Mon, 3 Dec 2012 15:36:56 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TfY4u-0000mO-Sl for freebsd-net@freebsd.org; Mon, 03 Dec 2012 16:37:04 +0100 Received: from broadband-77-37-234-86.nationalcablenetworks.ru ([77.37.234.86]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 03 Dec 2012 16:37:04 +0100 Received: from vadim_nuclight by broadband-77-37-234-86.nationalcablenetworks.ru with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 03 Dec 2012 16:37:04 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Vadim Goncharov Subject: Re: [CFT] Virtual BPF interfaces Date: Mon, 3 Dec 2012 15:36:33 +0000 (UTC) Organization: Nuclear Lightning @ Moscow, Home Lines: 63 Message-ID: References: <4F96D11B.2060007@FreeBSD.org> <20120425.020518.406495893112283552.hrs@allbsd.org> <4F96E71B.9020405@FreeBSD.org> <20120427.084414.1142593201575277510.hrs@allbsd.org> <4FD4AD29.3040204@FreeBSD.org> <50BAA552.1010707@FreeBSD.org> <20121203081134.GO14202@glebius.int.ru> <50BC989E.3080303@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: broadband-77-37-234-86.nationalcablenetworks.ru X-Comment-To: Alexander V. Chernikov User-Agent: slrn/0.9.9p1 (FreeBSD) Cc: freebsd-ipfw@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: vadim_nuclight@mail.ru List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 15:36:57 -0000 Hi Alexander V. Chernikov! On Mon, 03 Dec 2012 16:18:38 +0400; Alexander V. Chernikov wrote about 'Re: [CFT] Virtual BPF interfaces': > On 03.12.2012 12:11, Gleb Smirnoff wrote: >> On Sun, Dec 02, 2012 at 04:48:18AM +0400, Alexander V. Chernikov wrote: >> A> On 10.06.2012 18:20, Alexander V. Chernikov wrote: >> A>> On 27.04.2012 03:44, Hiroki Sato wrote: >> A>>> "Alexander V. Chernikov" wrote >> A>>> in<4F96E71B.9020405@FreeBSD.org>: >> A>>> >> A>>> me> On 24.04.2012 21:05, Hiroki Sato wrote: >> A>> >> A>> Proof-of-concept patch attached. >> A> >> A> Hopefully, libcap code is easily extendable. >> A> New version attached: >> A> * BPF code is now able to use 'virtual' interfaces without real ifnet >> A> * New bpfattach3() / bpfdetach3() routines were added to attach virtual >> A> ifaces >> A> * New BIOCGIFLIST ioctl is added to permit userland to retrieve >> A> available virtual interfaces >> A> * freebsd-specific 'platform_finddevs' version is added to libpcap code >> A> (new file) >> A> >> A> There are some rough edges (conditional code in pcap-bpf.c, lack of >> A> documentation, maybe some style issues), but generally it seems to work >> A> and does not interfere with contrib/ code much (from my point of view). >> A> >> A> ipfw log device was converted to use new bpf(4) api, see attached patch. >> >> Nice proof of concept, Alexander! >> >> What does prevent us from unifing all bpf providers to be "virtual" in >> current terms? I think if we finish divorce between ifnet and bpf, the code >> would get simplier and you can proceed further with reducing locking >> overhead. > We have to jump from ifnet to the list of per-ifnet BPF consumers > somehow, so I'm not sure if we can do much more here. BPF itself doesn't > require much from parent ifnet. > What I really want to do next is the following: > 1) Make BPF_PEERS_PRESENT(ifp) to be (ifp->if_bpf != NULL). This saves > some processing time and permits 'bpf_if' to be be totally opaque > without any hacks. > 2) Set if_bpf pointer IFF there are some consumers (and set it back to > NULL when all consumers are detached). This should work well for 'main' > BPF DLT, but single (currently, 802.11) interface can hold more than one > DLTs. Probably we can save dst pointer passed to bpfattach2() to given There probably will be more of them when we will support tcpdump -i iggroupnam as admin can decide to move to one group interfaces with defferent DLTs. > bpf_if structure, and set this value instead of ->if_bpf. > This, however, can lead to hard-to-find problems, since bpfattach[2] is > usually not called by driver directly. -- WBR, Vadim Goncharov. ICQ#166852181 mailto:vadim_nuclight@mail.ru [Anti-Greenpeace][Sober FreeBSD zealot][http://nuclight.livejournal.com] From owner-freebsd-net@FreeBSD.ORG Mon Dec 3 16:15:34 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7290B399 for ; Mon, 3 Dec 2012 16:15:34 +0000 (UTC) (envelope-from keith.arner@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0429E8FC14 for ; Mon, 3 Dec 2012 16:15:33 +0000 (UTC) Received: by mail-ee0-f54.google.com with SMTP id c13so2065218eek.13 for ; Mon, 03 Dec 2012 08:15:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=EmVW8Why0F+/C0Bq8S3HSCIg+0ECaNUG15YxnZ28rxM=; b=YNEheMW7L8z//47eHhKytnH+aj0J+8XtIJ72PA2BsYC2DEd3L7tfoTtl4pe2Vz9ZnF qIe9ZGP+eHeOmWENkqGmBzookx/LNiG4SDG+ne+qyjrDfoyTquWnjGux9RLiB3gxDZ/8 wLbyYDgurmj/bLgqQFfrPuQt39G0DxNPMzHsliLigPok7l+jRTTy/XD9gE9Ti+dyxgH7 rUIpHT+PbhURvH+uQ6Pag25iGY6zgynnCWrej0rRQ2TjXl4C1Cdf+W5hpgvVKb+7k3k4 DFYpOQ2q2qTOx1vNJtadnqcjavoVd56NutsloKSjScSP5pJROi+W0F5y67tlHT/G22W5 cOsA== MIME-Version: 1.0 Received: by 10.14.218.69 with SMTP id j45mr37814254eep.35.1354551332826; Mon, 03 Dec 2012 08:15:32 -0800 (PST) Sender: keith.arner@gmail.com Received: by 10.14.48.1 with HTTP; Mon, 3 Dec 2012 08:15:32 -0800 (PST) Date: Mon, 3 Dec 2012 11:15:32 -0500 X-Google-Sender-Auth: 86FYCYWSKuVss6tQFoQnOPMLK7Q Message-ID: Subject: Re: Problems with ephemeral port selection From: Keith Arner To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 16:15:34 -0000 > Date: Sat, 01 Dec 2012 09:28:05 +0100 > From: Andre Oppermann > > On 30.11.2012 15:09, Keith Arner wrote: >> I've noticed some issues with ephemeral port number selection from >> tcp_connect(), > > this is an excellent analysis. Could you please file it as a problem > report too and post the PR-number here so we can better track it? Done. PR-number is: kern/174087 > From: Fernando Gont > Subject: Re: Problems with ephemeral port selection > > Please take a look at the discussion on how to "steal" incomming > connections in Section 3.1 of RFC 6056. Fair point. I added your comment to kern/174087 when I filed it. The points made in RFC 6056 actually answer a few outstanding questions I had about why in_pcbbind_setup() behaves the way it does. In particular, I previously couldn't figure out why it was taking special consideration for unconnected sockets. With that in mind, I believe the criteria for check_suitable_port() (as described bt RFC 6056) should be*: A candidate ephemeral port is suitable if and only if: 1) There is no other existing local socket with the same 5-tuple. 2) There is no local socket using the same local port number, and with either a wildcard fport or wildcard faddr. I had previously suggested using in_pcblookup_hash() as a check_suitable_port() function. That would suffice for criterion #1, but would fall short for criterion #2. Looks like we need yet another pcb lookup function. Keith * Yes, I realize that my terminology freely mixes the abstract concepts in the RFC with the concrete language of the FreeBSD implementation. -- "A problem well put is half solved." From owner-freebsd-net@FreeBSD.ORG Mon Dec 3 17:34:16 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EA2FFF0C for ; Mon, 3 Dec 2012 17:34:15 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 3E2578FC1E for ; Mon, 3 Dec 2012 17:34:14 +0000 (UTC) Received: (qmail 92577 invoked from network); 3 Dec 2012 19:04:45 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 3 Dec 2012 19:04:45 -0000 Message-ID: <50BCE294.4070409@freebsd.org> Date: Mon, 03 Dec 2012 18:34:12 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: "Alexander V. Chernikov" Subject: Re: [CFT] Virtual BPF interfaces References: <4F96D11B.2060007@FreeBSD.org> <20120425.020518.406495893112283552.hrs@allbsd.org> <4F96E71B.9020405@FreeBSD.org> <20120427.084414.1142593201575277510.hrs@allbsd.org> <4FD4AD29.3040204@FreeBSD.org> <50BAA552.1010707@FreeBSD.org> <20121203081134.GO14202@glebius.int.ru> <50BC989E.3080303@FreeBSD.org> In-Reply-To: <50BC989E.3080303@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-ipfw@FreeBSD.org, delphij@FreeBSD.org, Hiroki Sato , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 17:34:16 -0000 On 03.12.2012 13:18, Alexander V. Chernikov wrote: > On 03.12.2012 12:11, Gleb Smirnoff wrote: >> On Sun, Dec 02, 2012 at 04:48:18AM +0400, Alexander V. Chernikov wrote: >> A> On 10.06.2012 18:20, Alexander V. Chernikov wrote: >> A> > On 27.04.2012 03:44, Hiroki Sato wrote: >> A> >> "Alexander V. Chernikov" wrote >> A> >> in<4F96E71B.9020405@FreeBSD.org>: >> A> >> >> A> >> me> On 24.04.2012 21:05, Hiroki Sato wrote: >> A> > >> A> > Proof-of-concept patch attached. >> A> >> A> Hopefully, libcap code is easily extendable. >> A> New version attached: >> A> * BPF code is now able to use 'virtual' interfaces without real ifnet >> A> * New bpfattach3() / bpfdetach3() routines were added to attach virtual >> A> ifaces >> A> * New BIOCGIFLIST ioctl is added to permit userland to retrieve >> A> available virtual interfaces >> A> * freebsd-specific 'platform_finddevs' version is added to libpcap code >> A> (new file) >> A> >> A> There are some rough edges (conditional code in pcap-bpf.c, lack of >> A> documentation, maybe some style issues), but generally it seems to work >> A> and does not interfere with contrib/ code much (from my point of view). >> A> >> A> ipfw log device was converted to use new bpf(4) api, see attached patch. >> >> Nice proof of concept, Alexander! >> >> What does prevent us from unifing all bpf providers to be "virtual" in >> current terms? I think if we finish divorce between ifnet and bpf, the code >> would get simplier and you can proceed further with reducing locking >> overhead. > > We have to jump from ifnet to the list of per-ifnet BPF consumers somehow, so I'm not sure if we can > do much more here. BPF itself doesn't require much from parent ifnet. > > What I really want to do next is the following: > > 1) Make BPF_PEERS_PRESENT(ifp) to be (ifp->if_bpf != NULL). This saves some processing time and > permits 'bpf_if' to be be totally opaque without any hacks. You have to be a bit careful with locking, or rather not locking. When the consumer is not doing any lock operations it may not (immediately) pick up that the pointer was changed on another CPU. > 2) Set if_bpf pointer IFF there are some consumers (and set it back to NULL when all consumers are > detached). This should work well for 'main' BPF DLT, but single (currently, 802.11) interface can > hold more than one DLTs. Probably we can save dst pointer passed to bpfattach2() to given bpf_if > structure, and set this value instead of ->if_bpf. > This, however, can lead to hard-to-find problems, since bpfattach[2] is usually not called by driver > directly. Separate from the above BPF on the output side may be optimized by passing the mbuf not from drv*_start() but from drv*_txeof(). There may be a few microseconds delay but a mbuf (-chain) copy is saved in the transmit path. As an additional benefit only those packets that actually were transmitted are persented to bpf. -- Andre From owner-freebsd-net@FreeBSD.ORG Tue Dec 4 01:31:42 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AE569D5C for ; Tue, 4 Dec 2012 01:31:42 +0000 (UTC) (envelope-from prvs=1685a61a7f=evendas@krazer.com.br) Received: from krazer.com.br (usaimport.com.br [74.208.147.131]) by mx1.freebsd.org (Postfix) with ESMTP id 26AC78FC26 for ; Tue, 4 Dec 2012 01:31:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=krazer.com.br; s=MDaemon; t=1354583459; x=1355188259; q=dns/txt; h=DomainKey-Signature: Received:From:To:Subject:Date:MIME-Version:Content-Type: Message-ID; bh=/AGF6JRzB/J7iTsWt8UkF/2jJ0/KNRujXWNnXTa6qHA=; b=G p90aWi6jWdzIKxXTaBrwTZEAnx2MHKnn269Yto+K0ksdyPmChUqq0JJ9hckrPHVE 3fqbjEH6MmjEQPbuCNe8QxllJvD5mVUs0hEvhmtd/7Y7hO33GXfjhbQyj0cHqPwh +Upbz/ihPUjJf/e1kPi9YAasUKJcZfzS1gCjqMLP88= DomainKey-Signature: a=rsa-sha1; s=MDaemon; d=krazer.com.br; c=simple; q=dns; h=from:message-id; b=GZXyIIwtB+n/RLrcRxT88fMg0TRyw12rdPESYhDUkCpDbl0dd6UKSQnxkp/N 8CONNejRMW+xT6VzsrFkbNFhYjYVZNQDvLcby0Zjzp6jeM/GQyNO1V4Ev c8GDl2sfpdYjWnB36opLpzY0brBHOLnzHm5cz2e/Yiz6O0uv3lxzfA=; X-MDAV-Processed: allearth.com.br, Mon, 03 Dec 2012 23:10:59 -0200 Received: from krazer by allearth.com.br (MDaemon PRO v11.0.0) with ESMTP id md50003170654.msg for ; Mon, 03 Dec 2012 23:10:58 -0200 X-Spam-Processed: allearth.com.br, Mon, 03 Dec 2012 23:10:58 -0200 (not processed: message from trusted or authenticated source) X-Authenticated-Sender: evendas@krazer.com.br X-MDRemoteIP: 74.208.167.75 X-Return-Path: prvs=1685a61a7f=evendas@krazer.com.br X-Envelope-From: evendas@krazer.com.br X-MDaemon-Deliver-To: freebsd-net@freebsd.org From: "Vendas Krazer Technologies" To: Subject: =?utf-8?B?Tm92YSBDUEUgS3JhemVyIFNreSBTdGF0aW9uIDVHSHo=?= =?utf-8?B?IE4gLSBDUEUgQW50ZW5hIEludGVncmFkYSBkZSAxOGRCaQ==?= =?utf-8?B?IGUgQ29tIFNhw61kYSBwYXJhIEFudGVuYSBFeHRlcm5h?= Date: Mon, 03 Dec 2012 22:08:16 -0200 MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=45652905_3502_4801_0078_850943129657" Message-ID: X-Mailer: Clientes Krazer X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 01:31:42 -0000 This is a multi-part message in MIME format. ------=45652905_3502_4801_0078_850943129657 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Lan=C3=A7amento CPE Krazer Sky Station 5GHz N Voc=C3=AA cliente pediu que a Krazer fizesse uma nova CPE num formato mais = estiloso, pequena, de menor tamanho e que tivesse novas funcionabilidade, m= ais especificamente acesso f=C3=A1cil ao bot=C3=A3o de reset, prote=C3=A7= =C3=A3o contra queima e a t=C3=A3o desejada SA=C3=8DDA PARA ANTENA EXTERNA!= !! R$ 179.90 Antena Integrada de 18dBi 60=C2=BA Duas Portas de Rede Lan e Wan PA Real de 630mW e LNA Ultra Ganho PoE Passivo com Prote=C3=A7=C3=A3o Dupla de 12 a 24V Fonte Chaveada 12V Full Range 110 a 220V Exclusiva Sa=C3=ADda para Antena Externa Homologa=C3=A7=C3=A3o Anatel 0269-11-5280 Instala=C3=A7=C3=A3o R=C3=A1pida e Simples. Software Amigavel e em Portugu=C3=AAs! Suporte a PPPoE Wisp Cliente! Controle de Banda! Excelente sinal de recep=C3=A7=C3=A3o! Longa Dist=C3=A2ncia! Fa=C3=A7a um teste em sua rede e compare com os concorrentes, muito mais si= nal que UBNT, muito mais dados, transmiss=C3=A3o de quase 90Mbps TCP/IP con= tinuamente! Lat=C3=AAncia de rede de 1 a 5 ms com carga completa! Contate-nos Val Campos // Carla Maria // Eder Roberto Email / MSN: vendas@allearth.com.br Vendas / SAC (19) 3256-5557 (19) 3245-0708 www.krazer.com.br Envio de Email n=C3=A3o autorizado =C3=A9 crime, n=C3=A3o seja o vil=C3=A3o= da hist=C3=B3ria! Email =C3=A9 protegido sobre sigilo fiscal e federal. Le= i Federal Brasil. ------=45652905_3502_4801_0078_850943129657-- From owner-freebsd-net@FreeBSD.ORG Tue Dec 4 18:43:03 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 87770496 for ; Tue, 4 Dec 2012 18:43:03 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 474488FC1D for ; Tue, 4 Dec 2012 18:43:03 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 77BE5B9BC; Tue, 4 Dec 2012 13:43:02 -0500 (EST) From: John Baldwin To: Barney Cordoba Subject: Re: Latency issues with buf_ring Date: Tue, 4 Dec 2012 11:08:17 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> In-Reply-To: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201212041108.17645.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 04 Dec 2012 13:43:02 -0500 (EST) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 18:43:03 -0000 On Sunday, November 18, 2012 12:24:01 pm Barney Cordoba wrote: > > --- On Thu, 1/19/12, John Baldwin wrote: > > > From: John Baldwin > > Subject: Latency issues with buf_ring > > To: net@freebsd.org > > Cc: "Ed Maste" , "Navdeep Parhar" > > Date: Thursday, January 19, 2012, 11:41 AM > > The current buf_ring usage in various > > NIC drivers has a race that can > > result in really high packet latencies in some cases. > > Specifically, > > the common pattern in an if_transmit routine is to use a > > try-lock on > > the queue and if that fails enqueue the packet in the > > buf_ring and > > return. The race, of course, is that the thread > > holding the lock > > might have just finished checking the buf_ring and found it > > empty and > > be in the process of releasing the lock when the original > > thread fails > > the try lock. If this happens, then the packet queued > > by the first > > thread will be stalled until another thread tries to > > transmit packets > > for that queue. Some drivers attempt to handle this > > race (igb(4) > > schedules a task to kick the transmit queue if the try lock > > fails) and > > others don't (cxgb(4) doesn't handle it at all). At > > work this race > > was triggered very often after upgrading from 7 to 8 with > > bursty > > traffic and caused numerous problems, so it is not a rare > > occurrence > > and needs to be addressed. > > > > (Note, all patches mentioned are against 8) > > > > The first hack I tried to use was to simply always lock the > > queue after > > the drbr enqueue if the try lock failed and then drain the > > queue if > > needed (www.freebsd.org/~jhb/patches/try_fail.patch). > > While this fixed > > my latency problems, it would seem that this breaks other > > workloads > > that the drbr design is trying to optimize. > > > > After further hacking what I came up with was a variant of > > drbr_enqueue() > > that would atomically set a 'pending' flag. During the > > enqueue operation. > > The first thread to fail the try lock sets this flag (it is > > told that it > > set the flag by a new return value (EINPROGRESS) from the > > enqueue call). > > The pending thread then explicitly clears the flag once it > > acquires the > > queue lock. This should prevent multiple threads from > > stacking up on the > > queue lock so that if multiple threads are dumping packets > > into the ring > > concurrently all but two (the one draining the queue > > currently and the > > one waiting for the lock) can continue to drain the > > queue. One downside > > of this approach though is that each driver has to be > > changed to make > > an explicit call to clear the pending flag after grabbing > > the queue lock > > if the try lock fails. This is what I am currently > > running in production > > (www.freebsd.org/~jhb/patches/try_fail3.patch). > > > > However, this still results in a lot of duplicated code in > > each driver > > that wants to support multiq. Several folks have > > expressed a desire > > to move in a direction where the stack has explicit > > knowledge of > > transmit queues allowing us to hoist some of this duplicated > > code out > > of the drivers and up into the calling layer. After > > discussing this a > > bit with Navdeep (np@), the approach I am looking at is to > > alter the > > buf_ring code flow a bit to more closely model the older > > code-flow > > with IFQ and if_start methods. That is, have the > > if_transmit methods > > always enqueue each packet that arrives to the buf_ring and > > then to > > call an if_start-like method that drains a specific transmit > > queue. > > This approach simplifies a fair bit of driver code and means > > we can > > potentially move the enqueue, etc. bits up into the calling > > layer and > > instead have drivers provide the per-transmit queue start > > routine as > > the direct function pointer to the upper layers ala > > if_start. > > > > However, we would still need a way to close the latency > > race. I've > > attempted to do that by inverting my previous 'thread > > pending' flag. > > Instead, I make the buf_ring store a 'busy' flag. This > > flag is > > managed by the single-consumer buf_ring dequeue method > > (that > > drbr_dequeue() uses). It is set to true when a packet > > is removed from > > the queue while there are more packets pending. > > Conversely, if there > > are no other pending packets then it is set to false. > > The assumption > > is that once a thread starts draining the queue, it will not > > stop > > until the queue is empty (or if it has to stop for some > > other reason > > such as the transmit ring being full, the driver will > > restart draining > > of the queue until it is empty, e.g. after it receives a > > transmit > > completion interrupt). Now when the if_transmit > > routine enqueues the > > packet, it will get either a real error, 0 if the packet was > > enqueued > > and the queue was not idle, or EINPROGRESS if the packet was > > enqueued > > and the queue was busy. For the EINPROGRESS case the > > if_transmit > > routine just returns success. For the 0 case it does a > > blocking lock > > on the queue lock and calls the queue's start routine (note > > that this > > means that the busy flag is similar to the old OACTIVE > > interface > > flag). This does mean that in some cases you may have > > one thread that > > is sending what was the last packet in the buf_ring holding > > the lock > > when another thread blocks, and that the first thread will > > see the new > > packet when it loops back around so that the second thread > > is wasting > > it's time spinning, but in the common case I believe it will > > give the > > same parallelism as the current code. OTOH, there is > > nothing to > > prevent multiple threads from "stacking up" in the new > > approach. At > > least the try_fail3 patch ensured only one thread at a time > > would ever > > potentially block on the queue lock. > > > > Another approach might be to replace the 'busy' flag with > > the 'thread > > pending' flag from try_fail3.patch, but to clear the 'thread > > pending' > > flag anytime the dequeue method is called rather than using > > an > > explicit 'clear pending' method. (Hadn't thought of > > that until > > writing this e-mail.) That would prevent multiple > > threads from > > waiting on the queue lock perhaps. > > > > Note that the 'busy' approach (or the modification I > > mentioned above) > > does rely on the assumption I stated above, i.e. once a > > driver starts > > draining a queue, it will drain it until empty unless it > > hits an > > "error" condition (link went down, transmit ring full, > > etc.). If it > > hits an "error" condition, the driver is responsible for > > restarting > > transmit when the condition clears. I believe our > > drivers already > > work this way now. > > > > The 'busy' patch is at http://www.freebsd.org/~jhb/patches/drbr.patch > > > > -- > > John Baldwin > > Q1: Has this been corrected? No. I've yet to been able to raise a meaningful discussion about possible solutions to this. > Q2: Are there any case studies or benchmarks for buf_ring, or it is just > blindly being used because someone claimed it was better and offered it > for free? One of the points of locking is to avoid race conditions, so the fact that you have races in a supposed lock-less scheme seems more than just ironic. The buf_ring author claims it has benefits in high pps workloads. I am not aware of any benchmarks, etc. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Tue Dec 4 19:34:37 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7CCC42E4; Tue, 4 Dec 2012 19:34:37 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wi0-f180.google.com (mail-wi0-f180.google.com [209.85.212.180]) by mx1.freebsd.org (Postfix) with ESMTP id D9EA98FC0C; Tue, 4 Dec 2012 19:34:36 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id hj13so854359wib.13 for ; Tue, 04 Dec 2012 11:34:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=nOwx4cI4NlCjeGnwjUHbdLlEYhtNXtuMfHL3cRMCfUM=; b=I8j4+NOl/OdkArwgmSZP1baSk+2pfcWYBRVLGMBLTHNpdQbG5wP3RZX70fG4O9fwHd kbM7pbs1hqtaoEyzjgnwOCe79BdBA/oEHXyriR0lpEhv/+5ADx5UZ18cVFRwnr2FIhze qXC0pNL+t+k+P6istLvFVCzGEIAVTRCgOnkqGl5un5rwnB65GKiGH57G/VMKmYCfzwZ7 96T2cAyYvB3GGUisc+t2oU5DWEx/yMaN0MCBzcQuSkzRrOuRozIg38u2Pv5YdDbHtgmc uCl5Cqh9rfM75OLwKd8Wuf4ZSXAW4qkcVtyDZjVt6BVe+nZ+tveLPJj85FQlLjgRKkdo I3mQ== MIME-Version: 1.0 Received: by 10.216.85.211 with SMTP id u61mr5672480wee.212.1354649675658; Tue, 04 Dec 2012 11:34:35 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.217.57.9 with HTTP; Tue, 4 Dec 2012 11:34:35 -0800 (PST) In-Reply-To: <201212041108.17645.jhb@freebsd.org> References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> Date: Tue, 4 Dec 2012 11:34:35 -0800 X-Google-Sender-Auth: ouURajnVNcCsSzqsreuwjeeKdNU Message-ID: Subject: Re: Latency issues with buf_ring From: Adrian Chadd To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: Barney Cordoba , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 19:34:37 -0000 .. and it's important to note that buf_ring itself doesn't have the race condition; it's the general driver implementation that's racy. I have the same races in ath(4) with the watchdog programming. Exactly the same issue. Adrian From owner-freebsd-net@FreeBSD.ORG Tue Dec 4 20:02:32 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A515EDF0 for ; Tue, 4 Dec 2012 20:02:32 +0000 (UTC) (envelope-from oppermann@networx.ch) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 0D2578FC08 for ; Tue, 4 Dec 2012 20:02:31 +0000 (UTC) Received: (qmail 5917 invoked from network); 4 Dec 2012 21:32:44 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 4 Dec 2012 21:32:44 -0000 Message-ID: <50BE56C8.1030804@networx.ch> Date: Tue, 04 Dec 2012 21:02:16 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: Latency issues with buf_ring References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Barney Cordoba , John Baldwin , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 20:02:32 -0000 On 04.12.2012 20:34, Adrian Chadd wrote: > .. and it's important to note that buf_ring itself doesn't have the > race condition; it's the general driver implementation that's racy. > > I have the same races in ath(4) with the watchdog programming. Exactly > the same issue. Our IF_* stack/driver boundary handoff isn't up to the task anymore. Also the interactions are either poorly defined or understood in many places. I've had a few chats with yongari@ and am experimenting with a modernized interface in my branch. The reason I stumbled across it was because I'm extending the hardware offload feature set and found out that the stack and the drivers (and the drivers among themself) are not really in sync with regards to behavior. For most if not all ethernet drivers from 100Mbit/s the TX DMA rings are so large that buffering at the IFQ level doesn't make sense anymore and only adds latency. So it could simply directly put everything into the TX DMA and not even try to soft-queue. If the TX DMA ring is full ENOBUFS is returned instead of filling yet another queue. However there are ALTQ interactions and other mechanisms which have to be considered too making it a bit more involved. I'm coming up with a draft and some benchmark results for an updated stack/driver boundary in the next weeks before xmas. -- Andre From owner-freebsd-net@FreeBSD.ORG Tue Dec 4 21:31:26 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F26E2819; Tue, 4 Dec 2012 21:31:25 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4FAC78FC19; Tue, 4 Dec 2012 21:31:24 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id u54so2209471wey.13 for ; Tue, 04 Dec 2012 13:31:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=A+iA2cUXAhMGGI46BKigGPjZEnEKIEBPalVUClTApi4=; b=Uezhxn3hNNm7kZtALNJj2VS45tb18yKP4llDpfMt3OcwZCyQmmb9i+98lxs1Mk7aU8 /DrJmTi41DQFeA3dLhCwG7W+JrIT1/RjSmDbu2O+ewj5asBgjd02hWKrM9yXBD0UzfQ+ 9aYxr34ul5F9pjPN3v8/Tq7sccBwfRnKuIptOP6dBHnXwOX6xqcNz1to9TXMqbWDNc/S GxT+rdm6O0QZnZ0fCsEluXq02HjHoZ8wb0Qca5V2rf+c/XcNhZGABdsfU7zpCiuQV1OQ lmqKR4palTm2FupfA0yPjgm5KkzK21ib09nG7q2P2zFcSEAPVBTbCzcBulvLsqwjPgab G6MA== MIME-Version: 1.0 Received: by 10.216.139.140 with SMTP id c12mr5872057wej.46.1354656683288; Tue, 04 Dec 2012 13:31:23 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.217.57.9 with HTTP; Tue, 4 Dec 2012 13:31:23 -0800 (PST) In-Reply-To: <50BE56C8.1030804@networx.ch> References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> <50BE56C8.1030804@networx.ch> Date: Tue, 4 Dec 2012 13:31:23 -0800 X-Google-Sender-Auth: CK7HGl4-msBdEVpC_rzmRv2l7J8 Message-ID: Subject: Re: Latency issues with buf_ring From: Adrian Chadd To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 Cc: Barney Cordoba , John Baldwin , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 21:31:26 -0000 On 4 December 2012 12:02, Andre Oppermann wrote: > Our IF_* stack/driver boundary handoff isn't up to the task anymore. Right. well, the current hand off is really "here's a packet, go do stuff!" and the legacy if_start() method is just plain broken for SMP, preemption and direct dispatch. Things are also very special in the net80211 world, with the stack layer having to get its grubby fingers into things. I'm sure that the other examples of layered protocols (eg doing MPLS, or even just straight PPPoE style tunneling) has the same issues. Anything with sequence numbers and encryption being done by some other layer is going to have the same issue, unless it's all enforced via some other queue and a single thread handling the network stack "stuff". I bet direct-dispatch netgraph will have similar issues too, if it ever comes into existence. :-) > Also the interactions are either poorly defined or understood in many > places. I've had a few chats with yongari@ and am experimenting with > a modernized interface in my branch. > > The reason I stumbled across it was because I'm extending the hardware > offload feature set and found out that the stack and the drivers (and > the drivers among themself) are not really in sync with regards to behavior. > > For most if not all ethernet drivers from 100Mbit/s the TX DMA rings > are so large that buffering at the IFQ level doesn't make sense anymore > and only adds latency. So it could simply directly put everything into > the TX DMA and not even try to soft-queue. If the TX DMA ring is full > ENOBUFS is returned instead of filling yet another queue. However there > are ALTQ interactions and other mechanisms which have to be considered > too making it a bit more involved. net80211 has slightly different problems. We have requirements for per-node, per-TID/per-AC state (not just for QOS, but separate sequence numbers, different state machine handling for things like aggregation and (later) U-APSD handling, etc) so we do need to direct frames into different queues and then correctly serialise that mess. > I'm coming up with a draft and some benchmark results for an updated > stack/driver boundary in the next weeks before xmas. Ok. Please don't rush into it though; I'd like time to think about it after NY (as I may actually _have_ a holiday this xmas!) and I'd like to try and rope in people from non-ethernet-packet-pushing backgrounds to comment. They may have much stricter and/or stranger requirements when it comes to how the network layer passes, serialises and pushes packets to other layers. Thanks, Adrian From owner-freebsd-net@FreeBSD.ORG Wed Dec 5 03:31:39 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F3616D4F; Wed, 5 Dec 2012 03:31:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx06.syd.optusnet.com.au (fallbackmx06.syd.optusnet.com.au [211.29.132.8]) by mx1.freebsd.org (Postfix) with ESMTP id 2824A8FC08; Wed, 5 Dec 2012 03:31:37 +0000 (UTC) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qB53VUG7026821; Wed, 5 Dec 2012 14:31:31 +1100 Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qB53VHcG016927 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 5 Dec 2012 14:31:21 +1100 Date: Wed, 5 Dec 2012 14:31:17 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andre Oppermann Subject: Re: Latency issues with buf_ring In-Reply-To: <50BE56C8.1030804@networx.ch> Message-ID: <20121205112511.Q932@besplex.bde.org> References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> <50BE56C8.1030804@networx.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=c8fz2mBl c=1 sm=1 a=ie5KVN3-GTQA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=SOXvLa97LiYA:10 a=gr-qOqZ8CvggYKoQBowA:9 a=CjuIK1q_8ugA:10 a=bxQHXO5Py4tHmhUgaywp5w==:117 Cc: Barney Cordoba , Adrian Chadd , John Baldwin , freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 03:31:39 -0000 On Tue, 4 Dec 2012, Andre Oppermann wrote: > For most if not all ethernet drivers from 100Mbit/s the TX DMA rings > are so large that buffering at the IFQ level doesn't make sense anymore > and only adds latency. I found sort of the opposite for bge at 1Gbps. Most or all bge NICs have a tx ring size of 512. The ifq length is the tx ring size minus 1 (511). I needed to expand this to imax(2 * tick / 4, 10000) to maximize pps. This does bad things to latency and worse things to caching (512 buffers might fit in the L2 cache, but 10000 buffers bust any reasonably cache as they are cycled through), but I only tried to optimize tx pps. > So it could simply directly put everything into > the TX DMA and not even try to soft-queue. If the TX DMA ring is full > ENOBUFS is returned instead of filling yet another queue. That could work, but upper layers currently don't understand ENOBUFS at all, so it would work poorly now. Also, 512 entries is not many, so even if upper layers understood ENOBUFS it is not easy for them to _always_ respond fast enough to keep the tx active, unless there are upstream buffers with many more than 512 entries. There needs to be enough buffering somewhere so that the tx ring can be replenished almost instantly from the buffer, to handle the worst-case latency for the threads generatng new (unbuffered) packets. At the line rate of ~1.5 Mpps for 1 Gbps, the maximum latency that can be covered by 512 entries is only 340 usec. > However there > are ALTQ interactions and other mechanisms which have to be considered > too making it a bit more involved. I didn't try to handle ALTQ or even optimize for TCP. More details: to maximize pps, the main detail is to ensure that the tx ring never becomes empty. The tx then transmits as fast as possible. This requires some watermark processing, but FreeBSD has almost none for tx rings. The following normally happens for packet generators like ttcp and netsend: - loop calling send() or sendto() until the tx ring (and also any upstream buffers) fill up. Then ENOBUFS is returned. - watermark processing is broken in the user API at this point. There is no way for the application to wait for the ENOBUFS condition to go away (select() and poll() don't work). Applications use poor workarounds: - old (~1989) ttcp sleeps for 18 msec when send() returns ENOBUFS. This was barely good enough for 1 Mbps ethernet (line rate ~1500 pps is 27 per 18 msec, so IFQ_MAXLEN = 50 combined with just a 1-entry tx ring provides a safety factor of about 2). Expansion of the tx ring size to 512 makes this work with 10 Mbps ethernet too. Expansion of the ifq to 511 gives another factor of 2. After losing the safety factor of 2, we can now handle 40 Mbps ethernet, and are only a factor of 25 short for 1 Gbps. My hardware can't do line rate for small packets -- it can only do 640 kpps. Thus ttcp is only a factor of 11 short of supporting the hardware at 1 Gbps. This assumes that sleeps of 18 msec are actually possible, which they aren't with HZ = 100 giving a granularity of 10 msec so that sleep(18 msec) actually sleeps for an average of 23 msec. -current uses the bad default of HZ = 1000. With that sleep(18 msec) would average 18.5 msec. Of course, ttcp should sleep for more like 1 msec if that is possible. Then the average sleep is 1.5 msec. ttcp can keep up with the hardware with that, and is only slightly behind the hardware with the worst-case sleep of 2 msec (512+511 packets generated every 2 msec is 511.5 kpps). I normally use old ttcp, except I modify it to sleep for 1 msec instead of 18 in one version, and in another version I remove the sleep so that it busy-waits in a loop that calls send() which almost always returns ENOBUFS. The latter wastes a lot of CPU, but is almost good enough for throughput testing. - newer ttcp tries to program the sleep time in microseconds. This doesn't really work, since the sleep granularity is normally at least a millisecond, and even if it could be the 340 microseconds needed by bge with no ifq (see above, and better divide the 340 by 2), then this is quite short and would take almost as much CPU as busy-waiting. I consider HZ = 1000 to be another form of polling/busy-waiting and don't use it except for testing. - netrate/netsend also uses a programmed sleep time. This doesn't really work, as above. netsend also tries to limit its rate based on sleeping. This is further from working, since even finer-grained sleeps are needed to limit the rate accurately than to keep up with the maxium rate. Watermark processing at the kernel level is not quite as broken. It is mostly non-existend, but partly works, sort of accidentally. The difference is now that there is a tx "eof" or "completion" interrupt which indicates the condition corresponding to the ENOBUFS condition going away, so that the kernel doesn't have to poll for this. This is not really an "eof" interrupt (unless bge is programmed insanely, to interrupt only after the tx ring is completely empty). It acts as primitive watermarking. bge can be programmed to interrupt after having sent every N packets (strictly, after every N buffer descriptors, but for small packets these are the same). When there are more than N packets to start, say M, this acts as a watermark at M-N packets. bge is normally misprogrammed with N = 10. At the line rate of 1.5 Mpps, this asks for an interrupt rate of 150 kHz, which is far too high and is usually unreachable, so reaching the line rate is impossible due to the CPU load from the interrupts. I use N = 384 or 256 so that the interrupt rate is not the dominant limit. However, N = 10 is better for latency and works under light loads. It also reduces the amount of buffering needed. The ifq works more as part of accidentally watermarking than as a buffer. It is the same size as the tx right (actually 1 smaller for bogus reasons), so it is not really useful as a buffer. However, with no explicit watermarking, any separate buffer like the ifq provides a sort of watermark at the boundary between the buffers. The usefulness of this would most obvious if the tx "eof" interrupt were actually for eof (perhaps that is what it was originally). Then on the eof interrupt, there is no time at all to generate new packets, and the time when the tx is idle can be minimized by keeping pre-generated packets handy where the can be copied to the tx ring at tx "eof" interrupt time. A buffer of about the same size as the tx ring (or maybe 1/4) the size, is enough for this. OTOH, with bge misprogrammed to interrupt after every 10 tx packets, the ifq is useless for its watermark purposes. The watermark is effectively in the tx ring, and very strangely placed there at 10 below the top (ring full). Normally tx watermarks are placed near the bottom (ring empty). They must not be placed too near the bottom, else there would not be enough time to replenish the ring between the time when the "eof" (really, the "watermark") interrupt is received and when the tx runs dry. They should not be placed too near the top like they are in -current's bge, else the point of having a large tx ring is defeated and there are too many interrupts. However, when they are placed near the top, latencency requirements are reduced. I recently worked on buffering for sio and noticed similar related problems for tx watermarks. Don't laugh -- serial i/o 1 character at a time at 3.686400 Mbps has much the same timing requirements as ethernet i/o 1 packet at a time at 1 Gbps. Each serial character takes ~2.7 usec and each minimal ethernet packet takes ~0.67 usec. With tx "ring" sizes of 128 and 512 respectively, the ring times for full to empty are 347 usec for serial i/o and 341 usec for ethernet i/o. Strangely, tx is harder than rx because: - perfection is possible and easier to measure for tx. It consists of just keeping at least 1 entry in the tx ring at all times. Latency must be kept below ~340 usec to have any chance of this. This is not so easy to achieve under _all_ loads. - for rx, you have an external source generating the packets, so you don't have to worry about latency affecting the generators. - the need for watermark processing is better known for rx, since it obviously doesn't work to generate the rx "eof" interrupt near the top. The serial timing was actually harder to satisfy, because I worked on it on a 366 MHz CPU while I worked on bge on a 2 GHz CPU, and even the 2GHz CPU couldn't keep up with line rate (so from full to empty takes 800 usec). It turned out that the best position for the tx low watermark is about 1/4 or 1/2 from the bottom for both sio and bge. It must be fairly high, else the latency requirements are not met. In the middle is a good general position. Although it apparently "wastes" half of the ring to make the latency requirements easier to meet (without very system-dependent tuning), the efficiency lost from this is reasonably small. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 5 03:57:55 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 66EF01F9; Wed, 5 Dec 2012 03:57:55 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: from mail-ie0-f179.google.com (mail-ie0-f179.google.com [209.85.223.179]) by mx1.freebsd.org (Postfix) with ESMTP id EE27F8FC0C; Wed, 5 Dec 2012 03:57:54 +0000 (UTC) Received: by mail-ie0-f179.google.com with SMTP id k14so7086016iea.10 for ; Tue, 04 Dec 2012 19:57:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=hkHoL6ot2rl5hN8jxnUoDl46qp9wOoFGrA83diK/XQ4=; b=A3jGiym1yazWI2tkDSYmEAzW9U5uMCESQYRDuqWe2iZI9KgjdFAKVcuiGubHm53uy/ dIi8oIhl/9nA72e/h0EDPu0zpeKsbNltev0wkrGIzW5Z2WT2j3YueK+GWuCLNWNVH+JB LdYGuJhYk6VuZnqMaPwBbWEG4yh9P+K5UaFyKj54YyCXgC0f3Jkdo4GyRcb9PJO1lzpF ge+WLXKHnoYabs7H/HmchahvziTSCabuCXVFMOcgkA+sdPybjtudn0nSrkNmmptyqY8J d1gYnava0T9+WN2ZTJ2bmbu8JjQSkdyuoW3j8pG9tP0baXtuHMG8eH6rc2W8rPaNCn8t +F0w== Received: by 10.50.33.173 with SMTP id s13mr582385igi.23.1354679874026; Tue, 04 Dec 2012 19:57:54 -0800 (PST) Received: from [10.0.0.130] ([24.225.136.71]) by mx.google.com with ESMTPS id uj11sm11568434igb.15.2012.12.04.19.57.51 (version=SSLv3 cipher=OTHER); Tue, 04 Dec 2012 19:57:52 -0800 (PST) Message-ID: <50BEC63B.6020801@gmail.com> Date: Tue, 04 Dec 2012 22:57:47 -0500 From: Karim Fodil-Lemelin User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: Latency issues with buf_ring References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> <50BE56C8.1030804@networx.ch> In-Reply-To: <50BE56C8.1030804@networx.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Barney Cordoba , Adrian Chadd , John Baldwin , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 03:57:55 -0000 Hi, On 04/12/2012 3:02 PM, Andre Oppermann wrote: > On 04.12.2012 20:34, Adrian Chadd wrote: >> .. and it's important to note that buf_ring itself doesn't have the >> race condition; it's the general driver implementation that's racy. >> >> I have the same races in ath(4) with the watchdog programming. Exactly >> the same issue. > > Our IF_* stack/driver boundary handoff isn't up to the task anymore. > > Also the interactions are either poorly defined or understood in many > places. I've had a few chats with yongari@ and am experimenting with > a modernized interface in my branch. > > The reason I stumbled across it was because I'm extending the hardware > offload feature set and found out that the stack and the drivers (and > the drivers among themself) are not really in sync with regards to > behavior. > > For most if not all ethernet drivers from 100Mbit/s the TX DMA rings > are so large that buffering at the IFQ level doesn't make sense anymore > and only adds latency. So it could simply directly put everything into > the TX DMA and not even try to soft-queue. If the TX DMA ring is full > ENOBUFS is returned instead of filling yet another queue. However there > are ALTQ interactions and other mechanisms which have to be considered > too making it a bit more involved. I've also bumped into this 'internalization' of drbr for quite some time now. I have been toying with some ideas around a multi-queue capable ALTQ. Not unlike IFQ_* the whole class_queue_t code in ALTQ could use some freshening up. One avenue I am looking into is drbr queues (and its associated TX lock) as the back end queue implementation for ALTQ. ALTQ(9) has a concept of driver managed queues and the approach tries to keep the same paradigm but adapt it for buf_ring. In that context, It doesn't feel natural for me that drbr logic is handled so low inside the device drivers and makes system level modifications to ALTQ unnecessarily driver dependent. ALTQ is also using very coarse grained locking (using the IFQ_LOCK for everything) which doesn't make much sense in a SMP/multiqueue system but that's another story. > > I'm coming up with a draft and some benchmark results for an updated > stack/driver boundary in the next weeks before xmas. > Sounds great, can't wait to read it while drinking that eggnog :) From owner-freebsd-net@FreeBSD.ORG Wed Dec 5 03:58:09 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0DF2F286; Wed, 5 Dec 2012 03:58:09 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id 96C398FC12; Wed, 5 Dec 2012 03:58:08 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id s9so8977056iec.13 for ; Tue, 04 Dec 2012 19:58:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=Az1vPZAQ9TMin+ClxwSn7WMwLgYwBzDn6RN1spKRA9M=; b=hzvzRdBngMd2IpmZtYfCAeFmeharw/RZn/LQ6D9s6l2VQgodn9PyEApeC4Vvw/DqqM AbWYR3XgpuTVRa/78Kf8Sn6WGNn9ylb2Xxgb8LmILvJj72yde2d9oD1BB8ZFW3FnlsVi ho6xzMQp7Qe/SWN6EJtIFREd4WwL07uGUAqi0GdQYRFQNuh6SkvdhBqHGWie47nwUYM+ d+uKkGVmDfcUyonDDSg5hyA7Kwr8kcE7YyTBjV+riJExY97vB+sa710VqfNxM4MKWhGJ F7rR4RuVSo+tglENGZ4yWRrdQ9qSDGTBU0J/vlC5ibq3UQsgQ1z2NJFHR+Xp3rUgJ5HQ nNCQ== Received: by 10.50.150.144 with SMTP id ui16mr503107igb.68.1354679887885; Tue, 04 Dec 2012 19:58:07 -0800 (PST) Received: from [10.0.0.130] ([24.225.136.71]) by mx.google.com with ESMTPS id uj11sm11568274igb.15.2012.12.04.19.58.06 (version=SSLv3 cipher=OTHER); Tue, 04 Dec 2012 19:58:07 -0800 (PST) Message-ID: <50BEC64B.7010906@gmail.com> Date: Tue, 04 Dec 2012 22:58:03 -0500 From: Karim Fodil-Lemelin User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: Latency issues with buf_ring References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> <50BE56C8.1030804@networx.ch> In-Reply-To: <50BE56C8.1030804@networx.ch> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Barney Cordoba , Adrian Chadd , John Baldwin , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 03:58:09 -0000 Hi, On 04/12/2012 3:02 PM, Andre Oppermann wrote: > On 04.12.2012 20:34, Adrian Chadd wrote: >> .. and it's important to note that buf_ring itself doesn't have the >> race condition; it's the general driver implementation that's racy. >> >> I have the same races in ath(4) with the watchdog programming. Exactly >> the same issue. > > Our IF_* stack/driver boundary handoff isn't up to the task anymore. > > Also the interactions are either poorly defined or understood in many > places. I've had a few chats with yongari@ and am experimenting with > a modernized interface in my branch. > > The reason I stumbled across it was because I'm extending the hardware > offload feature set and found out that the stack and the drivers (and > the drivers among themself) are not really in sync with regards to > behavior. > > For most if not all ethernet drivers from 100Mbit/s the TX DMA rings > are so large that buffering at the IFQ level doesn't make sense anymore > and only adds latency. So it could simply directly put everything into > the TX DMA and not even try to soft-queue. If the TX DMA ring is full > ENOBUFS is returned instead of filling yet another queue. However there > are ALTQ interactions and other mechanisms which have to be considered > too making it a bit more involved. I've also bumped into this 'internalization' of drbr for quite some time now. I have been toying with some ideas around a multi-queue capable ALTQ. Not unlike IFQ_* the whole class_queue_t code in ALTQ could use some freshening up. One avenue I am looking into is drbr queues (and its associated TX lock) as the back end queue implementation for ALTQ. ALTQ(9) has a concept of driver managed queues and the approach tries to keep the same paradigm but adapt it for buf_ring. In that context, It doesn't feel natural for me that drbr logic is handled so low inside the device drivers and makes system level modifications to ALTQ unnecessarily driver dependent. ALTQ is also using very coarse grained locking (using the IFQ_LOCK for everything) which doesn't make much sense in a SMP/multiqueue system but that's another story. > > I'm coming up with a draft and some benchmark results for an updated > stack/driver boundary in the next weeks before xmas. > Sounds great, can't wait to read it while drinking eggnog :) From owner-freebsd-net@FreeBSD.ORG Wed Dec 5 12:32:59 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 291CA85A for ; Wed, 5 Dec 2012 12:32:59 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm47-vm1.bullet.mail.ne1.yahoo.com (nm47-vm1.bullet.mail.ne1.yahoo.com [98.138.121.97]) by mx1.freebsd.org (Postfix) with ESMTP id BF8FF8FC14 for ; Wed, 5 Dec 2012 12:32:58 +0000 (UTC) Received: from [98.138.90.49] by nm47.bullet.mail.ne1.yahoo.com with NNFMP; 05 Dec 2012 12:32:58 -0000 Received: from [98.138.89.168] by tm2.bullet.mail.ne1.yahoo.com with NNFMP; 05 Dec 2012 12:32:58 -0000 Received: from [127.0.0.1] by omp1024.mail.ne1.yahoo.com with NNFMP; 05 Dec 2012 12:32:58 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 65999.77670.bm@omp1024.mail.ne1.yahoo.com Received: (qmail 33185 invoked by uid 60001); 5 Dec 2012 12:32:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1354710777; bh=mXB4TdkKnov/lNyNyy9JU4dr7hc96bRxKTGK/B/wMD4=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=LuAjGzb6PF/nJ4jD1u1aAw++FWUG3JnT8r9tyM6wyCfjroaEjwF8mIF+RCYztZXmm/zUJTTYVKa0YBJ1pcMOvruqAODNZaRwNixkfDUEQXktLEW0Hn3SX9CQHFBTfb0leESkuZq61p68Z+x9SiZnA3Ojo/TfEEUQnufuCkYPG3Y= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=NLvs1yrSntjcOnc7fMr6oLYb+TJuKlMRvVckTEVPqKvuKLm+Xa4qMqUFLq7ois+Q/MXGTciOfWhrJT8lFIH7Ft0MS/5YY2t+Gk2TCySomm+nN9WKKd/k/juQSrnIMvTDVL2cMEmLIeEjFgyNC3/nA8TIHOqH1/kgqx9q9Ql1PBw=; X-YMail-OSG: Y58jqwoVM1n2ffkYFcd15Ta_8CZLzIirawoghO3vYwamjAG _FJDX.IY36WbxCSgBVnE9YhYEgYPiN35dXIkczK06bdiHPm7dXoq0XJKL.7M 3n91A_kR0UWB01D.Ty9vJEkLWskDS.fC84UU9wJyeq805xBSY7AA.PxqL772 nzXVeedQhTGmJvdJ.ZzDbkZxRRN7ivebLt50yga08AHIQB6uDVvGJk..Wz6E YyuFjY0OzBC2qrAJahwS9wINb5_fAkftKwb4GiOXZOW4la7T7kmOUSlxfU4g QylibeFFzQITUpqsQmmsA1jP0.WAKLoQjae2yOYOgAdErPj5hRJycFiA3FrY jYiTGKvI7JUIMQ6lQPNRBwJnY4x.v2Gar59iDP8CWDy0jj0pTN39tqndHBoo xh7w1zM9Qd5_T2CgR8P.R0Gdf2NvFPowomNoviu_J5BubQSEbHw17idnFYc4 tXMT9uba76d3FrU3polVY7bh6CZeQY5B0BDug084qB5GIfEkKiYjfFTJlg8N 8_bVpxE6v7snIniJ2cnPHduqlqq.AcQ-- Received: from [174.48.128.27] by web121602.mail.ne1.yahoo.com via HTTP; Wed, 05 Dec 2012 04:32:57 PST X-Rocket-MIMEInfo: 001.001, CgotLS0gT24gVHVlLCAxMi80LzEyLCBCcnVjZSBFdmFucyA8YnJkZUBvcHR1c25ldC5jb20uYXU.IHdyb3RlOgoKPiBGcm9tOiBCcnVjZSBFdmFucyA8YnJkZUBvcHR1c25ldC5jb20uYXU.Cj4gU3ViamVjdDogUmU6IExhdGVuY3kgaXNzdWVzIHdpdGggYnVmX3JpbmcKPiBUbzogIkFuZHJlIE9wcGVybWFubiIgPG9wcGVybWFubkBuZXR3b3J4LmNoPgo.IENjOiAiQWRyaWFuIENoYWRkIiA8YWRyaWFuQEZyZWVCU0Qub3JnPiwgIkJhcm5leSBDb3Jkb2JhIiA8YmFybmV5X2NvcmRvYmFAeWFob28uY29tPiwgIkoBMAEBAQE- X-Mailer: YahooMailClassic/15.1.1 YahooMailWebService/0.8.128.478 Message-ID: <1354710777.97879.YahooMailClassic@web121602.mail.ne1.yahoo.com> Date: Wed, 5 Dec 2012 04:32:57 -0800 (PST) From: Barney Cordoba Subject: Re: Latency issues with buf_ring To: Andre Oppermann , Bruce Evans In-Reply-To: <20121205112511.Q932@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@FreeBSD.org, Adrian Chadd , John Baldwin X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 12:32:59 -0000 =0A=0A--- On Tue, 12/4/12, Bruce Evans wrote:=0A=0A>= From: Bruce Evans =0A> Subject: Re: Latency issues w= ith buf_ring=0A> To: "Andre Oppermann" =0A> Cc: "Adri= an Chadd" , "Barney Cordoba" = , "John Baldwin" , freebsd-net@FreeBSD.org=0A> Date: Tuesd= ay, December 4, 2012, 10:31 PM=0A> On Tue, 4 Dec 2012, Andre Oppermann=0A> = wrote:=0A> =0A> > For most if not all ethernet drivers from 100Mbit/s the= =0A> TX DMA rings=0A> > are so large that buffering at the IFQ level doesn'= t=0A> make sense anymore=0A> > and only adds latency.=0A> =0A> I found sort= of the opposite for bge at 1Gbps.=A0 Most or=0A> all bge NICs=0A> have a t= x ring size of 512.=A0 The ifq length is the tx=0A> ring size minus=0A> 1 (= 511).=A0 I needed to expand this to imax(2 * tick / 4,=0A> 10000) to=0A> ma= ximize pps.=A0 This does bad things to latency and=0A> worse things to=0A> = caching (512 buffers might fit in the L2 cache, but 10000=0A> buffers=0A> b= ust any reasonably cache as they are cycled through), but I=0A> only=0A> tr= ied to optimize tx pps.=0A> =0A> > So it could simply directly put everythi= ng into=0A> > the TX DMA and not even try to soft-queue.=A0 If the=0A> TX D= MA ring is full=0A> > ENOBUFS is returned instead of filling yet another=0A= > queue.=0A> =0A> That could work, but upper layers currently don't underst= and=0A> ENOBUFS=0A> at all, so it would work poorly now.=A0 Also, 512 entri= es=0A> is not many,=0A> so even if upper layers understood ENOBUFS it is no= t easy=0A> for them to=0A> _always_ respond fast enough to keep the tx acti= ve, unless=0A> there are=0A> upstream buffers with many more than 512 entri= es.=A0=0A> There needs to be=0A> enough buffering somewhere so that the tx = ring can be=0A> replenished=0A> almost instantly from the buffer, to handle= the worst-case=0A> latency=0A> for the threads generatng new (unbuffered) = packets.=A0 At=0A> the line rate=0A> of ~1.5 Mpps for 1 Gbps, the maximum l= atency that can be=0A> covered by=0A> 512 entries is only 340 usec.=0A> =0A= > > However there=0A> > are ALTQ interactions and other mechanisms which ha= ve=0A> to be considered=0A> > too making it a bit more involved.=0A> =0A> I= didn't try to handle ALTQ or even optimize for TCP.=0A> =0A> More details:= to maximize pps, the main detail is to ensure=0A> that the tx=0A> ring nev= er becomes empty.=A0 The tx then transmits as=0A> fast as possible.=0A> Thi= s requires some watermark processing, but FreeBSD has=0A> almost none=0A> f= or tx rings.=A0 The following normally happens for=0A> packet generators=0A= > like ttcp and netsend:=0A> =0A> - loop calling send() or sendto() until t= he tx ring (and=0A> also any=0A> =A0 upstream buffers) fill up.=A0 Then ENO= BUFS is=0A> returned.=0A> =0A> - watermark processing is broken in the user= API at this=0A> point.=A0 There=0A> =A0 is no way for the application to w= ait for the ENOBUFS=0A> condition to=0A> =A0 go away (select() and poll() d= on't work).=A0=0A> Applications use poor=0A> =A0 workarounds:=0A> =0A> - ol= d (~1989) ttcp sleeps for 18 msec when send() returns=0A> ENOBUFS.=A0 This= =0A> =A0 was barely good enough for 1 Mbps ethernet (line rate=0A> ~1500 pp= s is 27=0A> =A0 per 18 msec, so IFQ_MAXLEN =3D 50 combined with just a=0A> = 1-entry tx ring=0A> =A0 provides a safety factor of about 2).=A0 Expansion= =0A> of the tx ring size to=0A> =A0 512 makes this work with 10 Mbps ethern= et too.=A0=0A> Expansion of the ifq=0A> =A0 to 511 gives another factor of = 2.=A0 After losing=0A> the safety factor of 2,=0A> =A0 we can now handle 40= Mbps ethernet, and are only a=0A> factor of 25 short=0A> =A0 for 1 Gbps.= =A0 My hardware can't do line rate for=0A> small packets -- it=0A> =A0 can = only do 640 kpps.=A0 Thus ttcp is only a=0A> factor of 11 short of=0A> =A0 = supporting the hardware at 1 Gbps.=0A> =0A> =A0 This assumes that sleeps of= 18 msec are actually=0A> possible, which=0A> =A0 they aren't with HZ =3D 1= 00 giving a granularity of 10=0A> msec so that=0A> =A0 sleep(18 msec) actua= lly sleeps for an average of 23=0A> msec.=A0 -current=0A> =A0 uses the bad = default of HZ =3D 1000.=A0 With that=0A> sleep(18 msec) would=0A> =A0 avera= ge 18.5 msec.=A0 Of course, ttcp should sleep=0A> for more like 1=0A> =A0 m= sec if that is possible.=A0 Then the average=0A> sleep is 1.5 msec.=A0 ttcp= =0A> =A0 can keep up with the hardware with that, and is only=0A> slightly = behind=0A> =A0 the hardware with the worst-case sleep of 2 msec=0A> (512+51= 1 packets=0A> =A0 generated every 2 msec is 511.5 kpps).=0A> =0A> =A0 I nor= mally use old ttcp, except I modify it to sleep=0A> for 1 msec instead=0A> = =A0 of 18 in one version, and in another version I remove=0A> the sleep so = that=0A> =A0 it busy-waits in a loop that calls send() which=0A> almost alw= ays returns=0A> =A0 ENOBUFS.=A0 The latter wastes a lot of CPU, but is=0A> = almost good enough=0A> =A0 for throughput testing.=0A> =0A> - newer ttcp tr= ies to program the sleep time in=0A> microseconds.=A0 This doesn't=0A> =A0 = really work, since the sleep granularity is normally=0A> at least a millise= cond,=0A> =A0 and even if it could be the 340 microseconds needed=0A> by bg= e with no ifq=0A> =A0 (see above, and better divide the 340 by 2), then=0A>= this is quite short=0A> =A0 and would take almost as much CPU as=0A> busy-= waiting.=A0 I consider HZ =3D 1000=0A> =A0 to be another form of polling/bu= sy-waiting and don't=0A> use it except for=0A> =A0 testing.=0A> =0A> - netr= ate/netsend also uses a programmed sleep time.=A0=0A> This doesn't really= =0A> =A0 work, as above.=A0 netsend also tries to limit its=0A> rate based = on sleeping.=0A> =A0 This is further from working, since even=0A> finer-gra= ined sleeps are needed=0A> =A0 to limit the rate accurately than to keep up= with the=0A> maxium rate.=0A> =0A> Watermark processing at the kernel leve= l is not quite as=0A> broken.=A0 It=0A> is mostly non-existend, but partly = works, sort of=0A> accidentally.=A0 The=0A> difference is now that there is= a tx "eof" or "completion"=0A> interrupt=0A> which indicates the condition= corresponding to the ENOBUFS=0A> condition=0A> going away, so that the ker= nel doesn't have to poll for=0A> this.=A0 This=0A> is not really an "eof" i= nterrupt (unless bge is programmed=0A> insanely,=0A> to interrupt only afte= r the tx ring is completely=0A> empty).=A0 It acts as=0A> primitive waterma= rking.=A0 bge can be programmed to=0A> interrupt after=0A> having sent ever= y N packets (strictly, after every N buffer=0A> descriptors,=0A> but for sm= all packets these are the same).=A0 When there=0A> are more than=0A> N pack= ets to start, say M, this acts as a watermark at M-N=0A> packets.=0A> bge i= s normally misprogrammed with N =3D 10.=A0 At the line=0A> rate of 1.5 Mpps= ,=0A> this asks for an interrupt rate of 150 kHz, which is far too=0A> high= and=0A> is usually unreachable, so reaching the line rate is=0A> impossibl= e due to=0A> the CPU load from the interrupts.=A0 I use N =3D 384 or 256=0A= > so that the=0A> interrupt rate is not the dominant limit.=A0 However, N = =3D=0A> 10 is better=0A> for latency and works under light loads.=A0 It als= o=0A> reduces the amount=0A> of buffering needed.=0A> =0A> The ifq works mo= re as part of accidentally watermarking than=0A> as a buffer.=0A> It is the= same size as the tx right (actually 1 smaller for=0A> bogus reasons),=0A> = so it is not really useful as a buffer.=A0 However, with=0A> no explicit=0A= > watermarking, any separate buffer like the ifq provides a=0A> sort of=0A>= watermark at the boundary between the buffers.=A0 The=0A> usefulness of th= is=0A> would most obvious if the tx "eof" interrupt were actually=0A> for e= of=0A> (perhaps that is what it was originally).=A0 Then on the=0A> eof int= errupt,=0A> there is no time at all to generate new packets, and the=0A> ti= me when the=0A> tx is idle can be minimized by keeping pre-generated packet= s=0A> handy where=0A> the can be copied to the tx ring at tx "eof" interrup= t=0A> time.=A0 A buffer=0A> of about the same size as the tx ring (or maybe= 1/4) the=0A> size, is enough=0A> for this.=0A> =0A> OTOH, with bge misprog= rammed to interrupt after every 10 tx=0A> packets, the=0A> ifq is useless f= or its watermark purposes.=A0 The=0A> watermark is effectively=0A> in the t= x ring, and very strangely placed there at 10 below=0A> the top=0A> (ring f= ull).=A0 Normally tx watermarks are placed near=0A> the bottom (ring=0A> em= pty).=A0 They must not be placed too near the bottom,=0A> else there would= =0A> not be enough time to replenish the ring between the time=0A> when the= "eof"=0A> (really, the "watermark") interrupt is received and when the=0A>= tx runs=0A> dry.=A0 They should not be placed too near the top like=0A> th= ey are in -current's=0A> bge, else the point of having a large tx ring is d= efeated=0A> and there are=0A> too many interrupts.=A0 However, when they ar= e placed=0A> near the top, latencency=0A> requirements are reduced.=0A> =0A= > I recently worked on buffering for sio and noticed similar=0A> related=0A= > problems for tx watermarks.=A0 Don't laugh -- serial i/o=0A> 1 character = at=0A> a time at 3.686400 Mbps has much the same timing=0A> requirements as= =0A> ethernet i/o 1 packet at a time at 1 Gbps.=A0 Each serial=0A> characte= r=0A> takes ~2.7 usec and each minimal ethernet packet takes ~0.67=0A> usec= .=0A> With tx "ring" sizes of 128 and 512 respectively, the ring=0A> times = for=0A> full to empty are 347 usec for serial i/o and 341 usec for=0A> ethe= rnet i/o.=0A> Strangely, tx is harder than rx because:=0A> - perfection is = possible and easier to measure for tx.=A0=0A> It consists of=0A> =A0 just k= eeping at least 1 entry in the tx ring at all=0A> times.=A0 Latency=0A> =A0= must be kept below ~340 usec to have any chance of=0A> this.=A0 This is no= t=0A> =A0 so easy to achieve under _all_ loads.=0A> - for rx, you have an e= xternal source generating the=0A> packets, so you=0A> =A0 don't have to wor= ry about latency affecting the=0A> generators.=0A> - the need for watermark= processing is better known for rx,=0A> since it=0A> =A0 obviously doesn't = work to generate the rx "eof"=0A> interrupt near the=0A> =A0 top.=0A> The s= erial timing was actually harder to satisfy, because I=0A> worked on=0A> it= on a 366 MHz CPU while I worked on bge on a 2 GHz CPU,=0A> and even the=0A= > 2GHz CPU couldn't keep up with line rate (so from full to=0A> empty takes= =0A> 800 usec).=0A> =0A> It turned out that the best position for the tx lo= w=0A> watermark is about=0A> 1/4 or 1/2 from the bottom for both sio and bg= e.=A0 It=0A> must be fairly=0A> high, else the latency requirements are not= met.=A0 In=0A> the middle is a=0A> good general position.=A0 Although it a= pparently "wastes"=0A> half of the ring=0A> to make the latency requirement= s easier to meet (without=0A> very=0A> system-dependent tuning), the effici= ency lost from this is=0A> reasonably=0A> small.=0A> =0A> Bruce=0A> =0A=0AI= 'm sure that Bill Paul is a nice man, but referencing drivers that were=0Aw= ritten from a template and never properly load tested doesn't really=0Aillu= strate anything. All of his drivers are functional but optimized for=0Anoth= ing.=0A=0ABC From owner-freebsd-net@FreeBSD.ORG Wed Dec 5 13:01:13 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 32AB4955 for ; Wed, 5 Dec 2012 13:01:13 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm59-vm3.bullet.mail.ne1.yahoo.com (nm59-vm3.bullet.mail.ne1.yahoo.com [98.138.121.127]) by mx1.freebsd.org (Postfix) with ESMTP id 56EF58FC17 for ; Wed, 5 Dec 2012 13:01:12 +0000 (UTC) Received: from [98.138.226.176] by nm59.bullet.mail.ne1.yahoo.com with NNFMP; 05 Dec 2012 12:58:17 -0000 Received: from [98.138.87.6] by tm11.bullet.mail.ne1.yahoo.com with NNFMP; 05 Dec 2012 12:58:17 -0000 Received: from [127.0.0.1] by omp1006.mail.ne1.yahoo.com with NNFMP; 05 Dec 2012 12:58:17 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 728604.25428.bm@omp1006.mail.ne1.yahoo.com Received: (qmail 69087 invoked by uid 60001); 5 Dec 2012 12:58:17 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1354712297; bh=B3h8bir2SgWE4ETUypD5mlhkUVnmkWxWg71yBpDHixE=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=U5kzQd02Dqacf6gEID3KvufkzcxMnHkZCjTYTXWd3yGWcfayqU+nfudArnrFQa8uuAp44ig7J4CQsagW9zO+QYitfX6JK9ggXxljf2BIQUNMeeoEMbmju37T8dH6JPxDgkcZDWFefk3KM6W3cn1uXZ3mMg4GarLsDF0iZr8vFZY= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=vRGINeoDBlhKH3F8ZSQFabVaUhVB4qqJSZgjhw7RcThgmj6ATebFOv1Anl8plOe5sbURVXvZZkVEopFsGFUZVIFl9nshYDmSBVfeCqROp69VSH9JuAMBz0xxiC4iN5vzM+QQtVGtl2CP/kfQXI/DsMlNqkNfAulsxJDKYHrgxVw=; X-YMail-OSG: gYLHBpsVM1kSTeI7EZJA1XwTzbgNTHB43r8PYFfF_2hPTHB 2uUJ2cRGRrIOfxGJO8rdKYlYMvdFqj15NbYWwRkKwbTLVxldISZZ2CL1.EJQ A3o8_XMeGeVmBKFJSUYQal7ECMWXeqT_fGsJczXkaSg_o70qp6WXFuKTJqNV ncSTNL48JptnHk5FGm2LUK1R7FL0ASNM3tvCtWV4nhWPd4YjMsZnCN6aRuj7 rQyCdc3g1SAshcEnwVarNanEqGD5cxCiyPLGgqgYsSECMP5p.U3ew04Vb.re VVSnjtdS5XeBfQLFZ0mMVxoCyXKppjqnD4OpkbBxSEeXMARX32wyCcEHkaoW 0V1eodsaWy_Hn.y9GceWJ4U2lRXO6ZiPFk7bx3xFXhxekJKh_2scsYXDQe4n ELNZ02zbii.k61Dbag.1cfmR1Ei3rFCrNfy1r3TXADqO16s2TWAi4POVSXKS uLLLsEjxAcZ9QBLk5OV.0bifG_kgDtQOfS0xgdluPw5O16qpcTkEkGjc8sBu 3Y7E2KTPG_s0IE24DtXz7huu9sJ49gA-- Received: from [174.48.128.27] by web121606.mail.ne1.yahoo.com via HTTP; Wed, 05 Dec 2012 04:58:17 PST X-Rocket-MIMEInfo: 001.001, CgotLS0gT24gVHVlLCAxMi80LzEyLCBBZHJpYW4gQ2hhZGQgPGFkcmlhbkBmcmVlYnNkLm9yZz4gd3JvdGU6Cgo.IEZyb206IEFkcmlhbiBDaGFkZCA8YWRyaWFuQGZyZWVic2Qub3JnPgo.IFN1YmplY3Q6IFJlOiBMYXRlbmN5IGlzc3VlcyB3aXRoIGJ1Zl9yaW5nCj4gVG86ICJBbmRyZSBPcHBlcm1hbm4iIDxvcHBlcm1hbm5AbmV0d29yeC5jaD4KPiBDYzogIkJhcm5leSBDb3Jkb2JhIiA8YmFybmV5X2NvcmRvYmFAeWFob28uY29tPiwgIkpvaG4gQmFsZHdpbiIgPGpoYkBmcmVlYnNkLm9yZz4sIGZyZWVic2QBMAEBAQE- X-Mailer: YahooMailClassic/15.1.1 YahooMailWebService/0.8.128.478 Message-ID: <1354712297.65896.YahooMailClassic@web121606.mail.ne1.yahoo.com> Date: Wed, 5 Dec 2012 04:58:17 -0800 (PST) From: Barney Cordoba Subject: Re: Latency issues with buf_ring To: Andre Oppermann , Adrian Chadd In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org, John Baldwin X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 13:01:13 -0000 =0A=0A--- On Tue, 12/4/12, Adrian Chadd wrote:=0A=0A> = From: Adrian Chadd =0A> Subject: Re: Latency issues wit= h buf_ring=0A> To: "Andre Oppermann" =0A> Cc: "Barney= Cordoba" , "John Baldwin" , fre= ebsd-net@freebsd.org=0A> Date: Tuesday, December 4, 2012, 4:31 PM=0A> On 4 = December 2012 12:02, Andre=0A> Oppermann =0A> wrote:= =0A> =0A> > Our IF_* stack/driver boundary handoff isn't up to the=0A> task= anymore.=0A> =0A> Right. well, the current hand off is really "here's a=0A= > packet, go do=0A> stuff!" and the legacy if_start() method is just plain= =0A> broken for SMP,=0A> preemption and direct dispatch.=0A> =0A> Things ar= e also very special in the net80211 world, with the=0A> stack=0A> layer hav= ing to get its grubby fingers into things.=0A> =0A> I'm sure that the other= examples of layered protocols (eg=0A> doing MPLS,=0A> or even just straigh= t PPPoE style tunneling) has the same=0A> issues.=0A> Anything with sequenc= e numbers and encryption being done by=0A> some other=0A> layer is going to= have the same issue, unless it's all=0A> enforced via=0A> some other queue= and a single thread handling the network=0A> stack=0A> "stuff".=0A> =0A> I= bet direct-dispatch netgraph will have similar issues too,=0A> if it=0A> e= ver comes into existence. :-)=0A> =0A> > Also the interactions are either p= oorly defined or=0A> understood in many=0A> > places.=A0 I've had a few cha= ts with yongari@ and am=0A> experimenting with=0A> > a modernized interface= in my branch.=0A> >=0A> > The reason I stumbled across it was because I'm= =0A> extending the hardware=0A> > offload feature set and found out that th= e stack and=0A> the drivers (and=0A> > the drivers among themself) are not = really in sync with=0A> regards to behavior.=0A> >=0A> > For most if not al= l ethernet drivers from 100Mbit/s the=0A> TX DMA rings=0A> > are so large t= hat buffering at the IFQ level doesn't=0A> make sense anymore=0A> > and onl= y adds latency.=A0 So it could simply=0A> directly put everything into=0A> = > the TX DMA and not even try to soft-queue.=A0 If the=0A> TX DMA ring is f= ull=0A> > ENOBUFS is returned instead of filling yet another=0A> queue.=A0 = However there=0A> > are ALTQ interactions and other mechanisms which have= =0A> to be considered=0A> > too making it a bit more involved.=0A> =0A> net= 80211 has slightly different problems. We have=0A> requirements for=0A> per= -node, per-TID/per-AC state (not just for QOS, but=0A> separate=0A> sequenc= e numbers, different state machine handling for=0A> things like=0A> aggrega= tion and (later) U-APSD handling, etc) so we do need=0A> to direct=0A> fram= es into different queues and then correctly serialise=0A> that mess.=0A> = =0A> > I'm coming up with a draft and some benchmark results=0A> for an upd= ated=0A> > stack/driver boundary in the next weeks before xmas.=0A> =0A> Ok= . Please don't rush into it though; I'd like time to think=0A> about it=0A>= after NY (as I may actually _have_ a holiday this xmas!) and=0A> I'd like= =0A> to try and rope in people from non-ethernet-packet-pushing=0A> backgro= unds=0A> to comment.=0A> They may have much stricter and/or stranger requir= ements=0A> when it comes=0A> to how the network layer passes, serialises an= d pushes=0A> packets to=0A> other layers.=0A> =0A> Thanks,=0A> =0A> =0A> Ad= rian=0A=0ASomething I'd like to see is a general modularization of function= ,=0Awhich will make all of the other stuff much easier. A big issue with=0A= multipurpose OSes is that they tend to be bloated with stuff that almost=0A= nobody uses. 99.9% of people are running either bridge/filters or straight= =0ATCP/IP, and there is a different design goal for a single nic web server= =0Aand a router or firewall. =0A=0ABy modularization, I mean making the "pi= eces" threadable. The requirements=0Afor threading vary by application, but= the ability to control it can=0Amake a world of difference in performance.= Having a dedicate transmit=0Athread may make no sense on a web server, on = a dual core system or=0Awith a single queue adapter, but other times it mig= ht. Instead of having=0Aone big honking routine that does everything, modul= arizing it not only=0Acleans up the code, but also makes the system more fl= exible without =0Amaking it a mess.=0A=0AThe design for the 99% should not = be hindered by the need to support =0Astuff like ALTQ. The hooks for ALTQ s= hould be possible, but the locking=0Aand queuing only required for such out= liers should be separable. =0A=0AI'd also like to see a unification of all = of the projects. Is it really=0Anecessary to have 34 checks for different "= ideas" in if_ethersubr.c? =0A=0AAs a developer I know that you always want = to work on the next new thing,=0Abut sometimes you need to stop, think, and= clean up your code. The cleaner=0Acode opens up new possibilities, and res= ults in a better overall product.=0A=0ABC From owner-freebsd-net@FreeBSD.ORG Wed Dec 5 14:00:18 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9D651D8 for ; Wed, 5 Dec 2012 14:00:18 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 0F30C8FC14 for ; Wed, 5 Dec 2012 14:00:17 +0000 (UTC) Received: (qmail 10187 invoked from network); 5 Dec 2012 15:30:27 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 5 Dec 2012 15:30:27 -0000 Message-ID: <50BF536C.3060909@freebsd.org> Date: Wed, 05 Dec 2012 15:00:12 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Barney Cordoba Subject: Re: Latency issues with buf_ring References: <1354712297.65896.YahooMailClassic@web121606.mail.ne1.yahoo.com> In-Reply-To: <1354712297.65896.YahooMailClassic@web121606.mail.ne1.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Adrian Chadd , John Baldwin X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2012 14:00:18 -0000 On 05.12.2012 13:58, Barney Cordoba wrote: > > > --- On Tue, 12/4/12, Adrian Chadd wrote: > >> From: Adrian Chadd >> Subject: Re: Latency issues with buf_ring >> To: "Andre Oppermann" >> Cc: "Barney Cordoba" , "John Baldwin" , freebsd-net@freebsd.org >> Date: Tuesday, December 4, 2012, 4:31 PM >> On 4 December 2012 12:02, Andre >> Oppermann >> wrote: >> >>> Our IF_* stack/driver boundary handoff isn't up to the >> task anymore. >> >> Right. well, the current hand off is really "here's a >> packet, go do >> stuff!" and the legacy if_start() method is just plain >> broken for SMP, >> preemption and direct dispatch. >> >> Things are also very special in the net80211 world, with the >> stack >> layer having to get its grubby fingers into things. >> >> I'm sure that the other examples of layered protocols (eg >> doing MPLS, >> or even just straight PPPoE style tunneling) has the same >> issues. >> Anything with sequence numbers and encryption being done by >> some other >> layer is going to have the same issue, unless it's all >> enforced via >> some other queue and a single thread handling the network >> stack >> "stuff". >> >> I bet direct-dispatch netgraph will have similar issues too, >> if it >> ever comes into existence. :-) >> >>> Also the interactions are either poorly defined or >> understood in many >>> places. I've had a few chats with yongari@ and am >> experimenting with >>> a modernized interface in my branch. >>> >>> The reason I stumbled across it was because I'm >> extending the hardware >>> offload feature set and found out that the stack and >> the drivers (and >>> the drivers among themself) are not really in sync with >> regards to behavior. >>> >>> For most if not all ethernet drivers from 100Mbit/s the >> TX DMA rings >>> are so large that buffering at the IFQ level doesn't >> make sense anymore >>> and only adds latency. So it could simply >> directly put everything into >>> the TX DMA and not even try to soft-queue. If the >> TX DMA ring is full >>> ENOBUFS is returned instead of filling yet another >> queue. However there >>> are ALTQ interactions and other mechanisms which have >> to be considered >>> too making it a bit more involved. >> >> net80211 has slightly different problems. We have >> requirements for >> per-node, per-TID/per-AC state (not just for QOS, but >> separate >> sequence numbers, different state machine handling for >> things like >> aggregation and (later) U-APSD handling, etc) so we do need >> to direct >> frames into different queues and then correctly serialise >> that mess. >> >>> I'm coming up with a draft and some benchmark results >> for an updated >>> stack/driver boundary in the next weeks before xmas. >> >> Ok. Please don't rush into it though; I'd like time to think >> about it >> after NY (as I may actually _have_ a holiday this xmas!) and >> I'd like >> to try and rope in people from non-ethernet-packet-pushing >> backgrounds >> to comment. >> They may have much stricter and/or stranger requirements >> when it comes >> to how the network layer passes, serialises and pushes >> packets to >> other layers. >> >> Thanks, >> >> >> Adrian > > Something I'd like to see is a general modularization of function, > which will make all of the other stuff much easier. A big issue with > multipurpose OSes is that they tend to be bloated with stuff that almost > nobody uses. 99.9% of people are running either bridge/filters or straight > TCP/IP, and there is a different design goal for a single nic web server > and a router or firewall. > > By modularization, I mean making the "pieces" threadable. The requirements > for threading vary by application, but the ability to control it can > make a world of difference in performance. Having a dedicate transmit > thread may make no sense on a web server, on a dual core system or > with a single queue adapter, but other times it might. Instead of having > one big honking routine that does everything, modularizing it not only > cleans up the code, but also makes the system more flexible without > making it a mess. > > The design for the 99% should not be hindered by the need to support > stuff like ALTQ. The hooks for ALTQ should be possible, but the locking > and queuing only required for such outliers should be separable. > > I'd also like to see a unification of all of the projects. Is it really > necessary to have 34 checks for different "ideas" in if_ethersubr.c? > > As a developer I know that you always want to work on the next new thing, > but sometimes you need to stop, think, and clean up your code. The cleaner > code opens up new possibilities, and results in a better overall product. I hear you. -- Andre From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 06:39:13 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8493512B for ; Thu, 6 Dec 2012 06:39:13 +0000 (UTC) (envelope-from kevlo@kevlo.org) Received: from ns.kevlo.org (kevlo.org [220.128.136.52]) by mx1.freebsd.org (Postfix) with ESMTP id 058B78FC08 for ; Thu, 6 Dec 2012 06:39:12 +0000 (UTC) Received: from srg.kevlo.org (git.kevlo.org [220.128.136.52]) by ns.kevlo.org (8.14.5/8.14.5) with ESMTP id qB66d15M051618 for ; Thu, 6 Dec 2012 14:39:01 +0800 (CST) (envelope-from kevlo@kevlo.org) Message-ID: <50C03D8F.3090106@kevlo.org> Date: Thu, 06 Dec 2012 14:39:11 +0800 From: Kevin Lo User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Review request: fix return value of socket(2) on no family found Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 06:39:13 -0000 Hi, Here's the patch mostly from NetBSD to make socket(2) return EAFNOSUPPORT rather than EPROTONOSUPPORT if the family cannot be found. http://people.freebsd.org/~kevlo/patch-socket The man page documents the behavior specified in POSIX.1-2008: http://pubs.opengroup.org/onlinepubs/9699919799/functions/socket.html For reference, Linux, NetBSD, and OS X return EAFNOSUPPORT for this. Kevin From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 09:13:46 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8F3E8145; Thu, 6 Dec 2012 09:13:46 +0000 (UTC) (envelope-from ermal.luci@gmail.com) Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id 2E2538FC15; Thu, 6 Dec 2012 09:13:45 +0000 (UTC) Received: by mail-qa0-f47.google.com with SMTP id a19so546446qad.13 for ; Thu, 06 Dec 2012 01:13:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=E8vbsDDCOqpkv5xjsDU6z4mwj49/e+EQ5bbcO7jESvg=; b=d92GhF2cCOr1PWrHTKxKXYespxbmt7O72XdJQgG4TyCKc7FDXyWCIDrAOx55glhg4D VZamewYt/ryvA/UZ+jggdbQFBDAnj5eJNV2lbr2Xsn9HP2lE85X23xZJ6BADjQTvNnss Ai8Ti3fm52SrSa1xT1qmw8bI1+c7VhD8l8B1vy5AfboA3FAf8VLuvHcq0rEzkTXx1RtC SHo+EwgHAUnqdAeg8ScWa3VqFrCAgOy74gGbKJx+Rr6EJzyYEt/+vxMQSZ6Y54Sn3RdH iT2udGTPnjXnTs7hjTCO8HA31dl2gDTaJk3223BoWPJikWkEx+jHlmujkC++w3lazf4p 2eVw== MIME-Version: 1.0 Received: by 10.229.201.160 with SMTP id fa32mr356975qcb.16.1354785225309; Thu, 06 Dec 2012 01:13:45 -0800 (PST) Sender: ermal.luci@gmail.com Received: by 10.49.121.163 with HTTP; Thu, 6 Dec 2012 01:13:45 -0800 (PST) Date: Thu, 6 Dec 2012 10:13:45 +0100 X-Google-Sender-Auth: 3kSIhFh3XTCevcjAOClVIc3Gmco Message-ID: Subject: ipfw(4) dynamic states/rules and its callout From: =?ISO-8859-1?Q?Ermal_Lu=E7i?= To: freebsd-net , freebsd-ipfw@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 09:13:46 -0000 Hello, i was looking at ipfw dynamic code for dynamic states/rules and see that it unconditionally schedules a callout even if there is not work to do. Wouldn't it be best to reschedule it when there is something to do to avoid having a useless callout/event run every time on the system? Is there any complication i am missing on it! Regards, Ermal From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 09:35:17 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 477CEB59; Thu, 6 Dec 2012 09:35:17 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 167C38FC0C; Thu, 6 Dec 2012 09:35:17 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id A2A6F46B20; Thu, 6 Dec 2012 04:35:16 -0500 (EST) Date: Thu, 6 Dec 2012 09:35:16 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: John Baldwin Subject: Re: Latency issues with buf_ring In-Reply-To: <201212041108.17645.jhb@freebsd.org> Message-ID: References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Barney Cordoba , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 09:35:17 -0000 On Tue, 4 Dec 2012, John Baldwin wrote: >> Q2: Are there any case studies or benchmarks for buf_ring, or it is just >> blindly being used because someone claimed it was better and offered it for >> free? One of the points of locking is to avoid race conditions, so the > > fact that you have races in a supposed lock-less scheme seems more than just > ironic. > > The buf_ring author claims it has benefits in high pps workloads. I am not > aware of any benchmarks, etc. ... joining this conversation a bit late -- still about two years behind on net@ :-) ... There are several places where having a good buf_ring primitive should offer significant benefits over blocking locks around queues: - ifnet transmit enqueue path, whether owned by the general stack (ifqueue) or the driver (as is often the case with if_transmit). - netisr queues used in deferred input dispatch, including loopback. - A future lockless hand-off of inbound TCP segments from the ithread/netisr to an already running user thread a la Van Jacobson's proposal to the Linux community (now implemented), which would significantly reduce contention on inpcb locks in many workloads. I've measured significant lock contention in all those places in the past, and I believe buf_ring was intended to address at least the first case. This isn't the same as having benchmarks showing that the current code is "better", but the right primitive used in the right way should almost certainly help all of those cases substantially. I know that when Philip Paeps was working with the Solarflare driver, switching to lockless dispatch in the outbound path made a significant difference. One thing we do need to make sure is handled well is bounds on queue length, since we don't want infinitely long queues when a backlog begins to form -- there's no reason this can't be done, although the specifics depend on what one wants to accomplish and how. I would like to see us making use of lockless queue primitives in these kinds of scenarios, motivated by benchmarking, and ideally addressing architectures with weaker memory consistency properly. We should definitely minimise the number of different implementations of those primitives as much as possible, since (as with locks themselves) they are very hard to get right, and debugging problems with them can be quite problematic. Robert From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 09:39:48 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5FFE5C4C; Thu, 6 Dec 2012 09:39:48 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2DDE98FC0C; Thu, 6 Dec 2012 09:39:48 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id CCA9346B1A; Thu, 6 Dec 2012 04:39:47 -0500 (EST) Date: Thu, 6 Dec 2012 09:39:47 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Andre Oppermann Subject: Re: Latency issues with buf_ring In-Reply-To: <50BE56C8.1030804@networx.ch> Message-ID: References: <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org> <50BE56C8.1030804@networx.ch> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Barney Cordoba , Adrian Chadd , John Baldwin , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 09:39:48 -0000 On Tue, 4 Dec 2012, Andre Oppermann wrote: > For most if not all ethernet drivers from 100Mbit/s the TX DMA rings are so > large that buffering at the IFQ level doesn't make sense anymore and only > adds latency. So it could simply directly put everything into the TX DMA > and not even try to soft-queue. If the TX DMA ring is full ENOBUFS is > returned instead of filling yet another queue. However there are ALTQ > interactions and other mechanisms which have to be considered too making it > a bit more involved. I asserted for many years that software-side queueing would be subsumed by increasingly large DMA descriptor rings for the majority of devices and configurations. However, this turns out not to have happened in a number of scenarios, and so I've revised my conclusions there. I think we will continue to need to support transmit-side buffering, ideally in the form of a set of "libraries" that device drivers can use to avoid code replication and integrate queue management features fairly transparently. I'm a bit worried by the level of copy-and-paste between 10gbps device drivers right now -- for 10/100/1000 drivers, the network stack contains the majority of the code, and the responsibility of the device driver is to advertise hardware features and manage interactions with rings, interrupts, etc. On the 10gbps side, we see lots of code replication, especially in queue management, and it suggests to me (as discussed for several years in a row at BSDCan and elsehwere) that it's time to do a bit of revisiting of ifnet, pull more code back into the central stack and out of device drivers, etc. That doesn't necessarily mean changing notions of ownership of event models, rather, centralising code in libraries rather than all over the place. This is something to do with some care, of course. Robert From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 11:56:09 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E6104FBF; Thu, 6 Dec 2012 11:56:09 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 785468FC13; Thu, 6 Dec 2012 11:56:09 +0000 (UTC) Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1Tga77-0003RI-Th; Thu, 06 Dec 2012 15:59:37 +0400 Message-ID: <50C087D2.6020607@FreeBSD.org> Date: Thu, 06 Dec 2012 15:56:02 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120121 Thunderbird/9.0 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Ermal_Lu=E7i?= Subject: Re: ipfw(4) dynamic states/rules and its callout References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-net , freebsd-ipfw@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 11:56:10 -0000 On 06.12.2012 13:13, Ermal Luçi wrote: > Hello, > > i was looking at ipfw dynamic code for dynamic states/rules and see that it > unconditionally schedules a callout even if there is not work to do. > > Wouldn't it be best to reschedule it when there is something to do to avoid > having a useless > callout/event run every time on the system? > > Is there any complication i am missing on it! I thought about the same (and possibly not allocating dynamic hash at all if we have no dynamic rules) while rewriting dynamic code. The main "problem" is to reliably determine if we have dynamic rules in our ruleset. Rule checking probably can be done via adding additional argument to check_ipfw_struct(), however the rest can be a bit more complicated since we can delete more that one rule (or set with bunch of rules) at once. > > Regards, > Ermal > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 16:48:54 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F003027B for ; Thu, 6 Dec 2012 16:48:54 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id 6F16F8FC12 for ; Thu, 6 Dec 2012 16:48:53 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.5/8.14.5) with ESMTP id qB6Gmp0J061126; Thu, 6 Dec 2012 20:48:51 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.5/8.14.5/Submit) id qB6Gmn4g061125; Thu, 6 Dec 2012 20:48:49 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 6 Dec 2012 20:48:49 +0400 From: Gleb Smirnoff To: Kevin Lo Subject: Re: Review request: fix return value of socket(2) on no family found Message-ID: <20121206164849.GE48639@FreeBSD.org> References: <50C03D8F.3090106@kevlo.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <50C03D8F.3090106@kevlo.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 16:48:55 -0000 Kevin, On Thu, Dec 06, 2012 at 02:39:11PM +0800, Kevin Lo wrote: K> Here's the patch mostly from NetBSD to make socket(2) return EAFNOSUPPORT K> rather than EPROTONOSUPPORT if the family cannot be found. K> K> http://people.freebsd.org/~kevlo/patch-socket K> K> The man page documents the behavior specified in POSIX.1-2008: K> K> http://pubs.opengroup.org/onlinepubs/9699919799/functions/socket.html K> K> For reference, Linux, NetBSD, and OS X return EAFNOSUPPORT for this. IMO, the proposed change is correct. I'd suggest only couple of things: - Please commit the addition of the pffinddomain() function and its documentation separately from socket() return value change. - May be it is worth to have a comment with reference to POSIX in the code in uipc_socket.c, that selects approptiate error value. -- Totus tuus, Glebius. From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 18:02:10 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BBAD56E3 for ; Thu, 6 Dec 2012 18:02:10 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm25-vm3.bullet.mail.ne1.yahoo.com (nm25-vm3.bullet.mail.ne1.yahoo.com [98.138.91.155]) by mx1.freebsd.org (Postfix) with ESMTP id 66C758FC17 for ; Thu, 6 Dec 2012 18:02:10 +0000 (UTC) Received: from [98.138.90.49] by nm25.bullet.mail.ne1.yahoo.com with NNFMP; 06 Dec 2012 18:02:04 -0000 Received: from [98.138.87.6] by tm2.bullet.mail.ne1.yahoo.com with NNFMP; 06 Dec 2012 18:02:04 -0000 Received: from [127.0.0.1] by omp1006.mail.ne1.yahoo.com with NNFMP; 06 Dec 2012 18:02:04 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 132418.83920.bm@omp1006.mail.ne1.yahoo.com Received: (qmail 81634 invoked by uid 60001); 6 Dec 2012 18:02:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1354816924; bh=ywQcO9yhsuj8zQKuy3VE9yte4qK0pkGhZZG+Djp7cpE=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=m4Dv9cw3Xh7wxbtplKyVZHVCA57FGPKRvUNvwTYzNSK/7PMXEVE94+2GBn/Nfr8dtRPwnmRkUciwFwMCoSADnpUowmnAouB6QR4jBCx5ckJGHzpAMK7gBPd2pB2bRArGfrirYBDuP41h4zDgN27oedq2pOMXFEQUTRTf9byur+w= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=1vTQm3zPWTNy1/8+ssk2tuvSYprD346HtV2vljx1hSNwC4PWxxYwcURFiPgVQKPPf7T/wmeJPrAYPgNPZsETFdWxnHKTpiQSnOiWLI409HSQx9oR7S+ymzFq8UmReeGjbefsgrCLT5Cw3r9IKM0JNPg6Agoz92cSLu44YSCkDjg=; X-YMail-OSG: A5Ce9xsVM1nP3XFfhl7gCs4S0Frcz0WQFNjabKs7z4xs49N yFJxpw94T_iQz9auJNa5MXMOYJa0RPNNyBlOZ3wpKS4X991Nm2nQLXSJqDJg pkvSLhfxcHpzol5lmCi23EFm73_YpjsVZwy.Si9iDUG4Fd1hvQxIifijlnnX 4QcW7rUvKVe8AIyjQFhpRux5WAChrMvnobZd.oU5bk40zJXgMfd2XtvBnH8k OrHaUpACeHrFM1Ih9bTyVOadJSbaRHUfHzyDee2BkBxzmPqvTdJwHtFaNpTB v6s7a0NJtZwt8nPovx.9I7buoo.60mmXjSY6ie9gUHtabqSlGO0_SRGZ0oje b9t5JcznGKBiHYILrEMhe2uZN_fk07UJah0ZrAikyOv5WpJ1NoMwU2Xtt2tC jZJppiyuLVIUrDaDG0tvDvn12W2iuOE1f7qbXT6WwtQMYgf0nF2hHhJByxAG x7JIdUTC_5X98MlEi3VPyfFuSnFalNJnlGpBSJJ_kN1LPyS8Td66OnhoVfnx vndlJOYbDtcc_f1kOVFMLSJpfMQ7Y_Q-- Received: from [174.48.128.27] by web121601.mail.ne1.yahoo.com via HTTP; Thu, 06 Dec 2012 10:02:03 PST X-Rocket-MIMEInfo: 001.001, CgotLS0gT24gVGh1LCAxMi82LzEyLCBSb2JlcnQgV2F0c29uIDxyd2F0c29uQEZyZWVCU0Qub3JnPiB3cm90ZToKCj4gRnJvbTogUm9iZXJ0IFdhdHNvbiA8cndhdHNvbkBGcmVlQlNELm9yZz4KPiBTdWJqZWN0OiBSZTogTGF0ZW5jeSBpc3N1ZXMgd2l0aCBidWZfcmluZwo.IFRvOiAiQW5kcmUgT3BwZXJtYW5uIiA8b3BwZXJtYW5uQG5ldHdvcnguY2g.Cj4gQ2M6ICJCYXJuZXkgQ29yZG9iYSIgPGJhcm5leV9jb3Jkb2JhQHlhaG9vLmNvbT4sICJBZHJpYW4gQ2hhZGQiIDxhZHJpYW5AZnJlZWJzZC5vcmc.LCABMAEBAQE- X-Mailer: YahooMailClassic/15.1.1 YahooMailWebService/0.8.128.478 Message-ID: <1354816923.71234.YahooMailClassic@web121601.mail.ne1.yahoo.com> Date: Thu, 6 Dec 2012 10:02:03 -0800 (PST) From: Barney Cordoba Subject: Re: Latency issues with buf_ring To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 18:02:10 -0000 =0A=0A--- On Thu, 12/6/12, Robert Watson wrote:=0A=0A= > From: Robert Watson =0A> Subject: Re: Latency issues= with buf_ring=0A> To: "Andre Oppermann" =0A> Cc: "Ba= rney Cordoba" , "Adrian Chadd" , "John Baldwin" , freebsd-net@freebsd.org=0A> Date: Thu= rsday, December 6, 2012, 4:39 AM=0A> On Tue, 4 Dec 2012, Andre Oppermann=0A= > wrote:=0A> =0A> > For most if not all ethernet drivers from 100Mbit/s the= =0A> TX DMA rings are so large that buffering at the IFQ level=0A> doesn't = make sense anymore and only adds latency.=A0 So=0A> it could simply directl= y put everything into the TX DMA and=0A> not even try to soft-queue.=A0 If = the TX DMA ring is full=0A> ENOBUFS is returned instead of filling yet anot= her=0A> queue.=A0 However there are ALTQ interactions and other=0A> mechani= sms which have to be considered too making it a bit=0A> more involved.=0A> = =0A> I asserted for many years that software-side queueing would=0A> be sub= sumed by increasingly large DMA descriptor rings for=0A> the majority of de= vices and configurations.=A0 However,=0A> this turns out not to have happen= ed in a number of=0A> scenarios, and so I've revised my conclusions there.= =A0 I=0A> think we will continue to need to support transmit-side=0A> buffe= ring, ideally in the form of a set of "libraries" that=0A> device drivers c= an use to avoid code replication and=0A> integrate queue management feature= s fairly transparently.=0A> =0A> I'm a bit worried by the level of copy-and= -paste between=0A> 10gbps device drivers right now -- for 10/100/1000 drive= rs,=0A> the network stack contains the majority of the code, and the=0A> re= sponsibility of the device driver is to advertise hardware=0A> features and= manage interactions with rings, interrupts,=0A> etc.=A0 On the 10gbps side= , we see lots of code=0A> replication, especially in queue management, and = it suggests=0A> to me (as discussed for several years in a row at BSDCan an= d=0A> elsehwere) that it's time to do a bit of revisiting of=0A> ifnet, pul= l more code back into the central stack and out of=0A> device drivers, etc.= =A0 That doesn't necessarily mean=0A> changing notions of ownership of even= t models, rather,=0A> centralising code in libraries rather than all over t= he=0A> place.=A0 This is something to do with some care, of=0A> course.=0A>= =0A> Robert=0A=0A=0AMore troubling than that is the notion that the same c= ode that's suitable=0Afor 10/100Gb/s should be used in a 10Gb/s environment= . 10Gb/s requires a=0Acompletely different way of thinking.=0A=0ABC From owner-freebsd-net@FreeBSD.ORG Thu Dec 6 18:31:30 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E1F78335; Thu, 6 Dec 2012 18:31:30 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 430358FC12; Thu, 6 Dec 2012 18:31:29 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id u54so3355301wey.13 for ; Thu, 06 Dec 2012 10:31:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=9N8tU9+r7+p/H1S/e9w8DvFsTOOWRV6viq8jbxsjCDE=; b=0l196Iqtljhs/jWu4mVSe6F/BfZnbP54cNR5WqCTHWcZlqISmSJuu3QJldnxpz938L V9Ip+eWmj+1e3yDifidFkNnu6voSswpn2tp4hg5LKsR+NkLsRRYOKoc7aLy+YwQm8VyU dtu6rGEkeUd0uLDI8g4Kdi22JkvzKIfg+KzEeaKCdtnhfKFVLc+y1wJDSS3N6ZqsxG77 hIormynxB9DJhNUdtboy0S7p055mBvUd65NoV5gOsf/u3VwviLHLzniPB29DP/VAS0Sy KM77tvRvnFr6zncNI6LgDYz2FuizNUBMrhWfwhXMTWd5D72pmjQvSSJNvG4yK2U48Scc 729w== MIME-Version: 1.0 Received: by 10.180.104.69 with SMTP id gc5mr10477681wib.13.1354818689241; Thu, 06 Dec 2012 10:31:29 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.217.57.9 with HTTP; Thu, 6 Dec 2012 10:31:29 -0800 (PST) In-Reply-To: <1354816923.71234.YahooMailClassic@web121601.mail.ne1.yahoo.com> References: <1354816923.71234.YahooMailClassic@web121601.mail.ne1.yahoo.com> Date: Thu, 6 Dec 2012 10:31:29 -0800 X-Google-Sender-Auth: A_YlupdxiL_661PLRwV17rqfvQY Message-ID: Subject: Re: Latency issues with buf_ring From: Adrian Chadd To: Barney Cordoba Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org, Robert Watson X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Dec 2012 18:31:31 -0000 There've been plenty of discussions about "better" ways of doing this networking stuff. Barney, are you able to make it to any of the developer summits? adrian From owner-freebsd-net@FreeBSD.ORG Fri Dec 7 02:32:11 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 088B82B6; Fri, 7 Dec 2012 02:32:11 +0000 (UTC) (envelope-from mike@karels.net) Received: from mail.karels.net (unknown [IPv6:2001:470:c004:1::5]) by mx1.freebsd.org (Postfix) with ESMTP id BE7338FC0C; Fri, 7 Dec 2012 02:32:10 +0000 (UTC) Received: from mail.karels.net (localhost [127.0.0.1]) by mail.karels.net (8.14.5/8.14.5) with ESMTP id qB72W5ji039704; Thu, 6 Dec 2012 20:32:07 -0600 (CST) (envelope-from mike@karels.net) Message-Id: <201212070232.qB72W5ji039704@mail.karels.net> To: Gleb Smirnoff From: Mike Karels Subject: Re: Review request: fix return value of socket(2) on no family found In-reply-to: Your message of Thu, 06 Dec 2012 20:48:49 +0400. <20121206164849.GE48639@FreeBSD.org> Date: Thu, 06 Dec 2012 20:32:05 -0600 Cc: freebsd-net@freebsd.org, Kevin Lo X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: mike@karels.net List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 02:32:11 -0000 > On Thu, Dec 06, 2012 at 02:39:11PM +0800, Kevin Lo wrote: > K> Here's the patch mostly from NetBSD to make socket(2) return EAFNOSUPPORT > K> rather than EPROTONOSUPPORT if the family cannot be found. > K> > K> http://people.freebsd.org/~kevlo/patch-socket > K> > K> The man page documents the behavior specified in POSIX.1-2008: > K> > K> http://pubs.opengroup.org/onlinepubs/9699919799/functions/socket.html > K> > K> For reference, Linux, NetBSD, and OS X return EAFNOSUPPORT for this. > IMO, the proposed change is correct. I'd have to disagree. EAFNOSUPPORT means "Address family not supported by protocol family". However, the socket syscall does not take an address family parameter. It takes a protocol family, a socket type, and an optional protocol. EPFNOSUPPORT would be the correct error if the protocol family is not supported. I don't remember if I missed this when POSIX was being balloted, or if my objection was unsuccessful. That said, I will say that consistency across systems and with the standard is a useful thing, so I'll reluctantly agree with the change to the errno. However, the proposed text for socket(2) doesn't make sense: +The address family (domain) is not supported or the +specified domain is not supported by this protocol family. The domain is the protocol family. This could reasonably say just "The protocol family (domain) is not supported." It might further say "This specific error value may not be accurate, but is specified by POSIX.1-2008." Mike From owner-freebsd-net@FreeBSD.ORG Fri Dec 7 12:27:50 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by hub.freebsd.org (Postfix) with ESMTP id B9FAB329; Fri, 7 Dec 2012 12:27:50 +0000 (UTC) (envelope-from ae@FreeBSD.org) Received: from butcher-nb.yandex.net (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) by mx2.freebsd.org (Postfix) with ESMTP id 7CCE33B4C04; Fri, 7 Dec 2012 12:27:39 +0000 (UTC) Message-ID: <50C1E09A.5050301@FreeBSD.org> Date: Fri, 07 Dec 2012 16:27:06 +0400 From: "Andrey V. Elsukov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org, freebsd-ipfw Subject: [RFC] IPv6 ifaddr hash X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: melifaro@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 12:27:50 -0000 Hi All, We have discovered that ipfw(4) shows very low performance results with our rules. One of the biggest problems is rules with O_IP6_XXX_ME opcode. They checks match or not match packet's addresses with locally configured IPv6 addresses. For IPv4 we have an in_ifaddr hash for the quick search an address, but not for the IPv6. So, I have implemented the first patch based on the code for the IPv4, but there are several questions I want to discuss. The patch is here: http://people.freebsd.org/~ae/in6_ifaddrhash.diff 1. The hash size. I made it the same what IPv4 has. But I think 512 buckets is too many. 2. What hash function is better to use? 3. Using the whole 128 bit of address to hash seems like overkill. -- WBR, Andrey V. Elsukov From owner-freebsd-net@FreeBSD.ORG Fri Dec 7 20:49:20 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5A47744E for ; Fri, 7 Dec 2012 20:49:20 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id CAD558FC13 for ; Fri, 7 Dec 2012 20:49:19 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id go10so929794lbb.13 for ; Fri, 07 Dec 2012 12:49:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=VM5IVFLdPcmF4olnribjjlibJA6ulb6YPihOOnrDBhI=; b=S1a2j1L9vP+kK8zCy66pm5ANMIm620YgStnN/DjpHDVSX2lXMvPXTF1XEjRMv3EkZo x6U5MSilE02OKfwX/s4aFSVo0qxZIhUx/V7cLSsZ44i+FLYP8NHEgTwsmnGb5OmUaL5v JR09coX4Jlo6Y59ifUc2azlyLNbsx+xonyx4fZhUTLTDpKxG/Uk4UYDGRXNb86gu97D9 WX+hwU2DnJhCiDWml/u/oFn4HBsaazciYlMv9EuFXSha3KYPO4ISB5YpA7W29k2hxZup 2MEJUzwcb2yjTforPlpH3ksUxgzhZa4cZXNvPAeqV7dgfsPha8X06F2y0qwk6jYiE3aI S9ZQ== MIME-Version: 1.0 Received: by 10.152.45.229 with SMTP id q5mr6566784lam.34.1354913358674; Fri, 07 Dec 2012 12:49:18 -0800 (PST) Received: by 10.112.99.70 with HTTP; Fri, 7 Dec 2012 12:49:18 -0800 (PST) Date: Fri, 7 Dec 2012 12:49:18 -0800 Message-ID: Subject: Can't create lagg interfaces on recent HEAD (2012.12.05 based sources) From: Garrett Cooper To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 20:49:20 -0000 I can't seem to create a lagg'ed interface on HEAD with ixgbe (it's failing when creating a cloned interface), whereas creating it on 9.1-STABLE built from a couple weeks ago just worked. Ideas? Thanks, -Garrett # cat /root/ISI-GENERIC include GENERIC ident ISI-GENERIC makeoptions MODULES_OVERRIDE="bxe cxgb cxgbe em igb ixgbe qlxgb" nodevice bxe nodevice cxgb nodevice cxgbe nodevice em nodevice igb nodevice ixgbe nodevice qlxgb options OFED options SDP options IPOIB_CM device ipoib device mlx4ib device mlxen device mthca # uname -a FreeBSD wf158.west.isilon.com 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Thu Dec 6 23:41:57 PST 2012 root@wf158.west.isilon.com:/usr/obj/usr/src/sys/ISI-GENERIC amd64 # service netif restart Stopping Network: lo0 ix0 ix1. lo0: flags=8048 metric 0 mtu 16384 options=600003 nd6 options=21 ix0: flags=8802 metric 0 mtu 9000 options=407bb ether 00:1b:21:88:51:c4 inet6 fe80::21b:21ff:fe88:51c4%ix0 prefixlen 64 scopeid 0x2 nd6 options=29 media: Ethernet autoselect (10Gbase-SR ) status: active ix1: flags=8802 metric 0 mtu 9000 options=407bb ether 00:1b:21:88:51:c5 inet6 fe80::21b:21ff:fe88:51c5%ix1 prefixlen 64 scopeid 0x3 nd6 options=29 media: Ethernet autoselect (10Gbase-SR ) status: active ifconfig: SIOCIFCREATE2: Invalid argument ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Starting Network: lo0 ix0 ix1. lo0: flags=8049 metric 0 mtu 16384 options=600003 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 inet 127.0.0.1 netmask 0xff000000 nd6 options=21 ix0: flags=8843 metric 0 mtu 9000 options=407bb ether 00:1b:21:88:51:c4 inet6 fe80::21b:21ff:fe88:51c4%ix0 prefixlen 64 scopeid 0x2 nd6 options=29 media: Ethernet autoselect (10Gbase-SR ) status: active ix1: flags=8843 metric 0 mtu 9000 options=407bb ether 00:1b:21:88:51:c5 inet6 fe80::21b:21ff:fe88:51c5%ix1 prefixlen 64 scopeid 0x3 nd6 options=29 media: Ethernet autoselect (10Gbase-SR ) status: active # cat /etc/rc.conf hostname="wf158.west.isilon.com" ifconfig_em0="DHCP" sshd_enable="YES" ntpd_enable="YES" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="NO" nfs_client_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" tahi_interface="em1" # #ipv6_default_interface="$tahi_interface" #ipv6_network_interfaces="$tahi_interface" # eval "ifconfig_${tahi_interface}_ipv6='inet6 up -accept_rtadv -auto_linklocal -nud'" devfs_system_ruleset="tahi_bpf" kld_list="ixgbe" ifconfig_ix0="up mtu 9000" ifconfig_ix1="up mtu 9000" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto lacp laggport ix0 laggport ix1 lagghash l3 7.7.6.42 netmask 255.255.255.0" From owner-freebsd-net@FreeBSD.ORG Sat Dec 8 09:49:37 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AD12B96B; Sat, 8 Dec 2012 09:49:37 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3CC928FC15; Sat, 8 Dec 2012 09:49:37 +0000 (UTC) Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1ThH5l-0009WS-SS; Sat, 08 Dec 2012 13:53:05 +0400 Message-ID: <50C30D21.6070804@FreeBSD.org> Date: Sat, 08 Dec 2012 13:49:21 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120121 Thunderbird/9.0 MIME-Version: 1.0 To: "Andrey V. Elsukov" Subject: Re: [RFC] IPv6 ifaddr hash References: <50C1E09A.5050301@FreeBSD.org> In-Reply-To: <50C1E09A.5050301@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, freebsd-ipfw X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2012 09:49:37 -0000 On 07.12.2012 16:27, Andrey V. Elsukov wrote: > Hi All, > > We have discovered that ipfw(4) shows very low performance results with > our rules. One of the biggest problems is rules with O_IP6_XXX_ME > opcode. They checks match or not match packet's addresses with locally > configured IPv6 addresses. > > For IPv4 we have an in_ifaddr hash for the quick search an address, but > not for the IPv6. So, I have implemented the first patch based on the > code for the IPv4, but there are several questions I want to discuss. > > The patch is here: > http://people.freebsd.org/~ae/in6_ifaddrhash.diff > > 1. The hash size. I made it the same what IPv4 has. But I think 512 > buckets is too many. While the same IPv6 configuration can have up to x2 addresses as in IPv4 (link-local addresses), 512 is really too much, maybe 64, or 128 be better for common-use case? > > 2. What hash function is better to use? We've got at least 3 (known to me) hashes in our kernel: ng_netflow one, flowtable and in ipfw. Can you provide some benchmarks and hashing effectiveness for some real-world data for those? > > 3. Using the whole 128 bit of address to hash seems like overkill. There are people using IPv6 address space just as plain IPv4, e.g: XX:YY:ZZ::1, XX:YY:ZZ::2, ... ::n, or even XX:YY:ZZ::A.B.C.D, so hashing upper 64 bits can lead to collisions. Hashing lower 64 is more promising, but there can be other use cases, too. Imho we can just test test performance of hashing functions and see how much is the different and is it worth talking. There is another problem: link-local addresses. They are all the same, (or there are some small number of different groups) so one (or more) bucket will always be filled by them. This can result in * some searches for global addresses being much slower * IPv6 code accepting packet to link-local address of the other interface ( RFC 4291 sec 2.5.6 ) We can workaround first problem by adding global unicast to list head, and link-local - to list tail, but this leaves us with the second one. One of possible solutions is to add interface index as another parameter to hash function, and use it IFF address is site-local. > From owner-freebsd-net@FreeBSD.ORG Sat Dec 8 14:18:35 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F2F5613B for ; Sat, 8 Dec 2012 14:18:34 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm36-vm3.bullet.mail.ne1.yahoo.com (nm36-vm3.bullet.mail.ne1.yahoo.com [98.138.229.115]) by mx1.freebsd.org (Postfix) with ESMTP id 745AA8FC08 for ; Sat, 8 Dec 2012 14:18:34 +0000 (UTC) Received: from [98.138.90.52] by nm36.bullet.mail.ne1.yahoo.com with NNFMP; 08 Dec 2012 14:16:21 -0000 Received: from [98.138.226.167] by tm5.bullet.mail.ne1.yahoo.com with NNFMP; 08 Dec 2012 14:16:21 -0000 Received: from [127.0.0.1] by omp1068.mail.ne1.yahoo.com with NNFMP; 08 Dec 2012 14:16:21 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 902537.3254.bm@omp1068.mail.ne1.yahoo.com Received: (qmail 39576 invoked by uid 60001); 8 Dec 2012 14:16:21 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1354976181; bh=cCcKVvzQe2jx/nQq3/qUi34qiBYrnA9hs/cr8i7KDU0=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=H/BJKoZMNMlgTlzyj9Xb0TzIrrAbsnvRbd62ugBbhJNw6zizr/Te1wt4mMQ2FWU4MKp7kzV417AjvzJfaVE/VRyHeNoPEejcttNorT7T9CT7vXOFCOEDHQz3iqr6wlzDwSnN1mK5OSHxCKpCUdYg/h+Av1QvZBwM1jhgglcZ6KU= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=Mkov8DmvpED72KKv7CqWmLtSmq7nZwjxPVvMqyUwXJmwTylD4lCDB6SlKtuMs4X1o3jkU1PaXvNxqIA3wfqkpLhYmEqan9J/eH6ohQsKPLmjNcQKUP+zDU3ztRPTuZLrv4awYm5tZobRXeeSQmSO1nLlPx8vKRA7YmfZyv6Vmow=; X-YMail-OSG: 6gSNFVwVM1nudH.8ncrzN5BQuH4BCsZ2O9T1RcHiadbpkBl W4IvR51hfzgwIYgtxadGLQ1p6cv2BYfMP.DY.VItBjyi48bmrTt4jXfNpZY6 i7qzlV0r3YwAOI0XvffkLPuliEMV.BeWurqqu1sBm0df3P4opGSyo26JPr2y guLLUCfeL9Pv_hmPT1JHfazP59JKa7IPZ8T5GE.AZTz.4XYgp9A5Jm55E0Ok 8JIQggz7T.JWavkv9muNP3p6YUDRxUcTt9KYfX.Pu2hkrpI0LS7Zf_TvW_5g Nd.IEHFXfwuJAsZamrNpa0KbdxPV460WqZ0NtA1EvOh8nXSMw1rkJ4KskCfU nv_cPcfbDzTrUM1CyDzJR.z8lGwFXlliAEqEJDB3wBhnTmiAEGrmwvC9uNxf tN_KtVY_F7p_gpX1kPm.uzjXd9KAXwnVTiACZZ2QZAkAYT1BhXoNleg3Tifk iwFzaU6d8Om2blx9LfuMkAQdkuqFPPn8wXkZbsvKBHGOXvfvyYOkTgkhYGq_ p60wWQ07pxdi9SK2nZW1Os..6YFm.Ng-- Received: from [174.48.128.27] by web121603.mail.ne1.yahoo.com via HTTP; Sat, 08 Dec 2012 06:16:21 PST X-Rocket-MIMEInfo: 001.001, CgotLS0gT24gVGh1LCAxMi82LzEyLCBBZHJpYW4gQ2hhZGQgPGFkcmlhbkBmcmVlYnNkLm9yZz4gd3JvdGU6Cgo.IEZyb206IEFkcmlhbiBDaGFkZCA8YWRyaWFuQGZyZWVic2Qub3JnPgo.IFN1YmplY3Q6IFJlOiBMYXRlbmN5IGlzc3VlcyB3aXRoIGJ1Zl9yaW5nCj4gVG86ICJCYXJuZXkgQ29yZG9iYSIgPGJhcm5leV9jb3Jkb2JhQHlhaG9vLmNvbT4KPiBDYzogZnJlZWJzZC1uZXRAZnJlZWJzZC5vcmcsICJSb2JlcnQgV2F0c29uIiA8cndhdHNvbkBmcmVlYnNkLm9yZz4KPiBEYXRlOiBUaHVyc2RheSwgRGUBMAEBAQE- X-Mailer: YahooMailClassic/15.1.1 YahooMailWebService/0.8.128.478 Message-ID: <1354976181.39549.YahooMailClassic@web121603.mail.ne1.yahoo.com> Date: Sat, 8 Dec 2012 06:16:21 -0800 (PST) From: Barney Cordoba Subject: Re: Latency issues with buf_ring To: Adrian Chadd In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Robert Watson X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2012 14:18:35 -0000 --- On Thu, 12/6/12, Adrian Chadd wrote: > From: Adrian Chadd > Subject: Re: Latency issues with buf_ring > To: "Barney Cordoba" > Cc: freebsd-net@freebsd.org, "Robert Watson" > Date: Thursday, December 6, 2012, 1:31 PM > There've been plenty of discussions > about "better" ways of doing this > networking stuff. > > Barney, are you able to make it to any of the developer > summits? > Perhaps the "summits" are part of the problem? The goal should be to get the best ideas; not the best ideas of those with the time and resource and desire to attend a summit. Lists are the best summit. You can get ideas from people who may not be allowed by their contract obligations to attend such a summit. BC From owner-freebsd-net@FreeBSD.ORG Sat Dec 8 16:43:25 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9460AA45 for ; Sat, 8 Dec 2012 16:43:25 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2083F8FC08 for ; Sat, 8 Dec 2012 16:43:24 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id u54so783095wey.13 for ; Sat, 08 Dec 2012 08:43:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=fyUosl2T8pcnHmgZG/J8MvjZ5RpISZZrKw7iS0o1WYc=; b=ahvxIf4SFtfs77ZMFCGVVoYCVVLHLtoLjQS1Am34VXTHv9aje/MlGwdQ8azeIWsoEl 7ki6JsksMP0eWKOQGPonJEA9Z+fogwUdCfJJKdE/WcbXKe5JoL/LIyN1IyTymEU8s6Eh LZ1IXY+1WJct2HuURYzMH4iDlCas/m1pM9IhNoTvUysXmX5GKwHBspHgZFXaTQDMkIVc QVONrNPcLSE0WoLUqNrC7nFdoef7hoARQAgU3DLkLzi6V2XQnVr8aaRv7Q0bUNr5fOgH YdY57HOAl1F6wpAUrdvg0F8XJgXnz5Ivt/9UydKNc0UlLQAV8VfMPmd/PcXhno2PtyVQ TyXA== MIME-Version: 1.0 Received: by 10.216.85.211 with SMTP id u61mr3622786wee.212.1354985003781; Sat, 08 Dec 2012 08:43:23 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.217.57.9 with HTTP; Sat, 8 Dec 2012 08:43:23 -0800 (PST) In-Reply-To: <1354976181.39549.YahooMailClassic@web121603.mail.ne1.yahoo.com> References: <1354976181.39549.YahooMailClassic@web121603.mail.ne1.yahoo.com> Date: Sat, 8 Dec 2012 08:43:23 -0800 X-Google-Sender-Auth: bXq710l51IzsOcc5e4aDu_14LPs Message-ID: Subject: Re: Latency issues with buf_ring From: Adrian Chadd To: Barney Cordoba Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2012 16:43:25 -0000 The summits are just another tool for collaboration. Plenty of discussion and coding is done via list interaction. The problem isn't how the collaboration is done. The problem is having people to design and code things up. :-) Otherwise talk is just that - talk. So, someone come up with a few examples of how to better implement the network device producer/consumer model and get back to me/us about it. Adrian