From owner-freebsd-net@FreeBSD.ORG Mon Jul 7 18:16:49 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 004691065682; Mon, 7 Jul 2008 18:16:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 82DF58FC2F; Mon, 7 Jul 2008 18:16:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m67IGgGo021885 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 8 Jul 2008 04:16:44 +1000 Date: Tue, 8 Jul 2008 04:16:42 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Andre Oppermann In-Reply-To: <48724238.2020103@freebsd.org> Message-ID: <20080708034304.R21502@delplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Ingo Flaschberger , FreeBSD Net , Bart Van Kerckhove , Paul Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 18:16:49 -0000 On Mon, 7 Jul 2008, Andre Oppermann wrote: > Bruce Evans wrote: >> So it seems that the major overheads are not near the driver (as I already >> knew), and upper layers are responsible for most of the cache misses. >> The packet header is accessed even in monitor mode, so I think most of >> the cache misses in upper layers are not related to the packet header. >> Maybe they are due mainly to perfect non-locality for mbufs. > > Monitor mode doesn't access the payload packet header. It only looks > at the mbuf (which has a structure called mbuf packet header). The mbuf > header it hot in the cache because the driver just touched it and filled > in the information. The packet content (the payload) is cold and just > arrived via DMA in DRAM. Why does it use ntohs() then? :-). From if_ethersubr.c: % static void % ether_input(struct ifnet *ifp, struct mbuf *m) % { % struct ether_header *eh; % u_short etype; % % if ((ifp->if_flags & IFF_UP) == 0) { % m_freem(m); % return; % } % #ifdef DIAGNOSTIC % if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) { % if_printf(ifp, "discard frame at !IFF_DRV_RUNNING\n"); % m_freem(m); % return; % } % #endif % /* % * Do consistency checks to verify assumptions % * made by code past this point. % */ % if ((m->m_flags & M_PKTHDR) == 0) { % if_printf(ifp, "discard frame w/o packet header\n"); % ifp->if_ierrors++; % m_freem(m); % return; % } % if (m->m_len < ETHER_HDR_LEN) { % /* XXX maybe should pullup? */ % if_printf(ifp, "discard frame w/o leading ethernet " % "header (len %u pkt len %u)\n", % m->m_len, m->m_pkthdr.len); % ifp->if_ierrors++; % m_freem(m); % return; % } % eh = mtod(m, struct ether_header *); Point outside of mbuf header. % etype = ntohs(eh->ether_type); First access outside of mbuf header. But this seems to be bogus and might be fixed by compiler optimization, since etype is not used until after the monitor mode returns. This may have been broken by debugging cruft -- in 5.2, etype is used immediately after here in a printf about discarding oversize frames. The compiler might also pessimize things by reordering code. % if (m->m_pkthdr.rcvif == NULL) { % if_printf(ifp, "discard frame w/o interface pointer\n"); % ifp->if_ierrors++; % m_freem(m); % return; % } % #ifdef DIAGNOSTIC % if (m->m_pkthdr.rcvif != ifp) { % if_printf(ifp, "Warning, frame marked as received on %s\n", % m->m_pkthdr.rcvif->if_xname); % } % #endif % % if (ETHER_IS_MULTICAST(eh->ether_dhost)) { % if (ETHER_IS_BROADCAST(eh->ether_dhost)) % m->m_flags |= M_BCAST; % else % m->m_flags |= M_MCAST; % ifp->if_imcasts++; % } Another dereference of eh (2 unless optimizable and optimized). Here the result is actually used early, but I think you don't care enough about maintaing if_imcasts to do this. % % #ifdef MAC % /* % * Tag the mbuf with an appropriate MAC label before any other % * consumers can get to it. % */ % mac_ifnet_create_mbuf(ifp, m); % #endif % % /* % * Give bpf a chance at the packet. % */ % ETHER_BPF_MTAP(ifp, m); I think this can access the whole packet, but usually doesn't. % % /* % * If the CRC is still on the packet, trim it off. We do this once % * and once only in case we are re-entered. Nothing else on the % * Ethernet receive path expects to see the FCS. % */ % if (m->m_flags & M_HASFCS) { % m_adj(m, -ETHER_CRC_LEN); % m->m_flags &= ~M_HASFCS; % } % % ifp->if_ibytes += m->m_pkthdr.len; % % /* Allow monitor mode to claim this frame, after stats are updated. */ % if (ifp->if_flags & IFF_MONITOR) { % m_freem(m); % return; % } Finally return in monitor mode. I don't see any stats update before here except for the stray if_imcasts one. BTW, stats behave strangely in monitor mode: - netstat -I 1 works except: - the byte counts are 0 every second second (the next second counts the previous 2), while the packet counts are update every second - one system started showing bge0 stats for all interfaces. Perhaps unrelated. - systat -ip shows all counts 0. I think this is due to stats maintained by the driver working but other stats not. The mixture seems strange at user level. Bruce