From owner-freebsd-arch@FreeBSD.ORG Sun Jan 19 23:23:25 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 311A9E37; Sun, 19 Jan 2014 23:23:25 +0000 (UTC) Received: from mail-qe0-x231.google.com (mail-qe0-x231.google.com [IPv6:2607:f8b0:400d:c02::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D2FCD161F; Sun, 19 Jan 2014 23:23:24 +0000 (UTC) Received: by mail-qe0-f49.google.com with SMTP id w4so5750736qeb.22 for ; Sun, 19 Jan 2014 15:23:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=0q9dJTb7RECFfSOKXxbQE0tEBBCzypoq/yY+yAF2pq0=; b=VxNRa0gUj1e/U6QMWUvLVXp2SAg+23iwQ1pkWwDvYGncazFNhXKpq+EoMa4oj3XMjd jOY6DgY0tIY0hUtv1rQmc0mQYjV8yKDq98SVqem3YKiF3MNxWtdR3Ddpxfj3ktOsseoS VdxsLhAR2om3CE4aa7KM218eqRc4ppX6suQ7fSE6Alqsjc/OJqtyrPdOD934/79KY6Nj 5WdhTni39SOsehwYG/QMuaDO3oHHvCkgzBGO4ribVJToFT2D8QxGv7Nte/hBLd1RsNEl ksmGFh5IFTrfwLhXljyF/cIBNTtr8PRFE8g4L7IsXoILoTUebc2e9GT6LpygOhgIDvRK jWiw== MIME-Version: 1.0 X-Received: by 10.140.96.180 with SMTP id k49mr19171647qge.4.1390173804048; Sun, 19 Jan 2014 15:23:24 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.52.8 with HTTP; Sun, 19 Jan 2014 15:23:23 -0800 (PST) In-Reply-To: References: Date: Sun, 19 Jan 2014 15:23:23 -0800 X-Google-Sender-Auth: OPdobZwUZScu0Q6itl6_OEu-NrY Message-ID: Subject: Re: [rfc] set inp_flowid on initial TCP connection From: Adrian Chadd To: FreeBSD Net , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jan 2014 23:23:25 -0000 Ok, I've committed this to -HEAD. Thanks, -a On 16 January 2014 12:28, Adrian Chadd wrote: > Hi, > > This patch sets the inp_flowid on incoming connections. Without this, > the initial connection has no flowid, so things like the per-CPU TCP > callwheel stuff would map to a different CPU on the initial incoming > setup. > > > > -a > > Index: sys/netinet/tcp_syncache.c > =================================================================== > --- sys/netinet/tcp_syncache.c (revision 260499) > +++ sys/netinet/tcp_syncache.c (working copy) > @@ -722,6 +722,16 @@ > #endif > > /* > + * If there's an mbuf and it has a flowid, then let's initialise the > + * inp with that particular flowid. > + */ > + if (m != NULL && m->m_flags & M_FLOWID) { > + inp->inp_flags |= INP_HW_FLOWID; > + inp->inp_flags &= ~INP_SW_FLOWID; > + inp->inp_flowid = m->m_pkthdr.flowid; > + } > + > + /* > * Install in the reservation hash table for now, but don't yet > * install a connection group since the full 4-tuple isn't yet > * configured. From owner-freebsd-arch@FreeBSD.ORG Wed Jan 22 11:32:46 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 854DAB2E; Wed, 22 Jan 2014 11:32:46 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8BC991EF1; Wed, 22 Jan 2014 11:32:45 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA23687; Wed, 22 Jan 2014 13:32:37 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1W5w2v-000MSQ-Jd; Wed, 22 Jan 2014 13:32:37 +0200 Message-ID: <52DFAC31.6030905@FreeBSD.org> Date: Wed, 22 Jan 2014 13:32:01 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: core dump vs kern.ipc.shm_use_phys X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jan 2014 11:32:46 -0000 I seems that if kern.ipc.shm_use_phys is enabled then shared memory regions are not included into a coredump. Seems that each_writable_segment() in sys/kern/imgact_elf.c skips OBJT_PHYS objects. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Fri Jan 24 07:36:24 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 978D1B30; Fri, 24 Jan 2014 07:36:24 +0000 (UTC) Received: from forward-corp1e.mail.yandex.net (forward-corp1e.mail.yandex.net [IPv6:2a02:6b8:0:202::10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EA6B91E5B; Fri, 24 Jan 2014 07:36:23 +0000 (UTC) Received: from smtpcorp4.mail.yandex.net (smtpcorp4.mail.yandex.net [95.108.252.2]) by forward-corp1e.mail.yandex.net (Yandex) with ESMTP id EEAFD640156; Fri, 24 Jan 2014 11:36:08 +0400 (MSK) Received: from smtpcorp4.mail.yandex.net (localhost [127.0.0.1]) by smtpcorp4.mail.yandex.net (Yandex) with ESMTP id C9BB92C074A; Fri, 24 Jan 2014 11:36:08 +0400 (MSK) Received: from 95.108.170.36-red.dhcp.yndx.net (95.108.170.36-red.dhcp.yndx.net [95.108.170.36]) by smtpcorp4.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id CgjK8oGxfP-a8vOdV3E; Fri, 24 Jan 2014 11:36:08 +0400 (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (Client certificate not present) X-Yandex-Uniq: 2899ebf9-2281-439c-b2c4-dd8a763c3304 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1390548968; bh=XstNle/Nib50Wr/KnaHqTfMQgR6GwXxknNCPO1w7158=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: Content-Type; b=NzrnCFjPf0allBuiNwQH4bNA+J1u+097C6xHJ9uP9x9DH0CrhEJl8oL4Hwq4gL7Ti PmIqsYTMj42p/8vgHLuLULAwfAiM9yEA6/D7Q1G6+t2bKeheTbw2AWYmSvXE6OtxIC w+SQGebWUBVgfKMW2c0CyONl54SnHlHxUvnWpkWc= Authentication-Results: smtpcorp4.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Message-ID: <52E21721.5010309@yandex-team.ru> Date: Fri, 24 Jan 2014 11:32:49 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: "net@freebsd.org" Subject: "slow path" in network code || IPv6 panic on inteface removal Content-Type: multipart/mixed; boundary="------------050804080408080705010802" Cc: arch@freebsd.org, hackers@freebsd.org, "Andrey V. Elsukov" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jan 2014 07:36:24 -0000 This is a multi-part message in MIME format. --------------050804080408080705010802 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hello guys! Typically we're mostly interested in making "fast" paths in our code running faster. However it seems it is time to take care of code which is either called rarely or is quite complex in terms of relative code size or/and locking. Some good examples from current codebase are probably: * L3->L2 mapping like ARP handling - while doing doing arpresolve we discover there is no valid entry, so we start doing complex locking, are request preparing/sending in the same piece of code. This washes out both i/d caches and makes sending process _more_ unpredictable. Here we can queue given mbuf to delayed processing and return quickly. * ip_fastfwd() handling corner cases. This is already optimized in terms of splitting "fast" and "slow" code paths for all cases. * ipfw(4) (and probably other pfil consumers) generating/sending various icmp/icmp6 packets for inbound mbuf What exactly is proposed: - Another one netisr queue for handling different types of packets - metainfo is stored in mbuf_tag attached to packet - ifnet departure handler taking care of packets queued from/to killed ifnet - API to register/unregister/dispath given type of traffic Real problem which is solved by this approach (traced by ae@): We're using per-LLE IPv6 timers for various purposes, most of them requires LLE modifications, so timer function starts with lle write lock held. Some timer events requires us to send neighbour solicication messages which involves a) source address selection (requiring LLE lock being held ) and b) calling ip6_output() which requires LLE lock being not held. It is solved exactly as in IPv4 arp handling code: timer function drops write lock before calling nd6_ns_output(). Dropping/acquiring lock is error-prone, for example, the following scenario is possible (traced by ae@): we're calling if_detach(ifp) (thread 1) and nd6_llinfo_timer (thread 2). Then the following can happen: #1 T2 releases LLE lock and runs nd6_ns_output(). #2 T1 proceeds with detaching: in6_ifdetach() -> in6_purgeaddr() -> nd6_rem_ifa_lle() -> in6_lltable_prefix_free() which removes all LLEs for given prefix acquiring each LLE write lock. "Our" LLE is not destroyed since it is refcounted by nd6_llinfo_settimer_locked(). #3 T2 proceeds with nd6_ns_output() selecting source address (which involves acquiring LLE read lock) #4 T1 finishes with detaching interface addresses and sets ifp->if_addr to NULL #5 T2 calls nd6_ifptomac() which reads interface MAC from ifp->if_addr #6 User inspects core generated by previous call Using new API, we can avoid #6 by making the following code changes: * LLE timer does not drop/reacquire LLE lock * we require nd6_ns_output callers to lock LLE if it is provided * nd6_ns_output() uses "slow" path instead of sending mbuf to ip6_output() immediately if LLE is not NULL. What do you think? --------------050804080408080705010802 Content-Type: text/x-patch; name="dly_fin2.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dly_fin2.diff" Index: sys/conf/files =================================================================== --- sys/conf/files (revision 260983) +++ sys/conf/files (working copy) @@ -3044,6 +3044,7 @@ net/bpf_filter.c optional bpf | netgraph_bpf net/bpf_zerocopy.c optional bpf net/bridgestp.c optional bridge | if_bridge net/flowtable.c optional flowtable inet | flowtable inet6 +net/delayed_dispatch.c standard net/ieee8023ad_lacp.c optional lagg net/if.c standard net/if_arcsubr.c optional arcnet Index: sys/net/netisr.c =================================================================== --- sys/net/netisr.c (revision 260983) +++ sys/net/netisr.c (working copy) @@ -555,6 +555,81 @@ netisr_setqlimit(const struct netisr_handler *nhp, } /* + * Scan workqueue and delete mbufs pointed to handler. + */ +static int +netisr_scan_workqueue(struct netisr_work *npwp, netisr_scan_t *scan_f, + void *data) +{ + struct mbuf *m, *m_prev; + int deleted; + + deleted = 0; + m_prev = NULL; + m = npwp->nw_head; + while (m != NULL) { + if (scan_f(m, data) == 0) { + m_prev = m; + m = m->m_nextpkt; + continue; + } + + /* Handler requested item deletion */ + if (m_prev == NULL) + npwp->nw_head = m->m_nextpkt; + else + m_prev->m_nextpkt = m->m_nextpkt; + + if (m->m_nextpkt == NULL) + npwp->nw_tail = m_prev; + + npwp->nw_len--; + m_freem(m); + deleted++; + + if (m_prev == NULL) + m = npwp->nw_head; + else + m = m_prev->m_nextpkt; + } + + return (deleted); +} + +int +netisr_scan(unsigned int proto, netisr_scan_t *scan_f, void *data) +{ +#ifdef NETISR_LOCKING + struct rm_priotracker tracker; +#endif + struct netisr_proto *np; + struct netisr_work *npwp; + unsigned int i; + int deleted; + +#ifdef NETISR_LOCKING + NETISR_RLOCK(&tracker); +#endif + + deleted = 0; + + KASSERT(scan_f != NULL, ("%s: scan function is NULL", __func__)); + + np = &netisr_proto[proto]; + + CPU_FOREACH(i) { + npwp = &(DPCPU_ID_PTR(i, nws))->nws_work[proto]; + deleted += netisr_scan_workqueue(npwp, scan_f, data); + } + +#ifdef NETISR_LOCKING + NETISR_RUNLOCK(&tracker); +#endif + + return (deleted); +} + +/* * Drain all packets currently held in a particular protocol work queue. */ static void Index: sys/net/netisr.h =================================================================== --- sys/net/netisr.h (revision 260983) +++ sys/net/netisr.h (working copy) @@ -61,6 +61,7 @@ #define NETISR_IPV6 10 #define NETISR_NATM 11 #define NETISR_EPAIR 12 /* if_epair(4) */ +#define NETISR_SLOWPATH 13 /* delayed dispatch */ /* * Protocol ordering and affinity policy constants. See the detailed @@ -178,6 +179,7 @@ struct sysctl_netisr_work { */ struct mbuf; typedef void netisr_handler_t(struct mbuf *m); +typedef int netisr_scan_t(struct mbuf *m, void *); typedef struct mbuf *netisr_m2cpuid_t(struct mbuf *m, uintptr_t source, u_int *cpuid); typedef struct mbuf *netisr_m2flow_t(struct mbuf *m, uintptr_t source); @@ -212,6 +214,7 @@ void netisr_getqlimit(const struct netisr_handler void netisr_register(const struct netisr_handler *nhp); int netisr_setqlimit(const struct netisr_handler *nhp, u_int qlimit); void netisr_unregister(const struct netisr_handler *nhp); +int netisr_scan(u_int proto, netisr_scan_t *, void *); /* * Process a packet destined for a protocol, and attempt direct dispatch. Index: sys/netinet6/nd6.c =================================================================== --- sys/netinet6/nd6.c (revision 260983) +++ sys/netinet6/nd6.c (working copy) @@ -153,6 +153,8 @@ nd6_init(void) callout_init(&V_nd6_slowtimo_ch, 0); callout_reset(&V_nd6_slowtimo_ch, ND6_SLOWTIMER_INTERVAL * hz, nd6_slowtimo, curvnet); + + nd6_nbr_init(); } #ifdef VIMAGE @@ -160,6 +162,7 @@ void nd6_destroy() { + nd6_nbr_destroy(); callout_drain(&V_nd6_slowtimo_ch); callout_drain(&V_nd6_timer_ch); } @@ -500,9 +503,7 @@ nd6_llinfo_timer(void *arg) if (ln->la_asked < V_nd6_mmaxtries) { ln->la_asked++; nd6_llinfo_settimer_locked(ln, (long)ndi->retrans * hz / 1000); - LLE_WUNLOCK(ln); nd6_ns_output(ifp, NULL, dst, ln, 0); - LLE_WLOCK(ln); } else { struct mbuf *m = ln->la_hold; if (m) { @@ -547,9 +548,7 @@ nd6_llinfo_timer(void *arg) ln->la_asked = 1; ln->ln_state = ND6_LLINFO_PROBE; nd6_llinfo_settimer_locked(ln, (long)ndi->retrans * hz / 1000); - LLE_WUNLOCK(ln); nd6_ns_output(ifp, dst, dst, ln, 0); - LLE_WLOCK(ln); } else { ln->ln_state = ND6_LLINFO_STALE; /* XXX */ nd6_llinfo_settimer_locked(ln, (long)V_nd6_gctimer * hz); @@ -559,9 +558,7 @@ nd6_llinfo_timer(void *arg) if (ln->la_asked < V_nd6_umaxtries) { ln->la_asked++; nd6_llinfo_settimer_locked(ln, (long)ndi->retrans * hz / 1000); - LLE_WUNLOCK(ln); nd6_ns_output(ifp, dst, dst, ln, 0); - LLE_WLOCK(ln); } else { EVENTHANDLER_INVOKE(lle_event, ln, LLENTRY_EXPIRED); (void)nd6_free(ln, 0); Index: sys/netinet6/nd6.h =================================================================== --- sys/netinet6/nd6.h (revision 260983) +++ sys/netinet6/nd6.h (working copy) @@ -421,6 +421,8 @@ int nd6_storelladdr(struct ifnet *, struct mbuf *, const struct sockaddr *, u_char *, struct llentry **); /* nd6_nbr.c */ +void nd6_nbr_init(void); +void nd6_nbr_destroy(void); void nd6_na_input(struct mbuf *, int, int); void nd6_na_output(struct ifnet *, const struct in6_addr *, const struct in6_addr *, u_long, int, struct sockaddr *); Index: sys/netinet6/nd6_nbr.c =================================================================== --- sys/netinet6/nd6_nbr.c (revision 260983) +++ sys/netinet6/nd6_nbr.c (working copy) @@ -74,6 +74,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #define SDL(s) ((struct sockaddr_dl *)s) @@ -87,12 +88,37 @@ static void nd6_dad_ns_input(struct ifaddr *); static void nd6_dad_na_input(struct ifaddr *); static void nd6_na_output_fib(struct ifnet *, const struct in6_addr *, const struct in6_addr *, u_long, int, struct sockaddr *, u_int); +static int nd6_ns_output2(struct mbuf *, int, uintptr_t, struct ifnet *); VNET_DEFINE(int, dad_ignore_ns) = 0; /* ignore NS in DAD - specwise incorrect*/ VNET_DEFINE(int, dad_maxtry) = 15; /* max # of *tries* to transmit DAD packet */ #define V_dad_ignore_ns VNET(dad_ignore_ns) #define V_dad_maxtry VNET(dad_maxtry) +static struct dly_dispatcher dly_d = { + .name = "nd6_ns", + .dly_dispatch = nd6_ns_output2, +}; + +static int nd6_dlyid; + +void +nd6_nbr_init() +{ + + if (IS_DEFAULT_VNET(curvnet)) + nd6_dlyid = dly_register(&dly_d); +} + +void +nd6_nbr_destroy() +{ + + if (IS_DEFAULT_VNET(curvnet)) + dly_unregister(nd6_dlyid); +} + + /* * Input a Neighbor Solicitation Message. * @@ -366,11 +392,34 @@ nd6_ns_input(struct mbuf *m, int off, int icmp6len m_freem(m); } +static int +nd6_ns_output2(struct mbuf *m, int dad, uintptr_t _data, struct ifnet *ifp) +{ + struct ip6_moptions im6o; + + if (m->m_flags & M_MCAST) { + im6o.im6o_multicast_ifp = ifp; + im6o.im6o_multicast_hlim = 255; + im6o.im6o_multicast_loop = 0; + } + + /* Zero ingress interface not to fool PFIL consumers */ + m->m_pkthdr.rcvif = NULL; + + ip6_output(m, NULL, NULL, dad ? IPV6_UNSPECSRC : 0, &im6o, NULL, NULL); + icmp6_ifstat_inc(ifp, ifs6_out_msg); + icmp6_ifstat_inc(ifp, ifs6_out_neighborsolicit); + ICMP6STAT_INC(icp6s_outhist[ND_NEIGHBOR_SOLICIT]); + + return (0); +} + /* * Output a Neighbor Solicitation Message. Caller specifies: * - ICMP6 header source IP6 address * - ND6 header target IP6 address * - ND6 header source datalink address + * Note llentry has to be locked if specified * * Based on RFC 2461 * Based on RFC 2462 (duplicate address detection) @@ -386,11 +435,9 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_ struct m_tag *mtag; struct ip6_hdr *ip6; struct nd_neighbor_solicit *nd_ns; - struct ip6_moptions im6o; int icmp6len; int maxlen; caddr_t mac; - struct route_in6 ro; if (IN6_IS_ADDR_MULTICAST(taddr6)) return; @@ -413,13 +460,8 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_ if (m == NULL) return; - bzero(&ro, sizeof(ro)); - if (daddr6 == NULL || IN6_IS_ADDR_MULTICAST(daddr6)) { m->m_flags |= M_MCAST; - im6o.im6o_multicast_ifp = ifp; - im6o.im6o_multicast_hlim = 255; - im6o.im6o_multicast_loop = 0; } icmp6len = sizeof(*nd_ns); @@ -468,7 +510,6 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_ hsrc = NULL; if (ln != NULL) { - LLE_RLOCK(ln); if (ln->la_hold != NULL) { struct ip6_hdr *hip6; /* hold ip6 */ @@ -483,7 +524,6 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_ hsrc = &hip6->ip6_src; } } - LLE_RUNLOCK(ln); } if (hsrc && (ifa = (struct ifaddr *)in6ifa_ifpwithaddr(ifp, hsrc)) != NULL) { @@ -502,7 +542,7 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_ oifp = ifp; error = in6_selectsrc(&dst_sa, NULL, - NULL, &ro, NULL, &oifp, &src_in); + NULL, NULL, NULL, &oifp, &src_in); if (error) { char ip6buf[INET6_ADDRSTRLEN]; nd6log((LOG_DEBUG, @@ -572,20 +612,16 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_ m_tag_prepend(m, mtag); } - ip6_output(m, NULL, &ro, dad ? IPV6_UNSPECSRC : 0, &im6o, NULL, NULL); - icmp6_ifstat_inc(ifp, ifs6_out_msg); - icmp6_ifstat_inc(ifp, ifs6_out_neighborsolicit); - ICMP6STAT_INC(icp6s_outhist[ND_NEIGHBOR_SOLICIT]); + if (ln == NULL) + nd6_ns_output2(m, dad, 0, ifp); + else { + m->m_pkthdr.rcvif = ifp; /* Save VNET */ + dly_queue(nd6_dlyid, m, dad, 0, ifp); + } - /* We don't cache this route. */ - RO_RTFREE(&ro); - return; bad: - if (ro.ro_rt) { - RTFREE(ro.ro_rt); - } m_freem(m); return; } Index: sys/sys/mbuf.h =================================================================== --- sys/sys/mbuf.h (revision 260983) +++ sys/sys/mbuf.h (working copy) @@ -1022,6 +1022,7 @@ struct mbuf *m_unshare(struct mbuf *, int); #define PACKET_TAG_CARP 28 /* CARP info */ #define PACKET_TAG_IPSEC_NAT_T_PORTS 29 /* two uint16_t */ #define PACKET_TAG_ND_OUTGOING 30 /* ND outgoing */ +#define PACKET_TAG_DISPATCH_INFO 31 /* Netist slow dispatch */ /* Specific cookies and tags. */ --- /dev/null 2014-01-24 00:33:00.000000000 +0400 +++ sys/net/delayed_dispatch.c 2014-01-24 00:17:05.573964680 +0400 @@ -0,0 +1,364 @@ +/*- + * Copyright (c) 2014 Alexander V. Chernikov + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD: head/sys/net/delayed_dispatch.c$"); + +/* + * delayed dispatch is so-called "slowpath" packet path which permits you + * to enqueue mbufs requiring complex dispath (and/or possibly complex locking) + * into separate netisr queue instead of trying to deal with it in "fast" code path. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include + +struct dly_info { + struct dly_dispatcher *index; + int alloc; + int count; + struct rmlock lock; +}; +#define DLY_ALLOC_ITEMS 16 + +static struct dly_info dly; + +#define DLY_LOCK_INIT() rm_init(&dly.lock, "dly_lock") +#define DLY_RLOCK() rm_rlock(&dly.lock, &tracker) +#define DLY_RUNLOCK() rm_runlock(&dly.lock, &tracker) +#define DLY_WLOCK() rm_wlock(&dly.lock) +#define DLY_WUNLOCK() rm_wunlock(&dly.lock) +#define DLY_READER struct rm_priotracker tracker + +static eventhandler_tag ifdetach_tag; + +/* + * Adds mbuf to slowpath queue. Additional information + * is stored in PACKET_TAG_DISPATCH_INFO mbuf tag. + * Returns 0 if successfull, error code overwise. + */ +int +dly_queue(int dtype, struct mbuf *m, int dsubtype, uintptr_t data, + struct ifnet *ifp) +{ + struct dly_item *item; + struct m_tag *dtag; + DLY_READER; + + /* Ensure we're not going to cycle packet */ + if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) { + printf("tag found: %p\n", dtag); + return (EINVAL); + } + + DLY_RLOCK(); + if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) { + DLY_RUNLOCK(); + printf("invalid dtype: 0..%d..%d\n", dtype, dly.alloc); + return (EINVAL); + } + DLY_RUNLOCK(); + + VNET_ASSERT(m->m_pkthdr.rcvif != NULL, + ("%s:%d rcvif == NULL: m=%p", __func__, __LINE__, m)); + + /* + * Do not allocate tag for basic IPv4/IPv6 output + */ + if (dtype != 0) { + dtag = m_tag_get(PACKET_TAG_DISPATCH_INFO, + sizeof(struct dly_item), M_NOWAIT); + + if (dtag == NULL) + return (ENOBUFS); + + item = (struct dly_item *)(dtag + 1); + + item->type = dtype; + item->subtype = dsubtype; + item->data = data; + item->ifp = ifp; + + m_tag_prepend(m, dtag); + } + + netisr_queue(NETISR_SLOWPATH, m); + + return (0); +} + +/* + * Adds mbuf to slowpath queue. User-provided buffer + * of size @size is stored inside PACKET_TAG_DISPATCH_INFO + * mbuf tag. Buffer structure needs to embed properly filled + * dly_item structure at the beginning of buffer. Such buffers + * needs to be dispatched by dly_pdispatch() handler. + * + * Returns 0 if successfull, error code overwise. + */ +int +dly_pqueue(int dtype, struct mbuf *m, struct dly_item *item, size_t size) +{ + struct m_tag *dtag; + DLY_READER; + + /* Ensure we're not going to cycle packet */ + if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) { + return (EINVAL); + } + + DLY_RLOCK(); + if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) { + DLY_RUNLOCK(); + return (EINVAL); + } + DLY_RUNLOCK(); + + VNET_ASSERT(m->m_pkthdr.rcvif != NULL, + ("%s:%d rcvif == NULL: m=%p", __func__, __LINE__, m)); + + dtag = m_tag_get(PACKET_TAG_DISPATCH_INFO, size, M_NOWAIT); + + if (dtag == NULL) + return (ENOBUFS); + + memcpy(dtag + 1, item, size); + m_tag_prepend(m, dtag); + netisr_queue(NETISR_SLOWPATH, m); + + return (0); +} + +/* + * Base netisr handler for slowpath + */ +static void +dly_dispatch_item(struct mbuf *m) +{ + struct m_tag *dtag; + struct dly_item *item; + int dtype; + struct dly_dispatcher *dld; + DLY_READER; + + item = NULL; + dtype = 0; + + if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) { + item = (struct dly_item *)(dtag + 1); + dtype = item->type; + } + + DLY_RLOCK(); + if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) { + DLY_RUNLOCK(); + return; + } + + dld = &dly.index[dtype]; + + if (dld->dly_dispatch != NULL) + dld->dly_dispatch(m, item->subtype, item->data, item->ifp); + else + dld->dly_pdispatch(m, item); + + DLY_RUNLOCK(); + + return; +} + + +/* + * Check if queue items is received or going to be transmitted + * via destroying interface. + */ +static int +dly_scan_ifp(struct mbuf *m, void *_data) +{ + struct m_tag *dtag; + struct dly_item *item; + struct ifnet *difp; + + difp = (struct ifnet *)_data; + + if (m->m_pkthdr.rcvif == difp) + return (1); + + if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) { + item = (struct dly_item *)(dtag + 1); + if (item->ifp == difp) + return (1); + } + + return (0); +} + +/* + * Registers new slowpath handler. + * Returns handler id to use in dly_queue() or + * dly_pqueue() functions/ + */ +int +dly_register(struct dly_dispatcher *dld) +{ + int i, alloc; + struct dly_dispatcher *dd, *tmp; + +again: + DLY_WLOCK(); + + if (dly.count < dly.alloc) { + i = dly.count++; + dly.index[i] = *dld; + DLY_WUNLOCK(); + return (i); + } + + alloc = dly.alloc + DLY_ALLOC_ITEMS; + + DLY_WUNLOCK(); + + /* No spare room, need to increase */ + dd = malloc(sizeof(struct dly_dispatcher) * alloc, M_TEMP, + M_ZERO|M_WAITOK); + + DLY_WLOCK(); + if (dly.alloc >= alloc) { + /* Lost the race, try again */ + DLY_WUNLOCK(); + free(dd, M_TEMP); + goto again; + } + + memcpy(dly.index, dd, sizeof(struct dly_dispatcher) * dly.alloc); + tmp = dly.index; + dly.index = dd; + dly.alloc = alloc; + i = dly.count++; + dly.index[i] = *dld; + DLY_WUNLOCK(); + + free(tmp, M_TEMP); + + return (i); +} + +/* + * Checks if given netisr queue item is of type which + * needs to be unregistered. + */ +static int +dly_scan_unregistered(struct mbuf *m, void *_data) +{ + struct m_tag *dtag; + struct dly_item *item; + int i; + + i = *((int *)(intptr_t)_data); + + if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) { + item = (struct dly_item *)(dtag + 1); + if (item->type == i) + return (1); + } + + return (0); +} + +/* + * Unregisters slow handler registered previously by dly_register(). + * Caller needs to ensure that no new items of given type can be queued + * prior calling this function. + */ +void +dly_unregister(int dtype) +{ + + netisr_scan(NETISR_SLOWPATH, dly_scan_unregistered, &dtype); + + DLY_WLOCK(); + if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) { + DLY_WUNLOCK(); + return; + } + + KASSERT(dly.index[dtype].name != NULL, + ("%s: unresigstering non-existend protocol %d", __func__, dtype)); + + memset(&dly.index[dtype], 0, sizeof(struct dly_dispatcher)); + DLY_WUNLOCK(); +} + + +static void +dly_ifdetach(void *arg __unused, struct ifnet *ifp) +{ + + netisr_scan(NETISR_SLOWPATH, dly_scan_ifp, ifp); +} + +static struct netisr_handler dly_nh = { + .nh_name = "slow", + .nh_handler = dly_dispatch_item, + .nh_proto = NETISR_SLOWPATH, + .nh_policy = NETISR_POLICY_SOURCE, +}; + +static void +dly_init(__unused void *arg) +{ + + memset(&dly, 0, sizeof(dly)); + dly.index = malloc(sizeof(struct dly_dispatcher) * DLY_ALLOC_ITEMS, + M_TEMP, M_ZERO|M_WAITOK); + dly.alloc = DLY_ALLOC_ITEMS; + dly.count = 1; + + DLY_LOCK_INIT(); + + netisr_register(&dly_nh); + ifdetach_tag = EVENTHANDLER_REGISTER(ifnet_departure_event, + dly_ifdetach, NULL, EVENTHANDLER_PRI_ANY); +} + +/* Exactly after netisr */ +SYSINIT(dly_init, SI_SUB_SOFTINTR, SI_ORDER_SECOND, dly_init, NULL); + --- /dev/null 2014-01-24 00:33:00.000000000 +0400 +++ sys/net/delayed_dispatch.h 2014-01-23 23:54:50.166594749 +0400 @@ -0,0 +1,57 @@ +/*- + * Copyright (c) 2014 Alexander V. Chernikov + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: head/sys/net/netisr.h 222249 2011-05-24 12:34:19Z rwatson $ + */ + +#ifndef _NET_DELAYED_DISPATCH_H_ +#define _NET_DELAYED_DISPATCH_H_ + +struct dly_item { + int type; + int subtype; + struct ifnet *ifp; + uintptr_t data; +}; + +typedef int dly_dispatch_t(struct mbuf *, int, uintptr_t, struct ifnet *); +typedef int dly_pdispatch_t(struct mbuf *, struct dly_item *); +typedef int dly_free_t(struct mbuf *, int, uintptr_t, struct ifnet *); + +struct dly_dispatcher { + const char *name; + dly_dispatch_t *dly_dispatch; + dly_pdispatch_t *dly_pdispatch; + dly_free_t *dly_free; +}; + + +int dly_register(struct dly_dispatcher *); +void dly_unregister(int); +int dly_queue(int, struct mbuf *, int, uintptr_t, struct ifnet *); +int dly_pqueue(int, struct mbuf *, struct dly_item *, size_t); + +#endif + --------------050804080408080705010802--