From owner-cvs-src@FreeBSD.ORG Mon Jul 7 20:42:27 2008 Return-Path: Delivered-To: cvs-src@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AEB91065675 for ; Mon, 7 Jul 2008 20:42:27 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outR.internet-mail-service.net (outr.internet-mail-service.net [216.240.47.241]) by mx1.freebsd.org (Postfix) with ESMTP id E226E8FC27 for ; Mon, 7 Jul 2008 20:42:26 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 82A7B23F9; Mon, 7 Jul 2008 13:28:10 -0700 (PDT) Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id DC4AA2D6022; Mon, 7 Jul 2008 13:27:35 -0700 (PDT) Message-ID: <48727C37.9080001@elischer.org> Date: Mon, 07 Jul 2008 13:27:35 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) MIME-Version: 1.0 To: Alfred Perlstein References: <200807071057.m67Av9WD014167@repoman.freebsd.org> <20080707121042.W63144@fledge.watson.org> <48720552.9000605@freebsd.org> <20080707200418.GE95574@elvis.mu.org> In-Reply-To: <20080707200418.GE95574@elvis.mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Robert Watson , src-committers@FreeBSD.org, Andre Oppermann , cvs-all@FreeBSD.org, cvs-src@FreeBSD.org Subject: Re: cvs commit: src/sys/netinet udp_usrreq.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 20:42:27 -0000 Alfred Perlstein wrote: > * Andre Oppermann [080707 05:01] wrote: >> Robert Watson wrote: >>> On Mon, 7 Jul 2008, Robert Watson wrote: >>> >>>> rwatson 2008-07-07 10:56:55 UTC >>>> >>>> FreeBSD src repository >>>> >>>> Modified files: >>>> sys/netinet udp_usrreq.c >>>> Log: >>>> SVN rev 180344 on 2008-07-07 10:56:55Z by rwatson >>>> >>>> First step towards parallel transmit in UDP: if neither a specific >>>> source or a specific destination address is requested as part of a send >>>> on a UDP socket, read lock the inpcb rather than write lock it. This >>>> will allow fully parallel transmit down to the IP layer when sending >>>> simultaneously from multiple threads on a connected UDP socket. >>>> >>>> Parallel transmit for more complex cases, such as when sendto(2) is >>>> invoked with an address and there's already a local binding, will >>>> follow. >>> This change doesn't help the particularly interesting applications, such >>> as named, etc, as they usually call sendto() with an address rather than >>> connect() the UDP socket, but upcoming changes should address that. >>> Once you get to the IP layer, the routing code shows up as a massive >>> source of contention, and it would be great if someone wanted to work on >>> improving concurrency for routing lookups. Re-introducing the route >>> cache for inpcbs would also help the connect() case, but not the >>> sendto() case, but is still a good idea as it would help TCP a *lot*. >>> Once you get below the IP layer, contention on device driver transmit >>> locks appears to be the next major locking-related performance issue. >>> The UDP changes I'm in the throes of merging have lead to significant >>> performance improvements for UDP applications, such as named and >>> memcached, and hopefully can be MFC'd for 7.1 or 7.2. >> Caching the route in the inpcb has a number of problems: >> >> - any routing table change has to walk all inpcb's to invalidate >> and remove outdated and invalid references. >> >> - adding host routes again just bloats the table again and makes >> lookups more expensive. >> >> - host routes (cloned) do not change when the underlying route is >> adjusted and packets are still routed to the old gateway (for >> example new default route). >> >> - We have a tangled mess of cross-pointers and dependencies again >> precluding optimizations to the routing table and code itself. > > Can't you address #1, #3 and #4 by copying the entry and using > a generation count? When a route change happens, then just > bump the generation count, the copy will be invalidated and then > next time it's attempted to be used, it will be thrown out. > > Can't comment on the rest of this as I'm not that familiar... > >> A different path to a reduced routing overhead may be the following: >> >> - move ARP out of the routing table into its own per-AF and interface >> structure and optimized for fast perfect match lookups; This removes >> a lot of bloat and dependencies from the routing table. >> the arp-v2 branch in p4 does this. needs more eyes. >> - prohibit any direct references to specific routes (pointers) in the >> routing table; Lookups take the ifp/nexthop and unlock the table >> w/o any further references; >> >> - The per-route locks can be removed and a per-AF global optimized table >> lock can be introduced. >> >> - A clear separation between route lookup and modify (add/remove) should >> be made; With this change differentiated locking strategies can be >> used (rwlocks and/or the routing table can be replicated per-cpu). >> >> - Make a distinction between host and router mode to allow for different >> optimizations (rmlock for hosts and rwlocks for routers for example). >> >> Our current routing code has its fingers still in too many things. Once >> it can be untangled way more optimization and simplification is possible. > > That sounds cool. >