From owner-cvs-all@FreeBSD.ORG  Mon Jul  7 20:42:27 2008
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0E8231065678
	for <cvs-all@FreeBSD.org>; Mon,  7 Jul 2008 20:42:27 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outQ.internet-mail-service.net (outq.internet-mail-service.net
	[216.240.47.240])
	by mx1.freebsd.org (Postfix) with ESMTP id E654B8FC29
	for <cvs-all@FreeBSD.org>; Mon,  7 Jul 2008 20:42:26 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from idiom.com (mx0.idiom.com [216.240.32.160])
	by out.internet-mail-service.net (Postfix) with ESMTP id 82A7B23F9;
	Mon,  7 Jul 2008 13:28:10 -0700 (PDT)
Received: from julian-mac.elischer.org (localhost [127.0.0.1])
	by idiom.com (Postfix) with ESMTP id DC4AA2D6022;
	Mon,  7 Jul 2008 13:27:35 -0700 (PDT)
Message-ID: <48727C37.9080001@elischer.org>
Date: Mon, 07 Jul 2008 13:27:35 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421)
MIME-Version: 1.0
To: Alfred Perlstein <alfred@freebsd.org>
References: <200807071057.m67Av9WD014167@repoman.freebsd.org>
	<20080707121042.W63144@fledge.watson.org>
	<48720552.9000605@freebsd.org>
	<20080707200418.GE95574@elvis.mu.org>
In-Reply-To: <20080707200418.GE95574@elvis.mu.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Robert Watson <rwatson@FreeBSD.org>, src-committers@FreeBSD.org,
	Andre Oppermann <andre@freebsd.org>, cvs-all@FreeBSD.org,
	cvs-src@FreeBSD.org
Subject: Re: cvs commit: src/sys/netinet udp_usrreq.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jul 2008 20:42:27 -0000

Alfred Perlstein wrote:
> * Andre Oppermann <andre@freebsd.org> [080707 05:01] wrote:
>> Robert Watson wrote:
>>> On Mon, 7 Jul 2008, Robert Watson wrote:
>>>
>>>> rwatson     2008-07-07 10:56:55 UTC
>>>>
>>>> FreeBSD src repository
>>>>
>>>> Modified files:
>>>>   sys/netinet          udp_usrreq.c
>>>> Log:
>>>> SVN rev 180344 on 2008-07-07 10:56:55Z by rwatson
>>>>
>>>> First step towards parallel transmit in UDP: if neither a specific
>>>> source or a specific destination address is requested as part of a send
>>>> on a UDP socket, read lock the inpcb rather than write lock it.  This
>>>> will allow fully parallel transmit down to the IP layer when sending
>>>> simultaneously from multiple threads on a connected UDP socket.
>>>>
>>>> Parallel transmit for more complex cases, such as when sendto(2) is
>>>> invoked with an address and there's already a local binding, will
>>>> follow.
>>> This change doesn't help the particularly interesting applications, such 
>>> as named, etc, as they usually call sendto() with an address rather than 
>>> connect() the UDP socket, but upcoming changes should address that.  
>>> Once you get to the IP layer, the routing code shows up as a massive 
>>> source of contention, and it would be great if someone wanted to work on 
>>> improving concurrency for routing lookups.  Re-introducing the route 
>>> cache for inpcbs would also help the connect() case, but not the 
>>> sendto() case, but is still a good idea as it would help TCP a *lot*.  
>>> Once you get below the IP layer, contention on device driver transmit 
>>> locks appears to be the next major locking-related performance issue.  
>>> The UDP changes I'm in the throes of merging have lead to significant 
>>> performance improvements for UDP applications, such as named and 
>>> memcached, and hopefully can be MFC'd for 7.1 or 7.2.
>> Caching the route in the inpcb has a number of problems:
>>
>>  - any routing table change has to walk all inpcb's to invalidate
>>    and remove outdated and invalid references.
>>
>>  - adding host routes again just bloats the table again and makes
>>    lookups more expensive.
>>
>>  - host routes (cloned) do not change when the underlying route is
>>    adjusted and packets are still routed to the old gateway (for
>>    example new default route).
>>
>>  - We have a tangled mess of cross-pointers and dependencies again
>>    precluding optimizations to the routing table and code itself.
> 
> Can't you address #1, #3 and #4 by copying the entry and using
> a generation count?  When a route change happens, then just
> bump the generation count, the copy will be invalidated and then
> next time it's attempted to be used, it will be thrown out.
> 
> Can't comment on the rest of this as I'm not that familiar...
> 
>> A different path to a reduced routing overhead may be the following:
>>
>>  - move ARP out of the routing table into its own per-AF and interface
>>    structure and optimized for fast perfect match lookups;  This removes
>>    a lot of bloat and dependencies from the routing table.
>>

the arp-v2 branch in p4 does this.
needs more eyes.

>>  - prohibit any direct references to specific routes (pointers) in the
>>    routing table;  Lookups take the ifp/nexthop and unlock the table
>>    w/o any further references;
>>
>>  - The per-route locks can be removed and a per-AF global optimized table
>>    lock can be introduced.
>>
>>  - A clear separation between route lookup and modify (add/remove) should
>>    be made;  With this change differentiated locking strategies can be
>>    used (rwlocks and/or the routing table can be replicated per-cpu).
>>
>>  - Make a distinction between host and router mode to allow for different
>>    optimizations  (rmlock for hosts and rwlocks for routers for example).
>>
>> Our current routing code has its fingers still in too many things.  Once
>> it can be untangled way more optimization and simplification is possible.
> 
> That sounds cool.
>