From owner-p4-projects@FreeBSD.ORG Thu Nov 29 09:20:26 2007 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id AFC4116A474; Thu, 29 Nov 2007 09:20:26 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 562FF16A417 for ; Thu, 29 Nov 2007 09:20:26 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id C859513C4F0 for ; Thu, 29 Nov 2007 09:20:25 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 66028 invoked from network); 29 Nov 2007 08:25:27 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 29 Nov 2007 08:25:27 -0000 Message-ID: <474E7E1C.3030907@freebsd.org> Date: Thu, 29 Nov 2007 09:53:48 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.13 (Windows/20070809) MIME-Version: 1.0 To: Robert Watson References: <200711260527.lAQ5RNSw090238@repoman.freebsd.org> <20071126115044.J65286@fledge.watson.org> <20071129075148.X7555@fledge.watson.org> In-Reply-To: <20071129075148.X7555@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Kip Macy , Perforce Change Reviews , Kip Macy Subject: Re: PERFORCE change 129544 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Nov 2007 09:20:27 -0000 Robert Watson wrote: > > On Wed, 28 Nov 2007, Kip Macy wrote: > >> I agree that making it toe specific is somewhat misleading. I actually >> think I'll be able to fix my code so that it can cope with data being >> added to the end of a cluster or mbuf that has already been >> transmitted. If so, I won't need to pull this along when I bring TOE >> support into CVS. >> >> Thanks for the feedback. > > I'm not as familiar with the transmit side of the socket buffer side -- > at least not anymore -- but on the receive side we make certain strong > guarantees about not replacing existing mbufs and clusters, especially > at the head of the socket buffer queue. I think the requirement for > that in 7/8 may have changed because of the rewritten soreceive() code, > but it used to be that soreceive() expected the value of sb_mb never to > go from one non-NULL value to another non-NULL value as long as the > sb_sx lock (or its predecessor) was held, even though sb_mtx had been > released. This was so that the mbuf could be left in the socket buffer > during copyout() and related receive activities, so that if there was a > short read, error, etc, mbufs weren't being re-inserted at the head of > the queue. That type of invariant has historically been undocumented, > but it could be that similar invariants exist in the compaction code for > transmit and can be documented, enforced, and possibly even relied upon. > :-) On the TX side we don't append data *into* existing mbufs to protect ongoing DMA transfers. Appends happen to the mbuf chain (m_next). A number of small writes will consume one mbuf each. On the RX side we compress the mbufs at the tail (sbcompress) to prevent external exhaustion attacks. I've written a special version of soreceive_stream that pulls as many mbufs from the head of the socket buffer as the user has specified space in iovecs. Those mbuf are removed from the queue. The lock then was dropped and the copyout performed on the whole chain in one go. This gave significant speedups at high receive speeds. However at the expense of a fatal race condition when copyout failed and the socket went away. Then the resulting prepend would horribly crash. Haven't studied the new socket locks and locking model yet in detail. Perhaps this can now be implemented in a safe way. -- Andre