From owner-svn-src-head@FreeBSD.ORG  Thu Oct 18 23:39:42 2012
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D6A4D340;
 Thu, 18 Oct 2012 23:39:42 +0000 (UTC)
 (envelope-from nparhar@gmail.com)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
 [209.85.160.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 8BF2F8FC1A;
 Thu, 18 Oct 2012 23:39:42 +0000 (UTC)
Received: by mail-pb0-f54.google.com with SMTP id rp8so9516169pbb.13
 for <multiple recipients>; Thu, 18 Oct 2012 16:39:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=/jOBjjCO816ZdX/701N3c7/T8ne9CEa7Ww2xOxDKpt8=;
 b=hvEs9g/tSaA0HxTiSVLDSume7Nkvqug68gsKvbz1jNmPmVtZQbcKbnqIY4jURQfbDI
 iHDKp85uU7eVKLe3hKyi4j5705HAtAgHsD3tzOdo2IAizsUhFepjJpm79B7ywKVt5iex
 BNgKPKdU8SMUxF7TlJtt5wbfmOHJYt70P7Aax7hH4XopfvhnmJOkmHFgd+bCbLEov/1e
 I0T20bdtMQYDsjijkwtMaDjsS4RGuoGELR306nWIoqGgRhQGxAg5nYTba0IPelaNRRqq
 cjzgtaUc2fnHbiP1hI8AzC6ceoaci21UPHejYd8uCQ3gyndAo5GDLABafV9PHVPX76XK
 9cbA==
Received: by 10.68.218.132 with SMTP id pg4mr71768494pbc.100.1350603581957;
 Thu, 18 Oct 2012 16:39:41 -0700 (PDT)
Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58])
 by mx.google.com with ESMTPS id ox5sm192241pbc.75.2012.10.18.16.39.39
 (version=SSLv3 cipher=OTHER); Thu, 18 Oct 2012 16:39:40 -0700 (PDT)
Sender: Navdeep Parhar <nparhar@gmail.com>
Message-ID: <5080933A.9040404@FreeBSD.org>
Date: Thu, 18 Oct 2012 16:39:38 -0700
From: Navdeep Parhar <np@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:16.0) Gecko/20121012 Thunderbird/16.0.1
MIME-Version: 1.0
To: Andre Oppermann <andre@freebsd.org>
Subject: Re: svn commit: r241703 - head/sys/kern
References: <201210182022.q9IKMHFa016360@svn.freebsd.org>
 <50806A10.4070703@freebsd.org> <50806F6F.60109@FreeBSD.org>
 <50807CBD.8080703@freebsd.org>
In-Reply-To: <50807CBD.8080703@freebsd.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 23:39:43 -0000

On 10/18/12 15:03, Andre Oppermann wrote:
> On 18.10.2012 23:06, Navdeep Parhar wrote:
>> Hello Andre,
>>
>> A couple of things if you're poking around in this area...
>
> I didn't really mean to dive too deep into COW socket writes.
>
>> On 10/18/12 13:44, Andre Oppermann wrote:
>>> On 18.10.2012 22:22, Andre Oppermann wrote:
>>>> Author: andre
>>>> Date: Thu Oct 18 20:22:17 2012
>>>> New Revision: 241703
>>>> URL: http://svn.freebsd.org/changeset/base/241703
>>>>
>>>> Log:
>>>>    Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
>>>>    zero copy specialized sosend_copyin() helper function.
>>>
>>> Note that I'm not saying zero copy should be used or is even
>>> more performant than the optimized m_uiotombuf() function.
>>
>> Some time back I played around with a modified m_uiotombuf() that was
>> aware of the mbuf_jumbo_16K
>> zone (instead of limiting itself to 4K mbufs).  In some cases it
>> performed better than the stock
>> m_uiotombuf. I suspect this change would also help drivers that are
>> unable to deal with long gather
>> lists when doing TSO.  But my testing wasn't rigorous enough (I was
>> merely playing around), and the
>> drivers I work with can mostly cope with whatever the kernel throws at
>> them.  So nothing came out of
>> it.
>
> The jumbo 16K zone is special in that the memory is actually allocated
> by contigmalloc to get physically contiguous RAM. After some uptime and
> heavy use this may become difficult to obtain. Also contigmalloc has to
> hunt for it which may cause quite a bit of overhead.
>
> 4K mbufs, actually PAGE_SIZE mbufs, are very easily obtainable and fast.
>
> To be honest I'm not really happy about > PAGE_SIZE mbufs.  They were
> introduced at a time when DMA engines were more limited and couldn't
> do S/G DMA on receive.
>
> So performance with > PAGE_SIZE mbufs may be a little bit better but
> when you approach memory fragmentation after some heavy system usage
> it sucks up to the point where it fails most of the time.  PAGE_SIZE
> mbufs always perform the same with very little deviation.
>
> In an ideal scenario I'd like to see 9K and 16K mbufs go away and
> have the RX DMA ring stitch a packet up out of PAGE_SIZE mbufs.

Sure, when the backend allocator gets called it's easier for it to find 
a single page than multiple contiguous pages.  But if the system's 
workload keeps the 16K zone warm then the zone allocator doesn't have to 
reach out to the backend allocator all the time.  The large clusters do 
have their advantages.  I guess cluster consumers that prefer 16K but 
are willing to fall back to PAGE_SIZE when the larger zone is depleted 
will do well no matter what the memory situation is.

Regards,
Navdeep

>
>>> Actually there may be some real bit-rot to zero copy sockets.
>>> I've just started looking into it.
>>
>> I have a cxgbe(4)-specific true zero-copy implementation.  The rx side
>> is in head, the tx side works
>> only for blocking sockets (the "easy" case) and I haven't checked it
>> in anywhere.  Take a look at
>> t4_soreceive_ddp() and m_mbuftouio_ddp() in sys/dev/cxgbe/t4_ddp.c.
>> They're mostly identical to the
>> kernel routines they're based on (read: copy-pasted from).  You may
>> find them of some interest if
>> you're working in this area and are thinking of adding zero-copy hooks
>> to the socket implementation.
>
> I'm going to have a look at it think about how to generically support
> DDP either way with our socket buffer layout.
>
> Actually that may end up as the golden path. Do away with > PAGE_SIZE
> mbufs, sink page flipping COW (incorrectly named ZERO_COPY) and use
> DDP for those who need utmost performance (as I said only COW aware
> applications gain a bit of speed, unaware may end up much worse).
>