From owner-svn-src-all@freebsd.org Mon May 9 18:57:21 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7BA2FB3458A; Mon, 9 May 2016 18:57:21 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3B1021A58; Mon, 9 May 2016 18:57:21 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1azqMp-0003fT-4E; Mon, 09 May 2016 21:57:19 +0300 Date: Mon, 9 May 2016 21:57:19 +0300 From: Slawa Olhovchenkov To: John Baldwin Cc: src-committers@freebsd.org, svn-src-head@freebsd.org, svn-src-all@freebsd.org Subject: Re: svn commit: r299210 - in head/sys/dev/cxgbe: . tom Message-ID: <20160509185719.GG1447@zxy.spb.ru> References: <201605070033.u470XZCs075568@repo.freebsd.org> <3138889.ZBJ52FyIMB@ralph.baldwin.cx> <20160507134451.GA39874@zxy.spb.ru> <3833131.rOKpC7i1Gu@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3833131.rOKpC7i1Gu@ralph.baldwin.cx> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2016 18:57:21 -0000 On Mon, May 09, 2016 at 10:49:30AM -0700, John Baldwin wrote: > On Saturday, May 07, 2016 04:44:51 PM Slawa Olhovchenkov wrote: > > On Fri, May 06, 2016 at 05:52:15PM -0700, John Baldwin wrote: > > > > > On Saturday, May 07, 2016 12:33:35 AM John Baldwin wrote: > > > > Author: jhb > > > > Date: Sat May 7 00:33:35 2016 > > > > New Revision: 299210 > > > > URL: https://svnweb.freebsd.org/changeset/base/299210 > > > > > > > > Log: > > > > Use DDP to implement zerocopy TCP receive with aio_read(). > > > > > > > > Chelsio's TCP offload engine supports direct DMA of received TCP payload > > > > into wired user buffers. This feature is known as Direct-Data Placement. > > > > However, to scale well the adapter needs to prepare buffers for DDP > > > > before data arrives. aio_read() is more amenable to this requirement than > > > > read() as applications often call read() only after data is available in > > > > the socket buffer. > > > > > > > > When DDP is enabled, TOE sockets use the recently added pru_aio_queue > > > > protocol hook to claim aio_read(2) requests instead of letting them use > > > > the default AIO socket logic. The DDP feature supports scheduling DMA > > > > to two buffers at a time so that the second buffer is ready for use > > > > after the first buffer is filled. The aio/DDP code optimizes the case > > > > of an application ping-ponging between two buffers (similar to the > > > > zero-copy bpf(4) code) by keeping the two most recently used AIO buffers > > > > wired. If a buffer is reused, the aio/DDP code is able to reuse the > > > > vm_page_t array as well as page pod mappings (a kind of MMU mapping the > > > > Chelsio NIC uses to describe user buffers). The generation of the > > > > vmspace of the calling process is used in conjunction with the user > > > > buffer's address and length to determine if a user buffer matches a > > > > previously used buffer. If an application queues a buffer for AIO that > > > > does not match a previously used buffer then the least recently used > > > > buffer is unwired before the new buffer is wired. This ensures that no > > > > more than two user buffers per socket are ever wired. > > > > > > > > Note that this feature is best suited to applications sending a steady > > > > stream of data vs short bursts of traffic. > > > > > > > > Discussed with: np > > > > Relnotes: yes > > > > Sponsored by: Chelsio Communications > > > > > > The primary tool I used for evaluating performance was netperf's TCP stream > > > test. It is a best case for this (constant stream of traffic), but that is > > > also the intended use case for this feature. > > > > > > Using 2 64K buffers in a ping-pong via aio_read() to receive a 40Gbps stream > > > used about about two full CPUs (~190% CPU usage) on a single-package > > > Intel E5-1620 v3 @ 3.50GHz with the stock TCP stack. Enabling TOE brings the > > > usage down to about 110% CPU. With DDP, the usage is around 30% of a single > > > CPU. With two 1MB buffers the the stock and TOE numbers are about the same, > > > but the DDP usage is about 5% of single CPU. > > > > > > Note that these numbers are with aio_read(). read() fares a bit better (180% > > > for stock and 70% for TOE). Before the AIO rework, trying to use aio_read() > > > with two buffers in a ping-pong used twice as much CPU as bare read(), but > > > aio_read() in general is now fairly comparable to read() at least in terms of > > > CPU overhead. > > > > Can be this impovement of nfsclient and etc? > > The NFS client is implemented in the kernel (and doesn't use the AIO > interfaces), so that would be a bit trickier to manage. OTOH, this could be > useful for something like rsync if that had an opton to use aio_read(). May be possible by some additional create some general API for using inside kernel for nfsclient/nfsd/iscsi initiator/target/etc? Automatic using, in ideal. As I see reuiring aio in userland is for buffer pre-allocating and pining, please check me, this is already true for all in-kernel operations?