From owner-freebsd-arch@FreeBSD.ORG  Tue May 27 00:57:23 2003
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 50B5537B401
	for <arch@freebsd.org>; Tue, 27 May 2003 00:57:23 -0700 (PDT)
Received: from park.rambler.ru (park.rambler.ru [81.19.64.101])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D65C643F75
	for <arch@freebsd.org>; Tue, 27 May 2003 00:57:21 -0700 (PDT)
	(envelope-from is@rambler-co.ru)
Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102])
	by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4R7vKmF012670;
	Tue, 27 May 2003 11:57:20 +0400 (MSD)
Date: Tue, 27 May 2003 11:57:20 +0400 (MSD)
From: Igor Sysoev <is@rambler-co.ru>
X-Sender: is@is
To: Peter Jeremy <peterjeremy@optushome.com.au>
In-Reply-To: <20030526201740.GA22178@cirb503493.alcatel.com.au>
Message-ID: <Pine.BSF.4.21.0305271126470.46491-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@freebsd.org
Subject: Re: sendfile(2) SF_NOPUSH flag proposal
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 May 2003 07:57:23 -0000

On Tue, 27 May 2003, Peter Jeremy wrote:

> On Mon, May 26, 2003 at 09:41:50PM +0400, Igor Sysoev wrote:
> >sendfile(2) now has two drawbacks:
> [IP frames are not always full]
> ...
> >When I turn TCP_NOPUSH on just before sendfile() then it sends the header
> >and the first part of the file in one 1460 bytes packet.
> >Besides it sends file pages in the full ethernet 1460 bytes packets.
> >When sendfile() completed or returned EAGAIN (I use non-blocking sockets)
> >I turn TCP_NOPUSH off and the remaining file part is flushed to client.
> >Without turing off the remaining file part is delayed for 5 seconds.
> ...
> >So here is a proposal.  We can introduce a sendfile(2) flag, i.e. SF_NOPUSH
> >that will turn TF_NOPUSH on before the sending and turn it off just
> >before return. It allows to save two syscalls on each sendfile() call
> >and it's especially useful with non-blocking sockets - they can cause many
> >sendfile() calls.
> 
> I'm less certain of the benefits of this - particularly in the non-
> blocking case.  As I understand your proposal, your patch would turn
> off TF_NOPUSH just before returning EAGAIN.  At this point, the TCP
> send buffer is full so packets should start being sent immediately.
> The last data in the send buffer may not comprise a complete frame so
> it should not be sent, but left queued to be merged with the next
> sendfile(2).  Once SO_SNDLOWAT bytes are available in the send buffer,
> the socket will become writable, allowing a further sendfile(2) call.
> As long as SO_SNDLOWAT is at least one frame smaller than SO_SNDBUF,
> there should not be any send delay caused by TF_NOPUSH being set.
> 
> I believe TF_NOPUSH should be set at the beginning of a transaction
> (or when the socket is opened) and cleared at the end of a transaction
> (or implicitly by close()ing the socket).

I thought about it more and I agree with you. TF_NOPUSH should be turned on
at the start of a transaction and turned off at the end of a transaction.
So I think there should be two flags:

SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap:

    s = splnet();
    inp = sotoinpcb(so);
    if (inp != NULL) {
        tp = intotcpcb(inp);
        tp->t_flags |= TF_NOPUSH;
    }
    splx(s);


SF_PUSH - it turns TF_NOPUSH off after the sending has been completed.
If the sending returned EAGAIN then TF_NOPUSH would not be touched.
It's cheap too especially if the send buffer has enough data to fill
one MSS:

    s = splnet();
    inp = sotoinpcb(so);
    if (inp != NULL) {
        tp = intotcpcb(inp);
        tp->t_flags &= ~TF_NOPUSH;

        if (so->so_snd.sb_cc < tp->t_maxseg) {
            error = tcp_output(tp);
        }
    }
    splx(s);


Igor Sysoev
http://sysoev.ru/en/