From owner-freebsd-net Tue Dec 15 11:23:37 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA11896 for freebsd-net-outgoing; Tue, 15 Dec 1998 11:23:37 -0800 (PST) (envelope-from owner-freebsd-net@FreeBSD.ORG) Received: from mail-out2.apple.com (mail-out2.apple.com [17.254.0.51]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA11890 for ; Tue, 15 Dec 1998 11:23:35 -0800 (PST) (envelope-from justin@scv4.apple.com) Received: from mailgate.apple.com (A17-128-100-225.apple.com [17.128.100.225]) by mail-out2.apple.com (8.8.5/8.8.5) with ESMTP id LAA19682 for ; Tue, 15 Dec 1998 11:14:04 -0800 Received: from scv4.apple.com (scv4.apple.com) by mailgate.apple.com (mailgate.apple.com - SMTPRS 2.0.15) with ESMTP id for ; Tue, 15 Dec 1998 11:13:57 -0800 Received: from localhost (grinch.apple.com [17.202.43.163]) by scv4.apple.com (8.8.5/8.8.5) with ESMTP id LAA12518 for ; Tue, 15 Dec 1998 11:13:38 -0800 Received: (from justin@localhost) by localhost (8.8.5/8.8.5) id LAA00996 for freebsd-net@freebsd.org; Tue, 15 Dec 1998 11:06:00 -0800 Message-Id: <19981215110600.D652@apple.com> Date: Tue, 15 Dec 1998 11:06:00 -0800 From: "Justin C. Walker" To: freebsd-net@FreeBSD.ORG Subject: Re: MLEN < write length < MINCLSIZE "bug" Reply-To: justin@apple.com References: <199812151555.PAA07456@netrinsics.com> MIME-Version: 1.0 X-Mailer: Mutt 0.93.2i In-Reply-To: ; from Marc Slemko on Tue, Dec 15, 1998 at 08:30:12AM -0800 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org If I can horn in on the discussion, we have seen the problems you're debating, particularly when using request/response interactions over TCP. After beating our heads against a few walls, we came to the following conclusion: this is due (as has been observed before in this thread) to the separation between the job that the socket layer is doing, and the job that the TCP (or, generally, transport protocol) layer is doing. One way to smooth over the bump in the road is to provide a hint to the lower layer, so we have a socket state bit (SS_MORETOCOME) that we turn on (in so_state) just before calling the protocol send routine (PRU_SEND, ...), and turn off after the call returns. The hint is turned on only if resid is positive. In tcp_output(), in the 'if len { ...}' (following the comment on silly window avoidaince), we check for the bit: if (len) { if (len == tp->t_maxseg) goto send; if (!(so->so_state & SS_MORETOCOME)) { if ((idle || tp->t_flags & TF_NODELAY) && len + off >= so->so_snd.sb_cc) goto send; } if (tp->t_force) goto send; if (len >= tp->max_sndwnd / 2) goto send; if (SEQ_LT(tp->snd_nxt, tp->snd_max)) goto send; } Essentially, if there's more to come, we hold off sending; and we only believe there's more to come if the user has committed to it (in the form of a write request). This seems to smooth out (some of) the bumps caused by the user buffer/mbuf/cluster size differences and the request/response effects on the TCP state machines. Regards, Justin On Tue, Dec 15, 1998 at 08:30:12AM -0800, Marc Slemko wrote: > (-stable removed from the cc list, since this isn't particular to stable > in any way) > > On Tue, 15 Dec 1998, Michael Robinson wrote: > > > Bill Fenner writes: > > >You misunderstand. The fix is to accumulate mbufs in a chain until either > > >a) The protocol gets all of the data that it wanted, or > > >b) All of the data that the user has provided has been copied into mbufs. > > > > > >(b) is what sosend() used to do. The URL referenced (the one with > > >"vanj88" in it) describes why sosend() was changed to use only a single > > >mbuf at a time, but this performance problem was not envisioned at > > >the time. > > > > Ok, I misunderstood. But I still disagree it's a bug. Or, more precisely, > > it would be a bug if the socket API and the TCP protocol were seen as one > > inseparable entity, which is not the case. > > No, it really is a bug. > > It is inherently broken to write multiple packets for one write() when the > size of the write is far less than the MTU (well, the "effective MTU") > unless you have extreme extenuating circumstances. > > It may not be a bug covered by any spec, but for people trying to write > useful network apps it shoots them in the head. It is still a bug. > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-net" in the body of the message -- Justin C. Walker, Curmudgeon-At-Large * Institute for General Semantics | Manager, CoreOS Networking | Men are from Earth. Apple Computer, Inc. | Women are from Earth. 2 Infinite Loop | Deal with it. Cupertino, CA 95014 | *---------------------------------------*------------------------------------* To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message