From owner-freebsd-stable Mon Nov 16 10:04:06 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA09331 for freebsd-stable-outgoing; Mon, 16 Nov 1998 10:04:06 -0800 (PST) (envelope-from owner-freebsd-stable@FreeBSD.ORG) Received: from alive.znep.com (207-178-54-226.go2net.com [207.178.54.226]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA09256 for ; Mon, 16 Nov 1998 10:03:58 -0800 (PST) (envelope-from marcs@znep.com) Received: from localhost (marcs@localhost) by alive.znep.com (8.9.1/8.9.1) with ESMTP id JAA07995; Mon, 16 Nov 1998 09:59:37 -0800 (PST) (envelope-from marcs@znep.com) Date: Mon, 16 Nov 1998 09:59:37 -0800 (PST) From: Marc Slemko To: Michael Robinson cc: freebsd-stable@FreeBSD.ORG Subject: Re: writev() to tcp In-Reply-To: <199811161720.BAA26218@public.bta.net.cn> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 17 Nov 1998, Michael Robinson wrote: > I'm working with ORBit under FreeBSD 2.2.7. The ORBit IIOP driver makes > extensive use of writev. > > I've noticed this very wierd thing: with a total writev buffer of 160 > bytes, the tcp socket is sending the first 100 bytes (on a segment boundary), > waiting for a tcp ack on the socket, and then sending the remaining 60 bytes. > Here's a tcpdump: > > 01:11:33.789875 localhost.2358 > localhost.2359: P 1441:1541(100) ack 891 win 57344 (DF) > 01:11:33.970016 localhost.2359 > localhost.2358: . ack 1541 win 57344 (DF) > 01:11:33.970223 localhost.2358 > localhost.2359: P 1541:1601(60) ack 891 win 57344 (DF) > > Obviously, a gratuitous 200ms delay in the middle of every transaction is > not exactly what you want in your CORBA library. But the bigger question > in my mind is, why is the tcp socket flushing its buffer in the middle of > the writev, instead of at the end (or when the buffer gets full, whichever > comes first)? This doesn't really have anything to do with writev() in particular. There is a bug in the TCP code where a packet bigger than a single mbuf (MLEN == 108 bytes) but not big enough for a mbuf cluster (MINCLSIZE == 204 bytes) ends up being put into two mbufs that end up being put on the wire in two parts. There should be some messages about it in the archives. There are various fixes, but none has been made yet. Disabling nagle can work around some of the bad interactions between nagle and delayed ack in this case, but isn't a great solution in general. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message