From owner-freebsd-hackers Fri Nov 15 12:19:02 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id MAA13298 for hackers-outgoing; Fri, 15 Nov 1996 12:19:02 -0800 (PST) Received: from brasil.moneng.mei.com (brasil.moneng.mei.com [151.186.109.160]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id MAA13292 for ; Fri, 15 Nov 1996 12:19:00 -0800 (PST) Received: (from jgreco@localhost) by brasil.moneng.mei.com (8.7.Beta.1/8.7.Beta.1) id OAA28769; Fri, 15 Nov 1996 14:14:47 -0600 From: Joe Greco Message-Id: <199611152014.OAA28769@brasil.moneng.mei.com> Subject: Re: Sockets question... To: terry@lambert.org (Terry Lambert) Date: Fri, 15 Nov 1996 14:14:47 -0600 (CST) Cc: jgreco@brasil.moneng.mei.com, terry@lambert.org, jdp@polstra.com, scrappy@ki.net, hackers@FreeBSD.org In-Reply-To: <199611151748.KAA26388@phaeton.artisoft.com> from "Terry Lambert" at Nov 15, 96 10:48:35 am X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-hackers@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > > > The problem that is supposedly being addressed by looking at the bytes > > > written is knowing that the data will be available as a unit to the > > > reader. > > > > Wrong, Terry. > > > > The problem that is supposedly being addressed (and as the person who > > wrote the advice, I am telling you indisputably that this is what is > > being addressed) is that sometimes, people will forget that they are > > writing to a particular type of socket (such as {,non}blocking) and > > will inadvertently forget to check to see if all the data was written. > > Non-blocking sockets for reliable stream protocols like TCP/IP are > a stupid idea. I am glad that you can make a broad, sweeping generalization such as that. I guess if I had a lot of faith in the portability of threads, I might agree that that model is not too useful. However, until you can show me a way to write a select() based server that writes data to a client and provides guarantees against blocking on a write() call, I will ask you to kindly avoid making such stupid statements about network programming paradigms. The whole world is NOT broken (although many parts of it are). > If I wanted datagrams, I would pick a protocol like UDP. If I wanted unreliable packet delivery, I would pick a protocol like UDP. If I wanted reliable stream oriented data delivery, I would pick a protocol like TCP. If I wanted reliable stream oriented data delivery, while still being able to provide guarantees of availability of the server process, which is closer to being able to provide it? > > In the case of a nonblocking socket, the test is mandatory. > > What is this, a bizzarre method of dealing with source quench on TCP? No. It is a method of testing to see how much data actually made it into the pipe. If I want to do a write(fd, buf, 1048576) on a socket connected via a 9600 baud SLIP link, I might expect the system call to take around 1092 seconds. If I have a server process dealing with two such sockets, response time will be butt slow if the server is currently writing to the other socket... it has to wait for the write to complete because write(2) has to finish sending the entire 1048576 bytes. So a clever software author does not do this. He has 1048576 bytes of (different, even) data that he wants to write "simultaneously" to two sockets. He wants to do the equivalent of Sun's aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL); aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL); Well how the hell do you do THAT if you are busy blocked in a write call? Well, you use non-blocking I/O... and you take advantage of the fact that the OS is capable of buffering some data on your behalf. Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1" and "len2" for the size of the corresponding buf's. You write code to do the following: rval = write(fd1, buf1, len1) # Wrote 2K of data len1 -= rval; # 1046528 bytes remain buf1 += rval; # Move forward 2K in buffer rval = write(fd2, buf2, len2) # Wrote 3K of data len2 -= rval; # 1045504 bytes remain buf2 += rval; # Move forward 3K in buffer rval = write(fd1, buf1, len1) # Wrote 1K of data len1 -= rval; # 1045504 bytes remain buf1 += rval; # Move forward 1K in buffer rval = write(fd2, buf2, len2) # Wrote 1K of data len2 -= rval; # 1044480 bytes remain buf2 += rval; # Move forward 1K in buffer You can trivially do this with a moderately complex select() mechanism, so that the outbound buffers for both sockets are kept filled. A little hard to do without nonblocking sockets. Very useful. I don't think that this is a "stupid idea" at all. > What is the point of a non-blocking write if this is what happens? I will leave that as your homework for tonite. > I assume you would use this for "frigging huge writes which you expect > to exceed the available buffer space"? This is potentially useful > for lazy programmers, and for directed finite state automatons. On Please tell that to FreeBSD's FTP server, which uses a single (blocking) write to perform delivery of data. Why should an application developer have to know or care what the available buffer space is? Please tell me where in write(2) and read(2) it says I must worry about this. It doesn't. > Otherwise, you have just un-formatted your transport contents. 8-(. > > > > In the case of an indeterminate socket, the test is also mandatory - > > precisely BECAUSE you don't know. > > Indeterminate sockets are evil. They are on the order of not knowing > your lock state when entering into a function that's going to need > the lock held. I suppose you have never written a library function. I suppose you do not subscribe to the philosophy that you should be liberal in what you accept (in this case, assume that you may need to deal with either type of socket). I wonder if anyone has ever rewritten one of your programs, and made a fundamental change that silently broke one of your programs because an underlying concept was changed. Any software author who writes code and does not perform reasonable sanity checks on the return value, particularly for something as important as the read and write system calls, is hanging a big sign around their neck saying "Kick Me I Code Worth Shit". > > I am not too sure that statement is true... but then I am a paranoid > > programmer, so I always define an xread() function guaranteed to do what > > I mean. Still, that bothers me... > > It bothers me too... I am used to formatting my IPC data streams. I > either use fixed length data units so that the receiver can post a > fixed size read, or I use a fix length data unit, and guarantee write > ordering by maintaining state. I do this in order to send a fixed > length header to indicate that I'm writing a variable length packet, > so the receiver can then issue a blocking read for the right size. I have never seen that work as expected with a large data size. > > > Instead of making a non-blocking read for which "it's OK if no data > > > is available", use select() and only call a blocking read if the select > > > is true. > > > > And I think this is what I was looking for... > > > > If you have a blocking read, select() returns true on a FD because one > > byte is available, and you try a read(1000), will it block? > > Yes. It will block pending all 1000 bytes being available. Wrong. > > I am reasonably certain that it will not - it will return the one byte. > > > > Ahhh. Yes. > > > > % man 2 read > > (SunOS version) > > > > Upon successful completion, read() and readv() return the > > number of bytes actually read and placed in the buffer. The > > > > Sun Release 4.1 Last change: 21 January 1990 1 > > > > READ(2V) SYSTEM CALLS READ(2V) > > > > system guarantees to read the number of bytes requested if > > the descriptor references a normal file which has that many > > bytes left before the EOF (end of file), but in no other > > case. > > > > Key words, "but in no other case".. > > That isn't the same as "and guarantees to return random numbers in all > other cases". 8-). Sigh. > The question, I suppose, is whether pending packet data less than the > packet length from the transmitter is fragged. > > To answer that, I refer you to socket(2). > > Specifically, you are describing the difference between SOCK_STREAM > and SOCK_SEQPACKET... the short reads you are defining are for the > SOCK_SEQPACKET case. From the man page: > > The communications protocols used to implement a SOCK_STREAM insure that > data is not lost or duplicated. If a piece of data for which the peer > protocol has buffer space cannot be successfully transmitted within a > reasonable length of time, then the connection is considered broken and > calls will indicate an error with -1 returns and with ETIMEDOUT as the > specific code in the global variable errno. The protocols optionally keep > sockets ``warm'' by forcing transmissions roughly every minute in the ab- > sence of other activity. An error is then indicated if no response can > be elicited on an otherwise idle connection for a extended period (e.g. 5 > minutes). A SIGPIPE signal is raised if a process sends on a broken > stream; this causes naive processes, which do not handle the signal, to > exit. > > If you want to get technical, according to this description, if you are > using a SOCK_STREAM, then a read on a blocking socket will act like a > recv(2) or recvfrom(2) with flags MSG_WAITALL by default. HUH? Where the hell do you get THAT from? Sigh. Terry, take the weekend off. You are obviously a few cards short of a full deck :-) ... JG