From owner-freebsd-questions Fri Nov 15 15:25:25 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id PAA26107 for questions-outgoing; Fri, 15 Nov 1996 15:25:25 -0800 (PST) Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id PAA25872 for freebsd-hackers-digest-outgoing; Fri, 15 Nov 1996 15:23:08 -0800 (PST) Date: Fri, 15 Nov 1996 15:23:08 -0800 (PST) Message-Id: <199611152323.PAA25872@freefall.freebsd.org> From: owner-hackers-digest To: freebsd-hackers-digest@FreeBSD.ORG Subject: hackers-digest V1 #1646 Reply-To: hackers Sender: owner-questions@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk hackers-digest Friday, 15 November 1996 Volume 01 : Number 1646 In this issue: Re: Sockets question... Re: Sockets question... Re: Sockets question... Re: earlier "holographic shell" in 2.2-ALPHA install procedure Re: Sockets question... Re: Sockets question... Re: Sockets question... Re: Sockets question... Re: Sockets question... Re: Sockets question... Re: Sockets question... Re: Sockets question... ---------------------------------------------------------------------- From: Bill Fenner Date: Fri, 15 Nov 1996 14:02:30 PST Subject: Re: Sockets question... In message <199611151748.KAA26388@phaeton.artisoft.com> Terry wrote: >[long quote from socket man page] > >If you want to get technical, according to this description, if you are >using a SOCK_STREAM, then a read on a blocking socket will act like a >recv(2) or recvfrom(2) with flags MSG_WAITALL by default. That's a wild thing to get from that description. What I can get from that description is: - - data is not lost or duplicated. - - the connection is broken if data cannot be transmitted. - - the connection can be optionally send keepalives in the absence of data. - - an error is indicated if the keepalive fails. - - SIGPIPE means you wrote on a closed socket. What part talks about how a blocking read works? >Maybe you should be using SOCK_SEQPACKET instead of SOCK_STREAM? There is no mapping from SOCK_SEQPACKET to an IP protocol. Maybe there will be if the IETF standardizes SFRP , but there is not today. Bill ------------------------------ From: Terry Lambert Date: Fri, 15 Nov 1996 14:55:30 -0700 (MST) Subject: Re: Sockets question... > If I want to do a > > write(fd, buf, 1048576) > > on a socket connected via a 9600 baud SLIP link, I might expect the system > call to take around 1092 seconds. If I have a server process dealing with > two such sockets, response time will be butt slow if the server is > currently writing to the other socket... it has to wait for the write to > complete because write(2) has to finish sending the entire 1048576 bytes. Actually, write will return when the data has been copied into the local transmit buffers, not when it has actually been sent. It's only when you run out of local transmit buffers that the write blocks. And well it should: something needs to tell the server process to quit making calls which the kernel is unable to satisfy. Halting the server process based on resource unavailability does this. > So a clever software author does not do this. He has 1048576 bytes of > (different, even) data that he wants to write "simultaneously" to two > sockets. He wants to do the equivalent of Sun's > > aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL); > aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL); Yes. This is *exactly* what he wants to do. > Well how the hell do you do THAT if you are busy blocked in a write call? He uses a native aiowrite(). Or he wants to call a write from a thread dedicated to that client, which may block the thread, but not the process, and therefore not other writes. The underlying implementation may use non-blocking I/O, or it may use an OS implementation of aiowrote (like Sun's SunOS 4.3 LWP user space threads library provided). It doesn't matter. That's the point of using threads. > Well, you use non-blocking I/O... and you take advantage of the fact that > the OS is capable of buffering some data on your behalf. > > Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1" > and "len2" for the size of the corresponding buf's. > > You write code to do the following: > > rval = write(fd1, buf1, len1) # Wrote 2K of data > len1 -= rval; # 1046528 bytes remain > buf1 += rval; # Move forward 2K in buffer [ ... ] > You can trivially do this with a moderately complex select() mechanism, > so that the outbound buffers for both sockets are kept filled. This is exactly the finite state automaton I was talking about having to move into user space code in order to use the interface. It makes things more complex for the user space programmer. > A little hard to do without nonblocking sockets. Very useful. I don't > think that this is a "stupid idea" at all. Maybe not compared to being unable to do it at all... but BSD is not limited this way. We have threads. > > What is the point of a non-blocking write if this is what happens? > > I will leave that as your homework for tonite. Answer: for writes in a multiple client server. Extra credit: the failure case that originated this discussion was concerned with a client using read. > Please tell that to FreeBSD's FTP server, which uses a single (blocking) > write to perform delivery of data. > > Why should an application developer have to know or care what the available > buffer space is? Please tell me where in write(2) and read(2) it says I > must worry about this. > > It doesn't. Exactly my point on a socket read not returning until it completes. > > Indeterminate sockets are evil. They are on the order of not knowing > > your lock state when entering into a function that's going to need > > the lock held. > > I suppose you have never written a library function. > > I suppose you do not subscribe to the philosophy that you should be > liberal in what you accept (in this case, assume that you may need to > deal with either type of socket). If I wrote a library function which operated on a nonu user-opaque object like a socket set up by the user, then it would function for all potential valid states in which that object could be at the time of the call. For potential invalid states, I would trap the ones which I could identify from subfunction returns, and state that the behaviour for other invalid states was "undefined" in the documentation which I published with the library (ie: optimise for the success case). More likely, I would encapsulate the object using an opaque data type, and I would expect the users who wish to consume my interface to obtain an object of that type, operate on the object with my functions, and release the object when done. In other words, I would employ standard data encapsulation techniques. > I wonder if anyone has ever rewritten one of your programs, and made > a fundamental change that silently broke one of your programs because > an underlying concept was changed. Unlikely. I document my assumptions. > Any software author who writes code and does not perform reasonable > sanity checks on the return value, particularly for something as important > as the read and write system calls, is hanging a big sign around their > neck saying "Kick Me I Code Worth Shit". On the other hand, "do not test for an error condition which you can not handle". If as part of my rundown in a program, I go to close a file, and the close fails, what should I do about it? Not exit? Give me a break... > > It bothers me too... I am used to formatting my IPC data streams. I > > either use fixed length data units so that the receiver can post a > > fixed size read, or I use a fix length data unit, and guarantee write > > ordering by maintaining state. I do this in order to send a fixed > > length header to indicate that I'm writing a variable length packet, > > so the receiver can then issue a blocking read for the right size. > > I have never seen that work as expected with a large data size. I have never seen *any* IPC transport work (reliably) with large data sizes... depending on your definition of large. To deal with this, you can only encapsulate the transport and handle them, or don't use large data sizes in the first place. Terry Lambert terry@lambert.org - --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------ From: Terry Lambert Date: Fri, 15 Nov 1996 15:04:32 -0700 (MST) Subject: Re: Sockets question... > Ummm... and the problem is...? > > As far as I am aware, byte oriented data can be written to unaligned > addresses on any UNIX architecture that I have seen. It's not efficient. It may take multiple bus cycles each time you break for more data making it not-as-fast-as-it-could-be. It is ugly and inelegant. It offends our aesthetics. > xread is explicitly called with what is clearly a byte oriented buffer. You could make the same arguments about bcopy, but we've optimized it for alignment boundries anyway. > If you are possibly worried about something such as the atomicity of > reads (potentially valid in a threaded environment, or one using shared > memory), I agree that there may be some concern. Since it is not clear > to _me_ that such atomicity of access would be valid under the same > circumstances even with read(), I would probably code around the > situation anyways. > > Is there some other problem that I am missing? I've done this sort of > things for several years now... Non-shared memory, but using a mmap'ed region as the destination buffer. In this case, I want to validate the target address range once and copy direct out of the frag buffers into the user buffer. I can make similar arguments about write's of a mmap'ed region. I believe that an FTP server sending files to a client would qualify, after the initial descriptor header is sent, and the only thing left to send is the file data. Further, if tthe kernel detected this happening, it could asynchronusly complete, and delay the unmapping until the transmit was complete, tunring almost the entire transaction around in kernel space. The same thing goes for whole file downloads (ie: .EXE, .DLL, etc.) for DOS clients of a UNIX server. Terry Lambert terry@lambert.org - --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------ From: "Jordan K. Hubbard" Date: Fri, 15 Nov 1996 14:19:37 -0800 Subject: Re: earlier "holographic shell" in 2.2-ALPHA install procedure > I would like the ability to launch the "emergency > holographic shell" earlier in the install process, So would I. :-) Unfortunately, if you do that before the chroot is done then it's impossible to unmount the floppy and use the drive again for fixit or floppy installation. The EHS is started just as soon as it's possible for me to start it, I'm afraid. Jordan ------------------------------ From: Terry Lambert Date: Fri, 15 Nov 1996 15:10:53 -0700 (MST) Subject: Re: Sockets question... > >But on a blocking socket, it doesn't make sense to have to issue multiple > >system calls to read chunks of a whole message when you aren't going to > >do anything with it until all the reads have been satisfied? > > The CSRG apparently felt otherwise. They didn't have threads or aioread. Terry Lambert terry@lambert.org - --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------ From: Terry Lambert Date: Fri, 15 Nov 1996 15:09:53 -0700 (MST) Subject: Re: Sockets question... > No, Karl is doing this: > > 1) The *writer* is writing records of variable size with a prefix to > indicate how many byte(s) follow. > > 2) The writer does this ASSUMING that all of the records will get > delivered to the reader. > > 3) When the writer is done, he writes a "no more records follow" > flag record. > > 4) All of those writes return with no errors. > > 5) The READER gets about 2700 of the records (out of 8500!) and NEVER > SEES ANY MORE DATA. It hangs in read()! > > This does NOT happen with the 2.6.3 development kit and libraries. It > RELIABLY happens with -current. Is the data in #1 getting to the wire? Who is losing the data, the writer or the reader? If the reader, is it because of a buffer overflow? If so, is the reader acking for packets it does not agregate into the processes read buffer, or is the writer pretending he got ack's? What if the reader is 2.6.3 and the writer is -current? What if the situation is reversed? We need to localize the problem to the client or the server (if possible), and then localize the problem further to the kernel interface at which it is occurring. Terry Lambert terry@lambert.org - --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------ From: Karl Denninger Date: Fri, 15 Nov 1996 16:24:14 -0600 (CST) Subject: Re: Sockets question... > > No, Karl is doing this: > > > > 1) The *writer* is writing records of variable size with a prefix to > > indicate how many byte(s) follow. > > > > 2) The writer does this ASSUMING that all of the records will get > > delivered to the reader. > > > > 3) When the writer is done, he writes a "no more records follow" > > flag record. > > > > 4) All of those writes return with no errors. > > > > 5) The READER gets about 2700 of the records (out of 8500!) and NEVER > > SEES ANY MORE DATA. It hangs in read()! > > > > This does NOT happen with the 2.6.3 development kit and libraries. It > > RELIABLY happens with -current. > > Is the data in #1 getting to the wire? It happens with the local host on both sides (ie: connect back to the local hostname, in which case the wire isn't involved). > Who is losing the data, the writer or the reader? The writer; the reader never gets the data. > If the reader, is it because of a buffer overflow? The reader never sees it, and its NOT in the mbuf clusters (netstat -an shows nothing outstanding and the socket in a connected state for both sides). > If so, is the reader acking for packets it does not agregate into the > processes read buffer, or is the writer pretending he got ack's? See above; the writer never gets ACKs back (he only expects one at the end of the stream, and since the reader never sees the end record he never sends the ACK). > What if the reader is 2.6.3 and the writer is -current? You're dead. The writer is the one which is important; the reader is not. > What if the situation is reversed? See above. > We need to localize the problem to the client or the server (if possible), > and then localize the problem further to the kernel interface at which > it is occurring. > > Terry Lambert > terry@lambert.org Its on the writing end. Leaving all else alone and recompiling the writer with 2.7.x breaks, 2.6.3 works. - -- - -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity http://www.mcs.net/~karl | T1's from $600 monthly to FULL DS-3 Service | 33 Analog Prefixes, 13 ISDN, Web servers $75/mo Voice: [+1 312 803-MCS1 x219]| Email to "info@mcs.net" WWW: http://www.mcs.net/ Fax: [+1 312 248-9865] | 2 FULL DS-3 Internet links; 400Mbps B/W Internal ------------------------------ From: "John S. Dyson" Date: Fri, 15 Nov 1996 17:46:12 -0500 (EST) Subject: Re: Sockets question... > > No, Karl is doing this: > > 1) The *writer* is writing records of variable size with a prefix to > indicate how many byte(s) follow. > > 2) The writer does this ASSUMING that all of the records will get > delivered to the reader. > > 3) When the writer is done, he writes a "no more records follow" > flag record. > > 4) All of those writes return with no errors. > > 5) The READER gets about 2700 of the records (out of 8500!) and NEVER > SEES ANY MORE DATA. It hangs in read()! > > This does NOT happen with the 2.6.3 development kit and libraries. It > RELIABLY happens with -current. > I guess that Karl is the authority on what Karl is doing :-). This helps put some guessing to rest... Levity aside, Karl, what is the process state for the process that is hanging in read()? This might give the networking people a hint as to where the missing/broken spl or lock is occuring... John ------------------------------ From: "John S. Dyson" Date: Fri, 15 Nov 1996 17:46:58 -0500 (EST) Subject: Re: Sockets question... > > Its on the writing end. Leaving all else alone and recompiling the writer > with 2.7.x breaks, 2.6.3 works. > Ohhh.... cancel my last request for info about read()!!!! John ------------------------------ From: Terry Lambert Date: Fri, 15 Nov 1996 15:48:20 -0700 (MST) Subject: Re: Sockets question... > > > This does NOT happen with the 2.6.3 development kit and libraries. It > > > RELIABLY happens with -current. Ugh. This line was singularly unclear; "2.6.3 vs. -current". > Its on the writing end. Leaving all else alone and recompiling the writer > with 2.7.x breaks, 2.6.3 works. So it's the complier change that bit you. What optimization flags, etc., are you using? I would suggest turning off all optimization, and see if that fixes it. o If it does, isolate the offending code, file-by-file, by turning on optimization one file at a time. You may get bit more than onces, so you should iterate this process once you find one file, and back it out. This process will, if you go for half the remaining code at a time, take you log2(N) * M complies for N files and M places you get bitten. 8-(. o If it doesn't, then it is a generic problem in the code generator or a semantic change in an asm statement somewhere. You will need to mix and match compilers, with the same effects. I would suggest using two compilation directories and sapping the time dependencies to let you copy objects back and forth and link. Then it's cc -S time, and diff the -S files. 8-(. Terry Lambert terry@lambert.org - --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------ From: Joe Greco Date: Fri, 15 Nov 1996 17:00:53 -0600 (CST) Subject: Re: Sockets question... > > If I want to do a > > > > write(fd, buf, 1048576) > > > > on a socket connected via a 9600 baud SLIP link, I might expect the system > > call to take around 1092 seconds. If I have a server process dealing with > > two such sockets, response time will be butt slow if the server is > > currently writing to the other socket... it has to wait for the write to > > complete because write(2) has to finish sending the entire 1048576 bytes. > > Actually, write will return when the data has been copied into the > local transmit buffers, not when it has actually been sent. It's > only when you run out of local transmit buffers that the write blocks. Yes, that should be clear, I made it clear that this is precisely what allows non-blocking sockets to be useful in this scenario. > And well it should: something needs to tell the server process to > quit making calls which the kernel is unable to satisfy. Halting > the server process based on resource unavailability does this. So does returning EWOULDBLOCK to the server process, allowing the server to react to this by going on to service someone else. > > So a clever software author does not do this. He has 1048576 bytes of > > (different, even) data that he wants to write "simultaneously" to two > > sockets. He wants to do the equivalent of Sun's > > > > aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL); > > aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL); > > Yes. This is *exactly* what he wants to do. > > > Well how the hell do you do THAT if you are busy blocked in a write call? > > He uses a native aiowrite(). Which doesn't exist in a portable fashion. ANYWHERE. > Or he wants to call a write from a thread dedicated to that client, > which may block the thread, but not the process, and therefore not > other writes. Which is fine IF you have a threads implementation. Which is, again, not a given, and therefore, not portable. > The underlying implementation may use non-blocking I/O, or it may use > an OS implementation of aiowrote (like Sun's SunOS 4.3 LWP user space > threads library provided). It doesn't matter. That's the point of > using threads. Yes, well, the point of using threads is currently that you're not really assured of being portable. I do not disagree that in an ideal world, threads are a good way to deal with this. > > Well, you use non-blocking I/O... and you take advantage of the fact that > > the OS is capable of buffering some data on your behalf. > > > > Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1" > > and "len2" for the size of the corresponding buf's. > > > > You write code to do the following: > > > > rval = write(fd1, buf1, len1) # Wrote 2K of data > > len1 -= rval; # 1046528 bytes remain > > buf1 += rval; # Move forward 2K in buffer > [ ... ] > > You can trivially do this with a moderately complex select() mechanism, > > so that the outbound buffers for both sockets are kept filled. > > > This is exactly the finite state automaton I was talking about > having to move into user space code in order to use the interface. > > It makes things more complex for the user space programmer. So? Making things more complex is a small tradeoff if it makes it POSSIBLE to do something in the first place. Tell me, how else do you do this on a system that does NOT support threads? You can select() on writability and send one byte at a time on a blocking socket until select() reports no further writability. Poor solution. > > A little hard to do without nonblocking sockets. Very useful. I don't > > think that this is a "stupid idea" at all. > > Maybe not compared to being unable to do it at all... but BSD is not > limited this way. We have threads. _FREE_BSD is not limited this way. _FREE_BSD has threads. The local 4.3BSD Tahoe system (it _is_ a BSD system, I hope you would agree) offers nonblocking writes but does not offer threads. Ultrix does not offer threads. I am sure there are other examples... You are missing the point as usual. BSD != FreeBSD, and FreeBSD != UNIX in general. I am continually amazed that someone like you could make that error... In order to write portable code, one must write portable code. > > > What is the point of a non-blocking write if this is what happens? > > > > I will leave that as your homework for tonite. > > Answer: for writes in a multiple client server. Ahhhh. You got it. > Extra credit: the failure case that originated this discussion was > concerned with a client using read. That is not very relevant. The statement which originated _THIS_ discussion was your assertion that "Non-blocking sockets for reliable stream protocols like TCP/IP are a stupid idea." I do not care about Karl's problem... he may well have a legitimate problem, and I agreed that it was probably beyond the scope of a usage discussion given his description. I do not care about Marc's problem... that is a separate issue. I am simply correcting a misconception that you are spreading that non-blocking sockets are a "stupid idea". > > Please tell that to FreeBSD's FTP server, which uses a single (blocking) > > write to perform delivery of data. > > > > Why should an application developer have to know or care what the available > > buffer space is? Please tell me where in write(2) and read(2) it says I > > must worry about this. > > > > It doesn't. > > Exactly my point on a socket read not returning until it completes. Yes, that's fine. I agree that there are merits on both sides. The read() returning what is available is probably more generally useful, and that seems to be what is implemented. I am not going to argue with the design and implementation of the Berkeley networking code, since it is widely considered to be the standard model for networking. Most other folks have not found this to be a critical design flaw, and neither do I. I can see several cases where a blocking read() call would be a substantial nuisance, and so I think that the behaviour as it exists makes a fair amount of sense. > > > Indeterminate sockets are evil. They are on the order of not knowing > > > your lock state when entering into a function that's going to need > > > the lock held. > > > > I suppose you have never written a library function. > > > > I suppose you do not subscribe to the philosophy that you should be > > liberal in what you accept (in this case, assume that you may need to > > deal with either type of socket). > > If I wrote a library function which operated on a nonu user-opaque > object like a socket set up by the user, then it would function for > all potential valid states in which that object could be at the time > of the call. For potential invalid states, I would trap the ones > which I could identify from subfunction returns, and state that the > behaviour for other invalid states was "undefined" in the documentation > which I published with the library (ie: optimise for the success case). What do you define "potential valid states" to be? I do not claim to cover all the bases all the time, but I do at least catch exceptional conditions I was not expecting. In my case, I would try to write a socket-handling library function to handle both blocking and non-blocking sockets if it was reasonably practical to do so. If not, I would cause it to bomb if it detected something odd. I think you are saying the same thing: that is good. > More likely, I would encapsulate the object using an opaque data > type, and I would expect the users who wish to consume my interface > to obtain an object of that type, operate on the object with my > functions, and release the object when done. In other words, I > would employ standard data encapsulation techniques. Nifty. That's even possible in many cases if you are designing from scratch. Otherwise, it is a real pain in the butt. > > I wonder if anyone has ever rewritten one of your programs, and made > > a fundamental change that silently broke one of your programs because > > an underlying concept was changed. > > Unlikely. I document my assumptions. So what? If I, as the engineer who replaces you five years down the road, decide that your program needs to use non-blocking writes, and I change the program to do them, and I miss one place where you failed to check a return value, your "documented assumptions" are worth diddly squat. Code your assumptions when they are this trivial to check. > > Any software author who writes code and does not perform reasonable > > sanity checks on the return value, particularly for something as important > > as the read and write system calls, is hanging a big sign around their > > neck saying "Kick Me I Code Worth Shit". > > On the other hand, "do not test for an error condition which you can > not handle". One can handle ANY error condition by bringing it to the attention of a higher authority. My UNIX kernel panicks when it hits a condition that it does not know how to handle. It does not foolishly take your advice and "do not test for an error condition which you can not handle". To do so would risk great havoc. You ALWAYS test for error conditions, PARTICULARLY the ones which you can not handle - because they are the really scary ones. > If as part of my rundown in a program, I go to close a file, and the > close fails, what should I do about it? Not exit? Give me a break... No, but if a close() fails, and you had a reasonable expectation for it to succeed, printing a warning is not unreasonable. According to SunOS, there are two reasons this could happen: EBADF and EINTR. If you are closing an inactive descriptor, it is clearly an error in the code, and I WOULD CERTAINLY WANT TO KNOW. If it is due to a signal, it is unclear what to do, but it is certainly not a "bad" idea to at least be aware that such a thing can (and has) happened! > > > It bothers me too... I am used to formatting my IPC data streams. I > > > either use fixed length data units so that the receiver can post a > > > fixed size read, or I use a fix length data unit, and guarantee write > > > ordering by maintaining state. I do this in order to send a fixed > > > length header to indicate that I'm writing a variable length packet, > > > so the receiver can then issue a blocking read for the right size. > > > > I have never seen that work as expected with a large data size. > > I have never seen *any* IPC transport work (reliably) with large data > sizes... depending on your definition of large. To deal with this, > you can only encapsulate the transport and handle them, or don't use > large data sizes in the first place. Okay, here we are in complete agreement. One _always_ needs to be aware of this, then. ... JG ------------------------------ From: Joe Greco Date: Fri, 15 Nov 1996 17:06:28 -0600 (CST) Subject: Re: Sockets question... > > Ummm... and the problem is...? > > > > As far as I am aware, byte oriented data can be written to unaligned > > addresses on any UNIX architecture that I have seen. > > It's not efficient. It may take multiple bus cycles each time you > break for more data making it not-as-fast-as-it-could-be. It is ugly > and inelegant. It offends our aesthetics. Tough doodles. It is a fact of life. However, as it is the unusual case, it is probably not a real big performance deal. > > xread is explicitly called with what is clearly a byte oriented buffer. > > You could make the same arguments about bcopy, but we've optimized > it for alignment boundries anyway. Yeah... which has what to do with what? > > If you are possibly worried about something such as the atomicity of > > reads (potentially valid in a threaded environment, or one using shared > > memory), I agree that there may be some concern. Since it is not clear > > to _me_ that such atomicity of access would be valid under the same > > circumstances even with read(), I would probably code around the > > situation anyways. > > > > Is there some other problem that I am missing? I've done this sort of > > things for several years now... > > Non-shared memory, but using a mmap'ed region as the destination buffer. > In this case, I want to validate the target address range once and > copy direct out of the frag buffers into the user buffer. And you can't, because you may have to perform multiple read() calls, is that your objection? I guess I don't really see how that is a major earth shaking crisis. Well, fine, go fix it. While you are at it, break FreeBSD's BSD compatibility. > I can make similar arguments about write's of a mmap'ed region. > > I believe that an FTP server sending files to a client would qualify, > after the initial descriptor header is sent, and the only thing left > to send is the file data. Yes. So use a blocking socket, and arrange not to be disturbed by signals, and I think you are probably in pretty good shape. > Further, if tthe kernel detected this happening, it could asynchronusly > complete, and delay the unmapping until the transmit was complete, > tunring almost the entire transaction around in kernel space. > > The same thing goes for whole file downloads (ie: .EXE, .DLL, etc.) > for DOS clients of a UNIX server. Point being? ... JG ------------------------------ End of hackers-digest V1 #1646 ******************************