From owner-freebsd-questions  Fri Nov 15 15:25:25 1996
Return-Path: owner-questions
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id PAA26107
          for questions-outgoing; Fri, 15 Nov 1996 15:25:25 -0800 (PST)
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id PAA25872
          for freebsd-hackers-digest-outgoing; Fri, 15 Nov 1996 15:23:08 -0800 (PST)
Date: Fri, 15 Nov 1996 15:23:08 -0800 (PST)
Message-Id: <199611152323.PAA25872@freefall.freebsd.org>
From: owner-hackers-digest
To: freebsd-hackers-digest@FreeBSD.ORG
Subject:   hackers-digest V1 #1646
Reply-To: hackers
Sender: owner-questions@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


hackers-digest            Friday, 15 November 1996      Volume 01 : Number 1646

In this issue:
Re: Sockets question... 
Re: Sockets question...
Re: Sockets question...
Re: earlier "holographic shell" in 2.2-ALPHA install procedure 
Re: Sockets question...
Re: Sockets question...
Re: Sockets question...
Re: Sockets question...
Re: Sockets question...
Re: Sockets question...
Re: Sockets question...
Re: Sockets question...

----------------------------------------------------------------------

From: Bill Fenner <fenner@parc.xerox.com>
Date: Fri, 15 Nov 1996 14:02:30 PST
Subject: Re: Sockets question... 

In message <199611151748.KAA26388@phaeton.artisoft.com> Terry wrote:
>[long quote from socket man page]
>
>If you want to get technical, according to this description, if you are
>using a SOCK_STREAM, then a read on a blocking socket will act like a
>recv(2) or recvfrom(2) with flags MSG_WAITALL by default.

That's a wild thing to get from that description.  What I can get from that 
description is:
- - data is not lost or duplicated.
- - the connection is broken if data cannot be transmitted.
- - the connection can be optionally send keepalives in the absence of data.
- - an error is indicated if the keepalive fails.
- - SIGPIPE means you wrote on a closed socket.

What part talks about how a blocking read works?

>Maybe you should be using SOCK_SEQPACKET instead of SOCK_STREAM?

There is no mapping from SOCK_SEQPACKET to an IP protocol.  Maybe there will
be if the IETF standardizes SFRP <draft-odell-srfp-00.txt>, but there is not 
today.

  Bill


------------------------------

From: Terry Lambert <terry@lambert.org>
Date: Fri, 15 Nov 1996 14:55:30 -0700 (MST)
Subject: Re: Sockets question...

> If I want to do a 
> 
> write(fd, buf, 1048576)
> 
> on a socket connected via a 9600 baud SLIP link, I might expect the system
> call to take around 1092 seconds.  If I have a server process dealing with
> two such sockets, response time will be butt slow if the server is
> currently writing to the other socket...  it has to wait for the write to
> complete because write(2) has to finish sending the entire 1048576 bytes.

Actually, write will return when the data has been copied into the
local transmit buffers, not when it has actually been sent.  It's
only when you run out of local transmit buffers that the write blocks.

And well it should: something needs to tell the server process to
quit making calls which the kernel is unable to satisfy.  Halting
the server process based on resource unavailability does this.


> So a clever software author does not do this.  He has 1048576 bytes of
> (different, even) data that he wants to write "simultaneously" to two
> sockets.  He wants to do the equivalent of Sun's
> 
> aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL);
> aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL);

Yes.  This is *exactly* what he wants to do.

> Well how the hell do you do THAT if you are busy blocked in a write call?

He uses a native aiowrite().

Or he wants to call a write from a thread dedicated to that client,
which may block the thread, but not the process, and therefore not
other writes.

The underlying implementation may use non-blocking I/O, or it may use
an OS implementation of aiowrote (like Sun's SunOS 4.3 LWP user space
threads library provided).  It doesn't matter.  That's the point of
using threads.


> Well, you use non-blocking I/O...  and you take advantage of the fact that
> the OS is capable of buffering some data on your behalf.
> 
> Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1"
> and "len2" for the size of the corresponding buf's.
> 
> You write code to do the following:
> 
> rval = write(fd1, buf1, len1)		# Wrote 2K of data
> len1 -= rval;				# 1046528 bytes remain
> buf1 += rval;				# Move forward 2K in buffer
[ ... ]
> You can trivially do this with a moderately complex select() mechanism,
> so that the outbound buffers for both sockets are kept filled.


This is exactly the finite state automaton I was talking about
having to move into user space code in order to use the interface.

It makes things more complex for the user space programmer.


> A little hard to do without nonblocking sockets.  Very useful.  I don't
> think that this is a "stupid idea" at all.

Maybe not compared to being unable to do it at all... but BSD is not
limited this way.  We have threads.


> > What is the point of a non-blocking write if this is what happens?
> 
> I will leave that as your homework for tonite.

Answer:		for writes in a multiple client server.
Extra credit:	the failure case that originated this discussion was
		concerned with a client using read.

> Please tell that to FreeBSD's FTP server, which uses a single (blocking)
> write to perform delivery of data.
> 
> Why should an application developer have to know or care what the available
> buffer space is?  Please tell me where in write(2) and read(2) it says I
> must worry about this.
> 
> It doesn't.

Exactly my point on a socket read not returning until it completes.

> > Indeterminate sockets are evil.  They are on the order of not knowing
> > your lock state when entering into a function that's going to need
> > the lock held.
> 
> I suppose you have never written a library function.
> 
> I suppose you do not subscribe to the philosophy that you should be
> liberal in what you accept (in this case, assume that you may need to
> deal with either type of socket).

If I wrote a library function which operated on a nonu user-opaque
object like a socket set up by the user, then it would function for
all potential valid states in which that object could be at the time
of the call.  For potential invalid states, I would trap the ones
which I could identify from subfunction returns, and state that the
behaviour for other invalid states was "undefined" in the documentation
which I published with the library (ie: optimise for the success case).


More likely, I would encapsulate the object using an opaque data
type, and I would expect the users who wish to consume my interface
to obtain an object of that type, operate on the object with my
functions, and release the object when done.  In other words, I
would employ standard data encapsulation techniques.


> I wonder if anyone has ever rewritten one of your programs, and made
> a fundamental change that silently broke one of your programs because
> an underlying concept was changed.

Unlikely.  I document my assumptions.


> Any software author who writes code and does not perform reasonable
> sanity checks on the return value, particularly for something as important
> as the read and write system calls, is hanging a big sign around their
> neck saying "Kick Me I Code Worth Shit".

On the other hand, "do not test for an error condition which you can
not handle".

If as part of my rundown in a program, I go to close a file, and the
close fails, what should I do about it?  Not exit?  Give me a break...

> > It bothers me too... I am used to formatting my IPC data streams.  I
> > either use fixed length data units so that the receiver can post a
> > fixed size read, or I use a fix length data unit, and guarantee write
> > ordering by maintaining state. I do this in order to send a fixed
> > length header to indicate that I'm writing a variable length packet,
> > so the receiver can then issue a blocking read for the right size.
> 
> I have never seen that work as expected with a large data size.

I have never seen *any* IPC transport work (reliably) with large data
sizes... depending on your definition of large.  To deal with this,
you can only encapsulate the transport and handle them, or don't use
large data sizes in the first place.


					Terry Lambert
					terry@lambert.org
- ---
Any opinions in this posting are my own and not those of my present
or previous employers.

------------------------------

From: Terry Lambert <terry@lambert.org>
Date: Fri, 15 Nov 1996 15:04:32 -0700 (MST)
Subject: Re: Sockets question...

> Ummm... and the problem is...?
> 
> As far as I am aware, byte oriented data can be written to unaligned
> addresses on any UNIX architecture that I have seen.

It's not efficient.  It may take multiple bus cycles each time you
break for more data making it not-as-fast-as-it-could-be.  It is ugly
and inelegant.  It offends our aesthetics.

> xread is explicitly called with what is clearly a byte oriented buffer.

You could make the same arguments about bcopy, but we've optimized
it for alignment boundries anyway.

> If you are possibly worried about something such as the atomicity of
> reads (potentially valid in a threaded environment, or one using shared
> memory), I agree that there may be some concern.  Since it is not clear
> to _me_ that such atomicity of access would be valid under the same
> circumstances even with read(), I would probably code around the
> situation anyways.
> 
> Is there some other problem that I am missing?  I've done this sort of
> things for several years now...

Non-shared memory, but using a mmap'ed region as the destination buffer.
In this case, I want to validate the target address range once and
copy direct out of the frag buffers into the user buffer.

I can make similar arguments about write's of a mmap'ed region.

I believe that an FTP server sending files to a client would qualify,
after the initial descriptor header is sent, and the only thing left
to send is the file data.

Further, if tthe kernel detected this happening, it could asynchronusly
complete, and delay the unmapping until the transmit was complete,
tunring almost the entire transaction around in kernel space.

The same thing goes for whole file downloads (ie: .EXE, .DLL, etc.)
for DOS clients of a UNIX server.


					Terry Lambert
					terry@lambert.org
- ---
Any opinions in this posting are my own and not those of my present
or previous employers.

------------------------------

From: "Jordan K. Hubbard" <jkh@time.cdrom.com>
Date: Fri, 15 Nov 1996 14:19:37 -0800
Subject: Re: earlier "holographic shell" in 2.2-ALPHA install procedure 

> I would like the ability to launch the "emergency
> holographic shell" earlier in the install process,

So would I. :-)

Unfortunately, if you do that before the chroot is done then it's
impossible to unmount the floppy and use the drive again for fixit or
floppy installation.  The EHS is started just as soon as it's possible
for me to start it, I'm afraid.

					Jordan

------------------------------

From: Terry Lambert <terry@lambert.org>
Date: Fri, 15 Nov 1996 15:10:53 -0700 (MST)
Subject: Re: Sockets question...

> >But on a blocking socket, it doesn't make sense to have to issue multiple
> >system calls to read chunks of a whole message when you aren't going to
> >do anything with it until all the reads have been satisfied?
> 
> The CSRG apparently felt otherwise.

They didn't have threads or aioread.


					Terry Lambert
					terry@lambert.org
- ---
Any opinions in this posting are my own and not those of my present
or previous employers.

------------------------------

From: Terry Lambert <terry@lambert.org>
Date: Fri, 15 Nov 1996 15:09:53 -0700 (MST)
Subject: Re: Sockets question...

> No, Karl is doing this:
> 
> 1)	The *writer* is writing records of variable size with a prefix to
> 	indicate how many byte(s) follow.
> 
> 2)	The writer does this ASSUMING that all of the records will get
> 	delivered to the reader.
> 
> 3)	When the writer is done, he writes a "no more records follow"
> 	flag record.
> 
> 4)	All of those writes return with no errors.
> 
> 5)	The READER gets about 2700 of the records (out of 8500!) and NEVER
> 	SEES ANY MORE DATA.  It hangs in read()!
> 
> This does NOT happen with the 2.6.3 development kit and libraries.  It
> RELIABLY happens with -current.

Is the data in #1 getting to the wire?

Who is losing the data, the writer or the reader?

If the reader, is it because of a buffer overflow?

If so, is the reader acking for packets it does not agregate into the
processes read buffer, or is the writer pretending he got ack's?

What if the reader is 2.6.3 and the writer is -current?

What if the situation is reversed?


We need to localize the problem to the client or the server (if possible),
and then localize the problem further to the kernel interface at which
it is occurring.


					Terry Lambert
					terry@lambert.org
- ---
Any opinions in this posting are my own and not those of my present
or previous employers.

------------------------------

From: Karl Denninger  <karl@Mcs.Net>
Date: Fri, 15 Nov 1996 16:24:14 -0600 (CST)
Subject: Re: Sockets question...

> > No, Karl is doing this:
> > 
> > 1)	The *writer* is writing records of variable size with a prefix to
> > 	indicate how many byte(s) follow.
> > 
> > 2)	The writer does this ASSUMING that all of the records will get
> > 	delivered to the reader.
> > 
> > 3)	When the writer is done, he writes a "no more records follow"
> > 	flag record.
> > 
> > 4)	All of those writes return with no errors.
> > 
> > 5)	The READER gets about 2700 of the records (out of 8500!) and NEVER
> > 	SEES ANY MORE DATA.  It hangs in read()!
> > 
> > This does NOT happen with the 2.6.3 development kit and libraries.  It
> > RELIABLY happens with -current.
> 
> Is the data in #1 getting to the wire?

It happens with the local host on both sides (ie: connect back to the local
hostname, in which case the wire isn't involved).

> Who is losing the data, the writer or the reader?

The writer; the reader never gets the data.

> If the reader, is it because of a buffer overflow?

The reader never sees it, and its NOT in the mbuf clusters (netstat -an
shows nothing outstanding and the socket in a connected state for both sides).

> If so, is the reader acking for packets it does not agregate into the
> processes read buffer, or is the writer pretending he got ack's?

See above; the writer never gets ACKs back (he only expects one at the end
of the stream, and since the reader never sees the end record he never sends
the ACK).

> What if the reader is 2.6.3 and the writer is -current?

You're dead.  The writer is the one which is important; the reader is not.

> What if the situation is reversed?

See above.

> We need to localize the problem to the client or the server (if possible),
> and then localize the problem further to the kernel interface at which
> it is occurring.
> 
> 					Terry Lambert
> 					terry@lambert.org

Its on the writing end.  Leaving all else alone and recompiling the writer
with 2.7.x breaks, 2.6.3 works.

- --
- --
Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity
http://www.mcs.net/~karl     | T1's from $600 monthly to FULL DS-3 Service
			     | 33 Analog Prefixes, 13 ISDN, Web servers $75/mo
Voice: [+1 312 803-MCS1 x219]| Email to "info@mcs.net" WWW: http://www.mcs.net/
Fax:   [+1 312 248-9865]     | 2 FULL DS-3 Internet links; 400Mbps B/W Internal

------------------------------

From: "John S. Dyson" <toor@dyson.iquest.net>
Date: Fri, 15 Nov 1996 17:46:12 -0500 (EST)
Subject: Re: Sockets question...

> 
> No, Karl is doing this:
> 
> 1)	The *writer* is writing records of variable size with a prefix to
> 	indicate how many byte(s) follow.
> 
> 2)	The writer does this ASSUMING that all of the records will get
> 	delivered to the reader.
> 
> 3)	When the writer is done, he writes a "no more records follow"
> 	flag record.
> 
> 4)	All of those writes return with no errors.
> 
> 5)	The READER gets about 2700 of the records (out of 8500!) and NEVER
> 	SEES ANY MORE DATA.  It hangs in read()!
> 
> This does NOT happen with the 2.6.3 development kit and libraries.  It
> RELIABLY happens with -current.
> 
I guess that Karl is the authority on what Karl is doing :-).  This
helps put some guessing to rest...  Levity aside, Karl, what is the
process state for the process that is hanging in read()?  This might
give the networking people a hint as to where the missing/broken spl
or lock is occuring...

John


------------------------------

From: "John S. Dyson" <toor@dyson.iquest.net>
Date: Fri, 15 Nov 1996 17:46:58 -0500 (EST)
Subject: Re: Sockets question...

> 
> Its on the writing end.  Leaving all else alone and recompiling the writer
> with 2.7.x breaks, 2.6.3 works.
> 
Ohhh.... cancel my last request for info about read()!!!!

John

------------------------------

From: Terry Lambert <terry@lambert.org>
Date: Fri, 15 Nov 1996 15:48:20 -0700 (MST)
Subject: Re: Sockets question...

> > > This does NOT happen with the 2.6.3 development kit and libraries.  It
> > > RELIABLY happens with -current.

Ugh.  This line was singularly unclear; "2.6.3 vs. -current".


> Its on the writing end.  Leaving all else alone and recompiling the writer
> with 2.7.x breaks, 2.6.3 works.

So it's the complier change that bit you.

What optimization flags, etc., are you using?

I would suggest turning off all optimization, and see if that fixes it.

o	If it does, isolate the offending code, file-by-file, by turning
	on optimization one file at a time.  You may get bit more than
	onces, so you should iterate this process once you find one file,
	and back it out.  This process will, if you go for half the
	remaining code at a time, take you log2(N) * M complies for N
	files and M places you get bitten.  8-(.

o	If it doesn't, then it is a generic problem in the code generator
	or a semantic change in an asm statement somewhere.  You will
	need to mix and match compilers, with the same effects.  I would
	suggest using two compilation directories and sapping the time
	dependencies to let you copy objects back and forth and link.

Then it's cc -S time, and diff the -S files.  8-(.


					Terry Lambert
					terry@lambert.org
- ---
Any opinions in this posting are my own and not those of my present
or previous employers.

------------------------------

From: Joe Greco <jgreco@brasil.moneng.mei.com>
Date: Fri, 15 Nov 1996 17:00:53 -0600 (CST)
Subject: Re: Sockets question...

> > If I want to do a 
> > 
> > write(fd, buf, 1048576)
> > 
> > on a socket connected via a 9600 baud SLIP link, I might expect the system
> > call to take around 1092 seconds.  If I have a server process dealing with
> > two such sockets, response time will be butt slow if the server is
> > currently writing to the other socket...  it has to wait for the write to
> > complete because write(2) has to finish sending the entire 1048576 bytes.
> 
> Actually, write will return when the data has been copied into the
> local transmit buffers, not when it has actually been sent.  It's
> only when you run out of local transmit buffers that the write blocks.

Yes, that should be clear, I made it clear that this is precisely what
allows non-blocking sockets to be useful in this scenario.

> And well it should: something needs to tell the server process to
> quit making calls which the kernel is unable to satisfy.  Halting
> the server process based on resource unavailability does this.

So does returning EWOULDBLOCK to the server process, allowing the server
to react to this by going on to service someone else.

> > So a clever software author does not do this.  He has 1048576 bytes of
> > (different, even) data that he wants to write "simultaneously" to two
> > sockets.  He wants to do the equivalent of Sun's
> > 
> > aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL);
> > aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL);
> 
> Yes.  This is *exactly* what he wants to do.
> 
> > Well how the hell do you do THAT if you are busy blocked in a write call?
> 
> He uses a native aiowrite().

Which doesn't exist in a portable fashion.  ANYWHERE.

> Or he wants to call a write from a thread dedicated to that client,
> which may block the thread, but not the process, and therefore not
> other writes.

Which is fine IF you have a threads implementation.  Which is, again, not
a given, and therefore, not portable.

> The underlying implementation may use non-blocking I/O, or it may use
> an OS implementation of aiowrote (like Sun's SunOS 4.3 LWP user space
> threads library provided).  It doesn't matter.  That's the point of
> using threads.

Yes, well, the point of using threads is currently that you're not really 
assured of being portable.

I do not disagree that in an ideal world, threads are a good way to deal
with this.

> > Well, you use non-blocking I/O...  and you take advantage of the fact that
> > the OS is capable of buffering some data on your behalf.
> > 
> > Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1"
> > and "len2" for the size of the corresponding buf's.
> > 
> > You write code to do the following:
> > 
> > rval = write(fd1, buf1, len1)		# Wrote 2K of data
> > len1 -= rval;				# 1046528 bytes remain
> > buf1 += rval;				# Move forward 2K in buffer
> [ ... ]
> > You can trivially do this with a moderately complex select() mechanism,
> > so that the outbound buffers for both sockets are kept filled.
> 
> 
> This is exactly the finite state automaton I was talking about
> having to move into user space code in order to use the interface.
> 
> It makes things more complex for the user space programmer.

So?  Making things more complex is a small tradeoff if it makes it POSSIBLE
to do something in the first place.

Tell me, how else do you do this on a system that does NOT support threads?

You can select() on writability and send one byte at a time on a blocking
socket until select() reports no further writability.  Poor solution.

> > A little hard to do without nonblocking sockets.  Very useful.  I don't
> > think that this is a "stupid idea" at all.
> 
> Maybe not compared to being unable to do it at all... but BSD is not
> limited this way.  We have threads.

_FREE_BSD is not limited this way.  _FREE_BSD has threads.  The local
4.3BSD Tahoe system (it _is_ a BSD system, I hope you would agree) offers
nonblocking writes but does not offer threads.  Ultrix does not offer
threads.  I am sure there are other examples...

You are missing the point as usual.  BSD != FreeBSD, and FreeBSD != UNIX in
general.  I am continually amazed that someone like you could make that
error...

In order to write portable code, one must write portable code.

> > > What is the point of a non-blocking write if this is what happens?
> > 
> > I will leave that as your homework for tonite.
> 
> Answer:		for writes in a multiple client server.

Ahhhh.  You got it.

> Extra credit:	the failure case that originated this discussion was
> 		concerned with a client using read.

That is not very relevant.  The statement which originated _THIS_
discussion was your assertion that "Non-blocking sockets for reliable 
stream protocols like TCP/IP are a stupid idea."

I do not care about Karl's problem...  he may well have a legitimate
problem, and I agreed that it was probably beyond the scope of a usage
discussion given his description.

I do not care about Marc's problem...  that is a separate issue.

I am simply correcting a misconception that you are spreading that
non-blocking sockets are a "stupid idea".

> > Please tell that to FreeBSD's FTP server, which uses a single (blocking)
> > write to perform delivery of data.
> > 
> > Why should an application developer have to know or care what the available
> > buffer space is?  Please tell me where in write(2) and read(2) it says I
> > must worry about this.
> > 
> > It doesn't.
> 
> Exactly my point on a socket read not returning until it completes.

Yes, that's fine.  I agree that there are merits on both sides.  The read()
returning what is available is probably more generally useful, and that
seems to be what is implemented.

I am not going to argue with the design and implementation of the Berkeley
networking code, since it is widely considered to be the standard model
for networking.  Most other folks have not found this to be a critical
design flaw, and neither do I.  I can see several cases where a blocking
read() call would be a substantial nuisance, and so I think that the
behaviour as it exists makes a fair amount of sense.

> > > Indeterminate sockets are evil.  They are on the order of not knowing
> > > your lock state when entering into a function that's going to need
> > > the lock held.
> > 
> > I suppose you have never written a library function.
> > 
> > I suppose you do not subscribe to the philosophy that you should be
> > liberal in what you accept (in this case, assume that you may need to
> > deal with either type of socket).
> 
> If I wrote a library function which operated on a nonu user-opaque
> object like a socket set up by the user, then it would function for
> all potential valid states in which that object could be at the time
> of the call.  For potential invalid states, I would trap the ones
> which I could identify from subfunction returns, and state that the
> behaviour for other invalid states was "undefined" in the documentation
> which I published with the library (ie: optimise for the success case).

What do you define "potential valid states" to be?

I do not claim to cover all the bases all the time, but I do at least
catch exceptional conditions I was not expecting.  In my case, I would
try to write a socket-handling library function to handle both blocking 
and non-blocking sockets if it was reasonably practical to do so.  If
not, I would cause it to bomb if it detected something odd.

I think you are saying the same thing: that is good.

> More likely, I would encapsulate the object using an opaque data
> type, and I would expect the users who wish to consume my interface
> to obtain an object of that type, operate on the object with my
> functions, and release the object when done.  In other words, I
> would employ standard data encapsulation techniques.

Nifty.  That's even possible in many cases if you are designing from 
scratch.  Otherwise, it is a real pain in the butt.

> > I wonder if anyone has ever rewritten one of your programs, and made
> > a fundamental change that silently broke one of your programs because
> > an underlying concept was changed.
> 
> Unlikely.  I document my assumptions.

So what?  If I, as the engineer who replaces you five years down the road,
decide that your program needs to use non-blocking writes, and I change
the program to do them, and I miss one place where you failed to check
a return value, your "documented assumptions" are worth diddly squat.
Code your assumptions when they are this trivial to check.

> > Any software author who writes code and does not perform reasonable
> > sanity checks on the return value, particularly for something as important
> > as the read and write system calls, is hanging a big sign around their
> > neck saying "Kick Me I Code Worth Shit".
> 
> On the other hand, "do not test for an error condition which you can
> not handle".

One can handle ANY error condition by bringing it to the attention of
a higher authority.

My UNIX kernel panicks when it hits a condition that it does not know how
to handle.  It does not foolishly take your advice and "do not test for
an error condition which you can not handle".  To do so would risk great
havoc.  You ALWAYS test for error conditions, PARTICULARLY the ones which
you can not handle - because they are the really scary ones.

> If as part of my rundown in a program, I go to close a file, and the
> close fails, what should I do about it?  Not exit?  Give me a break...

No, but if a close() fails, and you had a reasonable expectation for it
to succeed, printing a warning is not unreasonable.  According to SunOS,
there are two reasons this could happen:  EBADF and EINTR.  If you are
closing an inactive descriptor, it is clearly an error in the code, and
I WOULD CERTAINLY WANT TO KNOW.  If it is due to a signal, it is unclear
what to do, but it is certainly not a "bad" idea to at least be aware
that such a thing can (and has) happened!

> > > It bothers me too... I am used to formatting my IPC data streams.  I
> > > either use fixed length data units so that the receiver can post a
> > > fixed size read, or I use a fix length data unit, and guarantee write
> > > ordering by maintaining state. I do this in order to send a fixed
> > > length header to indicate that I'm writing a variable length packet,
> > > so the receiver can then issue a blocking read for the right size.
> > 
> > I have never seen that work as expected with a large data size.
> 
> I have never seen *any* IPC transport work (reliably) with large data
> sizes... depending on your definition of large.  To deal with this,
> you can only encapsulate the transport and handle them, or don't use
> large data sizes in the first place.

Okay, here we are in complete agreement.  One _always_ needs to be aware
of this, then.

... JG

------------------------------

From: Joe Greco <jgreco@brasil.moneng.mei.com>
Date: Fri, 15 Nov 1996 17:06:28 -0600 (CST)
Subject: Re: Sockets question...

> > Ummm... and the problem is...?
> > 
> > As far as I am aware, byte oriented data can be written to unaligned
> > addresses on any UNIX architecture that I have seen.
> 
> It's not efficient.  It may take multiple bus cycles each time you
> break for more data making it not-as-fast-as-it-could-be.  It is ugly
> and inelegant.  It offends our aesthetics.

Tough doodles.  It is a fact of life.  However, as it is the unusual case,
it is probably not a real big performance deal.

> > xread is explicitly called with what is clearly a byte oriented buffer.
> 
> You could make the same arguments about bcopy, but we've optimized
> it for alignment boundries anyway.

Yeah... which has what to do with what?

> > If you are possibly worried about something such as the atomicity of
> > reads (potentially valid in a threaded environment, or one using shared
> > memory), I agree that there may be some concern.  Since it is not clear
> > to _me_ that such atomicity of access would be valid under the same
> > circumstances even with read(), I would probably code around the
> > situation anyways.
> > 
> > Is there some other problem that I am missing?  I've done this sort of
> > things for several years now...
> 
> Non-shared memory, but using a mmap'ed region as the destination buffer.
> In this case, I want to validate the target address range once and
> copy direct out of the frag buffers into the user buffer.

And you can't, because you may have to perform multiple read() calls,
is that your objection?  I guess I don't really see how that is a
major earth shaking crisis.

Well, fine, go fix it.  While you are at it, break FreeBSD's BSD
compatibility.

> I can make similar arguments about write's of a mmap'ed region.
> 
> I believe that an FTP server sending files to a client would qualify,
> after the initial descriptor header is sent, and the only thing left
> to send is the file data.

Yes.  So use a blocking socket, and arrange not to be disturbed by signals,
and I think you are probably in pretty good shape.

> Further, if tthe kernel detected this happening, it could asynchronusly
> complete, and delay the unmapping until the transmit was complete,
> tunring almost the entire transaction around in kernel space.
> 
> The same thing goes for whole file downloads (ie: .EXE, .DLL, etc.)
> for DOS clients of a UNIX server.

Point being?

... JG

------------------------------

End of hackers-digest V1 #1646
******************************