Date: Fri, 08 Mar 2002 10:52:09 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Nate Williams <nate@yogotech.com> Cc: Julian Elischer <julian@elischer.org>, Poul-Henning Kamp <phk@critter.freebsd.dk>, arch@FreeBSD.ORG Subject: Re: Contemplating THIS change to signals. (fwd) Message-ID: <3C890859.4FB4F9D@mindspring.com> References: <15496.23508.148366.980354@caddis.yogotech.com> <Pine.BSF.4.21.0203080017330.46841-100000@InterJet.elischer.org> <15496.58430.16748.970354@caddis.yogotech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Nate Williams wrote: > > You'd be surprised then because once the send() is done, the network IO > > will happen independently of the process. > > I'm more thinking of send. Once the send() system call has queued the > data for sending, it's been 'sent' (ie; the stack has it, and will > 'DTRT' with it). The amount it queues onto the sockbuf is limited to the available space. Thus a send of a very large amount of data means that only part of it gets queued for sending. This is not a restartable situation, unless restart can pick up where it left off, since the sending of a partial load of data can modify peer state, as well, and that state is not under the control of the user process on your end. So just returning an EINTR for a send (or sendfile, etc.) means that the peer state is unknown. At that point, you must abandon the connection. While HTTP is tolerant to connection abandonment (and, if both the client and the server support ranges, can even recover automatically), things like FTP servers are not (an abandoned connection will not result in an automatic "reget" or "reput" for most every FTP). > > this is no different. > > Except for read() or recvfrom() system calls, and potentially things > like 'sendfile()'. Also, write() may behave differently (since write > involve disk writing, not network writing). Yes. Sendfile is very sensitive, since it is a loop to fill the socket buffer up to its limit (as well as the send window), and interrupting this loop without saving the current state damages it. Actually, this sort of begs that the sendfile interface be modified to take a context structure, wich is updated, so that it can be resumed when interupted. The context at the time of interupt would need to reflect the reality of the data that has been sent. For the read/recvfrom, this also kind of begs for a "recvfile", since there's no way you can modify them without futzing with the POSIX-ly correctness of the interface. > > from the time you do the ^Z to the time the syscall thinks of returning is > > how long? If you say 3 seconds then all that is different is that in my > > case the data has been taken off the queue but previously it would have > > still been on the queue, but since the process is stopped, > > who can tell? > > A lot can happen in 3 seconds. :) Or not happen. The dichotomy between a gigabit link on a server and a 28k link on a dialup client, damages a lot of the end-to-end assumptions that interrupting with EINTR tries to make, since it ignores the idea of a pool retention that is out of the control of the sender, once the send is initiated. For local disk, the problem is less (or at least, can be made less, if you want to hack up uiomove and the write path), because you can guarantee the relative atomicity of the operations. If they are initiated in block size increments on block boundaries, you can actually make a 100% guarantee (some code mods to the current code are required, but they are pretty trivial). You won't avoid the page-in-before-write-out in all cases, but you can avoid the case of partial-write-complete-and-interrupted-leaving-interminate-state case. > > In fact if the data was already present then sleep(0 would > > have never been called, so the blocking would (even now) happen > > at the user boundary. All I'm doing is making it consitent. > > Agreed. Actually, this isn't true. The wait for the window drain on a socket write *does* result in a sleep. Also the wait for a subsequent page-in for a write spanning two pages without a cluster adjacency on disk. > Sorry, I meant 'kernel context' above. My bad. I'll repeat. > > I'm still not getting a warm fuzzy that allowing the kernel context to > complete and then block at the userland boundary is a good idea. I'm > not saying it's a bad idea, but I'm almost positive there are gremlins > hiding in the details here. :) What are very large gremlins called... "goblins"? If they're there, then they're big. Of course, the only way to find out is to stop hypothesizing, and go look... -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C890859.4FB4F9D>