Date: Thu, 2 Nov 2000 12:40:55 -0800 From: Alfred Perlstein <bright@wintelcom.net> To: Andreas Schweitzer <andy@physast.uga.edu> Cc: freebsd-net@FreeBSD.ORG Subject: Re: recv/recvfrom and select are inconsistent on sockets - it hangs Message-ID: <20001102124054.V20567@fw.wintelcom.net> In-Reply-To: <20001102144039.B27160@bender.physast.uga.edu>; from andy@physast.uga.edu on Thu, Nov 02, 2000 at 02:40:39PM -0500 References: <20001102141053.A27160@bender.physast.uga.edu> <20001102112121.U20567@fw.wintelcom.net> <20001102144039.B27160@bender.physast.uga.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
* Andreas Schweitzer <andy@physast.uga.edu> [001102 11:40] wrote:
> > > It will sit forever in recv/recvfrom, although a previous select
> > > indicated the presence of data ! Here is the source code from the
> > > MPICH library code (the sock_msg_avail_on_fd routine):
> > >
> > > while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1)
> > > ;
> >
> > This is terrible!
>
> Agreed - it's from the MPICH library, not our code
> (/usr/ports/net/mpich/work/mpich-1.2.1/mpid/ch_p4/p4/lib/p4_sock_sr.c)
Someone needs to "lay some smack" on these guys.
> > First off, what was the hack you used to fix this?
>
> /* begin Halloween Hack */
>
> if((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) return(0);
>
> /* end Halloween Hack */
Ok, it's pretty possible that what I said is true, basically something
may have become corrupt and the error is looping.
>
> if (nfds) /* true even for eof */
> {
> /* see if data is on the socket or merely an eof condition */
> /* this should not loop long because the select succeeded */
> while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) ;
>
> It may just as bad. And it works, because this routine is looped over
> as well.
>
> > What other problems?
>
> Problems that a program hangs when reading from sockets.
OK, i thought you meant there were unrelated problems...
> > What is the actuall errno you see come back from recv?
>
> We did not check this yet, I'll try that next.
Please do, it would help a lot.
> > It's possible that you've corrupted some internal pointers such that
> > revc is returning EBADF/ENOTSOCK/EFAULT which would cause an inifinite
> > loop.
>
> Possible. But it's all rather deep in the guts of MPICH and how it talks
> to the OS.
It's possibly a MPICH bug (with the code I've seen so far I don't
doubt it), however corrupting a libraries state is pretty easy and
something that's also very possible.
> > Are you blocking in recv? or looping on that call?
>
> As far as I understand the MPI routine, it does not much more than
> that snippet, and another routine loops around it.
>
> A general comment : it may very well be that the MPI code is
> "not very clean". But we were thinking that some internals in the
> OS may not be the way they should be. And that it may even be
> fixable by simply tuning some parameters - I just have no idea where to
> look.
I really can't say without errno, but it seems like it's a bug in
your code or MPICH, not FreeBSD, get me the errno that's set
and we'll have a definite answer.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001102124054.V20567>
