Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Nov 2000 12:40:55 -0800
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Andreas Schweitzer <andy@physast.uga.edu>
Cc:        freebsd-net@FreeBSD.ORG
Subject:   Re: recv/recvfrom and select are inconsistent on sockets - it hangs
Message-ID:  <20001102124054.V20567@fw.wintelcom.net>
In-Reply-To: <20001102144039.B27160@bender.physast.uga.edu>; from andy@physast.uga.edu on Thu, Nov 02, 2000 at 02:40:39PM -0500
References:  <20001102141053.A27160@bender.physast.uga.edu> <20001102112121.U20567@fw.wintelcom.net> <20001102144039.B27160@bender.physast.uga.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
* Andreas Schweitzer <andy@physast.uga.edu> [001102 11:40] wrote:
> > > It will sit forever in recv/recvfrom, although a previous select
> > > indicated the presence of data ! Here is the source code from the
> > > MPICH library code (the sock_msg_avail_on_fd routine):
> > > 
> > >         while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1)
> > >             ;
> > 
> > This is terrible!
> 
> Agreed - it's from the MPICH library, not our code
> (/usr/ports/net/mpich/work/mpich-1.2.1/mpid/ch_p4/p4/lib/p4_sock_sr.c)

Someone needs to "lay some smack" on these guys.

> > First off, what was the hack you used to fix this?
> 
> /* begin Halloween Hack */
> 
>     if((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) return(0);
>                 
> /* end Halloween Hack */

Ok, it's pretty possible that what I said is true, basically something
may have become corrupt and the error is looping.

>     
>     if (nfds)                   /* true even for eof */
>     {
>         /* see if data is on the socket or merely an eof condition */
>         /* this should not loop long because the select succeeded */
>         while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) ;
> 
> It may just as bad. And it works, because this routine is looped over
> as well.
> 
> > What other problems?
> 
> Problems that a program hangs when reading from sockets.

OK, i thought you meant there were unrelated problems...

> > What is the actuall errno you see come back from recv?
> 
> We did not check this yet, I'll try that next.

Please do, it would help a lot.

> > It's possible that you've corrupted some internal pointers such that
> > revc is returning EBADF/ENOTSOCK/EFAULT which would cause an inifinite
> > loop.
> 
> Possible. But it's all rather deep in the guts of MPICH and how it talks
> to the OS.

It's possibly a MPICH bug (with the code I've seen so far I don't
doubt it), however corrupting a libraries state is pretty easy and
something that's also very possible.

> > Are you blocking in recv? or looping on that call?
> 
> As far as I understand the MPI routine, it does not much more than
> that snippet, and another routine loops around it.
> 
> A general comment : it may very well be that the MPI code is
> "not very clean". But we were thinking that some internals in the
> OS may not be the way they should be. And that it may even be
> fixable by simply tuning some parameters - I just have no idea where to
> look.

I really can't say without errno, but it seems like it's a bug in
your code or MPICH, not FreeBSD, get me the errno that's set
and we'll have a definite answer.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001102124054.V20567>