Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Nov 2000 14:40:39 -0500
From:      Andreas Schweitzer <andy@physast.uga.edu>
To:        Alfred Perlstein <bright@wintelcom.net>
Cc:        freebsd-net@FreeBSD.ORG
Subject:   Re: recv/recvfrom and select are inconsistent on sockets - it hangs
Message-ID:  <20001102144039.B27160@bender.physast.uga.edu>
In-Reply-To: <20001102112121.U20567@fw.wintelcom.net>; from Alfred Perlstein on Thu, Nov 02, 2000 at 11:21:22AM -0800
References:  <20001102141053.A27160@bender.physast.uga.edu> <20001102112121.U20567@fw.wintelcom.net>

next in thread | previous in thread | raw e-mail | index | archive | help
> > It will sit forever in recv/recvfrom, although a previous select
> > indicated the presence of data ! Here is the source code from the
> > MPICH library code (the sock_msg_avail_on_fd routine):
> > 
> >     SYSCALL_P4(nfds, select(p4_global->max_connections, &read_fds, 0, 0, &tv));
> >      
> >     if (nfds == -1)
> >     {        
> >         p4_dprintfl(20,"sock_msg_avail_on_fd selected on %d\n", fd);
> >         p4_error("sock_msg_avail_on_fd select", nfds);
> >     }
> >     if (nfds)                   /* true even for eof */
> >     {
> >         /* see if data is on the socket or merely an eof condition */
> >         /* this should not loop long because the select succeeded */
> >         while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1)
> >             ;
> 
> This is terrible!

Agreed - it's from the MPICH library, not our code
(/usr/ports/net/mpich/work/mpich-1.2.1/mpid/ch_p4/p4/lib/p4_sock_sr.c)

> > 
> >         if (rc == 0)    /* if eof */
> >         {
> > .
> First off, what was the hack you used to fix this?

Now the part looks :
    SYSCALL_P4(nfds, select(p4_global->max_connections, &read_fds, (fd_set *) 0, (fd_set *) 0, &tv));
    
    if (nfds == -1)
    {
        p4_dprintfl(20,"sock_msg_avail_on_fd selected on %d\n", fd);
        p4_error("sock_msg_avail_on_fd select", nfds);
    }

/* begin Halloween Hack */

    if((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) return(0);
                
/* end Halloween Hack */
    
    if (nfds)                   /* true even for eof */
    {
        /* see if data is on the socket or merely an eof condition */
        /* this should not loop long because the select succeeded */
        while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) ;

It may just as bad. And it works, because this routine is looped over
as well.

> What other problems?

Problems that a program hangs when reading from sockets.

> What is the actuall errno you see come back from recv?

We did not check this yet, I'll try that next.

> It's possible that you've corrupted some internal pointers such that
> revc is returning EBADF/ENOTSOCK/EFAULT which would cause an inifinite
> loop.

Possible. But it's all rather deep in the guts of MPICH and how it talks
to the OS.

> Are you blocking in recv? or looping on that call?

As far as I understand the MPI routine, it does not much more than
that snippet, and another routine loops around it.

A general comment : it may very well be that the MPI code is
"not very clean". But we were thinking that some internals in the
OS may not be the way they should be. And that it may even be
fixable by simply tuning some parameters - I just have no idea where to
look.

Thanks !

Andreas

-- 
Department of Physics & Astronomy  and  Center for Simulational Physics
University of Georgia                          Phone ++1 (706) 542 5043
Athens, GA 30602-2451                            Fax ++1 (706) 542 2492
USA                               http://dilbert.physast.uga.edu/~andy/

NEW ! WWW page for phoenix :

                   http://phoenix.physast.uga.edu


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001102144039.B27160>