Date: Thu, 2 Nov 2000 14:40:39 -0500 From: Andreas Schweitzer <andy@physast.uga.edu> To: Alfred Perlstein <bright@wintelcom.net> Cc: freebsd-net@FreeBSD.ORG Subject: Re: recv/recvfrom and select are inconsistent on sockets - it hangs Message-ID: <20001102144039.B27160@bender.physast.uga.edu> In-Reply-To: <20001102112121.U20567@fw.wintelcom.net>; from Alfred Perlstein on Thu, Nov 02, 2000 at 11:21:22AM -0800 References: <20001102141053.A27160@bender.physast.uga.edu> <20001102112121.U20567@fw.wintelcom.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> > It will sit forever in recv/recvfrom, although a previous select > > indicated the presence of data ! Here is the source code from the > > MPICH library code (the sock_msg_avail_on_fd routine): > > > > SYSCALL_P4(nfds, select(p4_global->max_connections, &read_fds, 0, 0, &tv)); > > > > if (nfds == -1) > > { > > p4_dprintfl(20,"sock_msg_avail_on_fd selected on %d\n", fd); > > p4_error("sock_msg_avail_on_fd select", nfds); > > } > > if (nfds) /* true even for eof */ > > { > > /* see if data is on the socket or merely an eof condition */ > > /* this should not loop long because the select succeeded */ > > while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) > > ; > > This is terrible! Agreed - it's from the MPICH library, not our code (/usr/ports/net/mpich/work/mpich-1.2.1/mpid/ch_p4/p4/lib/p4_sock_sr.c) > > > > if (rc == 0) /* if eof */ > > { > > . > First off, what was the hack you used to fix this? Now the part looks : SYSCALL_P4(nfds, select(p4_global->max_connections, &read_fds, (fd_set *) 0, (fd_set *) 0, &tv)); if (nfds == -1) { p4_dprintfl(20,"sock_msg_avail_on_fd selected on %d\n", fd); p4_error("sock_msg_avail_on_fd select", nfds); } /* begin Halloween Hack */ if((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) return(0); /* end Halloween Hack */ if (nfds) /* true even for eof */ { /* see if data is on the socket or merely an eof condition */ /* this should not loop long because the select succeeded */ while ((rc = recv(fd, tempbuf, 1, MSG_PEEK)) == -1) ; It may just as bad. And it works, because this routine is looped over as well. > What other problems? Problems that a program hangs when reading from sockets. > What is the actuall errno you see come back from recv? We did not check this yet, I'll try that next. > It's possible that you've corrupted some internal pointers such that > revc is returning EBADF/ENOTSOCK/EFAULT which would cause an inifinite > loop. Possible. But it's all rather deep in the guts of MPICH and how it talks to the OS. > Are you blocking in recv? or looping on that call? As far as I understand the MPI routine, it does not much more than that snippet, and another routine loops around it. A general comment : it may very well be that the MPI code is "not very clean". But we were thinking that some internals in the OS may not be the way they should be. And that it may even be fixable by simply tuning some parameters - I just have no idea where to look. Thanks ! Andreas -- Department of Physics & Astronomy and Center for Simulational Physics University of Georgia Phone ++1 (706) 542 5043 Athens, GA 30602-2451 Fax ++1 (706) 542 2492 USA http://dilbert.physast.uga.edu/~andy/ NEW ! WWW page for phoenix : http://phoenix.physast.uga.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001102144039.B27160>