From owner-freebsd-questions Wed Feb 27 13: 8:54 2002 Delivered-To: freebsd-questions@freebsd.org Received: from segfault.monkeys.com (246.dsl6660157.rstatic.surewest.net [66.60.157.246]) by hub.freebsd.org (Postfix) with ESMTP id 1F38037B420 for ; Wed, 27 Feb 2002 13:08:42 -0800 (PST) Received: from monkeys.com (localhost [127.0.0.1]) by segfault.monkeys.com (Postfix) with ESMTP id 28338660B for ; Wed, 27 Feb 2002 13:08:36 -0800 (PST) To: freebsd-questions@freebsd.org Subject: Annoying/non-intutive/undocumented poll(2) behavior: Bug or feature? From: "Ronald F. Guilmette" Date: Wed, 27 Feb 2002 13:08:36 -0800 Message-ID: <58877.1014844116@monkeys.com> Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Riddle: When is a socket error not a socket error? Answer: When you are using the poll(2) syscall in FreeBSD (4.3) to check for the completion status of an outbound connect(2). I'm just about to file a formal problem on this, but I thought that I would post it here first, in case anyone wants to talk me out of it, or in case anyone wants to take issue with my analysis. Consider the simple example program below. It allocates a socket, sets the O_NONBLOCK flag on the socket, and then tries to use the socket to asynchronously connect to port 32767 on one of my servers. (The server in question is _not_ running any sort of listener on that particular port.) Eventually, after a suitable waiting period and a suitable number of retries, the attempt to connect will fail, and the call to poll(2) will then return. At that point, the program checks to see if the POLLERR bit is set in the returned `revents' field of the pollfd structure. An error has indeed occured... the connect DID NOT complete successfully... so the POLLERR bit should be set, right? Well, on FreeBSD 4.3 at least, and much to my consternation, that error flag _DOESN'T_ get set. This caused me quite some grief and confusion. In my opinion, it is clearly incorrect kernel behavior. Just to make sure that I wasn't doing something wrong in my coding, I condensed my actual application down to the simple test program you see attached below. Then I compiled and ran it and the darn thing still prints: Error in getsockopt: ... which means that the connect error was NOT detected by checking the POLLERR bit in the returned pollfd structure, but that it could be detected via a subsequent call to getsockopt(2). Just to make sure that I hadn't made any programming error of my own, I lifted up this same test program and carried it over to a Linux system. I compiled and ran it there, just to try to see if the failure to set the POLLERR bit was in some sense ``standard'' (but undocumented) behavior in these circumstances, and I found that this same program, when executed on Redhat 7.2 does in fact print the expected message: poll(2) indicates connect error Now, does anybody want to talk me out of filing a formal FreeBSD problem report on this apparent misbehavior of poll(2)? It seems to me like poll(2) on FreeBSD is providing clearly incorrect behavior. I mean hay! The documentation says that the POLLERR bit will be set in case of an error. That clearly isn't happening. So at the very least, the documentation is wrong. Right? ========================================================================== /* poll(2) error test #1 */ #include #include #include #include #include #include #include #include #include #include #include #include static struct protoent *tcp_proto; static void fatal (register char const *const fmt, register char const *const arg) { fprintf (stderr, fmt, arg); putc ('\n', stderr); exit (1); } static void poll_for_completion (register int const fd) { auto struct pollfd pfd; auto int err; auto socklen_t err_size; pfd.fd = fd; pfd.events = POLLOUT; pfd.revents = 0; if (poll (&pfd, 1, -1) == -1) fatal ("Error in poll: %s", strerror (errno)); if (pfd.revents & POLLERR) fatal ("poll(2) indicates connect error", NULL); if (getsockopt (fd, SOL_SOCKET, SO_ERROR, &err, &err_size) == -1) fatal ("Error in getsockopt: %s", strerror (errno)); if (err != 0) fatal ("getsockopt(2) indicates connect error: %s", strerror (err)); fatal ("Connect successful", NULL); } static void start_connecting (struct in_addr addr, unsigned short port) { auto struct sockaddr_in sin; register int fd; if ((fd = socket (PF_INET, SOCK_STREAM, tcp_proto->p_proto)) == -1) fatal ("Error creating socket: %s", strerror (errno)); if (fcntl (fd, F_SETFL, O_NONBLOCK) == -1) fatal ("Error setting O_NONBLOCK for socket: %s", strerror (errno)); memset (&sin, 0, sizeof sin); sin.sin_family = AF_INET; sin.sin_addr = addr; sin.sin_port = htons (port); if (connect (fd, (struct sockaddr *) &sin, sizeof sin) == -1) { if (errno != EINPROGRESS) { printf ("Connection failed immediately\n"); close (fd); } else poll_for_completion (fd); } else { printf ("Connection completed immediately\n"); close (fd); } } int main (void) { static char const protocol_name[] = "tcp"; auto struct in_addr addr; if ((tcp_proto = getprotobyname (protocol_name)) == NULL) fatal ("Cannot find number for protocol: %s", protocol_name); inet_aton ("66.60.157.246", &addr); start_connecting (addr, 32767); return 0; } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message