From owner-freebsd-questions  Wed Feb 27 13: 8:54 2002
Delivered-To: freebsd-questions@freebsd.org
Received: from segfault.monkeys.com (246.dsl6660157.rstatic.surewest.net [66.60.157.246])
	by hub.freebsd.org (Postfix) with ESMTP id 1F38037B420
	for <freebsd-questions@freebsd.org>; Wed, 27 Feb 2002 13:08:42 -0800 (PST)
Received: from monkeys.com (localhost [127.0.0.1])
	by segfault.monkeys.com (Postfix) with ESMTP id 28338660B
	for <freebsd-questions@freebsd.org>; Wed, 27 Feb 2002 13:08:36 -0800 (PST)
To: freebsd-questions@freebsd.org
Subject: Annoying/non-intutive/undocumented poll(2) behavior: Bug or feature?
From: "Ronald F. Guilmette" <rfg@monkeys.com>
Date: Wed, 27 Feb 2002 13:08:36 -0800
Message-ID: <58877.1014844116@monkeys.com>
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-questions.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-questions>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-questions>
X-Loop: FreeBSD.ORG


Riddle:  When is a socket error not a socket error?

Answer:  When you are using the poll(2) syscall in FreeBSD (4.3) to
         check for the completion status of an outbound connect(2).

I'm just about to file a formal problem on this, but I thought that I
would post it here first, in case anyone wants to talk me out of it,
or in case anyone wants to take issue with my analysis.

Consider the simple example program below.  It allocates a socket, sets
the O_NONBLOCK flag on the socket, and then tries to use the socket to
asynchronously connect to port 32767 on one of my servers.  (The server
in question is _not_ running any sort of listener on that particular port.)

Eventually, after a suitable waiting period and a suitable number of
retries, the attempt to connect will fail, and the call to poll(2)
will then return.  At that point, the program checks to see if the
POLLERR bit is set in the returned `revents' field of the pollfd
structure.

An error has indeed occured... the connect DID NOT complete successfully...
so the POLLERR bit should be set, right?

Well, on FreeBSD 4.3 at least, and much to my consternation, that error
flag _DOESN'T_ get set.

This caused me quite some grief and confusion.  In my opinion, it is
clearly incorrect kernel behavior.

Just to make sure that I wasn't doing something wrong in my coding, I
condensed my actual application down to the simple test program you see
attached below.  Then I compiled and ran it and the darn thing still
prints:

	Error in getsockopt: ...

which means that the connect error was NOT detected by checking the
POLLERR bit in the returned pollfd structure, but that it could be
detected via a subsequent call to getsockopt(2).

Just to make sure that I hadn't made any programming error of my own,
I lifted up this same test program and carried it over to a Linux system.
I compiled and ran it there, just to try to see if the failure to set
the POLLERR bit was in some sense ``standard'' (but undocumented) behavior
in these circumstances, and I found that this same program, when executed
on Redhat 7.2 does in fact print the expected message:

	poll(2) indicates connect error

Now, does anybody want to talk me out of filing a formal FreeBSD problem
report on this apparent misbehavior of poll(2)?

It seems to me like poll(2) on FreeBSD is providing clearly incorrect
behavior.  I mean hay!  The documentation says that the POLLERR bit
will be set in case of an error.  That clearly isn't happening.  So
at the very least, the documentation is wrong.  Right?


==========================================================================
/* poll(2) error test #1 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <poll.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

static struct protoent *tcp_proto;

static void
fatal (register char const *const fmt, register char const *const arg)
{
  fprintf (stderr, fmt, arg);
  putc ('\n', stderr);
  exit (1);
}

static void
poll_for_completion (register int const fd)
{
  auto struct pollfd pfd;
  auto int err;
  auto socklen_t err_size;

  pfd.fd = fd;
  pfd.events = POLLOUT;
  pfd.revents = 0;

  if (poll (&pfd, 1, -1) == -1)
    fatal ("Error in poll: %s", strerror (errno));

  if (pfd.revents & POLLERR)
    fatal ("poll(2) indicates connect error", NULL);

  if (getsockopt (fd, SOL_SOCKET, SO_ERROR, &err, &err_size) == -1)
    fatal ("Error in getsockopt: %s", strerror (errno));
  
  if (err != 0)
    fatal ("getsockopt(2) indicates connect error: %s", strerror (err));

  fatal ("Connect successful", NULL);
}

static void
start_connecting (struct in_addr addr, unsigned short port)
{
  auto struct sockaddr_in sin;
  register int fd;

  if ((fd = socket (PF_INET, SOCK_STREAM, tcp_proto->p_proto)) == -1)
    fatal ("Error creating socket: %s", strerror (errno));

  if (fcntl (fd, F_SETFL, O_NONBLOCK) == -1)
    fatal ("Error setting O_NONBLOCK for socket: %s", strerror (errno));

  memset (&sin, 0, sizeof sin);
  sin.sin_family = AF_INET;
  sin.sin_addr = addr;
  sin.sin_port = htons (port);

  if (connect (fd, (struct sockaddr *) &sin, sizeof sin) == -1)
    {
      if (errno != EINPROGRESS)
        {
	  printf ("Connection failed immediately\n");
          close (fd);
        }
      else
	poll_for_completion (fd);
    }
  else
    {
      printf ("Connection completed immediately\n");
      close (fd);
    }
}

int
main (void)
{
  static char const protocol_name[] = "tcp";
  auto struct in_addr addr;

  if ((tcp_proto = getprotobyname (protocol_name)) == NULL)
    fatal ("Cannot find number for protocol: %s", protocol_name);

  inet_aton ("66.60.157.246", &addr);
  start_connecting (addr, 32767);

  return 0;
}

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message