Date: Thu, 18 Feb 2010 11:59:40 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Mikolaj Golub <to.my.trociny@gmail.com> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: unix socket: race on close? Message-ID: <alpine.BSF.2.00.1002181157450.59664@fledge.watson.org> In-Reply-To: <86ocjn3rue.fsf@zhuzha.ua1> References: <86ocjn3rue.fsf@zhuzha.ua1>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 18 Feb 2010, Mikolaj Golub wrote:
> Below is a simple test code with unix sockets: the client does 
> connect()/close() in loop and the server -- accept()/close().
>
> Sometimes close() fails with 'Socket is not connected' error:
Hi Mikolaj:
Thanks for this report, and sorry about not spotting your earlier post to 
freebsd-net.  I've been fairly preoccupied the last month and not keeping up 
with the mailing lists.  Could I ask you to file a PR on this, and forward me 
the PR number so I can claim ownership?  This should prevent it from getting 
lost while I catch up.
In short, your evaluation seems reasonable to me -- have you tried tweaking 
soclose() to ignore ENOTCONN from sodisconnect() to confirm this diagnosis 
fixes all the instances you've been seeing?
Robert N M Watson
Computer Laboratory
University of Cambridge
>
> a.out: parent: close error: 57
>
> or
>
> a.out: child: close error: 57
>
> It looks for me like some race in close(). Looking at uipc_socket.c:soclose():
>
> int
> soclose(struct socket *so)
> {
>        int error = 0;
>
>        KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter"));
>
>        CURVNET_SET(so->so_vnet);
>        funsetown(&so->so_sigio);
>        if (so->so_state & SS_ISCONNECTED) {
>                if ((so->so_state & SS_ISDISCONNECTING) == 0) {
>                        error = sodisconnect(so);
>                        if (error)
>                                goto drop;
>                }
>
> Isn't the problem here? so_state is checked for SS_ISCONNECTED and
> SS_ISDISCONNECTING without locking and then sodisconnect() is called, which
> closes both sockets of the connection. So it looks for me that if the close()
> is called for both ends simultaneously it is possible that sodisconnect() will
> be called for both ends and for one ENOTCONN will be returned. Or may I have
> missed something?
>
> We have been observing periodically ENOTCONN errors on unix socket close in
> our applications, so it is not just curiosity :-) (I posted about our problem
> to freebsd-net@ some time ago but then did not attract any attention
> http://lists.freebsd.org/pipermail/freebsd-net/2009-December/024047.html).
>
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <sys/un.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <strings.h>
> #include <string.h>
> #include <unistd.h>
> #include <sys/select.h>
> #include <err.h>
>
> #define UNIXSTR_PATH "/tmp/mytest.socket"
> #define USLEEP  100
>
> int main(int argc, char **argv)
> {
> 	int			listenfd, connfd, pid;
> 	struct sockaddr_un	servaddr;
>
> 	pid = fork();
> 	if (-1 == pid)
> 		errx(1, "fork(): %d", errno);
>
> 	if (0 != pid) { /* parent */
>
> 		if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
> 			errx(1, "parent: socket error: %d", errno);
>
> 		unlink(UNIXSTR_PATH);
> 		bzero(&servaddr, sizeof(servaddr));
> 		servaddr.sun_family = AF_LOCAL;
> 		strcpy(servaddr.sun_path, UNIXSTR_PATH);
>
> 		if (bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0)
> 			errx(1, "parent: bind error: %d", errno);
>
> 		if (listen(listenfd, 1024) < 0)
> 			errx(1, "parent: listen error: %d", errno);
>
> 		for ( ; ; ) {
> 			if ((connfd = accept(listenfd, (struct sockaddr *) NULL, NULL)) < 0)
> 				errx(1, "parent: accept error: %d", errno);
>
> 			//usleep(USLEEP / 2); // (I) uncomment this or (II) below to avoid the race
>
> 	        	if (close(connfd) < 0)
> 				errx(1, "parent: close error: %d", errno);
> 		}
>
> 	} else { /* child */
>
> 		sleep(1); /* give the parent some time to create the socket */
>
> 		for ( ; ; ) {
>
> 			if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
> 				errx(1, "child: socket error: %d", errno);
>
> 			bzero(&servaddr, sizeof(servaddr));
> 			servaddr.sun_family = AF_LOCAL;
> 			strcpy(servaddr.sun_path, UNIXSTR_PATH);
>
> 			if (connect(connfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0)
> 				errx(1, "child: connect error %d", errno);
>
> 			// usleep(USLEEP); // (II) uncomment this or (I) above to avoid the race
>
> 			if (close(connfd) != 0)
> 				errx(1, "child: close error: %d", errno);
>
> 			usleep(USLEEP);
> 		}
> 	}
>
> 	return 0;
> }
>
> -- 
> Mikolaj Golub
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1002181157450.59664>
