Date: Sun, 25 Apr 2010 14:17:15 +0300 From: Mikolaj Golub <to.my.trociny@gmail.com> To: Mikolaj Golub <to.my.trociny@gmail.com> Cc: freebsd-fs <freebsd-fs@freebsd.org>, Pawel Jakub Dawidek <pjd@FreeBSD.org> Subject: Re: HAST: primary might get stuck when there are connectivity problems with secondary Message-ID: <86tyqzeq84.fsf@kopusha.onet> In-Reply-To: <868w8dgk4e.fsf@kopusha.onet> (Mikolaj Golub's message of "Sat\, 24 Apr 2010 14\:33\:53 %2B0300") References: <86r5m9dvqf.fsf@zhuzha.ua1> <20100423062950.GD1670@garage.freebsd.pl> <86k4rye33e.fsf@zhuzha.ua1> <20100424073031.GD3067@garage.freebsd.pl> <868w8dgk4e.fsf@kopusha.onet>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-= On Sat, 24 Apr 2010 14:33:53 +0300 Mikolaj Golub wrote: > From the code I don't see how hast_proto_recv_hdr() may timeout if the > connection is alive, have I missed something? I did some experiments adding the code that sets SO_RCVTIMEO socket option (see the attached patch). It fixes this issue. After timeout the worker on the secondary is restarted with the error: Apr 25 13:06:45 hastb hastd: [storage] (secondary) Unable to receive request header: Resource temporarily unavailable. Apr 25 13:06:45 hastb hastd: [storage] (secondary) Worker process (pid=1243) exited ungracefully: status=19200. On the other hand when the FS is idle (there is no I/O at all) we have the worker restart too and the primary is not being connected to the secondary until some I/O appears. So it might look not very nicely :-) Also note, I had to modify proto_common_recv() to have timeout working. After timeout recv() sets errno to EWOULDBLOCK, which has the same number as EAGAIN in FreeBSD. The current proto_common_recv() restarts recv() if EAGAIN is returned. -- Mikolaj Golub --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=hastd.proto_tcp4.c.SO_RCVTIMEO.patch Index: sbin/hastd/proto_common.c =================================================================== --- sbin/hastd/proto_common.c (revision 207185) +++ sbin/hastd/proto_common.c (working copy) @@ -76,7 +76,7 @@ proto_common_recv(int fd, unsigned char *data, siz do { done = recv(fd, data, size, MSG_WAITALL); - } while (done == -1 && errno == EAGAIN); + } while (done == -1 && errno == EINTR); if (done == 0) return (ENOTCONN); else if (done < 0) Index: sbin/hastd/proto_tcp4.c =================================================================== --- sbin/hastd/proto_tcp4.c (revision 207185) +++ sbin/hastd/proto_tcp4.c (working copy) @@ -31,6 +31,7 @@ __FBSDID("$FreeBSD$"); #include <sys/param.h> /* MAXHOSTNAMELEN */ +#include <sys/time.h> #include <netinet/in.h> #include <netinet/tcp.h> @@ -203,7 +204,7 @@ tcp4_common_setup(const char *addr, void **ctxp, i sizeof(val)) == -1) { pjdlog_warning("Unable to set receive buffer size on %s", addr); } - + tctx->tc_side = side; tctx->tc_magic = TCP4_CTX_MAGIC; *ctxp = tctx; @@ -214,8 +215,23 @@ tcp4_common_setup(const char *addr, void **ctxp, i static int tcp4_client(const char *addr, void **ctxp) { + struct tcp4_ctx *tctx; + struct timeval tv; + int ret; - return (tcp4_common_setup(addr, ctxp, TCP4_SIDE_CLIENT)); + if ((ret = tcp4_common_setup(addr, ctxp, TCP4_SIDE_CLIENT)) != 0) + return (ret); + + tctx = *ctxp; + + tv.tv_sec = 300; + tv.tv_usec = 0; + if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, + sizeof(tv)) == -1) { + pjdlog_warning("Unable to set receive timeout %s", addr); + } + + return (0); } static int @@ -273,6 +289,7 @@ tcp4_accept(void *ctx, void **newctxp) { struct tcp4_ctx *tctx = ctx; struct tcp4_ctx *newtctx; + struct timeval tv; socklen_t fromlen; int ret; @@ -294,6 +311,13 @@ tcp4_accept(void *ctx, void **newctxp) return (ret); } + tv.tv_sec = 300; + tv.tv_usec = 0; + if (setsockopt(newtctx->tc_fd, SOL_SOCKET, SO_RCVTIMEO, &tv, + sizeof(tv)) == -1) { + pjdlog_debug(2, "Unable to set receive timeout"); + } + newtctx->tc_side = TCP4_SIDE_SERVER_WORK; newtctx->tc_magic = TCP4_CTX_MAGIC; *newctxp = newtctx; --=-=-=--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86tyqzeq84.fsf>