Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Oct 2010 11:45:51 -0700
From:      "Kevin Oberman" <oberman@es.net>
To:        Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        freebsd-stable@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>, Pete French <petefrench@ticketswitch.com>
Subject:   Re: hast vs ggate+gmirror sychrnoisation speed 
Message-ID:  <20101022184551.B587D1CC3E@ptavv.es.net>
In-Reply-To: Your message of "Fri, 22 Oct 2010 17:51:03 %2B0300." <861v7ii8mg.fsf@kopusha.home.net> 

next in thread | previous in thread | raw e-mail | index | archive | help
> From: Mikolaj Golub <to.my.trociny@gmail.com>
> Date: Fri, 22 Oct 2010 17:51:03 +0300
> Sender: owner-freebsd-stable@freebsd.org
> 
> 
> On Thu, 21 Oct 2010 13:25:34 +0100 Pete French wrote:
> 
>  PF> Well, I bit the bullet and moved to using hast - all went beautifully,
>  PF> and I migrated the pool with no downtime. The one thing I do notice,
>  PF> however, is that the synchronisation with hast is much slower
>  PF> than the older ggate+gmirror combination. It's about half the
>  PF> speed in fact.
> 
>  PF> When I orginaly setup my ggate configuration I did a lot of tweaks to
>  PF> get the speed good - these copnsisted of expanding the send and
>  PF> receive space for the sockets using sysctl.conf, and then providing
>  PF> large buffers to ggate. Is there a way to control this with hast ?
>  PF> I still have the sysctls set (as the machines have not rebooted)
>  PF> but I cant see any options in hast.conf which are equivalent to the
>  PF> "-S 262144 -R 262144" which I use with ggate
> 
>  PF> Any advice, or am I barking up the wrong tree here ?
> 
> Currently there are no options in hast.conf to change send and receive buffer
> size. They are hardcoded in sbin/hastd/proto_tcp4.c:
> 
>         val = 131072;
>         if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_SNDBUF, &val,
>             sizeof(val)) == -1) {
>                 pjdlog_warning("Unable to set send buffer size on %s", addr);
>         }
>         val = 131072;
>         if (setsockopt(tctx->tc_fd, SOL_SOCKET, SO_RCVBUF, &val,
>             sizeof(val)) == -1) {
>                 pjdlog_warning("Unable to set receive buffer size on %s", addr);
>         }
> 
> You could change the values and recompile hastd :-). It would be interesting
> to know about the results of your experiment (if you do).

This value may be just a bit high for a local connection and may be too
small for a wide-area connection that is far away. 

If you are 50ms RTT from the remote system, the default buffer size will
limit you to about 21 Mbps. Formula is Window-size(in bits/sec)/RTT(in
sec.) The result is the absolute maximum possible bandwidth in
bits/sec. Of course, you can replace window size with the bytes/sec and
the result will be in bytes. 
131072/.05=2621440 bytes/sec
131072*8/.05=20971520 bits/sec

Note that too large a buffer will slow down the transfer in most cases,
although the impact is usually not as obvious as too small a
buffer. That's why most modern TCP stacks auto-tune buffer size. The RTT
from coast to coast in the US should about 50 ms, but the fiber never
runs straight. Our RTT between San Francisco and New York City is about
87 ms. Ping is an easy way to check RTT.

> Also note there is another hardcoded value in sbin/hastd/proto_common.c
> 
>  /* Maximum size of packet we want to use when sending data. */
> #define MAX_SEND_SIZE   32768
> 
> that looks like might affect synchronization speed too. Previously we had 128kB
> here but this has been changed to 32Kb because it was reported about slow
> synchronization with MAX_SEND_SIZE=128kB.
> 
> http://svn.freebsd.org/viewvc/base?view=revision&revision=211452
> 
> I wonder couldn't slow synchronization with MAX_SEND_SIZE=131072 be due to
> SO_SNDBUF/SO_RCVBUF be equal to this size? May be increasing
> SO_SNDBUF/SO_RCVBUF we could reach better performance with
> MAX_SEND_SIZE=128kB?

Large data sizes (not really packets) can definitely slow down
transfers. The actual data per IP packet is usually 1460 bytes on
Ethernet which defaults to 1500 bytes of data with 40 bytes of header
(TCP). The system will segment the send-size data into multiple packets
and, should one be lost, all will have to be retransmitted. If SACK is
not in use, all of the data in flight will also need to be transferred
again, but most modern system implement SACK, so only the buffer need be
re-transmitted. So 128K would result in a LOT of data being sent again.

Most modern systems have page sizes of at least 4KB and many are using
larger pages. You want the MAX_SEND_SIZE to be a multiple of the page
size, so 32K is probably not a bad size. You could try dropping it to
16KB, but I doubt it will help and might hurt transfers.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101022184551.B587D1CC3E>