FreeBSD Mail Archives

Date:      Fri, 20 Jan 2012 16:07:32 +0200
From:      Nikolay Denev <ndenev@gmail.com>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        Gary Palmer <gpalmer@freebsd.org>, FreeBSD-Current <freebsd-current@freebsd.org>, Dennis K?gel <dk@neveragain.de>, "freebsd-geom@freebsd.org" <freebsd-geom@freebsd.org>
Subject:   Re: RFC: GEOM MULTIPATH rewrite
Message-ID:  <19A6D02C-C42C-4C9A-A8D0-C076E901F98F@gmail.com>
In-Reply-To: <4F196E72.40903@FreeBSD.org>
References:  <4EAF00A6.5060903@FreeBSD.org> <05E0E64F-5EC4-425A-81E4-B6C35320608B@neveragain.de> <4EB05566.3060700@FreeBSD.org> <20111114210957.GA68559@in-addr.com> <059C17DB-3A7B-41AA-BF91-2F8EBAF17D01@gmail.com> <4F19474A.9020600@FreeBSD.org> <-2439788735531654851@unknownmsgid> <4F19503B.2090200@FreeBSD.org> <25C45DA0-4B52-42E4-A1A3-DD5168451423@gmail.com> <4F195E85.4010708@FreeBSD.org> <E6C0EEED-7EA6-4BC8-9ACD-3C33B2F8557B@gmail.com> <4F196E72.40903@FreeBSD.org>


On Jan 20, 2012, at 3:38 PM, Alexander Motin wrote:

> On 01/20/12 15:27, Nikolay Denev wrote:
>>=20
>> On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote:
>>=20
>>> On 01/20/12 14:13, Nikolay Denev wrote:
>>>> On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote:
>>>>> On 01/20/12 13:08, Nikolay Denev wrote:
>>>>>> On 20.01.2012, at 12:51, Alexander Motin<mav@freebsd.org>    =
wrote:
>>>>>>=20
>>>>>>> On 01/20/12 10:09, Nikolay Denev wrote:
>>>>>>>> Another thing I've observed is that active/active probably only =
makes sense if you are accessing single LUN.
>>>>>>>> In my tests where I have 24 LUNS that form 4 vdevs in a single =
zpool, the highest performance was achieved
>>>>>>>> when I split the active paths among the controllers installed =
in the server importing the pool. (basically "gmultipath rotate $LUN" in =
rc.local for half of the paths)
>>>>>>>> Using active/active in this situation resulted in fluctuating =
performance.
>>>>>>>=20
>>>>>>> How big was fluctuation? Between speed of one and all paths?
>>>>>>>=20
>>>>>>> Several active/active devices without knowledge about each other =
with some probability will send part of requests via the same links, =
while ZFS itself already does some balancing between vdevs.
>>>>>>=20
>>>>>> I will test in a bit and post results.
>>>>>>=20
>>>>>> P.S.: Is there a way to enable/disable active-active on the fly? =
I'm
>>>>>> currently re-labeling to achieve that.
>>>>>=20
>>>>> No, there is not now. But for experiments you may achieve the same =
results by manually marking as failed all paths except one. It is not =
dangerous, as if that link fail, all other will resurrect automatically.
>>>>=20
>>>> I had to destroy and relabel anyways, since I was not using =
active-active currently. Here's what I did (maybe a little too verbose):
>>>>=20
>>>> And now a very naive benchmark :
>>>>=20
>>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>>>> 512+0 records in
>>>> 512+0 records out
>>>> 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec)
>>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>>>> 512+0 records in
>>>> 512+0 records out
>>>> 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec)
>>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>>>> 512+0 records in
>>>> 512+0 records out
>>>> 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec)
>>>>=20
>>>> Now deactivate the alternative paths :
>>>> And the benchmark again:
>>>>=20
>>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>>>> 512+0 records in
>>>> 512+0 records out
>>>> 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec)
>>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>>>> 512+0 records in
>>>> 512+0 records out
>>>> 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec)
>>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>>>> 512+0 records in
>>>> 512+0 records out
>>>> 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec)
>>>>=20
>>>> P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and =
is directly connected to a 4Gbps Xyratex dual-controller (active-active) =
storage array.
>>>> All the 24 SAS drives are setup as single disk RAID0 LUNs.
>>>=20
>>> This difference is too huge to explain it with ineffective paths =
utilization. Can't this storage have some per-LUN port/controller =
affinity that may penalize concurrent access to the same LUN from =
different paths? Can't it be active/active on port level, but =
active/passive for each specific LUN? If there are really two =
controllers inside, they may need to synchronize their caches or bounce =
requests, that may be expensive.
>>>=20
>>> --
>>> Alexander Motin
>>=20
>> Yes, I think that's what's happening. There are two controllers each =
with it's own CPU and cache and have cache synchronization enabled.
>> I will try to test multipath if both paths are connected to the same =
controller (there are two ports on each controller). But that will =
require remote hands and take some time.
>>=20
>> In the mean time I've now disabled the writeback cache on the array =
(this disables also the cache synchronization) and here are the results =
:
>>=20
>> ACTIVE-ACTIVE:
>>=20
>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>> 512+0 records in
>> 512+0 records out
>> 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec)
>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>> 512+0 records in
>> 512+0 records out
>> 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec)
>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>> 512+0 records in
>> 512+0 records out
>> 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec)
>>=20
>> ACTIVE-PASSIVE (half of the paths failed the same way as in the =
previous email):
>>=20
>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>> 512+0 records in
>> 512+0 records out
>> 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec)
>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>> 512+0 records in
>> 512+0 records out
>> 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec)
>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512
>> 512+0 records in
>> 512+0 records out
>> 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec)
>>=20
>> This increased the performance for both cases, probably because =
writeback caching does nothing for large sequential writes.
>> Anyways, here ACTIVE-ACTIVE is still slower, but not by that much.
>=20
> Thank you for numbers, but I have some doubts about them. 2295702835 =
bytes/sec is about 18Gbps. If you have 4Gbps links, that would need more =
then 4 of them, I think.
>=20
> --=20
> Alexander Motin

Hmm, thats silly of me. 512M is just too small, and probably I've =
benched the ZFS cache. (I have only two 4Gbps links to the array).
Here's run with 8G file:

ACTIVE-ACTIVE:

# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096
8096+0 records in
8096+0 records out
8489271296 bytes transferred in 62.120919 secs (136657207 bytes/sec)
# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096
8096+0 records in
8096+0 records out
8489271296 bytes transferred in 65.066861 secs (130469969 bytes/sec)
# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096
8096+0 records in
8096+0 records out
8489271296 bytes transferred in 64.011907 secs (132620190 bytes/sec)

ACTIVE-PASSIVE:

# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096
8096+0 records in
8096+0 records out
8489271296 bytes transferred in 34.297121 secs (247521398 bytes/sec)
# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096
8096+0 records in
8096+0 records out
8489271296 bytes transferred in 31.709855 secs (267717127 bytes/sec)
# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096
8096+0 records in
8096+0 records out
8489271296 bytes transferred in 34.111564 secs (248867840 bytes/sec)

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19A6D02C-C42C-4C9A-A8D0-C076E901F98F>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation