From owner-freebsd-geom@FreeBSD.ORG Fri Jan 20 13:27:09 2012 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4ABF41065670; Fri, 20 Jan 2012 13:27:09 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 021668FC1A; Fri, 20 Jan 2012 13:27:07 +0000 (UTC) Received: by eekb47 with SMTP id b47so216608eek.13 for ; Fri, 20 Jan 2012 05:27:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=fd3MQ3MYtmhgMNcJ2XLEefOaHPXVp8VWsmQVQo/OhFo=; b=wZdSmprI9yu3IpkbiGojtSnpXLEAV3rgLNs1jwVJHeeczDV6ASOqIssDU9/ioMDEs8 EPTMxR7TnZ8lrAzLkSGtMCH8d1SN6VZ9TGyicUFuw0hLdwGakh5f8VJ9vvBQZRMsRFug qP9+dxUQClUshnnOCf1kcCydAvqegJtNCXoJ0= Received: by 10.14.3.154 with SMTP id 26mr3244979eeh.40.1327066027057; Fri, 20 Jan 2012 05:27:07 -0800 (PST) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id c16sm11558558eei.1.2012.01.20.05.27.04 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 20 Jan 2012 05:27:05 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: Nikolay Denev In-Reply-To: <4F195E85.4010708@FreeBSD.org> Date: Fri, 20 Jan 2012 15:27:03 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4EAF00A6.5060903@FreeBSD.org> <05E0E64F-5EC4-425A-81E4-B6C35320608B@neveragain.de> <4EB05566.3060700@FreeBSD.org> <20111114210957.GA68559@in-addr.com> <059C17DB-3A7B-41AA-BF91-2F8EBAF17D01@gmail.com> <4F19474A.9020600@FreeBSD.org> <-2439788735531654851@unknownmsgid> <4F19503B.2090200@FreeBSD.org> <25C45DA0-4B52-42E4-A1A3-DD5168451423@gmail.com> <4F195E85.4010708@FreeBSD.org> To: Alexander Motin X-Mailer: Apple Mail (2.1251.1) Cc: Gary Palmer , FreeBSD-Current , Dennis K?gel , "freebsd-geom@freebsd.org" Subject: Re: RFC: GEOM MULTIPATH rewrite X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2012 13:27:09 -0000 On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote: > On 01/20/12 14:13, Nikolay Denev wrote: >> On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: >>> On 01/20/12 13:08, Nikolay Denev wrote: >>>> On 20.01.2012, at 12:51, Alexander Motin wrote: >>>>=20 >>>>> On 01/20/12 10:09, Nikolay Denev wrote: >>>>>> Another thing I've observed is that active/active probably only = makes sense if you are accessing single LUN. >>>>>> In my tests where I have 24 LUNS that form 4 vdevs in a single = zpool, the highest performance was achieved >>>>>> when I split the active paths among the controllers installed in = the server importing the pool. (basically "gmultipath rotate $LUN" in = rc.local for half of the paths) >>>>>> Using active/active in this situation resulted in fluctuating = performance. >>>>>=20 >>>>> How big was fluctuation? Between speed of one and all paths? >>>>>=20 >>>>> Several active/active devices without knowledge about each other = with some probability will send part of requests via the same links, = while ZFS itself already does some balancing between vdevs. >>>>=20 >>>> I will test in a bit and post results. >>>>=20 >>>> P.S.: Is there a way to enable/disable active-active on the fly? = I'm >>>> currently re-labeling to achieve that. >>>=20 >>> No, there is not now. But for experiments you may achieve the same = results by manually marking as failed all paths except one. It is not = dangerous, as if that link fail, all other will resurrect automatically. >>=20 >> I had to destroy and relabel anyways, since I was not using = active-active currently. Here's what I did (maybe a little too verbose): >>=20 >> And now a very naive benchmark : >>=20 >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) >>=20 >> Now deactivate the alternative paths : >> And the benchmark again: >>=20 >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) >>=20 >> P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is = directly connected to a 4Gbps Xyratex dual-controller (active-active) = storage array. >> All the 24 SAS drives are setup as single disk RAID0 LUNs. >=20 > This difference is too huge to explain it with ineffective paths = utilization. Can't this storage have some per-LUN port/controller = affinity that may penalize concurrent access to the same LUN from = different paths? Can't it be active/active on port level, but = active/passive for each specific LUN? If there are really two = controllers inside, they may need to synchronize their caches or bounce = requests, that may be expensive. >=20 > --=20 > Alexander Motin Yes, I think that's what's happening. There are two controllers each = with it's own CPU and cache and have cache synchronization enabled. I will try to test multipath if both paths are connected to the same = controller (there are two ports on each controller). But that will = require remote hands and take some time. In the mean time I've now disabled the writeback cache on the array = (this disables also the cache synchronization) and here are the results = : ACTIVE-ACTIVE: :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 512+0 records in 512+0 records out 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec) :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec) :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec) ACTIVE-PASSIVE (half of the paths failed the same way as in the previous = email): :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec) :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec) :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec) This increased the performance for both cases, probably because = writeback caching does nothing for large sequential writes. Anyways, here ACTIVE-ACTIVE is still slower, but not by that much.