From owner-freebsd-current@FreeBSD.ORG Fri Jan 20 13:39:03 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDA80106566B; Fri, 20 Jan 2012 13:39:03 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0928E8FC0C; Fri, 20 Jan 2012 13:39:02 +0000 (UTC) Received: by eekb47 with SMTP id b47so221778eek.13 for ; Fri, 20 Jan 2012 05:39:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=A9XzUwUZFcs0Xs66OHmjxgZwuoB3PgAm7gliTmbgpcU=; b=hD2lHQC7fwVrZfPosmNYVkRWM+MdMCIWk5gSVPEQlyVNeJt9REmfGZ3gyUKraYtAj4 v4LKj4mP2/WnjU0BEUYnH3AVKspjJYpXg8MbU4H/HM7GprQHPideZ98FIlO8nZ1CFyeP lAbHruHgNJpVzS5Avm+PMgJZKbLpW7PESmOWg= Received: by 10.14.14.7 with SMTP id c7mr3066953eec.89.1327066741830; Fri, 20 Jan 2012 05:39:01 -0800 (PST) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id x43sm11563793eef.8.2012.01.20.05.38.59 (version=SSLv3 cipher=OTHER); Fri, 20 Jan 2012 05:39:00 -0800 (PST) Sender: Alexander Motin Message-ID: <4F196E72.40903@FreeBSD.org> Date: Fri, 20 Jan 2012 15:38:58 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111227 Thunderbird/9.0 MIME-Version: 1.0 To: Nikolay Denev References: <4EAF00A6.5060903@FreeBSD.org> <05E0E64F-5EC4-425A-81E4-B6C35320608B@neveragain.de> <4EB05566.3060700@FreeBSD.org> <20111114210957.GA68559@in-addr.com> <059C17DB-3A7B-41AA-BF91-2F8EBAF17D01@gmail.com> <4F19474A.9020600@FreeBSD.org> <-2439788735531654851@unknownmsgid> <4F19503B.2090200@FreeBSD.org> <25C45DA0-4B52-42E4-A1A3-DD5168451423@gmail.com> <4F195E85.4010708@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Gary Palmer , FreeBSD-Current , Dennis K?gel , "freebsd-geom@freebsd.org" Subject: Re: RFC: GEOM MULTIPATH rewrite X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2012 13:39:04 -0000 On 01/20/12 15:27, Nikolay Denev wrote: > > On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote: > >> On 01/20/12 14:13, Nikolay Denev wrote: >>> On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: >>>> On 01/20/12 13:08, Nikolay Denev wrote: >>>>> On 20.01.2012, at 12:51, Alexander Motin wrote: >>>>> >>>>>> On 01/20/12 10:09, Nikolay Denev wrote: >>>>>>> Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. >>>>>>> In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved >>>>>>> when I split the active paths among the controllers installed in the server importing the pool. (basically "gmultipath rotate $LUN" in rc.local for half of the paths) >>>>>>> Using active/active in this situation resulted in fluctuating performance. >>>>>> >>>>>> How big was fluctuation? Between speed of one and all paths? >>>>>> >>>>>> Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. >>>>> >>>>> I will test in a bit and post results. >>>>> >>>>> P.S.: Is there a way to enable/disable active-active on the fly? I'm >>>>> currently re-labeling to achieve that. >>>> >>>> No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. >>> >>> I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose): >>> >>> And now a very naive benchmark : >>> >>> :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 >>> 512+0 records in >>> 512+0 records out >>> 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) >>> :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 >>> 512+0 records in >>> 512+0 records out >>> 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) >>> :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 >>> 512+0 records in >>> 512+0 records out >>> 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) >>> >>> Now deactivate the alternative paths : >>> And the benchmark again: >>> >>> :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 >>> 512+0 records in >>> 512+0 records out >>> 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) >>> :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 >>> 512+0 records in >>> 512+0 records out >>> 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) >>> :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 >>> 512+0 records in >>> 512+0 records out >>> 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) >>> >>> P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is directly connected to a 4Gbps Xyratex dual-controller (active-active) storage array. >>> All the 24 SAS drives are setup as single disk RAID0 LUNs. >> >> This difference is too huge to explain it with ineffective paths utilization. Can't this storage have some per-LUN port/controller affinity that may penalize concurrent access to the same LUN from different paths? Can't it be active/active on port level, but active/passive for each specific LUN? If there are really two controllers inside, they may need to synchronize their caches or bounce requests, that may be expensive. >> >> -- >> Alexander Motin > > Yes, I think that's what's happening. There are two controllers each with it's own CPU and cache and have cache synchronization enabled. > I will try to test multipath if both paths are connected to the same controller (there are two ports on each controller). But that will require remote hands and take some time. > > In the mean time I've now disabled the writeback cache on the array (this disables also the cache synchronization) and here are the results : > > ACTIVE-ACTIVE: > > :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 > 512+0 records in > 512+0 records out > 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec) > :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 > 512+0 records in > 512+0 records out > 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec) > :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 > 512+0 records in > 512+0 records out > 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec) > > ACTIVE-PASSIVE (half of the paths failed the same way as in the previous email): > > :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 > 512+0 records in > 512+0 records out > 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec) > :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 > 512+0 records in > 512+0 records out > 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec) > :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 > 512+0 records in > 512+0 records out > 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec) > > This increased the performance for both cases, probably because writeback caching does nothing for large sequential writes. > Anyways, here ACTIVE-ACTIVE is still slower, but not by that much. Thank you for numbers, but I have some doubts about them. 2295702835 bytes/sec is about 18Gbps. If you have 4Gbps links, that would need more then 4 of them, I think. -- Alexander Motin