From owner-freebsd-questions@FreeBSD.ORG Thu Apr 18 00:28:24 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F3472B3A; Thu, 18 Apr 2013 00:28:23 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-oa0-f50.google.com (mail-oa0-f50.google.com [209.85.219.50]) by mx1.freebsd.org (Postfix) with ESMTP id ACE51F7D; Thu, 18 Apr 2013 00:28:23 +0000 (UTC) Received: by mail-oa0-f50.google.com with SMTP id n1so2182272oag.23 for ; Wed, 17 Apr 2013 17:28:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=cKq+oF89YQZ+eYEetnkdYDtswbceu5Gu93OaPa2nIB4=; b=tb9aINlTH1RpJ4k02tSn6koEX0eMM8Ed5il6Up5wK3F0FVNps2i6djyVZpAXv8p/7K 7Fl6bEU04VMGqLEOcZc0/nBL+7CgbRlswRsvmxBzZlFV7r/oaETnydaRqHoYederibPu JqXNINvYIVsCltFs98IDrnX278//oBOVbrkhSvkuJzLXbiocOlepmgJxBttwaCTVA7Ss WWksLqF2zI6UiSbaX3otDS9T06R+V67gmFznhScFunYTRt7kAnxJXFXlK5dCWuwZM0/6 cn6f3rhPiduc/9x342ZQz6UDQaMbsRWVddvh2uMRz/NuABR78+wOcsaWSUNmNPhZf0Iu RYXA== MIME-Version: 1.0 X-Received: by 10.182.43.164 with SMTP id x4mr3929448obl.28.1366244902926; Wed, 17 Apr 2013 17:28:22 -0700 (PDT) Received: by 10.76.135.194 with HTTP; Wed, 17 Apr 2013 17:28:22 -0700 (PDT) In-Reply-To: <13CA24D6AB415D428143D44749F57D7201F0613F@ltcfiswmsgmb21> References: <13CA24D6AB415D428143D44749F57D7201F05E0A@ltcfiswmsgmb21> <13CA24D6AB415D428143D44749F57D7201F05FD4@ltcfiswmsgmb21> <13CA24D6AB415D428143D44749F57D7201F0613F@ltcfiswmsgmb21> Date: Wed, 17 Apr 2013 20:28:22 -0400 Message-ID: Subject: Re: gmultipath, ses and shared disks / cant seem to share between local nodes From: Outback Dingo To: Devin Teske Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Questions X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Apr 2013 00:28:24 -0000 On Wed, Apr 17, 2013 at 8:05 PM, Teske, Devin wr= ote: > > On Apr 17, 2013, at 4:56 PM, Outback Dingo wrote: > > > > > On Wed, Apr 17, 2013 at 7:29 PM, Teske, Devin = wrote: > >> >> On Apr 17, 2013, at 4:10 PM, Outback Dingo wrote: >> >> >> >> >> On Wed, Apr 17, 2013 at 6:39 PM, Teske, Devin wrote: >> >>> >>> On Apr 17, 2013, at 3:26 PM, Outback Dingo wrote: >>> >>> > Ok, maybe im at a loss here in the way my brain is viewing this >>> > >>> > we have a box, its got 2 nodes in the chassis, and 32 sata drives >>> > attached to a SATA/SAS backplane via 4 (2 per node) LSI MPT SAS2 card= s >>> > should i not logically be seeing 4 controllers X #drive count ?? >>> > >>> > camcontrol devlist shows 32 devices, daX,passX and sesX,passX >>> > >>> > at scbus0 target 9 lun 0 (da0,pass= 0) >>> > at scbus0 target 10 lun 0 (ses0,pass1) >>> > at scbus0 target 11 lun 0 >>> (da1,pass2) >>> > at scbus0 target 12 lun 0 (ses1,pass3) >>> > at scbus0 target 13 lun 0 >>> (da2,pass4) >>> > at scbus0 target 14 lun 0 (ses2,pass5) >>> > at scbus0 target 15 lun 0 >>> (da3,pass6) >>> > at scbus0 target 16 lun 0 (ses3,pass7) >>> > at scbus0 target 17 lun 0 >>> (da4,pass8) >>> > at scbus0 target 18 lun 0 (ses4,pass9) >>> > at scbus0 target 19 lun 0 >>> (da5,pass10) >>> > at scbus0 target 20 lun 0 (ses5,pass11) >>> > at scbus0 target 21 lun 0 >>> (da6,pass12) >>> > at scbus0 target 22 lun 0 (ses6,pass13) >>> > at scbus0 target 23 lun 0 >>> (da7,pass14) >>> > at scbus0 target 24 lun 0 (ses7,pass15) >>> > at scbus1 target 0 lun 0 >>> (da8,pass16) >>> > at scbus1 target 1 lun 0 >>> (da9,pass17) >>> > at scbus8 target 10 lun 0 (ses8,pass19) >>> > at scbus8 target 11 lun 0 >>> (da11,pass20) >>> > at scbus8 target 12 lun 0 (ses9,pass21) >>> > at scbus8 target 13 lun 0 >>> (da12,pass22) >>> > at scbus8 target 14 lun 0 (ses10,pass23) >>> > at scbus8 target 15 lun 0 >>> (da13,pass24) >>> > at scbus8 target 16 lun 0 (ses11,pass25) >>> > at scbus8 target 17 lun 0 >>> (da14,pass26) >>> > at scbus8 target 18 lun 0 (ses12,pass27) >>> > at scbus8 target 19 lun 0 >>> (da15,pass28) >>> > at scbus8 target 20 lun 0 (ses13,pass29) >>> > at scbus8 target 21 lun 0 >>> (da16,pass30) >>> > at scbus8 target 22 lun 0 (ses14,pass31) >>> > at scbus8 target 23 lun 0 >>> (da17,pass32) >>> > at scbus8 target 24 lun 0 (ses15,pass33) >>> > at scbus9 target 0 lun 0 >>> (da18,pass34) >>> > >>> > >>> > we would like to create a zpool from all the devices, that in theory = if >>> > nodeA failed >>> > then nodeB could force import the pool, >>> >>> gmultipath (which you mention in the subject) is the appropriate tool >>> for this, but there's no need for an import of the pool if you build th= e >>> pool out of multipath devices. In our experience, we can pull a cable a= nd >>> zfs continues working just fine. >>> >>> In other words, don't build the pool out of the devices, put a >>> gmultipath label on each device and then use /dev/multipath/LABEL for t= he >>> zpool devices. >>> >>> >>> > nodeA and NodeB are attached through >>> > dual LSI controllers, to the SATA/SAS backplane. but i cant seem to >>> create >>> > a zpool from sesX or passX devices, i can however create a 16 drive >>> zp0ol >>> > on either node, from any daX device. what did i miss? ive looked at >>> > gmirror, and also ses documents. Any insight is appreciated, thanks i= n >>> > advance. >>> >>> gmirror is the wrong tool, gmultipath is what you want. The basic task >>> is to use "gmultipath label FOO da#" to write a cookie on the disk (use= d to >>> identify new/existing paths during GOEM "taste" events for example). >>> >>> After you've labeled the da# devices with gmultipath you say "gmultipat= h >>> status" to see the components of each label and you use "multipath/LABE= L" >>> as your disk name when creating the zpool (these correspond directly to >>> /dev/multipath/LABEL, but "zpool create =85" or "zpool add =85" allow y= ou to >>> omit the leading "/dev"). >>> >> >> sanity check me on node A i did >> >> zpool destroy master >> >> gmultipath label FOO da0 >> >> gmultipath status >> Name Status Components >> multipath/FOO DEGRADED da0 (ACTIVE) >> multipath/FOO-619648737 DEGRADED da1 (ACTIVE) >> multipath/FOO-191725652 DEGRADED da2 (ACTIVE) >> multipath/FOO-1539342315 DEGRADED da3 (ACTIVE) >> multipath/FOO-1276041606 DEGRADED da4 (ACTIVE) >> multipath/FOO-2000832198 DEGRADED da5 (ACTIVE) >> multipath/FOO-1285640577 DEGRADED da6 (ACTIVE) >> multipath/FOO-1816092574 DEGRADED da7 (ACTIVE) >> multipath/FOO-1102254444 DEGRADED da8 (ACTIVE) >> multipath/FOO-330300690 DEGRADED da9 (ACTIVE) >> multipath/FOO-92140635 DEGRADED da10 (ACTIVE) >> multipath/FOO-855257672 DEGRADED da11 (ACTIVE) >> multipath/FOO-1003634134 DEGRADED da12 (ACTIVE) >> multipath/FOO-2449862 DEGRADED da13 (ACTIVE) >> multipath/FOO-1137080233 DEGRADED da14 (ACTIVE) >> multipath/FOO-1696804371 DEGRADED da15 (ACTIVE) >> multipath/FOO-1304457562 DEGRADED da16 (ACTIVE) >> multipath/FOO-912159854 DEGRADED da17 (ACTIVE) >> >> now on node B i should do the same? reboot both nodes and i should be >> able "see" 32 multipath/FOO deices to create a pool from ? >> >> >> It appears from the above output that you labeled all of the block >> devices (da0 through da17) with the same label. >> >> This is not what you want. >> >> Use "gmultipath clear FOO" on each of the block devices and have >> another go using unique values. >> >> For example: >> >> gmultipath label SATA_LUN01 da0 >> gmultipath label SATA_LUN02 da1 >> gmultipath label SATA_LUN03 da2 >> gmultipath label SATA_LUN04 da3 >> gmultipath label SATA_LUN05 da4 >> gmultipath label SATA_LUN06 da5 >> gmultipath label SATA_LUN07 da6 >> gmultipath label SATA_LUN08 da7 >> gmultipath label SATA_LUN09 da8 >> gmultipath label SATA_LUN10 da9 >> gmultipath label SATA_LUN11 da10 >> gmultipath label SATA_LUN12 da11 >> gmultipath label SATA_LUN13 da12 >> gmultipath label SATA_LUN14 da13 >> gmultipath label SATA_LUN15 da14 >> gmultipath label SATA_LUN16 da15 >> gmultipath label SATA_LUN17 da16 >> gmultipath label SATA_LUN18 da17 >> .. >> >> Then "gmultipath status" should show your unique labels each with a >> single component. >> >> Then you would do: >> >> zpool create master multipath/SATA_LUN{01,02,03,04,05,06,=85} >> >> > ahh ok got it, and probably on the other node > > gmultipath label SATA_LUN19 da0 > gmultipath label SATA_LUN20 da1 > > -------------------snip------------------------------ > > gmultipath label SATA_LUN36 da15 > > > No. You do not need to label the other "node" > > Since the "gmultipath label =85" command writes data to the disk, you do > not need to label the disk multiple times (and in fact would be an error > to). Rather, as the system is probing and adding disks, it will > automatically detect multiple paths based on this data stored on the disk= . > > Read: If da0 and another da# device are indeed two paths to the same > device, then as those devices are probed by the kernel, "gmultipath statu= s" > will dynamically show the newly discovered paths. > > If, after labeling all the devices on a single path you find that > "gmultipath status" still shows only one component for each label, try > rebooting. If still after a reboot "gmultipath status" only shows a singl= e > component for each label, then clearly you are not configured (hardware > wise) for multiple paths to the same components (and this may be where th= e > "gmultipath" versus "gmirror" nit that I caught in your original post com= es > into play -- maybe "gmultipath" was the wrong thing to put in the subject > if you don't have multiple paths to the same components but instead have = a > mirrored set of components that you want to gmirror all your data to a > second pool -- if that ends up being the case, then I would actually > recommend a zfs send/receive cron-job based on snapshots to utilize the > performance of ZFS Copy On Write rather than perhaps gmirror; but your > mileage may vary). > -- > well nodeA sees daX devices, where nodeB does also however the serials for da0 are different on both nodes it seems NodeA sees NodeB drives as sesX/(daX,passX) and NodeB sees NodeA drives as sesX/(daX,passX) each node sees pass0 to pass32 so i would think the 4 LSI controllers connected to the backplane see all 32 SATA drives in the enclosure nodeA drive list camcontrol devlist at scbus0 target 9 lun 0 (da0,pass0) at scbus0 target 10 lun 0 (ses0,pass1) at scbus0 target 11 lun 0 (da1,pass2) at scbus0 target 12 lun 0 (ses1,pass3) at scbus0 target 13 lun 0 (da2,pass4) at scbus0 target 14 lun 0 (ses2,pass5) at scbus0 target 15 lun 0 (da3,pass6) at scbus0 target 16 lun 0 (ses3,pass7) at scbus0 target 17 lun 0 (da4,pass8) at scbus0 target 18 lun 0 (ses4,pass9) at scbus0 target 19 lun 0 (da5,pass10) at scbus0 target 20 lun 0 (ses5,pass11) at scbus0 target 21 lun 0 (da6,pass12) at scbus0 target 22 lun 0 (ses6,pass13) at scbus0 target 23 lun 0 (da7,pass14) at scbus0 target 24 lun 0 (ses7,pass15) at scbus1 target 0 lun 0 (da8,pass16) at scbus1 target 1 lun 0 (da9,pass17) at scbus8 target 9 lun 0 (da10,pass18) at scbus8 target 10 lun 0 (ses8,pass19) at scbus8 target 11 lun 0 (da11,pass20) at scbus8 target 12 lun 0 (ses9,pass21) at scbus8 target 13 lun 0 (da12,pass22) at scbus8 target 14 lun 0 (ses10,pass23) at scbus8 target 15 lun 0 (da13,pass24) at scbus8 target 16 lun 0 (ses11,pass25) at scbus8 target 17 lun 0 (da14,pass26) at scbus8 target 18 lun 0 (ses12,pass27) at scbus8 target 19 lun 0 (da15,pass28) at scbus8 target 20 lun 0 (ses13,pass29) at scbus8 target 21 lun 0 (da16,pass30) at scbus8 target 22 lun 0 (ses14,pass31) at scbus8 target 23 lun 0 (da17,pass32) at scbus8 target 24 lun 0 (ses15,pass33) nodeB drive list camcontrol devlist at scbus0 target 9 lun 0 (pass0,da0) at scbus0 target 10 lun 0 (pass1,da1) at scbus0 target 11 lun 0 (ses0,pass2) at scbus0 target 12 lun 0 (pass3,da2) at scbus0 target 13 lun 0 (ses1,pass4) at scbus0 target 14 lun 0 (pass5,da3) at scbus0 target 15 lun 0 (ses2,pass6) at scbus0 target 16 lun 0 (pass7,da4) at scbus0 target 17 lun 0 (ses3,pass8) at scbus0 target 18 lun 0 (pass9,da5) at scbus0 target 19 lun 0 (ses4,pass10) at scbus0 target 20 lun 0 (pass11,da6) at scbus0 target 21 lun 0 (pass12,da7) at scbus0 target 22 lun 0 (ses5,pass13) at scbus0 target 23 lun 0 (pass14,da8) at scbus0 target 25 lun 0 (ses6,pass15) at scbus0 target 26 lun 0 (pass16,da9) at scbus0 target 28 lun 0 (ses7,pass17) at scbus1 target 0 lun 0 (pass18,da10) at scbus1 target 1 lun 0 (pass19,da11) at scbus8 target 9 lun 0 (pass20,da12) at scbus8 target 10 lun 0 (ses8,pass21) at scbus8 target 11 lun 0 (pass22,da13) at scbus8 target 12 lun 0 (ses9,pass23) at scbus8 target 13 lun 0 (pass24,da14) at scbus8 target 14 lun 0 (ses10,pass25) at scbus8 target 15 lun 0 (pass26,da15) at scbus8 target 16 lun 0 (ses11,pass27) at scbus8 target 17 lun 0 (pass28,da16) at scbus8 target 18 lun 0 (ses12,pass29) at scbus8 target 19 lun 0 (pass30,da17) at scbus8 target 20 lun 0 (ses13,pass31) at scbus8 target 21 lun 0 (pass32,da18) at scbus8 target 22 lun 0 (ses14,pass33) at scbus8 target 23 lun 0 (pass34,da19) at scbus8 target 24 lun 0 (ses15,pass35) at scbus10 target 0 lun 0 (da20,pass36) the logic looks right, correct? Devin > > > > then create the zpool from the "36" multipath devices? > > so if i create a 36 drive multipath zpool on nodeA when it fails do i > just import it to nodeB > i was thinking to use carp for failover..... so nodeB would continue nfs > sessions and import the zpool to nodeB > > > >> -- >> Devin >> >> _____________ >> The information contained in this message is proprietary and/or >> confidential. If you are not the intended recipient, please: (i) delete = the >> message and all copies; (ii) do not disclose, distribute or use the mess= age >> in any manner; and (iii) notify the sender immediately. In addition, ple= ase >> be aware that any message addressed to our domain is subject to archivin= g >> and review by persons other than the intended recipient. Thank you. >> > > > _____________ > The information contained in this message is proprietary and/or > confidential. If you are not the intended recipient, please: (i) delete t= he > message and all copies; (ii) do not disclose, distribute or use the messa= ge > in any manner; and (iii) notify the sender immediately. In addition, plea= se > be aware that any message addressed to our domain is subject to archiving > and review by persons other than the intended recipient. Thank you. >