From owner-freebsd-current@FreeBSD.ORG Mon Sep 24 05:50:32 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 163C9106566B for ; Mon, 24 Sep 2012 05:50:32 +0000 (UTC) (envelope-from jose@lajni.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id D2DE68FC0A for ; Mon, 24 Sep 2012 05:50:31 +0000 (UTC) Received: by pbbrp8 with SMTP id rp8so2354010pbb.13 for ; Sun, 23 Sep 2012 22:50:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:x-mailer:thread-index:content-language :x-gm-message-state; bh=CdYXXgJCg3Um+0STGuaRM5/IxRGhS0dV02XpUsT+5Ds=; b=Qre+GUno9GHjtoUlnBaUKsFwgyqUBgCr7jvAXTiJ82EBMm8nBV6FX2EGS7MKGzhxBe SndKUChQnW5IvvgDgbvTYypphMzIHN2oG2/evvUTr9xV1fKDpbf6LhH6D7cBr7G2n2Rk qoMju6NiIB52MPQOb+prKc/mGBr8uvCB4piDAoN1LKsyQ8XYh7h0hUz6mUNyqmNJFFPV pL0Xn9EAkmBkmD1L6w8Ovqi9LhSB4YfCCoYCvYSFG6a1HV9QIeJXGtAksRzUnldDDJxh deTRgdMiqEClDkF9H4E6wQUgt6GvUFmbXdHZYgt3Bn9MMf6zQv2iZELv/7mG01Wgfb9I Ea2Q== Received: by 10.66.75.232 with SMTP id f8mr29925881paw.59.1348465831032; Sun, 23 Sep 2012 22:50:31 -0700 (PDT) Received: from josebashPC ([64.27.27.45]) by mx.google.com with ESMTPS id uh7sm9111759pbc.35.2012.09.23.22.50.28 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 23 Sep 2012 22:50:30 -0700 (PDT) From: "Jose A. Lombera" To: References: In-Reply-To: Date: Sun, 23 Sep 2012 22:50:28 -0700 Message-ID: <013101cd9a18$8106ede0$8314c9a0$@lajni.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: Ac2aDaSfA/eISQnaQV2EOVvlTuByDwAA4HcgAAHPPwA= Content-Language: en-us X-Gm-Message-State: ALoCoQm2MZcEUhy2c+WAmgMQV0Kg94ZXgBnYdR/OzYE4kMM95xAkCAmWMF41747bGbpukuUoXtDR Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: RE: zpool can't bring online disk2 ----I screwed up X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Sep 2012 05:50:32 -0000 This is the error I got when I run the failover script. =20 Sep 24 06:43:39 san1 hastd[3404]: [disk3] (primary) Provider /dev/mfid3 = is not part of resource disk3. Sep 24 06:43:39 san1 hastd[3343]: [disk3] (primary) Worker process = exited ungracefully (pid=3D3404, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3413]: [disk6] (primary) Provider /dev/mfid6 = is not part of resource disk6. Sep 24 06:43:39 san1 hastd[3343]: [disk6] (primary) Worker process = exited ungracefully (pid=3D3413, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3425]: [disk10] (primary) Unable to open = /dev/mfid10: No such file or directory. Sep 24 06:43:39 san1 hastd[3407]: [disk4] (primary) Provider /dev/mfid4 = is not part of resource disk4. Sep 24 06:43:39 san1 hastd[3343]: [disk10] (primary) Worker process = exited ungracefully (pid=3D3425, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3410]: [disk5] (primary) Provider /dev/mfid5 = is not part of resource disk5. Sep 24 06:43:39 san1 hastd[3343]: [disk4] (primary) Worker process = exited ungracefully (pid=3D3407, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3416]: [disk7] (primary) Provider /dev/mfid7 = is not part of resource disk7. Sep 24 06:43:39 san1 hastd[3422]: [disk9] (primary) Provider /dev/mfid9 = is not part of resource disk9. Sep 24 06:43:39 san1 hastd[3419]: [disk8] (primary) Provider /dev/mfid8 = is not part of resource disk8. Sep 24 06:43:39 san1 hastd[3343]: [disk5] (primary) Worker process = exited ungracefully (pid=3D3410, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3343]: [disk9] (primary) Worker process = exited ungracefully (pid=3D3422, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3343]: [disk8] (primary) Worker process = exited ungracefully (pid=3D3419, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3343]: [disk7] (primary) Worker process = exited ungracefully (pid=3D3416, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803). Sep 24 06:43:45 san1 hastd[3348]: [disk1] (primary) Split-brain = condition! Sep 24 06:43:50 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803). Sep 24 06:43:55 san1 hastd[3348]: [disk1] (primary) Split-brain = condition! Sep 24 06:44:00 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803). Sep 24 06:44:05 san1 hastd[3348]: [disk1] (primary) Split-brain = condition! Sep 24 06:44:10 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803) =20 =20 Is there any patch I need to run to fix this issue? =20 =20 =20 From: Jose A. Lombera [mailto:jose@lajni.com]=20 Sent: Sunday, September 23, 2012 10:00 PM To: freebsd-current@freebsd.org Cc: freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 ----I screwed up =20 Everytime I run this for any of the disk 3,4,5,6,7,8,9,10 Disk 1,2 shows in the /dev/hast =20 [root@san2 /usr/home/jose]# hastctl role primary disk3 [root@san2 /usr/home/jose]# =20 I got this in the logs. =20 Sep 23 21:58:13 san2 hastd[2793]: [disk3] (primary) Provider /dev/mfid3 = is not part of resource disk3. =20 Please help. =20 Thanks. =20 =20 =20 From: Jose A. Lombera [mailto:jose@lajni.com]=20 Sent: Sunday, September 23, 2012 9:46 PM To: 'Freddie Cash' Cc: freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 ----I screwed up =20 Please, some one help me=E2=80=A6.!!! =20 I screw up big time. =20 =20 I was doing the=20 =20 Hastctl create disk2 =20 But since I got some input out errors I decided to stop = /etc/rc.d/hastd stop But since couldn=E2=80=99t stop disk1 and 9 I killed it. Restarted both servers. =20 And now only /dev/hast shows nothing. And the pool is lost. =20 I was able to create disk2. I have restarted both server but the pool is not coming up. =20 Any suggestions, please help I know that the info is there since I only = did =E2=80=9Chastctl create disk2=E2=80=9D I haven=E2=80=99t done it for = the other disks. =20 =20 =20 =20 =20 From: Jose A. Lombera [mailto:jose@lajni.com]=20 Sent: Sunday, September 23, 2012 8:10 PM To: 'Freddie Cash' Cc: freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 =20 Freddie, =20 Thanks for your great help, now makes so much sense. I still have a small problem, and I'm not sure if it is because hastd is = running. I can't initialize (hastctl create disk2) disk2 =20 This is what I did. =20 1.. zpool offline tank /dev/dsk/hast/disk2 2. zpool status -x [root@san /usr/home/jose]# zpool status -x pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning = in a degraded state. action: Online the device using 'zpool online' or replace the device = with 'zpool replace'. scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 = 2012 config: =20 NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 hast/disk1 ONLINE 0 0 0 11919832608590631234 OFFLINE 0 0 0 was = /dev/dsk/hast/disk2 hast/disk3 ONLINE 0 0 0 hast/disk4 ONLINE 0 0 0 hast/disk5 ONLINE 0 0 0 hast/disk6 ONLINE 0 0 0 hast/disk7 ONLINE 0 0 0 hast/disk8 ONLINE 0 0 0 hast/disk9 ONLINE 0 0 0 hast/disk10 ONLINE 0 0 0 =20 errors: No known data errors =20 3. removed disk / insert a new one. 4. initialize Hastctl role init disk2 [root@san /usr/home/jose]# hastctl status disk2 disk2: role: init provname: disk2 localpath: /dev/mfid2 extentsize: 0 (0B) keepdirty: 0 remoteaddr: san1 replication: fullsync dirty: 0 (0B) statistics: reads: 0 writes: 0 deletes: 0 flushes: 0 activemap updates: 0 [root@san /usr/home/jose]#=20 [root@san /usr/home/jose]#=20 [root@san /usr/home/jose]# hastctl create disk2 [ERROR] [disk2] Unable to write metadata: Input/output error. =20 =20 =20 I don't want to stop hastd since it will shut down the connection to my = san. =20 Do you have any suggestion? =20 Thanks =20 =20 --jose =20 =20 -----Original Message----- From: owner-freebsd-current@freebsd.org = [mailto:owner-freebsd-current@freebsd.org] On Behalf Of Freddie Cash Sent: Sunday, September 23, 2012 6:30 PM To: compufutura -the computer of the future Cc: yanegomi@gmail.com; freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 =20 Since it's a HAST device, you have to initialise the disk via hastctl. = Once that is done, the /dev/hast/disk2 GEOM device node will be created. =20 Then you can 'zpool replace' it. =20 One step at a time. :) And you've skipped a few. =20 1. 'zpool offline' the defective disk 2. Physically remove the defective disk 3. Physically insert the new disk 4. Initialise it as a HAST resource via 'hastctl' 5. 'zpool replace' it using the /dev/hast node 6. Wait for the pool (and = HAST) to resilver it 7. Carry on as per normal On Sep 23, 2012 2:28 PM, = "compufutura -the computer of the future" < = jose@compufutura.com> wrote: =20 > Yanegomi, >=20 >=20 >=20 > I tried that, as you can see below, freebsd doesn=E2=80=99t have = cfgadm >=20 > Utility to un configure the device, according to,=20 > = http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html, I=20 > looked to ports but there is no utility like that. >=20 >=20 >=20 > Pardon me, my knowledge is little. >=20 >=20 >=20 > Can you please type the command I will need, or if I need cfgadm do I=20 > have to look for that and install it in my freebsd box? >=20 >=20 >=20 > Thanks. >=20 >=20 >=20 >=20 >=20 > [root@san1 /usr/home/jose]# zpool offline tank hast/disk2 >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool status -x >=20 > pool: tank >=20 > state: DEGRADED >=20 > status: One or more devices has been taken offline by the = administrator. >=20 > Sufficient replicas exist for the pool to continue functioning = > in a >=20 > degraded state. >=20 > action: Online the device using 'zpool online' or replace the device=20 > with >=20 > 'zpool replace'. >=20 > scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19=20 > 2012 >=20 > config: >=20 >=20 >=20 > NAME STATE READ WRITE CKSUM >=20 > tank DEGRADED 0 0 0 >=20 > raidz1-0 DEGRADED 0 0 0 >=20 > hast/disk1 ONLINE 0 0 0 >=20 > 11919832608590631234 OFFLINE 0 0 0 was > /dev/hast/disk2 >=20 > hast/disk3 ONLINE 0 0 0 >=20 > hast/disk4 ONLINE 0 0 0 >=20 > hast/disk5 ONLINE 0 0 0 >=20 > hast/disk6 ONLINE 0 0 0 >=20 > hast/disk7 ONLINE 0 0 0 >=20 > hast/disk8 ONLINE 0 0 0 >=20 > hast/disk9 ONLINE 0 0 0 >=20 > hast/disk10 ONLINE 0 0 0 >=20 >=20 >=20 > errors: No known data errors >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool replace tank hast/disk2 >=20 > cannot open 'hast/disk2': no such GEOM provider >=20 > must be a full path or shorthand device name >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# cfgadm >=20 > bash: cfgadm: command not found >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool offline tank hast/disk2 >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool status -x >=20 > pool: tank >=20 > state: DEGRADED >=20 > status: One or more devices has been taken offline by the = administrator. >=20 > Sufficient replicas exist for the pool to continue functioning = > in a >=20 > degraded state. >=20 > action: Online the device using 'zpool online' or replace the device=20 > with >=20 > 'zpool replace'. >=20 > scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19=20 > 2012 >=20 > config: >=20 >=20 >=20 > NAME STATE READ WRITE CKSUM >=20 > tank DEGRADED 0 0 0 >=20 > raidz1-0 DEGRADED 0 0 0 >=20 > hast/disk1 ONLINE 0 0 0 >=20 > 11919832608590631234 OFFLINE 0 0 0 was > /dev/hast/disk2 >=20 > hast/disk3 ONLINE 0 0 0 >=20 > hast/disk4 ONLINE 0 0 0 >=20 > hast/disk5 ONLINE 0 0 0 >=20 > hast/disk6 ONLINE 0 0 0 >=20 > hast/disk7 ONLINE 0 0 0 >=20 > hast/disk8 ONLINE 0 0 0 >=20 > hast/disk9 ONLINE 0 0 0 >=20 > hast/disk10 ONLINE 0 0 0 >=20 >=20 >=20 > errors: No known data errors >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool online tank hast/disk2 >=20 > warning: device 'hast/disk2' onlined, but remains in faulted state >=20 > use 'zpool replace' to replace devices that are no longer present >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool replace tank hast/disk2 >=20 > cannot open 'hast/disk2': no such GEOM provider >=20 > must be a full path or shorthand device name >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 >=20 >=20 > From: Garrett Cooper < yanegomi@gmail.com> > Date: September 23, 2012 12:25:52 PM PDT > To: "Jose A. Lombera" < jose@lajni.com> > Cc: freebsd-current@freebsd.org > Subject: Re: zpool can't bring online disk2 >=20 > On Sun, Sep 23, 2012 at 11:23 AM, Jose A. Lombera < = jose@lajni.com> wrote: >=20 >=20 >=20 > Hello! all, >=20 >=20 >=20 > I hope someone can help me out with this. >=20 >=20 >=20 > Recently disk2 when bad, I have used >=20 >=20 >=20 > Zpool offline tank hast/disk2 >=20 >=20 >=20 > To bring the disk offline. >=20 > Then I replaced it. >=20 >=20 >=20 >=20 >=20 >=20 >=20 > And use the command >=20 >=20 >=20 > Zpool online tank hast/disk2 >=20 >=20 >=20 > But the disk show REMOVE. >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > [root@san1 /usr/home/jose]# zpool status -v >=20 > pool: tank >=20 > state: DEGRADED >=20 > status: One or more devices has been removed by the administrator. >=20 >=20 >=20 > Sufficient replicas exist for the pool to continue functioning=20 > in a >=20 > degraded state. >=20 >=20 >=20 > action: Online the device using 'zpool online' or replace the device=20 > with >=20 >=20 >=20 > 'zpool replace'. >=20 >=20 >=20 > scan: resilvered 2.49M in 0h2m with 0 errors on Sat Sep 22 01:03:13=20 > 2012 >=20 > config: >=20 >=20 >=20 > NAME STATE READ WRITE CKSUM >=20 >=20 >=20 > tank DEGRADED 0 0 0 >=20 >=20 >=20 > raidz1-0 DEGRADED 0 0 0 >=20 >=20 >=20 > hast/disk1 ONLINE 0 0 0 >=20 >=20 >=20 > 11919832608590631234 REMOVED 0 0 0 was >=20 > /dev/hast/disk2 >=20 >=20 >=20 > hast/disk3 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk4 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk5 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk6 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk7 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk8 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk9 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk10 ONLINE 0 0 0 >=20 >=20 >=20 > [root@san1 /usr/home/jose]# zpool online tank hast/disk2 >=20 >=20 >=20 > warning: device 'hast/disk2' onlined, but remains in faulted state >=20 >=20 >=20 > use 'zpool replace' to replace devices that are no longer present >=20 >=20 >=20 > [root@san1 /usr/home/jose]# >=20 >=20 >=20 > I can't bring it back online. >=20 >=20 >=20 > Can you guys help me out what to do. >=20 >=20 >=20 > This is a production server and I can't afford to bring the server = down. >=20 >=20 >=20 > I have already swap 3 disks and I got the same result. >=20 >=20 >=20 > Thank you guys in advance. >=20 >=20 > You forgot to call zpool replace as the last step in the process of = > replacing your faulted disk: > = http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html . > Cheers, > -Garrett >=20 > _______________________________________________ > freebsd-current@freebsd.org = mailing list=20 > = http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to " = = freebsd-current-unsubscribe@freebsd.org" >=20 _______________________________________________ freebsd-current@freebsd.org = mailing list = = http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to " = = freebsd-current-unsubscribe@freebsd.org"