Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Sep 2012 22:50:28 -0700
From:      "Jose A. Lombera" <jose@lajni.com>
To:        <freebsd-current@freebsd.org>
Subject:   RE: zpool can't bring online disk2 ----I screwed up
Message-ID:  <013101cd9a18$8106ede0$8314c9a0$@lajni.com>

next in thread | raw e-mail | index | archive | help
This is the error I got when I run the failover script.

=20

Sep 24 06:43:39 san1 hastd[3404]: [disk3] (primary) Provider /dev/mfid3 =
is not part of resource disk3.

Sep 24 06:43:39 san1 hastd[3343]: [disk3] (primary) Worker process =
exited ungracefully (pid=3D3404, exitcode=3D66).

Sep 24 06:43:39 san1 hastd[3413]: [disk6] (primary) Provider /dev/mfid6 =
is not part of resource disk6.

Sep 24 06:43:39 san1 hastd[3343]: [disk6] (primary) Worker process =
exited ungracefully (pid=3D3413, exitcode=3D66).

Sep 24 06:43:39 san1 hastd[3425]: [disk10] (primary) Unable to open =
/dev/mfid10: No such file or directory.

Sep 24 06:43:39 san1 hastd[3407]: [disk4] (primary) Provider /dev/mfid4 =
is not part of resource disk4.

Sep 24 06:43:39 san1 hastd[3343]: [disk10] (primary) Worker process =
exited ungracefully (pid=3D3425, exitcode=3D66).

Sep 24 06:43:39 san1 hastd[3410]: [disk5] (primary) Provider /dev/mfid5 =
is not part of resource disk5.

Sep 24 06:43:39 san1 hastd[3343]: [disk4] (primary) Worker process =
exited ungracefully (pid=3D3407, exitcode=3D66).

Sep 24 06:43:39 san1 hastd[3416]: [disk7] (primary) Provider /dev/mfid7 =
is not part of resource disk7.

Sep 24 06:43:39 san1 hastd[3422]: [disk9] (primary) Provider /dev/mfid9 =
is not part of resource disk9.

Sep 24 06:43:39 san1 hastd[3419]: [disk8] (primary) Provider /dev/mfid8 =
is not part of resource disk8.

Sep 24 06:43:39 san1 hastd[3343]: [disk5] (primary) Worker process =
exited ungracefully (pid=3D3410, exitcode=3D66).

Sep 24 06:43:40 san1 hastd[3343]: [disk9] (primary) Worker process =
exited ungracefully (pid=3D3422, exitcode=3D66).

Sep 24 06:43:40 san1 hastd[3343]: [disk8] (primary) Worker process =
exited ungracefully (pid=3D3419, exitcode=3D66).

Sep 24 06:43:40 san1 hastd[3343]: [disk7] (primary) Worker process =
exited ungracefully (pid=3D3416, exitcode=3D66).

Sep 24 06:43:40 san1 hastd[3351]: [disk2] (primary) Resource unique ID =
mismatch (primary=3D2635341666474957411, =
secondary=3D5944493181984227803).

Sep 24 06:43:45 san1 hastd[3348]: [disk1] (primary) Split-brain =
condition!

Sep 24 06:43:50 san1 hastd[3351]: [disk2] (primary) Resource unique ID =
mismatch (primary=3D2635341666474957411, =
secondary=3D5944493181984227803).

Sep 24 06:43:55 san1 hastd[3348]: [disk1] (primary) Split-brain =
condition!

Sep 24 06:44:00 san1 hastd[3351]: [disk2] (primary) Resource unique ID =
mismatch (primary=3D2635341666474957411, =
secondary=3D5944493181984227803).

Sep 24 06:44:05 san1 hastd[3348]: [disk1] (primary) Split-brain =
condition!

Sep 24 06:44:10 san1 hastd[3351]: [disk2] (primary) Resource unique ID =
mismatch (primary=3D2635341666474957411, =
secondary=3D5944493181984227803)

=20

=20

Is there any patch I need to run to fix this issue?

=20

=20

=20

From: Jose A. Lombera [mailto:jose@lajni.com]=20
Sent: Sunday, September 23, 2012 10:00 PM
To: freebsd-current@freebsd.org
Cc: freebsd-current@freebsd.org
Subject: RE: zpool can't bring online disk2 ----I screwed up

=20

Everytime I run this for any of the disk 3,4,5,6,7,8,9,10

Disk 1,2 shows in the /dev/hast

=20

[root@san2 /usr/home/jose]# hastctl role primary disk3

[root@san2 /usr/home/jose]#

=20

I got this in the logs.

=20

Sep 23 21:58:13 san2 hastd[2793]: [disk3] (primary) Provider /dev/mfid3 =
is not part of resource disk3.

=20

Please help.

=20

Thanks.

=20

=20

=20

From: Jose A. Lombera [mailto:jose@lajni.com]=20
Sent: Sunday, September 23, 2012 9:46 PM
To: 'Freddie Cash'
Cc: freebsd-current@freebsd.org
Subject: RE: zpool can't bring online disk2 ----I screwed up

=20

Please, some one help me=E2=80=A6.!!!

=20

I screw up big time.

=20

=20

I was doing the=20

=20

Hastctl create disk2

=20

But since I got some input out errors I decided to stop   =
/etc/rc.d/hastd stop

But since couldn=E2=80=99t stop disk1 and 9 I killed it.

Restarted both servers.

=20

And now only  /dev/hast  shows nothing.

And the pool is lost.

=20

I was able to create disk2.

I have restarted both server but  the pool is not coming up.

=20

Any suggestions, please help I know that the info is there since I only =
did =E2=80=9Chastctl create disk2=E2=80=9D I haven=E2=80=99t done it for =
the other disks.

=20

=20

=20

=20

=20

From: Jose A. Lombera [mailto:jose@lajni.com]=20
Sent: Sunday, September 23, 2012 8:10 PM
To: 'Freddie Cash'
Cc: freebsd-current@freebsd.org
Subject: RE: zpool can't bring online disk2

=20

Freddie,

=20

Thanks for your great help, now makes so much sense.

I still have a small problem, and I'm not sure if it is because hastd is =
running.

I can't initialize (hastctl create disk2) disk2

=20

This is what I did.

=20

1.. zpool offline tank /dev/dsk/hast/disk2

2. zpool status -x

[root@san /usr/home/jose]# zpool status -x

  pool: tank

state: DEGRADED

status: One or more devices has been taken offline by the administrator.

        Sufficient replicas exist for the pool to continue functioning =
in a

        degraded state.

action: Online the device using 'zpool online' or replace the device =
with

        'zpool replace'.

scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 =
2012

config:

=20

        NAME                      STATE     READ WRITE CKSUM

        tank                      DEGRADED     0     0     0

          raidz1-0                DEGRADED     0     0     0

            hast/disk1            ONLINE       0     0     0

            11919832608590631234  OFFLINE      0     0     0  was =
/dev/dsk/hast/disk2

            hast/disk3            ONLINE       0     0     0

            hast/disk4            ONLINE       0     0     0

            hast/disk5            ONLINE       0     0     0

            hast/disk6            ONLINE       0     0     0

            hast/disk7            ONLINE       0     0     0

            hast/disk8            ONLINE       0     0     0

            hast/disk9            ONLINE       0     0     0

            hast/disk10           ONLINE       0     0     0

=20

errors: No known data errors

=20

3. removed disk / insert a new one.

4. initialize

     Hastctl role init disk2

    [root@san /usr/home/jose]# hastctl status disk2

disk2:

  role: init

  provname: disk2

  localpath: /dev/mfid2

  extentsize: 0 (0B)

  keepdirty: 0

  remoteaddr: san1

  replication: fullsync

  dirty: 0 (0B)

  statistics:

    reads: 0

    writes: 0

    deletes: 0

    flushes: 0

    activemap updates: 0

[root@san /usr/home/jose]#=20

[root@san /usr/home/jose]#=20

[root@san /usr/home/jose]# hastctl create disk2

[ERROR] [disk2] Unable to write metadata: Input/output error.

=20

=20

=20

I don't want to stop hastd since it will shut down the connection to my =
san.

=20

Do you have any suggestion?

=20

Thanks

=20

=20

--jose

=20

=20

-----Original Message-----
From: owner-freebsd-current@freebsd.org =
[mailto:owner-freebsd-current@freebsd.org] On Behalf Of Freddie Cash
Sent: Sunday, September 23, 2012 6:30 PM
To: compufutura -the computer of the future
Cc: yanegomi@gmail.com; freebsd-current@freebsd.org
Subject: RE: zpool can't bring online disk2

=20

Since it's a HAST device, you have to initialise the disk via hastctl. =
Once that is done, the /dev/hast/disk2 GEOM device node will be created.

=20

Then you can 'zpool replace' it.

=20

One step at a time. :)  And you've skipped a few.

=20

1. 'zpool offline' the defective disk

2. Physically remove the defective disk

3. Physically insert the new disk

4. Initialise it as a HAST resource via 'hastctl'

5. 'zpool replace' it using the /dev/hast node 6. Wait for the pool (and =
HAST) to resilver it 7. Carry on as per normal  On Sep 23, 2012 2:28 PM, =
"compufutura -the computer of the future" <  =
<mailto:jose@compufutura.com> jose@compufutura.com> wrote:

=20

> Yanegomi,

>=20

>=20

>=20

> I tried that, as you can see below, freebsd doesn=E2=80=99t have =
cfgadm

>=20

> Utility to un configure the device, according to,=20

>  <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html>; =
http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html, I=20

> looked to ports but there is no utility like that.

>=20

>=20

>=20

> Pardon me, my knowledge is little.

>=20

>=20

>=20

> Can you please type the command I will need, or if I need cfgadm do I=20

> have to look for that and install it in my freebsd box?

>=20

>=20

>=20

> Thanks.

>=20

>=20

>=20

>=20

>=20

> [root@san1 /usr/home/jose]# zpool offline tank hast/disk2

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# zpool status -x

>=20

>   pool: tank

>=20

> state: DEGRADED

>=20

> status: One or more devices has been taken offline by the =
administrator.

>=20

>         Sufficient replicas exist for the pool to continue functioning =


> in a

>=20

>         degraded state.

>=20

> action: Online the device using 'zpool online' or replace the device=20

> with

>=20

>         'zpool replace'.

>=20

> scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19=20

> 2012

>=20

> config:

>=20

>=20

>=20

>         NAME                      STATE     READ WRITE CKSUM

>=20

>         tank                      DEGRADED     0     0     0

>=20

>           raidz1-0                DEGRADED     0     0     0

>=20

>             hast/disk1            ONLINE       0     0     0

>=20

>             11919832608590631234  OFFLINE      0     0     0  was

> /dev/hast/disk2

>=20

>             hast/disk3            ONLINE       0     0     0

>=20

>             hast/disk4            ONLINE       0     0     0

>=20

>             hast/disk5            ONLINE       0     0     0

>=20

>             hast/disk6            ONLINE       0     0     0

>=20

>             hast/disk7            ONLINE       0     0     0

>=20

>             hast/disk8            ONLINE       0     0     0

>=20

>             hast/disk9            ONLINE       0     0     0

>=20

>             hast/disk10           ONLINE       0     0     0

>=20

>=20

>=20

> errors: No known data errors

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# zpool replace tank hast/disk2

>=20

> cannot open 'hast/disk2': no such GEOM provider

>=20

> must be a full path or shorthand device name

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# cfgadm

>=20

> bash: cfgadm: command not found

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# zpool offline tank hast/disk2

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# zpool status -x

>=20

>   pool: tank

>=20

> state: DEGRADED

>=20

> status: One or more devices has been taken offline by the =
administrator.

>=20

>         Sufficient replicas exist for the pool to continue functioning =


> in a

>=20

>         degraded state.

>=20

> action: Online the device using 'zpool online' or replace the device=20

> with

>=20

>         'zpool replace'.

>=20

> scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19=20

> 2012

>=20

> config:

>=20

>=20

>=20

>         NAME                      STATE     READ WRITE CKSUM

>=20

>         tank                      DEGRADED     0     0     0

>=20

>           raidz1-0                DEGRADED     0     0     0

>=20

>             hast/disk1            ONLINE       0     0     0

>=20

>             11919832608590631234  OFFLINE      0     0     0  was

> /dev/hast/disk2

>=20

>             hast/disk3            ONLINE       0     0     0

>=20

>             hast/disk4            ONLINE       0     0     0

>=20

>             hast/disk5            ONLINE       0     0     0

>=20

>             hast/disk6            ONLINE       0     0     0

>=20

>             hast/disk7            ONLINE       0     0     0

>=20

>             hast/disk8            ONLINE       0     0     0

>=20

>             hast/disk9            ONLINE       0     0     0

>=20

>             hast/disk10           ONLINE       0     0     0

>=20

>=20

>=20

> errors: No known data errors

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# zpool online tank hast/disk2

>=20

> warning: device 'hast/disk2' onlined, but remains in faulted state

>=20

> use 'zpool replace' to replace devices that are no longer present

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]# zpool replace tank hast/disk2

>=20

> cannot open 'hast/disk2': no such GEOM provider

>=20

> must be a full path or shorthand device name

>=20

> [root@san1 /usr/home/jose]#

>=20

> [root@san1 /usr/home/jose]#

>=20

>=20

>=20

> From: Garrett Cooper < <mailto:yanegomi@gmail.com> yanegomi@gmail.com>

> Date: September 23, 2012 12:25:52 PM PDT

> To: "Jose A. Lombera" < <mailto:jose@lajni.com> jose@lajni.com>

> Cc:  <mailto:freebsd-current@freebsd.org> freebsd-current@freebsd.org

> Subject: Re: zpool can't bring online disk2

>=20

> On Sun, Sep 23, 2012 at 11:23 AM, Jose A. Lombera < =
<mailto:jose@lajni.com> jose@lajni.com> wrote:

>=20

>=20

>=20

> Hello! all,

>=20

>=20

>=20

> I hope someone can help me out with this.

>=20

>=20

>=20

> Recently disk2 when bad, I have used

>=20

>=20

>=20

> Zpool offline tank hast/disk2

>=20

>=20

>=20

> To bring the disk offline.

>=20

> Then I replaced it.

>=20

>=20

>=20

>=20

>=20

>=20

>=20

> And use the command

>=20

>=20

>=20

> Zpool online tank hast/disk2

>=20

>=20

>=20

> But the disk show   REMOVE.

>=20

>=20

>=20

>=20

>=20

>=20

>=20

>=20

>=20

>=20

>=20

> [root@san1 /usr/home/jose]# zpool status -v

>=20

>  pool: tank

>=20

> state: DEGRADED

>=20

> status: One or more devices has been removed by the administrator.

>=20

>=20

>=20

>        Sufficient replicas exist for the pool to continue functioning=20

> in a

>=20

>        degraded state.

>=20

>=20

>=20

> action: Online the device using 'zpool online' or replace the device=20

> with

>=20

>=20

>=20

>        'zpool replace'.

>=20

>=20

>=20

> scan: resilvered 2.49M in 0h2m with 0 errors on Sat Sep 22 01:03:13=20

> 2012

>=20

> config:

>=20

>=20

>=20

>        NAME                      STATE     READ WRITE CKSUM

>=20

>=20

>=20

>        tank                      DEGRADED     0     0     0

>=20

>=20

>=20

>          raidz1-0                DEGRADED     0     0     0

>=20

>=20

>=20

>            hast/disk1            ONLINE       0     0     0

>=20

>=20

>=20

>            11919832608590631234  REMOVED      0     0     0  was

>=20

> /dev/hast/disk2

>=20

>=20

>=20

>            hast/disk3            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk4            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk5            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk6            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk7            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk8            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk9            ONLINE       0     0     0

>=20

>=20

>=20

>            hast/disk10           ONLINE       0     0     0

>=20

>=20

>=20

> [root@san1 /usr/home/jose]# zpool online tank hast/disk2

>=20

>=20

>=20

> warning: device 'hast/disk2' onlined, but remains in faulted state

>=20

>=20

>=20

> use 'zpool replace' to replace devices that are no longer present

>=20

>=20

>=20

> [root@san1 /usr/home/jose]#

>=20

>=20

>=20

> I can't bring it back online.

>=20

>=20

>=20

> Can you guys help me out what to do.

>=20

>=20

>=20

> This is a production server and I can't afford to bring the server =
down.

>=20

>=20

>=20

> I have already swap 3 disks and I got the same result.

>=20

>=20

>=20

> Thank you guys in advance.

>=20

>=20

>    You forgot to call zpool replace as the last step in the process of =


> replacing your faulted disk:

>  <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html>; =
http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html .

> Cheers,

> -Garrett

>=20

> _______________________________________________

>  <mailto:freebsd-current@freebsd.org> freebsd-current@freebsd.org =
mailing list=20

>  <http://lists.freebsd.org/mailman/listinfo/freebsd-current>; =
http://lists.freebsd.org/mailman/listinfo/freebsd-current

> To unsubscribe, send any mail to " =
<mailto:freebsd-current-unsubscribe@freebsd.org> =
freebsd-current-unsubscribe@freebsd.org"

>=20

_______________________________________________

 <mailto:freebsd-current@freebsd.org> freebsd-current@freebsd.org =
mailing list  =
<http://lists.freebsd.org/mailman/listinfo/freebsd-current>; =
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to " =
<mailto:freebsd-current-unsubscribe@freebsd.org> =
freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?013101cd9a18$8106ede0$8314c9a0$>