Date: Sun, 23 Sep 2012 22:50:28 -0700 From: "Jose A. Lombera" <jose@lajni.com> To: <freebsd-current@freebsd.org> Subject: RE: zpool can't bring online disk2 ----I screwed up Message-ID: <013101cd9a18$8106ede0$8314c9a0$@lajni.com>
next in thread | raw e-mail | index | archive | help
This is the error I got when I run the failover script. =20 Sep 24 06:43:39 san1 hastd[3404]: [disk3] (primary) Provider /dev/mfid3 = is not part of resource disk3. Sep 24 06:43:39 san1 hastd[3343]: [disk3] (primary) Worker process = exited ungracefully (pid=3D3404, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3413]: [disk6] (primary) Provider /dev/mfid6 = is not part of resource disk6. Sep 24 06:43:39 san1 hastd[3343]: [disk6] (primary) Worker process = exited ungracefully (pid=3D3413, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3425]: [disk10] (primary) Unable to open = /dev/mfid10: No such file or directory. Sep 24 06:43:39 san1 hastd[3407]: [disk4] (primary) Provider /dev/mfid4 = is not part of resource disk4. Sep 24 06:43:39 san1 hastd[3343]: [disk10] (primary) Worker process = exited ungracefully (pid=3D3425, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3410]: [disk5] (primary) Provider /dev/mfid5 = is not part of resource disk5. Sep 24 06:43:39 san1 hastd[3343]: [disk4] (primary) Worker process = exited ungracefully (pid=3D3407, exitcode=3D66). Sep 24 06:43:39 san1 hastd[3416]: [disk7] (primary) Provider /dev/mfid7 = is not part of resource disk7. Sep 24 06:43:39 san1 hastd[3422]: [disk9] (primary) Provider /dev/mfid9 = is not part of resource disk9. Sep 24 06:43:39 san1 hastd[3419]: [disk8] (primary) Provider /dev/mfid8 = is not part of resource disk8. Sep 24 06:43:39 san1 hastd[3343]: [disk5] (primary) Worker process = exited ungracefully (pid=3D3410, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3343]: [disk9] (primary) Worker process = exited ungracefully (pid=3D3422, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3343]: [disk8] (primary) Worker process = exited ungracefully (pid=3D3419, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3343]: [disk7] (primary) Worker process = exited ungracefully (pid=3D3416, exitcode=3D66). Sep 24 06:43:40 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803). Sep 24 06:43:45 san1 hastd[3348]: [disk1] (primary) Split-brain = condition! Sep 24 06:43:50 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803). Sep 24 06:43:55 san1 hastd[3348]: [disk1] (primary) Split-brain = condition! Sep 24 06:44:00 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803). Sep 24 06:44:05 san1 hastd[3348]: [disk1] (primary) Split-brain = condition! Sep 24 06:44:10 san1 hastd[3351]: [disk2] (primary) Resource unique ID = mismatch (primary=3D2635341666474957411, = secondary=3D5944493181984227803) =20 =20 Is there any patch I need to run to fix this issue? =20 =20 =20 From: Jose A. Lombera [mailto:jose@lajni.com]=20 Sent: Sunday, September 23, 2012 10:00 PM To: freebsd-current@freebsd.org Cc: freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 ----I screwed up =20 Everytime I run this for any of the disk 3,4,5,6,7,8,9,10 Disk 1,2 shows in the /dev/hast =20 [root@san2 /usr/home/jose]# hastctl role primary disk3 [root@san2 /usr/home/jose]# =20 I got this in the logs. =20 Sep 23 21:58:13 san2 hastd[2793]: [disk3] (primary) Provider /dev/mfid3 = is not part of resource disk3. =20 Please help. =20 Thanks. =20 =20 =20 From: Jose A. Lombera [mailto:jose@lajni.com]=20 Sent: Sunday, September 23, 2012 9:46 PM To: 'Freddie Cash' Cc: freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 ----I screwed up =20 Please, some one help me=E2=80=A6.!!! =20 I screw up big time. =20 =20 I was doing the=20 =20 Hastctl create disk2 =20 But since I got some input out errors I decided to stop = /etc/rc.d/hastd stop But since couldn=E2=80=99t stop disk1 and 9 I killed it. Restarted both servers. =20 And now only /dev/hast shows nothing. And the pool is lost. =20 I was able to create disk2. I have restarted both server but the pool is not coming up. =20 Any suggestions, please help I know that the info is there since I only = did =E2=80=9Chastctl create disk2=E2=80=9D I haven=E2=80=99t done it for = the other disks. =20 =20 =20 =20 =20 From: Jose A. Lombera [mailto:jose@lajni.com]=20 Sent: Sunday, September 23, 2012 8:10 PM To: 'Freddie Cash' Cc: freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 =20 Freddie, =20 Thanks for your great help, now makes so much sense. I still have a small problem, and I'm not sure if it is because hastd is = running. I can't initialize (hastctl create disk2) disk2 =20 This is what I did. =20 1.. zpool offline tank /dev/dsk/hast/disk2 2. zpool status -x [root@san /usr/home/jose]# zpool status -x pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning = in a degraded state. action: Online the device using 'zpool online' or replace the device = with 'zpool replace'. scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 = 2012 config: =20 NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 hast/disk1 ONLINE 0 0 0 11919832608590631234 OFFLINE 0 0 0 was = /dev/dsk/hast/disk2 hast/disk3 ONLINE 0 0 0 hast/disk4 ONLINE 0 0 0 hast/disk5 ONLINE 0 0 0 hast/disk6 ONLINE 0 0 0 hast/disk7 ONLINE 0 0 0 hast/disk8 ONLINE 0 0 0 hast/disk9 ONLINE 0 0 0 hast/disk10 ONLINE 0 0 0 =20 errors: No known data errors =20 3. removed disk / insert a new one. 4. initialize Hastctl role init disk2 [root@san /usr/home/jose]# hastctl status disk2 disk2: role: init provname: disk2 localpath: /dev/mfid2 extentsize: 0 (0B) keepdirty: 0 remoteaddr: san1 replication: fullsync dirty: 0 (0B) statistics: reads: 0 writes: 0 deletes: 0 flushes: 0 activemap updates: 0 [root@san /usr/home/jose]#=20 [root@san /usr/home/jose]#=20 [root@san /usr/home/jose]# hastctl create disk2 [ERROR] [disk2] Unable to write metadata: Input/output error. =20 =20 =20 I don't want to stop hastd since it will shut down the connection to my = san. =20 Do you have any suggestion? =20 Thanks =20 =20 --jose =20 =20 -----Original Message----- From: owner-freebsd-current@freebsd.org = [mailto:owner-freebsd-current@freebsd.org] On Behalf Of Freddie Cash Sent: Sunday, September 23, 2012 6:30 PM To: compufutura -the computer of the future Cc: yanegomi@gmail.com; freebsd-current@freebsd.org Subject: RE: zpool can't bring online disk2 =20 Since it's a HAST device, you have to initialise the disk via hastctl. = Once that is done, the /dev/hast/disk2 GEOM device node will be created. =20 Then you can 'zpool replace' it. =20 One step at a time. :) And you've skipped a few. =20 1. 'zpool offline' the defective disk 2. Physically remove the defective disk 3. Physically insert the new disk 4. Initialise it as a HAST resource via 'hastctl' 5. 'zpool replace' it using the /dev/hast node 6. Wait for the pool (and = HAST) to resilver it 7. Carry on as per normal On Sep 23, 2012 2:28 PM, = "compufutura -the computer of the future" < = <mailto:jose@compufutura.com> jose@compufutura.com> wrote: =20 > Yanegomi, >=20 >=20 >=20 > I tried that, as you can see below, freebsd doesn=E2=80=99t have = cfgadm >=20 > Utility to un configure the device, according to,=20 > <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> = http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html, I=20 > looked to ports but there is no utility like that. >=20 >=20 >=20 > Pardon me, my knowledge is little. >=20 >=20 >=20 > Can you please type the command I will need, or if I need cfgadm do I=20 > have to look for that and install it in my freebsd box? >=20 >=20 >=20 > Thanks. >=20 >=20 >=20 >=20 >=20 > [root@san1 /usr/home/jose]# zpool offline tank hast/disk2 >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool status -x >=20 > pool: tank >=20 > state: DEGRADED >=20 > status: One or more devices has been taken offline by the = administrator. >=20 > Sufficient replicas exist for the pool to continue functioning = > in a >=20 > degraded state. >=20 > action: Online the device using 'zpool online' or replace the device=20 > with >=20 > 'zpool replace'. >=20 > scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19=20 > 2012 >=20 > config: >=20 >=20 >=20 > NAME STATE READ WRITE CKSUM >=20 > tank DEGRADED 0 0 0 >=20 > raidz1-0 DEGRADED 0 0 0 >=20 > hast/disk1 ONLINE 0 0 0 >=20 > 11919832608590631234 OFFLINE 0 0 0 was > /dev/hast/disk2 >=20 > hast/disk3 ONLINE 0 0 0 >=20 > hast/disk4 ONLINE 0 0 0 >=20 > hast/disk5 ONLINE 0 0 0 >=20 > hast/disk6 ONLINE 0 0 0 >=20 > hast/disk7 ONLINE 0 0 0 >=20 > hast/disk8 ONLINE 0 0 0 >=20 > hast/disk9 ONLINE 0 0 0 >=20 > hast/disk10 ONLINE 0 0 0 >=20 >=20 >=20 > errors: No known data errors >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool replace tank hast/disk2 >=20 > cannot open 'hast/disk2': no such GEOM provider >=20 > must be a full path or shorthand device name >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# cfgadm >=20 > bash: cfgadm: command not found >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool offline tank hast/disk2 >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool status -x >=20 > pool: tank >=20 > state: DEGRADED >=20 > status: One or more devices has been taken offline by the = administrator. >=20 > Sufficient replicas exist for the pool to continue functioning = > in a >=20 > degraded state. >=20 > action: Online the device using 'zpool online' or replace the device=20 > with >=20 > 'zpool replace'. >=20 > scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19=20 > 2012 >=20 > config: >=20 >=20 >=20 > NAME STATE READ WRITE CKSUM >=20 > tank DEGRADED 0 0 0 >=20 > raidz1-0 DEGRADED 0 0 0 >=20 > hast/disk1 ONLINE 0 0 0 >=20 > 11919832608590631234 OFFLINE 0 0 0 was > /dev/hast/disk2 >=20 > hast/disk3 ONLINE 0 0 0 >=20 > hast/disk4 ONLINE 0 0 0 >=20 > hast/disk5 ONLINE 0 0 0 >=20 > hast/disk6 ONLINE 0 0 0 >=20 > hast/disk7 ONLINE 0 0 0 >=20 > hast/disk8 ONLINE 0 0 0 >=20 > hast/disk9 ONLINE 0 0 0 >=20 > hast/disk10 ONLINE 0 0 0 >=20 >=20 >=20 > errors: No known data errors >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool online tank hast/disk2 >=20 > warning: device 'hast/disk2' onlined, but remains in faulted state >=20 > use 'zpool replace' to replace devices that are no longer present >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# zpool replace tank hast/disk2 >=20 > cannot open 'hast/disk2': no such GEOM provider >=20 > must be a full path or shorthand device name >=20 > [root@san1 /usr/home/jose]# >=20 > [root@san1 /usr/home/jose]# >=20 >=20 >=20 > From: Garrett Cooper < <mailto:yanegomi@gmail.com> yanegomi@gmail.com> > Date: September 23, 2012 12:25:52 PM PDT > To: "Jose A. Lombera" < <mailto:jose@lajni.com> jose@lajni.com> > Cc: <mailto:freebsd-current@freebsd.org> freebsd-current@freebsd.org > Subject: Re: zpool can't bring online disk2 >=20 > On Sun, Sep 23, 2012 at 11:23 AM, Jose A. Lombera < = <mailto:jose@lajni.com> jose@lajni.com> wrote: >=20 >=20 >=20 > Hello! all, >=20 >=20 >=20 > I hope someone can help me out with this. >=20 >=20 >=20 > Recently disk2 when bad, I have used >=20 >=20 >=20 > Zpool offline tank hast/disk2 >=20 >=20 >=20 > To bring the disk offline. >=20 > Then I replaced it. >=20 >=20 >=20 >=20 >=20 >=20 >=20 > And use the command >=20 >=20 >=20 > Zpool online tank hast/disk2 >=20 >=20 >=20 > But the disk show REMOVE. >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > [root@san1 /usr/home/jose]# zpool status -v >=20 > pool: tank >=20 > state: DEGRADED >=20 > status: One or more devices has been removed by the administrator. >=20 >=20 >=20 > Sufficient replicas exist for the pool to continue functioning=20 > in a >=20 > degraded state. >=20 >=20 >=20 > action: Online the device using 'zpool online' or replace the device=20 > with >=20 >=20 >=20 > 'zpool replace'. >=20 >=20 >=20 > scan: resilvered 2.49M in 0h2m with 0 errors on Sat Sep 22 01:03:13=20 > 2012 >=20 > config: >=20 >=20 >=20 > NAME STATE READ WRITE CKSUM >=20 >=20 >=20 > tank DEGRADED 0 0 0 >=20 >=20 >=20 > raidz1-0 DEGRADED 0 0 0 >=20 >=20 >=20 > hast/disk1 ONLINE 0 0 0 >=20 >=20 >=20 > 11919832608590631234 REMOVED 0 0 0 was >=20 > /dev/hast/disk2 >=20 >=20 >=20 > hast/disk3 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk4 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk5 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk6 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk7 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk8 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk9 ONLINE 0 0 0 >=20 >=20 >=20 > hast/disk10 ONLINE 0 0 0 >=20 >=20 >=20 > [root@san1 /usr/home/jose]# zpool online tank hast/disk2 >=20 >=20 >=20 > warning: device 'hast/disk2' onlined, but remains in faulted state >=20 >=20 >=20 > use 'zpool replace' to replace devices that are no longer present >=20 >=20 >=20 > [root@san1 /usr/home/jose]# >=20 >=20 >=20 > I can't bring it back online. >=20 >=20 >=20 > Can you guys help me out what to do. >=20 >=20 >=20 > This is a production server and I can't afford to bring the server = down. >=20 >=20 >=20 > I have already swap 3 disks and I got the same result. >=20 >=20 >=20 > Thank you guys in advance. >=20 >=20 > You forgot to call zpool replace as the last step in the process of = > replacing your faulted disk: > <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> = http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html . > Cheers, > -Garrett >=20 > _______________________________________________ > <mailto:freebsd-current@freebsd.org> freebsd-current@freebsd.org = mailing list=20 > <http://lists.freebsd.org/mailman/listinfo/freebsd-current> = http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to " = <mailto:freebsd-current-unsubscribe@freebsd.org> = freebsd-current-unsubscribe@freebsd.org" >=20 _______________________________________________ <mailto:freebsd-current@freebsd.org> freebsd-current@freebsd.org = mailing list = <http://lists.freebsd.org/mailman/listinfo/freebsd-current> = http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to " = <mailto:freebsd-current-unsubscribe@freebsd.org> = freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?013101cd9a18$8106ede0$8314c9a0$>