Date: Tue, 26 Oct 2010 13:04:53 -0700 From: Rumen Telbizov <telbizov@gmail.com> To: freebsd-stable@freebsd.org Subject: Degraded zpool cannot detach old/bad drive Message-ID: <AANLkTi=EWfVyZjKEYe=c0x6QvsdUcHGo2-iqGr4OaVG7@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hello everyone, After a few days of struggle with my degraded zpool on a backup server I decided to ask for help here or at least get some clues as to what might be wrong with it. Here's the current state of the zpool: # zpool status pool: tank state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 spare DEGRADED 0 0 0 replacing DEGRADED 0 0 0 17307041822177798519 UNAVAIL 0 299 0 was /dev/gpt/disk-e1:s2 gpt/newdisk-e1:s2 ONLINE 0 0 0 gpt/disk-e2:s10 ONLINE 0 0 0 gpt/disk-e1:s3 ONLINE 30 0 0 gpt/disk-e1:s4 ONLINE 0 0 0 gpt/disk-e1:s5 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk-e1:s6 ONLINE 0 0 0 gpt/disk-e1:s7 ONLINE 0 0 0 gpt/disk-e1:s8 ONLINE 0 0 0 gpt/disk-e1:s9 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk-e1:s10 ONLINE 0 0 0 gpt/disk-e1:s11 ONLINE 0 0 0 gpt/disk-e1:s12 ONLINE 0 0 0 gpt/disk-e1:s13 ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 gpt/disk-e1:s14 ONLINE 0 0 0 gpt/disk-e1:s15 ONLINE 0 0 0 gpt/disk-e1:s16 ONLINE 0 0 0 spare DEGRADED 0 0 0 replacing DEGRADED 0 0 0 15258738282880603331 UNAVAIL 0 48 0 was /dev/gpt/disk-e1:s17 gpt/newdisk-e1:s17 ONLINE 0 0 0 gpt/disk-e2:s11 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk-e1:s18 ONLINE 0 0 0 gpt/disk-e1:s19 ONLINE 0 0 0 gpt/disk-e1:s20 ONLINE 0 0 0 gpt/disk-e1:s21 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk-e1:s22 ONLINE 0 0 0 gpt/disk-e1:s23 ONLINE 0 0 0 gpt/disk-e2:s0 ONLINE 0 0 0 gpt/disk-e2:s1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk-e2:s2 ONLINE 0 0 0 gpt/disk-e2:s3 ONLINE 0 0 0 gpt/disk-e2:s4 ONLINE 0 0 0 gpt/disk-e2:s5 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/disk-e2:s6 ONLINE 0 0 0 gpt/disk-e2:s7 ONLINE 0 0 0 gpt/disk-e2:s8 ONLINE 0 0 0 gpt/disk-e2:s9 ONLINE 0 0 0 spares gpt/disk-e2:s10 INUSE currently in use gpt/disk-e2:s11 INUSE currently in use gpt/disk-e1:s2 UNAVAIL cannot open gpt/newdisk-e1:s17 INUSE currently in use errors: 4 data errors, use '-v' for a list The problem is: after replacing the bad drives and resilvering the old/bad drives cannot be detached. The replace command didn't remove it automatically and manual detach fails. Here are some examples: # zpool detach tank 15258738282880603331 cannot detach 15258738282880603331: no valid replicas # zpool detach tank gpt/disk-e2:s11 cannot detach gpt/disk-e2:s11: no valid replicas # zpool detach tank gpt/newdisk-e1:s17 cannot detach gpt/newdisk-e1:s17: no valid replicas # zpool detach tank gpt/disk-e1:s17 cannot detach gpt/disk-e1:s17: no valid replicas Here's more information and history of events. This is a 36 disk SuperMicro 847 machine with 2T WD RE4 disks organized in raidz1 groups as depicted above. zpool deals only with partitions like those: => 34 3904294845 mfid30 GPT (1.8T) 34 3903897600 1 disk-e2:s9 (1.8T) 3903897634 397245 - free - (194M) mfidXX devices are disks connected to a SuperMicro/LSI controller and presented as jbods. JBODs in this adapter are actually constructed as raid0 array of 1 disk but this should be irrelevant in this case. This machine was working fine since September 6th but two of the disks (in different raidz1 vdevs) were going pretty bad and accumulated quite a bit of errors until eventually they died. This is how they looked like: raidz1 DEGRADED 0 0 0 gpt/disk-e1:s2 UNAVAIL 44 59.5K 0 experienced I/O failures gpt/disk-e1:s3 ONLINE 0 0 0 gpt/disk-e1:s4 ONLINE 0 0 0 gpt/disk-e1:s5 ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 gpt/disk-e1:s14 ONLINE 0 0 0 gpt/disk-e1:s15 ONLINE 0 0 0 gpt/disk-e1:s16 ONLINE 0 0 0 gpt/disk-e1:s17 UNAVAIL 1.56K 49.0K 0 experienced I/O failures I did have two spare disks ready to replace them. So after they died here's what I executed: # zpool replace tank gpt/disk-e1:s2 gpt/disk-e2:s10 # zpool replace tank gpt/disk-e1:s17 gpt/disk-e2:s11 Resilvering started. While in the middle of it though the kernel paniced and I had to reboot the machine. After reboot I waited until the resilvering is complete. Now that it was complete I expected to see the old/bad device removed from the vdev but it was still there. Trying detach was complaining with no valid replicas. I sent colo technician to replace both those defective drives with brand new ones. Once I had them inserted I recreated them exactly the same way as the ones that I had before - jbod and gpart labeled partition with the same name! Then I added them as spares: # zpool add tank spare gpt/disk-e1:s2 # zpool add tank spare gpt/disk-e1:s17 That actually made it worse I think since now I had the same device name both as a 'previous' failed device inside the raidz1 group and as a hot spare spare device. I couldn't do anything with it. What I did was to export the pool fail the disk on the controller, import the pool and check that zfs could open it anymore (as a part of the hot spares). Then I recreated that disk/partition with a new label 'newdisk-XXX' and tried to replace the device that originally failed (and was only presented with a number). So I did this: # zpool replace tank gpt/disk-e1:s17 gpt/newdisk-e1:s17 # zpool replace tank gpt/disk-e1:s2 gpt/newdisk-e1:s2 Resilvering completed after 17 hours or so and I expected for the 'replacing' operation to disappear and the replaced device to go away. But it didn't! Instead I have the state of the pool as shown in the beginning of the email. As for the 'errors: 4 data errors, use '-v' for a list' I suspect that it's due another failing device (gpt/disk-e1:s3) inside the first (currently degraded) raidz1 vdev. Those 4 corrupted files actually could be read sometimes so that tells me that the disk has trouble reading *sometimes* those bad blocks. Here's the output of zdb -l tank version=14 name='tank' state=0 txg=200225 pool_guid=13504509992978610301 hostid=409325918 hostname='XXXX' vdev_tree type='root' id=0 guid=13504509992978610301 children[0] type='raidz' id=0 guid=3740854890192825394 nparity=1 metaslab_array=33 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='spare' id=0 guid=16171901098004278313 whole_disk=0 children[0] type='replacing' id=0 guid=2754550310390861576 whole_disk=0 children[0] type='disk' id=0 guid=17307041822177798519 path='/dev/gpt/disk-e1:s2' whole_disk=0 not_present=1 DTL=246 children[1] type='disk' id=1 guid=1641394056824955485 path='/dev/gpt/newdisk-e1:s2' whole_disk=0 DTL=55 children[1] type='disk' id=1 guid=13150356781300468512 path='/dev/gpt/disk-e2:s10' whole_disk=0 is_spare=1 DTL=1289 children[1] type='disk' id=1 guid=6047192237176807561 path='/dev/gpt/disk-e1:s3' whole_disk=0 DTL=250 children[2] type='disk' id=2 guid=9178318500891071208 path='/dev/gpt/disk-e1:s4' whole_disk=0 DTL=249 children[3] type='disk' id=3 guid=2567999855746767831 path='/dev/gpt/disk-e1:s5' whole_disk=0 DTL=248 children[1] type='raidz' id=1 guid=17097047310177793733 nparity=1 metaslab_array=31 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=14513380297393196654 path='/dev/gpt/disk-e1:s6' whole_disk=0 DTL=266 children[1] type='disk' id=1 guid=7673391645329839273 path='/dev/gpt/disk-e1:s7' whole_disk=0 DTL=265 children[2] type='disk' id=2 guid=15189132305590412134 path='/dev/gpt/disk-e1:s8' whole_disk=0 DTL=264 children[3] type='disk' id=3 guid=17171875527714022076 path='/dev/gpt/disk-e1:s9' whole_disk=0 DTL=263 children[2] type='raidz' id=2 guid=4551002265962803186 nparity=1 metaslab_array=30 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=12104241519484712161 path='/dev/gpt/disk-e1:s10' whole_disk=0 DTL=262 children[1] type='disk' id=1 guid=3950210349623142325 path='/dev/gpt/disk-e1:s11' whole_disk=0 DTL=261 children[2] type='disk' id=2 guid=14559903955698640085 path='/dev/gpt/disk-e1:s12' whole_disk=0 DTL=260 children[3] type='disk' id=3 guid=12364155114844220066 path='/dev/gpt/disk-e1:s13' whole_disk=0 DTL=259 children[3] type='raidz' id=3 guid=12517231224568010294 nparity=1 metaslab_array=29 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=7655789038925330983 path='/dev/gpt/disk-e1:s14' whole_disk=0 DTL=258 children[1] type='disk' id=1 guid=17815755378968233141 path='/dev/gpt/disk-e1:s15' whole_disk=0 DTL=257 children[2] type='disk' id=2 guid=9590421681925673767 path='/dev/gpt/disk-e1:s16' whole_disk=0 DTL=256 children[3] type='spare' id=3 guid=4015417100051235398 whole_disk=0 children[0] type='replacing' id=0 guid=11653429697330193176 whole_disk=0 children[0] type='disk' id=0 guid=15258738282880603331 path='/dev/gpt/disk-e1:s17' whole_disk=0 not_present=1 DTL=255 children[1] type='disk' id=1 guid=908651380690954833 path='/dev/gpt/newdisk-e1:s17' whole_disk=0 is_spare=1 DTL=52 children[1] type='disk' id=1 guid=7250934196571906160 path='/dev/gpt/disk-e2:s11' whole_disk=0 is_spare=1 DTL=1292 children[4] type='raidz' id=4 guid=7622366288306613136 nparity=1 metaslab_array=28 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=11283483106921343963 path='/dev/gpt/disk-e1:s18' whole_disk=0 DTL=254 children[1] type='disk' id=1 guid=14900597968455968576 path='/dev/gpt/disk-e1:s19' whole_disk=0 DTL=253 children[2] type='disk' id=2 guid=4140592611852504513 path='/dev/gpt/disk-e1:s20' whole_disk=0 DTL=252 children[3] type='disk' id=3 guid=2794215380207576975 path='/dev/gpt/disk-e1:s21' whole_disk=0 DTL=251 children[5] type='raidz' id=5 guid=17655293908271300889 nparity=1 metaslab_array=27 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=5274146379037055039 path='/dev/gpt/disk-e1:s22' whole_disk=0 DTL=278 children[1] type='disk' id=1 guid=8651755019404873686 path='/dev/gpt/disk-e1:s23' whole_disk=0 DTL=277 children[2] type='disk' id=2 guid=16827379661759988976 path='/dev/gpt/disk-e2:s0' whole_disk=0 DTL=276 children[3] type='disk' id=3 guid=2524967151333933972 path='/dev/gpt/disk-e2:s1' whole_disk=0 DTL=275 children[6] type='raidz' id=6 guid=2413519694016115220 nparity=1 metaslab_array=26 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=16361968944335143412 path='/dev/gpt/disk-e2:s2' whole_disk=0 DTL=274 children[1] type='disk' id=1 guid=10054650477559530937 path='/dev/gpt/disk-e2:s3' whole_disk=0 DTL=273 children[2] type='disk' id=2 guid=17105959045159531558 path='/dev/gpt/disk-e2:s4' whole_disk=0 DTL=272 children[3] type='disk' id=3 guid=17370453969371497663 path='/dev/gpt/disk-e2:s5' whole_disk=0 DTL=271 children[7] type='raidz' id=7 guid=4614010953103453823 nparity=1 metaslab_array=24 metaslab_shift=36 ashift=9 asize=7995163410432 is_log=0 children[0] type='disk' id=0 guid=10090128057592036175 path='/dev/gpt/disk-e2:s6' whole_disk=0 DTL=270 children[1] type='disk' id=1 guid=16676544025008223925 path='/dev/gpt/disk-e2:s7' whole_disk=0 DTL=269 children[2] type='disk' id=2 guid=11777789246954957292 path='/dev/gpt/disk-e2:s8' whole_disk=0 DTL=268 children[3] type='disk' id=3 guid=3406600121427522915 path='/dev/gpt/disk-e2:s9' whole_disk=0 DTL=267 OS: 8.1-STABLE FreeBSD 8.1-STABLE #0: Sun Sep 5 00:22:45 PDT 2010 amd64 Hardware: Chassis: SuperMicro 847E1 (two backplanes 24 disks front and 12 disks in the back) Motherboard: X8SIL CPU: 1 x X3430 @ 2.40GHz RAM: 16G HDD Controller: SuperMicro / LSI 9260 (pciconf -lv SAS1078 PCI-X Fusion-MPT SAS) : 2 ports Disks: 36 x 2T Western Digital RE4 Any help would be appreciated. Let me know what additional information I should provide. Thank you in advance, -- Rumen Telbizov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=EWfVyZjKEYe=c0x6QvsdUcHGo2-iqGr4OaVG7>