Date: Wed, 27 Oct 2010 16:22:35 -0700 From: Rumen Telbizov <telbizov@gmail.com> To: freebsd-stable@freebsd.org Subject: Re: Degraded zpool cannot detach old/bad drive Message-ID: <AANLkTinjfpnHGMvzJ5Ly8_WFXGvQmQ4D0-_TgbVBi=cf@mail.gmail.com> In-Reply-To: <AANLkTi=EWfVyZjKEYe=c0x6QvsdUcHGo2-iqGr4OaVG7@mail.gmail.com> References: <AANLkTi=EWfVyZjKEYe=c0x6QvsdUcHGo2-iqGr4OaVG7@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
No ideas whatsoever? On Tue, Oct 26, 2010 at 1:04 PM, Rumen Telbizov <telbizov@gmail.com> wrote: > Hello everyone, > > After a few days of struggle with my degraded zpool on a backup server I > decided to ask for > help here or at least get some clues as to what might be wrong with it. > Here's the current state of the zpool: > > # zpool status > > pool: tank > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > spare DEGRADED 0 0 0 > replacing DEGRADED 0 0 0 > 17307041822177798519 UNAVAIL 0 299 0 was > /dev/gpt/disk-e1:s2 > gpt/newdisk-e1:s2 ONLINE 0 0 0 > gpt/disk-e2:s10 ONLINE 0 0 0 > gpt/disk-e1:s3 ONLINE 30 0 0 > gpt/disk-e1:s4 ONLINE 0 0 0 > gpt/disk-e1:s5 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > gpt/disk-e1:s6 ONLINE 0 0 0 > gpt/disk-e1:s7 ONLINE 0 0 0 > gpt/disk-e1:s8 ONLINE 0 0 0 > gpt/disk-e1:s9 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > gpt/disk-e1:s10 ONLINE 0 0 0 > gpt/disk-e1:s11 ONLINE 0 0 0 > gpt/disk-e1:s12 ONLINE 0 0 0 > gpt/disk-e1:s13 ONLINE 0 0 0 > raidz1 DEGRADED 0 0 0 > gpt/disk-e1:s14 ONLINE 0 0 0 > gpt/disk-e1:s15 ONLINE 0 0 0 > gpt/disk-e1:s16 ONLINE 0 0 0 > spare DEGRADED 0 0 0 > replacing DEGRADED 0 0 0 > 15258738282880603331 UNAVAIL 0 48 0 was > /dev/gpt/disk-e1:s17 > gpt/newdisk-e1:s17 ONLINE 0 0 0 > gpt/disk-e2:s11 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > gpt/disk-e1:s18 ONLINE 0 0 0 > gpt/disk-e1:s19 ONLINE 0 0 0 > gpt/disk-e1:s20 ONLINE 0 0 0 > gpt/disk-e1:s21 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > gpt/disk-e1:s22 ONLINE 0 0 0 > gpt/disk-e1:s23 ONLINE 0 0 0 > gpt/disk-e2:s0 ONLINE 0 0 0 > gpt/disk-e2:s1 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > gpt/disk-e2:s2 ONLINE 0 0 0 > gpt/disk-e2:s3 ONLINE 0 0 0 > gpt/disk-e2:s4 ONLINE 0 0 0 > gpt/disk-e2:s5 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > gpt/disk-e2:s6 ONLINE 0 0 0 > gpt/disk-e2:s7 ONLINE 0 0 0 > gpt/disk-e2:s8 ONLINE 0 0 0 > gpt/disk-e2:s9 ONLINE 0 0 0 > spares > gpt/disk-e2:s10 INUSE currently in use > gpt/disk-e2:s11 INUSE currently in use > gpt/disk-e1:s2 UNAVAIL cannot open > gpt/newdisk-e1:s17 INUSE currently in use > > errors: 4 data errors, use '-v' for a list > > > The problem is: after replacing the bad drives and resilvering the old/bad > drives cannot be detached. > The replace command didn't remove it automatically and manual detach fails. > Here are some examples: > > # zpool detach tank 15258738282880603331 > cannot detach 15258738282880603331: no valid replicas > # zpool detach tank gpt/disk-e2:s11 > cannot detach gpt/disk-e2:s11: no valid replicas > # zpool detach tank gpt/newdisk-e1:s17 > cannot detach gpt/newdisk-e1:s17: no valid replicas > # zpool detach tank gpt/disk-e1:s17 > cannot detach gpt/disk-e1:s17: no valid replicas > > > Here's more information and history of events. > This is a 36 disk SuperMicro 847 machine with 2T WD RE4 disks organized in > raidz1 groups as > depicted above. zpool deals only with partitions like those: > > => 34 3904294845 mfid30 GPT (1.8T) > 34 3903897600 1 disk-e2:s9 (1.8T) > 3903897634 397245 - free - (194M) > > mfidXX devices are disks connected to a SuperMicro/LSI controller and > presented as jbods. JBODs in this adapter > are actually constructed as raid0 array of 1 disk but this should be > irrelevant in this case. > > This machine was working fine since September 6th but two of the disks (in > different raidz1 vdevs) were going > pretty bad and accumulated quite a bit of errors until eventually they > died. This is how they looked like: > > raidz1 DEGRADED 0 0 0 > gpt/disk-e1:s2 UNAVAIL 44 59.5K 0 experienced I/O > failures > gpt/disk-e1:s3 ONLINE 0 0 0 > gpt/disk-e1:s4 ONLINE 0 0 0 > gpt/disk-e1:s5 ONLINE 0 0 0 > > raidz1 DEGRADED 0 0 0 > gpt/disk-e1:s14 ONLINE 0 0 0 > gpt/disk-e1:s15 ONLINE 0 0 0 > gpt/disk-e1:s16 ONLINE 0 0 0 > gpt/disk-e1:s17 UNAVAIL 1.56K 49.0K 0 experienced I/O > failures > > > I did have two spare disks ready to replace them. So after they died here's > what I executed: > > # zpool replace tank gpt/disk-e1:s2 gpt/disk-e2:s10 > # zpool replace tank gpt/disk-e1:s17 gpt/disk-e2:s11 > > Resilvering started. While in the middle of it though the kernel paniced > and I had to reboot the machine. > After reboot I waited until the resilvering is complete. Now that it was > complete I expected to see the old/bad > device removed from the vdev but it was still there. Trying detach was > complaining with no valid replicas. > I sent colo technician to replace both those defective drives with brand > new ones. Once I had them inserted > I recreated them exactly the same way as the ones that I had before - jbod > and gpart labeled partition with the > same name! Then I added them as spares: > > # zpool add tank spare gpt/disk-e1:s2 > # zpool add tank spare gpt/disk-e1:s17 > > That actually made it worse I think since now I had the same device name > both as a 'previous' failed device > inside the raidz1 group and as a hot spare spare device. I couldn't do > anything with it. > What I did was to export the pool fail the disk on the controller, import > the pool and check that zfs could open > it anymore (as a part of the hot spares). Then I recreated that > disk/partition with a new label 'newdisk-XXX' > and tried to replace the device that originally failed (and was only > presented with a number). So I did this: > > # zpool replace tank gpt/disk-e1:s17 gpt/newdisk-e1:s17 > # zpool replace tank gpt/disk-e1:s2 gpt/newdisk-e1:s2 > > Resilvering completed after 17 hours or so and I expected for the > 'replacing' operation to disappear and the > replaced device to go away. But it didn't! Instead I have the state of the > pool as shown in the beginning of > the email. > As for the 'errors: 4 data errors, use '-v' for a list' I suspect that > it's due another failing > device (gpt/disk-e1:s3) inside the first (currently degraded) raidz1 vdev. > Those 4 corrupted files actually > could be read sometimes so that tells me that the disk has trouble reading > *sometimes* those bad blocks. > > Here's the output of zdb -l tank > > version=14 > name='tank' > state=0 > txg=200225 > pool_guid=13504509992978610301 > hostid=409325918 > hostname='XXXX' > vdev_tree > type='root' > id=0 > guid=13504509992978610301 > children[0] > type='raidz' > id=0 > guid=3740854890192825394 > nparity=1 > metaslab_array=33 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='spare' > id=0 > guid=16171901098004278313 > whole_disk=0 > children[0] > type='replacing' > id=0 > guid=2754550310390861576 > whole_disk=0 > children[0] > type='disk' > id=0 > guid=17307041822177798519 > path='/dev/gpt/disk-e1:s2' > whole_disk=0 > not_present=1 > DTL=246 > children[1] > type='disk' > id=1 > guid=1641394056824955485 > path='/dev/gpt/newdisk-e1:s2' > whole_disk=0 > DTL=55 > children[1] > type='disk' > id=1 > guid=13150356781300468512 > path='/dev/gpt/disk-e2:s10' > whole_disk=0 > is_spare=1 > DTL=1289 > children[1] > type='disk' > id=1 > guid=6047192237176807561 > path='/dev/gpt/disk-e1:s3' > whole_disk=0 > DTL=250 > children[2] > type='disk' > id=2 > guid=9178318500891071208 > path='/dev/gpt/disk-e1:s4' > whole_disk=0 > DTL=249 > children[3] > type='disk' > id=3 > guid=2567999855746767831 > path='/dev/gpt/disk-e1:s5' > whole_disk=0 > DTL=248 > children[1] > type='raidz' > id=1 > guid=17097047310177793733 > nparity=1 > metaslab_array=31 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=14513380297393196654 > path='/dev/gpt/disk-e1:s6' > whole_disk=0 > DTL=266 > children[1] > type='disk' > id=1 > guid=7673391645329839273 > path='/dev/gpt/disk-e1:s7' > whole_disk=0 > DTL=265 > children[2] > type='disk' > id=2 > guid=15189132305590412134 > path='/dev/gpt/disk-e1:s8' > whole_disk=0 > DTL=264 > children[3] > type='disk' > id=3 > guid=17171875527714022076 > path='/dev/gpt/disk-e1:s9' > whole_disk=0 > DTL=263 > children[2] > type='raidz' > id=2 > guid=4551002265962803186 > nparity=1 > metaslab_array=30 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=12104241519484712161 > path='/dev/gpt/disk-e1:s10' > whole_disk=0 > DTL=262 > children[1] > type='disk' > id=1 > guid=3950210349623142325 > path='/dev/gpt/disk-e1:s11' > whole_disk=0 > DTL=261 > children[2] > type='disk' > id=2 > guid=14559903955698640085 > path='/dev/gpt/disk-e1:s12' > whole_disk=0 > DTL=260 > children[3] > type='disk' > id=3 > guid=12364155114844220066 > path='/dev/gpt/disk-e1:s13' > whole_disk=0 > DTL=259 > children[3] > type='raidz' > id=3 > guid=12517231224568010294 > nparity=1 > metaslab_array=29 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=7655789038925330983 > path='/dev/gpt/disk-e1:s14' > whole_disk=0 > DTL=258 > children[1] > type='disk' > id=1 > guid=17815755378968233141 > path='/dev/gpt/disk-e1:s15' > whole_disk=0 > DTL=257 > children[2] > type='disk' > id=2 > guid=9590421681925673767 > path='/dev/gpt/disk-e1:s16' > whole_disk=0 > DTL=256 > children[3] > type='spare' > id=3 > guid=4015417100051235398 > whole_disk=0 > children[0] > type='replacing' > id=0 > guid=11653429697330193176 > whole_disk=0 > children[0] > type='disk' > id=0 > guid=15258738282880603331 > path='/dev/gpt/disk-e1:s17' > whole_disk=0 > not_present=1 > DTL=255 > children[1] > type='disk' > id=1 > guid=908651380690954833 > path='/dev/gpt/newdisk-e1:s17' > whole_disk=0 > is_spare=1 > DTL=52 > children[1] > type='disk' > id=1 > guid=7250934196571906160 > path='/dev/gpt/disk-e2:s11' > whole_disk=0 > is_spare=1 > DTL=1292 > children[4] > type='raidz' > id=4 > guid=7622366288306613136 > nparity=1 > metaslab_array=28 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=11283483106921343963 > path='/dev/gpt/disk-e1:s18' > whole_disk=0 > DTL=254 > children[1] > type='disk' > id=1 > guid=14900597968455968576 > path='/dev/gpt/disk-e1:s19' > whole_disk=0 > DTL=253 > children[2] > type='disk' > id=2 > guid=4140592611852504513 > path='/dev/gpt/disk-e1:s20' > whole_disk=0 > DTL=252 > children[3] > type='disk' > id=3 > guid=2794215380207576975 > path='/dev/gpt/disk-e1:s21' > whole_disk=0 > DTL=251 > children[5] > type='raidz' > id=5 > guid=17655293908271300889 > nparity=1 > metaslab_array=27 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=5274146379037055039 > path='/dev/gpt/disk-e1:s22' > whole_disk=0 > DTL=278 > children[1] > type='disk' > id=1 > guid=8651755019404873686 > path='/dev/gpt/disk-e1:s23' > whole_disk=0 > DTL=277 > children[2] > type='disk' > id=2 > guid=16827379661759988976 > path='/dev/gpt/disk-e2:s0' > whole_disk=0 > DTL=276 > children[3] > type='disk' > id=3 > guid=2524967151333933972 > path='/dev/gpt/disk-e2:s1' > whole_disk=0 > DTL=275 > children[6] > type='raidz' > id=6 > guid=2413519694016115220 > nparity=1 > metaslab_array=26 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=16361968944335143412 > path='/dev/gpt/disk-e2:s2' > whole_disk=0 > DTL=274 > children[1] > type='disk' > id=1 > guid=10054650477559530937 > path='/dev/gpt/disk-e2:s3' > whole_disk=0 > DTL=273 > children[2] > type='disk' > id=2 > guid=17105959045159531558 > path='/dev/gpt/disk-e2:s4' > whole_disk=0 > DTL=272 > children[3] > type='disk' > id=3 > guid=17370453969371497663 > path='/dev/gpt/disk-e2:s5' > whole_disk=0 > DTL=271 > children[7] > type='raidz' > id=7 > guid=4614010953103453823 > nparity=1 > metaslab_array=24 > metaslab_shift=36 > ashift=9 > asize=7995163410432 > is_log=0 > children[0] > type='disk' > id=0 > guid=10090128057592036175 > path='/dev/gpt/disk-e2:s6' > whole_disk=0 > DTL=270 > children[1] > type='disk' > id=1 > guid=16676544025008223925 > path='/dev/gpt/disk-e2:s7' > whole_disk=0 > DTL=269 > children[2] > type='disk' > id=2 > guid=11777789246954957292 > path='/dev/gpt/disk-e2:s8' > whole_disk=0 > DTL=268 > children[3] > type='disk' > id=3 > guid=3406600121427522915 > path='/dev/gpt/disk-e2:s9' > whole_disk=0 > DTL=267 > > OS: > 8.1-STABLE FreeBSD 8.1-STABLE #0: Sun Sep 5 00:22:45 PDT 2010 amd64 > > Hardware: > Chassis: SuperMicro 847E1 (two backplanes 24 disks front and 12 > disks in the back) > Motherboard: X8SIL > CPU: 1 x X3430 @ 2.40GHz > RAM: 16G > HDD Controller: SuperMicro / LSI 9260 (pciconf -lv SAS1078 PCI-X > Fusion-MPT SAS) : 2 ports > Disks: 36 x 2T Western Digital RE4 > > > > Any help would be appreciated. Let me know what additional information I > should provide. > Thank you in advance, > -- > Rumen Telbizov > > -- Rumen Telbizov http://telbizov.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinjfpnHGMvzJ5Ly8_WFXGvQmQ4D0-_TgbVBi=cf>