Date: Fri, 5 Jun 2009 04:28:38 -0700 From: Kip Macy <kmacy@freebsd.org> To: Kip Macy <kmacy@freebsd.org>, freebsd-stable@freebsd.org Subject: Re: ZFS weird device tasting loop since MFC Message-ID: <3c1674c90906050428mafb5760gc706e879193345e0@mail.gmail.com> In-Reply-To: <20090605084423.GA1609@acme.spoerlein.net> References: <20090602091610.GE93344@acme.spoerlein.net> <20090602092408.GF93344@acme.spoerlein.net> <20090605084423.GA1609@acme.spoerlein.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Must be a weird geom interaction. I don't see this with raw disk. I'll look at it eventually but UMA and performance are further up in the queue. -Kip On Fri, Jun 5, 2009 at 1:44 AM, Ulrich Sp=F6rlein<uqs@spoerlein.net> wrote: > On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Sp=F6rlein wrote: >> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Sp=F6rlein wrote: >> > Hi all, >> > >> > so I went ahead and updated my ~7.2 file server to the new ZFS goodnes= s, >> > and before running any further tests, I already discovered something >> > weird and annoying. >> > >> > I'm using a mirror on GELI, where one disk is usually *not* attached a= s >> > a means of poor man's backup. (I had to go that route, as send/recv of >> > snapshots frequently deadlocked the system, whereas a mirror scrubbing >> > did not) >> > >> > root@coyote:~# zpool status >> > =A0 pool: tank >> > =A0state: DEGRADED >> > status: The pool is formatted using an older on-disk format. =A0The po= ol can >> > =A0 =A0 =A0 =A0 still be used, but some features are unavailable. >> > action: Upgrade the pool using 'zpool upgrade'. =A0Once this is done, = the >> > =A0 =A0 =A0 =A0 pool will no longer be accessible on older software ve= rsions. >> > =A0scrub: none requested >> > config: >> > >> > =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0STATE = =A0 =A0 READ WRITE CKSUM >> > =A0 =A0 =A0 =A0 tank =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DEGRAD= ED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >> > =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DEGRADED= =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >> > =A0 =A0 =A0 =A0 =A0 =A0 ad4.eli =A0 =A0 =A0 =A0 =A0 =A0 =A0 ONLINE =A0= =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >> > =A0 =A0 =A0 =A0 =A0 =A0 12333765091756463941 =A0REMOVED =A0 =A0 =A00 = =A0 =A0 0 =A0 =A0 0 =A0was /dev/da0.eli >> > >> > errors: No known data errors >> > >> > When imported, there is a constant "tasting" of all devices in the sys= tem, >> > which also makes the floppy drive go spinning constantly, which is rea= lly >> > annoying. It did not do this with the old ZFS, are there any remedies? >> > >> > gstat(8) is displaying the following every other second, together with= a >> > spinning fd0 drive. >> > >> > dT: 1.010s =A0w: 1.000s =A0filter: ^...$ >> > =A0L(q) =A0ops/s =A0 =A0r/s =A0 kBps =A0 ms/r =A0 =A0w/s =A0 kBps =A0 = ms/w =A0 %busy Name >> > =A0 =A0 0 =A0 =A0 =A00 =A0 =A0 =A00 =A0 =A0 =A00 =A0 =A00.0 =A0 =A0 = =A00 =A0 =A0 =A00 =A0 =A00.0 =A0 =A00.0| fd0 >> > =A0 =A0 0 =A0 =A0 =A08 =A0 =A0 =A08 =A0 1014 =A0 =A00.1 =A0 =A0 =A00 = =A0 =A0 =A00 =A0 =A00.0 =A0 =A00.1| md0 >> > =A0 =A0 0 =A0 =A0 32 =A0 =A0 32 =A0 4055 =A0 =A09.2 =A0 =A0 =A00 =A0 = =A0 =A00 =A0 =A00.0 =A0 29.2| ad0 >> > =A0 =A0 0 =A0 =A0 77 =A0 =A0 10 =A0 1267 =A0 =A07.1 =A0 =A0 63 =A0 112= 5 =A0 =A02.3 =A0 31.8| ad4 >> > >> > There is no activity going on, especially md0 is for /tmp, yet it >> > constantly tries to read stuff from everywhere. I will now insert the >> > second drive and see if ZFS shuts up then ... >> >> It does, but it also did not start resilvering the second disk: >> >> root@coyote:~# zpool status >> =A0 pool: tank >> =A0state: ONLINE >> status: One or more devices has experienced an unrecoverable error. =A0A= n >> =A0 =A0 =A0 =A0 attempt was made to correct the error. =A0Applications a= re unaffected. >> action: Determine if the device needs to be replaced, and clear the erro= rs >> =A0 =A0 =A0 =A0 using 'zpool clear' or replace the device with 'zpool re= place'. >> =A0 =A0see: http://www.sun.com/msg/ZFS-8000-9P >> =A0scrub: none requested >> config: >> >> =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM >> =A0 =A0 =A0 =A0 tank =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0 ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0 da0.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A016 >> >> errors: No known data errors >> >> Will now run the scrub and report back in 6-9h. > > Another datapoint: While the floppy-tasting has stopped, since the mirror= sees > all devices again, there is some other problem here: > > root@coyote:/# zpool online tank da0.eli > root@coyote:/# zpool status > =A0pool: tank > =A0state: ONLINE > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A02.20M resilvered > > errors: No known data errors > root@coyote:/# zpool offline tank da0.eli > root@coyote:/# zpool status > =A0pool: tank > =A0state: DEGRADED > status: One or more devices has been taken offline by the administrator. > =A0 =A0 =A0 =A0Sufficient replicas exist for the pool to continue functio= ning in a > =A0 =A0 =A0 =A0degraded state. > action: Online the device using 'zpool online' or replace the device with > =A0 =A0 =A0 =A0'zpool replace'. > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 =A0 0 =A0 =A0 = 0 =A02.20M resilvered > > errors: No known data errors > root@coyote:/# zpool status > =A0pool: tank > =A0state: DEGRADED > status: One or more devices has experienced an unrecoverable error. =A0An > =A0 =A0 =A0 =A0attempt was made to correct the error. =A0Applications are= unaffected. > action: Determine if the device needs to be replaced, and clear the error= s > =A0 =A0 =A0 =A0using 'zpool clear' or replace the device with 'zpool repl= ace'. > =A0 see: http://www.sun.com/msg/ZFS-8000-9P > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 339 =A0 =A0 0 = =A02.20M resilvered > > errors: No known data errors > root@coyote:/# zpool status > =A0pool: tank > =A0state: DEGRADED > status: One or more devices has been taken offline by the administrator. > =A0 =A0 =A0 =A0Sufficient replicas exist for the pool to continue functio= ning in a > =A0 =A0 =A0 =A0degraded state. > action: Online the device using 'zpool online' or replace the device with > =A0 =A0 =A0 =A0'zpool replace'. > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 =A0 0 =A0 =A0 = 0 =A02.20M resilvered > > errors: No known data errors > > > So I ran 'zpool status' thrice after the offline, and the second one repo= rts > write errors on the OFFLINE device (WTF?). Running zpool status in a loop= , this > will constantly show up and then vanish again. > > I also get constant write requests to the remaining device, even though n= o > applications are accessing it. What the hell is ZFS trying to do here? > > root@coyote:/# zpool iostat 1 > =A0 =A0 =A0 =A0 =A0 =A0 =A0 capacity =A0 =A0 operations =A0 =A0bandwidth > pool =A0 =A0 =A0 =A0 used =A0avail =A0 read =A0write =A0 read =A0write > ---------- =A0----- =A0----- =A0----- =A0----- =A0----- =A0----- > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0246 =A056.8K =A01.= 53M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0249 =A055.9K =A01.= 55M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0250 =A055.0K =A01.= 54M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0252 =A054.1K =A01.= 56M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0254 =A053.3K =A01.= 57M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0253 =A052.5K =A01.= 56M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A07 =A0 =A0255 =A051.7K =A01.= 57M > ^C > > Again, WTF? Can someone please enlighten me here? > > Cheers, > Ulrich Sp=F6rlein > -- > http://www.dubistterrorist.de/ > --=20 When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c1674c90906050428mafb5760gc706e879193345e0>