Date: Fri, 5 Jun 2009 10:44:23 +0200 From: Ulrich =?utf-8?B?U3DDtnJsZWlu?= <uqs@spoerlein.net> To: Kip Macy <kmacy@FreeBSD.org> Cc: freebsd-stable@FreeBSD.org Subject: Re: ZFS weird device tasting loop since MFC Message-ID: <20090605084423.GA1609@acme.spoerlein.net> In-Reply-To: <20090602092408.GF93344@acme.spoerlein.net> References: <20090602091610.GE93344@acme.spoerlein.net> <20090602092408.GF93344@acme.spoerlein.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Spörlein wrote: > On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Spörlein wrote: > > Hi all, > > > > so I went ahead and updated my ~7.2 file server to the new ZFS goodness, > > and before running any further tests, I already discovered something > > weird and annoying. > > > > I'm using a mirror on GELI, where one disk is usually *not* attached as > > a means of poor man's backup. (I had to go that route, as send/recv of > > snapshots frequently deadlocked the system, whereas a mirror scrubbing > > did not) > > > > root@coyote:~# zpool status > > pool: tank > > state: DEGRADED > > status: The pool is formatted using an older on-disk format. The pool can > > still be used, but some features are unavailable. > > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > > pool will no longer be accessible on older software versions. > > scrub: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > tank DEGRADED 0 0 0 > > mirror DEGRADED 0 0 0 > > ad4.eli ONLINE 0 0 0 > > 12333765091756463941 REMOVED 0 0 0 was /dev/da0.eli > > > > errors: No known data errors > > > > When imported, there is a constant "tasting" of all devices in the system, > > which also makes the floppy drive go spinning constantly, which is really > > annoying. It did not do this with the old ZFS, are there any remedies? > > > > gstat(8) is displaying the following every other second, together with a > > spinning fd0 drive. > > > > dT: 1.010s w: 1.000s filter: ^...$ > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > > 0 0 0 0 0.0 0 0 0.0 0.0| fd0 > > 0 8 8 1014 0.1 0 0 0.0 0.1| md0 > > 0 32 32 4055 9.2 0 0 0.0 29.2| ad0 > > 0 77 10 1267 7.1 63 1125 2.3 31.8| ad4 > > > > There is no activity going on, especially md0 is for /tmp, yet it > > constantly tries to read stuff from everywhere. I will now insert the > > second drive and see if ZFS shuts up then ... > > It does, but it also did not start resilvering the second disk: > > root@coyote:~# zpool status > pool: tank > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > ad4.eli ONLINE 0 0 0 > da0.eli ONLINE 0 0 16 > > errors: No known data errors > > Will now run the scrub and report back in 6-9h. Another datapoint: While the floppy-tasting has stopped, since the mirror sees all devices again, there is some other problem here: root@coyote:/# zpool online tank da0.eli root@coyote:/# zpool status pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ad4.eli ONLINE 0 0 0 684K resilvered da0.eli ONLINE 0 0 0 2.20M resilvered errors: No known data errors root@coyote:/# zpool offline tank da0.eli root@coyote:/# zpool status pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 ad4.eli ONLINE 0 0 0 684K resilvered da0.eli OFFLINE 0 0 0 2.20M resilvered errors: No known data errors root@coyote:/# zpool status pool: tank state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 ad4.eli ONLINE 0 0 0 684K resilvered da0.eli OFFLINE 0 339 0 2.20M resilvered errors: No known data errors root@coyote:/# zpool status pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 ad4.eli ONLINE 0 0 0 684K resilvered da0.eli OFFLINE 0 0 0 2.20M resilvered errors: No known data errors So I ran 'zpool status' thrice after the offline, and the second one reports write errors on the OFFLINE device (WTF?). Running zpool status in a loop, this will constantly show up and then vanish again. I also get constant write requests to the remaining device, even though no applications are accessing it. What the hell is ZFS trying to do here? root@coyote:/# zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- tank 883G 48.4G 8 246 56.8K 1.53M tank 883G 48.4G 8 249 55.9K 1.55M tank 883G 48.4G 8 250 55.0K 1.54M tank 883G 48.4G 8 252 54.1K 1.56M tank 883G 48.4G 8 254 53.3K 1.57M tank 883G 48.4G 8 253 52.5K 1.56M tank 883G 48.4G 7 255 51.7K 1.57M ^C Again, WTF? Can someone please enlighten me here? Cheers, Ulrich Spörlein -- http://www.dubistterrorist.de/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090605084423.GA1609>