Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Jun 2009 10:44:23 +0200
From:      Ulrich =?utf-8?B?U3DDtnJsZWlu?= <uqs@spoerlein.net>
To:        Kip Macy <kmacy@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: ZFS weird device tasting loop since MFC
Message-ID:  <20090605084423.GA1609@acme.spoerlein.net>
In-Reply-To: <20090602092408.GF93344@acme.spoerlein.net>
References:  <20090602091610.GE93344@acme.spoerlein.net> <20090602092408.GF93344@acme.spoerlein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Spörlein wrote:
> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Spörlein wrote:
> > Hi all,
> > 
> > so I went ahead and updated my ~7.2 file server to the new ZFS goodness,
> > and before running any further tests, I already discovered something
> > weird and annoying.
> > 
> > I'm using a mirror on GELI, where one disk is usually *not* attached as
> > a means of poor man's backup. (I had to go that route, as send/recv of
> > snapshots frequently deadlocked the system, whereas a mirror scrubbing
> > did not)
> > 
> > root@coyote:~# zpool status
> >   pool: tank
> >  state: DEGRADED
> > status: The pool is formatted using an older on-disk format.  The pool can
> >         still be used, but some features are unavailable.
> > action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
> >         pool will no longer be accessible on older software versions.
> >  scrub: none requested
> > config:
> > 
> >         NAME                      STATE     READ WRITE CKSUM
> >         tank                      DEGRADED     0     0     0
> >           mirror                  DEGRADED     0     0     0
> >             ad4.eli               ONLINE       0     0     0
> >             12333765091756463941  REMOVED      0     0     0  was /dev/da0.eli
> > 
> > errors: No known data errors
> > 
> > When imported, there is a constant "tasting" of all devices in the system,
> > which also makes the floppy drive go spinning constantly, which is really
> > annoying. It did not do this with the old ZFS, are there any remedies?
> > 
> > gstat(8) is displaying the following every other second, together with a
> > spinning fd0 drive.
> > 
> > dT: 1.010s  w: 1.000s  filter: ^...$
> >  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
> >     0      0      0      0    0.0      0      0    0.0    0.0| fd0
> >     0      8      8   1014    0.1      0      0    0.0    0.1| md0
> >     0     32     32   4055    9.2      0      0    0.0   29.2| ad0
> >     0     77     10   1267    7.1     63   1125    2.3   31.8| ad4
> > 
> > There is no activity going on, especially md0 is for /tmp, yet it
> > constantly tries to read stuff from everywhere. I will now insert the
> > second drive and see if ZFS shuts up then ...
> 
> It does, but it also did not start resilvering the second disk:
> 
> root@coyote:~# zpool status
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
> 
>         NAME         STATE     READ WRITE CKSUM
>         tank         ONLINE       0     0     0
>           mirror     ONLINE       0     0     0
>             ad4.eli  ONLINE       0     0     0
>             da0.eli  ONLINE       0     0    16
> 
> errors: No known data errors
> 
> Will now run the scrub and report back in 6-9h.

Another datapoint: While the floppy-tasting has stopped, since the mirror sees
all devices again, there is some other problem here:

root@coyote:/# zpool online tank da0.eli
root@coyote:/# zpool status
  pool: tank
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  ONLINE       0     0     0  2.20M resilvered

errors: No known data errors
root@coyote:/# zpool offline tank da0.eli
root@coyote:/# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  OFFLINE      0     0     0  2.20M resilvered

errors: No known data errors
root@coyote:/# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  OFFLINE      0   339     0  2.20M resilvered

errors: No known data errors
root@coyote:/# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  OFFLINE      0     0     0  2.20M resilvered

errors: No known data errors


So I ran 'zpool status' thrice after the offline, and the second one reports
write errors on the OFFLINE device (WTF?). Running zpool status in a loop, this
will constantly show up and then vanish again.

I also get constant write requests to the remaining device, even though no
applications are accessing it. What the hell is ZFS trying to do here?

root@coyote:/# zpool iostat 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank         883G  48.4G      8    246  56.8K  1.53M
tank         883G  48.4G      8    249  55.9K  1.55M
tank         883G  48.4G      8    250  55.0K  1.54M
tank         883G  48.4G      8    252  54.1K  1.56M
tank         883G  48.4G      8    254  53.3K  1.57M
tank         883G  48.4G      8    253  52.5K  1.56M
tank         883G  48.4G      7    255  51.7K  1.57M
^C

Again, WTF? Can someone please enlighten me here?

Cheers,
Ulrich Spörlein
-- 
http://www.dubistterrorist.de/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090605084423.GA1609>