From owner-freebsd-stable@FreeBSD.ORG  Fri Jun  5 08:44:26 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 085321065673;
	Fri,  5 Jun 2009 08:44:26 +0000 (UTC)
	(envelope-from uqs@spoerlein.net)
Received: from acme.spoerlein.net (cl-43.dus-01.de.sixxs.net
	[IPv6:2a01:198:200:2a::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 786368FC19;
	Fri,  5 Jun 2009 08:44:25 +0000 (UTC)
	(envelope-from uqs@spoerlein.net)
Received: from acme.spoerlein.net (localhost.spoerlein.net [127.0.0.1])
	by acme.spoerlein.net (8.14.3/8.14.3) with ESMTP id n558iNSm015007
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 5 Jun 2009 10:44:23 +0200 (CEST)
	(envelope-from uqs@spoerlein.net)
Received: (from uqs@localhost)
	by acme.spoerlein.net (8.14.3/8.14.3/Submit) id n558iNK0015006;
	Fri, 5 Jun 2009 10:44:23 +0200 (CEST)
	(envelope-from uqs@spoerlein.net)
Date: Fri, 5 Jun 2009 10:44:23 +0200
From: Ulrich =?utf-8?B?U3DDtnJsZWlu?= <uqs@spoerlein.net>
To: Kip Macy <kmacy@FreeBSD.org>
Message-ID: <20090605084423.GA1609@acme.spoerlein.net>
Mail-Followup-To: Kip Macy <kmacy@FreeBSD.org>, freebsd-stable@freebsd.org
References: <20090602091610.GE93344@acme.spoerlein.net>
	<20090602092408.GF93344@acme.spoerlein.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20090602092408.GF93344@acme.spoerlein.net>
User-Agent: Mutt/1.5.19 (2009-01-05)
Cc: freebsd-stable@FreeBSD.org
Subject: Re: ZFS weird device tasting loop since MFC
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jun 2009 08:44:26 -0000

On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Spörlein wrote:
> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Spörlein wrote:
> > Hi all,
> > 
> > so I went ahead and updated my ~7.2 file server to the new ZFS goodness,
> > and before running any further tests, I already discovered something
> > weird and annoying.
> > 
> > I'm using a mirror on GELI, where one disk is usually *not* attached as
> > a means of poor man's backup. (I had to go that route, as send/recv of
> > snapshots frequently deadlocked the system, whereas a mirror scrubbing
> > did not)
> > 
> > root@coyote:~# zpool status
> >   pool: tank
> >  state: DEGRADED
> > status: The pool is formatted using an older on-disk format.  The pool can
> >         still be used, but some features are unavailable.
> > action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
> >         pool will no longer be accessible on older software versions.
> >  scrub: none requested
> > config:
> > 
> >         NAME                      STATE     READ WRITE CKSUM
> >         tank                      DEGRADED     0     0     0
> >           mirror                  DEGRADED     0     0     0
> >             ad4.eli               ONLINE       0     0     0
> >             12333765091756463941  REMOVED      0     0     0  was /dev/da0.eli
> > 
> > errors: No known data errors
> > 
> > When imported, there is a constant "tasting" of all devices in the system,
> > which also makes the floppy drive go spinning constantly, which is really
> > annoying. It did not do this with the old ZFS, are there any remedies?
> > 
> > gstat(8) is displaying the following every other second, together with a
> > spinning fd0 drive.
> > 
> > dT: 1.010s  w: 1.000s  filter: ^...$
> >  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
> >     0      0      0      0    0.0      0      0    0.0    0.0| fd0
> >     0      8      8   1014    0.1      0      0    0.0    0.1| md0
> >     0     32     32   4055    9.2      0      0    0.0   29.2| ad0
> >     0     77     10   1267    7.1     63   1125    2.3   31.8| ad4
> > 
> > There is no activity going on, especially md0 is for /tmp, yet it
> > constantly tries to read stuff from everywhere. I will now insert the
> > second drive and see if ZFS shuts up then ...
> 
> It does, but it also did not start resilvering the second disk:
> 
> root@coyote:~# zpool status
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
> 
>         NAME         STATE     READ WRITE CKSUM
>         tank         ONLINE       0     0     0
>           mirror     ONLINE       0     0     0
>             ad4.eli  ONLINE       0     0     0
>             da0.eli  ONLINE       0     0    16
> 
> errors: No known data errors
> 
> Will now run the scrub and report back in 6-9h.

Another datapoint: While the floppy-tasting has stopped, since the mirror sees
all devices again, there is some other problem here:

root@coyote:/# zpool online tank da0.eli
root@coyote:/# zpool status
  pool: tank
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  ONLINE       0     0     0  2.20M resilvered

errors: No known data errors
root@coyote:/# zpool offline tank da0.eli
root@coyote:/# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  OFFLINE      0     0     0  2.20M resilvered

errors: No known data errors
root@coyote:/# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  OFFLINE      0   339     0  2.20M resilvered

errors: No known data errors
root@coyote:/# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: resilver completed after 0h0m with 0 errors on Fri Jun  5 10:21:36 2009
config:

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          mirror     DEGRADED     0     0     0
            ad4.eli  ONLINE       0     0     0  684K resilvered
            da0.eli  OFFLINE      0     0     0  2.20M resilvered

errors: No known data errors


So I ran 'zpool status' thrice after the offline, and the second one reports
write errors on the OFFLINE device (WTF?). Running zpool status in a loop, this
will constantly show up and then vanish again.

I also get constant write requests to the remaining device, even though no
applications are accessing it. What the hell is ZFS trying to do here?

root@coyote:/# zpool iostat 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank         883G  48.4G      8    246  56.8K  1.53M
tank         883G  48.4G      8    249  55.9K  1.55M
tank         883G  48.4G      8    250  55.0K  1.54M
tank         883G  48.4G      8    252  54.1K  1.56M
tank         883G  48.4G      8    254  53.3K  1.57M
tank         883G  48.4G      8    253  52.5K  1.56M
tank         883G  48.4G      7    255  51.7K  1.57M
^C

Again, WTF? Can someone please enlighten me here?

Cheers,
Ulrich Spörlein
-- 
http://www.dubistterrorist.de/