From owner-freebsd-stable@FreeBSD.ORG Fri Jun 5 11:28:40 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1D13D106568C; Fri, 5 Jun 2009 11:28:40 +0000 (UTC) (envelope-from mat.macy@gmail.com) Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.29]) by mx1.freebsd.org (Postfix) with ESMTP id ACC1F8FC08; Fri, 5 Jun 2009 11:28:39 +0000 (UTC) (envelope-from mat.macy@gmail.com) Received: by yw-out-2324.google.com with SMTP id 9so777469ywe.13 for ; Fri, 05 Jun 2009 04:28:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=FzXyvIzN1WDzft8cZdjZJfWpiNCLue/qrvRYzvN9ie0=; b=kuH+gT7R10PEYlkwQSvyonjpPSo+37ruT/SLrOsy4jTD2XMvItdsuQgk94wYGnkD9h 7cVcZUobi4jy7xZUvH+xTDsuapd6wNmG6BS3Uy0OL/HRwCSnzocQztU2lAnhRbQ9QQx8 8Gmer1jB9q49F1w5P7wqDUhrClLa/JSdyp/zE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=cmIoFK6hgE9gCMiy2TBlNxAqi37/uhCOyVbejjz1jleVbOGwDyTof657YGGskM2zBP ZItZvcvL/d97HhvpFL4UzatankrKI8aATXmF43Pyy6yRBy90BaGoy0IzQyu/esQARnKu 9tH7H7Ejy7f2jhlFP2y4sYw1yaZRF84RuV/HU= MIME-Version: 1.0 Sender: mat.macy@gmail.com Received: by 10.100.3.6 with SMTP id 6mr3774681anc.33.1244201319030; Fri, 05 Jun 2009 04:28:39 -0700 (PDT) In-Reply-To: <20090605084423.GA1609@acme.spoerlein.net> References: <20090602091610.GE93344@acme.spoerlein.net> <20090602092408.GF93344@acme.spoerlein.net> <20090605084423.GA1609@acme.spoerlein.net> Date: Fri, 5 Jun 2009 04:28:38 -0700 X-Google-Sender-Auth: dd75ec42a5081d1d Message-ID: <3c1674c90906050428mafb5760gc706e879193345e0@mail.gmail.com> From: Kip Macy To: Kip Macy , freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: ZFS weird device tasting loop since MFC X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jun 2009 11:28:40 -0000 Must be a weird geom interaction. I don't see this with raw disk. I'll look at it eventually but UMA and performance are further up in the queue. -Kip On Fri, Jun 5, 2009 at 1:44 AM, Ulrich Sp=F6rlein wrote: > On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Sp=F6rlein wrote: >> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Sp=F6rlein wrote: >> > Hi all, >> > >> > so I went ahead and updated my ~7.2 file server to the new ZFS goodnes= s, >> > and before running any further tests, I already discovered something >> > weird and annoying. >> > >> > I'm using a mirror on GELI, where one disk is usually *not* attached a= s >> > a means of poor man's backup. (I had to go that route, as send/recv of >> > snapshots frequently deadlocked the system, whereas a mirror scrubbing >> > did not) >> > >> > root@coyote:~# zpool status >> > =A0 pool: tank >> > =A0state: DEGRADED >> > status: The pool is formatted using an older on-disk format. =A0The po= ol can >> > =A0 =A0 =A0 =A0 still be used, but some features are unavailable. >> > action: Upgrade the pool using 'zpool upgrade'. =A0Once this is done, = the >> > =A0 =A0 =A0 =A0 pool will no longer be accessible on older software ve= rsions. >> > =A0scrub: none requested >> > config: >> > >> > =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0STATE = =A0 =A0 READ WRITE CKSUM >> > =A0 =A0 =A0 =A0 tank =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DEGRAD= ED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >> > =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DEGRADED= =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >> > =A0 =A0 =A0 =A0 =A0 =A0 ad4.eli =A0 =A0 =A0 =A0 =A0 =A0 =A0 ONLINE =A0= =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 >> > =A0 =A0 =A0 =A0 =A0 =A0 12333765091756463941 =A0REMOVED =A0 =A0 =A00 = =A0 =A0 0 =A0 =A0 0 =A0was /dev/da0.eli >> > >> > errors: No known data errors >> > >> > When imported, there is a constant "tasting" of all devices in the sys= tem, >> > which also makes the floppy drive go spinning constantly, which is rea= lly >> > annoying. It did not do this with the old ZFS, are there any remedies? >> > >> > gstat(8) is displaying the following every other second, together with= a >> > spinning fd0 drive. >> > >> > dT: 1.010s =A0w: 1.000s =A0filter: ^...$ >> > =A0L(q) =A0ops/s =A0 =A0r/s =A0 kBps =A0 ms/r =A0 =A0w/s =A0 kBps =A0 = ms/w =A0 %busy Name >> > =A0 =A0 0 =A0 =A0 =A00 =A0 =A0 =A00 =A0 =A0 =A00 =A0 =A00.0 =A0 =A0 = =A00 =A0 =A0 =A00 =A0 =A00.0 =A0 =A00.0| fd0 >> > =A0 =A0 0 =A0 =A0 =A08 =A0 =A0 =A08 =A0 1014 =A0 =A00.1 =A0 =A0 =A00 = =A0 =A0 =A00 =A0 =A00.0 =A0 =A00.1| md0 >> > =A0 =A0 0 =A0 =A0 32 =A0 =A0 32 =A0 4055 =A0 =A09.2 =A0 =A0 =A00 =A0 = =A0 =A00 =A0 =A00.0 =A0 29.2| ad0 >> > =A0 =A0 0 =A0 =A0 77 =A0 =A0 10 =A0 1267 =A0 =A07.1 =A0 =A0 63 =A0 112= 5 =A0 =A02.3 =A0 31.8| ad4 >> > >> > There is no activity going on, especially md0 is for /tmp, yet it >> > constantly tries to read stuff from everywhere. I will now insert the >> > second drive and see if ZFS shuts up then ... >> >> It does, but it also did not start resilvering the second disk: >> >> root@coyote:~# zpool status >> =A0 pool: tank >> =A0state: ONLINE >> status: One or more devices has experienced an unrecoverable error. =A0A= n >> =A0 =A0 =A0 =A0 attempt was made to correct the error. =A0Applications a= re unaffected. >> action: Determine if the device needs to be replaced, and clear the erro= rs >> =A0 =A0 =A0 =A0 using 'zpool clear' or replace the device with 'zpool re= place'. >> =A0 =A0see: http://www.sun.com/msg/ZFS-8000-9P >> =A0scrub: none requested >> config: >> >> =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM >> =A0 =A0 =A0 =A0 tank =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0 ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0 da0.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A016 >> >> errors: No known data errors >> >> Will now run the scrub and report back in 6-9h. > > Another datapoint: While the floppy-tasting has stopped, since the mirror= sees > all devices again, there is some other problem here: > > root@coyote:/# zpool online tank da0.eli > root@coyote:/# zpool status > =A0pool: tank > =A0state: ONLINE > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A02.20M resilvered > > errors: No known data errors > root@coyote:/# zpool offline tank da0.eli > root@coyote:/# zpool status > =A0pool: tank > =A0state: DEGRADED > status: One or more devices has been taken offline by the administrator. > =A0 =A0 =A0 =A0Sufficient replicas exist for the pool to continue functio= ning in a > =A0 =A0 =A0 =A0degraded state. > action: Online the device using 'zpool online' or replace the device with > =A0 =A0 =A0 =A0'zpool replace'. > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 =A0 0 =A0 =A0 = 0 =A02.20M resilvered > > errors: No known data errors > root@coyote:/# zpool status > =A0pool: tank > =A0state: DEGRADED > status: One or more devices has experienced an unrecoverable error. =A0An > =A0 =A0 =A0 =A0attempt was made to correct the error. =A0Applications are= unaffected. > action: Determine if the device needs to be replaced, and clear the error= s > =A0 =A0 =A0 =A0using 'zpool clear' or replace the device with 'zpool repl= ace'. > =A0 see: http://www.sun.com/msg/ZFS-8000-9P > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 339 =A0 =A0 0 = =A02.20M resilvered > > errors: No known data errors > root@coyote:/# zpool status > =A0pool: tank > =A0state: DEGRADED > status: One or more devices has been taken offline by the administrator. > =A0 =A0 =A0 =A0Sufficient replicas exist for the pool to continue functio= ning in a > =A0 =A0 =A0 =A0degraded state. > action: Online the device using 'zpool online' or replace the device with > =A0 =A0 =A0 =A0'zpool replace'. > =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:= 21:36 2009 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 > =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 =A0684K resilvered > =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 =A0 0 =A0 =A0 = 0 =A02.20M resilvered > > errors: No known data errors > > > So I ran 'zpool status' thrice after the offline, and the second one repo= rts > write errors on the OFFLINE device (WTF?). Running zpool status in a loop= , this > will constantly show up and then vanish again. > > I also get constant write requests to the remaining device, even though n= o > applications are accessing it. What the hell is ZFS trying to do here? > > root@coyote:/# zpool iostat 1 > =A0 =A0 =A0 =A0 =A0 =A0 =A0 capacity =A0 =A0 operations =A0 =A0bandwidth > pool =A0 =A0 =A0 =A0 used =A0avail =A0 read =A0write =A0 read =A0write > ---------- =A0----- =A0----- =A0----- =A0----- =A0----- =A0----- > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0246 =A056.8K =A01.= 53M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0249 =A055.9K =A01.= 55M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0250 =A055.0K =A01.= 54M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0252 =A054.1K =A01.= 56M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0254 =A053.3K =A01.= 57M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0253 =A052.5K =A01.= 56M > tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A07 =A0 =A0255 =A051.7K =A01.= 57M > ^C > > Again, WTF? Can someone please enlighten me here? > > Cheers, > Ulrich Sp=F6rlein > -- > http://www.dubistterrorist.de/ > --=20 When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke