From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 03:57:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C149F106564A for ; Mon, 7 Jun 2010 03:57:01 +0000 (UTC) (envelope-from jurgen@ish.com.au) Received: from fish.ish.com.au (eth5921.nsw.adsl.internode.on.net [59.167.240.32]) by mx1.freebsd.org (Postfix) with ESMTP id 7D1D38FC15 for ; Mon, 7 Jun 2010 03:57:01 +0000 (UTC) Received: from ip-211.ish.com.au ([203.29.62.211]:17037 helo=ish.com.au) by fish.ish.com.au with esmtp (Exim 4.69) (envelope-from ) id 1OLTHA-0005k8-2D for freebsd-fs@freebsd.org; Mon, 07 Jun 2010 13:45:24 +1000 Received: from [203.29.62.154] (HELO ip-154.ish.com.au) by ish.com.au (CommuniGate Pro SMTP 5.3.7) with ESMTP id 5950264 for freebsd-fs@freebsd.org; Mon, 07 Jun 2010 13:45:24 +1000 Message-ID: <4C0C6B54.8020005@ish.com.au> Date: Mon, 07 Jun 2010 13:45:24 +1000 From: Jurgen Weber User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100310 Shredder/3.0.4pre MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: zfs filesystem problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 03:57:01 -0000 Hello I have a FreeBSD 8.0-p2 system, which runs two pools. One with 6 disks all mirrored for our data and another mirrored pool for the OS. The system has 16GB of RAM. I have a nightly cron script running which takes a snapshot of a particular file system within the storage pool. This has been running for just over a month now without any issues until this weekend. Now we can not access the mentioned file system. If we try to `ls` to it or `cd` into it the shell locks up (not even kill -9 can stop the `ls` processes, etc) and top shows that the process state is `zfs`. This file system is the root of a jail. While the jailed system works fine right now I can not help but feel its time is limited. Any suggestions on how to get this file system functioning normally again? Thanks Jurgen --------------------------> ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 06:10:39 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0492D106566C for ; Mon, 7 Jun 2010 06:10:39 +0000 (UTC) (envelope-from sergiy.suprun@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id AD6388FC08 for ; Mon, 7 Jun 2010 06:10:38 +0000 (UTC) Received: by vws4 with SMTP id 4so1370560vws.13 for ; Sun, 06 Jun 2010 23:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=0t2eCNSblkjj/HvQgGpG4Evn2I3TosdUQ9qTttmbmHs=; b=m5NFho0eRBdxbJiyAjViXi+wTjxIgsyLfV3TLWHk93A8WXheLM8Gur6WCdCcaZF+TM /wlKjqcjF5dK8A+xTXrAXfzC1vl7Zlo8VFesCGH9HP3Mwtx85hRuGXAG7KLQWqeQ6+LE NlNHoN5HOuocHFobtZe1SSAfJZ/6RXuTcvecI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=FSSQ37GMwIJ7PhQsd3qYs50FYbu8UMjYegGlM5mbc2QbKl52JZqF+HIR7Q4hcnxyKo 2a6GCTg+yNcQiRigde3PofTl/hV/gw+1HEMSKrhwbFTlq4iF1CVThlFQ2sWmHVRvg7dC uSSp7XqHknpM4CGydvNfhHY2owFKR5pgS1ehI= MIME-Version: 1.0 Received: by 10.224.59.12 with SMTP id j12mr948367qah.94.1275889570778; Sun, 06 Jun 2010 22:46:10 -0700 (PDT) Received: by 10.224.19.133 with HTTP; Sun, 6 Jun 2010 22:46:10 -0700 (PDT) In-Reply-To: <4C0C6B54.8020005@ish.com.au> References: <4C0C6B54.8020005@ish.com.au> Date: Mon, 7 Jun 2010 08:46:10 +0300 Message-ID: From: Sergiy Suprun To: Jurgen Weber Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: zfs filesystem problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 06:10:39 -0000 On Mon, Jun 7, 2010 at 06:45, Jurgen Weber wrote: > Hello > > I have a FreeBSD 8.0-p2 system, which runs two pools. One with 6 disks all > mirrored for our data and another mirrored pool for the OS. The system has > 16GB of RAM. > > I have a nightly cron script running which takes a snapshot of a particular > file system within the storage pool. This has been running for just over a > month now without any issues until this weekend. > > Now we can not access the mentioned file system. If we try to `ls` to it or > `cd` into it the shell locks up (not even kill -9 can stop the `ls` > processes, etc) and top shows that the process state is `zfs`. > > This file system is the root of a jail. While the jailed system works fine > right now I can not help but feel its time is limited. > > Any suggestions on how to get this file system functioning normally again? > > Thanks > > Jurgen > --------------------------> > ish > http://www.ish.com.au > Level 1, 30 Wilson Street Newtown 2042 Australia > phone +61 2 9550 5001 fax +61 2 9550 4001 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Hello. How about scrub ? And which size of your pools and how many place used by data+snapshots? From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 08:15:57 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EE40B106564A for ; Mon, 7 Jun 2010 08:15:57 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 27EF38FC1B for ; Mon, 7 Jun 2010 08:15:56 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA18004 for ; Mon, 07 Jun 2010 11:15:55 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OLXUx-000G9l-DD for freebsd-fs@freebsd.org; Mon, 07 Jun 2010 11:15:55 +0300 Message-ID: <4C0CAABA.2010506@icyb.net.ua> Date: Mon, 07 Jun 2010 11:15:54 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100321) MIME-Version: 1.0 To: freebsd-fs@freebsd.org X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 08:15:58 -0000 During recent zpool scrub one read error was detected and "128K repaired". In system log I see the following message: ZFS: vdev I/O failure, zpool=tank path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 size=131072 error=5 On the other hand, there are no other errors, nothing from geom, ahci, etc. Why would that happen? What kind of error could this be? Thanks! -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 08:34:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E44E1065686 for ; Mon, 7 Jun 2010 08:34:30 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id 3D7618FC0A for ; Mon, 7 Jun 2010 08:34:29 +0000 (UTC) Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19]) by qmta07.emeryville.ca.mail.comcast.net with comcast id SwZZ1e0040QkzPwA7waV7u; Mon, 07 Jun 2010 08:34:29 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta02.emeryville.ca.mail.comcast.net with comcast id SwaU1e0043S48mS8NwaUfD; Mon, 07 Jun 2010 08:34:29 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6EBA29B418; Mon, 7 Jun 2010 01:34:28 -0700 (PDT) Date: Mon, 7 Jun 2010 01:34:28 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20100607083428.GA48419@icarus.home.lan> References: <4C0CAABA.2010506@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C0CAABA.2010506@icyb.net.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 08:34:31 -0000 On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: > During recent zpool scrub one read error was detected and "128K repaired". > > In system log I see the following message: > ZFS: vdev I/O failure, zpool=tank > path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 > size=131072 error=5 > > On the other hand, there are no other errors, nothing from geom, ahci, etc. > Why would that happen? What kind of error could this be? I believe this indicates silent data corruption[1], which ZFS can auto-correct if the pool is a mirror or raidz (otherwise it can detect the problem but not fix it). This can happen for a lot of reasons, but tracking down the source is often difficult. Usually it indicates the disk itself has some kind of problem (cache going bad, some sector remaps which didn't happen or failed, etc.). What I'd need to determine the cause: - Full "zpool status tank" output before the scrub - Full "zpool status tank" output after the scrub - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank" Furthermore, what made you decide to scrub the pool on a whim? [1]: http://blogs.sun.com/elowe/entry/zfs_saves_the_day_ta http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data http://blogs.sun.com/bonwick/entry/raid_z -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 08:55:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 092601065674 for ; Mon, 7 Jun 2010 08:55:30 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4B9778FC19 for ; Mon, 7 Jun 2010 08:55:28 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA19509; Mon, 07 Jun 2010 11:55:25 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OLY7A-000GDF-QW; Mon, 07 Jun 2010 11:55:24 +0300 Message-ID: <4C0CB3FC.8070001@icyb.net.ua> Date: Mon, 07 Jun 2010 11:55:24 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100321) MIME-Version: 1.0 To: Jeremy Chadwick References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> In-Reply-To: <20100607083428.GA48419@icarus.home.lan> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 08:55:30 -0000 on 07/06/2010 11:34 Jeremy Chadwick said the following: > On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: >> During recent zpool scrub one read error was detected and "128K repaired". >> >> In system log I see the following message: >> ZFS: vdev I/O failure, zpool=tank >> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 >> size=131072 error=5 >> >> On the other hand, there are no other errors, nothing from geom, ahci, etc. >> Why would that happen? What kind of error could this be? > > I believe this indicates silent data corruption[1], which ZFS can > auto-correct if the pool is a mirror or raidz (otherwise it can detect > the problem but not fix it). This pool is a mirror. > This can happen for a lot of reasons, but > tracking down the source is often difficult. Usually it indicates the > disk itself has some kind of problem (cache going bad, some sector > remaps which didn't happen or failed, etc.). Please note that this is not a CKSUM error, but READ error. > What I'd need to determine the cause: > > - Full "zpool status tank" output before the scrub This was "all clear". > - Full "zpool status tank" output after the scrub zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 5h0m with 0 errors on Sat Jun 5 05:05:43 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ada0p4 ONLINE 0 0 0 gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff ONLINE 1 0 0 128K repaired > - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank" Those output for both disks are "perfect". I monitor them regularly, also smartd is running and complaints from it. > Furthermore, what made you decide to scrub the pool on a whim? Why on a whim? It was a regularly scheduled scrub (bi-weekly). -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 09:08:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70BAF106564A for ; Mon, 7 Jun 2010 09:08:52 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id 57BF88FC13 for ; Mon, 7 Jun 2010 09:08:51 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta02.emeryville.ca.mail.comcast.net with comcast id Sx7f1e0020mlR8UA2x8r8r; Mon, 07 Jun 2010 09:08:51 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta11.emeryville.ca.mail.comcast.net with comcast id Sx8q1e0043S48mS8Xx8r6q; Mon, 07 Jun 2010 09:08:51 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 9563B9B418; Mon, 7 Jun 2010 02:08:50 -0700 (PDT) Date: Mon, 7 Jun 2010 02:08:50 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20100607090850.GA49166@icarus.home.lan> References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C0CB3FC.8070001@icyb.net.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 09:08:52 -0000 On Mon, Jun 07, 2010 at 11:55:24AM +0300, Andriy Gapon wrote: > on 07/06/2010 11:34 Jeremy Chadwick said the following: > > On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: > >> During recent zpool scrub one read error was detected and "128K repaired". > >> > >> In system log I see the following message: > >> ZFS: vdev I/O failure, zpool=tank > >> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 > >> size=131072 error=5 > >> > >> On the other hand, there are no other errors, nothing from geom, ahci, etc. > >> Why would that happen? What kind of error could this be? > > > > I believe this indicates silent data corruption[1], which ZFS can > > auto-correct if the pool is a mirror or raidz (otherwise it can detect > > the problem but not fix it). > > This pool is a mirror. > > > This can happen for a lot of reasons, but > > tracking down the source is often difficult. Usually it indicates the > > disk itself has some kind of problem (cache going bad, some sector > > remaps which didn't happen or failed, etc.). > > Please note that this is not a CKSUM error, but READ error. Okay, then it indicates reading some data off the disk failed. ZFS auto-corrected it by reading the data from the other member in the pool (ada0p4). That's confirmed here: > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > ada0p4 ONLINE 0 0 0 > gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff ONLINE 1 0 0 128K repaired > > - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank" > > Those output for both disks are "perfect". > I monitor them regularly, also smartd is running and complaints from it. Most people I know if do not know how to interpret SMART statistics, and that's not their fault -- and that's why I requested them. :-) In this case, I'd like to see "smartctl -a" output for the disk that's associated with the above GPT ID. There may be some attributes or data in the SMART error log which could indicate what's going on. smartd does not know how to interpret data; it just logs what it sees. > > Furthermore, what made you decide to scrub the pool on a whim? > > Why on a whim? It was a regularly scheduled scrub (bi-weekly). I'm still trying to figure out why people do this. ZFS will automatically detect and correct errors of this sort when it encounters them during normal operation. It's good that you caught an error ahead of time, but ZFS would have dealt with this on its own. It's important to remember that scrubs are *highly* intensive on both the system itself as well as on all pool members. Disk I/O activity is very heavy during a scrub; it's not considered "normal use". -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 09:28:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA1FF106564A for ; Mon, 7 Jun 2010 09:28:49 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D6D2D8FC19 for ; Mon, 7 Jun 2010 09:28:48 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA19992; Mon, 07 Jun 2010 12:28:44 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OLYdP-000GFz-Mi; Mon, 07 Jun 2010 12:28:43 +0300 Message-ID: <4C0CBBCA.3050304@icyb.net.ua> Date: Mon, 07 Jun 2010 12:28:42 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100321) MIME-Version: 1.0 To: Jeremy Chadwick References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> In-Reply-To: <20100607090850.GA49166@icarus.home.lan> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 09:28:49 -0000 on 07/06/2010 12:08 Jeremy Chadwick said the following: > On Mon, Jun 07, 2010 at 11:55:24AM +0300, Andriy Gapon wrote: >> on 07/06/2010 11:34 Jeremy Chadwick said the following: >>> On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: >>>> During recent zpool scrub one read error was detected and "128K repaired". >>>> >>>> In system log I see the following message: >>>> ZFS: vdev I/O failure, zpool=tank >>>> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 >>>> size=131072 error=5 >>>> >>>> On the other hand, there are no other errors, nothing from geom, ahci, etc. >>>> Why would that happen? What kind of error could this be? >>> I believe this indicates silent data corruption[1], which ZFS can >>> auto-correct if the pool is a mirror or raidz (otherwise it can detect >>> the problem but not fix it). >> This pool is a mirror. >> >>> This can happen for a lot of reasons, but >>> tracking down the source is often difficult. Usually it indicates the >>> disk itself has some kind of problem (cache going bad, some sector >>> remaps which didn't happen or failed, etc.). >> Please note that this is not a CKSUM error, but READ error. > > Okay, then it indicates reading some data off the disk failed. ZFS > auto-corrected it by reading the data from the other member in the pool > (ada0p4). That's confirmed here: Yes, right, of course. If you read my original post you'll see that my question was: why ZFS saw I/O error, but disk/controller/geom/etc driver didn't see it. I do not see us moving towards an answer to that. >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are unaffected. >> >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> ada0p4 ONLINE 0 0 0 >> gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff ONLINE 1 0 0 128K repaired > >>> - Full "smartctl -a /dev/XXX" for all disk members of zpool "tank" >> Those output for both disks are "perfect". >> I monitor them regularly, also smartd is running and complaints from it. > > Most people I know if do not know how to interpret SMART statistics, and > that's not their fault -- and that's why I requested them. :-) I'll leave this without a comment. > In this > case, I'd like to see "smartctl -a" output for the disk that's > associated with the above GPT ID. There may be some attributes or data > in the SMART error log which could indicate what's going on. smartd > does not know how to interpret data; it just logs what it sees. $ smartctl -a /dev/ada1 smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.1-PRERELEASE amd64] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Blue Serial ATA family Device Model: WDC WD5000AAKS-00A7B2 Serial Number: WD-WMASY6905909 Firmware Version: 01.03B01 User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Jun 7 11:53:50 2010 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (11160) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 131) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 169 160 021 Pre-fail Always - 4516 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 53 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10385 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 30 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 52 194 Temperature_Celsius 0x0022 102 088 000 Old_age Always - 45 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 10331 - # 2 Extended offline Completed without error 00% 10237 - # 3 Short offline Completed without error 00% 10165 - # 4 Short offline Completed without error 00% 9999 - # 5 Short offline Completed without error 00% 9830 - # 6 Short offline Completed without error 00% 9662 - # 7 Extended offline Completed without error 00% 9496 - # 8 Short offline Completed without error 00% 9327 - # 9 Short offline Completed without error 00% 9159 - #10 Short offline Completed without error 00% 8992 - #11 Short offline Completed without error 00% 8824 - #12 Extended offline Completed without error 00% 8778 - #13 Short offline Completed without error 00% 8657 - #14 Short offline Completed without error 00% 8489 - #15 Short offline Completed without error 00% 8154 - #16 Extended offline Completed without error 00% 8036 - #17 Short offline Completed without error 00% 7986 - #18 Short offline Completed without error 00% 7819 - #19 Short offline Completed without error 00% 7651 - #20 Extended offline Completed without error 00% 7366 - #21 Short offline Completed without error 00% 7316 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 10:38:32 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DF5F1065672 for ; Mon, 7 Jun 2010 10:38:32 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.westchester.pa.mail.comcast.net (qmta13.westchester.pa.mail.comcast.net [76.96.59.243]) by mx1.freebsd.org (Postfix) with ESMTP id BFE788FC1D for ; Mon, 7 Jun 2010 10:38:31 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta13.westchester.pa.mail.comcast.net with comcast id SyYS1e0041vXlb85DyeXYP; Mon, 07 Jun 2010 10:38:31 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta17.westchester.pa.mail.comcast.net with comcast id SyeW1e00A3S48mS3dyeXaE; Mon, 07 Jun 2010 10:38:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 66A9F9B418; Mon, 7 Jun 2010 03:38:29 -0700 (PDT) Date: Mon, 7 Jun 2010 03:38:29 -0700 From: Jeremy Chadwick To: Andriy Gapon Message-ID: <20100607103829.GA50106@icarus.home.lan> References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <4C0CBBCA.3050304@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C0CBBCA.3050304@icyb.net.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 10:38:32 -0000 On Mon, Jun 07, 2010 at 12:28:42PM +0300, Andriy Gapon wrote: > on 07/06/2010 12:08 Jeremy Chadwick said the following: > > On Mon, Jun 07, 2010 at 11:55:24AM +0300, Andriy Gapon wrote: > >> on 07/06/2010 11:34 Jeremy Chadwick said the following: > >>> On Mon, Jun 07, 2010 at 11:15:54AM +0300, Andriy Gapon wrote: > >>>> During recent zpool scrub one read error was detected and "128K repaired". > >>>> > >>>> In system log I see the following message: > >>>> ZFS: vdev I/O failure, zpool=tank > >>>> path=/dev/gptid/536c6f78-e4f3-11de-b9f8-001cc08221ff offset=284456910848 > >>>> size=131072 error=5 > >>>> > >>>> On the other hand, there are no other errors, nothing from geom, ahci, etc. > >>>> Why would that happen? What kind of error could this be? > >>> I believe this indicates silent data corruption[1], which ZFS can > >>> auto-correct if the pool is a mirror or raidz (otherwise it can detect > >>> the problem but not fix it). > >> This pool is a mirror. > >> > >>> This can happen for a lot of reasons, but > >>> tracking down the source is often difficult. Usually it indicates the > >>> disk itself has some kind of problem (cache going bad, some sector > >>> remaps which didn't happen or failed, etc.). > >> Please note that this is not a CKSUM error, but READ error. > > > > Okay, then it indicates reading some data off the disk failed. ZFS > > auto-corrected it by reading the data from the other member in the pool > > (ada0p4). That's confirmed here: > > Yes, right, of course. > If you read my original post you'll see that my question was: why ZFS saw I/O > error, but disk/controller/geom/etc driver didn't see it. > I do not see us moving towards an answer to that. My understanding is that a "vdev I/O error" indicates some sort of communication failure with a member in the pool, or some other layer within FreeBSD (GEOM I think, like you said). I don't think there has to be a 1:1 ratio between vdev I/O errors and controller/disk errors. For AHCI and storage controllers, I/O errors are messages that are returned from the controller to the OS, or from the disk through the controller to the OS. I suppose it's possible ZFS could be throwing an error for something that isn't actually block/disk-level. I'm interested to see what this turns out to be! I agree that your SMART statistics look fine -- the only test that isn't working is a manual or automatic offline data collection test, but this one fails (gets aborted) pretty often when the system is in use. You can see that here: > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. > Auto Offline Data Collection: Enabled. This is the test that "-t offline" induces (not -t short/long). It takes a very long time to run, which is why it often gets aborted: > Total time to complete Offline > data collection: (11160) seconds. That's the only thing that looks even remotely of concern with ada1, and it's not even worth focusing on. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 11:06:54 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05192106567B for ; Mon, 7 Jun 2010 11:06:54 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (unknown [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E7AF78FC13 for ; Mon, 7 Jun 2010 11:06:53 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o57B6rRn008623 for ; Mon, 7 Jun 2010 11:06:53 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o57B6r7l008620 for freebsd-fs@FreeBSD.org; Mon, 7 Jun 2010 11:06:53 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 7 Jun 2010 11:06:53 GMT Message-Id: <201006071106.o57B6r7l008620@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 11:06:54 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/147420 fs [nfs] [panic] kldload nfs modules causes nfs-aware ker o kern/147292 fs [nfs] [patch] readahead missing in nfs client options o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/146375 fs [nfs] [patch] Typos in macro variables names in sys/fs o kern/145778 fs [zfs] [panic] panic in zfs_fuid_map_id (known issue fi s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat s kern/145424 fs [zfs] [patch] move source closer to v15 o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o kern/145309 fs [disklabel]: Editing disk label invalidates the whole o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c o kern/144458 fs [nfs] [patch] nfsd fails as a kld p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o kern/143345 fs [ext2fs] [patch] extfs minor header cleanups to better o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142924 fs [ext2fs] [patch] Small cleanup for the inode struct in o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs o bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/139363 fs [nfs] diskless root nfs mount from non FreeBSD server o kern/138790 fs [zfs] ZFS ceases caching when mem demand is high o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb f kern/137037 fs [zfs] [hang] zfs rollback on root causes FreeBSD to fr o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic o kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [panic] panic: ffs_truncate: read-only filesystem o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS p kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/53137 fs [ffs] [panic] background fscking causing ffs_valloc pa o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 175 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 11:12:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9A94106566C for ; Mon, 7 Jun 2010 11:12:19 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id 529CF8FC18 for ; Mon, 7 Jun 2010 11:12:18 +0000 (UTC) Received: from higson.cam.lispworks.com (IDENT:U2FsdGVkX195IfpMpx2tua35nMV9ljAsr3py87BCKUE@higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id o57BCGGQ052521; Mon, 7 Jun 2010 12:12:16 +0100 (BST) (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com by higson.cam.lispworks.com (8.13.1) id o57BCGXU027499; Mon, 7 Jun 2010 12:12:16 +0100 Received: (from martin@localhost) by higson.cam.lispworks.com (8.13.1/8.13.1/Submit) id o57BCGMf027496; Mon, 7 Jun 2010 12:12:16 +0100 Date: Mon, 7 Jun 2010 12:12:16 +0100 Message-Id: <201006071112.o57BCGMf027496@higson.cam.lispworks.com> From: Martin Simmons To: freebsd-fs@freebsd.org In-reply-to: <20100607090850.GA49166@icarus.home.lan> (message from Jeremy Chadwick on Mon, 7 Jun 2010 02:08:50 -0700) References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 11:12:19 -0000 >>>>> On Mon, 7 Jun 2010 02:08:50 -0700, Jeremy Chadwick said: > > I'm still trying to figure out why people do this. Maybe because the ZFS Best Practices Guide suggests it? ("Run zpool scrub on a regular basis to identify data integrity problems...") It makes sense to detect errors when there is still a healthy mirror, rather than waiting until two drives are failing :-) > It's important to remember that scrubs are *highly* intensive on both > the system itself as well as on all pool members. Disk I/O activity is > very heavy during a scrub; it's not considered "normal use". Is it worse that a full backup? I guess scrub does read all drives, but OTOH backup will typically read all data non-linearly, which adds a different kind of stress. __Martin From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 11:43:37 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0C4E106566B for ; Mon, 7 Jun 2010 11:43:37 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 10DDF8FC0C for ; Mon, 7 Jun 2010 11:43:36 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA23524; Mon, 07 Jun 2010 14:43:33 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C0CDB64.6090304@icyb.net.ua> Date: Mon, 07 Jun 2010 14:43:32 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: Jeremy Chadwick References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <4C0CBBCA.3050304@icyb.net.ua> <20100607103829.GA50106@icarus.home.lan> In-Reply-To: <20100607103829.GA50106@icarus.home.lan> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 11:43:37 -0000 on 07/06/2010 13:38 Jeremy Chadwick said the following: > My understanding is that a "vdev I/O error" indicates some sort of > communication failure with a member in the pool, or some other layer > within FreeBSD (GEOM I think, like you said). I don't think there has > to be a 1:1 ratio between vdev I/O errors and controller/disk errors. > > For AHCI and storage controllers, I/O errors are messages that are > returned from the controller to the OS, or from the disk through the > controller to the OS. I suppose it's possible ZFS could be throwing > an error for something that isn't actually block/disk-level. > > I'm interested to see what this turns out to be! Yes, me too :) I skimmed through the sources and so far I see at least two possibilities: 1) Decompression error for a filesystem with compression. Again, I don't know why that could happen if there are no checksum errors or hardware errors. 2) Successful but short read from disk. Same thing - I don't know why that could happen. And I am sure that there are other possibilities too. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 12:19:56 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D5D31065680 for ; Mon, 7 Jun 2010 12:19:56 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id 32E888FC08 for ; Mon, 7 Jun 2010 12:19:55 +0000 (UTC) Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90]) by qmta03.emeryville.ca.mail.comcast.net with comcast id T00o1e0041wfjNsA30KvFQ; Mon, 07 Jun 2010 12:19:55 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta23.emeryville.ca.mail.comcast.net with comcast id T0Ku1e00A3S48mS8j0Kuz3; Mon, 07 Jun 2010 12:19:55 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 948929B418; Mon, 7 Jun 2010 05:19:54 -0700 (PDT) Date: Mon, 7 Jun 2010 05:19:54 -0700 From: Jeremy Chadwick To: Martin Simmons Message-ID: <20100607121954.GA52932@icarus.home.lan> References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <201006071112.o57BCGMf027496@higson.cam.lispworks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201006071112.o57BCGMf027496@higson.cam.lispworks.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 12:19:56 -0000 On Mon, Jun 07, 2010 at 12:12:16PM +0100, Martin Simmons wrote: > >>>>> On Mon, 7 Jun 2010 02:08:50 -0700, Jeremy Chadwick said: > > > > I'm still trying to figure out why people do this. > > Maybe because the ZFS Best Practices Guide suggests it? ("Run zpool scrub on > a regular basis to identify data integrity problems...") > > It makes sense to detect errors when there is still a healthy mirror, rather > than waiting until two drives are failing :-) The official quote from the ZFS Best Practices Guide[1] is: "Run zpool scrub on a regular basis to identify data integrity problems. If you have consumer-quality drives, consider a weekly scrubbing schedule. If you have datacenter-quality drives, consider a monthly scrubbing schedule." The first line of the paragraph seems reasonable; the concept being, do this process often so that you catch potential data-threatening errors before your entire pool explodes. Cool, I can accept that, but it gets us into a discussion about how often this is necessary (keep reading for more on that). However, the second part of the paragraph -- total rubbish. "Datacenter-quality drives?" Oh, I think they mean "enterprise-grade drives", which really don't offer much more than high-end consumer-grade drives at this point in time[2]. One of the key points of ZFS's creation was to provide a reliable filesystem using cheap disks[3][4]. The only thing I can find in the ZFS Administration Guide[5] is this: "The simplest way to check your data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the file system should remain usable and nearly as responsive while the scrubbing occurs." "Performing routine scrubbing also guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored." What's confusing about this is the phrase that pool verification is done by "verifying all the blocks can be read". Doesn't that happen when a standard read operation comes down the pipe for a file? What I'm getting at is that there's no explanation (that I can find) which states why scrubbing regularly "ensures" anything, other than allowing a person to see an error sooner than later. Which brings us to the topic of scrub interval... This exact question was asked on the ZFS OpenSolaris list[6] in late 2008, and nobody there provided any concrete evidence either. The closest thing to evidence is this: "...in normal operation, ZFS only checks data as it's read back from the disks. If you don't periodically scrub, errors that happen over time won't be caught until I next read that actual data, which might be inconvenient if it's a long since the initial data was written". The topic of scrub intervals was also brought up a month later[7]. Someone said: "We did a study on re-write scrubs which showed that once per year was a good interval for modern, enterprise-class disks. However, ZFS does a read-only scrub, so you might want to scrub more often". The first part conflicts with what the guide recommends (I'd also like to see the results of the study!), while the last half of the paragraph makes no sense ("because it reads, do it more often!"). So if you take the first sentence and apply it to what the ZFS Best Practices Guide says, you come out with... "scrub consumer-grade disks every 6 months". In the same thread, we have this quote from a different person: "Even that is probably more frequent than necessary. I'm sure somebody has done the MTTDL math. IIRC, the big win is doing any scrubbing at all. The difference between scrubbing every 2 weeks and every 2 months may be negligible. (IANAMathematician tho)" So the justification seems, well, unjustified. It's almost as if because the filesystem is new, that there's an underlying sense of paranoia, so everyone scrubs often. I understand the "pre-emptive" argument, just not the technical argument. So how often do *I* scrub our pools? Rarely. I tend to look at SMART stats much more aggressively; "uh oh, uncorrected sector, better scrub..." Or if while using the system it feels sluggish on I/O, or cronjob tasks taking way longer than need be. > > It's important to remember that scrubs are *highly* intensive on both > > the system itself as well as on all pool members. Disk I/O activity is > > very heavy during a scrub; it's not considered "normal use". > > Is it worse that a full backup? I guess scrub does read all drives, but OTOH > backup will typically read all data non-linearly, which adds a different kind > of stress. I'd guess it'd depend greatly on the type of backup. I'd imagine that a ZFS snapshot (non-incremental) + zfs send would be less intensive than a scrub, and the same (but even more so) with an incremental snapshot. I'd imagine rsync/tar/cp/etc. would be somewhere in-between. I don't use ZFS snapshots because I don't know if they've stabilised on FreeBSD. [1]: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools [2]: http://lists.freebsd.org/pipermail/freebsd-fs/2010-May/008508.html [3]: http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data [4]: http://Fwww.sun.com/software/solaris/zfs_lc_preso.pdf [5]: http://docs.sun.com/app/docs/doc/819-5461/gbbwa?l=en&a=view [6]: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg20995.html [7]: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg21728.html [8]: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSPeriodicScrubbing -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 13:22:09 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83C101065676 for ; Mon, 7 Jun 2010 13:22:09 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id D39B48FC1F for ; Mon, 7 Jun 2010 13:22:08 +0000 (UTC) Received: by iwn5 with SMTP id 5so4102027iwn.13 for ; Mon, 07 Jun 2010 06:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=xYeM61mLdIcJ1eV7H9kWRIrPWMEskPThLMcodtAhMrw=; b=It4gEZIAx+Qxrt4ai9tIdQAQAjVP+TgMuVRanBJYSnGC5QACeLkcuuzDflzGZN4F1V g6k1boIihKlOBcrJqGvWaJgiReIruWKl2OoAqTWjgjOj11gJC4cDMwglwQO9Ool00kds m+XePBR2z2U+kRK3oiI97w7If2UQB+wf/V9IY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=cp2Fb3nq7ZCQRPq6/GwmNRwch/aVR7vD9Zg1NaxE/vp+QtilgAuSA4tdImk7q2/f/E t0q8QUPwYnYP7srjuiim5sgNyoXJMhSz0+WvHyFxFvzbG56LcgS/ji13k5w/th8zCdUH ZiahzcND0G1SwZz21h/Ow3w0yY5Wi4OI7fyrA= Received: by 10.231.125.87 with SMTP id x23mr17307915ibr.88.1275916927560; Mon, 07 Jun 2010 06:22:07 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id f1sm20702856ibg.21.2010.06.07.06.22.04 (version=SSLv3 cipher=RC4-MD5); Mon, 07 Jun 2010 06:22:05 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C0CF27B.1050402@dataix.net> Date: Mon, 07 Jun 2010 09:22:03 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 To: Sergiy Suprun References: <4C0C6B54.8020005@ish.com.au> In-Reply-To: X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs filesystem problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 13:22:09 -0000 On 06/07/2010 01:46, Sergiy Suprun wrote: > On Mon, Jun 7, 2010 at 06:45, Jurgen Weber wrote: > >> Hello >> >> I have a FreeBSD 8.0-p2 system, which runs two pools. One with 6 disks all >> mirrored for our data and another mirrored pool for the OS. The system has >> 16GB of RAM. >> >> I have a nightly cron script running which takes a snapshot of a particular >> file system within the storage pool. This has been running for just over a >> month now without any issues until this weekend. >> >> Now we can not access the mentioned file system. If we try to `ls` to it or >> `cd` into it the shell locks up (not even kill -9 can stop the `ls` >> processes, etc) and top shows that the process state is `zfs`. This is most likely caused by some bugs that were found and fixed in stable/8. One of the commits that mm@ made has touched that zio->iowait that you should see your processes are stuck in. There still seems "at least in my case" some zio->iowait problems going on but I have not pinned that down to the cause yet, but they have not caused any of my system proccesses to freeze in that state. Grab a kernel from one of the snapshots that were made sometime last month to test this out just to be sure so your not upgrading for no reason. When I say kernel I mean kernel & modules that go with it as ZFS is a module and you will obviously need that. Please report back on your findings if the kernel from stable fixed your problem. URL to retrieve snapshots: http://bit.ly/aLoXXV Good Luck!, > Hello. > How about scrub ? > And which size of your pools and how many place used by data+snapshots? . -- jhell From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 13:31:25 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91C68106567B for ; Mon, 7 Jun 2010 13:31:25 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 2255D8FC1F for ; Mon, 7 Jun 2010 13:31:24 +0000 (UTC) Received: by fxm20 with SMTP id 20so2493393fxm.13 for ; Mon, 07 Jun 2010 06:31:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ElPHcB02FihrllgSxW0It4b4Wp1czmGJo0ZJDDmPOCo=; b=BwktnhfGURh89CjKAoSKAOPuNOuVIUuJGXLxwv+ufolJrPGV3qZHEYBj3pvVNsa3j7 Vo03yBYBd5OH7DSRDyhKbll+KqcW4GjO49lVRyTqtNZRELk8iSlZ/XG2kXBjtbsjidW3 i+PEuvMgM4gEfh1LAW097Rs+dJWYpgw5I/Dbk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=sWt6hnyzxlH9J/E2q4SXhgPqizkYfirbDxuW8qIyvzRNQehxHfcegEFbH7BwV26HT5 j5Ty4d28wkDM4soZBR00FuV7U4cOJWJ+qhMCSXFco9rNDoLXI0pw2XhauA5LFPKjgwa7 bo7QwBcNe3jRVVcagLPoGUq5FD0b7o3bmIxKs= MIME-Version: 1.0 Received: by 10.239.185.72 with SMTP id b8mr984872hbh.99.1275917483862; Mon, 07 Jun 2010 06:31:23 -0700 (PDT) Received: by 10.239.185.1 with HTTP; Mon, 7 Jun 2010 06:31:23 -0700 (PDT) In-Reply-To: <20100607121954.GA52932@icarus.home.lan> References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <201006071112.o57BCGMf027496@higson.cam.lispworks.com> <20100607121954.GA52932@icarus.home.lan> Date: Mon, 7 Jun 2010 14:31:23 +0100 Message-ID: From: Tom Evans To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 13:31:25 -0000 On Mon, Jun 7, 2010 at 1:19 PM, Jeremy Chadwick wrote: > What's confusing about this is the phrase that pool verification is done > by "verifying all the blocks can be read". =C2=A0Doesn't that happen when= a > standard read operation comes down the pipe for a file? =C2=A0What I'm > getting at is that there's no explanation (that I can find) which states > why scrubbing regularly "ensures" anything, other than allowing a person > to see an error sooner than later. > The purpose is to avoid unrecoverable double failures. Assume you have a raidz, and you do not periodically scrub the disk. One of the disks develops a silent problem with reading a file. Later, a second disk completely fails. You replace the disk, and then during the resilver discover that your raidz is FAULTED, because it cannot reconstruct files from the silently dodgy first disk. With periodic scrubs, you are ensuring that at that point you can recover from a single disk failure. Regularly running a scrub increases your confidence that you will be able to recover. The ZFS best practices guide suggests a shorter interval for consumer grade hard drives because they have lower confidence in them to remain error free. As I understand it, the scrub is just an attempt to ensure that everything on the pool is readable, attempting to reconstruct it if there are any issues. I guess it is slightly more clever than 'find /tank -type f -print0 | xargs -0 cat > /dev/null'. Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 16:55:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D512B106567B for ; Mon, 7 Jun 2010 16:55:38 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id 6719D8FC23 for ; Mon, 7 Jun 2010 16:55:34 +0000 (UTC) Received: from higson.cam.lispworks.com (IDENT:U2FsdGVkX1+YcVTcmnofzCojPtvAIFHitg0ZVF99qqI@higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id o57GtTne086545; Mon, 7 Jun 2010 17:55:29 +0100 (BST) (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com by higson.cam.lispworks.com (8.13.1) id o57GtSxK029970; Mon, 7 Jun 2010 17:55:28 +0100 Received: (from martin@localhost) by higson.cam.lispworks.com (8.13.1/8.13.1/Submit) id o57GtSBg029967; Mon, 7 Jun 2010 17:55:28 +0100 Date: Mon, 7 Jun 2010 17:55:28 +0100 Message-Id: <201006071655.o57GtSBg029967@higson.cam.lispworks.com> From: Martin Simmons To: freebsd-fs@freebsd.org In-reply-to: <20100607121954.GA52932@icarus.home.lan> (message from Jeremy Chadwick on Mon, 7 Jun 2010 05:19:54 -0700) References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <201006071112.o57BCGMf027496@higson.cam.lispworks.com> <20100607121954.GA52932@icarus.home.lan> Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 16:55:38 -0000 >>>>> On Mon, 7 Jun 2010 05:19:54 -0700, Jeremy Chadwick said: > > Which brings us to the topic of scrub interval... > > This exact question was asked on the ZFS OpenSolaris list[6] in late > 2008, and nobody there provided any concrete evidence either. The > closest thing to evidence is this: > > "...in normal operation, ZFS only checks data as it's read back from the > disks. If you don't periodically scrub, errors that happen over time > won't be caught until I next read that actual data, which might be > inconvenient if it's a long since the initial data was written". The question can't be answered with absolute numbers, because it depends on other factors such as environmental effects. > The topic of scrub intervals was also brought up a month later[7]. > Someone said: > > "We did a study on re-write scrubs which showed that once per year was a > good interval for modern, enterprise-class disks. However, ZFS does a > read-only scrub, so you might want to scrub more often". > > The first part conflicts with what the guide recommends (I'd also like > to see the results of the study!), while the last half of the paragraph > makes no sense ("because it reads, do it more often!"). So if you take > the first sentence and apply it to what the ZFS Best Practices Guide > says, you come out with... "scrub consumer-grade disks every 6 months". It doesn't conflict if you agree that freshly written data is more likely to be readable that data written long ago (with some curve in between). The re-write scrub they are talking about will write all of the data back to the disks during the scrubbing operation, which makes it fresher. ZFS OTOH performs read-only scrubs, i.e. it just checks that the data can be read. It only writes if there was a problem reading from one of the disks. I don't know if there is any science behind that theory... __Martin From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 17:11:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B0DD1065674 for ; Mon, 7 Jun 2010 17:11:47 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 1F1AF8FC0A for ; Mon, 7 Jun 2010 17:11:46 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o57HBj2P023334; Mon, 7 Jun 2010 12:11:46 -0500 (CDT) Date: Mon, 7 Jun 2010 12:11:45 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jeremy Chadwick In-Reply-To: <20100607121954.GA52932@icarus.home.lan> Message-ID: References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <201006071112.o57BCGMf027496@higson.cam.lispworks.com> <20100607121954.GA52932@icarus.home.lan> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 07 Jun 2010 12:11:46 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 17:11:47 -0000 On Mon, 7 Jun 2010, Jeremy Chadwick wrote: > rubbish. "Datacenter-quality drives?" Oh, I think they mean > "enterprise-grade drives", which really don't offer much more than > high-end consumer-grade drives at this point in time[2]. One of the key > points of ZFS's creation was to provide a reliable filesystem using > cheap disks[3][4]. There are differences between disks. High-grade enterprise disks offer uncorrected error rates at least an order of magnitude better than typical tier-2 "SATA" disks and sometimes two orders of magnitude better than a cheap maximum-density drive. Yes, there are tier-2 drives that come with SAS interfaces, and you can immediately distinguish what they are since they offer high storage capacities and more reasonable prices. > What's confusing about this is the phrase that pool verification is done > by "verifying all the blocks can be read". Doesn't that happen when a > standard read operation comes down the pipe for a file? What I'm No. A standard read does not verify that all data and metadata can be read. Only one copy of the data and metadata is read and there may be several such copies. Metadata is always stored multiple times, even if the vdev does not offer additional redundancy. > The topic of scrub intervals was also brought up a month later[7]. > Someone said: > > "We did a study on re-write scrubs which showed that once per year was a > good interval for modern, enterprise-class disks. However, ZFS does a > read-only scrub, so you might want to scrub more often". The concept of "bit rot" on modern disk drives is very unproven. The magnetism will surely last 1000+ years so the issue is mostly with stability of the media material and the heads. The idea that scrub should re-write the data assumes that magnetic hysteresis is lost over time. This is all very silly for a device with an expected service life of 5 years. It is much more likely for the drive heads to lose their function or for a mechanical defect to appear. Given the above, it makes sense to scrub more often on pools which see a lot of writes (to verify the recently written data), and less often on pools which are rarely updated. More levels of redundancy diminshes the value of the scrub. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 17:16:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A34B61065676 for ; Mon, 7 Jun 2010 17:16:19 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 668AA8FC17 for ; Mon, 7 Jun 2010 17:16:19 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o57HGISp023377; Mon, 7 Jun 2010 12:16:18 -0500 (CDT) Date: Mon, 7 Jun 2010 12:16:18 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Martin Simmons In-Reply-To: <201006071655.o57GtSBg029967@higson.cam.lispworks.com> Message-ID: References: <4C0CAABA.2010506@icyb.net.ua> <20100607083428.GA48419@icarus.home.lan> <4C0CB3FC.8070001@icyb.net.ua> <20100607090850.GA49166@icarus.home.lan> <201006071112.o57BCGMf027496@higson.cam.lispworks.com> <20100607121954.GA52932@icarus.home.lan> <201006071655.o57GtSBg029967@higson.cam.lispworks.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 07 Jun 2010 12:16:18 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: zfs i/o error, no driver error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 17:16:19 -0000 On Mon, 7 Jun 2010, Martin Simmons wrote: > > It doesn't conflict if you agree that freshly written data is more likely to > be readable that data written long ago (with some curve in between). Depending on the actual failure mechanism, the inverse may actually be true. Freshly written data may be trash while old data still reads fine. > I don't know if there is any science behind that theory... The science is continually changing. A study done even 5 or 7 years ago may no longer be relevant. Regardless, actual results seen in the field count more than any theory. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 22:59:16 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D4636106564A for ; Mon, 7 Jun 2010 22:59:16 +0000 (UTC) (envelope-from brad@duttonbros.com) Received: from uno.mnl.com (uno.mnl.com [64.221.209.136]) by mx1.freebsd.org (Postfix) with ESMTP id 8E2BF8FC08 for ; Mon, 7 Jun 2010 22:59:16 +0000 (UTC) Received: from uno.mnl.com (localhost [127.0.0.1]) by uno.mnl.com (Postfix) with ESMTP id 6C5C619E8 for ; Mon, 7 Jun 2010 15:42:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=duttonbros.com; h= message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding; s=mail; bh=B49dhHgjMmD3QCQJxbuCP7QVk zg=; b=hTnMeva4DwL6QyyWXn7/Q0YYc0dGQhxUTseavB6QFsKjS+CgmkPahOckA hoRjLJ2U0KifDvXDuw3Q7Fon/7di58iwLWQK0doKYcw0A/Juu+7dKWl9mcKAYrcK FnQaSjQ6FhPRNgGDdMpJEQpAQYlCP0QBkIY+hWQQQykdyHXxGk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=duttonbros.com; h=message-id :date:from:to:subject:mime-version:content-type :content-transfer-encoding; q=dns; s=mail; b=E7+vgDYUshOdj/bNGGB OPEcadGJGK/8q7LAutTk4TBgIcthbSMAw+kFhyfhkrG2Oy45T6101QT5xNERQIwz ShvaM2VPd86fpmcBIP4tFYFDGcw0HPo0nPDUxKV/FOjQHXTeS1yblQ7ESX98ura4 Zzbeu8ysWA8hTQ7aEr3UCLis= Received: from localhost (localhost [127.0.0.1]) by uno.mnl.com (Postfix) with ESMTP id 291E019E7 for ; Mon, 7 Jun 2010 15:42:56 -0700 (PDT) Received: from noah.mnl.com (noah.mnl.com [192.168.0.31]) by duttonbros.com (Horde Framework) with HTTP; Mon, 07 Jun 2010 15:42:56 -0700 Message-ID: <20100607154256.941428ovaq2hha0g@duttonbros.com> Date: Mon, 07 Jun 2010 15:42:56 -0700 From: "Bradley W. Dutton" To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.7) / FreeBSD-8.1 Subject: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 22:59:16 -0000 Hi, I just upgraded a 5x500 raidz (no NCQ) array to an 8x2tb raidz2 (NCQ) array. In the process I was expecting my new setup to absolutely tear through data due to having faster and additional drives. While the new setup is considerably faster than the old, some of the throughput rates weren't as high as I was expecting. I was hoping I could get some help to understand how ZFS is working or possibly identify some bottlenecks. My goal is to have ZFS on FreeBSD be the best it can. Below are benchmarks of the old 5 drive array (normal/raidz1/raidz2) and raidz2 of the new 8 drive array. As I'm using the new array I can't reformat it to test the other vdev types. Sorry in advance if this format is hard to read. Let me know if I omitted any key information. I did several runs of each of these commands and the results were in range of each other enough that I didn't think any numbers were out of line due to caching. The PC I'm using to test: FreeBSD backup 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #0: Mon May 24 18:45:38 PDT 2010 root@backup:/usr/obj/usr/src/sys/BACKUP amd64 AMD Athlon X2 5600 4gigs of RAM 5 SATA drives are Western Digital RE2 (7200rpm) using on board controller (NvidiaA nForce 570 SLI MCP, no NCQ): WD5001ABYS (3 of these) WD5000YS (2 of these) Supermicro AOC-USAS-L8i PCI Express x8 controller (with NCQ): 8 Hitachi 2TB 7200rpm drives Relevant /boot/loader.conf settings: vm.kmem_size="3G" vfs.zfs.arc_max="2100M" vfs.zfs.arc_meta_limit="700M" vfs.zfs.prefetch_disable="0" My CPU metrics aren't anything official, just me monitoring top while these commands are running. I mostly kept track of CPU to see if any processes were CPU bound. These are a percentage of total CPU time on the box, so 50% would be 1 core maxxed out. Changing the dd blocksize didn't seem to affect anything so I left it at 1M. Also, if the machine was running for a while and had various items cached in the ARC the speeds could be much slower, as much as half. The first ZFS benchmark was half as fast as the below numbers on a warm box (running for several days), I rebooted to get max speed. The faster numbers weren't due to the data being cached, I observed higher throughput numbers using gstat. Instead of 30Mbytes/sec I would see 60 or 70. The RE2 drives do between 70-80Mbytes/sec sequential reading/writing: #!/bin/sh for disk in "ad4" "ad6" "ad10" "ad12" "ad14" do dd if=/dev/${disk} of=/dev/null bs=1m count=4000 & done 4194304000 bytes transferred in 49.603534 secs (84556556 bytes/sec) 4194304000 bytes transferred in 51.679365 secs (81160130 bytes/sec) 4194304000 bytes transferred in 52.642995 secs (79674494 bytes/sec) 4194304000 bytes transferred in 57.742892 secs (72637581 bytes/sec) 4194304000 bytes transferred in 58.189738 secs (72079789 bytes/sec) CPU usage is low when doing these 5 reads, <10% The Hitachi drives do 120-130Mbytes/sec sequential read/write: #!/bin/sh for disk in "da0" "da1" "da2" "da3" "da4" "da5" "da6" "da7" do dd if=/dev/${disk} of=/dev/null bs=1m count=4000 & done 4194304000 bytes transferred in 31.980469 secs (131152048 bytes/sec) 4194304000 bytes transferred in 32.349440 secs (129656155 bytes/sec) 4194304000 bytes transferred in 32.776024 secs (127968664 bytes/sec) 4194304000 bytes transferred in 32.951440 secs (127287427 bytes/sec) 4194304000 bytes transferred in 33.048651 secs (126913017 bytes/sec) 4194304000 bytes transferred in 33.057686 secs (126878331 bytes/sec) 4194304000 bytes transferred in 33.374149 secs (125675234 bytes/sec) 4194304000 bytes transferred in 35.226584 secs (119066441 bytes/sec) CPU usage is around 25-30% Now on to the ZFS benchmarks: # # a regular ZFS pool for the 5 drive array # zpool create bench /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14 dd if=/dev/zero of=/bench/test.file bs=1m count=12000 12582912000 bytes transferred in 39.687730 secs (317047913 bytes/sec) 30-35% CPU All 5 drives are written to so we have: 317/5 = ~63Mbytes/sec This is close to 70Mbytes/sec so I'm ok with these numbers. I'm not sure how much overhead the checksumming is adding so that could account for the throughput gap here? dd if=/bench/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 34.668165 secs (362952928 bytes/sec) around 30% CPU All 5 drives are read from so we have: 362/5 = ~72Mbytes/sec This seems to be max speed considering the slowest drives in the pool run at this speed. # # a ZFS raidz pool for the 5 drive array # zpool destroy bench zpool create bench raidz /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14 dd if=/dev/zero of=/bench/test.file bs=1m count=12000 12582912000 bytes transferred in 54.357053 secs (231486281 bytes/sec) CPU varied widely, between 30 and 70%, kernel process using most, then dd Only 4 of 5 are writing actual data correct? so we have: 231/4 = ~58Mbytes/sec (this seems to be similar to gstat) We are getting a bit slower here from our reference 70Mbytes/sec and compared to 63 in the regular vdev. dd if=/bench/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 45.825533 secs (274582993 bytes/sec) around 40% CPU, kernel then dd using the most CPU Again only 4 of 5 have data so the throughput is this? 274/4 = ~68Mbytes/sec (looks to be similar to gstat) This is good and close to max speed. # # a ZFS raidz2 pool for the 5 drive array # zpool destroy bench zpool create bench raidz2 /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14 dd if=/dev/zero of=/bench/test.file bs=1m count=12000 12582912000 bytes transferred in 97.491160 secs (129067210 bytes/sec) CPU varied a lot 15-50%, a burst or two to 75% Only 3 of 5 are writing actual data correct? so we have: 129/3 = ~43Mbytes/sec (gstat was varying quite a bit here, as low as 5, as high as 60) These speeds are now quite a bit lower than I would expect. Calculation overhead is causing the discrepancy here? The CPU is too slow? dd if=/bench/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 58.947959 secs (213457976 bytes/sec) around 30% CPU Only 3 of 5 have data and I'm not sure how to calculate throughput. I'm guessing the round robin reads help boost these numbers (read 3 data disks + 1 parity so only 4 of 5 drives are in use for any given read?). gstat shows rates around 40Mbytes/sec even though I would expect closer to 60-70. 213/3 = ~71Mbytes/sec (although I don't think we can do this calculation this way) # # ZFS raidz2 pool on the 8 drive array # this pool is about 15% used so the read/write tests aren't necessarily # on the fastest part of the disks. # zpool create tank raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 /dev/da6 /dev/da7 dd if=/dev/zero of=/tank/test.file bs=1m count=12000 12582912000 bytes transferred in 40.878876 secs (307809638 bytes/sec) varying 40-70% CPU (a few bursts into the 90s), kernel then dd using most of it 307/6 = ~51Mbytes/sec (gstat varied quite a big, 20-80, it seems to average in the 50s as dd reported) Per disk this isn't much faster than the old array, 51 compared to 43. With a few bursts to 95% CPU it seems as though some of this could be CPU bound. dd if=/tank/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 32.911291 secs (382328118 bytes/sec) around 55% CPU, mostly kernel then dd Similar to raidz2 test above, I don't think we can calculate throughput this way. In any case, this is actually slower per disk than the old array. 382/6 = ~64Mbytes/sec (gstat seemed to be around 50 so I'm guessing the round robin reading is creating more throughput) # # wrap up # So the normal vdev performs closest to raw drive speeds. Raidz1 is slower and raidz2 even more so. This is observable in the dd tests and viewing in gstat. Any ideas why the raid numbers are slower? I've tried to account for the fact that the raid vdevs have fewer data disks. Would a faster CPU help here? Unfortunately I migrated all of my data to the new array so I can't run all of my tests on there. It would have been nice to see if a normal pool (non raid) on these disks would have come close to max speeds of 120-130Mbytes/sec (giving a total pool through put close to 1Gbyte/sec) as the smaller array did with respect to its max speed. I noticed scrubbing the big array is CPU bound as the kernel process is at 99% when running (total CPU is 50% as the as the scrub doesn't multithread/process). The disks are running around 45-50Mbytes/sec in gstat. Scrubbing the smaller/slower array isn't CPU bound and the disks run at close to max speed. Thanks for time, Brad From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 23:19:18 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A3831065678 for ; Mon, 7 Jun 2010 23:19:18 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id C4FC18FC14 for ; Mon, 7 Jun 2010 23:19:17 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o57NJFIq004113; Mon, 7 Jun 2010 18:19:16 -0500 (CDT) Date: Mon, 7 Jun 2010 18:19:15 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "Bradley W. Dutton" In-Reply-To: <20100607154256.941428ovaq2hha0g@duttonbros.com> Message-ID: References: <20100607154256.941428ovaq2hha0g@duttonbros.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 07 Jun 2010 18:19:16 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 23:19:18 -0000 On Mon, 7 Jun 2010, Bradley W. Dutton wrote: > So the normal vdev performs closest to raw drive speeds. Raidz1 is slower and > raidz2 even more so. This is observable in the dd tests and viewing in gstat. > Any ideas why the raid numbers are slower? I've tried to account for the fact > that the raid vdevs have fewer data disks. Would a faster CPU help here? The sequential throughput on your new drives is faster than the old drives, but it is likely that the seek and rotational latencies are longer. ZFS is transaction-oriented and must tell all the drives to sync their write cache before proceeding to the next transaction group. Drives with more latency will slow down this step. Likewise, ZFS always reads and writes full filesystem blocks (default 128K) and this may cause more overhead when using raidz. Using 'dd' from /dev/zero is not a very good benchmark test since zfs could potentially compress zero-filled blocks down to just a few bytes (I think recent versions of zfs do this) and of course Unix supports files with holes. The higher CPU usage might be due to the device driver or the interface card being used. If you could afford to do so, you will likely see considerably better performance by using mirrors instead of raidz since then 128K blocks will be sent to each disk and with fewer seeks. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 7 23:29:14 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3FC5106566B for ; Mon, 7 Jun 2010 23:29:14 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail14.syd.optusnet.com.au (mail14.syd.optusnet.com.au [211.29.132.195]) by mx1.freebsd.org (Postfix) with ESMTP id 54F788FC08 for ; Mon, 7 Jun 2010 23:29:13 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c211-30-160-13.mirnd2.nsw.optusnet.com.au [211.30.160.13] (may be forged)) by mail14.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o57NTAj3000531 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 8 Jun 2010 09:29:12 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o57NTApC058778 for ; Tue, 8 Jun 2010 09:29:10 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o57NTACe058777 for freebsd-fs@freebsd.org; Tue, 8 Jun 2010 09:29:10 +1000 (EST) (envelope-from peter) Date: Tue, 8 Jun 2010 09:29:10 +1000 From: Peter Jeremy To: freebsd-fs@freebsd.org Message-ID: <20100607232909.GA57423@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jI8keyz6grp/JLjh" Content-Disposition: inline X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Subject: ZFS memory usage X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 23:29:14 -0000 --jI8keyz6grp/JLjh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Currently, ZFS does not appear to be able to steal memory from the "inactive" list, whereas NFS and UFS both return "freed" pages to the "inactive" list. Over time, unless you have a pure ZFS box (with no NFS), this tends to result in ZFS reporting a memory shortage (kstat.zfs.misc.arcstats.memory_throttle_count increasing), whilst there is plenty of "inactive" space. What is involved in correcting this? At least part of the problem is that cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:arc_memory_throttle() only looks at cnt.v_free_count (number of free pages) when deciding whether to throttle or not. Is the fix as simple as changing the test to check (cnt.v_free_count + cnt.v_inactive_count)? Assuming that the fix is non-trivial, is there an easy way to transfer "inactive" memory to the "free" list? The perl hack: perl -e '$x =3D "x" x 1000000;' sort-of works - by forcing the VM system into real memory shortage. Is there a better work-around? --=20 Peter Jeremy --jI8keyz6grp/JLjh Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEUEARECAAYFAkwNgMUACgkQ/opHv/APuIdQ3gCgvipKI+Dalgu9JATA2CHohjy1 8U8AljXf+S28MzAjT0It336mNQGC0wQ= =jsoE -----END PGP SIGNATURE----- --jI8keyz6grp/JLjh-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 00:14:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1026B106564A for ; Tue, 8 Jun 2010 00:14:29 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id D7CCD8FC08 for ; Tue, 8 Jun 2010 00:14:28 +0000 (UTC) Received: by pwj1 with SMTP id 1so2168144pwj.13 for ; Mon, 07 Jun 2010 17:14:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=docQGOGeooXh2/Wb2yeWf/H4pTSreSPXF478hBzazG4=; b=ntqQp2WTRrhODBUbKvpH4DnJz5cXXSJoGpFRQ43mSIR/oPAZMTGJVbPxD4azYWjc3c AhBVNJw0OuMDWes083ds9oLElyMX9oJT3TDHGUY/BeIXPPwi51mNa0yGTDR3XVn0Crs4 +0yl+q2c4lZCzgcHN5CjgvcIYG5YcXlW1/Qxc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=kBfAKGJkfOJYrCPa4/h4CLbMloVy0hVUIEGFV0S4JPUac2NKAIUoRFtXk2nI69g+5X /b3212oaT3vyTOYu0F/X+xPDkdkD9J/sJxeX1O77NRBmmjbrryL1UMYCqzOV+8BJrGgF A4u4UzkOGO1wHmymSiPIsbDGdRcWe39SrIEzA= MIME-Version: 1.0 Received: by 10.140.248.7 with SMTP id v7mr12523527rvh.252.1275956068293; Mon, 07 Jun 2010 17:14:28 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.141.40.4 with HTTP; Mon, 7 Jun 2010 17:14:28 -0700 (PDT) In-Reply-To: <20100607232909.GA57423@server.vk2pj.dyndns.org> References: <20100607232909.GA57423@server.vk2pj.dyndns.org> Date: Mon, 7 Jun 2010 17:14:28 -0700 X-Google-Sender-Auth: RzqYfPu2vObfOqlhkkxvNiPUqXI Message-ID: From: Artem Belevich To: Peter Jeremy Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS memory usage X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 00:14:29 -0000 I believe it's pagedaemon's job to push pages from active list to inactive and from inactive down to cache and free. I have a really ugly hack to arc.c which forces pagedaemon wakeup if ARC sees too much memory on inactive list. How much is too much is defined by a sysctl value. http://pastebin.com/ZCkzkWcs Be warned: it's ugly, it may not work, it assumes too much, it's plain broken, it may ... I'm serious -- I have seen my box locking up when I did manage to exhaust memory. The only reason I'm posting this ugliness at all is bacause of hope that someone more familiar with memory allocation in FreeBSD may be able to suggest better approach. --Artem On Mon, Jun 7, 2010 at 4:29 PM, Peter Jeremy wrote: > Currently, ZFS does not appear to be able to steal memory from the > "inactive" list, whereas NFS and UFS both return "freed" pages to the > "inactive" list. =A0Over time, unless you have a pure ZFS box (with no > NFS), this tends to result in ZFS reporting a memory shortage > (kstat.zfs.misc.arcstats.memory_throttle_count increasing), whilst > there is plenty of "inactive" space. > > What is involved in correcting this? > > At least part of the problem is that > cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:arc_memory_throttle() > only looks at cnt.v_free_count (number of free pages) when deciding > whether to throttle or not. =A0Is the fix as simple as changing the > test to check (cnt.v_free_count + cnt.v_inactive_count)? > > Assuming that the fix is non-trivial, is there an easy way to transfer > "inactive" memory to the "free" list? =A0The perl hack: > =A0perl -e '$x =3D "x" x 1000000;' > sort-of works - by forcing the VM system into real memory shortage. > Is there a better work-around? > > -- > Peter Jeremy > From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 00:30:13 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35EA6106566C for ; Tue, 8 Jun 2010 00:30:13 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id EA96A8FC1B for ; Tue, 8 Jun 2010 00:30:12 +0000 (UTC) Received: by email.octopus.com.au (Postfix, from userid 1002) id 995795CB94D; Tue, 8 Jun 2010 10:23:10 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: **** X-Spam-Status: No, score=4.4 required=10.0 tests=ALL_TRUSTED, DNS_FROM_OPENWHOIS,FH_DATE_PAST_20XX autolearn=no version=3.2.3 Received: from [10.1.50.144] (142.19.96.58.static.exetel.com.au [58.96.19.142]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id 86DA95CB938 for ; Tue, 8 Jun 2010 10:23:06 +1000 (EST) Message-ID: <4C0D8F09.4090009@modulus.org> Date: Tue, 08 Jun 2010 10:30:01 +1000 From: Andrew Snow User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20100607232909.GA57423@server.vk2pj.dyndns.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS memory usage X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 00:30:13 -0000 I think that most ZFS users, if they have a UFS partition, it'll only be for a root/boot partition anyway. So in a mixed ZFS/UFS system, ideally the Inactive pages should be slowly returned to ARC. It is desirable for ZFS to get a majority of the available (inactive) memory, to improve its performance. Currently we have the opposite situation in effect. - Andrew From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 00:32:20 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1366A106567B for ; Tue, 8 Jun 2010 00:32:20 +0000 (UTC) (envelope-from brad@duttonbros.com) Received: from uno.mnl.com (uno.mnl.com [64.221.209.136]) by mx1.freebsd.org (Postfix) with ESMTP id C80F28FC0C for ; Tue, 8 Jun 2010 00:32:19 +0000 (UTC) Received: from uno.mnl.com (localhost [127.0.0.1]) by uno.mnl.com (Postfix) with ESMTP id 027DE1A2B; Mon, 7 Jun 2010 17:32:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=duttonbros.com; h= message-id:date:from:to:cc:subject:references:in-reply-to :mime-version:content-type:content-transfer-encoding; s=mail; bh=l7lEN0CgegfYT9yurIpXOq9Yu+4=; b=tZyAKeXj1DuHQJCEdl6Rx3+l6ACd KbtuAw2KN9SXGd1ZXhy2uFndD2MLrDKO5CqNZDGJwETDRgiPSIz0jfKPf+uRVj4x aOsrSXWUjorHSWIAa79pTTLTDHGBatHhf0l+BMb/uw/6aYyexh8C43PVt0f+KOTo ScfxdmzTbZC+o2Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=duttonbros.com; h=message-id :date:from:to:cc:subject:references:in-reply-to:mime-version :content-type:content-transfer-encoding; q=dns; s=mail; b=LONbBj Ys6vyu0eQ7KcvkUUylJnXVXhMBtWqlzAz9aBRZT/aMso/3IaijZwZgXHwBh4bm6T GCRc8qi+2wcQLsLGYH4/2/yDoLBiO2VwZwEvzGUAVC408u1j3iH+Iaip4ODSWZfr cuNmZ6JNRZO+ZEvUZRP7By2z5b2VRXrP+wFkE= Received: from localhost (localhost [127.0.0.1]) by uno.mnl.com (Postfix) with ESMTP id E1E821A29; Mon, 7 Jun 2010 17:32:18 -0700 (PDT) Received: from noah.mnl.com (noah.mnl.com [192.168.0.31]) by duttonbros.com (Horde Framework) with HTTP; Mon, 07 Jun 2010 17:32:18 -0700 Message-ID: <20100607173218.11716iopp083dbpu@duttonbros.com> Date: Mon, 07 Jun 2010 17:32:18 -0700 From: "Bradley W. Dutton" To: Bob Friesenhahn References: <20100607154256.941428ovaq2hha0g@duttonbros.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.7) / FreeBSD-8.1 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 00:32:20 -0000 Quoting Bob Friesenhahn : > On Mon, 7 Jun 2010, Bradley W. Dutton wrote: >> So the normal vdev performs closest to raw drive speeds. Raidz1 is >> slower and raidz2 even more so. This is observable in the dd tests >> and viewing in gstat. Any ideas why the raid numbers are slower? >> I've tried to account for the fact that the raid vdevs have fewer >> data disks. Would a faster CPU help here? > > The sequential throughput on your new drives is faster than the old > drives, but it is likely that the seek and rotational latencies are > longer. ZFS is transaction-oriented and must tell all the drives to > sync their write cache before proceeding to the next transaction > group. Drives with more latency will slow down this step. > Likewise, ZFS always reads and writes full filesystem blocks > (default 128K) and this may cause more overhead when using raidz. The details are little lacking on the Hitachi site but the HDS722020ALA330 says 8.2 seek time. http://www.hitachigst.com/tech/techlib.nsf/techdocs/5F2DC3B35EA0311386257634000284AD/$file/USA7K2000_DS7K2000_OEMSpec_r1.2.pdf The WDC drives say 8.9 so we should be in the same ballpark on seek times. http://www.wdc.com/en/products/products.asp?driveid=399 I thought the NCQ vs no NCQ might tip the scales in favor of the Hitachi array as well. Are there any tools to check the latencies of the disks? > Using 'dd' from /dev/zero is not a very good benchmark test since > zfs could potentially compress zero-filled blocks down to just a few > bytes (I think recent versions of zfs do this) and of course Unix > supports files with holes. I know it's pretty simple but for checking throughput I thought it would be ok. I don't have compression on and based on the drive lights and gstat, the drives definitely aren't idle. > The higher CPU usage might be due to the device driver or the > interface card being used. Definitely a plausible explanation. If this was the case would the 8 parallel dd processes exhibit the same behavior? or is the type of IO affecting how much CPU the driver is using? > If you could afford to do so, you will likely see considerably > better performance by using mirrors instead of raidz since then 128K > blocks will be sent to each disk and with fewer seeks. I agree with you but at this poing I value the extra space more as I don't have a lot of random IO. I read the following and decided to stick with raidz2 when ditching my old raidz1 setup: http://blogs.sun.com/roch/entry/when_to_and_not_to Thanks for the feedback, Brad From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 01:37:53 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0922B1065672 for ; Tue, 8 Jun 2010 01:37:53 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id C46E78FC1B for ; Tue, 8 Jun 2010 01:37:52 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o581bopE004728; Mon, 7 Jun 2010 20:37:51 -0500 (CDT) Date: Mon, 7 Jun 2010 20:37:50 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "Bradley W. Dutton" In-Reply-To: <20100607173218.11716iopp083dbpu@duttonbros.com> Message-ID: References: <20100607154256.941428ovaq2hha0g@duttonbros.com> <20100607173218.11716iopp083dbpu@duttonbros.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 07 Jun 2010 20:37:51 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 01:37:53 -0000 On Mon, 7 Jun 2010, Bradley W. Dutton wrote: > > Are there any tools to check the latencies of the disks? There might be something better, but 'iostat -x' is definitely your friend when it comes to looking at latencies under load. Use a sample time of 30 seconds ('iostat -x 30'). Check if a few disks are much slower than the others. If they are all about the same, then the disks are likely operating ok. Sometimes it is found that one or two disks are abnormally slow, and this slows down the whole raidz. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 04:47:09 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 479C3106566C for ; Tue, 8 Jun 2010 04:47:09 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 2C66A8FC13 for ; Tue, 8 Jun 2010 04:47:08 +0000 (UTC) Received: from omta17.emeryville.ca.mail.comcast.net ([76.96.30.73]) by qmta05.emeryville.ca.mail.comcast.net with comcast id TG4f1e0051afHeLA5Gn8eh; Tue, 08 Jun 2010 04:47:08 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta17.emeryville.ca.mail.comcast.net with comcast id TGn71e00C3S48mS8dGn7A8; Tue, 08 Jun 2010 04:47:08 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 526C29B418; Mon, 7 Jun 2010 21:47:07 -0700 (PDT) Date: Mon, 7 Jun 2010 21:47:07 -0700 From: Jeremy Chadwick To: "Bradley W. Dutton" Message-ID: <20100608044707.GA78147@icarus.home.lan> References: <20100607154256.941428ovaq2hha0g@duttonbros.com> <20100607173218.11716iopp083dbpu@duttonbros.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100607173218.11716iopp083dbpu@duttonbros.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 04:47:09 -0000 On Mon, Jun 07, 2010 at 05:32:18PM -0700, Bradley W. Dutton wrote: > Quoting Bob Friesenhahn : > > >On Mon, 7 Jun 2010, Bradley W. Dutton wrote: > >>So the normal vdev performs closest to raw drive speeds. Raidz1 > >>is slower and raidz2 even more so. This is observable in the dd > >>tests and viewing in gstat. Any ideas why the raid numbers are > >>slower? I've tried to account for the fact that the raid vdevs > >>have fewer data disks. Would a faster CPU help here? > > > >The sequential throughput on your new drives is faster than the > >old drives, but it is likely that the seek and rotational > >latencies are longer. ZFS is transaction-oriented and must tell > >all the drives to sync their write cache before proceeding to the > >next transaction group. Drives with more latency will slow down > >this step. Likewise, ZFS always reads and writes full filesystem > >blocks (default 128K) and this may cause more overhead when using > >raidz. > > The details are little lacking on the Hitachi site but the > HDS722020ALA330 says 8.2 seek time. > http://www.hitachigst.com/tech/techlib.nsf/techdocs/5F2DC3B35EA0311386257634000284AD/$file/USA7K2000_DS7K2000_OEMSpec_r1.2.pdf > > The WDC drives say 8.9 so we should be in the same ballpark on seek times. > http://www.wdc.com/en/products/products.asp?driveid=399 > > I thought the NCQ vs no NCQ might tip the scales in favor of the > Hitachi array as well. I'm not sure you understand NCQ. What you're doing in your dd test is individual dd's on each disk. NCQ is a per-disk thing. What you need to test is multiple concurrent transactions *per disk*. What I'm trying to say is that NCQ vs. no-NCQ isn't the culprit here, because your testbench model isn't making use of it. > I know it's pretty simple but for checking throughput I thought it > would be ok. I don't have compression on and based on the drive > lights and gstat, the drives definitely aren't idle. Try disabling prefetch (you have it enabled) and try setting vfs.zfs.txg.timeout="5". Some people have reported a "sweet spot" with regards to the last parameter (needing to be adjusted if your disks are extremely fast, etc.), as otherwise ZFS would be extremely "bursty" in its I/O (stalling/deadlocking the system at set intervals). By decreasing the value you essentially do disk writes more regularly (with less data), and depending upon the load and controller, this may even out performance. > >The higher CPU usage might be due to the device driver or the > >interface card being used. > > Definitely a plausible explanation. If this was the case would the 8 > parallel dd processes exhibit the same behavior? or is the type of > IO affecting how much CPU the driver is using? It would be the latter. Also, I believe this Supermicro controller has been discussed in the past. I can't remember if people had outright failures/issues with it or if people were complaining about sub-par performance. I could also be remembering a different Supermicro controller. If I had to make a recommendation, it would be to reproduce the same setup on a system using an Intel ICH9/ICH9R or ICH10/ICH10R controller in AHCI mode (with ahci.ko loaded, not ataahci.ko) and see if things improve. But start with the loader.conf tunables I mentioned above -- segregate each test. I would also recommend you re-run your tests with a different blocksize for dd. I don't know why people keep using 1m (Linux websites?). Test the following increments: 4k, 8k, 16k, 32k, 64k, 128k, 256k. That's about where you should stop. Otherwise, consider installing ports/benchmarks/bonnie++ and try that. That will also get you concurrent I/O tests, I believe. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 04:47:36 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 53E591065670 for ; Tue, 8 Jun 2010 04:47:36 +0000 (UTC) (envelope-from jurgen@ish.com.au) Received: from fish.ish.com.au (eth5921.nsw.adsl.internode.on.net [59.167.240.32]) by mx1.freebsd.org (Postfix) with ESMTP id 8B1B78FC1E for ; Tue, 8 Jun 2010 04:47:35 +0000 (UTC) Received: from ip-211.ish.com.au ([203.29.62.211]:29587 helo=ish.com.au) by fish.ish.com.au with esmtp (Exim 4.69) (envelope-from ) id 1OLqiq-0000C5-0l; Tue, 08 Jun 2010 14:47:32 +1000 Received: from [203.29.62.154] (HELO ip-154.ish.com.au) by ish.com.au (CommuniGate Pro SMTP 5.3.7) with ESMTP id 5951910; Tue, 08 Jun 2010 14:47:32 +1000 Message-ID: <4C0DCB64.5090002@ish.com.au> Date: Tue, 08 Jun 2010 14:47:32 +1000 From: Jurgen Weber User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100310 Shredder/3.0.4pre MIME-Version: 1.0 To: jhell References: <4C0C6B54.8020005@ish.com.au> <4C0CF27B.1050402@dataix.net> In-Reply-To: <4C0CF27B.1050402@dataix.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs filesystem problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 04:47:36 -0000 Thanks I was able to reboot this machine last night which solved the immediate problem. I'll let you know how I go. On 7/06/10 11:22 PM, jhell wrote: > On 06/07/2010 01:46, Sergiy Suprun wrote: >> On Mon, Jun 7, 2010 at 06:45, Jurgen Weber wrote: >> >>> Hello >>> >>> I have a FreeBSD 8.0-p2 system, which runs two pools. One with 6 disks all >>> mirrored for our data and another mirrored pool for the OS. The system has >>> 16GB of RAM. >>> >>> I have a nightly cron script running which takes a snapshot of a particular >>> file system within the storage pool. This has been running for just over a >>> month now without any issues until this weekend. >>> >>> Now we can not access the mentioned file system. If we try to `ls` to it or >>> `cd` into it the shell locks up (not even kill -9 can stop the `ls` >>> processes, etc) and top shows that the process state is `zfs`. > > This is most likely caused by some bugs that were found and fixed in > stable/8. One of the commits that mm@ made has touched that zio->iowait > that you should see your processes are stuck in. > > There still seems "at least in my case" some zio->iowait problems going > on but I have not pinned that down to the cause yet, but they have not > caused any of my system proccesses to freeze in that state. > > Grab a kernel from one of the snapshots that were made sometime last > month to test this out just to be sure so your not upgrading for no > reason. When I say kernel I mean kernel& modules that go with it as ZFS > is a module and you will obviously need that. > > Please report back on your findings if the kernel from stable fixed your > problem. > > URL to retrieve snapshots: http://bit.ly/aLoXXV > > Good Luck!, > > > >> Hello. >> How about scrub ? >> And which size of your pools and how many place used by data+snapshots? > . > -- --------------------------> ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 07:26:04 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AED6A1065672 for ; Tue, 8 Jun 2010 07:26:04 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id F2B808FC1F for ; Tue, 8 Jun 2010 07:26:03 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA13343; Tue, 08 Jun 2010 10:25:57 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OLtC9-000KOw-4E; Tue, 08 Jun 2010 10:25:57 +0300 Message-ID: <4C0DF084.6090106@icyb.net.ua> Date: Tue, 08 Jun 2010 10:25:56 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: Artem Belevich References: <20100607232909.GA57423@server.vk2pj.dyndns.org> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS memory usage X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 07:26:04 -0000 on 08/06/2010 03:14 Artem Belevich said the following: > I believe it's pagedaemon's job to push pages from active list to > inactive and from inactive down to cache and free. > I have a really ugly hack to arc.c which forces pagedaemon wakeup if > ARC sees too much memory on inactive list. > How much is too much is defined by a sysctl value. > > http://pastebin.com/ZCkzkWcs > > Be warned: it's ugly, it may not work, it assumes too much, it's plain > broken, it may ... > I'm serious -- I have seen my box locking up when I did manage to > exhaust memory. The only reason I'm posting this ugliness at all is > bacause of hope that someone more familiar with memory allocation in > FreeBSD may be able to suggest better approach. I think it's a good start. I did a much more primitive thing locally and even that improved thing for me a lot - I simply dropped "(vm_paging_target() > -2048" check. Kip Macy is aware of this situation, perhaps he'll look into resolving it. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 08:56:15 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A05B1065673 for ; Tue, 8 Jun 2010 08:56:15 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id D79D98FC14 for ; Tue, 8 Jun 2010 08:56:14 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id 19B0E47114 for ; Tue, 8 Jun 2010 10:36:50 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id uE60sxk492mZ for ; Tue, 8 Jun 2010 10:36:49 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id E670747113; Tue, 8 Jun 2010 10:36:49 +0200 (CEST) Date: Tue, 8 Jun 2010 10:36:49 +0200 From: Anders Nordby To: freebsd-fs@freebsd.org Message-ID: <20100608083649.GA77452@fupp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Subject: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 08:56:15 -0000 Hi! I have a file server running 8.1-PRERELEASE amd64, where I share some filesystems using NFS and Samba. After running for a day or two, the server starts to get around 25% packet loss, browsing directories across NFS gets really slow etc. Rebooting solves it until it happens again. Has anyone experienced anything similar? I had this issue in FreeBSD 7 as well, upgrading did not help. PS: I used mountd and /etc/exports to share the filesystems. I also regularly run zpool status from monitoring systems. I also replaced the server physically, changed switch ports, cables etc. So it does not seem to be a problem with hardware. Bye, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 10:01:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0E611065672 for ; Tue, 8 Jun 2010 10:01:20 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.westchester.pa.mail.comcast.net (qmta14.westchester.pa.mail.comcast.net [76.96.59.212]) by mx1.freebsd.org (Postfix) with ESMTP id A381E8FC1E for ; Tue, 8 Jun 2010 10:01:20 +0000 (UTC) Received: from omta07.westchester.pa.mail.comcast.net ([76.96.62.59]) by qmta14.westchester.pa.mail.comcast.net with comcast id TMir1e0021GhbT85EMl6ha; Tue, 08 Jun 2010 09:45:06 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta07.westchester.pa.mail.comcast.net with comcast id TMl51e00A3S48mS3TMl6qS; Tue, 08 Jun 2010 09:45:06 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 15FBD9B418; Tue, 8 Jun 2010 02:45:04 -0700 (PDT) Date: Tue, 8 Jun 2010 02:45:04 -0700 From: Jeremy Chadwick To: Anders Nordby Message-ID: <20100608094504.GA86086@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100608083649.GA77452@fupp.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 10:01:21 -0000 On Tue, Jun 08, 2010 at 10:36:49AM +0200, Anders Nordby wrote: > I have a file server running 8.1-PRERELEASE amd64, where I share some > filesystems using NFS and Samba. After running for a day or two, the > server starts to get around 25% packet loss, browsing directories across > NFS gets really slow etc. Rebooting solves it until it happens again. > Has anyone experienced anything similar? I had this issue in FreeBSD 7 > as well, upgrading did not help. > > PS: I used mountd and /etc/exports to share the filesystems. I also > regularly run zpool status from monitoring systems. I also replaced the > server physically, changed switch ports, cables etc. So it does not seem > to be a problem with hardware. For what it's worth, we have a similar setup, but without Samba. Machine happens to be running 8.0-STABLE (world/kernel Mon Apr 26 02:26:36). No packet loss seen, and no overall issues aside from some input errors on our em1 NIC (which do not correlate on the switch its connected to). NFS is not used heavily aside from daily backup jobs across a gigE nework. Machine has been up 42 days. $ netstat -ibn Name Mtu Network Address Ipkts Ierrs Idrop Ibytes Opkts Oerrs Obytes Coll em0 1500 XX:XX:XX:XX:XX:XX 1541235 0 0 344966704 359378 0 255337127 0 em0 1500 XX.XX.XX.XX/X XX.XX.XX.XX 424637 - - 271854885 359033 - 250287803 - em1 1500 XX:XX:XX:XX:XX:XX 62851814 59 0 81941673228 43418668 0 3520821042 0 em1 1500 XX.XX.XX.XX/X XX.XX.XX.XX 62813816 - - 81059999660 43464778 - 2912408432 - lo0 16384 3288 0 0 435566 3288 0 435566 0 lo0 16384 127.0.0.0/8 127.0.0.1 3288 - - 435566 3288 - 435566 - $ netstat -m 3085/2165/5250 mbufs in use (current/cache/total) 2549/2433/4982/25600 mbuf clusters in use (current/cache/total/max) 2048/896 mbuf+clusters out of packet secondary zone in use (current/cache) 0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 5869K/5823K/11692K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines switch# show interfaces 2 Status and Counters - Port Counters for port 2 Name : XXXXXXXXXX Link Status : Up Totals (Since boot or last clear) : Bytes Rx : 2,512,873,738 Bytes Tx : 1,117,402,842 Unicast Rx : 169,702,228 Unicast Tx : 237,760,196 Bcast/Mcast Rx : 12,667 Bcast/Mcast Tx : 75,726 Errors (Since boot or last clear) : FCS Rx : 0 Drops Rx : 0 Alignment Rx : 0 Collisions Tx : 0 Runts Rx : 0 Late Colln Tx : 0 Giants Rx : 0 Excessive Colln : 0 Total Rx Errors : 0 Deferred Tx : 0 Rates (5 minute weighted average) : Total Rx (bps) : 1449296 Total Tx (bps) : 1504528 Unicast Rx (Pkts/sec) : 0 Unicast Tx (Pkts/sec) : 0 B/Mcast Rx (Pkts/sec) : 0 B/Mcast Tx (Pkts/sec) : 0 Utilization Rx : 00.04 % Utilization Tx : 00.04 % -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 15:20:07 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF1BD1065674 for ; Tue, 8 Jun 2010 15:20:07 +0000 (UTC) (envelope-from brad@duttonbros.com) Received: from uno.mnl.com (uno.mnl.com [64.221.209.136]) by mx1.freebsd.org (Postfix) with ESMTP id A05CA8FC15 for ; Tue, 8 Jun 2010 15:20:07 +0000 (UTC) Received: from uno.mnl.com (localhost [127.0.0.1]) by uno.mnl.com (Postfix) with ESMTP id 1DD501F04; Tue, 8 Jun 2010 08:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=duttonbros.com; h= message-id:date:from:to:cc:subject:references:in-reply-to :mime-version:content-type:content-transfer-encoding; s=mail; bh=UK/HH0x79TxQQwm/y3vO9uNunDs=; b=YR1uMMY3wzuG9aItq9LtESzI7f5G aQJ8oENrDeY1flS9yb2t/XKVJEqanll4GGX3D/L2sLmHEZKdfxe1Rhfac1x12Eo/ P7NVimFDL/pf90i1e/MFaQP+lHdR6IDU7xBnr/xqlgYjHqsolqtBr1c57qkBkIPV NNU0VIPSJyLiM3I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=duttonbros.com; h=message-id :date:from:to:cc:subject:references:in-reply-to:mime-version :content-type:content-transfer-encoding; q=dns; s=mail; b=V6d4Ac LlJtYifoPgABSStCMcijEGy40Pe5QJiol24gfej3IVSSUUH1otIsWjik520jTlcB sk1smdsYvBXcFqlwdpJvAwZjD4BPnjLIvd71S1zNCpoJvY0hBs5AKzOGDDvPkknd zUgtN1+pWvv/1dVtSxN3GVoZhtMW5ruff46Ro= Received: from localhost (localhost [127.0.0.1]) by uno.mnl.com (Postfix) with ESMTP id 0739E1F03; Tue, 8 Jun 2010 08:20:07 -0700 (PDT) Received: from c-98-210-178-102.hsd1.ca.comcast.net (c-98-210-178-102.hsd1.ca.comcast.net [98.210.178.102]) by duttonbros.com (Horde Framework) with HTTP; Tue, 08 Jun 2010 08:20:06 -0700 Message-ID: <20100608082006.5006764hokcpvzqe@duttonbros.com> Date: Tue, 08 Jun 2010 08:20:06 -0700 From: "Bradley W. Dutton" To: Jeremy Chadwick References: <20100607154256.941428ovaq2hha0g@duttonbros.com> <20100607173218.11716iopp083dbpu@duttonbros.com> <20100608044707.GA78147@icarus.home.lan> In-Reply-To: <20100608044707.GA78147@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.7) / FreeBSD-8.1 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 15:20:07 -0000 Quoting Jeremy Chadwick : >> On Mon, Jun 07, 2010 at 05:32:18PM -0700, Bradley W. Dutton wrote: >> I know it's pretty simple but for checking throughput I thought it >> would be ok. I don't have compression on and based on the drive >> lights and gstat, the drives definitely aren't idle. > > Try disabling prefetch (you have it enabled) and try setting > vfs.zfs.txg.timeout="5". Some people have reported a "sweet spot" with > regards to the last parameter (needing to be adjusted if your disks are > extremely fast, etc.), as otherwise ZFS would be extremely "bursty" in > its I/O (stalling/deadlocking the system at set intervals). By > decreasing the value you essentially do disk writes more regularly (with > less data), and depending upon the load and controller, this may even > out performance. I tested some of these settings. With the timeout set to 5 not much changed write wise. (keep in mind these results are the Nvidia/WDRE2 combo): With txg=5 and prefetch disabled I saw read speeds go down considerably: # normal/jbod txg=5 no prefetch zpool create bench /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14 dd if=/bench/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 59.330330 secs (212082286 bytes/sec) compared to 12582912000 bytes transferred in 34.668165 secs (362952928 bytes/sec) zpool create bench raidz /dev/ad4 /dev/ad6 /dev/ad10 /dev/ad12 /dev/ad14 dd if=/bench/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 71.135696 secs (176886046 bytes/sec) compared to 12582912000 bytes transferred in 45.825533 secs (274582993 bytes/sec) Running the same tests on the raidz2 Supermicro/Hitachi setup didn't yield any difference in writes, the reads were slower: zpool create tank raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 /dev/da6 /dev/da7 dd if=/tank/test.file of=/dev/null bs=1m 12582912000 bytes transferred in 44.118409 secs (285207745 bytes/sec) compared to 12582912000 bytes transferred in 32.911291 secs (382328118 bytes/sec) I rebooted and reran these numbers just to make sure they were consistent. >> >The higher CPU usage might be due to the device driver or the >> >interface card being used. >> >> Definitely a plausible explanation. If this was the case would the 8 >> parallel dd processes exhibit the same behavior? or is the type of >> IO affecting how much CPU the driver is using? > > It would be the latter. > > Also, I believe this Supermicro controller has been discussed in the > past. I can't remember if people had outright failures/issues with it > or if people were complaining about sub-par performance. I could also > be remembering a different Supermicro controller. > > If I had to make a recommendation, it would be to reproduce the same > setup on a system using an Intel ICH9/ICH9R or ICH10/ICH10R controller > in AHCI mode (with ahci.ko loaded, not ataahci.ko) and see if things > improve. But start with the loader.conf tunables I mentioned above -- > segregate each test. > > I would also recommend you re-run your tests with a different blocksize > for dd. I don't know why people keep using 1m (Linux websites?). Test > the following increments: 4k, 8k, 16k, 32k, 64k, 128k, 256k. That's > about where you should stop. I tested with 8, 16, 32, 64, 128, 1m and the results all looked similar. As such I stuck with bs=1m because it's easier to change count. > Otherwise, consider installing ports/benchmarks/bonnie++ and try that. > That will also get you concurrent I/O tests, I believe. I may give this a shot but I'm most interested in less concurrency as I have larger files with only a couple of readers/writers. As Bob noted a bunch of mirrors in the pool would definitely be faster for concurrent IO. Thanks for the help, Brad From owner-freebsd-fs@FreeBSD.ORG Tue Jun 8 23:39:28 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1A801065670 for ; Tue, 8 Jun 2010 23:39:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5749A8FC1F for ; Tue, 8 Jun 2010 23:39:27 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEADdxDkyDaFvK/2dsb2JhbACeRnG/ZoUWBA X-IronPort-AV: E=Sophos;i="4.53,387,1272859200"; d="scan'208";a="80004278" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 08 Jun 2010 19:39:25 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 526E8109C2C3; Tue, 8 Jun 2010 19:39:27 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s1hizAzPQcVt; Tue, 8 Jun 2010 19:39:26 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id D485B109C24A; Tue, 8 Jun 2010 19:39:26 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o58NtWP09898; Tue, 8 Jun 2010 19:55:32 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 8 Jun 2010 19:55:32 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Anders Nordby In-Reply-To: <20100608083649.GA77452@fupp.net> Message-ID: References: <20100608083649.GA77452@fupp.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 23:39:28 -0000 On Tue, 8 Jun 2010, Anders Nordby wrote: > Hi! > > I have a file server running 8.1-PRERELEASE amd64, where I share some > filesystems using NFS and Samba. After running for a day or two, the > server starts to get around 25% packet loss, browsing directories across > NFS gets really slow etc. Rebooting solves it until it happens again. > Has anyone experienced anything similar? I had this issue in FreeBSD 7 > as well, upgrading did not help. > Well, here's a few things you might try. (I know nothing about ZFS, except what I see discussed on the mailing lists.) - "netstat -m" will show you mbuf allocations. Might give you a hint w.r.t. mbuf/mbuf cluster exhaustion. - I'd try setting zio_use_uma = 0, since there have been reports of issues related to ZFS using the uma allocator and mbuf allocation uses the uma allocator now, too. (I think this is fairly recent, so might not be relevant to FreeBSD7.) - You can try the experimental NFS server to see if that affects the behaviour. ("-e" option on both mountd and nfsd) - If you have some different network hardware, you could try a different net interface. This would isolate the problem, if it happens to be related to the network device driver for the hardware you have. There are lots of email messages in the archive related to tuning the arc for zfs. I know nothing about it, but I'd look for a message that describes what the current recommendations are for amd64 w.r.t. this. Hopefully others can suggest other things to check. It smells like some sort of resource exhaustion problem, but who knows??? Good luck with it, rick From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 02:44:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DD3C1065675 for ; Wed, 9 Jun 2010 02:44:52 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 2D49F8FC18 for ; Wed, 9 Jun 2010 02:44:51 +0000 (UTC) Received: by email.octopus.com.au (Postfix, from userid 1002) id E24795CB93A; Wed, 9 Jun 2010 12:37:47 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: **** X-Spam-Status: No, score=4.4 required=10.0 tests=ALL_TRUSTED, DNS_FROM_OPENWHOIS,FH_DATE_PAST_20XX autolearn=no version=3.2.3 Received: from [10.1.50.144] (142.19.96.58.static.exetel.com.au [58.96.19.142]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id E64B25CB94B; Wed, 9 Jun 2010 12:37:43 +1000 (EST) Message-ID: <4C0F0017.5000002@modulus.org> Date: Wed, 09 Jun 2010 12:44:39 +1000 From: Andrew Snow User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4 MIME-Version: 1.0 To: "Bradley W. Dutton" , freebsd-fs@freebsd.org References: <20100607154256.941428ovaq2hha0g@duttonbros.com> <20100607173218.11716iopp083dbpu@duttonbros.com> <20100608044707.GA78147@icarus.home.lan> <20100608082006.5006764hokcpvzqe@duttonbros.com> In-Reply-To: <20100608082006.5006764hokcpvzqe@duttonbros.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS performance of various vdevs (long post) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 02:44:52 -0000 Under opensolaris, the LSI-based controllers seem to use interrupt coalescing, and you can even tweak the settings via lsiutil. When you disable it, things go alot slower. I suspect this is the reason for the speed difference of sequential transfer rates vs FreeBSD. - Andrew From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 08:26:18 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87D4A106567D; Wed, 9 Jun 2010 08:26:18 +0000 (UTC) (envelope-from az@FreeBSD.org) Received: from freefall.freebsd.org (unknown [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6037A8FC23; Wed, 9 Jun 2010 08:26:18 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o598QI5G026982; Wed, 9 Jun 2010 08:26:18 GMT (envelope-from az@freefall.freebsd.org) Received: (from az@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o598QFYq026978; Wed, 9 Jun 2010 08:26:15 GMT (envelope-from az) Date: Wed, 9 Jun 2010 08:26:15 GMT Message-Id: <201006090826.o598QFYq026978@freefall.freebsd.org> To: andrey.zverev@electro-com.ru, az@FreeBSD.org, freebsd-fs@FreeBSD.org From: az@FreeBSD.org Cc: Subject: Re: kern/130979: [smbfs] [panic] boot/kernel/smbfs.ko X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 08:26:18 -0000 Synopsis: [smbfs] [panic] boot/kernel/smbfs.ko State-Changed-From-To: open->closed State-Changed-By: az State-Changed-When: Wed Jun 9 08:26:15 UTC 2010 State-Changed-Why: not occurs anymore http://www.freebsd.org/cgi/query-pr.cgi?pr=130979 From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 12:25:19 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E20341065670 for ; Wed, 9 Jun 2010 12:25:19 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id 2CAC18FC13 for ; Wed, 9 Jun 2010 12:25:18 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id A691C47321; Wed, 9 Jun 2010 14:25:17 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id 7URuBGWSvEX8; Wed, 9 Jun 2010 14:25:17 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id 295B447320; Wed, 9 Jun 2010 14:25:17 +0200 (CEST) Date: Wed, 9 Jun 2010 14:25:17 +0200 From: Anders Nordby To: Rick Macklem Message-ID: <20100609122517.GA16231@fupp.net> References: <20100608083649.GA77452@fupp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 12:25:20 -0000 Hi, On Tue, Jun 08, 2010 at 07:55:32PM -0400, Rick Macklem wrote: > Well, here's a few things you might try. (I know nothing about ZFS, > except what I see discussed on the mailing lists.) > > - "netstat -m" will show you mbuf allocations. Might give you a hint > w.r.t. mbuf/mbuf cluster exhaustion. > - I'd try setting zio_use_uma = 0, since there have been reports of > issues related to ZFS using the uma allocator and mbuf allocation > uses the uma allocator now, too. (I think this is fairly recent, so > might not be relevant to FreeBSD7.) > - You can try the experimental NFS server to see if that affects the > behaviour. ("-e" option on both mountd and nfsd) > - If you have some different network hardware, you could try a different > net interface. This would isolate the problem, if it happens to be > related to the network device driver for the hardware you have. > > There are lots of email messages in the archive related to tuning the > arc for zfs. I know nothing about it, but I'd look for a message that > describes what the current recommendations are for amd64 w.r.t. this. > > Hopefully others can suggest other things to check. It smells like some > sort of resource exhaustion problem, but who knows??? Thanks. The only thing that (temporarily) solves this issue so far is rebooting, which helps only for a day or so. I have tried different NICs, replacing the physical server, replacing cables, changing and resetting switch ports. But it did not help, so I think this is a software problem. I will try zio_use_uma = 0 I think, and then try to limit vfs.zfs.arc_max to 100 MB or so. On the ZFS+NFS server while having these issues: root@unixfile:~# netstat -m 1293/4602/5895 mbufs in use (current/cache/total) 1109/3619/4728/65536 mbuf clusters in use (current/cache/total/max) 257/1023 mbuf+clusters out of packet secondary zone in use (current/cache) 0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 2541K/8804K/11345K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Packet loss seen from my workstation: anders@noname:~$ ping unixfile PING unixfile.aftenposten.no (192.168.120.33) 56(84) bytes of data. 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=1 ttl=63 time=0 .230 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=3 ttl=63 time=0 .262 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=5 ttl=63 time=0 .272 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=6 ttl=63 time=0 .203 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=7 ttl=63 time=0 .306 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=9 ttl=63 time=0 .309 ms ^C --- unixfile.aftenposten.no ping statistics --- 10 packets transmitted, 6 received, 40% packet loss, time 9017ms rtt min/avg/max/mdev = 0.203/0.263/0.309/0.042 ms Here is also vmstat -z from the server: ITEM SIZE LIMIT USED FREE REQUESTS FAILURES UMA Kegs: 208, 0, 175, 12, 175, 0 UMA Zones: 320, 0, 175, 5, 175, 0 UMA Slabs: 568, 0, 20339, 7535, 162600, 0 UMA RCntSlabs: 568, 0, 2468, 3, 2468, 0 UMA Hash: 256, 0, 5, 85, 81, 0 16 Bucket: 152, 0, 558, 292, 1115, 0 32 Bucket: 280, 0, 269, 25, 491, 0 64 Bucket: 536, 0, 254, 5, 391, 17 128 Bucket: 1048, 0, 3598, 47, 6823, 914 VM OBJECT: 216, 0, 47009, 9529, 2668554, 0 MAP: 232, 0, 7, 25, 7, 0 KMAP ENTRY: 120, 119815, 4881, 606, 376403, 0 MAP ENTRY: 120, 0, 1797, 683, 4683855, 0 DP fakepg: 120, 0, 0, 0, 0, 0 SG fakepg: 120, 0, 0, 0, 0, 0 mt_zone: 2056, 0, 196, 3, 196, 0 16: 16, 0, 14932, 7916, 3237030, 0 32: 32, 0, 2438, 1703, 2411143, 0 64: 64, 0, 32128, 18216, 93399160, 0 128: 128, 0, 28706, 55075, 12071701, 0 256: 256, 0, 3831, 7104, 58010086, 0 512: 512, 0, 1753, 578, 32140172, 0 1024: 1024, 0, 93, 123, 201330, 0 2048: 2048, 0, 529, 375, 36122797, 0 4096: 4096, 0, 253, 184, 185892, 0 Files: 80, 0, 424, 386, 1078416, 0 TURNSTILE: 136, 0, 297, 63, 297, 0 umtx pi: 96, 0, 0, 0, 0, 0 MAC labels: 40, 0, 0, 0, 0, 0 PROC: 1120, 0, 66, 114, 107003, 0 THREAD: 984, 0, 267, 29, 294, 0 SLEEPQUEUE: 80, 0, 297, 80, 297, 0 VMSPACE: 392, 0, 45, 155, 107030, 0 cpuset: 72, 0, 2, 98, 2, 0 audit_record: 952, 0, 0, 0, 0, 0 mbuf_packet: 256, 0, 259, 1021, 34278617, 0 mbuf: 256, 0, 1025, 3590, 131614064, 0 mbuf_cluster: 2048, 65536, 2278, 2450, 16615870, 0 mbuf_jumbo_page: 4096, 12800, 0, 104, 153927, 0 mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0 mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0 g_bio: 232, 0, 0, 8544, 1690094, 0 ttyinq: 160, 0, 135, 81, 300, 0 ttyoutq: 256, 0, 72, 48, 160, 0 ata_request: 320, 0, 0, 24, 1, 0 ata_composite: 336, 0, 0, 0, 0, 0 VNODE: 472, 0, 69327, 4057, 12604560, 0 VNODEPOLL: 112, 0, 0, 0, 0, 0 S VFS Cache: 108, 0, 70366, 6821, 12297146, 0 L VFS Cache: 328, 0, 179, 25369, 544759, 0 NAMEI: 1024, 0, 0, 96, 18824297, 0 NFSMOUNT: 616, 0, 0, 0, 0, 0 NFSNODE: 656, 0, 0, 0, 0, 0 DIRHASH: 1024, 0, 1147, 37, 1147, 0 pipe: 728, 0, 19, 86, 85332, 0 ksiginfo: 112, 0, 166, 890, 4901, 0 itimer: 344, 0, 0, 22, 1, 0 KNOTE: 128, 0, 0, 145, 622, 0 socket: 680, 131076, 53, 79, 20777, 0 unpcb: 240, 131072, 10, 182, 6269, 0 ipq: 56, 2079, 0, 189, 159, 0 udp_inpcb: 336, 131076, 11, 66, 5487, 0 udpcb: 16, 131208, 11, 661, 5487, 0 tcp_inpcb: 336, 131076, 32, 111, 9019, 0 tcpcb: 880, 131072, 32, 96, 9019, 0 tcptw: 72, 26250, 0, 200, 51, 0 syncache: 144, 15366, 0, 130, 8229, 0 hostcache: 136, 15372, 8, 132, 61, 0 tcpreass: 40, 4116, 3, 501, 662733, 0 sackhole: 32, 0, 0, 202, 11, 0 ripcb: 336, 131076, 0, 22, 1, 0 rtentry: 200, 0, 4, 34, 4, 0 selfd: 56, 0, 262, 683, 704729, 0 SWAPMETA: 288, 116519, 0, 0, 0, 0 ip4flow: 56, 99351, 16, 551, 11254, 0 ip6flow: 80, 99360, 0, 0, 0, 0 Mountpoints: 752, 0, 5, 20, 5, 0 FFS inode: 168, 0, 43495, 25739, 526228, 0 FFS1 dinode: 128, 0, 0, 0, 0, 0 FFS2 dinode: 256, 0, 43495, 25610, 526228, 0 taskq_zone: 56, 0, 0, 819, 299535, 0 zio_cache: 776, 0, 0, 2830, 7902766, 0 zio_buf_512: 512, 0, 73281, 39083, 2179139, 0 zio_data_buf_512: 512, 0, 41, 260, 86233, 0 zio_buf_1024: 1024, 0, 64, 624, 31885, 0 zio_data_buf_1024: 1024, 0, 33, 815, 14631, 0 zio_buf_1536: 1536, 0, 15, 161, 6621, 0 zio_data_buf_1536: 1536, 0, 9, 179, 666, 0 zio_buf_2048: 2048, 0, 10, 352, 13371, 0 zio_data_buf_2048: 2048, 0, 4, 82, 518, 0 zio_buf_2560: 2560, 0, 6, 76, 4631, 0 zio_data_buf_2560: 2560, 0, 8, 79, 751, 0 zio_buf_3072: 3072, 0, 3, 146, 8829, 0 zio_data_buf_3072: 3072, 0, 4, 107, 1160, 0 zio_buf_3584: 3584, 0, 5, 273, 22944, 0 zio_data_buf_3584: 3584, 0, 5, 82, 418, 0 zio_buf_4096: 4096, 0, 10, 192, 21812, 0 zio_data_buf_4096: 4096, 0, 7, 141, 1628, 0 zio_buf_5120: 5120, 0, 2, 236, 49783, 0 zio_data_buf_5120: 5120, 0, 14, 366, 2686, 0 zio_buf_6144: 6144, 0, 3, 127, 26343, 0 zio_data_buf_6144: 6144, 0, 20, 629, 1944, 0 zio_buf_7168: 7168, 0, 3, 85, 7341, 0 zio_data_buf_7168: 7168, 0, 31, 690, 2953, 0 zio_buf_8192: 8192, 0, 5, 98, 6653, 0 zio_data_buf_8192: 8192, 0, 47, 712, 3562, 0 zio_buf_10240: 10240, 0, 10, 109, 5628, 0 zio_data_buf_10240: 10240, 0, 80, 846, 5494, 0 zio_buf_12288: 12288, 0, 9, 81, 2704, 0 zio_data_buf_12288: 12288, 0, 59, 972, 4714, 0 zio_buf_14336: 14336, 0, 0, 293, 79024, 0 zio_data_buf_14336: 14336, 0, 64, 770, 5474, 0 zio_buf_16384: 16384, 0, 3409, 613, 42927, 0 zio_data_buf_16384: 16384, 0, 53, 615, 36196, 0 zio_buf_20480: 20480, 0, 0, 72, 1000, 0 zio_data_buf_20480: 20480, 0, 50, 761, 5383, 0 zio_buf_24576: 24576, 0, 3, 42, 702, 0 zio_data_buf_24576: 24576, 0, 24, 312, 3207, 0 zio_buf_28672: 28672, 0, 1, 54, 784, 0 zio_data_buf_28672: 28672, 0, 10, 157, 1538, 0 zio_buf_32768: 32768, 0, 0, 61, 1079, 0 zio_data_buf_32768: 32768, 0, 8, 129, 22324, 0 zio_buf_36864: 36864, 0, 3, 71, 486, 0 zio_data_buf_36864: 36864, 0, 11, 92, 1506, 0 zio_buf_40960: 40960, 0, 1, 53, 324, 0 zio_data_buf_40960: 40960, 0, 7, 58, 728, 0 zio_buf_45056: 45056, 0, 1, 43, 319, 0 zio_data_buf_45056: 45056, 0, 3, 55, 530, 0 zio_buf_49152: 49152, 0, 0, 65, 1224, 0 zio_data_buf_49152: 49152, 0, 1, 140, 17837, 0 zio_buf_53248: 53248, 0, 0, 53, 364, 0 zio_data_buf_53248: 53248, 0, 0, 54, 349, 0 zio_buf_57344: 57344, 0, 2, 52, 381, 0 zio_data_buf_57344: 57344, 0, 6, 97, 2164, 0 zio_buf_61440: 61440, 0, 0, 44, 267, 0 zio_data_buf_61440: 61440, 0, 1, 50, 594, 0 zio_buf_65536: 65536, 0, 172, 92, 41829, 0 zio_data_buf_65536: 65536, 0, 0, 119, 14319, 0 zio_buf_69632: 69632, 0, 0, 35, 194, 0 zio_data_buf_69632: 69632, 0, 0, 38, 195, 0 zio_buf_73728: 73728, 0, 0, 44, 525, 0 zio_data_buf_73728: 73728, 0, 3, 75, 718, 0 zio_buf_77824: 77824, 0, 0, 58, 462, 0 zio_data_buf_77824: 77824, 0, 6, 74, 557, 0 zio_buf_81920: 81920, 0, 1, 53, 422, 0 zio_data_buf_81920: 81920, 0, 0, 118, 12825, 0 zio_buf_86016: 86016, 0, 1, 34, 308, 0 zio_data_buf_86016: 86016, 0, 5, 50, 957, 0 zio_buf_90112: 90112, 0, 1, 48, 481, 0 zio_data_buf_90112: 90112, 0, 1, 29, 44, 0 zio_buf_94208: 94208, 0, 0, 49, 1036, 0 zio_data_buf_94208: 94208, 0, 0, 57, 177, 0 zio_buf_98304: 98304, 0, 0, 44, 348, 0 zio_data_buf_98304: 98304, 0, 0, 112, 12362, 0 zio_buf_102400: 102400, 0, 0, 58, 388, 0 zio_data_buf_102400: 102400, 0, 0, 20, 45, 0 zio_buf_106496: 106496, 0, 1, 35, 477, 0 zio_data_buf_106496: 106496, 0, 1, 57, 482, 0 zio_buf_110592: 110592, 0, 1, 72, 884, 0 zio_data_buf_110592: 110592, 0, 0, 71, 930, 0 zio_buf_114688: 114688, 0, 0, 61, 656, 0 zio_data_buf_114688: 114688, 0, 1, 146, 10626, 0 zio_buf_118784: 118784, 0, 0, 67, 532, 0 zio_data_buf_118784: 118784, 0, 0, 10, 29, 0 zio_buf_122880: 122880, 0, 1, 86, 1444, 0 zio_data_buf_122880: 122880, 0, 0, 50, 176, 0 zio_buf_126976: 126976, 0, 1, 59, 1029, 0 zio_data_buf_126976: 126976, 0, 0, 42, 325, 0 zio_buf_131072: 131072, 0, 0, 717, 119915, 0 zio_data_buf_131072: 131072, 0, 474, 981, 214146, 0 dmu_buf_impl_t: 224, 0, 77939, 46739, 2664713, 0 dnode_t: 776, 0, 73767, 45043, 2094869, 0 arc_buf_hdr_t: 208, 0, 27195, 24519, 605620, 0 arc_buf_t: 72, 0, 4901, 14949, 677129, 0 zil_lwb_cache: 200, 0, 2, 1233, 118944, 0 zfs_znode_cache: 376, 0, 25805, 4005, 12077350, 0 Regards, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 13:35:23 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58A531065670; Wed, 9 Jun 2010 13:35:23 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yw0-f182.google.com (mail-yw0-f182.google.com [209.85.211.182]) by mx1.freebsd.org (Postfix) with ESMTP id DD23F8FC26; Wed, 9 Jun 2010 13:35:22 +0000 (UTC) Received: by ywh12 with SMTP id 12so4550306ywh.14 for ; Wed, 09 Jun 2010 06:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=lCNZqEZFgdHyY52qMl5pRpfjTgPbsAnOLT3D80M4zC0=; b=KeLRRAfd1QlKzP3QjpXT3dYARH9yyNZV9on6/noUAeLm1+v1C/1Yg5H1EnmPK8dYFr 6Iea5lB27lEC47QBR92LejqAkZRvtpMP8nqvnUis9aqXEC9qahNQjk4ZZchjLle9jSax y6ieD7KP0RhVnaAidaz/IZ+1kDla4Lhweu8ZE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=axTqZuVIhEJawz4DpYhFl5aS9p/kn9oWObFs7KeXllbfNXsMVAWDFrPhsEcG1uZ5ni 78/hDd9zkh4WOadE+YC661Vsc6ocIUrQb9G/xAWIXoCPTk6uPXolN+KTV86Ltl5yxe6m U4JO12TmHDnZntpdPFG4NzJs8I/1Tb1MIOxgo= MIME-Version: 1.0 Received: by 10.229.223.201 with SMTP id il9mr5906267qcb.89.1276090521180; Wed, 09 Jun 2010 06:35:21 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.229.183.213 with HTTP; Wed, 9 Jun 2010 06:35:21 -0700 (PDT) In-Reply-To: <20100605190659.GA3369@a91-153-117-195.elisa-laajakaista.fi> References: <20100603143501.GA3176@a91-153-117-195.elisa-laajakaista.fi> <20100605190659.GA3369@a91-153-117-195.elisa-laajakaista.fi> Date: Wed, 9 Jun 2010 15:35:21 +0200 X-Google-Sender-Auth: FtESSOhMuEGNWUGUcLzN1Fx-kpY Message-ID: From: Attilio Rao To: Jaakko Heinonen Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, kib@freebsd.org Subject: Re: syncer vnode leak because of nmount() race X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 13:35:23 -0000 2010/6/5 Jaakko Heinonen : > > Thank you for the reply. > > On 2010-06-04, Attilio Rao wrote: >> I think that, luckilly, it is not a very common condition to have the >> mount still in flight and get updates... :) > > Agreed, but mountd(8) increases chances because it does an update mount > for all local file systems when it receives SIGHUP. > >> However, I think that the real breakage here is that the check on >> mnt->mnt_syncer is done lockless and it is unsafe. > >> I found also this bug when rewriting the syncer and I resolved that by >> using a separate flag for that (in my case it was simpler and more >> beneficial actually for some other reasons, but you may do the same >> thing with a mnt_kern_flag entry). > > OK, I will take a look at this approach. > >> Additively, note that vfs_busy() here is not intended to protect >> against such situations but against unmount. >> >> > PS. vfs_unbusy(9) manual page is out of date after r184554 and IMO >> > =C2=A0 =C2=A0vfs_busy(9) manual page is misleading because it talks ab= out >> > =C2=A0 =C2=A0synchronizing access to a mount point. >> >> May you be more precise on what is misleading please? > > As you wrote above it protects only against unmount. At least I got > feeling that it does more than that when I read this: "The purpose of > this function is to synchronize access to a mount point. =C2=A0It also de= lays > unmounting by sleeping on mp if the MNTK_UNMOUNT flag is set in > mp->mnt_kern_flag and the LK_NOWAIT flag is not set.". > > I did some updates for the manual pages: > > =C2=A0 =C2=A0 =C2=A0 =C2=A0http://people.freebsd.org/~jh/patches/vfs_busy= -vfs_unbusy.diff That patch is fine. I'd just avoid to mention the mnt_lockref flag name, just use the generic 'refcount' word. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 14:26:40 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C46F8106567A for ; Wed, 9 Jun 2010 14:26:40 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 70F308FC1A for ; Wed, 9 Jun 2010 14:26:40 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FA9A.dip.t-dialin.net [217.84.250.154]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 976DB84400A for ; Wed, 9 Jun 2010 16:26:36 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 8BC035143 for ; Wed, 9 Jun 2010 16:26:30 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276093590; bh=3xXXXXyAjxubPmzOSt0BIzdRKaxNyf0qQLuGo44Mels=; h=Message-ID:Date:From:To:Subject:MIME-Version:Content-Type: Content-Transfer-Encoding; b=MLVA6KFMWKK+AF1hISUKUP70DzPjl6y24Syk0SwcxHtVhtETSOzJ1tmqiUf69Iln/ JNIGkGiOSv2l5mUPtO8wP7E5MFTelW8ct5HYOxnAfO/h61rCwWhr7w1+I6GC1mjdqj gAzg9E8JNvLJSoOUsHlvnV5JUYAwj2wqnTeTNn/LW24ZCZyMBHzgt4jTjFulZaJ8oh LfKI7L6nckBFea/MyvvWudaTp4j6tbNgyVZBoi5YkIHGgTEqN+bSdPfI5MGTj9pyS1 gestUg5R+A+flF/SFT/NFbsLREmL0Dbh0kNeQkjbNe4eTPhD2J/FaS2Ed0bCmRvbzL /8PwmzwPLrcxQ== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o59EQSD0013312 for fs@freebsd.org; Wed, 9 Jun 2010 16:26:28 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Wed, 09 Jun 2010 16:26:27 +0200 Message-ID: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> Date: Wed, 09 Jun 2010 16:26:27 +0200 From: Alexander Leidinger To: fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 976DB84400A.A77FA X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276698397.30206@6t8LK8M5dwh/ciFgKfrgmQ X-EBL-Spam-Status: No Cc: Subject: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 14:26:40 -0000 Hi, I noticed that we do not have an automatism to scrub a ZFS pool periodically. Is there interest in something like this, or shall I keep it local? Here's the main part of the monthly periodic script I quickly created: ---snip--- case "$monthly_scrub_zfs_enable" in [Yy][Ee][Ss]) echo echo 'Scrubbing of zfs pools:' if [ -z "${monthly_scrub_zfs_pools}" ]; then monthly_scrub_zfs_pools="$(zpool list -H -o name)" fi for pool in ${monthly_scrub_zfs_pools}; do # successful only if there is at least one pool to scrub rc=0 echo " starting scrubbing of pool '${pool}'" zpool scrub ${pool} echo " consult 'zpool status ${pool}' for the result" echo " or wait for the daily_status_zfs mail, if enabled" done ;; ---snip--- Bye, Alexander. -- Fuch's Warning: If you actually look like your passport photo, you aren't well enough to travel. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 15:10:30 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 347D3106564A for ; Wed, 9 Jun 2010 15:10:30 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 8EC9F8FC2B for ; Wed, 9 Jun 2010 15:10:29 +0000 (UTC) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o59EhwES040479 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 9 Jun 2010 16:43:58 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.3/8.14.3) with ESMTP id o59EhtwG005367 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 9 Jun 2010 16:43:55 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o59EhtYe074772; Wed, 9 Jun 2010 16:43:55 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o59EhtIr074771; Wed, 9 Jun 2010 16:43:55 +0200 (CEST) (envelope-from ticso) Date: Wed, 9 Jun 2010 16:43:55 +0200 From: Bernd Walter To: Alexander Leidinger Message-ID: <20100609144355.GL72453@cicely7.cicely.de> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01 autolearn=ham version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 15:10:30 -0000 On Wed, Jun 09, 2010 at 04:26:27PM +0200, Alexander Leidinger wrote: > Hi, > > I noticed that we do not have an automatism to scrub a ZFS pool > periodically. Is there interest in something like this, or shall I > keep it local? For me scrub'ing takes several days without having a special big pool size and starting another scrub restarts everything. You should at least check if another one is still running. I think resilvering is also a collision case to check for. > Here's the main part of the monthly periodic script I quickly created: > ---snip--- > case "$monthly_scrub_zfs_enable" in > [Yy][Ee][Ss]) > echo > echo 'Scrubbing of zfs pools:' > > if [ -z "${monthly_scrub_zfs_pools}" ]; then > monthly_scrub_zfs_pools="$(zpool list -H -o name)" > fi > > for pool in ${monthly_scrub_zfs_pools}; do > # successful only if there is at least one pool to scrub > rc=0 > > echo " starting scrubbing of pool '${pool}'" > zpool scrub ${pool} > echo " consult 'zpool status ${pool}' for the result" > echo " or wait for the daily_status_zfs mail, if > enabled" > done > ;; > ---snip--- -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 15:12:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10A21106566B; Wed, 9 Jun 2010 15:12:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9C48B8FC12; Wed, 9 Jun 2010 15:12:47 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAPJLD0yDaFvK/2dsb2JhbACeS3G+HIUYBA X-IronPort-AV: E=Sophos;i="4.53,391,1272859200"; d="scan'208";a="79387669" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 09 Jun 2010 11:12:44 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id C0744109C2C9; Wed, 9 Jun 2010 11:12:45 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n5r7430M-ZgK; Wed, 9 Jun 2010 11:12:45 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 04749109C327; Wed, 9 Jun 2010 11:12:45 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o59FSqn27257; Wed, 9 Jun 2010 11:28:52 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 9 Jun 2010 11:28:52 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Anders Nordby In-Reply-To: <20100609122517.GA16231@fupp.net> Message-ID: References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 15:12:48 -0000 On Wed, 9 Jun 2010, Anders Nordby wrote: > > Thanks. The only thing that (temporarily) solves this issue so far is > rebooting, which helps only for a day or so. I have tried different > NICs, replacing the physical server, replacing cables, changing and > resetting switch ports. But it did not help, so I think this is a > software problem. I will try zio_use_uma = 0 I think, and then try to > limit vfs.zfs.arc_max to 100 MB or so. > When you tried a different NIC, was a different type (ie. different chipset that uses a different device driver)? I suggested that not because I thought the hardware was broken but because I thought it might be related to the network interface's device driver and switching to a different device driver would isolate that possibility. > On the ZFS+NFS server while having these issues: > > root@unixfile:~# netstat -m > 1293/4602/5895 mbufs in use (current/cache/total) > 1109/3619/4728/65536 mbuf clusters in use (current/cache/total/max) > 257/1023 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/104/104/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 2541K/8804K/11345K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > Packet loss seen from my workstation: > > anders@noname:~$ ping unixfile > PING unixfile.aftenposten.no (192.168.120.33) 56(84) bytes of data. > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=1 > ttl=63 time=0 > .230 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=3 > ttl=63 time=0 > .262 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=5 > ttl=63 time=0 > .272 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=6 > ttl=63 time=0 > .203 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=7 > ttl=63 time=0 > .306 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=9 > ttl=63 time=0 > .309 ms Well, it doesn't seem to be mbuf exhaustion (I don't know what "out of packet secondary zone" means, I'll have to look at that) and if it doesn't handle pings it seems really hosed. Have you done a "vmstat 5" + "ps axlH" (or similar) to try and see what it's doing? ("top" and "netstat" might also help?) If you can figure out where it's spinning its wheels, that might at least give us a hint w.r.t. the problem. Good luck with it, rick From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 15:23:15 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FEF31065672 for ; Wed, 9 Jun 2010 15:23:15 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id C25378FC19 for ; Wed, 9 Jun 2010 15:23:14 +0000 (UTC) Received: by vws1 with SMTP id 1so1303981vws.13 for ; Wed, 09 Jun 2010 08:23:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=svHz9HjkzfVMAY2N38DZeqY4A5nlEmdWy//f2aHo5jE=; b=bXGj9gInM60USS1xy/0Ltf2IyjwuKpv/fGaCqXlarkR11Ab23whtpX35bBYpto5b6O lG0TgaFJy5CrKfK2capWutsUW790nqes/8BE015Hm4sC0XWTJAhZ31n/Mnkz5xwi5yA4 pR7+R21bu49mtrUZPyaXwoQLKlAkTS6HMVil0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=F4KeFOIqllciuu28djKHrqIY+CyBgmz2rQA1Lzj/U0zJd2e4NF0nM625mTj4p1yDdT hahRdEbUpjejDhqHD8Zst0CdkGSfVCKN9vlpaJA/T35GmDYKrrF3zHt3wyGaD9MYSU/Y pjNaPc6ELXnqL9XY3T1Lrdyedu2mhAOkEMWzQ= Received: by 10.224.64.76 with SMTP id d12mr2572727qai.208.1276096992919; Wed, 09 Jun 2010 08:23:12 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id m29sm9513928qck.16.2010.06.09.08.23.11 (version=SSLv3 cipher=RC4-MD5); Wed, 09 Jun 2010 08:23:12 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C0FB1DE.9080508@dataix.net> Date: Wed, 09 Jun 2010 11:23:10 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 To: Alexander Leidinger References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> In-Reply-To: <4C0FAE2A.7050103@dataix.net> X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 15:23:15 -0000 On 06/09/2010 11:07, jhell wrote: > On 06/09/2010 10:26, Alexander Leidinger wrote: >> Hi, >> >> I noticed that we do not have an automatism to scrub a ZFS pool >> periodically. Is there interest in something like this, or shall I keep >> it local? >> >> Here's the main part of the monthly periodic script I quickly created: >> ---snip--- >> case "$monthly_scrub_zfs_enable" in >> [Yy][Ee][Ss]) >> echo >> echo 'Scrubbing of zfs pools:' >> >> if [ -z "${monthly_scrub_zfs_pools}" ]; then >> monthly_scrub_zfs_pools="$(zpool list -H -o name)" >> fi >> >> for pool in ${monthly_scrub_zfs_pools}; do >> # successful only if there is at least one pool to scrub >> rc=0 >> >> echo " starting scrubbing of pool '${pool}'" >> zpool scrub ${pool} >> echo " consult 'zpool status ${pool}' for the result" >> echo " or wait for the daily_status_zfs mail, if >> enabled" >> done >> ;; >> ---snip--- >> >> Bye, >> Alexander. >> > > Please add a check to see if any resilerving is being done on the pool > that the scub is being executed on. (Just in case), I would hope that > the scrub would fail silently in this case. > > Please also check whether a scrub is already running on one of the pools > and if so & another pool exists start a background loop to wait for the > first scrub to finish or die silently. > > I had a scrub fully restart from calling scrub a second time after being > more than 50% complete, its frustrating. > > > Thanks!, > I should probably suggest one check that comes to mind. zpool history ${pool} | grep scrub | tail -1 |cut -f1 -d. Then compare the output with today's date to make sure today is >= 30 days from the date of the last scrub. With the above this could be turned into a daily_zfs_scrub_enable with a default daily_zfs_scrub_threshold="30" and ensuring that if one check is missed it will not take another 30 days to run the check again. Food for thought. Thanks!, Thanks!, -- jhell From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 15:31:19 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E23B1065673 for ; Wed, 9 Jun 2010 15:31:19 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 09AD18FC12 for ; Wed, 9 Jun 2010 15:31:18 +0000 (UTC) Received: by vws1 with SMTP id 1so1314192vws.13 for ; Wed, 09 Jun 2010 08:31:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=rw4sqFwf1+DasB+/2YgKSYCW/vvQGWmyymVOqNtTR6s=; b=GQ9IXn1xtQeJk0Hn5vYXN6+jdWrGh+dDX2AoQmrR+homPOp6Xg/TzM9DWpyFQZd3fq VhBo4san6crvbzpgsksiCGUniUedbbTqmDvaNC/c/H4j7YurK5K7F7VNYYqJcEyZDlsV PNzjLixQfJmFgbi6TEK8cl8ywwJ122fjhQ3Y8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=b0UsKjTTc18Bd7hqgrpOjxyXNR2N76ANRNrFWaVIpboM+FdXcRnmorRpEIIKccX2aw oTfDYizMzSt0jGvGDUEgLFXLQbq16hSda5RbwhJwO2sqZ6zCdhat5LwZZeEmuBlmKkiD A2BMeoXMI+iLxc/L3nqKQG4hhjvDF4M7DOJ0Q= Received: by 10.224.26.154 with SMTP id e26mr2616905qac.247.1276096046379; Wed, 09 Jun 2010 08:07:26 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id i10sm9462951qcb.23.2010.06.09.08.07.23 (version=SSLv3 cipher=RC4-MD5); Wed, 09 Jun 2010 08:07:24 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C0FAE2A.7050103@dataix.net> Date: Wed, 09 Jun 2010 11:07:22 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 To: Alexander Leidinger References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> In-Reply-To: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 15:31:19 -0000 On 06/09/2010 10:26, Alexander Leidinger wrote: > Hi, > > I noticed that we do not have an automatism to scrub a ZFS pool > periodically. Is there interest in something like this, or shall I keep > it local? > > Here's the main part of the monthly periodic script I quickly created: > ---snip--- > case "$monthly_scrub_zfs_enable" in > [Yy][Ee][Ss]) > echo > echo 'Scrubbing of zfs pools:' > > if [ -z "${monthly_scrub_zfs_pools}" ]; then > monthly_scrub_zfs_pools="$(zpool list -H -o name)" > fi > > for pool in ${monthly_scrub_zfs_pools}; do > # successful only if there is at least one pool to scrub > rc=0 > > echo " starting scrubbing of pool '${pool}'" > zpool scrub ${pool} > echo " consult 'zpool status ${pool}' for the result" > echo " or wait for the daily_status_zfs mail, if > enabled" > done > ;; > ---snip--- > > Bye, > Alexander. > Please add a check to see if any resilerving is being done on the pool that the scub is being executed on. (Just in case), I would hope that the scrub would fail silently in this case. Please also check whether a scrub is already running on one of the pools and if so & another pool exists start a background loop to wait for the first scrub to finish or die silently. I had a scrub fully restart from calling scrub a second time after being more than 50% complete, its frustrating. Thanks!, -- jhell From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 23:22:17 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67D12106566C for ; Wed, 9 Jun 2010 23:22:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1E6748FC17 for ; Wed, 9 Jun 2010 23:22:16 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlcHADO/D0yDaFvI/2dsb2JhbACSSAEBjBJxv1KFGAQ X-IronPort-AV: E=Sophos;i="4.53,395,1272859200"; d="scan'208";a="80144053" Received: from darling.cs.uoguelph.ca ([131.104.91.200]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 09 Jun 2010 19:22:14 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 0FC0E940138 for ; Wed, 9 Jun 2010 19:22:16 -0400 (EDT) X-Virus-Scanned: amavisd-new at darling.cs.uoguelph.ca Received: from darling.cs.uoguelph.ca ([127.0.0.1]) by localhost (darling.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zMehGE+SBbU3 for ; Wed, 9 Jun 2010 19:22:15 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 1B5D29400E6 for ; Wed, 9 Jun 2010 19:22:15 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o59NcOC23052 for ; Wed, 9 Jun 2010 19:38:24 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 9 Jun 2010 19:38:24 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-fs@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Testers: NFSv3 support for pxeboot for nfs diskless root X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 23:22:17 -0000 I put 3 patches (you need to apply them all) here: http://people.freebsd.org/~rmacklem/nfsdiskless-patches/ They convert lib/libstand/nfs.c and pxeboot to use NFSv3 instead of NFSv2 (unless built with OLD_NFSV2 defined). Initial test reports have been good. (one has it working ok and the other has a problem in an area not related to the patches, it appears) So, if others are interested in testing these, it would be appreciated, rick From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 08:17:16 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C2421065672 for ; Thu, 10 Jun 2010 08:17:16 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id A86A08FC16 for ; Thu, 10 Jun 2010 08:17:14 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c211-30-160-13.mirnd2.nsw.optusnet.com.au [211.30.160.13] (may be forged)) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o5A8HBce013897 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 10 Jun 2010 18:17:12 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o5A8HAgm064684; Thu, 10 Jun 2010 18:17:10 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o5A8HAFU064683; Thu, 10 Jun 2010 18:17:10 +1000 (EST) (envelope-from peter) Date: Thu, 10 Jun 2010 18:17:10 +1000 From: Peter Jeremy To: Anders Nordby Message-ID: <20100610081710.GA64350@server.vk2pj.dyndns.org> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="82I3+IH0IqGh5yIs" Content-Disposition: inline In-Reply-To: <20100609122517.GA16231@fupp.net> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 08:17:16 -0000 --82I3+IH0IqGh5yIs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Jun-09 14:25:17 +0200, Anders Nordby wrote: >Thanks. The only thing that (temporarily) solves this issue so far is >rebooting, which helps only for a day or so. I have tried different >NICs, replacing the physical server, replacing cables, changing and >resetting switch ports. But it did not help, so I think this is a >software problem. I will try zio_use_uma =3D 0 I think, and then try to >limit vfs.zfs.arc_max to 100 MB or so. I wonder if your system is running out of free RAM. How would you like to monitor "inactive", "cache" and "free" from either "systat -v" or "vmstat -s" whilst the problem is occurring. Does something like perl -e '$x =3D "x" x 10000000;' temporarily correct the problem? --=20 Peter Jeremy --82I3+IH0IqGh5yIs Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkwQn4YACgkQ/opHv/APuIcB1wCgrdULTyjAmBBdPnx1t0b4XFSj bU8AoLvsCgKcm2xInLijQgSYP/5jt/5W =sJDK -----END PGP SIGNATURE----- --82I3+IH0IqGh5yIs-- From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 09:23:53 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FEE2106564A for ; Thu, 10 Jun 2010 09:23:53 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id BE34B8FC17 for ; Thu, 10 Jun 2010 09:23:52 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FE15.dip.t-dialin.net [217.84.254.21]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D938484400A; Thu, 10 Jun 2010 11:23:48 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 0C919510B; Thu, 10 Jun 2010 11:23:46 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276161826; bh=Dcxo6zVx6ibwiiWk7SE1m5A5gJR3uG18MmLUn7x7DGM=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=HG/1+XO8bHWMMmYgcTpiCBQO1GfHtreM/wGKsZ9whWluJQqvSKxbHENo26HjKMpQ5 SN7j7nZ8mlCfaNE3Uc6koloroMrfHw8XaCYsqfJffq60gGpS6rltyXEeH9CmdUeQue UIsRlVOlbnxbDIO1RkqDU8aK+MalM5JhDoUJeIK0IKNKi9y0e8vJTZPH9Se2pI2SaS Pm1MwHUK+ILG9uMzxCRHoKku34i/uAz9K8S5RhwrDKr//9ztJ/05LrITgFgnHrVojU Wd2IN7gjkTGnqM+ITCWHg2XOg19OU5vgNzTW0lb7aXGAZEPHp5slsJ+7KTmXNaJmUr onAraupNhwVeQ== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5A9Njiw090741; Thu, 10 Jun 2010 11:23:45 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 10 Jun 2010 11:23:45 +0200 Message-ID: <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> Date: Thu, 10 Jun 2010 11:23:45 +0200 From: Alexander Leidinger To: ticso@cicely.de References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <20100609144355.GL72453@cicely7.cicely.de> In-Reply-To: <20100609144355.GL72453@cicely7.cicely.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D938484400A.A6C5E X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.423, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, J_CHICKENPOX_53 0.60, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276766630.67689@y3jyMiMwQ2yXop2mseZ7gg X-EBL-Spam-Status: No Cc: fs@FreeBSD.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 09:23:53 -0000 Quoting Bernd Walter (from Wed, 9 Jun 2010 16:43:55 +0200): > On Wed, Jun 09, 2010 at 04:26:27PM +0200, Alexander Leidinger wrote: >> Hi, >> >> I noticed that we do not have an automatism to scrub a ZFS pool >> periodically. Is there interest in something like this, or shall I >> keep it local? > > For me scrub'ing takes several days without having a special big > pool size and starting another scrub restarts everything. > You should at least check if another one is still running. Good point, I will have a look at this... But I'm a little bit surprised, when I scrub a pool of 3 times 250 GB disks in RAIDZ configuration, it is finished fast (a fraction of a day... maybe an hour or two). Initially it displays a very long time (>400 hours), but this is reducing after a while drastically. The pool is filled up to 3/4 of the entire capacity. > I think resilvering is also a collision case to check for. No. Resilvering has higher priority than a scrub. From the man-page: ---snip--- If a resilver is in progress, ZFS does not allow a scrub to be started until the resilver completes. ---snip--- Bye, Alexander. -- Fairy Tale, n.: A horror story to prepare children for the newspapers. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 09:27:17 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB3C3106567C for ; Thu, 10 Jun 2010 09:27:17 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 55A378FC1B for ; Thu, 10 Jun 2010 09:27:17 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FE15.dip.t-dialin.net [217.84.254.21]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 3FE2584400A; Thu, 10 Jun 2010 11:27:14 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 684E1510C; Thu, 10 Jun 2010 11:27:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276162031; bh=kFfA/Kv+EvB0dZDuJIUymozwCEMUx49yqdzcGWPdi2I=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=NzjKK/B1LT7NGS/rjUc/gIQg5ce6hLwb/Gvf8JqXMNVZVhJuLQ8N+/JeQITNw56IJ JipT2RVHiA2al+m3jsZmVl57S9HY4HZWDiEJTrJrW7KoE+ipFKQtWjmlH/W2hBRbcn WGUwdOmJkmeYVF76rn7bKVk+edMY9AENfVq28XaOefnM7MAdPVq0nWIKyUiiia2lhl 5EHN2gT6DhFOX4vvr61JrcOL8Qj52Y9d4cltwb0hU+MTp5HNKvd7bY3WvBIBAcv6iO JIfGr27csMKq+kha4iPODfmdQpauJyh9XqM+iDZOUXWj1fY9yEfo099HCKnvMmy7pI Q/cYRn1ln15wQ== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5A9RADE091502; Thu, 10 Jun 2010 11:27:10 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 10 Jun 2010 11:27:10 +0200 Message-ID: <20100610112710.20215zznvaqdai88@webmail.leidinger.net> Date: Thu, 10 Jun 2010 11:27:10 +0200 From: Alexander Leidinger To: jhell References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> In-Reply-To: <4C0FAE2A.7050103@dataix.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 3FE2584400A.A5FD7 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276766835.33222@eRKmZy5hXQLo6fdhk/1uog X-EBL-Spam-Status: No Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 09:27:17 -0000 Quoting jhell (from Wed, 09 Jun 2010 11:07:22 -0400): > Please add a check to see if any resilerving is being done on the pool > that the scub is being executed on. (Just in case), I would hope that > the scrub would fail silently in this case. It does. No need to check for the resilvering. > Please also check whether a scrub is already running on one of the pools > and if so & another pool exists start a background loop to wait for the > first scrub to finish or die silently. I do not want a background job running forever in the periodic script. If a scrub is in progress, it should print the fact and do nothing (a second scrub after the running one finishes us superflous). I will have a look at this. Bye, Alexander. -- You look tired. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 09:32:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A591B1065673 for ; Thu, 10 Jun 2010 09:32:49 +0000 (UTC) (envelope-from admin@kkip.pl) Received: from mainframe.kkip.pl (kkip.pl [87.105.164.78]) by mx1.freebsd.org (Postfix) with ESMTP id 15E898FC16 for ; Thu, 10 Jun 2010 09:32:48 +0000 (UTC) Received: from static-78-8-144-74.ssp.dialog.net.pl ([78.8.144.74] helo=[192.168.0.2]) by mainframe.kkip.pl with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.71 (FreeBSD)) (envelope-from ) id 1OMe7s-000Kho-9n for freebsd-fs@freebsd.org; Thu, 10 Jun 2010 11:32:47 +0200 Message-ID: <4C10B136.3030404@kkip.pl> Date: Thu, 10 Jun 2010 11:32:38 +0200 From: Bartosz Stec User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.9) Gecko/20100406 Shredder/3.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> In-Reply-To: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-User: admin@kkip.pl X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Spam-Score: -8.1 X-Spam-Score-Int: -80 X-Exim-Version: 4.71 (build at 02-Feb-2010 20:10:28) X-Date: 2010-06-10 11:32:47 X-Connected-IP: 78.8.144.74:63299 X-Message-Linecount: 58 X-Body-Linecount: 46 X-Message-Size: 1943 X-Body-Size: 1400 X-Received-Count: 1 X-Recipient-Count: 1 X-Local-Recipient-Count: 1 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 09:32:49 -0000 On 2010-06-09 16:26, Alexander Leidinger wrote: > Hi, > > I noticed that we do not have an automatism to scrub a ZFS pool > periodically. Is there interest in something like this, or shall I > keep it local? > > Here's the main part of the monthly periodic script I quickly created: > ---snip--- > case "$monthly_scrub_zfs_enable" in > [Yy][Ee][Ss]) > echo > echo 'Scrubbing of zfs pools:' > > if [ -z "${monthly_scrub_zfs_pools}" ]; then > monthly_scrub_zfs_pools="$(zpool list -H -o name)" > fi > > for pool in ${monthly_scrub_zfs_pools}; do > # successful only if there is at least one pool to scrub > rc=0 > > echo " starting scrubbing of pool '${pool}'" > zpool scrub ${pool} > echo " consult 'zpool status ${pool}' for the > result" > echo " or wait for the daily_status_zfs mail, if > enabled" > done > ;; > ---snip--- > > Bye, > Alexander. > Ross-at-neces-dot-com already did what you're searching for. I'm using his periodic scripts for some months now, check here: http://www.neces.com/blog/technology/integrating-freebsd-zfs-and-periodic-snapshots-and-scrubs. They're doing all necessary stuff, like checking for scrub in progress too. Hope you'll find them helpful. Cheers :) -- Bartosz Stec From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 09:41:17 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 353B9106566B for ; Thu, 10 Jun 2010 09:41:17 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id EBCFA8FC13 for ; Thu, 10 Jun 2010 09:41:16 +0000 (UTC) Received: from omta05.westchester.pa.mail.comcast.net ([76.96.62.43]) by qmta02.westchester.pa.mail.comcast.net with comcast id U9Na1e0050vyq2s529U1M7; Thu, 10 Jun 2010 09:28:01 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta05.westchester.pa.mail.comcast.net with comcast id U9Tz1e0053S48mS3R9U0h2; Thu, 10 Jun 2010 09:28:01 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6C8BA9B418; Thu, 10 Jun 2010 02:27:58 -0700 (PDT) Date: Thu, 10 Jun 2010 02:27:58 -0700 From: Jeremy Chadwick To: Alexander Leidinger Message-ID: <20100610092758.GA67752@icarus.home.lan> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <20100609144355.GL72453@cicely7.cicely.de> <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: ticso@cicely.de, fs@FreeBSD.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 09:41:17 -0000 On Thu, Jun 10, 2010 at 11:23:45AM +0200, Alexander Leidinger wrote: > But I'm a little bit surprised, when I scrub a pool of 3 times 250 > GB disks in RAIDZ configuration, it is finished fast (a fraction of > a day... maybe an hour or two). Initially it displays a very long > time (>400 hours), but this is reducing after a while drastically. For what it's worth, Solaris does the exact same thing (initially shows a very long duration, which keeps getting longer, but then reduces after some time and begins catching up quickly). It didn't originally behave this way (on FreeBSD or Solaris) so there's probably a justified reason for it. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 09:53:32 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF6E01065677 for ; Thu, 10 Jun 2010 09:53:32 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 8756A8FC1C for ; Thu, 10 Jun 2010 09:53:32 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FE15.dip.t-dialin.net [217.84.254.21]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 2B0D284400A; Thu, 10 Jun 2010 11:53:27 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 4ED045110; Thu, 10 Jun 2010 11:53:24 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276163604; bh=NtdeI0CCaxYciry4aAL6cPV4BSDW65K80po0wn2HWok=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=XnN6P8OFET/smuwvY9i/+sYNptWyC2gll0vdxVWubLVHDvMFM6lAARw8Ekpx3ReDd SinQCyNxtHknjL4+NRHz5fysrqkd+OIZwE87ZDCbVQhMQL689uPFpOEwSwG1h8OAPi +wCYZVbT677YU/LbGJx2OgAqSN/3/ZFMYacbDjEgXblSO7y/TogCv2qGkFBpzLjUGx xpYioa0vk26yZhzCwFOP8JK4k7RkgJynmo4Vnui4c0rW9t8BpFplgBuX2kXGlzLhvP SbNxKFwit3p9AMg//rbdr8dKDnxe9Jwk8mVd9Q1lGH2vnnZiFKHlq8+vGS0dqZwK25 HenngjEHBMd0A== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5A9rO9F097574; Thu, 10 Jun 2010 11:53:24 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 10 Jun 2010 11:53:24 +0200 Message-ID: <20100610115324.10161biomkjndvy8@webmail.leidinger.net> Date: Thu, 10 Jun 2010 11:53:24 +0200 From: Alexander Leidinger To: jhell References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> In-Reply-To: <4C0FB1DE.9080508@dataix.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 2B0D284400A.A5DAF X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276768410.15697@8r0q2ZXhaXHjW+sOx5mwbQ X-EBL-Spam-Status: No Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 09:53:33 -0000 Quoting jhell (from Wed, 09 Jun 2010 11:23:10 -0400): > On 06/09/2010 11:07, jhell wrote: >> On 06/09/2010 10:26, Alexander Leidinger wrote: >>> Hi, >>> >>> I noticed that we do not have an automatism to scrub a ZFS pool >>> periodically. Is there interest in something like this, or shall I keep >>> it local? >>> >>> Here's the main part of the monthly periodic script I quickly created: >>> ---snip--- >>> case "$monthly_scrub_zfs_enable" in >>> [Yy][Ee][Ss]) >>> echo >>> echo 'Scrubbing of zfs pools:' >>> >>> if [ -z "${monthly_scrub_zfs_pools}" ]; then >>> monthly_scrub_zfs_pools="$(zpool list -H -o name)" >>> fi >>> >>> for pool in ${monthly_scrub_zfs_pools}; do >>> # successful only if there is at least one pool to scrub >>> rc=0 >>> >>> echo " starting scrubbing of pool '${pool}'" >>> zpool scrub ${pool} >>> echo " consult 'zpool status ${pool}' for the result" >>> echo " or wait for the daily_status_zfs mail, if >>> enabled" >>> done >>> ;; >>> ---snip--- >>> >>> Bye, >>> Alexander. >>> >> >> Please add a check to see if any resilerving is being done on the pool >> that the scub is being executed on. (Just in case), I would hope that >> the scrub would fail silently in this case. >> >> Please also check whether a scrub is already running on one of the pools >> and if so & another pool exists start a background loop to wait for the >> first scrub to finish or die silently. >> >> I had a scrub fully restart from calling scrub a second time after being >> more than 50% complete, its frustrating. >> >> >> Thanks!, >> > > I should probably suggest one check that comes to mind. > > zpool history ${pool} | grep scrub | tail -1 |cut -f1 -d. > > Then compare the output with today's date to make sure today is >= 30 > days from the date of the last scrub. > > With the above this could be turned into a daily_zfs_scrub_enable with a > default daily_zfs_scrub_threshold="30" and ensuring that if one check is > missed it will not take another 30 days to run the check again. Good idea! I even found a command line which does the calculation for the number of days between "now" and the last run (not taking a leap year into account, but an off-by-one day error here does not matter). Bye, Alexander. -- "He's a businessman. I'll make him an offer he can't refuse." -- Vito Corleone, "Chapter 1", page 39 http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 10:24:13 2010 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FD3D1065676 for ; Thu, 10 Jun 2010 10:24:13 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 05F768FC0A for ; Thu, 10 Jun 2010 10:24:12 +0000 (UTC) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o5AAOAMJ004753 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 10 Jun 2010 12:24:10 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.3/8.14.3) with ESMTP id o5AAO1AZ058643 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 10 Jun 2010 12:24:01 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o5AAO0tx080480; Thu, 10 Jun 2010 12:24:00 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o5AAO01W080479; Thu, 10 Jun 2010 12:24:00 +0200 (CEST) (envelope-from ticso) Date: Thu, 10 Jun 2010 12:24:00 +0200 From: Bernd Walter To: Alexander Leidinger Message-ID: <20100610102350.GP72453@cicely7.cicely.de> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <20100609144355.GL72453@cicely7.cicely.de> <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01 autolearn=ham version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de Cc: ticso@cicely.de, fs@FreeBSD.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 10:24:13 -0000 On Thu, Jun 10, 2010 at 11:23:45AM +0200, Alexander Leidinger wrote: > > Quoting Bernd Walter (from Wed, 9 Jun 2010 > 16:43:55 +0200): > > >On Wed, Jun 09, 2010 at 04:26:27PM +0200, Alexander Leidinger wrote: > >>Hi, > >> > >>I noticed that we do not have an automatism to scrub a ZFS pool > >>periodically. Is there interest in something like this, or shall I > >>keep it local? > > > >For me scrub'ing takes several days without having a special big > >pool size and starting another scrub restarts everything. > >You should at least check if another one is still running. > > Good point, I will have a look at this... > > But I'm a little bit surprised, when I scrub a pool of 3 times 250 GB > disks in RAIDZ configuration, it is finished fast (a fraction of a > day... maybe an hour or two). Initially it displays a very long time > (>400 hours), but this is reducing after a while drastically. The pool > is filled up to 3/4 of the entire capacity. Well - my system is not idle during scrub and I don't have very fast disks either. My system runs with 2x 4x500G RAIDZ. Disks are consumer grade sata. Controller are onboard Intel AHCI and SiI 3132. OS is 8.0RC1(r198183), therefor I'm still using ata driver. That's at scrub start: [115]cicely14# zpool status pool: data state: ONLINE scrub: scrub in progress for 0h0m, 0.00% done, 2275h55m to go config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad34 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad28 ONLINE 0 0 0 ad26 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad36 ONLINE 0 0 0 ad10 ONLINE 0 0 0 cache label/cache6 ONLINE 0 0 0 label/cache7 ONLINE 0 0 0 label/cache8 ONLINE 0 0 0 label/cache9 ONLINE 0 0 0 label/cache10 ONLINE 0 0 0 errors: No known data errors ETA first increases: [116]cicely14# zpool status pool: data state: ONLINE scrub: scrub in progress for 0h0m, 0.00% done, 2539h19m to go Then gets smaller: [117]cicely14# zpool status pool: data state: ONLINE scrub: scrub in progress for 0h1m, 0.00% done, 1551h38m to go [120]cicely14# zpool status pool: data state: ONLINE scrub: scrub in progress for 0h2m, 0.00% done, 1182h20m to go But it may get higher again: [121]cicely14# zpool status pool: data state: ONLINE scrub: scrub in progress for 0h6m, 0.01% done, 1346h41m to go I dont remember the time it took for the last scrub, but IIRC it took something about 2-3 days, so initial ETA is much higher than reality too. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 10:29:22 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FFAD106566C for ; Thu, 10 Jun 2010 10:29:22 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id F15298FC0A for ; Thu, 10 Jun 2010 10:29:21 +0000 (UTC) Received: from omta08.westchester.pa.mail.comcast.net ([76.96.62.12]) by qmta01.westchester.pa.mail.comcast.net with comcast id U9wc1e0060Fqzac51AVMfS; Thu, 10 Jun 2010 10:29:21 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta08.westchester.pa.mail.comcast.net with comcast id UAVL1e0023S48mS3UAVLo5; Thu, 10 Jun 2010 10:29:21 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id CFDC99B418; Thu, 10 Jun 2010 03:29:18 -0700 (PDT) Date: Thu, 10 Jun 2010 03:29:18 -0700 From: Jeremy Chadwick To: ticso@cicely.de Message-ID: <20100610102918.GA69770@icarus.home.lan> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <20100609144355.GL72453@cicely7.cicely.de> <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> <20100610102350.GP72453@cicely7.cicely.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610102350.GP72453@cicely7.cicely.de> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Alexander Leidinger , fs@FreeBSD.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 10:29:22 -0000 On Thu, Jun 10, 2010 at 12:24:00PM +0200, Bernd Walter wrote: > On Thu, Jun 10, 2010 at 11:23:45AM +0200, Alexander Leidinger wrote: > > > > Quoting Bernd Walter (from Wed, 9 Jun 2010 > > 16:43:55 +0200): > > > > >On Wed, Jun 09, 2010 at 04:26:27PM +0200, Alexander Leidinger wrote: > > >>Hi, > > >> > > >>I noticed that we do not have an automatism to scrub a ZFS pool > > >>periodically. Is there interest in something like this, or shall I > > >>keep it local? > > > > > >For me scrub'ing takes several days without having a special big > > >pool size and starting another scrub restarts everything. > > >You should at least check if another one is still running. > > > > Good point, I will have a look at this... > > > > But I'm a little bit surprised, when I scrub a pool of 3 times 250 GB > > disks in RAIDZ configuration, it is finished fast (a fraction of a > > day... maybe an hour or two). Initially it displays a very long time > > (>400 hours), but this is reducing after a while drastically. The pool > > is filled up to 3/4 of the entire capacity. > > Well - my system is not idle during scrub and I don't have very > fast disks either. > My system runs with 2x 4x500G RAIDZ. > Disks are consumer grade sata. > Controller are onboard Intel AHCI and SiI 3132. > OS is 8.0RC1(r198183), therefor I'm still using ata driver. > > That's at scrub start: > [115]cicely14# zpool status > pool: data > state: ONLINE > scrub: scrub in progress for 0h0m, 0.00% done, 2275h55m to go > config: > > NAME STATE READ WRITE CKSUM > data ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad34 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad28 ONLINE 0 0 0 > ad26 ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ad36 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > cache > label/cache6 ONLINE 0 0 0 > label/cache7 ONLINE 0 0 0 > label/cache8 ONLINE 0 0 0 > label/cache9 ONLINE 0 0 0 > label/cache10 ONLINE 0 0 0 > > errors: No known data errors > > ETA first increases: > [116]cicely14# zpool status > pool: data > state: ONLINE > scrub: scrub in progress for 0h0m, 0.00% done, 2539h19m to go > > Then gets smaller: > [117]cicely14# zpool status > pool: data > state: ONLINE > scrub: scrub in progress for 0h1m, 0.00% done, 1551h38m to go > > [120]cicely14# zpool status > pool: data > state: ONLINE > scrub: scrub in progress for 0h2m, 0.00% done, 1182h20m to go > > But it may get higher again: > [121]cicely14# zpool status > pool: data > state: ONLINE > scrub: scrub in progress for 0h6m, 0.01% done, 1346h41m to go > > I dont remember the time it took for the last scrub, but IIRC > it took something about 2-3 days, so initial ETA is much higher > than reality too. You're running an 8.0 release candidate. There have been some changes to scrubbing and other whatnots with ZFS between then and now. I'd recommend trying RELENG_8 and seeing if the behaviour remains. You don't have to use ahci.ko (you can stick with ataahci.ko). By "behaviour" I'm referring to how long the scrub is taking. The variance you see in ETA is normal. You can verify that things aren't stalled blindly by using "zpool iostat" (there should be fairly intensive I/O). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 11:06:13 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7343F1065672 for ; Thu, 10 Jun 2010 11:06:13 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id 2D6898FC16 for ; Thu, 10 Jun 2010 11:06:12 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id D0188471D3; Thu, 10 Jun 2010 13:06:11 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id qtOxo--HCVD5; Thu, 10 Jun 2010 13:06:09 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id B89E2471D2; Thu, 10 Jun 2010 13:06:09 +0200 (CEST) Date: Thu, 10 Jun 2010 13:06:09 +0200 From: Anders Nordby To: Peter Jeremy Message-ID: <20100610110609.GA87243@fupp.net> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20100610081710.GA64350@server.vk2pj.dyndns.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 11:06:13 -0000 Hi, On Thu, Jun 10, 2010 at 06:17:10PM +1000, Peter Jeremy wrote: > I wonder if your system is running out of free RAM. How would you > like to monitor "inactive", "cache" and "free" from either "systat -v" > or "vmstat -s" whilst the problem is occurring. > > Does something like > perl -e '$x = "x" x 10000000;' > temporarily correct the problem? While the problem is happening: root@unixfile:~# vmstat -s 511745441 cpu context switches 151635080 device interrupts 14028218 software interrupts 11549957 traps 974939023 system calls 22 kernel threads created 77512 fork() calls 6097 vfork() calls 0 rfork() calls 0 swap pager pageins 0 swap pager pages paged in 0 swap pager pageouts 0 swap pager pages paged out 699 vnode pager pageins 4777 vnode pager pages paged in 2024 vnode pager pageouts 2471 vnode pager pages paged out 0 page daemon wakeups 0 pages examined by the page daemon 318 pages reactivated 4738808 copy-on-write faults 4957 copy-on-write optimized faults 3843376 zero fill pages zeroed 0 zero fill pages prezeroed 2273 intransit blocking page faults 11236873 total VM faults taken 0 pages affected by kernel thread creation 20699066 pages affected by fork() 1707164 pages affected by vfork() 0 pages affected by rfork() 363 pages cached 27229532 pages freed 0 pages freed by daemon 6618712 pages freed by exiting processes 6054 pages active 37307 pages inactive 28 pages in VM cache 261148 pages wired down 456560 pages free 4096 bytes per page 43744208 total name lookups cache hits (19% pos + 1% neg) system 0% per-directory deletions 2%, falsehits 0%, toolong 0% And from systat -v: Disks da0 da1 pass0 pass1 1045240 wire KB/t 0.00 0.00 0.00 0.00 25240 act tps 0 0 0 0 149344 inact MB/s 0.00 0.00 0.00 0.00 112 cache %busy 0 0 0 0 1824452 free 323680 buf > Does something like > perl -e '$x = "x" x 10000000;' > temporarily correct the problem? No. Regards, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 11:13:17 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22FC3106564A for ; Thu, 10 Jun 2010 11:13:17 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id D0C508FC19 for ; Thu, 10 Jun 2010 11:13:16 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id 58B484720E; Thu, 10 Jun 2010 13:13:16 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id jRzrMs13SDTp; Thu, 10 Jun 2010 13:13:16 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id 2C2934720D; Thu, 10 Jun 2010 13:13:16 +0200 (CEST) Date: Thu, 10 Jun 2010 13:13:16 +0200 From: Anders Nordby To: Rick Macklem Message-ID: <20100610111316.GB87243@fupp.net> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 11:13:17 -0000 Hi, On Wed, Jun 09, 2010 at 11:28:52AM -0400, Rick Macklem wrote: > When you tried a different NIC, was a different type (ie. different > chipset that uses a different device driver)? I suggested that not > because I thought the hardware was broken but because I thought it > might be related to the network interface's device driver and switching > to a different device driver would isolate that possibility. Nope. I switched from NIC 1 to 2, and switched server to an identical one. They both use bge NICs, a very common interface. I somehow doubt this is related to the NIC or driver, I have many machines with the same bge NIC (HP NC7782) that does not have any problems like this. > Well, it doesn't seem to be mbuf exhaustion (I don't know what > "out of packet secondary zone" means, I'll have to look at that) and > if it doesn't handle pings it seems really hosed. Have you done a > "vmstat 5" + "ps axlH" (or similar) to try and see what it's doing? > ("top" and "netstat" might also help?) root@unixfile:~# vmstat 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 0 0 0 410M 1781M 279 0 0 0 481 0 0 0 1918 12338 6476 0 2 98 0 0 0 410M 1781M 1 0 0 0 0 0 0 0 497 34 2268 0 1 99 0 0 0 410M 1781M 123 0 0 0 116 0 0 0 455 1787 2071 0 0 99 0 0 0 410M 1781M 0 0 0 0 4 0 0 0 292 38 1459 0 1 99 ^C root@unixfile:~# top -b 5 last pid: 86306; load averages: 0.04, 0.13, 0.07 up 0+22:01:28 13:09:31 46 processes: 1 running, 45 sleeping Mem: 25M Active, 147M Inact, 1021M Wired, 112K Cache, 316M Buf, 1780M Free Swap: 6144M Total, 6144M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 786 root 4 44 0 5804K 1276K rpcsvc 1 51:04 0.00% nfsd 839 nagios 1 44 0 10880K 3228K select 0 0:04 0.00% nrpe2 847 root 1 44 0 20852K 8356K select 0 0:04 0.00% perl5.10.1 1076 root 1 44 0 11968K 4188K select 0 0:01 0.00% sendmail 81645 root 1 44 0 10220K 2920K wait 2 0:01 0.00% bash The server doesn't have many connections, 16 in ESTABLISHED state. As you can see from top, the server has 1780 MB free memory. Regards, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 11:46:16 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C31141065676 for ; Thu, 10 Jun 2010 11:46:16 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id A830F8FC1A for ; Thu, 10 Jun 2010 11:46:15 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta07.emeryville.ca.mail.comcast.net with comcast id UBbX1e0040EPchoA7BmFsj; Thu, 10 Jun 2010 11:46:15 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta01.emeryville.ca.mail.comcast.net with comcast id UBmE1e00A3S48mS8MBmEYg; Thu, 10 Jun 2010 11:46:15 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 692D99B418; Thu, 10 Jun 2010 04:46:14 -0700 (PDT) Date: Thu, 10 Jun 2010 04:46:14 -0700 From: Jeremy Chadwick To: Anders Nordby Message-ID: <20100610114614.GA71432@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610111316.GB87243@fupp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610111316.GB87243@fupp.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 11:46:16 -0000 On Thu, Jun 10, 2010 at 01:13:16PM +0200, Anders Nordby wrote: > On Wed, Jun 09, 2010 at 11:28:52AM -0400, Rick Macklem wrote: > > When you tried a different NIC, was a different type (ie. different > > chipset that uses a different device driver)? I suggested that not > > because I thought the hardware was broken but because I thought it > > might be related to the network interface's device driver and switching > > to a different device driver would isolate that possibility. > > Nope. I switched from NIC 1 to 2, and switched server to an identical > one. They both use bge NICs, a very common interface. I somehow doubt > this is related to the NIC or driver, I have many machines with the same > bge NIC (HP NC7782) that does not have any problems like this. This may not be the problem of course, but are they they *exact* same model and revision of NIC? pciconf -lvc on both boxes, and looking for the relevant bgeX interfaces, would determine that. I believe Rick was recommending you switch to another model of NIC that doesn't fall under the same driver, e.g. do you see this behaviour when using em(4). Also, can you provide uname -a output, or specifically the build date of the kernel. There have been bge(4) changes happening regularly throughout the lifetime of RELENG_8, including into the -PRERELEASE stage. > Mem: 25M Active, 147M Inact, 1021M Wired, 112K Cache, 316M Buf, 1780M Free > Swap: 6144M Total, 6144M Free > > The server doesn't have many connections, 16 in ESTABLISHED state. As > you can see from top, the server has 1780 MB free memory. Clarification: I believe it actually has 1927MB (147M Inact + 1780M Free) available. I've always understood top's "Free" field to mean "number/amount of pages which have never been touched/used since the kernel was started", while "Inact" to mean "number/amount of pages which have been touched/used but are not actively being used, this available for use". If someone more familiar with the VM and top could expand on this, that'd be helpful. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 11:48:33 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3762C106566B for ; Thu, 10 Jun 2010 11:48:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id 1E61E8FC19 for ; Thu, 10 Jun 2010 11:48:33 +0000 (UTC) Received: from omta04.emeryville.ca.mail.comcast.net ([76.96.30.35]) by qmta07.emeryville.ca.mail.comcast.net with comcast id UBXx1e0060lTkoCA7BoY5Q; Thu, 10 Jun 2010 11:48:32 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta04.emeryville.ca.mail.comcast.net with comcast id UBoY1e0013S48mS8QBoYXq; Thu, 10 Jun 2010 11:48:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 09CE09B418; Thu, 10 Jun 2010 04:48:32 -0700 (PDT) Date: Thu, 10 Jun 2010 04:48:32 -0700 From: Jeremy Chadwick To: Anders Nordby Message-ID: <20100610114831.GB71432@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610110609.GA87243@fupp.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 11:48:33 -0000 On Thu, Jun 10, 2010 at 01:06:09PM +0200, Anders Nordby wrote: > Hi, > > On Thu, Jun 10, 2010 at 06:17:10PM +1000, Peter Jeremy wrote: > > I wonder if your system is running out of free RAM. How would you > > like to monitor "inactive", "cache" and "free" from either "systat -v" > > or "vmstat -s" whilst the problem is occurring. > > > > Does something like > > perl -e '$x = "x" x 10000000;' > > temporarily correct the problem? > > While the problem is happening: > > root@unixfile:~# vmstat -s Can you also provide "vmstat -i" output, both when the issue is happening and after the machine has been rebooted (but been up for 5-10 minutes)? Thanks. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 11:54:46 2010 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E07E2106564A for ; Thu, 10 Jun 2010 11:54:46 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 4A6238FC1C for ; Thu, 10 Jun 2010 11:54:45 +0000 (UTC) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o5ABsiTj010250 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 10 Jun 2010 13:54:44 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.3/8.14.3) with ESMTP id o5ABsWt8061742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 10 Jun 2010 13:54:32 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o5ABsWZJ080841; Thu, 10 Jun 2010 13:54:32 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o5ABsVw4080840; Thu, 10 Jun 2010 13:54:31 +0200 (CEST) (envelope-from ticso) Date: Thu, 10 Jun 2010 13:54:31 +0200 From: Bernd Walter To: Jeremy Chadwick Message-ID: <20100610115429.GQ72453@cicely7.cicely.de> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <20100609144355.GL72453@cicely7.cicely.de> <20100610112345.644960lrau3mxfk0@webmail.leidinger.net> <20100610102350.GP72453@cicely7.cicely.de> <20100610102918.GA69770@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610102918.GA69770@icarus.home.lan> X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01 autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de Cc: Alexander Leidinger , ticso@cicely.de, fs@FreeBSD.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 11:54:47 -0000 On Thu, Jun 10, 2010 at 03:29:18AM -0700, Jeremy Chadwick wrote: > You're running an 8.0 release candidate. There have been some changes > to scrubbing and other whatnots with ZFS between then and now. I'd > recommend trying RELENG_8 and seeing if the behaviour remains. You > don't have to use ahci.ko (you can stick with ataahci.ko). Good to know. Updating to more recent 8 or maybe current is already on my TODO list for ataahci, but since the system runs it is quite low. My wishlist also has reboot persistent cache devices. For me they work prety well, but are empty after reboot and it takes several days to fill. But so far noone could tell me if it has been changed so far. > By "behaviour" I'm referring to how long the scrub is taking. The > variance you see in ETA is normal. You can verify that things aren't > stalled blindly by using "zpool iostat" (there should be fairly > intensive I/O). -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 13:03:09 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2D8B106566B for ; Thu, 10 Jun 2010 13:03:09 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id 5B4F48FC08 for ; Thu, 10 Jun 2010 13:03:08 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id 02AE34766B; Thu, 10 Jun 2010 15:03:08 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id rlna0lJRHmku; Thu, 10 Jun 2010 15:03:07 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id CC6BE4766A; Thu, 10 Jun 2010 15:03:07 +0200 (CEST) Date: Thu, 10 Jun 2010 15:03:07 +0200 From: Anders Nordby To: Jeremy Chadwick Message-ID: <20100610130307.GA33285@fupp.net> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20100610114831.GB71432@icarus.home.lan> User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@FreeBSD.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 13:03:09 -0000 Hi, On Thu, Jun 10, 2010 at 04:48:32AM -0700, Jeremy Chadwick wrote: > Can you also provide "vmstat -i" output, both when the issue is > happening and after the machine has been rebooted (but been up for 5-10 > minutes)? Thanks. While having issues: root@unixfile:~# vmstat -i interrupt total rate irq1: atkbd0 6 0 irq14: ata0 1 0 irq18: uhci2 78164874 953 irq19: uhci1 643047 7 irq26: bge1 73830825 900 irq51: ciss0 642774 7 cpu0: timer 163861455 1998 cpu1: timer 163853438 1998 cpu3: timer 163906515 1999 cpu2: timer 163906515 1999 Total 5 minutes after a reboot: root@unixfile:~# vmstat -i interrupt total rate irq1: atkbd0 6 0 irq14: ata0 1 0 irq18: uhci2 5813 19 irq19: uhci1 2503 8 irq26: bge1 1997 6 irq51: ciss0 2503 8 cpu0: timer 592619 1995 cpu1: timer 584601 1968 cpu2: timer 584605 1968 cpu3: timer 584606 1968 Total 2359254 7943 Bye, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 13:39:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 056281065679 for ; Thu, 10 Jun 2010 13:39:01 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id DDC6F8FC20 for ; Thu, 10 Jun 2010 13:39:00 +0000 (UTC) Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76]) by qmta04.emeryville.ca.mail.comcast.net with comcast id UC9f1e0031eYJf8A4Df0yN; Thu, 10 Jun 2010 13:39:00 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta19.emeryville.ca.mail.comcast.net with comcast id UDez1e0033S48mS01DezCg; Thu, 10 Jun 2010 13:39:00 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 18E089B418; Thu, 10 Jun 2010 06:38:59 -0700 (PDT) Date: Thu, 10 Jun 2010 06:38:59 -0700 From: Jeremy Chadwick To: Anders Nordby Message-ID: <20100610133859.GA74094@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610130307.GA33285@fupp.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 13:39:01 -0000 On Thu, Jun 10, 2010 at 03:03:07PM +0200, Anders Nordby wrote: > On Thu, Jun 10, 2010 at 04:48:32AM -0700, Jeremy Chadwick wrote: > > Can you also provide "vmstat -i" output, both when the issue is > > happening and after the machine has been rebooted (but been up for 5-10 > > minutes)? Thanks. > > While having issues: > > root@unixfile:~# vmstat -i > interrupt total rate > irq1: atkbd0 6 0 > irq14: ata0 1 0 > irq18: uhci2 78164874 953 > irq19: uhci1 643047 7 > irq26: bge1 73830825 900 > irq51: ciss0 642774 7 > cpu0: timer 163861455 1998 > cpu1: timer 163853438 1998 > cpu3: timer 163906515 1999 > cpu2: timer 163906515 1999 > Total > > 5 minutes after a reboot: > > root@unixfile:~# vmstat -i > interrupt total rate > irq1: atkbd0 6 0 > irq14: ata0 1 0 > irq18: uhci2 5813 19 > irq19: uhci1 2503 8 > irq26: bge1 1997 6 > irq51: ciss0 2503 8 > cpu0: timer 592619 1995 > cpu1: timer 584601 1968 > cpu2: timer 584605 1968 > cpu3: timer 584606 1968 > Total 2359254 7943 The interrupt rate for bge1 (irq26) is very high during the problem, while otherwise is only ~6/sec. Shot in the dark, but this is probably the cause of the packet loss you see. Oddly, your uhci2 interface (used for USB) is also firing at a very high rate. I don't know if this is the sign of a NIC problem, driver problem, or interrupt (think APIC?) routing problem. Debugging this is beyond my capability, but folks like John Baldwin may have some ideas on where to go from here. Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or "tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is happening? The reason I ask is to determine if there's any chance this box starts seeing problems due to DoS attacks or excessive LAN traffic which is unexpected. Basically, be sure that all the network I/O going on across bge1 is expected. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 14:26:38 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C21971065678 for ; Thu, 10 Jun 2010 14:26:38 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6EA6D8FC15 for ; Thu, 10 Jun 2010 14:26:38 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FE15.dip.t-dialin.net [217.84.254.21]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 2D16D84400A for ; Thu, 10 Jun 2010 16:26:34 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id BB19E5133 for ; Thu, 10 Jun 2010 16:26:30 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276179990; bh=O+U83IuEdEGiKRxgFIt2pSeDkq2rqmzcd4g40Ir5/4c=; h=Message-ID:Date:From:To:Subject:MIME-Version:Content-Type: Content-Transfer-Encoding; b=X73EJTlzwUpNe8XMdiLBspwNWUvojLybicbfiESDeWuz8malhosrHeQ+u0kCjQXMj ejIEyzTPv3zZPziHF+btnDM4xiYYhbBVtaHZsV09Z1dlTVJ5EioCHAH3QPyFgj5qhz Qru0lq0RwfgPO1lfHIGas/wPANM/b2czD30y9TNkF53rW7xvkpCE9A3xbp3H+TuDkQ YDrvZv72vwHyOSp4xC6f44Tv41YKqsLB4GfZIsZuLaF4mlu1xs7QxTIeftWb5+Gzk9 kgtwWRfcYoPqBLSC1WRh6nbGB3HWPYL0PVDirAsTxHS0RbXX3VHuE2PrKBxLiitkDJ hS++2dNVC6H5g== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5AEQT3t024202 for fs@freebsd.org; Thu, 10 Jun 2010 16:26:29 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 10 Jun 2010 16:26:29 +0200 Message-ID: <20100610162629.38992mazf0sfdqg0@webmail.leidinger.net> Date: Thu, 10 Jun 2010 16:26:29 +0200 From: Alexander Leidinger To: fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 2D16D84400A.A5F60 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276784796.45294@k6vt/wRgJTOjnYvTQ47Q/Q X-EBL-Spam-Status: No Cc: Subject: CFT: periodic scrubbing of ZFS pools X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 14:26:38 -0000 Hi, as there seems to be interest in a periodic script to scrub zpools, I modified my monthly-POC into a daily script with parameters for which pools to scrub, how many days between scrubs (even different per pool, if required), and several error checks (non-existing pool specified, scrub in progress). You can find it at http://www.Leidinger.net/FreeBSD/current-patches/600.scrub-zfs Please put it into /etc/periodic/daily and test it. Possible periodic.conf variables are: daily_scrub_zfs_enable="YES" daily_scrub_zfs_pools="name1 name2 name3" # all if unset or empty daily scrub_zfs_default_threshold="" # default: 30 daily_scrub_zfs__threshold="" If there is no specific threshold for a pool (= days between scrubs), the default threshold is used. Bye, Alexander. -- Hear about... the guru who refused Novocaine while having a tooth pulled because he wanted to transcend dental medication? http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 14:59:18 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0152E106567A for ; Thu, 10 Jun 2010 14:59:18 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9F0508FC08 for ; Thu, 10 Jun 2010 14:59:17 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FE15.dip.t-dialin.net [217.84.254.21]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 0E76C84400A; Thu, 10 Jun 2010 16:59:14 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id E057E5138; Thu, 10 Jun 2010 16:59:10 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276181951; bh=0vjWRYEjt7X2eHhj6sivGuhQwwrwnP/Sq/H4T0I0N18=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=MYCwxdzv/PKYo4U6EK5PkRDZclVp1JTNQ+Dz7i4HObj97S3XHfR2fRpWISMIXF3RQ l+KXrNOOyhgXyCBJwbobh0dNK2q4a7XyN/3Vxee92H0tATM5M9eZp2hNfWHDS2exdV hpOtnddqYQa4CsKfCbczNUdXaqFSE8xr3p4uPRzpL5rzXRrWXfXZJjRJ9AcBufXAhN p1ae0dKxeogEBC9+7nv4O6i/HLIhgYLkAn+tEKxtTafTqcxF770N3nAeTNoAB/BcyJ tacPOwvmZKAGrt7oILXB8VO1ln8Ep4iNxyjyZoaNEPqjgWtiVplP+27IwCiEFPHWGZ FShrC1e+4z6Tw== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5AExAsZ057261; Thu, 10 Jun 2010 16:59:10 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 10 Jun 2010 16:59:09 +0200 Message-ID: <20100610165909.19296dpe2uxbeqo0@webmail.leidinger.net> Date: Thu, 10 Jun 2010 16:59:09 +0200 From: Alexander Leidinger To: Bartosz Stec References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C10B136.3030404@kkip.pl> In-Reply-To: <4C10B136.3030404@kkip.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 0E76C84400A.A4F1A X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276786754.8908@VKQnxMUjr3Yh7FMHYD2BZQ X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 14:59:18 -0000 Quoting Bartosz Stec (from Thu, 10 Jun 2010 11:32:38 +0200): > Ross-at-neces-dot-com already did what you're searching for. I'm > using his periodic scripts for some months now, check here: > http://www.neces.com/blog/technology/integrating-freebsd-zfs-and-periodic-snapshots-and-scrubs. > They're doing all necessary stuff, like checking for scrub in progress > too. Hope you'll find them helpful. They can not be imported as is into FreeBSD, the way he is sharing common stuff between several scripts is a little bit outside of what *I* would agree to do in FreeBSD. My polished up script has also some more features and the code in question is not difficult to get right. So... no reuse of what he did. You can find the script referenced in another mail I wrote to fs@. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 14:59:55 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0B7E106566B for ; Thu, 10 Jun 2010 14:59:55 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 81ACB8FC12 for ; Thu, 10 Jun 2010 14:59:55 +0000 (UTC) Received: by vws1 with SMTP id 1so33527vws.13 for ; Thu, 10 Jun 2010 07:59:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=jcvOVPY1k1EA9E/whQYMfE/D3fw2KHx2L1/+fCoQSEg=; b=gU6yqmlSmt9QstP8f6NmVhm4SOtedmtZ+dV7KcwJ7IaVLolhc/NC55vGWkZeCBTXE2 JlINADUuvey56dwwBAUpRVe7Zj2mZOG5pgtMlzRlYZn1sKZMLt9L9iW4vS0x6aTWPqMl GWXzBv87VE4NxKGX4lvEg49q8L3aTBT6/rw1Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=Sqt9ZQFyJSgUp8JhzCH7MhVehk15X8KDwqb/cVvadf5gom4r6heqX2ZXc1Td2IYHoz 4u3Sl/UdNFR9D7m0vvGIbQuRKwacah5bcqtk75L7qLiIuxmzIJ1KxnEa/4be9OhQYzgv mPHiiw1ljgvy4a1mwjQxlqaT704/O0kuDwGpg= MIME-Version: 1.0 Received: by 10.224.72.34 with SMTP id k34mr271999qaj.283.1276181987148; Thu, 10 Jun 2010 07:59:47 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.220.202.11 with HTTP; Thu, 10 Jun 2010 07:59:46 -0700 (PDT) In-Reply-To: <20100610115324.10161biomkjndvy8@webmail.leidinger.net> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> <20100610115324.10161biomkjndvy8@webmail.leidinger.net> Date: Thu, 10 Jun 2010 07:59:46 -0700 X-Google-Sender-Auth: s-MbWBVC1hkAYToYvi2-TLIvWQE Message-ID: From: Artem Belevich To: Alexander Leidinger Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 14:59:55 -0000 > Good idea! I even found a command line which does the calculation for the > number of days between "now" and the last run (not taking a leap year int= o > account, but an off-by-one day error here does not matter). You can get exactly one month difference by using -v option of 'date' command to figure out the time/date offset by arbitrary amount. Combined with +"%s" format to print number of seconds since Epoch and -r to specify the reference point in time it makes 'date' pretty useful in scripts. --Artem On Thu, Jun 10, 2010 at 2:53 AM, Alexander Leidinger wrote: > Quoting jhell (from Wed, 09 Jun 2010 11:23:10 -0400): > >> On 06/09/2010 11:07, jhell wrote: >>> >>> On 06/09/2010 10:26, Alexander Leidinger wrote: >>>> >>>> Hi, >>>> >>>> I noticed that we do not have an automatism to scrub a ZFS pool >>>> periodically. Is there interest in something like this, or shall I kee= p >>>> it local? >>>> >>>> Here's the main part of the monthly periodic script I quickly created: >>>> ---snip--- >>>> case "$monthly_scrub_zfs_enable" in >>>> =A0 =A0[Yy][Ee][Ss]) >>>> =A0 =A0 =A0 =A0echo >>>> =A0 =A0 =A0 =A0echo 'Scrubbing of zfs pools:' >>>> >>>> =A0 =A0 =A0 =A0if [ -z "${monthly_scrub_zfs_pools}" ]; then >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0monthly_scrub_zfs_pools=3D"$(zpool list= -H -o name)" >>>> =A0 =A0 =A0 =A0fi >>>> >>>> =A0 =A0 =A0 =A0for pool in ${monthly_scrub_zfs_pools}; do >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0# successful only if there is at least = one pool to scrub >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rc=3D0 >>>> >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0echo " =A0 starting scrubbing of pool '= ${pool}'" >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0zpool scrub ${pool} >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0echo " =A0 =A0 =A0consult 'zpool status= ${pool}' for the >>>> result" >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0echo " =A0 =A0 =A0or wait for the daily= _status_zfs mail, if >>>> enabled" >>>> =A0 =A0 =A0 =A0done >>>> =A0 =A0 =A0 =A0;; >>>> ---snip--- >>>> >>>> Bye, >>>> Alexander. >>>> >>> >>> Please add a check to see if any resilerving is being done on the pool >>> that the scub is being executed on. (Just in case), I would hope that >>> the scrub would fail silently in this case. >>> >>> Please also check whether a scrub is already running on one of the pool= s >>> and if so & another pool exists start a background loop to wait for the >>> first scrub to finish or die silently. >>> >>> I had a scrub fully restart from calling scrub a second time after bein= g >>> more than 50% complete, its frustrating. >>> >>> >>> Thanks!, >>> >> >> I should probably suggest one check that comes to mind. >> >> zpool history ${pool} | grep scrub | tail -1 |cut -f1 -d. >> >> Then compare the output with today's date to make sure today is >=3D 30 >> days from the date of the last scrub. >> >> With the above this could be turned into a daily_zfs_scrub_enable with a >> default daily_zfs_scrub_threshold=3D"30" and ensuring that if one check = is >> missed it will not take another 30 days to run the check again. > > Good idea! I even found a command line which does the calculation for the > number of days between "now" and the last run (not taking a leap year int= o > account, but an off-by-one day error here does not matter). > > Bye, > Alexander. > > -- > "He's a businessman. I'll make him an offer he can't refuse." > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-- Vito Corleone, "Chapter 1", page 39 > > http://www.Leidinger.net =A0 =A0Alexander @ Leidinger.net: PGP ID =3D B00= 63FE7 > http://www.FreeBSD.org =A0 =A0 =A0 netchild @ FreeBSD.org =A0: PGP ID =3D= 72077137 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 15:38:37 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BDDB106566C for ; Thu, 10 Jun 2010 15:38:37 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id A002C8FC15 for ; Thu, 10 Jun 2010 15:38:36 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FE15.dip.t-dialin.net [217.84.254.21]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 1379684400A; Thu, 10 Jun 2010 17:38:30 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 61EC4513D; Thu, 10 Jun 2010 17:38:26 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276184306; bh=6g9oRTUhwv6PRLZi6gzWvchHLsGHcQ1V5jhOtYXtTFk=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=VLIIg5HYI4BUcHydyOWEGEQz/v7JWT4faPiymqX/RnnUzctoGnvPQsmguMGzZTJe6 j/9AH0EsQ+iptqVkmWC4Mm6RhDHjWk6JUTSD4I4wGZPPdODPzHzU+5fnv3PFUdJmiZ GX777V5YzYC4TufeJ1Yzw3Gslxl/Vu4RhtfTxzpkB7GIB0R6ToP7DeOjJXuC0zj+TX G+9ToXXeKIaaWxdcwlaCd93MdLvvfj1D88UD9iiTFs0DbbmnF+gG9beW5JN3yznc1W WByGsH7Vh4CN4tbM/78ZxxSmJrqBD2a+IJGGj26GTtQcmD8wwjv5tN7Rr1kaTP2oy0 JkpXN0zcHZFFA== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5AFcPUF014853; Thu, 10 Jun 2010 17:38:25 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 10 Jun 2010 17:38:25 +0200 Message-ID: <20100610173825.164930ekkryr5tes@webmail.leidinger.net> Date: Thu, 10 Jun 2010 17:38:25 +0200 From: Alexander Leidinger To: Artem Belevich References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> <20100610115324.10161biomkjndvy8@webmail.leidinger.net> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 1379684400A.A3AE9 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276789113.78919@IHdN45pu8j7X7jQc1IMirw X-EBL-Spam-Status: No Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 15:38:37 -0000 Quoting Artem Belevich (from Thu, 10 Jun 2010 07:59:46 -0700): >> Good idea! I even found a command line which does the calculation for the >> number of days between "now" and the last run (not taking a leap year into >> account, but an off-by-one day error here does not matter). > > You can get exactly one month difference by using -v option of 'date' > command to figure out the time/date offset by arbitrary amount. > Combined with +"%s" format to print number of seconds since Epoch and > -r to specify the reference point in time it makes 'date' pretty > useful in scripts. What we have is the date of the last scrub (e.g. 2010-06-08.20:51:12), and what we want to know is if between the last scrub and now we passed a specific amount of days or not. What I do is taking the year multiplied with 365 plus the day of the year. Both of this for the last date of the scrub and "now". The difference is the number of days between those two dates. This value I can use with -le or -ge for the test command. This is only off by one once in a leap year when the leap-day is in-between the two dates (those people which want to scrub every 4 years are off by two when both leap-days are in-between, but a scrub of every 4 years or more looks unreasonable to me, so I do not care much about this). This is done in one line with two calls to date (once for the last scrub, once for "now") and a little bit of shell-buildin-arithmetic. If you have a more correct version which is not significantly more complex, feel free to share it here. Bye, Alexander. -- "Who would have though hell would really exist? And that it would be in New Jersey?" -Leela "Actually..." - Fry http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 16:34:55 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EDECF106564A for ; Thu, 10 Jun 2010 16:34:54 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9BB0A8FC19 for ; Thu, 10 Jun 2010 16:34:54 +0000 (UTC) Received: by gwj20 with SMTP id 20so107585gwj.13 for ; Thu, 10 Jun 2010 09:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=MC3I8FcxLBDFrRWpwq2eEV8tx9Qr4iji8o+RECRnV6g=; b=r7Dv6Qemf8ZgBMDM3eyTWmVO45hQ1Bez4+dhe45fXUQWcLL5bjCp/uAuZnvWM1zhcu 3+lN5+8nzzXsMS/3sHtYq5H8uOg9q7JC2IUJ56L+FlK5aLilLzbItizm3yiypORJaMPf RmhNvNmOyXtLWj/tu2qPWZJKLXuQJx70torhk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=L6sLqT5vh7HYYn07viZ8RSlR+s747bFHiMAtuAaDz+4Ta46vcc/OZ+E8vIQEPCrNyp +s7yTHMox7L4mAgzQMDPPWpNzYarBw2W1I7iSiXNeMcoTP8DO24rejc+AHaef7nF2GD5 L0w0HxMaZBnm5976PRgoLptA+jgNLoOvskRV8= MIME-Version: 1.0 Received: by 10.229.214.8 with SMTP id gy8mr341769qcb.173.1276187689283; Thu, 10 Jun 2010 09:34:49 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.220.202.11 with HTTP; Thu, 10 Jun 2010 09:34:49 -0700 (PDT) In-Reply-To: <20100610173825.164930ekkryr5tes@webmail.leidinger.net> References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> <20100610115324.10161biomkjndvy8@webmail.leidinger.net> <20100610173825.164930ekkryr5tes@webmail.leidinger.net> Date: Thu, 10 Jun 2010 09:34:49 -0700 X-Google-Sender-Auth: -vohyxI_OHLBoDB5mmJ_vCo_cfE Message-ID: From: Artem Belevich To: Alexander Leidinger Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 16:34:55 -0000 You can do something like this: #SCRUB_TS=3D"2010-06-08.20:51:12" SCRUB_TS=3D$1 # parse timestamp, move it forward by 1 month and print in seconds since Ep= och NEXT_SCRUB_DATE_S=3D`date -j -f "%Y-%m-%d.%H:%M:%S" -v+1m +"%s" $SCRUB_TS` # for debugging purposes convert epoch time into something human-readable NEXT_SCRUB_DATE=3D`date -r $NEXT_SCRUB_DATE` # surrent time in secs since Epoch. NOW_S=3D`date +"%s"` # Compare two times to figure out if next scrub time is still in the future if [ $NOW_S -gt $NEXT_SCRUB_DATE_S ]; then echo yup. else echo nope. fi --Artem On Thu, Jun 10, 2010 at 8:38 AM, Alexander Leidinger wrote: > Quoting Artem Belevich (from Thu, 10 Jun 2010 07:59:46 > -0700): > >>> Good idea! I even found a command line which does the calculation for t= he >>> number of days between "now" and the last run (not taking a leap year >>> into >>> account, but an off-by-one day error here does not matter). >> >> You can get exactly one month difference by using -v option of 'date' >> command to figure out the time/date offset by arbitrary amount. >> Combined with +"%s" format to print number of seconds since Epoch and >> -r to specify the reference point in time it makes 'date' pretty >> useful in scripts. > > What we have is the date of the last scrub (e.g. 2010-06-08.20:51:12), an= d > what we want to know is if between the last scrub and now we passed a > specific amount of days or not. > > What I do is taking the year multiplied with 365 plus the day of the year= . > Both of this for the last date of the scrub and "now". The difference is = the > number of days between those two dates. This value I can use with -le or = -ge > for the test command. > > This is only off by one once in a leap year when the leap-day is in-betwe= en > the two dates (those people which want to scrub every 4 years are off by = two > when both leap-days are in-between, but a scrub of every 4 years or more > looks unreasonable to me, so I do not care much about this). > > This is done in one line with two calls to date (once for the last scrub, > once for "now") and a little bit of shell-buildin-arithmetic. If you have= a > more correct version which is not significantly more complex, feel free t= o > share it here. > > Bye, > Alexander. > > -- > =A0"Who would have though hell would really exist? And that it would be i= n New > Jersey?" -Leela > "Actually..." - Fry > > http://www.Leidinger.net =A0 =A0Alexander @ Leidinger.net: PGP ID =3D B00= 63FE7 > http://www.FreeBSD.org =A0 =A0 =A0 netchild @ FreeBSD.org =A0: PGP ID =3D= 72077137 > From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 17:36:32 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 294291065673; Thu, 10 Jun 2010 17:36:32 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id F155B8FC1B; Thu, 10 Jun 2010 17:36:31 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD)) id 1OMlg6-000NGa-0v; Thu, 10 Jun 2010 13:36:30 -0400 Date: Thu, 10 Jun 2010 13:36:29 -0400 From: Gary Palmer To: Jeremy Chadwick Message-ID: <20100610173629.GA70716@in-addr.com> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610111316.GB87243@fupp.net> <20100610114614.GA71432@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100610114614.GA71432@icarus.home.lan> Cc: freebsd-fs@FreeBSD.org, Anders Nordby Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 17:36:32 -0000 On Thu, Jun 10, 2010 at 04:46:14AM -0700, Jeremy Chadwick wrote: > Clarification: I believe it actually has 1927MB (147M Inact + 1780M > Free) available. I've always understood top's "Free" field to mean > "number/amount of pages which have never been touched/used since the > kernel was started", while "Inact" to mean "number/amount of pages which > have been touched/used but are not actively being used, this available > for use". > > If someone more familiar with the VM and top could expand on this, > that'd be helpful. I'm not a VM guru, however here is my understanding: - "Free" are pages that have been reclaimed by the page daemon and are ready for immediate use without further action. The page daemon always tries to keep a few pages in the "Free" state to avoid problems with page starvation - "Inactive" pages are pages that are candidates for reclamation by the page daemon if so needed. I believe some amount of work is needed to move an inactive page to the free list, including zeroing it I think as well as removing any references still pointing to it (e.g. it could be a cached copy of data from local storage). Probably not completely 100% accurate as I haven't kept up with VM changes in the last few years, but close enough for government work :) Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 17:58:36 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8282D1065673 for ; Thu, 10 Jun 2010 17:58:36 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8ADE68FC1E for ; Thu, 10 Jun 2010 17:58:34 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA19784; Thu, 10 Jun 2010 20:58:33 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C1127C8.9040207@icyb.net.ua> Date: Thu, 10 Jun 2010 20:58:32 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: Gary Palmer References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610111316.GB87243@fupp.net> <20100610114614.GA71432@icarus.home.lan> <20100610173629.GA70716@in-addr.com> In-Reply-To: <20100610173629.GA70716@in-addr.com> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 17:58:36 -0000 on 10/06/2010 20:36 Gary Palmer said the following: > On Thu, Jun 10, 2010 at 04:46:14AM -0700, Jeremy Chadwick wrote: >> Clarification: I believe it actually has 1927MB (147M Inact + 1780M >> Free) available. I've always understood top's "Free" field to mean >> "number/amount of pages which have never been touched/used since the >> kernel was started", while "Inact" to mean "number/amount of pages which >> have been touched/used but are not actively being used, this available >> for use". >> >> If someone more familiar with the VM and top could expand on this, >> that'd be helpful. > > I'm not a VM guru, however here is my understanding: > > - "Free" are pages that have been reclaimed by the page daemon and are > ready for immediate use without further action. The page daemon always > tries to keep a few pages in the "Free" state to avoid problems with > page starvation > > - "Inactive" pages are pages that are candidates for reclamation by the > page daemon if so needed. I believe some amount of work is needed to > move an inactive page to the free list, including zeroing it I think as > well as removing any references still pointing to it (e.g. it could be > a cached copy of data from local storage). Something like that, right. My understanding: Active pages are also candidates for reclamation, but Inactive are the primary ones. The only difference is how much time passed since they were last "referenced". Cached pages are pages that effectively free, i.e. can be reclaimed any moment, but their content is still valid and so they can be re-used. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 18:48:57 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6BEFF106566C; Thu, 10 Jun 2010 18:48:57 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by mx1.freebsd.org (Postfix) with ESMTP id EEA0B8FC1D; Thu, 10 Jun 2010 18:48:56 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c211-30-160-13.mirnd2.nsw.optusnet.com.au [211.30.160.13] (may be forged)) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o5AImk9e004609 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jun 2010 04:48:47 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o5AImjjq069479; Fri, 11 Jun 2010 04:48:45 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o5AImiWl069459; Fri, 11 Jun 2010 04:48:44 +1000 (EST) (envelope-from peter) Date: Fri, 11 Jun 2010 04:48:44 +1000 From: Peter Jeremy To: Anders Nordby Message-ID: <20100610184844.GA64544@server.vk2pj.dyndns.org> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="17pEHd4RhPHOinZp" Content-Disposition: inline In-Reply-To: <20100610110609.GA87243@fupp.net> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 18:48:57 -0000 --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Jun-10 13:06:09 +0200, Anders Nordby wrote: >On Thu, Jun 10, 2010 at 06:17:10PM +1000, Peter Jeremy wrote: >> I wonder if your system is running out of free RAM. How would you >> like to monitor "inactive", "cache" and "free" from either "systat -v" >> or "vmstat -s" whilst the problem is occurring. >>=20 >> Does something like >> perl -e '$x =3D "x" x 10000000;' >> temporarily correct the problem? > >While the problem is happening: =2E.. >And from systat -v: > >Disks da0 da1 pass0 pass1 1045240 wire >KB/t 0.00 0.00 0.00 0.00 25240 act >tps 0 0 0 0 149344 inact >MB/s 0.00 0.00 0.00 0.00 112 cache >%busy 0 0 0 0 1824452 free > 323680 buf >> Does something like >> perl -e '$x =3D "x" x 10000000;' >> temporarily correct the problem? > >No. OK, it's not the issue I was considering. I can't offer any further suggestions at this point. --=20 Peter Jeremy PS: Sorry about the confused 'From' address last time. --17pEHd4RhPHOinZp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkwRM4wACgkQ/opHv/APuIeZhwCgno9MH4EsURDXi5kS+YbpX8TE 93wAn1evYz+M3uyjWTYXiRuqtGzBpIoJ =+ufv -----END PGP SIGNATURE----- --17pEHd4RhPHOinZp-- From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 19:11:22 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A516106567C for ; Thu, 10 Jun 2010 19:11:22 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id C65A98FC08 for ; Thu, 10 Jun 2010 19:11:21 +0000 (UTC) Received: by pwj1 with SMTP id 1so154190pwj.13 for ; Thu, 10 Jun 2010 12:11:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=kr9TGqryv95PLQPydMzDZEJ669kD8Xs8cIiI9T1H9E8=; b=Xs2pPKntmo9nxXa0fJxPVxlR0C5Q+iLGD/PxT359oklmDeFAvaNucEdLrZI3hMUQ3p rQXhHDFzPRGbiRHjepDxK5e+jLlRVKxkryddUw2YK87Aw+TpCMOyCe0bkcmneOAki4fz vgsdq34vhvRcC+tYjDnWtwe+spR9eiaWNqEjs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=JsQU+kx8Q28G9j6hJhOD9DzGnAcJLY+UvLldwhdXmkq/7MRJXyjZAUDhvcdrfrG1cl ij4+jajVKKu6wUGYrG239aYMLIBWrf4nFoBBObAXkDIAydtZLuNYnsGIkcnih0i698SP /m1GTPvfoh2c4GjV61XG6d48T/TfQwjFd0hdE= Received: by 10.143.26.1 with SMTP id d1mr428550wfj.311.1276197076119; Thu, 10 Jun 2010 12:11:16 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id s21sm186820wff.12.2010.06.10.12.11.13 (version=SSLv3 cipher=RC4-MD5); Thu, 10 Jun 2010 12:11:14 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C1138D0.7070901@dataix.net> Date: Thu, 10 Jun 2010 15:11:12 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> <20100610115324.10161biomkjndvy8@webmail.leidinger.net> <20100610173825.164930ekkryr5tes@webmail.leidinger.net> In-Reply-To: X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Leidinger , fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 19:11:22 -0000 On 06/10/2010 12:34, Artem Belevich wrote: > You can do something like this: > > #SCRUB_TS="2010-06-08.20:51:12" > SCRUB_TS=$1 > # parse timestamp, move it forward by 1 month and print in seconds since Epoch > NEXT_SCRUB_DATE_S=`date -j -f "%Y-%m-%d.%H:%M:%S" -v+1m +"%s" $SCRUB_TS` > # for debugging purposes convert epoch time into something human-readable > NEXT_SCRUB_DATE=`date -r $NEXT_SCRUB_DATE` > # surrent time in secs since Epoch. > NOW_S=`date +"%s"` > # Compare two times to figure out if next scrub time is still in the future > if [ $NOW_S -gt $NEXT_SCRUB_DATE_S ]; then > echo yup. > else > echo nope. > fi > > --Artem #!/bin/sh lastscrub=$(zpool history exports |grep scrub |tail -1 |cut -f1 -d.) todaypoch=$(date -j -f "%Y-%m-%d" "+%s" $(date "+%Y-%m-%d")) scrubpoch=$(date -j -f "%Y-%m-%d" "+%s" $lastscrub) echo $lastscrub Last Scrub From zpool history echo $todaypoch Today converted to seconds since epoch echo $scrubpoch Last scrub converted to seconds since epoch expired=$((((($todaypoch-$scrubpoch)/60)/60)/24)) if [ ${expired:=30} -ge ${daily_scrub_zfs_threshold:=30} ]; then echo "Performing Scrub...." else echo "SORRY its only been $expired days since your last scrub." fi My reasoning for setting expired to have a default value of 30 depended on whether a pool may have just been created in which a scrub would have never been performed thus with this value being equal to that of the default threshold would allow that pool to be scrubbed on the first day it was created. I considered just doing ${expired:=${daily_scrub_zfs_threshold:=30}} which would also allow it to be set to whatever a user set their value to before the pool was created and adds another layer of redundancy on that variable in a fail-safe sort of way. Regards & nice work on this. I just noticed the CFT just after writing this. but still have a look at the above it may simplify the testing while providing some fallback for what I stated above. > > > > On Thu, Jun 10, 2010 at 8:38 AM, Alexander Leidinger > wrote: >> Quoting Artem Belevich (from Thu, 10 Jun 2010 07:59:46 >> -0700): >> >>>> Good idea! I even found a command line which does the calculation for the >>>> number of days between "now" and the last run (not taking a leap year >>>> into >>>> account, but an off-by-one day error here does not matter). >>> >>> You can get exactly one month difference by using -v option of 'date' >>> command to figure out the time/date offset by arbitrary amount. >>> Combined with +"%s" format to print number of seconds since Epoch and >>> -r to specify the reference point in time it makes 'date' pretty >>> useful in scripts. >> >> What we have is the date of the last scrub (e.g. 2010-06-08.20:51:12), and >> what we want to know is if between the last scrub and now we passed a >> specific amount of days or not. >> >> What I do is taking the year multiplied with 365 plus the day of the year. >> Both of this for the last date of the scrub and "now". The difference is the >> number of days between those two dates. This value I can use with -le or -ge >> for the test command. >> >> This is only off by one once in a leap year when the leap-day is in-between >> the two dates (those people which want to scrub every 4 years are off by two >> when both leap-days are in-between, but a scrub of every 4 years or more >> looks unreasonable to me, so I do not care much about this). >> >> This is done in one line with two calls to date (once for the last scrub, >> once for "now") and a little bit of shell-buildin-arithmetic. If you have a >> more correct version which is not significantly more complex, feel free to >> share it here. >> >> Bye, >> Alexander. >> >> -- >> "Who would have though hell would really exist? And that it would be in New >> Jersey?" -Leela >> "Actually..." - Fry >> >> http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 >> http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- jhell From owner-freebsd-fs@FreeBSD.ORG Thu Jun 10 23:32:41 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEB4B1065673; Thu, 10 Jun 2010 23:32:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 82ED58FC17; Thu, 10 Jun 2010 23:32:40 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEABsTEUyDaFvK/2dsb2JhbACeeHG/CIUYBA X-IronPort-AV: E=Sophos;i="4.53,400,1272859200"; d="scan'208";a="80271424" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 10 Jun 2010 19:32:37 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 8A6A2109C350; Thu, 10 Jun 2010 19:32:39 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bmtIrr2oPbuO; Thu, 10 Jun 2010 19:32:39 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 00592109C34B; Thu, 10 Jun 2010 19:32:38 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o5ANmnW07793; Thu, 10 Jun 2010 19:48:49 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Thu, 10 Jun 2010 19:48:49 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Jeremy Chadwick In-Reply-To: <20100610133859.GA74094@icarus.home.lan> Message-ID: References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Peter Jeremy , Anders Nordby Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2010 23:32:41 -0000 On Thu, 10 Jun 2010, Jeremy Chadwick wrote: > > The interrupt rate for bge1 (irq26) is very high during the problem, > while otherwise is only ~6/sec. Shot in the dark, but this is probably > the cause of the packet loss you see. Oddly, your uhci2 interface (used > for USB) is also firing at a very high rate. I don't know if this is > the sign of a NIC problem, driver problem, or interrupt (think APIC?) > routing problem. > > Debugging this is beyond my capability, but folks like John Baldwin may > have some ideas on where to go from here. > > Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or > "tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is > happening? The reason I ask is to determine if there's any chance this > box starts seeing problems due to DoS attacks or excessive LAN traffic > which is unexpected. Basically, be sure that all the network I/O going > on across bge1 is expected. > Yes, I think Jeremy is on the right track. I'd second the recommendation to look at traffic when it is happening. I might choose: tcpdump -s 0 -w -i bge1 and then load "" into wireshark, since wireshark is much better at making sense of NFS traffic. (Since the nfsd is at the top of the process list, it hints that there may be heavy nfs traffic being received by bge1.) If you do this tcpdump for a short period of time and then email "" to me as an attachment, I can take a look at it. (If the traffic isn't NFS, then there's not much point in doing this.) We might have a case where a client is retrying the same RPC (or RPC sequence) over and over and over again, my friend (sorry I couldn't resist:-). Given that you stated FreeBSD8.1-Prerelease I think you should have the patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is at least r206406. Let me know how it goes, rick From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 03:18:12 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3A1C1065679 for ; Fri, 11 Jun 2010 03:18:12 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id 7521A8FC17 for ; Fri, 11 Jun 2010 03:18:11 +0000 (UTC) Received: from omta14.westchester.pa.mail.comcast.net ([76.96.62.60]) by qmta09.westchester.pa.mail.comcast.net with comcast id USbJ1e0031HzFnQ59TJCw6; Fri, 11 Jun 2010 03:18:12 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta14.westchester.pa.mail.comcast.net with comcast id UTJA1e00F3S48mS3aTJBzs; Fri, 11 Jun 2010 03:18:12 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AEAB19B423; Thu, 10 Jun 2010 20:18:09 -0700 (PDT) Date: Thu, 10 Jun 2010 20:18:09 -0700 From: Jeremy Chadwick To: Rick Macklem Message-ID: <20100611031809.GA93666@icarus.home.lan> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@FreeBSD.org, Peter Jeremy , Anders Nordby , PYUN Yong-Hyeon Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 03:18:13 -0000 On Thu, Jun 10, 2010 at 07:48:49PM -0400, Rick Macklem wrote: > On Thu, 10 Jun 2010, Jeremy Chadwick wrote: > >The interrupt rate for bge1 (irq26) is very high during the problem, > >while otherwise is only ~6/sec. Shot in the dark, but this is probably > >the cause of the packet loss you see. Oddly, your uhci2 interface (used > >for USB) is also firing at a very high rate. I don't know if this is > >the sign of a NIC problem, driver problem, or interrupt (think APIC?) > >routing problem. > > > >Debugging this is beyond my capability, but folks like John Baldwin may > >have some ideas on where to go from here. > > > >Also, have you used "netstat -ibn -I bge1" (to look at byte counters) or > >"tcpdump -l -n -s 0 -i bge1" to watch network traffic live when this is > >happening? The reason I ask is to determine if there's any chance this > >box starts seeing problems due to DoS attacks or excessive LAN traffic > >which is unexpected. Basically, be sure that all the network I/O going > >on across bge1 is expected. > > > Yes, I think Jeremy is on the right track. I'd second the recommendation > to look at traffic when it is happening. I might choose: > tcpdump -s 0 -w -i bge1 > and then load "" into wireshark, since wireshark is much better at > making sense of NFS traffic. (Since the nfsd is at the top of the process > list, it hints that there may be heavy nfs traffic being received by > bge1.) > > If you do this tcpdump for a short period of time and then email "" > to me as an attachment, I can take a look at it. (If the traffic isn't > NFS, then there's not much point in doing this.) We might have a case > where a client is retrying the same RPC (or RPC sequence) over and over > and over again, my friend (sorry I couldn't resist:-). > > Given that you stated FreeBSD8.1-Prerelease I think you should have the > patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is > at least r206406. > > Let me know how it goes, rick Also for Anders -- With regards to possible bge(4) issues, Yong-Hyeon works on this driver fairly often. If it turns out to be a driver issue of some sort, he can probably help. Relevant commits are here (to give you some idea of activity): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c One commit caught my eye (rev 1.226.2.15), but that seems to be more focused on mbuf issues (your system doesn't appear to be having any, given your netstat -m output). CC'ing Yong-Hyeong, as he might know of some edge case where bge(4) could go crazy with interrupts. :-) Yong-Hyeon, the entire thread is here: http://lists.freebsd.org/pipermail/freebsd-fs/2010-June/008654.html -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 06:03:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58AE4106567C for ; Fri, 11 Jun 2010 06:03:50 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id A66B78FC14 for ; Fri, 11 Jun 2010 06:03:49 +0000 (UTC) Received: by bwz2 with SMTP id 2so281077bwz.13 for ; Thu, 10 Jun 2010 23:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:subject:organization :date:message-id:user-agent:mime-version:content-type; bh=jJGd4KXkRNjxRw7LniXf+XPGfrJorXOofPJIxd9Jktg=; b=Wgn3s0fT+RaCkNgnQNlPbYNjJGOtNOlZx7t65p1hoCtcjwgVHojNZPNdHxkwh7kS7c E/6uhWGvHuiBmQfu8jjUzyNE7SXq0uC9fATJwZ6vinrlsEupHqJS2ZjstTaY3cTvMZEx z1kloN60pnbpzqIV5qmprXDCIX1VeqXU4nJMs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:subject:organization:date:message-id:user-agent :mime-version:content-type; b=Cy36o/FtDPQ1YdH7o1cYSpPl3r0LKFZEJOyQ0Q91l+b5BVfbCCKyj2g/thNPPVVmcy GJ1s02Hen0VDSs1lH+B1oECreRTRr4K2at6a7w9Ws9ayKs940Hy2pazo0Hjx7NDWNMmx RPVsRacmFSdrCRM2xurXItQovik9Wrhre7gdE= Received: by 10.204.81.84 with SMTP id w20mr919417bkk.81.1276236227869; Thu, 10 Jun 2010 23:03:47 -0700 (PDT) Received: from localhost (ua1.etadirect.net [91.198.140.16]) by mx.google.com with ESMTPS id z17sm3441551bkx.6.2010.06.10.23.03.46 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 10 Jun 2010 23:03:47 -0700 (PDT) From: Mikolaj Golub To: freebsd-fs@freebsd.org Organization: TOA Ukraine Date: Fri, 11 Jun 2010 09:03:44 +0300 Message-ID: <86mxv22ji7.fsf@zhuzha.ua1> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Subject: '#ifndef DIAGNOSTIC' in nfsclient code looks like a typo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 06:03:50 -0000 --=-=-= Hi: '#ifndef DIAGNOSTIC' in sys/nfsclient/nfs_vnops.c and sys/fs/nfsclient/nfs_clvnops.c looks like a typo and '#ifdef' should be used instead (see the attached patch). -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=nfsclient.ifdef_DIAGNOSTIC.patch Index: sys/nfsclient/nfs_vnops.c =================================================================== --- sys/nfsclient/nfs_vnops.c (revision 209021) +++ sys/nfsclient/nfs_vnops.c (working copy) @@ -1348,7 +1348,7 @@ nfs_writerpc(struct vnode *vp, struct uio *uiop, s int v3 = NFS_ISV3(vp), committed = NFSV3WRITE_FILESYNC; int wsize; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (uiop->uio_iovcnt != 1) panic("nfs: writerpc iovcnt > 1"); #endif @@ -1708,7 +1708,7 @@ nfs_remove(struct vop_remove_args *ap) int error = 0; struct vattr vattr; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if ((cnp->cn_flags & HASBUF) == 0) panic("nfs_remove: no name"); if (vrefcnt(vp) < 1) @@ -1814,7 +1814,7 @@ nfs_rename(struct vop_rename_args *ap) struct componentname *fcnp = ap->a_fcnp; int error; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if ((tcnp->cn_flags & HASBUF) == 0 || (fcnp->cn_flags & HASBUF) == 0) panic("nfs_rename: no name"); @@ -2277,7 +2277,7 @@ nfs_readdirrpc(struct vnode *vp, struct uio *uiop, int attrflag; int v3 = NFS_ISV3(vp); -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || (uiop->uio_resid & (DIRBLKSIZ - 1))) panic("nfs readdirrpc bad uio"); @@ -2482,7 +2482,7 @@ nfs_readdirplusrpc(struct vnode *vp, struct uio *u #ifndef nolint dp = NULL; #endif -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || (uiop->uio_resid & (DIRBLKSIZ - 1))) panic("nfs readdirplusrpc bad uio"); @@ -2752,7 +2752,7 @@ nfs_sillyrename(struct vnode *dvp, struct vnode *v cache_purge(dvp); np = VTONFS(vp); -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (vp->v_type == VDIR) panic("nfs: sillyrename dir"); #endif Index: sys/fs/nfsclient/nfs_clvnops.c =================================================================== --- sys/fs/nfsclient/nfs_clvnops.c (revision 209021) +++ sys/fs/nfsclient/nfs_clvnops.c (working copy) @@ -1564,7 +1564,7 @@ nfs_remove(struct vop_remove_args *ap) int error = 0; struct vattr vattr; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if ((cnp->cn_flags & HASBUF) == 0) panic("nfs_remove: no name"); if (vrefcnt(vp) < 1) @@ -1676,7 +1676,7 @@ nfs_rename(struct vop_rename_args *ap) struct nfsv4node *newv4 = NULL; int error; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if ((tcnp->cn_flags & HASBUF) == 0 || (fcnp->cn_flags & HASBUF) == 0) panic("nfs_rename: no name"); @@ -2137,7 +2137,7 @@ ncl_readdirrpc(struct vnode *vp, struct uio *uiop, struct nfsmount *nmp = VFSTONFS(vp->v_mount); int error = 0, eof, attrflag; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || (uiop->uio_resid & (DIRBLKSIZ - 1))) panic("nfs readdirrpc bad uio"); @@ -2198,7 +2198,7 @@ ncl_readdirplusrpc(struct vnode *vp, struct uio *u struct nfsmount *nmp = VFSTONFS(vp->v_mount); int error = 0, attrflag, eof; -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || (uiop->uio_resid & (DIRBLKSIZ - 1))) panic("nfs readdirplusrpc bad uio"); @@ -2264,7 +2264,7 @@ nfs_sillyrename(struct vnode *dvp, struct vnode *v cache_purge(dvp); np = VTONFS(vp); -#ifndef DIAGNOSTIC +#ifdef DIAGNOSTIC if (vp->v_type == VDIR) panic("nfs: sillyrename dir"); #endif --=-=-=-- From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 08:42:29 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E8E2106567D for ; Fri, 11 Jun 2010 08:42:29 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 0B2098FC0A for ; Fri, 11 Jun 2010 08:42:28 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FC95.dip.t-dialin.net [217.84.252.149]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 7BC1F84405C; Fri, 11 Jun 2010 10:42:23 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 44A2251EB; Fri, 11 Jun 2010 10:42:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276245740; bh=g5+qKNsN4COx6/MHIHxAUCSoCZrvGqp5tGOMhyaZwxo=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=zbTfE+frFODz5EsHOQYFhkfNuBVfMFP3fASLFkeHPLOGn+ukdLyndPKlbwqD0/cLG 9sjMFUGCO7N0t2TA6cwIsUgbPfgmnbfPXqpwxXw6P08w/Z5v8kAfZo+1laG0w+Fnvx k9ayTyGcEDFhkUEPhp5HAV/Ta60SSbW98XoxgRBRXtDg7rCRgBkF4Wm5itc4HPOMap ZHmXPgSkMziPAlewnFfGLwiWppnpM77kyAIXAiHcCn90X3TjVCdEag1PKUBoFVNO3q jO9BfKoiobSbSm+8uqSIIR6Caj9H1IilSr+L/I1Yjd4zXvoayPus5neKkz51A//lCL aahlJVF+ROw2w== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5B8gJ58059155; Fri, 11 Jun 2010 10:42:19 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Fri, 11 Jun 2010 10:42:19 +0200 Message-ID: <20100611104219.51344ag1ah7br4kk@webmail.leidinger.net> Date: Fri, 11 Jun 2010 10:42:19 +0200 From: Alexander Leidinger To: jhell References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> <20100610115324.10161biomkjndvy8@webmail.leidinger.net> <20100610173825.164930ekkryr5tes@webmail.leidinger.net> <4C1138D0.7070901@dataix.net> In-Reply-To: <4C1138D0.7070901@dataix.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 7BC1F84405C.A6A65 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276850545.62292@GjhW0Arrhz5WypsI9PqVPw X-EBL-Spam-Status: No Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 08:42:29 -0000 Quoting jhell (from Thu, 10 Jun 2010 15:11:12 -0400): > On 06/10/2010 12:34, Artem Belevich wrote: >> You can do something like this: >> >> #SCRUB_TS="2010-06-08.20:51:12" >> SCRUB_TS=$1 >> # parse timestamp, move it forward by 1 month and print in seconds >> since Epoch >> NEXT_SCRUB_DATE_S=`date -j -f "%Y-%m-%d.%H:%M:%S" -v+1m +"%s" $SCRUB_TS` >> # for debugging purposes convert epoch time into something human-readable >> NEXT_SCRUB_DATE=`date -r $NEXT_SCRUB_DATE` >> # surrent time in secs since Epoch. >> NOW_S=`date +"%s"` >> # Compare two times to figure out if next scrub time is still in the future >> if [ $NOW_S -gt $NEXT_SCRUB_DATE_S ]; then >> echo yup. >> else >> echo nope. >> fi >> >> --Artem > > #!/bin/sh > > lastscrub=$(zpool history exports |grep scrub |tail -1 |cut -f1 -d.) > todaypoch=$(date -j -f "%Y-%m-%d" "+%s" $(date "+%Y-%m-%d")) > scrubpoch=$(date -j -f "%Y-%m-%d" "+%s" $lastscrub) > > echo $lastscrub Last Scrub From zpool history > echo $todaypoch Today converted to seconds since epoch > echo $scrubpoch Last scrub converted to seconds since epoch > > expired=$((((($todaypoch-$scrubpoch)/60)/60)/24)) Apart from the fact that we can do this with one $(( ))... what happens if/when time_t is extended to 64 bits on 32 bit platforms? Can we get into trouble with the shell-arithmetic or not? It depends upon the bit-size of the shell integers, and the signedness of them. Jilles (our shell maintainer) suggested also to use the seconds since epoch and I asked him the same question. I'm waiting for an answer from him. The same concerns apply to test(1) (or the corresponding buildin) in the solution of Artem. By calculating with days everywhere (like in my solution), I'm sure that it takes longer to hit a wall than by calculating with seconds since epoch (which can cause a problem in 2038 or during a transition when this problem is tackled in time_t but not here, which is not that far away). The off-by-one day once every 4 years shouldn't be a problem. If someone can assure with some nice facts, that using the seconds since epoch will not cause problems in the described cases, I have no problem to switch to use them. Bye, Alexander. > if [ ${expired:=30} -ge ${daily_scrub_zfs_threshold:=30} ]; then > echo "Performing Scrub...." > else > echo "SORRY its only been $expired days since your last scrub." > fi > > > My reasoning for setting expired to have a default value of 30 depended > on whether a pool may have just been created in which a scrub would have > never been performed thus with this value being equal to that of the > default threshold would allow that pool to be scrubbed on the first day > it was created. > > I considered just doing ${expired:=${daily_scrub_zfs_threshold:=30}} > which would also allow it to be set to whatever a user set their value > to before the pool was created and adds another layer of redundancy on > that variable in a fail-safe sort of way. > > Regards & nice work on this. I just noticed the CFT just after writing > this. but still have a look at the above it may simplify the testing > while providing some fallback for what I stated above. > >> >> >> >> On Thu, Jun 10, 2010 at 8:38 AM, Alexander Leidinger >> wrote: >>> Quoting Artem Belevich (from Thu, 10 Jun 2010 07:59:46 >>> -0700): >>> >>>>> Good idea! I even found a command line which does the calculation for the >>>>> number of days between "now" and the last run (not taking a leap year >>>>> into >>>>> account, but an off-by-one day error here does not matter). >>>> >>>> You can get exactly one month difference by using -v option of 'date' >>>> command to figure out the time/date offset by arbitrary amount. >>>> Combined with +"%s" format to print number of seconds since Epoch and >>>> -r to specify the reference point in time it makes 'date' pretty >>>> useful in scripts. >>> >>> What we have is the date of the last scrub (e.g. 2010-06-08.20:51:12), and >>> what we want to know is if between the last scrub and now we passed a >>> specific amount of days or not. >>> >>> What I do is taking the year multiplied with 365 plus the day of the year. >>> Both of this for the last date of the scrub and "now". The >>> difference is the >>> number of days between those two dates. This value I can use with >>> -le or -ge >>> for the test command. >>> >>> This is only off by one once in a leap year when the leap-day is in-between >>> the two dates (those people which want to scrub every 4 years are >>> off by two >>> when both leap-days are in-between, but a scrub of every 4 years or more >>> looks unreasonable to me, so I do not care much about this). >>> >>> This is done in one line with two calls to date (once for the last scrub, >>> once for "now") and a little bit of shell-buildin-arithmetic. If you have a >>> more correct version which is not significantly more complex, feel free to >>> share it here. >>> >>> Bye, >>> Alexander. >>> >>> -- >>> "Who would have though hell would really exist? And that it would >>> be in New >>> Jersey?" -Leela >>> "Actually..." - Fry >>> >>> http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 >>> http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 >>> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > -- > > jhell > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > -- Before marriage the three little words are "I love you," after marriage they are "Let's eat out." http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 10:38:07 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03008106567D for ; Fri, 11 Jun 2010 10:38:07 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id AC61D8FC19 for ; Fri, 11 Jun 2010 10:38:06 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1ON1ci-00085B-7r; Fri, 11 Jun 2010 13:38:04 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id E37B11CC0B; Fri, 11 Jun 2010 13:38:03 +0300 (EEST) Date: Fri, 11 Jun 2010 13:38:03 +0300 From: Andrey Simonenko To: Rick Macklem Message-ID: <20100611103803.GA1855@pm513-1.comsys.ntu-kpi.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 06-Jan-2007 23:14:37) X-Date: 2010-06-11 13:38:04 X-Connected-IP: 10.18.52.101:11436 X-Message-Linecount: 30 X-Body-Linecount: 15 X-Message-Size: 1422 X-Body-Size: 729 Cc: freebsd-fs@freebsd.org Subject: Re: Testers: NFSv3 support for pxeboot for nfs diskless root X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 10:38:07 -0000 On Wed, Jun 09, 2010 at 07:38:24PM -0400, Rick Macklem wrote: > I put 3 patches (you need to apply them all) here: > http://people.freebsd.org/~rmacklem/nfsdiskless-patches/ > > They convert lib/libstand/nfs.c and pxeboot to use NFSv3 instead > of NFSv2 (unless built with OLD_NFSV2 defined). Initial test > reports have been good. (one has it working ok and the other has > a problem in an area not related to the patches, it appears) > > So, if others are interested in testing these, it would be > appreciated, rick Shouldn't return values from malloc() calls be checked? Also additional checks for NULL values before free() calls can be removed, at least this will reduce size of code. There is PR/83424 related to this. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 12:04:40 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2D5F41065676 for ; Fri, 11 Jun 2010 12:04:40 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id D8F0D8FC14 for ; Fri, 11 Jun 2010 12:04:39 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1ON2yU-0001Bx-Fi for freebsd-fs@freebsd.org; Fri, 11 Jun 2010 14:04:38 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 11 Jun 2010 14:04:38 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 11 Jun 2010 14:04:38 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org connect(): No such file or directory From: Ivan Voras Date: Fri, 11 Jun 2010 14:04:24 +0200 Lines: 28 Message-ID: References: <20100610162629.38992mazf0sfdqg0@webmail.leidinger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100518 Thunderbird/3.0.4 In-Reply-To: <20100610162629.38992mazf0sfdqg0@webmail.leidinger.net> X-Enigmail-Version: 1.0.1 Subject: Re: CFT: periodic scrubbing of ZFS pools X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 12:04:40 -0000 On 06/10/10 16:26, Alexander Leidinger wrote: > Hi, > > as there seems to be interest in a periodic script to scrub zpools, I > modified my monthly-POC into a daily script with parameters for which > pools to scrub, how many days between scrubs (even different per pool, > if required), and several error checks (non-existing pool specified, > scrub in progress). > > You can find it at > http://www.Leidinger.net/FreeBSD/current-patches/600.scrub-zfs > > Please put it into /etc/periodic/daily and test it. Possible > periodic.conf variables are: > daily_scrub_zfs_enable="YES" > daily_scrub_zfs_pools="name1 name2 name3" # all if unset or empty > daily scrub_zfs_default_threshold="" # default: 30 > daily_scrub_zfs__threshold="" > > If there is no specific threshold for a pool (= days between scrubs), > the default threshold is used. Fairly good and useful, but could you add a small check of "zpool status" information before scrubbing that would a) complain LOUDLY AND VISIBLY if a previous scrub failed and b) skip issuing a new scrub command if there is such an error, to avoid stressing possibly broken hardware? From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 14:51:46 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B96091065678 for ; Fri, 11 Jun 2010 14:51:46 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 6A4F28FC13 for ; Fri, 11 Jun 2010 14:51:45 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAKvqEUyDaFvK/2dsb2JhbACee3G/EYUYBA X-IronPort-AV: E=Sophos;i="4.53,403,1272859200"; d="scan'208";a="80337382" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 11 Jun 2010 10:51:43 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 74E1A109C358; Fri, 11 Jun 2010 10:51:45 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i6wLuQqUrRwm; Fri, 11 Jun 2010 10:51:45 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 101DE109C34B; Fri, 11 Jun 2010 10:51:45 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o5BF7v320443; Fri, 11 Jun 2010 11:07:57 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Fri, 11 Jun 2010 11:07:57 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Andrey Simonenko In-Reply-To: <20100611103803.GA1855@pm513-1.comsys.ntu-kpi.kiev.ua> Message-ID: References: <20100611103803.GA1855@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: Testers: NFSv3 support for pxeboot for nfs diskless root X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 14:51:46 -0000 On Fri, 11 Jun 2010, Andrey Simonenko wrote: > > Shouldn't return values from malloc() calls be checked? Yea, I suppose that's a good idea, although I think all that can be done is print a failure message, since it's "dead in the water" at that point. > Also additional checks for NULL values before free() calls can be removed, > at least this will reduce size of code. There is PR/83424 related to this. > My only concern here would be if someone were to change Free() so it doesn't check for a null pointer, but since it does now, I suppose it's a feature and shouldn't be changed. Anyone else have an opinion on this? (ie. Whether I should just assume that Free() checks for the NULL ptr.) rick From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 15:20:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48F301065677; Fri, 11 Jun 2010 15:20:42 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id C89408FC17; Fri, 11 Jun 2010 15:20:41 +0000 (UTC) Received: from outgoing.leidinger.net (pD954FC95.dip.t-dialin.net [217.84.252.149]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 8EE4F84400A; Fri, 11 Jun 2010 17:20:37 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 3FE81521B; Fri, 11 Jun 2010 17:20:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1276269634; bh=fqWjhwATLlDUTnUUymky33S3bCX3IngdN484M52/M1A=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=MwAKfWquhD0EcPouOe5m3/y0UQ2sOPpv9U6nXXa3Cq67IBDMm5l/Zg0z3q6rXvxlr aKxZFUBWl9o3fOsDrz4VJJB+gShqvyZKl+yZh2KJdxpU7DIKz6ZgKAyQoWsVV+lzpD LJPaMPPnld6ZZTyXq74Zj1CJqWJDH1dy/upExwSoV5Jl67GiAnyMLu1D0dX8iP1JIN A+nyx4CxAgRODuUEiRNOdjJpO/jvHbsQRS5EbR5OtW74mKtRu1paRrOIOA+sH3d5Rb idHMim9w/tz/d+G5FXQVtqwpD/zakgONiE5hAjPlDkKuVFXzU1aWMNiyzeE1KSku6s tepOC0ClbGuwA== Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id o5BFKXeI022633; Fri, 11 Jun 2010 17:20:33 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Fri, 11 Jun 2010 17:20:33 +0200 Message-ID: <20100611172033.42001s90ahe57oe8@webmail.leidinger.net> Date: Fri, 11 Jun 2010 17:20:33 +0200 From: Alexander Leidinger To: Ivan Voras References: <20100610162629.38992mazf0sfdqg0@webmail.leidinger.net> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 8EE4F84400A.A78D4 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1.023, required 6, autolearn=disabled, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1276874438.51835@75x/WBkoEbA9pDIrtOCYzA X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: CFT: periodic scrubbing of ZFS pools X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 15:20:42 -0000 Quoting Ivan Voras (from Fri, 11 Jun 2010 14:04:24 +0200): > On 06/10/10 16:26, Alexander Leidinger wrote: >> Hi, >> >> as there seems to be interest in a periodic script to scrub zpools, I >> modified my monthly-POC into a daily script with parameters for which >> pools to scrub, how many days between scrubs (even different per pool, >> if required), and several error checks (non-existing pool specified, >> scrub in progress). >> >> You can find it at >> http://www.Leidinger.net/FreeBSD/current-patches/600.scrub-zfs >> >> Please put it into /etc/periodic/daily and test it. Possible >> periodic.conf variables are: >> daily_scrub_zfs_enable="YES" >> daily_scrub_zfs_pools="name1 name2 name3" # all if unset or empty >> daily scrub_zfs_default_threshold="" # default: 30 >> daily_scrub_zfs__threshold="" >> >> If there is no specific threshold for a pool (= days between scrubs), >> the default threshold is used. > > Fairly good and useful, but could you add a small check of "zpool > status" information before scrubbing that would a) complain LOUDLY AND > VISIBLY if a previous scrub failed and b) skip issuing a new scrub > command if there is such an error, to avoid stressing possibly broken > hardware? Can you please provide an example of such a failed scrub? Things I fixed so far: - use the creation time of the pool if no scrub was done before - rename the script via s/600/800/ (this is a I/O intensive task and we want to have this done late in the periodic run, so that other stuff is not slowed down too much) Bye, Alexander. -- Winning isn't everything, but losing isn't anything. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 15:53:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A3E51065670 for ; Fri, 11 Jun 2010 15:53:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 499D48FC12 for ; Fri, 11 Jun 2010 15:53:11 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id ECFC146C13; Fri, 11 Jun 2010 11:53:10 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id EF5B78A03C; Fri, 11 Jun 2010 11:53:09 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Fri, 11 Jun 2010 11:51:21 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <20100611103803.GA1855@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201006111151.21925.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 11 Jun 2010 11:53:10 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Subject: Re: Testers: NFSv3 support for pxeboot for nfs diskless root X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 15:53:11 -0000 On Friday 11 June 2010 11:07:57 am Rick Macklem wrote: > > On Fri, 11 Jun 2010, Andrey Simonenko wrote: > > > > > Shouldn't return values from malloc() calls be checked? > > Yea, I suppose that's a good idea, although I think all that can be > done is print a failure message, since it's "dead in the water" at > that point. > > > Also additional checks for NULL values before free() calls can be removed, > > at least this will reduce size of code. There is PR/83424 related to this. > > > My only concern here would be if someone were to change Free() so it > doesn't check for a null pointer, but since it does now, I suppose > it's a feature and shouldn't be changed. > > Anyone else have an opinion on this? (ie. Whether I should just assume > that Free() checks for the NULL ptr.) free() in the kernel and userland also check for NULL, so I think it's ok to assume the same behavior for libstand. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 16:12:08 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30CCA106564A for ; Fri, 11 Jun 2010 16:12:08 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.155]) by mx1.freebsd.org (Postfix) with ESMTP id 7F1B78FC0C for ; Fri, 11 Jun 2010 16:12:06 +0000 (UTC) Received: by fg-out-1718.google.com with SMTP id d23so246201fga.13 for ; Fri, 11 Jun 2010 09:12:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:cc:content-type; bh=f1RokaG63qog1hhpCLdXgKCdteQnUmKa38pqTmn1z28=; b=Zf2r3UiQ+ntNOh+NFD3keVzb44FSoMMAP8bcnryyCpMUxla8kQtBnm/AVYl5P5sQ5W RIKCwqbqyrwWLf7VOOfe6ILWcGWnhPtZktGakTNG2nD7e7FXBpLXVYiQHdsu3ASTp6g0 0AL40UIbHVyzqwo6Ejd6vkX7dZ66b5KPCeNnc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; b=Z3Y+ScGOsOVTzOp7Wmt1a+h4iRCaRPAZilUjrvK7Fb1o9ukpWf0h0PUiuCnMGGQlsV Qys4CqR4O/9BM+rMWS8IeG2h0tHuTrmk6S6uxWXd3Cg1ZdnK6b/Xs44veAUG1c0bOe1i TUG7xAJf4XTabADz15ZgKsw5F8iPh4KalzLWw= Received: by 10.216.179.138 with SMTP id h10mr1179376wem.49.1276272725571; Fri, 11 Jun 2010 09:12:05 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.216.89.197 with HTTP; Fri, 11 Jun 2010 09:11:45 -0700 (PDT) In-Reply-To: <20100611172033.42001s90ahe57oe8@webmail.leidinger.net> References: <20100610162629.38992mazf0sfdqg0@webmail.leidinger.net> <20100611172033.42001s90ahe57oe8@webmail.leidinger.net> From: Ivan Voras Date: Fri, 11 Jun 2010 18:11:45 +0200 X-Google-Sender-Auth: cBtPpfA_UZZbY1XRQP_1Mch67qI Message-ID: To: Alexander Leidinger Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: CFT: periodic scrubbing of ZFS pools X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 16:12:08 -0000 On 11 June 2010 17:20, Alexander Leidinger wrote: > Quoting Ivan Voras (from Fri, 11 Jun 2010 14:04:24 > +0200): >> Fairly good and useful, but could you add a small check of "zpool >> status" information before scrubbing that would a) complain LOUDLY AND >> VISIBLY if a previous scrub failed and b) skip issuing a new scrub >> command if there is such an error, to avoid stressing possibly broken >> hardware? > > Can you please provide an example of such a failed scrub? You should probably treat any status message that doesn't have "none requested" or "scrub completed with 0 errors..." as failed. I could setup a gnop device with errors to prove it if you'd like :) From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 16:33:17 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B39461065673 for ; Fri, 11 Jun 2010 16:33:17 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id 510298FC14 for ; Fri, 11 Jun 2010 16:33:16 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id 5036547C36; Fri, 11 Jun 2010 18:33:15 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id apHlpcw7u46C; Fri, 11 Jun 2010 18:33:14 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id C22D947C35; Fri, 11 Jun 2010 18:33:14 +0200 (CEST) Date: Fri, 11 Jun 2010 18:33:14 +0200 From: Anders Nordby To: Jeremy Chadwick Message-ID: <20100611163314.GA84574@fupp.net> References: <20100608083649.GA77452@fupp.net> <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> <20100611031809.GA93666@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20100611031809.GA93666@icarus.home.lan> User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@FreeBSD.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 16:33:17 -0000 Hi, On Thu, Jun 10, 2010 at 08:18:09PM -0700, Jeremy Chadwick wrote: >> Given that you stated FreeBSD8.1-Prerelease I think you should have the >> patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is >> at least r206406. I didn't have any time to dump and look at the network traffic much yet (life is busy). But, the issue in this thread also happens/happened in FreeBSD 7.3-RELEASE, so I don't see how it's a recent change that makes this happen. Last night I had some progress, by switching to an old 100 Mbps USB NIC of mine (nerds sure do have lots of handy things at home eh) I got rid of the packet loss: Jun 11 01:25:14 unixfile kernel: rue0: on usbus3 Jun 11 01:25:14 unixfile kernel: miibus2: on rue0 Jun 11 01:25:14 unixfile kernel: ruephy0: PHY 0 on miibus2 Performance is quite lousy however. Just in case I am trying to get hold of a PCI-X Intel NIC to see how that goes, as this is a production server after all (or supposed to be). > With regards to possible bge(4) issues, Yong-Hyeon works on this driver > fairly often. If it turns out to be a driver issue of some sort, he can > probably help. Relevant commits are here (to give you some idea of > activity): > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c > > One commit caught my eye (rev 1.226.2.15), but that seems to be more > focused on mbuf issues (your system doesn't appear to be having any, > given your netstat -m output). > > CC'ing Yong-Hyeong, as he might know of some edge case where bge(4) > could go crazy with interrupts. :-) Yong-Hyeon, the entire thread is > here: > > http://lists.freebsd.org/pipermail/freebsd-fs/2010-June/008654.html Let me know if there's anything bge related I can try/test. It might take a day or two or more. Customer is sort of getting annoyed by these problems, so the room for testing is getting smaller. But of course I want to help get a fix for this. Regards, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 18:30:43 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03B101065678; Fri, 11 Jun 2010 18:30:43 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6F21F8FC16; Fri, 11 Jun 2010 18:30:42 +0000 (UTC) Received: by pwj1 with SMTP id 1so976169pwj.13 for ; Fri, 11 Jun 2010 11:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=cb6N/hi4tPsh8RXsxGECMvteyup7Qjnfqn9C6dF610c=; b=mzBcOiKtVrB9Me0LSvEwQo1GWU7qzlYWqrjwwMTlx4unTzbjVR98URePda4Izr3ACC zcdqsfnRflDbq4RENz0kxCqSL7PCXoxe5+FTty8FIHmbyaoGBU9eJmiz7fLwlNWPhiSZ wLQ+tevt04A8N85P1BigIOjg+VZ2KhuTu02HI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=nXb1jkn2T7RQ+yq3o4iuziTYxoDQnpy9nsYlQbESm4Glou7RLEsssfY9UoI8vZD+il UWlBIeR9mmeC4sKEaCC53i+BJSVyaUdwRDfIHMHo2DPuKoLuKLW96wkmcAmh/5ECY87A tKPv9NIw/LJ+86CJ9LvBpbV9G8WlfL9+de2aI= Received: by 10.140.55.13 with SMTP id d13mr1674983rva.119.1276279167128; Fri, 11 Jun 2010 10:59:27 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id b12sm1435975rvn.22.2010.06.11.10.59.25 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 11 Jun 2010 10:59:26 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Fri, 11 Jun 2010 10:58:05 -0700 From: Pyun YongHyeon Date: Fri, 11 Jun 2010 10:58:05 -0700 To: Anders Nordby Message-ID: <20100611175805.GE13776@michelle.cdnetworks.com> References: <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> <20100611031809.GA93666@icarus.home.lan> <20100611163314.GA84574@fupp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100611163314.GA84574@fupp.net> User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 18:30:43 -0000 On Fri, Jun 11, 2010 at 06:33:14PM +0200, Anders Nordby wrote: > Hi, > > On Thu, Jun 10, 2010 at 08:18:09PM -0700, Jeremy Chadwick wrote: > >> Given that you stated FreeBSD8.1-Prerelease I think you should have the > >> patch, but please make sure that your sys/nfsserver/nfs_srvsubs.c is > >> at least r206406. > > I didn't have any time to dump and look at the network traffic much yet > (life is busy). But, the issue in this thread also happens/happened in > FreeBSD 7.3-RELEASE, so I don't see how it's a recent change that makes > this happen. Last night I had some progress, by switching to an old 100 > Mbps USB NIC of mine (nerds sure do have lots of handy things at home > eh) I got rid of the packet loss: > > Jun 11 01:25:14 unixfile kernel: rue0: 0/0, rev > 1.10/1.00, addr 2> on usbus3 > Jun 11 01:25:14 unixfile kernel: miibus2: on rue0 > Jun 11 01:25:14 unixfile kernel: ruephy0: media interf > ace> PHY 0 on miibus2 > > Performance is quite lousy however. Just in case I am trying to get hold > of a PCI-X Intel NIC to see how that goes, as this is a production > server after all (or supposed to be). > > > With regards to possible bge(4) issues, Yong-Hyeon works on this driver > > fairly often. If it turns out to be a driver issue of some sort, he can > > probably help. Relevant commits are here (to give you some idea of > > activity): > > > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c > > > > One commit caught my eye (rev 1.226.2.15), but that seems to be more > > focused on mbuf issues (your system doesn't appear to be having any, > > given your netstat -m output). > > > > CC'ing Yong-Hyeong, as he might know of some edge case where bge(4) > > could go crazy with interrupts. :-) Yong-Hyeon, the entire thread is > > here: > > > > http://lists.freebsd.org/pipermail/freebsd-fs/2010-June/008654.html > > Let me know if there's anything bge related I can try/test. It might > take a day or two or more. Customer is sort of getting annoyed by these > problems, so the room for testing is getting smaller. But of course I > want to help get a fix for this. > Show me dmesg output to know which bge(4) controller you had. And show me output of "netstat -ndI bge0". Some bge(4) controllers supports detailed MAC counters and these are exported via sysctl. If your controller is one of these controller, you can check the statistics of controller with "sysctl dev.bge.0.stat" and post it if you can see them. > Regards, > > -- > Anders. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 19:11:04 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EACAB1065674 for ; Fri, 11 Jun 2010 19:11:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 686688FC15 for ; Fri, 11 Jun 2010 19:11:03 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o5BJB050036792 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jun 2010 22:11:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o5BJB0N5060004; Fri, 11 Jun 2010 22:11:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o5BJAxAf060003; Fri, 11 Jun 2010 22:10:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 11 Jun 2010 22:10:59 +0300 From: Kostik Belousov To: Mikolaj Golub Message-ID: <20100611191059.GF13238@deviant.kiev.zoral.com.ua> References: <86mxv22ji7.fsf@zhuzha.ua1> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wtjvnLv0o8UUzur2" Content-Disposition: inline In-Reply-To: <86mxv22ji7.fsf@zhuzha.ua1> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.6 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: '#ifndef DIAGNOSTIC' in nfsclient code looks like a typo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 19:11:05 -0000 --wtjvnLv0o8UUzur2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 11, 2010 at 09:03:44AM +0300, Mikolaj Golub wrote: > Hi: >=20 > '#ifndef DIAGNOSTIC' in sys/nfsclient/nfs_vnops.c and > sys/fs/nfsclient/nfs_clvnops.c looks like a typo and '#ifdef' should be u= sed > instead (see the attached patch). All the changes should be converted to the KASSERTs. There is no point in doing if (something) panic(); for diagnostic; use KASSERT(something, (panic message)); --wtjvnLv0o8UUzur2 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkwSikMACgkQC3+MBN1Mb4ht8wCg4Lo/kk++XQFke4I56+CCH46v O1cAnRHlBUAYSiDN3fKNYfxaT989cDOo =qNG7 -----END PGP SIGNATURE----- --wtjvnLv0o8UUzur2-- From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 23:01:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBF5A106566B for ; Fri, 11 Jun 2010 23:01:21 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id 52AF38FC12 for ; Fri, 11 Jun 2010 23:01:20 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id C16B347194; Sat, 12 Jun 2010 01:01:20 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id Lpuwp0wZHxGg; Sat, 12 Jun 2010 01:01:20 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id 545BD47193; Sat, 12 Jun 2010 01:01:20 +0200 (CEST) Date: Sat, 12 Jun 2010 01:01:20 +0200 From: Anders Nordby To: Pyun YongHyeon Message-ID: <20100611230120.GA89356@fupp.net> References: <20100609122517.GA16231@fupp.net> <20100610081710.GA64350@server.vk2pj.dyndns.org> <20100610110609.GA87243@fupp.net> <20100610114831.GB71432@icarus.home.lan> <20100610130307.GA33285@fupp.net> <20100610133859.GA74094@icarus.home.lan> <20100611031809.GA93666@icarus.home.lan> <20100611163314.GA84574@fupp.net> <20100611175805.GE13776@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20100611175805.GE13776@michelle.cdnetworks.com> User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@freebsd.org, Peter Jeremy Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 23:01:21 -0000 Hi, On Fri, Jun 11, 2010 at 10:58:05AM -0700, Pyun YongHyeon wrote: >> Let me know if there's anything bge related I can try/test. It might >> take a day or two or more. Customer is sort of getting annoyed by these >> problems, so the room for testing is getting smaller. But of course I >> want to help get a fix for this. > Show me dmesg output to know which bge(4) controller you had. And > show me output of "netstat -ndI bge0". Some bge(4) controllers > supports detailed MAC counters and these are exported via sysctl. > If your controller is one of these controller, you can check the > statistics of controller with "sysctl dev.bge.0.stat" and post it > if you can see them. Since running on rue NIC I didn't retry bge again. But I did not reboot since I had problems last time either, I just changed NIC from bge1 to ue0. So I'm not sure if these numbers are interesting or if I should retry using a bge NIC, but here goes: anders@unixfile:~$ grep ^bge1 /var/run/dmesg.boot bge1: mem 0xfdce0000-0xfdceffff irq 26 at device 1.1 on pci3 bge1: Ethernet address: 00:16:35:03:e6:3e bge1: [ITHREAD] anders@unixfile:~$ netstat -ndI bge1 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop bge1* 1500 00:16:35:03:e6:3e 21417404 0 0 20313076 0 0 0 anders@unixfile:~$ sysctl dev.bge.1.stats dev.bge.1.stats.FramesDroppedDueToFilters: 0 dev.bge.1.stats.DmaWriteQueueFull: 34 dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 dev.bge.1.stats.NoMoreRxBDs: 0 dev.bge.1.stats.InputDiscards: 0 dev.bge.1.stats.InputErrors: 0 dev.bge.1.stats.RecvThresholdHit: 12086131 dev.bge.1.stats.DmaReadQueueFull: 957280 dev.bge.1.stats.DmaReadHighPriQueueFull: 4835 dev.bge.1.stats.SendDataCompQueueFull: 0 dev.bge.1.stats.RingSetSendProdIndex: 20515417 dev.bge.1.stats.RingStatusUpdate: 20492506 dev.bge.1.stats.Interrupts: 20492506 dev.bge.1.stats.AvoidedInterrupts: 0 dev.bge.1.stats.SendThresholdHit: 0 dev.bge.1.stats.rx.Octets: 0 dev.bge.1.stats.rx.Fragments: 0 dev.bge.1.stats.rx.UcastPkts: 0 dev.bge.1.stats.rx.MulticastPkts: 0 dev.bge.1.stats.rx.FCSErrors: 0 dev.bge.1.stats.rx.AlignmentErrors: 0 dev.bge.1.stats.rx.xonPauseFramesReceived: 0 dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 dev.bge.1.stats.rx.ControlFramesReceived: 0 dev.bge.1.stats.rx.xoffStateEntered: 0 dev.bge.1.stats.rx.FramesTooLong: 0 dev.bge.1.stats.rx.Jabbers: 0 dev.bge.1.stats.rx.UndersizePkts: 0 dev.bge.1.stats.rx.inRangeLengthError: 0 dev.bge.1.stats.rx.outRangeLengthError: 0 dev.bge.1.stats.tx.Octets: 0 dev.bge.1.stats.tx.Collisions: 0 dev.bge.1.stats.tx.XonSent: 0 dev.bge.1.stats.tx.XoffSent: 0 dev.bge.1.stats.tx.flowControlDone: 0 dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 dev.bge.1.stats.tx.SingleCollisionFrames: 0 dev.bge.1.stats.tx.MultipleCollisionFrames: 0 dev.bge.1.stats.tx.DeferredTransmissions: 0 dev.bge.1.stats.tx.ExcessiveCollisions: 0 dev.bge.1.stats.tx.LateCollisions: 0 dev.bge.1.stats.tx.UcastPkts: 0 dev.bge.1.stats.tx.MulticastPkts: 0 dev.bge.1.stats.tx.BroadcastPkts: 0 dev.bge.1.stats.tx.CarrierSenseErrors: 0 dev.bge.1.stats.tx.Discards: 0 dev.bge.1.stats.tx.Errors: 0 Regards, -- Anders. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 23:19:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADA39106567D for ; Fri, 11 Jun 2010 23:19:21 +0000 (UTC) (envelope-from hiroshi@soupacific.com) Received: from mail.soupacific.com (mail.soupacific.com [211.19.53.201]) by mx1.freebsd.org (Postfix) with ESMTP id 46D638FC20 for ; Fri, 11 Jun 2010 23:19:21 +0000 (UTC) Received: from [127.0.0.1] (unknown [192.168.1.239]) by mail.soupacific.com (Postfix) with ESMTP id 6EE966ABBB; Thu, 10 Jun 2010 09:40:54 +0000 (UTC) Message-ID: <4C10B526.4040908@soupacific.com> Date: Thu, 10 Jun 2010 18:49:26 +0900 From: "hiroshi@soupacific.com" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <20100416065126.GG1705@garage.freebsd.pl> <4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com> <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> In-Reply-To: <4BD17B0D.5080601@soupacific.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 23:19:21 -0000 Thanks for your supporting timeout and it works great for 9.0. One fo two server shutdown, then rebooting only one server, it works as primary. And now I try to run HAST on FreeBSD 8.0. Exact same configuration but soething wrong. On Primary server sv01A#hastctl crate zfshast sv01A#hastd sv01A#hastctl role primary zfshast On secondary sv01B#hastctl create zfshast sv01B#hastd sv01B#hastctl role secondary zfshast Then Secondary shows following Jun ..... [zfshast] (secondary) Unable to recieve request header: socket is not connected. Jun...... [zfshast] (secondary) worker process exited I checked and found proto_recv() function always returns socket is not connected. sv01A and sv01B is looks working, since before hastctl role secondary zfshast. hastd shows Jun.... sv01B hastd: [zfshast] (init) we acr as init for the resource and not as secondary as requested by tcp4:/192.168.0.240:56279 Tow time above message are shown. hast.conf is #global section control /var/run/hastctl listen tcp:/0.0.0.0.:8547 ## timeout 50 resource zfshast { on sv01A { local /dev/ad8 remote 192.168.0.241 } on sv01B { local /dev/ad8 remote 192.168.0.240 } } I change timeout value but no difference. 8.1 also same result. What shall I do ? Thanks Hiroshi P.S. I have to change postfix ip soon. From owner-freebsd-fs@FreeBSD.ORG Fri Jun 11 23:44:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B03B5106567E for ; Fri, 11 Jun 2010 23:44:21 +0000 (UTC) (envelope-from hiroshi@soupacific.com) Received: from mail.soupacific.com (mail.soupacific.com [211.19.53.201]) by mx1.freebsd.org (Postfix) with ESMTP id 48A5D8FC16 for ; Fri, 11 Jun 2010 23:44:20 +0000 (UTC) Received: from [127.0.0.1] (unknown [192.168.1.239]) by mail.soupacific.com (Postfix) with ESMTP id ED45A6B30E; Fri, 11 Jun 2010 11:46:57 +0000 (UTC) Message-ID: <4C122435.4020409@soupacific.com> Date: Fri, 11 Jun 2010 20:55:33 +0900 From: "hiroshi@soupacific.com" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <20100416065126.GG1705@garage.freebsd.pl> <4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com> <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> In-Reply-To: <4C10B526.4040908@soupacific.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 23:44:21 -0000 Thanks for your supporting timeout and it works great for 9.0. One fo two server shutdown, then rebooting only one server, it works as primary. And now I try to run HAST on FreeBSD 8.0. Exact same configuration but soething wrong. On Primary server sv01A#hastctl crate zfshast sv01A#hastd sv01A#hastctl role primary zfshast On secondary sv01B#hastctl create zfshast sv01B#hastd sv01B#hastctl role secondary zfshast Then Secondary shows following Jun ..... [zfshast] (secondary) Unable to recieve request header: socket is not connected. Jun...... [zfshast] (secondary) worker process exited I checked and found proto_recv() function always returns socket is not connected except first loop of recv_thread(). sv01A and sv01B is looks working, since before hastctl role secondary zfshast. hastd shows Jun.... sv01B hastd: [zfshast] (init) we acr as init for the resource and not as secondary as requested by tcp4:/192.168.0.240:56279 Tow time above message are shown. hast.conf is #global section control /var/run/hastctl listen tcp:/0.0.0.0.:8547 ## timeout 50 resource zfshast { on sv01A { local /dev/ad8 remote 192.168.0.241 } on sv01B { local /dev/ad8 remote 192.168.0.240 } } I change timeout value but no difference. 8.1 also same result. What shall I do ? Thanks Hiroshi P.S. This is second mail caused by postfix ffailure From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 06:02:28 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9992E106566B; Sat, 12 Jun 2010 06:02:28 +0000 (UTC) (envelope-from hiroshi@soupacific.com) Received: from mail.soupacific.com (mail.soupacific.com [211.19.53.201]) by mx1.freebsd.org (Postfix) with ESMTP id 2A73C8FC0A; Sat, 12 Jun 2010 06:02:27 +0000 (UTC) Received: from [127.0.0.1] (unknown [192.168.1.239]) by mail.soupacific.com (Postfix) with ESMTP id 0A98C6B638; Sat, 12 Jun 2010 05:53:50 +0000 (UTC) Message-ID: <4C1322EE.8070704@soupacific.com> Date: Sat, 12 Jun 2010 15:02:22 +0900 From: "hiroshi@soupacific.com" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Pawel Jakub Dawidek , freebsd-fs@freebsd.org References: <20100416065126.GG1705@garage.freebsd.pl><4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com><4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com><4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com><4BD0E432.1000108@soupacific.com><20100423061521.GC1670@garage.freebsd.pl><4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <4C122435.4020409@soupacific.com> In-Reply-To: <4C122435.4020409@soupacific.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 06:02:28 -0000 I put some log message to trace the trouble of HAST on 8.1. Modified code is /* * Thread receives requests from the primary node. */ static void * recv_thread(void *arg) { struct hast_resource *res = arg; struct hio *hio; bool wakeup; pjdlog_warning("recv_thread"); for (;;) { pjdlog_debug(2, "recv: Taking free request."); mtx_lock(&hio_free_list_lock); while ((hio = TAILQ_FIRST(&hio_free_list)) == NULL) { pjdlog_debug(2, "recv: No free requests, waiting."); cv_wait(&hio_free_list_cond, &hio_free_list_lock); } TAILQ_REMOVE(&hio_free_list, hio, hio_next); mtx_unlock(&hio_free_list_lock); pjdlog_debug(2, "recv: (%p) Got request.", hio); pjdlog_warning("wooooo"); if (hast_proto_recv_hdr(res->hr_remotein, &hio->hio_nv) < 0) { pjdlog_exit(EX_TEMPFAIL, "Unable to receive request header. "); } if (requnpack(res, hio) != 0) { pjdlog_warning("requnpack"); goto send_queue; } reqlog(LOG_DEBUG, 2, -1, hio, "recv: (%p) Got request header: ", hio); if (hio->hio_cmd == HIO_WRITE) { if (hast_proto_recv_data(res, res->hr_remotein, hio->hio_nv, hio->hio_data, MAXPHYS) < 0) { pjdlog_exit(EX_TEMPFAIL, "Unable to receive reply data"); } pjdlog_warning("HIO_WRITE"); } pjdlog_debug(2, "recv: (%p) Moving request to the disk queue.", hio); mtx_lock(&hio_disk_list_lock); wakeup = TAILQ_EMPTY(&hio_disk_list); TAILQ_INSERT_TAIL(&hio_disk_list, hio, hio_next); mtx_unlock(&hio_disk_list_lock); if (wakeup) { pjdlog_warning("wakeup"); cv_signal(&hio_disk_list_cond); } continue; send_queue: pjdlog_debug(2, "recv: (%p) Moving request to the send queue.", hio); mtx_lock(&hio_send_list_lock); wakeup = TAILQ_EMPTY(&hio_send_list); TAILQ_INSERT_TAIL(&hio_send_list, hio, hio_next); mtx_unlock(&hio_send_list_lock); if (wakeup) cv_signal(&hio_send_list_cond); } /* NOTREACHED */ return (NULL); } /* * Thread sends requests back to primary node. */ static void * send_thread(void *arg) { struct hast_resource *res = arg; struct nv *nvout; struct hio *hio; void *data; size_t length; bool wakeup; for (;;) { pjdlog_warning("send_thread for loop"); pjdlog_debug(2, "send: Taking request."); mtx_lock(&hio_send_list_lock); while ((hio = TAILQ_FIRST(&hio_send_list)) == NULL) { pjdlog_debug(2, "send: No requests, waiting."); cv_wait(&hio_send_list_cond, &hio_send_list_lock); } TAILQ_REMOVE(&hio_send_list, hio, hio_next); mtx_unlock(&hio_send_list_lock); 9.0 logs shows un 12 12:49:33 fw01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) HIO_WRITE Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) wakup Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) woooo Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) HIO_WRITE Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) wakup Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) woooo Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) HIO_WRITE Jun 12 12:49:33 fw01B hastd: [zfshast] (secondary) wakup repeated forever 8.1 Jun 12 14:07:18 sv01B hastd: [zfshast] (init) We act as init for the resource and not as secondary as requested by tcp4://192.168.0.240:59254. Jun 12 14:07:23 sv01B hastd: [zfshast] (init) We act as init for the resource and not as secondary as requested by tcp4://192.168.0.240:56349. Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) recv_thread Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) wooooo Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) HIO_WRITE Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) wakeup Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) wooooo Jun 12 14:07:28 sv01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=757, exitcode=75). Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) recv_thread Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) wooooo Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) HIO_WRITE Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) wakeup Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) wooooo Jun 12 14:07:33 sv01B hastd: [zfshast] (secondary) send_thread for loop Jun 12 14:07:38 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. I hope this simple trace could help you some idea. Thanks Hiroshi From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 08:23:01 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B3E8106566B for ; Sat, 12 Jun 2010 08:23:01 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (core.vx.sk [188.40.32.143]) by mx1.freebsd.org (Postfix) with ESMTP id 8008A8FC1A for ; Sat, 12 Jun 2010 08:23:00 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 70947162D7 for ; Sat, 12 Jun 2010 10:22:59 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id hkLYKna0c+rp for ; Sat, 12 Jun 2010 10:22:57 +0200 (CEST) Received: from [10.9.8.1] (chello089173000055.chello.sk [89.173.0.55]) by mail.vx.sk (Postfix) with ESMTPSA id D5519162C8 for ; Sat, 12 Jun 2010 10:22:56 +0200 (CEST) Message-ID: <4C1343E2.1000102@FreeBSD.org> Date: Sat, 12 Jun 2010 10:22:58 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=windows-1250 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: ZFS vendor bugfix patches on my TODO-list X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 08:23:01 -0000 Here is a list of not yet commited vendor ZFS patches on my todo list. All patches are from OpenSolaris and fix known bugs. delphij@ has already reviewed these, I am waiting for currently waiting for pjd's final words. If you run into any of these issues please try the corresponding patch. All 9 patches bundled: http://people.freebsd.org/~mm/patches/zfs/8.1/head-aggregated.patch Individual patches and problem descriptions: 1. http://people.freebsd.org/~mm/patches/zfs/8.1/head-8890.patch Synopsis: Unable to remove a file over NFS after hitting refquota limit Bug-ID: 6798878 Onnv revision: 8890:8c2bd5f17bf2 2. http://people.freebsd.org/~mm/patches/zfs/8.1/head-9409.patch Synopsis: zfs destroy fails to free object in open context, stops up txg train Bug-ID: 6809683 Onnv revision: 9409:9dc3f17354ed 3. http://people.freebsd.org/~mm/patches/zfs/8.1/head-9434.patch Synopsis: incomplete resilvering after disk replacement (raidz) Bug-ID: 6794570 Onnv revision: 9434:3bebded7c76a 4. http://people.freebsd.org/~mm/patches/8.1/head-9722.patch Synopsis: vdev_probe() starvation brings txg train to a screeching halt Bug-ID: 6844069 Onnv revision: 9722:e3866bad4e96 5. http://people.freebsd.org/~mm/patches/zfs/8.1/head-9774.patch Synopsis: ZFS panic deadlock: cycle in blocking chain via zfs_zget Bug-ID: 6788152 Onnv revision: 9774:0bb234ab2287 6. http://people.freebsd.org/~mm/patches/zfs/8.1/head-9997.patch Synopsis: zpool resilver stalls with spa_scrub_thread in a 3 way deadlock Bug-ID: 6843235 Onnv revision: 9997:174d75a29a1c 7. http://people.freebsd.org/~mm/patches/zfs/8.1/head-10040.patch Synopsis: zfs panics on zpool import Bug-ID: 6857012 Onnv revision: 10040:38b25aeeaf7a 8. http://people.freebsd.org/~mm/patches/zfs/8.1/head-10295.patch Synopsis: panic in zfs_getsecattr Bug-ID: 6870564 Onnv revision: 10295:f7a18a1e9610 9. http://people.freebsd.org/~mm/patches/zfs/8.1/head-10839.patch Synopsis: arc_read_done may try to byteswap undefined data (sparc-related) Bug-ID: 6836714 Onnv revision: 10839:cf83b553a2ab Cheers, mm From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 10:47:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7AAE1065679 for ; Sat, 12 Jun 2010 10:47:50 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id 0DACB8FC1D for ; Sat, 12 Jun 2010 10:47:48 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 9C03845CBA; Sat, 12 Jun 2010 12:47:46 +0200 (CEST) Received: from localhost (gate.wheel.pl [10.0.0.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 8F588456B1; Sat, 12 Jun 2010 12:47:40 +0200 (CEST) Date: Sat, 12 Jun 2010 12:47:32 +0200 From: Pawel Jakub Dawidek To: "hiroshi@soupacific.com" Message-ID: <20100612104336.GA2253@garage.freebsd.pl> References: <4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com> <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UHN/qo2QbUvPLonB" Content-Disposition: inline In-Reply-To: <4C10B526.4040908@soupacific.com> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, TO_ADDRESS_EQ_REAL autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 10:47:50 -0000 --UHN/qo2QbUvPLonB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 10, 2010 at 06:49:26PM +0900, hiroshi@soupacific.com wrote: > Thanks for your supporting timeout and it works great for 9.0. > One fo two server shutdown, then rebooting only one server, it works as= =20 > primary. >=20 >=20 > And now I try to run HAST on FreeBSD 8.0. Is this 8.0 or 8-STABLE? > Exact same configuration but soething wrong. >=20 > On Primary server > sv01A#hastctl crate zfshast > sv01A#hastd > sv01A#hastctl role primary zfshast >=20 > On secondary >=20 > sv01B#hastctl create zfshast > sv01B#hastd > sv01B#hastctl role secondary zfshast >=20 > Then > Secondary shows following >=20 > Jun ..... [zfshast] (secondary) Unable to recieve request header: socket= =20 > is not connected. > Jun...... [zfshast] (secondary) worker process exited >=20 > I checked and found proto_recv() function always returns socket is not=20 > connected. >=20 > sv01A and sv01B is looks working, since before hastctl role secondary=20 > zfshast. >=20 > hastd shows > Jun.... sv01B hastd: [zfshast] (init) we acr as init for the resource=20 > and not as secondary as requested by tcp4:/192.168.0.240:56279 So is the resource configured as secondary or not? Could you stop hastd and start it manually with debug turned on? Don't forget to mark it as secondary. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --UHN/qo2QbUvPLonB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkwTZcMACgkQForvXbEpPzQ7mgCfRIjXlsXpptQ/3w4xTmW7De+/ pxMAn0GYsiQT1XUhpu7d03yOiyXJKlCf =9tGV -----END PGP SIGNATURE----- --UHN/qo2QbUvPLonB-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 11:43:34 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 990961065670; Sat, 12 Jun 2010 11:43:34 +0000 (UTC) (envelope-from hiroshi@soupacific.com) Received: from mail.soupacific.com (mail.soupacific.com [211.19.53.201]) by mx1.freebsd.org (Postfix) with ESMTP id 222F68FC19; Sat, 12 Jun 2010 11:43:33 +0000 (UTC) Received: from [127.0.0.1] (unknown [192.168.1.239]) by mail.soupacific.com (Postfix) with ESMTP id 98E316B702; Sat, 12 Jun 2010 11:34:56 +0000 (UTC) Message-ID: <4C1372E0.1000903@soupacific.com> Date: Sat, 12 Jun 2010 20:43:28 +0900 From: "hiroshi@soupacific.com" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com> <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <20100612104336.GA2253@garage.freebsd.pl> In-Reply-To: <20100612104336.GA2253@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 11:43:34 -0000 Thanks for your quick responce! > Is this 8.0 or 8-STABLE? First I did on 8.0-Release and now csuped to 8.1-Prerelease Both same behaiver. hastd -dd log info: Follwoing are debug.log Sorry bit long! 9.0 current un 12 18:53:52 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:53:52 fw01B hastd: tcp4://192.168.0.240:32772: resource=zfshast Jun 12 18:53:57 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:53:57 fw01B hastd: tcp4://192.168.0.240:35907: resource=zfshast Jun 12 18:54:02 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:02 fw01B hastd: tcp4://192.168.0.240:41046: resource=zfshast Jun 12 18:54:07 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:07 fw01B hastd: tcp4://192.168.0.240:24170: resource=zfshast Jun 12 18:54:12 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:12 fw01B hastd: tcp4://192.168.0.240:58260: resource=zfshast Jun 12 18:54:17 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:17 fw01B hastd: tcp4://192.168.0.240:62353: resource=zfshast Jun 12 18:54:22 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:22 fw01B hastd: tcp4://192.168.0.240:45572: resource=zfshast Jun 12 18:54:27 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:27 fw01B hastd: tcp4://192.168.0.240:40139: resource=zfshast Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:40139. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:40139 configured. Jun 12 18:54:27 fw01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 18:54:27 fw01B hastd: tcp4://192.168.0.240:16787: resource=zfshast Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:16787 configured. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x8014132e0) Got request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x8014132e0) Got request header: WRITE(0, 131072). Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x8014132e0) Moving request to the disk queue. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x801413290) Got request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: (0x8014132e0) Got request: WRITE(0, 131072). Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: (0x8014132e0) Moving request to the send queue. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: (0x8014132e0) Got request: WRITE(0, 131072). Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: (0x8014132e0) Moving request to the free queue. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x801413290) Got request header: WRITE(131072, 131072). Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x801413290) Moving request to the disk queue. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x801413240) Got request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: (0x801413290) Got request: WRITE(131072, 131072). Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: (0x801413290) Moving request to the send queue. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: (0x801413290) Got request: WRITE(131072, 131072). Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) disk: (0x801413290) Moving request to the free queue. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 18:54:27 fw01B hastd: [zfshast] (secondary) recv: (0x801413240) Got request header: WRITE(262144, 131072). 8.1-Prerelease Jun 12 20:00:07 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 20:00:07 sv01B hastd: tcp4://192.168.0.240:63762: resource=zfshast Jun 12 20:00:12 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 20:00:12 sv01B hastd: tcp4://192.168.0.240:22890: resource=zfshast Jun 12 20:00:17 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 20:00:17 sv01B hastd: tcp4://192.168.0.240:36449: resource=zfshast Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:36449. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:36449 configured. Jun 12 20:00:17 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 20:00:17 sv01B hastd: tcp4://192.168.0.240:39312: resource=zfshast Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:39312 configured. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) recv: (0x8011f52e0) Got request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) recv: (0x8011f52e0) Got request header: WRITE(0, 131072). Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) recv: (0x8011f52e0) Moving request to the disk queue. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) recv: (0x8011f5290) Got request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Got request: WRITE(0, 131072). Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Moving request to the send queue. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) send: (0x8011f52e0) Got request: WRITE(0, 131072). Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Moving request to the free queue. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 20:00:17 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 20:00:22 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 20:00:22 sv01B hastd: tcp4://192.168.0.240:55777: resource=zfshast Jun 12 20:00:22 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:55777. Jun 12 20:00:22 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=768), stopping it. Jun 12 20:00:22 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:55777 configured. Jun 12 20:00:22 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 20:00:22 sv01B hastd: tcp4://192.168.0.240:38559: resource=zfshast 0:56279 > > So is the resource configured as secondary or not? > Could you stop hastd and start it manually with debug turned on? > Don't forget to mark it as secondary. > Yes I did it as secondary manualy. Thanks Hiroshi From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 12:03:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EDF61065674; Sat, 12 Jun 2010 12:03:52 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id EBC0F8FC12; Sat, 12 Jun 2010 12:03:51 +0000 (UTC) Received: by iwn7 with SMTP id 7so2757516iwn.13 for ; Sat, 12 Jun 2010 05:03:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=9ZatU3tWlk1pC8R6lFZNJj8e/M82Zd+KnVRDq22mb5o=; b=tvPziFEhqbhjsVaTP7FGqXQVlR3JxArtPeT4FJzjY8VMKBFzdBB/qsoqvh5G9rf87D kqw8p/5mtq6P1Sfn0AYcYCbpy40FlePF0JbCDq5BCvxRKCPLW+sW2K9K4Gv0oXqJK/br X/8ra6nJovZLYTDuP6aapmXcaMOOWO8+A0HMg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=L5iZBofDC7KuP48FPFxwDTsuiqUTEzAjKXyj2uoqOZIK1iySp2EYePF9bWglnjbA7h knX876JGi61jG+RjrjAByoTzweAXCgIl4/gAvXBf+z+Wy8h3b7wB7bnb+aigTGf9Lbl8 HwZdyC//LPQOr8ETOFyAMD1BwC14b8xoTwuMo= Received: by 10.231.139.21 with SMTP id c21mr3176762ibu.160.1276344230877; Sat, 12 Jun 2010 05:03:50 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id b3sm10149581ibf.13.2010.06.12.05.03.48 (version=SSLv3 cipher=RC4-MD5); Sat, 12 Jun 2010 05:03:49 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C1377A3.2040408@dataix.net> Date: Sat, 12 Jun 2010 08:03:47 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 To: Ivan Voras References: <20100610162629.38992mazf0sfdqg0@webmail.leidinger.net> <20100611172033.42001s90ahe57oe8@webmail.leidinger.net> In-Reply-To: X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Alexander Leidinger Subject: Re: CFT: periodic scrubbing of ZFS pools X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 12:03:52 -0000 On 06/11/2010 12:11, Ivan Voras wrote: > On 11 June 2010 17:20, Alexander Leidinger wrote: >> Quoting Ivan Voras (from Fri, 11 Jun 2010 14:04:24 >> +0200): > >>> Fairly good and useful, but could you add a small check of "zpool >>> status" information before scrubbing that would a) complain LOUDLY AND >>> VISIBLY if a previous scrub failed and b) skip issuing a new scrub >>> command if there is such an error, to avoid stressing possibly broken >>> hardware? >> >> Can you please provide an example of such a failed scrub? > > You should probably treat any status message that doesn't have "none > requested" or "scrub completed with 0 errors..." as failed. I disagree with this as it conflicts with your previous request. none requested = no error and the next scrub should be allowed scrub completed with 0 errors = no errors. Why shouldn't the next scrub that is being determined in the script take place if there is no errors ?. I only see doing this if you want the scrub to only ever be performed once and never again thereafter. I do agree on the other hand if the scrub status has any form of [fF][aA][iI][dD] in it then it should not be performed and as well if it contains [fF][aA][uU][lL][tT][eE][dD] or some other combinations on one of the devices. Regards, -- jhell From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 12:17:22 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 988371065676 for ; Sat, 12 Jun 2010 12:17:22 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 52CFF8FC1C for ; Sat, 12 Jun 2010 12:17:22 +0000 (UTC) Received: by iwn7 with SMTP id 7so2768066iwn.13 for ; Sat, 12 Jun 2010 05:17:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=xhE12Tler9mfTf7hv2jzvQaIo7S/bN9OXWeTKiGwpZc=; b=cAaZ1TWSIn6Oj48QufzHobzbmKiX08msW40WQc6nr4HfAgKisP7wnbI7TW49rkAXqT GsF3kebYuq1krZlBZ1gLSP5ovTcC6XrgCkhwGiXNDuKNCWnF1Q+IE5gsMNrvMacR/uCW 2RQcYrcIC4tteRFVnaRLldizsUfKBC6B7kzRk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=bYd7Xa2NbvM8QtW7qwA8cyKYiI36xtFzRQ1Z6RBNDRohpp2GlsrvQ8COyhbkB+wCHK a9lFe8Ew8f7Kfhwf3/sU1H0HZ9CuSh13noccWVxsgUkvPh5oZB7zIORp8AxODGhyrUqa mG0qiUG+j5jQWMfS9bZ0TReEW9RooiLW7Lhd4= Received: by 10.231.168.129 with SMTP id u1mr3302111iby.49.1276345040957; Sat, 12 Jun 2010 05:17:20 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id a8sm10195975ibi.11.2010.06.12.05.17.18 (version=SSLv3 cipher=RC4-MD5); Sat, 12 Jun 2010 05:17:19 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C137ACE.9080900@dataix.net> Date: Sat, 12 Jun 2010 08:17:18 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 To: Alexander Leidinger References: <20100609162627.11355zjzwnf7nj8k@webmail.leidinger.net> <4C0FAE2A.7050103@dataix.net> <4C0FB1DE.9080508@dataix.net> <20100610115324.10161biomkjndvy8@webmail.leidinger.net> <20100610173825.164930ekkryr5tes@webmail.leidinger.net> <4C1138D0.7070901@dataix.net> <20100611104219.51344ag1ah7br4kk@webmail.leidinger.net> In-Reply-To: <20100611104219.51344ag1ah7br4kk@webmail.leidinger.net> X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: Do we want a periodic script for a zfs scrub? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 12:17:22 -0000 On 06/11/2010 04:42, Alexander Leidinger wrote: : #!/bin/sh : : lastscrub=$(zpool history exports |grep scrub |tail -1 |cut -f1 -d.) : todayjul=$(date -j -f "%Y-%m-%d" "+%j" $(date "+%Y-%m-%d")) : scrubjul=$(date -j -f "%Y-%m-%d" "+%j" $lastscrub) : : echo $lastscrub Last Scrub From zpool history : echo $todayjul Today converted to julian : echo $scrubjul Last scrub converted to julian : : expired=$(($todayjul-$scrubjul)) > > Apart from the fact that we can do this with one $(( ))... what happens > if/when time_t is extended to 64 bits on 32 bit platforms? Can we get > into trouble with the shell-arithmetic or not? It depends upon the > bit-size of the shell integers, and the signedness of them. Jilles (our > shell maintainer) suggested also to use the seconds since epoch and I > asked him the same question. I'm waiting for an answer from him. > I do not think this would be a problem for the script as the script is relying on date for the conversion except for the subtraction that is taking place. If there was a problem then I would believe it would have to be corrected in date(1) & possibly sh(1), I could be wrong though. > The same concerns apply to test(1) (or the corresponding buildin) in the > solution of Artem. > I agree. > By calculating with days everywhere (like in my solution), I'm sure that > it takes longer to hit a wall than by calculating with seconds since > epoch (which can cause a problem in 2038 or during a transition when > this problem is tackled in time_t but not here, which is not that far > away). The off-by-one day once every 4 years shouldn't be a problem. If > someone can assure with some nice facts, that using the seconds since > epoch will not cause problems in the described cases, I have no problem > to switch to use them. I agree with this, please see corrected example above. Another situation that had come to mind is when & certainly being possible that the time on the system could drop to before the last scrub has taken place causing (using the above script for example) todayjul to be less than scrubjul and in that case would output a negative integer that would skew results until the system time has been restored to its correct & current date & time. if [ $todayjul -gt $scrubjul ]; then expired=$(($todayjul-$scrubjul)) else expired=$(($scrubjul-$todayjul)) fi -- jhell From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 13:28:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 084E11065673; Sat, 12 Jun 2010 13:28:19 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7433F8FC15; Sat, 12 Jun 2010 13:28:18 +0000 (UTC) Received: by iwn7 with SMTP id 7so2826055iwn.13 for ; Sat, 12 Jun 2010 06:28:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=BxKOviUiOtBJ1fd1sfklvc+5zpmrN2xLQkgXacW2qEQ=; b=xsyMK3DcAbEZ6dfyLRn58aX8aZ18TD7ciJzoEml+yQmrin5DVre6l5tCItPAGGeWGR SQkqYZ4iVWzBC5HC+WJwL1Xmo2uNhTd8luqfCN5m741coVvW5WQkmP7SN9k0DWmkLYty m06V+EdeKoxCadd/kthpp2lIljlhctYwT5Dc0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:openpgp:content-type :content-transfer-encoding; b=M0ZTOQShAB9LRiLk3btAJjGGONJEPoB8pujjW6GUzJ0niFGEznCKXGE3MyoioiCLFk I7AexSz5sQ4YykfbvOUZaaBZjqP1BcTnzdHiEXj+MSjX1sw9TZZSa+53z0cS3jMN9gc8 WQkFM6baQLzVJ9vgiJV7VG/b1WeGyPYoH4xa8= Received: by 10.231.190.132 with SMTP id di4mr3385990ibb.41.1276349297993; Sat, 12 Jun 2010 06:28:17 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-128-180.dsl.klmzmi.sbcglobal.net [99.181.128.180]) by mx.google.com with ESMTPS id t28sm10442362ibg.18.2010.06.12.06.28.16 (version=SSLv3 cipher=RC4-MD5); Sat, 12 Jun 2010 06:28:16 -0700 (PDT) Sender: "J. Hellenthal" Message-ID: <4C138B6F.3060107@dataix.net> Date: Sat, 12 Jun 2010 09:28:15 -0400 From: jhell User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.9) Gecko/20100515 Thunderbird MIME-Version: 1.0 To: "hiroshi@soupacific.com" References: <20100416065126.GG1705@garage.freebsd.pl> <4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com> <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <4C122435.4020409@soupacific.com> In-Reply-To: <4C122435.4020409@soupacific.com> X-Enigmail-Version: 1.0.1 OpenPGP: id=89D8547E Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 13:28:19 -0000 On 06/11/2010 07:55, hiroshi@soupacific.com wrote: > On Primary server > sv01A#hastctl crate zfshast Just to be sure. Are you sure the above command is intended to be "crate" and not create ? As I have seen you mention it two times now. -- jhell From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 13:35:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A5C91065672; Sat, 12 Jun 2010 13:35:50 +0000 (UTC) (envelope-from hiroshi@soupacific.com) Received: from mail.soupacific.com (mail.soupacific.com [211.19.53.201]) by mx1.freebsd.org (Postfix) with ESMTP id 040438FC18; Sat, 12 Jun 2010 13:35:49 +0000 (UTC) Received: from [127.0.0.1] (unknown [192.168.1.239]) by mail.soupacific.com (Postfix) with ESMTP id A0A776B779; Sat, 12 Jun 2010 13:27:12 +0000 (UTC) Message-ID: <4C138D30.30302@soupacific.com> Date: Sat, 12 Jun 2010 22:35:44 +0900 From: "hiroshi@soupacific.com" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: jhell References: <20100416065126.GG1705@garage.freebsd.pl> <4BCD3979.8050107@soupacific.com> <4BCD5AD7.8070502@soupacific.com> <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <4C122435.4020409@soupacific.com> <4C138B6F.3060107@dataix.net> In-Reply-To: <4C138B6F.3060107@dataix.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 13:35:50 -0000 On 6/12/2010 10:28 PM, jhell wrote: > On 06/11/2010 07:55, hiroshi@soupacific.com wrote: >> On Primary server >> sv01A#hastctl crate zfshast > > Just to be sure. Are you sure the above command is intended to be > "crate" and not create ? > > As I have seen you mention it two times now. > > sorry those are mistype ! Sure create ! Thanks Hiroshi From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 14:23:28 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4AF7B1065678 for ; Sat, 12 Jun 2010 14:23:28 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id 851E88FC18 for ; Sat, 12 Jun 2010 14:23:27 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 8872345E48; Sat, 12 Jun 2010 16:23:25 +0200 (CEST) Received: from localhost (gate.wheel.pl [10.0.0.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id B010745CDC; Sat, 12 Jun 2010 16:23:20 +0200 (CEST) Date: Sat, 12 Jun 2010 16:23:11 +0200 From: Pawel Jakub Dawidek To: "hiroshi@soupacific.com" Message-ID: <20100612142311.GF2253@garage.freebsd.pl> References: <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <20100612104336.GA2253@garage.freebsd.pl> <4C1372E0.1000903@soupacific.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="gTtJ75FAzB1T2CN6" Content-Disposition: inline In-Reply-To: <4C1372E0.1000903@soupacific.com> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00, TO_ADDRESS_EQ_REAL autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 14:23:28 -0000 --gTtJ75FAzB1T2CN6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 12, 2010 at 08:43:28PM +0900, hiroshi@soupacific.com wrote: > >Is this 8.0 or 8-STABLE? > First I did on 8.0-Release and now csuped to 8.1-Prerelease > Both same behaiver. >=20 > hastd -dd log info: > Follwoing are debug.log Sorry bit long! [...] Could you send the debug from the whole session, but when the problem appears? I don't see those "socket is not connected" errors in the output you sent. If it is possible could you turn off lines wrapping or maybe send the debug output as an attachment? --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --gTtJ75FAzB1T2CN6 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEUEARECAAYFAkwTmE8ACgkQForvXbEpPzQbzwCXb0s6H4M4g4GiMvmJxCAvF3gn pQCeKKOkN4yF8+AqiCe5tcmSj+R1sZw= =zfXG -----END PGP SIGNATURE----- --gTtJ75FAzB1T2CN6-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 14:54:27 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB9171065674; Sat, 12 Jun 2010 14:54:27 +0000 (UTC) (envelope-from hiroshi@soupacific.com) Received: from mail.soupacific.com (mail.soupacific.com [211.19.53.201]) by mx1.freebsd.org (Postfix) with ESMTP id 05A788FC20; Sat, 12 Jun 2010 14:54:26 +0000 (UTC) Received: from [127.0.0.1] (unknown [192.168.1.239]) by mail.soupacific.com (Postfix) with ESMTP id 5450C6B7B6; Sat, 12 Jun 2010 14:45:49 +0000 (UTC) Message-ID: <4C139F9C.2090305@soupacific.com> Date: Sat, 12 Jun 2010 23:54:20 +0900 From: "hiroshi@soupacific.com" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <20100612104336.GA2253@garage.freebsd.pl> <4C1372E0.1000903@soupacific.com> <20100612142311.GF2253@garage.freebsd.pl> In-Reply-To: <20100612142311.GF2253@garage.freebsd.pl> Content-Type: multipart/mixed; boundary="------------040800000505090801010307" Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 14:54:27 -0000 This is a multi-part message in MIME format. --------------040800000505090801010307 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit > Could you send the debug from the whole session, but when the problem > appears? I don't see those "socket is not connected" errors in the > output you sent. "socket is not connected" errors is in message log > If it is possible could you turn off lines wrapping or maybe send the > debug output as an attachment? > I here attache debug.log and message files. Thanks Hiroshi --------------040800000505090801010307 Content-Type: text/plain; name="messages" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="messages" Jun 12 23:37:33 sv01B newsyslog[433]: logfile first created Jun 12 23:37:33 sv01B syslogd: kernel boot file is /boot/kernel/kernel Jun 12 23:37:33 sv01B kernel: Copyright (c) 1992-2010 The FreeBSD Project. Jun 12 23:37:33 sv01B kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Jun 12 23:37:33 sv01B kernel: The Regents of the University of California. All rights reserved. Jun 12 23:37:33 sv01B kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Jun 12 23:37:33 sv01B kernel: FreeBSD 8.1-PRERELEASE #3: Mon Jun 7 21:35:44 UTC 2010 Jun 12 23:37:33 sv01B kernel: root@sv01B:/usr/obj/usr/src/sys/GENERIC amd64 Jun 12 23:37:33 sv01B kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Jun 12 23:37:33 sv01B kernel: CPU: AMD Athlon(tm) II X3 440 Processor (3000.14-MHz K8-class CPU) Jun 12 23:37:33 sv01B kernel: Origin = "AuthenticAMD" Id = 0x100f52 Family = 10 Model = 5 Stepping = 2 Jun 12 23:37:33 sv01B kernel: Features=0x178bfbff Jun 12 23:37:33 sv01B kernel: Features2=0x802009 Jun 12 23:37:33 sv01B kernel: AMD Features=0xee500800 Jun 12 23:37:33 sv01B kernel: AMD Features2=0x37ff Jun 12 23:37:33 sv01B kernel: TSC: P-state invariant Jun 12 23:37:33 sv01B kernel: real memory = 4294967296 (4096 MB) Jun 12 23:37:33 sv01B kernel: avail memory = 3845677056 (3667 MB) Jun 12 23:37:33 sv01B kernel: ACPI APIC Table: <7623MS A7623200> Jun 12 23:37:33 sv01B kernel: FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs Jun 12 23:37:33 sv01B kernel: FreeBSD/SMP: 1 package(s) x 3 core(s) Jun 12 23:37:33 sv01B kernel: cpu0 (BSP): APIC ID: 0 Jun 12 23:37:33 sv01B kernel: cpu1 (AP): APIC ID: 1 Jun 12 23:37:33 sv01B kernel: cpu2 (AP): APIC ID: 2 Jun 12 23:37:33 sv01B kernel: ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x 0 0/0x1 (20100331/tbfadt-655) Jun 12 23:37:33 sv01B kernel: ioapic0 irqs 0-23 on motherboard Jun 12 23:37:33 sv01B kernel: kbd1 at kbdmux0 Jun 12 23:37:33 sv01B kernel: acpi0: <7623MS A7623200> on motherboard Jun 12 23:37:33 sv01B kernel: acpi0: [ITHREAD] Jun 12 23:37:33 sv01B kernel: acpi0: Power Button (fixed) Jun 12 23:37:33 sv01B kernel: acpi0: reservation of fee00000, 1000 (3) failed Jun 12 23:37:33 sv01B kernel: acpi0: reservation of ffb80000, 80000 (3) failed Jun 12 23:37:33 sv01B kernel: acpi0: reservation of fec10000, 20 (3) failed Jun 12 23:37:33 sv01B kernel: acpi0: reservation of 0, a0000 (3) failed Jun 12 23:37:33 sv01B kernel: acpi0: reservation of 100000, cff00000 (3) failed Jun 12 23:37:33 sv01B kernel: ACPI HPET table warning: Sequence is non-zero (2) Jun 12 23:37:33 sv01B kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Jun 12 23:37:33 sv01B kernel: acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 Jun 12 23:37:33 sv01B kernel: cpu0: on acpi0 Jun 12 23:37:33 sv01B kernel: cpu1: on acpi0 Jun 12 23:37:33 sv01B kernel: cpu2: on acpi0 Jun 12 23:37:33 sv01B kernel: acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Jun 12 23:37:33 sv01B kernel: Timecounter "HPET" frequency 14318180 Hz quality 900 Jun 12 23:37:33 sv01B kernel: pcib0: port 0xcf8-0xcff on acpi0 Jun 12 23:37:33 sv01B kernel: pci0: on pcib0 Jun 12 23:37:33 sv01B kernel: pcib1: at device 1.0 on pci0 Jun 12 23:37:33 sv01B kernel: pci1: on pcib1 Jun 12 23:37:33 sv01B kernel: vgapci0: port 0xd000-0xd0ff mem 0xd0000000-0xdfffffff,0xfeaf0000-0xfeafffff,0xfe900000-0xfe9fffff irq 18 at device 5.0 on pci1 Jun 12 23:37:33 sv01B kernel: pci1: at device 5.1 (no driver attached) Jun 12 23:37:33 sv01B kernel: pcib2: irq 17 at device 5.0 on pci0 Jun 12 23:37:33 sv01B kernel: pci2: on pcib2 Jun 12 23:37:33 sv01B kernel: alc0: port 0xe800-0xe87f mem 0xfebc0000-0xfebfffff irq 17 at device 0.0 on pci2 Jun 12 23:37:33 sv01B kernel: alc0: 15872 Tx FIFO, 15360 Rx FIFO Jun 12 23:37:33 sv01B kernel: alc0: Using 1 MSI message(s). Jun 12 23:37:33 sv01B kernel: miibus0: on alc0 Jun 12 23:37:33 sv01B kernel: atphy0: PHY 0 on miibus0 Jun 12 23:37:33 sv01B kernel: atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, auto Jun 12 23:37:33 sv01B kernel: alc0: Ethernet address: 40:61:86:cc:e9:34 Jun 12 23:37:33 sv01B kernel: alc0: [FILTER] Jun 12 23:37:33 sv01B kernel: atapci0: port 0xc000-0xc007,0xb000-0xb003,0xa000-0xa007,0x9000-0x9003,0x8000-0x800f mem 0xfe8ffc00-0xfe8fffff irq 22 at device 17.0 on pci0 Jun 12 23:37:33 sv01B kernel: atapci0: [ITHREAD] Jun 12 23:37:33 sv01B kernel: atapci0: AHCI v1.10 controller with 4 3Gbps ports, PM supported Jun 12 23:37:33 sv01B kernel: ata2: on atapci0 Jun 12 23:37:33 sv01B kernel: ata2: [ITHREAD] Jun 12 23:37:33 sv01B kernel: ata3: on atapci0 Jun 12 23:37:33 sv01B kernel: ata3: [ITHREAD] Jun 12 23:37:33 sv01B kernel: ata4: on atapci0 Jun 12 23:37:33 sv01B kernel: ata4: [ITHREAD] Jun 12 23:37:33 sv01B kernel: ata5: on atapci0 Jun 12 23:37:33 sv01B kernel: ata5: [ITHREAD] Jun 12 23:37:33 sv01B kernel: ohci0: mem 0xfe8fe000-0xfe8fefff irq 16 at device 18.0 on pci0 Jun 12 23:37:33 sv01B kernel: ohci0: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus0: on ohci0 Jun 12 23:37:33 sv01B kernel: ohci1: mem 0xfe8fd000-0xfe8fdfff irq 16 at device 18.1 on pci0 Jun 12 23:37:33 sv01B kernel: ohci1: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus1: on ohci1 Jun 12 23:37:33 sv01B kernel: ehci0: mem 0xfe8ff800-0xfe8ff8ff irq 17 at device 18.2 on pci0 Jun 12 23:37:33 sv01B kernel: ehci0: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus2: EHCI version 1.0 Jun 12 23:37:33 sv01B kernel: usbus2: on ehci0 Jun 12 23:37:33 sv01B kernel: ohci2: mem 0xfe8fc000-0xfe8fcfff irq 18 at device 19.0 on pci0 Jun 12 23:37:33 sv01B kernel: ohci2: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus3: on ohci2 Jun 12 23:37:33 sv01B kernel: ohci3: mem 0xfe8fb000-0xfe8fbfff irq 18 at device 19.1 on pci0 Jun 12 23:37:33 sv01B kernel: ohci3: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus4: on ohci3 Jun 12 23:37:33 sv01B kernel: ehci1: mem 0xfe8ff400-0xfe8ff4ff irq 19 at device 19.2 on pci0 Jun 12 23:37:33 sv01B kernel: ehci1: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus5: EHCI version 1.0 Jun 12 23:37:33 sv01B kernel: usbus5: on ehci1 Jun 12 23:37:33 sv01B kernel: pci0: at device 20.0 (no driver attached) Jun 12 23:37:33 sv01B kernel: atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 Jun 12 23:37:33 sv01B kernel: ata0: on atapci1 Jun 12 23:37:33 sv01B kernel: ata0: [ITHREAD] Jun 12 23:37:33 sv01B kernel: ata1: on atapci1 Jun 12 23:37:33 sv01B kernel: ata1: [ITHREAD] Jun 12 23:37:33 sv01B kernel: isab0: at device 20.3 on pci0 Jun 12 23:37:33 sv01B kernel: isa0: on isab0 Jun 12 23:37:33 sv01B kernel: pcib3: at device 20.4 on pci0 Jun 12 23:37:33 sv01B kernel: pci3: on pcib3 Jun 12 23:37:33 sv01B kernel: ohci4: mem 0xfe8fa000-0xfe8fafff irq 18 at device 20.5 on pci0 Jun 12 23:37:33 sv01B kernel: ohci4: [ITHREAD] Jun 12 23:37:33 sv01B kernel: usbus6: on ohci4 Jun 12 23:37:33 sv01B kernel: acpi_button0: on acpi0 Jun 12 23:37:33 sv01B kernel: atrtc0: port 0x70-0x71 irq 8 on acpi0 Jun 12 23:37:33 sv01B kernel: sc0: at flags 0x100 on isa0 Jun 12 23:37:33 sv01B kernel: sc0: VGA <16 virtual consoles, flags=0x300> Jun 12 23:37:33 sv01B kernel: vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Jun 12 23:37:33 sv01B kernel: atkbdc0: at port 0x60,0x64 on isa0 Jun 12 23:37:33 sv01B kernel: atkbd0: irq 1 on atkbdc0 Jun 12 23:37:33 sv01B kernel: kbd0 at atkbd0 Jun 12 23:37:33 sv01B kernel: atkbd0: [GIANT-LOCKED] Jun 12 23:37:33 sv01B kernel: atkbd0: [ITHREAD] Jun 12 23:37:33 sv01B kernel: ppc0: cannot reserve I/O port range Jun 12 23:37:33 sv01B kernel: acpi_throttle0: on cpu0 Jun 12 23:37:33 sv01B kernel: hwpstate0: on cpu0 Jun 12 23:37:33 sv01B kernel: ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; Jun 12 23:37:33 sv01B kernel: to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. Jun 12 23:37:33 sv01B kernel: ZFS filesystem version 3 Jun 12 23:37:33 sv01B kernel: ZFS storage pool version 14 Jun 12 23:37:33 sv01B kernel: Timecounters tick every 1.000 msec Jun 12 23:37:33 sv01B kernel: usbus0: 12Mbps Full Speed USB v1.0 Jun 12 23:37:33 sv01B kernel: usbus1: 12Mbps Full Speed USB v1.0 Jun 12 23:37:33 sv01B kernel: usbus2: 480Mbps High Speed USB v2.0 Jun 12 23:37:33 sv01B kernel: usbus3: 12Mbps Full Speed USB v1.0 Jun 12 23:37:33 sv01B kernel: usbus4: 12Mbps Full Speed USB v1.0 Jun 12 23:37:33 sv01B kernel: usbus5: 480Mbps High Speed USB v2.0 Jun 12 23:37:33 sv01B kernel: usbus6: 12Mbps Full Speed USB v1.0 Jun 12 23:37:33 sv01B kernel: ad4: 953869MB at ata2-master UDMA100 SATA 3Gb/s Jun 12 23:37:33 sv01B kernel: ugen0.1: at usbus0 Jun 12 23:37:33 sv01B kernel: uhub0: on usbus0 Jun 12 23:37:33 sv01B kernel: ugen1.1: at usbus1 Jun 12 23:37:33 sv01B kernel: uhub1: on usbus1 Jun 12 23:37:33 sv01B kernel: ugen2.1: at usbus2 Jun 12 23:37:33 sv01B kernel: uhub2: on usbus2 Jun 12 23:37:33 sv01B kernel: ugen3.1: at usbus3 Jun 12 23:37:33 sv01B kernel: uhub3: on usbus3 Jun 12 23:37:33 sv01B kernel: ugen4.1: at usbus4 Jun 12 23:37:33 sv01B kernel: uhub4: on usbus4 Jun 12 23:37:33 sv01B kernel: ugen5.1: at usbus5 Jun 12 23:37:33 sv01B kernel: uhub5: on usbus5 Jun 12 23:37:33 sv01B kernel: ugen6.1: at usbus6 Jun 12 23:37:33 sv01B kernel: uhub6: on usbus6 Jun 12 23:37:33 sv01B kernel: SMP: AP CPU #1 Launched! Jun 12 23:37:33 sv01B kernel: SMP: AP CPU #2 Launched! Jun 12 23:37:33 sv01B kernel: Root mount waiting for: usbus6 usbus5 usbus4 usbus3 usbus2 usbus1 usbus0 Jun 12 23:37:33 sv01B kernel: uhub6: 2 ports with 2 removable, self powered Jun 12 23:37:33 sv01B kernel: uhub1: 3 ports with 3 removable, self powered Jun 12 23:37:33 sv01B kernel: uhub0: 3 ports with 3 removable, self powered Jun 12 23:37:33 sv01B kernel: uhub3: 3 ports with 3 removable, self powered Jun 12 23:37:33 sv01B kernel: uhub4: 3 ports with 3 removable, self powered Jun 12 23:37:33 sv01B kernel: Root mount waiting for: usbus5 usbus2 Jun 12 23:37:33 sv01B kernel: Root mount waiting for: usbus5 usbus2 Jun 12 23:37:33 sv01B kernel: uhub2: 6 ports with 6 removable, self powered Jun 12 23:37:33 sv01B kernel: uhub5: 6 ports with 6 removable, self powered Jun 12 23:37:33 sv01B kernel: Trying to mount root from zfs:zsv01B/ROOT/zsv01B Jun 12 23:37:33 sv01B kernel: ugen1.2: at usbus1 Jun 12 23:37:33 sv01B kernel: ukbd0: on usbus1 Jun 12 23:37:33 sv01B kernel: kbd2 at ukbd0 Jun 12 23:37:35 sv01B kernel: alc0: link state changed to UP Jun 12 23:38:45 sv01B login: ROOT LOGIN (root) ON ttyv0 Jun 12 23:38:53 sv01B login: ROOT LOGIN (root) ON ttyv1 Jun 12 23:39:00 sv01B login: ROOT LOGIN (root) ON ttyv2 Jun 12 23:39:59 sv01B hastd: [zfshast] (init) We act as init for the resource and not as secondary as requested by tcp4://192.168.0.240:41687. Jun 12 23:40:04 sv01B hastd: [zfshast] (init) We act as init for the resource and not as secondary as requested by tcp4://192.168.0.240:49067. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=763, exitcode=75). Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=765, exitcode=75). Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=767, exitcode=75). Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=769, exitcode=75). Jun 12 23:40:34 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:34 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=771, exitcode=75). Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=772, exitcode=75). Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=773, exitcode=75). Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=774, exitcode=75). Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Unable to receive request header. : Socket is not connected. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Worker process exited ungracefully (pid=775, exitcode=75). --------------040800000505090801010307 Content-Type: text/plain; name="debug.log" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="debug.log" Jun 12 23:37:33 sv01B newsyslog[433]: logfile first created Jun 12 23:39:59 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:39:59 sv01B hastd: tcp4://192.168.0.240:41687: resource=zfshast Jun 12 23:40:04 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:04 sv01B hastd: tcp4://192.168.0.240:49067: resource=zfshast Jun 12 23:40:09 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:09 sv01B hastd: tcp4://192.168.0.240:25069: resource=zfshast Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:25069. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:25069 configured. Jun 12 23:40:09 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:09 sv01B hastd: tcp4://192.168.0.240:65280: resource=zfshast Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:65280 configured. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) recv: (0x8011f52e0) Got request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) recv: (0x8011f52e0) Got request header: WRITE(0, 131072). Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) recv: (0x8011f52e0) Moving request to the disk queue. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) recv: (0x8011f5290) Got request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Got request: WRITE(0, 131072). Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Moving request to the send queue. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: (0x8011f52e0) Got request: WRITE(0, 131072). Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Moving request to the free queue. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:14 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:14 sv01B hastd: tcp4://192.168.0.240:32992: resource=zfshast Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:32992. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=763), stopping it. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:32992 configured. Jun 12 23:40:14 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:14 sv01B hastd: tcp4://192.168.0.240:26238: resource=zfshast Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:26238 configured. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:14 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:19 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:19 sv01B hastd: tcp4://192.168.0.240:56005: resource=zfshast Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:56005. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=765), stopping it. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:56005 configured. Jun 12 23:40:19 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:19 sv01B hastd: tcp4://192.168.0.240:18207: resource=zfshast Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:18207 configured. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:19 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:24 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:24 sv01B hastd: tcp4://192.168.0.240:43780: resource=zfshast Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:43780. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=767), stopping it. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:43780 configured. Jun 12 23:40:24 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:24 sv01B hastd: tcp4://192.168.0.240:52141: resource=zfshast Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:52141 configured. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:24 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:29 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:29 sv01B hastd: tcp4://192.168.0.240:25496: resource=zfshast Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:25496. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=769), stopping it. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:25496 configured. Jun 12 23:40:29 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:29 sv01B hastd: tcp4://192.168.0.240:35356: resource=zfshast Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:35356 configured. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:29 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:34 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:34 sv01B hastd: tcp4://192.168.0.240:32148: resource=zfshast Jun 12 23:40:34 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:32148. Jun 12 23:40:34 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=771), stopping it. Jun 12 23:40:34 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:32148 configured. Jun 12 23:40:34 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:35 sv01B hastd: tcp4://192.168.0.240:26904: resource=zfshast Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:26904 configured. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:35 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:40 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:40 sv01B hastd: tcp4://192.168.0.240:10336: resource=zfshast Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:10336. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=772), stopping it. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:10336 configured. Jun 12 23:40:40 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:40 sv01B hastd: tcp4://192.168.0.240:23685: resource=zfshast Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:23685 configured. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:40 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:45 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:45 sv01B hastd: tcp4://192.168.0.240:59142: resource=zfshast Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:59142. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=773), stopping it. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:59142 configured. Jun 12 23:40:45 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:45 sv01B hastd: tcp4://192.168.0.240:42481: resource=zfshast Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:42481 configured. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:45 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:50 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:50 sv01B hastd: tcp4://192.168.0.240:60611: resource=zfshast Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:60611. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=774), stopping it. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:60611 configured. Jun 12 23:40:50 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:50 sv01B hastd: tcp4://192.168.0.240:19970: resource=zfshast Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:19970 configured. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:50 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:55 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:55 sv01B hastd: tcp4://192.168.0.240:14274: resource=zfshast Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:14274. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=775), stopping it. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:14274 configured. Jun 12 23:40:55 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:40:55 sv01B hastd: tcp4://192.168.0.240:17831: resource=zfshast Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:17831 configured. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:40:55 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:00 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:00 sv01B hastd: tcp4://192.168.0.240:35927: resource=zfshast Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:35927. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=776), stopping it. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:35927 configured. Jun 12 23:41:00 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:00 sv01B hastd: tcp4://192.168.0.240:40411: resource=zfshast Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:40411 configured. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:00 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:05 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:05 sv01B hastd: tcp4://192.168.0.240:45270: resource=zfshast Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:45270. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=778), stopping it. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:45270 configured. Jun 12 23:41:05 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:05 sv01B hastd: tcp4://192.168.0.240:58964: resource=zfshast Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:58964 configured. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:05 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:10 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:10 sv01B hastd: tcp4://192.168.0.240:46708: resource=zfshast Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:46708. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=780), stopping it. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:46708 configured. Jun 12 23:41:10 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:10 sv01B hastd: tcp4://192.168.0.240:14714: resource=zfshast Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:14714 configured. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:10 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:15 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:15 sv01B hastd: tcp4://192.168.0.240:61084: resource=zfshast Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:61084. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=781), stopping it. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:61084 configured. Jun 12 23:41:15 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:15 sv01B hastd: tcp4://192.168.0.240:51255: resource=zfshast Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:51255 configured. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:15 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:20 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:20 sv01B hastd: tcp4://192.168.0.240:47597: resource=zfshast Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:47597. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=782), stopping it. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:47597 configured. Jun 12 23:41:20 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:20 sv01B hastd: tcp4://192.168.0.240:60235: resource=zfshast Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:60235 configured. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:20 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:25 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:25 sv01B hastd: tcp4://192.168.0.240:21401: resource=zfshast Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:21401. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=784), stopping it. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:21401 configured. Jun 12 23:41:25 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:25 sv01B hastd: tcp4://192.168.0.240:47118: resource=zfshast Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:47118 configured. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:25 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:30 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:30 sv01B hastd: tcp4://192.168.0.240:21567: resource=zfshast Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:21567. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=785), stopping it. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:21567 configured. Jun 12 23:41:30 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:30 sv01B hastd: tcp4://192.168.0.240:38332: resource=zfshast Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:38332 configured. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:30 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:35 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:35 sv01B hastd: tcp4://192.168.0.240:53072: resource=zfshast Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Initial connection from tcp4://192.168.0.240:53072. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Worker process exists (pid=786), stopping it. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Incoming connection from tcp4://192.168.0.240:53072 configured. Jun 12 23:41:35 sv01B hastd: Accepting connection to tcp4://0.0.0.0:8457. Jun 12 23:41:35 sv01B hastd: tcp4://192.168.0.240:15778: resource=zfshast Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Outgoing connection to tcp4://192.168.0.240:15778 configured. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Obtained info about /dev/ad4p4. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Locked /dev/ad4p4. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Got request header: WRITE(0, 131072). Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f32e0) Moving request to the disk queue. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) recv: Taking free request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) recv: (0x8011f3290) Got request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) Local activemap cleared. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the send queue. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: Taking request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: No requests, waiting. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) send: (0x8011f32e0) Got request: WRITE(0, 131072). Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) disk: (0x8011f32e0) Moving request to the free queue. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) send: Taking request. Jun 12 23:41:35 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. --------------040800000505090801010307-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 19:15:57 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0BC6106564A for ; Sat, 12 Jun 2010 19:15:57 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 49AB08FC08 for ; Sat, 12 Jun 2010 19:15:56 +0000 (UTC) Received: by bwz2 with SMTP id 2so1500790bwz.13 for ; Sat, 12 Jun 2010 12:15:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=cEljRfu8oPr58A2o+cBkmZlt4VTiCNhD+nzTUHonS+k=; b=ikySyJKP1B8GtY70iFJ57zU76GeVgWNrCMEpx93pd2SjEoU4GRJBwP+JEG3seDT9P+ DbtlJyCvjoD/bPFbwe7zTfBYblW1JSI0rDuBPUt7xpD/uwkaWWfndzre/84gNOlDpjG6 l0cM9Ldsjgm9kaL3lTT3P924pMK3bhmlrxdXU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=siBhcADzwLnlSC1u4W/48qFm4CjM/XEgDBSAjhglxl610WnkHxr4ix+OOZdBxV6+7T 4AG/99UGC2CnI5fPfaBvVfvqwk6EnrVriDpYTz33v4jMezAKBHhjJ4fIz6vApwbDfvE0 524VckJiFlCcUUwxeJD4RvxM6QWpH1v7g6dk8= Received: by 10.204.74.2 with SMTP id s2mr2618085bkj.28.1276370154970; Sat, 12 Jun 2010 12:15:54 -0700 (PDT) Received: from localhost ([95.69.160.52]) by mx.google.com with ESMTPS id z20sm11017279bkx.9.2010.06.12.12.15.53 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 12 Jun 2010 12:15:54 -0700 (PDT) From: Mikolaj Golub To: Kostik Belousov References: <86mxv22ji7.fsf@zhuzha.ua1> <20100611191059.GF13238@deviant.kiev.zoral.com.ua> X-Comment-To: Kostik Belousov Date: Sat, 12 Jun 2010 22:15:52 +0300 In-Reply-To: <20100611191059.GF13238@deviant.kiev.zoral.com.ua> (Kostik Belousov's message of "Fri, 11 Jun 2010 22:10:59 +0300") Message-ID: <86mxv0cb9z.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: freebsd-fs@freebsd.org Subject: Re: '#ifndef DIAGNOSTIC' in nfsclient code looks like a typo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 19:15:58 -0000 --=-=-= On Fri, 11 Jun 2010 22:10:59 +0300 Kostik Belousov wrote: KB> All the changes should be converted to the KASSERTs. There is no point KB> in doing KB> if (something) KB> panic(); KB> for diagnostic; use KB> KASSERT(something, (panic message)); Please look at the attached patch. -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=nfs.KASSERT.patch Index: sys/nfsclient/nfs_vnops.c =================================================================== --- sys/nfsclient/nfs_vnops.c (revision 208960) +++ sys/nfsclient/nfs_vnops.c (working copy) @@ -1348,10 +1348,7 @@ int v3 = NFS_ISV3(vp), committed = NFSV3WRITE_FILESYNC; int wsize; -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfs: writerpc iovcnt > 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfs: writerpc iovcnt > 1")); *must_commit = 0; tsiz = uiop->uio_resid; mtx_lock(&nmp->nm_mtx); @@ -1708,12 +1705,8 @@ int error = 0; struct vattr vattr; -#ifndef DIAGNOSTIC - if ((cnp->cn_flags & HASBUF) == 0) - panic("nfs_remove: no name"); - if (vrefcnt(vp) < 1) - panic("nfs_remove: bad v_usecount"); -#endif + KASSERT(cnp->cn_flags & HASBUF, ("nfs_remove: no name")); + KASSERT(vrefcnt(vp) > 0, ("nfs_remove: bad v_usecount")); if (vp->v_type == VDIR) error = EPERM; else if (vrefcnt(vp) == 1 || (np->n_sillyrename && @@ -1814,11 +1807,8 @@ struct componentname *fcnp = ap->a_fcnp; int error; -#ifndef DIAGNOSTIC - if ((tcnp->cn_flags & HASBUF) == 0 || - (fcnp->cn_flags & HASBUF) == 0) - panic("nfs_rename: no name"); -#endif + KASSERT((tcnp->cn_flags & HASBUF) && (fcnp->cn_flags & HASBUF), + ("nfs_rename: no name")); /* Check for cross-device rename */ if ((fvp->v_mount != tdvp->v_mount) || (tvp && (fvp->v_mount != tvp->v_mount))) { @@ -2277,11 +2267,10 @@ int attrflag; int v3 = NFS_ISV3(vp); -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + !(uiop->uio_offset & (DIRBLKSIZ - 1)) && + !(uiop->uio_resid & (DIRBLKSIZ - 1)), + ("nfs readdirrpc bad uio")); /* * If there is no cookie, assume directory was stale. @@ -2482,11 +2471,10 @@ #ifndef nolint dp = NULL; #endif -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirplusrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + !(uiop->uio_offset & (DIRBLKSIZ - 1)) && + !(uiop->uio_resid & (DIRBLKSIZ - 1)), + ("nfs readdirplusrpc bad uio")); ndp->ni_dvp = vp; newvp = NULLVP; @@ -2752,10 +2740,7 @@ cache_purge(dvp); np = VTONFS(vp); -#ifndef DIAGNOSTIC - if (vp->v_type == VDIR) - panic("nfs: sillyrename dir"); -#endif + KASSERT(vp->v_type != VDIR, ("nfs: sillyrename dir")); sp = malloc(sizeof (struct sillyrename), M_NFSREQ, M_WAITOK); sp->s_cred = crhold(cnp->cn_cred); Index: sys/nfsclient/nfs_bio.c =================================================================== --- sys/nfsclient/nfs_bio.c (revision 208960) +++ sys/nfsclient/nfs_bio.c (working copy) @@ -453,10 +453,7 @@ int seqcount; int nra, error = 0, n = 0, on = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_READ) - panic("nfs_read mode"); -#endif + KASSERT(uio->uio_rw == UIO_READ, ("nfs_read mode")); if (uio->uio_resid == 0) return (0); if (uio->uio_offset < 0) /* XXX VDIR cookies can be negative */ @@ -875,12 +872,9 @@ int bcount; int n, on, error = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_WRITE) - panic("nfs_write mode"); - if (uio->uio_segflg == UIO_USERSPACE && uio->uio_td != curthread) - panic("nfs_write proc"); -#endif + KASSERT(uio->uio_rw == UIO_WRITE, ("nfs_write mode")); + KASSERT(uio->uio_segflg != UIO_USERSPACE || uio->uio_td == curthread, + ("nfs_write proc")); if (vp->v_type != VREG) return (EIO); mtx_lock(&np->n_mtx); Index: sys/nfsclient/nfs_subs.c =================================================================== --- sys/nfsclient/nfs_subs.c (revision 208960) +++ sys/nfsclient/nfs_subs.c (working copy) @@ -199,10 +199,7 @@ int uiosiz, clflg, rem; char *cp; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfsm_uiotombuf: iovcnt != 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfsm_uiotombuf: iovcnt != 1")); if (siz > MLEN) /* or should it >= MCLBYTES ?? */ clflg = 1; @@ -789,10 +786,7 @@ pos = (uoff_t)off / NFS_DIRBLKSIZ; if (pos == 0 || off < 0) { -#ifdef DIAGNOSTIC - if (add) - panic("nfs getcookie add at <= 0"); -#endif + KASSERT(!add, ("nfs getcookie add at <= 0")); return (&nfs_nullcookie); } pos--; @@ -843,10 +837,7 @@ { struct nfsnode *np = VTONFS(vp); -#ifdef DIAGNOSTIC - if (vp->v_type != VDIR) - panic("nfs: invaldir not dir"); -#endif + KASSERT(vp->v_type == VDIR, ("nfs: invaldir not dir")); nfs_dircookie_lock(np); np->n_direofoffset = 0; np->n_cookieverf.nfsuquad[0] = 0; Index: sys/fs/nfsclient/nfs_clbio.c =================================================================== --- sys/fs/nfsclient/nfs_clbio.c (revision 208960) +++ sys/fs/nfsclient/nfs_clbio.c (working copy) @@ -453,10 +453,7 @@ int seqcount; int nra, error = 0, n = 0, on = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_READ) - panic("ncl_read mode"); -#endif + KASSERT(uio->uio_rw == UIO_READ, ("ncl_read mode")); if (uio->uio_resid == 0) return (0); if (uio->uio_offset < 0) /* XXX VDIR cookies can be negative */ @@ -881,12 +878,9 @@ int bcount; int n, on, error = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_WRITE) - panic("ncl_write mode"); - if (uio->uio_segflg == UIO_USERSPACE && uio->uio_td != curthread) - panic("ncl_write proc"); -#endif + KASSERT(uio->uio_rw == UIO_WRITE, ("ncl_write mode")); + KASSERT(uio->uio_segflg != UIO_USERSPACE || uio->uio_td == curthread, + ("ncl_write proc")); if (vp->v_type != VREG) return (EIO); mtx_lock(&np->n_mtx); Index: sys/fs/nfsclient/nfs_clcomsubs.c =================================================================== --- sys/fs/nfsclient/nfs_clcomsubs.c (revision 208960) +++ sys/fs/nfsclient/nfs_clcomsubs.c (working copy) @@ -194,10 +194,7 @@ int uiosiz, clflg, rem; char *cp, *tcp; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfsm_uiotombuf: iovcnt != 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfsm_uiotombuf: iovcnt != 1")); if (siz > ncl_mbuf_mlen) /* or should it >= MCLBYTES ?? */ clflg = 1; @@ -346,10 +343,7 @@ pos = off / NFS_DIRBLKSIZ; if (pos == 0) { -#ifdef DIAGNOSTIC - if (add) - panic("nfs getcookie add at 0"); -#endif + KASSERT(!add, ("nfs getcookie add at 0")); return (&nfs_nullcookie); } pos--; Index: sys/fs/nfsclient/nfs_clsubs.c =================================================================== --- sys/fs/nfsclient/nfs_clsubs.c (revision 208960) +++ sys/fs/nfsclient/nfs_clsubs.c (working copy) @@ -282,10 +282,7 @@ pos = (uoff_t)off / NFS_DIRBLKSIZ; if (pos == 0 || off < 0) { -#ifdef DIAGNOSTIC - if (add) - panic("nfs getcookie add at <= 0"); -#endif + KASSERT(!add, ("nfs getcookie add at <= 0")); return (&nfs_nullcookie); } pos--; @@ -336,10 +333,7 @@ { struct nfsnode *np = VTONFS(vp); -#ifdef DIAGNOSTIC - if (vp->v_type != VDIR) - panic("nfs: invaldir not dir"); -#endif + KASSERT(vp->v_type == VDIR, ("nfs: invaldir not dir")); ncl_dircookie_lock(np); np->n_direofoffset = 0; np->n_cookieverf.nfsuquad[0] = 0; Index: sys/fs/nfsclient/nfs_clvnops.c =================================================================== --- sys/fs/nfsclient/nfs_clvnops.c (revision 208960) +++ sys/fs/nfsclient/nfs_clvnops.c (working copy) @@ -1564,12 +1564,8 @@ int error = 0; struct vattr vattr; -#ifndef DIAGNOSTIC - if ((cnp->cn_flags & HASBUF) == 0) - panic("nfs_remove: no name"); - if (vrefcnt(vp) < 1) - panic("nfs_remove: bad v_usecount"); -#endif + KASSERT(cnp->cn_flags & HASBUF, ("nfs_remove: no name")); + KASSERT(vrefcnt(vp) > 0, ("nfs_remove: bad v_usecount")); if (vp->v_type == VDIR) error = EPERM; else if (vrefcnt(vp) == 1 || (np->n_sillyrename && @@ -1676,11 +1672,8 @@ struct nfsv4node *newv4 = NULL; int error; -#ifndef DIAGNOSTIC - if ((tcnp->cn_flags & HASBUF) == 0 || - (fcnp->cn_flags & HASBUF) == 0) - panic("nfs_rename: no name"); -#endif + KASSERT((tcnp->cn_flags & HASBUF) && (fcnp->cn_flags & HASBUF), + ("nfs_rename: no name")); /* Check for cross-device rename */ if ((fvp->v_mount != tdvp->v_mount) || (tvp && (fvp->v_mount != tvp->v_mount))) { @@ -2137,11 +2130,10 @@ struct nfsmount *nmp = VFSTONFS(vp->v_mount); int error = 0, eof, attrflag; -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + !(uiop->uio_offset & (DIRBLKSIZ - 1)) && + !(uiop->uio_resid & (DIRBLKSIZ - 1)), + ("nfs readdirrpc bad uio")); /* * If there is no cookie, assume directory was stale. @@ -2198,11 +2190,10 @@ struct nfsmount *nmp = VFSTONFS(vp->v_mount); int error = 0, attrflag, eof; -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirplusrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + !(uiop->uio_offset & (DIRBLKSIZ - 1)) && + !(uiop->uio_resid & (DIRBLKSIZ - 1)), + ("nfs readdirplusrpc bad uio")); /* * If there is no cookie, assume directory was stale. @@ -2264,10 +2255,7 @@ cache_purge(dvp); np = VTONFS(vp); -#ifndef DIAGNOSTIC - if (vp->v_type == VDIR) - panic("nfs: sillyrename dir"); -#endif + KASSERT(vp->v_type != VDIR, ("nfs: sillyrename dir")); MALLOC(sp, struct sillyrename *, sizeof (struct sillyrename), M_NEWNFSREQ, M_WAITOK); sp->s_cred = crhold(cnp->cn_cred); Index: sys/fs/nfsclient/nfs_clrpcops.c =================================================================== --- sys/fs/nfsclient/nfs_clrpcops.c (revision 208960) +++ sys/fs/nfsclient/nfs_clrpcops.c (working copy) @@ -1445,10 +1445,7 @@ struct nfsrv_descript *nd = &nfsd; nfsattrbit_t attrbits; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfs: writerpc iovcnt > 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfs: writerpc iovcnt > 1")); *attrflagp = 0; tsiz = uio_uio_resid(uiop); NFSLOCKMNT(nmp); @@ -2501,10 +2498,9 @@ u_int32_t *tl2 = NULL; size_t tresid; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uio_uio_resid(uiop) & (DIRBLKSIZ - 1))) - panic("nfs readdirrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + !(uio_uio_resid(uiop) & (DIRBLKSIZ - 1)), + ("nfs readdirrpc bad uio")); /* * There is no point in reading a lot more than uio_resid, however @@ -2939,10 +2935,9 @@ size_t tresid; u_int32_t *tl2 = NULL, fakefileno = 0xffffffff, rderr; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uio_uio_resid(uiop) & (DIRBLKSIZ - 1))) - panic("nfs readdirplusrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + !(uio_uio_resid(uiop) & (DIRBLKSIZ - 1)), + ("nfs readdirplusrpc bad uio")); *attrflagp = 0; if (eofp != NULL) *eofp = 0; Index: sys/fs/nfsserver/nfs_nfsdsocket.c =================================================================== --- sys/fs/nfsserver/nfs_nfsdsocket.c (revision 208960) +++ sys/fs/nfsserver/nfs_nfsdsocket.c (working copy) @@ -364,10 +364,7 @@ * Get a locked vnode for the first file handle */ if (!(nd->nd_flag & ND_NFSV4)) { -#ifdef DIAGNOSTIC - if (nd->nd_repstat) - panic("nfsrvd_dorpc"); -#endif + KASSERT(nd->nd_repstat == 0, ("nfsrvd_dorpc")); /* * For NFSv3, if the malloc/mget allocation is near limits, * return NFSERR_DELAY. --=-=-=-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 19:40:27 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42AE31065674 for ; Sat, 12 Jun 2010 19:40:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id B26C28FC17 for ; Sat, 12 Jun 2010 19:40:26 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o5CJeMpG049663 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 12 Jun 2010 22:40:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o5CJeMHA078076; Sat, 12 Jun 2010 22:40:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o5CJeMne078075; Sat, 12 Jun 2010 22:40:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 12 Jun 2010 22:40:22 +0300 From: Kostik Belousov To: Mikolaj Golub Message-ID: <20100612194022.GP13238@deviant.kiev.zoral.com.ua> References: <86mxv22ji7.fsf@zhuzha.ua1> <20100611191059.GF13238@deviant.kiev.zoral.com.ua> <86mxv0cb9z.fsf@kopusha.home.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hHiQ9nAwW5IGN2dL" Content-Disposition: inline In-Reply-To: <86mxv0cb9z.fsf@kopusha.home.net> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.6 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: '#ifndef DIAGNOSTIC' in nfsclient code looks like a typo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 19:40:27 -0000 --hHiQ9nAwW5IGN2dL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 12, 2010 at 10:15:52PM +0300, Mikolaj Golub wrote: >=20 > On Fri, 11 Jun 2010 22:10:59 +0300 Kostik Belousov wrote: >=20 > KB> All the changes should be converted to the KASSERTs. There is no poi= nt > KB> in doing > KB> if (something) > KB> panic(); > KB> for diagnostic; use > KB> KASSERT(something, (panic message)); >=20 > Please look at the attached patch. Almost there. According to style(9), the values should be explicitely compared with 0, unless the value is of the boolean type. I suggest you to change e.g. + KASSERT(uiop->uio_iovcnt =3D=3D 1 && + !(uio_uio_resid(uiop) & (DIRBLKSIZ - 1)), to + KASSERT(uiop->uio_iovcnt =3D=3D 1 && + (uio_uio_resid(uiop) & (DIRBLKSIZ - 1)) =3D=3D 0, and change + KASSERT((tcnp->cn_flags & HASBUF) && (fcnp->cn_flags & HASBUF), + ("nfs_rename: no name")); to + KASSERT((tcnp->cn_flags & HASBUF) !=3D 0 && (fcnp->cn_flags & HASBUF) != =3D 0, + ("nfs_rename: no name")); --hHiQ9nAwW5IGN2dL Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkwT4qUACgkQC3+MBN1Mb4gAiwCfW3Cm3vzXSk2wnlnbg5pjlpv4 rNEAoIoNtjZAiTBUTch547aZn+DZruqP =EUq2 -----END PGP SIGNATURE----- --hHiQ9nAwW5IGN2dL-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 20:25:00 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C0ED106566C; Sat, 12 Jun 2010 20:25:00 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8F08C8FC0C; Sat, 12 Jun 2010 20:24:59 +0000 (UTC) Received: by bwz2 with SMTP id 2so1535531bwz.13 for ; Sat, 12 Jun 2010 13:24:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=jgLuAZLEqB+ZDfyE0vubpbDmMxxbmhZTrKJQIDRl2rg=; b=jAauPJJHRkTGYNneTy6S+foFKx/bEIWGq6cKwGiGM94BzkPQBm5J692naYTbL9PK6Y YJTXb8zSrE/SIeQq2bF3NzKdNGGj3N68o7TemHhcgI/FTCHYXSW8anl/zPn/ADLiGlqx NRdoXGUTI527tokf3mBGo5zIHTwrwiM+Bo5Jk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=xIJc91kHr3XVI+FeOWHUPQceU0ByO9CqavQL7nrovO3IMXhEVkA6Jf4Ww62dNIvKOS kA3iuLYp9xrYZzm07FhxNa2tYmFVE6lIv7bTaGf51vpEtqnHcC/LHCU/uLPdXhFcVIS3 G1nU5ENbtG0QlB88ytebwjcSCByk/4ZqOX1ag= Received: by 10.204.74.35 with SMTP id s35mr2669221bkj.33.1276374298416; Sat, 12 Jun 2010 13:24:58 -0700 (PDT) Received: from localhost ([95.69.160.52]) by mx.google.com with ESMTPS id z17sm11246814bkx.12.2010.06.12.13.24.56 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 12 Jun 2010 13:24:57 -0700 (PDT) From: Mikolaj Golub To: "hiroshi\@soupacific.com" References: <4BCFA4C2.6000109@soupacific.com> <4BCFB1C5.5000908@soupacific.com> <4BD01800.9040901@soupacific.com> <4BD0438B.5080308@soupacific.com> <4BD0E432.1000108@soupacific.com> <20100423061521.GC1670@garage.freebsd.pl> <4BD17B0D.5080601@soupacific.com> <4C10B526.4040908@soupacific.com> <20100612104336.GA2253@garage.freebsd.pl> <4C1372E0.1000903@soupacific.com> <20100612142311.GF2253@garage.freebsd.pl> <4C139F9C.2090305@soupacific.com> X-Comment-To: hiroshi@soupacific.com Date: Sat, 12 Jun 2010 23:24:53 +0300 In-Reply-To: <4C139F9C.2090305@soupacific.com> (hiroshi@soupacific.com's message of "Sat, 12 Jun 2010 23:54:20 +0900") Message-ID: <86iq5oc82y.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 20:25:00 -0000 --=-=-= On Sat, 12 Jun 2010 23:54:20 +0900 hiroshi@soupacific.com wrote: >> Could you send the debug from the whole session, but when the problem >> appears? I don't see those "socket is not connected" errors in the >> output you sent. h> "socket is not connected" errors is in message log >> If it is possible could you turn off lines wrapping or maybe send the >> debug output as an attachment? >> h> I here attache debug.log and message files. It would be good to have all.log enabled in newsyslog.conf and provide the output from there so all lines are in one log and it is clear which message appeared earlier. Also the logs from the primary could be useful too. h> Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: (0x8011f52e0) Got request: WRITE(0, 131072). h> Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) disk: (0x8011f52e0) Moving request to the free queue. BTW, this message lies that it is from the disk thread. It is from the send thread too. See the attached patch. h> Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: Taking request. h> Jun 12 23:40:09 sv01B hastd: [zfshast] (secondary) send: No requests, waiting. -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=secondary.c.send.patch Index: sbin/hastd/secondary.c =================================================================== --- sbin/hastd/secondary.c (revision 208960) +++ sbin/hastd/secondary.c (working copy) @@ -687,7 +687,7 @@ send_thread(void *arg) pjdlog_exit(EX_TEMPFAIL, "Unable to send reply."); } nv_free(nvout); - pjdlog_debug(2, "disk: (%p) Moving request to the free queue.", + pjdlog_debug(2, "send: (%p) Moving request to the free queue.", hio); nv_free(hio->hio_nv); hio->hio_error = 0; --=-=-=-- From owner-freebsd-fs@FreeBSD.ORG Sat Jun 12 21:33:48 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F522106564A for ; Sat, 12 Jun 2010 21:33:48 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id E21C48FC1A for ; Sat, 12 Jun 2010 21:33:47 +0000 (UTC) Received: by bwz2 with SMTP id 2so1569795bwz.13 for ; Sat, 12 Jun 2010 14:33:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=1n/t+5IAfsVr1RwhpqW8Ksnip3bkG9Z90+eqDI7aeOk=; b=NTYRCZRvIYvV98wYzW1Cw9pNIxChcketOpNJ2v2oEeTEt/6nNQh7StVI4E2jWke5HB eh6wNK+Etn1UEnDqNwt5t4ISA8S5vjIuEBd03UlFaUvtN/eYuh4pIStA4CMRMeue6+4l alfYAoukOiRbTraeQbPjwQPEKvw+7O1xKq3Uw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=gr4v7PeLm1NIweSPHgtPNsir2xNbu1yfmuwpyzi6LJ7MONaXVlez9iMiTU2kzKTbIN a2C9R3VfVJqxGRdQZ4C39tb4or1bRx3BLk1v8wIi/7S1P4K4HtTiUUwV70IXXIAG08pM LkmgEoEBrFrv3f+goSwv14MyBQCCDvD61azGY= Received: by 10.204.132.211 with SMTP id c19mr2640088bkt.184.1276378426515; Sat, 12 Jun 2010 14:33:46 -0700 (PDT) Received: from localhost ([95.69.160.52]) by mx.google.com with ESMTPS id z17sm11478440bkx.12.2010.06.12.14.33.45 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 12 Jun 2010 14:33:45 -0700 (PDT) From: Mikolaj Golub To: Kostik Belousov References: <86mxv22ji7.fsf@zhuzha.ua1> <20100611191059.GF13238@deviant.kiev.zoral.com.ua> <86mxv0cb9z.fsf@kopusha.home.net> <20100612194022.GP13238@deviant.kiev.zoral.com.ua> X-Comment-To: Kostik Belousov Date: Sun, 13 Jun 2010 00:33:43 +0300 In-Reply-To: <20100612194022.GP13238@deviant.kiev.zoral.com.ua> (Kostik Belousov's message of "Sat, 12 Jun 2010 22:40:22 +0300") Message-ID: <86eigcc4w8.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: freebsd-fs@freebsd.org Subject: Re: '#ifndef DIAGNOSTIC' in nfsclient code looks like a typo X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jun 2010 21:33:48 -0000 --=-=-= On Sat, 12 Jun 2010 22:40:22 +0300 Kostik Belousov wrote: KB> Almost there. According to style(9), the values should be explicitely KB> compared with 0, unless the value is of the boolean type. I suggest KB> you to change e.g. KB> + KASSERT(uiop->uio_iovcnt == 1 && KB> + !(uio_uio_resid(uiop) & (DIRBLKSIZ - 1)), KB> to KB> + KASSERT(uiop->uio_iovcnt == 1 && KB> + (uio_uio_resid(uiop) & (DIRBLKSIZ - 1)) == 0, KB> and change KB> + KASSERT((tcnp->cn_flags & HASBUF) && (fcnp->cn_flags & HASBUF), KB> + ("nfs_rename: no name")); KB> to KB> + KASSERT((tcnp->cn_flags & HASBUF) != 0 && (fcnp->cn_flags & HASBUF) != 0, KB> + ("nfs_rename: no name")); Updated. -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=nfs.KASSERT.patch Index: sys/nfsclient/nfs_vnops.c =================================================================== --- sys/nfsclient/nfs_vnops.c (revision 208960) +++ sys/nfsclient/nfs_vnops.c (working copy) @@ -1348,10 +1348,7 @@ int v3 = NFS_ISV3(vp), committed = NFSV3WRITE_FILESYNC; int wsize; -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfs: writerpc iovcnt > 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfs: writerpc iovcnt > 1")); *must_commit = 0; tsiz = uiop->uio_resid; mtx_lock(&nmp->nm_mtx); @@ -1708,12 +1705,8 @@ int error = 0; struct vattr vattr; -#ifndef DIAGNOSTIC - if ((cnp->cn_flags & HASBUF) == 0) - panic("nfs_remove: no name"); - if (vrefcnt(vp) < 1) - panic("nfs_remove: bad v_usecount"); -#endif + KASSERT((cnp->cn_flags & HASBUF) != 0, ("nfs_remove: no name")); + KASSERT(vrefcnt(vp) > 0, ("nfs_remove: bad v_usecount")); if (vp->v_type == VDIR) error = EPERM; else if (vrefcnt(vp) == 1 || (np->n_sillyrename && @@ -1814,11 +1807,9 @@ struct componentname *fcnp = ap->a_fcnp; int error; -#ifndef DIAGNOSTIC - if ((tcnp->cn_flags & HASBUF) == 0 || - (fcnp->cn_flags & HASBUF) == 0) - panic("nfs_rename: no name"); -#endif + KASSERT((tcnp->cn_flags & HASBUF) != 0 && + (fcnp->cn_flags & HASBUF) != 0, + ("nfs_rename: no name")); /* Check for cross-device rename */ if ((fvp->v_mount != tdvp->v_mount) || (tvp && (fvp->v_mount != tvp->v_mount))) { @@ -2277,11 +2268,10 @@ int attrflag; int v3 = NFS_ISV3(vp); -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + (uiop->uio_offset & (DIRBLKSIZ - 1)) == 0 && + (uiop->uio_resid & (DIRBLKSIZ - 1)) == 0, + ("nfs readdirrpc bad uio")); /* * If there is no cookie, assume directory was stale. @@ -2482,11 +2472,10 @@ #ifndef nolint dp = NULL; #endif -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirplusrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + (uiop->uio_offset & (DIRBLKSIZ - 1)) == 0 && + (uiop->uio_resid & (DIRBLKSIZ - 1)) == 0, + ("nfs readdirplusrpc bad uio")); ndp->ni_dvp = vp; newvp = NULLVP; @@ -2752,10 +2741,7 @@ cache_purge(dvp); np = VTONFS(vp); -#ifndef DIAGNOSTIC - if (vp->v_type == VDIR) - panic("nfs: sillyrename dir"); -#endif + KASSERT(vp->v_type != VDIR, ("nfs: sillyrename dir")); sp = malloc(sizeof (struct sillyrename), M_NFSREQ, M_WAITOK); sp->s_cred = crhold(cnp->cn_cred); Index: sys/nfsclient/nfs_bio.c =================================================================== --- sys/nfsclient/nfs_bio.c (revision 208960) +++ sys/nfsclient/nfs_bio.c (working copy) @@ -453,10 +453,7 @@ int seqcount; int nra, error = 0, n = 0, on = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_READ) - panic("nfs_read mode"); -#endif + KASSERT(uio->uio_rw == UIO_READ, ("nfs_read mode")); if (uio->uio_resid == 0) return (0); if (uio->uio_offset < 0) /* XXX VDIR cookies can be negative */ @@ -875,12 +872,9 @@ int bcount; int n, on, error = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_WRITE) - panic("nfs_write mode"); - if (uio->uio_segflg == UIO_USERSPACE && uio->uio_td != curthread) - panic("nfs_write proc"); -#endif + KASSERT(uio->uio_rw == UIO_WRITE, ("nfs_write mode")); + KASSERT(uio->uio_segflg != UIO_USERSPACE || uio->uio_td == curthread, + ("nfs_write proc")); if (vp->v_type != VREG) return (EIO); mtx_lock(&np->n_mtx); Index: sys/nfsclient/nfs_subs.c =================================================================== --- sys/nfsclient/nfs_subs.c (revision 208960) +++ sys/nfsclient/nfs_subs.c (working copy) @@ -199,10 +199,7 @@ int uiosiz, clflg, rem; char *cp; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfsm_uiotombuf: iovcnt != 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfsm_uiotombuf: iovcnt != 1")); if (siz > MLEN) /* or should it >= MCLBYTES ?? */ clflg = 1; @@ -789,10 +786,7 @@ pos = (uoff_t)off / NFS_DIRBLKSIZ; if (pos == 0 || off < 0) { -#ifdef DIAGNOSTIC - if (add) - panic("nfs getcookie add at <= 0"); -#endif + KASSERT(!add, ("nfs getcookie add at <= 0")); return (&nfs_nullcookie); } pos--; @@ -843,10 +837,7 @@ { struct nfsnode *np = VTONFS(vp); -#ifdef DIAGNOSTIC - if (vp->v_type != VDIR) - panic("nfs: invaldir not dir"); -#endif + KASSERT(vp->v_type == VDIR, ("nfs: invaldir not dir")); nfs_dircookie_lock(np); np->n_direofoffset = 0; np->n_cookieverf.nfsuquad[0] = 0; Index: sys/fs/nfsclient/nfs_clbio.c =================================================================== --- sys/fs/nfsclient/nfs_clbio.c (revision 208960) +++ sys/fs/nfsclient/nfs_clbio.c (working copy) @@ -453,10 +453,7 @@ int seqcount; int nra, error = 0, n = 0, on = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_READ) - panic("ncl_read mode"); -#endif + KASSERT(uio->uio_rw == UIO_READ, ("ncl_read mode")); if (uio->uio_resid == 0) return (0); if (uio->uio_offset < 0) /* XXX VDIR cookies can be negative */ @@ -881,12 +878,9 @@ int bcount; int n, on, error = 0; -#ifdef DIAGNOSTIC - if (uio->uio_rw != UIO_WRITE) - panic("ncl_write mode"); - if (uio->uio_segflg == UIO_USERSPACE && uio->uio_td != curthread) - panic("ncl_write proc"); -#endif + KASSERT(uio->uio_rw == UIO_WRITE, ("ncl_write mode")); + KASSERT(uio->uio_segflg != UIO_USERSPACE || uio->uio_td == curthread, + ("ncl_write proc")); if (vp->v_type != VREG) return (EIO); mtx_lock(&np->n_mtx); Index: sys/fs/nfsclient/nfs_clcomsubs.c =================================================================== --- sys/fs/nfsclient/nfs_clcomsubs.c (revision 208960) +++ sys/fs/nfsclient/nfs_clcomsubs.c (working copy) @@ -194,10 +194,7 @@ int uiosiz, clflg, rem; char *cp, *tcp; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfsm_uiotombuf: iovcnt != 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfsm_uiotombuf: iovcnt != 1")); if (siz > ncl_mbuf_mlen) /* or should it >= MCLBYTES ?? */ clflg = 1; @@ -346,10 +343,7 @@ pos = off / NFS_DIRBLKSIZ; if (pos == 0) { -#ifdef DIAGNOSTIC - if (add) - panic("nfs getcookie add at 0"); -#endif + KASSERT(!add, ("nfs getcookie add at 0")); return (&nfs_nullcookie); } pos--; Index: sys/fs/nfsclient/nfs_clsubs.c =================================================================== --- sys/fs/nfsclient/nfs_clsubs.c (revision 208960) +++ sys/fs/nfsclient/nfs_clsubs.c (working copy) @@ -282,10 +282,7 @@ pos = (uoff_t)off / NFS_DIRBLKSIZ; if (pos == 0 || off < 0) { -#ifdef DIAGNOSTIC - if (add) - panic("nfs getcookie add at <= 0"); -#endif + KASSERT(!add, ("nfs getcookie add at <= 0")); return (&nfs_nullcookie); } pos--; @@ -336,10 +333,7 @@ { struct nfsnode *np = VTONFS(vp); -#ifdef DIAGNOSTIC - if (vp->v_type != VDIR) - panic("nfs: invaldir not dir"); -#endif + KASSERT(vp->v_type == VDIR, ("nfs: invaldir not dir")); ncl_dircookie_lock(np); np->n_direofoffset = 0; np->n_cookieverf.nfsuquad[0] = 0; Index: sys/fs/nfsclient/nfs_clvnops.c =================================================================== --- sys/fs/nfsclient/nfs_clvnops.c (revision 208960) +++ sys/fs/nfsclient/nfs_clvnops.c (working copy) @@ -1564,12 +1564,8 @@ int error = 0; struct vattr vattr; -#ifndef DIAGNOSTIC - if ((cnp->cn_flags & HASBUF) == 0) - panic("nfs_remove: no name"); - if (vrefcnt(vp) < 1) - panic("nfs_remove: bad v_usecount"); -#endif + KASSERT((cnp->cn_flags & HASBUF) != 0, ("nfs_remove: no name")); + KASSERT(vrefcnt(vp) > 0, ("nfs_remove: bad v_usecount")); if (vp->v_type == VDIR) error = EPERM; else if (vrefcnt(vp) == 1 || (np->n_sillyrename && @@ -1676,11 +1672,9 @@ struct nfsv4node *newv4 = NULL; int error; -#ifndef DIAGNOSTIC - if ((tcnp->cn_flags & HASBUF) == 0 || - (fcnp->cn_flags & HASBUF) == 0) - panic("nfs_rename: no name"); -#endif + KASSERT((tcnp->cn_flags & HASBUF) != 0 && + (fcnp->cn_flags & HASBUF) != 0, + ("nfs_rename: no name")); /* Check for cross-device rename */ if ((fvp->v_mount != tdvp->v_mount) || (tvp && (fvp->v_mount != tvp->v_mount))) { @@ -2137,11 +2131,10 @@ struct nfsmount *nmp = VFSTONFS(vp->v_mount); int error = 0, eof, attrflag; -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + (uiop->uio_offset & (DIRBLKSIZ - 1)) == 0 && + (uiop->uio_resid & (DIRBLKSIZ - 1)) == 0, + ("nfs readdirrpc bad uio")); /* * If there is no cookie, assume directory was stale. @@ -2198,11 +2191,10 @@ struct nfsmount *nmp = VFSTONFS(vp->v_mount); int error = 0, attrflag, eof; -#ifndef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uiop->uio_offset & (DIRBLKSIZ - 1)) || - (uiop->uio_resid & (DIRBLKSIZ - 1))) - panic("nfs readdirplusrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + (uiop->uio_offset & (DIRBLKSIZ - 1)) == 0 && + (uiop->uio_resid & (DIRBLKSIZ - 1)) == 0, + ("nfs readdirplusrpc bad uio")); /* * If there is no cookie, assume directory was stale. @@ -2264,10 +2256,7 @@ cache_purge(dvp); np = VTONFS(vp); -#ifndef DIAGNOSTIC - if (vp->v_type == VDIR) - panic("nfs: sillyrename dir"); -#endif + KASSERT(vp->v_type != VDIR, ("nfs: sillyrename dir")); MALLOC(sp, struct sillyrename *, sizeof (struct sillyrename), M_NEWNFSREQ, M_WAITOK); sp->s_cred = crhold(cnp->cn_cred); Index: sys/fs/nfsclient/nfs_clrpcops.c =================================================================== --- sys/fs/nfsclient/nfs_clrpcops.c (revision 208960) +++ sys/fs/nfsclient/nfs_clrpcops.c (working copy) @@ -1445,10 +1445,7 @@ struct nfsrv_descript *nd = &nfsd; nfsattrbit_t attrbits; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1) - panic("nfs: writerpc iovcnt > 1"); -#endif + KASSERT(uiop->uio_iovcnt == 1, ("nfs: writerpc iovcnt > 1")); *attrflagp = 0; tsiz = uio_uio_resid(uiop); NFSLOCKMNT(nmp); @@ -2501,10 +2498,9 @@ u_int32_t *tl2 = NULL; size_t tresid; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uio_uio_resid(uiop) & (DIRBLKSIZ - 1))) - panic("nfs readdirrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + (uio_uio_resid(uiop) & (DIRBLKSIZ - 1)) == 0, + ("nfs readdirrpc bad uio")); /* * There is no point in reading a lot more than uio_resid, however @@ -2939,10 +2935,9 @@ size_t tresid; u_int32_t *tl2 = NULL, fakefileno = 0xffffffff, rderr; -#ifdef DIAGNOSTIC - if (uiop->uio_iovcnt != 1 || (uio_uio_resid(uiop) & (DIRBLKSIZ - 1))) - panic("nfs readdirplusrpc bad uio"); -#endif + KASSERT(uiop->uio_iovcnt == 1 && + (uio_uio_resid(uiop) & (DIRBLKSIZ - 1)) == 0, + ("nfs readdirplusrpc bad uio")); *attrflagp = 0; if (eofp != NULL) *eofp = 0; Index: sys/fs/nfsserver/nfs_nfsdsocket.c =================================================================== --- sys/fs/nfsserver/nfs_nfsdsocket.c (revision 208960) +++ sys/fs/nfsserver/nfs_nfsdsocket.c (working copy) @@ -364,10 +364,7 @@ * Get a locked vnode for the first file handle */ if (!(nd->nd_flag & ND_NFSV4)) { -#ifdef DIAGNOSTIC - if (nd->nd_repstat) - panic("nfsrvd_dorpc"); -#endif + KASSERT(nd->nd_repstat == 0, ("nfsrvd_dorpc")); /* * For NFSv3, if the malloc/mget allocation is near limits, * return NFSERR_DELAY. --=-=-=--