From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 00:29:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 28AE1CC for ; Sun, 11 Nov 2012 00:29:26 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D60168FC0A for ; Sun, 11 Nov 2012 00:29:25 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAPbvnlCDaFvO/2dsb2JhbABEhhm+Q4IeAQEEASMEUgUWDgoCAg0ZAlkGiBcGqgmRdoEikCqBEwOIWo0ikEODDYF7 X-IronPort-AV: E=Sophos;i="4.80,755,1344225600"; d="scan'208";a="187477971" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 10 Nov 2012 19:29:19 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 1D876B408C; Sat, 10 Nov 2012 19:29:19 -0500 (EST) Date: Sat, 10 Nov 2012 19:29:19 -0500 (EST) From: Rick Macklem To: Xavier Beaudouin Message-ID: <985721960.220092.1352593759104.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: 9.0-RELEASE-p4 + NFS + ZFS = issues... :/ (probably a memory leak) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 00:29:26 -0000 Xavier Beaudouin wrote: > Hi Rick, > > I did several nfs mount using the udp patch you gave me and also TCP > mount instead. > > Seem with vmstat -m that : > > NFS fh 1 1K - 342087 64 > > Is going to growing and growing. > It is the first numeric field called "InUse" that indicates how many are currently allocated. The above line indicates "1". Are you saying that field keeps ratcheting up? (342087 is just how many times a malloc of this type has happened. It will normally keep increasing.) For "vmstat -z", it is the 3rd numeric field called "INUSE" that would keep increasing, if there was a leak. If no NFS related items are ratcheting up for the fields mentioned above, then there isn't a malloc() or uma_zalloc() leak. --> Probably a ZFS issue and know nothing about ZFS. (I wouldn't know a zpool, if it jumped up and bit me on the nose;-) Hopefully some ZFS guys may jump in with suggestions on what to check, rick > Umounting the nfs mount free some memory and I getting free memory > again. > > Seems that : > > rsync -r -v -p -o -g --times --specials --delete --exclude=.snapshot > /mnt/ /vol/hosting/ > > is raising this issue on my own (on case B, see previous mails). > > Any another hints? > > /Xavier From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 05:12:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4FF81930 for ; Sun, 11 Nov 2012 05:12:04 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id ACABA8FC0A for ; Sun, 11 Nov 2012 05:12:03 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id gg13so287298lbb.13 for ; Sat, 10 Nov 2012 21:12:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=KPVacai0rxGh4gKZVqVFiwI2oIgdJ4v2XlwMGPB15Mg=; b=uT9u1UvqFuO/6MNQ6jqsDme0O2axOcQH63zdCrZo0iX9mSwwRtFzbj8bl0T7rFJK9K /QVuaNGC4J0F3tbMR4e29yUIufmt0U7R7Fr9cAtLL1pJKXtPcpQx47pa7EDHpgeM1FK1 FptNgROO18EwQByniwkrKfREHZBh9hXEf5cxM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=KPVacai0rxGh4gKZVqVFiwI2oIgdJ4v2XlwMGPB15Mg=; b=XktZF90c9ChXRYKDk0XyaaTFv+jhSnXZMQFjlZowttGVxfkluvQysoNsgb0WUWLCSa Eh7wyPPaLurJxdPiJX7NilwGH7x8EDkgu2+Q4Tp1mk+Qjqej1FNhB8/UiKpmTGVfdOnr U8CZSxhHd0K1ITkWaU40Vr9X7RNT5KL8g6ARLGoCUr96MX5i4OShBY5xBziHnnCoN87f CPYN67SgOXX+g85KB/p1zGhp/t/ncmlmBRHSHGYtZ/c1QQjPaXaNsklPsQ7kJM2Gulil NiwH1kEYt0vOnTgLgRritwzZoAk/9XLO3TnLztDhIvudHDqyO7q9CxUnkajY0b6n9sIs juAg== Received: by 10.112.103.136 with SMTP id fw8mr4417661lbb.18.1352610722312; Sat, 10 Nov 2012 21:12:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.25.166 with HTTP; Sat, 10 Nov 2012 21:11:30 -0800 (PST) In-Reply-To: References: <509E79C7.10300@shatow.net> <509EA942.9060801@shatow.net> <67C1C89F-4A91-4595-8EA7-19AF3EC4656F@dragondata.com> From: Eitan Adler Date: Sun, 11 Nov 2012 00:11:30 -0500 Message-ID: Subject: Re: ZFS can't delete files when over quota To: Chris Rees Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmZuRrOHDPoWZrnafEA+FfJYym5k7tYSgb4Z4UZPnoYZw5bWmUB4RSHsRl3spatoq6ihnrM Cc: "freebsd-fs@freebsd.org" , Bryan Drewery X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 05:12:04 -0000 On 10 November 2012 15:36, Chris Rees wrote: > These are the reasons I added the -T option, which I realise would > have been more correct as -t; it's undesirable to have as default > behaviour. In my patch, errno is tested. This only fixes the issue with rm. Others tools which delete files (find(1) for instance) would be left with the same problem. Ideally the root cause could fixed, though I imagine this is non-trivial. -- Eitan Adler From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 05:39:41 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 438F1CA3 for ; Sun, 11 Nov 2012 05:39:41 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from thyme.infocus-llc.com (server.infocus-llc.com [206.156.254.44]) by mx1.freebsd.org (Postfix) with ESMTP id 0F42D8FC14 for ; Sun, 11 Nov 2012 05:39:40 +0000 (UTC) Received: from draco.over-yonder.net (c-75-65-60-66.hsd1.ms.comcast.net [75.65.60.66]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by thyme.infocus-llc.com (Postfix) with ESMTPSA id DF1D237B550; Sat, 10 Nov 2012 23:39:33 -0600 (CST) Received: by draco.over-yonder.net (Postfix, from userid 100) id 3XzkSP1TXczXy6; Sat, 10 Nov 2012 23:39:33 -0600 (CST) Date: Sat, 10 Nov 2012 23:39:33 -0600 From: "Matthew D. Fuller" To: Kevin Day Subject: Re: ZFS can't delete files when over quota Message-ID: <20121111053933.GG66994@over-yonder.net> References: <509E79C7.10300@shatow.net> <509EA942.9060801@shatow.net> <67C1C89F-4A91-4595-8EA7-19AF3EC4656F@dragondata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <67C1C89F-4A91-4595-8EA7-19AF3EC4656F@dragondata.com> X-Editor: vi X-OS: FreeBSD User-Agent: Mutt/1.5.21-fullermd.4 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.6 at thyme.infocus-llc.com X-Virus-Status: Clean Cc: "freebsd-fs@freebsd.org" , Bryan Drewery X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 05:39:41 -0000 On Sat, Nov 10, 2012 at 02:29:46PM -0600 I heard the voice of Kevin Day, and lo! it spake thus: > > This also may cause unintended or weird behavior with regard to > open/running binaries or processes that want to keep a file open. Don't forget hardlinks. I'd be Very Surprised(tm) if I rm'd one directory entry for a file, and suddenly another one turned up empty next time I tried using it... -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 07:27:51 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2D59D82E for ; Sun, 11 Nov 2012 07:27:51 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id AED438FC0A for ; Sun, 11 Nov 2012 07:27:49 +0000 (UTC) Received: from server.rulingia.com (c220-239-241-202.belrs5.nsw.optusnet.com.au [220.239.241.202]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qAB7RkcQ061181 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sun, 11 Nov 2012 18:27:47 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qAB7RdnL004846 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 11 Nov 2012 18:27:40 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qAB7RdIV004845 for freebsd-fs@freebsd.org; Sun, 11 Nov 2012 18:27:39 +1100 (EST) (envelope-from peter) Date: Sun, 11 Nov 2012 18:27:39 +1100 From: Peter Jeremy To: freebsd-fs@freebsd.org Subject: Re: zfs diff deadlock Message-ID: <20121111072739.GA4814@server.rulingia.com> References: <20121110223249.GB506@server.rulingia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="45Z9DzgjV8m4Oswq" Content-Disposition: inline In-Reply-To: <20121110223249.GB506@server.rulingia.com> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 07:27:51 -0000 --45Z9DzgjV8m4Oswq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Nov-11 09:32:49 +1100, Peter Jeremy wro= te: >I recently decided to do a "zfs diff" between two snapshots to try >and identify why there was so much "USED" space in the snapshot. >The diff ran for a while (though with very little IO) but has now >wedged unkillably. There's nothing on the console or in any logs, >the pool reports no problems and there are no other visible FS >issues. Any ideas on tracking this down? =2E.. >The systems is running a 4-month old 8-stable (r237444) I've tried a second system running the same world with the same result, so this looks like a real bug in ZFS rather than a system glitch. --=20 Peter Jeremy --45Z9DzgjV8m4Oswq Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCfU2sACgkQ/opHv/APuIcaYQCgxNY2HmA/FV3Dy8w7jVDJd8M9 NQgAoMTLMXNmEarP8NS6R4Rlj5nuug9n =w0TR -----END PGP SIGNATURE----- --45Z9DzgjV8m4Oswq-- From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 08:13:20 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1BC23DB2 for ; Sun, 11 Nov 2012 08:13:20 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5DAD98FC0C for ; Sun, 11 Nov 2012 08:13:18 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA09476; Sun, 11 Nov 2012 10:13:01 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TXSf7-0008fq-F6; Sun, 11 Nov 2012 10:13:01 +0200 Message-ID: <509F5E0A.1020501@FreeBSD.org> Date: Sun, 11 Nov 2012 10:12:58 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121030 Thunderbird/16.0.2 MIME-Version: 1.0 To: Peter Jeremy Subject: Re: zfs diff deadlock References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> In-Reply-To: <20121111072739.GA4814@server.rulingia.com> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 08:13:20 -0000 on 11/11/2012 09:27 Peter Jeremy said the following: > On 2012-Nov-11 09:32:49 +1100, Peter Jeremy > wrote: >> I recently decided to do a "zfs diff" between two snapshots to try and >> identify why there was so much "USED" space in the snapshot. The diff ran >> for a while (though with very little IO) but has now wedged unkillably. >> There's nothing on the console or in any logs, the pool reports no >> problems and there are no other visible FS issues. Any ideas on tracking >> this down? > ... >> The systems is running a 4-month old 8-stable (r237444) > > I've tried a second system running the same world with the same result, so > this looks like a real bug in ZFS rather than a system glitch. > Are you able to catch the state of all threads in the system? E.g. via procstat -k -a. Or a crash dump. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 16:16:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 08425EA1 for ; Sun, 11 Nov 2012 16:16:38 +0000 (UTC) (envelope-from bryan@shatow.net) Received: from secure.xzibition.com (secure.xzibition.com [173.160.118.92]) by mx1.freebsd.org (Postfix) with ESMTP id 9544C8FC65 for ; Sun, 11 Nov 2012 16:16:34 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=shatow.net; h=message-id :date:from:mime-version:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; q=dns; s=sweb; b=2AdWt2 KW+MP2T6s02uW5rNgoFz8Xybyu8bX4kA7qC5qxI6t+HHUifHie4JgDWC08a2nA+R EenqVWZ0oO0rC4HnjHpcof9BM4TCD1Qqg6tIpgJyqB7CjdA2qojxMkuSESLuFOX6 P3dCxfxaMzENb18Oe0Yzz6kqEgVHA8T9aqADQ= DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=shatow.net; h=message-id :date:from:mime-version:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; s=sweb; bh=k4dlyu97BWgA 4WUTnpDaIl2sDhFvtwijMsb1hunbWnU=; b=NXLQdSEJdubpwkuYuVv9wce7zvYN Aw7epvYy0dr7zwiAXjPhAFgg/PGJAgtnbQ8CspBsy7UW+0lmW91lO01o6GvaJJCP nrY+ok6+Kgt0Y3dQL0VO1YIko6tN9y9Faioa8qdARzo/8ynmmc7hXjpVlWkEQMv7 Eo9Y3bfFO5HFkrs= Received: (qmail 99979 invoked from network); 11 Nov 2012 10:16:27 -0600 Received: from unknown (HELO ?10.10.0.115?) (bryan@shatow.net@10.10.0.115) by sweb.xzibition.com with ESMTPA; 11 Nov 2012 10:16:27 -0600 Message-ID: <509FCF58.60709@shatow.net> Date: Sun, 11 Nov 2012 10:16:24 -0600 From: Bryan Drewery User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 CC: "freebsd-fs@freebsd.org" Subject: Re: ZFS can't delete files when over quota References: <509E79C7.10300@shatow.net> <509EA942.9060801@shatow.net> <67C1C89F-4A91-4595-8EA7-19AF3EC4656F@dragondata.com> In-Reply-To: X-Enigmail-Version: 1.4.5 OpenPGP: id=3C9B0CF9; url=http://www.shatow.net/bryan/bryan.asc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 16:16:38 -0000 On 11/10/2012 11:11 PM, Eitan Adler wrote: > On 10 November 2012 15:36, Chris Rees wrote: > >> These are the reasons I added the -T option, which I realise would >> have been more correct as -t; it's undesirable to have as default >> behaviour. In my patch, errno is tested. > > This only fixes the issue with rm. Others tools which delete files > (find(1) for instance) would be left with the same problem. Ideally > the root cause could fixed, though I imagine this is non-trivial. > Yeah, I agree that just fixing rm(1) to work around a ZFS issue with unlink(2) is not a good plan. There are several easy workarounds, that the rm(1) patch is not really needed. Bryan From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 16:18:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5B347AD for ; Sun, 11 Nov 2012 16:18:26 +0000 (UTC) (envelope-from bryan@shatow.net) Received: from secure.xzibition.com (secure.xzibition.com [173.160.118.92]) by mx1.freebsd.org (Postfix) with ESMTP id ED1808FC57 for ; Sun, 11 Nov 2012 16:18:22 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=shatow.net; h=message-id :date:from:mime-version:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; q=dns; s=sweb; b=y31H6y ke9gWBGa1fB3X3eYBIA9M2T8ZMxjT/ROKI4KrehJ0RprgWkwHxYvneFkRsF+qZEN YEawu5ZshpFRtzqGmisO0yp8k+IRgNv7uh50+2lfzzIEJVXh1OzLMmMLPJYgLkBh KTtIeUe9NFXa+PaE6u5HUZrum31IWjXnrbuEg= DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=shatow.net; h=message-id :date:from:mime-version:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; s=sweb; bh=cL00LMlTz3e/ SX6diedr5rVwmVjEpBJIYgcrsSuaLV8=; b=wp8tJBEwWiXef66sML0X7ieE0OMd eqS6087jlMqG9IiDVcnSBr/o0jcJtO/vZ3U0lStWyBgXGQw3nCw51kdBUfbYy4ZU ovmoN63pQOMuzvuaXWbsIL+Fv3jgg2gBBEKYnKD6RHo57RECU6UeTNKj6qFgpAeH SZDUjaGdOnlcJeo= Received: (qmail 41899 invoked from network); 11 Nov 2012 10:18:21 -0600 Received: from unknown (HELO ?10.10.0.115?) (bryan@shatow.net@10.10.0.115) by sweb.xzibition.com with ESMTPA; 11 Nov 2012 10:18:21 -0600 Message-ID: <509FCFCC.8030607@shatow.net> Date: Sun, 11 Nov 2012 10:18:20 -0600 From: Bryan Drewery User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 CC: freebsd-fs@freebsd.org Subject: Re: ZFS can't delete files when over quota References: <509E79C7.10300@shatow.net> <509EA942.9060801@shatow.net> <20121110194759.GA19081@dft-labs.eu> <20121110210317.GC19081@dft-labs.eu> In-Reply-To: X-Enigmail-Version: 1.4.5 OpenPGP: id=3C9B0CF9; url=http://www.shatow.net/bryan/bryan.asc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 16:18:26 -0000 On 11/10/2012 5:06 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Garrett Cooper" >>> Except this is not about being unable to rm because of EDQUOT (whether >>> ZFS can do something about that or not I have no idea). This is about >>> being >>> able to remove just after truncation, which clearly shows that zfs >>> can in >>> principle remove this file on its own. >>> >> >> You're probably right. My guess is that the fix would be to ignore >> EDQUOT in the unlink VOP handler. > > The CoW nature of ZFS causes this issue, which is why UFS doesn't have this > problem. Unfortunately given ZFS's snapshots there's no guarantee that > truncating the file will result in enough free space to perform the unlink. > > Regards > Steve > This is what I was initially thinking. I assumed ZFS unlink behavior was smart enough to see that I had no snapshots and that it would really trim the data immediately. Having no idea how the internals of ZFS works, but a general understanding, I could see this being non-trivial as well to not hurt performance. Bryan From owner-freebsd-fs@FreeBSD.ORG Sun Nov 11 21:35:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6E869B7D; Sun, 11 Nov 2012 21:35:31 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id EB03E8FC0C; Sun, 11 Nov 2012 21:35:30 +0000 (UTC) Received: from server.rulingia.com (c220-239-241-202.belrs5.nsw.optusnet.com.au [220.239.241.202]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qABLZMBK064345 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 12 Nov 2012 08:35:22 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qABLZGuo063915 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 12 Nov 2012 08:35:16 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qABLZFoK063914; Mon, 12 Nov 2012 08:35:15 +1100 (EST) (envelope-from peter) Date: Mon, 12 Nov 2012 08:35:15 +1100 From: Peter Jeremy To: Steven Hartland Subject: Re: ZFS corruption due to lack of space? Message-ID: <20121111213515.GC5594@server.rulingia.com> References: <27087376D1C14132A3CC1B4016912F6D@multiplay.co.uk> <20121031212346.GL3309@server.rulingia.com> <9DB937FEA7634C4BAC49EF5823F93CA3@multiplay.co.uk> <20121101224355.GS3309@server.rulingia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NzB8fVQJ5HfG6fxh" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2012 21:35:31 -0000 --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Nov-02 09:30:04 -0000, Steven Hartland wr= ote: >From: "Peter Jeremy" >> Many years ago, I wrote a simple utility that fills a raw disk with >> a pseudo-random sequence and then verifies it. This sort of tool > >Sounds useful, got a link? Sorry, no. I never released it. But writing something like it is quite easy. --=20 Peter Jeremy --NzB8fVQJ5HfG6fxh Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCgGhMACgkQ/opHv/APuIcEWQCgv/fIBA95G/Xy6D0NRfqJTgqx vhAAoKYra7UebNp7fz2YZxxVnNFcA+pv =0pjN -----END PGP SIGNATURE----- --NzB8fVQJ5HfG6fxh-- From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 04:40:09 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E1C66AB5 for ; Mon, 12 Nov 2012 04:40:09 +0000 (UTC) (envelope-from "") Received: from mrelay.siverge.com (mrelay.siverge.com [82.80.94.82]) by mx1.freebsd.org (Postfix) with ESMTP id 4EA498FC0C for ; Mon, 12 Nov 2012 04:40:08 +0000 (UTC) Received: from MailMarshal ([127.0.0.1]) by mrelay.siverge.com with MailMarshal (v6, 8, 4, 9558) id ; Mon, 12 Nov 2012 06:13:31 +0200 Message-ID: From: mailmarshal@siverge.com To: freebsd-fs@freebsd.org CC: Date: Mon, 12 Nov 2012 06:13:31 +0200 Subject: Policy Breaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Undetermined, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic, Suspect Summary Digest: 1 Messages MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="--=edf56325-6c7e-48e2-844b-610c3900c4d3" X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 04:40:10 -0000 ----=edf56325-6c7e-48e2-844b-610c3900c4d3 Content-Type: text/plain; charset="utf-7" Content-Transfer-Encoding: quoted-printable Policy Breaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Un= determined, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornogra= phic, Suspect Folder Summary Digest for freebsd-fs@freebsd.org sent on Mo= nday, November 12, 2012 The emails listed below have been placed by MailMarshal in your Policy Br= eaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Undetermine= d, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic, Sus= pect Folder. They will be deleted after 7 days. To view your quarantine mails go to http://mrelay/Spam/ From: infos@paypal.at Subject: Important Date: 12 Nov 03:05 =20 Von: PayPal Datum: 12/11/2012 Wir m=F6chten Sie dar=FCber informier= en, dass ... For more information contact yossin@siverge.com ----=edf56325-6c7e-48e2-844b-610c3900c4d3-- From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 04:42:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2C06EB32 for ; Mon, 12 Nov 2012 04:42:22 +0000 (UTC) (envelope-from jhellenthal@dataix.net) Received: from mail-ia0-f182.google.com (mail-ia0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id D317B8FC08 for ; Mon, 12 Nov 2012 04:42:21 +0000 (UTC) Received: by mail-ia0-f182.google.com with SMTP id k10so5390567iag.13 for ; Sun, 11 Nov 2012 20:42:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dataix.net; s=rsa; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:cc:content-transfer-encoding:mime-version; bh=HnnhEmurFhWS8FNrLrXg2RUOUS88I4QGd9ybrK4cO1c=; b=SJlgGHryuXJuHphx7Fg02gMbYbOyKaQXJHFBgp2j1aiWS39L+55lHUdkBpceJjazIr ejD088LCpxWfNUX01dH7DU0EvAnWzqE0M44+aWYeN/q4ph+A430A8acNu4JSUjukqo7b p4z28lhYyfqbUhtxdQwz8MND6b1O5qrMD8sCQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:cc:content-transfer-encoding:mime-version :x-gm-message-state; bh=HnnhEmurFhWS8FNrLrXg2RUOUS88I4QGd9ybrK4cO1c=; b=GEuLRS8wgC1YqdCJDfckiq5O45Sp5tVAnl2xuxg13kW9VLvH6R8pTjqgrfo++di7Iu oFWxVGMS+DfHlpZC+2tm3ZanN82myFlLobwFbyDh/XutDcPw5F8bwmW7MqvZhcr4XeF7 zkfdRsytxwqq9390+xpgiYqClAsJ12w9ld4BW9kK7LYYHapchZn4ACNL9tPnNwUHMWBt Kn8THNzUCu5SlWrLZRPWNIaDCho1BD5XOr6qZB0n6C/7dBdYZx5WKXH3MOQzmaLQRGWN ETNDeldEfgbB0LeoTua+s+AyiWAlDBhBJV1PTj5KIdwLVZdX7kMTItuIkBYgV1+RJMar gabw== Received: by 10.50.12.138 with SMTP id y10mr6750269igb.58.1352695340989; Sun, 11 Nov 2012 20:42:20 -0800 (PST) Received: from [192.168.32.64] (adsl-99-181-130-210.dsl.klmzmi.sbcglobal.net. [99.181.130.210]) by mx.google.com with ESMTPS id u4sm5488107igw.6.2012.11.11.20.42.20 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Nov 2012 20:42:20 -0800 (PST) Subject: Re: Policy Breaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Undetermined, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic, Suspect Summary Digest: 1 Messages References: From: Jason Hellenthal Content-Type: text/plain; charset=utf-8 X-Mailer: iPhone Mail (8C148) In-Reply-To: Message-Id: <82C23EC8-6AB7-4402-9713-4BA5DE2AA143@DataIX.net> Date: Sun, 11 Nov 2012 23:42:16 -0500 Cc: "freebsd-fs@freebsd.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (iPhone Mail 8C148) X-Gm-Message-State: ALoCoQmD5xmZtzqgGKZHP0ZfXx+0dwfvGqujOGIBmCENW8NkqazvoxkEu2KTg7BUGRAxQYgaBD8y X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 04:42:22 -0000 These turds need to learn a lesson. --=20 Jason Hellenthal JJH48-ARIN - (2^(N-1)) On Nov 11, 2012, at 23:13, mailmarshal@siverge.com wrote: > Policy Breaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Und= etermined, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic= , Suspect Folder Summary Digest for freebsd-fs@freebsd.org sent on Monday, N= ovember 12, 2012 > The emails listed below have been placed by MailMarshal in your Policy Bre= aches, Malformed, Spam Type - Zero Day, Routing, Encryption, Undetermined, S= pam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic, Suspect Fo= lder. They will be deleted after 7 days. > To view your quarantine mails go to http://mrelay/Spam/ >=20 > From: infos@paypal.at > Subject: Important > Date: 12 Nov 03:05 > Von: PayPal Datum: 12/11/2012 Wir m=C3=B6chten Sie dar=C3=BCber informi= eren, dass ... >=20 >=20 >=20 > For more information contact yossin@siverge.com >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 04:48:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 33DC8BC7; Mon, 12 Nov 2012 04:48:56 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from onyx.glenbarber.us (onyx.glenbarber.us [IPv6:2607:fc50:1000:c200::face]) by mx1.freebsd.org (Postfix) with ESMTP id E4FDB8FC14; Mon, 12 Nov 2012 04:48:55 +0000 (UTC) Received: from glenbarber.us (unknown [IPv6:2001:470:8:1205:2:2:0:100]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: gjb) by onyx.glenbarber.us (Postfix) with ESMTPSA id 395A123F654; Sun, 11 Nov 2012 23:48:54 -0500 (EST) Date: Sun, 11 Nov 2012 23:48:51 -0500 From: Glen Barber To: Jason Hellenthal Subject: Re: Policy Breaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Undetermined, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic, Suspect Summary Digest: 1 Messages Message-ID: <20121112044851.GC1372@glenbarber.us> References: <82C23EC8-6AB7-4402-9713-4BA5DE2AA143@DataIX.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vEao7xgI/oilGqZ+" Content-Disposition: inline In-Reply-To: <82C23EC8-6AB7-4402-9713-4BA5DE2AA143@DataIX.net> X-Operating-System: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 04:48:56 -0000 --vEao7xgI/oilGqZ+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Nov 11, 2012 at 11:42:16PM -0500, Jason Hellenthal wrote: > These turds need to learn a lesson. >=20 Same for the people responding to/acknowledging their garbage email. Glen --vEao7xgI/oilGqZ+ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQEcBAEBCAAGBQJQoH+zAAoJEFJPDDeguUajq8YH/jI+XvxcyX5a5dGKDWnW413a VwM/3gx1UsChxyoJqsHP6FzMqUnAl7U8QdgChMIGhyahAqkQYNT/vjzvn0M/wqe7 zEO64csOaK9KMRhFx90zRNp+c9c1uIDzVa7IFgbY4VFMkZ/MX/R+q01t7/tn32WS Y7IXuWkfmG6X4zcQvOO83WsEA1f3ydH/fLjn25062YrRMaPBpxe5zGBef5ZzrbhZ KK4q9IBBYIT2RHHtrZxhHtj5cHGZMUFpK8Ts3YZ5wJ4iKQ5rZtbAZWuHwKiCxx7D c+0O/kqPmzo+suMlmRQ9Ox7+d0GokzD37x08P2ID4iQW9pzAABQ3GXZm15eA9mY= =53SL -----END PGP SIGNATURE----- --vEao7xgI/oilGqZ+-- From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 04:57:58 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 65D3FC8C for ; Mon, 12 Nov 2012 04:57:58 +0000 (UTC) (envelope-from jhellenthal@dataix.net) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id 144DA8FC14 for ; Mon, 12 Nov 2012 04:57:57 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id k10so10950107iea.13 for ; Sun, 11 Nov 2012 20:57:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dataix.net; s=rsa; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=/q+boCZ0ajz+ht4Q3v2+4mEO9jSam/3ZxWIyjiXhiLU=; b=A8ILErrhwf7M4uXh6tfm91kdkyUcVGW8sVRxzo6IqOTP1nwg5qJK1uW/Zn5gR1EeR5 veUvi77vfi7PhPj1PH+iVNPZAZiZKb3pVKCgL2cD173VG/gmgNbVznDa31id0wdkPs9C +pH6Y6gCDY1TuBDJWYEjFPbrtuRgDZKM9o9j0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to :x-gm-message-state; bh=/q+boCZ0ajz+ht4Q3v2+4mEO9jSam/3ZxWIyjiXhiLU=; b=ScfVdbxPUG0jaolZddOhlpsg0wyo44gCoeACWMeshsSIpr2MkSI0pRQrnkRI95xYtu 8eFFMhQkZKRK9SsTIifX3TGqi7BE5OlM9vuj16T7Vkg8Mitke3WDw9+G9j4sDCQT1+Ro pXkT7idR/lagv/BFJCXspN1Fx+/BcCMOh9+lJ9+fhF3pIfp3RmIyeT0ApbcJ2TpQzMBb s1OVugZUWNFT10JNNm7QVv5MyXlYRMN1k05ea29NrIP13oLo1aluTwG4MK158oY1dJ/d SxCNpaz4jM3c637ehN4Gbe6wIcjB30kU4aKFS6MdHreFVEE6YptU8OlobmxhOGOYy2g5 SUxg== Received: by 10.50.151.238 with SMTP id ut14mr6799173igb.58.1352696277242; Sun, 11 Nov 2012 20:57:57 -0800 (PST) Received: from DataIX.net (adsl-99-181-130-210.dsl.klmzmi.sbcglobal.net. [99.181.130.210]) by mx.google.com with ESMTPS id hg2sm7468408igc.3.2012.11.11.20.57.55 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Nov 2012 20:57:55 -0800 (PST) Received: from [192.168.32.64] ([192.168.32.64]) (authenticated bits=0) by DataIX.net (8.14.5/8.14.5) with ESMTP id qAC4vqKP027375 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 11 Nov 2012 23:57:53 -0500 (EST) (envelope-from jhellenthal@DataIX.net) References: <82C23EC8-6AB7-4402-9713-4BA5DE2AA143@DataIX.net> <20121112044851.GC1372@glenbarber.us> In-Reply-To: <20121112044851.GC1372@glenbarber.us> Mime-Version: 1.0 (iPhone Mail 8C148) Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Message-Id: X-Mailer: iPhone Mail (8C148) From: Jason Hellenthal Subject: Re: Policy Breaches, Malformed, Spam Type - Zero Day, Routing, Encryption, Undetermined, Spam, Malformed Mime, Spam Type - Phish, Spam Type - Pornographic, Suspect Summary Digest: 1 Messages Date: Sun, 11 Nov 2012 23:57:50 -0500 To: Glen Barber X-Gm-Message-State: ALoCoQlrTAaBr66aIA6KBSFR9pKoK5HFUVH83sept8tx85zCbcMdZBt9NB3o62gkvmtY9xNd+Qrw Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 04:57:58 -0000 Yeah but I cut their name out. -- Jason Hellenthal JJH48-ARIN - (2^(N-1)) On Nov 11, 2012, at 23:48, Glen Barber wrote: > On Sun, Nov 11, 2012 at 11:42:16PM -0500, Jason Hellenthal wrote: >> These turds need to learn a lesson. >> > > Same for the people responding to/acknowledging their garbage email. > > Glen > From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 08:06:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9363D68C for ; Mon, 12 Nov 2012 08:06:00 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id BF3638FC08 for ; Mon, 12 Nov 2012 08:05:58 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id qAC85kK9088455 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 12 Nov 2012 10:05:47 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <50A0ADDA.9040205@digsys.bg> Date: Mon, 12 Nov 2012 10:05:46 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.10) Gecko/20121029 Thunderbird/10.0.10 MIME-Version: 1.0 To: Freddie Cash Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) References: <1350698905.86715.33.camel@btw.pki2.com> <1350711509.86715.59.camel@btw.pki2.com> <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <5084F6D5.5080400@digsys.bg> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 08:06:00 -0000 On 22.10.12 16:47, Freddie Cash wrote: > > I'll double-check when I get to work, but I'm pretty sure it's > 10.something. > > On Oct 22, 2012 12:34 AM, "Daniel Kalchev" > wrote: > > > > On 21.10.12 09:52, Freddie Cash wrote: > > [...] > > All three run without any serious issues. The only issues > we've had are 3, maybe 4, situations where I've tried to > destroy multi-TB filesystems without enough RAM in the > machine. We're now running a minimum of 32 GB of RAM with 64 > GB in one box. > > > What is the firmware on your LSI2008 controllers? > > I am having weird situation with one server that has LSI2008, on > 9-stable and all SSD configuration. One or two of the drives would > drop off the bus for no reason sometimes few times a day and > because the current driver ignores bus reset, someone has to > physically remove and re-insert the drives for them to come back. > Real pain. > My firmware version is 12.00.00.00 -- perhaps it is buggy? > As weird as it sounds, I discovered that my SSD-only zpool (raidz1) was using ashift=9. So, changed to ashift=12 and not seen disconnects anymore for a week now. Is 4k good for these SSDs? Or 8k is better, or larger? It seems it's really an SSD firmware problem, as the SSDs are likely doing more work when used with 512b sectors and from time to time fail to communicate properly with the bus. The SSDs are OCZ-VERTEX4 (firmware 1.5). Sometimes it seems to be drive related problem and perhaps the mps driver/hardware is too sensitive to drive issues. Daniel From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 09:19:11 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3EDA1550 for ; Mon, 12 Nov 2012 09:19:11 +0000 (UTC) (envelope-from prvs=1663cb00d2=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id BA7188FC08 for ; Mon, 12 Nov 2012 09:19:10 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001029766.msg for ; Mon, 12 Nov 2012 09:18:30 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 12 Nov 2012 09:18:30 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1663cb00d2=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <6E97CF2618534750AE82D23332C6CADC@multiplay.co.uk> From: "Steven Hartland" To: "Daniel Kalchev" , "Freddie Cash" References: <1350698905.86715.33.camel@btw.pki2.com> <1350711509.86715.59.camel@btw.pki2.com> <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <5084F6D5.5080400@digsys.bg> <50A0ADDA.9040205@digsys.bg> Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) Date: Mon, 12 Nov 2012 09:18:41 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 09:19:11 -0000 ----- Original Message ----- From: "Daniel Kalchev" > As weird as it sounds, I discovered that my SSD-only zpool (raidz1) was > using ashift=9. So, changed to ashift=12 and not seen disconnects > anymore for a week now. Is 4k good for these SSDs? Or 8k is better, or > larger? > > It seems it's really an SSD firmware problem, as the SSDs are likely > doing more work when used with 512b sectors and from time to time fail > to communicate properly with the bus. The SSDs are OCZ-VERTEX4 (firmware > 1.5). > > Sometimes it seems to be drive related problem and perhaps the mps > driver/hardware is too sensitive to drive issues. I don't know any SSD or drives for that matter using larger than 4k sectors. Could you post the output from:- camcontrol identify I've got a list of other 4k drives to add quirks for so might as well include this one while I'm at it :) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 09:56:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0222DBD0 for ; Mon, 12 Nov 2012 09:56:35 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 31C118FC15 for ; Mon, 12 Nov 2012 09:56:34 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id qAC9uSN5090011 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 12 Nov 2012 11:56:29 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <50A0C7CC.5020108@digsys.bg> Date: Mon, 12 Nov 2012 11:56:28 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.10) Gecko/20121029 Thunderbird/10.0.10 MIME-Version: 1.0 To: Steven Hartland Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) References: <1350698905.86715.33.camel@btw.pki2.com> <1350711509.86715.59.camel@btw.pki2.com> <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <5084F6D5.5080400@digsys.bg> <50A0ADDA.9040205@digsys.bg> <6E97CF2618534750AE82D23332C6CADC@multiplay.co.uk> In-Reply-To: <6E97CF2618534750AE82D23332C6CADC@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 09:56:36 -0000 On 12.11.12 11:18, Steven Hartland wrote: > ----- Original Message ----- From: "Daniel Kalchev" > >> As weird as it sounds, I discovered that my SSD-only zpool (raidz1) >> was using ashift=9. So, changed to ashift=12 and not seen disconnects >> anymore for a week now. Is 4k good for these SSDs? Or 8k is better, >> or larger? >> >> It seems it's really an SSD firmware problem, as the SSDs are likely >> doing more work when used with 512b sectors and from time to time >> fail to communicate properly with the bus. The SSDs are OCZ-VERTEX4 >> (firmware 1.5). >> >> Sometimes it seems to be drive related problem and perhaps the mps >> driver/hardware is too sensitive to drive issues. > > I don't know any SSD or drives for that matter using larger than 4k > sectors. > > Could you post the output from:- > camcontrol identify > > I've got a list of other 4k drives to add quirks for so might as well > include this one while I'm at it :) Quirks, yes. camcontrol identify da0 returns nothing :) camcontrol inquiry da0 returns pass0: Fixed Direct Access SCSI-6 device pass0: Serial Number OCZ-9DS07S644P10JV16 pass0: 600.000MB/s transfers, Command Queueing Enabled smartctl -a /dev/da0 returns more useful info smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-PRERELEASE amd64] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Indilinx Everest/Martini based SSDs Device Model: OCZ-VERTEX4 Serial Number: OCZ-9DS07S644P10JV16 LU WWN Device Id: 5 e83a97 2e1c46899 Firmware Version: 1.5 User Capacity: 128,035,676,160 bytes [128 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Nov 12 11:52:32 2012 EET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x00) Error logging NOT supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 0) minutes. Extended self-test routine recommended polling time: ( 0) minutes. SMART Attributes Data Structure revision number: 18 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0000 006 000 000 Old_age Offline - 6 3 Spin_Up_Time 0x0000 100 100 000 Old_age Offline - 0 4 Start_Stop_Count 0x0000 100 100 000 Old_age Offline - 0 5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0 9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 2323 12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 23 232 Lifetime_Writes 0x0000 100 100 000 Old_age Offline - 101076764290 233 Media_Wearout_Indicator 0x0000 090 000 000 Old_age Offline - 90 SMART Error Log not supported Warning! SMART Self-Test Log Structure error: invalid SMART checksum. SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Selective Self-tests/Logging not supported Daniel From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 10:11:33 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7CFEAEBE for ; Mon, 12 Nov 2012 10:11:33 +0000 (UTC) (envelope-from prvs=1663cb00d2=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 0480E8FC13 for ; Mon, 12 Nov 2012 10:11:32 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001030009.msg for ; Mon, 12 Nov 2012 10:11:29 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 12 Nov 2012 10:11:29 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1663cb00d2=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <4812812DAF724B0B9051BEF0C6360149@multiplay.co.uk> From: "Steven Hartland" To: "Daniel Kalchev" References: <1350698905.86715.33.camel@btw.pki2.com> <1350711509.86715.59.camel@btw.pki2.com> <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> <1350778257.86715.106.camel@btw.pki2.com> <5084F6D5.5080400@digsys.bg> <50A0ADDA.9040205@digsys.bg> <6E97CF2618534750AE82D23332C6CADC@multiplay.co.uk> <50A0C7CC.5020108@digsys.bg> Subject: Re: ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2)) Date: Mon, 12 Nov 2012 10:11:40 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 10:11:33 -0000 ----- Original Message ----- From: "Daniel Kalchev" .. >>> Sometimes it seems to be drive related problem and perhaps the mps >>> driver/hardware is too sensitive to drive issues. >> >> I don't know any SSD or drives for that matter using larger than 4k >> sectors. >> >> Could you post the output from:- >> camcontrol identify >> >> I've got a list of other 4k drives to add quirks for so might as well >> include this one while I'm at it :) > > Quirks, yes. > > camcontrol identify da0 > > returns nothing :) Ah yes sorry forgot thats only working here, something that will hopefully make it into the tree relatively soon :) > camcontrol inquiry da0 > > returns > > pass0: Fixed Direct Access SCSI-6 device > pass0: Serial Number OCZ-9DS07S644P10JV16 > pass0: 600.000MB/s transfers, Command Queueing Enabled Cool that's what I need (the serial number) thanks. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 11:06:43 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5BD6AA06 for ; Mon, 12 Nov 2012 11:06:43 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 412A58FC23 for ; Mon, 12 Nov 2012 11:06:43 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qACB6hOj000335 for ; Mon, 12 Nov 2012 11:06:43 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qACB6gZB000333 for freebsd-fs@FreeBSD.org; Mon, 12 Nov 2012 11:06:42 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 12 Nov 2012 11:06:42 GMT Message-Id: <201211121106.qACB6gZB000333@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 11:06:43 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173254 fs [zfs] [patch] Upgrade requests used in ZFS trim map ba o kern/173234 fs [zfs] [patch] Allow filtering of properties on zfs rec o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/172259 fs [zfs] [patch] ZFS fails to receive valid snapshots (pa o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o kern/170914 fs [zfs] [patch] Import patchs related with issues 3090 a o kern/170912 fs [zfs] [patch] unnecessarily setting DS_FLAG_INCONSISTE o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/170238 fs [zfs] [panic] Panic when deleting data o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167066 fs [zfs] ZVOLs not appearing in /dev/zvol o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo p kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " p kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o conf/144213 fs [rc.d] [patch] Disappearing zvols on reboot o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 298 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 11:57:09 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8E3788EF; Mon, 12 Nov 2012 11:57:09 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from tower.berklix.org (tower.berklix.org [83.236.223.114]) by mx1.freebsd.org (Postfix) with ESMTP id EEC568FC12; Mon, 12 Nov 2012 11:57:08 +0000 (UTC) Received: from mart.js.berklix.net (p5DCBD58D.dip.t-dialin.net [93.203.213.141]) (authenticated bits=0) by tower.berklix.org (8.14.2/8.14.2) with ESMTP id qACBuv2u094533; Mon, 12 Nov 2012 11:56:59 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id qACBujR5026400; Mon, 12 Nov 2012 12:56:45 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id qACBuCNO053019; Mon, 12 Nov 2012 12:56:19 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201211121156.qACBuCNO053019@fire.js.berklix.net> To: Peter Jeremy Subject: Re: ZFS corruption due to lack of space? From: "Julian H. Stacey" Organization: http://berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Mon, 12 Nov 2012 08:35:15 +1100." <20121111213515.GC5594@server.rulingia.com> Date: Mon, 12 Nov 2012 12:56:11 +0100 Sender: jhs@berklix.com Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 11:57:09 -0000 Peter Jeremy wrote: > On 2012-Nov-02 09:30:04 -0000, Steven Hartland wr= > ote: > >From: "Peter Jeremy" > >> Many years ago, I wrote a simple utility that fills a raw disk with > >> a pseudo-random sequence and then verifies it. This sort of tool > > > >Sounds useful, got a link? > > Sorry, no. I never released it. But writing something like it is > quite easy. I wrote http://berklix.com/~jhs/src/bsd/jhs/bin/public/testblock/ There's no pseudo random, but sufficient for my media test needs, There's a bunch of similar tools in ports/ (but I wrote mine before FreeBSD existed so I've never looked). Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Reply below not above, like a play script. Indent old text with "> ". Send plain text. Not: HTML, multipart/alternative, base64, quoted-printable. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 17:24:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AA4E314C for ; Mon, 12 Nov 2012 17:24:10 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id 7324F8FC08 for ; Mon, 12 Nov 2012 17:24:10 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtp (Exim 4.76) (envelope-from ) id 1TXxk0-0003KM-34 for freebsd-fs@freebsd.org; Mon, 12 Nov 2012 12:24:08 -0500 Message-ID: <50A130B7.4080604@cse.yorku.ca> Date: Mon, 12 Nov 2012 12:24:07 -0500 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.9) Gecko/20121011 Thunderbird/10.0.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: RHEL to FreeBSD file server Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: For the last few months, I've been working on and off learning about FreeBSD. The goal of my work is to swap out our current dual Red Hat Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for the most reliable, high performance NFS file server that I can get. The fact that, in addition, I get to take advantage of ZFS is what I see as a a major bonus. [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 17:24:10 -0000 For the last few months, I've been working on and off learning about FreeBSD. The goal of my work is to swap out our current dual Red Hat Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for the most reliable, high performance NFS file server that I can get. The fact that, in addition, I get to take advantage of ZFS is what I see as a a major bonus. I only recently (thanks Rick!) became aware of this mailing list, and after reading a few months worth of postings, I'm a little "nervous" about stability of ZFS in FreeBSD, though I can see that many issues are tied to specific combinations of FreeBSD versions, driver versions, specific HBAs, etc. In addition, I'm a little bit concerned reading about perceived performance issues with the NFS implementation, yet, only recently, there was a new NFS implementation in FreeBSD. That being said, I've learned that in general, people don't often go posting about experiences that work well, so I'm trying to stay positive, and hoping that my plan is still for the best. I'm hoping to share some information about what I've done with the old file servers, and what I intend to do with the new one, and get some feedback from list members to see if I'm heading in the right direction. Even if you can't comment on anything I'm about to write about, but you can tell me about a positive experience you have running FreeBSD as an NFS file server with ZFS, that would be great!! My present (2008) file servers both contain LSI/3ware RAID controller cards, and several RAID units with disks arranged in RAID10 configuration. There are a total of about 1600 mounts across both servers. Home directories are "split" between the servers, but only on two ext3 filesystems. We are using NFSv3 at the moment, and because I don't use Kerberos, I run NFS over OpenVPN, mostly to protect the connection (though we use cipher none for performance). For cost effectiveness, we have a "manual failover" solution. That is, either file server has enough disk slots to "take over" for the other server. If a server is taken down, I can take the disks out of either server, place them in the other server, turn it on, and through scripting, either server can take over the IP/name/disks from the other server, and all the NFS clients resume as if both servers are running. It's not ideal, but I'll tell you - it's cost effective! Fast forward a few years... I'm looking to replace the above hardware completely. In terms of hardware, I've recently been able to acquire a new 12th generation Dell PowerEdge R720 server with 64 GB of memory and dual E5-2660 processors (2.20 Ghz). It has an integrated Dell H310 controller (FreeBSD mfi driver) - which is presently only used for a mirrored root configuration (2 x 500 GB NL SAS drives). I added 2 x LSI 9205-8e cards (LSISAS2308) to the server. The LSI cards were flashed to the latest LSI firmware. I also have 1 Dell MD1220 array with 24 x 900 GB 10K SAS drives for data. The server has 4 x 1 GB Intel NICs. I'm working with FreeBSD 9.1RC3 because I understand that the 9.1 series includes many important improvements, and a totally new driver for the LSI SAS HBA cards. I suspect that by the time the file server is ready to go live, 9.1 will be officially released. In terms of ZFS, in my testing, I have been using a single ZFS pool comprised of 11 mirrored vdevs - a total of 22 disks, with 2 spares (24 disks total). As I understand it, I should be able to get the optimal performance this way. I considered using multiple pools, but with multiple pools comes multiple ZIL, L2ARC, etc and reduction in the performance numbers. I've been told that people have far bigger ZFS pools than my 22 disk zpool. As I understand it, as storage requirements increase, I could easily add another MD1220 with an additional 11 x mirrored vdev pairs and "append" this to the original pool, giving me lots more space with little hassle. At the moment, I have each LSI 9205-8e serving half of the disks in the single MD1220 chassis in a split configuration - that is, 12 disks on each LSI HBA card. It's a little overkill, I think, but the primary reason for buying the second LSI HBA card was to ensure that I had a spare card in the event that the first card ever failed. I figured that I might as well use it to improve performance rather than sitting it on the shelf collecting dust. Should I get funds to purchase an additional MD1220 (another 24 disks), I was thinking of configuring 1 x 9205-8e per MD1220, which I'm sure is also overkill. However, in theory, if both sides of the mirrored vdevs were placed in separate MD1220s, I would expect this to give me the ultimate in performance. In addition, should I lose one 9205-8e or one MD1220, I would expect that I would be able to "temporarily" continue in operation (while biting my nails without redundancy!!!). In addition, in my testing, I'm hoping to use NFSv4, which so far seems good. I have many, oh so many questions... 1) The new file server is relatively powerful. However, is one file server enough to handle a load of approximately 2000 connections? should I be looking at getting another server, or getting another server and another MD1220? How is 64 GB of memory when I'm talking about up to 2500-3000 ZFS filesystems on the box? I'm not using dedup and using minimal compression. 2) It is my intention to have 1 ZFS filesystem per user (so approx. 1800 right now)... Is this the way to go? It sure makes quotas easier! 3) I understand that I should be adding a SSD based ZIL. I don't have one right now. I've seen a lot of mixed information about what is the most cost effective solution that actually makes a difference. I'm wondering if someone could recommend a cost effective ZIL that works. It has to be 2.5" because all the disk slots in my configuration are 2.5". I believe someone recently recommended one of the newer Intel SSDs? As well, what size? (I understand that what complicates performance in any one brand of SSD is that the difference sizes perform differently)... Is there a problem if I put the ZIL in the file server head that is being managed by the mfi driver, even though it is ZIL for the disks managed by mps in the MD1220? 4) Under Linux, to be able to have a second server take over the disks from the first server with my "manual failover", I had to hard-code fsids on exports. Should I choose to do the same thing under FreeBSD, I'm told that the fsids on FreeBSD are generated based on a unique number for the file system type plus number generated by the file system -- but will this number remain the same for the filesystem if its exported from one system and imported into another ? 5) What would be the best recommended way of testing performance of the setup? I've done some really really basic testing using filebench.. local filebench fs: 42115: 77.169: IO Summary: 3139018 ops, 52261.125 ops/s, (4751/9502 r/w), 1265.8mb/s, 0us cpu/op, 3.3ms latency over NFS on a 100 mbps client: 27939: 182.854: IO Summary: 53254 ops, 887.492 ops/s, (81/162 r/w), 20.6mb/s, 876us cpu/op, 202.8ms latency over NFS on a 1 gigabit client: 4588: 84.732: IO Summary: 442488 ops, 7374.279 ops/s, (670/1341 r/w), 175.3mb/s, 491us cpu/op, 23.5ms latency ... I don't have the resources to write my own test suite, custom to our day to day operations, so I have to stick with one of the existing solutions. What would the best way to do this? Would simply connecting to the NFS server from several hundred clients, and running filebench be an "optimal" solution? Anyway, my apologies for the length of this e-mail. I've tried to "shorten" this as much as I could. I have so many questions! :) I'm hoping for any feedback that you might be able to provide, even if it's just one comment or two. Thanks for taking the time to read! Jason Keltz From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 18:20:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8EBFBDD1 for ; Mon, 12 Nov 2012 18:20:42 +0000 (UTC) (envelope-from break19@gmail.com) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4DD958FC0C for ; Mon, 12 Nov 2012 18:20:42 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id k10so12286006iea.13 for ; Mon, 12 Nov 2012 10:20:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=FEmEuMdxqZ6/NSPWgIgmn7ziubX/IeYCliYb6pnA3tM=; b=fGp4AZnoQt2bV9Rp6Qz7T8Fvrb0hvLnqyPwtC0/9j+YV18LrD8VRd192ukPSodr3aw BOOhiHcGvvgzUxHPfeX+kg6iUMr3TnWYc9PbZIDOYZTIMZnkVSDH95+SXt4h9m2/p0GC nAiMV45pXlSd5DBG0b1JWw9qo6upCWX7bGiK/KpWtpprNnIx9htS34ui/XSFEwPVpJ4b 7KheYsdr66dN7wHEb4UUGYvQplMTOD0SlMUm5jsAVk33Kgz+rR4EDci/ug/N6u9kaKoi SQCXk9GYss0KbcQGdDOTYr3E2Xm0GfW9M0czB3R5dzLwluMBYeVEPg0E/qaGqnkq7y24 ZEvQ== Received: by 10.50.40.229 with SMTP id a5mr8840269igl.59.1352744441801; Mon, 12 Nov 2012 10:20:41 -0800 (PST) Received: from [192.168.0.199] (173-119-70-193.pools.spcsdns.net. [173.119.70.193]) by mx.google.com with ESMTPS id dq9sm6823088igc.5.2012.11.12.10.20.39 (version=SSLv3 cipher=OTHER); Mon, 12 Nov 2012 10:20:41 -0800 (PST) Message-ID: <50A13DC1.2040301@gmail.com> Date: Mon, 12 Nov 2012 12:19:45 -0600 From: Chuck Burns User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> In-Reply-To: <50A130B7.4080604@cse.yorku.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 18:20:42 -0000 TBH, I've never gotten my hands on anything close to that much power, but I can tell you, from my experience with FreeBSD's NFS implementation on much lower-powered hardware, it uses less CPU on a system that ran debian stable for a bit. Said system is not powerful enough for ZFS, mind you.. However, I do have another FreeBSD system that is more of a desktop system. I have only 3x750G wd blue sata drives in it, in a raidz, and I've not had any io-related issues on it. Admittedly, my experiences are more "hobbyist" than enterprise, and although I do try to keep abreast of all this info, I'm so not sure how much that info helps.. I do, however, have the understanding that most of the performance issues have had to do with certain controllers simply not being configured correctly, or have the "right" firmware On a slightly unrelated note, "pics?" I would love to drool over this setup. :) -- Chuck Burns From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 21:20:37 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7347D92F for ; Mon, 12 Nov 2012 21:20:37 +0000 (UTC) (envelope-from me@nikitosi.us) Received: from mail-qa0-f54.google.com (mail-qa0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 235C18FC14 for ; Mon, 12 Nov 2012 21:20:36 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id g24so307321qab.13 for ; Mon, 12 Nov 2012 13:20:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type :x-gm-message-state; bh=lkKzBqKN20sXwguPVBW+M/rweoOtPMYR7tTlWK1Mmr4=; b=Fd6J/aXeWWZWp1L3HBoHbFvkYCaP0m9le/58FAdUh7FghJOR8Ftr+4bYjjLafwCaNv pn4uvmPBYongJz0sQUcE86E/ZSWYnbw65P3bAm1NVkw6jxf/GJHNN+lqmOcUZEKTLiDJ FFUId201D721FwWH5na47oRoWwe+H42obwhKf03p3dF+z7C+cHv4eMtGaND0qKxLw5Qw nS7xM981X6UhJNNLsMebm5iWGC653JKauOfQwMUQi63ay3LxByPazwiC5ug07HgXhHPh jAQsfdwBV3gTwc79opmyJffJaQO7yccasFeL090QQlREj1S1JCez/FcceDf7hqmbtH95 pALA== Received: by 10.224.27.140 with SMTP id i12mr21976918qac.15.1352755235023; Mon, 12 Nov 2012 13:20:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.49.29.230 with HTTP; Mon, 12 Nov 2012 13:20:13 -0800 (PST) From: nikitosiusis Date: Tue, 13 Nov 2012 00:20:13 +0300 Message-ID: Subject: "zpool add" safety checks are skipped if a pool was created with "-f" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQkKT6SQNt9I+OBgAwbT1d3/WrVxaOG7bv0AyH4bqBxuWT61TK6PprZ75BR3dh2micXVDtVV X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 21:20:37 -0000 Good day everyone. I managed to add a single device to a raidz2 pool by a typo(actually I wanted to add a cache device) without "-f" flag. This can be reproduced with md devices. # for a in {1..8}; do dd if=/dev/zero of=$a bs=1M count=96;done # dd if=/dev/zero of=9 bs=1M count=128 # ls -la -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 1 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 2 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 3 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 4 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 5 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 6 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 7 -rw-r--r-- 1 root wheel 100663296 Nov 12 21:04 8 -rw-r--r-- 1 root wheel 134217728 Nov 12 21:04 9 # for a in {1..9}; do mdconfig -f ~/tmp/$a; done Now we create a raidz pool with 8 identical drives. # zpool create testpool raidz2 md{1..8} # zpool add testpool md9 invalid vdev specification use '-f' to override the following errors: mismatched replication level: pool uses raidz and new vdev is disk It's ok. It doesn't allow to add a device. Now we create a new pool, but with devices of different size(I don't know what is the difference in size allowed, I used 32mb). # zpool create testpool raidz2 md{1..7} md9 invalid vdev specification use '-f' to override the following errors: raidz contains devices of different sizes # zpool create -f testpool raidz2 md{1..7} md9 # zpool add testpool md8 And we have no error here - single drive is added to the pool. Since it is an undoable action - it can ruin your pool and should be considered as a bug imho. Here is my result in production. NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT root 3.63T 1.06T 2.57T 29% 1.00x ONLINE - raidz2 3.62T 1.06T 2.57T - ada0 - - - - ada1 - - - - ada3 - - - - ada4 - - - - ada5 - - - - ada6 - - - - ada7 - - - - ada8 - - - - da0p1 3.75G 76.5K 3.75G - # uname -rv 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #2: Thu Nov 8 13:50:55 UTC 2012 root@ex.a.nikitos.name:/usr/obj/usr/src/sys/GENERIC # zpool get version testpool NAME PROPERTY VALUE SOURCE testpool version 28 default # zfs get version testpool NAME PROPERTY VALUE SOURCE testpool version 5 - btw is there a chance to remove this device now? Regards. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 02:44:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 24ECE3E5 for ; Tue, 13 Nov 2012 02:44:31 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id D03DC8FC08 for ; Tue, 13 Nov 2012 02:44:30 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id qAD2Vn5O016177; Mon, 12 Nov 2012 20:31:49 -0600 (CST) Date: Mon, 12 Nov 2012 20:31:48 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jason Keltz Subject: Re: RHEL to FreeBSD file server In-Reply-To: <50A130B7.4080604@cse.yorku.ca> Message-ID: References: <50A130B7.4080604@cse.yorku.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 12 Nov 2012 20:31:49 -0600 (CST) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 02:44:31 -0000 I am not able to attest to the type of configuration you are building. It all sounds good to me and it would surely work fine with Solaris, and so likely with FreeBSD as well. The only concern might be that 64 GB of memory is not so much according to today's hardware standards. New hardware is easily able to fit 128 GB, 256 GB, or even 512 GB. If the working set from all those clients is bigger than will fit in 64 GB of RAM, then your disks will be working much harder than they should be. With zfs, disks don't do any reading (reads will stall) while zfs is writing a transaction group. A proper server should be doing mostly writing because key data is already in cache. With so many clients, make sure that your intent log FLASH devices are mirrored and be prepared to replace them periodically. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 03:47:20 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ED4B43EC for ; Tue, 13 Nov 2012 03:47:20 +0000 (UTC) (envelope-from gary.buhrmaster@gmail.com) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id AD5108FC08 for ; Tue, 13 Nov 2012 03:47:20 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id k10so13123701iea.13 for ; Mon, 12 Nov 2012 19:47:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=st8KdZL1jhvS7lVbud19EZUM3Bg/dXwWwRKBGL80z58=; b=qaSh/GSfqOfQWkLG63Sh1krmUpBxKqcluVFGbS1R+VnaocXEJyAYNzNHLiUXVhKLdD a5hFgIUnIfSHcOV/fa6YvRTnS0fEtzGTb0meTevOVuCUqv3QN38lN4oORx2KWP5Hrxz9 IqKQCT4oU14r/xwJZjj/OhvPpECwEkuWN/vb59WkztZuwJCd+++XxvYkkdiic/SQO+8f EJGvExyosRETdQ/5Ws52NqDWuIx/5m7GmMw16bEMnahAxJPp33FWu/t5mox0+sX01jJi HvJZjY5RC4jTSh8/PNl/6EFygWlZQnm09e4fDOtLisXYj1FO0g87rh/xxxO9gISy6XGu tlsA== MIME-Version: 1.0 Received: by 10.50.6.129 with SMTP id b1mr10034979iga.23.1352778439935; Mon, 12 Nov 2012 19:47:19 -0800 (PST) Received: by 10.42.239.3 with HTTP; Mon, 12 Nov 2012 19:47:19 -0800 (PST) In-Reply-To: References: Date: Mon, 12 Nov 2012 19:47:19 -0800 Message-ID: Subject: Re: "zpool add" safety checks are skipped if a pool was created with "-f" From: Gary Buhrmaster To: nikitosiusis Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 03:47:21 -0000 On Mon, Nov 12, 2012 at 1:20 PM, nikitosiusis wrote: ... > btw is there a chance to remove this device now? I think this requires the block pointer rewrite code. The mythical zfs code that would provide numerous asked for capabilities. Gary From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 15:00:05 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 50059FE6 for ; Tue, 13 Nov 2012 15:00:05 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id 18A458FC16 for ; Tue, 13 Nov 2012 15:00:04 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtp (Exim 4.76) (envelope-from ) id 1TYHy1-0002f6-LY; Tue, 13 Nov 2012 09:59:57 -0500 Message-ID: <50A2606D.3040306@cse.yorku.ca> Date: Tue, 13 Nov 2012 09:59:57 -0500 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.9) Gecko/20121011 Thunderbird/10.0.9 MIME-Version: 1.0 To: Bob Friesenhahn Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: Thanks for your reply Bob! I've thought about adding more memory to the R720 which can go up to 768 GB. I'm just not quite sure how much we need, but it can never hurt to add more, I guess. I know you can never have too much memory!! :) [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 15:00:05 -0000 Thanks for your reply Bob! I've thought about adding more memory to the R720 which can go up to 768 GB. I'm just not quite sure how much we need, but it can never hurt to add more, I guess. I know you can never have too much memory!! :) Any specific suggested devices for the ZIL? Jason. On 11/12/2012 09:31 PM, Bob Friesenhahn wrote: > I am not able to attest to the type of configuration you are building. > It all sounds good to me and it would surely work fine with Solaris, > and so likely with FreeBSD as well. The only concern might be that 64 > GB of memory is not so much according to today's hardware standards. > New hardware is easily able to fit 128 GB, 256 GB, or even 512 GB. > > If the working set from all those clients is bigger than will fit in > 64 GB of RAM, then your disks will be working much harder than they > should be. With zfs, disks don't do any reading (reads will stall) > while zfs is writing a transaction group. A proper server should be > doing mostly writing because key data is already in cache. > > With so many clients, make sure that your intent log FLASH devices are > mirrored and be prepared to replace them periodically. > > Bob -- Jason Keltz Manager of Development Department of Computer Science and Engineering York University, Toronto, Canada Tel: 416-736-2100 x. 33570 Fax: 416-736-5872 From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 15:27:24 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EED11B42 for ; Tue, 13 Nov 2012 15:27:24 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id A700A8FC0C for ; Tue, 13 Nov 2012 15:27:24 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtp (Exim 4.76) (envelope-from ) id 1TYIOY-0003iq-Ry; Tue, 13 Nov 2012 10:27:22 -0500 Message-ID: <50A266DA.2090605@cse.yorku.ca> Date: Tue, 13 Nov 2012 10:27:22 -0500 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.9) Gecko/20121011 Thunderbird/10.0.9 MIME-Version: 1.0 To: kpneal@pobox.com Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> In-Reply-To: <20121113043409.GA70601@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: Thanks for your reply Kevin.. On 11/12/2012 11:34 PM, kpneal@pobox.com wrote: > I'll see your long post and raise you one .... Nevermind. :) :) :) > On Mon, Nov 12, 2012 at 12:24:07PM -0500, Jason Keltz wrote: >> For the last few months, I've been working on and off learning about >> FreeBSD. The goal of my work is to swap out our current dual Red Hat >> Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for >> the most reliable, high performance NFS file server that I can get. The >> fact that, in addition, I get to take advantage of ZFS is what I see as >> a a major bonus. > So, which is it? Do you want the most reliable server? OR, do you want > the highest performance server? Be clear on what your requirements > actually are. Honestly, I don't really think that wanting a reliable server that isn't say, prone to crashing, losing filesystems out of the blue for unexplained reasons, etc. yet performing well is out of the question. Of course, who *doesn't* want a reliable server? :) On the other hand, as you've suggested below, consideration on filesystem setup that might reduce the level of performance, yet provide *additional* reliability is something that should be considered nonetheless. [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 15:27:25 -0000 Thanks for your reply Kevin.. On 11/12/2012 11:34 PM, kpneal@pobox.com wrote: > I'll see your long post and raise you one .... Nevermind. :) :) :) > On Mon, Nov 12, 2012 at 12:24:07PM -0500, Jason Keltz wrote: >> For the last few months, I've been working on and off learning about >> FreeBSD. The goal of my work is to swap out our current dual Red Hat >> Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for >> the most reliable, high performance NFS file server that I can get. The >> fact that, in addition, I get to take advantage of ZFS is what I see as >> a a major bonus. > So, which is it? Do you want the most reliable server? OR, do you want > the highest performance server? Be clear on what your requirements > actually are. Honestly, I don't really think that wanting a reliable server that isn't say, prone to crashing, losing filesystems out of the blue for unexplained reasons, etc. yet performing well is out of the question. Of course, who *doesn't* want a reliable server? :) On the other hand, as you've suggested below, consideration on filesystem setup that might reduce the level of performance, yet provide *additional* reliability is something that should be considered nonetheless. >> I'm looking to replace the above hardware completely. In terms of >> hardware, I've recently been able to acquire a new 12th generation Dell >> PowerEdge R720 server with 64 GB of memory and dual E5-2660 processors >> (2.20 Ghz). It has an integrated Dell H310 controller (FreeBSD mfi >> driver) - which is presently only used for a mirrored root configuration >> (2 x 500 GB NL SAS drives). I added 2 x LSI 9205-8e cards (LSISAS2308) >> to the server. The LSI cards were flashed to the latest LSI firmware. >> I also have 1 Dell MD1220 array with 24 x 900 GB 10K SAS drives for >> data. The server has 4 x 1 GB Intel NICs. > >> I'm working with FreeBSD 9.1RC3 because I understand that the 9.1 series >> includes many important improvements, and a totally new driver for the >> LSI SAS HBA cards. I suspect that by the time the file server is ready >> to go live, 9.1 will be officially released. > Yep, 9.1 will be the first release with the driver you'll need for the > R720's H310, the H810, and all the other 12G cards. > Actually, I had the H310 working with 9.0 (which is what I started working with) by backporting the code. ;) >> In terms of ZFS, in my testing, I have been using a single ZFS pool >> comprised of 11 mirrored vdevs - a total of 22 disks, with 2 spares (24 >> disks total). As I understand it, I should be able to get the optimal >> performance this way. I considered using multiple pools, but with > Optimal? Hard to say. > > On one of the OpenSolaris ZFS lists a guy did performance testing of various > configurations. He found that for writes the best performance came from > having vdevs consisting of a single disk. Performance scaled pretty well > with the number of disks, but it had zero redundancy. For reads the best > performance came from a single vdev of an N-way mirror. Writes were a > little worse than a single disk, but the up side was that it had near-linear > scaling of read performance plus _excellent_ redundancy. > > Neither of those configurations is a very good idea in most cases. > > With your setup of 11 mirrors you have a good mixture of read and write > performance, but you've compromised on the safety. The reason that RAID 6 > (and thus raidz2) and up were invented was because drives that get used > together tend to fail together. If you lose a drive in a mirror there is > an elevated probability that the replacement drive will not be in place > before the remaining leg of the mirror fails. If that happens then you've > lost the pool. (Drive failures are _not_ independent.) > > Consider instead having raidz2 vdevs of four drives each. This will give > you the same amount of overhead (so the same amount of usable space), but > the pool will be able to survive two failures in each group of four instead > of only one in each group of two. Both read and write performance will be > less than with your mirror pairs, but you'll still have striping across > all of your vdevs. If performance isn't up to snuff you can then try adding > more disks, controllers, etc. I did experiment with a similar option at one point ... 4, 6 disk raidz2 vdevs (as opposed to 6 x 4 disk vdev).. Presently, with the mirrors, I'm using 22 disks with leaving 2 hot spares. If I used 6 x 4 disk vdevs, I use all the disks and don't have hot spares (though I have better redundancy) -- I could put spares into the R720 head which actually has 14 disk slots empty, but I'm hesitant to put a spare for the md1220 using a different driver in the head of the R720... If I use 5 x 4 disk vdevs, I lose a little space... probably a bit of performance as well ... but have better reliability, as you say... Now, as I said, I didn't test the 6 x 4 disk vdev (yet), but I had tried 4 x 6 disk vdevs which resulted in significantly less performance... here's a basic filebench number from fs for 4 x 6 disk raidz2: 41941: 81.258: IO Summary: 1067034 ops, 17669.393 ops/s, (1606/3213 r/w), 426.7mb/s, 0us cpu/op, 10.0ms latency ... compared to the mirrored vdevs: 42115: 77.169: IO Summary: 3139018 ops, 52261.125 ops/s, (4751/9502 r/w), 1265.8mb/s, 0us cpu/op, 3.3ms latency ... and that's already splitting the disks across two controllers.... but I can understand how raidz2 would improve on the redundancy and provide a more reliable configuration. > With groups of four you could if you had the dough have four shelves and > four controllers. You could lose a controller or a shelf and still have > redundancy left over in case of a drive failure on one of the remaining > working shelves. One guy on the OpenSolaris list had an array of 6 > controllers/shelves with 6-drive raidz2 vdevs and one drive per vdev per > shelf. > > Of course, I have no idea what your budget is for hardware or for useless > employees due to a fileserver being down. You'll have to balance those > risks and costs yourself. Absolutely .... understood... >> multiple pools comes multiple ZIL, L2ARC, etc and reduction in the >> performance numbers. I've been told that people have far bigger ZFS >> pools than my 22 disk zpool. As I understand it, as storage >> requirements increase, I could easily add another MD1220 with an >> additional 11 x mirrored vdev pairs and "append" this to the original >> pool, giving me lots more space with little hassle. > Yes, with ZFS you can add more drives and the new data will be striped > across all the drives with a distribution that I'm unclear on. The old > data will not be rebalanced. So if you wait until your pool is almost full, > and then you add only a handful of drives, then your new drives may not > give you the performance you require. Yes. I've read this. I'm not sure why an "enterprise" filesystem like ZFS doesn't support the "rebalancing" of data across the entire volume automatically (without recopying all the data back on to itself).. I understand it's a costly operation, but if someone wants to do it, the option should be there... (zpool rebalance).. >> 1) The new file server is relatively powerful. However, is one file >> server enough to handle a load of approximately 2000 connections? >> should I be looking at getting another server, or getting another >> server and another MD1220? How is 64 GB of memory when I'm talking >> about up to 2500-3000 ZFS filesystems on the box? I'm not using dedup >> and using minimal compression. > Sizing machines is _hard_. It's a fair bet that your new machine is faster > than one of your old machines. But whether or not it can handle the > consolidated load is a question I can't answer. It depends on how much > load a typical (for you) client puts on the server. > Understood.. Thanks for your feedback! It's much appreciated. Jason Keltz From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 15:43:47 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 06A9D135 for ; Tue, 13 Nov 2012 15:43:47 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id AC4CA8FC12 for ; Tue, 13 Nov 2012 15:43:46 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 5CEA228427; Tue, 13 Nov 2012 16:43:39 +0100 (CET) Received: from [192.168.1.2] (unknown [89.177.49.69]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id E586C28422; Tue, 13 Nov 2012 16:43:33 +0100 (CET) Message-ID: <50A26AA5.70806@quip.cz> Date: Tue, 13 Nov 2012 16:43:33 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Jason Keltz Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> <50A2606D.3040306@cse.yorku.ca> In-Reply-To: <50A2606D.3040306@cse.yorku.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 15:43:47 -0000 Jason Keltz wrote: > Thanks for your reply Bob! I've thought about adding more memory to the > R720 which can go up to 768 GB. I'm just not quite sure how much we > need, but it can never hurt to add more, I guess. I know you can never > have too much memory!! :) Yes you can: http://www.zfsbuild.com/2012/03/02/when-is-enough-memory-too-much/ http://www.zfsbuild.com/2012/03/05/when-is-enough-memory-too-much-part-2/ I don't know if the same problem exists on FreeBSD or not. I recommend you to read a whole blog "zfsbuild". There are many useful informations. Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 15:56:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C6075636 for ; Tue, 13 Nov 2012 15:56:57 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 80CA48FC14 for ; Tue, 13 Nov 2012 15:56:57 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 2CAC528427; Tue, 13 Nov 2012 16:56:56 +0100 (CET) Received: from [192.168.1.2] (unknown [89.177.49.69]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 0D4B228422; Tue, 13 Nov 2012 16:56:55 +0100 (CET) Message-ID: <50A26DC6.1050205@quip.cz> Date: Tue, 13 Nov 2012 16:56:54 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Jason Keltz Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A266DA.2090605@cse.yorku.ca> In-Reply-To: <50A266DA.2090605@cse.yorku.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 15:56:57 -0000 Jason Keltz wrote: [...] > Presently, with the mirrors, I'm using 22 disks with leaving 2 hot spares. > If I used 6 x 4 disk vdevs, I use all the disks and don't have hot > spares (though I have better redundancy) -- I could put spares into the > R720 head which actually has 14 disk slots empty, but I'm hesitant to > put a spare for the md1220 using a different driver in the head of the > R720... Beware of hot spares - they are not "hot". kern/134491: [zfs] Hot spares are rather cold... http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 This is long standing unsolved issue. You can add drives as "hot spares" to the zpool, but FreeBSD lacks daemon to recieve notifications about disk failure and failed drive will not be replaced by spare. Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 17:05:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 358D5F3 for ; Tue, 13 Nov 2012 17:05:13 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-da0-f54.google.com (mail-da0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id F3BCE8FC14 for ; Tue, 13 Nov 2012 17:05:12 +0000 (UTC) Received: by mail-da0-f54.google.com with SMTP id z9so3432868dad.13 for ; Tue, 13 Nov 2012 09:05:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:x-mailer:from:subject:date :to; bh=b06UWXUwJyNt/IPyMebL6OnaTro7nQCvb67y2UJ1vWc=; b=ggfIx80d7opd/iK4e1AQrLqtMpLuedSlFkxWpt6f3IdRr1dQvjTEV+awj5XBEE2WFk L+KM2EE/3zQhk3OwPKZ2cAEj1O9sxHp0M7YUdivSXXjuYdZe6phpmW/ARGXOQTNu/1te xjHcQk61IYexdcBdWBKg8XluA/IhMKeKaH+2KMY6N+AorIc4wGkr7utUf2fTJ8kUWKti 0cXWyB/jYT1oO9GZYPWB0PIKAdehoiQx1DVuoaUxZe8kl49/48biQwQGx4hmKhdvb1oo Vwvu/WoaLX5RZzu0frSc8Y1tN9EcrxCavzfZfjh9o+Y1vlk3ttKWgkHFPe62EqGQldQB dFHQ== Received: by 10.68.236.131 with SMTP id uu3mr69169412pbc.104.1352826311604; Tue, 13 Nov 2012 09:05:11 -0800 (PST) Received: from [10.143.141.46] (mobile-166-147-093-219.mycingular.net. [166.147.93.219]) by mx.google.com with ESMTPS id qp6sm6224355pbc.25.2012.11.13.09.04.58 (version=SSLv3 cipher=OTHER); Tue, 13 Nov 2012 09:05:09 -0800 (PST) References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A266DA.2090605@cse.yorku.ca> <50A26DC6.1050205@quip.cz> Mime-Version: 1.0 (1.0) In-Reply-To: <50A26DC6.1050205@quip.cz> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: X-Mailer: iPhone Mail (10A523) From: Garrett Cooper Subject: Re: RHEL to FreeBSD file server Date: Tue, 13 Nov 2012 09:04:54 -0800 To: Miroslav Lachman <000.fbsd@quip.cz> Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 17:05:13 -0000 On Nov 13, 2012, at 7:56 AM, Miroslav Lachman <000.fbsd@quip.cz> wrote: > Jason Keltz wrote: > [...] >=20 >> Presently, with the mirrors, I'm using 22 disks with leaving 2 hot spares= . >> If I used 6 x 4 disk vdevs, I use all the disks and don't have hot >> spares (though I have better redundancy) -- I could put spares into the >> R720 head which actually has 14 disk slots empty, but I'm hesitant to >> put a spare for the md1220 using a different driver in the head of the >> R720... >=20 > Beware of hot spares - they are not "hot". >=20 > kern/134491: [zfs] Hot spares are rather cold... > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D134491 >=20 > This is long standing unsolved issue. You can add drives as "hot spares" t= o the zpool, but FreeBSD lacks daemon to recieve notifications about disk fa= ilure and failed drive will not be replaced by spare. You should be able to work around this on 9.1 with a zfs scrub once the driv= e has been pulled from the chassis, but I recommend testing this out first, j= ust to be safe. Cheers, -Garrett= From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 17:28:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E027BA13 for ; Tue, 13 Nov 2012 17:28:40 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 9AD218FC16 for ; Tue, 13 Nov 2012 17:28:40 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id qADHSWDw019628; Tue, 13 Nov 2012 11:28:32 -0600 (CST) Date: Tue, 13 Nov 2012 11:28:32 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Miroslav Lachman <000.fbsd@quip.cz> Subject: Re: RHEL to FreeBSD file server In-Reply-To: <50A26AA5.70806@quip.cz> Message-ID: References: <50A130B7.4080604@cse.yorku.ca> <50A2606D.3040306@cse.yorku.ca> <50A26AA5.70806@quip.cz> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 13 Nov 2012 11:28:32 -0600 (CST) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 17:28:41 -0000 On Tue, 13 Nov 2012, Miroslav Lachman wrote: > Jason Keltz wrote: >> Thanks for your reply Bob! I've thought about adding more memory to the >> R720 which can go up to 768 GB. I'm just not quite sure how much we >> need, but it can never hurt to add more, I guess. I know you can never >> have too much memory!! :) > > Yes you can: > > http://www.zfsbuild.com/2012/03/02/when-is-enough-memory-too-much/ > http://www.zfsbuild.com/2012/03/05/when-is-enough-memory-too-much-part-2/ > > I don't know if the same problem exists on FreeBSD or not. The Solaris kernel and ZFS have a fairly intimate relationship regarding memory and ZFS is able to give up memory to the kernel on demand. FreeBSD does not have this intimate relationship. Regardless, code specific to ZFS likely behaves similarly. The behavior of semaphoring depends on CPU architecture and how many CPU cores are in the system. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 17:41:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5DFA8C43 for ; Tue, 13 Nov 2012 17:41:17 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 145818FC13 for ; Tue, 13 Nov 2012 17:41:16 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id qADHf8GE019718; Tue, 13 Nov 2012 11:41:09 -0600 (CST) Date: Tue, 13 Nov 2012 11:41:08 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: kpneal@pobox.com Subject: Re: RHEL to FreeBSD file server In-Reply-To: <20121113043409.GA70601@neutralgood.org> Message-ID: References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 13 Nov 2012 11:41:09 -0600 (CST) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 17:41:17 -0000 On Mon, 12 Nov 2012, kpneal@pobox.com wrote: > > With your setup of 11 mirrors you have a good mixture of read and write > performance, but you've compromised on the safety. The reason that RAID 6 > (and thus raidz2) and up were invented was because drives that get used > together tend to fail together. If you lose a drive in a mirror there is > an elevated probability that the replacement drive will not be in place > before the remaining leg of the mirror fails. If that happens then you've > lost the pool. (Drive failures are _not_ independent.) Do you have a reference to independent data which supports this claim that drive failures are not independent? The whole function of RAID assumes that drive failures are independent. If drives share a chassis, care should be taken to make sure that redundant drives are not in physical proximity to each other and that they are supported via a different controller, I/O path, and power supply. If the drives are in a different chassis then their failures should be completely independent outside of a shared event like power surge, fire, EMP, flood, or sun-spot activity. The idea of raidz2 vdevs of four drives each sounds nice but will suffer from decreased performance and increased time to replace a failed disk. There are always tradeoffs. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 20:27:47 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6D18D31E; Tue, 13 Nov 2012 20:27:47 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id E325D8FC14; Tue, 13 Nov 2012 20:27:46 +0000 (UTC) Received: from server.rulingia.com (c220-239-241-202.belrs5.nsw.optusnet.com.au [220.239.241.202]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qADKRcqn072314 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 14 Nov 2012 07:27:39 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qADKRUJb044107 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 14 Nov 2012 07:27:30 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qADKRUwc044106; Wed, 14 Nov 2012 07:27:30 +1100 (EST) (envelope-from peter) Date: Wed, 14 Nov 2012 07:27:30 +1100 From: Peter Jeremy To: Andriy Gapon Subject: Re: zfs diff deadlock Message-ID: <20121113202730.GA42238@server.rulingia.com> References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="y0ulUmNC+osPPQO6" Content-Disposition: inline In-Reply-To: <509F5E0A.1020501@FreeBSD.org> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 20:27:47 -0000 --y0ulUmNC+osPPQO6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Nov-11 10:12:58 +0200, Andriy Gapon wrote: >on 11/11/2012 09:27 Peter Jeremy said the following: >> On 2012-Nov-11 09:32:49 +1100, Peter Jeremy >> wrote: >>> I recently decided to do a "zfs diff" between two snapshots to try and >>> identify why there was so much "USED" space in the snapshot. The diff r= an >>> for a while (though with very little IO) but has now wedged unkillably. >>> There's nothing on the console or in any logs, the pool reports no >>> problems and there are no other visible FS issues. Any ideas on tracki= ng >>> this down? >> ... >>> The systems is running a 4-month old 8-stable (r237444) >>=20 >> I've tried a second system running the same world with the same result, = so=20 >> this looks like a real bug in ZFS rather than a system glitch. >>=20 > >Are you able to catch the state of all threads in the system? >E.g. via procstat -k -a. >Or a crash dump. Unfortunately, neither of those systems are really suitable for debugging. I have setup a VBox and sent most of the offending FS to it. That gives somewhat different results: On a recent 8-stable (r242865M), I get a panic whilst on a recent head, I get a "Unable to determine path or stats" error. On 8-stable, I have a crashdump and the panic is: suspending ithread with the following locks held: shared spin mutex ({6") r =3D 0 (0xffffff005c395a80) locked @ /usr/src/sys= /modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:522 panic: witness_warn cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1ce witness_warn() at witness_warn+0x2b2 ithread_loop() at ithread_loop+0x112 fork_exit() at fork_exit+0x11d fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff800008ccf0, rbp =3D 0 --- Note that zap.c:522 is the rw_enter() in zap_get_leaf_byblk() - which is the offending function in the backtrace on r237444. On head, I get some normal differences terminated by: Unable to determine path or stats for object 2128453 in tank/beckett/home@2= 0120518: Invalid argument A scrub reports no issues but the problem remains: root@FB10-64:~ # zpool status=20 pool: tank state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on software that does not support= feature flags. scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ada2 ONLINE 0 0 0 errors: No known data errors I've done some searching and found 2 hits on the message - one in an OI IRC log and the other in a ZFS-on-Linux list. Neither offered any insights. I've tried ktracing the zfs diff and that ends: 1856 zfs CALL read(0x7,0x7fffffbfc160,0x18) 1856 zfs GIO fd 7 read 24 bytes 0x0000 0400 0000 0000 0000 e079 2000 0000 0000 397a 2000 0000 0000 = = |.........y .....9z .....| 1856 zfs RET read 24/0x18 1856 zfs CALL ioctl(0x3,0xd5985a36,0x7fffffbfc178) 1856 zfs RET ioctl 0 1856 zfs CALL read(0x7,0x7fffffbfc160,0x18) 1856 zfs GIO fd 7 read 24 bytes 0x0000 0200 0000 0000 0000 3a7a 2000 0000 0000 4d7a 2000 0000 0000 = = |........:z .....Mz .....| 1856 zfs RET read 24/0x18 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 2 No such file or directory 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl 0 1856 zfs CALL ioctl(0x3,0xd5985a39,0x7fffffbfab18) 1856 zfs RET ioctl -1 errno 22 Invalid argument 1856 zfs CALL close(0x1) 1856 zfs RET close 0 1856 zfs CALL close(0x7) 1856 zfs RET ioctl -1 errno 32 Broken pipe 1856 zfs CALL close(0x8) 1856 zfs RET close 0 1856 zfs CALL thr_kill(0x18adf,SIG 32) 1856 zfs RET thr_kill 0 1856 zfs CALL _umtx_op(0x802c06c00,0x2,0x18adf,0,0) 1856 zfs RET close 0 1856 zfs PSIG SIG 32 caught handler=3D0x8020537f0 mask=3D0x0 code= =3DSI_LWP 1856 zfs CALL sigreturn(0x7fffffbfbca0) 1856 zfs RET sigreturn JUSTRETURN 1856 zfs CALL thr_exit(0x802c06c00) 1856 zfs RET _umtx_op 0 1856 zfs CALL close(0x6) 1856 zfs RET close 0 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/share/nls/C/libc.cat" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/share/nls/libc/C" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/local/share/nls/C/libc.cat" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL stat(0x7fffffffa900,0x7fffffffa888) 1856 zfs NAMI "/usr/local/share/nls/libc/C" 1856 zfs RET stat -1 errno 2 No such file or directory 1856 zfs CALL write(0x2,0x7fffffffa740,0x65) 1856 zfs GIO fd 2 wrote 101 bytes "Unable to determine path or stats for object 2128453 in tank/becket= t/home@20120518: Invalid argument " 1856 zfs RET write 101/0x65 1856 zfs CALL close(0x5) 1856 zfs RET close 0 1856 zfs CALL close(0x3) 1856 zfs RET close 0 1856 zfs CALL close(0x4) 1856 zfs RET close 0 1856 zfs CALL exit(0x1) --=20 Peter Jeremy --y0ulUmNC+osPPQO6 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCirTIACgkQ/opHv/APuIccAACdGfzyqFTjb5UUcu7pqRgz3DiH pB4An0fjLFS7wQwDVAJCxEiALo4kcJZB =E0LV -----END PGP SIGNATURE----- --y0ulUmNC+osPPQO6-- From owner-freebsd-fs@FreeBSD.ORG Tue Nov 13 21:19:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 59433D01 for ; Tue, 13 Nov 2012 21:19:27 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id 0BFB58FC0C for ; Tue, 13 Nov 2012 21:19:26 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtp (Exim 4.76) (envelope-from ) id 1TYNtG-0000Wd-2D; Tue, 13 Nov 2012 16:19:26 -0500 Message-ID: <50A2B95D.4000400@cse.yorku.ca> Date: Tue, 13 Nov 2012 16:19:25 -0500 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.9) Gecko/20121011 Thunderbird/10.0.9 MIME-Version: 1.0 To: Bob Friesenhahn Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: > On Mon, 12 Nov 2012, kpneal@pobox.com wrote: >> >> With your setup of 11 mirrors you have a good mixture of read and write >> performance, but you've compromised on the safety. The reason that >> RAID 6 >> (and thus raidz2) and up were invented was because drives that get used >> together tend to fail together. If you lose a drive in a mirror there is >> an elevated probability that the replacement drive will not be in place >> before the remaining leg of the mirror fails. If that happens then >> you've >> lost the pool. (Drive failures are _not_ independent.) > > Do you have a reference to independent data which supports this claim > that drive failures are not independent? The whole function of RAID > assumes that drive failures are independent. > > If drives share a chassis, care should be taken to make sure that > redundant drives are not in physical proximity to each other and that > they are supported via a different controller, I/O path, and power > supply. If the drives are in a different chassis then their failures > should be completely independent outside of a shared event like power > surge, fire, EMP, flood, or sun-spot activity. > > The idea of raidz2 vdevs of four drives each sounds nice but will > suffer from decreased performance and increased time to replace a > failed disk. There are always tradeoffs. [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 21:19:27 -0000 On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: > On Mon, 12 Nov 2012, kpneal@pobox.com wrote: >> >> With your setup of 11 mirrors you have a good mixture of read and write >> performance, but you've compromised on the safety. The reason that >> RAID 6 >> (and thus raidz2) and up were invented was because drives that get used >> together tend to fail together. If you lose a drive in a mirror there is >> an elevated probability that the replacement drive will not be in place >> before the remaining leg of the mirror fails. If that happens then >> you've >> lost the pool. (Drive failures are _not_ independent.) > > Do you have a reference to independent data which supports this claim > that drive failures are not independent? The whole function of RAID > assumes that drive failures are independent. > > If drives share a chassis, care should be taken to make sure that > redundant drives are not in physical proximity to each other and that > they are supported via a different controller, I/O path, and power > supply. If the drives are in a different chassis then their failures > should be completely independent outside of a shared event like power > surge, fire, EMP, flood, or sun-spot activity. > > The idea of raidz2 vdevs of four drives each sounds nice but will > suffer from decreased performance and increased time to replace a > failed disk. There are always tradeoffs. Hi Bob. Initially, I had one storage chassis, split between 2 LSI 9205-8e controllers with a 22 disk pool comprised of 11 mirrored vdevs. I think that I'm still slightly uncomfortable with the fact that 2 disks, which were all purchased at the same time, could essentially die at the same time, killing my whole pool. Yet, while moving to raidz2 would allow better redundancy, I'm not sure if the raidz2 rebuild time and decrease in performance would be worth it.. After all, this would be a primary file server, without which, I'd be in big trouble.. As a result, I'm considering this approach.. I'll buy another md1220, a few more disks, add another 9205-8e card... and use triple mirrored vdevs instead of dual.... I only really need about 8 x 900 GB storage, so if I can multiply this by 3, add a few spares... in addition, each set of disks would be on its own controller. I should be able to lose a controller, and maintain full redundancy.... I should be able to lose an entire disk enclosure and still be up ... I believe read performance would probably go up, but I suspect that write performance would suffer a little -- not sure exactly by how much. When I first speced out the server, the LSI 9205-8e was the best choice for a card since the PCI Express 3 HBAs (which the R720 supports) weren't out yet ... now, there's the LSI 9207-8e which is PCIE3, but I guess it doesn't make much sense to buy one of those now that I have another 2 x LSI 9205-8e cards already ... (a shame though since there is less than $50 difference between the cards). By the way - on another note - what do you or other list members think of the new Intel SSD DC S3700 as ZIL? Sounds very promising when it's finally available. I spent a lot of time researching ZILs today, and one thing I can say is that I have a major headache now because of it!! Jason. From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 01:47:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9A5C5EC6 for ; Wed, 14 Nov 2012 01:47:06 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 6FEEE8FC14 for ; Wed, 14 Nov 2012 01:47:06 +0000 (UTC) Received: from JRE-MBP-2.local (c-50-143-149-146.hsd1.ca.comcast.net [50.143.149.146]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id qAE1kn9Y003221 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 13 Nov 2012 17:46:50 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <50A2F804.3010009@freebsd.org> Date: Tue, 13 Nov 2012 17:46:44 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Jason Keltz Subject: Re: RHEL to FreeBSD file server References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> In-Reply-To: <50A2B95D.4000400@cse.yorku.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 01:47:06 -0000 On 11/13/12 1:19 PM, Jason Keltz wrote: > On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: >> On Mon, 12 Nov 2012, kpneal@pobox.com wrote: >>> >>> With your setup of 11 mirrors you have a good mixture of read and >>> write >>> performance, but you've compromised on the safety. The reason that >>> RAID 6 >>> (and thus raidz2) and up were invented was because drives that get >>> used >>> together tend to fail together. If you lose a drive in a mirror >>> there is >>> an elevated probability that the replacement drive will not be in >>> place >>> before the remaining leg of the mirror fails. If that happens then >>> you've >>> lost the pool. (Drive failures are _not_ independent.) >> >> Do you have a reference to independent data which supports this >> claim that drive failures are not independent? The whole function >> of RAID assumes that drive failures are independent. >> >> If drives share a chassis, care should be taken to make sure that >> redundant drives are not in physical proximity to each other and >> that they are supported via a different controller, I/O path, and >> power supply. If the drives are in a different chassis then their >> failures should be completely independent outside of a shared event >> like power surge, fire, EMP, flood, or sun-spot activity. >> >> The idea of raidz2 vdevs of four drives each sounds nice but will >> suffer from decreased performance and increased time to replace a >> failed disk. There are always tradeoffs. > > Hi Bob. > > Initially, I had one storage chassis, split between 2 LSI 9205-8e > controllers with a 22 disk pool comprised of 11 mirrored vdevs. > I think that I'm still slightly uncomfortable with the fact that 2 > disks, which were all purchased at the same time, could essentially > die at the same time, killing my whole pool. Yet, while moving to > raidz2 would allow better redundancy, I'm not sure if the raidz2 > rebuild time and decrease in performance would be worth it.. > After all, this would be a primary file server, without which, I'd > be in big trouble.. > As a result, I'm considering this approach.. > I'll buy another md1220, a few more disks, add another 9205-8e > card... and use triple mirrored vdevs instead of dual.... I only > really need about 8 x 900 GB storage, so if I can multiply this by > 3, add a few spares... in addition, each set of disks would be on > its own controller. I should be able to lose a controller, and > maintain full redundancy.... I should be able to lose an entire > disk enclosure and still be up ... I believe read performance would > probably go up, but I suspect that write performance would suffer a > little -- not sure exactly by how much. > > When I first speced out the server, the LSI 9205-8e was the best > choice for a card since the PCI Express 3 HBAs (which the R720 > supports) weren't out yet ... now, there's the LSI 9207-8e which is > PCIE3, but I guess it doesn't make much sense to buy one of those > now that I have another 2 x LSI 9205-8e cards already ... (a shame > though since there is less than $50 difference between the cards). > > By the way - on another note - what do you or other list members > think of the new Intel SSD DC S3700 as ZIL? Sounds very promising > when it's finally available. I spent a lot of time researching ZILs > today, and one thing I can say is that I have a major headache now > because of it!! ZIL is best served by battery backed up ram or something.. it's tiny and not a really good fit an SSD (maybe just a partition) L2ARC on the other hand is a really good use for SSD. > > Jason. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 02:30:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EE894433; Wed, 14 Nov 2012 02:30:22 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id A902D8FC08; Wed, 14 Nov 2012 02:30:22 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id qAE2UCGM021771; Tue, 13 Nov 2012 20:30:12 -0600 (CST) Date: Tue, 13 Nov 2012 20:30:12 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Julian Elischer Subject: Re: RHEL to FreeBSD file server In-Reply-To: <50A2F804.3010009@freebsd.org> Message-ID: References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 13 Nov 2012 20:30:12 -0600 (CST) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 02:30:23 -0000 On Tue, 13 Nov 2012, Julian Elischer wrote: > On 11/13/12 1:19 PM, Jason Keltz wrote: >> On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: >>> On Mon, 12 Nov 2012, kpneal@pobox.com wrote: >> >> Initially, I had one storage chassis, split between 2 LSI 9205-8e >> controllers with a 22 disk pool comprised of 11 mirrored vdevs. >> I think that I'm still slightly uncomfortable with the fact that 2 disks, >> which were all purchased at the same time, could essentially die at the >> same time, killing my whole pool. Yet, while moving to raidz2 would allow >> better redundancy, I'm not sure if the raidz2 rebuild time and decrease in >> performance would be worth it.. The concern about a bad batch of disks is a valid concern. Of course a bad batch of disks could even bring down raidz2. If this is a concern, then purchase the disks from two different vendors. >> I'll buy another md1220, a few more disks, add another 9205-8e card... and >> use triple mirrored vdevs instead of dual.... I only really need about 8 x Triple mirror is considered to be exceedingly reliable. Not very cost effective though. Reads will be faster than with simple mirror since any one of the three disks can satisfy a read request. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 03:51:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 61C283BE for ; Wed, 14 Nov 2012 03:51:31 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [IPv6:2001:44b8:8060:ff02:300:1:6:5]) by mx1.freebsd.org (Postfix) with ESMTP id A611F8FC13 for ; Wed, 14 Nov 2012 03:51:30 +0000 (UTC) Message-Id: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhlWAF0Uo1ABigXDPGdsb2JhbABEhTSFI7hmGAEBAQE4NIIfAQV5EAgDDQE4QxQGiBzHTxUGDIR/gQYDnAkDjTaBSAEHFw Received: from unknown (HELO localhost) ([1.138.5.195]) by ipmail05.adl6.internode.on.net with ESMTP; 14 Nov 2012 14:21:29 +1030 From: Stephen McKay To: Tom Evans Subject: Re: SSD recommendations for ZFS cache/log References: In-Reply-To: from Tom Evans at "Thu, 08 Nov 2012 21:07:24 -0000" Date: Wed, 14 Nov 2012 14:51:22 +1100 Cc: FreeBSD FS , Stephen McKay X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 03:51:31 -0000 On Thursday, 8th November 2012, Tom Evans wrote: >I'm upgrading my home ZFS setup, and want to speed things up a bit by >adding some SSDs for cache/log. I was hoping some more experienced >heads could offer some advice on what I've gleaned so far. Before you get excited about SSD for ZIL, measure your synchronous write rate. If you have a mostly async load, you may get little or zero improvement. To measure ZIL activity, install dtrace and run Richard Elling's zilstat script. Everyone with more than a passing interest in ZFS should do this. Measurement always beats speculation. On my workstation, I have sync writes only during email delivery, and for that I'm willing to spend the extra few milliseconds a hard disk takes so that I don't have to risk my data on a consumer grade SSD. I have no way to determine in advance the behaviour of an SSD on power failure so I assume all the ones I can afford have bad behaviour. :-) I know that expensive ones contain capacitors so that power failures do not corrupt their contents. By the nature of advertising (from which we know that any feature not excessively hyped must therefore not be supported), we must conclude that other SSDs by normal operation corrupt blocks on power failure. So, that puts SSDs (that I can afford) behind standard disks for reliability, plus I wouldn't benefit much from the speed, so I don't use an SSD for ZIL. Even if you have a sync heavy load (NFS server, say, or perhaps a time machine server via netatalk), the right answer might be to subvert those protocols so they become async. (Maybe nothing you do with those protocols actually depends on their sync guarantees, or perhaps you can recover easily from failure by restarting.) You'll only know if you have to make decisions like this (expensive reliable SSD for ZIL vs cheating at protocols) if you measure. So, measure! As for L2ARC, do you need it? It's harder to tell in advance that a cache device would be useful, but if you have sufficient RAM for your purposes, you may not need it. Sufficient could be approximately 1GB per 1TB of disk (other rules of thumb exist). If you enable dedup, you are unlikely to have sufficient RAM! So in this case L2ARC may be advisable. Even then, performance when using dedup may be less than you would hope for, so I recommend against enabling dedup. Remember that L2ARC is not persistent. It takes time to warm up. If you reboot often, you will get little to no use from it. If you leave your machine on all the time, eventually everything frequently used will end up in there. But, if you don't use all your RAM for ARC before you reboot anyway, your L2ARC will be (essentially) unused. Again, you have to measure at least a little bit (perhaps using the zfs-stats port) before you know. On the plus side, a corrupt L2ARC shouldn't do any more than require a reboot, so it's safe to experiment with cheap SSDs. >The drives I am thinking of getting are either Intel 330, Intel 520, >Crucial M4 RealSSD or Samsung 830, all in their 120/128GB variants. Do any of these contain capacitors for use when power fails? If not then I'd assume they are unsafe for use as ZIL and would limit them to L2ARC. If you can show that any of these somehow avoid corruption on power failure without a capacitor system, I'd love to know how that works! Cheers, Stephen. From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 04:24:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8C3DBACF for ; Wed, 14 Nov 2012 04:24:29 +0000 (UTC) (envelope-from chris@behanna.org) Received: from alayta.pair.com (alayta.pair.com [209.68.4.24]) by mx1.freebsd.org (Postfix) with ESMTP id 6398D8FC0C for ; Wed, 14 Nov 2012 04:24:28 +0000 (UTC) Received: from [172.16.0.6] (99-120-175-239.lightspeed.austtx.sbcglobal.net [99.120.175.239]) by alayta.pair.com (Postfix) with ESMTPSA id 924B4D9837 for ; Tue, 13 Nov 2012 23:18:55 -0500 (EST) Subject: Re: SSD recommendations for ZFS cache/log References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> From: Chris BeHanna Content-Type: text/plain; charset=us-ascii X-Mailer: iPad Mail (10A523) In-Reply-To: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> Message-Id: <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> Date: Tue, 13 Nov 2012 22:18:54 -0600 To: FreeBSD FS Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 04:24:29 -0000 On Nov 13, 2012, at 21:51, Stephen McKay wrote: > [...lots of good advice about measuring, and lots of good advice about L2A= RC...] >=20 > I have no way to determine in advance the behaviour of an SSD on > power failure so I assume all the ones I can afford have bad > behaviour. :-) I know that expensive ones contain capacitors so > that power failures do not corrupt their contents. By the nature > of advertising (from which we know that any feature not excessively > hyped must therefore not be supported), we must conclude that other > SSDs by normal operation corrupt blocks on power failure. If you'll pardon what may be an ignorant question, does this matter if you h= ave your machine on a UPS, especially if you run upsmon or nut to do a grace= ful shutdown when there are n minutes of battery remaining? Thanks, --=20 Chris BeHanna chris@behanna.org= From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 04:26:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3F7DBB59 for ; Wed, 14 Nov 2012 04:26:00 +0000 (UTC) (envelope-from bryan@shatow.net) Received: from secure.xzibition.com (secure.xzibition.com [173.160.118.92]) by mx1.freebsd.org (Postfix) with ESMTP id E65C08FC14 for ; Wed, 14 Nov 2012 04:25:59 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=shatow.net; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; q=dns; s=sweb; b=GlY1a8 upBE0n1YZpMU/uxfJ+lN0scza7d+AWl4sVRE9fPuLcbJdHiWXIw/iX38TjphwH11 h0KIYF46s6yLjAAWVlTenI8xn04nr/WtRdtnEBe/efZ+eIacpezR9myr6OYq9Ccr 7N5I1OzUgaNa6gxDQSmKugbIEQwwfs/qk26VI= DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=shatow.net; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; s=sweb; bh=18jScapueb5t X1Lla5y/06p/6zscRRS8w3Q9uBqP3x8=; b=TDiV60WPLwoUjmFUCIhvJzLNfW5Q azRhtKEdHll0RLVKi/Bxo/X8tVud+rWbBhC+u04VAGY+hr0Mpxwal4wDu6xpIv4Y 5AJ78o8oA3L4NuW6jhzD+o68xlUqzwIrOMG8NaNPUxygZ4OGV0G8oOBE0N6BvaQ6 /dmIFjlJwEkIT98= Received: (qmail 17092 invoked from network); 13 Nov 2012 22:25:51 -0600 Received: from unknown (HELO ?10.10.0.115?) (bryan@shatow.net@10.10.0.115) by sweb.xzibition.com with ESMTPA; 13 Nov 2012 22:25:51 -0600 Message-ID: <50A31D48.3000700@shatow.net> Date: Tue, 13 Nov 2012 22:25:44 -0600 From: Bryan Drewery User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Stephen McKay Subject: Re: SSD recommendations for ZFS cache/log References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> In-Reply-To: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> X-Enigmail-Version: 1.4.5 OpenPGP: id=3C9B0CF9; url=http://www.shatow.net/bryan/bryan.asc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD FS , Eitan Adler X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 04:26:00 -0000 On 11/13/2012 9:51 PM, Stephen McKay wrote: > On Thursday, 8th November 2012, Tom Evans wrote: > >> I'm upgrading my home ZFS setup, and want to speed things up a bit by >> adding some SSDs for cache/log. I was hoping some more experienced >> heads could offer some advice on what I've gleaned so far. > > Before you get excited about SSD for ZIL, measure your synchronous > write rate. If you have a mostly async load, you may get little > or zero improvement. > > To measure ZIL activity, install dtrace and run Richard Elling's > zilstat script. Everyone with more than a passing interest in ZFS > should do this. Measurement always beats speculation. > > On my workstation, I have sync writes only during email delivery, > and for that I'm willing to spend the extra few milliseconds a > hard disk takes so that I don't have to risk my data on a consumer > grade SSD. > > I have no way to determine in advance the behaviour of an SSD on > power failure so I assume all the ones I can afford have bad > behaviour. :-) I know that expensive ones contain capacitors so > that power failures do not corrupt their contents. By the nature > of advertising (from which we know that any feature not excessively > hyped must therefore not be supported), we must conclude that other > SSDs by normal operation corrupt blocks on power failure. > > So, that puts SSDs (that I can afford) behind standard disks for > reliability, plus I wouldn't benefit much from the speed, so I don't > use an SSD for ZIL. > > Even if you have a sync heavy load (NFS server, say, or perhaps a > time machine server via netatalk), the right answer might be to > subvert those protocols so they become async. (Maybe nothing you > do with those protocols actually depends on their sync guarantees, > or perhaps you can recover easily from failure by restarting.) > You'll only know if you have to make decisions like this (expensive > reliable SSD for ZIL vs cheating at protocols) if you measure. So, > measure! > > As for L2ARC, do you need it? It's harder to tell in advance that > a cache device would be useful, but if you have sufficient RAM for > your purposes, you may not need it. Sufficient could be approximately > 1GB per 1TB of disk (other rules of thumb exist). > > If you enable dedup, you are unlikely to have sufficient RAM! So > in this case L2ARC may be advisable. Even then, performance when > using dedup may be less than you would hope for, so I recommend > against enabling dedup. > > Remember that L2ARC is not persistent. It takes time to warm up. > If you reboot often, you will get little to no use from it. If > you leave your machine on all the time, eventually everything > frequently used will end up in there. But, if you don't use all > your RAM for ARC before you reboot anyway, your L2ARC will be > (essentially) unused. Again, you have to measure at least a little > bit (perhaps using the zfs-stats port) before you know. > > On the plus side, a corrupt L2ARC shouldn't do any more than require > a reboot, so it's safe to experiment with cheap SSDs. > >> The drives I am thinking of getting are either Intel 330, Intel 520, >> Crucial M4 RealSSD or Samsung 830, all in their 120/128GB variants. > > Do any of these contain capacitors for use when power fails? If not > then I'd assume they are unsafe for use as ZIL and would limit them > to L2ARC. If you can show that any of these somehow avoid corruption > on power failure without a capacitor system, I'd love to know how that > works! > > Cheers, > IMHO this whole post should be enshrined into an FAQ or manpage or wiki. It's very informative and compelling. > Stephen. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 04:27:14 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8A64EBCF for ; Wed, 14 Nov 2012 04:27:14 +0000 (UTC) (envelope-from bryan@shatow.net) Received: from secure.xzibition.com (secure.xzibition.com [173.160.118.92]) by mx1.freebsd.org (Postfix) with ESMTP id 2B4C48FC12 for ; Wed, 14 Nov 2012 04:27:13 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=shatow.net; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; q=dns; s=sweb; b=z5jvbZ nYmwJwX/0Cdi+b7Gik412oi7SX1FXsAgYBuBA07559DObt7bsouqLpUnj4o2LZqZ xfbKzFZPuJemhqh04t5n2SyxyIKmiQltIjwVo5VfdpRe1JpIYX9J+dRtKLVgihXH PAurqyOZbnWwh+EegcROubViGM2ZJLYNd8JfI= DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=shatow.net; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; s=sweb; bh=iTF9KVd6zwV4 ucx9SSyI3buAOJR/eB/XeFO91YkX/PQ=; b=Av6IdI3WKzGRLC/dDGtf+aMsJfZq q4HLyem96z0h5i+fVT5AkVu5vhQdRb4JWuu72scKgAQdXEYgxaqnfWuoXRtkSb0r g27TWlndc/35a8K4ikcasUBl6B0G8kP0qp3ujuODM8OdyGy2+8i044egNrlcMb3/ 1MDicMCpILXJOJ4= Received: (qmail 44339 invoked from network); 13 Nov 2012 22:27:12 -0600 Received: from unknown (HELO ?10.10.0.115?) (bryan@shatow.net@10.10.0.115) by sweb.xzibition.com with ESMTPA; 13 Nov 2012 22:27:12 -0600 Message-ID: <50A31D9A.7020200@shatow.net> Date: Tue, 13 Nov 2012 22:27:06 -0600 From: Bryan Drewery User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Chris BeHanna Subject: Re: SSD recommendations for ZFS cache/log References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> In-Reply-To: <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> X-Enigmail-Version: 1.4.5 OpenPGP: id=3C9B0CF9; url=http://www.shatow.net/bryan/bryan.asc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 04:27:14 -0000 On 11/13/2012 10:18 PM, Chris BeHanna wrote: > On Nov 13, 2012, at 21:51, Stephen McKay wrote: > >> [...lots of good advice about measuring, and lots of good advice about L2ARC...] >> >> I have no way to determine in advance the behaviour of an SSD on >> power failure so I assume all the ones I can afford have bad >> behaviour. :-) I know that expensive ones contain capacitors so >> that power failures do not corrupt their contents. By the nature >> of advertising (from which we know that any feature not excessively >> hyped must therefore not be supported), we must conclude that other >> SSDs by normal operation corrupt blocks on power failure. > > If you'll pardon what may be an ignorant question, does this matter if you have your machine on a UPS, especially if you run upsmon or nut to do a graceful shutdown when there are n minutes of battery remaining? > I've had more than 1 UPS battery die on me, resulting in instant shutoff. > Thanks, > From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 05:10:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 00E639E0 for ; Wed, 14 Nov 2012 05:10:39 +0000 (UTC) (envelope-from chris@behanna.org) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.freebsd.org (Postfix) with SMTP id A87F78FC08 for ; Wed, 14 Nov 2012 05:10:39 +0000 (UTC) Received: (qmail 83990 invoked by uid 0); 14 Nov 2012 05:03:57 -0000 Received: from 99.120.175.239 (HELO scythe.behanna.org) (99.120.175.239) by relay01.pair.com with SMTP; 14 Nov 2012 05:03:57 -0000 X-pair-Authenticated: 99.120.175.239 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: SSD recommendations for ZFS cache/log From: Chris BeHanna In-Reply-To: <50A31D9A.7020200@shatow.net> Date: Tue, 13 Nov 2012 23:03:58 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> <50A31D9A.7020200@shatow.net> To: FreeBSD FS X-Mailer: Apple Mail (2.1499) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 05:10:40 -0000 On Nov 13, 2012, at 22:27 , Bryan Drewery wrote: > On 11/13/2012 10:18 PM, Chris BeHanna wrote: >> On Nov 13, 2012, at 21:51, Stephen McKay = wrote: >>=20 >>> [...lots of good advice about measuring, and lots of good advice = about L2ARC...] >>>=20 >>> I have no way to determine in advance the behaviour of an SSD on >>> power failure so I assume all the ones I can afford have bad >>> behaviour. :-) I know that expensive ones contain capacitors so >>> that power failures do not corrupt their contents. By the nature >>> of advertising (from which we know that any feature not excessively >>> hyped must therefore not be supported), we must conclude that other >>> SSDs by normal operation corrupt blocks on power failure. >>=20 >> If you'll pardon what may be an ignorant question, does this matter = if you have your machine on a UPS, especially if you run upsmon or nut = to do a graceful shutdown when there are n minutes of battery remaining? >=20 > I've had more than 1 UPS battery die on me, resulting in instant = shutoff. Mine always die at 0300 or thereabouts, and none of my UPSen has = an "I know, now STFU" button. I would gather that the extra expense of a capacitor-backed SSD = if you already have a UPS (with relatively new batteries) depends upon = the particular use case. Banking data? Hell yeah. Home office? Meh. = I might lose a few pieces of spam and the last few minutes of work from = my text editor. --=20 Chris BeHanna chris@behanna.org From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 05:21:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9FE02B1D; Wed, 14 Nov 2012 05:21:13 +0000 (UTC) (envelope-from ryao@gentoo.org) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by mx1.freebsd.org (Postfix) with ESMTP id 0D5838FC08; Wed, 14 Nov 2012 05:21:13 +0000 (UTC) Received: from [192.168.1.2] (pool-173-77-245-118.nycmny.fios.verizon.net [173.77.245.118]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id EC3D033DA81; Wed, 14 Nov 2012 05:21:11 +0000 (UTC) Message-ID: <50A329A4.9090304@gentoo.org> Date: Wed, 14 Nov 2012 00:18:28 -0500 From: Richard Yao User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.10) Gecko/20121107 Thunderbird/10.0.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Port of ZFSOnLinux solution for illumos-gate issue #2663 to FreeBSD X-Enigmail-Version: 1.3.5 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig11BA0E7B797DBD67F4D1FE6F" Cc: Eitan Adler X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 05:21:13 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig11BA0E7B797DBD67F4D1FE6F Content-Type: multipart/mixed; boundary="------------000801070506070506070302" This is a multi-part message in MIME format. --------------000801070506070506070302 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Dear Everyone, I am the Gentoo Linux ZFS maintainer as well as part of the Gentoo BSD team. I have attached a patch that ports the ZFSOnLinux solution of illumos-gate issue #2663 to FreeBSD-HEAD. It should also apply against stable, with fuzz. This permits users to avoid fiddling with gnop when making pools on drives that lie about their sector sizes. There are a few things to note about this patch: 1. This does not apply to `zpool add`, `zpool attach` and `zpool replace`. A separate patch for that is being reviewed in ZFSOnLinux at this time. I will port it separately after it is committed. 2. This has not been sent to Illumos upstream. As a Gentoo BSD team developer, I am in a much better position to send code to FreeBSD than to send code to Illumos. I expect that Martin Matuska will port this change to Illumos after it is accepted into FreeBSD, so I assume that this is okay. 3. ZFSOnLinux enforces the CDDL's attribution requirement by relying on commit messages and metadata. FreeBSD and Illumos satisfy it by adding copyright notices to files. I have tried to translate the ZFS attribution policy by adding appropriate copyright notices for non-trivial changes. I would expect this to pass review by the Gentoo Foundation members that review licensing for Gentoo, so I assume that this is okay. I have discussed committing this patch to FreeBSD with Eitan Adler. He requires one of the FreeBSD Filesystem developers to acknowledge it as being appropriate for the tree before he will commit it. Yours truly, Richard Yao --------------000801070506070506070302 Content-Type: text/x-patch; name="0001-Add-ashift-property-to-zpool-create.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-Add-ashift-property-to-zpool-create.patch" =46rom 6a07263af0d51eb5031ba43e299e5d6cdc7ea1f0 Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?Christian=3D20Kohlsch=3DC3=3DBCtter?=3D Date: Sat, 10 Nov 2012 12:09:35 -0500 Subject: [PATCH] Add "ashift" property to zpool create Some disks with internal sectors larger than 512 bytes (e.g., 4k) can suffer from bad write performance when ashift is not configured correctly. This is caused by the disk not reporting its actual sector size, but a sector size of 512 bytes. The drive may behave this way for compatibility reasons. For example, the WDC WD20EARS disks are known to exhibit this behavior. When creating a zpool, ZFS takes that wrong sector size and sets the "ashift" property accordingly (to 9: 1<<9=3D512), whereas it should be set to 12 for 4k sectors (1<<12=3D4096). This patch allows an adminstrator to manual specify the known correct ashift size at 'zpool create' time. This can significantly improve performance in certain cases. However, it will have an impact on your total pool capacity. See the updated ashift property description in the zpool.8 man page for additional details. Valid values for the ashift property range from 9 to 13 (512B-8KB). Additionally, you may set the ashift to 0 if you wish to auto-detect the sector size based on what the disk reports, this is the default behavior. The most common ashift values are 9 and 12. Example: zpool create -o ashift=3D12 tank raidz2 sda sdb sdc sdd Closed zfsonlinux/zfs#280 This patch was modified during its port by by Richard Yao to include a la= ter change he wrote to reduce the effective maximum ashift value to 13 due to= pool corruption issues discovered with higher values. Original-patch-by: Richard Laager Signed-off-by: Brian Behlendorf Ported-by: Richard Yao Man-page-ported-by: Eitan Adler --- cddl/contrib/opensolaris/cmd/zpool/zpool.8 | 20 ++++++++++++++- cddl/contrib/opensolaris/cmd/zpool/zpool_main.c | 6 ++--- cddl/contrib/opensolaris/cmd/zpool/zpool_util.h | 4 +-- cddl/contrib/opensolaris/cmd/zpool/zpool_vdev.c | 29 ++++++++++++++++= ------ .../opensolaris/lib/libzfs/common/libzfs_pool.c | 21 ++++++++++++++++= .../contrib/opensolaris/common/zfs/zpool_prop.c | 4 +++ .../contrib/opensolaris/uts/common/sys/fs/zfs.h | 1 + 7 files changed, 72 insertions(+), 13 deletions(-) diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool.8 b/cddl/contrib/op= ensolaris/cmd/zpool/zpool.8 index 88fc79b..8ed120a 100644 --- a/cddl/contrib/opensolaris/cmd/zpool/zpool.8 +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool.8 @@ -22,10 +22,11 @@ .\" Copyright (c) 2011, Justin T. Gibbs .\" Copyright (c) 2012 by Delphix. All Rights Reserved. .\" Copyright (c) 2012, Glen Barber +.\" Copyright (c) 2012, Richard Laager .\" .\" $FreeBSD$ .\" -.Dd November 28, 2011 +.Dd November 11, 2012 .Dt ZPOOL 8 .Os .Sh NAME @@ -589,6 +590,23 @@ command does not. For non-full pools of a reasonable= size, these effects should be invisible. For small pools, or pools that are close to being complete= ly full, these discrepancies may become more noticeable. .Pp +The following property can be set at creation time: +.Bl -tag -width 2n +.It Sy afigt Ns =3D Ns Ar number +Pool sector size exponent, to the power of 2 (internally referred +to as "ashift"). +I/O operations will be aligned to the specified size boundaries. +Additionally, the minimum (disk) write size will be set to the +specified size, +so this represents a space vs. performance trade-off. +The typical case for setting this property is when performance is +important and the underlying disks use 4KiB sectors but report 512B +sectors to the OS for compatibility reasons; +in that case, set +.Cm ashift=3D12 +(which is 1<<12 =3D 4096). +.El +.Pp The following property can be set at creation time and import time: .Bl -tag -width 2n .It Sy altroot diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c b/cddl/contr= ib/opensolaris/cmd/zpool/zpool_main.c index b57c816..9840dc1 100644 --- a/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c @@ -544,7 +544,7 @@ zpool_do_add(int argc, char **argv) } =20 /* pass off to get_vdev_spec for processing */ - nvroot =3D make_root_vdev(zhp, force, !force, B_FALSE, dryrun, + nvroot =3D make_root_vdev(zhp, NULL, force, !force, B_FALSE, dryrun, argc, argv); if (nvroot =3D=3D NULL) { zpool_close(zhp); @@ -884,7 +884,7 @@ zpool_do_create(int argc, char **argv) } =20 /* pass off to get_vdev_spec for bulk processing */ - nvroot =3D make_root_vdev(NULL, force, !force, B_FALSE, dryrun, + nvroot =3D make_root_vdev(NULL, props, force, !force, B_FALSE, dryrun, argc - 1, argv + 1); if (nvroot =3D=3D NULL) goto errout; @@ -3162,7 +3162,7 @@ zpool_do_attach_or_replace(int argc, char **argv, i= nt replacing) return (1); } =20 - nvroot =3D make_root_vdev(zhp, force, B_FALSE, replacing, B_FALSE, + nvroot =3D make_root_vdev(zhp, NULL, force, B_FALSE, replacing, B_FALSE= , argc, argv); if (nvroot =3D=3D NULL) { zpool_close(zhp); diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool_util.h b/cddl/contr= ib/opensolaris/cmd/zpool/zpool_util.h index 134c730..b67ff8b 100644 --- a/cddl/contrib/opensolaris/cmd/zpool/zpool_util.h +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool_util.h @@ -43,8 +43,8 @@ uint_t num_logs(nvlist_t *nv); * Virtual device functions */ =20 -nvlist_t *make_root_vdev(zpool_handle_t *zhp, int force, int check_rep, - boolean_t replacing, boolean_t dryrun, int argc, char **argv); +nvlist_t *make_root_vdev(zpool_handle_t *zhp, nvlist_t *props, int force= , + int check_rep, boolean_t replacing, boolean_t dryrun, int argc, char= **argv); nvlist_t *split_mirror_vdev(zpool_handle_t *zhp, char *newname, nvlist_t *props, splitflags_t flags, int argc, char **argv); =20 diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool_vdev.c b/cddl/contr= ib/opensolaris/cmd/zpool/zpool_vdev.c index 5ffd39a..75dbb94 100644 --- a/cddl/contrib/opensolaris/cmd/zpool/zpool_vdev.c +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool_vdev.c @@ -21,6 +21,8 @@ =20 /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights re= served. + * Copyright (c) 2011 by Christian Kohlsch=C3=BCtter . + * All rights reserved. */ =20 /* @@ -414,7 +416,7 @@ is_whole_disk(const char *arg) * xxx Shorthand for /dev/dsk/xxx */ static nvlist_t * -make_leaf_vdev(const char *arg, uint64_t is_log) +make_leaf_vdev(nvlist_t *props, const char *arg, uint64_t is_log) { char path[MAXPATHLEN]; struct stat64 statbuf; @@ -512,6 +514,19 @@ make_leaf_vdev(const char *arg, uint64_t is_log) verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK, (uint64_t)wholedisk) =3D=3D 0); =20 + if (props !=3D NULL) { + uint64_t ashift =3D 0; + char *value =3D NULL; + + if (nvlist_lookup_string(props, + zpool_prop_to_name(ZPOOL_PROP_ASHIFT), &value) =3D=3D 0) + zfs_nicestrtonum(NULL, value, &ashift); + + if (ashift > 0) + verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_ASHIFT, + ashift) =3D=3D 0); + } + /* * For a whole disk, defer getting its devid until after labeling it. */ @@ -1198,7 +1213,7 @@ is_grouping(const char *type, int *mindev, int *max= dev) * because the program is just going to exit anyway. */ nvlist_t * -construct_spec(int argc, char **argv) +construct_spec(nvlist_t *props, int argc, char **argv) { nvlist_t *nvroot, *nv, **top, **spares, **l2cache; int t, toplevels, mindev, maxdev, nspares, nlogs, nl2cache; @@ -1287,7 +1302,7 @@ construct_spec(int argc, char **argv) children * sizeof (nvlist_t *)); if (child =3D=3D NULL) zpool_no_memory(); - if ((nv =3D make_leaf_vdev(argv[c], B_FALSE)) + if ((nv =3D make_leaf_vdev(props, argv[c], B_FALSE)) =3D=3D NULL) return (NULL); child[children - 1] =3D nv; @@ -1343,7 +1358,7 @@ construct_spec(int argc, char **argv) * We have a device. Pass off to make_leaf_vdev() to * construct the appropriate nvlist describing the vdev. */ - if ((nv =3D make_leaf_vdev(argv[0], is_log)) =3D=3D NULL) + if ((nv =3D make_leaf_vdev(props, argv[0], is_log)) =3D=3D NULL) return (NULL); if (is_log) nlogs++; @@ -1409,7 +1424,7 @@ split_mirror_vdev(zpool_handle_t *zhp, char *newnam= e, nvlist_t *props, uint_t c, children; =20 if (argc > 0) { - if ((newroot =3D construct_spec(argc, argv)) =3D=3D NULL) { + if ((newroot =3D construct_spec(props, argc, argv)) =3D=3D NULL) { (void) fprintf(stderr, gettext("Unable to build a " "pool from the specified devices\n")); return (NULL); @@ -1461,7 +1476,7 @@ split_mirror_vdev(zpool_handle_t *zhp, char *newnam= e, nvlist_t *props, * added, even if they appear in use. */ nvlist_t * -make_root_vdev(zpool_handle_t *zhp, int force, int check_rep, +make_root_vdev(zpool_handle_t *zhp, nvlist_t *props, int force, int chec= k_rep, boolean_t replacing, boolean_t dryrun, int argc, char **argv) { nvlist_t *newroot; @@ -1473,7 +1488,7 @@ make_root_vdev(zpool_handle_t *zhp, int force, int = check_rep, * that we have a valid specification, and that all devices can be * opened. */ - if ((newroot =3D construct_spec(argc, argv)) =3D=3D NULL) + if ((newroot =3D construct_spec(props, argc, argv)) =3D=3D NULL) return (NULL); =20 if (zhp && ((poolconfig =3D zpool_get_config(zhp, NULL)) =3D=3D NULL)) diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c b/c= ddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c index 03bc3e6..a904f4f 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c @@ -22,6 +22,8 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights re= served. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. + * Copyright (c) 2011 by Christian Kohlsch=C3=BCtter . + * All rights reserved. * Copyright (c) 2012 by Delphix. All rights reserved. */ =20 @@ -304,6 +306,7 @@ zpool_get_prop(zpool_handle_t *zhp, zpool_prop_t prop= , char *buf, size_t len, case ZPOOL_PROP_FREE: case ZPOOL_PROP_FREEING: case ZPOOL_PROP_EXPANDSZ: + case ZPOOL_PROP_ASHIFT: (void) zfs_nicenum(intval, buf, len); break; =20 @@ -512,6 +515,24 @@ zpool_valid_proplist(libzfs_handle_t *hdl, const cha= r *poolname, } break; =20 + case ZPOOL_PROP_ASHIFT: + if (!flags.create) { + zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, + "property '%s' can only be set at " + "creation time"), propname); + (void) zfs_error(hdl, EZFS_BADPROP, errbuf); + goto error; + } + + if (intval !=3D 0 && (intval < 9 || intval > 13)) { + zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, + "property '%s' number %d is invalid."), + propname, intval); + (void) zfs_error(hdl, EZFS_BADPROP, errbuf); + goto error; + } + break; + case ZPOOL_PROP_BOOTFS: if (flags.create || flags.import) { zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, diff --git a/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c b/sys/c= ddl/contrib/opensolaris/common/zfs/zpool_prop.c index 72db879..06e1cff 100644 --- a/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c +++ b/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c @@ -95,6 +95,10 @@ zpool_prop_init(void) PROP_READONLY, ZFS_TYPE_POOL, "<1.00x or higher if deduped>", "DEDUP"); =20 + /* readonly onetime number properties */ + zprop_register_number(ZPOOL_PROP_ASHIFT, "ashift", 0, PROP_ONETIME, + ZFS_TYPE_POOL, "", "ASHIFT"); + /* default number properties */ zprop_register_number(ZPOOL_PROP_VERSION, "version", SPA_VERSION, PROP_DEFAULT, ZFS_TYPE_POOL, "", "VERSION"); diff --git a/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h b/sys/c= ddl/contrib/opensolaris/uts/common/sys/fs/zfs.h index 64fd2e6..e113064 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h +++ b/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h @@ -170,6 +170,7 @@ typedef enum { ZPOOL_PROP_FREE, ZPOOL_PROP_ALLOCATED, ZPOOL_PROP_READONLY, + ZPOOL_PROP_ASHIFT, ZPOOL_PROP_COMMENT, ZPOOL_PROP_EXPANDSZ, ZPOOL_PROP_FREEING, --=20 1.8.0 --------------000801070506070506070302-- --------------enig11BA0E7B797DBD67F4D1FE6F Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQoymrAAoJECDuEZm+6ExkULAQAIY8/oC81TxnbtL+AOHBr2hL A8Z0goK6HB/eI/4syOquP42CUcBBWSMNw9/trkqEMqI+VUuRVYk+G4A17+iZO6wo L6EnGU6EvZmh5Uj44yGSez33ffe4nrtiLzIOmrKm2SLHkyXTZ+C360ACLkAOIZr0 tT0n/cr9p0wR3EUv809Hi/ofzzXS+Pb8MmT1rmQgTYQNAuNYddPxB3vOdDpz4khA v0PhFKKB6wsje2L4KRCbxK8zdvWaNOE0ITF9gY+n52+eHfh9ZSlINhzGcfFV8UT/ YAbvhvejDqo4KsDYovp/hkDZCk7amRUn2mPBP2IDGZTyCaXsnxlgesjeJG6Ptd/O DtKaN8afsiipAtzWCynv5YJzwwDkLt1uCaFOvxwO9/rNgyBdxfWA0FtDezKnKOxn rjStdCujtWkmDF9ensAH/Vk7LElL6fSIq4PX9IK/Y+43OpSd9wOhX/iUPgOPE4Ar cV0okw68aLrJcuhIjMXMsaj6z50TOurHUmcqKnzZpFEKPgH77As5iQMovF8aZFOG pRzcSszxuSUNZo9M5j7kcn/8cB7v1jul7tJ7PjF5Qw5rUdqNx197VPDY5ZOUfIu4 HEovTxLk5kJaYrgqi8miY+3R4TKbRIDZ1wEhEL5yqUp5qz2DNnSFTZkIC+PbKjtJ 3hkGG0jC1hudRk/gjbxW =kWL+ -----END PGP SIGNATURE----- --------------enig11BA0E7B797DBD67F4D1FE6F-- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 05:34:14 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0CB19DC6; Wed, 14 Nov 2012 05:34:14 +0000 (UTC) (envelope-from prvs=166515a72b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 381818FC0C; Wed, 14 Nov 2012 05:34:12 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001052098.msg; Wed, 14 Nov 2012 05:34:03 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 14 Nov 2012 05:34:03 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=166515a72b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <3AD8FAE474C24860B57C75DEBE431126@multiplay.co.uk> From: "Steven Hartland" To: "Richard Yao" , References: <50A329A4.9090304@gentoo.org> Subject: Re: Port of ZFSOnLinux solution for illumos-gate issue #2663 to FreeBSD Date: Wed, 14 Nov 2012 05:34:02 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: Eitan Adler X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 05:34:14 -0000 Useful stuff, you might be interested in the follow:- Teach ZFS about geom stripe size so zpools are created with optimum ashift http://www.freebsd.org/cgi/query-pr.cgi?pr=173115 I've got an updated patch which allows the min ashift to be configured. It should be noted that the max working ashift appears to be 17 due the way labels use zio_read_phys with offset. Regards Steve ----- Original Message ----- From: "Richard Yao" To: Cc: "Eitan Adler" Sent: Wednesday, November 14, 2012 5:18 AM Subject: Port of ZFSOnLinux solution for illumos-gate issue #2663 to FreeBSD Dear Everyone, I am the Gentoo Linux ZFS maintainer as well as part of the Gentoo BSD team. I have attached a patch that ports the ZFSOnLinux solution of illumos-gate issue #2663 to FreeBSD-HEAD. It should also apply against stable, with fuzz. This permits users to avoid fiddling with gnop when making pools on drives that lie about their sector sizes. There are a few things to note about this patch: 1. This does not apply to `zpool add`, `zpool attach` and `zpool replace`. A separate patch for that is being reviewed in ZFSOnLinux at this time. I will port it separately after it is committed. 2. This has not been sent to Illumos upstream. As a Gentoo BSD team developer, I am in a much better position to send code to FreeBSD than to send code to Illumos. I expect that Martin Matuska will port this change to Illumos after it is accepted into FreeBSD, so I assume that this is okay. 3. ZFSOnLinux enforces the CDDL's attribution requirement by relying on commit messages and metadata. FreeBSD and Illumos satisfy it by adding copyright notices to files. I have tried to translate the ZFS attribution policy by adding appropriate copyright notices for non-trivial changes. I would expect this to pass review by the Gentoo Foundation members that review licensing for Gentoo, so I assume that this is okay. I have discussed committing this patch to FreeBSD with Eitan Adler. He requires one of the FreeBSD Filesystem developers to acknowledge it as being appropriate for the tree before he will commit it. Yours truly, Richard Yao ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 06:34:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A928F8C0 for ; Wed, 14 Nov 2012 06:34:02 +0000 (UTC) (envelope-from spork@bway.net) Received: from smtp3.bway.net (smtp3.bway.net [216.220.96.27]) by mx1.freebsd.org (Postfix) with ESMTP id 5C0DC8FC0C for ; Wed, 14 Nov 2012 06:34:01 +0000 (UTC) Received: from toasty.sporklab.com (foon.sporktines.com [96.57.144.66]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: spork@bway.net) by smtp3.bway.net (Postfix) with ESMTPSA id 08B2A95868; Wed, 14 Nov 2012 01:25:22 -0500 (EST) References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> In-Reply-To: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii Message-Id: Content-Transfer-Encoding: quoted-printable From: Charles Sprickman Subject: Re: SSD recommendations for ZFS cache/log Date: Wed, 14 Nov 2012 01:25:22 -0500 To: Stephen McKay X-Mailer: Apple Mail (2.1084) Cc: Tom Evans , FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 06:34:03 -0000 On Nov 13, 2012, at 10:51 PM, Stephen McKay wrote: > On Thursday, 8th November 2012, Tom Evans wrote: >=20 >> I'm upgrading my home ZFS setup, and want to speed things up a bit by >> adding some SSDs for cache/log. I was hoping some more experienced >> heads could offer some advice on what I've gleaned so far. >=20 > Before you get excited about SSD for ZIL, measure your synchronous > write rate. If you have a mostly async load, you may get little > or zero improvement. >=20 > To measure ZIL activity, install dtrace and run Richard Elling's > zilstat script. Everyone with more than a passing interest in ZFS > should do this. Measurement always beats speculation. >=20 > On my workstation, I have sync writes only during email delivery, > and for that I'm willing to spend the extra few milliseconds a > hard disk takes so that I don't have to risk my data on a consumer > grade SSD. >=20 > I have no way to determine in advance the behaviour of an SSD on > power failure so I assume all the ones I can afford have bad > behaviour. :-) I know that expensive ones contain capacitors so > that power failures do not corrupt their contents. By the nature > of advertising (from which we know that any feature not excessively > hyped must therefore not be supported), we must conclude that other > SSDs by normal operation corrupt blocks on power failure. >=20 > So, that puts SSDs (that I can afford) behind standard disks for > reliability, plus I wouldn't benefit much from the speed, so I don't > use an SSD for ZIL. >=20 > Even if you have a sync heavy load (NFS server, say, or perhaps a > time machine server via netatalk), the right answer might be to > subvert those protocols so they become async. (Maybe nothing you > do with those protocols actually depends on their sync guarantees, > or perhaps you can recover easily from failure by restarting.) > You'll only know if you have to make decisions like this (expensive > reliable SSD for ZIL vs cheating at protocols) if you measure. So, > measure! >=20 > As for L2ARC, do you need it? It's harder to tell in advance that > a cache device would be useful, but if you have sufficient RAM for > your purposes, you may not need it. Sufficient could be approximately > 1GB per 1TB of disk (other rules of thumb exist). >=20 > If you enable dedup, you are unlikely to have sufficient RAM! So > in this case L2ARC may be advisable. Even then, performance when > using dedup may be less than you would hope for, so I recommend > against enabling dedup. >=20 > Remember that L2ARC is not persistent. It takes time to warm up. > If you reboot often, you will get little to no use from it. If > you leave your machine on all the time, eventually everything > frequently used will end up in there. But, if you don't use all > your RAM for ARC before you reboot anyway, your L2ARC will be > (essentially) unused. Again, you have to measure at least a little > bit (perhaps using the zfs-stats port) before you know. >=20 > On the plus side, a corrupt L2ARC shouldn't do any more than require > a reboot, so it's safe to experiment with cheap SSDs. >=20 >> The drives I am thinking of getting are either Intel 330, Intel 520, >> Crucial M4 RealSSD or Samsung 830, all in their 120/128GB variants. >=20 > Do any of these contain capacitors for use when power fails? =20 I may be out of date on this, but when I last looked, the Intel 320s were the only "consumer" drives that were safe: = http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_3= 20_Series_Enhance_Power_Loss_Technology_Brief.pdf (apologies for the = pdf, but there's Intel actually saying the drives are safe) This expands on it a bit: http://blog.2ndquadrant.com/intel_ssd_now_off_the_sherr_sh/ And this post contains results of some "pull the plug" tests: = http://archives.postgresql.org/message-id/4D9D1FC3.4020207@2ndQuadrant.com= This post also contains some interesting thoughts on the lifetime of = these drives: http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ Charles > If not > then I'd assume they are unsafe for use as ZIL and would limit them > to L2ARC. If you can show that any of these somehow avoid corruption > on power failure without a capacitor system, I'd love to know how that > works! >=20 > Cheers, >=20 > Stephen. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 06:37:18 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E693299F for ; Wed, 14 Nov 2012 06:37:18 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [IPv6:2001:44b8:8060:ff02:300:1:6:5]) by mx1.freebsd.org (Postfix) with ESMTP id 355558FC14 for ; Wed, 14 Nov 2012 06:37:18 +0000 (UTC) Message-Id: <57ac1f$gf6p7c@ipmail05.adl6.internode.on.net> X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AitWAJI6o1ABigXDPGdsb2JhbAAuFoU0hSO4ZxgBAQEBODSCHgEBBAF5EAgDDQsTG0MUBhMZh2sFx04CAQIWhQuBBgOcCQONNg Received: from unknown (HELO localhost) ([1.138.5.195]) by ipmail05.adl6.internode.on.net with ESMTP; 14 Nov 2012 17:07:16 +1030 From: Stephen McKay To: Chris BeHanna Subject: Re: SSD recommendations for ZFS cache/log References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net><943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> In-Reply-To: <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> from Chris BeHanna at "Tue, 13 Nov 2012 22:18:54 -0600" Date: Wed, 14 Nov 2012 17:37:08 +1100 Cc: FreeBSD FS , Stephen McKay X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 06:37:19 -0000 On Tuesday, 13th November 2012, Chris BeHanna wrote: >On Nov 13, 2012, at 21:51, Stephen McKay wrote: > >> [...lots of good advice about measuring, and lots of good advice about L2ARC...] I'm glad people found what I wrote useful. I'll have to rant more often. :-) >> I have no way to determine in advance the behaviour of an SSD on >> power failure so I assume all the ones I can afford have bad >> behaviour. > >If you'll pardon what may be an ignorant question, does this >matter if you have your machine on a UPS, especially if you run >upsmon or nut to do a graceful shutdown when there are n minutes >of battery remaining? I would still care. But then, I assume my hardware is out to get me. :-) In the end, it's a matter of risk assessment. How valuable your data is vs how difficult it is to recover vs how likely it is to be lost. High value data that cannot be recovered should be stored (backed up) in many places on highly reliable media because even a low risk of loss is bad(1). Low value data can be treated more roughly, and maybe occasional (detectable) corruption or loss is OK. The difficulty is in how we can calculate the chance of loss when we have no published statistics on power failure corruption rates in SSDs. Multiply that by the chance your UPS may fail(2) (which I'm guessing you don't have a number for either), and we have what we hope is a very small number, but which might in reality be large enough to cause grief. If I ran a bank, my ZIL would likely be a redundant array of battery backed RAM disks (that's the most expensive and fastest sort, you can reduce your life expectancy simply by reading the price list). And my power supply would be redundant. And my UPS would be redundant. And I'd do generator tests frequently. And the armed guards would keep cleaners out of the computer room. And ... But at home, you have to make your best guess and go from there. As I've said before, the end result of my calculations was to have no SSD ZIL at all. I think for most people this is an entirely reasonable situation. Cheers, Stephen. (1) There's a long discussion of disk redundancy (mirror vs raidz, etc) and backup strategies (periodic vs continuous, off/on-line, on/off-site, automated/ad hoc) to mitigate hardware failures, software errors, system administrator fumbles, hacker attacks and the plain disregard the universe has for you that I've left out here but which matters at least as much as broken SSDs do. (2) UPS failure can include the owner tripping over the power cord or accidentally switching it off. Watching someone accidentally switch off a room full of computers this way caused much merriment. No, wait! It caused us all several days of pain. :-( From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 14:06:28 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E800D26F for ; Wed, 14 Nov 2012 14:06:28 +0000 (UTC) (envelope-from gary.buhrmaster@gmail.com) Received: from mail-ia0-f182.google.com (mail-ia0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id A9D588FC17 for ; Wed, 14 Nov 2012 14:06:28 +0000 (UTC) Received: by mail-ia0-f182.google.com with SMTP id x2so385425iad.13 for ; Wed, 14 Nov 2012 06:06:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=upuXioRUj4TzFdnszkdw+jFIyfpB0b/7RwQHmpvsVP8=; b=nq8eA3s6kbhGfDPuLvK52SAA61UiSTJarKoP3HtrSDYqmE4tugO4AT9v6tQ1OqVoqR vsyNHsmtHsddn5BI4v0XBWvAzT1mf+5eSkHjLc7J2vJqyT4UJRJ0a8WtJz040Gld0YWd QpITjzC3/dXCjcVEXFIFXhCFQXNjlcJUxR3YrBm5l1RqhWhzfeu5Rf/7iB51oOU8jz1F PiD29t3PG1ccXYrYhOC3c4MqoV0K0f0h0inlpYE/kvhO/u1Dd1SY74H6XXLPmCfdfNzH LspG0rDQ3Cj9UYfsdluxfOymI0AjENwEjsyQfHIr4VqWpel+zQZfl1jHACYGevunpOi5 naqg== MIME-Version: 1.0 Received: by 10.50.202.97 with SMTP id kh1mr1699468igc.15.1352901982140; Wed, 14 Nov 2012 06:06:22 -0800 (PST) Received: by 10.42.239.3 with HTTP; Wed, 14 Nov 2012 06:06:22 -0800 (PST) In-Reply-To: <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <943159E4-8824-4767-96E1-89E8EC69DCDF@behanna.org> Date: Wed, 14 Nov 2012 06:06:22 -0800 Message-ID: Subject: Re: SSD recommendations for ZFS cache/log From: Gary Buhrmaster To: Chris BeHanna Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 14:06:29 -0000 On Tue, Nov 13, 2012 at 8:18 PM, Chris BeHanna wrote: .... > If you'll pardon what may be an ignorant question, does this matter if you have your machine on a UPS, especially if you run upsmon or nut to do a graceful shutdown when there are n minutes of battery remaining? In the real world, UPS's aren't (uninterruptable), people pull power cords (even redundant ones), power supplies fail, the power supply redundant backplane fails, and the motherboard fries and shuts down the power supply, and disks/SSDs sometimes corrupt themselves for other random reasons. And, of course, the reason any of this is so important with SSDs is that (almost) all SSDs lie about having written the data to the sectors (they indicate immediate success) since writing to FLASH is so slow (you have to read a flash 4KB/8KB sector, update it with your (usually/often) smaller block, erase the flash sector, and then write the new data). They may also be doing internal scrubs and defragmentation at the time of the request. And so they buffer written data to onboard RAM and report immediate success. Since ZFS is so dependent on the ZIL being correct for recovery (smart people have added codes to no longer result in complete loss when it encounters a corrupted ZIL, but the result can still be some data loss), the ZFS codes to update the ZIL expect that when the device indicates "written to disk complete", it has been written. Since the flash has buffered the ZIL data, a power failure could result in violating this presumption of ZFS and the ZIL integrity. A common solution on SSDs is sometimes called a "super capacitor" so that in the event of a power failure the SSD still has enough power (time) to finish in-flight writes. Marketing in various companies call the solution different things. Gary From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 23:11:47 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 161CBEE; Wed, 14 Nov 2012 23:11:47 +0000 (UTC) (envelope-from ryao@gentoo.org) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by mx1.freebsd.org (Postfix) with ESMTP id 98C178FC12; Wed, 14 Nov 2012 23:11:46 +0000 (UTC) Received: from [192.168.1.2] (pool-173-77-245-118.nycmny.fios.verizon.net [173.77.245.118]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id EA16F33D966; Wed, 14 Nov 2012 23:11:39 +0000 (UTC) Message-ID: <50A42490.90103@gentoo.org> Date: Wed, 14 Nov 2012 18:09:04 -0500 From: Richard Yao User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.10) Gecko/20121107 Thunderbird/10.0.10 MIME-Version: 1.0 To: Steven Hartland Subject: Re: Port of ZFSOnLinux solution for illumos-gate issue #2663 to FreeBSD References: <50A329A4.9090304@gentoo.org> <3AD8FAE474C24860B57C75DEBE431126@multiplay.co.uk> In-Reply-To: <3AD8FAE474C24860B57C75DEBE431126@multiplay.co.uk> X-Enigmail-Version: 1.3.5 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigC5E529FD90024BFF04A2D9BD" Cc: freebsd-fs@freebsd.org, Eitan Adler X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 23:11:47 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigC5E529FD90024BFF04A2D9BD Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 11/14/2012 12:34 AM, Steven Hartland wrote: > Useful stuff, you might be interested in the follow:- > > Teach ZFS about geom stripe size so zpools are created with optimum ash= ift > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D173115 > > I've got an updated patch which allows the min ashift to be configured.= > > It should be noted that the max working ashift appears to be 17 > due the way labels use zio_read_phys with offset. > > Regards > Steve Your patch modifies autodetection while this patch provides a manual override. They should be able to co-exist. With that said, the orignal patch permitted ashift=3D17, but that was shown to permit pool corruption: https://github.com/zfsonlinux/zfs/issues/425 As far as I know, ashift=3D13 is the highest value permitted on both Linu= x and FreeBSD. The code can operate with higher values, but I do not recommend them. --------------enigC5E529FD90024BFF04A2D9BD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQpCSQAAoJECDuEZm+6ExkuqUP/2jhJw8l3VoBDCtaZ39B445Q sDlOKJrPzYqOpFzYvnwwfEiP8QoAmFDCvZfmhZqYF+LTsPj5+7Ro/6BqP6UiZsUr A6AdhmEcQvaMgvK+bcA/uUL6JztwcmEMICUFFiEw/Dbca7886aAGgzv7HJxiSfpP afTeSBQVrHeY6UWKO5SBkt6M3fCQlQlrLq8AHANYumHkJT9G3Em32t7VYXJAKvJW jBWSeUprwDM7asMszdRVPkNHeaOXFsASHWOHBhTYsPVhL336jKEzKBJqAEGdqJbE GPIM3lglQQI59QHkq/B3rzlpNoCxCfqpo2wwWchT1x9RDSIv+5z3c8sbGwSYiepy soCATj6HaXP5VbZIxMdUmjWT5EEm9HFzbA3/L9fYdCDMKxVAEKrgCH9eeS5eoBgJ NjXZtjyNkNGxnHZVSObZ3U8FbzIjQ43fd5cCOpOUj0Zyu4XX0oyx7U0yOCCF0HEq L+6wvYHOALIwZHljlItGeMerxQbleLrJ03XC8dnx3Jsv6I6INFg6Ww9jB0IblrR5 b4P/5qUOmqWgv6n8esrQTRdLhywnOAMSFNeBFwTdvY82KU9Xtz4PjbvOf2ILJeiH ENGgZVIxbjVjdSpGSw5cSuoH9+/7mjbF3CjMr2vseK36vq87WzOJa3byH0WKr3/q sPffRnF39CIZ2eyw5it5 =0n6T -----END PGP SIGNATURE----- --------------enigC5E529FD90024BFF04A2D9BD-- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 14 23:47:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3A8F4984; Wed, 14 Nov 2012 23:47:43 +0000 (UTC) (envelope-from prvs=166515a72b=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 33DEE8FC12; Wed, 14 Nov 2012 23:47:42 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001061225.msg; Wed, 14 Nov 2012 23:47:38 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 14 Nov 2012 23:47:38 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=166515a72b=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <098A2B6F89484CF1829EED7B9822DDD0@multiplay.co.uk> From: "Steven Hartland" To: "Richard Yao" References: <50A329A4.9090304@gentoo.org> <3AD8FAE474C24860B57C75DEBE431126@multiplay.co.uk> <50A42490.90103@gentoo.org> Subject: Re: Port of ZFSOnLinux solution for illumos-gate issue #2663 to FreeBSD Date: Wed, 14 Nov 2012 23:47:41 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org, Eitan Adler X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2012 23:47:43 -0000 ----- Original Message ----- From: "Richard Yao" >> Teach ZFS about geom stripe size so zpools are created with optimum ashift >> http://www.freebsd.org/cgi/query-pr.cgi?pr=173115 >> >> I've got an updated patch which allows the min ashift to be configured. >> >> It should be noted that the max working ashift appears to be 17 >> due the way labels use zio_read_phys with offset. > > Your patch modifies autodetection while this patch provides a manual > override. They should be able to co-exist. Yes indeed that was my intention :) > With that said, the orignal patch permitted ashift=17, but that was > shown to permit pool corruption: > > https://github.com/zfsonlinux/zfs/issues/425 > > As far as I know, ashift=13 is the highest value permitted on both Linux > and FreeBSD. The code can operate with higher values, but I do not > recommend them. Interesting I'll have a play with that there may be other edge cases I'm not aware of thanks for the heads up. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 15 00:18:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id 725ABE3A; Thu, 15 Nov 2012 00:18:40 +0000 (UTC) Date: Thu, 15 Nov 2012 00:18:40 +0000 From: John To: Julian Elischer Subject: Re: RHEL to FreeBSD file server Message-ID: <20121115001840.GA27399@FreeBSD.org> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50A2F804.3010009@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 00:18:40 -0000 ----- Julian Elischer's Original Message ----- > On 11/13/12 1:19 PM, Jason Keltz wrote: > >On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: > >>On Mon, 12 Nov 2012, kpneal@pobox.com wrote: > >>> > >>>With your setup of 11 mirrors you have a good mixture of read > >>>and write > >>>performance, but you've compromised on the safety. The reason > >>>that RAID 6 ... > >By the way - on another note - what do you or other list members > >think of the new Intel SSD DC S3700 as ZIL? Sounds very promising > >when it's finally available. I spent a lot of time researching > >ZILs today, and one thing I can say is that I have a major > >headache now because of it!! > > ZIL is best served by battery backed up ram or something.. it's tiny > and not a really good fit an SSD (maybe just a partition) L2ARC on > the other hand is a really good use for SSD. Well, since you brought the subject up :-) Do you have any recommendations for an NVRAM unit usable with Freebsd? -John From owner-freebsd-fs@FreeBSD.ORG Thu Nov 15 00:25:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1D2A1EFD for ; Thu, 15 Nov 2012 00:25:00 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 8D7BE8FC08 for ; Thu, 15 Nov 2012 00:24:59 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAJAGU1pFCDaFvO/2dsb2JhbABEhh25Y4RGB4IeAQYjBIECFBkCBFUGiB6aDY5Tkl+MLYUZgRMDiFqGIYcBkEODDYF7 X-IronPort-AV: E=Sophos;i="4.83,254,1352091600"; d="scan'208";a="247816" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 14 Nov 2012 19:24:58 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 85408B4040 for ; Wed, 14 Nov 2012 19:24:58 -0500 (EST) Date: Wed, 14 Nov 2012 19:24:58 -0500 (EST) From: Rick Macklem To: "freebsd-fs@freebsd.org" Message-ID: <585500992.392200.1352939098529.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1960242840.392195.1352939094946.JavaMail.root@erie.cs.uoguelph.ca> Subject: testing/review of a patch that adds "nfsstat -m" to dump NFS mount options MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_392199_539570562.1352939098525" X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 00:25:00 -0000 ------=_Part_392199_539570562.1352939098525 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, I've attached a pair of patches: nfsstat-m.patch - applies to nfsstat to add a "-m" options nfsstat-dumpmnt.patch - applies to the kernel to support the above "nfsstat -m" dumps out the options actually being used by new NFS client mounts. Feel free to test and/or review them. Thanks, rick ------=_Part_392199_539570562.1352939098525 Content-Type: text/x-patch; name=nfsstat-m.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=nfsstat-m.patch LS0tIHVzci5iaW4vbmZzc3RhdC9uZnNzdGF0LmMuc2F2CTIwMTItMTEtMTQgMDg6MzQ6MDIuMDAw MDAwMDAwIC0wNTAwCisrKyB1c3IuYmluL25mc3N0YXQvbmZzc3RhdC5jCTIwMTItMTEtMTQgMTg6 MDk6MDEuMDAwMDAwMDAwIC0wNTAwCkBAIC0xMDcsMTQgKzEwNywzNiBAQCBtYWluKGludCBhcmdj LCBjaGFyICoqYXJndikKIAlpbnQgY2g7CiAJY2hhciAqbWVtZiwgKm5saXN0ZjsKIAljaGFyIGVy cmJ1ZltfUE9TSVgyX0xJTkVfTUFYXTsKKwlpbnQgbW50bGVuLCBpOworCWNoYXIgYnVmWzEwMjRd OworCXN0cnVjdCBzdGF0ZnMgKm1udGJ1ZjsKKwlzdHJ1Y3QgbmZzY2xfZHVtcG1udG9wdHMgZHVt cG1udG9wdHM7CiAKIAlpbnRlcnZhbCA9IDA7CiAJbWVtZiA9IG5saXN0ZiA9IE5VTEw7Ci0Jd2hp bGUgKChjaCA9IGdldG9wdChhcmdjLCBhcmd2LCAiY2VzV006Tjpvdzp6IikpICE9IC0xKQorCXdo aWxlICgoY2ggPSBnZXRvcHQoYXJnYywgYXJndiwgImNlc1dNOm1OOm93OnoiKSkgIT0gLTEpCiAJ CXN3aXRjaChjaCkgewogCQljYXNlICdNJzoKIAkJCW1lbWYgPSBvcHRhcmc7CiAJCQlicmVhazsK KwkJY2FzZSAnbSc6CisJCQkvKiBEaXNwbGF5IG1vdW50IG9wdGlvbnMgZm9yIE5GUyBtb3VudCBw b2ludHMuICovCisJCQltbnRsZW4gPSBnZXRtbnRpbmZvKCZtbnRidWYsIE1OVF9OT1dBSVQpOwor CQkJZm9yIChpID0gMDsgaSA8IG1udGxlbjsgaSsrKSB7CisJCQkJaWYgKHN0cmNtcChtbnRidWYt PmZfZnN0eXBlbmFtZSwgIm5mcyIpID09IDApIHsKKwkJCQkJZHVtcG1udG9wdHMubmRtbnRfZm5h bWUgPQorCQkJCQkgICAgbW50YnVmLT5mX21udG9ubmFtZTsKKwkJCQkJZHVtcG1udG9wdHMubmRt bnRfYnVmID0gYnVmOworCQkJCQlkdW1wbW50b3B0cy5uZG1udF9ibGVuID0gc2l6ZW9mKGJ1Zik7 CisJCQkJCWlmIChuZnNzdmMoTkZTU1ZDX0RVTVBNTlRPUFRTLAorCQkJCQkgICAgJmR1bXBtbnRv cHRzKSA+PSAwKQorCQkJCQkJcHJpbnRmKCIlcyBvbiAlc1xuJXNcbiIsCisJCQkJCQkgICAgbW50 YnVmLT5mX21udGZyb21uYW1lLAorCQkJCQkJICAgIG1udGJ1Zi0+Zl9tbnRvbm5hbWUsIGJ1Zik7 CisJCQkJfQorCQkJCW1udGJ1ZisrOworCQkJfQorCQkJZXhpdCgwKTsKIAkJY2FzZSAnTic6CiAJ CQlubGlzdGYgPSBvcHRhcmc7CiAJCQlicmVhazsKQEAgLTY0Niw3ICs2NjgsNyBAQCB2b2lkCiB1 c2FnZSh2b2lkKQogewogCSh2b2lkKWZwcmludGYoc3RkZXJyLAotCSAgICAidXNhZ2U6IG5mc3N0 YXQgWy1jZW9zelddIFstTSBjb3JlXSBbLU4gc3lzdGVtXSBbLXcgd2FpdF1cbiIpOworCSAgICAi dXNhZ2U6IG5mc3N0YXQgWy1jZW9tc3pXXSBbLU0gY29yZV0gWy1OIHN5c3RlbV0gWy13IHdhaXRd XG4iKTsKIAlleGl0KDEpOwogfQogCi0tLSB1c3IuYmluL25mc3N0YXQvbmZzc3RhdC4xLnNhdgky MDEyLTExLTE0IDE4OjA0OjMyLjAwMDAwMDAwMCAtMDUwMAorKysgdXNyLmJpbi9uZnNzdGF0L25m c3N0YXQuMQkyMDEyLTExLTE0IDE4OjA3OjUzLjAwMDAwMDAwMCAtMDUwMApAQCAtMjgsNyArMjgs NyBAQAogLlwiICAgICBGcm9tOiBAKCMpbmZzc3RhdC4xCTguMSAoQmVya2VsZXkpIDYvNi85Mwog LlwiICRGcmVlQlNEOiBzdGFibGUvOS91c3IuYmluL25mc3N0YXQvbmZzc3RhdC4xIDIyMTQ5MSAy MDExLTA1LTA1IDEwOjE3OjA4WiBydSAkCiAuXCIKLS5EZCBNYXkgNCwgMjAxMQorLkRkIE5vdmVt YmVyIDE0LCAyMDEyCiAuRHQgTkZTU1RBVCAxCiAuT3MKIC5TaCBOQU1FCkBAIC0zOCw3ICszOCw3 IEBACiBzdGF0aXN0aWNzCiAuU2ggU1lOT1BTSVMKIC5ObQotLk9wIEZsIGNlb3N6VworLk9wIEZs IGNlb21zelcKIC5PcCBGbCBNIEFyIGNvcmUKIC5PcCBGbCBOIEFyIHN5c3RlbQogLk9wIEZsIHcg QXIgd2FpdApAQCAtNjksNiArNjksMTEgQEAgRXh0cmFjdCB0aGUgbmFtZSBsaXN0IGZyb20gdGhl IHNwZWNpZmllZAogUmVwb3J0IHN0YXRpc3RpY3MgZm9yIHRoZSBvbGQgTkZTIGNsaWVudCBhbmQv b3Igc2VydmVyLgogV2l0aG91dCB0aGlzCiBvcHRpb24gc3RhdGlzdGljcyBmb3IgdGhlIG5ldyBO RlMgY2xpZW50IGFuZC9vciBzZXJ2ZXIgd2lsbCBiZSByZXBvcnRlZC4KKy5JdCBGbCBtCitSZXBv cnQgdGhlIG1vdW50IG9wdGlvbnMgZm9yIGFsbCBuZXcgTkZTIGNsaWVudCBtb3VudHMuCitUaGlz IG9wdGlvbiBvdmVycmlkZXMgYWxsIG90aGVycyBhbmQKKy5ObQord2lsbCBleGl0IGFmdGVyIGNv bXBsZXRpbmcgdGhlIHJlcG9ydC4KIC5JdCBGbCBzCiBPbmx5IGRpc3BsYXkgc2VydmVyIHNpZGUg c3RhdGlzdGljcy4KIC5JdCBGbCBXCg== ------=_Part_392199_539570562.1352939098525 Content-Type: text/x-patch; name=nfsstat-dumpmnt.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=nfsstat-dumpmnt.patch LS0tIG5mcy9uZnNzdmMuaC5zYXYyCTIwMTItMTEtMTMgMTA6MDQ6MDYuMDAwMDAwMDAwIC0wNTAw CisrKyBuZnMvbmZzc3ZjLmgJMjAxMi0xMS0xNCAxNzo0ODoyNS4wMDAwMDAwMDAgLTA1MDAKQEAg LTY4LDUgKzY4LDEzIEBACiAjZGVmaW5lCU5GU1NWQ19aRVJPU1JWU1RBVFMJMHgwMjAwMDAwMAkv KiBtb2RpZmllciBmb3IgR0VUU1RBVFMgKi8KICNkZWZpbmUJTkZTU1ZDX1NVU1BFTkRORlNECTB4 MDQwMDAwMDAKICNkZWZpbmUJTkZTU1ZDX1JFU1VNRU5GU0QJMHgwODAwMDAwMAorI2RlZmluZQlO RlNTVkNfRFVNUE1OVE9QVFMJMHgxMDAwMDAwMAorCisvKiBBcmd1bWVudCBzdHJ1Y3R1cmUgZm9y IE5GU1NWQ19EVU1QTU5UT1BUUy4gKi8KK3N0cnVjdCBuZnNjbF9kdW1wbW50b3B0cyB7CisJY2hh cgkqbmRtbnRfZm5hbWU7CQkvKiBGaWxlIE5hbWUgKi8KKwlzaXplX3QJbmRtbnRfYmxlbjsJCS8q IFNpemUgb2YgYnVmZmVyICovCisJdm9pZAkqbmRtbnRfYnVmOwkJLyogYW5kIHRoZSBidWZmZXIg Ki8KK307CiAKICNlbmRpZiAvKiBfTkZTX05GU1NWQ19IICovCi0tLSBuZnMvbmZzX25mc3N2Yy5j LnNhdjIJMjAxMi0xMS0xMyAxMDowMjozMC4wMDAwMDAwMDAgLTA1MDAKKysrIG5mcy9uZnNfbmZz c3ZjLmMJMjAxMi0xMS0xMyAxMDowMzo1Ny4wMDAwMDAwMDAgLTA1MDAKQEAgLTkxLDggKzkxLDgg QEAgc3lzX25mc3N2YyhzdHJ1Y3QgdGhyZWFkICp0ZCwgc3RydWN0IG5mcwogCWlmICgodWFwLT5m bGFnICYgKE5GU1NWQ19BRERTT0NLIHwgTkZTU1ZDX09MRE5GU0QgfCBORlNTVkNfTkZTRCkpICYm CiAJICAgIG5mc2RfY2FsbF9uZnNzZXJ2ZXIgIT0gTlVMTCkKIAkJZXJyb3IgPSAoKm5mc2RfY2Fs bF9uZnNzZXJ2ZXIpKHRkLCB1YXApOwotCWVsc2UgaWYgKCh1YXAtPmZsYWcgJiAoTkZTU1ZDX0NC QUREU09DSyB8IE5GU1NWQ19ORlNDQkQpKSAmJgotCSAgICBuZnNkX2NhbGxfbmZzY2wgIT0gTlVM TCkKKwllbHNlIGlmICgodWFwLT5mbGFnICYgKE5GU1NWQ19DQkFERFNPQ0sgfCBORlNTVkNfTkZT Q0JEIHwKKwkgICAgTkZTU1ZDX0RVTVBNTlRPUFRTKSkgJiYgbmZzZF9jYWxsX25mc2NsICE9IE5V TEwpCiAJCWVycm9yID0gKCpuZnNkX2NhbGxfbmZzY2wpKHRkLCB1YXApOwogCWVsc2UgaWYgKCh1 YXAtPmZsYWcgJiAoTkZTU1ZDX0lETkFNRSB8IE5GU1NWQ19HRVRTVEFUUyB8CiAJICAgIE5GU1NW Q19HU1NEQUREUE9SVCB8IE5GU1NWQ19HU1NEQURERklSU1QgfCBORlNTVkNfR1NTRERFTEVURUFM TCB8Ci0tLSBmcy9uZnMvbmZzX3Zhci5oLnNhdjIJMjAxMi0xMS0xMyAxMDoxMToyMC4wMDAwMDAw MDAgLTA1MDAKKysrIGZzL25mcy9uZnNfdmFyLmgJMjAxMi0xMS0xMyAxMDoxMTo1NS4wMDAwMDAw MDAgLTA1MDAKQEAgLTMxMyw2ICszMTMsNyBAQCB2b2lkIG5mc2RfaW5pdCh2b2lkKTsKIGludCBu ZnNkX2NoZWNrcm9vdGV4cChzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQgKik7CiAKIC8qIG5mc19jbHZm c29wcy5jICovCit2b2lkIG5mc2NsX3JldG9wdHMoc3RydWN0IG5mc21vdW50ICosIGNoYXIgKiwg c2l6ZV90KTsKIAogLyogbmZzX2NvbW1vbnBvcnQuYyAqLwogaW50IG5mc3J2X2NoZWNrc29ja3Nl cW51bShzdHJ1Y3Qgc29ja2V0ICosIHRjcF9zZXEpOwotLS0gZnMvbmZzY2xpZW50L25mc19jbHBv cnQuYy5zYXYyCTIwMTItMTEtMTMgMTA6MDU6MDAuMDAwMDAwMDAwIC0wNTAwCisrKyBmcy9uZnNj bGllbnQvbmZzX2NscG9ydC5jCTIwMTItMTEtMTQgMDk6MDU6MjguMDAwMDAwMDAwIC0wNTAwCkBA IC0xMjMyLDYgKzEyMzIsOSBAQCBuZnNzdmNfbmZzY2woc3RydWN0IHRocmVhZCAqdGQsIHN0cnVj dCBuCiAJc3RydWN0IG5mc2NiZF9hcmdzIG5mc2NiZGFyZzsKIAlzdHJ1Y3QgbmZzZF9uZnNjYmRf YXJncyBuZnNjYmRhcmcyOwogCWludCBlcnJvcjsKKwlzdHJ1Y3QgbmFtZWlkYXRhIG5kOworCXN0 cnVjdCBuZnNjbF9kdW1wbW50b3B0cyBkdW1wbW50b3B0czsKKwljaGFyICpidWY7CiAKIAlpZiAo dWFwLT5mbGFnICYgTkZTU1ZDX0NCQUREU09DSykgewogCQllcnJvciA9IGNvcHlpbih1YXAtPmFy Z3AsIChjYWRkcl90KSZuZnNjYmRhcmcsIHNpemVvZihuZnNjYmRhcmcpKTsKQEAgLTEyNjQsNiAr MTI2NywyOCBAQCBuZnNzdmNfbmZzY2woc3RydWN0IHRocmVhZCAqdGQsIHN0cnVjdCBuCiAJCWlm IChlcnJvcikKIAkJCXJldHVybiAoZXJyb3IpOwogCQllcnJvciA9IG5mc2NiZF9uZnNkKHRkLCAm bmZzY2JkYXJnMik7CisJfSBlbHNlIGlmICh1YXAtPmZsYWcgJiBORlNTVkNfRFVNUE1OVE9QVFMp IHsKKwkJZXJyb3IgPSBjb3B5aW4odWFwLT5hcmdwLCAmZHVtcG1udG9wdHMsIHNpemVvZihkdW1w bW50b3B0cykpOworCQlpZiAoZXJyb3IgPT0gMCAmJiAoZHVtcG1udG9wdHMubmRtbnRfYmxlbiA8 IDI1NiB8fAorCQkgICAgZHVtcG1udG9wdHMubmRtbnRfYmxlbiA+IDEwMjQpKQorCQkJZXJyb3Ig PSBFUEVSTTsKKwkJaWYgKGVycm9yID09IDApCisJCQllcnJvciA9IG5mc3J2X2xvb2t1cGZpbGVu YW1lKCZuZCwKKwkJCSAgICBkdW1wbW50b3B0cy5uZG1udF9mbmFtZSwgdGQpOworCQlpZiAoZXJy b3IgPT0gMCAmJiBzdHJjbXAobmQubmlfdnAtPnZfbW91bnQtPm1udF92ZmMtPnZmY19uYW1lLAor CQkgICAgIm5mcyIpICE9IDApIHsKKwkJCXZwdXQobmQubmlfdnApOworCQkJZXJyb3IgPSBFUEVS TTsKKwkJfQorCQlpZiAoZXJyb3IgPT0gMCkgeworCQkJYnVmID0gbWFsbG9jKGR1bXBtbnRvcHRz Lm5kbW50X2JsZW4sIE1fVEVNUCwgTV9XQUlUT0spOworCQkJbmZzY2xfcmV0b3B0cyhWRlNUT05G UyhuZC5uaV92cC0+dl9tb3VudCksIGJ1ZiwKKwkJCSAgICBkdW1wbW50b3B0cy5uZG1udF9ibGVu KTsKKwkJCXZwdXQobmQubmlfdnApOworCQkJZXJyb3IgPSBjb3B5b3V0KGJ1ZiwgZHVtcG1udG9w dHMubmRtbnRfYnVmLAorCQkJICAgIGR1bXBtbnRvcHRzLm5kbW50X2JsZW4pOworCQkJZnJlZShi dWYsIE1fVEVNUCk7CisJCX0KIAl9IGVsc2UgewogCQllcnJvciA9IEVJTlZBTDsKIAl9Ci0tLSBm cy9uZnNjbGllbnQvbmZzX2NsdmZzb3BzLmMuc2F2MgkyMDEyLTA5LTI4IDE5OjAyOjMyLjAwMDAw MDAwMCAtMDQwMAorKysgZnMvbmZzY2xpZW50L25mc19jbHZmc29wcy5jCTIwMTItMTEtMTQgMTg6 MDI6MjYuMDAwMDAwMDAwIC0wNTAwCkBAIC0xNjI4LDMgKzE2MjgsMTA1IEBAIG5mc19nZXRubG1p bmZvKHN0cnVjdCB2bm9kZSAqdnAsIHVpbnQ4X3QKIAl9CiB9CiAKKy8qCisgKiBUaGlzIGZ1bmN0 aW9uIHByaW50cyBvdXQgYW4gb3B0aW9uIG5hbWUsIGJhc2VkIG9uIHRoZSBjb25kaXRpb25hbAor ICogYXJndW1lbnQuCisgKi8KK3N0YXRpYyBfX2lubGluZSB2b2lkIG5mc2NsX3ByaW50b3B0KHN0 cnVjdCBuZnNtb3VudCAqbm1wLCBpbnQgdGVzdHZhbCwKKyAgICBjaGFyICpvcHQsIGNoYXIgKipi dWYsIHNpemVfdCAqYmxlbikKK3sKKwlpbnQgbGVuOworCisJaWYgKHRlc3R2YWwgIT0gMCAmJiAq YmxlbiA+IHN0cmxlbihvcHQpKSB7CisJCWxlbiA9IHNucHJpbnRmKCpidWYsICpibGVuLCAiJXMi LCBvcHQpOworCQlpZiAobGVuICE9IHN0cmxlbihvcHQpKQorCQkJcHJpbnRmKCJFRUshIVxuIik7 CisJCSpidWYgKz0gbGVuOworCQkqYmxlbiAtPSBsZW47CisJfQorfQorCisvKgorICogVGhpcyBm dW5jdGlvbiBwcmludGYgb3V0IGFuIG9wdGlvbnMgaW50ZWdlciB2YWx1ZS4KKyAqLworc3RhdGlj IF9faW5saW5lIHZvaWQgbmZzY2xfcHJpbnRvcHR2YWwoc3RydWN0IG5mc21vdW50ICpubXAsIGlu dCBvcHR2YWwsCisgICAgY2hhciAqb3B0LCBjaGFyICoqYnVmLCBzaXplX3QgKmJsZW4pCit7CisJ aW50IGxlbjsKKworCWlmICgqYmxlbiA+IHN0cmxlbihvcHQpICsgMSkgeworCQkvKiBDb3VsZCBy ZXN1bHQgaW4gdHJ1bmNhdGVkIG91dHB1dCBzdHJpbmcuICovCisJCWxlbiA9IHNucHJpbnRmKCpi dWYsICpibGVuLCAiJXM9JWQiLCBvcHQsIG9wdHZhbCk7CisJCWlmIChsZW4gPCAqYmxlbikgewor CQkJKmJ1ZiArPSBsZW47CisJCQkqYmxlbiAtPSBsZW47CisJCX0KKwl9Cit9CisKKy8qCisgKiBM b2FkIHRoZSBvcHRpb24gZmxhZ3MgYW5kIHZhbHVlcyBpbnRvIHRoZSBidWZmZXIuCisgKi8KK3Zv aWQgbmZzY2xfcmV0b3B0cyhzdHJ1Y3QgbmZzbW91bnQgKm5tcCwgY2hhciAqYnVmZmVyLCBzaXpl X3QgYnVmbGVuKQoreworCWNoYXIgKmJ1ZjsKKwlzaXplX3QgYmxlbjsKKworCWJ1ZiA9IGJ1ZmZl cjsKKwlibGVuID0gYnVmbGVuOworCW5mc2NsX3ByaW50b3B0KG5tcCwgKG5tcC0+bm1fZmxhZyAm IE5GU01OVF9ORlNWNCkgIT0gMCwgIm5mc3Y0IiwgJmJ1ZiwKKwkgICAgJmJsZW4pOworCW5mc2Ns X3ByaW50b3B0KG5tcCwgKG5tcC0+bm1fZmxhZyAmIE5GU01OVF9ORlNWMykgIT0gMCwgIm5mc3Yz IiwgJmJ1ZiwKKwkgICAgJmJsZW4pOworCW5mc2NsX3ByaW50b3B0KG5tcCwgKG5tcC0+bm1fZmxh ZyAmIChORlNNTlRfTkZTVjMgfCBORlNNTlRfTkZTVjQpKSA9PSAwLAorCSAgICAibmZzdjIiLCAm YnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCBubXAtPm5tX3NvdHlwZSA9PSBTT0NL X1NUUkVBTSwgIix0Y3AiLCAmYnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCBubXAt Pm5tX3NvdHlwZSAhPSBTT0NLX1NUUkVBTSwgIix1ZHAiLCAmYnVmLCAmYmxlbik7CisJbmZzY2xf cHJpbnRvcHQobm1wLCAobm1wLT5ubV9mbGFnICYgTkZTTU5UX1JFU1ZQT1JUKSAhPSAwLCAiLHJl c3Zwb3J0IiwKKwkgICAgJmJ1ZiwgJmJsZW4pOworCW5mc2NsX3ByaW50b3B0KG5tcCwgKG5tcC0+ bm1fZmxhZyAmIE5GU01OVF9OT0NPTk4pICE9IDAsICIsbm9jb25uIiwKKwkgICAgJmJ1ZiwgJmJs ZW4pOworCW5mc2NsX3ByaW50b3B0KG5tcCwgKG5tcC0+bm1fZmxhZyAmIE5GU01OVF9TT0ZUKSA9 PSAwLCAiLGhhcmQiLCAmYnVmLAorCSAgICAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCAo bm1wLT5ubV9mbGFnICYgTkZTTU5UX1NPRlQpICE9IDAsICIsc29mdCIsICZidWYsCisJICAgICZi bGVuKTsKKwluZnNjbF9wcmludG9wdChubXAsIChubXAtPm5tX2ZsYWcgJiBORlNNTlRfSU5UKSAh PSAwLCAiLGludHIiLCAmYnVmLAorCSAgICAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCAo bm1wLT5ubV9mbGFnICYgTkZTTU5UX05PQ1RPKSA9PSAwLCAiLGN0byIsICZidWYsCisJICAgICZi bGVuKTsKKwluZnNjbF9wcmludG9wdChubXAsIChubXAtPm5tX2ZsYWcgJiBORlNNTlRfTk9DVE8p ICE9IDAsICIsbm9jdG8iLCAmYnVmLAorCSAgICAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1w LCAobm1wLT5ubV9mbGFnICYgTkZTTU5UX05PTE9DS0QpID09IDAsICIsbG9ja2QiLAorCSAgICAm YnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCAobm1wLT5ubV9mbGFnICYgTkZTTU5U X05PTE9DS0QpICE9IDAsICIsbm9sb2NrZCIsCisJICAgICZidWYsICZibGVuKTsKKwluZnNjbF9w cmludG9wdChubXAsIChubXAtPm5tX2ZsYWcgJiBORlNNTlRfUkRJUlBMVVMpICE9IDAsICIscmRp cnBsdXMiLAorCSAgICAmYnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCAobm1wLT5u bV9mbGFnICYgTkZTTU5UX0tFUkIpID09IDAsICIsc2VjPXN5cyIsCisJICAgICZidWYsICZibGVu KTsKKwluZnNjbF9wcmludG9wdChubXAsIChubXAtPm5tX2ZsYWcgJiAoTkZTTU5UX0tFUkIgfCBO RlNNTlRfSU5URUdSSVRZIHwKKwkgICAgTkZTTU5UX1BSSVZBQ1kpKSA9PSBORlNNTlRfS0VSQiwg IixzZWM9a3JiNSIsICZidWYsICZibGVuKTsKKwluZnNjbF9wcmludG9wdChubXAsIChubXAtPm5t X2ZsYWcgJiAoTkZTTU5UX0tFUkIgfCBORlNNTlRfSU5URUdSSVRZIHwKKwkgICAgTkZTTU5UX1BS SVZBQ1kpKSA9PSAoTkZTTU5UX0tFUkIgfCBORlNNTlRfSU5URUdSSVRZKSwgIixzZWM9a3JiNWki LAorCSAgICAmYnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHQobm1wLCAobm1wLT5ubV9mbGFn ICYgKE5GU01OVF9LRVJCIHwgTkZTTU5UX0lOVEVHUklUWSB8CisJICAgIE5GU01OVF9QUklWQUNZ KSkgPT0gKE5GU01OVF9LRVJCIHwgTkZTTU5UX1BSSVZBQ1kpLCAiLHNlYz1rcmI1cCIsCisJICAg ICZidWYsICZibGVuKTsKKwluZnNjbF9wcmludG9wdHZhbChubXAsIG5tcC0+bm1fYWNkaXJtaW4s ICIsYWNkaXJtaW4iLCAmYnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHR2YWwobm1wLCBubXAt Pm5tX2FjZGlybWF4LCAiLGFjZGlybWF4IiwgJmJ1ZiwgJmJsZW4pOworCW5mc2NsX3ByaW50b3B0 dmFsKG5tcCwgbm1wLT5ubV9hY3JlZ21pbiwgIixhY3JlZ21pbiIsICZidWYsICZibGVuKTsKKwlu ZnNjbF9wcmludG9wdHZhbChubXAsIG5tcC0+bm1fYWNyZWdtYXgsICIsYWNyZWdtYXgiLCAmYnVm LCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHR2YWwobm1wLCBubXAtPm5tX25hbWV0aW1lbywgIixu YW1ldGltZW8iLCAmYnVmLCAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHR2YWwobm1wLCBubXAtPm5t X25lZ25hbWV0aW1lbywgIixuZWduYW1ldGltZW8iLCAmYnVmLAorCSAgICAmYmxlbik7CisJbmZz Y2xfcHJpbnRvcHR2YWwobm1wLCBubXAtPm5tX3JzaXplLCAiLHJzaXplIiwgJmJ1ZiwgJmJsZW4p OworCW5mc2NsX3ByaW50b3B0dmFsKG5tcCwgbm1wLT5ubV93c2l6ZSwgIix3c2l6ZSIsICZidWYs ICZibGVuKTsKKwluZnNjbF9wcmludG9wdHZhbChubXAsIG5tcC0+bm1fcmVhZGRpcnNpemUsICIs cmVhZGRpcnNpemUiLCAmYnVmLAorCSAgICAmYmxlbik7CisJbmZzY2xfcHJpbnRvcHR2YWwobm1w LCBubXAtPm5tX3JlYWRhaGVhZCwgIixyZWFkYWhlYWQiLCAmYnVmLCAmYmxlbik7CisJbmZzY2xf cHJpbnRvcHR2YWwobm1wLCBubXAtPm5tX3djb21taXRzaXplLCAiLHdjb21taXRzaXplIiwgJmJ1 ZiwKKwkgICAgJmJsZW4pOworCW5mc2NsX3ByaW50b3B0dmFsKG5tcCwgbm1wLT5ubV90aW1lbywg Iix0aW1lb3V0IiwgJmJ1ZiwgJmJsZW4pOworCW5mc2NsX3ByaW50b3B0dmFsKG5tcCwgbm1wLT5u bV9yZXRyeSwgIixyZXRyYW5zIiwgJmJ1ZiwgJmJsZW4pOworfQorCg== ------=_Part_392199_539570562.1352939098525-- From owner-freebsd-fs@FreeBSD.ORG Thu Nov 15 00:38:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F11A9A7; Thu, 15 Nov 2012 00:38:16 +0000 (UTC) (envelope-from ryao@gentoo.org) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by mx1.freebsd.org (Postfix) with ESMTP id 636138FC0C; Thu, 15 Nov 2012 00:38:16 +0000 (UTC) Received: from [192.168.1.2] (pool-173-77-245-118.nycmny.fios.verizon.net [173.77.245.118]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id 0AF3A33DB5A; Thu, 15 Nov 2012 00:38:13 +0000 (UTC) Message-ID: <50A438D7.2030607@gentoo.org> Date: Wed, 14 Nov 2012 19:35:35 -0500 From: Richard Yao User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.10) Gecko/20121107 Thunderbird/10.0.10 MIME-Version: 1.0 To: Steven Hartland Subject: Re: Port of ZFSOnLinux solution for illumos-gate issue #2663 to FreeBSD References: <50A329A4.9090304@gentoo.org> <3AD8FAE474C24860B57C75DEBE431126@multiplay.co.uk> <50A42490.90103@gentoo.org> <098A2B6F89484CF1829EED7B9822DDD0@multiplay.co.uk> In-Reply-To: <098A2B6F89484CF1829EED7B9822DDD0@multiplay.co.uk> X-Enigmail-Version: 1.3.5 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig10E57287115B01ACEEE4A2FB" Cc: freebsd-fs@freebsd.org, Eitan Adler X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 00:38:17 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig10E57287115B01ACEEE4A2FB Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 11/14/2012 06:47 PM, Steven Hartland wrote: >=20 > ----- Original Message ----- From: "Richard Yao" >> With that said, the orignal patch permitted ashift=3D17, but that was >> shown to permit pool corruption: >> >> https://github.com/zfsonlinux/zfs/issues/425 >> >> As far as I know, ashift=3D13 is the highest value permitted on both L= inux >> and FreeBSD. The code can operate with higher values, but I do not >> recommend them. >=20 > Interesting I'll have a play with that there may be other edge cases I'= m > not aware of thanks for the heads up. >=20 My understanding of it is that badly designed hardware does not obey barriers correctly. The uberblock history was intended to workaround this by keeping readily accessible records of older transactions that can be used in the event that newer transactions failed to complete due to bad hardware. Increasing ashift will reduce the uberblock history size because the total space in each label for the uberblock history is 128 KB and the entries are padded to 2^ashift. If ashift is increased to the point where completed transactions are no longer in the history, pool corruption will occur in the event of sudden power loss. Hardware that properly respects barriers should be fine with ashift=3D16 because the previous entry will always be okay, but substandard hardware is not. So far, ashift=3D13 is the highest value that has been observed t= o be safe. There were some rather useful things about this written on the Open Solaris mailing list, but unfortunately, I did not keep a list of links for use as references. The following provides a partial description of what I just described: http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.h= tml Note that I consider my understanding of this issue to be incomplete, so please do not let my description of what I understand the issue to be prevent you from doing your own research. Yours truly, Richard Yao --------------enig10E57287115B01ACEEE4A2FB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQpDjZAAoJECDuEZm+6ExkBWkP/jrREtDxbYoAwf3jH0o5UCvJ mTRxDze2Qcu+aQaAaveytRtQSCloFrUHtN5Y3GvYznQ31TNRfc9zE8QZl6tofATM Qy8/Lla0kUvpOI6sve7VbptMXjKNoBTrIrWOqQuaX7or04pWl34e0qcWPdGEuxj3 BG1zBlNKtXlBHgc5xWey3Vjm/Z6L+5ihtu6UXvTIrvYz2tQu5E9E/AkrMnqy/OQ8 PNAIILOwiu9jGaEDzic+0FS6HG1FOMlSfEI1/jT8ctFXFkBse7Wi2bGGNzzYwkAy AUH9I8QGs7nuZlBCwVc65wiKS+/CJ2kQWczKzTH26BzZljQMPIraneSYtJhUfmyF nB1Cj1fMYjKJAybjVXgSPxxXqnv8QhQelmE9n+xW3K6C8ZIq126UQHaU2JJGqZwx 2eqFyGykbuhvUH8f/eAzEnZUQ75BNWMfJ3yJ5sOoTlqXt31cGy5f84yAZBGPZEAU WTLlAlVp3/6/r76By8PE1QJUL5dgyZg92ARHqn6iuAGC09NpeCH+pKiTrFv/J9nF ncus5BCkbxAyoKMsfR2r3zy+z4Rn7lI3Tr5Y9oPSUKHufeJNnm1oadHYF+htLUQ2 MqBkeOKspnAcwXMpCUu6S0bciUva9+FKW9NgtMTKUAYdqUMXa731qq97UMQgaj9r jC94YtWFlOKzMgOIJ4mQ =g35k -----END PGP SIGNATURE----- --------------enig10E57287115B01ACEEE4A2FB-- From owner-freebsd-fs@FreeBSD.ORG Thu Nov 15 01:58:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3BBEFC42 for ; Thu, 15 Nov 2012 01:58:00 +0000 (UTC) (envelope-from frimik@gmail.com) Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by mx1.freebsd.org (Postfix) with ESMTP id 09A138FC12 for ; Thu, 15 Nov 2012 01:57:59 +0000 (UTC) Received: by mail-pa0-f54.google.com with SMTP id kp6so785214pab.13 for ; Wed, 14 Nov 2012 17:57:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=H5IEkIkv3phq7O+zi0ba0XwZjlc9s+1qWV/xwTu9kXc=; b=oBh96SwpUHyWltvyTFX5CrKWy/D1E9zvq5NIpEnrYZt09n7eZ67RPuIkFoCWrLjsJh kaToH17lOvuF3l6jKdbMxBMXkBA81Vz2EO2wiIh0CipRrk4Q9q7KAT10JnAuqk8Y/n/M zveaxMB4l9I3s2AYleOKtxRV/A2vT0ohmAxUkyaJ9YtqJGpH1QHJ+3WcWgSyT/f7THtq XqJHy01ci8jtPan7csnOqqLQYN83lYrLLCxL1CBOIMO5p2XH9wkSQQhdxh4PxQTR0EeS iGhdINSzqQ7Rn4pBX6DUQgp1aqQ70s3RDU01CePxbuZrrTv8QC/ngvjf45pAh01sJ/AY U4PA== MIME-Version: 1.0 Received: by 10.68.202.7 with SMTP id ke7mr3401872pbc.114.1352944679542; Wed, 14 Nov 2012 17:57:59 -0800 (PST) Received: by 10.66.121.233 with HTTP; Wed, 14 Nov 2012 17:57:59 -0800 (PST) In-Reply-To: <20121115001840.GA27399@FreeBSD.org> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> Date: Thu, 15 Nov 2012 02:57:59 +0100 Message-ID: Subject: Re: RHEL to FreeBSD file server From: Mikael Fridh To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 01:58:00 -0000 On Thu, Nov 15, 2012 at 1:18 AM, John wrote: > ----- Julian Elischer's Original Message ----- > > On 11/13/12 1:19 PM, Jason Keltz wrote: > > >On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: > > >>On Mon, 12 Nov 2012, kpneal@pobox.com wrote: > > >>> > > >>>With your setup of 11 mirrors you have a good mixture of read > > >>>and write > > >>>performance, but you've compromised on the safety. The reason > > >>>that RAID 6 > > ... > > > >By the way - on another note - what do you or other list members > > >think of the new Intel SSD DC S3700 as ZIL? Sounds very promising > > >when it's finally available. I spent a lot of time researching > > >ZILs today, and one thing I can say is that I have a major > > >headache now because of it!! > > > > ZIL is best served by battery backed up ram or something.. it's tiny > > and not a really good fit an SSD (maybe just a partition) L2ARC on > > the other hand is a really good use for SSD. > > Well, since you brought the subject up :-) > > Do you have any recommendations for an NVRAM unit usable with Freebsd? > I've always had my eyes on something like this for ZIL but never had the need to explore it yet: http://www.ddrdrive.com/ Most recommendations I've seen have also been around mirrored 15krpm disks of some sort or even a cheaper battery-backed raid controller in front of decent disks. for zil it would just need a tiny bit of RAM anyway. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 15 09:27:08 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B0F97981 for ; Thu, 15 Nov 2012 09:27:08 +0000 (UTC) (envelope-from rainer@ultra-secure.de) Received: from mail.ultra-secure.de (mail.ultra-secure.de [78.47.114.122]) by mx1.freebsd.org (Postfix) with ESMTP id E90C18FC15 for ; Thu, 15 Nov 2012 09:27:07 +0000 (UTC) Received: (qmail 94258 invoked by uid 89); 15 Nov 2012 09:27:05 -0000 Received: by simscan 1.4.0 ppid: 94253, pid: 94255, t: 0.1364s scanners: attach: 1.4.0 clamav: 0.97.3/m:54/d:15577 Received: from unknown (HELO suse3) (rainer@ultra-secure.de@212.71.117.1) by mail.ultra-secure.de with ESMTPA; 15 Nov 2012 09:27:05 -0000 Date: Thu, 15 Nov 2012 10:27:04 +0100 From: Rainer Duffner To: John Subject: Re: RHEL to FreeBSD file server Message-ID: <20121115102704.6657ee52@suse3> In-Reply-To: <20121115001840.GA27399@FreeBSD.org> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-suse-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 09:27:08 -0000 > Do you have any recommendations for an NVRAM unit usable with Freebsd? I haven't tried it, but my bet would be on this one: http://www.stec-inc.com/product/zeusram.php It seems that various Nexenta-integrators are using this. To the OS, it should appears as just another SAS hardrive. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 15 23:10:49 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 299D77B6; Thu, 15 Nov 2012 23:10:49 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id C63C48FC08; Thu, 15 Nov 2012 23:10:48 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id qAFNAlle013379; Thu, 15 Nov 2012 18:10:47 -0500 (EST) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id qAFNAlhY013376; Thu, 15 Nov 2012 18:10:47 -0500 (EST) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20645.30327.506353.158003@hergotha.csail.mit.edu> Date: Thu, 15 Nov 2012 18:10:47 -0500 From: Garrett Wollman To: freebsd-net@freebsd.org, freebsd-fs@freebsd.org Subject: NFS over SCTP -- is anyone likely to implement this? X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Thu, 15 Nov 2012 18:10:47 -0500 (EST) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2012 23:10:49 -0000 I'm working on (of all things) a Puppet module to configure NFS servers, and I'm wondering if anyone expects to implement NFS over SCTP on FreeBSD. -GAWollman From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 03:55:20 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 50A17984 for ; Fri, 16 Nov 2012 03:55:20 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id B82E68FC08 for ; Fri, 16 Nov 2012 03:55:19 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id gg13so2311747lbb.13 for ; Thu, 15 Nov 2012 19:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=VZ7URePKaagqoqxuxljiu5Ng1Ev7Pc+j9A2JA3ybST8=; b=FzoohHvDjS4+dEECF2e040E6nOn6mlPCZ5SQqE/Mri04W08MT85gy0Hr9E+R+2TmcK AIIRasP/1sqfl11wUvAp0+CPFHto3v0elOepkFQmWrf8yVHngBDqN0N6K0+ZLdmh9Gai YFN8Bst1DbzA55ex/ORONX9TXFpU2Cjnwd6Xw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :x-gm-message-state; bh=VZ7URePKaagqoqxuxljiu5Ng1Ev7Pc+j9A2JA3ybST8=; b=MvNmbMrEvboQKxti3PseviZAP7v6vL4MMRSNlksYNy65y4djgfB9DV1Vo3Rqn2wtUQ /aXGjXO5ywlHEeYZ/vwJJxj6PP4pka0sVvvKmaN4v6IAQZ6biM9Zw29TbMS5oP2ulRGj fc4YqIyUMRkNNvylleIVNrQkemdikxUvEoIkoRwt27feF8GevTVkXy/ldSgNHSf2inNg 0wkE4ReXDyPSXQe3HTfTGIvbNbMyKIr3anukJ2LrNwTYIUuE80K5YZm1wTPzV05LnMa/ kksEwEoGdHwNUZu1NxeOXeQDqlu1mUq20OF/CcGwGi9AJ7eMoITHUAH2bAzGeXsNHiG+ vKTw== Received: by 10.112.54.40 with SMTP id g8mr1479635lbp.49.1353038118300; Thu, 15 Nov 2012 19:55:18 -0800 (PST) MIME-Version: 1.0 Sender: lists@eitanadler.com Received: by 10.112.25.166 with HTTP; Thu, 15 Nov 2012 19:54:46 -0800 (PST) In-Reply-To: <50A31D48.3000700@shatow.net> References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> From: Eitan Adler Date: Thu, 15 Nov 2012 22:54:46 -0500 X-Google-Sender-Auth: ftb5aLdlPWd8z8-nB2LfQHb15dM Message-ID: Subject: Re: SSD recommendations for ZFS cache/log To: Bryan Drewery Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQlc7hCJOoG8aBIWlgHqb9V4Xor9lcpLCD0/KbJQWoqQcebooPcuxfha40IYmor4UFTgb9Kn Cc: FreeBSD FS , Stephen McKay X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 03:55:20 -0000 On 13 November 2012 23:25, Bryan Drewery wrote: > IMHO this whole post should be enshrined into an FAQ or manpage or wiki. > It's very informative and compelling. Sounds good. Can people here please tell me what is wrong in the following content? Is there additional data or questions to add? Please note that I've never used ZFS so being whacked by a cluebat would be helpful. commit 1675274e40464fb37f822a176b9eed28ea729947 Author: Eitan Adler Date: Thu Nov 15 22:52:33 2012 -0500 Add a section in the FAQ about ZFS Reviewed by: ??? Approved by: ??? diff --git a/en_US.ISO8859-1/books/faq/book.xml b/en_US.ISO8859-1/books/faq/book.xml index 7ad4974..caffad1 100644 --- a/en_US.ISO8859-1/books/faq/book.xml +++ b/en_US.ISO8859-1/books/faq/book.xml @@ -5367,6 +5367,62 @@ C:\="DOS" + + + ZFS + + + + + What is the ZIL and when does it get used? + + + + The ZIL (ZFS + intent log) is a write cache for ZFS. All writes get + recorded in the ZIL. Eventually ZFS will perform a + Transaction Group Commit in which it + flushes out the data in the ZIL to disk. + + + + + + What is the L2ARC? + + + + The L2ARC is a read cache stored + on a fast device such as an SSD. It is + used to speed up operations such as deduplication or + encryption. This cache is not persisent across + reboots. Note that RAM is used as the first layer + of cache and the L2ARC is only needed if there is + insufficient RAM. + + + + + + Is enabling deduplication advisable? + + + + The answer very much depends on the expected workload. + Deduplication takes up a signifigent amount of RAM and CPU + time and may slow down read and write disk access times. + Unless one is storing data that is very heavily + duplicated (such as virtual machine images, or user + backups) it is likely that deduplication will do more harm + than good. Another consideration is the inability to + revert deduplication status. If deduplication is enabled, + data written, and then dedup is disabled, those blocks + which were deduplicated will not be duplicated until + they are next modified. + + + + -- Eitan Adler Source, Ports, Doc committer Bugmeister, Ports Security teams From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 04:58:39 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0B800273; Fri, 16 Nov 2012 04:58:39 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [IPv6:2001:44b8:8060:ff02:300:1:6:5]) by mx1.freebsd.org (Postfix) with ESMTP id 18BA28FC08; Fri, 16 Nov 2012 04:58:37 +0000 (UTC) Message-Id: <57ac1f$gg70bn@ipmail05.adl6.internode.on.net> X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AohgABzHpVB4mMGlPGdsb2JhbAAqGrQQjjoYAQEBATg0gh8BBQ5XFBAIEAE4QxQGiB8MLckohAeBBgOXGIRxA4UliBE Received: from unknown (HELO localhost) ([120.152.193.165]) by ipmail05.adl6.internode.on.net with ESMTP; 16 Nov 2012 15:28:36 +1030 From: Stephen McKay To: Eitan Adler Subject: ZFS FAQ (Was: SSD recommendations for ZFS cache/log) References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> In-Reply-To: from Eitan Adler at "Thu, 15 Nov 2012 22:54:46 -0500" Date: Fri, 16 Nov 2012 15:58:27 +1100 Cc: FreeBSD FS , Stephen McKay X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 04:58:39 -0000 On Thursday, 15th November 2012, Eitan Adler wrote: >Can people here please tell me what is wrong in the following content? A few things. I'll intersperse them. >Is there additional data or questions to add? The whole ZFS world desperately needs good documentation. There are misconceptions everywhere. There are good tuning hints and bad (or out of date) ones. Further, it depends on your target application whether the defaults are fairly good or plain suck. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide is one of the places to go, but it's quite Solaris specific. It's also starting to get a bit dated. The existing http://wiki.freebsd.org/ZFS and ZFSTuningGuide are a mixture of old and recent information and are not a useful intro to the subject for a system administrator. As a project, we should have some short pithy "How to do ZFS right" document that targets workstations, file servers, database servers and web servers (as a start). There are so many things to balance it's not obvious what to do. Simply whacking in a cheap SSD and a bunch of slow disks is rarely the right answer. Included in this hypothetical guide would also be the basics of partitioning using gpart (GPT) with labels and boot partitions to allow complex setups. For example, I've set up small servers where a small slice of each disk becomes a massively mirrored root pool while the majority of each disk becomes a raidz2 pool. This makes sense where you need maximum flexibility and can afford only a few spindles. >Please note that I've never used ZFS so being whacked by a cluebat >would be helpful. My cluebat is small. :-) I've used ZFS for real for a few different target applications but by no means have I covered all uses. I think we need input from many people to make a useful ZFS FAQ. >+ >+ What is the ZIL and when does it get used? >+ >+ >+ >+ The ZIL (ZFS >+ intent log) is a write cache for ZFS. All writes get >+ recorded in the ZIL. Eventually ZFS will perform a >+ Transaction Group Commit in which it >+ flushes out the data in the ZIL to disk. >+ The ZIL is not a cache. It is only used for synchronous writes, not for all writes. It is only read during crash recovery. Its purpose is data integrity. Async writes (most writes) are kept in RAM and bundled into transactions. Transactions are written to disk in an atomic fashion. The ZIL is needed for writes that have been acknowledged as written but which are not yet on disk as part of a transaction. Sync writes will result from fsync() calls, being a NFS server, most netatalk stuff I've seen, and probably a lot of other stuff. But crucially, you get none from editing, compiling, playing nethack, web browsing and many other things. So you may not need a separate fast ZIL, which was basically the question that started all this off. :-) So, I guess you need a "Do I need an SSD for ZIL?" question here somewhere, and a similar "Do I need an SSD for L2ARC?" to go with it. >+ >+ What is the L2ARC? >+ >+ >+ >+ The L2ARC is a read cache stored >+ on a fast device such as an SSD. It is >+ used to speed up operations such as deduplication or >+ encryption. This cache is not persisent across >+ reboots. Note that RAM is used as the first layer >+ of cache and the L2ARC is only needed if there is >+ insufficient RAM. >+ >+ The L2ARC is a general read cache which happens to speed up dedup because the dedup table typically gets very large and will not fit into ARC. It's not primarily there to help dedup. I don't think it's anything to do with encryption. Or compression, if that's what you meant to write. I wish I could expand on the "you don't need L2ARC if you have enough RAM" idea, but that's basically true. L2ARC isn't a free lunch either as it needs space in the ARC to index it. So, perversely, a working set that fits perfectly in the ARC will not fit perfectly any more if an L2ARC is used because part of the ARC is holding the index, pushing part of the working set into the L2ARC which is presumably slower than RAM. I think people could still write research papers on this aspect of ZFS. It's also any area where the defaults seem poorly tuned, at least if we believe our own ZFSTuningGuide wiki page. >+ >+ Is enabling deduplication advisable? >+ >+ >+ >+ The answer very much depends on the expected workload. >+ Deduplication takes up a signifigent amount of RAM and CPU >+ time and may slow down read and write disk access times. >+ Unless one is storing data that is very heavily >+ duplicated (such as virtual machine images, or user >+ backups) it is likely that deduplication will do more harm >+ than good. Another consideration is the inability to >+ revert deduplication status. If deduplication is enabled, >+ data written, and then dedup is disabled, those blocks >+ which were deduplicated will not be duplicated until >+ they are next modified. >+ s/signifigent/significant/ I've got a really short answer to whether or not you should enable dedup which I give people who ask: No. I have a longer answer too, but I think the short answer is better than typing all day. :-) I like your version, but would be tempted to make it more scary so people don't discover too late the long term pain dedup causes. People rarely expect, for example, that deleting stuff can be slow, but with dedup it can be glacial, especially if your dedup table doesn't fit in RAM. Perhaps you should start with the words "Generally speaking, no." Cheers, Stephen. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 05:41:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0F82D582; Fri, 16 Nov 2012 05:41:36 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id EF2A88FC16; Fri, 16 Nov 2012 05:41:34 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id gg13so2362330lbb.13 for ; Thu, 15 Nov 2012 21:41:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=87Zgcd60kYFuO7Frna6sZCSQApLan3epo9zCQfUNdI0=; b=ZwV7RxzZyxqjfz7JvXKeJnekR0HdNnaAzXjk7O9Hd/qtSfzwtJKwkezo4CXNPZgqjL iCRNxRfToWZwjNNI7eSFG3fawA4o8Fd3pZK2inDTwOaXOdvWlTomcMQlg2dYZwBavm9i dtbb0d6CAkI/ff1HA9PvWfKs0DzAXklZCUFM5430whBRWhnBsxWQlTgTm9iKl+stDzXu scu0Iuw3eZDpHJrRN71bXXmoUIEyZ5kNtezh2rvPWrXg2h1BgCQIOpuXLAFVlvDx/GZQ gfqJwY7kn7bMm1tccSZe0RnV49YNfey0a2nDOc7+US66hRnKiIF97nkNWKNClky6IHr5 0zsg== MIME-Version: 1.0 Received: by 10.152.106.110 with SMTP id gt14mr3261105lab.1.1353044493951; Thu, 15 Nov 2012 21:41:33 -0800 (PST) Received: by 10.112.49.138 with HTTP; Thu, 15 Nov 2012 21:41:33 -0800 (PST) In-Reply-To: <20121116044055.GA47859@neutralgood.org> References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> <20121116044055.GA47859@neutralgood.org> Date: Fri, 16 Nov 2012 00:41:33 -0500 Message-ID: Subject: Re: SSD recommendations for ZFS cache/log From: Zaphod Beeblebrox To: kpneal@pobox.com Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD FS , Bryan Drewery , Eitan Adler , Stephen McKay X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 05:41:36 -0000 On Thu, Nov 15, 2012 at 11:40 PM, wrote: >> + >> + The answer very much depends on the expected workload. >> + Deduplication takes up a signifigent amount of RAM and CPU >> + time and may slow down read and write disk access times. >> + Unless one is storing data that is very heavily >> + duplicated (such as virtual machine images, or user >> + backups) it is likely that deduplication will do more harm >> + than good. Another consideration is the inability to > > I advise against advice that is this firm. The statement that it will "do > more harm than good" really should be omitted. And I'm not sure it is > fair to say it takes a bunch of CPU. Lots of memory, yes, but lots of > CPU isn't so clear. I experimented by enabling DEDUP on a RAID-Z1 pool containing 4x 2T green drives. The system had 8G of RAM and was otherwise quiet. I copied a dataset of about 1T of random stuff onto the array and then copied the same set of data onto the array a second time. The end result is a dedup ration of almost 2.0 and only around 1T of disk used. As I recall (and it's been 6-ish months since I did this), the 2nd write became largely CPU bound with little disk activity. As far as I could tell, the dedup table never thrashed on the disk ... and that most of the disk activity seemed to be creating the directory tree or reading the disk to do the verify step of dedup. The CPU is modest... a 2.6 Ghz Core-2-duo --- and I don't recall if it busied both cores or just one. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 06:03:23 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9775B9F5; Fri, 16 Nov 2012 06:03:23 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id D99D78FC0C; Fri, 16 Nov 2012 06:03:20 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id gg13so2373692lbb.13 for ; Thu, 15 Nov 2012 22:03:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YFuuywfayqqoLYMGDLB9DIFtekc+3eS8V/3DHePUvv0=; b=Njf7xAjtTSl1XIdLOFfM12yOG5uxb2DO3TYfWLfZUmCtA/P8Kqm/pglS6FrL8j9Dkt rC+d1eXHog0iGZBH5wgqMKeCEpVUZlFMhswWrkWtMg0rEXksNbjCPptij9YhiuU8tVGO Jk7vSQvl8zguDQ5aFHKaF3YopOfCGtgvNTcRXQsVYVHrl+wlCGgLQOVd8Q7NLdR84CKZ FOhpbhljHvLMW3wK/lsCf12+JR/l+0efdUAC8FqmrbGT0CTAfyzwANn4CtEJqq5RUD4k JCmHfNjevDRgs5gZGn2eDcgZ4Fp+jobTbeX6hFiOrPFhdsawz8Ov0m2dPH8FVTlh4ov1 HTZw== MIME-Version: 1.0 Received: by 10.112.9.199 with SMTP id c7mr1540295lbb.70.1353045799666; Thu, 15 Nov 2012 22:03:19 -0800 (PST) Received: by 10.112.49.138 with HTTP; Thu, 15 Nov 2012 22:03:19 -0800 (PST) In-Reply-To: <20121115102704.6657ee52@suse3> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> <20121115102704.6657ee52@suse3> Date: Fri, 16 Nov 2012 01:03:19 -0500 Message-ID: Subject: Re: RHEL to FreeBSD file server From: Zaphod Beeblebrox To: Rainer Duffner Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org, John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 06:03:23 -0000 On Thu, Nov 15, 2012 at 4:27 AM, Rainer Duffner wrote: > http://www.stec-inc.com/product/zeusram.php > It seems that various Nexenta-integrators are using this. The only price I see is $2449 on Dell's website. Ouch. For 8G. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 06:31:32 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2A9C11A6 for ; Fri, 16 Nov 2012 06:31:32 +0000 (UTC) (envelope-from smckay@internode.on.net) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [IPv6:2001:44b8:8060:ff02:300:1:6:4]) by mx1.freebsd.org (Postfix) with ESMTP id 6D4938FC0C for ; Fri, 16 Nov 2012 06:31:31 +0000 (UTC) Message-Id: <75d11b$h1cl9g@ipmail04.adl6.internode.on.net> X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjlKAFjcpVB4mDTJPGdsb2JhbAAqGsJKGAEBAQE4NIIeAQEEAXkQCBA5QxSIIAUMLck8AgOBYYIhgQYDnAkDhR2IGQ Received: from unknown (HELO localhost) ([120.152.52.201]) by ipmail04.adl6.internode.on.net with ESMTP; 16 Nov 2012 17:01:30 +1030 From: Stephen McKay To: freebsd-fs@freebsd.org Subject: RAM disk for ZFS ZIL? (Was: RHEL to FreeBSD file server) References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org><20121115102704.6657ee52@suse3> In-Reply-To: <20121115102704.6657ee52@suse3> from Rainer Duffner at "Thu, 15 Nov 2012 10:27:04 +0100" Date: Fri, 16 Nov 2012 17:31:26 +1100 Cc: Stephen McKay X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 06:31:32 -0000 On Thursday, 15th November 2012, Rainer Duffner wrote: >> Do you have any recommendations for an NVRAM unit usable with Freebsd? > >I haven't tried it, but my bet would be on this one: > >http://www.stec-inc.com/product/zeusram.php > >It seems that various Nexenta-integrators are using this. > >To the OS, it should appears as just another SAS hardrive. And at the low end of the market, has anyone tried this: http://www.acard.com/english/fb01-product.jsp?idno_no=382&prod_no=ANS-9010BA&type1_idno=5&ino=28 The Acard ANS-9010BA seems cheap and, if not plentifully available, at least available on ebay and amazon. It's battery backed and will auto backup to a CF card, which seems pretty cool for something that is cheaper than the big boys' toys. Sometimes cheap is good. Sometimes cheap is a disaster. :-) Has anyone discovered which camp this one is in, when used as a ZFS ZIL? Cheers, Stephen. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 08:17:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 892E1D7B for ; Fri, 16 Nov 2012 08:17:56 +0000 (UTC) (envelope-from rainer@ultra-secure.de) Received: from mail.ultra-secure.de (mail.ultra-secure.de [78.47.114.122]) by mx1.freebsd.org (Postfix) with ESMTP id BE3DD8FC14 for ; Fri, 16 Nov 2012 08:17:55 +0000 (UTC) Received: (qmail 25444 invoked by uid 89); 16 Nov 2012 08:17:47 -0000 Received: by simscan 1.4.0 ppid: 25439, pid: 25441, t: 0.0460s scanners: attach: 1.4.0 clamav: 0.97.3/m:54/d:15583 Received: from unknown (HELO suse3) (rainer@ultra-secure.de@212.71.117.1) by mail.ultra-secure.de with ESMTPA; 16 Nov 2012 08:17:47 -0000 Date: Fri, 16 Nov 2012 09:17:47 +0100 From: Rainer Duffner To: Zaphod Beeblebrox Subject: Re: RHEL to FreeBSD file server Message-ID: <20121116091747.2c1bfc55@suse3> In-Reply-To: References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> <20121115102704.6657ee52@suse3> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-suse-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 08:17:56 -0000 Am Fri, 16 Nov 2012 01:03:19 -0500 schrieb Zaphod Beeblebrox : > On Thu, Nov 15, 2012 at 4:27 AM, Rainer Duffner > wrote: >=20 > > http://www.stec-inc.com/product/zeusram.php >=20 > > It seems that various Nexenta-integrators are using this. >=20 > The only price I see is $2449 on Dell's website. Ouch. For 8G. I saw it for 1800=E2=82=AC somewhere. FusionIO is 10k for 320G.=20 But I've never seen FusionIO used in a Fileserver - probably doesn't make sense, price-wise. IIRC, the OP wanted to build a 16T fileserver. So he will have to spend some cash anyway. I think it's a safe assumption that Nexenta and its partners have spent quite some time evaluating the various options. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 14:04:59 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 89464DE4 for ; Fri, 16 Nov 2012 14:04:59 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (hill.egr.msu.edu [35.9.37.162]) by mx1.freebsd.org (Postfix) with ESMTP id 563D78FC0C for ; Fri, 16 Nov 2012 14:04:58 +0000 (UTC) Received: from hill (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id 0D4802F1E7 for ; Fri, 16 Nov 2012 08:58:45 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by hill (hill.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zU1ZlQZAaNSx for ; Fri, 16 Nov 2012 08:58:44 -0500 (EST) Received: from EGR authenticated sender Message-ID: <50A64694.5030001@egr.msu.edu> Date: Fri, 16 Nov 2012 08:58:44 -0500 From: Adam McDougall User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121115 Thunderbird/16.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: SSD recommendations for ZFS cache/log References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> <20121116044055.GA47859@neutralgood.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 14:04:59 -0000 On 11/16/12 00:41, Zaphod Beeblebrox wrote: > On Thu, Nov 15, 2012 at 11:40 PM, wrote: >>> + >>> + The answer very much depends on the expected workload. >>> + Deduplication takes up a signifigent amount of RAM and CPU >>> + time and may slow down read and write disk access times. >>> + Unless one is storing data that is very heavily >>> + duplicated (such as virtual machine images, or user >>> + backups) it is likely that deduplication will do more harm >>> + than good. Another consideration is the inability to >> >> I advise against advice that is this firm. The statement that it will "do >> more harm than good" really should be omitted. And I'm not sure it is >> fair to say it takes a bunch of CPU. Lots of memory, yes, but lots of >> CPU isn't so clear. > > I experimented by enabling DEDUP on a RAID-Z1 pool containing 4x 2T > green drives. The system had 8G of RAM and was otherwise quiet. I > copied a dataset of about 1T of random stuff onto the array and then > copied the same set of data onto the array a second time. The end > result is a dedup ration of almost 2.0 and only around 1T of disk > used. > > As I recall (and it's been 6-ish months since I did this), the 2nd > write became largely CPU bound with little disk activity. As far as I > could tell, the dedup table never thrashed on the disk ... and that > most of the disk activity seemed to be creating the directory tree or > reading the disk to do the verify step of dedup. > > The CPU is modest... a 2.6 Ghz Core-2-duo --- and I don't recall if it > busied both cores or just one. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Now try deleting some data and the fun begins :) From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 15:45:28 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2FC9EA0B; Fri, 16 Nov 2012 15:45:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2CDE98FC1A; Fri, 16 Nov 2012 15:45:26 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA17441; Fri, 16 Nov 2012 17:45:08 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <50A65F83.5000604@FreeBSD.org> Date: Fri, 16 Nov 2012 17:45:07 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2 MIME-Version: 1.0 To: Guido Falsi , Bartosz Stec Subject: problem booting to multi-vdev root pool [Was: kern/150503: [zfs] ZFS disks are UNAVAIL and corrupted after reboot] References: <509D1DEC.6040505@FreeBSD.org> <50A27243.408@madpilot.net> In-Reply-To: <50A27243.408@madpilot.net> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 15:45:28 -0000 on 13/11/2012 18:16 Guido Falsi said the following: > My idea, but is just a speculation, i could be very wrong, is that the geom > tasting code has some problem with multiple vdev root pools. Guido, you are absolutely correct. The code for reconstructing/tasting a root pool configuration is a modified upstream code, so it inherited a limitation from it: the support for only a single top-level vdev in a root pool. I have an idea how to add the missing support, but it turned out not to be something that I can hack together in couple of hours. So, instead I wrote the following patch that should fall back to using a root pool configuration from zpool.cache (if it's present there) for a multi-vdev root pool: http://people.freebsd.org/~avg/zfs-spa-multi_vdev_root_fallback.diff The patch also fixes a minor (single-time) memory leak. Guido, Bartosz, could you please test the patch? Apologies for the breakage. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 16:13:19 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3C93459A; Fri, 16 Nov 2012 16:13:19 +0000 (UTC) (envelope-from zeising@daemonic.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [IPv6:2001:6b0:17:f0a0::3]) by mx1.freebsd.org (Postfix) with ESMTP id 683588FC08; Fri, 16 Nov 2012 16:13:16 +0000 (UTC) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 7377340014; Fri, 16 Nov 2012 17:13:14 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 686744000C; Fri, 16 Nov 2012 17:13:14 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.3.1 X-Spam-Score: 0.0 Received: from mx.daemonic.se (mx.daemonic.se [IPv6:2001:470:dca9:0:1::3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id E265F40005; Fri, 16 Nov 2012 17:13:13 +0100 (CET) Received: from mailscanner.daemonic.se (mailscanner.daemonic.se [IPv6:2001:470:dca9:0:1::6]) by mx.daemonic.se (Postfix) with ESMTPS id 3Y34HD4fsdz8hVn; Fri, 16 Nov 2012 17:13:12 +0100 (CET) X-Virus-Scanned: amavisd-new at daemonic.se Received: from mx.daemonic.se ([10.1.0.3]) (using TLS with cipher CAMELLIA256-SHA) by mailscanner.daemonic.se (mailscanner.daemonic.se [10.1.0.6]) (amavisd-new, port 10025) with ESMTPS id LWgNZDAa99L3; Fri, 16 Nov 2012 17:13:10 +0100 (CET) Received: from mail.daemonic.se (mail.daemonic.se [IPv6:2001:470:dca9:0:1::4]) by mx.daemonic.se (Postfix) with ESMTPS id 3Y34HB1ThRz8hVm; Fri, 16 Nov 2012 17:13:10 +0100 (CET) Received: from tifa.daemonic.se (tifa.daemonic.se [IPv6:2001:470:dca9:1::6]) by mail.daemonic.se (Postfix) with ESMTPSA id 3Y34H954Pnz9Ctj; Fri, 16 Nov 2012 17:13:09 +0100 (CET) Received: from tifa.daemonic.se (localhost [IPv6:::1]) by tifa.daemonic.se (Postfix) with ESMTP id 448F5228F2; Fri, 16 Nov 2012 17:13:09 +0100 (CET) Message-ID: <50A66615.9060906@daemonic.se> Date: Fri, 16 Nov 2012 17:13:09 +0100 From: Niclas Zeising User-Agent: Mutt/1.5.21 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: problem booting to multi-vdev root pool [Was: kern/150503: [zfs] ZFS disks are UNAVAIL and corrupted after reboot] References: <509D1DEC.6040505@FreeBSD.org> <50A27243.408@madpilot.net> <50A65F83.5000604@FreeBSD.org> In-Reply-To: <50A65F83.5000604@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 16:13:19 -0000 On 11/16/12 16:45, Andriy Gapon wrote: > on 13/11/2012 18:16 Guido Falsi said the following: >> My idea, but is just a speculation, i could be very wrong, is that the geom >> tasting code has some problem with multiple vdev root pools. > > Guido, > > you are absolutely correct. The code for reconstructing/tasting a root pool > configuration is a modified upstream code, so it inherited a limitation from it: > the support for only a single top-level vdev in a root pool. > I have an idea how to add the missing support, but it turned out not to be > something that I can hack together in couple of hours. > > So, instead I wrote the following patch that should fall back to using a root pool > configuration from zpool.cache (if it's present there) for a multi-vdev root pool: > http://people.freebsd.org/~avg/zfs-spa-multi_vdev_root_fallback.diff > > The patch also fixes a minor (single-time) memory leak. > > Guido, Bartosz, > could you please test the patch? > > Apologies for the breakage. > Just to confirm, since I am holding back an update pending on this. If I have a raidz root pool, with three disks, like this: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 gpt/disk2 ONLINE 0 0 0 Then I'm fine to update without issues. the problem is only if, as an example, you have a mirror with striped disks, or a stripe with mirrored disks, which it seems to me the original poster had. Am I correct, and therefore ok to update? Regards! -- Niclas Zeising From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 16:17:15 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5079470C; Fri, 16 Nov 2012 16:17:15 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from winston.madpilot.net (winston.madpilot.net [78.47.75.155]) by mx1.freebsd.org (Postfix) with ESMTP id 010D38FC08; Fri, 16 Nov 2012 16:17:14 +0000 (UTC) Received: from winston.madpilot.net (localhost [127.0.0.1]) by winston.madpilot.net (Postfix) with ESMTP id 3Y34Mq3CB9zFTDB; Fri, 16 Nov 2012 17:17:11 +0100 (CET) X-Virus-Scanned: amavisd-new at madpilot.net Received: from winston.madpilot.net ([127.0.0.1]) by winston.madpilot.net (winston.madpilot.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8Z1FnAPAi99A; Fri, 16 Nov 2012 17:17:09 +0100 (CET) Received: from vwg82.hq.ignesti.it (unknown [80.74.176.55]) by winston.madpilot.net (Postfix) with ESMTPSA; Fri, 16 Nov 2012 17:17:09 +0100 (CET) Message-ID: <50A66701.701@madpilot.net> Date: Fri, 16 Nov 2012 17:17:05 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121114 Thunderbird/16.0.2 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: problem booting to multi-vdev root pool [Was: kern/150503: [zfs] ZFS disks are UNAVAIL and corrupted after reboot] References: <509D1DEC.6040505@FreeBSD.org> <50A27243.408@madpilot.net> <50A65F83.5000604@FreeBSD.org> In-Reply-To: <50A65F83.5000604@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 16:17:15 -0000 On 11/16/12 16:45, Andriy Gapon wrote: > on 13/11/2012 18:16 Guido Falsi said the following: >> My idea, but is just a speculation, i could be very wrong, is that the geom >> tasting code has some problem with multiple vdev root pools. > > Guido, > > you are absolutely correct. The code for reconstructing/tasting a root pool > configuration is a modified upstream code, so it inherited a limitation from it: > the support for only a single top-level vdev in a root pool. > I have an idea how to add the missing support, but it turned out not to be > something that I can hack together in couple of hours. I can imagine, it does not look simple in any way! > > So, instead I wrote the following patch that should fall back to using a root pool > configuration from zpool.cache (if it's present there) for a multi-vdev root pool: > http://people.freebsd.org/~avg/zfs-spa-multi_vdev_root_fallback.diff > > The patch also fixes a minor (single-time) memory leak. > > Guido, Bartosz, > could you please test the patch? I have just compiler an r242910 kernel with this patch (and just this one) applied. System booted so it seems to work fine! :) > > Apologies for the breakage. > No worries, and thanks for this fix. Also thanks for all the work on ZFS! -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 16:33:08 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 904302CE; Fri, 16 Nov 2012 16:33:08 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from winston.madpilot.net (winston.madpilot.net [78.47.75.155]) by mx1.freebsd.org (Postfix) with ESMTP id 403D18FC16; Fri, 16 Nov 2012 16:33:08 +0000 (UTC) Received: from winston.madpilot.net (localhost [127.0.0.1]) by winston.madpilot.net (Postfix) with ESMTP id 3Y34k96mjBzFTDB; Fri, 16 Nov 2012 17:33:05 +0100 (CET) X-Virus-Scanned: amavisd-new at madpilot.net Received: from winston.madpilot.net ([127.0.0.1]) by winston.madpilot.net (winston.madpilot.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kkZxEZEpMvWe; Fri, 16 Nov 2012 17:33:01 +0100 (CET) Received: from vwg82.hq.ignesti.it (unknown [80.74.176.55]) by winston.madpilot.net (Postfix) with ESMTPSA; Fri, 16 Nov 2012 17:33:01 +0100 (CET) Message-ID: <50A66ABE.5030108@madpilot.net> Date: Fri, 16 Nov 2012 17:33:02 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121114 Thunderbird/16.0.2 MIME-Version: 1.0 To: Niclas Zeising Subject: Re: problem booting to multi-vdev root pool [Was: kern/150503: [zfs] ZFS disks are UNAVAIL and corrupted after reboot] References: <509D1DEC.6040505@FreeBSD.org> <50A27243.408@madpilot.net> <50A65F83.5000604@FreeBSD.org> <50A66615.9060906@daemonic.se> In-Reply-To: <50A66615.9060906@daemonic.se> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 16:33:08 -0000 On 11/16/12 17:13, Niclas Zeising wrote: > > Just to confirm, since I am holding back an update pending on this. > If I have a raidz root pool, with three disks, like this: > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > gpt/disk0 ONLINE 0 0 0 > gpt/disk1 ONLINE 0 0 0 > gpt/disk2 ONLINE 0 0 0 > > Then I'm fine to update without issues. the problem is only if, as an > example, you have a mirror with striped disks, or a stripe with mirrored > disks, which it seems to me the original poster had. > Am I correct, and therefore ok to update? Yes, looks like that. The affected system pool looks like this: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 gpt/disk3 ONLINE 0 0 0 other systems I have with simple mirror pools or single disks have shown no problems. BTW I don't know why the system insists on identifying the third disk as ada2p2, it has a gpt label defined just like the others. -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 16:43:27 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9532D4FA; Fri, 16 Nov 2012 16:43:27 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9B3A58FC0C; Fri, 16 Nov 2012 16:43:26 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA17725; Fri, 16 Nov 2012 18:42:50 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <50A66D0A.7040208@FreeBSD.org> Date: Fri, 16 Nov 2012 18:42:50 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2 MIME-Version: 1.0 To: Niclas Zeising Subject: Re: problem booting to multi-vdev root pool [Was: kern/150503: [zfs] ZFS disks are UNAVAIL and corrupted after reboot] References: <509D1DEC.6040505@FreeBSD.org> <50A27243.408@madpilot.net> <50A65F83.5000604@FreeBSD.org> <50A66615.9060906@daemonic.se> In-Reply-To: <50A66615.9060906@daemonic.se> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 16:43:27 -0000 on 16/11/2012 18:13 Niclas Zeising said the following: > Then I'm fine to update without issues. the problem is only if, as an example, you > have a mirror with striped disks, or a stripe with mirrored disks, which it seems > to me the original poster had. > Am I correct, and therefore ok to update? Yes. The problem occurs only if your pool has multiple vdevs _immediately_ under root. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 17:02:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C862EE9E for ; Fri, 16 Nov 2012 17:02:29 +0000 (UTC) (envelope-from bgold@simons-rock.edu) Received: from hedwig.simons-rock.edu (hedwig.simons-rock.edu [208.81.88.14]) by mx1.freebsd.org (Postfix) with ESMTP id 803F18FC08 for ; Fri, 16 Nov 2012 17:02:29 +0000 (UTC) Received: from behemoth (behemoth.simons-rock.edu [10.30.2.44]) by hedwig.simons-rock.edu (Postfix) with ESMTP id DE39714B for ; Fri, 16 Nov 2012 12:02:28 -0500 (EST) From: "Brian Gold" To: Subject: odd phantom directory (cross posted from freebsd-general) Date: Fri, 16 Nov 2012 12:02:29 -0500 Message-ID: <072001cdc41c$291283a0$7b378ae0$@simons-rock.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: Ac3EHCkI7Iry60q+S6efPxFPyEP22A== Content-Language: en-us X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 17:02:29 -0000 Hi all, My problem appears to be resolved for now, but I would definitely like to understand what the root cause of the issue was. I'm crossposting this to freebsd-fs based on some a suggestion from a freebsd-general user. -------------------------------------------- First post -------------------------------------------- I ran into a rather odd issue this morning with my FreeBSD 9.0-Release system running ZFS v28. This system serves as an RSYNC host which all of our other systems back up to each night. Last night, I started getting the following error: file has vanished: "/backup/ldap1/etc/pki" Now, usually when I get a file has vanished error during an RSYNC run, it indicates that the source file/directory on the system that is sending the rsync backup has been deleted or moved before rsync got a chance to actually send it. That doesn't appear to be the case here. "/backup/ldap1/etc/pki" is the destination directory on my Freebsd/ZFS server. I take a look in "/backup/ldap1/etc" on my Freebsd server and the "pki" subdirectory is no longer listed. Ok, so I run "mkdir /backup/ldap1/etc/pki" and get the following error: "mkdir: /backup/ldap1/etc/pki: File exists". Odd Just to double check, I run "ls -la /backup/ldap1/etc/pki" and get the following: "ls: /backup/ldap1/etc/pki: No such file or directory" Alright, how about a simple touch? "touch: /backup/ldap1/etc/pki: No such file or directory" Fine. Maybe there is something funky about the "/backup/ldap1/etc" directory that is preventing me from doing any of this. "mkdir /backup/ldap1/etc/pki2". That works just fine. What the heck? Looking at the output of my daily security run, I see the following: Checking setuid files and devices: find: /backup/ldap1/etc/fonts/conf.avail: No such file or directory find: /backup/ldap1/etc/fonts/conf.d/30-metric-aliases.conf: No such file or directory find: /backup/ldap1/etc/pki: No such file or directory So, it looks like there are a few files/directories in /backup/ldap1/etc that were affected. Looking through dmesg and /var/log/messages, I don't see anything out of the ordinary. I'm running a zpool scrub now just to be on the safe side, but I haven't seen any checksum or other errors so far. Any thoughts as to what might be causing this? -------------------------------------------- second post -------------------------------------------- It looks like this may be the same issue as reported here: http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027902.html but that thread seems to have just died off about a year ago. Zfs scrub is still running, but not reported errors so far. I'm going to run a "zdb -ccv backup" once that is done. >From looking over this other thread, I tried just a simple "ls /backup/ldap1/etc" and "/backup/ldap1/etc/pki" does show up if I do "ls" without any arguments. If I do an "ls -l" then it doesn't show up. -------------------------------------------- third post -------------------------------------------- Ok, really confused now. I just ran an "rm -rf /backup/ldap1", which errored out when trying to rm "/backup/ldap1/etc/pki", "/backup/ldap1/etc/fonts/conf.d/30-metric-aliases.conf", and "/backup/ldap1/etc/fonts/conf.avail". Everything else got purged correctly, except for those phantom files. I then reran my rsync script, which DIDN'T error this time, shipped all the files over, and I can now read those phantom files/folders just fine. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 18:42:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BFE84CC4 for ; Fri, 16 Nov 2012 18:42:04 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2CE568FC0C for ; Fri, 16 Nov 2012 18:42:03 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id go10so322228lbb.13 for ; Fri, 16 Nov 2012 10:42:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=H5piqpFJXYkYp7+DsnDJhERUlUfEcpR5BtFAIzr3kJE=; b=rjxAElzaqsBWTz6x+KibcLvHmGclKcDUi2DHKTOOLq0w7dBPqTKBsSHDHxq3Loenmi jxUdnsx5o9epvRYyHe+J26b6BWtOYSXoAMweL/T/li/SSnGPGgDQBLEA0QTbA/SP4sIq uiLMwl+Bszuh4klgRgYVMRKmk8EZV5LRFojYc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :x-gm-message-state; bh=H5piqpFJXYkYp7+DsnDJhERUlUfEcpR5BtFAIzr3kJE=; b=XfkBWaFmAhSSRDe0ZMZTGHX9Eg12UUMWHXCqYBcI9Vnldi00DGLK/4Zg10qd97eThx kVEJJXUPl1yqd4qDooytXtmAaUhNZXlrUHNGOeIl2Wv42DXOAaqQ9qHHovVRZhEJO3Tc emAJVktpf4cbZF5DsbnXxw2bDPRhVTdNVaaldM2358mSKZ8dcKIO5DAB8v8l94TttZtX O5Pg7H8RW1ysMmDUdmD64REZiRZtHaDQcGoG0Dozg9Xh+YJyBaXeU0nhq/4ZP32X4zPW Ywy70g7AdOVFjalacTBr7uy1iCqrV2+hRub/MdgSH2Cc9yv5euz5enY4wImQ/AYQsgGp jpUQ== Received: by 10.152.104.115 with SMTP id gd19mr5147508lab.13.1353091322607; Fri, 16 Nov 2012 10:42:02 -0800 (PST) MIME-Version: 1.0 Sender: lists@eitanadler.com Received: by 10.112.25.166 with HTTP; Fri, 16 Nov 2012 10:41:32 -0800 (PST) In-Reply-To: <57ac1f$gg70bn@ipmail05.adl6.internode.on.net> References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> <57ac1f$gg70bn@ipmail05.adl6.internode.on.net> From: Eitan Adler Date: Fri, 16 Nov 2012 13:41:32 -0500 X-Google-Sender-Auth: Rui3Lb71RJzXHqczMBntU-DDytE Message-ID: Subject: Re: ZFS FAQ (Was: SSD recommendations for ZFS cache/log) To: Stephen McKay Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQk5rerZzAOk3VfeVAG25vvdZ8mPGYkHiCeVTzck48uL8hFqdWYXKdnUyVgYmOQqh3aMC9ff Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 18:42:04 -0000 On 15 November 2012 23:58, Stephen McKay wrote: > On Thursday, 15th November 2012, Eitan Adler wrote: > >>Can people here please tell me what is wrong in the following content? > > A few things. I'll intersperse them. > >>Is there additional data or questions to add? > > The whole ZFS world desperately needs good documentation. There > are misconceptions everywhere. There are good tuning hints and > bad (or out of date) ones. Further, it depends on your target > application whether the defaults are fairly good or plain suck. New version of the patch taking into account the comments so far: http://people.freebsd.org/~eadler/files/add-zfs-faq-section.diff -- Eitan Adler Source, Ports, Doc committer Bugmeister, Ports Security teams From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 19:24:53 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4DC53552; Fri, 16 Nov 2012 19:24:53 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 84F8A8FC14; Fri, 16 Nov 2012 19:24:52 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id go10so357342lbb.13 for ; Fri, 16 Nov 2012 11:24:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=RhMg7OEVp+qIJZtGnKcA77Ch8EgP6o10JmvSvJmE3aI=; b=eab37bxGgnRVwKQkxvWAFlNRAjgzn3R+QY8CDNC1fc2a4tTNTmLjjmWxYr8s6uJ9xZ Klo51y4GpqxuWSFPbI+gA+Cozek8yIGY4SQ15qaD7u/jbK/UJoxShNBUcMD2QwGD8f/f V3BbiiJGknXgiAO/E3uBsc7ylTkozIFVus5xnG3wLsLe7YlDDzB38fik6KE0x93IXi/P FW5w7DMuYvXBVSLkHRjBmKRkmAWQosxcClOaZiDOdRVPTOzWmTc4xsuIOQivxuyoRSN0 zaoB9hMzGSHE89EUZKiaxG2DrIleZP/sWidzGxU2EAwIoiBmxLJ+lHDuOKDl9a5lOU19 s+Kg== MIME-Version: 1.0 Received: by 10.152.104.115 with SMTP id gd19mr5250179lab.13.1353093891217; Fri, 16 Nov 2012 11:24:51 -0800 (PST) Received: by 10.112.49.138 with HTTP; Fri, 16 Nov 2012 11:24:51 -0800 (PST) In-Reply-To: <20121116091747.2c1bfc55@suse3> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> <20121115102704.6657ee52@suse3> <20121116091747.2c1bfc55@suse3> Date: Fri, 16 Nov 2012 14:24:51 -0500 Message-ID: Subject: Re: RHEL to FreeBSD file server From: Zaphod Beeblebrox To: Rainer Duffner Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 19:24:53 -0000 On Fri, Nov 16, 2012 at 3:17 AM, Rainer Duffner wr= ote: > Am Fri, 16 Nov 2012 01:03:19 -0500 > schrieb Zaphod Beeblebrox : >> The only price I see is $2449 on Dell's website. Ouch. For 8G. > I saw it for 1800=80 somewhere. > FusionIO is 10k for 320G. > > But I've never seen FusionIO used in a Fileserver - probably doesn't > make sense, price-wise. > > IIRC, the OP wanted to build a 16T fileserver. So he will have to spend > some cash anyway. I think it's a safe assumption that Nexenta and its > partners have spent quite some time evaluating the various options. That seems about the same price. I've stamped out a few of the following builds which are surprisingly cheap and fast. You start with an ICH10 motherboard where the 6 ICH10 SATA ports support port multipliers (this is random and you may have to test --- some do and some don't). Bonus marks if your motherboard also has a few other SATA ports. Check out "gamer" boards from the ASUS "RoG" series. Now... port multipliers are built into quite a few things ... 4 drive external boxes are fairly easy to find, but if you look, the port multiplier itself (1-to-4 or 1-to-5) is available separately for about $40. attach port multipliers to each of the 6 ICH10 ports and go on to connect 24 (or 30) drives. If you're using "green" 2T's, the speed of 4 disks on one channel is about half of the speed of 1-disk-per-channel. If your motherboard has 2 extra SATA (sometimes 6G, for instance), they likely _don't_ support multipliers (this seems deliberate). But you can have a 2 disk raid-1 boot of either FFS or ZFS. Another choice is USB for either flash or HD boot. Put as much memory as it supports. Desktop board will often take 8G or 16G now... or more. You'll want a big power supply and an nice i5 :). You'll get close to wire speed GigE out of this setup. 3x8 vdevs in RAID-Z1 or Z2 work nicely. The reason I say all this... is that this config runs about $3500-ish here in Canada (where green 2T's are ~$109). The ZeusRAM drive up there is 2/3 of that. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 19:36:53 2012 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 90936720; Fri, 16 Nov 2012 19:36:53 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5BEF48FC0C; Fri, 16 Nov 2012 19:36:53 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qAGJarTI011022; Fri, 16 Nov 2012 19:36:53 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qAGJar81011018; Fri, 16 Nov 2012 19:36:53 GMT (envelope-from linimon) Date: Fri, 16 Nov 2012 19:36:53 GMT Message-Id: <201211161936.qAGJar81011018@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/173657: [nfs] strange UID map with nfsuserd X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 19:36:53 -0000 Old Synopsis: strange UID map with nfsuserd New Synopsis: [nfs] strange UID map with nfsuserd Responsible-Changed-From-To: freebsd-amd64->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Nov 16 19:36:41 UTC 2012 Responsible-Changed-Why: reclassify. http://www.freebsd.org/cgi/query-pr.cgi?pr=173657 From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 19:38:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 18C6B7E8; Fri, 16 Nov 2012 19:38:13 +0000 (UTC) (envelope-from stevenschlansker@gmail.com) Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by mx1.freebsd.org (Postfix) with ESMTP id C91368FC0C; Fri, 16 Nov 2012 19:38:12 +0000 (UTC) Received: by mail-pa0-f54.google.com with SMTP id kp6so2224435pab.13 for ; Fri, 16 Nov 2012 11:38:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=3rBTqRn9bWc/Xwv2dqlrUkGgAuK4p4H0xdZ/+yysZNU=; b=hvUYqw9iLjnzC4uWsxTz3K3bis4HhMcjJoLzXNQnpaNSeXsFNraUlqVBE1B7l1TnjJ hrX9sXKxv+FewMCT/NfOkvDQDYSCNqlmnYdUzB0mIB1EvxMF5BnGqdJQsIICeh2D+ndg /JoFDL2iEyIomsKDYdu75AniYCdH0u6bTy0SeLgL7m8nejE2Tc8kPW8RejgK96JRPMVh v5o1tk2xCGAPPv+v6dyb3ZREY5x3/6fON9dIv6oxAb/BAyqE/P5BusRr/CZ0PpfsHsDg tdB0yv/3Sb2UTywrls1jHcX3qbkfw516ZtXZOu4jD35pPOryWrvm78XIofnumrhJF32P WWvA== Received: by 10.66.88.136 with SMTP id bg8mr15572431pab.54.1353094692357; Fri, 16 Nov 2012 11:38:12 -0800 (PST) Received: from anesthetize.dyn.corp.trumpet.io ([207.86.77.58]) by mx.google.com with ESMTPS id v9sm1528857paz.6.2012.11.16.11.38.11 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 16 Nov 2012 11:38:11 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: RHEL to FreeBSD file server From: Steven Schlansker In-Reply-To: Date: Fri, 16 Nov 2012 11:38:13 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <16B803FB-0964-4237-8F25-291470E7EFB5@gmail.com> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> <20121115102704.6657ee52@suse3> <20121116091747.2c1bfc55@suse3> To: Zaphod Beeblebrox X-Mailer: Apple Mail (2.1499) Cc: freebsd-fs@freebsd.org, John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 19:38:13 -0000 On Nov 16, 2012, at 11:24 AM, Zaphod Beeblebrox = wrote: > On Fri, Nov 16, 2012 at 3:17 AM, Rainer Duffner = wrote: >> Am Fri, 16 Nov 2012 01:03:19 -0500 >> schrieb Zaphod Beeblebrox : >=20 >>> The only price I see is $2449 on Dell's website. Ouch. For 8G. >=20 >> I saw it for 1800=80 somewhere. >> FusionIO is 10k for 320G. >>=20 >> But I've never seen FusionIO used in a Fileserver - probably doesn't >> make sense, price-wise. >>=20 >> IIRC, the OP wanted to build a 16T fileserver. So he will have to = spend >> some cash anyway. I think it's a safe assumption that Nexenta and = its >> partners have spent quite some time evaluating the various options. > > go on > to connect 24 (or 30) drives. If you're using "green" 2T's, the speed > of 4 disks on one channel is about half of the speed of > 1-disk-per-channel. > >=20 > The reason I say all this... is that this config runs about $3500-ish > here in Canada (where green 2T's are ~$109). The ZeusRAM drive up > there is 2/3 of that. Curious -- have you been running this setup for any length of time? = There's a fair number of horror stories about the "green" drives in particular. The power management is very aggressive about spin down, causing many = unneeded power on/off cycles, dramatically reducing lifespan in a RAID = configuration. Additionally, supposedly the error recovery is inappropriate leading to = drive failure events. (I believe the feature is known as TLER, time-limited = error recovery) Have you run into this? From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 20:15:59 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6904B813 for ; Fri, 16 Nov 2012 20:15:59 +0000 (UTC) (envelope-from obrith@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id E7C468FC14 for ; Fri, 16 Nov 2012 20:15:58 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id x43so1395296wey.13 for ; Fri, 16 Nov 2012 12:15:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=3SG3x+zFPzsI/7l4oq0itgCDC91TvbSt+sywxgzh+8o=; b=NESbrSC668OLY+3LyCQEYFtvsNIZ+Gdccx+hMxiU8n55yd0mK8Vhfzf6RuoXCpf/uO /NGQkBlzeYgf30kVosRPIB405gn9F0vlVYI9UWaVBo7cS0YyLO6Qw8/VweaM5HzejS8C NP9njwS04yhHlJQqRZOB0YcjXxDNuzAPaKu3f5pedm6epuXWkpuLXvSKAeRFcFg6QCq/ sXCmsZKeJTZZs9YZlAvoTnPZmIhtUY3FxM0k2x8wZq0zGXYIK9WHkwzTxPSHZgbLCTn/ mkMt132UoXllH/fEQiinYaVszcSZWbQmayCyKL9CUS3QHgN0xaQSzTOrUQoxWT63AhBP KEZw== Received: by 10.180.99.194 with SMTP id es2mr6792074wib.15.1353096951784; Fri, 16 Nov 2012 12:15:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.180.18.198 with HTTP; Fri, 16 Nov 2012 12:15:30 -0800 (PST) From: Mike McLaughlin Date: Fri, 16 Nov 2012 12:15:30 -0800 Message-ID: Subject: Re: SSD recommendations for ZFS cache/log To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 20:15:59 -0000 > > On Thu, Nov 15, 2012 at 1:18 AM, John wrote: > > > ----- Julian Elischer's Original Message ----- > > > On 11/13/12 1:19 PM, Jason Keltz wrote: > > > >On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: > > > >>On Mon, 12 Nov 2012, kpneal@pobox.com wrote: > > > >>> > > > >>>With your setup of 11 mirrors you have a good mixture of read > > > >>>and write > > > >>>performance, but you've compromised on the safety. The reason > > > >>>that RAID 6 > > > > ... > > > > > >By the way - on another note - what do you or other list members > > > >think of the new Intel SSD DC S3700 as ZIL? Sounds very promising > > > >when it's finally available. I spent a lot of time researching > > > >ZILs today, and one thing I can say is that I have a major > > > >headache now because of it!! > > > > > > ZIL is best served by battery backed up ram or something.. it's tiny > > > and not a really good fit an SSD (maybe just a partition) L2ARC on > > > the other hand is a really good use for SSD. > > > > Well, since you brought the subject up :-) > > > > Do you have any recommendations for an NVRAM unit usable with Freebsd? > > > > I've always had my eyes on something like this for ZIL but never had the > need to explore it yet: http://www.ddrdrive.com/ > Most recommendations I've seen have also been around mirrored 15krpm disks > of some sort or even a cheaper battery-backed raid controller in front of > decent disks. for zil it would just need a tiny bit of RAM anyway. > > First, I wholeheartedly agree with some of the other posts calling for more documentation and FAQs on ZFS; It's sorely lacking and there is a whole lot of FUD and outdated information. I've tested several SSDs, I have a few DDRdrives, and I have a ZuesRAM (in a TrueNAS appliance - and another on order that I can test with Solaris). The DDRdrive is OK at best. The latency is quite good, but it's not very high throughput mostly because it's PCIe 1x I believe. It can do lots of very small writes but peaks out at about 130MB/sec no matter the blocksize. If you're using GbE, you're set. If you're using LAGG or 10GbE, it's not great for the price. I also just had a wicked evening a few days ago when my building lost power for a few hours at night and UPSs failed. The UPS that the DDRdrive was attached to died at the same time the one backing the server and it broke my zpool quite severely - none of the typical recovery commands worked at all (this was an OpenIndiana box) and the DDRdrive lost 100% of it's configuration - the system thought it was a brand new drive that didn't belong in the pool (it lost it's partition table, label, etc) . It was a disappointing display by the DDRdrive. I know it's my own fault for the power, but the thing is not a good idea if you aren't 100% certain it's battery will outlast the system UPS/shutdown. The SSD that I've had by far and away the best luck with that has a supercap is the Intel 320. I've got a couple systems with 300gb Intel 320's, partitioned to use 15gb for ZIL (and the rest empty). I've been using them for about a year now and have been monitoring the wear. They will not exceed their expected write lifetime until they've written about 1.2PB or more - several years at a fairly heavy workload for me. It can also do 100-175MB/sec and ~10-20k IOPS depending on the workload, often outpacing the DDRdrives. I'm going to get my hands on the new Intel drives with supercaps coming out as soon as they're available - they look quite promising. As for the ZuesRAM, it's exceedingly fast at the system level. I haven't been able to test it thoroughly in my setup though - It seems FreeBSD has a pretty severe performance issue with sync writes over NFS written to the ZIL, at least in backing VMware. I have a very high end system from IX that just can't do more than ~125MB/sec writes (just above 1GbE). It just flat-lines. The ZuesRAM is certainly not bottle necking and doing o_sync dd writes over NFS from other *nix sources I can write nearly 500MB/sec (at 4k bs). My Solaris based systems do not hit the 125MB barrier that FreeBSD seems to have with VMware. I'm using a 10GbE network for my VMware storage. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 20:28:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1353E8F5 for ; Fri, 16 Nov 2012 20:28:34 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id C12788FC08 for ; Fri, 16 Nov 2012 20:28:33 +0000 (UTC) Received: from JRE-MBP-2.local (c-50-143-149-146.hsd1.ca.comcast.net [50.143.149.146]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id qAGKSQF7016429 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 16 Nov 2012 12:28:27 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <50A6A1E5.4070000@freebsd.org> Date: Fri, 16 Nov 2012 12:28:21 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Mike McLaughlin Subject: Re: SSD recommendations for ZFS cache/log References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, jpaetzel@ixsystems.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 20:28:34 -0000 On 11/16/12 12:15 PM, Mike McLaughlin wrote: >> On Thu, Nov 15, 2012 at 1:18 AM, John wrote: >> >>> ----- Julian Elischer's Original Message ----- >>>> On 11/13/12 1:19 PM, Jason Keltz wrote: >>>>> On 11/13/2012 12:41 PM, Bob Friesenhahn wrote: >>>>>> On Mon, 12 Nov 2012, kpneal@pobox.com wrote: >>>>>>> With your setup of 11 mirrors you have a good mixture of read >>>>>>> and write >>>>>>> performance, but you've compromised on the safety. The reason >>>>>>> that RAID 6 >>> ... >>> >>>>> By the way - on another note - what do you or other list members >>>>> think of the new Intel SSD DC S3700 as ZIL? Sounds very promising >>>>> when it's finally available. I spent a lot of time researching >>>>> ZILs today, and one thing I can say is that I have a major >>>>> headache now because of it!! >>>> ZIL is best served by battery backed up ram or something.. it's tiny >>>> and not a really good fit an SSD (maybe just a partition) L2ARC on >>>> the other hand is a really good use for SSD. >>> Well, since you brought the subject up :-) >>> >>> Do you have any recommendations for an NVRAM unit usable with Freebsd? >>> >> I've always had my eyes on something like this for ZIL but never had the >> need to explore it yet: http://www.ddrdrive.com/ >> Most recommendations I've seen have also been around mirrored 15krpm disks >> of some sort or even a cheaper battery-backed raid controller in front of >> decent disks. for zil it would just need a tiny bit of RAM anyway. >> >> > First, I wholeheartedly agree with some of the other posts calling for more > documentation and FAQs on ZFS; It's sorely lacking and there is a whole lot > of FUD and outdated information. > > I've tested several SSDs, I have a few DDRdrives, and I have a ZuesRAM (in > a TrueNAS appliance - and another on order that I can test with Solaris). > The DDRdrive is OK at best. The latency is quite good, but it's not very > high throughput mostly because it's PCIe 1x I believe. It can do lots of > very small writes but peaks out at about 130MB/sec no matter the blocksize. > If you're using GbE, you're set. If you're using LAGG or 10GbE, it's not > great for the price. I also just had a wicked evening a few days ago when > my building lost power for a few hours at night and UPSs failed. The UPS > that the DDRdrive was attached to died at the same time the one backing the > server and it broke my zpool quite severely - none of the typical recovery > commands worked at all (this was an OpenIndiana box) and the DDRdrive lost > 100% of it's configuration - the system thought it was a brand new drive > that didn't belong in the pool (it lost it's partition table, label, etc) . > It was a disappointing display by the DDRdrive. I know it's my own fault > for the power, but the thing is not a good idea if you aren't 100% certain > it's battery will outlast the system UPS/shutdown. The SSD that I've had by > far and away the best luck with that has a supercap is the Intel 320. I've > got a couple systems with 300gb Intel 320's, partitioned to use 15gb for > ZIL (and the rest empty). I've been using them for about a year now and > have been monitoring the wear. They will not exceed their expected write > lifetime until they've written about 1.2PB or more - several years at a > fairly heavy workload for me. It can also do 100-175MB/sec and ~10-20k IOPS > depending on the workload, often outpacing the DDRdrives. I'm going to get > my hands on the new Intel drives with supercaps coming out as soon as > they're available - they look quite promising. > > As for the ZuesRAM, it's exceedingly fast at the system level. I haven't > been able to test it thoroughly in my setup though - It seems FreeBSD has a > pretty severe performance issue with sync writes over NFS written to the > ZIL, at least in backing VMware. I have a very high end system from IX that > just can't do more than ~125MB/sec writes (just above 1GbE). It just > flat-lines. The ZuesRAM is certainly not bottle necking and doing o_sync dd > writes over NFS from other *nix sources I can write nearly 500MB/sec (at 4k > bs). My Solaris based systems do not hit the 125MB barrier that FreeBSD > seems to have with VMware. I'm using a 10GbE network for my VMware storage. I know someone mentionned the Fusion-IO drives as expensive but it would be good to get ix-systems to let us know how much the new cards are.. especially the small 'consumer' one. (io-FX).. I work there (Fusion-IO) but I have NO IDEA what the price is. I know ix have tested them in their TrueNAS boxes as L2ARC but I don't really know the numbers. We do have one (old) card in panther2 (I think) in the FreeBSD cluster so if anyne wants to try it out as a ZIL or L2ARC (an know someone with access) then there is that possibility. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 22:35:54 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 869271D1; Fri, 16 Nov 2012 22:35:54 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id E22468FC08; Fri, 16 Nov 2012 22:35:52 +0000 (UTC) Received: from server.rulingia.com (c220-239-241-202.belrs5.nsw.optusnet.com.au [220.239.241.202]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qAGMZo1o088965 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 17 Nov 2012 09:35:50 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qAGMZiQM099830 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 17 Nov 2012 09:35:44 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qAGMZi4T099829; Sat, 17 Nov 2012 09:35:44 +1100 (EST) (envelope-from peter) Date: Sat, 17 Nov 2012 09:35:44 +1100 From: Peter Jeremy To: Andriy Gapon Subject: Re: zfs diff deadlock Message-ID: <20121116223544.GA99247@server.rulingia.com> References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org> <20121113202730.GA42238@server.rulingia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="LQksG6bCIzRHxTLp" Content-Disposition: inline In-Reply-To: <20121113202730.GA42238@server.rulingia.com> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 22:35:54 -0000 --LQksG6bCIzRHxTLp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Nov-14 07:27:30 +1100, Peter Jeremy wro= te: >On head, I get some normal differences terminated by: >Unable to determine path or stats for object 2128453 in tank/beckett/home@= 20120518: Invalid argument I've done some more digging and got the following. Does it make sense for an object that is a "ZFS plain file" to have another "ZFS plain file" as a parent? It doesn't sound right to me. root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 202641= 9 objects, rootbp DVA[0]=3D<0:266a0efa00:200> DVA[1]=3D<0:31b07fbc00:200> [= L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=3D800L/200P = birth=3D8375L/8375P fill=3D2026419 cksum=3D1acdb1fbd9:93bf9c61e94:1b35c72eb= 8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2128453 1 16K 1.50K 1.50K 1.50K 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED=20 dnode maxblkid: 0 path ??? uid 1000 gid 1000 atime Fri Mar 23 16:34:52 2012 mtime Sat Oct 22 16:13:42 2011 ctime Sun Oct 23 21:09:02 2011 crtime Sat Oct 22 16:13:42 2011 gen 2237174 mode 100444 size 1089 parent 2242171 links 1 pflags 40800000004 xattr 0 rdev 0x0000000000000000 root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 202641= 9 objects, rootbp DVA[0]=3D<0:266a0efa00:200> DVA[1]=3D<0:31b07fbc00:200> [= L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=3D800L/200P = birth=3D8375L/8375P fill=3D2026419 cksum=3D1acdb1fbd9:93bf9c61e94:1b35c72eb= 8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2242171 3 16K 128K 25.4M 25.5M 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED=20 dnode maxblkid: 203 path /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png uid 1000 gid 1000 atime Fri Mar 23 16:41:53 2012 mtime Mon Oct 24 21:15:56 2011 ctime Mon Oct 24 21:15:56 2011 crtime Mon Oct 24 21:15:37 2011 gen 2286679 mode 100644 size 26625731 parent 7001490 links 1 pflags 40800000004 xattr 0 rdev 0x0000000000000000 root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 202641= 9 objects, rootbp DVA[0]=3D<0:266a0efa00:200> DVA[1]=3D<0:31b07fbc00:200> [= L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=3D800L/200P = birth=3D8375L/8375P fill=3D2026419 cksum=3D1acdb1fbd9:93bf9c61e94:1b35c72eb= 8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 7001490 1 16K 512 1K 512 100.00 ZFS directory 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED=20 dnode maxblkid: 0 path /jashank/Pictures/sch/pdm-a4-11 uid 1000 gid 1000 atime Thu May 17 03:38:32 2012 mtime Mon Oct 24 21:15:37 2011 ctime Mon Oct 24 21:15:37 2011 crtime Fri Oct 14 22:17:44 2011 gen 2088407 mode 40755 size 6 parent 6370559 links 2 pflags 40800000144 xattr 0 rdev 0x0000000000000000 microzap: 512 bytes, 4 entries stereo-pair-2.png =3D 2242171 (type: Regular File) stereo-pair-2.xcf =3D 7002074 (type: Regular File) stereo-pair-1.xcf =3D 7001512 (type: Regular File) stereo-pair-1.png =3D 2241802 (type: Regular File) root@FB10-64:~ # --=20 Peter Jeremy --LQksG6bCIzRHxTLp Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCmv8AACgkQ/opHv/APuIcBTgCfXBEBllonPGIG+SOTIBy0sO1h v2oAnAgkmWS0D4frLTeFbIDD17TWfNYF =1nJW -----END PGP SIGNATURE----- --LQksG6bCIzRHxTLp-- From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 22:44:38 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B71C0375 for ; Fri, 16 Nov 2012 22:44:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id CE8FD8FC08 for ; Fri, 16 Nov 2012 22:44:37 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA19501; Sat, 17 Nov 2012 00:44:28 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TZUeB-0001jB-MA; Sat, 17 Nov 2012 00:44:27 +0200 Message-ID: <50A6C1CA.50508@FreeBSD.org> Date: Sat, 17 Nov 2012 00:44:26 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121030 Thunderbird/16.0.2 MIME-Version: 1.0 To: Peter Jeremy Subject: Re: zfs diff deadlock References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org> <20121113202730.GA42238@server.rulingia.com> <20121116223544.GA99247@server.rulingia.com> In-Reply-To: <20121116223544.GA99247@server.rulingia.com> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 22:44:39 -0000 on 17/11/2012 00:35 Peter Jeremy said the following: > On 2012-Nov-14 07:27:30 +1100, Peter Jeremy > wrote: >> On head, I get some normal differences terminated by: Unable to >> determine path or stats for object 2128453 in tank/beckett/home@20120518: >> Invalid argument > > I've done some more digging and got the following. Does it make sense for > an object that is a "ZFS plain file" to have another "ZFS plain file" as a > parent? It doesn't sound right to me. There was a bug that could result in a problem like this (because of corrupted znode attributes) after certain operations. What version of FreeBSD do you run there (branch+rev) ? > root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453 Dataset > tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 > objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 > DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P > birth=8375L/8375P fill=2026419 > cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 > > Object lvl iblk dblk dsize lsize %full type 2128453 1 16K > 1.50K 1.50K 1.50K 100.00 ZFS plain file 264 bonus ZFS znode dnode > flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path > ??? uid 1000 gid 1000 atime Fri Mar 23 16:34:52 > 2012 mtime Sat Oct 22 16:13:42 2011 ctime Sun Oct 23 21:09:02 2011 > crtime Sat Oct 22 16:13:42 2011 gen 2237174 mode 100444 size 1089 > parent 2242171 links 1 pflags 40800000004 xattr 0 rdev > 0x0000000000000000 > > root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171 Dataset > tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 > objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 > DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P > birth=8375L/8375P fill=2026419 > cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 > > Object lvl iblk dblk dsize lsize %full type 2242171 3 16K > 128K 25.4M 25.5M 100.00 ZFS plain file 264 bonus ZFS znode dnode > flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 203 path > /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png uid 1000 gid 1000 > atime Fri Mar 23 16:41:53 2012 mtime Mon Oct 24 21:15:56 2011 ctime > Mon Oct 24 21:15:56 2011 crtime Mon Oct 24 21:15:37 2011 gen 2286679 mode > 100644 size 26625731 parent 7001490 links 1 pflags 40800000004 xattr > 0 rdev 0x0000000000000000 > > root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490 Dataset > tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 > objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 > DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P > birth=8375L/8375P fill=2026419 > cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 > > Object lvl iblk dblk dsize lsize %full type 7001490 1 16K > 512 1K 512 100.00 ZFS directory 264 bonus ZFS znode dnode > flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path > /jashank/Pictures/sch/pdm-a4-11 uid 1000 gid 1000 atime Thu May > 17 03:38:32 2012 mtime Mon Oct 24 21:15:37 2011 ctime Mon Oct 24 > 21:15:37 2011 crtime Fri Oct 14 22:17:44 2011 gen 2088407 mode 40755 > size 6 parent 6370559 links 2 pflags 40800000144 xattr 0 rdev > 0x0000000000000000 microzap: 512 bytes, 4 entries > > stereo-pair-2.png = 2242171 (type: Regular File) stereo-pair-2.xcf = > 7002074 (type: Regular File) stereo-pair-1.xcf = 7001512 (type: Regular > File) stereo-pair-1.png = 2241802 (type: Regular File) > > root@FB10-64:~ # > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Nov 16 22:50:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B74E74EC; Fri, 16 Nov 2012 22:50:17 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id F02498FC14; Fri, 16 Nov 2012 22:50:16 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id j13so3105310lah.13 for ; Fri, 16 Nov 2012 14:50:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=T0/dvfTMr01kye7SE/Ivn0AdAbmc3KkuLsZ/9zqHzdo=; b=ns9RxgWLYyPT9LBG7m+HNe6sdqHPFg6LJrX/q3lVIZGYyyblMqURRkyXLAFeXcy4Fy TLiZZhOSGnBQlU38i099dhuO4PmBHRmJYzXP7auy1iYgP5x14hIGDw27W7pPiQ+IYMfV uhsE0iFesJUsxh+fx6VVgxfULqd3vq0LQWrrkHVxqfNEG2jhe47ZKuehvcMG0W9dStdI sdjIm5naThqnBPO2c4ligsRYDMBma7210OJJBTC0nHJbKeNq93Nydz9ijkMu3YLglVl6 sVKvQ3DtjiDhYi77shBwzs8ce2DtIK34wpWkQLwqKpp+vB86C6io5ZewhwGVHy5h1afa PkNw== MIME-Version: 1.0 Received: by 10.152.106.110 with SMTP id gt14mr5573158lab.1.1353106214812; Fri, 16 Nov 2012 14:50:14 -0800 (PST) Received: by 10.112.49.138 with HTTP; Fri, 16 Nov 2012 14:50:14 -0800 (PST) In-Reply-To: <16B803FB-0964-4237-8F25-291470E7EFB5@gmail.com> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> <20121115102704.6657ee52@suse3> <20121116091747.2c1bfc55@suse3> <16B803FB-0964-4237-8F25-291470E7EFB5@gmail.com> Date: Fri, 16 Nov 2012 17:50:14 -0500 Message-ID: Subject: Re: RHEL to FreeBSD file server From: Zaphod Beeblebrox To: Steven Schlansker Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org, John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Nov 2012 22:50:17 -0000 On Fri, Nov 16, 2012 at 2:38 PM, Steven Schlansker wrote: > > On Nov 16, 2012, at 11:24 AM, Zaphod Beeblebrox wrote: > >> to connect 24 (or 30) drives. If you're using "green" 2T's, the speed >> of 4 disks on one channel is about half of the speed of >> 1-disk-per-channel. >> >> >> The reason I say all this... is that this config runs about $3500-ish >> here in Canada (where green 2T's are ~$109). The ZeusRAM drive up >> there is 2/3 of that. > > Curious -- have you been running this setup for any length of time? There's > a fair number of horror stories about the "green" drives in particular. > > The power management is very aggressive about spin down, causing many unneeded > power on/off cycles, dramatically reducing lifespan in a RAID configuration. > > Additionally, supposedly the error recovery is inappropriate leading to drive > failure events. (I believe the feature is known as TLER, time-limited error recovery) > > Have you run into this? I do see the power down events. If they array is quiet for some time, it takes 10-ish seconds to respond... but then it's supposedly saving power. I've been running about a dozen of these setups for myself and clients... the longest has been running since the early days of ZFS on FreeBSD (my home array). I find each array looses about 1 drive a year. If occasional errors pop up, they almost always indicate that a drive _will_ fail. Smart seems rather universally dumb on this issue... at least for the green drives. Due to the limitation of the port multipliers, increasing the speed of this setup is often quite expensive (requiring RAID cards and 8x slots and whatnot). Without changing the drives, 24 ports of SATA cost roughly $1200, last I looked at it. While the majority of these are "backup" or "archive" file servers, some do run production loads. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 00:29:35 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C7E312E6 for ; Sat, 17 Nov 2012 00:29:35 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 358E68FC08 for ; Sat, 17 Nov 2012 00:29:34 +0000 (UTC) Received: from server.rulingia.com (c220-239-241-202.belrs5.nsw.optusnet.com.au [220.239.241.202]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qAH0TVFg089230 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sat, 17 Nov 2012 11:29:31 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qAH0TPJJ007881 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 17 Nov 2012 11:29:26 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qAH0TP8b007880 for freebsd-fs@FreeBSD.org; Sat, 17 Nov 2012 11:29:25 +1100 (EST) (envelope-from peter) Date: Sat, 17 Nov 2012 11:29:25 +1100 From: Peter Jeremy To: freebsd-fs@FreeBSD.org Subject: Re: zfs diff deadlock Message-ID: <20121117002925.GE38823@server.rulingia.com> References: <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org> <20121113202730.GA42238@server.rulingia.com> <20121116223544.GA99247@server.rulingia.com> <50A6C1CA.50508@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VywGB/WGlW4DM4P8" Content-Disposition: inline In-Reply-To: <50A6C1CA.50508@FreeBSD.org> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 00:29:36 -0000 --VywGB/WGlW4DM4P8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [for the archives - I've written a followup to zfs-discuss] On 2012-Nov-17 00:44:26 +0200, Andriy Gapon wrote: >on 17/11/2012 00:35 Peter Jeremy said the following: >> On 2012-Nov-14 07:27:30 +1100, Peter Jeremy = =20 >> wrote: >>> On head, I get some normal differences terminated by: Unable to >>> determine path or stats for object 2128453 in tank/beckett/home@2012051= 8: >>> Invalid argument >>=20 >> I've done some more digging and got the following. Does it make sense f= or=20 >> an object that is a "ZFS plain file" to have another "ZFS plain file" as= a=20 >> parent? It doesn't sound right to me. > >There was a bug that could result in a problem like this (because of corru= pted >znode attributes) after certain operations. >What version of FreeBSD do you run there (branch+rev) ? This is head r242707 but the original pool is quite old and has been moved around a fair bit using send|recv. I've never intentionally used extended attributes but the pool has been exported to Windows XP via Samba and OS-X via NFSv3. If anyone has any ideas for repairing the pool, I'd be interested. --=20 Peter Jeremy --VywGB/WGlW4DM4P8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCm2mUACgkQ/opHv/APuIfWlACfYVPRz3kxtwFzlNSZe+FxXnjh bSYAoKIi0QmdrcqHsIOdjV1KQANEerHH =CitQ -----END PGP SIGNATURE----- --VywGB/WGlW4DM4P8-- From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 00:36:28 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A39563A3 for ; Sat, 17 Nov 2012 00:36:28 +0000 (UTC) (envelope-from frimik@gmail.com) Received: from mail-da0-f54.google.com (mail-da0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 64E2F8FC15 for ; Sat, 17 Nov 2012 00:36:28 +0000 (UTC) Received: by mail-da0-f54.google.com with SMTP id z9so1453594dad.13 for ; Fri, 16 Nov 2012 16:36:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=5hbME4oUbo6iiv16lzJ1ic8x7D9WuqWBQzyahdw7xwo=; b=MFZnSV5OYRkcBjLIAMhcnbjgfasdf0oveosUGnEUOdhWUZh0N9Duew4B+0SgLJOOrL 2c/IXVZaej4a855+vph3sJgNH7is5ehKYlSsq4WYoIzEqtsQGViAVxdE05iDgSWameYl iyQVK/YCrhgBLuBnJSnbRhqJISR69iWTIFgnplXjNigM4JP5K0ocSi9Q3LFExwb0KyH/ dA//8K2EHHAOQLqeQ9nO2I509150vhudWGnh52QA4oaxTFHt5vO19qMdbxz2oFalDBrp tzjE/SS4eXHpBN9dkUSUWYl7PZztCs0JLfsVsObsUuog5BDDSinAdBHopCvDS1vj+Iho F5bw== MIME-Version: 1.0 Received: by 10.68.239.9 with SMTP id vo9mr20240020pbc.83.1353112588124; Fri, 16 Nov 2012 16:36:28 -0800 (PST) Received: by 10.66.121.233 with HTTP; Fri, 16 Nov 2012 16:36:27 -0800 (PST) Received: by 10.66.121.233 with HTTP; Fri, 16 Nov 2012 16:36:27 -0800 (PST) In-Reply-To: <16B803FB-0964-4237-8F25-291470E7EFB5@gmail.com> References: <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org> <50A2B95D.4000400@cse.yorku.ca> <50A2F804.3010009@freebsd.org> <20121115001840.GA27399@FreeBSD.org> <20121115102704.6657ee52@suse3> <20121116091747.2c1bfc55@suse3> <16B803FB-0964-4237-8F25-291470E7EFB5@gmail.com> Date: Sat, 17 Nov 2012 01:36:27 +0100 Message-ID: Subject: Re: RHEL to FreeBSD file server From: Mikael Fridh To: Steven Schlansker , freebsd-fs@freebsd.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 00:36:28 -0000 On Nov 16, 2012 8:38 PM, "Steven Schlansker" wrote: > > > On Nov 16, 2012, at 11:24 AM, Zaphod Beeblebrox wrote= : > > > On Fri, Nov 16, 2012 at 3:17 AM, Rainer Duffner wrote: > >> Am Fri, 16 Nov 2012 01:03:19 -0500 > >> schrieb Zaphod Beeblebrox : > > > >>> The only price I see is $2449 on Dell's website. Ouch. For 8G. > > > >> I saw it for 1800=80 somewhere. > >> FusionIO is 10k for 320G. > >> > >> But I've never seen FusionIO used in a Fileserver - probably doesn't > >> make sense, price-wise. > >> > >> IIRC, the OP wanted to build a 16T fileserver. So he will have to spen= d > >> some cash anyway. I think it's a safe assumption that Nexenta and its > >> partners have spent quite some time evaluating the various options. > > > > go on > > to connect 24 (or 30) drives. If you're using "green" 2T's, the speed > > of 4 disks on one channel is about half of the speed of > > 1-disk-per-channel. > > > > > > The reason I say all this... is that this config runs about $3500-ish > > here in Canada (where green 2T's are ~$109). The ZeusRAM drive up > > there is 2/3 of that. > > Curious -- have you been running this setup for any length of time? There's > a fair number of horror stories about the "green" drives in particular. > > The power management is very aggressive about spin down, causing many unneeded > power on/off cycles, dramatically reducing lifespan in a RAID configuration. > > Additionally, supposedly the error recovery is inappropriate leading to drive > failure events. (I believe the feature is known as TLER, time-limited error recovery) For some drives you can fix the load cycle count issues by "flashing" (reconfiguring) the drives. My solution was Pxe booting FreeDOS and autoexecing the Idle3 utility on all drives... But I was lucky at the time to have a supported non-AHCI controller. And they were WD 2TB drives... They worked great ever since. A more recent post detailing some of this very well: http://koitsu.wordpress.com/2012/05/30/wd30ezrx-and-aggressive-head-parking= /_______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 02:15:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4E50D972 for ; Sat, 17 Nov 2012 02:15:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 0B2CC8FC0C for ; Sat, 17 Nov 2012 02:15:55 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AigIANHyplCDaFvO/2dsb2JhbABEhiC+DgeCIyWBCwINGQJfiCALnUGOU5J8gSKPC4ETA4hajSKQQ4MNgXs X-IronPort-AV: E=Sophos;i="4.83,268,1352091600"; d="scan'208";a="575603" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 16 Nov 2012 21:15:49 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2178CB3F44 for ; Fri, 16 Nov 2012 21:15:49 -0500 (EST) Date: Fri, 16 Nov 2012 21:15:49 -0500 (EST) From: Rick Macklem To: "freebsd-fs@freebsd.org" Message-ID: <1597743449.486373.1353118549122.JavaMail.root@erie.cs.uoguelph.ca> Subject: RFC: moving NFSv4.1 client from projects to head MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 02:15:56 -0000 Hi, I've been working on NFSv4.1 client support for FreeBSD for some time now and known issues from testing at a Bakeathon last June have been resolved. The patch is rather big, but I believe it should not affect the client unless the new mount options: minorversion=1,pnfs are used for an nfsv4 mount. Since I don't believe that the new NFS client will be affected unless these new mount options are used, I think it could go into head now. On the other hand, there are few NFSv4.1 servers currently available, so it might not yet be widely useful. (See below for slides w.r.t. server availability.) How do folks feel about doing this in early December? Since it doesn't change any KBIs, it could also be MFC'd to stable/9. Would MFC'ing it to stable/9 make sense? For those interested in testing and/or reviewing it, the code is currently in: base/projects/nfsv4.1-client (It is purely a kernel patch.) Also, the current state of NFSv4.1 servers is roughly: http://www.pnfs.com/docs/LISA-11-pNFS-BoF-final.pdf Thanks in advance for any comments, rick From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 14:04:53 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4F99359E for ; Sat, 17 Nov 2012 14:04:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id ABF5F8FC08 for ; Sat, 17 Nov 2012 14:04:52 +0000 (UTC) Received: from tom.home (localhost [127.0.0.1]) by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id qAHE4mPs027435; Sat, 17 Nov 2012 16:04:48 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.1 kib.kiev.ua qAHE4mPs027435 Received: (from kostik@localhost) by tom.home (8.14.5/8.14.5/Submit) id qAHE4m2V027434; Sat, 17 Nov 2012 16:04:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 17 Nov 2012 16:04:48 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: moving NFSv4.1 client from projects to head Message-ID: <20121117140448.GP73505@kib.kiev.ua> References: <1597743449.486373.1353118549122.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5ZmWxaBoDEwVVFom" Content-Disposition: inline In-Reply-To: <1597743449.486373.1353118549122.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=0.2 required=5.0 tests=ALL_TRUSTED, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 14:04:53 -0000 --5ZmWxaBoDEwVVFom Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 16, 2012 at 09:15:49PM -0500, Rick Macklem wrote: > Hi, >=20 > I've been working on NFSv4.1 client support for FreeBSD > for some time now and known issues from testing at a > Bakeathon last June have been resolved. The patch is > rather big, but I believe it should not affect the > client unless the new mount options: > minorversion=3D1,pnfs > are used for an nfsv4 mount. >=20 > Since I don't believe that the new NFS client will be > affected unless these new mount options are used, I think > it could go into head now. On the other hand, there are few > NFSv4.1 servers currently available, so it might not yet > be widely useful. (See below for slides w.r.t. server availability.) >=20 > How do folks feel about doing this in early December? >=20 > Since it doesn't change any KBIs, it could also be MFC'd > to stable/9. Would MFC'ing it to stable/9 make sense? >=20 > For those interested in testing and/or reviewing it, > the code is currently in: > base/projects/nfsv4.1-client > (It is purely a kernel patch.) > Also, the current state of NFSv4.1 servers is roughly: > http://www.pnfs.com/docs/LISA-11-pNFS-BoF-final.pdf >=20 > Thanks in advance for any comments, rick IMO, the earlier the change that you feel mature enough, hits the HEAD in the HEAD x.0 cycle, the better. That said, would you mind to put a diff somewhere to ease the review and testing ? --5ZmWxaBoDEwVVFom Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCnmX8ACgkQC3+MBN1Mb4hI3QCfTfvaEbuUCJHSTnIng4k/cQQF 4zwAoKk4eM9QhrbL+KW4rGT+lfC78Ihq =c9XC -----END PGP SIGNATURE----- --5ZmWxaBoDEwVVFom-- From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 18:48:22 2012 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5E28F7C3; Sat, 17 Nov 2012 18:48:22 +0000 (UTC) (envelope-from smh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 2689C8FC0C; Sat, 17 Nov 2012 18:48:22 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qAHImMav015873; Sat, 17 Nov 2012 18:48:22 GMT (envelope-from smh@freefall.freebsd.org) Received: (from smh@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id qAHImLJU015869; Sat, 17 Nov 2012 18:48:21 GMT (envelope-from smh) Date: Sat, 17 Nov 2012 18:48:21 GMT Message-Id: <201211171848.qAHImLJU015869@freefall.freebsd.org> To: smh@FreeBSD.org, freebsd-fs@FreeBSD.org, smh@FreeBSD.org From: smh@FreeBSD.org Subject: Re: kern/173254: [zfs] [patch] Upgrade requests used in ZFS trim map based on ashift X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 18:48:22 -0000 Synopsis: [zfs] [patch] Upgrade requests used in ZFS trim map based on ashift Responsible-Changed-From-To: freebsd-fs->smh Responsible-Changed-By: smh Responsible-Changed-When: Sat Nov 17 18:47:50 UTC 2012 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=173254 From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 22:32:49 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 465797B9; Sat, 17 Nov 2012 22:32:49 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 534648FC08; Sat, 17 Nov 2012 22:32:48 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id go10so1015089lbb.13 for ; Sat, 17 Nov 2012 14:32:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=1U7bqcNKpRacRZNLh2nUOIrmeOQ4DEB+vByKc0f41Do=; b=DKFPjdAhc1Ft9KJIe/LyXjAciPcSLu2VAu84KJAaytQyEmQPPajagbr3LyU1/xE2kd TReXE5FZhPfDJDGjdX1ujaNLagYPjdBNOmVb9UGLh6F5QL36Vdhhmn5lCaOUII5/5PDv qdNg2tDXSbu+/+H4Yv6/C7A6eLGSBZynPCirOIe0cDniBDVQyqoRCo5tO/TCU8WdZsDR 2nxHW7/pKXjxHgl45Bq7Cp8V5km/eH6n5RfNANcWNHvIFmNgIz5C/eiC3SEhsQJHjD83 OiO5aBch75wVGH3BLM2SWfk2WaefF6C6zuJUwWHINk5PmIsdhZS3cHRaK/eDcnRhBj4r JmmA== MIME-Version: 1.0 Received: by 10.112.30.195 with SMTP id u3mr3328589lbh.37.1353191567036; Sat, 17 Nov 2012 14:32:47 -0800 (PST) Received: by 10.112.49.138 with HTTP; Sat, 17 Nov 2012 14:32:46 -0800 (PST) Date: Sat, 17 Nov 2012 17:32:46 -0500 Message-ID: Subject: Jumbo Packet fail. From: Zaphod Beeblebrox To: freebsd-hackers@freebsd.org, freebsd-fs , FreeBSD Mailing Lists Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 22:32:49 -0000 I recently started using an iSCSI disk on my ZFS array seriously from a windows 7 host on the network. The performance is acceptable, but I was led to believe that using Jumbo packets is a win here. My win7 motherboard adapter did not support jumbo frames, so I got one that did... configured it, etc. Just in case anyone cares, the motherboard had an 82567V-2 (does not support jumbo frames) and I added in an intel 82574L based card. Similarly, I configured em0 on my FreeBSD host to have an MTU of 9014 bytes (I also tried 9000). The hardware on the FreeBSD 9.1RC2 side is: em0: port 0xdc00-0xdc1f mem 0xfcfe0000-0xfcffffff,0xfcfc0000-0xfcfdffff irq 16 at device 0.0 on pci3 pciconf -lv identifies the chipset as 82572EI Now... my problem is that the windows machine correctly advertises an MSS of 8960 bytes in it's SYN packet while FreeBSD advertises 1460 in the syn-ack. [1:42:342]root@vr:/usr/local/etc/istgt> ifconfig em0 em0: flags=8843 metric 0 mtu 9014 options=4019b ether 00:15:17:0d:04:a8 inet 66.96.20.52 netmask 0xffffffe0 broadcast 66.96.20.63 inet6 fe80::215:17ff:fe0d:4a8%em0 prefixlen 64 scopeid 0x5 inet6 2001:1928:1::52 prefixlen 64 inet 192.168.221.2 netmask 0xffffff00 broadcast 192.168.221.255 nd6 options=21 media: Ethernet autoselect (1000baseT ) status: active I have tested this with both ipv4 and ipv6 connections between the win7 host and the FreeBSD server. win7 always requests the larger mss, and FreeBSD the smaller. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 17 22:59:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 04DE03B5 for ; Sat, 17 Nov 2012 22:59:00 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (hill.egr.msu.edu [35.9.37.162]) by mx1.freebsd.org (Postfix) with ESMTP id BF10B8FC0C for ; Sat, 17 Nov 2012 22:58:59 +0000 (UTC) Received: from hill (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id C80072FB12; Sat, 17 Nov 2012 17:58:52 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by hill (hill.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z_igZraJvWMh; Sat, 17 Nov 2012 17:58:52 -0500 (EST) Received: from daemon.localdomain (daemon.egr.msu.edu [35.9.44.65]) by mail.egr.msu.edu (Postfix) with ESMTP id A7A882FB0B; Sat, 17 Nov 2012 17:58:51 -0500 (EST) Received: by daemon.localdomain (Postfix, from userid 21281) id 9A1FB1815F; Sat, 17 Nov 2012 17:58:51 -0500 (EST) Date: Sat, 17 Nov 2012 17:58:51 -0500 From: Adam McDougall To: kpneal@pobox.com Subject: Re: SSD recommendations for ZFS cache/log Message-ID: <20121117225851.GJ1462@egr.msu.edu> References: <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net> <50A31D48.3000700@shatow.net> <20121116044055.GA47859@neutralgood.org> <50A64694.5030001@egr.msu.edu> <20121117181803.GA26421@neutralgood.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121117181803.GA26421@neutralgood.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2012 22:59:00 -0000 On Sat, Nov 17, 2012 at 01:18:03PM -0500, kpneal@pobox.com wrote: On Fri, Nov 16, 2012 at 08:58:44AM -0500, Adam McDougall wrote: > On 11/16/12 00:41, Zaphod Beeblebrox wrote: > > On Thu, Nov 15, 2012 at 11:40 PM, wrote: > >>> + > >>> + The answer very much depends on the expected workload. > >>> + Deduplication takes up a signifigent amount of RAM and CPU > >>> + time and may slow down read and write disk access times. > >>> + Unless one is storing data that is very heavily > >>> + duplicated (such as virtual machine images, or user > >>> + backups) it is likely that deduplication will do more harm > >>> + than good. Another consideration is the inability to > >> > >> I advise against advice that is this firm. The statement that it will "do > >> more harm than good" really should be omitted. And I'm not sure it is > >> fair to say it takes a bunch of CPU. Lots of memory, yes, but lots of > >> CPU isn't so clear. > > > > I experimented by enabling DEDUP on a RAID-Z1 pool containing 4x 2T > > green drives. The system had 8G of RAM and was otherwise quiet. I > > copied a dataset of about 1T of random stuff onto the array and then > > copied the same set of data onto the array a second time. The end > > result is a dedup ration of almost 2.0 and only around 1T of disk > > used. > > > > As I recall (and it's been 6-ish months since I did this), the 2nd > > write became largely CPU bound with little disk activity. As far as I > > could tell, the dedup table never thrashed on the disk ... and that > > most of the disk activity seemed to be creating the directory tree or > > reading the disk to do the verify step of dedup. Well, yes, it was CPU bound because it wasn't disk bound. All filesystem activity is going to be either disk bound, CPU bound, or waiting for more filesystem requests (eg, network bound or similar). Also note that the original text above said that dedup only made sense with heavily duplicated data. That's exactly the case you tested. So your test says nothing about the case where there isn't much duplicated data. The phrase I advised against was referring to the case you didn't test. > Now try deleting some data and the fun begins :) You've had a bad experience? I'd love to hear about it. -- Kevin P. Neal http://www.pobox.com/~kpn/ "Oh, I've heard that paradox a couple of times, but there's something about a cat dying and I hate to think of such things." - Dr. Donald Knuth speaking of Schrodinger's cat, December 8, 1999, MIT Deleting data takes significantly longer than usual because it has to un-dedupe the data, which takes longer than most people expect, and ties up the removal process until it is done. During that time, the CPU is pegged pretty hard and the disks are active but not doing much. I haven't had the opportunity to try this with a large memory system or one with snappy l2arc to see if it is better. This can spiral in at least two ways. For one, the average system admin will not expect it to take so long to delete files and think something is wrong. If this happens in small amounts, they may decide to disable dedupe if they realize that is the cause. But, since the data is already deduped, they are stuck with that behavior until the data is copied fresh or deleted. Doing THAT can take an enormous amount of time, progressing at a slow pace, and has a chance of leading to a deadlock (not making this up). If a deadlock occurs while they are trying to solve this issue, tempers flare even further, especially since the next reboot will continue thrashing the disks where it left off but perhaps before the admin has a chance to log in and figure out what is happening, which isn't obvious. Worse yet, if a lot of data has been deleted, another deadlock may occur. Rinse, Repeat, swear at ZFS, perhaps vow that dedupe is "not ready" and a quiet threat. There have been several people on the FreeBSD mailing lists that have had these symptoms. Some of them added ram to get past it. Some found a way to measure progress and kept letting it churn/deadlock/reboot until things came back to normal. I think in -current there is a new zfs feature allowing for background deletion that may ease this issue, and someone reported success.