From owner-freebsd-hackers@FreeBSD.ORG Fri Mar 14 19:18:55 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09CA2798; Fri, 14 Mar 2014 19:18:55 +0000 (UTC) Received: from mail-ee0-x236.google.com (mail-ee0-x236.google.com [IPv6:2a00:1450:4013:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6A83927D; Fri, 14 Mar 2014 19:18:54 +0000 (UTC) Received: by mail-ee0-f54.google.com with SMTP id d49so1771959eek.27 for ; Fri, 14 Mar 2014 12:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=snZKayD9LN0Kt0nHRRjd8vKb9CBZHE6TvoUaAoAS7zc=; b=nN4QMjWlK2Iku8d3+OapIFWQwePczmKmekJpAHuLJvFF0MvEd1VVQD0u+Qlo6fbNZt BNzle9Y0zWc6BpKWJfL9gLCJ99ISXg+QszGtC8OiUxyyg4+and0R68zy66ykhfqBXWUb vWJd81cGdFz2+EWc+G4MSwtIXbZ+wEgQfJKN70dt+diJSiaNGDMWUESioSYHjp7nwA5I EYHzwwRvB5HozfXFna3ctjdUCNYLnKLzW1o9BsgadpkKqGT1zdOnkFWLLT32bvdvrC9u lards4w2sIHLUL5FjxDWfp8PewWM2EhmGyyq+vDlyv8JMp6UMg9w50V9itSYKu+imlOS FIdw== X-Received: by 10.14.172.69 with SMTP id s45mr10109083eel.26.1394824732798; Fri, 14 Mar 2014 12:18:52 -0700 (PDT) Received: from strashydlo.home (adfi238.neoplus.adsl.tpnet.pl. [79.184.112.238]) by mx.google.com with ESMTPSA id cb5sm19102744eeb.18.2014.03.14.12.18.51 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 14 Mar 2014 12:18:52 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Subject: Re: GSoC proposition: multiplatform UFS2 driver Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=iso-8859-2 From: =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= In-Reply-To: <53235014.1040003@gentoo.org> Date: Fri, 14 Mar 2014 20:18:50 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <9DA009CD-0629-4402-A2A0-0A6BDE1E86FD@FreeBSD.org> References: <20140314152732.0f6fdb02@gumby.homeunix.com> <1394811577.1149.543.camel@revolution.hippie.lan> <0405D29C-D74B-4343-82C7-57EA8BEEF370@FreeBSD.org> <53235014.1040003@gentoo.org> To: Richard Yao X-Mailer: Apple Mail (2.1283) Cc: freebsd-hackers@FreeBSD.org, RW , Ian Lepore X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Mar 2014 19:18:55 -0000 Wiadomo=B6=E6 napisana przez Richard Yao w dniu 14 mar 2014, o godz. = 19:53: > On 03/14/2014 02:36 PM, Edward Tomasz Napiera=B3a wrote: >> Wiadomo=B6=E6 napisana przez Ian Lepore w dniu 14 mar 2014, o godz. = 16:39: >>> On Fri, 2014-03-14 at 15:27 +0000, RW wrote: >>>> On Thu, 13 Mar 2014 18:22:10 -0800 >>>> Dieter BSD wrote: >>>>=20 >>>>> Julio writes, >>>>>> That being said, I do not like the idea of using NetBSD's UFS2 >>>>>> code. It lacks Soft-Updates, which I consider to make FreeBSD = UFS2 >>>>>> second only to ZFS in desirability. >>>>>=20 >>>>> FFS has been in production use for decades. ZFS is still wet = behind >>>>> the ears. Older versions of NetBSD have soft updates, and they = work >>>>> fine for me. I believe that NetBSD 6.0 is the first release = without >>>>> soft updates. They claimed that soft updates was "too difficult" = to >>>>> maintain. I find that soft updates are *essential* for data >>>>> integrity (I don't know *why*, I'm not a FFS guru).=20 >>>>=20 >>>> NetBSD didn't simply drop soft-updates, they replaced it with >>>> journalling, which is the approach used by practically all modern >>>> filesystems.=20 >>>>=20 >>>> A number of people on the questions list have said that they find >>>> UFS+SU to be considerably less robust than the journalled = filesystems >>>> of other OS's. =20 >>=20 >> Let me remind you that some other OS-es had problems such as = truncation >> of files which were _not_ written (XFS), silently corrupting metadata = when >> there were too many files in a single directory (ext3), and panicing = instead >> of returning ENOSPC (btrfs). ;-> >=20 > Lets be clear that such problems live between the VFS and block layer > and therefore are isolated to specific filesystems. Such problems > disappear when using ZFS. Such problems disappear after fixing bugs that caused them. Just like with ZFS - some people _have_ lost zpools in the past. >>> What I've seen claimed is that UFS+SUJ is less robust. That's a = very >>> different thing than UFS+SU. Journaling was nailed onto the side of = UFS >>> +SU as an afterthought, and it shows. >>=20 >> Not really - it was developed rather recently, and with filesystems = it usually >> shows, but it's not "nailed onto the side": it complements SU = operation >> by journalling the few things which SU doesn't really handle and = which >> used to require background fsck. >>=20 >> One problem with SU is that it depends on hardware not lying about >> write completion. Journalling filesystems usually just issue flushes >> instead. >=20 > This point about write completion being done on unflushed data and no > flushes being done could explain the disconnect between RW's = statements > and what Soft Updates should accomplish. However, it does not change = my > assertion that placing UFS SU on a ZFS zvol will avoid such failure > modes. Assuming everything between UFS and ZFS below behaves correctly. > In ZFS, we have a two stage transaction commit that issues a > flush at each stage to ensure that data goes to disk, no matter what = the > drive reported. Unless the hardware disobeys flushes, the second stage > cannot happen if the first stage does not complete and if the second > stage does not complete, all changes are ignored. >=20 > What keeps soft updates from issuing a flush following write = completion? > If there are no pending writes, it is a noop. If the hardware lies, = then > this will force the write. The internal dependency tracking mechanisms > in Soft Updates should make figuring out when a flush needs to be = issued > should hardware have lied about completion rather simple. At a high > level, what needs to be done is to batch the things that can be done > simultaneously and separate those that cannot by flushes. If such > behavior is implemented, it should have a mount option for toggling = it. > It simply is not needed on well behaved devices, such as ZFS zvols. As you say, it's not needed on well-behaved devices. While it could help with crappy hardware, I think it would be either very complicated (batching, as described), or would perform very poorly. To be honest, I wonder how many problems could be avoided by disabling write cache by default. With NCQ it shouldn't cause performance problems, right?