From owner-freebsd-fs@FreeBSD.ORG Sun Nov 27 18:41:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FCFE106567B; Sun, 27 Nov 2011 18:41:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 9664B8FC1A; Sun, 27 Nov 2011 18:41:25 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id pARIfLRs051807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 27 Nov 2011 20:41:21 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id pARIfLWu065382; Sun, 27 Nov 2011 20:41:21 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id pARIfKG6065381; Sun, 27 Nov 2011 20:41:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 27 Nov 2011 20:41:20 +0200 From: Kostik Belousov To: Lev Serebryakov Message-ID: <20111127184120.GT50300@deviant.kiev.zoral.com.ua> References: <20111123194444.GE50300@deviant.kiev.zoral.com.ua> <201111260725.pAQ7PDow056289@chez.mckusick.com> <20111126080351.GD50300@deviant.kiev.zoral.com.ua> <1961318852.20111126121354@serebryakov.spb.ru> <20111126084151.GH50300@deviant.kiev.zoral.com.ua> <1381381670.20111127152414@serebryakov.spb.ru> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="M3MVXBHeTEnycIo5" Content-Disposition: inline In-Reply-To: <1381381670.20111127152414@serebryakov.spb.ru> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Kirk McKusick , freebsd-fs@freebsd.org Subject: Re: Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Nov 2011 18:41:26 -0000 --M3MVXBHeTEnycIo5 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Nov 27, 2011 at 03:24:14PM +0400, Lev Serebryakov wrote: > Hello, Kostik. > You wrote 26 =CE=CF=D1=C2=D2=D1 2011 =C7., 12:41:51: >=20 > > on the operation end. In fact, there is inherited uglyness due to async > > nature, namely, the kernel-owned buffer locks. Getting rid of them would > > be much more useful then breaking UFS. > Why do you name it breaking? How additional piece of meta-information c= ould break > UFS? Because disabling reordering of the writes issued by UFS slows it down by a factor of 3-10 times. >=20 > > The non-broken driver must not return the 'completed' bio into the up > > queue until write is sent to hardware and hardware reported the complet= ion. > So, hold bio without completion for, say, 5 minutes, will be Ok? It is up to users of your driver to decide is it Ok or no. For UFS/SU, the only consequence will be the accumulation of the workitems in memory that track dependencies of other metadata buffers on the delayed one. For UFS/SU+J, if some buffer is delayed indefinitely, the journal might overflow. >=20 > > Raid controllers which aggressively cache the writes use nvram or > > battery backups, and do not allow to turn on write cache if battery is > > non-functional. I had not seen SU inconsistencies on RAID 6 on mfi(4), > It is not always true. And it could be not true for network > attached storage, as here is too many variables in equation in such > case. Yes, good controller should do this, I could not agree more. But > it is not always possible, unfortunately. Yous claims are not backed by any facts. Please inform us on the models and revisions of the firmware for the devices you declare are broken in the described ways. Also, please reference the documentation which states that devices behave in such a way. At least I would know what to avoid. >=20 > > despite one our machine has unfortunate habit of dropping boot disk over > > SATA channel each week, for 2 years. > Great! But even battery-backed (read: UPS) software realization is > not protected from OS crashes. So, it is impossible to implement > software RAID5, which plays nicely with UFS (in case of crash -- > until ehre is no crash, everyhting is perfect), now. Ok, you could > say ``we don't need it at all,'' but I could not agree with this > statement. Yes, I'm biased here. But, really, I see some interest to > software RAID5 on FreeBSD now. Software RAID5 might loose the checksum block due to kernel or power failure. This is not different from RAID1 declared inconsistent after the unclean stop. Your claim is not backed by facts, again. >=20 > > You again missed the point - if metadata is not reordable, but user > > data is, you get security issues. They are similar (but inverse) to what > > I described in the previous paragraph. > In case of crash -- yes. But, IMHO, in case of crash here could be > scenario when some information is leaked in any case. If here is no > crash, you haven't security issues. Because every read will return > actual information, either from write cache, or from plates. > Inconsistent cache implementation is bad thing, for sure, but it is > orthogonal question to what we discuss here. I cannot understand how you answer is related to my statement. --M3MVXBHeTEnycIo5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7ShFAACgkQC3+MBN1Mb4ispQCfa9fVFO8CZ6dNcpeqxxVWbxqQ Q0oAoN8Mhg+VlkLLhSbx4xooATs6l80g =AHnQ -----END PGP SIGNATURE----- --M3MVXBHeTEnycIo5--