From owner-freebsd-fs@FreeBSD.ORG  Sat Nov 26 08:14:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 557EF106564A
	for <freebsd-fs@freebsd.org>; Sat, 26 Nov 2011 08:14:02 +0000 (UTC)
	(envelope-from lev@freebsd.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1B3BE8FC0A
	for <freebsd-fs@freebsd.org>; Sat, 26 Nov 2011 08:14:02 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:5974:a369:b987:bc4d])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id C51BD4AC1C; 
	Sat, 26 Nov 2011 12:14:00 +0400 (MSK)
Date: Sat, 26 Nov 2011 12:13:54 +0400
From: Lev Serebryakov <lev@freebsd.org>
X-Priority: 3 (Normal)
Message-ID: <1961318852.20111126121354@serebryakov.spb.ru>
To: Kostik Belousov <kostikbel@gmail.com>
In-Reply-To: <20111126080351.GD50300@deviant.kiev.zoral.com.ua>
References: <20111123194444.GE50300@deviant.kiev.zoral.com.ua>
	<201111260725.pAQ7PDow056289@chez.mckusick.com>
	<20111126080351.GD50300@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org
Subject: Re: Does UFS2 send BIO_FLUSH to GEOM when update metadata (with
	softupdates)?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 26 Nov 2011 08:14:02 -0000

Hello, Kostik.
You wrote 26 =ED=EE=FF=E1=F0=FF 2011 =E3., 12:03:51:

>> You are entirely correct when you say that the requirement for
>> SU and SU+J is that it requires that notification of a disk-write
>> complete mean that the data is on the disk (stable). The problem
>> that arises is that (apparently) some tag-queue implementations
>> report back that tags have been written when in fact they have
>> not been written.=20
> Right, and my belief that real hardware is not much affected,
   You have wrong idea about modern hardware, sorry.

   Again: don't forget multi-megabyte caches, and absence of any
 guarantees, in which order these caches will be flashed. Many
 controllers and drives itself group writes. And if companion for data
 block in cache is found earlier than companion for metadata block (as
 drive doesn't distinguish them) or waiting timeout, data block will
 be written first. The same applicable to two metadata blocks, of
 course. And it is not question of BROKEN QUEUEING.

    Again, I'm speaking not about cheap ATA drivers here, but about
 expensive high-performance RAID controllers ands server drives with
 huge caches.

> except probably some ultra-cheap and old ATA disks. Another issue
> is broken-by-design 'drivers' which authors do not understand the
> environment they programming for.
  And, again, or you have synchronous from top to bottoms storage
 stack and performance, which will be miserable, compared to other
 OSes, or you need to give some freedom to driver authors and provide
 hints about semantic of personal operations to them. Every drive and
 controller, which does write caching and reordering (except old,
 cheap broken ATA ones) HAVE flags and knobs to send this individual
 block to plactes as soon as possible. But now drivers doesn't have
 any idea when they should use these flags. And they don't use them.

> I do not see how this proposal change much, except limiting potential
> havoc to the last 100ms of system operation. In fact, reordering,
> besides causing fs consistency problems, may cause the security issues
> as well [*]. If user data is written into the reused blocks, but
> metadata update was ordered after data write, we can end with the
> arbitrary override of the sensititive authorization or accounting
> information.
   It is why metadata requests should be marked as non-reordable,
  non-queuable. Personal requests, not some global barrier every 100ms.

--=20
// Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>