From owner-svn-src-all@FreeBSD.ORG Mon Feb 14 08:39:07 2011 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B0A41065670; Mon, 14 Feb 2011 08:39:07 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 7C40E8FC0A; Mon, 14 Feb 2011 08:39:06 +0000 (UTC) Received: from c122-107-114-89.carlnfd1.nsw.optusnet.com.au (c122-107-114-89.carlnfd1.nsw.optusnet.com.au [122.107.114.89]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p1E8d08u015563 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 14 Feb 2011 19:39:01 +1100 Date: Mon, 14 Feb 2011 19:39:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ivan Voras In-Reply-To: Message-ID: <20110214190055.V1273@besplex.bde.org> References: <201102121312.p1CDCjhD002584@svn.freebsd.org> <20110213213251.B1474@besplex.bde.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-46083153-1297672740=:1273" Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Konstantin Belousov , Bruce Evans Subject: Re: svn commit: r218603 - head/sbin/tunefs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2011 08:39:07 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-46083153-1297672740=:1273 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sun, 13 Feb 2011, Ivan Voras wrote: > On 13 February 2011 11:51, Bruce Evans wrote: >> On Sat, 12 Feb 2011, Konstantin Belousov wrote: >> >>> Log: >>> =C2=A0When creating a directory entry for the journal, always read at l= east >>> =C2=A0the fragment, and write the full block. Reading less might not wo= rk >>> =C2=A0due to device sector size bigger then size of direntries in the >>> =C2=A0last directory fragment. >> >> I think it should always write full fragments too (and the kernel should >> always read/write in units of fragments, not sectors of any size). > > Or at least One Single Variable, preferably recorded in the > superblock, so when the need arises there's only one thing to change > (so it might as well be fragment size in case of UFS). kib pointed out that the writes in fsck_ffs need to be atomic, and geom/device drivers only guarantee atomicity for single-sector writes (which I thing is a bug -- up to the driver's max_iosize should be guaranteed, and userland needs to be able to see this max if it needs to do atomic writes). I don't know if tunefs needs this too (maybe not, since AFAIK tunefs doesn't even work on ro-mounted file systems except in my version). Now I think size shouldn't be given by any fs parameter. The device used by utilities may have support different i/o sizes than the device used by the kernel. For example, it might be a block device or a regular file. I've actually made use of this. When block devices were broken on FreeBSD, Linux e2fsck stopped working. I used the workaround of copying a small (~1GB) partition to a regular file for fsck'ing and back to the disk for use in the kernel. It would be more unusual for the device used by utilities to require a larger i/o size than the kernel is using, and utilities would need more reblocking than they have to work if this size exceeds the fragment size, but this is possible too. Say the file system records the hardware sector size of the device on which it was created. This size will become unusable if the file system is copied to another device that has a larger hard sector size. But everything will keep working if you use a size that works on the current device, and this size is a divisor of the fragment size (else at least the kernel will stop working) and is not larger than 8K (else the superblock probe will fail). Examples: - start with a "device" consisting of a regular file. The ioctl to determine the sector size will fail, so you must not depend on it working or use its value. You can do no better than requiring the size to be specified on the command line. You can also default to 512. Copy the resulting file system image to a new disk with 4K sectors (and no block size fakery of its own :-). - start with a normal device with a normal sector size of 512. Use this a bit, then copy it to a new disk with 4K sectors. - test all this by copying file systems to md devices with various larger and smaller sector sizes. > There is currently nothing technically wrong with what this commit > does, but it's pretty much a certainty that future will be more > strange than today and future developers may forget there are two > places they need to change. Another technical error is lack of support for different i/o sizes for read and write. Not a large error since this is broken in the kernel too. But probing for an i/o size that works would handle all combinations. I have a DVD drive with this problem. For DVD-RW, it has a minimum read size of 2K but a minimum write size of 32K. It advertises a "firmware" sector size of 2K. So using the firmware sector size doesn't work for writing, and recent changes in fsck_ffs wouldn't work. The fragment size needs to be 32K to work. IIRC, the kernel did work with this fragment size, but 32K is too inefficient to actually use for long. This depends on reads of < 32K working, since the probe for the superblock only tries size 8K. A pure minimum i/o size of 32K would fail for just the buggy superblock probe and buggy utilities. Bruce --0-46083153-1297672740=:1273--