Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Feb 2011 19:39:00 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Ivan Voras <ivoras@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Konstantin Belousov <kib@FreeBSD.org>, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r218603 - head/sbin/tunefs
Message-ID:  <20110214190055.V1273@besplex.bde.org>
In-Reply-To: <AANLkTi=RED0hpX8aaw-icvYPPtc6pPhnuw2bT8-TarLx@mail.gmail.com>
References:  <201102121312.p1CDCjhD002584@svn.freebsd.org> <20110213213251.B1474@besplex.bde.org> <AANLkTi=RED0hpX8aaw-icvYPPtc6pPhnuw2bT8-TarLx@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-46083153-1297672740=:1273
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Sun, 13 Feb 2011, Ivan Voras wrote:

> On 13 February 2011 11:51, Bruce Evans <brde@optusnet.com.au> wrote:
>> On Sat, 12 Feb 2011, Konstantin Belousov wrote:
>>
>>> Log:
>>> =C2=A0When creating a directory entry for the journal, always read at l=
east
>>> =C2=A0the fragment, and write the full block. Reading less might not wo=
rk
>>> =C2=A0due to device sector size bigger then size of direntries in the
>>> =C2=A0last directory fragment.
>>
>> I think it should always write full fragments too (and the kernel should
>> always read/write in units of fragments, not sectors of any size).
>
> Or at least One Single Variable, preferably recorded in the
> superblock, so when the need arises there's only one thing to change
> (so it might as well be fragment size in case of UFS).

kib pointed out that the writes in fsck_ffs need to be atomic, and
geom/device drivers only guarantee atomicity for single-sector writes
(which I thing is a bug -- up to the driver's max_iosize should be
guaranteed, and userland needs to be able to see this max if it needs
to do atomic writes).  I don't know if tunefs needs this too (maybe
not, since AFAIK tunefs doesn't even work on ro-mounted file systems
except in my version).

Now I think size shouldn't be given by any fs parameter.  The device
used by utilities may have support different i/o sizes than the device
used by the kernel.  For example, it might be a block device or a
regular file.  I've actually made use of this.  When block devices
were broken on FreeBSD, Linux e2fsck stopped working.  I used the
workaround of copying a small (~1GB) partition to a regular file for
fsck'ing and back to the disk for use in the kernel.  It would be more
unusual for the device used by utilities to require a larger i/o size
than the kernel is using, and utilities would need more reblocking
than they have to work if this size exceeds the fragment size, but
this is possible too.  Say the file system records the hardware sector
size of the device on which it was created.  This size will become
unusable if the file system is copied to another device that has a
larger hard sector size.  But everything will keep working if you use
a size that works on the current device, and this size is a divisor
of the fragment size (else at least the kernel will stop working)
and is not larger than 8K (else the superblock probe will fail).
Examples:
- start with a "device" consisting of a regular file.  The ioctl to
   determine the sector size will fail, so you must not depend on it
   working or use its value.  You can do no better than requiring the
   size to be specified on the command line.  You can also default to
   512.  Copy the resulting file system image to a new disk with 4K
   sectors (and no block size fakery of its own :-).
- start with a normal device with a normal sector size of 512.  Use this
   a bit, then copy it to a new disk with 4K sectors.
- test all this by copying file systems to md devices with various larger
   and smaller sector sizes.

> There is currently nothing technically wrong with what this commit
> does, but it's pretty much a certainty that future will be more
> strange than today and future developers may forget there are two
> places they need to change.

Another technical error is lack of support for different i/o sizes for
read and write.  Not a large error since this is broken in the kernel
too.  But probing for an i/o size that works would handle all combinations.
I have a DVD drive with this problem.  For DVD-RW, it has a minimum
read size of 2K but a minimum write size of 32K.  It advertises a
"firmware" sector size of 2K.  So using the firmware sector size doesn't
work for writing, and recent changes in fsck_ffs wouldn't work.  The
fragment size needs to be 32K to work.  IIRC, the kernel did work with
this fragment size, but 32K is too inefficient to actually use for long.
This depends on reads of < 32K working, since the probe for the superblock
only tries size 8K.  A pure minimum i/o size of 32K would fail for just
the buggy superblock probe and buggy utilities.

Bruce
--0-46083153-1297672740=:1273--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110214190055.V1273>