Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Feb 2011 23:30:56 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kib@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r218604 - head/sbin/fsck_ffs
Message-ID:  <20110213215436.L1544@besplex.bde.org>
In-Reply-To: <201102121317.p1CDHE9v002742@svn.freebsd.org>
References:  <201102121317.p1CDHE9v002742@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 12 Feb 2011, Konstantin Belousov wrote:

> Log:
>  In checker, read journal by sectors.
>
>  Due to UFS insistence to pretend that device sector size is 512 bytes,

Do you mean that ffs_fsck does this?  UFS doesn't exist, and ffs in the
kernel didn't use device sectors at all until recently (it used DEV_BSIZE
units, but those are not sectors but are the units for bread() and friends).
newfs uses a hack to do this.  When handling kernel sizes in units of
DEV_BSIZE (not a sector size!), ffs_fsck should use DEV_BSIZE (or maybe
just fsbtodb()) instead of `dev_bsize' or `sectorsize' and not
pretend that DEV_BSIZE (or the corresponding superblock value for the
conversion macro) is the sector size.

>  sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices,
>  and from the label otherwise. The file images without label have to
>  be made with 512 sector size.

> Modified: head/sbin/fsck_ffs/fsck.h
> ==============================================================================
> --- head/sbin/fsck_ffs/fsck.h	Sat Feb 12 13:12:45 2011	(r218603)
> +++ head/sbin/fsck_ffs/fsck.h	Sat Feb 12 13:17:14 2011	(r218604)
> @@ -268,6 +268,7 @@ char	snapname[BUFSIZ];	/* when doing sna
> char	*cdevname;		/* name of device being checked */
> long	dev_bsize;		/* computed value of DEV_BSIZE */
> long	secsize;		/* actual disk sector size */
> +long	real_dev_bsize;

If `secsize' is the actual disk sector size, then we don't need another
variable giving the real sector size.

The new variable has no comment.  What is the difference between a sector
size and a dev_bsize?  There is enough confusion between DEV_BSIZE and
sector sizes already.  DEV_BSIZE has come to means nothing to do with
sectors and little to do with devices or block sizes.  It is just a minimal
i/o sizes for use in old interfaces (ones that didn't want to use more than
32 bits for disk addresses, but now use 64 bits anyway, so that they can
address 72 bits after multiplication by DEV_BSIZE).

These variables are named and documented slightly better in mkfs:

% extern int	sectorsize;	/* bytes/sector */
% extern int	realsectorsize;	/* bytes/sector in hardware*/

At lease the fake sector size variable doesn't claim to be actual, and
the real sector size variable doesn't have extra underscores and claims
to be a sector size variable.

> Modified: head/sbin/fsck_ffs/setup.c
> ==============================================================================
> --- head/sbin/fsck_ffs/setup.c	Sat Feb 12 13:12:45 2011	(r218603)
> +++ head/sbin/fsck_ffs/setup.c	Sat Feb 12 13:17:14 2011	(r218604)
> @@ -446,7 +446,7 @@ sblock_init(void)
> 	if (sblk.b_un.b_buf == NULL || asblk.b_un.b_buf == NULL)
> 		errx(EEXIT, "cannot allocate space for superblock");
> 	if ((lp = getdisklabel(NULL, fsreadfd)))
> -		dev_bsize = secsize = lp->d_secsize;
> +		real_dev_bsize = dev_bsize = secsize = lp->d_secsize;
> 	else
> 		dev_bsize = secsize = DEV_BSIZE;
> }

Both the variables are real enough here, provided getdisklabel() works.
Otherwise, you are in trouble and should fail instead of defaulting the
size unless the size is set by another means.  newfs seems to fail, and
I'm sure newfs_msdos does fail if the sector size is unknown.

> Modified: head/sbin/fsck_ffs/suj.c
> ==============================================================================
> --- head/sbin/fsck_ffs/suj.c	Sat Feb 12 13:12:45 2011	(r218603)
> +++ head/sbin/fsck_ffs/suj.c	Sat Feb 12 13:17:14 2011	(r218604)
> @@ -28,6 +28,7 @@
> __FBSDID("$FreeBSD$");
>
> #include <sys/param.h>
> +#include <sys/disk.h>
> #include <sys/disklabel.h>
> #include <sys/mount.h>
> #include <sys/stat.h>
> @@ -201,6 +202,11 @@ opendisk(const char *devnam)
> 		    disk->d_error);
> 	}
> 	fs = &disk->d_fs;
> +	if (real_dev_bsize == 0 && ioctl(disk->d_fd, DIOCGSECTORSIZE,
> +	    &real_dev_bsize) == -1)
> +		real_dev_bsize = secsize;
> +	if (debug)
> +		printf("dev_bsize %ld\n", real_dev_bsize);
> }
>
> /*

If this gives a different "real_dev_bsize", and this is different from the
sector size in the label or defaulted-to, then the latter is presumably
wrong, and we're left with a `secsize' that is not the "actual sector size".

It is a bug to replace the sector size in the label with the one obtained
here.  The former should be correct, and it is the only one under user
control.  It might need to be changed if one given by the ioctl is wrong
or just if it is fake.  (Labels only have 32-bit disk addresses, so fake
sector sizes are needed to address more than 41 bits if the physical sector
size has is 512.)

newfs doesn't have the  precedence that I want:
- sectorsize set on the command line (-S option) has precedence.  [Bug;
   -S 0 is needed to cancel any previous setting of sectorsize on the
   command line by -S or maybe by -T, but it is treated as an error.]
   fsck_ffs unfortunately doesn't seem to have any -S or -T option.  These
   used to be unneeded since sector size weren't used, and labels worked
   and contain enough info to locate the superblock where there is more
   info.
- then try setting sectorsize using the ioctl if sectorsize is not already
   set.  [Bug: no error checking for this ioctl.  If it fails, then it
   may clobber sectorsize, which breaks the next step.]
- then set sectorsize from the label if there is a label and sectorsize
   is not already set
- fail if sectorsize is not set

Then the hacks involving realsectorsize:
- set it to sectorsize
- if sectorsize != DEV_BSIZE, change it to DEV_BSIZE and fix up related
   quantities (I hope the change to the partition table is never written).
   Now `sectorsize' is no longer "actual", though it was actual before the
   hacks.  We continue with the actual sector size recorded in realsectorsize.

> @@ -2262,7 +2268,7 @@ suj_build(void)
> 		rec = (union jrec *)seg->ss_blk;
> 		for (i = 0; i < seg->ss_rec.jsr_cnt; off += JREC_SIZE, rec++) {
> 			/* skip the segrec. */
> -			if ((off % DEV_BSIZE) == 0)
> +			if ((off % real_dev_bsize) == 0)
> 				continue;
> 			switch (rec->rec_jrefrec.jr_op) {
> 			case JOP_ADDREF:
> ...

This is all sort of backwards.  DEV_BSIZE is the real DEV_BSIZE.  Old code
that uses units of DEV_BSIZE spells this as dev_bsize although it is
constant.  New code that uses units of sectors spelled this as DEV_BSIZE
although the correct size is variable; now the new code spelled this as
real_dev_bsize, which is variable but unrelated to dev_bsize = DEV_BSIZE :-).

I think using fragments instead of sectors would work better.  For a start,
you can get the fragment size from the superblock and not have to worry
about labels of ioctls.

There is a problem locating the superblock.  The kernel and various
utilities only try reading the superblock with size SBSIZE (8K).  This
fails if the sector size is not a divisor of 8K.  They also only try
reading at various offsets which might not be a multiple of the sector
size, but which are larger than 8K so they are more likely to be a
multiple.  I fixed this problem in fsck_msdosfs where it is larger
(fsck_msdosfs assumes that the sector size is a divisor of 512, but
sector sizes like 2K and 4K are now common relative to sizes of 16K).

Hmm, the initialization of dev_bsize is even more obfuscated than I
remembered:

% 	/*
% 	 * Compute block size that the file system is based on,
% 	 * according to fsbtodb, and adjust superblock block number
% 	 * so we can tell if this is an alternate later.
% 	 */
% 	super *= dev_bsize;

We may have needed a fake sector size to read the superblock.  I forget if
we got a non-fake one from the label or ioctl before here.  We should have
just tried all reasonable power of 2 sizes between 1 and 1M (I use
64K down to DOSBOOTBLOCKSIZE = 512 in fsck_msdosfs).

% 	dev_bsize = sblock.fs_fsize / fsbtodb(&sblock, 1);

This is an obfuscated way of spelling DEV_BSIZE.  Although fsbtodb()
converts to blocks with fixed size DEV_BSIZE, it needs a variable shift
count to handle the variable fragment size.  This shift count is
fs_fsbtodb, which was initialized in newfs to
ilog2(fs_fsize / sectorsize).  But sectorsize was fake there
(always DEV_BSIZE).  Suppose for example than fs_fsize is the default
(2K).  Then fs_fsbtodb = ilog2(4) = 2, so fsbtodb(&sbblock, 1) is 4
and dev_bsize = 2K / 4 = DEV_BSIZE again.  It is always DEV_BSIZE again,
since the fake sectorsize in newfs is always DEV_BSIZE.  The fakery is
extended to fsck_ffs by the above assignment.  The above assignment is
necessary if the kernel is changed to not fake things so much, or to
fake them differently (newfs can't can change it's i/o methods but can't
change the generated fs_fsbtodb etc. unless the kernel changes.  All it
could do much better is to make the conversions between different block
sizes (real and virtual either way) more explicit).  However, things
are already confusing enough with a fixed DEV_BSIZE.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110213215436.L1544>