From owner-svn-src-all@FreeBSD.ORG Sun Feb 13 13:14:17 2011 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 948B7106564A; Sun, 13 Feb 2011 13:14:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id A0A2A8FC18; Sun, 13 Feb 2011 13:14:16 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p1DDEBnQ073045 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 13 Feb 2011 15:14:11 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p1DDEBmV030252; Sun, 13 Feb 2011 15:14:11 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p1DDEBww030251; Sun, 13 Feb 2011 15:14:11 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 13 Feb 2011 15:14:11 +0200 From: Kostik Belousov To: Bruce Evans Message-ID: <20110213131411.GI78089@deviant.kiev.zoral.com.ua> References: <201102121317.p1CDHE9v002742@svn.freebsd.org> <20110213215436.L1544@besplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Ve0F8fDhlCYJZdTN" Content-Disposition: inline In-Reply-To: <20110213215436.L1544@besplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r218604 - head/sbin/fsck_ffs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Feb 2011 13:14:17 -0000 --Ve0F8fDhlCYJZdTN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 13, 2011 at 11:30:56PM +1100, Bruce Evans wrote: > On Sat, 12 Feb 2011, Konstantin Belousov wrote: >=20 > >Log: > > In checker, read journal by sectors. > > > > Due to UFS insistence to pretend that device sector size is 512 bytes, >=20 > Do you mean that ffs_fsck does this? UFS doesn't exist, and ffs in the > kernel didn't use device sectors at all until recently (it used DEV_BSIZE > units, but those are not sectors but are the units for bread() and friend= s). Yes, the journal writer started using DEV_BSIZE as _sector_ size to write the journal blocks, and fixing this was the point of the series of commits. Journal was unoperable on the providers with non-512 byte sectors. > newfs uses a hack to do this. When handling kernel sizes in units of > DEV_BSIZE (not a sector size!), ffs_fsck should use DEV_BSIZE (or maybe > just fsbtodb()) instead of `dev_bsize' or `sectorsize' and not > pretend that DEV_BSIZE (or the corresponding superblock value for the > conversion macro) is the sector size. >=20 > > sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices, > > and from the label otherwise. The file images without label have to > > be made with 512 sector size. >=20 > >Modified: head/sbin/fsck_ffs/fsck.h > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >--- head/sbin/fsck_ffs/fsck.h Sat Feb 12 13:12:45 2011 (r218603) > >+++ head/sbin/fsck_ffs/fsck.h Sat Feb 12 13:17:14 2011 (r218604) > >@@ -268,6 +268,7 @@ char snapname[BUFSIZ]; /* when doing sna > >char *cdevname; /* name of device being checked */ > >long dev_bsize; /* computed value of DEV_BSIZE */ > >long secsize; /* actual disk sector size */ > >+long real_dev_bsize; >=20 > If `secsize' is the actual disk sector size, then we don't need another > variable giving the real sector size. >=20 > The new variable has no comment. What is the difference between a sector > size and a dev_bsize? There is enough confusion between DEV_BSIZE and > sector sizes already. DEV_BSIZE has come to means nothing to do with > sectors and little to do with devices or block sizes. It is just a minim= al > i/o sizes for use in old interfaces (ones that didn't want to use more th= an > 32 bits for disk addresses, but now use 64 bits anyway, so that they can > address 72 bits after multiplication by DEV_BSIZE). There is no "actual" difference (I tired of the word actual when debugging the patches), they are all DEV_BSIZE. >=20 > These variables are named and documented slightly better in mkfs: >=20 > % extern int sectorsize; /* bytes/sector */ > % extern int realsectorsize; /* bytes/sector in hardware*/ >=20 > At lease the fake sector size variable doesn't claim to be actual, and > the real sector size variable doesn't have extra underscores and claims > to be a sector size variable. >=20 > >Modified: head/sbin/fsck_ffs/setup.c > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >--- head/sbin/fsck_ffs/setup.c Sat Feb 12 13:12:45 2011 (r218603) > >+++ head/sbin/fsck_ffs/setup.c Sat Feb 12 13:17:14 2011 (r218604) > >@@ -446,7 +446,7 @@ sblock_init(void) > > if (sblk.b_un.b_buf =3D=3D NULL || asblk.b_un.b_buf =3D=3D NULL) > > errx(EEXIT, "cannot allocate space for superblock"); > > if ((lp =3D getdisklabel(NULL, fsreadfd))) > >- dev_bsize =3D secsize =3D lp->d_secsize; > >+ real_dev_bsize =3D dev_bsize =3D secsize =3D lp->d_secsize; > > else > > dev_bsize =3D secsize =3D DEV_BSIZE; > >} >=20 > Both the variables are real enough here, provided getdisklabel() works. > Otherwise, you are in trouble and should fail instead of defaulting the > size unless the size is set by another means. newfs seems to fail, and > I'm sure newfs_msdos does fail if the sector size is unknown. >=20 > >Modified: head/sbin/fsck_ffs/suj.c > >=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >--- head/sbin/fsck_ffs/suj.c Sat Feb 12 13:12:45 2011 (r218603) > >+++ head/sbin/fsck_ffs/suj.c Sat Feb 12 13:17:14 2011 (r218604) > >@@ -28,6 +28,7 @@ > >__FBSDID("$FreeBSD$"); > > > >#include > >+#include > >#include > >#include > >#include > >@@ -201,6 +202,11 @@ opendisk(const char *devnam) > > disk->d_error); > > } > > fs =3D &disk->d_fs; > >+ if (real_dev_bsize =3D=3D 0 && ioctl(disk->d_fd, DIOCGSECTORSIZE, > >+ &real_dev_bsize) =3D=3D -1) > >+ real_dev_bsize =3D secsize; > >+ if (debug) > >+ printf("dev_bsize %ld\n", real_dev_bsize); > >} > > > >/* >=20 > If this gives a different "real_dev_bsize", and this is different from the > sector size in the label or defaulted-to, then the latter is presumably > wrong, and we're left with a `secsize' that is not the "actual sector siz= e". Yes, secsize is not the "actual sector size", it is equal to DEV_BSIZE, see setup.c:sblock_init(). >=20 > It is a bug to replace the sector size in the label with the one obtained > here. The former should be correct, and it is the only one under user > control. It might need to be changed if one given by the ioctl is wrong > or just if it is fake. (Labels only have 32-bit disk addresses, so fake > sector sizes are needed to address more than 41 bits if the physical sect= or > size has is 512.) >=20 > newfs doesn't have the precedence that I want: > - sectorsize set on the command line (-S option) has precedence. [Bug; > -S 0 is needed to cancel any previous setting of sectorsize on the > command line by -S or maybe by -T, but it is treated as an error.] > fsck_ffs unfortunately doesn't seem to have any -S or -T option. These > used to be unneeded since sector size weren't used, and labels worked > and contain enough info to locate the superblock where there is more > info. > - then try setting sectorsize using the ioctl if sectorsize is not already > set. [Bug: no error checking for this ioctl. If it fails, then it > may clobber sectorsize, which breaks the next step.] > - then set sectorsize from the label if there is a label and sectorsize > is not already set > - fail if sectorsize is not set newfs.c does not do what you describe, in fact. After going to all troubles finding the sector size, it performs the following: if (sectorsize !=3D DEV_BSIZE) { /* XXX */ int secperblk =3D sectorsize / DEV_BSIZE; sectorsize =3D DEV_BSIZE; fssize *=3D secperblk; if (pp !=3D NULL) pp->p_size *=3D secperblk; } And now layout calculations happen as if sector size is 512 bytes. I was puzzled for long time why libufs givess me 512 in d_bsize for a volume over 4096-bytes per sector md: disk->d_bsize =3D fs->fs_fsize / fsbtodb(fs, 1); >=20 > Then the hacks involving realsectorsize: > - set it to sectorsize > - if sectorsize !=3D DEV_BSIZE, change it to DEV_BSIZE and fix up related > quantities (I hope the change to the partition table is never written). > Now `sectorsize' is no longer "actual", though it was actual before the > hacks. We continue with the actual sector size recorded in=20 > realsectorsize. realsectorsize is not used. You are citing the piece of code I pasted above. >=20 > >@@ -2262,7 +2268,7 @@ suj_build(void) > > rec =3D (union jrec *)seg->ss_blk; > > for (i =3D 0; i < seg->ss_rec.jsr_cnt; off +=3D JREC_SIZE,=20 > > rec++) { > > /* skip the segrec. */ > >- if ((off % DEV_BSIZE) =3D=3D 0) > >+ if ((off % real_dev_bsize) =3D=3D 0) > > continue; > > switch (rec->rec_jrefrec.jr_op) { > > case JOP_ADDREF: > >... >=20 > This is all sort of backwards. DEV_BSIZE is the real DEV_BSIZE. Old code > that uses units of DEV_BSIZE spells this as dev_bsize although it is > constant. New code that uses units of sectors spelled this as DEV_BSIZE > although the correct size is variable; now the new code spelled this as > real_dev_bsize, which is variable but unrelated to dev_bsize =3D DEV_BSIZ= E=20 > :-). We need to write the journal block atomically, and device/geom level only provides that: - we must write at least sector; - any write bigger then sector is not atomic. >=20 > I think using fragments instead of sectors would work better. For a star= t, > you can get the fragment size from the superblock and not have to worry > about labels of ioctls. No, it would not, since fragment write can be non-atomic. >=20 > There is a problem locating the superblock. The kernel and various > utilities only try reading the superblock with size SBSIZE (8K). This > fails if the sector size is not a divisor of 8K. They also only try > reading at various offsets which might not be a multiple of the sector > size, but which are larger than 8K so they are more likely to be a > multiple. I fixed this problem in fsck_msdosfs where it is larger > (fsck_msdosfs assumes that the sector size is a divisor of 512, but > sector sizes like 2K and 4K are now common relative to sizes of 16K). >=20 > Hmm, the initialization of dev_bsize is even more obfuscated than I > remembered: >=20 > % /* > % * Compute block size that the file system is based on, > % * according to fsbtodb, and adjust superblock block number > % * so we can tell if this is an alternate later. > % */ > % super *=3D dev_bsize; >=20 > We may have needed a fake sector size to read the superblock. I forget if > we got a non-fake one from the label or ioctl before here. We should have > just tried all reasonable power of 2 sizes between 1 and 1M (I use > 64K down to DOSBOOTBLOCKSIZE =3D 512 in fsck_msdosfs). >=20 > % dev_bsize =3D sblock.fs_fsize / fsbtodb(&sblock, 1); >=20 > This is an obfuscated way of spelling DEV_BSIZE. Although fsbtodb() > converts to blocks with fixed size DEV_BSIZE, it needs a variable shift > count to handle the variable fragment size. This shift count is > fs_fsbtodb, which was initialized in newfs to > ilog2(fs_fsize / sectorsize). But sectorsize was fake there > (always DEV_BSIZE). Suppose for example than fs_fsize is the default > (2K). Then fs_fsbtodb =3D ilog2(4) =3D 2, so fsbtodb(&sbblock, 1) is 4 > and dev_bsize =3D 2K / 4 =3D DEV_BSIZE again. It is always DEV_BSIZE aga= in, > since the fake sectorsize in newfs is always DEV_BSIZE. The fakery is > extended to fsck_ffs by the above assignment. The above assignment is > necessary if the kernel is changed to not fake things so much, or to > fake them differently (newfs can't can change it's i/o methods but can't > change the generated fs_fsbtodb etc. unless the kernel changes. All it > could do much better is to make the conversions between different block > sizes (real and virtual either way) more explicit). However, things > are already confusing enough with a fixed DEV_BSIZE. Again, you are describing what I pasted above, and what I referred to as "UFS (ok, FFS) pretending that sector size is 512 bytes". I agree that it would be cleaner if FFS does not make such obscuring arithmetic, but it did not matter while i/o was done in units of whole fragments. Journal requirements are different. --Ve0F8fDhlCYJZdTN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk1X2SIACgkQC3+MBN1Mb4hGJACfUvdEVje1OLZamE9cVtDKFqcY 7dgAoLnMYTVl21xIyzO8lQ9tUnJKTDpV =yLzZ -----END PGP SIGNATURE----- --Ve0F8fDhlCYJZdTN--