Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Feb 2013 09:48:32 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        fs@freebsd.org
Subject:   Re: cleaning files beyond EOF
Message-ID:  <20130217074832.GA2598@kib.kiev.ua>
In-Reply-To: <20130217172928.C1900@besplex.bde.org>
References:  <20130217113031.N9271@besplex.bde.org> <20130217055528.GB2522@kib.kiev.ua> <20130217172928.C1900@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--mP3DRpeJDSE+ciuQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote:
> On Sun, 17 Feb 2013, Konstantin Belousov wrote:
>=20
> > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
> >> I have a (possibly damaged) ffs data block with nonzero data beyond
> >> EOF.  Is anything responsible for clearing this data when the file
> >> is mmapped()?
> >>
> >> At least old versions of gcc mmap() the file and have a bug checking
> >> for EOF.  They read the garbage beyond the end and get confused.
> >
> > Does the 'damaged' status of the data block mean that it contain the
> > garbage after EOF on disk ?
>=20
> Yes, it's at most software damage.  I used a broken version of
> vfs_bio_clrbuf() for a long time and it probably left some unusual
> blocks.  This matters suprisingly rarely.
I recently had to modify the vfs_bio_clrbuf().  For me, a bug in the
function did matter a lot, because the function is used, in particular,
to clear the indirect blocks.  The bug caused quite random filesystem
failures until I figured it out.  My version of vfs_bio_clrbuf() is
at the end of the message, it avoids accessing b_data.

>=20
> I forgot to mention that this is with an old version of FreeBSD,
> where I changed vfs_bio.c a lot but barely touched vm.
>=20
> > UFS uses a small wrapper around vnode_generic_getpages() as the
> > VOP_GETPAGES(), the wrapping code can be ignored for the current
> > purpose.
> >
> > vnode_generic_getpages() iterates over the the pages after the bstrateg=
y()
> > and marks the part of the page after EOF valid and zeroes it, using
> > vm_page_set_valid_range().
>=20
> The old version has a large non-wrapper in ffs, and vnode_generic_getpage=
s()
> uses vm_page_set_validclean().  Maybe the bug is just in the old
> ffs_getpages().  It seems to do only DEV_BSIZE'ed zeroing stuff.  It
> begins with the same "We have to zero that data" code that forms most
> of the wrapper in the current version.  It normally only returns
> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
> However, my version has a variable which I had forgotten about to
> control this, and the forgotten setting of this variable results in
> always using vnode_pager_generic_getpages(), as in -current.  I probably
> copied some fixes in -current for this.  So the bug can't be just in
> ffs_getpages().
>=20
> The "damaged" block is at the end of vfs_default.c.  The file size is
> 25 * PAGE_SIZE + 16.  It is in 7 16K blocks, 2 full 2K frags, and 1 frag
> with 16 bytes valid in it.
But the ffs_getpages() might be indeed the culprit. It calls
vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
that ffs_getpages() also should zero the after eof part of the last page
of the file to fix your damage, since device read cannot read less than
DEV_BSIZE.

diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index ef6194c..4240b78 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -844,9 +844,9 @@ static int
 ffs_getpages(ap)
 	struct vop_getpages_args *ap;
 {
-	int i;
 	vm_page_t mreq;
-	int pcount;
+	uint64_t size;
+	int i, pcount;
=20
 	pcount =3D round_page(ap->a_count) / PAGE_SIZE;
 	mreq =3D ap->a_m[ap->a_reqpage];
@@ -861,6 +861,9 @@ ffs_getpages(ap)
 	if (mreq->valid) {
 		if (mreq->valid !=3D VM_PAGE_BITS_ALL)
 			vm_page_zero_invalid(mreq, TRUE);
+		size =3D VTOI(ap->a_vp)->i_size;
+		if (mreq->pindex =3D=3D OFF_TO_IDX(size))
+			pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE);
 		for (i =3D 0; i < pcount; i++) {
 			if (i !=3D ap->a_reqpage) {
 				vm_page_lock(ap->a_m[i]);

On the other hand, it is not clear should we indeed protect against such
case, or just declare the disk data broken.

>=20
> I have another problem that is apparently with
> vnode_pager_generic_getpages() and now affects -current from about a
> year ago in an identical way with the old version: mmap() is very slow
> in msdosfs.  cmp uses mmap() too much, and reading files sequentially
> using mmap() is 3.4 times slower than reading them using read() on my
> DVD media/drive.  The i/o seems to be correctly clustered for both.
> with average transaction sizes over 50K but tps much lower for mmap().
> Similarly on a (faster) hard disk except the slowness is not as noticeable
> (drive buffering might hide it completely).  However, for ffs files on
> the hard disk, mmap() is as fast as read().

diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
index 6393399..83d3609 100644
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
@@ -3704,8 +4070,7 @@ vfs_bio_set_valid(struct buf *bp, int base, int size)
 void
 vfs_bio_clrbuf(struct buf *bp)=20
 {
-	int i, j, mask;
-	caddr_t sa, ea;
+	int i, j, mask, sa, ea, slide;
=20
 	if ((bp->b_flags & (B_VMIO | B_MALLOC)) !=3D B_VMIO) {
 		clrbuf(bp);
@@ -3723,39 +4088,69 @@ vfs_bio_clrbuf(struct buf *bp)
 		if ((bp->b_pages[0]->valid & mask) =3D=3D mask)
 			goto unlock;
 		if ((bp->b_pages[0]->valid & mask) =3D=3D 0) {
-			bzero(bp->b_data, bp->b_bufsize);
+			pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize);
 			bp->b_pages[0]->valid |=3D mask;
 			goto unlock;
 		}
 	}
-	ea =3D sa =3D bp->b_data;
-	for(i =3D 0; i < bp->b_npages; i++, sa =3D ea) {
-		ea =3D (caddr_t)trunc_page((vm_offset_t)sa + PAGE_SIZE);
-		ea =3D (caddr_t)(vm_offset_t)ulmin(
-		    (u_long)(vm_offset_t)ea,
-		    (u_long)(vm_offset_t)bp->b_data + bp->b_bufsize);
+	sa =3D bp->b_offset & PAGE_MASK;
+	slide =3D 0;
+	for (i =3D 0; i < bp->b_npages; i++) {
+		slide =3D imin(slide + PAGE_SIZE, bp->b_bufsize + sa);
+		ea =3D slide & PAGE_MASK;
+		if (ea =3D=3D 0)
+			ea =3D PAGE_SIZE;
 		if (bp->b_pages[i] =3D=3D bogus_page)
 			continue;
-		j =3D ((vm_offset_t)sa & PAGE_MASK) / DEV_BSIZE;
+		j =3D sa / DEV_BSIZE;
 		mask =3D ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j;
 		VM_OBJECT_LOCK_ASSERT(bp->b_pages[i]->object, MA_OWNED);
 		if ((bp->b_pages[i]->valid & mask) =3D=3D mask)
 			continue;
 		if ((bp->b_pages[i]->valid & mask) =3D=3D 0)
-			bzero(sa, ea - sa);
+			pmap_zero_page_area(bp->b_pages[i], sa, ea - sa);
 		else {
 			for (; sa < ea; sa +=3D DEV_BSIZE, j++) {
-				if ((bp->b_pages[i]->valid & (1 << j)) =3D=3D 0)
-					bzero(sa, DEV_BSIZE);
+				if ((bp->b_pages[i]->valid & (1 << j)) =3D=3D 0) {
+					pmap_zero_page_area(bp->b_pages[i],
+					    sa, DEV_BSIZE);
+				}
 			}
 		}
 		bp->b_pages[i]->valid |=3D mask;
+		sa =3D 0;
 	}
 unlock:
 	VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
 	bp->b_resid =3D 0;
 }
=20
+void
+vfs_bio_bzero_buf(struct buf *bp, int base, int size)
+{
+	vm_page_t m;
+	int i, n;
+
+	if ((bp->b_flags & B_UNMAPPED) =3D=3D 0) {
+		BUF_CHECK_MAPPED(bp);
+		bzero(bp->b_data + base, size);
+	} else {
+		BUF_CHECK_UNMAPPED(bp);
+		n =3D PAGE_SIZE - (base & PAGE_MASK);
+		VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
+		for (i =3D base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
+			m =3D bp->b_pages[i];
+			if (n > size)
+				n =3D size;
+			pmap_zero_page_area(m, base & PAGE_MASK, n);
+			base +=3D n;
+			size -=3D n;
+			n =3D PAGE_SIZE;
+		}
+		VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
+	}
+}
+
 /*
  * vm_hold_load_pages and vm_hold_free_pages get pages into
  * a buffers address space.  The pages are anonymous and are

--mP3DRpeJDSE+ciuQ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRIItPAAoJEJDCuSvBvK1BLxoP/RBTTxKS5x4P4Y8eh72d80PC
Q7FmAKW62SjIFapsHMYVoIdsfi7QYBWW8RGLI9LSjHfz5ouwb4p7LwR5NjcFCjw3
2G/wv9cTgkHv7/+QVqWNXBi0FquLVPY9oPwe2reFz9MZpHJeN0j/5rYKzCWGLxkC
EMog2CdI5UBleLRTHCeXvHiSss75W39GAvefijo1C0/soAESCfSBLiITLZu3ZRjE
+8sLS4mJ7pGHJ2AQdbVKGb1dkt6XHoZ2T8Jc3C/GwyoYE4CnUXe/2mssSCjePEng
ho5eElyqW+3bwtMJKSQAuS+rQDhUvTLQFHHLlkZLyaDHbIVB0mkC03WQGRkdLi34
0Y+bzfNjcncyM197xUaG828DrpwKQZdsfkG685VzWBFWtzqz5epNVjjLWPBt+ReO
HBB5FNAU9FXa0u8/amkpnexYTbfgnRx0h8mtd3B7eviKeZ7o2XrMwUAWVihYvZQN
gAl/np41x8WXuIZ0BO3itc4LFFlSYj84AloyP3Tw9LT1zuk8bCcl3Vz0Xp6WAce/
tKfskZIMcgUn02wO3XOkHviwfWsi1IJ+eZhgqg7KWefj80WwtWTM8WR83Q0yhcBO
rtTfsgFYPJdhF3dAK7EZBcASSs/xYOL74/qcyUVQwUZ4Wg2AA2ew1VdvSPm10qUu
Rsd8TZkYmljlzb1BKkAK
=Zg5p
-----END PGP SIGNATURE-----

--mP3DRpeJDSE+ciuQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130217074832.GA2598>