Date: Sun, 17 Feb 2013 09:48:32 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: fs@freebsd.org Subject: Re: cleaning files beyond EOF Message-ID: <20130217074832.GA2598@kib.kiev.ua> In-Reply-To: <20130217172928.C1900@besplex.bde.org> References: <20130217113031.N9271@besplex.bde.org> <20130217055528.GB2522@kib.kiev.ua> <20130217172928.C1900@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote:
> On Sun, 17 Feb 2013, Konstantin Belousov wrote:
>
> > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
> >> I have a (possibly damaged) ffs data block with nonzero data beyond
> >> EOF. Is anything responsible for clearing this data when the file
> >> is mmapped()?
> >>
> >> At least old versions of gcc mmap() the file and have a bug checking
> >> for EOF. They read the garbage beyond the end and get confused.
> >
> > Does the 'damaged' status of the data block mean that it contain the
> > garbage after EOF on disk ?
>
> Yes, it's at most software damage. I used a broken version of
> vfs_bio_clrbuf() for a long time and it probably left some unusual
> blocks. This matters suprisingly rarely.
I recently had to modify the vfs_bio_clrbuf(). For me, a bug in the
function did matter a lot, because the function is used, in particular,
to clear the indirect blocks. The bug caused quite random filesystem
failures until I figured it out. My version of vfs_bio_clrbuf() is
at the end of the message, it avoids accessing b_data.
>
> I forgot to mention that this is with an old version of FreeBSD,
> where I changed vfs_bio.c a lot but barely touched vm.
>
> > UFS uses a small wrapper around vnode_generic_getpages() as the
> > VOP_GETPAGES(), the wrapping code can be ignored for the current
> > purpose.
> >
> > vnode_generic_getpages() iterates over the the pages after the bstrategy()
> > and marks the part of the page after EOF valid and zeroes it, using
> > vm_page_set_valid_range().
>
> The old version has a large non-wrapper in ffs, and vnode_generic_getpages()
> uses vm_page_set_validclean(). Maybe the bug is just in the old
> ffs_getpages(). It seems to do only DEV_BSIZE'ed zeroing stuff. It
> begins with the same "We have to zero that data" code that forms most
> of the wrapper in the current version. It normally only returns
> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
> However, my version has a variable which I had forgotten about to
> control this, and the forgotten setting of this variable results in
> always using vnode_pager_generic_getpages(), as in -current. I probably
> copied some fixes in -current for this. So the bug can't be just in
> ffs_getpages().
>
> The "damaged" block is at the end of vfs_default.c. The file size is
> 25 * PAGE_SIZE + 16. It is in 7 16K blocks, 2 full 2K frags, and 1 frag
> with 16 bytes valid in it.
But the ffs_getpages() might be indeed the culprit. It calls
vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
that ffs_getpages() also should zero the after eof part of the last page
of the file to fix your damage, since device read cannot read less than
DEV_BSIZE.
diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index ef6194c..4240b78 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -844,9 +844,9 @@ static int
ffs_getpages(ap)
struct vop_getpages_args *ap;
{
- int i;
vm_page_t mreq;
- int pcount;
+ uint64_t size;
+ int i, pcount;
pcount = round_page(ap->a_count) / PAGE_SIZE;
mreq = ap->a_m[ap->a_reqpage];
@@ -861,6 +861,9 @@ ffs_getpages(ap)
if (mreq->valid) {
if (mreq->valid != VM_PAGE_BITS_ALL)
vm_page_zero_invalid(mreq, TRUE);
+ size = VTOI(ap->a_vp)->i_size;
+ if (mreq->pindex == OFF_TO_IDX(size))
+ pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE);
for (i = 0; i < pcount; i++) {
if (i != ap->a_reqpage) {
vm_page_lock(ap->a_m[i]);
On the other hand, it is not clear should we indeed protect against such
case, or just declare the disk data broken.
>
> I have another problem that is apparently with
> vnode_pager_generic_getpages() and now affects -current from about a
> year ago in an identical way with the old version: mmap() is very slow
> in msdosfs. cmp uses mmap() too much, and reading files sequentially
> using mmap() is 3.4 times slower than reading them using read() on my
> DVD media/drive. The i/o seems to be correctly clustered for both.
> with average transaction sizes over 50K but tps much lower for mmap().
> Similarly on a (faster) hard disk except the slowness is not as noticeable
> (drive buffering might hide it completely). However, for ffs files on
> the hard disk, mmap() is as fast as read().
diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
index 6393399..83d3609 100644
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
@@ -3704,8 +4070,7 @@ vfs_bio_set_valid(struct buf *bp, int base, int size)
void
vfs_bio_clrbuf(struct buf *bp)
{
- int i, j, mask;
- caddr_t sa, ea;
+ int i, j, mask, sa, ea, slide;
if ((bp->b_flags & (B_VMIO | B_MALLOC)) != B_VMIO) {
clrbuf(bp);
@@ -3723,39 +4088,69 @@ vfs_bio_clrbuf(struct buf *bp)
if ((bp->b_pages[0]->valid & mask) == mask)
goto unlock;
if ((bp->b_pages[0]->valid & mask) == 0) {
- bzero(bp->b_data, bp->b_bufsize);
+ pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize);
bp->b_pages[0]->valid |= mask;
goto unlock;
}
}
- ea = sa = bp->b_data;
- for(i = 0; i < bp->b_npages; i++, sa = ea) {
- ea = (caddr_t)trunc_page((vm_offset_t)sa + PAGE_SIZE);
- ea = (caddr_t)(vm_offset_t)ulmin(
- (u_long)(vm_offset_t)ea,
- (u_long)(vm_offset_t)bp->b_data + bp->b_bufsize);
+ sa = bp->b_offset & PAGE_MASK;
+ slide = 0;
+ for (i = 0; i < bp->b_npages; i++) {
+ slide = imin(slide + PAGE_SIZE, bp->b_bufsize + sa);
+ ea = slide & PAGE_MASK;
+ if (ea == 0)
+ ea = PAGE_SIZE;
if (bp->b_pages[i] == bogus_page)
continue;
- j = ((vm_offset_t)sa & PAGE_MASK) / DEV_BSIZE;
+ j = sa / DEV_BSIZE;
mask = ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j;
VM_OBJECT_LOCK_ASSERT(bp->b_pages[i]->object, MA_OWNED);
if ((bp->b_pages[i]->valid & mask) == mask)
continue;
if ((bp->b_pages[i]->valid & mask) == 0)
- bzero(sa, ea - sa);
+ pmap_zero_page_area(bp->b_pages[i], sa, ea - sa);
else {
for (; sa < ea; sa += DEV_BSIZE, j++) {
- if ((bp->b_pages[i]->valid & (1 << j)) == 0)
- bzero(sa, DEV_BSIZE);
+ if ((bp->b_pages[i]->valid & (1 << j)) == 0) {
+ pmap_zero_page_area(bp->b_pages[i],
+ sa, DEV_BSIZE);
+ }
}
}
bp->b_pages[i]->valid |= mask;
+ sa = 0;
}
unlock:
VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
bp->b_resid = 0;
}
+void
+vfs_bio_bzero_buf(struct buf *bp, int base, int size)
+{
+ vm_page_t m;
+ int i, n;
+
+ if ((bp->b_flags & B_UNMAPPED) == 0) {
+ BUF_CHECK_MAPPED(bp);
+ bzero(bp->b_data + base, size);
+ } else {
+ BUF_CHECK_UNMAPPED(bp);
+ n = PAGE_SIZE - (base & PAGE_MASK);
+ VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
+ for (i = base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
+ m = bp->b_pages[i];
+ if (n > size)
+ n = size;
+ pmap_zero_page_area(m, base & PAGE_MASK, n);
+ base += n;
+ size -= n;
+ n = PAGE_SIZE;
+ }
+ VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
+ }
+}
+
/*
* vm_hold_load_pages and vm_hold_free_pages get pages into
* a buffers address space. The pages are anonymous and are
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
iQIcBAEBAgAGBQJRIItPAAoJEJDCuSvBvK1BLxoP/RBTTxKS5x4P4Y8eh72d80PC
Q7FmAKW62SjIFapsHMYVoIdsfi7QYBWW8RGLI9LSjHfz5ouwb4p7LwR5NjcFCjw3
2G/wv9cTgkHv7/+QVqWNXBi0FquLVPY9oPwe2reFz9MZpHJeN0j/5rYKzCWGLxkC
EMog2CdI5UBleLRTHCeXvHiSss75W39GAvefijo1C0/soAESCfSBLiITLZu3ZRjE
+8sLS4mJ7pGHJ2AQdbVKGb1dkt6XHoZ2T8Jc3C/GwyoYE4CnUXe/2mssSCjePEng
ho5eElyqW+3bwtMJKSQAuS+rQDhUvTLQFHHLlkZLyaDHbIVB0mkC03WQGRkdLi34
0Y+bzfNjcncyM197xUaG828DrpwKQZdsfkG685VzWBFWtzqz5epNVjjLWPBt+ReO
HBB5FNAU9FXa0u8/amkpnexYTbfgnRx0h8mtd3B7eviKeZ7o2XrMwUAWVihYvZQN
gAl/np41x8WXuIZ0BO3itc4LFFlSYj84AloyP3Tw9LT1zuk8bCcl3Vz0Xp6WAce/
tKfskZIMcgUn02wO3XOkHviwfWsi1IJ+eZhgqg7KWefj80WwtWTM8WR83Q0yhcBO
rtTfsgFYPJdhF3dAK7EZBcASSs/xYOL74/qcyUVQwUZ4Wg2AA2ew1VdvSPm10qUu
Rsd8TZkYmljlzb1BKkAK
=Zg5p
-----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130217074832.GA2598>
