Date: Thu, 07 Feb 2008 01:33:13 +0200 From: Andriy Gapon <avg@icyb.net.ua> To: pav@FreeBSD.org, scottl@FreeBSD.org Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: fs/udf: vm pages "overlap" while reading large dir [patch] Message-ID: <47AA43B9.1040608@icyb.net.ua> In-Reply-To: <47A9E191.8000607@icyb.net.ua> References: <200612221824.kBMIOhfM049471@freefall.freebsd.org> <47A2EDB0.8000801@icyb.net.ua> <47A2F404.7010208@icyb.net.ua> <47A735A4.3060506@icyb.net.ua> <47A75B47.2040604@elischer.org> <1202155663.62432.0.camel@ikaros.oook.cz> <47A8754C.5010607@icyb.net.ua> <1202235368.68281.12.camel@ikaros.oook.cz> <47A9E191.8000607@icyb.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
on 06/02/2008 18:34 Andriy Gapon said the following:
>
> Actually the patch is not entirely correct. max_size returned from
> udf_bmap_internal should be used to calculate number of continuous
> sectors for read-ahead (as opposed to file size in the patch).
>
Attached is an updated patch.
The most prominent changes from the previous version:
1. udf_read can handle files with data embedded into fentry
(sysutils/udfclient can produce such things for small files).
2. above mentioned files can now be mmap-ed, so that you can not only
cat them, but also cp them; btw, I believe that cp(1) should have logic
to try to fallback to read if mmap() fails - remember kern/92040 ? :-)
3. udf_bmap uses max_size[*] returned by udf_bmap_internal to calculate
a number of contiguous blocks for read-ahead; [*] - this is a number of
contiguous bytes from a given offset within a file.
Things that stay the same:
1. I still use bread() via directory vnode; I think it's better even if
reading via device vnode would work correctly.
2. I still try to read as much data as possible in directory bread(), I
think this is a sort of an ad-hoc read-ahead without all the heuristics
and logic that cluster reading does; I think that this should be ok
because we do not have random reading of directory, it's always
sequential - either to read-in the whole directory or linearly search
through it until wanted entry is found.
Detailed description of the patch.
udf_vnops.c:
Hunk1 - this is "just in case" patch, reject files with unsupported
strategy right away instead of deferring failure to udf_bmap_internal.
Hunk2 - fix typo in existing macro, add new macro to check if a file has
data embedded into fentry ("unusual file").
Hunk3 - for an unusual file just uiomove data from fentry.
Hunk4 - cosmetic changes plus a fix for udf_bmap_internal errors not
actually being honored; also added an additional printf, because
udf_strategy should never really be called for unusual files.
Hunk5 - return EOPNOTSUPP for unusual files, this is correct because we
can not correctly bmap them and also this enables correct handling of
this files in vm code (vnode_pager); also code for read-ahead
calculation borrowed from cd9660.
Hunk6- explain function of udf_bmap_internal call in this place.
Hunk7 - some cosmetics; prevent size passed to bread from being greater
than MAXBSIZE; do bread via directory vnode rather than device vnode
(udf_readlblks was a macro-wrapper around bread).
udf_vfsops.c:
Hunk1 - this is borrowed from cd9660, apparently this data is needed for
correct cluster reading.
Couple of words about testing.
udf in 7.0-RC1 plus this patch correctly handled everything I could
throw at it - huge directories, unusual files, reading, mmaping.
Using udfclientfs I wrote a directory on DVD-RAM UDF disk that contained
2G of files from ports distfiles. The files varied in size from about
100 of bytes to hundreds of megabytes. I watched systat -vmstat while
copying this directory. With unpatched code some files would not be
copied (small "unusual files"), KB/t value stayed at 2K (meaning reads
always were sector sized), MB/s was about 1. With the patch all files
were copied successfully and correctly (md5 verified), KB/t varied from
2K to 64K (apparently depending on size of currently copied file), MB/s
varied from 1 to 4 which is not bad for these DVD-RAM disk and drive.
If somebody asks I can produce a UDF image that would contain various
test cases: large directory, unusual files, etc. Or you can give a try
to udfclient/udfclientfs from sysutils/udfclient port.
I hope this patch will be useful.
--
Andriy Gapon
[-- Attachment #2 --]
--- udf_vnops.c.orig 2008-01-29 23:50:49.000000000 +0200
+++ udf_vnops.c 2008-02-07 00:12:34.000000000 +0200
@@ -168,6 +168,16 @@ udf_open(struct vop_open_args *ap) {
struct udf_node *np = VTON(ap->a_vp);
off_t fsize;
+ /*
+ * We currently do not support any other strategy type, so
+ * udf_bmap_internal, udf_bmap, udf_strategy would fail.
+ * I.e. we can not access content of this file anyway.
+ */
+ if (le16toh(np->fentry->icbtag.strat_type) != 4) {
+ printf("Unsupported strategy type %d\n", le16toh(np->fentry->icbtag.strat_type));
+ return ENODEV;
+ }
+
fsize = le64toh(np->fentry->inf_len);
vnode_create_vobject(ap->a_vp, fsize, ap->a_td);
return 0;
@@ -340,7 +350,9 @@ udf_pathconf(struct vop_pathconf_args *a
#define lblkno(udfmp, loc) ((loc) >> (udfmp)->bshift)
#define blkoff(udfmp, loc) ((loc) & (udfmp)->bmask)
-#define lblktosize(imp, blk) ((blk) << (udfmp)->bshift)
+#define lblktosize(udfmp, blk) ((blk) << (udfmp)->bshift)
+
+#define HAS_EMBEDDED_DATA(node) ((le16toh((node)->fentry->icbtag.flags) & 0x7) == 3)
static int
udf_read(struct vop_read_args *ap)
@@ -359,6 +371,26 @@ udf_read(struct vop_read_args *ap)
return (0);
if (uio->uio_offset < 0)
return (EINVAL);
+
+ if (HAS_EMBEDDED_DATA(node)) {
+ struct file_entry *fentry;
+ uint8_t *data;
+
+ fentry = node->fentry;
+ data = &fentry->data[le32toh(fentry->l_ea)];
+ fsize = le32toh(fentry->l_ad);
+
+ n = uio->uio_resid;
+ diff = fsize - uio->uio_offset;
+ if (diff <= 0)
+ return (0);
+ if (diff < n)
+ n = diff;
+
+ error = uiomove(data + uio->uio_offset, (int)n, uio);
+ return (error);
+ }
+
fsize = le64toh(node->fentry->inf_len);
udfmp = node->udfmp;
do {
@@ -799,10 +831,10 @@ udf_strategy(struct vop_strategy_args *a
struct buf *bp;
struct vnode *vp;
struct udf_node *node;
+ struct bufobj *bo;
int maxsize;
daddr_t sector;
- struct bufobj *bo;
- int multiplier;
+ int error;
bp = a->a_bp;
vp = a->a_vp;
@@ -813,24 +845,23 @@ udf_strategy(struct vop_strategy_args *a
* Files that are embedded in the fentry don't translate well
* to a block number. Reject.
*/
- if (udf_bmap_internal(node, bp->b_lblkno * node->udfmp->bsize,
- §or, &maxsize)) {
+ error = udf_bmap_internal(node, lblktosize(node->udfmp, bp->b_lblkno), §or, &maxsize);
+ if (error) {
+ if (error == UDF_INVALID_BMAP)
+ printf("udf_strategy called for file with data embedded in fentry\n");
clrbuf(bp);
bp->b_blkno = -1;
+ bufdone(bp);
+ return (0);
}
-
- /* bmap gives sector numbers, bio works with device blocks */
- multiplier = node->udfmp->bsize / DEV_BSIZE;
- bp->b_blkno = sector * multiplier;
-
- }
- if ((long)bp->b_blkno == -1) {
- bufdone(bp);
- return (0);
+ /* udf_bmap_internal gives sector numbers, bio works with device blocks */
+ bp->b_blkno = sector << (node->udfmp->bshift - DEV_BSHIFT);
}
+
bo = node->udfmp->im_bo;
bp->b_iooffset = dbtob(bp->b_blkno);
BO_STRATEGY(bo, bp);
+
return (0);
}
@@ -851,17 +882,43 @@ udf_bmap(struct vop_bmap_args *a)
if (a->a_runb)
*a->a_runb = 0;
- error = udf_bmap_internal(node, a->a_bn * node->udfmp->bsize, &lsector,
- &max_size);
+ /*
+ * UDF_INVALID_BMAP means data embedded into fentry, this is an internal
+ * error that should not be propagated to calling code.
+ * Most obvious mapping for this error is EOPNOTSUPP as we can not truly
+ * translate block numbers in this case.
+ * Incidentally, this return code will make vnode pager to use VOP_READ
+ * to get data for mmap-ed pages and udf_read knows how to do the right
+ * thing for this kind of files.
+ */
+ error = udf_bmap_internal(node, a->a_bn << node->udfmp->bshift, &lsector, &max_size);
+ if (error == UDF_INVALID_BMAP)
+ return EOPNOTSUPP;
if (error)
return (error);
/* Translate logical to physical sector number */
*a->a_bnp = lsector << (node->udfmp->bshift - DEV_BSHIFT);
- /* Punt on read-ahead for now */
- if (a->a_runp)
- *a->a_runp = 0;
+ /*
+ * Determine maximum number of readahead blocks following the
+ * requested block.
+ */
+ if (a->a_runp) {
+ int nblk;
+
+ nblk = (max_size >> node->udfmp->bshift) - 1;
+ if (nblk <= 0)
+ *a->a_runp = 0;
+ else if (nblk >= (MAXBSIZE >> node->udfmp->bshift))
+ *a->a_runp = (MAXBSIZE >> node->udfmp->bshift) - 1;
+ else
+ *a->a_runp = nblk;
+ }
+
+ if (a->a_runb) {
+ *a->a_runb = 0;
+ }
return (0);
}
@@ -1060,7 +1117,10 @@ udf_readatoffset(struct udf_node *node,
udfmp = node->udfmp;
- *bp = NULL;
+ /*
+ * This call is made not only to detect UDF_INVALID_BMAP case,
+ * max_size is used as an ad-hoc read-ahead hint for "normal" case.
+ */
error = udf_bmap_internal(node, offset, §or, &max_size);
if (error == UDF_INVALID_BMAP) {
/*
@@ -1078,9 +1138,10 @@ udf_readatoffset(struct udf_node *node,
/* Adjust the size so that it is within range */
if (*size == 0 || *size > max_size)
*size = max_size;
- *size = min(*size, MAXBSIZE);
+ *size = min(*size, MAXBSIZE - blkoff(udfmp, offset));
- if ((error = udf_readlblks(udfmp, sector, *size + (offset & udfmp->bmask), bp))) {
+ *bp = NULL;
+ if ((error = bread(node->i_vnode, lblkno(udfmp, offset), (*size + blkoff(udfmp, offset) + udfmp->bmask) & ~udfmp->bmask, NOCRED, bp))) {
printf("warning: udf_readlblks returned error %d\n", error);
/* note: *bp may be non-NULL */
return (error);
--- udf_vfsops.c.orig 2007-03-13 03:50:24.000000000 +0200
+++ udf_vfsops.c 2008-02-05 01:29:10.000000000 +0200
@@ -330,6 +330,11 @@ udf_mountfs(struct vnode *devvp, struct
bo = &devvp->v_bufobj;
+ if (devvp->v_rdev->si_iosize_max != 0)
+ mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+ if (mp->mnt_iosize_max > MAXPHYS)
+ mp->mnt_iosize_max = MAXPHYS;
+
/* XXX: should be M_WAITOK */
MALLOC(udfmp, struct udf_mnt *, sizeof(struct udf_mnt), M_UDFMOUNT,
M_NOWAIT | M_ZERO);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47AA43B9.1040608>
