From owner-freebsd-fs@FreeBSD.ORG Wed Feb 6 23:33:28 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1550B16A476; Wed, 6 Feb 2008 23:33:28 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id 5EA6C13C459; Wed, 6 Feb 2008 23:33:27 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id 66E62744013; Thu, 7 Feb 2008 01:33:25 +0200 (EET) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10027) with ESMTP id gHzvoAmT8B76; Thu, 7 Feb 2008 01:33:25 +0200 (EET) Received: from [10.74.70.239] (unknown [193.138.145.53]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id 801A6744012; Thu, 7 Feb 2008 01:33:23 +0200 (EET) Message-ID: <47AA43B9.1040608@icyb.net.ua> Date: Thu, 07 Feb 2008 01:33:13 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.9 (X11/20071208) MIME-Version: 1.0 To: pav@FreeBSD.org, scottl@FreeBSD.org References: <200612221824.kBMIOhfM049471@freefall.freebsd.org> <47A2EDB0.8000801@icyb.net.ua> <47A2F404.7010208@icyb.net.ua> <47A735A4.3060506@icyb.net.ua> <47A75B47.2040604@elischer.org> <1202155663.62432.0.camel@ikaros.oook.cz> <47A8754C.5010607@icyb.net.ua> <1202235368.68281.12.camel@ikaros.oook.cz> <47A9E191.8000607@icyb.net.ua> In-Reply-To: <47A9E191.8000607@icyb.net.ua> Content-Type: multipart/mixed; boundary="------------090706050706020808090509" Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: fs/udf: vm pages "overlap" while reading large dir [patch] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2008 23:33:28 -0000 This is a multi-part message in MIME format. --------------090706050706020808090509 Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: 7bit on 06/02/2008 18:34 Andriy Gapon said the following: > > Actually the patch is not entirely correct. max_size returned from > udf_bmap_internal should be used to calculate number of continuous > sectors for read-ahead (as opposed to file size in the patch). > Attached is an updated patch. The most prominent changes from the previous version: 1. udf_read can handle files with data embedded into fentry (sysutils/udfclient can produce such things for small files). 2. above mentioned files can now be mmap-ed, so that you can not only cat them, but also cp them; btw, I believe that cp(1) should have logic to try to fallback to read if mmap() fails - remember kern/92040 ? :-) 3. udf_bmap uses max_size[*] returned by udf_bmap_internal to calculate a number of contiguous blocks for read-ahead; [*] - this is a number of contiguous bytes from a given offset within a file. Things that stay the same: 1. I still use bread() via directory vnode; I think it's better even if reading via device vnode would work correctly. 2. I still try to read as much data as possible in directory bread(), I think this is a sort of an ad-hoc read-ahead without all the heuristics and logic that cluster reading does; I think that this should be ok because we do not have random reading of directory, it's always sequential - either to read-in the whole directory or linearly search through it until wanted entry is found. Detailed description of the patch. udf_vnops.c: Hunk1 - this is "just in case" patch, reject files with unsupported strategy right away instead of deferring failure to udf_bmap_internal. Hunk2 - fix typo in existing macro, add new macro to check if a file has data embedded into fentry ("unusual file"). Hunk3 - for an unusual file just uiomove data from fentry. Hunk4 - cosmetic changes plus a fix for udf_bmap_internal errors not actually being honored; also added an additional printf, because udf_strategy should never really be called for unusual files. Hunk5 - return EOPNOTSUPP for unusual files, this is correct because we can not correctly bmap them and also this enables correct handling of this files in vm code (vnode_pager); also code for read-ahead calculation borrowed from cd9660. Hunk6- explain function of udf_bmap_internal call in this place. Hunk7 - some cosmetics; prevent size passed to bread from being greater than MAXBSIZE; do bread via directory vnode rather than device vnode (udf_readlblks was a macro-wrapper around bread). udf_vfsops.c: Hunk1 - this is borrowed from cd9660, apparently this data is needed for correct cluster reading. Couple of words about testing. udf in 7.0-RC1 plus this patch correctly handled everything I could throw at it - huge directories, unusual files, reading, mmaping. Using udfclientfs I wrote a directory on DVD-RAM UDF disk that contained 2G of files from ports distfiles. The files varied in size from about 100 of bytes to hundreds of megabytes. I watched systat -vmstat while copying this directory. With unpatched code some files would not be copied (small "unusual files"), KB/t value stayed at 2K (meaning reads always were sector sized), MB/s was about 1. With the patch all files were copied successfully and correctly (md5 verified), KB/t varied from 2K to 64K (apparently depending on size of currently copied file), MB/s varied from 1 to 4 which is not bad for these DVD-RAM disk and drive. If somebody asks I can produce a UDF image that would contain various test cases: large directory, unusual files, etc. Or you can give a try to udfclient/udfclientfs from sysutils/udfclient port. I hope this patch will be useful. -- Andriy Gapon --------------090706050706020808090509 Content-Type: text/x-patch; name="udf_latest2.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="udf_latest2.diff" --- udf_vnops.c.orig 2008-01-29 23:50:49.000000000 +0200 +++ udf_vnops.c 2008-02-07 00:12:34.000000000 +0200 @@ -168,6 +168,16 @@ udf_open(struct vop_open_args *ap) { struct udf_node *np = VTON(ap->a_vp); off_t fsize; + /* + * We currently do not support any other strategy type, so + * udf_bmap_internal, udf_bmap, udf_strategy would fail. + * I.e. we can not access content of this file anyway. + */ + if (le16toh(np->fentry->icbtag.strat_type) != 4) { + printf("Unsupported strategy type %d\n", le16toh(np->fentry->icbtag.strat_type)); + return ENODEV; + } + fsize = le64toh(np->fentry->inf_len); vnode_create_vobject(ap->a_vp, fsize, ap->a_td); return 0; @@ -340,7 +350,9 @@ udf_pathconf(struct vop_pathconf_args *a #define lblkno(udfmp, loc) ((loc) >> (udfmp)->bshift) #define blkoff(udfmp, loc) ((loc) & (udfmp)->bmask) -#define lblktosize(imp, blk) ((blk) << (udfmp)->bshift) +#define lblktosize(udfmp, blk) ((blk) << (udfmp)->bshift) + +#define HAS_EMBEDDED_DATA(node) ((le16toh((node)->fentry->icbtag.flags) & 0x7) == 3) static int udf_read(struct vop_read_args *ap) @@ -359,6 +371,26 @@ udf_read(struct vop_read_args *ap) return (0); if (uio->uio_offset < 0) return (EINVAL); + + if (HAS_EMBEDDED_DATA(node)) { + struct file_entry *fentry; + uint8_t *data; + + fentry = node->fentry; + data = &fentry->data[le32toh(fentry->l_ea)]; + fsize = le32toh(fentry->l_ad); + + n = uio->uio_resid; + diff = fsize - uio->uio_offset; + if (diff <= 0) + return (0); + if (diff < n) + n = diff; + + error = uiomove(data + uio->uio_offset, (int)n, uio); + return (error); + } + fsize = le64toh(node->fentry->inf_len); udfmp = node->udfmp; do { @@ -799,10 +831,10 @@ udf_strategy(struct vop_strategy_args *a struct buf *bp; struct vnode *vp; struct udf_node *node; + struct bufobj *bo; int maxsize; daddr_t sector; - struct bufobj *bo; - int multiplier; + int error; bp = a->a_bp; vp = a->a_vp; @@ -813,24 +845,23 @@ udf_strategy(struct vop_strategy_args *a * Files that are embedded in the fentry don't translate well * to a block number. Reject. */ - if (udf_bmap_internal(node, bp->b_lblkno * node->udfmp->bsize, - §or, &maxsize)) { + error = udf_bmap_internal(node, lblktosize(node->udfmp, bp->b_lblkno), §or, &maxsize); + if (error) { + if (error == UDF_INVALID_BMAP) + printf("udf_strategy called for file with data embedded in fentry\n"); clrbuf(bp); bp->b_blkno = -1; + bufdone(bp); + return (0); } - - /* bmap gives sector numbers, bio works with device blocks */ - multiplier = node->udfmp->bsize / DEV_BSIZE; - bp->b_blkno = sector * multiplier; - - } - if ((long)bp->b_blkno == -1) { - bufdone(bp); - return (0); + /* udf_bmap_internal gives sector numbers, bio works with device blocks */ + bp->b_blkno = sector << (node->udfmp->bshift - DEV_BSHIFT); } + bo = node->udfmp->im_bo; bp->b_iooffset = dbtob(bp->b_blkno); BO_STRATEGY(bo, bp); + return (0); } @@ -851,17 +882,43 @@ udf_bmap(struct vop_bmap_args *a) if (a->a_runb) *a->a_runb = 0; - error = udf_bmap_internal(node, a->a_bn * node->udfmp->bsize, &lsector, - &max_size); + /* + * UDF_INVALID_BMAP means data embedded into fentry, this is an internal + * error that should not be propagated to calling code. + * Most obvious mapping for this error is EOPNOTSUPP as we can not truly + * translate block numbers in this case. + * Incidentally, this return code will make vnode pager to use VOP_READ + * to get data for mmap-ed pages and udf_read knows how to do the right + * thing for this kind of files. + */ + error = udf_bmap_internal(node, a->a_bn << node->udfmp->bshift, &lsector, &max_size); + if (error == UDF_INVALID_BMAP) + return EOPNOTSUPP; if (error) return (error); /* Translate logical to physical sector number */ *a->a_bnp = lsector << (node->udfmp->bshift - DEV_BSHIFT); - /* Punt on read-ahead for now */ - if (a->a_runp) - *a->a_runp = 0; + /* + * Determine maximum number of readahead blocks following the + * requested block. + */ + if (a->a_runp) { + int nblk; + + nblk = (max_size >> node->udfmp->bshift) - 1; + if (nblk <= 0) + *a->a_runp = 0; + else if (nblk >= (MAXBSIZE >> node->udfmp->bshift)) + *a->a_runp = (MAXBSIZE >> node->udfmp->bshift) - 1; + else + *a->a_runp = nblk; + } + + if (a->a_runb) { + *a->a_runb = 0; + } return (0); } @@ -1060,7 +1117,10 @@ udf_readatoffset(struct udf_node *node, udfmp = node->udfmp; - *bp = NULL; + /* + * This call is made not only to detect UDF_INVALID_BMAP case, + * max_size is used as an ad-hoc read-ahead hint for "normal" case. + */ error = udf_bmap_internal(node, offset, §or, &max_size); if (error == UDF_INVALID_BMAP) { /* @@ -1078,9 +1138,10 @@ udf_readatoffset(struct udf_node *node, /* Adjust the size so that it is within range */ if (*size == 0 || *size > max_size) *size = max_size; - *size = min(*size, MAXBSIZE); + *size = min(*size, MAXBSIZE - blkoff(udfmp, offset)); - if ((error = udf_readlblks(udfmp, sector, *size + (offset & udfmp->bmask), bp))) { + *bp = NULL; + if ((error = bread(node->i_vnode, lblkno(udfmp, offset), (*size + blkoff(udfmp, offset) + udfmp->bmask) & ~udfmp->bmask, NOCRED, bp))) { printf("warning: udf_readlblks returned error %d\n", error); /* note: *bp may be non-NULL */ return (error); --- udf_vfsops.c.orig 2007-03-13 03:50:24.000000000 +0200 +++ udf_vfsops.c 2008-02-05 01:29:10.000000000 +0200 @@ -330,6 +330,11 @@ udf_mountfs(struct vnode *devvp, struct bo = &devvp->v_bufobj; + if (devvp->v_rdev->si_iosize_max != 0) + mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; + if (mp->mnt_iosize_max > MAXPHYS) + mp->mnt_iosize_max = MAXPHYS; + /* XXX: should be M_WAITOK */ MALLOC(udfmp, struct udf_mnt *, sizeof(struct udf_mnt), M_UDFMOUNT, M_NOWAIT | M_ZERO); --------------090706050706020808090509--