From owner-freebsd-fs@FreeBSD.ORG Mon Nov 8 17:41:56 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 852471065673; Mon, 8 Nov 2010 17:41:56 +0000 (UTC) (envelope-from sarawgi.aditya@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4744D8FC1A; Mon, 8 Nov 2010 17:41:56 +0000 (UTC) Received: by pwj5 with SMTP id 5so53605pwj.13 for ; Mon, 08 Nov 2010 09:41:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=UEkQ27ifyK9gVdHmvIkulyOdojh5gOSz4OrW8Vy+kxs=; b=EcdO/CPPKVdQNU7fC0zc5Pu/nT8X/MVirF0q4P+l/G7I3VtPmpGEPsIV2EKmQqgXWd 3ZCLIO/s13i8WLWok4CDmA1E1hfJRLC5P2Ar9kQUiWoQ9AsO34YcXW6+Vrz59icSasoz a4Z+pyorIcvY6JvSB1vCqcHNCkTKVkHTSk7Fs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=v303C99e1ZdPfVwapyBaD804PoTQLm5Li4QF0FtowSgRBHuoqu918rhxthqvLShZNE P8TgZa+tx/4JXHYxn4Wzyeyi2rOd1HL57158K8Rlcn4ZvTz3JObRFTIHyodRNGfW1GZu X1KrgVYdT/zXeSXwPxmJLkzKHoKIG3IiYR4/c= Received: by 10.142.237.4 with SMTP id k4mr5118835wfh.171.1289238115655; Mon, 08 Nov 2010 09:41:55 -0800 (PST) Received: from earth ([183.87.49.109]) by mx.google.com with ESMTPS id x35sm175623wfd.1.2010.11.08.09.41.52 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 08 Nov 2010 09:41:54 -0800 (PST) Date: Mon, 8 Nov 2010 23:13:32 +0530 From: Aditya Sarawgi To: Doug Barton Message-ID: <20101108174327.GC2066@earth> References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org> <20100929202526.GA1564@aditya> <4CD0A3E8.4080304@FreeBSD.org> <4CD201AE.3040409@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="BXVAT5kNtrzKuDFl" Content-Disposition: inline In-Reply-To: <4CD201AE.3040409@FreeBSD.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2010 17:41:56 -0000 --BXVAT5kNtrzKuDFl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Nov 03, 2010 at 05:43:26PM -0700, Doug Barton wrote: > On 11/03/10 16:38, Aditya Sarawgi wrote: > > On Wed, Nov 3, 2010 at 5:21 AM, Doug Barton wrote: > > >> Is anything happening with this? I recently built a new system that is > >> multi-booting windows, freebsd, and ubuntu. I chose ext[23]fs for my /home > >> partition so that I could share unix'y stuff between freebsd and linux, but > >> I'm having both performance and stability problems, and today (fortunately > >> for the first time, and fortunately recoverable) I had actual data loss. I'm > >> happy to be a guinea pig for new code if people are reasonably sure that it > >> will help, but if the situation doesn't improve I will have to reformat. > >> > > > > Are you suffering from these problems on CURRENT ? > > Yes. > > > Can you please elaborate > > on the performance and stability issue you are facing ? Any specific scenario ? > > What I did was create a fairly large (37G) /home and put all the stuff > I'd like to have access to from all 3 systems, like svn, my ports tree, > etc. I also ended up putting my obj directory there because I created my > /usr/local a little smaller than I should have and after installing > gnome I ran out of room. :) > > I should also point out that this is on a brand new desktop system that > was donated by a FreeBSD user. It's a C2D running at 3.17G, 4G RAM, and > a fast 250G disk. I'm running amd64 -current. Everything disk intensive > (updating ports with csup, updating my svn trees, etc.) is slower on > this system than it was on my laptop where all the same stuff was on > UFS2. Bruce's message that started this thread alluded to the problems, > my experience has been similar. > > Regarding stability, sometimes (but not always) when I'm doing the above > listed disk-intensive things on an otherwise idle system I've had the > system lock up. Not panic, not reboot, just wedge. I'm running X when > this happens, so I'm not 100% sure that the disk activity is the > culprit, but it seems very suspicious. Yesterday was a very bad day, I > had to do 3 tries to get all the way through a buildworld/kernel, mostly > because the last 2 crashes resulted in my /usr/src (which is actually > /home/svn/head) and /usr/obj (/home/obj-9) directories getting corrupted > respectively. Today (running r214694) has actually been quite good, > although I haven't tried a buildworld yet. > I am not sure if this is the right use case for ext2fs > > You can test Zheng's preallocation patch for ext2fs, there is a > > serious lack of testers for that. > > I would be happy to do that, but my reading of this thread last month > didn't produce a clear "try this version of the patch" neon sign. > Various people referred to suggestions, updates, etc. If someone could > provide a URL for the right patch to try, as well as a suggestion for > benchmarking methodology, I'll be glad to do so. > I have attached the patch. Some primitive testing like copying files, untaring etc and comparing with the existing ext2fs will do. If you are looking to do a full fledged benchmarking then I would suggest iozone, blogbench, dbench etc. > >> On a related note, is there any way to use the journaling features of ext3fs > >> in FreeBSD? When I boot the linux partition it's treating the fs as ext3fs, > >> but AFAICS we only have ext2fs capabilities. > >> > > > > Journaling is difficult to bring in, especially if one is planning to > > have a BSDL version. > > Ok. I can live with accessing the stuff as ext2 from FreeBSD, and I can > even live with a minor performance penalty. What I can't live with is > instability and/or data corruption; and it should go without saying that > our users should not have to live with that either. > We were planning to use gjournal but it is too tied with UFS and it wouldn't be compatible with ext2fs journaling. Haiku seems to have journaling for ext2fs but that depends a lot on BFS journaling. Bringing in journaling code is not a option over here since they have their separate journaling layer. > > Thanks for the response, > > Doug > > -- > > Nothin' ever doesn't change, but nothin' changes much. > -- OK Go > > Breadth of IT experience, and depth of knowledge in the DNS. > Yours for the right price. :) http://SupersetSolutions.com/ > --BXVAT5kNtrzKuDFl Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ext2fs_prealloc.diff" diff -urN /usr/src/sys/fs/ext2fs/ext2_alloc.c new/ext2_alloc.c --- /usr/src/sys/fs/ext2fs/ext2_alloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_alloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -50,6 +50,9 @@ #include #include #include +#include + +#define phy_blk(cg, fs) (((cg) * (fs->e2fs->e2fs_fpg)) + fs->e2fs->e2fs_first_dblock) static daddr_t ext2_alloccg(struct inode *, int, daddr_t, int); static u_long ext2_dirpref(struct inode *); @@ -59,37 +62,524 @@ int)); static daddr_t ext2_nodealloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_mapsearch(struct m_ext2fs *, char *, daddr_t); + +/* For reservation window */ +static u_long ext2_alloc_blk(struct inode *, int, struct buf *, int32_t, struct ext2_rsv_win *); +static int ext2_alloc_new_rsv(struct inode *, int, struct buf *, int32_t); +static int ext2_bpref_in_rsv(struct ext2_rsv_win *, int32_t); +static int ext2_find_rsv(struct ext2_rsv_win *, struct ext2_rsv_win *, + struct m_ext2fs *, int32_t, int); +static void ext2_remove_rsv_win(struct m_ext2fs *, struct ext2_rsv_win *); +static u_long ext2_rsvalloc(struct m_ext2fs *, struct inode *, + int, struct buf *, int32_t, int); +static daddr_t ext2_search_next_block(struct m_ext2fs *, char *, int, int); +static struct ext2_rsv_win *ext2_search_rsv(struct ext2_rsv_win_tree *, int32_t); + +RB_GENERATE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + /* * Allocate a block in the file system. * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. - * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. + * By given preference: + * Check whether inode has a reservation window and preference + * is within it and try to allocate a free block from + * this reservation window. + * If not, traverse RB tree to find a place, which is not in + * any window and insert it to RB tree to try to allocate a + * free block again. + * If it fails, try to allocate a free block in other cylinder + * groups without preference. + */ + +/* + * Allocate a free block. + * + * First check whether reservation window is used. + * If reservation window is used, try to allocate a free + * block from the reservation window. If it fails, traverse + * the bitmap to find a free block. + * If reservation window is not used, try to allocate + * a free block by bpref. If it fails, traverse the bitmap + * to find a free block. */ +static u_long +ext2_alloc_blk(struct inode *ip, int cg, struct buf *bp, + int32_t bpref, struct ext2_rsv_win *rp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + int bno, start, end; + char *bbp; + + fs = ip->i_e2fs; + ump = ip->i_ump; + bbp = (char *)bp->b_data; + + if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0) + return (0); + + if (bpref < 0) + bpref = 0; + + /* Check whether it use reservation window */ + if (rp != NULL) { + /* + * If window's start is not in this cylinder group, + * try to allocate from the beginning, otherwise + * try to allocate from the beginning of the + * window. + */ + if (dtog(fs, rp->rsv_start) < cg) + start = 0; + else + start = rp->rsv_start; + + /* + * If window's end crosses the end of this group, + * set end variable to the end of this group. + * Otherwise, set it to the window's end. + */ + if (dtog(fs, rp->rsv_end) > cg) + end = phy_blk(cg + 1, fs) - 1; + else + end = rp->rsv_end; + + /* If preference block is within the window, try to allocate it. */ + if (start <= bpref && bpref <= end) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + rp->rsv_alloc_hit++; + bno = bpref; + goto gotit; + } + } else + if (dtog(fs, rp->rsv_start) == cg) + bpref = dtogd(fs, rp->rsv_start); + else + bpref = 0; + } else { + if (dtog(fs, bpref) != cg) + bpref = 0; + if (bpref != 0) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + bno = bpref; + goto gotit; + } + } + } + + bno = ext2_mapsearch(fs, bbp, bpref); + if (bno < 0) + return (0); + +gotit: + setbit(bbp, (daddr_t)bno); + EXT2_LOCK(ump); + fs->e2fs->e2fs_fbcount--; + fs->e2fs_gd[cg].ext2bgd_nbfree--; + fs->e2fs_fmod = 1; + EXT2_UNLOCK(ump); + bdwrite(bp); + bno = phy_blk(cg, fs) + bno; + return (bno); +} + +/* + * Initialize reservation window per inode. + */ +void +ext2_init_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + rp = malloc(sizeof(struct ext2_rsv_win), + M_EXT2NODE, M_WAITOK | M_ZERO); + + /* + * If malloc failed, we just do not use the + * reservation window mechanism. + */ + if (rp == NULL) + return; + + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; + rp->rsv_alloc_hit = 0; + + ip->i_rsv = rp; +} + +/* + * Discard reservation window. + * + * It is called during the following situations: + * 1. free an inode + * 2. sync inode + * 3. truncate a file + */ +void +ext2_discard_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + if (ip->i_rsv == NULL) + return; + + rp = ip->i_rsv; + + /* If reservation window is empty, nothing to do */ + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED) + return; + + EXT2_TREE_LOCK(ip->i_e2fs); + ext2_remove_rsv_win(ip->i_e2fs, rp); + EXT2_TREE_UNLOCK(ip->i_e2fs); + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; +} + +/* + * Remove a ext2_rsv_win structure from RB tree. + */ +static void +ext2_remove_rsv_win(struct m_ext2fs *fs, struct ext2_rsv_win *rp) +{ + RB_REMOVE(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_alloc_hit = 0; +} + +/* + * Check bpref is in the reservation window. + */ +static int +ext2_bpref_in_rsv(struct ext2_rsv_win *rp, int32_t bpref) +{ + if (bpref >= 0 && (bpref < rp->rsv_start || bpref > rp->rsv_end)) + return (0); + + return (1); +} + +/* + * Search a tree node from RB tree. It includes the bpref or + * the previous one if bpref is not in any window. + */ +static struct ext2_rsv_win * +ext2_search_rsv(struct ext2_rsv_win_tree *root, int32_t start) +{ + struct ext2_rsv_win *prev, *next; + + if (RB_EMPTY(root)) + return (NULL); + + next = RB_ROOT(root); + do { + prev = next; + if (start < next->rsv_start) + next = RB_LEFT(next, rsv_link); + else if (start > next->rsv_end) + next = RB_RIGHT(next, rsv_link); + else + return (next); + } while (next != NULL); + + if (prev->rsv_start > start) { + next = RB_PREV(ext2_rsv_win_tree, root, prev); + if (next != NULL) + prev = next; + } + + return (prev); +} + +/* + * Find a reservation window by given range from start to + * the end of this cylinder group. + */ +static int +ext2_find_rsv(struct ext2_rsv_win *search, struct ext2_rsv_win *rp, + struct m_ext2fs *fs, int32_t start, int cg) +{ + struct ext2_rsv_win *rsv, *prev; + int32_t cur; + int size = rp->rsv_goal_size; + + if (search == NULL) { + rp->rsv_start = start & ~7; + rp->rsv_end = start + size - 1; + rp->rsv_alloc_hit = 0; + + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); + } + + /* + * Make the start of reservation window byte-aligned + * in order to can find a free block with bit operations + * in the ext2_search_next_block() function. + */ + cur = start & ~7; + rsv = search; + prev = NULL; + + while (1) { + if (cur <= rsv->rsv_end) + cur = rsv->rsv_end + 1; + + if (dtog(fs, cur) != cg) + return (-1); + + prev = rsv; + rsv = RB_NEXT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rsv); + + if (rsv == NULL) + break; + + if (cur + size <= rsv->rsv_start) + break; + } + + if (prev != rp && rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + + rp->rsv_start = cur; + rp->rsv_end = cur + size - 1; + rp->rsv_alloc_hit = 0; + + if (prev != rp) + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); +} + +/* + * Find a free block by given range from bpref to + * the end of this cylinder group. + */ +static daddr_t +ext2_search_next_block(struct m_ext2fs *fs, char *bbp, int bpref, int cg) +{ + daddr_t bno; + int start, loc, len, map, i; + + start = bpref / NBBY; + len = howmany(fs->e2fs->e2fs_fpg, NBBY) - start; + loc = skpc(0xff, len, &bbp[start]); + if (loc == 0) + return (-1); + + i = start + len - loc; + map = bbp[i]; + bno = i * NBBY; + for (i = 1; i < (1 << NBBY); i <<= 1, bno++) { + if ((map & i) == 0) + return (bno); + } + + return (-1); +} + +/* + * Allocate a new reservation window. + */ +static int +ext2_alloc_new_rsv(struct inode *ip, int cg, struct buf *bp, int32_t bpref) +{ + struct m_ext2fs *fs; + struct ext2_rsv_win *rp, *search; + char *bbp; + int start, size, ret; + + fs = ip->i_e2fs; + rp = ip->i_rsv; + bbp = bp->b_data; + size = rp->rsv_goal_size; + + if (bpref <= 0) + start = phy_blk(cg, fs); + else + start = bpref; + + /* Dynamically increase the size of window */ + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) { + if (rp->rsv_alloc_hit > + ((rp->rsv_end - rp->rsv_start + 1) / 2)) { + size = size * 2; + if (size > EXT2_RSV_MAX_RESERVE_BLKS) + size = EXT2_RSV_MAX_RESERVE_BLKS; + rp->rsv_goal_size = size; + } + } + + EXT2_TREE_LOCK(fs); + + search = ext2_search_rsv(fs->e2fs_rsv_tree, start); + +repeat: + ret = ext2_find_rsv(search, rp, fs, start, cg); + if (ret < 0) { + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + EXT2_TREE_UNLOCK(fs); + + start = dtogd(fs, rp->rsv_start); + start = ext2_search_next_block(fs, bbp, start, cg); + if (start < 0) { + EXT2_TREE_LOCK(fs); + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + + start = phy_blk(cg, fs) + start; + if (start >= rp->rsv_start && start <= rp->rsv_end) + return (0); + + search = rp; + EXT2_TREE_LOCK(fs); + goto repeat; +} + +/* + * Allocate a free block from reservation window. + */ +static u_long +ext2_rsvalloc(struct m_ext2fs *fs, struct inode *ip, int cg, + struct buf *bp, int32_t bpref, int size) +{ + struct ext2_rsv_win *rp; + int ret; + + rp = ip->i_rsv; + if (rp == NULL) + return (ext2_alloc_blk(ip, cg, bp, bpref, NULL)); + + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED || + !ext2_bpref_in_rsv(rp, bpref)) { + ret = ext2_alloc_new_rsv(ip, cg, bp, bpref); + if (ret < 0) + return (0); + } + + return (ext2_alloc_blk(ip, cg, bp, bpref, rp)); +} + +/* + * Allocate a block using reservation window in ext2 file system. + * + * NOTE: This function will replace the ext2_alloc() function. + */ +int +ext2_alloc_rsv(struct inode *ip, int32_t lbn, int32_t bpref, + int size, struct ucred *cred, int32_t *bnp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + struct buf *bp; + int32_t bno = 0; + int i, cg, error; + + *bnp = 0; + fs = ip->i_e2fs; + ump = ip->i_ump; + mtx_assert(EXT2_MTX(ump), MA_OWNED); + + if (size == fs->e2fs_bsize && fs->e2fs->e2fs_fbcount == 0) + goto nospace; + if (cred->cr_uid != 0 && + fs->e2fs->e2fs_fbcount < fs->e2fs->e2fs_rbcount) + goto nospace; + + if (bpref >= fs->e2fs->e2fs_bcount) + bpref = 0; + if (bpref == 0) + cg = ino_to_cg(fs, ip->i_number); + else + cg = dtog(fs, bpref); + + /* If cg has some free blocks, then try to allocate a free block from this cg */ + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + /* Try to allocate from reservation window */ + bno = ext2_rsvalloc(fs, ip, cg, bp, bpref, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + /* Just need to try to allocate a free block from rest groups. */ + cg = (cg + 1) % fs->e2fs_gcount; + for (i = 1; i < fs->e2fs_gcount; i++) { + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + bno = ext2_rsvalloc(fs, ip, cg, bp, -1, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + cg++; + if (cg == fs->e2fs_gcount) + cg = 0; + } + +allocated: + if (bno > 0) { + ip->i_next_alloc_block = lbn; + ip->i_next_alloc_goal = bno; + + ip->i_blocks += btodb(fs->e2fs_bsize); + ip->i_flag |= IN_CHANGE | IN_UPDATE; + *bnp = bno; + return (0); + } + +nospace: + EXT2_UNLOCK(ump); + ext2_fserr(fs, cred->cr_uid, "file system full"); + uprintf("\n%s: write failed, file system is full\n", fs->e2fs_fsmnt); + return (ENOSPC); + +ioerror: + ext2_fserr(fs, cred->cr_uid, "file system IO error"); + uprintf("\n%s: write failed, file system IO error\n", fs->e2fs_fsmnt); + return (EIO); +} int ext2_alloc(ip, lbn, bpref, size, cred, bnp) @@ -923,9 +1413,11 @@ start = 0; loc = skpc(0xff, len, &bbp[start]); if (loc == 0) { - printf("start = %d, len = %d, fs = %s\n", - start, len, fs->e2fs_fsmnt); - panic("ext2fs_alloccg: map corrupted"); + /* XXX: just for reservation window */ + return -1; + /*printf("start = %d, len = %d, fs = %s\n",*/ + /*start, len, fs->e2fs_fsmnt);*/ + /*panic("ext2fs_alloccg: map corrupted");*/ /* NOTREACHED */ } } diff -urN /usr/src/sys/fs/ext2fs/ext2_balloc.c new/ext2_balloc.c --- /usr/src/sys/fs/ext2fs/ext2_balloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_balloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -49,6 +49,7 @@ #include #include #include +#include /* * Balloc defines the structure of file system storage * by allocating the physical blocks on a device given @@ -78,6 +79,9 @@ fs = ip->i_e2fs; ump = ip->i_ump; + if (ip->i_rsv == NULL) + ext2_init_rsv(ip); + /* * check if this is a sequential block allocation. * If so, increment next_alloc fields to allow ext2_blkpref @@ -136,9 +140,9 @@ else nsize = fs->e2fs_bsize; EXT2_LOCK(ump); - error = ext2_alloc(ip, lbn, - ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), - nsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, + ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), + nsize, cred, &newb); if (error) return (error); bp = getblk(vp, lbn, nsize, 0, 0, 0); @@ -170,9 +174,9 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[0].in_off + EXT2_NDIR_BLOCKS, &ip->i_db[0], 0); - if ((error = ext2_alloc(ip, lbn, pref, - (int)fs->e2fs_bsize, cred, &newb))) - return (error); + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb))) + return (error); nb = newb; bp = getblk(vp, indirs[1].in_lbn, fs->e2fs_bsize, 0, 0, 0); bp->b_blkno = fsbtodb(fs, newb); @@ -211,7 +215,7 @@ if (pref == 0) pref = ext2_blkpref(ip, lbn, indirs[i].in_off, bap, bp->b_lblkno); - error = ext2_alloc(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); if (error) { brelse(bp); return (error); @@ -250,8 +254,8 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[i].in_off, &bap[0], bp->b_lblkno); - if ((error = ext2_alloc(ip, - lbn, pref, (int)fs->e2fs_bsize, cred, &newb)) != 0) { + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb)) != 0) { brelse(bp); return (error); } diff -urN /usr/src/sys/fs/ext2fs/ext2_inode.c new/ext2_inode.c --- /usr/src/sys/fs/ext2fs/ext2_inode.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_inode.c 2010-08-19 02:47:29.000000000 +0800 @@ -52,6 +52,7 @@ #include #include #include +#include static int ext2_indirtrunc(struct inode *, int32_t, int32_t, int32_t, int, long *); @@ -153,6 +154,11 @@ } fs = oip->i_e2fs; osize = oip->i_size; + + EXT2_RSV_LOCK(oip); + ext2_discard_rsv(oip); + EXT2_RSV_UNLOCK(oip); + /* * Lengthen the size of the file. We must ensure that the * last byte of the file is allocated. Since the smallest @@ -484,6 +490,10 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ext2_inactive: pushing active", vp); + EXT2_RSV_LOCK(ip); + ext2_discard_rsv(ip); + EXT2_RSV_UNLOCK(ip); + /* * Ignore inodes related to stale file handles. */ @@ -525,11 +535,21 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ufs_reclaim: pushing active", vp); ip = VTOI(vp); + if (ip->i_flag & IN_LAZYMOD) { ip->i_flag |= IN_MODIFIED; ext2_update(vp, 0); } vfs_hash_remove(vp); + + EXT2_RSV_LOCK(ip); + if (ip->i_rsv != NULL) { + free(ip->i_rsv, M_EXT2NODE); + ip->i_rsv = NULL; + } + EXT2_RSV_UNLOCK(ip); + mtx_destroy(&ip->i_rsv_lock); + free(vp->v_data, M_EXT2NODE); vp->v_data = 0; vnode_destroy_vobject(vp); diff -urN /usr/src/sys/fs/ext2fs/ext2_rsv_win.h new/ext2_rsv_win.h --- /usr/src/sys/fs/ext2fs/ext2_rsv_win.h 1970-01-01 08:00:00.000000000 +0800 +++ new/ext2_rsv_win.h 2010-08-19 02:47:29.000000000 +0800 @@ -0,0 +1,78 @@ +/*- + * Copyright (c) 2010, 2010 Zheng Liu + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: src/sys/fs/ext2fs/ext2_rsv_win.h,v 0.1 2010/05/08 12:41:51 lz Exp $ + */ +#ifndef _FS_EXT2FS_EXT2_RSV_WIN_H_ +#define _FS_EXT2FS_EXT2_RSV_WIN_H_ + +#include + +#define EXT2_RSV_DEFAULT_RESERVE_BLKS 8 +#define EXT2_RSV_MAX_RESERVE_BLKS 1024 +#define EXT2_RSV_NOT_ALLOCATED 0 + +#define EXT2_RSV_LOCK(ip) mtx_lock(&ip->i_rsv_lock) +#define EXT2_RSV_UNLOCK(ip) mtx_unlock(&ip->i_rsv_lock) + +#define EXT2_TREE_LOCK(fs) mtx_lock(&fs->e2fs_rsv_lock); +#define EXT2_TREE_UNLOCK(fs) mtx_unlock(&fs->e2fs_rsv_lock); + +/* + * Reservation window entry + */ +struct ext2_rsv_win { + RB_ENTRY(ext2_rsv_win) rsv_link; /* RB tree links */ + + int32_t rsv_goal_size; /* Default reservation window size */ + int32_t rsv_alloc_hit; /* Number of allocated windows */ + + int32_t rsv_start; /* First bytes of window */ + int32_t rsv_end; /* End bytes of window */ +}; + +RB_HEAD(ext2_rsv_win_tree, ext2_rsv_win); + +static __inline int +ext2_rsv_win_cmp(const struct ext2_rsv_win *a, + const struct ext2_rsv_win *b) +{ + if (a->rsv_start < b->rsv_start) + return (-1); + if (a->rsv_start == b->rsv_start) + return (0); + + return (1); +} +RB_PROTOTYPE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + +/* predefine */ +struct inode; +/* ext2_alloc.c */ +void ext2_init_rsv(struct inode *ip); +void ext2_discard_rsv(struct inode *ip); +int ext2_alloc_rsv(struct inode *, int32_t, int32_t, int, struct ucred *, int32_t *); + +#endif /* !_FS_EXT2FS_EXT2_RSV_WIN_H_ */ diff -urN /usr/src/sys/fs/ext2fs/ext2_vfsops.c new/ext2_vfsops.c --- /usr/src/sys/fs/ext2fs/ext2_vfsops.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_vfsops.c 2010-08-19 02:47:29.000000000 +0800 @@ -1,4 +1,4 @@ -/*- +/* * modified for EXT2FS support in Lites 1.1 * * Aug 1995, Godmar Back (gback@cs.utah.edu) @@ -61,6 +61,7 @@ #include #include #include +#include static int ext2_flushfiles(struct mount *mp, int flags, struct thread *td); static int ext2_mountfs(struct vnode *, struct mount *); @@ -95,9 +96,9 @@ static int compute_sb_data(struct vnode * devvp, struct ext2fs * es, struct m_ext2fs * fs); -static const char *ext2_opts[] = { "from", "export", "acls", "noexec", - "noatime", "union", "suiddir", "multilabel", "nosymfollow", - "noclusterr", "noclusterw", "force", NULL }; +static const char *ext2_opts[] = { "acls", "async", "export", "force", + "from", "multilabel", "noatime", "noclusterr", "noclusterw", + "noexec", "nosymfollow", "suiddir", "union", NULL }; /* * VFS Operations. @@ -581,6 +582,14 @@ if ((error = compute_sb_data(devvp, ump->um_e2fs->e2fs, ump->um_e2fs))) goto out; + /* Initial reservation window index and lock */ + bzero(&ump->um_e2fs->e2fs_rsv_lock, sizeof(struct mtx)); + mtx_init(&ump->um_e2fs->e2fs_rsv_lock, + "rsv tree lock", NULL, MTX_DEF); + ump->um_e2fs->e2fs_rsv_tree = malloc(sizeof(struct ext2_rsv_win_tree), + M_EXT2MNT, M_WAITOK | M_ZERO); + RB_INIT(ump->um_e2fs->e2fs_rsv_tree); + brelse(bp); bp = NULL; fs = ump->um_e2fs; @@ -680,6 +689,8 @@ g_topology_unlock(); PICKUP_GIANT(); vrele(ump->um_devvp); + free(fs->e2fs_rsv_tree, M_EXT2MNT); + mtx_destroy(&fs->e2fs_rsv_lock); free(fs->e2fs_gd, M_EXT2MNT); free(fs->e2fs_contigdirs, M_EXT2MNT); free(fs->e2fs, M_EXT2MNT); @@ -919,6 +930,10 @@ ip->i_prealloc_count = 0; ip->i_prealloc_block = 0; + bzero(&ip->i_rsv_lock, sizeof(struct mtx)); + mtx_init(&ip->i_rsv_lock, "inode rsv lock", NULL, MTX_DEF); + ip->i_rsv = NULL; + /* * Now we want to make sure that block pointers for unused * blocks are zeroed out - ext2_balloc depends on this diff -urN /usr/src/sys/fs/ext2fs/ext2fs.h new/ext2fs.h --- /usr/src/sys/fs/ext2fs/ext2fs.h 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2fs.h 2010-08-19 02:47:29.000000000 +0800 @@ -38,6 +38,7 @@ #define _FS_EXT2FS_EXT2_FS_H #include +#include /* * Special inode numbers @@ -174,6 +175,9 @@ char e2fs_wasvalid; /* valid at mount time */ off_t e2fs_maxfilesize; struct ext2_gd *e2fs_gd; /* Group Descriptors */ + + struct mtx e2fs_rsv_lock; /* Protect reservation window RB tree */ + struct ext2_rsv_win_tree *e2fs_rsv_tree; /* Reservation window index */ }; /* diff -urN /usr/src/sys/fs/ext2fs/inode.h new/inode.h --- /usr/src/sys/fs/ext2fs/inode.h 2010-01-14 22:30:54.000000000 +0800 +++ new/inode.h 2010-08-19 02:47:29.000000000 +0800 @@ -100,6 +100,10 @@ int32_t i_gen; /* Generation number. */ u_int32_t i_uid; /* File owner. */ u_int32_t i_gid; /* File group. */ + + /* Fields for reservation window */ + struct mtx i_rsv_lock; /* Protects i_rsv */ + struct ext2_rsv_win *i_rsv; /* Reservation window */ }; /* --BXVAT5kNtrzKuDFl--