Date: Wed, 29 Sep 2010 13:50:16 -0500 From: Brandon Gooch <jamesbrandongooch@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: ext2fs now extremely slow Message-ID: <AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN@mail.gmail.com> In-Reply-To: <201009290917.05269.jhb@freebsd.org> References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 29, 2010 at 8:17 AM, John Baldwin <jhb@freebsd.org> wrote: > On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote: >> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote: >> > On Wed, 29 Sep 2010, Bruce Evans wrote: >> > >> > > On Wed, 29 Sep 2010, Bruce Evans wrote: >> > > >> > >> For benchmarks on ext2fs: >> > >> >> > >> Under FreeBSD-~5.2 rerun today: >> > >> untar: =A0 =A0 59.17 real >> > >> tar: =A0 =A0 =A0 19.52 real >> > >> >> > >> Under -current run today: >> > >> untar: =A0 =A0101.16 real >> > >> tar: =A0 =A0 =A0172.03 real >> > >> >> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower= for >> > >> untar. >> > >> ... >> > >> So it seems that only 1 block in every 8 is used, and there is a se= ek >> > >> after every block. =A0This asks for an 8-fold reduction in throughp= ut, >> > >> and it seems to have got that and a bit more for reading although n= ot >> > >> for writing. =A0Even (or especially) with perfect hardware, it must= give >> > >> an 8-fold reduction. =A0And it is likely to give more, since it def= eats >> > >> vfs clustering by making all runs of contiguous blocks have length = 1. >> > >> >> > >> Simple sequential allocation should be used unless the allocation p= olicy >> > >> and implementation are very good. >> > > >> > > This work a bit better after zapping the 8-fold way: >> > =A0 =A0Things >> > > ... >> > > This gives an improvement of: >> > > >> > > untar: =A0 =A0101.16 real -> 63.46 >> > > tar: =A0 =A0 =A0172.03 real -> 50.70 >> > > >> > > Now -current is only 1.1 times slower for untar and 2.6 times slower= for >> > > tar. >> > > >> > > There must be a problem with bpref for things to have been so bad. = =A0There >> > > is some point to leaving a gap of 7 blocks for expansion, but the ga= p was >> > > left even between blocks in a single file. >> > > ... >> > > I haven't tried the bde_blkpref hack in the above. =A0It should kill= bpref >> > > completely so that there is no jump between lbn0 and lbn1, and break >> > > cylinder group based allocation even better. =A0Setting bde_blkpref = to 1 >> > > restores the bug that was present in ext2fs in FreeBSD between 1995 = and >> > > 2010. =A0This bug gave seqential allocation starting at the beginnin= g of >> > > the disk in almost all cases, so map searches were slow and early gr= oups >> > > filled up before later groups were used at all. >> > >> > Tried this (patch repeated below), and it gave essentially the same >> > speed as old versions. >> > >> > The main problem seems to be that the `goal' variables aren't initiali= zed. >> > After restoring bits verbatim from an old version, things seem to work= as >> > expected: >> > >> > % Index: ext2_alloc.c >> > % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v >> > % retrieving revision 1.2 >> > % diff -u -2 -r1.2 ext2_alloc.c >> > % --- ext2_alloc.c =A01 Sep 2010 05:34:17 -0000 =A0 =A0 =A0 1.2 >> > % +++ ext2_alloc.c =A028 Sep 2010 21:08:42 -0000 >> > % @@ -1,2 +1,5 @@ >> > % +int bde_blkpref =3D 0; >> > % +int bde_alloc8 =3D 0; >> > % + >> > % =A0/*- >> > % =A0 * =A0modified for Lites 1.1 >> > % @@ -117,4 +120,8 @@ >> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ext2_alloccg); >> > % =A0 =A0 =A0 =A0 =A0if (bno > 0) { >> > % + =A0 =A0 =A0 =A0 /* set next_alloc fields as done in block_getblk *= / >> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_block =3D lbn; >> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_goal =3D bno; >> > % + >> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_blocks +=3D btodb(fs->e2fs_= bsize); >> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_flag |=3D IN_CHANGE | IN_UP= DATE; >> > >> > The only things that changed recently in this block were the 4 deleted >> > lines and 4 lines with tabs corrupted to spaces. =A0Perhaps an editing >> > error. >> > >> > % @@ -542,6 +549,12 @@ >> > % =A0 =A0 =A0then set the goal to what we thought it should be >> > % =A0 */ >> > % +if (bde_blkpref =3D=3D 0) { >> > % =A0 if(ip->i_next_alloc_block =3D=3D lbn && ip->i_next_alloc_goal != =3D 0) >> > % =A0 =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal; >> > % +} else if (bde_blkpref =3D=3D 1) { >> > % + if(ip->i_next_alloc_block =3D=3D lbn) >> > % + =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal; >> > % +} else >> > % + return 0; >> > % >> > % =A0 /* now check whether we were provided with an array that basical= ly >> > >> > Not needed now. >> > >> > % @@ -662,4 +675,5 @@ >> > % =A0 =A0* block. >> > % =A0 =A0*/ >> > % +if (bde_alloc8 =3D=3D 0) { >> > % =A0 if (bpref) >> > % =A0 =A0 =A0 =A0 =A0 start =3D dtogd(fs, bpref) / NBBY; >> > % @@ -679,4 +693,5 @@ >> > % =A0 =A0 =A0 =A0 =A0 } >> > % =A0 } >> > % +} >> > % >> > % =A0 bno =3D ext2_mapsearch(fs, bbp, bpref); >> > >> > The code to skip to the next 8-block boundary should be removed perman= ently. >> > After fixing the initialization, it doesn't generate holes inside file= s but >> > it still generates holes between files. =A0The holes are quite large w= ith >> > 4K-blocks. >> > >> > Benchmark results with just the initialization of `goal' variables res= tored: >> > >> > %%% >> > ext2fs-1024-1024: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 78.79 real =A0 =A0 =A0 = =A0 0.31 user =A0 =A0 =A0 =A0 4.94 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.62 real =A0 =A0 =A0 =A0 0.19= user =A0 =A0 =A0 =A0 1.82 sys >> > ext2fs-1024-1024-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 52.07 real =A0 =A0 =A0 = =A0 0.26 user =A0 =A0 =A0 =A0 4.95 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.80 real =A0 =A0 =A0 =A0 0.10= user =A0 =A0 =A0 =A0 1.93 sys >> > ext2fs-4096-4096: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 74.14 real =A0 =A0 =A0 = =A0 0.34 user =A0 =A0 =A0 =A0 3.96 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.82 real =A0 =A0 =A0 =A0 0.10= user =A0 =A0 =A0 =A0 1.19 sys >> > ext2fs-4096-4096-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 53.54 real =A0 =A0 =A0 = =A0 0.36 user =A0 =A0 =A0 =A0 3.87 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.91 real =A0 =A0 =A0 =A0 0.14= user =A0 =A0 =A0 =A0 1.15 sys >> > %%% >> > >> > The much larger holes between the files are apparently responsible for= the >> > decreased speed with 4K-blocks. =A01K-blocks are really too small, so = 4K-blocks >> > should be faster. >> > >> > Benchmark results with the fix and bde_alloc8 =3D 1. >> > >> > ext2fs-1024-1024: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 71.60 real =A0 =A0 =A0 = =A0 0.15 user =A0 =A0 =A0 =A0 2.04 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 22.34 real =A0 =A0 =A0 =A0 0.05= user =A0 =A0 =A0 =A0 0.79 sys >> > ext2fs-1024-1024-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 46.03 real =A0 =A0 =A0 = =A0 0.14 user =A0 =A0 =A0 =A0 2.02 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 21.97 real =A0 =A0 =A0 =A0 0.05= user =A0 =A0 =A0 =A0 0.80 sys >> > ext2fs-4096-4096: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 59.66 real =A0 =A0 =A0 = =A0 0.13 user =A0 =A0 =A0 =A0 1.63 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.88 real =A0 =A0 =A0 =A0 0.07= user =A0 =A0 =A0 =A0 0.46 sys >> > ext2fs-4096-4096-as: >> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 37.30 real =A0 =A0 =A0 = =A0 0.12 user =A0 =A0 =A0 =A0 1.60 sys >> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.93 real =A0 =A0 =A0 =A0 0.05= user =A0 =A0 =A0 =A0 0.49 sys >> > >> > Bruce >> >> Hi, >> >> I see what you are saying. The gap of 8 block between the files >> is due to the old preallocation which used to allocate additional >> 8 blocks in advance for a particular inode when allocating a block >> for it. The gap between blocks of the same file shouldn't be there >> too. Both of these cases should be removed. I will look into this >> during this week. The slowness is also due to lack of preallocation >> in the new code. > > One of the GSoC students worked on a patch to add preallocation back to > ext2fs this summer. =A0Would you be interested in reviewing and/or testin= g > that patch? =A0(I've attached it). =A0Here is his original e-mail: > > <quote> > Hi all, > > There is a patch in attachment which implements a preallocation > algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010. > > This patch implements the in-memory ext2/3 block preallocation algorithm > from reservation window. It uses a RB-tree to index block allocation > request and reserve a number of blocks for each file which has requested > to allocate a block. When a file request to allocate a block, it will > find a block to allocate to this file. When it find the block to > allocate, it will try to allocate a block, which is in the same cylinder > group with inode and is not in other reservation window in RB-tree. > Meanwhile there are some contiguous free blocks after this block. It > uses a data structure to store this block's position and the length of > contiguous free blocks. Then it inserts this data structure into > RB-tree. When this file request to allocate a block again, It will find > corresponding data structure in RB-tree. If it can find, the next free > block will be allocated to this file directly. Otherwise, it will search > a new block again. > > I have run some benchmarks to test this algorithm. Please review it in > wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance > is better when the number of threads is smaller than 4. When the number > of threads is greater than 4, the performance can be increased a little. > > Please test it. > > > Thanks and best regards, > > lz > </quote> Wow, this is really awesome! What are the chances of this code being committed before a 9.0 release (assuming we have enough user testing)? -Brandon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN>