Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Sep 2010 13:50:16 -0500
From:      Brandon Gooch <jamesbrandongooch@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ext2fs now extremely slow
Message-ID:  <AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN@mail.gmail.com>
In-Reply-To: <201009290917.05269.jhb@freebsd.org>
References:  <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 29, 2010 at 8:17 AM, John Baldwin <jhb@freebsd.org> wrote:
> On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote:
>> On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote:
>> > On Wed, 29 Sep 2010, Bruce Evans wrote:
>> >
>> > > On Wed, 29 Sep 2010, Bruce Evans wrote:
>> > >
>> > >> For benchmarks on ext2fs:
>> > >>
>> > >> Under FreeBSD-~5.2 rerun today:
>> > >> untar: =A0 =A0 59.17 real
>> > >> tar: =A0 =A0 =A0 19.52 real
>> > >>
>> > >> Under -current run today:
>> > >> untar: =A0 =A0101.16 real
>> > >> tar: =A0 =A0 =A0172.03 real
>> > >>
>> > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower=
 for
>> > >> untar.
>> > >> ...
>> > >> So it seems that only 1 block in every 8 is used, and there is a se=
ek
>> > >> after every block. =A0This asks for an 8-fold reduction in throughp=
ut,
>> > >> and it seems to have got that and a bit more for reading although n=
ot
>> > >> for writing. =A0Even (or especially) with perfect hardware, it must=
 give
>> > >> an 8-fold reduction. =A0And it is likely to give more, since it def=
eats
>> > >> vfs clustering by making all runs of contiguous blocks have length =
1.
>> > >>
>> > >> Simple sequential allocation should be used unless the allocation p=
olicy
>> > >> and implementation are very good.
>> > >
>> > > This work a bit better after zapping the 8-fold way:
>> > =A0 =A0Things
>> > > ...
>> > > This gives an improvement of:
>> > >
>> > > untar: =A0 =A0101.16 real -> 63.46
>> > > tar: =A0 =A0 =A0172.03 real -> 50.70
>> > >
>> > > Now -current is only 1.1 times slower for untar and 2.6 times slower=
 for
>> > > tar.
>> > >
>> > > There must be a problem with bpref for things to have been so bad. =
=A0There
>> > > is some point to leaving a gap of 7 blocks for expansion, but the ga=
p was
>> > > left even between blocks in a single file.
>> > > ...
>> > > I haven't tried the bde_blkpref hack in the above. =A0It should kill=
 bpref
>> > > completely so that there is no jump between lbn0 and lbn1, and break
>> > > cylinder group based allocation even better. =A0Setting bde_blkpref =
to 1
>> > > restores the bug that was present in ext2fs in FreeBSD between 1995 =
and
>> > > 2010. =A0This bug gave seqential allocation starting at the beginnin=
g of
>> > > the disk in almost all cases, so map searches were slow and early gr=
oups
>> > > filled up before later groups were used at all.
>> >
>> > Tried this (patch repeated below), and it gave essentially the same
>> > speed as old versions.
>> >
>> > The main problem seems to be that the `goal' variables aren't initiali=
zed.
>> > After restoring bits verbatim from an old version, things seem to work=
 as
>> > expected:
>> >
>> > % Index: ext2_alloc.c
>> > % =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
>> > % retrieving revision 1.2
>> > % diff -u -2 -r1.2 ext2_alloc.c
>> > % --- ext2_alloc.c =A01 Sep 2010 05:34:17 -0000 =A0 =A0 =A0 1.2
>> > % +++ ext2_alloc.c =A028 Sep 2010 21:08:42 -0000
>> > % @@ -1,2 +1,5 @@
>> > % +int bde_blkpref =3D 0;
>> > % +int bde_alloc8 =3D 0;
>> > % +
>> > % =A0/*-
>> > % =A0 * =A0modified for Lites 1.1
>> > % @@ -117,4 +120,8 @@
>> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ext2_alloccg);
>> > % =A0 =A0 =A0 =A0 =A0if (bno > 0) {
>> > % + =A0 =A0 =A0 =A0 /* set next_alloc fields as done in block_getblk *=
/
>> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_block =3D lbn;
>> > % + =A0 =A0 =A0 =A0 ip->i_next_alloc_goal =3D bno;
>> > % +
>> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_blocks +=3D btodb(fs->e2fs_=
bsize);
>> > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ip->i_flag |=3D IN_CHANGE | IN_UP=
DATE;
>> >
>> > The only things that changed recently in this block were the 4 deleted
>> > lines and 4 lines with tabs corrupted to spaces. =A0Perhaps an editing
>> > error.
>> >
>> > % @@ -542,6 +549,12 @@
>> > % =A0 =A0 =A0then set the goal to what we thought it should be
>> > % =A0 */
>> > % +if (bde_blkpref =3D=3D 0) {
>> > % =A0 if(ip->i_next_alloc_block =3D=3D lbn && ip->i_next_alloc_goal !=
=3D 0)
>> > % =A0 =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal;
>> > % +} else if (bde_blkpref =3D=3D 1) {
>> > % + if(ip->i_next_alloc_block =3D=3D lbn)
>> > % + =A0 =A0 =A0 =A0 return ip->i_next_alloc_goal;
>> > % +} else
>> > % + return 0;
>> > %
>> > % =A0 /* now check whether we were provided with an array that basical=
ly
>> >
>> > Not needed now.
>> >
>> > % @@ -662,4 +675,5 @@
>> > % =A0 =A0* block.
>> > % =A0 =A0*/
>> > % +if (bde_alloc8 =3D=3D 0) {
>> > % =A0 if (bpref)
>> > % =A0 =A0 =A0 =A0 =A0 start =3D dtogd(fs, bpref) / NBBY;
>> > % @@ -679,4 +693,5 @@
>> > % =A0 =A0 =A0 =A0 =A0 }
>> > % =A0 }
>> > % +}
>> > %
>> > % =A0 bno =3D ext2_mapsearch(fs, bbp, bpref);
>> >
>> > The code to skip to the next 8-block boundary should be removed perman=
ently.
>> > After fixing the initialization, it doesn't generate holes inside file=
s but
>> > it still generates holes between files. =A0The holes are quite large w=
ith
>> > 4K-blocks.
>> >
>> > Benchmark results with just the initialization of `goal' variables res=
tored:
>> >
>> > %%%
>> > ext2fs-1024-1024:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 78.79 real =A0 =A0 =A0 =
=A0 0.31 user =A0 =A0 =A0 =A0 4.94 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.62 real =A0 =A0 =A0 =A0 0.19=
 user =A0 =A0 =A0 =A0 1.82 sys
>> > ext2fs-1024-1024-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 52.07 real =A0 =A0 =A0 =
=A0 0.26 user =A0 =A0 =A0 =A0 4.95 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 24.80 real =A0 =A0 =A0 =A0 0.10=
 user =A0 =A0 =A0 =A0 1.93 sys
>> > ext2fs-4096-4096:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 74.14 real =A0 =A0 =A0 =
=A0 0.34 user =A0 =A0 =A0 =A0 3.96 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.82 real =A0 =A0 =A0 =A0 0.10=
 user =A0 =A0 =A0 =A0 1.19 sys
>> > ext2fs-4096-4096-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 53.54 real =A0 =A0 =A0 =
=A0 0.36 user =A0 =A0 =A0 =A0 3.87 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 33.91 real =A0 =A0 =A0 =A0 0.14=
 user =A0 =A0 =A0 =A0 1.15 sys
>> > %%%
>> >
>> > The much larger holes between the files are apparently responsible for=
 the
>> > decreased speed with 4K-blocks. =A01K-blocks are really too small, so =
4K-blocks
>> > should be faster.
>> >
>> > Benchmark results with the fix and bde_alloc8 =3D 1.
>> >
>> > ext2fs-1024-1024:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 71.60 real =A0 =A0 =A0 =
=A0 0.15 user =A0 =A0 =A0 =A0 2.04 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 22.34 real =A0 =A0 =A0 =A0 0.05=
 user =A0 =A0 =A0 =A0 0.79 sys
>> > ext2fs-1024-1024-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 46.03 real =A0 =A0 =A0 =
=A0 0.14 user =A0 =A0 =A0 =A0 2.02 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 21.97 real =A0 =A0 =A0 =A0 0.05=
 user =A0 =A0 =A0 =A0 0.80 sys
>> > ext2fs-4096-4096:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 59.66 real =A0 =A0 =A0 =
=A0 0.13 user =A0 =A0 =A0 =A0 1.63 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.88 real =A0 =A0 =A0 =A0 0.07=
 user =A0 =A0 =A0 =A0 0.46 sys
>> > ext2fs-4096-4096-as:
>> > tarcp /f srcs: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 37.30 real =A0 =A0 =A0 =
=A0 0.12 user =A0 =A0 =A0 =A0 1.60 sys
>> > tar cf /dev/zero srcs: =A0 =A0 =A0 =A0 19.93 real =A0 =A0 =A0 =A0 0.05=
 user =A0 =A0 =A0 =A0 0.49 sys
>> >
>> > Bruce
>>
>> Hi,
>>
>> I see what you are saying. The gap of 8 block between the files
>> is due to the old preallocation which used to allocate additional
>> 8 blocks in advance for a particular inode when allocating a block
>> for it. The gap between blocks of the same file shouldn't be there
>> too. Both of these cases should be removed. I will look into this
>> during this week. The slowness is also due to lack of preallocation
>> in the new code.
>
> One of the GSoC students worked on a patch to add preallocation back to
> ext2fs this summer. =A0Would you be interested in reviewing and/or testin=
g
> that patch? =A0(I've attached it). =A0Here is his original e-mail:
>
> <quote>
> Hi all,
>
> There is a patch in attachment which implements a preallocation
> algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010.
>
> This patch implements the in-memory ext2/3 block preallocation algorithm
> from reservation window. It uses a RB-tree to index block allocation
> request and reserve a number of blocks for each file which has requested
> to allocate a block. When a file request to allocate a block, it will
> find a block to allocate to this file. When it find the block to
> allocate, it will try to allocate a block, which is in the same cylinder
> group with inode and is not in other reservation window in RB-tree.
> Meanwhile there are some contiguous free blocks after this block. It
> uses a data structure to store this block's position and the length of
> contiguous free blocks. Then it inserts this data structure into
> RB-tree. When this file request to allocate a block again, It will find
> corresponding data structure in RB-tree. If it can find, the next free
> block will be allocated to this file directly. Otherwise, it will search
> a new block again.
>
> I have run some benchmarks to test this algorithm. Please review it in
> wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance
> is better when the number of threads is smaller than 4. When the number
> of threads is greater than 4, the performance can be increased a little.
>
> Please test it.
>
>
> Thanks and best regards,
>
> lz
> </quote>

Wow, this is really awesome! What are the chances of this code being
committed before a 9.0 release (assuming we have enough user testing)?

-Brandon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTim_0mPVZeP1b2z38apvskaLeKnjCvVwH0BK9dAN>