FreeBSD Mail Archives

Date:      Fri, 13 Nov 1998 15:46:51 +1100
From:      Peter Jeremy <peter.jeremy@auss2.alcatel.com.au>
To:        hackers@FreeBSD.ORG
Subject:   dump(8) very slow
Message-ID:  <98Nov13.154625est.40323@border.alcanet.com.au>

next in thread | raw e-mail | index | archive | help

As has been mentioned recently (in the thread on the performance of
striped filesystems), the throughput of a disk is strongly correlated
with I/O request size - large requests are necessary to get close to
the theoretical throughputs (which are the ones claimed by the vendors).
The FFS design took this into account (with support for groups of
adjacent blocks being read/written in one operation).

Unfortunately, dump(8) is distinctly sub-optimal as far as reading the
disk is concerned.  Dump reads the disk in blocksize blocks (with
partial blocks read as a multiple of fragsize).  For a typical
filesystem this means that dump never reads more than 8K at a time
(and if there are lots of small files, the average read size will
drop).  This translates (at least for me) to dump throughputs (to
/dev/null) of 500-1000KBps - slower than typical tape drives.

I believe that this could be substantially improved (at the expense
of increased working set size), without changing the dump tape format
(ie restore(8) is not affected).

The approach I'm thinking of would be to allocate a large(*) buffer
(or buffers) and then sort and merge the outstanding queue of read
requests to fill the buffer, whilst maximising the individual read
request sizes.  (Taking into account the fact that, in general, it is
faster to unnecessarily read a block or two off disk, rather than
issue two smaller reads for nearly adjacent sections).  The tape block
order can be restored by using either writev or readv to re-order the
buffer.  Two buffers (and associated processes) could be used to
overlap disk reads and tape writes.

Is this approach worth the effort?  I suspect this depends on how well
associated sequential blocks of inodes correlate to associated groups
of data blocks on disk - I don't know the answer to this.

Since this amounts to a buffer cache (albeit with a special layout and
replacement policy), would dump be better off going through the buffer
cache (maybe with some extra system calls to help dump tell the buffer
cache management software what it's doing - ala madvise(2))?

(*) where `large' is as big as possible whilst keeping dump(8) and
    the buffer(s) resident.

Peter
--
Peter Jeremy (VK2PJ)                    peter.jeremy@alcatel.com.au
Alcatel Australia Limited
41 Mandible St                          Phone: +61 2 9690 5019
ALEXANDRIA  NSW  2015                   Fax:   +61 2 9690 5247

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?98Nov13.154625est.40323>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation