Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Aug 1999 15:32:13 -0400
From:      Michael Vernick <vernick@bell-labs.com>
To:        freebsd-fs@freebsd.org
Subject:   Help with understand file system performance
Message-ID:  <37B07E3D.16F2B334@bell-labs.com>

next in thread | raw e-mail | index | archive | help
Greetings,

It's been a few years since I've hacked with FreeBSD, but I'm back and I
need some help deciphering some of the file system performance numbers
that I'm currently getting.  I'm sure that this has probably been
discussed before but I  haven't found any good related material.

The machine is a P166 w/ 32MB RAM and two 1GB SCSI disks (one for OS and
one for Data) running FreeBSD 3.2-RELEASE.  The Kernel configuration
uses all defaults.

My experiment consists of the following two steps:
1. Create a directory structure of files (depending on certain
parameters like height and width of structure) where the files are
randomly (uniform distribution) chosen to be between 10KB and 20KB.  The
total number of files is around 6400 for a total size of about 100MB.

2. Then a reader program is run that randomly reads a subset (3200) of
the files.  The reader program can have from 1 to 8 processes (fork() is
used to create each process). Each process simply uses 'rand()' to get a
random file, opens the file ('open()'), reads the file in its entirety
using 1 'read(sizeOfFile)' call, then closes the file.

Each experiment is run 8 times (varying the number of processes from
1-8) on each different directory structure.  The structures, in a
nutshell, can be deep (lots of subdirs with few files per directory, or
wide with few subdirs and lots of files per directory).  Both a single
file system and two file systems on the same physical disk are compared.

The performance metric is simply bytes/sec read.

My results show that:

1. Performance degrades significantly (15-20%) when going from 1 to 2
processes then slowly increases as more processes are run.  The same
performance is achieved when running a single reader vs. running 8
readers.  This happens for each type of directory structure.

Is this because of the overhead of directory operations and context
switches?  I would have hoped to get more parallelism with more
processes (i.e. keep the disk at fuller saturation because of Tagged
Queuing) but the results don't show that.

2. Performance degrades about 15% for the 1 process experiment when the
files are split across 2 file systems vs. a single file system.  This
one has me somewhat perplexed.  Is it because there is more directory
information thrashing from disk to memory?

3. On a per process basis, performance increases when the number of
files per directory increases/number of subdirs decreases.  Is this
because there is a better chance the directory information about the
file could be in memory?

In general, my conjecture is that the more directory information that
can be stored in memory, the better, thus leaving all disk activity for
retrieving the actual files. Are there kernel parameters which configure
how much memory is allocated to directory information (metadata) vs.
actual file data.

Our goal, of course, it to maximize performance.  So any help in the
tuning of our system (i.e. reading lots of ~15KB files) would be
appreciated.  I've started to look through the kernel source code to
figure out what is going in, but it isn't easy.  There is lots of
indirection via function pointers.  I've also just started looking
through the 4.4BSD OS Design book.  Is there any FreeBSD documentation
about the file system code?  I really didn't see anything in the
handbook.

Thanks for any help.   It's good to be back.

Michael Vernick, Ph.D.
Multimedia Applications Research
Lucent Bell Labs



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37B07E3D.16F2B334>