Date: Tue, 10 Aug 1999 15:32:13 -0400 From: Michael Vernick <vernick@bell-labs.com> To: freebsd-fs@freebsd.org Subject: Help with understand file system performance Message-ID: <37B07E3D.16F2B334@bell-labs.com>
next in thread | raw e-mail | index | archive | help
Greetings, It's been a few years since I've hacked with FreeBSD, but I'm back and I need some help deciphering some of the file system performance numbers that I'm currently getting. I'm sure that this has probably been discussed before but I haven't found any good related material. The machine is a P166 w/ 32MB RAM and two 1GB SCSI disks (one for OS and one for Data) running FreeBSD 3.2-RELEASE. The Kernel configuration uses all defaults. My experiment consists of the following two steps: 1. Create a directory structure of files (depending on certain parameters like height and width of structure) where the files are randomly (uniform distribution) chosen to be between 10KB and 20KB. The total number of files is around 6400 for a total size of about 100MB. 2. Then a reader program is run that randomly reads a subset (3200) of the files. The reader program can have from 1 to 8 processes (fork() is used to create each process). Each process simply uses 'rand()' to get a random file, opens the file ('open()'), reads the file in its entirety using 1 'read(sizeOfFile)' call, then closes the file. Each experiment is run 8 times (varying the number of processes from 1-8) on each different directory structure. The structures, in a nutshell, can be deep (lots of subdirs with few files per directory, or wide with few subdirs and lots of files per directory). Both a single file system and two file systems on the same physical disk are compared. The performance metric is simply bytes/sec read. My results show that: 1. Performance degrades significantly (15-20%) when going from 1 to 2 processes then slowly increases as more processes are run. The same performance is achieved when running a single reader vs. running 8 readers. This happens for each type of directory structure. Is this because of the overhead of directory operations and context switches? I would have hoped to get more parallelism with more processes (i.e. keep the disk at fuller saturation because of Tagged Queuing) but the results don't show that. 2. Performance degrades about 15% for the 1 process experiment when the files are split across 2 file systems vs. a single file system. This one has me somewhat perplexed. Is it because there is more directory information thrashing from disk to memory? 3. On a per process basis, performance increases when the number of files per directory increases/number of subdirs decreases. Is this because there is a better chance the directory information about the file could be in memory? In general, my conjecture is that the more directory information that can be stored in memory, the better, thus leaving all disk activity for retrieving the actual files. Are there kernel parameters which configure how much memory is allocated to directory information (metadata) vs. actual file data. Our goal, of course, it to maximize performance. So any help in the tuning of our system (i.e. reading lots of ~15KB files) would be appreciated. I've started to look through the kernel source code to figure out what is going in, but it isn't easy. There is lots of indirection via function pointers. I've also just started looking through the 4.4BSD OS Design book. Is there any FreeBSD documentation about the file system code? I really didn't see anything in the handbook. Thanks for any help. It's good to be back. Michael Vernick, Ph.D. Multimedia Applications Research Lucent Bell Labs To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37B07E3D.16F2B334>