Date: Tue, 12 May 2009 12:26:45 +0200 From: "Paul B. Mahol" <onemda@gmail.com> To: Attila Nagy <bra@fsn.hu> Cc: freebsd-stable@freebsd.org Subject: Re: stat() takes 54 msec in a directory with 94k files (even with a big dirhash) Message-ID: <3a142e750905120326m165f4fdeld8d01eb305a1771c@mail.gmail.com> In-Reply-To: <4A09382F.5010109@fsn.hu> References: <4A09382F.5010109@fsn.hu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/12/09, Attila Nagy <bra@fsn.hu> wrote: > Hello, > > I have a strange error on FreeBSD 7-STABLE (compiled on 7th May, just > few commits after the release, but an earlier kernel did the same). > > I'm doing several parallel rsyncs from a machine to another (let's call > them source and destination). The source contains maildirs, so there are > some directories with a (relatively) lot of files. > The source runs an earlier (around 6.2) FreeBSD and plain softupdates > mounted UFS2 file systems. > The destination has a bigger (UFS2) filesystem, on top of gjournal, > mounted as async. > > I've noticed that rsync sometimes stops moving data and the destination > machine gets sluggish. After some testing, I could catch the effect in > action (was not that hard, because it persists even for hours sometimes). > > top shows around 20% system activity (there are two quad core CPUs) and > 0% user. The WCPU field at rsync shows 100%. > > ktrace-ing the rsync process I can see this: > 31639 rsync 0.000004 CALL lstat(0x7fffffffab70,0x7fffffffaf70) > 31639 rsync 0.000004 NAMI > "hm33/00/16/uid/Maildir/new/1212536121.54673,S=3128" > 31639 rsync 0.054226 STRU struct stat {dev=100, ino=136943662, > mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942760, > atime=1241807071, stime=1212536121, ctime=1241807071, > birthtime=1212536121, size=3128, blksize=4096, blocks=8, flags=0x0 } > 31639 rsync 0.000013 RET lstat 0 > 31639 rsync 0.000018 CALL lstat(0x7fffffffab70,0x7fffffffaf70) > 31639 rsync 0.000004 NAMI > "hm33/00/16/uid/Maildir/new/1212537276.69702,S=4634" > 31639 rsync 0.054409 STRU struct stat {dev=100, ino=136943663, > mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942762, > atime=1241807071, stime=1212537276, ctime=1241807071, > birthtime=1212537276, size=4634, blksize=4096, blocks=12, flags=0x0 } > 31639 rsync 0.000013 RET lstat 0 > 31639 rsync 0.000020 CALL lstat(0x7fffffffab70,0x7fffffffaf70) > 31639 rsync 0.000005 NAMI > "hm33/00/16/uid/Maildir/new/1212537689.74390,S=3172" > 31639 rsync 0.054230 STRU struct stat {dev=100, ino=136943664, > mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942765, > atime=1241807071, stime=1212537689, ctime=1241807071, > birthtime=1212537689, size=3172, blksize=4096, blocks=8, flags=0x0 } > 31639 rsync 0.000013 RET lstat 0 > > So according to ktrace, the stat call takes 54 milliseconds to return > for each of the files. > I have tried with the default and a pretty much raised dirhash maxmem > value, but I can still get these. > Currently I have: > vfs.ufs.dirhash_docheck: 0 > vfs.ufs.dirhash_mem: 18589428 > vfs.ufs.dirhash_maxmem: 209715200 > vfs.ufs.dirhash_minsize: 2560 > So dirhash has space to expand. > > The directory in question contains 94493 files. > > The source machine doesn't show this behaviour. > > top's output on the destination machine: > CPU: 0.0% user, 0.0% nice, 22.7% system, 0.0% interrupt, 77.3% idle > Mem: 159M Active, 3032M Inact, 599M Wired, 47M Cache, 399M Buf, 102M Free > Swap: 4096M Total, 4096M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 31639 root 1 118 0 50648K 10512K CPU0 0 2:01 100.00% rsync > 634 root 1 -4 0 2536K 628K vlruwk 1 0:20 0.00% supervise > 26760 root 1 44 0 25940K 3316K select 1 0:10 0.00% sshd > 31640 root 1 75 0 87512K 8324K suspfs 4 0:10 0.00% rsync > 31641 root 1 75 0 18904K 7124K suspfs 6 0:10 0.00% rsync > 31637 root 1 75 0 40408K 7744K suspfs 4 0:09 0.00% rsync > 31636 root 1 44 0 20952K 6288K select 2 0:09 0.00% rsync > 31638 root 1 44 0 104M 8912K select 3 0:09 0.00% rsync > 31635 root 1 75 0 80344K 7812K suspfs 4 0:09 0.00% rsync > 31642 root 1 44 0 17940K 7624K select 1 0:04 0.00% ssh > 31646 root 1 45 0 17940K 7656K select 1 0:03 0.00% ssh > > All of the rsyncs use the same file system, but with different top level > directories. During this, neither of the other rsyncs can run. > > Any ideas about what could be done to work around this? Big guess, maybe it updates atime? Try with noatime mount option. -- Paul
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3a142e750905120326m165f4fdeld8d01eb305a1771c>