From owner-freebsd-stable@FreeBSD.ORG Tue May 12 09:06:22 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B96901065675 for ; Tue, 12 May 2009 09:06:22 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 16C828FC29 for ; Tue, 12 May 2009 09:06:21 +0000 (UTC) (envelope-from bra@fsn.hu) Message-ID: <4A09382F.5010109@fsn.hu> Date: Tue, 12 May 2009 10:49:51 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.21 (X11/20090318) MIME-Version: 1.0 To: freebsd-stable@freebsd.org X-Stationery: 0.4.8.14 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (people.fsn.hu [0.0.0.0]); Tue, 12 May 2009 10:49:55 +0200 (CEST) Subject: stat() takes 54 msec in a directory with 94k files (even with a big dirhash) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 May 2009 09:06:23 -0000 Hello, I have a strange error on FreeBSD 7-STABLE (compiled on 7th May, just few commits after the release, but an earlier kernel did the same). I'm doing several parallel rsyncs from a machine to another (let's call them source and destination). The source contains maildirs, so there are some directories with a (relatively) lot of files. The source runs an earlier (around 6.2) FreeBSD and plain softupdates mounted UFS2 file systems. The destination has a bigger (UFS2) filesystem, on top of gjournal, mounted as async. I've noticed that rsync sometimes stops moving data and the destination machine gets sluggish. After some testing, I could catch the effect in action (was not that hard, because it persists even for hours sometimes). top shows around 20% system activity (there are two quad core CPUs) and 0% user. The WCPU field at rsync shows 100%. ktrace-ing the rsync process I can see this: 31639 rsync 0.000004 CALL lstat(0x7fffffffab70,0x7fffffffaf70) 31639 rsync 0.000004 NAMI "hm33/00/16/uid/Maildir/new/1212536121.54673,S=3128" 31639 rsync 0.054226 STRU struct stat {dev=100, ino=136943662, mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942760, atime=1241807071, stime=1212536121, ctime=1241807071, birthtime=1212536121, size=3128, blksize=4096, blocks=8, flags=0x0 } 31639 rsync 0.000013 RET lstat 0 31639 rsync 0.000018 CALL lstat(0x7fffffffab70,0x7fffffffaf70) 31639 rsync 0.000004 NAMI "hm33/00/16/uid/Maildir/new/1212537276.69702,S=4634" 31639 rsync 0.054409 STRU struct stat {dev=100, ino=136943663, mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942762, atime=1241807071, stime=1212537276, ctime=1241807071, birthtime=1212537276, size=4634, blksize=4096, blocks=12, flags=0x0 } 31639 rsync 0.000013 RET lstat 0 31639 rsync 0.000020 CALL lstat(0x7fffffffab70,0x7fffffffaf70) 31639 rsync 0.000005 NAMI "hm33/00/16/uid/Maildir/new/1212537689.74390,S=3172" 31639 rsync 0.054230 STRU struct stat {dev=100, ino=136943664, mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942765, atime=1241807071, stime=1212537689, ctime=1241807071, birthtime=1212537689, size=3172, blksize=4096, blocks=8, flags=0x0 } 31639 rsync 0.000013 RET lstat 0 So according to ktrace, the stat call takes 54 milliseconds to return for each of the files. I have tried with the default and a pretty much raised dirhash maxmem value, but I can still get these. Currently I have: vfs.ufs.dirhash_docheck: 0 vfs.ufs.dirhash_mem: 18589428 vfs.ufs.dirhash_maxmem: 209715200 vfs.ufs.dirhash_minsize: 2560 So dirhash has space to expand. The directory in question contains 94493 files. The source machine doesn't show this behaviour. top's output on the destination machine: CPU: 0.0% user, 0.0% nice, 22.7% system, 0.0% interrupt, 77.3% idle Mem: 159M Active, 3032M Inact, 599M Wired, 47M Cache, 399M Buf, 102M Free Swap: 4096M Total, 4096M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 31639 root 1 118 0 50648K 10512K CPU0 0 2:01 100.00% rsync 634 root 1 -4 0 2536K 628K vlruwk 1 0:20 0.00% supervise 26760 root 1 44 0 25940K 3316K select 1 0:10 0.00% sshd 31640 root 1 75 0 87512K 8324K suspfs 4 0:10 0.00% rsync 31641 root 1 75 0 18904K 7124K suspfs 6 0:10 0.00% rsync 31637 root 1 75 0 40408K 7744K suspfs 4 0:09 0.00% rsync 31636 root 1 44 0 20952K 6288K select 2 0:09 0.00% rsync 31638 root 1 44 0 104M 8912K select 3 0:09 0.00% rsync 31635 root 1 75 0 80344K 7812K suspfs 4 0:09 0.00% rsync 31642 root 1 44 0 17940K 7624K select 1 0:04 0.00% ssh 31646 root 1 45 0 17940K 7656K select 1 0:03 0.00% ssh All of the rsyncs use the same file system, but with different top level directories. During this, neither of the other rsyncs can run. Any ideas about what could be done to work around this? Thanks,