From owner-freebsd-stable@FreeBSD.ORG Tue May 12 10:26:50 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E29C5106566B for ; Tue, 12 May 2009 10:26:49 +0000 (UTC) (envelope-from onemda@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.156]) by mx1.freebsd.org (Postfix) with ESMTP id 796C48FC08 for ; Tue, 12 May 2009 10:26:46 +0000 (UTC) (envelope-from onemda@gmail.com) Received: by fg-out-1718.google.com with SMTP id e12so701142fga.12 for ; Tue, 12 May 2009 03:26:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=REYZtF1dgvwTfZ+WWnmg2J58b/yRwfxCqsYDR2nL7cI=; b=dGGkSITVoG6t6dPI12YALGg++PU8/whKAMrWD/uSwgr00WA4GfG4o1YqBaSiqq4yDu gfLfyrad79FPLQ1fm/4o5VPE+Ux8NIpuwuvhV1ol3yDbmfZDkqswLxFEAY6yc4fFtV4R TMmYozAPzyVCgYGudmpBAaq0kkSbC0hhhuTHo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=MKbC7BNqXYqz/+rVEMX7UpZogsMQhLMqhCqoL2QHeOKBuK59yW7jmzh2CQWHm1iY6B x5CsU9DyNSV7YIYRWWfDTe+bd9R4PVbqAmlAhAlwhTEC8B43g8iyXgyt3JSuKzhvDnOk /gR78TUE/HIjwlFD+3VObrkumAZF9R9eZaY5g= MIME-Version: 1.0 Received: by 10.239.154.83 with SMTP id d19mr645719hbc.33.1242124005288; Tue, 12 May 2009 03:26:45 -0700 (PDT) In-Reply-To: <4A09382F.5010109@fsn.hu> References: <4A09382F.5010109@fsn.hu> Date: Tue, 12 May 2009 12:26:45 +0200 Message-ID: <3a142e750905120326m165f4fdeld8d01eb305a1771c@mail.gmail.com> From: "Paul B. Mahol" To: Attila Nagy Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: stat() takes 54 msec in a directory with 94k files (even with a big dirhash) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 May 2009 10:26:50 -0000 On 5/12/09, Attila Nagy wrote: > Hello, > > I have a strange error on FreeBSD 7-STABLE (compiled on 7th May, just > few commits after the release, but an earlier kernel did the same). > > I'm doing several parallel rsyncs from a machine to another (let's call > them source and destination). The source contains maildirs, so there are > some directories with a (relatively) lot of files. > The source runs an earlier (around 6.2) FreeBSD and plain softupdates > mounted UFS2 file systems. > The destination has a bigger (UFS2) filesystem, on top of gjournal, > mounted as async. > > I've noticed that rsync sometimes stops moving data and the destination > machine gets sluggish. After some testing, I could catch the effect in > action (was not that hard, because it persists even for hours sometimes). > > top shows around 20% system activity (there are two quad core CPUs) and > 0% user. The WCPU field at rsync shows 100%. > > ktrace-ing the rsync process I can see this: > 31639 rsync 0.000004 CALL lstat(0x7fffffffab70,0x7fffffffaf70) > 31639 rsync 0.000004 NAMI > "hm33/00/16/uid/Maildir/new/1212536121.54673,S=3128" > 31639 rsync 0.054226 STRU struct stat {dev=100, ino=136943662, > mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942760, > atime=1241807071, stime=1212536121, ctime=1241807071, > birthtime=1212536121, size=3128, blksize=4096, blocks=8, flags=0x0 } > 31639 rsync 0.000013 RET lstat 0 > 31639 rsync 0.000018 CALL lstat(0x7fffffffab70,0x7fffffffaf70) > 31639 rsync 0.000004 NAMI > "hm33/00/16/uid/Maildir/new/1212537276.69702,S=4634" > 31639 rsync 0.054409 STRU struct stat {dev=100, ino=136943663, > mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942762, > atime=1241807071, stime=1212537276, ctime=1241807071, > birthtime=1212537276, size=4634, blksize=4096, blocks=12, flags=0x0 } > 31639 rsync 0.000013 RET lstat 0 > 31639 rsync 0.000020 CALL lstat(0x7fffffffab70,0x7fffffffaf70) > 31639 rsync 0.000005 NAMI > "hm33/00/16/uid/Maildir/new/1212537689.74390,S=3172" > 31639 rsync 0.054230 STRU struct stat {dev=100, ino=136943664, > mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942765, > atime=1241807071, stime=1212537689, ctime=1241807071, > birthtime=1212537689, size=3172, blksize=4096, blocks=8, flags=0x0 } > 31639 rsync 0.000013 RET lstat 0 > > So according to ktrace, the stat call takes 54 milliseconds to return > for each of the files. > I have tried with the default and a pretty much raised dirhash maxmem > value, but I can still get these. > Currently I have: > vfs.ufs.dirhash_docheck: 0 > vfs.ufs.dirhash_mem: 18589428 > vfs.ufs.dirhash_maxmem: 209715200 > vfs.ufs.dirhash_minsize: 2560 > So dirhash has space to expand. > > The directory in question contains 94493 files. > > The source machine doesn't show this behaviour. > > top's output on the destination machine: > CPU: 0.0% user, 0.0% nice, 22.7% system, 0.0% interrupt, 77.3% idle > Mem: 159M Active, 3032M Inact, 599M Wired, 47M Cache, 399M Buf, 102M Free > Swap: 4096M Total, 4096M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 31639 root 1 118 0 50648K 10512K CPU0 0 2:01 100.00% rsync > 634 root 1 -4 0 2536K 628K vlruwk 1 0:20 0.00% supervise > 26760 root 1 44 0 25940K 3316K select 1 0:10 0.00% sshd > 31640 root 1 75 0 87512K 8324K suspfs 4 0:10 0.00% rsync > 31641 root 1 75 0 18904K 7124K suspfs 6 0:10 0.00% rsync > 31637 root 1 75 0 40408K 7744K suspfs 4 0:09 0.00% rsync > 31636 root 1 44 0 20952K 6288K select 2 0:09 0.00% rsync > 31638 root 1 44 0 104M 8912K select 3 0:09 0.00% rsync > 31635 root 1 75 0 80344K 7812K suspfs 4 0:09 0.00% rsync > 31642 root 1 44 0 17940K 7624K select 1 0:04 0.00% ssh > 31646 root 1 45 0 17940K 7656K select 1 0:03 0.00% ssh > > All of the rsyncs use the same file system, but with different top level > directories. During this, neither of the other rsyncs can run. > > Any ideas about what could be done to work around this? Big guess, maybe it updates atime? Try with noatime mount option. -- Paul