From owner-freebsd-current@FreeBSD.ORG  Sat Jan  1 16:16:57 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 1233)
	id 0D6BB1065672; Sat,  1 Jan 2011 16:16:57 +0000 (UTC)
Date: Sat, 1 Jan 2011 16:16:57 +0000
From: Alexander Best <arundel@freebsd.org>
To: Beat =?iso-8859-15?Q?G=E4tzi?= <beat@chruetertee.ch>
Message-ID: <20110101161657.GA16319@freebsd.org>
References: <4D1F1AE8.5040704@chruetertee.ch>
	<20110101151008.GA7762@freebsd.org>
	<4D1F4A48.6080604@chruetertee.ch>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <4D1F4A48.6080604@chruetertee.ch>
Cc: current@freebsd.org
Subject: Re: Suddenly slow lstat syscalls on CURRENT from Juli
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Jan 2011 16:16:57 -0000

On Sat Jan  1 11, Beat Gätzi wrote:
> On 01.01.2011 16:10, Alexander Best wrote:
> > On Sat Jan  1 11, Beat Gätzi wrote:
> >> Hi,
> >>
> >> Since a couple of days commands like ls(1) are very slow on one of my
> >> tinderboxes. Checking with ktrace/kdump I see that the lstat syscall
> >> takes about one second:
> >>
> >>  70559 ls       0.004644 CALL  lstat(0x284472f8,0x28447298)
> >>  70559 ls       0.004651 NAMI  "Mk"
> >>  70559 ls       0.004664 STRU  struct stat {... }
> >>  70559 ls       0.004672 RET   lstat 0
> >>  70559 ls       0.004688 CALL  lstat(0x2844a1b8,0x2844a158)
> >>  70559 ls       0.004696 NAMI  "README,v"
> >>  70559 ls       1.004245 STRU  struct stat {...}
> >>  70559 ls       1.004263 RET   lstat 0
> >>  70559 ls       1.004277 CALL  lstat(0x2844a2b8,0x2844a258)
> >>  70559 ls       1.004286 NAMI  ".cvsignore,v"
> >>  70559 ls       2.004282 STRU  struct stat {...}
> >>  70559 ls       2.004302 RET   lstat 0
> >>  70559 ls       2.004316 CALL  lstat(0x2844a3b8,0x2844a358)
> >>  70559 ls       2.004325 NAMI  "CHANGES,v"
> >>  70559 ls       3.004275 STRU  struct stat {...}
> >>  70559 ls       3.004296 RET   lstat 0
> >>  70559 ls       3.004310 CALL  lstat(0x2844a4b8,0x2844a458)
> >>  70559 ls       3.004318 NAMI  "COPYRIGHT,v"
> >>  70559 ls       4.004300 STRU  struct stat {...}
> >>  70559 ls       4.004318 RET   lstat 0
> >>
> >> When i immediately re-execute the command in the same directory it looks
> >> normal:
> >>
> >>  78007 ls       0.004665 CALL  lstat(0x284472f8,0x28447298)
> >>  78007 ls       0.004673 NAMI  "Mk"
> >>  78007 ls       0.004686 STRU  struct stat {...}
> >>  78007 ls       0.004693 RET   lstat 0
> >>  78007 ls       0.004708 CALL  lstat(0x2844a1b8,0x2844a158)
> >>  78007 ls       0.004715 NAMI  "README,v"
> >>  78007 ls       0.004728 STRU  struct stat {...}
> >>  78007 ls       0.004735 RET   lstat 0
> >>  78007 ls       0.004742 CALL  lstat(0x2844a2b8,0x2844a258)
> >>  78007 ls       0.004749 NAMI  ".cvsignore,v"
> >>  78007 ls       0.004761 STRU  struct stat {...}
> >>  78007 ls       0.004768 RET   lstat 0
> >>  78007 ls       0.004775 CALL  lstat(0x2844a3b8,0x2844a358)
> >>  78007 ls       0.004782 NAMI  "CHANGES,v"
> >>  78007 ls       0.004795 STRU  struct stat {...}
> >>  78007 ls       0.004802 RET   lstat 0
> >>  78007 ls       0.004809 CALL  lstat(0x2844a4b8,0x2844a458)
> >>  78007 ls       0.004817 NAMI  "COPYRIGHT,v"
> >>  78007 ls       0.004828 STRU  struct stat {...}
> >>  78007 ls       0.004835 RET   lstat 0
> >>
> >> System is a HP DL360 G3 with a Compaq Smart Array 5i controller. There
> >> is no evidence of a RAID controller or hard disk problem in the system
> >> logs. Also writing to a file and reading from a file looks normal.
> >> The running kernel is a GENERIC kernel from Juli without debugging options:
> >>
> >> # uname -a
> >> FreeBSD tinderbox.chruetertee.ch 9.0-CURRENT FreeBSD 9.0-CURRENT #5
> >> r209980: Tue Jul 13 11:25:50 CEST 2010
> >> root@tinderbox.chruetertee.ch:/usr/obj/usr/home/beat/dev/src/head/sys/BEASTIE
> >>  i386
> > 
> > no problems here.
> 
> I haven't had this problem the first ~166 days since I rebooted the box
> the last time. Then the problem suddenly occur in all directories on all
> partitions. The box was pretty much under load all the time as it builds
> packages almost 7x24. I forgot to say the used filesystem is UFS.
> 
> # diskinfo -ct da0
> da0
>         512             # sectorsize
>         299992412160    # mediasize in bytes (279G)
>         585922680       # mediasize in sectors
>         0               # stripesize
>         0               # stripeoffset
>         36472           # Cylinders according to firmware.
>         255             # Heads according to firmware.
>         63              # Sectors according to firmware.
>                         # Disk ident.
> 
> I/O command overhead:
>         time to read 10MB block     0.333161 sec  =    0.016 msec/sector
>         time to read 20480 sectors  3.392359 sec  =    0.166 msec/sector
>         calculated command overhead               =    0.149 msec/sector
> 
> Seek times:
>         Full stroke:      250 iter in   1.202862 sec =    4.811 msec
>         Half stroke:      250 iter in   1.120656 sec =    4.483 msec
>         Quarter stroke:   500 iter in   2.109077 sec =    4.218 msec
>         Short forward:    400 iter in   1.892342 sec =    4.731 msec
>         Short backward:   400 iter in   1.399378 sec =    3.498 msec
>         Seq outer:       2048 iter in   0.399352 sec =    0.195 msec
>         Seq inner:       2048 iter in   0.385460 sec =    0.188 msec
> Transfer rates:
>     outside:       102400 kbytes in   2.325967 sec =    44025 kbytes/sec
>     middle:        102400 kbytes in   2.157681 sec =    47458 kbytes/sec
>     inside:        102400 kbytes in   2.717089 sec =    37687 kbytes/sec
> 
> I also observed that the cpu output of vmstat is a little bit confusing
> on this system:
> 
> # vmstat
>  procs  memory     page                   disk   faults         cpu
>  r b w  avm  fre   flt  re  pi  po    fr  sr da0   in   sy   cs us sy id
>  0 0 0  1663M 1120M 117 0   1   0   120  21   0  172  27 245 -29 -10 139

yeah that bug really needs to be fixed. it occurs on almost all systems with
high load and high uptime (> 100 days).

see http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/30360

cheers.
alex

> 
> Thanks,
> Beat

-- 
a13x