From owner-freebsd-stable@FreeBSD.ORG Thu Jun 29 17:39:03 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3AEE616A50A for ; Thu, 29 Jun 2006 17:39:03 +0000 (UTC) (envelope-from lists@stringsutils.com) Received: from zoraida.natserv.net (p65-147.acedsl.com [66.114.65.147]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5168F43D6A for ; Thu, 29 Jun 2006 17:38:56 +0000 (GMT) (envelope-from lists@stringsutils.com) Received: from zoraida.natserv.net (localhost.natserv.net [127.0.0.1]) by zoraida.natserv.net (Postfix) with ESMTP id 7F63EB896; Thu, 29 Jun 2006 13:38:54 -0400 (EDT) Received: from zoraida.natserv.net (zoraida.natserv.net [66.114.65.147]) by zoraida.natserv.net (Postfix) with ESMTP id 2F963B893; Thu, 29 Jun 2006 13:38:54 -0400 (EDT) References: <20060607222351.GA66870@math.jussieu.fr> <20060628081752.GA24060@rink.nu> <20060628092825.GB24060@rink.nu> <20060628110237.GS79678@deviant.kiev.zoral.com.ua> Message-ID: X-Mailer: http://www.courier-mta.org/cone/ From: Francisco Reyes To: Kostik Belousov Date: Thu, 29 Jun 2006 13:38:54 -0400 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="US-ASCII" Content-Disposition: inline Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Cc: freebsd-stable@freebsd.org Subject: Re: 6.1-R ? 6-Stable ? 5.5-R ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 17:39:03 -0000 Kostik Belousov writes: >> > Approved by: pjd (mentor) >> > Revision Changes Path >> > 1.156.2.3 +16 -0 src/sys/nfsserver/nfs_serv.c >> > 1.136.2.3 +4 -0 src/sys/nfsserver/nfs_srvsubs.c >> >> The above files are what I have. Yes from a 6.1 stable around 6-25-06 > What this means ? That you have _this_ revisions of the files, > and your LA skyrocketed ? LA = load average? Our problem is vmstat 'b' column growing and nfs causing locks on the server side. When the machine locked it was running a background fsck. I saw "Giant" a lot in the status of the nfsd. I am really wondering if 6.1 is ready for production under heavy load. And for sure the NFS client in the whole 6.X line seems problematic (see my post in the stable list under subject: NFS clients freeze and can not disconnect). As for the vmstat, about the only thing doing anything even remotely appearing to be doing work is NFS. For instance I saw this in another thread: ps ax -O ppid,flags,mwchan | awk '($6 ~ /^D/ || $6 == "STAT") && $3 !~ /^20.$/' And in the machine in question it shows PID PPID F MWCHAN TT STAT TIME COMMAND 16124 16123 0 biowr ?? D 46:24.76 nfsd: server (nfsd) 16125 16123 0 biowr ?? D 16:05.58 nfsd: server (nfsd) 16126 16123 0 biowr ?? D 11:05.53 nfsd: server (nfsd) 16127 16123 0 biowr ?? D 8:01.21 nfsd: server (nfsd) 16128 16123 0 biowr ?? D 6:19.15 nfsd: server (nfsd) 16129 16123 0 biowr ?? D 5:01.27 nfsd: server (nfsd) 16130 16123 0 biowr ?? D 3:55.56 nfsd: server (nfsd) 16131 16123 0 biowr ?? D 3:13.11 nfsd: server (nfsd) 16132 16123 0 biowr ?? D 2:43.26 nfsd: server (nfsd) 16133 16123 0 biowr ?? D 2:16.40 nfsd: server (nfsd) 16134 16123 0 biowr ?? D 1:57.00 nfsd: server (nfsd) 16135 16123 0 biowr ?? D 1:41.02 nfsd: server (nfsd) 16136 16123 0 biowr ?? D 1:27.07 nfsd: server (nfsd) 16137 16123 0 biowr ?? D 1:15.25 nfsd: server (nfsd) 16138 16123 0 biowr ?? D 1:06.54 nfsd: server (nfsd) 16139 16123 0 biowr ?? D 0:57.57 nfsd: server (nfsd) 16140 16123 0 biowr ?? D 0:50.65 nfsd: server (nfsd) 16141 16123 0 biowr ?? D 0:44.60 nfsd: server (nfsd) 16142 16123 0 biowr ?? D 0:38.29 nfsd: server (nfsd) 16143 16123 0 biowr ?? D 0:34.21 nfsd: server (nfsd) 16144 16123 0 biowr ?? D 0:29.34 nfsd: server (nfsd) 16145 16123 0 biowr ?? D 0:26.35 nfsd: server (nfsd) 16146 16123 0 biowr ?? D 0:22.25 nfsd: server (nfsd) 16147 16123 0 biowr ?? D 0:18.17 nfsd: server (nfsd) 16148 16123 0 biowr ?? D 0:15.95 nfsd: server (nfsd) 16149 16123 0 biowr ?? D 0:13.66 nfsd: server (nfsd) 16150 16123 0 biowr ?? D 0:10.81 nfsd: server (nfsd) 16151 16123 0 biowr ?? D 0:08.92 nfsd: server (nfsd) 16152 16123 0 biowr ?? D 0:06.82 nfsd: server (nfsd) 16153 16123 0 biowr ?? D 0:05.16 nfsd: server (nfsd) 84338 10043 4100 ufs ?? D 0:02.00 qmgr -l -t fifo -u 91632 10043 4100 biowr ?? D 0:00.02 cleanup -z -t unix -u 91650 10043 4100 ufs ?? D 0:00.04 [smtpd] 91912 86635 4100 biowr ?? Ds 0:00.01 /usr/local/bin/maildrop -d cathy@sitescape.com 91916 90579 4100 biowr ?? Ds 0:00.01 /usr/local/bin/maildrop -d jobs@sitescape.com 71677 71672 4002 ppwait p1 D 0:00.15 -su (csh) The iostat for that machine shows: iostat 5 tty da0 pass0 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 130 15.35 109 1.63 0.00 0 0.00 6 0 6 1 87 0 36 10.43 230 2.34 0.00 0 0.00 3 0 2 1 93 0 12 10.81 280 2.96 0.00 0 0.00 6 0 2 0 92 0 12 13.03 259 3.30 0.00 0 0.00 0 0 1 1 98 0 12 12.87 259 3.26 0.00 0 0.00 5 0 2 1 91 0 12 17.17 228 3.82 0.00 0 0.00 8 0 3 1 87 0 12 18.38 306 5.49 0.00 0 0.00 3 0 2 1 94 0 12 14.53 284 4.04 0.00 0 0.00 6 0 3 1 89 0 12 26.03 213 5.41 0.00 0 0.00 5 0 3 2 91 Before that machine went into production, during the stress test I saw the machine do 700+ tps and substantially more MB/s. We also have another machine identical hardware wise and although it's tps is 50 to 100 less than this one.. the machine is always ver low in the 'b' column. I am trying now to read up in vmstat.. to see if I can see anything wrong in vmstat -s 1660720108 cpu context switches 736683712 device interrupts 46973243 software interrupts 99310719 traps 3405487756 system calls 46 kernel threads created 385149 fork() calls 7785 vfork() calls 0 rfork() calls 2809 swap pager pageins 4449 swap pager pages paged in 2027 swap pager pageouts 4609 swap pager pages paged out 5068 vnode pager pageins 20399 vnode pager pages paged in 0 vnode pager pageouts 0 vnode pager pages paged out 2156 page daemon wakeups 58310018 pages examined by the page daemon 12161 pages reactivated 21541481 copy-on-write faults 3659 copy-on-write optimized faults 38628563 zero fill pages zeroed 30430314 zero fill pages prezeroed 5780 intransit blocking page faults 79476476 total VM faults taken 0 pages affected by kernel thread creation 30747781 pages affected by fork() 3054182 pages affected by vfork() 0 pages affected by rfork() 152627514 pages freed 6 pages freed by daemon 35726176 pages freed by exiting processes 51914 pages active 810514 pages inactive 47456 pages in VM cache 56444 pages wired down 24779 pages free 4096 bytes per page 184453449 total name lookups cache hits (67% pos + 6% neg) system 2% per-directory deletions 6%, falsehits 0%, toolong 0% root@mailstore12.simplicato.com:~/bin#uptime Uptime: 1:35PM up 3 days, 14:48, 3 users, load averages: 0.26, 0.36, 0.29