From owner-freebsd-performance@FreeBSD.ORG Wed Apr 28 14:27:07 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A09DF16A4D5 for ; Wed, 28 Apr 2004 14:27:07 -0700 (PDT) Received: from svaha.com (svaha.com [38.113.6.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECFE243D2F for ; Wed, 28 Apr 2004 14:27:06 -0700 (PDT) (envelope-from meconlen@obfuscated.net) Received: from [10.140.1.78] (noc.neutelligent.com [64.156.25.3]) (AUTH: LOGIN meconlen) by svaha.com with esmtp; Wed, 28 Apr 2004 17:27:05 -0400 Mime-Version: 1.0 (Apple Message framework v613) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; format=flowed To: freebsd-performance@freebsd.org From: Michael Conlen Date: Wed, 28 Apr 2004 17:27:04 -0400 X-Mailer: Apple Mail (2.613) Subject: NFS Server X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Apr 2004 21:27:07 -0000 I've got an NFS server that's doing some heavy load. It's holding the web pages, images and videos for a cluster of servers doing about 40 Mbit/sec of traffic (and 160 requests/second). the NFS server has been doing between 40 Mbit/sec in and about 10 Mbit/sec out as daily averages for over 45 days and everything runs well. Today I noticed that at Midnight *exactly* the interrupt time went through the roof on the system (from 5% to 20%). I checked out the system and noticed that it's actually going to the disks a lot, 2-7 MB/sec of disk usage in systat -vmstat. My first thought is that something's got the inactive pages hosed, so I made a 2 GB file (dd if=/dev/zero of=foo bs=1024k count=2048), removed it and sync; sync; sync. Just like magic the Inactive page count vaporized as expected. The disk usage is the same as it had been when there was 1.6 GB of inactive pages. After running about a half hour the system still doesn't have much inactive page use. I've included systat -vmstat output below, though it's difficult to read. The main thing is that there's about 3500KB of inactive page use with a system doing 2-7 MB/sec of disk activity, mostly read operations (despite the network traffic, which I think is due to caching). Now, the whole system performs like magic right now, so I'm not too worried about it, until I dump another 80 MBit/sec of web traffic and 100 GB of more files on to the system. At that time I plan to jump to 4 GB of memory, with the idea that the extra memory used for inactive pages means less disk IO than there would otherwise be, but today's activity has me puzzled. The only thing in the whole system that might cause this is the backup process which kicks off at... ...midnight! The catch is that it's been kicking off every midnight for weeks and it's never affected the CPU. The current backup process is (don't shoot me, please) that I mount the filesystems on another server and rsync them on that server to local filesystems. The process ran and finished as normal. The backup server has since been rebooted (to address other needs) and is fine. Any thoughts as to why I've lost my inactive pages and have gone straight to disk for all operations? Having written all this the page count is still Mem: 16M Active, 3496K Inact, 270M Wired, 92K Cache, 199M Buf, 1719M Free Swap: 4079M Total, 48K Used, 4079M Free What follows is systat -vmstat output 4 users Load 1.46 1.28 1.16 Apr 28 17:14 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 6096 2652 16900 3956 1790332 count All 266596 3884 2431192 8064 pages Interrupts Proc:r p d s w Csw Trp Sys Int Sof Flt cow 8617 total 15 11 2640 6 101 8617 42 7 246924 wire 8389 mux irq11 16236 act ata1 irq15 22.5%Sys 23.2%Intr 0.0%User 0.0%Nice 54.2%Idl 3344 inact fdc0 irq6 | | | | | | | | | | 92 cache atkbd0 irq ===========++++++++++++ 1790240 free ppc0 irq7 daefr 100 clk irq0 Namei Name-cache Dir-cache prcfr 128 rtc irq8 Calls hits % hits % react 1050 1050 100 pdwake zfod pdpgs Disks aacd0 acd0 md0 ofod intrn KB/t 16.26 0.00 0.00 %slo-z 204096 buf tps 467 0 0 1734 tfree 219 dirtybuf MB/s 7.41 0.00 0.00 134716 desiredvnodes % busy 15 0 0 121459 numvnodes 118582 freevnodes From owner-freebsd-performance@FreeBSD.ORG Sat May 1 05:08:25 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E2A8216A4D3 for ; Sat, 1 May 2004 05:08:25 -0700 (PDT) Received: from rhid.com (rhid.com [200.46.204.134]) by mx1.FreeBSD.org (Postfix) with ESMTP id EDFFF43D4C for ; Sat, 1 May 2004 05:08:24 -0700 (PDT) (envelope-from flaw@rhid.com) Received: from void.rhid.com (rhid.com [200.46.204.134]) by rhid.com (Postfix) with ESMTP id 18660720E86; Sat, 1 May 2004 12:08:18 +0000 (GMT) Received: by void.rhid.com (Postfix, from userid 1000) id 2141A2C900; Sat, 1 May 2004 05:08:24 -0700 (MST) Date: Sat, 1 May 2004 05:08:24 -0700 From: James William Pye To: Tim Traver Message-ID: <20040501120823.GB510@void.ph.cox.net> References: <6.0.1.1.0.20040330113631.01ef7ec0@mail1.simplenet.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="cNdxnHkX5QqsyA0e" Content-Disposition: inline In-Reply-To: <6.0.1.1.0.20040330113631.01ef7ec0@mail1.simplenet.com> Organization: rhid development User-Agent: Mutt/1.5.5.1i cc: freebsd-performance@freebsd.org Subject: Re: shmem release X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 May 2004 12:08:26 -0000 --cNdxnHkX5QqsyA0e Content-Type: text/plain; charset=us-ascii Content-Disposition: inline A little late with the reply, but I've had (iirc) a similar issue before, and again not more than a few days ago. The problem seems to be with world being somehow out of sync with the kernel; I frequently install a newer kernel without touching world, and this issue has shown its ugly little head to me twice now. For me, just make'ing buildworld installworld squashes it. On 03/30/04:13/2, Tim Traver wrote: > shmget() failed: No space left on device -- Regards, James William Pye --cNdxnHkX5QqsyA0e Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (FreeBSD) iQEVAwUBQJOTN6ZpiPNPvu8yAQL2DQf/QF6GAyw30AMyKjBcW8fuCwcP6K2/awM+ dU4/hXCywZ2FJReVzDT27o2M3Njq0flD1bw0r4RkYPtiblSdSOQHrDzriDUTSKbZ BEuYmP9d6WyYXLLYQFe4BRJPxA3538qxohbWNvUe9jFIRQ2WDWTflcudXMkEtubQ LcWBVDi2+nce5AtyU1AkiCT9ktcCNnUzecJDQcXs3Y932tand08R8PAoZWPQEl2r denZ1JPYWvaHwXf+KVTedzmzWImwtpCvbqtDirAaOb903p5yAP5tOuJqkjG1lrMk m5IimCCTU6XtD0W43+7TRE/AevGOLPcG45D+YGu8eN0BO3bt4MiPyA== =+VDm -----END PGP SIGNATURE----- --cNdxnHkX5QqsyA0e--