From owner-freebsd-questions@FreeBSD.ORG Fri Aug 6 07:17:04 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1DF8C16A4CE for ; Fri, 6 Aug 2004 07:17:04 +0000 (GMT) Received: from yami.57thstreet.com (constr1-host1.corridor.net [66.100.236.130]) by mx1.FreeBSD.org (Postfix) with SMTP id 8983D43D55 for ; Fri, 6 Aug 2004 07:17:03 +0000 (GMT) (envelope-from jeffk@well.com) Received: (qmail 67025 invoked from network); 6 Aug 2004 07:48:12 -0000 Received: from unknown (HELO ?192.168.0.5?) (66.100.236.133) by constr1-host1.corridor.net with SMTP; 6 Aug 2004 07:48:12 -0000 Mime-Version: 1.0 X-Sender: jeffk@mail.well.com Message-Id: Date: Fri, 6 Aug 2004 02:16:50 -0500 To: freebsd-questions@freebsd.org From: Jeff Kramer Content-Type: text/plain; charset="us-ascii" ; format="flowed" Subject: identifying and fixing server I/O slowdowns X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Aug 2004 07:17:04 -0000 Oh great and wise FreeBSD gurus, I've been running FreeBSD boxes for about five years with great results (up to 6 at the moment), but recently one of my machines has started to seriously act up. Every time a heavy disk operation (say, tar'ing a 1 gig directory) occurs the system slows to a crawl, and requests to apache/php/mysql sites hosted on it just hang. The system is a dual p3 1.13ghz box with a gig of ram and mirrored 80 gig WD800BB drives on a Promise TX2 controller. The raid isn't degraded. There's a dedicated 1.5 gig swap partition and a swap file on the /usr partition. We had some apache processes go nuts one time, which is why I added the swap file. We run about 15 jails on the machine, with MySQL in the server proper and apache/php running inside the jails. I initially thought it was a rogue process taking down the machine, but it seems to be that any heavy disk activity for more than a few minutes brings about the slowdown. It doesn't happen instantly, but after a minute or two things will slow to a crawl. I've recompiled the kernel a few times, upgraded to the latest 4-STABLE rev, and even turned on device polling, but nothing seems to be helping. It doesn't seem to happen on another machine we have with identical hardware. My sysctl.conf: kern.ipc.somaxconn=4096 net.inet.tcp.sendspace=32768 net.inet.tcp.recvspace=32768 net.inet.icmp.drop_redirect=1 net.inet.icmp.log_redirect=1 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 net.link.ether.inet.max_age=1200 net.inet.icmp.bmcastecho=0 net.inet.icmp.maskrepl=0 kern.maxfiles=65536 kern.ipc.shm_use_phys=1 kern.polling.enable=1 And a netstat -m: 301/928/131072 mbufs in use (current/peak/max): 301 mbufs allocated to data 287/874/32768 mbuf clusters in use (current/peak/max) 1980 Kbytes allocated to network (2% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines And here's a typical systat -v snapshot while the machine's 'ok': 3 users Load 0.32 0.38 0.31 Aug 6 00:03 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 221588 38656 747652 117796 39404 count 4 3 All 1024156 41620 1546136 144132 pages 18 5 Interrupts Proc:r p d s w Csw Trp Sys Int Sof Flt 21 cow 1156 total 2 2 70 343 63322119 1156 57 397 186992 wire fxp0 irq2 623848 act 13 ohci0 irq9 4.4%Sys 1.0%Intr 2.5%User 0.0%Nice 92.1%Idl 176096 inact 11 mux irq10 | | | | | | | | | | 37220 cache fdc0 irq6 ==+> 2184 free 1004 clk irq0 daefr 128 rtc irq8 Namei Name-cache Dir-cache 15 prcfr Calls hits % hits % 5 react 126 125 99 pdwake 340 zfod pdpgs Disks ad4 ad6 fd0 md0 119 ofod 1 intrn KB/t 0.00 16.72 0.00 0.00 34 %slo-z 114304 buf tps 0 11 0 0 401 tfree 173 dirtybuf MB/s 0.00 0.17 0.00 0.00 70310 desiredvnodes % busy 0 9 0 0 64089 numvnodes 54829 freevnodes And here's a systat -v snapshop while the machine's choking: 4 users Load 0.39 0.35 0.31 Aug 6 00:08 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 191344 34248 728736 117268 51916 count 1 6 All 1024676 37500 2075520 144188 pages 2 67 Interrupts Proc:r p d s w Csw Trp Sys Int Sof Flt 29 cow 1698 total 5 2 70 573 74423171 1699 225 367 180904 wire fxp0 irq2 640404 act 335 ohci0 irq9 5.7%Sys 1.9%Intr 7.5%User 0.0%Nice 84.9%Idl 153116 inact 236 mux irq10 | | | | | | | | | | 50252 cache fdc0 irq6 ===+>>>> 1664 free 999 clk irq0 daefr 128 rtc irq8 Namei Name-cache Dir-cache 93 prcfr Calls hits % hits % 1 react 8693 8196 94 12 0 pdwake 308 zfod 2693 pdpgs Disks ad4 ad6 fd0 md0 135 ofod intrn KB/t 98.81 16.61 0.00 0.00 43 %slo-z 114304 buf tps 13 225 0 0 1277 tfree 278 dirtybuf MB/s 1.23 3.64 0.00 0.00 70310 desiredvnodes % busy 2 99 0 0 64089 numvnodes 52125 freevnodes Thoughts? Is there any way to force a machine to limit the monopolization of a disk controller by a process? -- Jeff Kramer jeffk@well.com http://www.keika.org/