From owner-freebsd-stable@FreeBSD.ORG Thu Dec 18 06:52:57 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E51B106564A for ; Thu, 18 Dec 2008 06:52:57 +0000 (UTC) (envelope-from paul@elehost.com) Received: from elephants.elehost.com (elephants.elehost.com [38.99.154.67]) by mx1.freebsd.org (Postfix) with ESMTP id 4AD228FC18 for ; Thu, 18 Dec 2008 06:52:57 +0000 (UTC) (envelope-from paul@elehost.com) X-Virus-Scanned: amavisd-new at elehost.com Received: from [192.168.1.126] (greenflowers.elehost.com [209.112.16.189]) (authenticated bits=0) by elephants.elehost.com (8.14.2/8.14.2) with ESMTP id mBHKrHvb040269 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 17 Dec 2008 15:53:18 -0500 (EST) (envelope-from paul@elehost.com) Message-ID: <4949673B.2070701@elehost.com> Date: Wed, 17 Dec 2008 15:55:23 -0500 From: Paul MacKenzie User-Agent: Thunderbird 3.0a1pre (Windows/2008022014) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Dec 2008 06:52:57 -0000 > > [ns8]# vmstat -i > > interrupt total rate > > irq4: sio0 57065 0 > > irq17: em1 3989494045 554 > > irq18: arcmsr0 558098657 77 > > cpu0: timer 14381393929 2000 > > irq256: em0 22763077 3 > > cpu1: timer 14381384902 1999 > > Total 33333191675 4635 > > [ns8]# > > > > arcmsr0: >> > > mem 0xe8600000-0xe8600fff,0xe8000000-0xe83fffff irq 18 at device >> > > 14.0 on pci2 > > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 > > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17 > > arcmsr0: [ITHREAD] > > ..... > > Waiting 5 seconds for SCSI devices to settle > > (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step > > da0 at arcmsr0 bus 0 target 0 lun 0 > > da0: Fixed Direct Access SCSI-5 device > > da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) > > da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C) > > SMP: AP CPU #1 Launched! > Hi and thanks for your reply. I do not believe the interrupts are the problem at the moment as the stats. Here is a vmstat when the system usage is spiking and just before http needs to be killed to get going again most recently. vmstat -i interrupt total rate irq1: atkbd0 2 0 irq4: sio0 22880 0 irq14: ata0 58 0 irq22: uhci1 uhci3 18068 0 irq23: uhci0 uhci+ 1 0 irq26: arcmsr0 496094 14 cpu0: timer 61769334 1791 irq256: em0 1 0 irq258: em2 48505 1 irq259: em3 1 0 cpu1: timer 61762043 1791 cpu3: timer 61299367 1777 cpu2: timer 61299179 1777 cpu4: timer 61326132 1778 cpu7: timer 60845245 1764 cpu5: timer 61326513 1778 cpu6: timer 60845018 1764 Total 491058441 14243 There are no errors en the event console for the areca-cli. ARC-1130-VOL#00 Main Raid Array Raid1+0 1000.0GB 00/00/00 Normal Main Raid Array 4 2000.0GB 0.0GB 1234 Normal Main Processor : 500MHz CPU ICache Size : 32KB CPU DCache Size : 32KB CPU SCache Size : 0KB System Memory : 1024MB/333MHz/ECC Firmware Version : V1.44 2008-2-1 BOOT ROM Version : V1.44 2008-1-28 The buildworld taking a really long time was just an example of the problem I am seeing that is easy to quantify. If I run boxbackup, dump, clamscan or a few other IO intensive everything gets VERY slow even when reading files from the server. When the HTTP locks up (another issue that is seen and is connected to the same issue in my view) this is what it looks like. It is almost as if the http gets backed up from what I can tell and I need a plunger to clean out the blockage :) I have to kill it and then restart it to get things back to normal for a bit. last pid: 46013; load averages: 105.30, 67.67, 34.45 up 4+23:59:42 19:08:40 629 processes: 89 running, 540 sleeping CPU: 21.9% user, 0.0% nice, 74.5% system, 3.1% interrupt, 0.4% idle Mem: 1538M Active, 11G Inact, 898M Wired, 303M Cache, 214M Buf, 1346M Free Swap: 8192M Total, 1036K Used, 8191M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 46000 www 1 65 0 86728K 15008K RUN 1 0:01 12.06% httpd 45994 www 1 56 0 86728K 15008K CPU1 3 0:01 10.16% httpd 46002 www 1 -4 0 150M 20648K RUN 3 0:00 6.98% httpd 45195 www 1 68 0 121M 19748K RUN 1 0:29 6.88% httpd 45991 www 1 53 0 150M 21060K select 3 0:01 6.59% httpd 45997 www 1 -4 0 150M 20992K ufs 5 0:01 6.59% httpd 45950 www 1 57 0 153M 23388K RUN 2 0:01 6.49% httpd 45999 www 1 -4 0 150M 20640K ufs 6 0:00 5.96% httpd 45189 www 1 66 0 161M 29660K RUN 6 0:26 5.76% httpd 45974 www 1 -4 0 151M 21564K ufs 3 0:01 5.76% httpd 45984 www 1 -4 0 151M 21376K ufs 5 0:01 5.66% httpd 45998 www 1 -4 0 150M 20652K ufs 3 0:00 5.57% httpd 45780 www 1 -4 0 153M 23516K ufs 6 0:06 5.37% httpd 45972 www 1 -4 0 151M 21332K RUN 4 0:01 5.37% httpd 46001 www 1 20 0 150M 20568K lockf 4 0:00 5.37% httpd 45425 www 1 60 0 164M 31516K RUN 7 0:15 5.18% httpd 45995 www 1 63 0 150M 20820K RUN 2 0:00 5.18% httpd 45845 www 1 -4 0 151M 21692K ufs 6 0:02 4.98% httpd 45854 www 1 52 0 151M 22080K CPU6 0 0:02 4.88% httpd 45977 root 1 47 0 10160K 3260K CPU2 6 0:01 4.88% top 45509 www 1 56 0 155M 25028K RUN 0 0:14 4.79% httpd 45735 www 1 -4 0 158M 27096K RUN 3 0:07 4.79% httpd 45730 www 1 20 0 151M 21728K lockf 2 0:04 4.79% httpd 45847 www 1 -4 0 150M 21092K RUN 5 0:02 4.69% httpd 85338 root 1 46 0 150M 20560K select 7 5:03 4.59% httpd 45835 www 1 -4 0 150M 21100K ufs 0 0:02 4.59% httpd 45443 www 1 -4 0 151M 22220K ufs 6 0:12 4.49% httpd 45699 www 1 -4 0 157M 26528K RUN 0 0:07 4.39% httpd 45722 www 1 -4 0 152M 22908K RUN 0 0:05 4.39% httpd 45701 www 1 -4 0 152M 22268K RUN 2 0:07 4.30% httpd 45849 www 1 -4 0 151M 21748K ufs 6 0:02 4.30% httpd 46010 nagios 1 -4 0 7684K 1136K ufs 5 0:00 4.30% check_ping 45515 www 1 -4 0 160M 29048K ufs 5 0:13 4.20% httpd 45606 www 1 -4 0 152M 22420K ufs 0 0:09 4.20% httpd vfs.numvnodes: 355382 kern.maxvnodes: 400000 vfs.ufs.dirhash_docheck: 0 vfs.ufs.dirhash_mem: 3239015 vfs.ufs.dirhash_maxmem: 10485760 vfs.ufs.dirhash_minsize: 2560 kern.ipc.nsfbufs: 0 kern.openfiles: 3395 kern.maxfiles: 12328 Results from netstat -m ------------------------ 1185/3360/4545 mbufs in use (current/cache/total) 1062/2856/3918/25600 mbuf clusters in use (current/cache/total/max) 1062/1556 mbuf+clusters out of packet secondary zone in use (current/cache) 10/1550/1560/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 2460K/12752K/15212K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 46262 requests for I/O initiated by sendfile 0 calls to protocol drain routines Results from vmstat -m ------------------------ Type InUse MemUse HighUse Requests Size(s) cdev 22 6K - 22 256 acd_driver 1 2K - 1 2048 sigio 1 1K - 1626 64 filedesc 684 941K - 1199696 16,32,64,128,256,512,1024,2048,4096 kenv 68 11K - 70 16,32,64 kqueue 368 414K - 1740632 256,2048,4096 proc-args 52 4K - 5389885 16,32,64,128,256 ithread 99 19K - 99 32,128,256 acpisem 13 2K - 13 128 CAM queue 12 1K - 302 16,32,64,128,256 KTRACE 100 13K - 100 128 linker 45 4K - 71 16,32,64,128,512 lockf 314 38K - 16413112 64,128,256,512,1024,2048,4096 scsi_da 0 0K - 65 16 ip6ndp 7 1K - 7 64,128 ip6opt 1 1K - 50503 256 temp 66 222K - 6704801 16,32,64,128,256,512,1024,2048,4096 devbuf 16781 35476K - 108258 16,32,64,128,256,512,1024,2048,4096 CAM SIM 2 1K - 2 256 module 204 26K - 204 128 acpidev 78 5K - 78 64 mtx_pool 1 8K - 1 subproc 1111 1606K - 1045562 512,4096 proc 2 16K - 2 session 34 5K - 20772 128 pgrp 39 5K - 158890 128 cred 24950 6238K - 11839905 256 uidinfo 13 3K - 7337 64,2048 plimit 24 6K - 226179 256 CAM periph 7 2K - 45 16,32,64,128,256 sysctltmp 0 0K - 215050 16,32,64,128,256 sysctloid 4373 216K - 4373 16,32,64,128 sysctl 0 0K - 828292 16,32,64 umtx 1692 212K - 1692 128 p1003.1b 1 1K - 1 16 SWAP 2 1097K - 2 64 CAM XPT 51 24K - 19790153 32,64,128,256,1024 bus-sc 111 101K - 1879 16,32,64,128,256,512,1024,2048,4096 bus 804 77K - 5926 16,32,64,128,256,1024 devstat 10 21K - 10 32,4096 eventhandler 57 5K - 57 64,128 kobj 125 500K - 160 4096 kbdmux 6 9K - 6 16,256,512,2048,4096 rman 168 21K - 576 16,64,128 sbuf 0 0K - 840 16,32,64,128,256,512,1024,2048,4096 pci_link 16 2K - 16 16,128 stack 0 0K - 14 256 taskqueue 19 2K - 19 16,32,128 Unitno 16 1K - 22074 32,64 iov 0 0K - 12126863 16,64,128,256,512 ioctlops 0 0K - 388714 16,32,64,128,256,512,1024,2048 msg 4 30K - 4 2048,4096 sem 4 8K - 4 512,1024,2048,4096 shm 1 16K - 1 ttys 1170 169K - 80824 128,1024 ptys 5 2K - 5 256 accf 3 1K - 301 32,64 mbuf_tag 0 0K - 520852 32,128 pcb 47 158K - 1332310 16,32,128,1024,2048,4096 soname 187 23K - 10680643 16,32,128 biobuf 1 2K - 143707 2048 vfscache 1 1024K - 1 cl_savebuf 0 0K - 154293 64,128 vfs_hash 1 512K - 1 vnodes 2 1K - 3 32,256 vnodemarker 1 1K - 124142 512 mount 111 6K - 495 16,32,64,128,256,2048 acpi_perf 8 1K - 8 64 BPF 6 1K - 6 128 ether_multi 29 2K - 32 16,32,64 ifaddr 136 48K - 136 32,64,128,256,512,4096 ifnet 7 13K - 7 256,2048 clone 6 24K - 6 4096 arpcom 5 1K - 5 16 lo 1 1K - 1 32 acpica 3057 292K - 68659 16,32,64,128,256,512,1024 routetbl 303 86K - 1027 32,64,128 ,256,512 in_multi 4 1K - 4 64 IpFw/IpAcct 60 9K - 60 64,128,2048 sctp_iter 0 0K - 65 256 sctp_ifn 4 1K - 4 128 sctp_ifa 66 9K - 66 128 sctp_vrf 1 1K - 1 64 sctp_a_it 0 0K - 65 16 hostcache 1 36K - 1 entropy 1024 64K - 1024 64 syncache 1 100K - 1 in6_multi 16 1K - 16 32,64,128 audit_evclass 150 5K - 187 32 savedino 0 0K - 406078 256 newdirblk 0 0K - 5047 64 dirrem 18 2K - 2259617 64 mkdir 1 1K - 283528 64 diradd 183 12K - 3426340 64 freefile 55 4K - 1081462 64 freeblks 26 7K - 792864 256 freefrag 2 1K - 781740 64 allocindir 5 1K - 2842332 128 indirdep 4 1K - 116594 64 allocdirect 62 16K - 4832896 256 bmsafemap 12 2K - 271759 128 newblk 1 1K - 7675229 64,512 inodedep 270 580K - 2593883 256 pagedep 12 130K - 318828 128 ufs_dirhash 2848 1230K - 42435 16,32,64,128,256,512,1024,2048,4096 ufs_quota 1 512K - 1 ufs_mount 15 241K - 51 128,256,512,2048,4096 UMAHash 5 572K - 33 512,1024,2048,4096 USBHC 0 0K - 660 16 USBdev 22 10K - 682 16,128,512 USB 761 683K - 4079 16,32,64,128,256,1024 vm_pgdata 2 129K - 2 128 DEVFS1 115 58K - 115 512 DEVFS3 250 63K - 251 256 DEVFS2 115 2K - 115 16 DEVFS_RULE 36 17K - 36 64,512 DEVFS 30 1K - 31 16,128 io_apic 2 4K - 2 2048 pfs_nodes 20 5K - 20 256 memdesc 1 4K - 2 4096 msi 4 1K - 4 128 nexusdev 4 1K - 4 16 acpitask 0 0K - 9 64 GEOM 104 20K - 882 16,32,64,128,256,512,1024,2048 atkbddev 2 1K - 2 64 isadev 7 1K - 7 128 CAM dev queue 2 1K - 2 128 ata_generic 1 1K - 1 1024 ata_dma 1 1K - 1 256 Results from systat -v ----------------------- 1 users Load 143 90.86 47.13 Nov 21 19:10 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 1754100 25964 4719924 55728 1551492 count All 1916252 113912 9413004 269144 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 179 cow 16002 total 73 133 454 2 32k 816 2520 3 29k 726 504 zfod atkbd0 1 ozfod sio1 irq3 86.8%Sys 3.5%Intr 9.2%User 0.0%Nice 0.6%Idle %ozfod sio0 irq4 | | | | | | | | | | | daefr ata0 irq14 ===========================================++>>>>> 16 prcfr uhci1 uhci 314 dtbuf 90 totfr uhci0 uhci Namei Name-cache Dir-cache 400000 desvn react 2 arcmsr0 26 Calls hits % hits % 355344 numvn pdwak 2004 cpu0: time 76763 76730 100 24902 frevn pdpgs em0 irq256 intrn 1 em2 irq258 Disks da0 da1 pass0 pass1 pass2 934624 wire em3 irq259 KB/t 9.00 0.00 0.00 0.00 0.00 1697060 act 2000 cpu1: time tps 1 0 0 0 0 12038912 inact 1996 cpu2: time MB/s 0.01 0.00 0.00 0.00 0.00 308732 cache 2000 cpu3: time %busy 0 0 0 0 0 1244784 free 2001 cpu7: time 219632 buf 1999 cpu4: time 1999 cpu6: time 2000 cpu5: time Here is a "normal" sysstat -v to compare when there are no "visible" problems: 3 users Load 1.67 1.03 1.02 Nov 25 22:12 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 797576 31388 2318500 57324 4051340 count All 952256 114916 6781828 226696 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 556 cow 16001 total 1 6 474 5463 1602 3387 1 31k 1567 853 zfod atkbd0 1 ozfod sio1 irq3 8.4%Sys 4.0%Intr 2.8%User 0.0%Nice 84.8%Idle %ozfod sio0 irq4 | | | | | | | | | | | daefr ata0 irq14 ====++>> 602 prcfr uhci1 uhci 125 dtbuf 1443 totfr uhci0 uhci Namei Name-cache Dir-cache 400000 desvn react arcmsr0 26 Calls hits % hits % 328748 numvn pdwak 2026 cpu0: time 52734 52660 100 24705 frevn pdpgs em0 irq256 intrn 1 em2 irq258 Disks da0 da1 pass0 pass1 pass2 857028 wire em3 irq259 KB/t 0.00 0.00 0.00 0.00 0.00 750716 act 2026 cpu1: time tps 0 0 0 0 0 10564316 inact 1975 cpu2: time MB/s 0.00 0.00 0.00 0.00 0.00 303468 cache 1977 cpu3: time %busy 0 0 0 0 0 3748056 free 1999 cpu7: time 219632 buf 2000 cpu4: time 1997 cpu6: time 2000 cpu5: time ---------------------------------- Here are the results of vmstat -w 1 during the problem: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 157 110 13 5544M 1111M 1141 0 0 0 1100 44 0 0 47 115 8744 7 14 79 146 34 98 5546M 1099M 4191 0 0 0 729 0 2 0 17 18583 102586 9 91 0 224 33 15 5548M 1091M 3825 0 0 0 664 0 0 0 7 14115 141707 10 90 0 165 103 11 5633M 1064M 12222 0 0 0 6745 0 2 0 42 41519 403437 14 86 0 214 73 4 5653M 1044M 4539 0 0 0 959 0 0 0 7 15698 94269 11 88 1 260 30 1 5664M 1034M 8457 0 0 0 2171 0 0 0 14 36978 248202 12 87 0 57 182 45 5667M 1029M 4761 0 0 0 2535 0 0 0 6 21004 133617 10 90 0 55 24 16 2152M 2454M 7993 0 0 0 3135 0 0 0 13 20263 173347 13 81 7 20 15 2 1972M 2537M 93820 0 0 0 465955 0 10 0 588 99274 716238 23 76 1 13 11 0 1877M 2581M 7820 0 0 0 31044 0 6 0 38 7859 76120 16 83 1 9 12 1 1816M 2599M 6889 0 0 0 14550 0 20 0 79 359198 21333 14 79 7 11 13 0 1797M 2613M 6542 0 0 0 8416 0 3 0 17 606119 15341 18 61 21 1 9 1 1740M 2636M 1744 0 0 0 6267 0 2 0 14 11617 15322 8 63 29 2 3 0 1694M 2657M 3417 0 0 0 8669 0 15 0 52 50341 12045 6 29 65 Here is another view of top at a later date with the same problem happening focusing on IO setting in Top: -------------------------------------------------------------- last pid: 17984; load averages: 39.26, 37.68, 24.75 up 8+09:25:55 04:34:53 539 processes: 59 running, 480 sleeping CPU: 9.8% user, 0.5% nice, 87.0% system, 2.3% interrupt, 0.4% idle Mem: 1146M Active, 9663M Inact, 875M Wired, 582M Cache, 214M Buf, 3577M Free Swap: 8192M Total, 1036K Used, 8191M Free PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 17587 www 446 62 0 0 0 0 0.00% httpd 17763 www 515 37 0 0 0 0 0.00% httpd 17860 www 538 47 0 0 0 0 0.00% httpd 17703 www 457 43 0 0 0 0 0.00% httpd 17701 www 485 34 0 0 0 0 0.00% httpd 17550 www 423 29 0 0 0 0 0.00% httpd 17579 www 0 0 0 0 0 0 0.00% httpd 17864 www 495 39 0 0 0 0 0.00% httpd 17836 www 520 36 0 0 0 0 0.00% httpd 17847 www 451 28 0 0 0 0 0.00% httpd 17756 www 462 29 0 0 0 0 0.00% httpd 17982 www 445 63 0 0 0 0 0.00% httpd 17581 www 451 60 0 0 0 0 0.00% httpd 17761 www 449 37 0 0 0 0 0.00% httpd 17582 www 509 30 0 0 0 0 0.00% httpd 17709 www 447 28 0 0 0 0 0.00% httpd 17705 www 515 30 0 0 0 0 0.00% httpd 17704 www 469 38 0 0 0 0 0.00% httpd 17706 www 508 53 0 0 0 0 0.00% httpd 17833 www 483 34 0 0 0 0 0.00% httpd 17834 www 499 43 0 0 0 0 0.00% httpd 17974 www 489 38 0 0 0 0 0.00% httpd 17978 www 467 45 0 0 0 0 0.00% httpd 17576 www 447 32 0 0 0 0 0.00% httpd 17570 www 443 37 0 0 0 0 0.00% httpd 17762 www 476 31 0 0 0 0 0.00% httpd 17837 www 508 44 0 0 0 0 0.00% httpd 17548 www 443 32 0 0 0 0 0.00% httpd 17783 www 390 22 0 0 0 0 0.00% httpd 17961 www 534 57 0 0 0 0 0.00% httpd 17590 www 498 50 0 0 0 0 0.00% httpd 17700 www 471 35 0 0 0 0 0.00% httpd 17580 www 438 41 0 0 0 0 0.00% httpd This used to be on a 4.11x system with 1 cpu and only 1gb of ram and ran flawlessly with much less resources with the same web site code for a long time. I do not have this problem on the other 7.0 machine. I originally thought it was just a cpu issue but it is very closely tied to when something is trying to use the raid arrays and this seems to be the way to reproduce it. I am having a hard time determining why the system load is so high. Can you recommend the best way to identify the culprit? Thanks, Paul