From owner-freebsd-performance@FreeBSD.ORG Mon Jun 16 12:44:55 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF85B1065686 for ; Mon, 16 Jun 2008 12:44:55 +0000 (UTC) (envelope-from felipebgn@gmail.com) Received: from ik-out-1112.google.com (ik-out-1112.google.com [66.249.90.179]) by mx1.freebsd.org (Postfix) with ESMTP id 49C228FC29 for ; Mon, 16 Jun 2008 12:44:54 +0000 (UTC) (envelope-from felipebgn@gmail.com) Received: by ik-out-1112.google.com with SMTP id c30so4652898ika.3 for ; Mon, 16 Jun 2008 05:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=/qhrIERE5QJyTRO9A07cHqXfdLln75fxzAHhykVeGTU=; b=ViAuOptDUGW7bySPQZiGWxHOW70nVGkJgAGxLnh6yoor+tw4v/jVvFM4fd5/O4KZe8 pEzVETQgOMIk1Sk42KQyPhwXvEkQWliJTDIAudw10oL17Q8VWotF8erAfk5Fk/bmmmVA AiMQmfH5N20su3pyIdPmcEgUdHlH+rQRvq8Io= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=WJDAA7neIBWYNj914SkAmxZqPN+9AJ5cFR8hk0MSqhoQbb+0fLVdUyaZrlE7d+Dfwf HZ6lGkbhi7WTsWHvmg3HCpJjsV8pRKq8BcnFSbINqalVL1mhFqp5C0OSXFyJcg3dIDWl Trw/yC6lZswAlHe6gTIdBavMnNU5BK0+3NCGY= Received: by 10.210.43.10 with SMTP id q10mr6035099ebq.183.1213620293708; Mon, 16 Jun 2008 05:44:53 -0700 (PDT) Received: by 10.210.19.16 with HTTP; Mon, 16 Jun 2008 05:44:53 -0700 (PDT) Message-ID: <928b5da90806160544y26c069ddl5bb6a6b946fefe5c@mail.gmail.com> Date: Mon, 16 Jun 2008 09:44:53 -0300 From: "Felipe Neuwald" To: "Andrew MacIntyre" In-Reply-To: <928b5da90806130545w37621619t20dab5495318459a@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <928b5da90806111738h55bbbbb0y3e9731323a1561f4@mail.gmail.com> <485107C2.7080202@bullseye.andymac.org> <928b5da90806121057j48c178bdp773adfe759561e15@mail.gmail.com> <928b5da90806130545w37621619t20dab5495318459a@mail.gmail.com> Cc: freebsd-performance@freebsd.org, freebsd-python@freebsd.org Subject: Re: Performance with python and FreeBSD 7.0 amd64 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jun 2008 12:44:55 -0000 Hi Andrew and all, Just to inform: after the weekend test, I still got no more errors. I think the problem is solved. Thank you very much, Felipe Neuwald. 2008/6/13 Felipe Neuwald : > Andrew and all, > > After recompile python 2.4 with HUGE_STACK_SIZE option, I got no more > problems. I'll still wait the weekend to say it again, and wait for > the customer reply about system performance / errors. If I got news, > I'll send to you. > > Thank you very much, > > Felipe Neuwald. > > 2008/6/12 Felipe Neuwald : >> Andrew, I'll try to recompile python with "HUGE STACK SIZE" option. Let's see. >> >> Felipe Neuwald. >> > From owner-freebsd-performance@FreeBSD.ORG Mon Jun 16 14:40:16 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21FF41065672; Mon, 16 Jun 2008 14:40:16 +0000 (UTC) (envelope-from andymac@bullseye.apana.org.au) Received: from ipmail01.adl6.internode.on.net (ipmail01.adl6.internode.on.net [203.16.214.146]) by mx1.freebsd.org (Postfix) with ESMTP id 758FE8FC24; Mon, 16 Jun 2008 14:40:15 +0000 (UTC) (envelope-from andymac@bullseye.apana.org.au) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq0EAFwYVkg7pyt2/2dsb2JhbACBW6tz X-IronPort-AV: E=Sophos;i="4.27,652,1204464600"; d="scan'208";a="127635226" Received: from ppp59-167-43-118.lns2.cbr1.internode.on.net (HELO bullseye.apana.org.au) ([59.167.43.118]) by ipmail01.adl6.internode.on.net with ESMTP; 17 Jun 2008 00:10:13 +0930 Received: from [192.168.63.10] (tenring.andymac.org [192.168.63.10]) by bullseye.apana.org.au (8.14.2/8.14.2) with ESMTP id m5GEbc59034471; Tue, 17 Jun 2008 00:37:38 +1000 (EST) (envelope-from andymac@bullseye.andymac.org) Message-ID: <48566D5E.3090208@bullseye.andymac.org> Date: Tue, 17 Jun 2008 00:40:46 +1100 From: Andrew MacIntyre User-Agent: Thunderbird 2.0.0.14 (OS/2/20080509) MIME-Version: 1.0 To: Felipe Neuwald References: <928b5da90806111738h55bbbbb0y3e9731323a1561f4@mail.gmail.com> <485107C2.7080202@bullseye.andymac.org> <928b5da90806121057j48c178bdp773adfe759561e15@mail.gmail.com> <928b5da90806130545w37621619t20dab5495318459a@mail.gmail.com> <928b5da90806160544y26c069ddl5bb6a6b946fefe5c@mail.gmail.com> In-Reply-To: <928b5da90806160544y26c069ddl5bb6a6b946fefe5c@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 16 Jun 2008 14:44:58 +0000 Cc: freebsd-performance@freebsd.org, freebsd-python@freebsd.org Subject: Re: Performance with python and FreeBSD 7.0 amd64 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jun 2008 14:40:16 -0000 Felipe Neuwald wrote: > Just to inform: after the weekend test, I still got no more errors. I > think the problem is solved. > > Thank you very much, > > Felipe Neuwald. > > 2008/6/13 Felipe Neuwald : >> Andrew and all, >> >> After recompile python 2.4 with HUGE_STACK_SIZE option, I got no more >> problems. I'll still wait the weekend to say it again, and wait for >> the customer reply about system performance / errors. If I got news, >> I'll send to you. Hope it stays that way for you. FWIW, Python 2.5 and later don't need to be recompiled to change the thread stack size, provided you can change the main script - there's a function in the threading module that will do it. Regards, Andrew. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From owner-freebsd-performance@FreeBSD.ORG Wed Jun 18 21:55:36 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D28111065670 for ; Wed, 18 Jun 2008 21:55:36 +0000 (UTC) (envelope-from hazlewood@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.227]) by mx1.freebsd.org (Postfix) with ESMTP id A39038FC16 for ; Wed, 18 Jun 2008 21:55:36 +0000 (UTC) (envelope-from hazlewood@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so7002478rvf.43 for ; Wed, 18 Jun 2008 14:55:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type; bh=67GNJ8R/1UKeuRn3On+jpezdO7cJqDqErcqhLTAHe5A=; b=qZlLAaGZM1X0OQ5Z1CkrCQ6S0gtM8wGAtlruNzUZIVyBUc4jVliub4bdRJDEe+j2zg nMvbVND7v0MvfG4nKnbFsc68MngwfTYFbJk8TXjDi1irCpZK7Yn/946Coew874OBQz/8 p+VPOOTAXrzSNWzsRcc0SiF5d1jlhJcH2t8eg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type; b=hVksLosXiktm0I0aJfBpUvxxKqWCv9AzJb6HV6V9urkG6ByHdewXQkKqF2vEU1G597 yxVt+f4uI1hK6qnc/1PiL0ZKdAzN5Xmm01dvdkbc1GROXdIVocW+GeKA+J+R+/N3YIiS 4GE79GOCNQNu2Oqs2IxYayeSttUi6739I6/gI= Received: by 10.114.158.1 with SMTP id g1mr1554039wae.203.1213824457925; Wed, 18 Jun 2008 14:27:37 -0700 (PDT) Received: by 10.114.136.18 with HTTP; Wed, 18 Jun 2008 14:27:37 -0700 (PDT) Message-ID: Date: Wed, 18 Jun 2008 14:27:37 -0700 From: Hazlewood To: freebsd-performance@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: 6.1 busy server periodically hangs, waits, then recovers a couple minutes later - analysis? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jun 2008 21:55:37 -0000 Hello List, I've done some searching but this particular random and temporary lockup condition that I'm experiencing doesn't seem to happen that much...anyways, here goes with my symptoms and I was hoping someone could guide me towards some add'l testing or stats I can gather to help pinpoint the root cause. The symptoms are as follows during this indefinite frozen condition: - Existing shell's will continue to be responsive - Existing sessions such as http download continue to work - Programs running within shells such as vmstat/systat/iostat, etc.. continue to spit out data - *New* incoming socket requests or commands executed on the shell will sit there indefinitely and come return an established connection or execute said command several minutes later when the system returns to life. Here's some data to start with: kern.ostype: FreeBSD kern.osrelease: 6.1-STABLE kern.osrevision: 199506 kern.version: FreeBSD 6.1-STABLE #1: Sat Jul 15 03:08:58 MST 2006 net.isr.swi_count: -993194527 net.isr.drop: 0 net.isr.queued: 57342094 net.isr.deferred: 1573529131 net.isr.directed: -1846058772 net.isr.count: -272529641 net.isr.direct: 1 net.route.netisr_maxqlen: 256 I changed to net.isr.direct=1 and saw a dramatic drop in the number of interrupts and a small drop in context switches but no real observed change in the number of lockups that happen.. net.inet.ip.intr_queue_maxlen: 512 net.inet.ip.intr_queue_drops: 10855650 I changed maxlen to 512 and the queue_drops don't increment anymore... hw.machine: amd64 hw.model: Intel(R) Xeon(R) CPU 5140 @ 2.33GHz hw.ncpu: 4 hw.byteorder: 1234 hw.physmem: 8578887680 hw.usermem: 8305831936 hw.pagesize: 4096 hw.floatingpoint: 1 hw.machine_arch: amd64 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection Version - 5.1.5 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x1096 subvendor=0x15d9 subdevice=0x0000 class=0x020000 dev.em.0.%parent: pci4 dev.em.0.debug_info: -1 dev.em.0.stats: -1 dev.em.0.rx_int_delay: 66 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 666 dev.em.0.tx_abs_int_delay: 666 dev.em.0.rx_processing_limit: -1 I tried messing with the default moderated polling settings for the em driver thinking the large number of interrupts coming from the NIC might possibly have something to do with it but so far no change. dev.mpt.0.%desc: LSILogic SAS Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=1 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0054 subvendor=0x1000 subdevice=0x3050 class=0x010000 dev.mpt.0.%parent: pci5 dev.mpt.0.debug: 3 This is scary, does the GIANT-LOCKED mean that this storage subsystem driver locks the entire kernel when it does I/O calls? (sorry I'm a little sketched out reading about the random bits of freebsd6 that don't yet use finer grained locking...) /var/run/dmesg.boot:mpt0: port 0x3000-0x30ff mem 0xc8310000-0xc8313fff,0xc8300000-0xc830ffff irq 24 at device 1.0 on pci5 /var/run/dmesg.boot:mpt0: [GIANT-LOCKED] /var/run/dmesg.boot:mpt0: MPI Version=1.5.12.0 Here's the various stats and logs.... vmstat -i interrupt total rate irq4: sio0 45153724 35 irq6: fdc0 3 0 irq14: ata0 47 0 irq18: em0 6188085724 4798 irq24: mpt0 340758261 264 cpu0: timer 2572161860 1994 cpu1: timer 2541470854 1970 cpu3: timer 2575006045 1996 cpu2: timer 2575266013 1997 Total 16837902531 13057 Here I caught about a minute's worth of frozen state as evidenced by the solid 15 or 17 processes in the wait queue and nothing really going on. vmstat 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 2 4 0 1374060 281876 41 0 0 0 2694 2323 0 0 192 3062 2137 3 10 88 2 11 0 1374060 348312 218 0 0 0 5263 8216 75 107 8129 25567 26444 6 15 79 2 6 0 1374060 396152 0 2 0 0 6063 8359 107 115 8190 25814 26399 6 19 75 0 7 1 1374060 309376 0 1 0 0 5623 0 63 130 8071 23806 25064 6 18 76 1 5 0 1374060 389544 0 0 0 0 5632 8438 83 92 8371 24884 26412 6 15 79 1 7 1 1374060 284640 0 1 0 0 5296 0 91 85 8380 23548 25405 6 18 76 0 5 3 1374060 359252 0 0 0 0 5072 8226 71 79 8294 24216 26170 5 18 77 1 8 0 1374060 411248 0 0 0 0 6034 8231 88 126 8366 27454 27470 6 18 76 1 6 1 1374060 303560 0 1 0 0 5475 0 94 121 8239 25001 25857 6 17 77 1 5 1 1374188 402788 14 1 0 0 5392 10179 52 139 8193 24831 25380 5 17 78 0 5 1 1374192 301200 20 0 0 0 5178 0 86 92 8336 24750 26702 5 15 79 1 7 0 1374192 369736 0 1 0 0 4941 8176 63 74 7998 23055 24414 6 17 78 1 8 1 1374192 275292 13 1 0 0 5192 0 119 102 8230 24862 26339 5 15 79 1 5 0 1374192 364596 459 0 1 1 5755 8438 112 124 8332 26315 26651 5 16 78 0 8 0 1374192 267856 0 1 0 0 4897 0 105 88 7997 22995 24989 6 16 78 1 12 0 1374192 339624 0 0 0 0 4920 8294 75 135 8227 24276 25400 5 16 80 2 5 0 1374200 397088 13 1 0 0 5718 8412 89 91 8116 26028 26313 6 18 77 2 6 0 1374228 292596 10 1 0 0 5374 0 94 96 8075 24513 25413 6 18 76 0 6 1 1374236 341492 7 1 0 0 6057 8342 134 116 8268 27172 27364 6 19 76 3 8 0 1374240 390940 7 1 0 0 5916 8243 108 121 8304 27672 27495 6 18 76 2 7 0 1374252 275052 7 2 0 0 5810 0 129 91 8198 24374 25501 5 18 77 4 8 0 1374252 321808 3 1 0 0 5921 8236 100 113 8273 26616 27128 6 19 75 1 8 0 1374268 272796 11 0 0 0 4735 0 83 114 7848 21924 24042 5 15 80 2 7 0 1374284 324676 7 0 0 0 6110 8201 114 93 8284 24293 25378 6 18 76 2 8 0 1374296 360076 10 1 0 0 6956 8328 136 125 8348 26724 27040 6 19 75 1 5 1 1374312 407356 230 0 0 0 6934 8229 126 131 8319 28257 27213 6 18 76 1 11 1 1374344 300228 13 0 0 0 6493 0 127 97 8354 25438 25542 6 19 75 0 4 1 1374344 348096 13 1 0 0 6719 8345 96 112 6806 28724 24457 7 19 74 1 10 1 1374952 404552 31 0 0 0 5796 8165 107 123 6340 24678 21992 6 17 78 0 13 1 1374952 374140 0 0 0 0 7774 0 97 117 6452 23443 22131 4 14 81 3 6 0 1374956 441772 3 0 0 0 3232 8181 147 110 6429 24878 22451 5 17 78 1 7 0 1374968 341796 10 0 0 0 4838 0 89 115 6542 24689 22643 5 17 78 0 13 0 1375012 414996 11 1 0 0 4828 8282 97 103 6162 21969 20812 4 14 81 0 15 0 1375192 414196 14 0 0 0 39 0 0 0 1695 380 3494 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 1255 23 2565 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 1248 25 2549 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 1012 23 2079 0 0 100 0 15 0 1375192 414196 218 0 0 0 252 0 0 0 828 476 1779 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 783 23 1618 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 809 24 1670 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 725 23 1503 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 545 25 1140 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 489 23 1027 0 0 100 0 15 0 1375192 414196 0 0 0 0 0 0 0 0 509 72 1075 0 0 100 2 14 0 1375376 412284 39 0 0 0 46 0 6 2 5397 359 1170 0 79 21 0 15 0 1375380 410128 99 0 0 0 211 0 10 46 1921 839 1182 0 69 31 1 15 0 1375380 421280 266 0 0 0 907 0 15 0 1582 578 589 0 73 26 0 15 0 1375380 421504 13 0 0 0 45 0 26 1 1581 94 497 0 59 41 0 17 1 1377496 430164 211 0 0 0 1505 0 16 19 2869 2736 2900 2 68 30 2 17 0 1377060 386444 264 1 0 0 970 0 34 30 3532 4017 3917 3 87 10 2 7 0 1377180 414764 1166 5 2 0 8861 8362 124 174 6236 31661 20389 10 26 64 0 7 1 1377180 346060 67 0 0 0 8122 0 77 220 6442 24740 22522 5 21 74 2 4 0 1377180 391492 7 0 0 0 5783 8273 87 72 6218 23687 22724 5 17 78 0 6 0 1377180 324412 10 0 0 0 5293 0 63 111 6949 20025 20603 4 17 79 2 6 0 1377180 426368 15 0 0 0 6419 8211 101 104 6487 18747 19175 5 23 72 0 4 1 1377180 334612 26 1 0 0 4488 0 133 115 6490 23064 22344 5 15 79 2 8 0 1377180 286908 104 0 0 0 4959 0 132 75 6222 18812 18577 5 27 68 1 10 0 1377180 389248 226 1 0 0 6362 9045 125 111 6263 22443 21853 5 17 78 2 11 1 1377180 291636 51 0 0 0 5923 0 98 116 6474 21827 20642 6 26 68 0 4 2 1377180 306052 37 1 0 0 7971 0 68 91 6006 21735 20893 5 18 78 1 6 0 1377180 428352 30 0 0 0 3050 8609 120 81 5949 19114 18973 4 25 71 0 6 1 1377180 374184 24 0 0 0 4525 0 97 92 6483 22294 21853 5 18 77 2 5 1 1377180 319532 4 0 0 0 4965 0 105 88 6348 21421 21703 5 18 77 Here's a normal top -S output, I haven't been able to grab one from when the problem exists... last pid: 67537; load averages: 0.72, 0.93, 0.94 up 15+16:05:21 14:12:38 83 processes: 7 running, 63 sleeping, 13 waiting CPU states: 4.7% user, 0.0% nice, 12.6% system, 0.2% interrupt, 82.5% idle Mem: 1232M Active, 5912M Inact, 261M Wired, 285M Cache, 214M Buf, 11M Free Swap: 9216M Total, 32K Used, 9216M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 1 171 52 0K 16K CPU2 2 344.0H 88.72% idle: cpu2 14 root 1 171 52 0K 16K RUN 0 338.1H 88.33% idle: cpu0 13 root 1 171 52 0K 16K RUN 1 338.3H 85.64% idle: cpu1 11 root 1 171 52 0K 16K CPU3 3 282.4H 62.89% idle: cpu3 24 root 1 -68 0 0K 16K CPU3 3 29.5H 20.17% em0 taskq 66405 squid 1 4 0 832M 819M kqread 3 13:47 13.28% squid 82587 squid 1 4 0 294M 280M kqread 3 885:07 12.55% squid At first I was thinking the single NIC and single CPU that gets hit from it was draining to 0% idle but I don't think it's really related and wouldn't explain why new processes couldn't run on new sessions, etc.. as for dmesg errors, all we see are these errors that pop every minute or so: mpt0: Unhandled Event Notify Frame. Event 0xe (ACK not required). mpt0: mpt_cam_event: 0xe but also once the problem starts we see a bunch of timeouts from mpt0: mpt0: request 0xffffffff8bd10010:19992 timed out for ccb 0xffffff022d40b400 (req->ccb 0xffffff022d40b400) mpt0: request 0xffffffff8bd0f5c0:19993 timed out for ccb 0xffffff01949b4800 (req->ccb 0xffffff01949b4800) mpt0: attempting to abort req 0xffffffff8bd10010:19992 function 0 mpt0: request 0xffffffff8bd0fb98:19998 timed out for ccb 0xffffff020d3f3c00 (req->ccb 0xffffff020d3f3c00) mpt0: completing timedout/aborted req 0xffffffff8bd10010:19992 mpt0: request 0xffffffff8bd0a9c8:20002 timed out for ccb 0xffffff022e147800 (req->ccb 0xffffff022e147800) mpt0: request 0xffffffff8bd098f0:20003 timed out for ccb 0xffffff0000f01800 (req->ccb 0xffffff0000f01800) (da2:mpt0:0:17:0): WRITE(10). CDB: 2a 0 c 5e f cf 0 0 80 0 (da2:mpt0:0:17:0): CAM Status: SCSI Status Error (da2:mpt0:0:17:0): SCSI Status: Check Condition (da2:mpt0:0:17:0): UNIT ATTENTION asc:29,7 (da2:mpt0:0:17:0): Reserved ASC/ASCQ pair (da2:mpt0:0:17:0): Retrying Command (per Sense Data) mpt0: abort of req 0xffffffff8bd10010:0 completed mpt0: attempting to abort req 0xffffffff8bd0f5c0:19993 function 0 mpt0: request 0xffffffff8bd13978:20004 timed out for ccb 0xffffff0000d60000 (req->ccb 0xffffff0000d60000) mpt0: completing timedout/aborted req 0xffffffff8bd0f5c0:19993 mpt0: request 0xffffffff8bd0e648:20005 timed out for ccb 0xffffff0000cb2000 (req->ccb 0xffffff0000cb2000) mpt0: abort of req 0xffffffff8bd12a58:0 completed mpt0: attempting to abort req 0xffffffff8bd130e0:20027 function 0 mpt0: completing timedout/aborted req 0xffffffff8bd130e0:20027 mpt0: abort of req 0xffffffff8bd130e0:0 completed mpt0: attempting to abort req 0xffffffff8bd11f58:20028 function 0 mpt0: completing timedout/aborted req 0xffffffff8bd11f58:20028 mpt0: abort of req 0xffffffff8bd11f58:0 completed mpt0: attempting to abort req 0xffffffff8bd0d4c0:20029 function 0 mpt0: completing timedout/aborted req 0xffffffff8bd0d4c0:20029 Etc... Here's what some typical disk activity looks like: iostat -d 5 da0 da1 da2 da3 KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s 47.87 45 2.11 51.09 54 2.69 50.72 56 2.77 50.09 61 3.00 47.47 93 4.30 54.25 92 4.87 49.92 77 3.77 43.49 81 3.45 70.62 49 3.41 62.31 80 4.89 54.47 86 4.58 44.36 70 3.03 56.75 50 2.79 52.30 103 5.26 64.82 56 3.53 50.95 55 2.73 59.76 74 4.34 63.63 64 3.98 43.97 141 6.04 45.34 104 4.60 41.91 105 4.30 50.78 96 4.75 61.19 66 3.96 50.23 82 4.00 54.88 60 3.20 64.17 70 4.37 71.42 74 5.16 40.63 104 4.11 52.17 110 5.60 54.50 86 4.55 50.85 95 4.74 36.75 91 3.28 I have not used WITNESS before but would this be a good time to start looking? Is the server simply too busy? What else could I look for or try tweaking to get around this problem that doesn't happen at lower off-peak load levels? Thanks, Hazlewood