From owner-freebsd-questions@FreeBSD.ORG Thu Jun 3 21:26:57 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5ABD106566C for ; Thu, 3 Jun 2010 21:26:57 +0000 (UTC) (envelope-from wmoran@potentialtech.com) Received: from mail.potentialtech.com (internet.potentialtech.com [66.167.251.6]) by mx1.freebsd.org (Postfix) with ESMTP id 735E88FC12 for ; Thu, 3 Jun 2010 21:26:56 +0000 (UTC) Received: from localhost (pr40.pitbpa0.pub.collaborativefusion.com [206.210.89.202]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.potentialtech.com (Postfix) with ESMTPSA id 4BA00F7427; Thu, 3 Jun 2010 17:26:55 -0400 (EDT) Date: Thu, 3 Jun 2010 17:26:54 -0400 From: Bill Moran To: Barry Steyn Message-Id: <20100603172654.ec6f5958.wmoran@potentialtech.com> In-Reply-To: <4C080F42.7080006@redbutton.co.za> References: <4C080F42.7080006@redbutton.co.za> Organization: Bill Moran X-Mailer: Sylpheed 3.0.2 (GTK+ 2.18.7; i386-portbld-freebsd7.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: Sluggish Apache Server X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jun 2010 21:26:57 -0000 In response to Barry Steyn : > Hi guys, > > We're having a serious problem here with our live server, it's very > sluggish all of a sudden. The problem is that Apache is *really* slow > responding to https requests but still fairly quick on http. We've > checked and ruled out all of the following: > > * CPU usage is normal and so is memory usage > * All other system daemons seem to be running just fine > * smartctl -a on both our disks (gmirror RAID 1) says health test > PASSED, gmirror status fine, smartd running for a week now with > nothing in the logs > * nothing strange in the apache access or error logs > * restarted apache, stopped jboss, upgraded apache to latest patch > level, even soft rebooted the box but to no avail > * nobody has done any upgrades, code changes, physical changes or > anything else to the box before the problem first manifested itself > * hosting problems - this problem even occurs when you do a wget on > the same box with https://localhost/... , in fact then it doesn't > even get to the SSL handshake as it doesn't get to complain about > the certificate mismatch > > The weird thing is that the first time this happened a week ago, there > was only jboss/seam (which runs behind apache via mod_proxy_ajp) that > had an issue with sluggishness, all other https pages worked just fine. > Our tech time was desperate when nobody senior was available and decided > to hard reboot (power off and on again) the box after which it acted > really strangely (with disk errors in the logs, other system daemons > dying randomly) but eventually came right. A week later, sometime this > afternoon, the problems reoccurred but this time they are chronic, > nothing we do seems to help. > > So, I keep thinking it must be a hardware problem. Not disk (or maybe it > is?), then perhaps faulty RAM? I always thought faulty RAM results in > nasty kernel panics, segfaults and other obvious symptoms but not a > sluggishness in one particular daemon... > > Any ideas? Given your description of the symptoms, I suspect hardware. Sure, SMART is nice, but it's not failproof. It also doesn't monitor the disk controller, which can have problems. The fact that it gave you all sorts of disk issues after a reboot tells me that there is something wrong that SMART and other hardware diagnostics aren't detecting. I've seen systems fail in ways that defy all attempts to predict and detect. Saw a RAID system die in such a way that the system locked up tight, in spite of the fact that there was a backup RAID card installed that should have taken over. If your budget allows, I'd make it top priority to migrate off that system and onto a new one, then get that system into a dedicated testing setup to see if you can isolate any problems in the hardware. Whatever else you do, make sure you have good backups of any data on that system right away. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/