From owner-freebsd-hardware@FreeBSD.ORG Sun Sep 2 08:58:09 2012 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3F20C106566B for ; Sun, 2 Sep 2012 08:58:09 +0000 (UTC) (envelope-from ragnar@gatorhole.com) Received: from maple.lonn.org (maple.lonn.org [109.228.153.253]) by mx1.freebsd.org (Postfix) with ESMTP id BF7358FC12 for ; Sun, 2 Sep 2012 08:58:08 +0000 (UTC) Received: from [10.0.1.16] (c213-100-153-38.cust.tele2.se [213.100.153.38]) (Authenticated sender: ragnar@gatorhole.com) by maple.lonn.org (Postfix) with ESMTPSA id D5C35735C6E for ; Sun, 2 Sep 2012 10:51:18 +0200 (CEST) Message-ID: <50431E04.5050207@gatorhole.com> Date: Sun, 02 Sep 2012 10:51:16 +0200 From: Ragnar Lonn User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-hardware@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Load testing knocks out network X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Sep 2012 08:58:09 -0000 Hi Andy, I work for an online load testing service (loadimpact.com) and what we see is that the most common cause when a server crashes during a load test, is that it runs out of some vital system resource. Usually system memory, but network connections (sockets/file descriptors) is also a likely cause. You should have gotten some kind of error messages in the system log, but if the problem is easily repeatable I would set up monitoring of at least memory and file descriptors, and see if you are near the limits when the machine freezes. Regards, /Ragnar On 09/01/2012 10:14 PM, Andy Young wrote: > Last night one our servers went offline while I was load testing it. When I > got to the datacenter to check on it, the server seemed perfectly fine. > Everything was running on it, there were no panics or any other sign of a > hard crash. The only problem is the network was unreachable. I couldn't > connect to the box even from a laptop directly attached to the ethernet > port. I couldn't connect to anything from the box either. It was if the > network controller had seized up. I restarted netif and it didn't make a > difference. Rebooting the machine however, solved the issue and everything > went back to working great. I restarted the load testing and reproduced the > problem twice more this morning so at least its repeatable. It feels like a > network controller / driver issue to me for a couple reasons. First, the > problem affects the entire system. We're running FreeBSD 9 with about a > half dozen jails. Most of the jails are running Apache but the one I was > load testing was running Jetty. However, if it was my application code > crashing I would expect the problem to at least be isolated to the jail > that hosts it. Instead, the entire machine and all jails in it lose access > to the network. > > Apart from not being able to access the network, I don't see any other > signs of problems. This is the first major problem I've had to debug in > FreeBSD so I'm not a debugging expert by any means. There are no error > messages in /var/log/messages or dmesg apart from syslogd not being able to > reach the network. If anyone has ideas on where I can look for more > evidence of what is going wrong, I would really appreciate it. > > We're running FreeBSD 9.0-RELEASE-p3. The network controller is a Intel(R) > PRO/1000 Network Connection version - 2.2.5 configured with 6 ips using > aliases, five of which are used for jails. > > Thank you for the help!! > > Andy > _______________________________________________ > freebsd-hardware@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hardware > To unsubscribe, send any mail to "freebsd-hardware-unsubscribe@freebsd.org"