From owner-freebsd-hardware@FreeBSD.ORG Sat Sep 1 20:14:29 2012 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C5A8E106566C for ; Sat, 1 Sep 2012 20:14:29 +0000 (UTC) (envelope-from ayoung@mosaicarchive.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 825F58FC1C for ; Sat, 1 Sep 2012 20:14:29 +0000 (UTC) Received: by obbun3 with SMTP id un3so9678500obb.13 for ; Sat, 01 Sep 2012 13:14:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=+KKyU3KnfAu2MyZfKKKQ1R/PjEFbQSMFCNI3nmRrZus=; b=CF1Qx4HIx1Pdt4jsuYXsqulVeAVVm3JWkzQna8BjOXauYM0Gg5M4Q4Cl32HTHz4QaJ 52QTRPnV58MGQAuhTkCvake2vozmBpEWUAeB7qEDb+ekZ3OCfBwXFWFEhPu6gVEYhjsA 0+yqoJpK15PP7E5as1jDETkowSdNI/Z2I1Eo1SHZ1SBuwG7/tWVGAC7djLtxcScgtfjT qQf4eH7t2uRPT/WZVmal1JUnWRGgWR1+DAoWAHkuIcNXGGLQYddoRPgj2SV0SoxeYhik /MYCoUzlh4kLnpUWdZKqbAzM1byV1JGwLg537qQ/RxQTub7rj4PorNEzLqJsJ1wjcpEl N3+w== MIME-Version: 1.0 Received: by 10.182.188.41 with SMTP id fx9mr10610472obc.92.1346530468743; Sat, 01 Sep 2012 13:14:28 -0700 (PDT) Received: by 10.76.174.38 with HTTP; Sat, 1 Sep 2012 13:14:28 -0700 (PDT) X-Originating-IP: [96.237.242.243] Date: Sat, 1 Sep 2012 16:14:28 -0400 Message-ID: From: Andy Young To: freebsd-hardware@freebsd.org X-Gm-Message-State: ALoCoQlwECjDO9tHP2DN9u1e5xY/aiwUXHOXWATJIOXhZHbvc8AkeShELihERGAABeZ+jOL0WfG/ Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Load testing knocks out network X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Sep 2012 20:14:29 -0000 Last night one our servers went offline while I was load testing it. When I got to the datacenter to check on it, the server seemed perfectly fine. Everything was running on it, there were no panics or any other sign of a hard crash. The only problem is the network was unreachable. I couldn't connect to the box even from a laptop directly attached to the ethernet port. I couldn't connect to anything from the box either. It was if the network controller had seized up. I restarted netif and it didn't make a difference. Rebooting the machine however, solved the issue and everything went back to working great. I restarted the load testing and reproduced the problem twice more this morning so at least its repeatable. It feels like a network controller / driver issue to me for a couple reasons. First, the problem affects the entire system. We're running FreeBSD 9 with about a half dozen jails. Most of the jails are running Apache but the one I was load testing was running Jetty. However, if it was my application code crashing I would expect the problem to at least be isolated to the jail that hosts it. Instead, the entire machine and all jails in it lose access to the network. Apart from not being able to access the network, I don't see any other signs of problems. This is the first major problem I've had to debug in FreeBSD so I'm not a debugging expert by any means. There are no error messages in /var/log/messages or dmesg apart from syslogd not being able to reach the network. If anyone has ideas on where I can look for more evidence of what is going wrong, I would really appreciate it. We're running FreeBSD 9.0-RELEASE-p3. The network controller is a Intel(R) PRO/1000 Network Connection version - 2.2.5 configured with 6 ips using aliases, five of which are used for jails. Thank you for the help!! Andy