From owner-freebsd-stable@FreeBSD.ORG Tue May 28 06:42:03 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 169697C0; Tue, 28 May 2013 06:42:03 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id C2BC3CD0; Tue, 28 May 2013 06:42:02 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1UhDba-000ELW-6j; Tue, 28 May 2013 09:41:58 +0300 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3 To: Devin Teske Subject: Re: SunFire X2200 ilo's bge1 DOWN/UP In-reply-to: <13CA24D6AB415D428143D44749F57D7201F62C26@ltcfiswmsgmb21> References: <13CA24D6AB415D428143D44749F57D7201F62C26@ltcfiswmsgmb21> Comments: In-reply-to "Teske, Devin" message dated "Mon, 27 May 2013 13:41:08 -0000." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 28 May 2013 09:41:58 +0300 From: Daniel Braniss Message-ID: Cc: "" , FreeBSD-STABLE Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 06:42:03 -0000 ... > There are ways you can speed up the replication time. I tend to flood a ser= > ver with > TCP while I've heard of it happening under UDP flood too. > > Here's a nice way to flood a server with TCP (assuming you have SSH access = > to the > system via keys): > > sh -c 'while :;do dd if=3D/dev/urandom of=3D/dev/stdout bs=3D1m count=3D102= > 4 | ssh HOST2KILL /sbin/md5; done' > > Run that about 16 times in separate screen sessions from various other host= > s on your network, > taking care to replace "HOST2KILL" with the hostname or IP of the box with = > the SunFire X2200. > > Let that run for a while, and then when you think you've had a reset (if yo= > u weren't standing > there watching for one)=85 > > grep 'bge.*DOWN' /var/log/messages > > On a system that has booted and stayed up-and-running, there shouldn't be a= > ny messages like this: > > bge0: link state changed to DOWN > > When you actually get this message (if your experience is like ours), you'l= > l be down for 90 seconds > while the NIC resets. > > However, since you say you have some older 9.1 releases=85 I'd start by fir= > st trying to bring the > replication time of the problem down by using TCP and/or UDP floods. That w= > ay you'll be able to > test for resolution of the problem as you progress up to stable/9 (where th= > e problem should be fixed > by the aforementioned SVN revisions -- specific to your hardware). ... > any ideas? > > > Well, you say the connection is OK=85 so it doesn't sound like a full reset= > as it > was in our case (we have a different chipset). > > But I agree that a log full of those would be annoying. > > Try getting up to stable/9 in its current state (note: stable/8 also has al= > l the > aforementioned revisions too). > -- > Devin Hi Devin, the kernel is pretty new, actually last Friday's, and the svn says it's r250960. the bg1 port is not UP, it's shared with the onboard BMC/ILO/IPMI thingy. connecting to it via ssh gets me into it's ILO manager: ... Sun(TM) Embedded Lights Out Manager Copyright 2004-2006 Sun Microsystems, Inc. All rights reserved. Version 3.23 ... and so typing start AgentInfo/console I can get to the 'serial' console. cheers, and thanks, danny