From owner-freebsd-current@FreeBSD.ORG  Wed Nov  2 15:51:39 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CF1F416A41F
	for <current@freebsd.org>; Wed,  2 Nov 2005 15:51:39 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 008DF43D73
	for <current@freebsd.org>; Wed,  2 Nov 2005 15:51:33 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220])
	by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id jA2FpQau047849;
	Wed, 2 Nov 2005 09:51:27 -0600 (CST)
	(envelope-from anderson@centtech.com)
Message-ID: <4368E07C.6050003@centtech.com>
Date: Wed, 02 Nov 2005 09:51:24 -0600
From: Eric Anderson <anderson@centtech.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20051021
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Sven Willenberger <sven@dmv.com>
References: <1130945793.7893.27.camel@lanshark.dmv.com>
In-Reply-To: <1130945793.7893.27.camel@lanshark.dmv.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV 0.82/1158/Wed Nov 2 07:29:56 2005 on mh1.centtech.com
X-Virus-Status: Clean
Cc: current@freebsd.org
Subject: Re: Tracking down em problem
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Nov 2005 15:51:39 -0000

Sven Willenberger wrote:
> FreeBSD6.0-RC1 (Wed Oct 26 13:31:21 EDT 2005)
> 
> I seem to have an issue with losing connections to an em interface
> during process of heavy IO load. There are several variables here so I
> am hoping for some guidelines to help troubleshoot this.
> 
> I have a postgresql server (8.0.4) set up on an i386 system. The data
> directory is on its own partition (which is actually a gstripe/gmirror
> setup -- see the footnote after my problem description).
> 
> I have enabled a replication system from another server. When I started
> relication there was a large amount of data that had to be fed to this
> server via the em0 interface. During this process, while ssh'ed to the
> box, my connection would just hang for a few moments, then it would
> recover. However, if I cd to the data directory (stripe/mirror) and
> start ls -alrt several times, the connection actually gets broken; not
> only my ssh connection but the replication connection from the master
> server is broken.
> 
> I have tried to set debug.mpsafenet=0 in /boot/loader.conf to no avail
> -- the same issue happens. Preemption is enabled in the kernel, as is
> sched_4bsd. I don't really know how to proceed at this point to try and
> troubleshoot this issue: as it stands now, it is most definitely a show
> stopper for the purposes of this server.


I've seen something similar on recent 5.4-STABLE, also using emX 
devices.  I have 3 Dell 1850's showing the same exact issue, and a few 
1850's that are not.  The ones that are not, are 5.4-RELEASE, and the 
ones that do, are running 5.4-STABLE.  In dmesg, I see a warning like this:

Nov  1 19:56:06 hal kernel: em1: Link is up 1000 Mbps Full Duplex

I don't see a 'link is down', just 'Link is up'.  One machine I've seen 
this on repeatedly is from about August 16th.

I'm using SCHED_4BSD, SMP, and most of the other GENERIC settings.

If anyone wants more details, let me know.  I have a spare Dell 1850 I 
can play with.


Eric


-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------