From owner-freebsd-current@FreeBSD.ORG  Wed Nov  2 15:36:35 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BA69016A41F
	for <current@freebsd.org>; Wed,  2 Nov 2005 15:36:35 +0000 (GMT)
	(envelope-from sven@dmv.com)
Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3E74C43D46
	for <current@freebsd.org>; Wed,  2 Nov 2005 15:36:34 +0000 (GMT)
	(envelope-from sven@dmv.com)
Received: from mail-gw-cl-b.dmv.com (mail-gw-cl-b.dmv.com [216.240.97.39])
	by smtp-gw-cl-c.dmv.com (8.12.10/8.12.10) with ESMTP id jA2FaWQb049286
	for <current@freebsd.org>; Wed, 2 Nov 2005 10:36:32 -0500 (EST)
	(envelope-from sven@dmv.com)
Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46])
	by mail-gw-cl-b.dmv.com (8.12.9/8.12.9) with ESMTP id jA2FaWGC072093
	for <current@freebsd.org>; Wed, 2 Nov 2005 10:36:32 -0500 (EST)
	(envelope-from sven@dmv.com)
From: Sven Willenberger <sven@dmv.com>
To: current@freebsd.org
Content-Type: text/plain
Date: Wed, 02 Nov 2005 10:36:33 -0500
Message-Id: <1130945793.7893.27.camel@lanshark.dmv.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.2.1.1 
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.39
X-Scanned-By: MIMEDefang 2.48 on 216.240.97.39
Cc: 
Subject: Tracking down em problem
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Nov 2005 15:36:36 -0000

FreeBSD6.0-RC1 (Wed Oct 26 13:31:21 EDT 2005)

I seem to have an issue with losing connections to an em interface
during process of heavy IO load. There are several variables here so I
am hoping for some guidelines to help troubleshoot this.

I have a postgresql server (8.0.4) set up on an i386 system. The data
directory is on its own partition (which is actually a gstripe/gmirror
setup -- see the footnote after my problem description).

I have enabled a replication system from another server. When I started
relication there was a large amount of data that had to be fed to this
server via the em0 interface. During this process, while ssh'ed to the
box, my connection would just hang for a few moments, then it would
recover. However, if I cd to the data directory (stripe/mirror) and
start ls -alrt several times, the connection actually gets broken; not
only my ssh connection but the replication connection from the master
server is broken.

I have tried to set debug.mpsafenet=0 in /boot/loader.conf to no avail
-- the same issue happens. Preemption is enabled in the kernel, as is
sched_4bsd. I don't really know how to proceed at this point to try and
troubleshoot this issue: as it stands now, it is most definitely a show
stopper for the purposes of this server.

Thanks,

Sven

*footnote: here is the gstripe/gmirror config:

a) the mirrors:
Geom name: pg1
State: COMPLETE
Components: 2
Balance: split
Slice: 8192
Flags: NONE
GenID: 0
SyncID: 1
ID: 1606567834
Providers:
1. Name: mirror/pg1
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: da1
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 2976581887
2. Name: da2
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 3738898587

Geom name: pg2
State: COMPLETE
Components: 2
Balance: split
Slice: 8192
Flags: NONE
GenID: 0
SyncID: 1
ID: 2419201320
Providers:
1. Name: mirror/pg2
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: da3
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 4053765902
2. Name: da4
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 2784554060

b) the stripes (using the mirrors):
Geom name: pgdata
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 65536
ID: 2329725949
Providers:
1. Name: stripe/pgdata
   Mediasize: 73407791104 (68G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: mirror/pg1
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 0
2. Name: mirror/pg2
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 1

This is then mounted as: 
/dev/stripe/pgdata      /usr/local/pgsql        ufs     rw,noatime
2       2