From owner-freebsd-current@FreeBSD.ORG Wed Nov 2 15:36:35 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BA69016A41F for ; Wed, 2 Nov 2005 15:36:35 +0000 (GMT) (envelope-from sven@dmv.com) Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E74C43D46 for ; Wed, 2 Nov 2005 15:36:34 +0000 (GMT) (envelope-from sven@dmv.com) Received: from mail-gw-cl-b.dmv.com (mail-gw-cl-b.dmv.com [216.240.97.39]) by smtp-gw-cl-c.dmv.com (8.12.10/8.12.10) with ESMTP id jA2FaWQb049286 for ; Wed, 2 Nov 2005 10:36:32 -0500 (EST) (envelope-from sven@dmv.com) Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46]) by mail-gw-cl-b.dmv.com (8.12.9/8.12.9) with ESMTP id jA2FaWGC072093 for ; Wed, 2 Nov 2005 10:36:32 -0500 (EST) (envelope-from sven@dmv.com) From: Sven Willenberger To: current@freebsd.org Content-Type: text/plain Date: Wed, 02 Nov 2005 10:36:33 -0500 Message-Id: <1130945793.7893.27.camel@lanshark.dmv.com> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.39 X-Scanned-By: MIMEDefang 2.48 on 216.240.97.39 Cc: Subject: Tracking down em problem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Nov 2005 15:36:36 -0000 FreeBSD6.0-RC1 (Wed Oct 26 13:31:21 EDT 2005) I seem to have an issue with losing connections to an em interface during process of heavy IO load. There are several variables here so I am hoping for some guidelines to help troubleshoot this. I have a postgresql server (8.0.4) set up on an i386 system. The data directory is on its own partition (which is actually a gstripe/gmirror setup -- see the footnote after my problem description). I have enabled a replication system from another server. When I started relication there was a large amount of data that had to be fed to this server via the em0 interface. During this process, while ssh'ed to the box, my connection would just hang for a few moments, then it would recover. However, if I cd to the data directory (stripe/mirror) and start ls -alrt several times, the connection actually gets broken; not only my ssh connection but the replication connection from the master server is broken. I have tried to set debug.mpsafenet=0 in /boot/loader.conf to no avail -- the same issue happens. Preemption is enabled in the kernel, as is sched_4bsd. I don't really know how to proceed at this point to try and troubleshoot this issue: as it stands now, it is most definitely a show stopper for the purposes of this server. Thanks, Sven *footnote: here is the gstripe/gmirror config: a) the mirrors: Geom name: pg1 State: COMPLETE Components: 2 Balance: split Slice: 8192 Flags: NONE GenID: 0 SyncID: 1 ID: 1606567834 Providers: 1. Name: mirror/pg1 Mediasize: 36703949312 (34G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: da1 Mediasize: 36703949824 (34G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 2976581887 2. Name: da2 Mediasize: 36703949824 (34G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 1 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3738898587 Geom name: pg2 State: COMPLETE Components: 2 Balance: split Slice: 8192 Flags: NONE GenID: 0 SyncID: 1 ID: 2419201320 Providers: 1. Name: mirror/pg2 Mediasize: 36703949312 (34G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: da3 Mediasize: 36703949824 (34G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 4053765902 2. Name: da4 Mediasize: 36703949824 (34G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 1 Flags: DIRTY GenID: 0 SyncID: 1 ID: 2784554060 b) the stripes (using the mirrors): Geom name: pgdata State: UP Status: Total=2, Online=2 Type: AUTOMATIC Stripesize: 65536 ID: 2329725949 Providers: 1. Name: stripe/pgdata Mediasize: 73407791104 (68G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: mirror/pg1 Mediasize: 36703949312 (34G) Sectorsize: 512 Mode: r1w1e2 Number: 0 2. Name: mirror/pg2 Mediasize: 36703949312 (34G) Sectorsize: 512 Mode: r1w1e2 Number: 1 This is then mounted as: /dev/stripe/pgdata /usr/local/pgsql ufs rw,noatime 2 2