Date: Wed, 02 Nov 2005 09:51:24 -0600 From: Eric Anderson <anderson@centtech.com> To: Sven Willenberger <sven@dmv.com> Cc: current@freebsd.org Subject: Re: Tracking down em problem Message-ID: <4368E07C.6050003@centtech.com> In-Reply-To: <1130945793.7893.27.camel@lanshark.dmv.com> References: <1130945793.7893.27.camel@lanshark.dmv.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Sven Willenberger wrote: > FreeBSD6.0-RC1 (Wed Oct 26 13:31:21 EDT 2005) > > I seem to have an issue with losing connections to an em interface > during process of heavy IO load. There are several variables here so I > am hoping for some guidelines to help troubleshoot this. > > I have a postgresql server (8.0.4) set up on an i386 system. The data > directory is on its own partition (which is actually a gstripe/gmirror > setup -- see the footnote after my problem description). > > I have enabled a replication system from another server. When I started > relication there was a large amount of data that had to be fed to this > server via the em0 interface. During this process, while ssh'ed to the > box, my connection would just hang for a few moments, then it would > recover. However, if I cd to the data directory (stripe/mirror) and > start ls -alrt several times, the connection actually gets broken; not > only my ssh connection but the replication connection from the master > server is broken. > > I have tried to set debug.mpsafenet=0 in /boot/loader.conf to no avail > -- the same issue happens. Preemption is enabled in the kernel, as is > sched_4bsd. I don't really know how to proceed at this point to try and > troubleshoot this issue: as it stands now, it is most definitely a show > stopper for the purposes of this server. I've seen something similar on recent 5.4-STABLE, also using emX devices. I have 3 Dell 1850's showing the same exact issue, and a few 1850's that are not. The ones that are not, are 5.4-RELEASE, and the ones that do, are running 5.4-STABLE. In dmesg, I see a warning like this: Nov 1 19:56:06 hal kernel: em1: Link is up 1000 Mbps Full Duplex I don't see a 'link is down', just 'Link is up'. One machine I've seen this on repeatedly is from about August 16th. I'm using SCHED_4BSD, SMP, and most of the other GENERIC settings. If anyone wants more details, let me know. I have a spare Dell 1850 I can play with. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4368E07C.6050003>
