Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jan 2008 13:13:27 -0800
From:      Steve Kargl <sgk@troutmask.apl.washington.edu>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        Tom Evans <tevans.uk@googlemail.com>, freebsd-current@freebsd.org
Subject:   Re: Regular bge watchdog timeouts on 7.0-PRERELEASE
Message-ID:  <20080118211327.GA50720@troutmask.apl.washington.edu>
In-Reply-To: <4790F680.1090204@freebsd.org>
References:  <1199966437.1545.27.camel@localhost> <20080110175347.GA68673@troutmask.apl.washington.edu> <4790F680.1090204@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 18, 2008 at 07:57:04PM +0100, Andre Oppermann wrote:
> Steve Kargl wrote:
> >On Thu, Jan 10, 2008 at 12:00:37PM +0000, Tom Evans wrote:
> >
> >>I am encountering regular watchdog timeouts on bge:
> >>
> >>Jan  9 08:36:11 zoot kernel: bge0: watchdog timeout -- resetting
> >>Jan  9 08:36:11 zoot kernel: bge0: link state changed to DOWN
> >>Jan  9 08:36:13 zoot kernel: bge0: link state changed to UP
> >
> >Add the following to /etc/sysctl.conf
> >
> >net.inet.tcp.sendspace=131072
> >net.inet.tcp.recvspace=131072
> 
> In 7.0 these are automatically tuning and can be left at the default
> settings.

I started using the above before automatic tuning was available,
and I haven't revisited whether these are still needed.  "If it
works, why fix it?" motto.

> >net.inet.tcp.path_mtu_discovery=0
> 
> You should not disable path MTU discovery.  It'll most likely break the
> internet for you when you encounter for example PPPoE links.

This is on a intranet.  A small cluster used for MPI computations.
I won't run into PPPoE issues, but it's good to know that problems
can occur.

> >net.inet.udp.recvspace=65536
> >net.inet.raw.recvspace=16384
> >kern.ipc.nmbclusters=50000
> >kern.ipc.shm_use_phys=1
> >net.inet.tcp.rexmit_min=30
> 
> These changes do not really have much influence on the bge problem
> (at least theoretically).

The first 3 are needed to make NFS happy on my cluster.  The shm
change is needed for MPICH2's nemesis device.  I don't remember
why I set rexmit_min.  See motto above.

-- 
Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080118211327.GA50720>