Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Oct 2001 08:51:55 -0500
From:      Ryan Dooley <dooleyr@missouri.edu>
To:        stable@freebsd.org
Subject:   Uh... server crashes every day :-/ any thoughts?
Message-ID:  <3BC5A3FB.22F51BB@missouri.edu>

next in thread | raw e-mail | index | archive | help
Hey All,

I've got this 4.4-RELEASE server running a Dell 6450 that seems to be
having issues (I've crashed once a day for the past week at the worst
possible time (business hours).

Here's the deal... 

The system is a central NFS server serving up NFS, SAMBA, and printing
to a large number of clients.  It has two interfaces.  One goes to a
dedicated 100MB network for 6 linux machines that act as web and ftp
servers as well as some general access machines (they mount a file
system from this server via NFS (version 3, udp)

The second interface goes to our public network to serve out NFS/SAMBA. 
We have a mix of unix clients (AIX, IRIX, and more linux).  The AIX and
IRIX clients mount via version 3 and tcp, while linux continues to mount
version 3, udp.

We have 170ish active NFS clients off this one interface and 800+ samba
clients.

The file server itself is a Dell 6450 (dual processor with 1 GB ram and
fibrechannel disk.)  The fibrechannel is connected via a Qlogic 2200 HBA
to an IBM fibrechannel array.  We have a 891GB disk that houses user
data (yes, this is a 45 minute PITA to fsck, I'm really looking forward
to fsck -B...)

The crash yesterday looked to involve a SMP error so I rebooted the
system with a uniprocessor kernel.

The past two crashes have left the system in a panic state, but the
state never recovers from a the "syncing disks message" and has to be
powercycled (I didn't wait that long ... but it hadn't rebooted so I
power cycled it).

/me not happy.

Now, we just recently switched out the hardware from a IBM Netfinity
4500R box we had sitting in the same cluster (I'm thinking of going back
to it... it is currently running 4.3-20010809-STABLE).  It was up for 35
days before our first crash.

The tweaks to the system are this:

/etc/sysctl.conf

kern.maxfiles=150000
vfs.vmiodirenable=1
net.inet.ip.intr_queue_maxlen=4096

/boot/loader.conf

userconfig_script_load="YES"
kern.ipc.nmbclusters="10240"            # number of mbuf clusters
kern.ipc.nmbufs="40960"                 # number of mbufs

I've also changed nfsd to startup to 512 servers (only have 256 running
though)

That's it.

nfsd, rpc.lockd, rpc.statd, portmap, nis (client to an SGI IRIX box),
and vinum.

The vinum partition is just a concatenated disk from the fibre channel
array (overkill I know) to create that 891GB partition.

Anybody have anything like this happen under simliar circumstances?

Cheers,
	"losing sleep fast" Ryan

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3BC5A3FB.22F51BB>