From owner-freebsd-stable Thu Oct 11 6:52: 1 2001 Delivered-To: freebsd-stable@freebsd.org Received: from umc-mail01.missouri.edu (umc-mail01.missouri.edu [128.206.10.216]) by hub.freebsd.org (Postfix) with ESMTP id C836037B403 for ; Thu, 11 Oct 2001 06:51:56 -0700 (PDT) Received: from missouri.edu (karma.iats.missouri.edu [128.206.94.220]) by umc-mail01.missouri.edu with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id 443S2FKL; Thu, 11 Oct 2001 08:51:55 -0500 Message-ID: <3BC5A3FB.22F51BB@missouri.edu> Date: Thu, 11 Oct 2001 08:51:55 -0500 From: Ryan Dooley X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.8-19mdk i686) X-Accept-Language: en MIME-Version: 1.0 To: stable@freebsd.org Subject: Uh... server crashes every day :-/ any thoughts? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hey All, I've got this 4.4-RELEASE server running a Dell 6450 that seems to be having issues (I've crashed once a day for the past week at the worst possible time (business hours). Here's the deal... The system is a central NFS server serving up NFS, SAMBA, and printing to a large number of clients. It has two interfaces. One goes to a dedicated 100MB network for 6 linux machines that act as web and ftp servers as well as some general access machines (they mount a file system from this server via NFS (version 3, udp) The second interface goes to our public network to serve out NFS/SAMBA. We have a mix of unix clients (AIX, IRIX, and more linux). The AIX and IRIX clients mount via version 3 and tcp, while linux continues to mount version 3, udp. We have 170ish active NFS clients off this one interface and 800+ samba clients. The file server itself is a Dell 6450 (dual processor with 1 GB ram and fibrechannel disk.) The fibrechannel is connected via a Qlogic 2200 HBA to an IBM fibrechannel array. We have a 891GB disk that houses user data (yes, this is a 45 minute PITA to fsck, I'm really looking forward to fsck -B...) The crash yesterday looked to involve a SMP error so I rebooted the system with a uniprocessor kernel. The past two crashes have left the system in a panic state, but the state never recovers from a the "syncing disks message" and has to be powercycled (I didn't wait that long ... but it hadn't rebooted so I power cycled it). /me not happy. Now, we just recently switched out the hardware from a IBM Netfinity 4500R box we had sitting in the same cluster (I'm thinking of going back to it... it is currently running 4.3-20010809-STABLE). It was up for 35 days before our first crash. The tweaks to the system are this: /etc/sysctl.conf kern.maxfiles=150000 vfs.vmiodirenable=1 net.inet.ip.intr_queue_maxlen=4096 /boot/loader.conf userconfig_script_load="YES" kern.ipc.nmbclusters="10240" # number of mbuf clusters kern.ipc.nmbufs="40960" # number of mbufs I've also changed nfsd to startup to 512 servers (only have 256 running though) That's it. nfsd, rpc.lockd, rpc.statd, portmap, nis (client to an SGI IRIX box), and vinum. The vinum partition is just a concatenated disk from the fibre channel array (overkill I know) to create that 891GB partition. Anybody have anything like this happen under simliar circumstances? Cheers, "losing sleep fast" Ryan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message