Date: Wed, 13 Oct 2004 18:37:51 -0400 From: Mikhail Teterin <Mikhail.Teterin@murex.com> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: bde@zeta.org.au Subject: Re: panic in ffs (Re: hangs in nbufkv) Message-ID: <200410131837.51832@misha-mx.virtual-estates.net> In-Reply-To: <200410130431.i9D4VjPJ094849@apollo.backplane.com> References: <416AE7D7.3030502@murex.com> <416C2502.5040505@murex.com> <200410130431.i9D4VjPJ094849@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
=:I don't know, how, but the bug seems triggered by upping the =:net.inet.udp.maxdgram from 9216 (default) to 16384 (to match the NFS =:client's wsize). Once I do that, the machine will either panic or just =:hang a few minutes into the heavy NFS writing (Sybase database dumps =:from a Solaris server). Happened twice already... = Interesting. That's getting a bit outside the realm I can help = with. NFS and the network stack have been issues in FreeBSD = recently so its probably something related. Actually, that's not it. Even if I don't touch any sysctl's, but simply proceed loading the machine with our backup scripts, it will eventually either hang (after many complains about WRITE_DMA problems with the disk, NFS clients write to) or panic with: initiate_write_inodeblock_ufs2: already started (in /sys/ufs/ffs/ffs_softdep.c). As for the WRITE_DMA problems, after going through two disks, two cables, and two different on-board SATA connectors, we concluded, the problem is with the ata-driver (hence http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/72451). As for panics, I set the BKVASIZE back down to 16Kb, rebuilt the kernel and recreated the filesystem, that used to have the 64K-bsize. Machine still either panics or hangs under load. May be, I should give a bit more details about the load. The load is produced by a script, which tells the Sybase server to dump one database at a time over NFS to the "staging" disk (single SATA150 drive) and, as each database is dumped, compresses it onto the RAID5 array for storage. When the thing is working properly, the Sybase server writes at or close to the wire speed (9-11Mb/second). Unfortunately, the staging disk soon starts throwing the above mentioned WRITE_DMA errors. Fortunately, those are usually recoverable. Unfortunately, the machine eventually hangs anyway... I changed the script to use the RAID5-partition as the staging area as well (this is the filesystem, that used to have 64Kb bsize and 8Kb fsize -- it is over 1Tb large) and it seems to work for now, but the throughput is much lower, than it used to be (limited by the raid-controller's i/o). Another observation, I can make, is that 'bufdaemon' often takes up 50-80% of the CPU time (on a 2.2 Opteron!) while this script is running. Not sure if that's normal or not. -mi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410131837.51832>