From owner-freebsd-current@FreeBSD.ORG  Wed Oct 13 22:38:07 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B7AC716A4CE
	for <freebsd-current@freebsd.org>;
	Wed, 13 Oct 2004 22:38:07 +0000 (GMT)
Received: from corbulon.video-collage.com (aldan.algebra.com [216.254.65.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7E63F43D55
	for <freebsd-current@freebsd.org>;
	Wed, 13 Oct 2004 22:38:06 +0000 (GMT)
	(envelope-from Mikhail.Teterin@murex.com)
Received: from 250-217.customer.cloud9.net (195-11.customer.cloud9.net
	[168.100.195.11])i9DMbw88076242
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <freebsd-current@freebsd.org>;
	Wed, 13 Oct 2004 18:38:04 -0400 (EDT)
	(envelope-from Mikhail.Teterin@murex.com)
Received: from localhost (mteterin@localhost [127.0.0.1])
	i9DMbqAg021237;	Wed, 13 Oct 2004 18:37:52 -0400 (EDT)
	(envelope-from Mikhail.Teterin@murex.com)
From: Mikhail Teterin <Mikhail.Teterin@murex.com>
Organization: Murex N.A.
To: Matthew Dillon <dillon@apollo.backplane.com>
Date: Wed, 13 Oct 2004 18:37:51 -0400
User-Agent: KMail/1.7
References: <416AE7D7.3030502@murex.com> <416C2502.5040505@murex.com>
	<200410130431.i9D4VjPJ094849@apollo.backplane.com>
In-Reply-To: <200410130431.i9D4VjPJ094849@apollo.backplane.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="koi8-u"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200410131837.51832@misha-mx.virtual-estates.net>
X-Virus-Scanned: clamd / ClamAV version devel-20040615, clamav-milter version
	0.73a	on corbulon.video-collage.com
X-Virus-Status: Clean
X-Scanned-By: MIMEDefang 2.43
X-Mailman-Approved-At: Thu, 14 Oct 2004 12:10:41 +0000
cc: freebsd-current@freebsd.org
cc: bde@zeta.org.au
Subject: Re: panic in ffs (Re: hangs in nbufkv)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Oct 2004 22:38:07 -0000

=:I don't know, how, but the bug seems triggered by upping the
=:net.inet.udp.maxdgram from 9216 (default) to 16384 (to match the NFS
=:client's wsize). Once I do that, the machine will either panic or just
=:hang a few minutes into the heavy NFS writing (Sybase database dumps
=:from a Solaris server). Happened twice already...

=    Interesting. That's getting a bit outside the realm I can help
=    with. NFS and the network stack have been issues in FreeBSD
=    recently so its probably something related.

Actually, that's not it. Even if I don't touch any sysctl's, but simply
proceed loading the machine with our backup scripts, it will eventually
either hang (after many complains about WRITE_DMA problems with the
disk, NFS clients write to) or panic with:

 initiate_write_inodeblock_ufs2: already started

(in /sys/ufs/ffs/ffs_softdep.c). As for the WRITE_DMA problems, after
going through two disks, two cables, and two different on-board SATA
connectors, we concluded, the problem is with the ata-driver (hence
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/72451). As for panics,
I set the BKVASIZE back down to 16Kb, rebuilt the kernel and recreated
the filesystem, that used to have the 64K-bsize.

Machine still either panics or hangs under load.

May be, I should give a bit more details about the load. The load is
produced by a script, which tells the Sybase server to dump one database
at a time over NFS to the "staging" disk (single SATA150 drive) and, as
each database is dumped, compresses it onto the RAID5 array for storage.

When the thing is working properly, the Sybase server writes at or close
to the wire speed (9-11Mb/second). Unfortunately, the staging disk soon
starts throwing the above mentioned WRITE_DMA errors. Fortunately, those
are usually recoverable. Unfortunately, the machine eventually hangs
anyway...

I changed the script to use the RAID5-partition as the staging area
as well (this is the filesystem, that used to have 64Kb bsize and
8Kb fsize -- it is over 1Tb large) and it seems to work for now, but
the throughput is much lower, than it used to be (limited by the
raid-controller's i/o).

Another observation, I can make, is that 'bufdaemon' often takes up
50-80% of the CPU time (on a 2.2 Opteron!) while this script is running.
Not sure if that's normal or not.

	-mi