Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Dec 2000 16:43:15 -0800
From:      Kirk McKusick <mckusick@mckusick.com>
To:        dbhague@allstor-sw.co.uk
Cc:        gibbs@scsiguy.com, mjacob@feral.com, Andre Albsmeier <andre.albsmeier@mchp.siemens.de>, Matt Dillon <dillon@earth.backplane.com>, freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org
Subject:   Re: Filesystem Crashes 
Message-ID:  <200012200043.QAA30900@beastie.mckusick.com>
In-Reply-To: Your message of "Wed, 15 Nov 2000 09:48:54 GMT." <80256998.0035EB4C.00@mail.plasmon.co.uk> 

next in thread | previous in thread | raw e-mail | index | archive | help
	From: dbhague@allstor-sw.co.uk
	To: Kirk McKusick <mckusick@mckusick.com>
	Date: Wed, 15 Nov 2000 09:48:54 +0000
	Subject: Re: Filesystem Crashes

	Kirk,

	I enclose the test program that we used to trigger the
	filesystem bug.  I have made numerous posting to the FS
	and SCSI bug lists with details of our problem.  The program
	basically generates a large number of small files and then
	deletes them. This is designed to stress the meta-data of
	the filesystem.

	You should be aware that we now believe this to be a problem
	with the RAID controller, in particular the lengths of the
	EIDE cables. However several people have used this program
	and had filesystem problems.

	Let me know if I can help further.

	Regards Dave

Just a quick note to bring you up to date with our current
thinking. The dup allocation and other block allocation panics
seem to be derived from a bug in the buffer caching code which
fails to properly write partial-page sized blocks (cylinder group
maps which are the data structure that is being corrupted is
typically 2K or 3K in size). Matt Dillion and I have been trying
to track it down. The jury is still out on whether we have found
the bug, but we have found and fixed several suspect pieces of code.
If you are still getting these types of panics in -current code,
please let Matt and I know.

	Kirk McKusick

=-=-=-=-=-=-=-=

	Sent by:  David Barrett-Hague

	To:   freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org
	cc:   gibbs@scsiguy.com, mjacob@feral.com, Andre Albsmeier
	      <andre.albsmeier@mchp.siemens.de>, Steven
	      McIntyre/ALLSTOR/UK/Plasmon@PlasNotes

	Subject:  Re: Stressed SCSI subsystem locks up the system  (Document
		link not converted)

	This is increasingly looking like a filesystem issue.

	We have done some more testing on the 4.1 build.  We believe
	the issues with 4.1 are different to the ones with 3.0.
	Due to the age of 3.0 we have decide to forget about it
	and concentrate on the 4.1 issues.

	We have two 4.1 test systems; one failed with the following
	on the console > dev #da/3 block=0 fs=/RAID blocks >panic
	ffs_blkfree freeing free frags >syncing disk The system
	was locked in this state and was ping-able but we could
	not telnet in.

	The other system is still running after two days.  This is
	the longest a test has run.  The only difference we can
	see between the systems is the way the filesystem was built.

	This was built by
	> dd if=/dev/zero of=/dev/rda0 count=2048
	> disklabel -Brw da0 auto
	> disklabel -e da0
	> newfs /dev/rda0d

	normally this is done in a script which effectively does
	> disklabel -rw da0 auto
	> dd if=/dev/zero of=/dev/rda0 count=2048
	> disklabel -rw da0 auto
	> "disklabel" | disklabel -rR da0
	> newfs /dev/rda0d

	The only real difference is the -B option in disklabel.
	We do not attempt to boot from this partition so this should
	not matter.

	Let me know what you think ?

	Regards Dave

	For the sake of Freebsd-fs subscribers:

	For your information we are a small software company, just
	south of Cambridge UK, and have a thin server that is
	running FreeBSD.  We are developing a storage system which
	controls a RAID system and automatically archives the data
	to optical jukeboxes or tape libraries. Basically HSM in a box.

	The system is as follows:
	   Single board computer, AMD K6-II,
	   128MB RAM,
	   Adaptec AHA 3940AU
	   Intel Ethernet Pro 100

	This is attached to a RAID system although we have had the
	same failure with a single SCSI drive.  I enclose the test
	script and the source for  the main test program. tmcp.c.
	The test is started with "golongmulti 7"

	If you require any further information please ask, or give me a call
	+44-1763-264-474.

	Thanks Dave
	(See attached file: tmcp.c)(See attached file: golong)
	(See attached file: golongmulti)(See attached file: go)


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012200043.QAA30900>