Date: Tue, 19 Dec 2000 16:43:15 -0800 From: Kirk McKusick <mckusick@mckusick.com> To: dbhague@allstor-sw.co.uk Cc: gibbs@scsiguy.com, mjacob@feral.com, Andre Albsmeier <andre.albsmeier@mchp.siemens.de>, Matt Dillon <dillon@earth.backplane.com>, freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org Subject: Re: Filesystem Crashes Message-ID: <200012200043.QAA30900@beastie.mckusick.com> In-Reply-To: Your message of "Wed, 15 Nov 2000 09:48:54 GMT." <80256998.0035EB4C.00@mail.plasmon.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
From: dbhague@allstor-sw.co.uk To: Kirk McKusick <mckusick@mckusick.com> Date: Wed, 15 Nov 2000 09:48:54 +0000 Subject: Re: Filesystem Crashes Kirk, I enclose the test program that we used to trigger the filesystem bug. I have made numerous posting to the FS and SCSI bug lists with details of our problem. The program basically generates a large number of small files and then deletes them. This is designed to stress the meta-data of the filesystem. You should be aware that we now believe this to be a problem with the RAID controller, in particular the lengths of the EIDE cables. However several people have used this program and had filesystem problems. Let me know if I can help further. Regards Dave Just a quick note to bring you up to date with our current thinking. The dup allocation and other block allocation panics seem to be derived from a bug in the buffer caching code which fails to properly write partial-page sized blocks (cylinder group maps which are the data structure that is being corrupted is typically 2K or 3K in size). Matt Dillion and I have been trying to track it down. The jury is still out on whether we have found the bug, but we have found and fixed several suspect pieces of code. If you are still getting these types of panics in -current code, please let Matt and I know. Kirk McKusick =-=-=-=-=-=-=-= Sent by: David Barrett-Hague To: freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org cc: gibbs@scsiguy.com, mjacob@feral.com, Andre Albsmeier <andre.albsmeier@mchp.siemens.de>, Steven McIntyre/ALLSTOR/UK/Plasmon@PlasNotes Subject: Re: Stressed SCSI subsystem locks up the system (Document link not converted) This is increasingly looking like a filesystem issue. We have done some more testing on the 4.1 build. We believe the issues with 4.1 are different to the ones with 3.0. Due to the age of 3.0 we have decide to forget about it and concentrate on the 4.1 issues. We have two 4.1 test systems; one failed with the following on the console > dev #da/3 block=0 fs=/RAID blocks >panic ffs_blkfree freeing free frags >syncing disk The system was locked in this state and was ping-able but we could not telnet in. The other system is still running after two days. This is the longest a test has run. The only difference we can see between the systems is the way the filesystem was built. This was built by > dd if=/dev/zero of=/dev/rda0 count=2048 > disklabel -Brw da0 auto > disklabel -e da0 > newfs /dev/rda0d normally this is done in a script which effectively does > disklabel -rw da0 auto > dd if=/dev/zero of=/dev/rda0 count=2048 > disklabel -rw da0 auto > "disklabel" | disklabel -rR da0 > newfs /dev/rda0d The only real difference is the -B option in disklabel. We do not attempt to boot from this partition so this should not matter. Let me know what you think ? Regards Dave For the sake of Freebsd-fs subscribers: For your information we are a small software company, just south of Cambridge UK, and have a thin server that is running FreeBSD. We are developing a storage system which controls a RAID system and automatically archives the data to optical jukeboxes or tape libraries. Basically HSM in a box. The system is as follows: Single board computer, AMD K6-II, 128MB RAM, Adaptec AHA 3940AU Intel Ethernet Pro 100 This is attached to a RAID system although we have had the same failure with a single SCSI drive. I enclose the test script and the source for the main test program. tmcp.c. The test is started with "golongmulti 7" If you require any further information please ask, or give me a call +44-1763-264-474. Thanks Dave (See attached file: tmcp.c)(See attached file: golong) (See attached file: golongmulti)(See attached file: go) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012200043.QAA30900>