From owner-freebsd-stable@FreeBSD.ORG Mon Jan 15 15:06:40 2007 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 67E9E16A55A for ; Mon, 15 Jan 2007 15:06:40 +0000 (UTC) (envelope-from quetzal@zone3000.net) Received: from mx1.sitevalley.com (sitevalley.com [209.67.60.43]) by mx1.freebsd.org (Postfix) with SMTP id F0A0613C448 for ; Mon, 15 Jan 2007 15:06:39 +0000 (UTC) (envelope-from quetzal@zone3000.net) Received: from unknown (HELO localhost) (217.144.69.37) by 209.67.61.254 with SMTP; 15 Jan 2007 15:06:36 -0000 Date: Mon, 15 Jan 2007 17:06:09 +0200 From: Nikolay Pavlov To: Scott Long Message-ID: <20070115150609.GB2510@zone3000.net> Mail-Followup-To: Nikolay Pavlov , Scott Long , Jan Mikkelsen , freebsd-stable@FreeBSD.org References: <001a01c732b0$c22f4860$0204a8c0@transactionware.com> <45A1C15F.8060802@samsco.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45A1C15F.8060802@samsco.org> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 6.1-RELEASE-p10 Cc: Jan Mikkelsen , freebsd-stable@FreeBSD.org Subject: Re: kernel panic on 6.2-RC2 with GENERIC. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jan 2007 15:06:40 -0000 On Sunday, 7 January 2007 at 19:58:23 -0800, Scott Long wrote: > Jan Mikkelsen wrote: > >(Scott: I should have emailed you this earlier, but Christmas and various > >other things got in the way.) > > > >Ian West wrote: > >>On Sun, Jan 07, 2007 at 02:25:02PM -0500, Mike Tancsa wrote: > >>>At 11:43 AM 1/7/2007, Craig Rodrigues wrote: > >>>>On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: > >>>>[ Areca kernel panic, IO failures ... ] > >>I have seen this identical fault with the new areca driver, my machine > >>is opteron hardware, but running a regular i386/SMP kernel/world. With > >>everything at 6.2RC2 (as of 29th of December) except the areca driver > >>the machine is rock solid, with the 29th of december version of the > >>areca driver the box will crash on extract of a large tar > >>file, removal > >>of a large directory structure, or pretty much anything that > >>does a lot > >>of disk io to different files/locations. There is no error > >>log prior to > >>seeing the following messages.. > >> > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=433078272, length=8192)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=433111040, length=16384)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=433209344, length=16384)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=433242112, length=32768)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=437612544, length=4096)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=437616640, length=12288)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=437633024, length=6144)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=437639168, length=2048)]error = 5 > >>Dec 29 14:26:44 aleph kernel: > >>g_vfs_done():da0s1g[WRITE(offset=437641216, length=6144)]error = 5 > >> > >>There are a string of these, followed by a crash and reboot. > >>The file system > >>state can be left very dirty to the point where background > >>fsck seems unable > >>to recover it. > >> > >>The areca card in question is running the latest firmware/boot and > >>has shown no problems either before, or since backing out the areca > >>driver. > >> > >>The volume is ran the tests on was a 250G on a raid6 raid set. > > > >I have seen various problems with various Areca drivers. All on > >6.2-RC1/amd64 with an Areca RAID-6 volume. > > > >Areca 1.20.00.02 seems to work fine. > > > >Areca 1.20.00.12 (from the Areca website) seems to have data corruption > >problems. My tests involve doing a "diff -r" on a filesystem with 2GB of > >data. It will occasional find differences in files. On examination, the > >last 640 bytes of the first block of the affected file contain data from > >another file "nearby" in the filesystem. Unmounting and remounting the > >filesystems and rerunning the test shows no problem, or a difference in > >another file entirely. I think this is the cause of the g_vfs_done > >failures > >with this version of the driver; the offsets are wrong because the data is > >corrupted. > > > >Areca 1.20.00.13 (as currently in the tree) does not seem to have data > >corruption problems, but I can trigger g_vfs_done failures under heavy I/O. > > > >I have raised this with Areca support, and I'm waiting to hear back from > >Erich Chen. > > > >Regards, > > > >Jan Mikkelsen > > > > I discussed this issue in length with the release engineering team > today, and we're going to go ahead with keeping the .013 version in > 6.2 since it has been working very reliably for a number of other > testers, and reverting it at this late stage of the release represents > more risk. A note about this issue will likely be put into the 6.2 > errata document as well. This problem isn't mentioned in errata notes to release 6.2. Someone can face the problem. http://www.freebsd.org/releases/6.2R/errata.html > > I plan to dig into this problem next week unless Areca fixes it first. > Please let me know if you hear anything from them. > > Scott > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- ====================================================================== - Best regards, Nikolay Pavlov. <<<----------------------------------- ======================================================================