From owner-freebsd-stable@FreeBSD.ORG Sun Jan 7 23:35:23 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2DE0F16A412 for ; Sun, 7 Jan 2007 23:35:23 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from mail.transactionware.com (mail.transactionware.com [203.14.245.7]) by mx1.freebsd.org (Postfix) with SMTP id 216E213C448 for ; Sun, 7 Jan 2007 23:35:19 +0000 (UTC) (envelope-from janm@transactionware.com) Received: (qmail 96476 invoked from network); 7 Jan 2007 23:08:56 -0000 Received: from midgard.transactionware.com (192.168.1.55) by dm.transactionware.com with SMTP; 7 Jan 2007 23:08:56 -0000 Received: (qmail 65156 invoked by uid 907); 7 Jan 2007 23:08:34 -0000 Received: from midgard.transactionware.com (HELO JMLAPTOP) (192.168.1.55) by midgard.transactionware.com (qpsmtpd/0.32) with ESMTP; Mon, 08 Jan 2007 10:08:34 +1100 From: "Jan Mikkelsen" To: "'Ian West'" , Date: Mon, 8 Jan 2007 10:08:33 +1100 Organization: Transactionware Message-ID: <001a01c732b0$c22f4860$0204a8c0@transactionware.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 In-Reply-To: <20070107213350.GA61293@aleph.niw.com.au> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Importance: Normal Cc: Subject: RE: kernel panic on 6.2-RC2 with GENERIC. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jan 2007 23:35:23 -0000 (Scott: I should have emailed you this earlier, but Christmas and = various other things got in the way.) Ian West wrote: > On Sun, Jan 07, 2007 at 02:25:02PM -0500, Mike Tancsa wrote: > > At 11:43 AM 1/7/2007, Craig Rodrigues wrote: > > >On Fri, Jan 05, 2007 at 06:59:10PM +0200, Nikolay Pavlov wrote: >>> [ Areca kernel panic, IO failures ... ] > I have seen this identical fault with the new areca driver, my machine > is opteron hardware, but running a regular i386/SMP kernel/world. With > everything at 6.2RC2 (as of 29th of December) except the areca driver > the machine is rock solid, with the 29th of december version of the > areca driver the box will crash on extract of a large tar=20 > file, removal > of a large directory structure, or pretty much anything that=20 > does a lot > of disk io to different files/locations. There is no error=20 > log prior to > seeing the following messages.. >=20 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D433078272, length=3D8192)]error =3D = 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D433111040, length=3D16384)]error = =3D 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D433209344, length=3D16384)]error = =3D 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D433242112, length=3D32768)]error = =3D 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D437612544, length=3D4096)]error =3D = 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D437616640, length=3D12288)]error = =3D 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D437633024, length=3D6144)]error =3D = 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D437639168, length=3D2048)]error =3D = 5 > Dec 29 14:26:44 aleph kernel:=20 > g_vfs_done():da0s1g[WRITE(offset=3D437641216, length=3D6144)]error =3D = 5 >=20 > There are a string of these, followed by a crash and reboot.=20 > The file system > state can be left very dirty to the point where background=20 > fsck seems unable > to recover it. >=20 > The areca card in question is running the latest firmware/boot and > has shown no problems either before, or since backing out the areca > driver. >=20 > The volume is ran the tests on was a 250G on a raid6 raid set. I have seen various problems with various Areca drivers. All on 6.2-RC1/amd64 with an Areca RAID-6 volume. Areca 1.20.00.02 seems to work fine. Areca 1.20.00.12 (from the Areca website) seems to have data corruption problems. My tests involve doing a "diff -r" on a filesystem with 2GB = of data. It will occasional find differences in files. On examination, = the last 640 bytes of the first block of the affected file contain data from another file "nearby" in the filesystem. Unmounting and remounting the filesystems and rerunning the test shows no problem, or a difference in another file entirely. I think this is the cause of the g_vfs_done = failures with this version of the driver; the offsets are wrong because the data = is corrupted. Areca 1.20.00.13 (as currently in the tree) does not seem to have data corruption problems, but I can trigger g_vfs_done failures under heavy = I/O. I have raised this with Areca support, and I'm waiting to hear back from Erich Chen. Regards, Jan Mikkelsen