From owner-freebsd-stable@FreeBSD.ORG Thu Jul 13 10:10:17 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7A0CF16A4DA for ; Thu, 13 Jul 2006 10:10:17 +0000 (UTC) (envelope-from johan@stromnet.org) Received: from pne-smtpout1-sn2.hy.skanova.net (pne-smtpout1-sn2.hy.skanova.net [81.228.8.83]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB30743D45 for ; Thu, 13 Jul 2006 10:10:16 +0000 (GMT) (envelope-from johan@stromnet.org) Received: from elfi.stromnet.org (213.67.205.103) by pne-smtpout1-sn2.hy.skanova.net (7.2.075) id 44A2E86F002E1A83 for freebsd-stable@freebsd.org; Thu, 13 Jul 2006 12:10:15 +0200 Received: from localhost (localhost [127.0.0.1]) by elfi.stromnet.org (Postfix) with ESMTP id B7B4861D51 for ; Thu, 13 Jul 2006 12:10:14 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.org Received: from elfi.stromnet.org ([127.0.0.1]) by localhost (elfi.stromnet.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WllgMts8+gRJ for ; Thu, 13 Jul 2006 12:10:13 +0200 (CEST) Received: from [IPv6:2001:16d8:ff20:2:211:24ff:fea2:8e01] (unknown [IPv6:2001:16d8:ff20:2:211:24ff:fea2:8e01]) by elfi.stromnet.org (Postfix) with ESMTP id 3ABB361D53 for ; Thu, 13 Jul 2006 12:10:13 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v750) In-Reply-To: <884C01BC-3E97-46EC-AA8B-E70C3931F3A4@stromnet.org> References: <8D08DDB6-6AC1-45B6-B2CE-08782F54968A@stromnet.org> <884C01BC-3E97-46EC-AA8B-E70C3931F3A4@stromnet.org> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <36895211-2796-4213-B336-6279AB3AC3CB@stromnet.org> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Thu, 13 Jul 2006 12:10:30 +0200 To: freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.750) Subject: Re: GEOM problems again... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jul 2006 10:10:17 -0000 On 10 jul 2006, at 13.59, Johan Str=F6m wrote: > > On 10 jul 2006, at 11.09, Johan Str=F6m wrote: > >> >> On 21 maj 2006, at 11.16, Johan Str=F6m wrote: >> >>> Hi >>> >>> I've had problems before with GEOM mirror and my SATA drives, and =20= >>> i've posted about it here before too. The solution seemd to be a =20 >>> change of motherboard, this reduced the crash very much (and also =20= >>> the speeds archieved was greatly improved, from 10-15MB/s to =20 >>> 40-50MB/s..). >>> However after the change i had one or two crashes, but now it has =20= >>> been running for well over 50-60 days or so without any problems. >>> Then, 11 days ago I upgraded to 6.1... And now I got these =20 >>> "crashe"s again (the mirror is crashed that is, the system still =20 >>> runs fine): >>> >>> May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached >>> May 21 02:04:58 elfi kernel: subdisk6: detached >>> May 21 02:04:58 elfi kernel: ad6: detached >>> May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad6s1 disconnected. >>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >>> (offset=3D11006308352, length=3D2048)]error =3D 6 >>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >>> (offset=3D164847927296, length=3D131072)]error =3D 6 >>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >>> (offset=3D256680296448, length=3D32768)]error =3D 6 >>> >>> >>> Some info about the controller and disks: >>> >>> May 9 22:46:52 elfi kernel: ata1: on atapci0 >>> May 9 22:46:52 elfi kernel: atapci1: >> controller> port =20 >>> 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0=20= >>> f,0x7c0 >>> 0-0x7c7f irq 22 at device 11.0 on pci0 >>> >>> May 9 22:46:52 elfi kernel: ad4: 286188MB >> BANC1G10> at ata2-master SATA150 >>> May 9 22:46:52 elfi kernel: ad6: 286188MB >> BANC1G10> at ata3-master SATA150 >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created =20 >>> (id=3D4118114647). >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad4s1 detected. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad6s1 detected. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad6s1 activated. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad4s1 activated. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> mirror/gm0s1 launched. >>> May 9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/=20 >>> mirror/gm0s1a >>> >>> Anyone got any new clues? Afaik the disks should be working fine =20 >>> (they are 6 months old and this same problem has occured multiple =20= >>> times...) >>> >>> Hope to solve this ;) >>> >>> Thanks >>> Johan >>> >> >> Here we go again >> >> Jul 7 16:20:09 elfi kernel: ad4: FAILURE - device detached >> Jul 7 16:20:09 elfi kernel: subdisk4: detached >> Jul 7 16:20:09 elfi kernel: ad4: detached >> Jul 7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >> ad4s1 disconnected. >> Jul 7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >> (offset=3D88896847872, length=3D32768)]error =3D 6 >> >> However no read read timeouts etc as before, just this. 18 days =20 >> uptime this time (i've rebooted for other reasons since last =20 >> mail). It always seems to be ad4 that is disconnecting.. I'm going =20= >> to do some disk tests on it but i doubt it will give anything =20 >> since i've had similiar problems from day one (did tests at that =20 >> time w/o problems) with this gmirror setup (new disks). >> >> Johan > > Followup, I ran over the disk with Maxtors own test program, full =20 > length test. Not a single problem. > After reboot the raid is rebuilding fine: > > GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1. > > As usual it seems i cannot get the controller/driver to redetect =20 > the disk using atacontrol etc.. > > Johan And now again... raid gone degraded only 2 days after reboot! Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached Jul 12 22:22:50 elfi kernel: subdisk4: detached Jul 12 22:22:50 elfi kernel: ad4: detached Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 ad4s1 disconnected. Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 (offset=3D120776474624, length=3D32768)]error =3D 6 $ uname -a FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue =20 May 9 20:40:23 CEST 2006 johan@elfi.stromnet.org:/usr/obj/usr/src/=20 sys/GENERIC i386 Still no luck with atacontrol... Is there any way to debug this further ?? I've tested the disk, the =20 SATA cables are new... I've had similar problems with other =20 motherboard... I dont think this is related to hw problems, but rather a =20 softwareproblem that needs to be solved, this is not something one =20 can call stable ;) So, any pointers how to enable more debugging or anything that could =20 give some clues? Johan