Date: Thu, 13 Jul 2006 12:10:30 +0200 From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.org> To: freebsd-stable@freebsd.org Subject: Re: GEOM problems again... Message-ID: <36895211-2796-4213-B336-6279AB3AC3CB@stromnet.org> In-Reply-To: <884C01BC-3E97-46EC-AA8B-E70C3931F3A4@stromnet.org> References: <DAFCD4DC-D2D4-4574-ACBF-367D642D9729@stromnet.org> <8D08DDB6-6AC1-45B6-B2CE-08782F54968A@stromnet.org> <884C01BC-3E97-46EC-AA8B-E70C3931F3A4@stromnet.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10 jul 2006, at 13.59, Johan Str=F6m wrote: > > On 10 jul 2006, at 11.09, Johan Str=F6m wrote: > >> >> On 21 maj 2006, at 11.16, Johan Str=F6m wrote: >> >>> Hi >>> >>> I've had problems before with GEOM mirror and my SATA drives, and =20= >>> i've posted about it here before too. The solution seemd to be a =20 >>> change of motherboard, this reduced the crash very much (and also =20= >>> the speeds archieved was greatly improved, from 10-15MB/s to =20 >>> 40-50MB/s..). >>> However after the change i had one or two crashes, but now it has =20= >>> been running for well over 50-60 days or so without any problems. >>> Then, 11 days ago I upgraded to 6.1... And now I got these =20 >>> "crashe"s again (the mirror is crashed that is, the system still =20 >>> runs fine): >>> >>> May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached >>> May 21 02:04:58 elfi kernel: subdisk6: detached >>> May 21 02:04:58 elfi kernel: ad6: detached >>> May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad6s1 disconnected. >>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >>> (offset=3D11006308352, length=3D2048)]error =3D 6 >>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >>> (offset=3D164847927296, length=3D131072)]error =3D 6 >>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >>> (offset=3D256680296448, length=3D32768)]error =3D 6 >>> >>> >>> Some info about the controller and disks: >>> >>> May 9 22:46:52 elfi kernel: ata1: <ATA channel 1> on atapci0 >>> May 9 22:46:52 elfi kernel: atapci1: <nVidia nForce2 Pro SATA150 =20= >>> controller> port =20 >>> 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0=20= >>> f,0x7c0 >>> 0-0x7c7f irq 22 at device 11.0 on pci0 >>> >>> May 9 22:46:52 elfi kernel: ad4: 286188MB <Maxtor 7L300S0 =20 >>> BANC1G10> at ata2-master SATA150 >>> May 9 22:46:52 elfi kernel: ad6: 286188MB <Maxtor 7L300S0 =20 >>> BANC1G10> at ata3-master SATA150 >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created =20 >>> (id=3D4118114647). >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad4s1 detected. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad6s1 detected. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad6s1 activated. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> ad4s1 activated. >>> May 9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >>> mirror/gm0s1 launched. >>> May 9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/=20 >>> mirror/gm0s1a >>> >>> Anyone got any new clues? Afaik the disks should be working fine =20 >>> (they are 6 months old and this same problem has occured multiple =20= >>> times...) >>> >>> Hope to solve this ;) >>> >>> Thanks >>> Johan >>> >> >> Here we go again >> >> Jul 7 16:20:09 elfi kernel: ad4: FAILURE - device detached >> Jul 7 16:20:09 elfi kernel: subdisk4: detached >> Jul 7 16:20:09 elfi kernel: ad4: detached >> Jul 7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 >> ad4s1 disconnected. >> Jul 7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 >> (offset=3D88896847872, length=3D32768)]error =3D 6 >> >> However no read read timeouts etc as before, just this. 18 days =20 >> uptime this time (i've rebooted for other reasons since last =20 >> mail). It always seems to be ad4 that is disconnecting.. I'm going =20= >> to do some disk tests on it but i doubt it will give anything =20 >> since i've had similiar problems from day one (did tests at that =20 >> time w/o problems) with this gmirror setup (new disks). >> >> Johan > > Followup, I ran over the disk with Maxtors own test program, full =20 > length test. Not a single problem. > After reboot the raid is rebuilding fine: > > GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1. > > As usual it seems i cannot get the controller/driver to redetect =20 > the disk using atacontrol etc.. > > Johan And now again... raid gone degraded only 2 days after reboot! Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached Jul 12 22:22:50 elfi kernel: subdisk4: detached Jul 12 22:22:50 elfi kernel: ad4: detached Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20 ad4s1 disconnected. Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20 (offset=3D120776474624, length=3D32768)]error =3D 6 $ uname -a FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue =20 May 9 20:40:23 CEST 2006 johan@elfi.stromnet.org:/usr/obj/usr/src/=20 sys/GENERIC i386 Still no luck with atacontrol... Is there any way to debug this further ?? I've tested the disk, the =20 SATA cables are new... I've had similar problems with other =20 motherboard... I dont think this is related to hw problems, but rather a =20 softwareproblem that needs to be solved, this is not something one =20 can call stable ;) So, any pointers how to enable more debugging or anything that could =20 give some clues? Johan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?36895211-2796-4213-B336-6279AB3AC3CB>