From owner-freebsd-geom@FreeBSD.ORG Wed Jun 6 15:17:23 2007 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C5EFE16A474 for ; Wed, 6 Jun 2007 15:17:23 +0000 (UTC) (envelope-from cory@clearwateranalytics.com) Received: from relay.mail.twtelecom.net (relay.mail.twtelecom.net [64.129.67.77]) by mx1.freebsd.org (Postfix) with ESMTP id 6CA5513C4B8 for ; Wed, 6 Jun 2007 15:17:23 +0000 (UTC) (envelope-from cory@clearwateranalytics.com) Received: from localhost (localhost.localdomain [127.0.0.1]) by relay.mail.twtelecom.net (Postfix) with ESMTP id 38BD953C0F2 for ; Wed, 6 Jun 2007 09:06:38 -0600 (MDT) X-Virus-Scanned: amavisd-new at twtelecom.net Received: from relay.mail.twtelecom.net ([127.0.0.1]) by localhost (relay-4.dnvr.twtelecom.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U7VI4ldhdc1W for ; Wed, 6 Jun 2007 09:06:35 -0600 (MDT) Received: from assp.arbfund.com (unknown [207.170.247.190]) by relay.mail.twtelecom.net (Postfix) with ESMTP id 08DE253C105 for ; Wed, 6 Jun 2007 09:06:34 -0600 (MDT) Received: from 192.168.9.81 ([192.168.9.81] helo=maildb1.arbfund.com) by assp.arbfund.com ; 6 Jun 07 15:25:55 -0000 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Wed, 6 Jun 2007 08:58:24 -0600 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Gmirror, broken ggate locks system Thread-Index: AceoSyGaBjp5um3PTCKy4w7jbhvVjQ== From: "Cory Marsh" To: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Gmirror, broken ggate locks system X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jun 2007 15:17:23 -0000 I am experiencing a gmirror issue on my mirrored partitions. These partitions work great replicating data over a gmirror interface to another machine. Everything goes just fine until the ggate interface in the gmirror goes down (backup machine reboot, network problem, etc). At that point, the machine with the gmirror locks up. Any process that is currently running will continue to run, so long as it does not access the disk in anyway. As soon as a disk request happens that process locks hard. This forces me to shutdown the machine ungracefully. =20 Is this the expected behavior? Shouldn't gmirror detect the stale (unresponsive) component and deactivate it? Is it a problem because my primary consumer is the ggate device? Is there a better configuration to achieve the same result? =20 Any ideas/suggestions would be appreciated. Thanks! -Cory =20 %uname -a FreeBSD cwanfs1.arbfund.com 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 12 23:34:43 MST 2007 root@:/usr/obj/usr/src/sys/GENERIC amd64 %gmirror list data Geom name: data State: COMPLETE Components: 2 Balance: prefer Slice: 4096 Flags: NOAUTOSYNC GenID: 2 SyncID: 21 ID: 1381569007 Providers: 1. Name: mirror/data Mediasize: 10737417728 (10G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ggate0 Mediasize: 10737418240 (10G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 1 Flags: NONE GenID: 2 SyncID: 21 ID: 1578386556 2. Name: ar0s1g Mediasize: 10737418240 (10G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 100 Flags: NONE GenID: 2 SyncID: 21 ID: 1982490913 =20 Info about the problem that locked the machine (got these messages for 20 minutes, about 100 of them, before the machine locked, it could have been locked after the first message, I only noticed the machine down after 20 minutes). It looks like a network card issue disconnected the ggate devices and then the machine locked. =20 /var/log/messages: ... Jun 5 17:10:15 cwanfs1 kernel: nfe0: watchdog timeout (missed Tx interrupts) -- recovering Jun 5 17:10:28 cwanfs1 ggatec: Lost connection 1. Jun 5 17:10:28 cwanfs1 ggatec: Disconnected [10.10.10.2 /dev/ar0s1g]. Connecting... Jun 5 17:10:59 cwanfs1 kernel: nfe0: watchdog timeout (missed Tx interrupts) -- recovering ... =20 %dmesg Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE #0: Fri Jan 12 23:34:43 MST 2007 root@:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ (2800.13-MHz K8-class CPU) Origin =3D "AuthenticAMD" Id =3D 0x40f33 Stepping =3D 3 =20 Features=3D0x178bfbff Features2=3D0x2001 AMD = Features=3D0xea500800 AMD Features2=3D0x1f,,CR8> Cores per package: 2 real memory =3D 2147287040 (2047 MB) avail memory =3D 2065846272 (1970 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) acpi0: reservation of fec00000, 1000 (3) failed acpi0: reservation of fee00000, 1000 (3) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0 cpu0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.0 (no driver attached) isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfeaf7000-0xfeaf7fff irq 21 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 10 ports with 10 removable, self powered ehci0: mem 0xfeaf6c00-0xfeaf6cff irq 22 at device 2.1 on pci0 ehci0: [GIANT-LOCKED] usb1: EHCI version 1.0 usb1: companion controller, 10 ports each: usb0 usb1: on ehci0 usb1: USB revision 2.0 uhub1: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub1: 10 ports with 10 removable, self powered atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 4.0 o n pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0x cc0f mem 0xfeaf5000-0xfeaf5fff irq 23 at device 5.0 on pci0 ata2: on atapci1 ata3: on atapci1 atapci2: port 0xc880-0xc887,0xc800-0xc803,0xc480-0xc487,0xc400-0xc403,0xc080-0x c08f mem 0xfeaf4000-0xfeaf4fff irq 20 at device 5.1 on pci0 ata4: on atapci2 ata5: on atapci2 atapci3: port 0xc000-0xc007,0xbc00-0xbc03,0xb880-0xb887,0xb800-0xb803,0xb480-0x b48f mem 0xfeaf3000-0xfeaf3fff irq 21 at device 5.2 on pci0 ata6: on atapci3 ata7: on atapci3 pcib1: at device 6.0 on pci0 pci1: on pcib1 pci1: at device 10.0 (no driver attached) nfe0: port 0xb400-0xb407 mem 0xfeaf2000-0xfeaf2fff,0xfeaf6800-0xfeaf68ff,0xfeaf 6400-0xfeaf640f irq 22 at device 8.0 on pci0 miibus0: on nfe0 ukphy0: on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto nfe0: Ethernet address: 00:e0:81:75:4d:fc nfe0: [FAST] nfe1: port 0xb080-0xb087 mem 0xfeaf1000-0xfeaf1fff,0xfeaf6000-0xfeaf60ff,0xfeaf 0c00-0xfeaf0c0f irq 23 at device 9.0 on pci0 miibus1: on nfe1 ukphy1: on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto nfe1: Ethernet address: 00:e0:81:75:4d:fd nfe1: [FAST] pcib2: at device 10.0 on pci0 pci2: on pcib2 pcib3: at device 11.0 on pci0 pci3: on pcib3 pcib4: at device 12.0 on pci0 pci4: on pcib4 pcib5: at device 13.0 on pci0 pci5: on pcib5 pcib6: at device 14.0 on pci0 pci6: on pcib6 pcib7: at device 15.0 on pci0 pci7: on pcib7 acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc97ff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2800129550 Hz quality 800 Timecounters tick every 1.000 msec acd0: CDROM at ata0-slave UDMA33 ad4: 305245MB at ata2-master SATA300 ad6: 305245MB at ata3-master SATA300 ar0: 305245MB status: READY ar0: disk0 READY (master) using ad4 at ata2-master ar0: disk1 READY (mirror) using ad6 at ata3-master =20