From owner-freebsd-geom@FreeBSD.ORG Sun Jul 30 13:18:45 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DB84B16A4DA for ; Sun, 30 Jul 2006 13:18:45 +0000 (UTC) (envelope-from brenthostetler@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.178]) by mx1.FreeBSD.org (Postfix) with ESMTP id E0A9443D46 for ; Sun, 30 Jul 2006 13:18:44 +0000 (GMT) (envelope-from brenthostetler@gmail.com) Received: by py-out-1112.google.com with SMTP id b36so264877pyb for ; Sun, 30 Jul 2006 06:18:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=HL8aIRwU1Voew4XadaUAR/f1OXjQ6rhQSQWONL/31ILA75Ts6AVTU1SLPAjSPYkRlK5HGCK8yYVmP5acS3s25H/6C7p8fzQgqQq8jOxMtBI7Y6PUnoXaxPPBaOeCjNgIuPRDwfom9lp0L52x6Z64q5W0iYyJ7LG9h5Fx46QC8PA= Received: by 10.35.11.15 with SMTP id o15mr2332474pyi; Sun, 30 Jul 2006 06:18:44 -0700 (PDT) Received: by 10.35.128.2 with HTTP; Sun, 30 Jul 2006 06:18:44 -0700 (PDT) Message-ID: Date: Sun, 30 Jul 2006 06:18:44 -0700 From: "Brent Hostetler" To: freebsd-geom@freebsd.org, freebsd-questions@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: Subject: gmirror/gconcat: mkdir causes system reboot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Jul 2006 13:18:46 -0000 I am having a strange issue. I have a samba server (freebsd) that has been running fine for quite some time no errors to report. I replaced the system drives with fresh install of Freebsd 6.1 and updated to the current security branch. This was same version of freebsd previously on the server. All of the samba shares are on gmirror/gconcat hybrid mount point. '/dev/gconcat/DATA' mounted on /usr/local/smbshares. Now for some uknown reason creating a directories on this directory will immediately cause reboot!! >From shell prompt I can SOMETIMES do the following othertimes it reboots: $ mkdir /usr/local/smbshares/testdir $ mkdir /usr/local/smbshares/test2 However creating directory beneath a directory in 'smbshares' ALLWAYS reboots: $ mkdir /usr/local/smbshares/media/pictures/testdir $ mkdir /usr/local/smbshares/media/dvds/testdir $ mkdir /usr/local/smbshares/media/dvds/all/testdir Reads seem to work fine. I can even create files so far with no problem. Files can be deleted without error. It is just when I try to make a directory that everything comes to a halt. The console error displayed before reboot is too quick to completley write but is something such as: mode 04277 inum=12258433 fs=/usr/local/smbshares panic: ffs_vallov: dup alloc .... snip ... All the providers are destroyed... Cannot dump: No dump device No apparent errors in logs. ----------------------------- Furthe system info. $ uname -a FreeBSD quiet.silent 6.1-RELEASE-p3 FreeBSD 6.1-RELEASE-p3 #0: Sun Jul 30 05:02:15 PDT 2006 root@quiet.silent:/usr/obj/usr/src/sys/GENERIC i386 $ gmirror status Name Status Components mirror/ROOT COMPLETE ad0s1 ad2s1 mirror/D2 COMPLETE ad4s1 ad16s1 mirror/D4 COMPLETE ad6s1 ad8s1 mirror/D1 COMPLETE ad10s1 ad12s1 mirror/D3 COMPLETE ad14s1 ad18s1 $ gconcat status Name Status Components concat/DATA UP mirror/D4 mirror/D1 mirror/D2 mirror/D3 $ cat /etc/fstab # Device Mountpoint FStype Options Dump Pass# /dev/mirror/ROOTb none swap sw 0 0 /dev/mirror/ROOTa / ufs rw 1 1 /dev/mirror/ROOTe /tmp ufs rw 2 2 /dev/mirror/ROOTf /usr ufs rw 2 2 /dev/mirror/ROOTd /var ufs rw 2 2 /dev/concat/DATA /usr/local/smbshares ufs rw 2 2 /dev/acd0 /cdrom cd9660 ro,noauto 0 0 $ df -h Filesystem Size Used Avail Capacity Mounted on /dev/mirror/ROOTa 959M 58M 824M 7% / devfs 1.0K 1.0K 0B 100% /dev /dev/mirror/ROOTe 4.9G 24K 4.5G 0% /tmp /dev/mirror/ROOTf 98G 8.4G 82G 9% /usr /dev/mirror/ROOTd 4.9G 123M 4.4G 3% /var /dev/concat/DATA 1.2T 794G 338G 70% /usr/local/smbshares $ dmesg Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE-p3 #0: Sun Jul 30 05:02:15 PDT 2006 root@quiet.silent:/usr/obj/usr/src/sys/GENERIC mptable_probe: MP Config Table has bad signature: 4 C ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Unknown CPU Type (1603.65-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x681 Stepping = 1 Features=0x383fbff AMD Features=0xc0400800 real memory = 268369920 (255 MB) avail memory = 253112320 (241 MB) ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe8000000-0xe9ffffff at device 0.0 on pci0 pci0: at device 0.1 (no driver attached) pci0: at device 0.2 (no driver attached) pci0: at device 0.3 (no driver attached) pci0: at device 0.4 (no driver attached) pci0: at device 0.5 (no driver attached) isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xee080000-0xee080fff irq 20 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xee083000-0xee083fff at device 2.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered ehci0: mem 0xee084000-0xee0840ff irq 21 at device 2.2 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 1.0 usb2: companion controllers, 4 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 6 ports with 6 removable, self powered pci0: at device 5.0 (no driver attached) pci0: at device 6.0 (no driver attached) pcib1: at device 8.0 on pci0 pci1: on pcib1 atapci0: port 0x9000-0x9007,0x9400-0x9403,0x9800-0x9807,0x9c00-0x9c03,0xa000-0xa00f mem 0xeb0a0000-0xeb0a3fff irq 16 at device 8.0 on pci1 ata2: on atapci0 ata3: on atapci0 pci1: at device 9.0 (no driver attached) atapci1: port 0xa400-0xa407,0xa800-0xa803,0xac00-0xac07,0xb000-0xb003,0xb400-0xb40f irq 18 at device 10.0 on pci1 ata4: on atapci1 ata5: on atapci1 atapci2: port 0xb800-0xb87f,0xbc00-0xbcff mem 0xeb0a4000-0xeb0a4fff,0xeb080000-0xeb09ffff irq 19 at device 11.0 on pci1 ata6: on atapci2 ata7: on atapci2 ata8: on atapci2 ata9: on atapci2 atapci3: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 9.0 on pci0 ata0: on atapci3 ata1: on atapci3 pcib2: at device 12.0 on pci0 pci2: on pcib2 xl0: <3Com 3c920B-EMB Integrated Fast Etherlink XL> port 0xc000-0xc07f mem 0xed000000-0xed00007f irq 20 at device 1.0 on pci2 miibus0: on xl0 acphy0: on miibus0 acphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:26:54:0b:50:df fwohci0: <1394 Open Host Controller Interface> mem 0xee086000-0xee0867ff,0xee087000-0xee08703f irq 21 at device 13.0 on pci0 fwohci0: OHCI version 1.10 (ROM=0) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:40:ca:07:01:03:77:bd fwohci0: Phy 1394a available S400, 3 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:40:ca:03:77:bd fwe0: Ethernet address: 02:40:ca:03:77:bd fwe0: if_start running deferred for Giant sbp0: on firewire0 fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) pcib3: at device 30.0 on pci0 pci3: on pcib3 acpi_tz0: on acpi0 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xcefff,0xd0000-0xd3fff,0xd4000-0xd67ff,0xd7000-0xd97ff,0xda000-0xdefff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1603648052 Hz quality 800 Timecounters tick every 1.000 msec ad0: 114473MB at ata0-master UDMA100 GEOM_MIRROR: Device ROOT created (id=2656088342). GEOM_MIRROR: Device ROOT: provider ad0s1 detected. ad2: 114473MB at ata1-master UDMA100 GEOM_MIRROR: Device ROOT: provider ad2s1 detected. GEOM_MIRROR: Device ROOT: provider ad2s1 activated. GEOM_MIRROR: Device ROOT: provider ad0s1 activated. GEOM_MIRROR: Device ROOT: provider mirror/ROOT launched. ad4: 476940MB at ata2-master UDMA100 ad6: 152627MB at ata3-master UDMA100 ad8: 152627MB at ata4-master UDMA100 ad10: 476940MB at ata5-master UDMA100 ad12: 476940MB at ata6-master SATA300 ad14: 194481MB at ata7-master SATA150 ad16: 476940MB at ata8-master SATA300 ad18: 194481MB at ata9-master SATA150 GEOM_MIRROR: Device D2 created (id=2018613835). GEOM_MIRROR: Device D2: provider ad4s1 detected. GEOM_MIRROR: Device D4 created (id=1140042297). GEOM_MIRROR: Device D4: provider ad6s1 detected. GEOM_MIRROR: Device D4: provider ad8s1 detected. GEOM_MIRROR: Device D4: provider ad8s1 activated. GEOM_MIRROR: Device D4: provider ad6s1 activated. GEOM_MIRROR: Device D4: provider mirror/D4 launched. GEOM_MIRROR: Device D1 created (id=2442871321). GEOM_MIRROR: Device D1: provider ad10s1 detected. GEOM_CONCAT: Device DATA created (id=2233628062). GEOM_CONCAT: Disk mirror/D4 attached to DATA. GEOM_MIRROR: Device D1: provider ad12s1 detected. GEOM_MIRROR: Device D1: provider ad12s1 activated. GEOM_MIRROR: Device D1: provider ad10s1 activated. GEOM_MIRROR: Device D1: provider mirror/D1 launched. GEOM_MIRROR: Device D3 created (id=914260241). GEOM_MIRROR: Device D3: provider ad14s1 detected. GEOM_MIRROR: Device D2: provider ad16s1 detected. GEOM_MIRROR: Device D2: provider ad16s1 activated. GEOM_MIRROR: Device D2: provider ad4s1 activated. GEOM_MIRROR: Device D2: provider mirror/D2 launched. GEOM_MIRROR: Device D3: provider ad18s1 detected. GEOM_MIRROR: Device D3: provider ad18s1 activated. GEOM_MIRROR: Device D3: provider ad14s1 activated. GEOM_MIRROR: Device D3: provider mirror/D3 launched. GEOM_CONCAT: Disk mirror/D1 attached to DATA. GEOM_CONCAT: Disk mirror/D2 attached to DATA. GEOM_CONCAT: Disk mirror/D3 attached to DATA. GEOM_CONCAT: Device DATA activated. Trying to mount root from ufs:/dev/mirror/ROOTa WARNING: / was not properly dismounted WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted WARNING: /usr/local/smbshares was not properly dismounted $ From owner-freebsd-geom@FreeBSD.ORG Sun Jul 30 13:33:05 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AA1C416A4E1 for ; Sun, 30 Jul 2006 13:33:05 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30312.mail.mud.yahoo.com (web30312.mail.mud.yahoo.com [68.142.201.230]) by mx1.FreeBSD.org (Postfix) with SMTP id E7A0443D46 for ; Sun, 30 Jul 2006 13:33:04 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 93612 invoked by uid 60001); 30 Jul 2006 13:33:04 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=xnJWUJs4DaepfYbTa8E4UVu0a+mhT7bpLL5fcq/+nbs5cn1KCGhSHum+nFizs3o6Lnxc+11Zr4Y62hsg9btb/SfD4GesyN/TzmCcTr1S5/c5IHoFBskx83EBOF7+K7IRP8qTXrDWkrV60RMIJbGgfJtIQdpqEhd+Pto5QCmNbLU= ; Message-ID: <20060730133304.93610.qmail@web30312.mail.mud.yahoo.com> Received: from [213.54.79.55] by web30312.mail.mud.yahoo.com via HTTP; Sun, 30 Jul 2006 06:33:04 PDT Date: Sun, 30 Jul 2006 06:33:04 -0700 (PDT) From: "R. B. Riddick" To: Brent Hostetler , freebsd-geom@freebsd.org, freebsd-questions@freebsd.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Subject: Re: gmirror/gconcat: mkdir causes system reboot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Jul 2006 13:33:05 -0000 --- Brent Hostetler wrote: > mode 04277 inum=12258433 fs=/usr/local/smbshares > panic: ffs_vallov: dup alloc > I say, did u try a fsck on that file system? It looks more like an file system related problem. I would try an fsck -n ... first (just in case there is a configuration error; e. g.: I had a gstripe and had to re-label it, but I forgot the original stripe size, so that the fsck-run destroyed almost the whole file system). -Arne __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-geom@FreeBSD.ORG Mon Jul 31 08:19:51 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5487316A4E1 for ; Mon, 31 Jul 2006 08:19:51 +0000 (UTC) (envelope-from brenthostetler@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.176]) by mx1.FreeBSD.org (Postfix) with ESMTP id BC83943D58 for ; Mon, 31 Jul 2006 08:19:50 +0000 (GMT) (envelope-from brenthostetler@gmail.com) Received: by py-out-1112.google.com with SMTP id b36so450474pyb for ; Mon, 31 Jul 2006 01:19:49 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=dSahIAURqRo71Cn1Y+5uwWwawOMvLxj51w9N8ZvLIR1cxOysBQcljHQ85S3ExPZMIqKb5iH8lSZtwW1Uq+c6fai5Ry3PVCzbZam3ttg332+ULNsW1vSNe3hM3nD7SU7hq4Hi5EKFFdgebnM7cnBZ9mu2eMd+qrVsy2+O4Cj+mgQ= Received: by 10.35.99.5 with SMTP id b5mr3629684pym; Mon, 31 Jul 2006 01:19:48 -0700 (PDT) Received: by 10.35.128.2 with HTTP; Mon, 31 Jul 2006 01:19:48 -0700 (PDT) Message-ID: Date: Mon, 31 Jul 2006 01:19:48 -0700 From: "Brent Hostetler" To: "R. B. Riddick" In-Reply-To: <20060730133304.93610.qmail@web30312.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20060730133304.93610.qmail@web30312.mail.mud.yahoo.com> Cc: freebsd-geom@freebsd.org Subject: Re: gmirror/gconcat: mkdir causes system reboot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 08:19:51 -0000 On 7/30/06, R. B. Riddick wrote: > --- Brent Hostetler wrote: > > mode 04277 inum=12258433 fs=/usr/local/smbshares > > panic: ffs_vallov: dup alloc > > > I say, did u try a fsck on that file system? > It looks more like an file system related problem. > > I would try an > fsck -n ... > first (just in case there is a configuration error; e. g.: I had a gstripe and > had to re-label it, but I forgot the original stripe size, so that the fsck-run > destroyed almost the whole file system). > > -Arne > Fsck -n ran for quite a few hours and indeed showed lotsssss of errors. Where these errors came from I am confused? Unless they came about when I added a new disk a few weeks ago and did a growfs. Well, I detached the spares from the mirrors that made up the gcocat data mount. Then I ran fsck -y /dev/concat/DATA.. I had to continually run this command about four-five times over about 4 hours before marked clean. The runs consisted of things such as: 311282919243014689 BAD I=122259928 UNEXPECTED SOFT UPDATE INCONSISTENCY -36191552698910622 BAD I=122259928 UNEXPECTED SOFT UPDATE INCONSISTENCY EXCESSIVE BAD BLKS I=122374685 CONTINUE? yes EXCESSIVE BAD BLKS I=122437538 COUNKNOWN FILE TYPE I=146374145 UNKNOWN FILE TYPE I=147458625 UNEXPECTED SOFT UPDATE INCONSISTENCY UNKNOWN FILE TYPE I=149175296 UNEXPECTED SOFT UPDATE INCONSISTENCY CLEAR? yes PARTIALLY ALLOCATED INODE I=150172536 UNEXPECTED SOFT UPDATE INCONSISTENCY YOU MUST RERUN FSCK AFTERWARDS DIRECTORY ?: CONTAINS EMPTY BLOCKS UNEXPECTED SOFT UPDATE INCONSISTENCY ADJUST LENGTH? yes These errors repeated for about 3meg log file of errors. After this, mount clean but lost 220 Gigs of data!!! So I created new concat from the spares and mounted and all the data is there and is readable! So Im in the process of copying from the spare drives to the main... What would have caused this madness and why no errors showing up in logs, or when the system boots and fsck is run in background?? How can I prevent this in the future? This is a home server, and because of the large size of the data (700GIGs + and growing) I cannot backup a large percent of the data. My only failsafe is the mirroring which has saved me a few times.. ( I know its not a backup, but its better then nothing! ) Thanks! From owner-freebsd-geom@FreeBSD.ORG Mon Jul 31 08:51:27 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F398716A4DD for ; Mon, 31 Jul 2006 08:51:26 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30312.mail.mud.yahoo.com (web30312.mail.mud.yahoo.com [68.142.201.230]) by mx1.FreeBSD.org (Postfix) with SMTP id 8D47043D49 for ; Mon, 31 Jul 2006 08:51:26 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 49556 invoked by uid 60001); 31 Jul 2006 08:51:25 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=oY2m3nWop+ot6ENYMUzburYJbs1evutJv9mGbqMvQ2jOK6d4QyRrfH71sMcof98+r94W03fjL6BkYRcHH7/WorzBMH63AzCzJo3AJrthwzvf9uFC74ZKb9CPQWK8X2S58YJVl6gOUyP5KdXodvdsBtd1xTQ/QRZ5Uh9E0rv7ou0= ; Message-ID: <20060731085125.49554.qmail@web30312.mail.mud.yahoo.com> Received: from [213.54.65.162] by web30312.mail.mud.yahoo.com via HTTP; Mon, 31 Jul 2006 01:51:25 PDT Date: Mon, 31 Jul 2006 01:51:25 -0700 (PDT) From: "R. B. Riddick" To: Brent Hostetler In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror/gconcat: mkdir causes system reboot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 08:51:27 -0000 --- Brent Hostetler wrote: > On 7/30/06, R. B. Riddick wrote: > > I say, did u try a fsck on that file system? > > It looks more like an file system related problem. > > > > Well, I detached the spares from the mirrors that made up the gcocat > good idea... > data mount. Then I ran fsck -y /dev/concat/DATA.. I had to continually > run this command about four-five times over about 4 hours before > marked clean. > interesting... :-) > -36191552698910622 BAD I=122259928 > UNEXPECTED SOFT UPDATE INCONSISTENCY > uhuhuhuh :-) such big numbers... > These errors repeated for about 3meg log file of errors. > Hmm... Maybe u mounted ur file system unclean somewhen (e. g. by pressing CTRL+C during the boot process) and then the errors escalated cataclysmically? :-) Or maybe ur gmirror's where out of sync due to some strange administrative trick or due to a system crash during synchronization (so that just one half of the mirror had the new data, while it stayed unmentioned, that the other half had still the old data (this is something fsck could miss, when it just reads the new/good half of the mirror (e. g. because those discs r faster or because of a coincidence))). > After this, mount clean but lost 220 Gigs of data!!! So I created new > concat from the spares and mounted and all the data is there and is > readable! So Im in the process of copying from the spare drives to the > main... > Hmm... And the copy on the other parts of the mirror had no errors? I mean: Did u run fsck on the concat of the spares, too? > What would have caused this madness and why no errors showing up in > logs, or when the system boots and fsck is run in background?? > I just have the catacylsm theory (see above)... I do not think, that the ufs code can produce such errors in case of normal operation, because: I never saw that (but I do fsck's every few weeks, whenever I crashed my box with a self-written kernel module or so). > How can I prevent this in the future? > I do not know. Maybe regular fsck's? Once a month? Boot to single user and do a fsck on every fs (especially the clean ones). > This is a home server, and because of the large size of the data > (700GIGs + and growing) I cannot backup a large percent of the data. > My only failsafe is the mirroring which has saved me a few times... (I > know its not a backup, but its better then nothing!) > If u like u may want to _test_ my brand-new graid5 (http://home.tiscali.de/cmdr_faako/geom_raid5.tbz), which could save some hard discs (but it is less secure, since it will tolerate in no case the loss of two discs at the same time, and since it is almost untested). U r right: Mirroring does not help, when u accidentially delete something, but it helps against disc damage related data loss... I have another idea: U could try to rsync (ports/net/rsync) "old" changes (maybe changes that happened at least one hour before) from one concat'ed disc set to another (instead of the mirror), which could help against accidential deletion (but then u could lose at least 1 hour of fresh data (or u use my graid5 or the older and slower(?) graid3 to protect against the damage on a single disc)). -Arne __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-geom@FreeBSD.ORG Mon Jul 31 10:11:09 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F0C4016A4DA for ; Mon, 31 Jul 2006 10:11:09 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F5B143D4C for ; Mon, 31 Jul 2006 10:11:08 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 18BEA5139A; Mon, 31 Jul 2006 12:11:07 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id EACB451339; Mon, 31 Jul 2006 12:11:00 +0200 (CEST) Date: Mon, 31 Jul 2006 12:10:33 +0200 From: Pawel Jakub Dawidek To: "R. B. Riddick" Message-ID: <20060731101033.GA842@garage.freebsd.pl> References: <20060731085125.49554.qmail@web30312.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="k+w/mQv8wyuph6w0" Content-Disposition: inline In-Reply-To: <20060731085125.49554.qmail@web30312.mail.mud.yahoo.com> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng/devel-r804 (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: Brent Hostetler , freebsd-geom@freebsd.org Subject: Re: gmirror/gconcat: mkdir causes system reboot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 10:11:10 -0000 --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 31, 2006 at 01:51:25AM -0700, R. B. Riddick wrote: > --- Brent Hostetler wrote: > > These errors repeated for about 3meg log file of errors. > > > Hmm... Maybe u mounted ur file system unclean somewhen (e. g. by pressing > CTRL+C during the boot process) and then the errors escalated cataclysmic= ally? > :-) >=20 > Or maybe ur gmirror's where out of sync due to some strange administrative > trick or due to a system crash during synchronization (so that just one h= alf of > the mirror had the new data, while it stayed unmentioned, that the other = half > had still the old data (this is something fsck could miss, when it just r= eads > the new/good half of the mirror (e. g. because those discs r faster or be= cause > of a coincidence))). Gmirror protects against such situations - on power loss or system crash it will resynchronize mirrors. I've heard of problems with growfs(8). Another option is incorrect addition of the new disk to gconcat device. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --k+w/mQv8wyuph6w0 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFEzdcZForvXbEpPzQRAq6qAJ0WVq6fLVFbYxse5pUcVMn/P4XO0ACgiEa8 01x8TC4vl3QQs7nYFMqziGQ= =OJy/ -----END PGP SIGNATURE----- --k+w/mQv8wyuph6w0-- From owner-freebsd-geom@FreeBSD.ORG Mon Jul 31 10:22:43 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBB1516A4DD for ; Mon, 31 Jul 2006 10:22:43 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30305.mail.mud.yahoo.com (web30305.mail.mud.yahoo.com [68.142.200.98]) by mx1.FreeBSD.org (Postfix) with SMTP id 4074843D45 for ; Mon, 31 Jul 2006 10:22:43 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 41082 invoked by uid 60001); 31 Jul 2006 10:22:42 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=30gcNfcCcWkhhIQhWMHEzhG7NLXNtzyF15ZLLElLXhdF8oqA7d3cMAcMwhY+AZtrmx75o+Lv04lEzLQK4ojd+AOED6YgcMn3fvVspKMUcqmYCzwYsLxcLXzEm1BxQhGw04+mRTCzuyoZdlwSdYx5duyXhu9Grh2h3SFL5pSpJAE= ; Message-ID: <20060731102242.41080.qmail@web30305.mail.mud.yahoo.com> Received: from [213.54.65.162] by web30305.mail.mud.yahoo.com via HTTP; Mon, 31 Jul 2006 03:22:42 PDT Date: Mon, 31 Jul 2006 03:22:42 -0700 (PDT) From: "R. B. Riddick" To: Pawel Jakub Dawidek In-Reply-To: <20060731101033.GA842@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Brent Hostetler , freebsd-geom@freebsd.org Subject: Re: gmirror/gconcat: mkdir causes system reboot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 10:22:43 -0000 --- Pawel Jakub Dawidek wrote: > Gmirror protects against such situations - on power loss or system crash > it will resynchronize mirrors. > Hmm... The evil is a squirrel... E. g. the write cache could delay the commit regarding the last sector of the disc until the data has been written... Then the sudden power loss (or so) could happen, while one disc updates the data but before the other disc and the meta data are written. Or not? Or is gmirror like gjournal? In graid5 I mark the geom_device as clean as soon as the shutdown event occurs. Is that a good idea? That makes versioning of the various disks obsolete, doesn't it? -Arne __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-geom@FreeBSD.ORG Mon Jul 31 11:03:45 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9902616A50C for ; Mon, 31 Jul 2006 11:03:45 +0000 (UTC) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id DF17243D94 for ; Mon, 31 Jul 2006 11:03:01 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k6VB2uAs051799 for ; Mon, 31 Jul 2006 11:02:56 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k6VB2tIA051795 for freebsd-geom@freebsd.org; Mon, 31 Jul 2006 11:02:55 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 31 Jul 2006 11:02:55 GMT Message-Id: <200607311102.k6VB2tIA051795@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-geom@FreeBSD.org Cc: Subject: Current problem reports assigned to you X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 11:03:45 -0000 Current FreeBSD problem reports Critical problems Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2005/01/21] kern/76538 geom [gbde] nfs-write on gbde partition stalls o [2005/08/04] kern/84556 geom [geom] GBDE-encrypted swap causes panic a o [2005/10/16] kern/87544 geom [gbde] mmaping large files on a gbde file o [2005/11/16] kern/89102 geom [geom_vfs] [panic] panic when forced unmo o [2005/12/08] bin/90093 geom fdisk(8) incapable of altering in-core ge o [2005/12/18] kern/90582 geom [geom_mirror] [panic] Restore cause panic o [2006/04/15] kern/95771 geom [geom] geom mirror provider destroyed (ma o [2006/05/27] kern/98034 geom [geom] dereference of NULL pointer in acd o [2006/06/09] kern/98742 geom [geli] IO errors while using geli o [2006/06/21] kern/99256 geom [geli] kernel panic/freeze with geli and 10 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2005/02/26] bin/78131 geom gbde "destroy" not working. o [2005/03/26] kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o [2006/03/18] kern/94632 geom [geom] Kernel output resets input while G o [2006/06/05] kern/98538 geom [geom] Kernel panic on ggate destroy 4 problems total. From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 04:26:53 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EFCB316A4E5 for ; Tue, 1 Aug 2006 04:26:53 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id A04ED43D49 for ; Tue, 1 Aug 2006 04:26:53 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [192.168.42.24] (andersonbox4.centtech.com [192.168.42.24]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k714QqJM012634 for ; Mon, 31 Jul 2006 23:26:52 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44CED817.1080905@centtech.com> Date: Mon, 31 Jul 2006 23:27:03 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.4 (X11/20060612) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1628/Mon Jul 31 16:56:57 2006 on mh2.centtech.com X-Virus-Status: Clean Subject: locking questions (regarding file systems) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 04:26:54 -0000 Hi GEOMers, I'm writing a file system (read-only), and I need to do some GEOM related locking. I can mount/unmount the filesystem on a vnode backed md disk, but I can't re-mount on that device nor can I get rid (mdconfig -d) of it. It appears to be wedged in some kind of locking. Here's basically what I do: in the mount function for the FS, I do something like this: DROP_GIANT(); g_topology_lock(); error = g_vfs_open(devvp, &cp, "fsname", 0); g_topology_unlock(); PICKUP_GIANT(); What is needed in my unmount function to release those locks? I've tried some combinations of things, like: DROP_GIANT(); g_topology_lock(); # wedges here g_vfs_close(cp, td); g_topology_unlock(); PICKUP_GIANT(); vrele(devvp); Any help would be greatly appreciated! Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 13:36:37 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D87A116A4DF; Tue, 1 Aug 2006 13:36:37 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from s4.hmnoc.net (s4.hmnoc.net [72.232.108.202]) by mx1.FreeBSD.org (Postfix) with ESMTP id 69C6043D60; Tue, 1 Aug 2006 13:36:34 +0000 (GMT) (envelope-from felipe@neuwald.biz) Received: from [200.101.23.190] (port=51023 helo=[192.168.1.22]) by s4.hmnoc.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.52) id 1G7uQA-0007G9-RE; Tue, 01 Aug 2006 10:36:31 -0300 Message-ID: <44CF58DD.8000805@neuwald.biz> Date: Tue, 01 Aug 2006 10:36:29 -0300 From: Felipe Neuwald User-Agent: Thunderbird 1.5.0.5 (X11/20060728) MIME-Version: 1.0 To: freebsd-current@freebsd.org, freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-PopBeforeSMTPSenders: felipe@neuwald.biz X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - s4.hmnoc.net X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - neuwald.biz X-Source: X-Source-Args: X-Source-Dir: Cc: Subject: GEOM_BDE: where is my partition? X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 13:36:38 -0000 Hi Folks, I have 4 GEOM_BDE encrypted partitions in one FreeBSD 6.1-PRERELEASE Server. Since the server anormally shutdown (power cut), one of partitions isn't more available. Here is the partitions: [root@xingu /dev]# ls -laF /etc/gbde/ total 12 drwxr-xr-x 2 root wheel 512 Mar 23 15:33 ./ drwxr-xr-x 19 root wheel 2048 Jul 13 11:34 ../ -rw------- 1 root wheel 16 Sep 26 2005 ad0s1g -rw------- 1 root wheel 16 Sep 26 2005 ad1s1d -rw------- 1 root wheel 16 Sep 27 2005 ad2s1d -rw------- 1 root wheel 16 Mar 23 15:33 ad5s1c ad0s1g, ad1s1d, and ad2s1d are correctly mounted: [root@xingu /dev]# mount | grep bde /dev/ad0s1g.bde on /data (ufs, NFS exported, local, soft-updates) /dev/ad1s1d.bde on /data1 (ufs, NFS exported, local, soft-updates) /dev/ad2s1d.bde on /data2 (ufs, NFS exported, local, soft-updates) But /dev/ad5s1c doesn't exist anymore!!! [root@xingu /dev]# /sbin/gbde attach /dev/ad5s1c -l /etc/gbde/ad5s1c Enter passphrase: gbde: Attach to ad5s1c failed: Provider not found [root@xingu /dev]# ls /dev/ad5s1c ls: /dev/ad5s1c: No such file or directory [root@xingu /dev]# Any idea of how to recover the partition? What could happend? =/ Here is some info about my kernel: [root@xingu /dev]# cat /usr/src/sys/i386/conf/KERNEL4 | grep GEOM options GEOM_GPT # GUID Partition Tables. options GEOM_BDE And about my system: [root@xingu /dev]# uname -a FreeBSD xingu.xxx 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Mon Feb 20 16:47:57 BRT 2006 root@xingu.xxx:/usr/obj/usr/src/sys/KERNEL4 i386 Thank you, Felipe Neuwald. From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 20:08:17 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89F8216A4DE for ; Tue, 1 Aug 2006 20:08:17 +0000 (UTC) (envelope-from nomad@castle.org) Received: from castle.org (castle.org [207.178.4.54]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2608A43D46 for ; Tue, 1 Aug 2006 20:08:17 +0000 (GMT) (envelope-from nomad@castle.org) Received: from [128.208.232.198] (vanyel.ee.washington.edu [128.208.232.198]) (authenticated bits=0) by castle.org (8.13.6/8.13.6) with ESMTP id k71K8EGU031748 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 1 Aug 2006 13:08:15 -0700 (PDT) (envelope-from nomad@castle.org) Message-ID: <44CFB4AB.4080803@castle.org> Date: Tue, 01 Aug 2006 13:08:11 -0700 From: Lee Damon User-Agent: Thunderbird 1.5.0.5 (Macintosh/20060719) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on castle.org Subject: growfs, old disks, gvinum - filesystem expansion and corruption X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 20:08:17 -0000 I have several FreeBSD-6.1 based file servers. Each one has a hardware RAID-based system that presents a single large (800G to 2TB) "disk" to the OS. In looking around for a FBSD-supported method of slicing this disk up into usable (and later growable) file systems the only thing I found was gvinum. The other geom commands don't seem to deal with the idea of slicing off parts of a large drive. In a process very similar to the LVM commands I've used on other systems I set up the boxes, calved off some disk as gvinum "drives", formatted with newfs -U /dev/gvinum/..., added data and served forth. Then as file systems filled up I expanded those file systems. When the data was no longer needed (testing was done, it was time to go production) I removed those file systems by unmounting them and using rm in the gvinum command. I then created new file systems and the time has recently come to expand them. Now I discover that gvinum + growfs = Bad Things. It seems that when a file system is grown over "old file systems" (that supposedly no longer exist) and then fsck is run it finds those old file systems and shoves random stuff (including device nodes in some cases) into the lost+found directory on the current file system. fsck(8) gives many complaints about "unexpected soft update inconsistency", unref dir", bad/dup file", and "allocated frags marked free" & "allocated files marked free". The rub being when I try to remove the old cruft from lost+found I run a good probability of triggering a kernel panic. My questions are: 1. Is there any way to clean up the old file system cruft without crashing/trashing/deleting the system? 2. What tool should I be using to calve off slices of a 'huge' drive and later expand it? Any help would be much appreciated, nomad From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 20:26:44 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C8DC16A4DF for ; Tue, 1 Aug 2006 20:26:44 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id C2F7743D70 for ; Tue, 1 Aug 2006 20:26:38 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k71KQXAX065577; Tue, 1 Aug 2006 15:26:33 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44CFB908.40904@centtech.com> Date: Tue, 01 Aug 2006 15:26:48 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.4 (X11/20060612) MIME-Version: 1.0 To: Lee Damon References: <44CFB4AB.4080803@castle.org> In-Reply-To: <44CFB4AB.4080803@castle.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1630/Tue Aug 1 10:38:56 2006 on mh2.centtech.com X-Virus-Status: Clean Cc: freebsd-geom@freebsd.org Subject: Re: growfs, old disks, gvinum - filesystem expansion and corruption X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 20:26:44 -0000 On 08/01/06 15:08, Lee Damon wrote: > I have several FreeBSD-6.1 based file servers. Each one has a hardware > RAID-based system that presents a single large (800G to 2TB) "disk" to > the OS. > > In looking around for a FBSD-supported method of slicing this disk up > into usable (and later growable) file systems the only thing I found was > gvinum. The other geom commands don't seem to deal with the idea of > slicing off parts of a large drive. > > In a process very similar to the LVM commands I've used on other systems > I set up the boxes, calved off some disk as gvinum "drives", formatted > with newfs -U /dev/gvinum/..., added data and served forth. Then as > file systems filled up I expanded those file systems. When the data was > no longer needed (testing was done, it was time to go production) I > removed those file systems by unmounting them and using rm in the gvinum > command. I then created new file systems and the time has recently come > to expand them. > > Now I discover that gvinum + growfs = Bad Things. It seems that > when a file system is grown over "old file systems" (that supposedly no > longer exist) and then fsck is run it finds those old file systems and > shoves random stuff (including device nodes in some cases) into the > lost+found directory on the current file system. fsck(8) gives many > complaints about "unexpected soft update inconsistency", unref dir", > bad/dup file", and "allocated frags marked free" & "allocated files > marked free". The rub being when I try to remove the old cruft from > lost+found I run a good probability of triggering a kernel panic. > > My questions are: > 1. Is there any way to clean up the old file system cruft without > crashing/trashing/deleting the system? You could use dd to write zeros over the old areas to wipe out the old filesystem data. Careful! > 2. What tool should I be using to calve off slices of a 'huge' drive and > later expand it? Why not use fdisk/bsdlabel? Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 20:38:18 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8247716A4DD for ; Tue, 1 Aug 2006 20:38:18 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2611143D45 for ; Tue, 1 Aug 2006 20:38:17 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 9946C54F for ; Tue, 1 Aug 2006 22:38:15 +0200 (CEST) Message-ID: <44CFBBB6.5010906@quip.cz> Date: Tue, 01 Aug 2006 22:38:14 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: gmirror rebuild does not accept full path to provider X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 20:38:18 -0000 I am not so experienced user of gmirror, but as I read man pages and some online HowTo, I think this is a bug. # gmirror rebuild -v gm0 /dev/ad5 No such provider: /dev/ad5 # gmirror rebuild -v gm0 ad5 GEOM_MIRROR: Device gm0: provider ad5 disconnected. GEOM_MIRROR: Device gm0: provider ad5 detected. GEOM_MIRROR: Device gm0: rebuilding provider ad5. Done. Is it expected? Other commands (gmirror label, gmirror insert, etc.) accept full path. Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 20:43:18 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 802FA16A4DD for ; Tue, 1 Aug 2006 20:43:18 +0000 (UTC) (envelope-from nomad@castle.org) Received: from castle.org (castle.org [207.178.4.54]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0215643D46 for ; Tue, 1 Aug 2006 20:43:17 +0000 (GMT) (envelope-from nomad@castle.org) Received: from [128.208.232.198] (vanyel.ee.washington.edu [128.208.232.198]) (authenticated bits=0) by castle.org (8.13.6/8.13.6) with ESMTP id k71KhE55031897 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 1 Aug 2006 13:43:15 -0700 (PDT) (envelope-from nomad@castle.org) Message-ID: <44CFBCDE.7040003@castle.org> Date: Tue, 01 Aug 2006 13:43:10 -0700 From: Lee Damon User-Agent: Thunderbird 1.5.0.5 (Macintosh/20060719) MIME-Version: 1.0 To: Eric Anderson References: <44CFB4AB.4080803@castle.org> <44CFB908.40904@centtech.com> In-Reply-To: <44CFB908.40904@centtech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on castle.org Cc: freebsd-geom@freebsd.org Subject: Re: growfs, old disks, gvinum - filesystem expansion and corruption X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 20:43:18 -0000 > = Eric Anderson >> = me ... >> My questions are: >> 1. Is there any way to clean up the old file system cruft without >> crashing/trashing/deleting the system? > > You could use dd to write zeros over the old areas to wipe out the old > filesystem data. Careful! Is there a way to find out which parts of the "disk" are currently not in use by an active gvinum area? >> 2. What tool should I be using to calve off slices of a 'huge' drive >> and later expand it? > > Why not use fdisk/bsdlabel? I didn't think that could be done with a live/active/mounted drive or with more than a handful of slices. [The system currently has 16 mounted gvinum-based filesystems, some of which will end up being made of 3 (or more) individual slices based on growth]. nomad From owner-freebsd-geom@FreeBSD.ORG Tue Aug 1 20:43:48 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B341816A4DA for ; Tue, 1 Aug 2006 20:43:48 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4D72443D46 for ; Tue, 1 Aug 2006 20:43:48 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k71Khl2U068379; Tue, 1 Aug 2006 15:43:47 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44CFBD14.1010108@centtech.com> Date: Tue, 01 Aug 2006 15:44:04 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.4 (X11/20060612) MIME-Version: 1.0 To: Miroslav Lachman <000.fbsd@quip.cz> References: <44CFBBB6.5010906@quip.cz> In-Reply-To: <44CFBBB6.5010906@quip.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1630/Tue Aug 1 10:38:56 2006 on mh2.centtech.com X-Virus-Status: Clean Cc: freebsd-geom@freebsd.org Subject: Re: gmirror rebuild does not accept full path to provider X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2006 20:43:48 -0000 On 08/01/06 15:38, Miroslav Lachman wrote: > I am not so experienced user of gmirror, but as I read man pages and > some online HowTo, I think this is a bug. > > # gmirror rebuild -v gm0 /dev/ad5 > No such provider: /dev/ad5 > # gmirror rebuild -v gm0 ad5 > GEOM_MIRROR: Device gm0: provider ad5 disconnected. > GEOM_MIRROR: Device gm0: provider ad5 detected. > GEOM_MIRROR: Device gm0: rebuilding provider ad5. > Done. > > Is it expected? > > Other commands (gmirror label, gmirror insert, etc.) accept full path. I think support was committed to the 6.x tree a couple weeks ago: pjd 2006-07-16 15:43:52 UTC FreeBSD src repository Modified files: (Branch: RELENG_6) sys/geom/mirror g_mirror_ctl.c Log: MFC: sys/geom/mirror/g_mirror_ctl.c 1.17 Always allow to specify components with /dev/ prefix. Revision Changes Path 1.11.2.3 +4 -2 src/sys/geom/mirror/g_mirror_ctl.c -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 08:46:20 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F4CD16A4DF for ; Wed, 2 Aug 2006 08:46:20 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 296DD43D77 for ; Wed, 2 Aug 2006 08:46:09 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 63D3C5934 for ; Wed, 2 Aug 2006 10:46:08 +0200 (CEST) Message-ID: <44D06650.1030803@quip.cz> Date: Wed, 02 Aug 2006 10:46:08 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 08:46:20 -0000 Hi, I have strange problem with gmirror or ATA. Gmirror gm0 is built from two providers - ad4 and ad5 (250GB Seagate on SATA ICH7) based on this article http://www.onlamp.com/pub/a/bsd/2005/11/10/FreeBSD_Basics.html Yesterday I got these errors in /var/log/messages: Aug 1 00:03:42 track kernel: ad5: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=290279525 Aug 1 00:03:48 track kernel: ad5: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=290279525 Aug 1 00:03:48 track kernel: ad5: FAILURE - WRITE_DMA48 status=51 error=10 LBA=290279525 Aug 1 00:03:48 track kernel: GEOM_MIRROR: Request failed (error=5). ad5[WRITE(offset=148623116800, length=2048)] Aug 1 00:03:48 track kernel: GEOM_MIRROR: Device gm0: provider ad5 disconnected. Following by similar errors on ad4 Aug 1 06:30:16 track kernel: ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=284911237 Aug 1 06:30:16 track kernel: ad4: FAILURE - WRITE_DMA48 status=51 error=10 LBA=284911237 Aug 1 06:30:16 track kernel: GEOM_MIRROR: Request failed (error=5). ad4[WRITE(offset=145874553344, length=32768)] Aug 1 06:30:16 track kernel: g_vfs_done():mirror/gm0s2d[WRITE(offset=76083052544, length=32768)]error = 5 After few minutes, system reboots itself with this error message: Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 detected. Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) broken, skipping. I tried smartctl -a /dev/ad4 and smartctl -a /dev/ad5, but does not see any errors. If I use gmirror activate -v gm0 ad5 I got Aug 2 10:24:03 track kernel: GEOM_MIRROR: Component ad5 (device gm0) broken, skipping. Aug 2 10:24:03 track kernel: GEOM_MIRROR: Cannot add disk ad5 to gm0 (error=22). I can successfuly mount partitions from drive ad5 like this mount /dev/ad5s2d /mnt (Aug 2 10:35:21 track kernel: WARNING: /vol0 was not properly dismounted) And read any files from this drive. Can anybody tell me, where is the problem / how can I found what is wrong? System is FreeBSD 6.1-RELEASE #0: Sun May 7 04:32:43 UTC 2006 root@opus.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 ASUS RS120-E3 There are full logs: http://www.quip.cz/1/freebsd/asus_rs120-e3/track_SMART_ad4.txt http://www.quip.cz/1/freebsd/asus_rs120-e3/track_SMART_ad5.txt http://www.quip.cz/1/freebsd/asus_rs120-e3/track_gmirror_list.txt http://www.quip.cz/1/freebsd/asus_rs120-e3/track_messages.txt Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 18:30:03 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E83516A4E0 for ; Wed, 2 Aug 2006 18:30:03 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id CA28A43D45 for ; Wed, 2 Aug 2006 18:30:02 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 14335 invoked by uid 2001); 2 Aug 2006 18:30:01 -0000 Date: Wed, 2 Aug 2006 13:30:01 -0500 From: "Rick C. Petty" To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20060802183001.GA14279@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44D06650.1030803@quip.cz> User-Agent: Mutt/1.4.2.1i Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 18:30:03 -0000 On Wed, Aug 02, 2006 at 10:46:08AM +0200, Miroslav Lachman wrote: > > Aug 1 00:03:42 track kernel: ad5: TIMEOUT - WRITE_DMA48 retrying (1 > retry left) LBA=290279525 Out of curiosity-- what's the dmesg output of your ATA controllers? > I tried smartctl -a /dev/ad4 and smartctl -a /dev/ad5, but does not see > any errors. Did you have SMART enabled in the BIOS? > If I use gmirror activate -v gm0 ad5 I got > Aug 2 10:24:03 track kernel: GEOM_MIRROR: Component ad5 (device gm0) > broken, skipping. > Aug 2 10:24:03 track kernel: GEOM_MIRROR: Cannot add disk ad5 to gm0 > (error=22). It's already activated, so you can't add it again (as the message states). > I can successfuly mount partitions from drive ad5 like this > mount /dev/ad5s2d /mnt > > (Aug 2 10:35:21 track kernel: WARNING: /vol0 was not properly dismounted) > > And read any files from this drive. That shouldn't be a surprise-- the disks themselves didn't fail, only writing to them (possibly under heavy load?) failed-- and gmirror dropped the disks. The first disk drop was ok-- the mirror should still work in DEGRADED state. The second drop was critical which is why your system broke. Mounting the disks individually will work of course. > Can anybody tell me, where is the problem / how can I found what is wrong? What's the output of "gmirror status" ?? I suspect on a reboot, gmirror will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once that is complete, gmirror won't be DEGRADED anymore. -- Rick C. Petty From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 20:38:01 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3FEF116A4DE for ; Wed, 2 Aug 2006 20:38:01 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2104B43D6E for ; Wed, 2 Aug 2006 20:37:51 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 3CB2D5280; Wed, 2 Aug 2006 22:37:50 +0200 (CEST) Message-ID: <44D10D1D.9040700@quip.cz> Date: Wed, 02 Aug 2006 22:37:49 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: rick-freebsd@kiwi-computer.com References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> In-Reply-To: <20060802183001.GA14279@megan.kiwi-computer.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 20:38:01 -0000 Rick C. Petty wrote: > On Wed, Aug 02, 2006 at 10:46:08AM +0200, Miroslav Lachman wrote: > >>Aug 1 00:03:42 track kernel: ad5: TIMEOUT - WRITE_DMA48 retrying (1 >>retry left) LBA=290279525 > > > Out of curiosity-- what's the dmesg output of your ATA controllers? atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xe800-0xe807,0xe480-0xe483,0xe400-0xe407,0xe080-0xe083,0xe000-0xe00f mem 0xfebff800-0xfebffbff irq 19 at device 31.2 on pci0 ata2: on atapci1 ata3: on atapci1 full dmesg output is included in http://www.quip.cz/1/freebsd/asus_rs120-e3/track_messages.txt >>I tried smartctl -a /dev/ad4 and smartctl -a /dev/ad5, but does not see >>any errors. > > > Did you have SMART enabled in the BIOS? Yes, (as I remember - I have only remote access now) and have smartd_enable="YES" in /etc/rc.conf and smartd.conf has these lines: /dev/ad4 -a -o on -S on -m root -M test -s (S/../.././04|L/../../6/05) -t -I 194 /dev/ad5 -a -o on -S on -m root -M test -s (S/../.././04|L/../../6/05) -t -I 194 full output of smartctl -a /dev/adX http://www.quip.cz/1/freebsd/asus_rs120-e3/track_SMART_ad4.txt http://www.quip.cz/1/freebsd/asus_rs120-e3/track_SMART_ad5.txt >>If I use gmirror activate -v gm0 ad5 I got >>Aug 2 10:24:03 track kernel: GEOM_MIRROR: Component ad5 (device gm0) >>broken, skipping. >>Aug 2 10:24:03 track kernel: GEOM_MIRROR: Cannot add disk ad5 to gm0 >>(error=22). > > > It's already activated, so you can't add it again (as the message states). But how can I force gmirror to re-use this disk? I don't know, what "broken, skipping" or "error=22" really means. >>I can successfuly mount partitions from drive ad5 like this >>mount /dev/ad5s2d /mnt >> >>(Aug 2 10:35:21 track kernel: WARNING: /vol0 was not properly dismounted) >> >>And read any files from this drive. > > > That shouldn't be a surprise-- the disks themselves didn't fail, only > writing to them (possibly under heavy load?) failed-- and gmirror dropped > the disks. The first disk drop was ok-- the mirror should still work in > DEGRADED state. The second drop was critical which is why your system > broke. Mounting the disks individually will work of course. This error occured after 5 days of periodical copying /usr/ports to another partition. (I used this to test disk/filesystem before deploying to production) Before this test, the server has another problems with disks and whole server was replaced with newone, only first drive (ad4) is from original machine. (originaly discussed on freebsd-stable@ - disk disappeared from ATA channel - not listed by atacontrol list command) >>Can anybody tell me, where is the problem / how can I found what is wrong? > > > What's the output of "gmirror status" ?? I suspect on a reboot, gmirror > will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once > that is complete, gmirror won't be DEGRADED anymore. # gmirror status Name Status Components mirror/gm0 DEGRADED ad4 gmirror list and atacontrol list output can be found on http://www.quip.cz/1/freebsd/asus_rs120-e3/track_gmirror_list.txt Gmirror is not synchronized after reboot: Aug 1 09:14:50 track kernel: acd0: DVDROM at ata0-slave UDMA100 Aug 1 09:14:50 track kernel: ad4: 238475MB at ata2-master SATA150 Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0 created (id=565164480). Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad4 detected. Aug 1 09:14:50 track kernel: ad5: 238475MB at ata2-slave SATA150 Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 detected. Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) broken, skipping. Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad4 activated. Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. Aug 1 09:14:50 track kernel: Trying to mount root from ufs:/dev/mirror/gm0s1a (also included in http://www.quip.cz/1/freebsd/asus_rs120-e3/track_messages.txt) So disk is OK, but gmirror refused to use it? If disks are OK, what is wrong? What caused READ / WRITE timeouts? Broken SATA controler? FreeBSD ATA driver? Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 21:05:17 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D70AF16A4DA for ; Wed, 2 Aug 2006 21:05:17 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30301.mail.mud.yahoo.com (web30301.mail.mud.yahoo.com [68.142.200.94]) by mx1.FreeBSD.org (Postfix) with SMTP id 6587D43D5C for ; Wed, 2 Aug 2006 21:05:17 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 88313 invoked by uid 60001); 2 Aug 2006 21:05:16 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=GoALUwXbfEgcJbposwhSYbFvceJLJDyZ+vG0/7gXi3Zema5+Ih4h/bOTWsFM92B2aLgnr3fTBNlXFsUIh5+hoynzUSIf1UFIUZ4rjX3qCM459bX5Wjdd9ZVIDWJh5LVoYbO0+uc7+YY4Vvu3dKlYUQ9bubSg9dyC6SWjRn2k0zk= ; Message-ID: <20060802210516.88311.qmail@web30301.mail.mud.yahoo.com> Received: from [213.54.80.200] by web30301.mail.mud.yahoo.com via HTTP; Wed, 02 Aug 2006 14:05:16 PDT Date: Wed, 2 Aug 2006 14:05:16 -0700 (PDT) From: "R. B. Riddick" To: Miroslav Lachman <000.fbsd@quip.cz> In-Reply-To: <44D10D1D.9040700@quip.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 21:05:17 -0000 --- Miroslav Lachman <000.fbsd@quip.cz> wrote: > But how can I force gmirror to re-use this disk? I don't know, what > "broken, skipping" or "error=22" really means. > error=22 might be EINVAL, which means "invalid argument" (taken from /usr/include/errno.h)... I would try the following: 1. gmirror forget gm0 gmirror insert gm0 ad5 and/or 2. gmirror remove gm0 ad5 gmirror insert gm0 ad5 Maybe I would repeat 1. and 2. 2-3 times... ;-)) I personally never had such problems with gmirror, so that I would guess, that u still have ill hardware... Maybe the temperature is too high? Do u have enough fans? Did u measure the temperature after some hours of heavy use (I think, smartctl -a show the temperature somewhere...)? Synchronization should start then automatically... -Arne __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 21:07:11 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2BB416A4E0 for ; Wed, 2 Aug 2006 21:07:10 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id 592B743D49 for ; Wed, 2 Aug 2006 21:07:10 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 15481 invoked by uid 2001); 2 Aug 2006 21:07:09 -0000 Date: Wed, 2 Aug 2006 16:07:09 -0500 From: "Rick C. Petty" To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20060802210709.GA15310@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44D10D1D.9040700@quip.cz> User-Agent: Mutt/1.4.2.1i Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 21:07:11 -0000 On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote: > > >Did you have SMART enabled in the BIOS? > > Yes, (as I remember - I have only remote access now) and have Then I doubt the disk itself had any errors.. Likely a bad cable or controller, which I've typically seen manifested under heavier disk activity. > >It's already activated, so you can't add it again (as the message states). > > But how can I force gmirror to re-use this disk? I don't know, what > "broken, skipping" or "error=22" really means. There's no forcing, unless you specifically deactivated a provider. The mirror should auto-sync at startup. > >That shouldn't be a surprise-- the disks themselves didn't fail, only > >writing to them (possibly under heavy load?) failed-- and gmirror dropped > >the disks. The first disk drop was ok-- the mirror should still work in > >DEGRADED state. The second drop was critical which is why your system > >broke. Mounting the disks individually will work of course. > > This error occured after 5 days of periodical copying /usr/ports to > another partition. (I used this to test disk/filesystem before deploying > to production) Before this test, the server has another problems with > disks and whole server was replaced with newone, only first drive (ad4) > is from original machine. (originaly discussed on freebsd-stable@ - disk > disappeared from ATA channel - not listed by atacontrol list command) Yup, disks disappear when they stop responding to "bus reset" commands. This seems to happen on various controllers after an unpredictable number of READ_DMA or WRITE_DMA timeout errors. Theoretically, you could reinit the channel and see if the disk pops back up. One thing to note: I recommend putting the disks on separate channels so if a reinit fails, you don't lose both disks. I hate it when manufacturers put two SATA ports on the same ATA channel.. Cheap for them, problematic for you. > >>Can anybody tell me, where is the problem / how can I found what is wrong? > > > > > >What's the output of "gmirror status" ?? I suspect on a reboot, gmirror > >will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once > >that is complete, gmirror won't be DEGRADED anymore. > > # gmirror status > Name Status Components > mirror/gm0 DEGRADED ad4 Hmm, and is ad5 detected? (rhetorical question, because I see that it was) > Gmirror is not synchronized after reboot: > > Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 > detected. > Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) > broken, skipping. Looks like the disk was marked with bad metadata. > So disk is OK, but gmirror refused to use it? Yes. I would first suggest trying "gmirror deactivate -v gm0 ad5" then trying to reactivate it. Maybe that will flush out the wrong metadata. If that doesn't work, try booting in verbose mode and attaching the dmesg (in particular, when the mirror is being attached). Last resort (although not a horrible option), you can try removing ad5 from the mirror and relabelling (gmirror label, not bsdlabel) it. If the remove fails, use a combination of forget and clear. > If disks are OK, what is wrong? What caused READ / WRITE timeouts? > Broken SATA controler? FreeBSD ATA driver? Try replacing the cables, trying a different SATA controller. I've seen these timeouts *a lot* and usually my gmirror/gvinum partitions all survive (after reboot at least). There are a lot of threads on this and other mailing lists describing the timeout problems. -- Rick C. Petty From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 22:02:53 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2918716A57C for ; Wed, 2 Aug 2006 22:02:53 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D60843D46 for ; Wed, 2 Aug 2006 22:02:52 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id EA6495280 for ; Thu, 3 Aug 2006 00:02:49 +0200 (CEST) Message-ID: <44D12109.3010600@quip.cz> Date: Thu, 03 Aug 2006 00:02:49 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: freebsd-geom@freebsd.org References: <20060802210516.88311.qmail@web30301.mail.mud.yahoo.com> In-Reply-To: <20060802210516.88311.qmail@web30301.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 22:02:53 -0000 R. B. Riddick wrote: [...] > I personally never had such problems with gmirror, so that I would guess, that > u still have ill hardware... Maybe the temperature is too high? Do u have > enough fans? Did u measure the temperature after some hours of heavy use (I > think, smartctl -a show the temperature somewhere...)? ASUS RS120-E3 is barebone 1U rackmount with 8 (or 10?) fans, temperature is monitored every 5 minutes by smartctl invoked from MRTG and displayed in graphs. Disk drives temperature is under 40°C with heavy loaded system. Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 22:11:47 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5393216A4DA for ; Wed, 2 Aug 2006 22:11:47 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4E90943D69 for ; Wed, 2 Aug 2006 22:11:39 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 415465138A; Thu, 3 Aug 2006 00:11:38 +0200 (CEST) Received: from localhost (dlk147.neoplus.adsl.tpnet.pl [83.24.40.147]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 57AB650EA7; Thu, 3 Aug 2006 00:11:31 +0200 (CEST) Date: Thu, 3 Aug 2006 00:10:54 +0200 From: Pawel Jakub Dawidek To: "Rick C. Petty" Message-ID: <20060802221054.GB36506@garage.freebsd.pl> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xXmbgvnjoT4axfJE" Content-Disposition: inline In-Reply-To: <20060802210709.GA15310@megan.kiwi-computer.com> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng/devel-r804 (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=BAYES_00,RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: Miroslav Lachman <000.fbsd@quip.cz>, freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 22:11:47 -0000 --xXmbgvnjoT4axfJE Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 02, 2006 at 04:07:09PM -0500, Rick C. Petty wrote: > On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote: > > # gmirror status > > Name Status Components > > mirror/gm0 DEGRADED ad4 >=20 > Hmm, and is ad5 detected? (rhetorical question, because I see that it wa= s) >=20 > > Gmirror is not synchronized after reboot: > >=20 > > Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5=20 > > detected. > > Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0)= =20 > > broken, skipping. >=20 > Looks like the disk was marked with bad metadata. In case on an EIO error from the component, gmirror will mark it as broken (actually will bump genid on the rest of components). Such component won't be able to connect again, because gmirror belives it is broken and keeping it in his configuration is a bad idea. If you don't want components to be disconnected in case of an error, you need to set sysctl kern.geom.mirror.disconnect_on_failure to 0. If you are sure the disk is ok and it was bad cable or something like this, you may connect it again: # gmirror forget gm0 # gmirror insert gm0 ad5 --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --xXmbgvnjoT4axfJE Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFE0SLuForvXbEpPzQRAvw2AKDlSzf43P4ajXiUdM+qZufP0ywlNwCgo9Wm nkgM9AOrv8RFH6NGxmLchmo= =pEyM -----END PGP SIGNATURE----- --xXmbgvnjoT4axfJE-- From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 22:28:11 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 45C6E16A4E0 for ; Wed, 2 Aug 2006 22:28:11 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id BC4C643D77 for ; Wed, 2 Aug 2006 22:28:00 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 56BC9593B; Thu, 3 Aug 2006 00:27:59 +0200 (CEST) Message-ID: <44D126EF.9070503@quip.cz> Date: Thu, 03 Aug 2006 00:27:59 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: rick-freebsd@kiwi-computer.com References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> In-Reply-To: <20060802210709.GA15310@megan.kiwi-computer.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 22:28:11 -0000 Rick C. Petty wrote: > On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote: > >>>Did you have SMART enabled in the BIOS? >> >>Yes, (as I remember - I have only remote access now) and have > > > Then I doubt the disk itself had any errors.. Likely a bad cable or > controller, which I've typically seen manifested under heavier disk > activity. [...] > Yup, disks disappear when they stop responding to "bus reset" commands. > This seems to happen on various controllers after an unpredictable number > of READ_DMA or WRITE_DMA timeout errors. Theoretically, you could reinit > the channel and see if the disk pops back up. Reinit did not help, only reboot. > One thing to note: I > recommend putting the disks on separate channels so if a reinit fails, you > don't lose both disks. I hate it when manufacturers put two SATA ports on > the same ATA channel.. Cheap for them, problematic for you. I dont understand hardware much, but SATA controller is set to IDE mode in BIOS and disks are on ATA channel 2 as ad4 Master and ad5 Slave. If BIOS settings is changed to AHCI, dmesg shows two more ATA channels, ad4 as ata2-master and second disk will be ad8 on ata4-master (without changing cables / connections). As I see same problem with disk disappearing with AHCI and IDE, I have decided to use IDE mode, which seems to me little bit faster in gmirror synchronization. Is there big difference between AHCI and IDE mode of SATA controller? As I see in dmesg, controller is Intel ICH7 *SATA300* but disks are SATA150, I this cause some troubles? >>>>Can anybody tell me, where is the problem / how can I found what is wrong? >>> >>> >>>What's the output of "gmirror status" ?? I suspect on a reboot, gmirror >>>will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once >>>that is complete, gmirror won't be DEGRADED anymore. >> >># gmirror status >> Name Status Components >>mirror/gm0 DEGRADED ad4 > > > Hmm, and is ad5 detected? (rhetorical question, because I see that it was) > > >>Gmirror is not synchronized after reboot: >> >>Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 >>detected. >>Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) >>broken, skipping. > > > Looks like the disk was marked with bad metadata. > > >>So disk is OK, but gmirror refused to use it? > > > Yes. I would first suggest trying "gmirror deactivate -v gm0 ad5" then > trying to reactivate it. Maybe that will flush out the wrong metadata. > If that doesn't work, try booting in verbose mode and attaching the dmesg > (in particular, when the mirror is being attached). > Last resort (although not a horrible option), you can try removing ad5 from > the mirror and relabelling (gmirror label, not bsdlabel) it. If the remove > fails, use a combination of forget and clear. gmirror forget and insert helped: root@track ~/# gmirror deactivate -v gm0 ad5 No such provider: ad5. root@track ~/# gmirror forget -v gm0 Done. root@track ~/# gmirror insert -v gm0 ad5 Done. root@track ~/# gmirror status Name Status Components mirror/gm0 DEGRADED ad4 ad5 (0%) >>If disks are OK, what is wrong? What caused READ / WRITE timeouts? >>Broken SATA controler? FreeBSD ATA driver? > > > Try replacing the cables, trying a different SATA controller. I've seen > these timeouts *a lot* and usually my gmirror/gvinum partitions all > survive (after reboot at least). There are a lot of threads on this and > other mailing lists describing the timeout problems. Yes, I read many post about similar problems. I have similar problem on 4 machines, so I think this is not cable problem. Maybe bad controller in whole serie of ASUS RS120, or something like this. (4 of 4 same machines has similar problems with disk subsystem) Thank you. Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 22:43:14 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7091816A4DE for ; Wed, 2 Aug 2006 22:43:14 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A3FF43D49 for ; Wed, 2 Aug 2006 22:43:14 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id DD1C2593B for ; Thu, 3 Aug 2006 00:43:12 +0200 (CEST) Message-ID: <44D12A80.9040802@quip.cz> Date: Thu, 03 Aug 2006 00:43:12 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: freebsd-geom@freebsd.org References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> <44D126EF.9070503@quip.cz> In-Reply-To: <44D126EF.9070503@quip.cz> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 22:43:14 -0000 Miroslav Lachman wrote: > gmirror forget and insert helped: > > root@track ~/# gmirror deactivate -v gm0 ad5 > No such provider: ad5. > root@track ~/# gmirror forget -v gm0 > Done. > root@track ~/# gmirror insert -v gm0 ad5 > Done. > > root@track ~/# gmirror status > Name Status Components > mirror/gm0 DEGRADED ad4 > ad5 (0%) > Something is definitely wrong. Gmirror status still shows 0% after couple of minutes (normaly synchronization progress is about 1% per minute) systat -vmstat shows less then 1MB/s instead of usual 40MB/s, but 100% busy. Disks ad4 ad5 KB/t 121 128 tps 4 4 MB/s 0.45 0.45 % busy 83 103 Is there any chance to found source of problems without step by step replacement of each component? I can't believe that I have bad cables in 4 new machines or bad hard drives in each machine... :o( totally tired Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 23:21:55 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9253316A56B for ; Wed, 2 Aug 2006 23:21:55 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id B68ED43DBA for ; Wed, 2 Aug 2006 23:20:28 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 16423 invoked by uid 2001); 2 Aug 2006 23:20:22 -0000 Date: Wed, 2 Aug 2006 18:20:22 -0500 From: "Rick C. Petty" To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20060802232022.GA16385@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> <44D126EF.9070503@quip.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44D126EF.9070503@quip.cz> User-Agent: Mutt/1.4.2.1i Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 23:21:55 -0000 On Thu, Aug 03, 2006 at 12:27:59AM +0200, Miroslav Lachman wrote: > Rick C. Petty wrote: > > >One thing to note: I > >recommend putting the disks on separate channels so if a reinit fails, you > >don't lose both disks. I hate it when manufacturers put two SATA ports on > >the same ATA channel.. Cheap for them, problematic for you. > > I dont understand hardware much, but SATA controller is set to IDE mode > in BIOS and disks are on ATA channel 2 as ad4 Master and ad5 Slave. If > BIOS settings is changed to AHCI, dmesg shows two more ATA channels, ad4 > as ata2-master and second disk will be ad8 on ata4-master (without > changing cables / connections). As I see same problem with disk > disappearing with AHCI and IDE, I have decided to use IDE mode, which > seems to me little bit faster in gmirror synchronization. That's surprising (that it's faster). In fact I would expect the reverse. > Is there big difference between AHCI and IDE mode of SATA controller? Not quite sure, but apparently your AHCI puts the two SATA channels on separate ATA channels. I've had better luck with that, in general. > As I see in dmesg, controller is Intel ICH7 *SATA300* but disks are > SATA150, I this cause some troubles? It shouldn't. The disks should perform at SATA150. Except that you use IDE mode, so I suspect they really run at UDMA100 (as your dmesg showed before). > > root@track ~/# gmirror deactivate -v gm0 ad5 > No such provider: ad5. > root@track ~/# gmirror forget -v gm0 > Done. > root@track ~/# gmirror insert -v gm0 ad5 > Done. > > root@track ~/# gmirror status > Name Status Components > mirror/gm0 DEGRADED ad4 > ad5 (0%) As I expected. Maybe this isn't "proper" but if you're stuck resyncing the entire mirror anyway... > Yes, I read many post about similar problems. I have similar problem on > 4 machines, so I think this is not cable problem. Maybe bad controller > in whole serie of ASUS RS120, or something like this. (4 of 4 same > machines has similar problems with disk subsystem) Sometimes replacing the cables seems to help me, other times it doesn't. I suspect cosmic radiation or some phase of the moon. -- Rick C. Petty From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 23:33:00 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 428F716A500 for ; Wed, 2 Aug 2006 23:33:00 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id 0A63B43D5D for ; Wed, 2 Aug 2006 23:32:55 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 16585 invoked by uid 2001); 2 Aug 2006 23:32:55 -0000 Date: Wed, 2 Aug 2006 18:32:55 -0500 From: "Rick C. Petty" To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20060802233255.GB16385@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> <44D126EF.9070503@quip.cz> <44D12A80.9040802@quip.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44D12A80.9040802@quip.cz> User-Agent: Mutt/1.4.2.1i Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 23:33:00 -0000 On Thu, Aug 03, 2006 at 12:43:12AM +0200, Miroslav Lachman wrote: > > Something is definitely wrong. Gmirror status still shows 0% after > couple of minutes (normaly synchronization progress is about 1% per minute) Under what conditions do you define "normally"? I think you can tweak the numbers to make it go faster or slower, and I think it's dependent upon (disk) idle time. > systat -vmstat shows less then 1MB/s instead of usual 40MB/s, but 100% busy. > > Disks ad4 ad5 > KB/t 121 128 > tps 4 4 > MB/s 0.45 0.45 > % busy 83 103 What other activity is happening on the box? Are you in the middle of a background fsck? What does the output of "atacontrol mode ad4" (and ad5) show? Are you sure your "normal" synchronization happened when you were in IDE mode instead of AHCI? > Is there any chance to found source of problems without step by step > replacement of each component? That depends upon the problems. To diagnose anything, you need to be able to reliably bring down the mirror-- e.g. heavy disk activity. > I can't believe that I have bad cables in > 4 new machines or bad hard drives in each machine... :o( I bought identical machines (cpus, boards, disks, cables, etc.) and had different results on each. Especially when you buy identical stuff, there is a small probability that they'll all have the same problems-- for example, a bad batch of disks. In your case, I'd investigate which steps you have to preform to repeatably cause the failures. On my systems, the heavier the disk load, the higher the probability of failure. Upgrading to the latest 6.1-STABLE might help in some cases. -- Rick C. Petty From owner-freebsd-geom@FreeBSD.ORG Thu Aug 3 06:19:49 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C168A16A4DA for ; Thu, 3 Aug 2006 06:19:49 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30313.mail.mud.yahoo.com (web30313.mail.mud.yahoo.com [68.142.201.231]) by mx1.FreeBSD.org (Postfix) with SMTP id 4A44B43D46 for ; Thu, 3 Aug 2006 06:19:49 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 19662 invoked by uid 60001); 3 Aug 2006 06:19:48 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=LrsvBZUGF6AvGDWT9FpYR+60UlJROA7zxIIG//0+W2COPHdK7Gvy1R9msa5Q+EJKNPQL4/OGtduY1mkaPuti/fbc78LDN59EJQNSWqIEymamq0zEvsDFVT+YAJBio/FZxXStTmtr4gLCGFLVJpQTyEPlwihtZ03u8yePQYEH1As= ; Message-ID: <20060803061948.19660.qmail@web30313.mail.mud.yahoo.com> Received: from [213.54.80.200] by web30313.mail.mud.yahoo.com via HTTP; Wed, 02 Aug 2006 23:19:48 PDT Date: Wed, 2 Aug 2006 23:19:48 -0700 (PDT) From: "R. B. Riddick" To: Miroslav Lachman <000.fbsd@quip.cz>, freebsd-geom@freebsd.org In-Reply-To: <44D12109.3010600@quip.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Aug 2006 06:19:49 -0000 --- Miroslav Lachman <000.fbsd@quip.cz> wrote: > ASUS RS120-E3 is barebone 1U rackmount with 8 (or 10?) fans, temperature > is monitored every 5 minutes by smartctl invoked from MRTG and displayed > in graphs. Disk drives temperature is under 40°C with heavy loaded system. > Hmm... I do not know modern disks, but my old disks (2003) have about 27 to 34 deg Celsius. I would still guess ur disks r too hot... I had similar symptoms, when it became summer: First a lot of read/write errors, then the box could do so much - after a reboot it did not become better, so that I moved one disk to a cooler place (away from the other disk)... -Arne __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From owner-freebsd-geom@FreeBSD.ORG Thu Aug 3 07:59:30 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 405BA16A4DA for ; Thu, 3 Aug 2006 07:59:30 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6991843D55 for ; Thu, 3 Aug 2006 07:59:27 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 4909852A9; Thu, 3 Aug 2006 09:59:26 +0200 (CEST) Message-ID: <44D1ACDE.8090104@quip.cz> Date: Thu, 03 Aug 2006 09:59:26 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: "R. B. Riddick" References: <20060803061948.19660.qmail@web30313.mail.mud.yahoo.com> In-Reply-To: <20060803061948.19660.qmail@web30313.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Aug 2006 07:59:30 -0000 R. B. Riddick wrote: > --- Miroslav Lachman <000.fbsd@quip.cz> wrote: > >>ASUS RS120-E3 is barebone 1U rackmount with 8 (or 10?) fans, temperature >>is monitored every 5 minutes by smartctl invoked from MRTG and displayed >>in graphs. Disk drives temperature is under 40°C with heavy loaded system. >> > > Hmm... I do not know modern disks, but my old disks (2003) have about 27 to 34 > deg Celsius. I would still guess ur disks r too hot... I had similar symptoms, > when it became summer: First a lot of read/write errors, then the box could do > so much - after a reboot it did not become better, so that I moved one disk to > a cooler place (away from the other disk)... Modern disks specifications allow temperatures to 55°C, for example my home HDD has 50°C, on another machines I have disks with 51°C without troubles. I 1U rackmount there is no cooler place - there is no space :o). I will be happier, if disks will be cooler, but there is no way to do it. Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Thu Aug 3 09:12:46 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91D4F16A4E1 for ; Thu, 3 Aug 2006 09:12:46 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD6DE43D53 for ; Thu, 3 Aug 2006 09:12:45 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id B7774527E; Thu, 3 Aug 2006 11:12:43 +0200 (CEST) Message-ID: <44D1BE0B.9090709@quip.cz> Date: Thu, 03 Aug 2006 11:12:43 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: rick-freebsd@kiwi-computer.com References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> <44D126EF.9070503@quip.cz> <44D12A80.9040802@quip.cz> <20060802233255.GB16385@megan.kiwi-computer.com> In-Reply-To: <20060802233255.GB16385@megan.kiwi-computer.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Aug 2006 09:12:46 -0000 Rick C. Petty wrote: > On Thu, Aug 03, 2006 at 12:43:12AM +0200, Miroslav Lachman wrote: > >>Something is definitely wrong. Gmirror status still shows 0% after >>couple of minutes (normaly synchronization progress is about 1% per minute) > > > Under what conditions do you define "normally"? I think you can tweak > the numbers to make it go faster or slower, and I think it's dependent > upon (disk) idle time. normally = few days ago, same HW, same BIOS settings etc. Whole synchronization of 250GB disks was done after about 90 minutes. >>systat -vmstat shows less then 1MB/s instead of usual 40MB/s, but 100% busy. >> >>Disks ad4 ad5 >>KB/t 121 128 >>tps 4 4 >>MB/s 0.45 0.45 >>% busy 83 103 > > > What other activity is happening on the box? Are you in the middle of a > background fsck? Almost no other activities, system has installed apache, mysql, postfix etc., but not serving any requests. Fsck was not running. > What does the output of "atacontrol mode ad4" (and ad5) show? Are you > sure your "normal" synchronization happened when you were in IDE mode > instead of AHCI? Yes, "normal" synchronization was with IDE mode. IDE mode was set more then week ago and as I play with gmirror I run synchronization many times. # atacontrol mode ad4 current mode = SATA150 # atacontrol mode ad5 current mode = SATA150 >>Is there any chance to found source of problems without step by step >>replacement of each component? > > > That depends upon the problems. To diagnose anything, you need to be > able to reliably bring down the mirror-- e.g. heavy disk activity. > > >>I can't believe that I have bad cables in >>4 new machines or bad hard drives in each machine... :o( > > > I bought identical machines (cpus, boards, disks, cables, etc.) and had > different results on each. Especially when you buy identical stuff, > there is a small probability that they'll all have the same problems-- > for example, a bad batch of disks. In your case, I'd investigate which > steps you have to preform to repeatably cause the failures. On my > systems, the heavier the disk load, the higher the probability of failure. > Upgrading to the latest 6.1-STABLE might help in some cases. Same here - heavier disk load, more often failures. After few crashes, disks disappeared in the middle of gmirror synchronization (heavy disk load). The disk was replaced with new one without success, then the whole server was replaced and running fine for about 1 week under heavy test load (concurrent copying of ports tree in infinete loop). Now the mentioned problem occured. Now it seems that it is disk problem this time. Synchronization was running whole night with tens or hunderds of messages like this: ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=9719424 ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad5: error issuing SETFEATURES SET TRANSFER MODE command After six hours I got message from smartd Device: /dev/ad5, FAILED SMART self-check. BACK UP DATA NOW! Device: /dev/ad5, 52 Currently unreadable (pending) sectors Device: /dev/ad5, 52 Offline uncorrectable sectors 90 minutes later, system reboot itself, trying rebuild provider ad5 and /var/log/messeges is full of ad5: FAILURE - SETFEATURES SET TRANSFER MODE status=71 error=4 ad5: FAILURE - SETFEATURES ENABLE RCACHE status=71 error=4 ad5: FAILURE - SETFEATURES ENABLE WCACHE status=71 error=4 ad5: FAILURE - SET_MULTI status=71 error=4 ad5: TIMEOUT - READ_DMA retrying (1 retry left) LBA=1 1 hour later ad5: FAILURE - ATA_IDENTIFY status=71 error=4 LBA=0 ad5: FAILURE - ATAPI_IDENTIFY status=71 error=4 LBA=0 smartd[506]: Device: /dev/ad5, failed to read SMART Attribute Data In MRTG graphs I got disk temperature (38°C) and Reallocated Sector Count which is increasing from time of synchronization start and after 5 hours the number of reallocated sectors goes above 2000! (out of range of the graph) After manual reboot, there is no ad5 device. I hope new drive helps, but I am still nervous, because I have similar troubles with 2 machines (both replaced with new one - so I played with 4 machines)... Thank you for your help. Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Thu Aug 3 17:10:27 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 891DC16A5B3 for ; Thu, 3 Aug 2006 17:10:27 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id D96AE43D46 for ; Thu, 3 Aug 2006 17:10:26 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 23489 invoked by uid 2001); 3 Aug 2006 17:10:25 -0000 Date: Thu, 3 Aug 2006 12:10:25 -0500 From: "Rick C. Petty" To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20060803171025.GA23405@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> <44D126EF.9070503@quip.cz> <44D12A80.9040802@quip.cz> <20060802233255.GB16385@megan.kiwi-computer.com> <44D1BE0B.9090709@quip.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44D1BE0B.9090709@quip.cz> User-Agent: Mutt/1.4.2.1i Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Aug 2006 17:10:27 -0000 On Thu, Aug 03, 2006 at 11:12:43AM +0200, Miroslav Lachman wrote: > Rick C. Petty wrote: > > >What other activity is happening on the box? Are you in the middle of a > >background fsck? > > Almost no other activities, system has installed apache, mysql, postfix > etc., but not serving any requests. Fsck was not running. But was any other process hitting the disk? You could try doing the synchronization in single-user mode and see if the throughput jumps up. > Now it seems that it is disk problem this time. [snip] > After six hours I got message from smartd > Device: /dev/ad5, FAILED SMART self-check. BACK UP DATA NOW! > Device: /dev/ad5, 52 Currently unreadable (pending) sectors > Device: /dev/ad5, 52 Offline uncorrectable sectors [snip] > smartd[506]: Device: /dev/ad5, failed to read SMART Attribute Data > > In MRTG graphs I got disk temperature (38?C) and Reallocated Sector > Count which is increasing from time of synchronization start and after 5 > hours the number of reallocated sectors goes above 2000! (out of range > of the graph) This certainly sounds like a disk-related problem. Likely your previous failures were due to the same problems. Time to send the disks back to the manufacturer for replacement.. :-/ > After manual reboot, there is no ad5 device. I hope new drive helps, but > I am still nervous, because I have similar troubles with 2 machines > (both replaced with new one - so I played with 4 machines)... Chance of one "new" disk being bad-- pretty low. Chance of two new disks being bad-- even lower. Chance of three or more disks going bad around the same time-- much higher. I've noticed this type of behavior before. Someone correct me if I'm wrong but it appears that you probably got a bad batch of disks. Try throwing a different set of disks on the boxes (preferrably a different manufacturer). I would also try swapping with brand new high-quality cables (just because they're cheaper than new disks). -- Rick C. Petty From owner-freebsd-geom@FreeBSD.ORG Fri Aug 4 13:33:48 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 074AC16A4DE for ; Fri, 4 Aug 2006 13:33:48 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id AAAF543D45 for ; Fri, 4 Aug 2006 13:33:47 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k74DXkka090077 for ; Fri, 4 Aug 2006 08:33:46 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44D34CCC.30100@centtech.com> Date: Fri, 04 Aug 2006 08:34:04 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.5 (X11/20060802) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1634/Wed Aug 2 17:32:49 2006 on mh2.centtech.com X-Virus-Status: Clean Subject: gjournal - Cannot find journal geom X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Aug 2006 13:33:48 -0000 Hi, I'm setting up two large (10TB) GJOURNAL devices. All seems to go well, but after mounting the newly newfs'ed filesystems, I see messages like this: Aug 4 08:23:27 snapshot1 kernel: GEOM_JOURNAL: Cannot find journal geom for /dev/label/vol11. Aug 4 08:23:37 snapshot1 kernel: GEOM_JOURNAL: Cannot find journal geom for /dev/label/vol10. Here's some info: # gjournal list Geom name: gjournal 2324645206 ID: 2324645206 Providers: 1. Name: da10.journal Mediasize: 10494183210496 (9.5T) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: da9 Mediasize: 7996964864 (7.4G) Sectorsize: 512 Mode: r1w1e1 Jend: 7996964352 Jstart: 0 Role: Journal 2. Name: da10 Mediasize: 10494183211008 (9.5T) Sectorsize: 512 Mode: r1w1e1 Role: Data Geom name: gjournal 3465440030 ID: 3465440030 Providers: 1. Name: da12.journal Mediasize: 10494183210496 (9.5T) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: da11 Mediasize: 7996964864 (7.4G) Sectorsize: 512 Mode: r1w1e1 Jend: 7996964352 Jstart: 0 Role: Journal 2. Name: da12 Mediasize: 10494183211008 (9.5T) Sectorsize: 512 Mode: r1w1e1 Role: Data # glabel list Geom name: da12.journal Providers: 1. Name: label/vol10 Mediasize: 10494183209984 (9.5T) Sectorsize: 512 Mode: r1w1e1 secoffset: 0 offset: 0 seclength: 20496451582 length: 10494183209984 index: 0 Consumers: 1. Name: da12.journal Mediasize: 10494183210496 (9.5T) Sectorsize: 512 Mode: r1w1e2 Geom name: da10.journal Providers: 1. Name: label/vol11 Mediasize: 10494183209984 (9.5T) Sectorsize: 512 Mode: r1w1e1 secoffset: 0 offset: 0 seclength: 20496451582 length: 10494183209984 index: 0 Consumers: 1. Name: da10.journal Mediasize: 10494183210496 (9.5T) Sectorsize: 512 Mode: r1w1e2 glabel label -v vol10 /dev/da12.journal glabel label -v vol11 /dev/da10.journal newfs -O2 -U /dev/label/vol10 mount -oasync,noatime,gjournal /dev/label/vol10 /vol10 newfs -O2 -U /dev/label/vol11 mount -oasync,noatime,gjournal /dev/label/vol11 /vol11 # gjournal status -s da10.journal N/A da9 da10.journal N/A da10 da12.journal N/A da11 da12.journal N/A da12 Aug 4 08:07:08 snapshot1 kernel: GEOM_LABEL: Label for provider da12.journal is label/vol10. Aug 4 08:07:15 snapshot1 kernel: GEOM_LABEL: Label for provider da10.journal is label/vol11. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: Journal 2324645206: da9 contains journal. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: Journal 2324645206: da10 contains data. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: Journal da10 clean. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: BIO_FLUSH not supported by da9. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: BIO_FLUSH not supported by da10. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: Journal 3465440030: da11 contains journal. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: Journal 3465440030: da12 contains data. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: Journal da12 clean. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: BIO_FLUSH not supported by da11. Aug 4 07:55:06 snapshot1 kernel: GEOM_JOURNAL: BIO_FLUSH not supported by da12. -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-geom@FreeBSD.ORG Fri Aug 4 18:16:17 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2158916A4DA for ; Fri, 4 Aug 2006 18:16:17 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 79FF943D6A for ; Fri, 4 Aug 2006 18:16:15 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id F0B2B51396; Fri, 4 Aug 2006 20:16:13 +0200 (CEST) Received: from localhost (dkr217.neoplus.adsl.tpnet.pl [83.24.21.217]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 5478E50EA7; Fri, 4 Aug 2006 20:16:08 +0200 (CEST) Date: Fri, 4 Aug 2006 20:15:30 +0200 From: Pawel Jakub Dawidek To: Eric Anderson Message-ID: <20060804181530.GB3225@garage.freebsd.pl> References: <44D34CCC.30100@centtech.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="U+BazGySraz5kW0T" Content-Disposition: inline In-Reply-To: <44D34CCC.30100@centtech.com> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng/devel-r804 (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=BAYES_00,RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-geom@freebsd.org Subject: Re: gjournal - Cannot find journal geom X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Aug 2006 18:16:17 -0000 --U+BazGySraz5kW0T Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 04, 2006 at 08:34:04AM -0500, Eric Anderson wrote: > Hi, > I'm setting up two large (10TB) GJOURNAL devices. All seems to go well, = but after mounting the newly newfs'ed filesystems, I see messages like this: >=20 > Aug 4 08:23:27 snapshot1 kernel: GEOM_JOURNAL: Cannot find journal geom = for /dev/label/vol11. > Aug 4 08:23:37 snapshot1 kernel: GEOM_JOURNAL: Cannot find journal geom = for /dev/label/vol10. I assume you use the older version, which used a hack to find out if file system is on top of gjournal - it was looking at the name of provider on which file system is mounted on. In your configuration you are using glabel on top of gjournal that's why it doesn't work. In the new version it is done right and file system is using BIO_GETATTR for this, which always work (if the layers in the middle pass BIO_GETATTR requests). So bascially your problem will be gone with the new version I planing to publish next week. In the mean time you may want to mount file systems directly from /dev/.journal providers. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --U+BazGySraz5kW0T Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFE047CForvXbEpPzQRAngJAKCQIYpyBQi3++Q/4jsFlUam8FPnsACg4IOe +5tp3QWu7D1NFSjP1zU8cOM= =CRCv -----END PGP SIGNATURE----- --U+BazGySraz5kW0T-- From owner-freebsd-geom@FreeBSD.ORG Sat Aug 5 18:33:18 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E50816A4EE for ; Sat, 5 Aug 2006 18:33:18 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from home.quip.cz (grimm.quip.cz [213.220.192.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37C4143DAC for ; Sat, 5 Aug 2006 18:31:24 +0000 (GMT) (envelope-from 000.fbsd@quip.cz) Received: from [192.168.1.2] (qwork.quip.test [192.168.1.2]) by home.quip.cz (Postfix) with ESMTP id 54BE153AB; Sat, 5 Aug 2006 20:31:15 +0200 (CEST) Message-ID: <44D4E3F2.5060709@quip.cz> Date: Sat, 05 Aug 2006 20:31:14 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cs, cz, en, en-us MIME-Version: 1.0 To: rick-freebsd@kiwi-computer.com References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> <20060802210709.GA15310@megan.kiwi-computer.com> <44D126EF.9070503@quip.cz> <44D12A80.9040802@quip.cz> <20060802233255.GB16385@megan.kiwi-computer.com> <44D1BE0B.9090709@quip.cz> <20060803171025.GA23405@megan.kiwi-computer.com> In-Reply-To: <20060803171025.GA23405@megan.kiwi-computer.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Aug 2006 18:33:18 -0000 Rick C. Petty wrote: > On Thu, Aug 03, 2006 at 11:12:43AM +0200, Miroslav Lachman wrote: [...] > This certainly sounds like a disk-related problem. Likely your previous > failures were due to the same problems. Time to send the disks back to the > manufacturer for replacement.. :-/ Disk was replaced with new one and synchronization was much slower then usual (more than 5 hours instead of 1:30) If I run gstat, disks are idle (0% busy), then I star dd command to write some test file to disk and see that dd writing is about 10 times slower then on another "single disk" system. root@track = 2x 250GB Seagate SATA disks in gmirror + P4 3GHz + 1GB RAM root@grimm2 = 1x 300GB Samsung ATA disk + AMD Burton 2500 + 256MB RAM both machines has 6.1-RELEASE and similar disk partitions root@track /vol0/# dd if=/dev/zero of=/vol0/test bs=1m count=10 10+0 records in 10+0 records out 10485760 bytes transferred in 1.639139 secs (6397114 bytes/sec) root@track /vol0/# dd if=/dev/zero of=/vol0/test bs=1m count=128 128+0 records in 128+0 records out 134217728 bytes transferred in 24.923024 secs (5385291 bytes/sec) root@track /vol0/# dd if=/dev/zero of=/vol0/test bs=1m count=1024 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 190.751349 secs (5629013 bytes/sec) root@grimm2 ~/# dd if=/dev/zero of=/vol0/test bs=1m count=10 10+0 records in 10+0 records out 10485760 bytes transferred in 0.173180 secs (60548295 bytes/sec) root@grimm2 ~/# dd if=/dev/zero of=/vol0/test bs=1m count=128 128+0 records in 128+0 records out 134217728 bytes transferred in 2.479431 secs (54132468 bytes/sec) root@grimm2 ~/# dd if=/dev/zero of=/vol0/test bs=1m count=1024 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 21.136426 secs (50800539 bytes/sec) grimm2 is almost 10 times faster! Is it expected, that gmirror is 10 time slower? I don't think so. Does anybody know how can I "debug" this slowness? >>After manual reboot, there is no ad5 device. I hope new drive helps, but >>I am still nervous, because I have similar troubles with 2 machines >>(both replaced with new one - so I played with 4 machines)... > > > Chance of one "new" disk being bad-- pretty low. > Chance of two new disks being bad-- even lower. > Chance of three or more disks going bad around the same time-- much higher. > > I've noticed this type of behavior before. Someone correct me if I'm > wrong but it appears that you probably got a bad batch of disks. Try > throwing a different set of disks on the boxes (preferrably a different > manufacturer). I would also try swapping with brand new high-quality > cables (just because they're cheaper than new disks). > > -- Rick C. Petty Thanks for your time and help If anything can goes bad, then everything will go bad :( (I am sorry if my english is bad, I think you know what I mean)