From owner-freebsd-fs@FreeBSD.ORG Sun May 18 07:11:47 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D861C1065677 for ; Sun, 18 May 2008 07:11:47 +0000 (UTC) (envelope-from andrew@thefrog.net) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.232]) by mx1.freebsd.org (Postfix) with ESMTP id BB6218FC16 for ; Sun, 18 May 2008 07:11:47 +0000 (UTC) (envelope-from andrew@thefrog.net) Received: by rv-out-0506.google.com with SMTP id b25so912555rvf.43 for ; Sun, 18 May 2008 00:11:47 -0700 (PDT) Received: by 10.141.137.16 with SMTP id p16mr2872136rvn.192.1211094707041; Sun, 18 May 2008 00:11:47 -0700 (PDT) Received: from qurbaga.plantsoft.org ( [121.44.4.97]) by mx.google.com with ESMTPS id g31sm10158869rvb.2.2008.05.18.00.11.41 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 18 May 2008 00:11:44 -0700 (PDT) Message-Id: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> From: Andrew Hill To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Sun, 18 May 2008 17:11:37 +1000 X-Mailer: Apple Mail (2.919.2) Sender: Andrew Hill Subject: ZFS lockup in "zfs" state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 May 2008 07:11:47 -0000 > The following patch, published some time ago by pjd helped me: > http://mbsd.msk.ru/dist/zfs_lockup.diff > > 100+ days of uptime of heavily loaded machines and no problems so far. > > Hope it would help. I applied this patch with some modifications to fix up the file names as they seem to have moved from - src/sys/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h - src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c - src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c to - src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h - src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c - src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (and pointed the kernel configuration file, MASSHOSTING_7_64, to my own kernel config) buildworld and buildkernel succeeded without error, but when i installed the new kernel and rebooted i got the following output (the important point being the failure to load zfs on the 8th line) May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1992-2008 The FreeBSD Project. May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 May 17 17:02:06 <0.2> gutter kernel: The Regents of the University of California. All rights reserved. May 17 17:02:06 <0.2> gutter kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. May 17 17:02:06 <0.2> gutter kernel: FreeBSD 7.0-STABLE #6: Sat May 17 16:39:32 EST 2008 May 17 17:02:06 <0.2> gutter kernel: root@gutter.thefrog.net:/usr/obj/ usr/src/sys/GUTTER May 17 17:02:06 <0.2> gutter kernel: link_elf_obj: symbol kproc_exit undefined May 17 17:02:06 <0.2> gutter kernel: KLD file zfs.ko - could not finalize loading May 17 17:02:06 <0.2> gutter kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 May 17 17:02:06 <0.2> gutter kernel: CPU: AMD Athlon(tm) 64 Processor 3200+ (2010.31-MHz K8-class CPU) May 17 17:02:06 <0.2> gutter kernel: Origin = "AuthenticAMD" Id = 0x10ff0 Stepping = 0 May 17 17:02:06 <0.2> gutter kernel: Features =0x78bfbff May 17 17:02:06 <0.2> gutter kernel: AMD Features=0xe2500800 May 17 17:02:06 <0.2> gutter kernel: AMD Features2=0x1 May 17 17:02:06 <0.2> gutter kernel: usable memory = 2137882624 (2038 MB) May 17 17:02:06 <0.2> gutter kernel: avail memory = 2060988416 (1965 MB) May 17 17:02:06 <0.2> gutter kernel: ACPI APIC Table: May 17 17:02:06 <0.2> gutter kernel: ioapic0 irqs 0-23 on motherboard May 17 17:02:06 <0.2> gutter kernel: ad0: 238475MB at ata0-master UDMA100 May 17 17:02:06 <0.2> gutter kernel: ad2: 238475MB at ata1-master UDMA100 May 17 17:02:06 <0.2> gutter kernel: ad3: 152627MB at ata1-slave UDMA100 May 17 17:02:06 <0.2> gutter kernel: ad4: 476940MB at ata2-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad6: 715404MB at ata3-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad8: 305245MB at ata4-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad10: 305245MB at ata5-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad12: 305245MB at ata6-master SATA150 May 17 17:02:06 <0.2> gutter kernel: Trying to mount root from zfs:tank/root May 17 17:02:06 <0.2> gutter kernel: May 17 17:02:06 <0.2> gutter kernel: Manual root filesystem specification: May 17 17:02:06 <0.2> gutter kernel: : Mount using filesystem May 17 17:02:06 <0.2> gutter kernel: eg. ufs:da0s1a May 17 17:02:06 <0.2> gutter kernel: ? List valid disk boot devices May 17 17:02:06 <0.2> gutter kernel: Abort manual input May 17 17:02:06 <0.2> gutter kernel: May 17 17:02:06 <0.2> gutter kernel: mountroot> at this point, since zfs has not been loaded, obviously i could not get it to mount root from zfs:tank/root, and resorted to a backup ufs root to put my old kernel back in place i'm not sure if there is more output available than just the "could not finalize loading", if so please let me know where to look and i'd love to re-test this patch if it'll provide more information right now, i'm getting uptimes in the order of days before everything locks up, i assume its related to this bug, though i'm also getting the following output when it locks up ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631 ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650 ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007 ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938 ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631 ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650 ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007 ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938 ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631 ad0: FAILURE - WRITE_DMA timed out LBA=234920650 ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007 ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938 typically repeated for a number of different LBA values before the system panics. I don't know if this is more likely to be related to the cause of the lockups (e.g. faulty hardware/driver) or if its an effect of the lockup (e.g. waiting on a deadlocked thread)... from what i've found searching mailing lists, this kind of error seems to turn up with faulty hardware/drivers so i guess it could just be that zfs exposes the faults because its using the hardware differently to my previous ufs setup... in terms of my specific setup, i have 2gb ram, i'm running from up-to- date -STABLE source (apart from my attempt to apply the aforementioned patch), i'm running an amd64 kernel, and my /boot/loader.conf looks like this: vm.kmem_size_max="1610612736" vm.kmem_size="1610612736" zfs_load="YES" vfs.root.mountfrom="zfs:tank/root" vfs.zfs.prefetch_disable="1" vfs.zfs.arc_max="838860800" the last line was an attempt to reduce the amount of arc cache in the kernel in case it was having trouble locating memory blocks for other things (as the default value had it at 1.2gb) but adding that parameter doesn't seem to have had any effect anyway, any info toward resolving this would be greatly appreciated, and otherwise let me know what further info i can provide to help track down the problem Andrew From owner-freebsd-fs@FreeBSD.ORG Sun May 18 12:33:59 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30C561065675 for ; Sun, 18 May 2008 12:33:59 +0000 (UTC) (envelope-from ighighi@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.234]) by mx1.freebsd.org (Postfix) with ESMTP id 083C78FC29 for ; Sun, 18 May 2008 12:33:58 +0000 (UTC) (envelope-from ighighi@gmail.com) Received: by rv-out-0506.google.com with SMTP id l9so934873rvb.1 for ; Sun, 18 May 2008 05:33:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=Tpym2iJ7xyFWHqUObeZgri+kjPMmRt+oT8yJdvR4zzQ=; b=KTuEu4aWy5loC008hW6mb/LuBamp6qcXyIjUOHCtxR7DL0zjS9aPX4BVmqZyY6Rg1uZ2udndEfSJU1TPia21hDoytDjSRRygY7cDmxVH0Rlpm4EQw2j0eYEk9wP9BHJgcft/JA1uLQBy/GcSWRcumDMuUkcCvmyL/wK97NHrP6c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=dezEbsCNXCynSsOOigIGH0kAKqDFQipaftnc+yLmA7YmsOxTfDBRMd0W++m/tYx9o5RAlfblyxY1Qg6qhlSHzHKNdvJW0zT4gWakm6w9VjCau3zBfS4sbmaWoGgbB8OxjZ3rpAnqBXG+50BjeBEp9biSR7U1OwV3OSrwwFCgS8Y= Received: by 10.141.35.21 with SMTP id n21mr2997099rvj.115.1211112516872; Sun, 18 May 2008 05:08:36 -0700 (PDT) Received: by 10.141.76.1 with HTTP; Sun, 18 May 2008 05:08:36 -0700 (PDT) Message-ID: Date: Mon, 19 May 2008 07:38:36 +1930 From: "Ighighi Ighighi" To: freebsd-fs@freebsd.org. MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: Subject: Incorrect handling of UF_IMMUTABLE & UF_APPEND flags on EXT2FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 May 2008 12:33:59 -0000 Almost 2 months have passed since I submitted this PR through GNATS (which cc'd it to freebsd-bugs), so I thought that maybe I should forward it to this list so it gets the attention it deserves: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/122047 http://lists.freebsd.org/pipermail/freebsd-bugs/2008-March/029740.html Some errata notes: The bug may be present in REISERFS, but there's no write support anyway. Salutes, Igh From owner-freebsd-fs@FreeBSD.ORG Sun May 18 12:42:17 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8796A106566B for ; Sun, 18 May 2008 12:42:17 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id 7A4678FC17 for ; Sun, 18 May 2008 12:42:17 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 63FAE1CC033; Sun, 18 May 2008 05:42:17 -0700 (PDT) Date: Sun, 18 May 2008 05:42:17 -0700 From: Jeremy Chadwick To: Andrew Hill Message-ID: <20080518124217.GA16222@eos.sc1.parodius.com> References: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS lockup in "zfs" state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 May 2008 12:42:17 -0000 On Sun, May 18, 2008 at 05:11:37PM +1000, Andrew Hill wrote: > right now, i'm getting uptimes in the order of days before everything locks > up, i assume its related to this bug, though i'm also getting the following > output when it locks up > > ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631 > ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650 > ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007 > ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938 > ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631 > ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650 > ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007 > ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938 > ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631 > ad0: FAILURE - WRITE_DMA timed out LBA=234920650 > ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007 > ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938 I've documented this fairly well, although I suppose I could write up a diagnosis method as an addendum. Anyway: http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues One thing: are the timeouts always on ad0 and ad2? > typically repeated for a number of different LBA values before the system > panics. I don't know if this is more likely to be related to the cause of > the lockups (e.g. faulty hardware/driver) or if its an effect of the lockup > (e.g. waiting on a deadlocked thread)... from what i've found searching > mailing lists, this kind of error seems to turn up with faulty > hardware/drivers so i guess it could just be that zfs exposes the faults > because its using the hardware differently to my previous ufs setup... It is possible you have some bad hardware, but there are many of us who have seen the above (with or without ZFS) on perfectly good hardware. For some, changing cables fixed the problem, while for others absolutely nothing fixed it (changed cables, changed controller brands, changed to new disks). If the DMA timeouts are easily reproducable, please get in touch with Scott Long , who is in the process of researching why these happen. Serial console access might be required. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun May 18 15:12:02 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 267841065670 for ; Sun, 18 May 2008 15:12:02 +0000 (UTC) (envelope-from andrew@thefrog.net) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.228]) by mx1.freebsd.org (Postfix) with ESMTP id 0556F8FC1D for ; Sun, 18 May 2008 15:12:01 +0000 (UTC) (envelope-from andrew@thefrog.net) Received: by rv-out-0506.google.com with SMTP id b25so1034001rvf.43 for ; Sun, 18 May 2008 08:12:01 -0700 (PDT) Received: by 10.141.88.3 with SMTP id q3mr3085651rvl.94.1211123521156; Sun, 18 May 2008 08:12:01 -0700 (PDT) Received: from qurbaga.plantsoft.org ( [121.44.23.190]) by mx.google.com with ESMTPS id k2sm11004890rvb.4.2008.05.18.08.11.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 18 May 2008 08:11:59 -0700 (PDT) Message-Id: <93F07874-8D5F-44AE-945F-803FFC3B9279@thefrog.net> From: Andrew Hill To: Jeremy Chadwick In-Reply-To: <20080518124217.GA16222@eos.sc1.parodius.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Mon, 19 May 2008 01:11:54 +1000 References: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> <20080518124217.GA16222@eos.sc1.parodius.com> X-Mailer: Apple Mail (2.919.2) Sender: Andrew Hill Cc: freebsd-fs@freebsd.org Subject: Re: ZFS lockup in "zfs" state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 May 2008 15:12:02 -0000 On 18/05/2008, at 10:42 PM, Jeremy Chadwick wrote: > One thing: are the timeouts always on ad0 and ad2? firstly, some relevant output from my dmesg atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0 atapci1: port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xcc00-0xcc0f mem 0xf4005000-0xf4005fff irq 21 at device 7.0 on pci0 atapci2: port 0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xe000-0xe00f mem 0xf4000000-0xf4000fff irq 22 at device 8.0 on pci0 atapci3: port 0x8400-0x8407,0x8800-0x8803,0x8c00-0x8c07,0x9000-0x9003,0x9400-0x940f mem 0xf1004000-0xf10043ff irq 17 at device 9.0 on pci1 ad0: 238475MB at ata0-master UDMA100 ad2: 238475MB at ata1-master UDMA100 ad3: 152627MB at ata1-slave UDMA100 ad4: 476940MB at ata2-master SATA300 ad6: 715404MB at ata3-master SATA300 ad8: 305245MB at ata4-master SATA300 ad10: 305245MB at ata5-master SATA300 ad12: 305245MB at ata6-master SATA150 and to answer the question, no, i get timeouts on ad0, 2, 4, 6, 8, 10 and 12, but when they occur its always 1 or 2 disks... for various reasons (primarily focusing on space and low-cost, not performance) i have a 7 disk raidz covering a 250GB slice on each of the above 7 disks, and i've made two more zpools from the remaining space on the drives - and yes, i realise this is a bit of a mess and anyone who's set up any kind of production raid would be appalled, but the aim was to make use of some old disks moreso than to have a fast/ clean setup ad0,2,3 are on the nvidia (southbridge) ata controller ad4,6,8,10 are on the nvidia (southbridge) sata controller ad12 is on the SiI 3114 controller so perhaps i can contribute something useful here because of my (odd) set up? my timeouts aren't limited to any one drive/controller/connector-type - i've had timeouts on all 7 of the drives in the raidz (i've yet to see a timeout on ad3 but that disk is rarely accessed so i'm not entirely surprised) i tend to find that the timeouts occur on one or two disks at once - e.g. ad0 and 2 will complain of timeouts, and the system locks up shortly thereafter... the pairs seem to be grouped by the ata controller... which is to say, i often get ad0 and 2 timeouts together, or two of ad4,6,8,10, or 12 on its own... i'm not 100% sure as i've not recorded the pairs each time, but it seems like there's a strong correlation between the drives giving timeouts and the controller they're running on. this might imply its a bug in the controller driver? or it might simply be an effect of the timing of the writes at some level... this correlation seems interesting though, and i've only just noticed it so i'll be keeping track of future timeouts to see if they consistently pair up within a controller there is the obvious power question (8 drives in a standard PC case... my initial guess was power) but i've hooked up a (Fluke 111) multimeter to log the 5 and 12V rails going to the drives, and its been a steady 5.4 and 12.3 V (including during a timeout and lockup) - these both varied by less than 0.1V over fairly long test periods - so i don't think its power, but i'm willing to keep testing anything... i've also run memtest86 on the ram fearing that might have been the cause... > It is possible you have some bad hardware, but there are many of us > who > have seen the above (with or without ZFS) on perfectly good hardware. > For some, changing cables fixed the problem, while for others > absolutely > nothing fixed it (changed cables, changed controller brands, changed > to > new disks). i'm inclined to think that the disks/cables themselves are good (given the timeouts aren't specific to one disk) and given the ram is okay (from the memtest at least), and the timeouts are occurring on multiple controllers, i think this suggests the controllers are probably okay... (i guess it could be in the northbridge or bus still...) > If the DMA timeouts are easily reproducable, please get in touch with > Scott Long , who is in the process of researching > why > these happen. Serial console access might be required. will do, thanks for the contacts/wiki page (: Andrew From owner-freebsd-fs@FreeBSD.ORG Mon May 19 11:06:52 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2696F10656D3 for ; Mon, 19 May 2008 11:06:52 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0DC218FC1D for ; Mon, 19 May 2008 11:06:52 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m4JB6pWZ011559 for ; Mon, 19 May 2008 11:06:51 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m4JB6pPA011555 for freebsd-fs@FreeBSD.org; Mon, 19 May 2008 11:06:51 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 19 May 2008 11:06:51 GMT Message-Id: <200805191106.m4JB6pPA011555@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 May 2008 11:06:52 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t 6 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime 6 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon May 19 20:30:52 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE325106566C for ; Mon, 19 May 2008 20:30:52 +0000 (UTC) (envelope-from peter.schuller@infidyne.com) Received: from smtp.infidyne.com (ds9.infidyne.com [88.80.6.206]) by mx1.freebsd.org (Postfix) with ESMTP id 63D298FC22 for ; Mon, 19 May 2008 20:30:52 +0000 (UTC) (envelope-from peter.schuller@infidyne.com) Received: from c-a916e555.03-51-73746f3.cust.bredbandsbolaget.se (c-a916e555.03-51-73746f3.cust.bredbandsbolaget.se [85.229.22.169]) by smtp.infidyne.com (Postfix) with ESMTPSA id 9DABA8F2C9; Mon, 19 May 2008 22:30:50 +0200 (CEST) From: Peter Schuller To: freebsd-fs@freebsd.org Date: Mon, 19 May 2008 22:31:38 +0200 User-Agent: KMail/1.9.7 References: <48252C89.8@FreeBSD.org> In-Reply-To: <48252C89.8@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1830900.nYTVAVdTCr"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200805192231.46561.peter.schuller@infidyne.com> Cc: Martin Matuska Subject: Re: ZFS lockup in "zfs" state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 May 2008 20:30:52 -0000 --nextPart1830900.nYTVAVdTCr Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline > I just experienced the same lockup in zfs state as other people did > (Ivan Voras, Peter Schuller) - UFS filesystems still intact. > There was heavy backup tar/gzip activity on the filesystem (read-only, > write was to NFS) and lots of reads via NFS, the server was doing this > job without problems for 11 days. =46WIW, I've seen it a few more times on two different machines. Both runni= ng=20 semi-new FreeBSD (I still don't think I ever saw this on earlier CURRENT:s). In the case of both machines, the machine is only selectively hung. Possibl= y=20 limited to the zfs file system - definitely not global to the pool. A remot= e=20 reboot -q -n has been useful to recover without console access. In both of these cases, more or less all activity of any amount is on ZFS f= ile=20 systems. One of them has only ZFS except for swap on a UFS file system. The= =20 other has root on UFS, but no bulk operations whatsoever happening (beyond= =20 the usual periodics) except on ZFS. No NFS on either machine. =2D-=20 / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org --nextPart1830900.nYTVAVdTCr Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.8 (FreeBSD) iEYEABECAAYFAkgx47IACgkQDNor2+l1i30cowCfTLnFFuqxUjTfeqXh9Ji+e4Bu I7EAoMY5EnKrEWmZZvrxqZRzhU39Y1/K =X4HK -----END PGP SIGNATURE----- --nextPart1830900.nYTVAVdTCr-- From owner-freebsd-fs@FreeBSD.ORG Sat May 24 01:09:02 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11CF51065678; Sat, 24 May 2008 01:09:02 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.delphij.net (unknown [IPv6:2001:470:1f03:2c9::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9B16A8FC16; Sat, 24 May 2008 01:09:01 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (tarsier.geekcn.org [202.108.54.204]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.delphij.net (Postfix) with ESMTPS id D180A28448; Sat, 24 May 2008 09:09:00 +0800 (CST) Received: from localhost (tarsier.geekcn.org [202.108.54.204]) by tarsier.geekcn.org (Postfix) with ESMTP id 5CC9CEB914C; Sat, 24 May 2008 09:09:00 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([202.108.54.204]) by localhost (mail.geekcn.org [202.108.54.204]) (amavisd-new, port 10024) with ESMTP id d378gTEDz3tI; Sat, 24 May 2008 09:08:55 +0800 (CST) Received: from charlie.delphij.net (71.5.7.139.ptr.us.xo.net [71.5.7.139]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTPSA id 4ACDCEB0C13; Sat, 24 May 2008 09:08:54 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:subject:x-enigmail-version:openpgp:content-type:content-transfer-encoding; b=K09TAcGpoP4QHKBrgpRipkvyQk+qviT9iWZvsrziUJR9ZzTTSxcnJ90NbJGg0m4El YRlxe8uXVi0DXjET1+2pg== Message-ID: <48376AA3.6090205@delphij.net> Date: Fri, 23 May 2008 18:08:51 -0700 From: Xin LI Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.14 (X11/20080505) MIME-Version: 1.0 To: freebsd-fs@freebsd.org, Jeff Roberson X-Enigmail-Version: 0.95.6 OpenPGP: id=18EDEBA0; url=http://www.delphij.net/delphij.asc Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: vfs.lookup_shared X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 May 2008 01:09:02 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Is there any reason behind we don't have vfs.lookup_shared enabled by default? Cheers, - -- ** Help China's quake relief at http://www.redcross.org.cn/ |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkg3aqIACgkQi+vbBBjt66B+kgCeN+tTUiiLkPGLsAVNkrJtgSe2 SKwAoKS15lX6IvL+9ej+ys5H2XKz3GpB =gqHL -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Sat May 24 03:38:33 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A4A61065681 for ; Sat, 24 May 2008 03:38:33 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.172]) by mx1.freebsd.org (Postfix) with ESMTP id CB9618FC0C for ; Sat, 24 May 2008 03:38:32 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wf-out-1314.google.com with SMTP id 24so748208wfg.7 for ; Fri, 23 May 2008 20:38:32 -0700 (PDT) Received: by 10.142.177.5 with SMTP id z5mr638262wfe.248.1211598787845; Fri, 23 May 2008 20:13:07 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 30sm11038737wfc.5.2008.05.23.20.13.06 (version=SSLv3 cipher=RC4-MD5); Fri, 23 May 2008 20:13:07 -0700 (PDT) Date: Fri, 23 May 2008 17:16:16 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: d@delphij.net In-Reply-To: <48376AA3.6090205@delphij.net> Message-ID: <20080523171509.K954@desktop> References: <48376AA3.6090205@delphij.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Jeff Roberson Subject: Re: vfs.lookup_shared X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 May 2008 03:38:33 -0000 On Fri, 23 May 2008, Xin LI wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > Is there any reason behind we don't have vfs.lookup_shared enabled by > default? We have discussed enabling it by default once the ffs shared lookup support is complete. Unfortunately ffs is still not 100% reliable. I want to verify that it's an ffs problem and not a problem with the vfs generic code which would effect all filesystems before we enable it by default. Jeff > > Cheers, > - -- > ** Help China's quake relief at http://www.redcross.org.cn/ > |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Xin LI http://www.delphij.net/ > FreeBSD - The Power to Serve! > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.9 (FreeBSD) > > iEYEARECAAYFAkg3aqIACgkQi+vbBBjt66B+kgCeN+tTUiiLkPGLsAVNkrJtgSe2 > SKwAoKS15lX6IvL+9ej+ys5H2XKz3GpB > =gqHL > -----END PGP SIGNATURE----- > From owner-freebsd-fs@FreeBSD.ORG Sat May 24 08:52:24 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 758) id 09E39106567A; Sat, 24 May 2008 08:52:24 +0000 (UTC) Date: Sat, 24 May 2008 08:52:24 +0000 From: Kris Kennaway To: Jeff Roberson Message-ID: <20080524085224.GL20868@hub.freebsd.org> References: <48376AA3.6090205@delphij.net> <20080523171509.K954@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080523171509.K954@desktop> User-Agent: Mutt/1.4.2.1i Cc: freebsd-fs@freebsd.org, Jeff Roberson , d@delphij.net Subject: Re: vfs.lookup_shared X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 May 2008 08:52:24 -0000 On Fri, May 23, 2008 at 05:16:16PM -1000, Jeff Roberson wrote: > On Fri, 23 May 2008, Xin LI wrote: > > >-----BEGIN PGP SIGNED MESSAGE----- > >Hash: SHA1 > > > >Hi, > > > >Is there any reason behind we don't have vfs.lookup_shared enabled by > >default? > > We have discussed enabling it by default once the ffs shared lookup > support is complete. Unfortunately ffs is still not 100% reliable. I > want to verify that it's an ffs problem and not a problem with the vfs > generic code which would effect all filesystems before we enable it by > default. Also, until Attilio's recent lockmgr work, shared lockmgr locks were starving exclusive lockmgr lock requests, leading to performance problems on some workloads with the only filesystem that supported shared locking (NFS). This is now fixed though. Kris -- In God we Trust -- all others must submit an X.509 certificate. -- Charles Forsythe