Date: Sun, 18 May 2008 17:11:37 +1000 From: Andrew Hill <lists@thefrog.net> To: freebsd-fs@freebsd.org Subject: ZFS lockup in "zfs" state Message-ID: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net>
next in thread | raw e-mail | index | archive | help
> The following patch, published some time ago by pjd helped me: > http://mbsd.msk.ru/dist/zfs_lockup.diff > > 100+ days of uptime of heavily loaded machines and no problems so far. > > Hope it would help. I applied this patch with some modifications to fix up the file names as they seem to have moved from - src/sys/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h - src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c - src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c to - src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h - src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c - src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (and pointed the kernel configuration file, MASSHOSTING_7_64, to my own kernel config) buildworld and buildkernel succeeded without error, but when i installed the new kernel and rebooted i got the following output (the important point being the failure to load zfs on the 8th line) May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1992-2008 The FreeBSD Project. May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 May 17 17:02:06 <0.2> gutter kernel: The Regents of the University of California. All rights reserved. May 17 17:02:06 <0.2> gutter kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. May 17 17:02:06 <0.2> gutter kernel: FreeBSD 7.0-STABLE #6: Sat May 17 16:39:32 EST 2008 May 17 17:02:06 <0.2> gutter kernel: root@gutter.thefrog.net:/usr/obj/ usr/src/sys/GUTTER May 17 17:02:06 <0.2> gutter kernel: link_elf_obj: symbol kproc_exit undefined May 17 17:02:06 <0.2> gutter kernel: KLD file zfs.ko - could not finalize loading May 17 17:02:06 <0.2> gutter kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 May 17 17:02:06 <0.2> gutter kernel: CPU: AMD Athlon(tm) 64 Processor 3200+ (2010.31-MHz K8-class CPU) May 17 17:02:06 <0.2> gutter kernel: Origin = "AuthenticAMD" Id = 0x10ff0 Stepping = 0 May 17 17:02:06 <0.2> gutter kernel: Features =0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> May 17 17:02:06 <0.2> gutter kernel: AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!> May 17 17:02:06 <0.2> gutter kernel: AMD Features2=0x1<LAHF> May 17 17:02:06 <0.2> gutter kernel: usable memory = 2137882624 (2038 MB) May 17 17:02:06 <0.2> gutter kernel: avail memory = 2060988416 (1965 MB) May 17 17:02:06 <0.2> gutter kernel: ACPI APIC Table: <Nvidia AWRDACPI> May 17 17:02:06 <0.2> gutter kernel: ioapic0 <Version 1.1> irqs 0-23 on motherboard <snip> May 17 17:02:06 <0.2> gutter kernel: ad0: 238475MB <Hitachi HDS722525VLAT80 V36OA60A> at ata0-master UDMA100 May 17 17:02:06 <0.2> gutter kernel: ad2: 238475MB <WDC WD2500PB-98FBA0 15.05R15> at ata1-master UDMA100 May 17 17:02:06 <0.2> gutter kernel: ad3: 152627MB <Seagate ST3160812A 3.AAE> at ata1-slave UDMA100 May 17 17:02:06 <0.2> gutter kernel: ad4: 476940MB <Seagate ST3500320AS SD15> at ata2-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad6: 715404MB <Seagate ST3750330AS SD15> at ata3-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad8: 305245MB <Seagate ST3320620AS 3.AAK> at ata4-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad10: 305245MB <Seagate ST3320620AS 3.AAE> at ata5-master SATA300 May 17 17:02:06 <0.2> gutter kernel: ad12: 305245MB <Seagate ST3320620AS 3.AAE> at ata6-master SATA150 May 17 17:02:06 <0.2> gutter kernel: Trying to mount root from zfs:tank/root May 17 17:02:06 <0.2> gutter kernel: May 17 17:02:06 <0.2> gutter kernel: Manual root filesystem specification: May 17 17:02:06 <0.2> gutter kernel: <fstype>:<device> Mount <device> using filesystem <fstype> May 17 17:02:06 <0.2> gutter kernel: eg. ufs:da0s1a May 17 17:02:06 <0.2> gutter kernel: ? List valid disk boot devices May 17 17:02:06 <0.2> gutter kernel: <empty line> Abort manual input May 17 17:02:06 <0.2> gutter kernel: May 17 17:02:06 <0.2> gutter kernel: mountroot> at this point, since zfs has not been loaded, obviously i could not get it to mount root from zfs:tank/root, and resorted to a backup ufs root to put my old kernel back in place i'm not sure if there is more output available than just the "could not finalize loading", if so please let me know where to look and i'd love to re-test this patch if it'll provide more information right now, i'm getting uptimes in the order of days before everything locks up, i assume its related to this bug, though i'm also getting the following output when it locks up ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631 ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650 ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007 ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938 ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631 ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650 ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007 ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938 ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631 ad0: FAILURE - WRITE_DMA timed out LBA=234920650 ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007 ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938 typically repeated for a number of different LBA values before the system panics. I don't know if this is more likely to be related to the cause of the lockups (e.g. faulty hardware/driver) or if its an effect of the lockup (e.g. waiting on a deadlocked thread)... from what i've found searching mailing lists, this kind of error seems to turn up with faulty hardware/drivers so i guess it could just be that zfs exposes the faults because its using the hardware differently to my previous ufs setup... in terms of my specific setup, i have 2gb ram, i'm running from up-to- date -STABLE source (apart from my attempt to apply the aforementioned patch), i'm running an amd64 kernel, and my /boot/loader.conf looks like this: vm.kmem_size_max="1610612736" vm.kmem_size="1610612736" zfs_load="YES" vfs.root.mountfrom="zfs:tank/root" vfs.zfs.prefetch_disable="1" vfs.zfs.arc_max="838860800" the last line was an attempt to reduce the amount of arc cache in the kernel in case it was having trouble locating memory blocks for other things (as the default value had it at 1.2gb) but adding that parameter doesn't seem to have had any effect anyway, any info toward resolving this would be greatly appreciated, and otherwise let me know what further info i can provide to help track down the problem Andrew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?683A6ED2-0E54-42D7-8212-898221C05150>