Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 May 2008 17:11:37 +1000
From:      Andrew Hill <lists@thefrog.net>
To:        freebsd-fs@freebsd.org
Subject:   ZFS lockup in "zfs" state
Message-ID:  <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net>

next in thread | raw e-mail | index | archive | help
> The following patch, published some time ago by pjd helped me:
> http://mbsd.msk.ru/dist/zfs_lockup.diff
>
> 100+ days of uptime of heavily loaded machines and no problems so far.
>
> Hope it would help.

I applied this patch with some modifications to fix up the file names  
as they seem to have moved
from
- src/sys/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
- src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
- src/sys/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
to
- src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
- src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
- src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
(and pointed the kernel configuration file, MASSHOSTING_7_64, to my  
own kernel config)

buildworld and buildkernel succeeded without error, but when i  
installed the new kernel and rebooted i got the following output
(the important point being the failure to load zfs on the 8th line)

May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1992-2008 The  
FreeBSD Project.
May 17 17:02:06 <0.2> gutter kernel: Copyright (c) 1979, 1980, 1983,  
1986, 1988, 1989, 1991, 1992, 1993, 1994
May 17 17:02:06 <0.2> gutter kernel: The Regents of the University of  
California. All rights reserved.
May 17 17:02:06 <0.2> gutter kernel: FreeBSD is a registered trademark  
of The FreeBSD Foundation.
May 17 17:02:06 <0.2> gutter kernel: FreeBSD 7.0-STABLE #6: Sat May 17  
16:39:32 EST 2008
May 17 17:02:06 <0.2> gutter kernel: root@gutter.thefrog.net:/usr/obj/ 
usr/src/sys/GUTTER
May 17 17:02:06 <0.2> gutter kernel: link_elf_obj: symbol kproc_exit  
undefined
May 17 17:02:06 <0.2> gutter kernel: KLD file zfs.ko - could not  
finalize loading
May 17 17:02:06 <0.2> gutter kernel: Timecounter "i8254" frequency  
1193182 Hz quality 0
May 17 17:02:06 <0.2> gutter kernel: CPU: AMD Athlon(tm) 64 Processor  
3200+ (2010.31-MHz K8-class CPU)
May 17 17:02:06 <0.2> gutter kernel: Origin = "AuthenticAMD"  Id =  
0x10ff0  Stepping = 0
May 17 17:02:06 <0.2> gutter kernel:  
Features 
=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
May 17 17:02:06 <0.2> gutter kernel: AMD  
Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
May 17 17:02:06 <0.2> gutter kernel: AMD Features2=0x1<LAHF>
May 17 17:02:06 <0.2> gutter kernel: usable memory = 2137882624 (2038  
MB)
May 17 17:02:06 <0.2> gutter kernel: avail memory  = 2060988416 (1965  
MB)
May 17 17:02:06 <0.2> gutter kernel: ACPI APIC Table: <Nvidia AWRDACPI>
May 17 17:02:06 <0.2> gutter kernel: ioapic0 <Version 1.1> irqs 0-23  
on motherboard

<snip>

May 17 17:02:06 <0.2> gutter kernel: ad0: 238475MB <Hitachi  
HDS722525VLAT80 V36OA60A> at ata0-master UDMA100
May 17 17:02:06 <0.2> gutter kernel: ad2: 238475MB <WDC  
WD2500PB-98FBA0 15.05R15> at ata1-master UDMA100
May 17 17:02:06 <0.2> gutter kernel: ad3: 152627MB <Seagate ST3160812A  
3.AAE> at ata1-slave UDMA100
May 17 17:02:06 <0.2> gutter kernel: ad4: 476940MB <Seagate  
ST3500320AS SD15> at ata2-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad6: 715404MB <Seagate  
ST3750330AS SD15> at ata3-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad8: 305245MB <Seagate  
ST3320620AS 3.AAK> at ata4-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad10: 305245MB <Seagate  
ST3320620AS 3.AAE> at ata5-master SATA300
May 17 17:02:06 <0.2> gutter kernel: ad12: 305245MB <Seagate  
ST3320620AS 3.AAE> at ata6-master SATA150
May 17 17:02:06 <0.2> gutter kernel: Trying to mount root from  
zfs:tank/root
May 17 17:02:06 <0.2> gutter kernel:
May 17 17:02:06 <0.2> gutter kernel: Manual root filesystem  
specification:
May 17 17:02:06 <0.2> gutter kernel: <fstype>:<device>  Mount <device>  
using filesystem <fstype>
May 17 17:02:06 <0.2> gutter kernel: eg. ufs:da0s1a
May 17 17:02:06 <0.2> gutter kernel: ?                  List valid  
disk boot devices
May 17 17:02:06 <0.2> gutter kernel: <empty line>       Abort manual  
input
May 17 17:02:06 <0.2> gutter kernel:
May 17 17:02:06 <0.2> gutter kernel: mountroot>

at this point, since zfs has not been loaded, obviously i could not  
get it to mount root from zfs:tank/root, and resorted to a backup ufs  
root to put my old kernel back in place

i'm not sure if there is more output available than just the "could  
not finalize loading", if so please let me know where to look and i'd  
love to re-test this patch if it'll provide more information

right now, i'm getting uptimes in the order of days before everything  
locks up, i assume its related to this bug, though i'm also getting  
the following output when it locks up

ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650
ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007
ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938
ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631
ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650
ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007
ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938
ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631
ad0: FAILURE - WRITE_DMA timed out LBA=234920650
ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007
ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938

typically repeated for a number of different LBA values before the  
system panics. I don't know if this is more likely to be related to  
the cause of the lockups (e.g. faulty hardware/driver) or if its an  
effect of the lockup (e.g. waiting on a deadlocked thread)... from  
what i've found searching mailing lists, this kind of error seems to  
turn up with faulty hardware/drivers so i guess it could just be that  
zfs exposes the faults because its using the hardware differently to  
my previous ufs setup...

in terms of my specific setup, i have 2gb ram, i'm running from up-to- 
date -STABLE source (apart from my attempt to apply the aforementioned  
patch), i'm running an amd64 kernel, and my /boot/loader.conf looks  
like this:

vm.kmem_size_max="1610612736"
vm.kmem_size="1610612736"
zfs_load="YES"
vfs.root.mountfrom="zfs:tank/root"
vfs.zfs.prefetch_disable="1"
vfs.zfs.arc_max="838860800"

the last line was an attempt to reduce the amount of arc cache in the  
kernel in case it was having trouble locating memory blocks for other  
things (as the default value had it at 1.2gb) but adding that  
parameter doesn't seem to have had any effect

anyway, any info toward resolving this would be greatly appreciated,  
and otherwise let me know what further info i can provide to help  
track down the problem

Andrew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?683A6ED2-0E54-42D7-8212-898221C05150>