Date: Mon, 26 Jan 2004 01:26:15 -0500 (EST) From: Andre Guibert de Bruet <andy@siliconlandmark.com> To: freebsd-current@freebsd.org Subject: Re: Processes blocked on ufs or getblk Message-ID: <20040126005108.Q42487@alpha.siliconlandmark.com> In-Reply-To: <20040115015136.B47506@alpha.siliconlandmark.com> References: <D90B8DC2-471E-11D8-A84C-000A95DBB47C@ca.com> <20040115015136.B47506@alpha.siliconlandmark.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Thu, 15 Jan 2004, Andre Guibert de Bruet wrote: > On Thu, 15 Jan 2004, Lachlan O'Dea wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > > > I found some discussion about this in December, but I don't think > > anyone has been able to get to the bottom of it yet. The symptom is > > that processes become permanently blocked in a state of ufs or getblk. > > I can reproduce it with find at will: > > > > % ps axl | grep ufs > > 0 13225 13215 1 -4 0 1300 804 ufs D ?? 0:00.96 find > > /var -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+ > > 0 28778 28765 0 -4 0 1300 804 ufs D ?? 0:00.97 find > > /var -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+ > > 0 33017 32933 2 -4 0 1304 788 ufs D p2- 0:10.69 find > > / -name samba > > > > It has also happened several times in single user mode to makewhatis > > running at the end of installworld. > > > > System details: 5.2-RC FreeBSD 5.2-RC #1: Fri Jan 9 04:45:51 EST 2004. > > Dell PowerEdge 2500. All filesystems are on a single raid 5 volume > > using the aac driver. The box has two CPUs, but I'm currently running > > with kern.smp.disabled=1. > > > > % mount > > /dev/aacd0s1a on / (ufs, local) > > devfs on /dev (devfs, local) > > /dev/aacd0s1e on /usr (ufs, local, with quotas, soft-updates) > > /dev/aacd0s1d on /var (ufs, local, soft-updates) > > procfs on /proc (procfs, local) > > linprocfs on /usr/compat/linux/proc (linprocfs, local) > > > > I also have ACLs enabled on /usr, if that's at all relevant. > > > > The kernel has DDB and DEBUG_LOCKS. Please let me know if there's > > anything I can do to help debug this. > > > > I don't know if this is related, but another problem is that when > > shutting down, it always gives up on a bunch of buffers. I think I've > > seen over 100, but usually it's 4-10 buffers. > > I'm seeing the same thing on my desktop machine. It usually occurs while > scanning large directories and/or dealing with large collections of files > rather quickly. I came across this bug while using gqview to go through my > image collection and a second time while re-checking out my ports tree > from local cvs. The programs appear to grab an exclusive lock and anything > that tries to read or write to the directory (or get a directory listing) > gets stuck in ufs state. > > My kernel config is rather simple, GENERIC without a lot of cruft except > amr, ata, scsi, usb and pcm. I'll try to get the output of a ddb ps and a > show lockedvnods. I'm reviving this thread as I have more information that might help track this problem down. The offending process in this case is gqview but it could have been 'find /' or any other process running when there's high system load (such as daylies). >From the emails that I've gotten it appears that this bug affects users that are using either ccd or hardware raid (amr driver in my case). I've attached the output of a ddb ps and a 'show lockednods'. Every time the getblk hang rears it's ugly head, I've seen "amr0: bad slot x completed" (where x is an integer between 0 to 4) printed on the serial console. This makes me think that there's a failure mode or special state that isn't being checked with the amr driver. Perusing the code shows that the bad slot message is a result of a NULL busy command. I'm no storage driver and my VFS knowledge is somewhat limited. Anyone out there want to have a look at this? I'm willing to try out any patches on this system. I'm currently running: FreeBSD bling.home 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Thu Jan 22 11:38:46 EST 2004 andy@bling.home:/usr/src/sys/i386/compile/BLING i386 Full Kernel config file is up at: http://bling.properkernel.com/BLING I'll have a boot -v up shortly at: http://bling.properkernel.com/boot-v.txt Regards, > Andre Guibert de Bruet | Enterprise Software Consultant > > Silicon Landmark, LLC. | http://siliconlandmark.com/ > [-- Attachment #2 --] db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 1140 77082a50 b4d0d000 0 1 1140 0004002 [SLP]nanslp 0x60799bbc] reboot 1043 6aa08dc0 b4c3c000 501 1 1042 0004000 [SLP]getblk 0x992ba724] gqview 58 68ea9528 b08be000 0 0 0 0000204 [SLP]- 0x607c80ac] nfsiod 3 57 68ea96e0 b08bf000 0 0 0 0000204 [SLP]- 0x607c80a8] nfsiod 2 56 68ea9898 b08c0000 0 0 0 0000204 [SLP]- 0x607c80a4] nfsiod 1 55 68ea9a50 b08c1000 0 0 0 0000204 [SLP]- 0x607c80a0] nfsiod 0 54 68ea9c08 b08c2000 0 0 0 0000204 [SLP]vlruwt 0x68ea9c08] vnlru 53 68ea9dc0 b08c3000 0 0 0 0000204 [SLP]syncer 0x60799580] syncer 52 690f0000 b2902000 0 0 0 0000204 [SLP]psleep 0x607c142c] bufdaemon 51 690f01b8 b2903000 0 0 0 000020c [SLP]pgzero 0x607ce828] pagezero 50 690f0370 b2904000 0 0 0 0000204 [SLP]psleep 0x607ce880] vmdaemon 49 690f0528 b2905000 0 0 0 0000204 [SLP]psleep 0x607ce86c] pagedaemon 9 690f06e0 b2906000 0 0 0 0000204 [SLP]- 0xb2930d0c] schedcpu 48 690f0898 b294f000 0 0 0 0000204 [IWAIT] swi0: tty:sio 47 68e55a50 b088b000 0 0 0 0000204 [SLP]usbtsk 0x60791c04] usbtask 46 68e55c08 b088c000 0 0 0 0000204 [SLP]usbevt 0x68fcd210] usb0 8 68e55dc0 b088d000 0 0 0 0000204 [SLP]actask 0x608cb36c] acpi_task2 7 68ea7000 b088e000 0 0 0 0000204 [SLP]actask 0x608cb36c] acpi_task1 6 68ea71b8 b088f000 0 0 0 0000204 [SLP]actask 0x608cb36c] acpi_task0 --More-- 45 68ea7370 b0890000 0 0 0 0000204 [IWAIT] swi7: task queue 44 68ea7528 b0891000 0 0 0 0000204 [IWAIT] swi7: acpitaskq 43 68ea76e0 b0892000 0 0 0 0000204 [IWAIT] swi3: cambio 42 68ea7898 b0893000 0 0 0 0000204 new [IWAIT] swi2: camnet 41 68ea7a50 b0894000 0 0 0 0000204 new [IWAIT] swi5:+ 5 68ea7c08 b08b9000 0 0 0 0000204 [SLP]tqthr 0x6079afe8] taskqueue 40 68ea7dc0 b08ba000 0 0 0 0000204 [IWAIT] swi6:+ 39 68ea9000 b08bb000 0 0 0 0000204 [SLP]- 0x6078e9a0] random 4 68e4c528 b085b000 0 0 0 0000204 [SLP]- 0x60794220] g_down 3 68e4c6e0 b085c000 0 0 0 0000204 [SLP]- 0x6079421c] g_up 2 68e4c898 b085d000 0 0 0 0000204 [SLP]- 0x60794214] g_event 38 68e4ca50 b085e000 0 0 0 0000204 new [IWAIT] swi4: vm 37 68e4cc08 b085f000 0 0 0 000020c [LOCK Giant 69109cc0] swi8: tty:sio clock 36 68e4cdc0 b0860000 0 0 0 0000204 [IWAIT] swi1: net 35 68e55000 b0861000 0 0 0 0000204 new [IWAIT] irq0: clk 34 68e551b8 b0886000 0 0 0 0000204 new [IWAIT] irq23: 33 68e55370 b0887000 0 0 0 0000204 new [IWAIT] irq22: 32 68e55528 b0888000 0 0 0 0000204 [IWAIT] irq21: amr0 31 68e556e0 b0889000 0 0 0 0000204 new [IWAIT] irq20: 30 68e55898 b088a000 0 0 0 0000204 [IWAIT] irq19: fwohci1+ --More-- 29 64f661b8 aee2a000 0 0 0 0000204 [IWAIT] irq18: rl0 28 64f66370 aee2b000 0 0 0 0000204 [IWAIT] irq17: atapci1 pcm0 27 64f66528 aee2c000 0 0 0 0000204 [IWAIT] irq16: fwohci0 26 64f666e0 aee2d000 0 0 0 0000204 [IWAIT] irq15: ata1 25 64f66898 aee52000 0 0 0 0000204 [IWAIT] irq14: ata0 24 64f66a50 aee53000 0 0 0 0000204 new [IWAIT] irq13: 23 64f66c08 aee54000 0 0 0 0000204 new [IWAIT] irq12: 22 64f66dc0 aee55000 0 0 0 0000204 new [IWAIT] irq11: 21 68e4c000 b0858000 0 0 0 0000204 new [IWAIT] irq10: 20 68e4c1b8 b0859000 0 0 0 0000204 new [IWAIT] irq9: acpi0 19 68e4c370 b085a000 0 0 0 0000204 new [IWAIT] irq8: rtc 18 64f5d000 aedd8000 0 0 0 0000204 new [IWAIT] irq7: ppc0 17 64f5d1b8 aee21000 0 0 0 0000204 new [IWAIT] irq6: 16 64f5d370 aee22000 0 0 0 0000204 new [IWAIT] irq5: 15 64f5d528 aee23000 0 0 0 0000204 new [IWAIT] irq4: sio0 14 64f5d6e0 aee24000 0 0 0 0000204 new [IWAIT] irq3: sio1 13 64f5d898 aee25000 0 0 0 0000204 [CPU 0] irq1: atkbd0 12 64f5da50 aee26000 0 0 0 000020c [Can run] idle: cpu0 11 64f5dc08 aee27000 0 0 0 000020c [CPU 1] idle: cpu1 1 64f5ddc0 aee28000 0 0 1 0004200 [SLP]wait 0x64f5ddc0] init --More-- 10 64f66000 aee29000 0 0 0 0000204 [CV]ktrace 0x607977a4] ktrace 0 60794320 60c1f000 0 0 0 0000200 [SLP]sched 0x60794320] swapper db> show lockedvnods Locked vnodes 0x6d4a1e38: tag ufs, type VREG, usecount 2, writecount 0, refcount 21, flags (VV_OBJBUF), lock type ufs: EXCL (count 1) by thread 0x6aa09e70 (pid 1043) ino 22988755, on dev amrd0a (4, 30) db>home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040126005108.Q42487>
