From owner-freebsd-stable@FreeBSD.ORG Thu Jan 5 18:24:24 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E20C16A41F; Thu, 5 Jan 2006 18:24:24 +0000 (GMT) (envelope-from dsh@vlink.ru) Received: from deliver.smtp.vlink.ru (vlink-1.avtlg.ru [83.239.142.33]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37D6343D5D; Thu, 5 Jan 2006 18:24:21 +0000 (GMT) (envelope-from dsh@vlink.ru) Received: from smtp.smtp.vlink.ru (clamav.smtp.vlink.ru [192.168.4.1]) by deliver.smtp.vlink.ru (Postfix) with ESMTP id 21DDAFECDE3; Thu, 5 Jan 2006 21:24:19 +0300 (MSK) Received: from neva.vlink.ru (neva.vlink.ru [217.107.252.66]) by smtp.smtp.vlink.ru (Postfix) with ESMTP id 35B831009AB3; Thu, 5 Jan 2006 21:24:18 +0300 (MSK) Received: from neva.vlink.ru (localhost [127.0.0.1]) by neva.vlink.ru (8.13.4/8.13.4) with ESMTP id k05HckYt006032; Thu, 5 Jan 2006 20:38:47 +0300 (MSK) (envelope-from dsh@vlink.ru) Received: (from dsh@localhost) by neva.vlink.ru (8.13.4/8.13.4/Submit) id k05Hciih006029; Thu, 5 Jan 2006 20:38:44 +0300 (MSK) (envelope-from dsh@vlink.ru) X-Comment-To: Greg Rivers To: Greg Rivers References: <20051121164139.T48994@w10.sac.fedex.com> <20051122021224.GA12402@xor.obsecurity.org> <20051121205535.W32523@nc8000.tharned.org> <20051122043952.GA14168@xor.obsecurity.org> <20051122211507.P32523@nc8000.tharned.org> <20060103135624.A798@nc8000.tharned.org> From: Denis Shaposhnikov Date: Thu, 05 Jan 2006 20:38:44 +0300 In-Reply-To: <20060103135624.A798@nc8000.tharned.org> (Greg Rivers's message of "Tue, 3 Jan 2006 14:03:06 -0600 (CST)") Message-ID: <8764oyinh7.fsf@neva.vlink.ru> User-Agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.4.18 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 X-Virus-Scanned: ClamAV using ClamSMTP Cc: Kirk McKusick , Don Lewis , freebsd-stable@freebsd.org, Kris Kennaway Subject: Re: Recurring problem: processes block accessing UFS file system X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jan 2006 18:24:24 -0000 Hi! >>>>> "Greg" == Greg Rivers writes: Greg> It's taken more than a month, but the problem has recurred Greg> without snapshots ever having been run. I've got a good trace I think that I have the same problem on a fresh CURRENT. For some processes I see MWCHAN = ufs and "D" in the STAT. And I can't kill such processes even with -9. And system can't kill them too on shutdown. So, system can't do shutdown and wait forever after "All buffers synced" message. At this moment I've entered to KDB do "show lockedvnods": Locked vnodes 0xc687cb58: tag ufs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () v_object 0xcb5b1934 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending ino 2072602, on dev ad4s1g 0xc687ca50: tag ufs, type VDIR usecount 31, writecount 0, refcount 32 mountedhere 0 flags () v_object 0xc85d2744 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending ino 2072603, on dev ad4s1g 0xc687c948: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc875d000 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending ino 2072615, on dev ad4s1g 0xc691f420: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc8a773e0 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending ino 2072680, on dev ad4s1g 0xc691f318: tag ufs, type VDIR usecount 3, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xc8a7b2e8 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc7019780 (pid 74103) with 2 pending ino 2072795, on dev ad4s1g 0xc69bb528: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc7890744 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc91f4600 (pid 74129) with 1 pending ino 2072767, on dev ad4s1g Locked vnodes 0xc687cb58: tag ufs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () v_object 0xcb5b1934 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending ino 2072602, on dev ad4s1g 0xc687ca50: tag ufs, type VDIR usecount 31, writecount 0, refcount 32 mountedhere 0 flags () v_object 0xc85d2744 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending ino 2072603, on dev ad4s1g 0xc687c948: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc875d000 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending ino 2072615, on dev ad4s1g 0xc691f420: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc8a773e0 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending ino 2072680, on dev ad4s1g 0xc691f318: tag ufs, type VDIR usecount 3, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xc8a7b2e8 ref 0 pages 1 lock type ufs: EXCL (count 1) by t(kgdb) After that I've done "call doadump" and got vmcore. ps show me: (kgdb) ps During symbol reading, Incomplete CFI data; unspecified registers at 0xc04d97eb. pid proc uid ppid pgrp flag stat comm wchan 74686 c9464000 0 1 1 000000 1 sh ufs c687caa8 74195 c970d000 0 3074 74195 4000100 1 sshd ufs c687caa8 74178 c7682adc 0 3074 74178 004000 1 sshd ufs c687c9a0 74129 c9b82adc 1008 1 5504 004000 1 parser3.cgi ufs c691f370 74103 c70b5458 1008 1 5504 000000 1 httpd ufs c69bb580 65610 c92c0458 1005 1 65610 004000 1 sftp-server ufs c691f478 5518 c6247458 1008 1 5516 004002 1 perl5.8.7 ufs c687caa8 3081 c7523d08 0 1 3081 000000 1 cron ufs c687caa8 3074 c7682d08 0 1 3074 000100 1 sshd ufs c687caa8 3016 c7523adc 0 1 3016 000000 1 syslogd ufs c687caa8 519 c68e4d08 80 1 518 000100 1 nginx ufs c691f370 34 c6260000 0 0 0 000204 1 schedcpu - e88b3cf0 33 c62438b0 0 0 0 000204 1 syncer ktsusp c6243938 32 c6243adc 0 0 0 000204 1 vnlru ktsusp c6243b64 31 c6243d08 0 0 0 000204 1 bufdaemon ktsusp c6243d90 30 c6244000 0 0 0 00020c 1 pagezero pgzero c06c21a0 29 c624422c 0 0 0 000204 1 vmdaemon psleep c06c1d08 28 c6244458 0 0 0 000204 1 pagedaemon psleep c06c1cc8 27 c602e684 0 0 0 000204 1 irq1: atkbd0 26 c602e8b0 0 0 0 000204 1 swi0: sio 25 c602eadc 0 0 0 000204 1 irq18: atapci1 24 c602ed08 0 0 0 000204 1 irq15: ata1 23 c6074000 0 0 0 000204 1 irq14: ata0 22 c607422c 0 0 0 000204 1 irq27: em1 21 c6074458 0 0 0 000204 1 irq26: em0 20 c6074684 0 0 0 000204 1 irq9: acpi0 19 c60748b0 0 0 0 000204 1 swi2: cambio 18 c6074adc 0 0 0 000204 1 swi6: task queue 9 c5fd822c 0 0 0 000204 1 acpi_task2 - c6061e40 8 c5fd8458 0 0 0 000204 1 acpi_task1 - c6061e40 7 c5fd8684 0 0 0 000204 1 acpi_task0 - c6061e40 17 c5fd88b0 0 0 0 000204 1 swi6: Giant taskq 6 c5fd8adc 0 0 0 000204 1 thread taskq - c6062040 16 c5fd8d08 0 0 0 000204 1 swi5: Fast taskq 5 c602e000 0 0 0 000204 1 kqueue taskq - c6062100 15 c602e22c 0 0 0 000204 1 yarrow - c06b1720 4 c602e458 0 0 0 000204 1 g_down - c06b1fc0 3 c5fd3000 0 0 0 000204 1 g_up - c06b1fbc 2 c5fd322c 0 0 0 000204 1 g_event - c06b1fb4 14 c5fd3458 0 0 0 000204 1 swi3: vm 13 c5fd3684 0 0 0 00020c 1 swi4: clock sio 12 c5fd38b0 0 0 0 000204 1 swi1: net 11 c5fd3adc 0 0 0 00020c 1 idle: cpu0 10 c5fd3d08 0 0 0 00020c 1 idle: cpu1 1 c5fd8000 0 0 1 004200 1 init ufs c687cbb0 0 c06b20c0 0 0 0 000200 1 swapper There is no member named p_pptr. Note, I've tried to check v_object addresses from "show lockedvnodes": (kgdb) p/x *0xcb5b1934 $1 = 0xc068f280 (kgdb) p/x *0xc85d2744 $2 = 0xc068f280 (kgdb) p/x *0xc875d000 $3 = 0xc068f280 (kgdb) p/x *0xc8a773e0 $4 = 0xc068f280 ... and so on. May be that's important? PS. I have a crash dump but already have no the kernel for that dump. Thank you for your attention. -- DSS5-RIPE DSS-RIPN 2:550/5068@fidonet 2:550/5069@fidonet mailto:dsh@vlink.ru http://neva.vlink.ru/~dsh/