From owner-freebsd-stable@FreeBSD.ORG Thu Dec 18 00:35:00 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C360A16A4CE for ; Thu, 18 Dec 2003 00:35:00 -0800 (PST) Received: from alcanet.com.au (mailout2.alcanet.com.au [208.178.117.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id C84E543D1F for ; Thu, 18 Dec 2003 00:34:55 -0800 (PST) (envelope-from peter.jeremy@alcatel.com.au) Received: from sydsmtp02.alcatel.com.au (IDENT:root@localhost.localdomain [127.0.0.1])hBGMPXPF025397; Wed, 17 Dec 2003 09:25:34 +1100 Received: from gsmx07.alcatel.com.au ([139.188.20.247]) by sydsmtp02.alcatel.com.au (Lotus Domino Release 5.0.12) with ESMTP id 2003121709253303:277578 ; Wed, 17 Dec 2003 09:25:33 +1100 Received: from gsmx07.alcatel.com.au (localhost [127.0.0.1]) by gsmx07.alcatel.com.au (8.12.9/8.12.9) with ESMTP id hBGMPW7d042103; Wed, 17 Dec 2003 09:25:32 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.12.9/8.12.9/Submit) id hBGMPWoS042102; Wed, 17 Dec 2003 09:25:32 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Date: Wed, 17 Dec 2003 09:25:31 +1100 From: Peter Jeremy To: freebsd-stable@freebsd.org Message-ID: <20031216222531.GN88618@gsmx07.alcatel.com.au> Mail-Followup-To: freebsd-stable@freebsd.org, Andrew.Li@alcatel.com.au Mime-Version: 1.0 User-Agent: Mutt/1.4.1i X-MIMETrack: Itemize by SMTP Server on SYDSMTP02/AlcatelAustralia(Release 5.0.12 |February 13, 2003) at 17/12/2003 09:25:33,|February 13, 2003) at 17/12/2003 09:25:34, Serialize complete at 17/12/2003 09:25:34 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Partial deadlock in 4.9p1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Dec 2003 08:35:01 -0000 I've just had a 4.9p1 system partially deadlock for about 30 minutes and then spontaneously(?) recover. The "ps" results point at vinum. I've never seen this before and don't recall anything like this being mentioned here before. At the time, I had two find's running in a large NFS mounted filesystem and was doing a "cvs -d /usr/ncvs -R checkout -P ports" from a normal SCSI disk (/usr/ncvs) into a 12-way RAID-5 filesystem ($PWD == /mnt). After a while (10-15 minutes), I noticed that the messages from cvs had stopped, though a top and systat were still running and the top output indicated that both find's were still running. Apart from the 'cvs', the system seemed to be running normally but trying "sync" in another terminal also wedged, though "df" worked as did "ls" of a different subdirectory in /mnt. ddb/ps showed that "cvs", "sync", "bufdaemon" and "syncer" were all blocked on "vrlock" with the same wchan (c1482800). The system sources are also under /mnt and when I went to src/sys and started a search for "vrlock", the system recovered - the sync returned and cvs started running again. Having looked at the sources, the whcan points to a struct rangelock - but gdb reports that that particular rangelock is now being recycled continuously so I'm not sure of the actual I/O operation that caused the problem. Whilst bufdaemon is often waiting in vrlock, once I stop doing disk I/O, the dirtybuf count will go to zero and bufdaemon will go to psleep instead. No messages were reported on either the console or via syslog. Vinum details are: aalp03# vinum list 12 drives: D disk00 State: up Device /dev/da1a Avail: 3/4094 MB (0%) D disk01 State: up Device /dev/da2a Avail: 3/4094 MB (0%) D disk02 State: up Device /dev/da3a Avail: 3/4094 MB (0%) D disk03 State: up Device /dev/da4a Avail: 3/4094 MB (0%) D disk04 State: up Device /dev/da5a Avail: 3/4094 MB (0%) D disk05 State: up Device /dev/da6a Avail: 3/4094 MB (0%) D disk08 State: up Device /dev/da7a Avail: 3/4094 MB (0%) D disk09 State: up Device /dev/da8a Avail: 3/4094 MB (0%) D disk10 State: up Device /dev/da9a Avail: 3/4094 MB (0%) D disk11 State: up Device /dev/da10a Avail: 3/4094 MB (0%) D disk12 State: up Device /dev/da11a Avail: 3/4094 MB (0%) D disk13 State: up Device /dev/da12a Avail: 3/4094 MB (0%) 1 volumes: V raid5 State: up Plexes: 1 Size: 43 GB 1 plexes: P raid5.p0 R5 State: up Subdisks: 12 Size: 43 GB 12 subdisks: S raid5.p0.s0 State: up PO: 0 B Size: 4091 MB S raid5.p0.s1 State: up PO: 4032 kB Size: 4091 MB S raid5.p0.s2 State: up PO: 8064 kB Size: 4091 MB S raid5.p0.s3 State: up PO: 11 MB Size: 4091 MB S raid5.p0.s4 State: up PO: 15 MB Size: 4091 MB S raid5.p0.s5 State: up PO: 19 MB Size: 4091 MB S raid5.p0.s6 State: up PO: 23 MB Size: 4091 MB S raid5.p0.s7 State: up PO: 27 MB Size: 4091 MB S raid5.p0.s8 State: up PO: 31 MB Size: 4091 MB S raid5.p0.s9 State: up PO: 35 MB Size: 4091 MB S raid5.p0.s10 State: up PO: 39 MB Size: 4091 MB S raid5.p0.s11 State: up PO: 43 MB Size: 4091 MB aalp03# tail /var/log/vinum_history 23 Dec 2002 06:59:45.792709 *** Created devices *** 23 Dec 2002 06:59:48.833111 list 11 Dec 2003 09:10:03.754890 *** vinum started *** 11 Dec 2003 09:10:03.779419 list 17 Dec 2003 08:39:33.834732 *** vinum started *** 17 Dec 2003 08:39:33.862485 list 17 Dec 2003 09:17:18.888825 *** vinum started *** 17 Dec 2003 09:17:18.938531 list 17 Dec 2003 09:17:22.942145 *** vinum started *** 17 Dec 2003 09:17:22.945871 list aalp03# tail /var/log/messages ... Dec 17 08:00:13 aalp03 /kernel: ipfw: Accounting cleared. Dec 17 08:17:51 aalp03 /kernel: Connection attempt to TCP xx.yy.zz.253:113 from xx.yy.zz.9:1658 flags:0x02 Dec 17 08:20:13 aalp03 /kernel: Connection attempt to TCP xx.yy.zz.253:113 from xx.yy.zz.9:1757 flags:0x02 Dec 17 08:28:47 aalp03 /kernel: arplookup 192.168.161.24 failed: host is not on local network Dec 17 08:29:51 aalp03 /kernel: arplookup 192.168.161.24 failed: host is not on local network aalp03# cat /var/run/dmesg.boot Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.9-RELEASE-p1 #4: Thu Dec 11 11:27:41 EST 2003 root@aalp03.alcatel.com.au:/mnt/obj/mnt/rpc/src/sys/wansim Timecounter "i8254" frequency 1193158 Hz Timecounter "TSC" frequency 598169769 Hz CPU: Pentium III/Pentium III Xeon/Celeron (598.17-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x673 Stepping = 3 Features=0x383f9ff real memory = 134201344 (131056K bytes) avail memory = 127127552 (124148K bytes) Preloaded elf kernel "kernel" at 0xc035f000. VESA: v2.0, 1024k memory, flags:0x0, mode table:0xc02ff602 (1000022) VESA: ATI MACH64 Pentium Pro MTRR support enabled npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 sym0: <875> port 0x2000-0x20ff mem 0xc6aff000-0xc6afffff,0xc6afaf00-0xc6afafff irq 10 at device 6.0 on pci0 sym0: No NVRAM, ID 7, Fast-20, SE, parity checking sym1: <875> port 0x2400-0x24ff mem 0xc6afe000-0xc6afefff,0xc6afae00-0xc6afaeff irq 11 at device 6.1 on pci0 sym1: No NVRAM, ID 7, Fast-20, SE, parity checking tl0: port 0x3040-0x304f mem 0xc6afacf0-0xc6afacff irq 15 at device 7.0 on pci0 tl0: Ethernet address: 00:50:8b:dc:8a:75 miibus0: on tl0 lxtphy0: on miibus0 lxtphy0: 100baseFX, 100baseFX-FDX, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto tlphy0: on miibus0 tlphy0: 10base2/BNC, 10base5/AUI pci0: at 8.0 pci0: (vendor=0x0e11, dev=0xa0f0) at 9.0 pci0: (vendor=0x114f, dev=0x0004) at 13.0 irq 5 isp0: port 0x2c00-0x2cff mem 0xc6afc000-0xc6afcfff irq 15 at device 15.0 on pci0 fxp0: port 0x3000-0x303f mem 0xc6b00000-0xc6bfffff,0xc6afb000-0xc6afbfff irq 15 at device 16.0 on pci0 fxp0: Ethernet address 00:d0:b7:20:bd:ab inphy0: on miibus1 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: at device 20.0 on pci0 isa0: on isab0 atapci0: port 0xf100-0xf10f at device 20.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: at 20.2 irq 0 chip1: at device 20.3 on pci0 orm0: