From owner-freebsd-current@FreeBSD.ORG Wed Jun 9 05:12:20 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8760316A4CE; Wed, 9 Jun 2004 05:12:20 +0000 (GMT) Received: from ran.psg.com (ip192.186.dsl-acs2.seawa0.iinet.com [209.20.186.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3718143D1F; Wed, 9 Jun 2004 05:12:20 +0000 (GMT) (envelope-from randy@psg.com) Received: from localhost ([127.0.0.1] helo=ran.psg.com.psg.com) by ran.psg.com with esmtp (Exim 4.32; FreeBSD) id 1BXvNr-0006Bs-Dm; Tue, 08 Jun 2004 22:12:19 -0700 From: Randy Bush MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16582.39986.825007.939008@ran.psg.com> Date: Tue, 8 Jun 2004 22:12:18 -0700 To: Robert Watson References: <16582.31631.362797.734568@ran.psg.com> cc: FreeBSD Current Subject: Re: snapshot dump hangs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2004 05:12:20 -0000 > (1) Could you do a "ps awxl" and see what wait channel dump is blocked on? UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 942 925 0 8 0 1332 1152 wait Ss p0 0:00.05 -bash (bash) 0 1701 942 0 8 0 1248 1052 wait S+ p0 0:00.01 /usr/local/bin/bash /do-dump 0 1745 1701 0 8 0 1496 1220 wait S+ p0 0:00.01 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump) 0 1748 1745 0 4 0 1624 1268 sbwait S+ p0 0:02.55 rdump: /dev/twed0s1d: pass 4: 60.19% done, finished in 0:00 (rdump) 0 1749 1748 0 20 0 1496 1220 pause S+ p0 0:03.99 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump) 0 1750 1748 0 20 0 1496 1220 pause S+ p0 0:03.95 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump) 0 1751 1748 0 20 0 1496 1220 pause S+ p0 0:03.95 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump) 0 1766 1764 0 8 0 1328 1148 wait Ss p1 0:00.03 -bash (bash) 0 1995 1766 0 76 0 1460 968 - R+ p1 0:00.00 /bin/ps -l 0 690 1 0 -8 0 2324 1812 piperd S d0- 0:00.02 /usr/local/bin/perl /usr/local/sbin/exim-pop/popwatch 0 691 1 0 8 0 2260 1764 nanslp S d0- 0:00.01 /usr/local/bin/perl /usr/local/sbin/exim-pop/popauth 0 692 1 0 8 0 2260 1704 nanslp S d0- 0:00.01 /usr/local/bin/perl /usr/local/sbin/exim-pop/popclean 0 703 690 0 4 0 1240 652 kqread S d0- 0:00.00 /usr/bin/tail -f /var/log/poplog 0 743 1 0 8 0 1676 1220 wait S d0- 0:00.02 /bin/sh /usr/local/bin/mysqld_safe --user=mysql --datadir=/var/db/mysql --pid-file=/var/db/mysql/psg.pid 0 904 1 0 5 0 1288 944 ttyin Ss+ d0 0:00.01 /usr/libexec/getty std.9600 ttyd0 0 896 1 0 5 0 1288 944 ttyin Ss+ v0 0:00.01 /usr/libexec/getty Pc ttyv0 0 897 1 0 5 0 1288 944 ttyin Ss+ v1 0:00.01 /usr/libexec/getty Pc ttyv1 0 898 1 0 5 0 1288 944 ttyin Ss+ v2 0:00.01 /usr/libexec/getty Pc ttyv2 0 899 1 0 5 0 1288 944 ttyin Ss+ v3 0:00.01 /usr/libexec/getty Pc ttyv3 0 900 1 0 5 0 1288 944 ttyin Ss+ v4 0:00.01 /usr/libexec/getty Pc ttyv4 0 901 1 0 5 0 1288 944 ttyin Ss+ v5 0:00.01 /usr/libexec/getty Pc ttyv5 0 902 1 0 5 0 1288 944 ttyin Ss+ v6 0:00.01 /usr/libexec/getty Pc ttyv6 0 903 1 0 5 0 1288 944 ttyin Ss+ v7 0:00.01 /usr/libexec/getty Pc ttyv7 > (2) Could you break into DDB and generate a stack trace for dump? db> trace 1745 sched_switch(c66cb150,8b088ded,52586aad,ffc00014,c66cb150) at sched_switch+0x145 mi_switch(1,c04d9400,c66ce6e0,c04d8e52,0) at mi_switch+0x1ab sleepq_switch(c66ce6e0,0,e3bdcc40,c04b97e1,c66ce6e0) at sleepq_switch+0x16f sleepq_wait_sig(c66ce6e0,5c,c66ce74c,c05fc4b5,0) at sleepq_wait_sig+0x14 msleep(c66ce6e0,c66ce74c,15c,c05fc4b5,0) at msleep+0x511 kern_wait(c66cb150,ffffffff,e3bdcc8c,0,e3bdcc90) at kern_wait+0xa19 wait4(c66cb150,e3bdcd14,10,7,4) at wait4+0x32 syscall(2f,2f,2f,6d1,bfbfe0c8) at syscall+0x320 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (7, FreeBSD ELF32, wait4), eip = 0x280c970f, esp = 0xbfbfe08c, ebp = 0xbfbfe0a8 --- db> trace 1748 sched_switch(c61e97e0,ba59592b,b60532c8,ffc06014,c61e97e0) at sched_switch+0x145 mi_switch(1,c04d9400,e3997b74,c04a61ef,0) at mi_switch+0x1ab sleepq_switch(c64466fc,0,e3997ba8,c04b97e1,c64466fc) at sleepq_switch+0x16f sleepq_wait_sig(c64466fc,58,0,0,0) at sleepq_wait_sig+0x14 msleep(c64466fc,0,158,c05ff4a1,0) at msleep+0x511 sbwait(c64466dc,c613e900,c64fcd20,e3997bfc,4) at sbwait+0x4b soreceive(c6446690,0,e3997c80,0,0) at soreceive+0x2a5 soo_read(c61ff990,e3997c80,c5fcde80,0,c61e97e0) at soo_read+0x93 dofileread(c61e97e0,c61ff990,7,bfbddb98,4) at dofileread+0xdc read(c61e97e0,e3997d14,c,c61e97e0,3) at read+0x6b syscall(807002f,805002f,bfbd002f,7,bfbddb98) at syscall+0x320 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (3, FreeBSD ELF32, read), eip = 0x280c978f, esp = 0xbfbddb5c, ebp = 0xbfbddb78 --- whoops! how do i tell it which process? > (3) Could you run "show lockedvnods" in DDB and show the results? db> show lockedvnods Locked vnodes > (4) Could you run "show locks " on the dump process? db> show locks 1745 No such command