From owner-freebsd-questions@FreeBSD.ORG Sun Jul 12 23:56:18 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B422B106566C for ; Sun, 12 Jul 2009 23:56:18 +0000 (UTC) (envelope-from mel.flynn+fbsd.questions@mailing.thruhere.net) Received: from mailhub.rachie.is-a-geek.net (rachie.is-a-geek.net [66.230.99.27]) by mx1.freebsd.org (Postfix) with ESMTP id 502CE8FC12 for ; Sun, 12 Jul 2009 23:56:18 +0000 (UTC) (envelope-from mel.flynn+fbsd.questions@mailing.thruhere.net) Received: from smoochies.rachie.is-a-geek.net (mailhub.lan.rachie.is-a-geek.net [192.168.2.11]) by mailhub.rachie.is-a-geek.net (Postfix) with ESMTP id 3E7867E818; Sun, 12 Jul 2009 15:56:17 -0800 (AKDT) From: Mel Flynn To: freebsd-questions@freebsd.org Date: Sun, 12 Jul 2009 15:56:15 -0800 User-Agent: KMail/1.11.4 (FreeBSD/8.0-CURRENT; KDE/4.2.4; i386; ; ) References: <200907122044359.SM01728@W500.Go2France.com> <200907121304.52880.mel.flynn+fbsd.questions@mailing.thruhere.net> <200907122302734.SM01728@W500.Go2France.com> In-Reply-To: <200907122302734.SM01728@W500.Go2France.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200907121556.16039.mel.flynn+fbsd.questions@mailing.thruhere.net> Cc: Len Conrad Subject: Re: dump hangs on 7.1 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Jul 2009 23:56:19 -0000 On Sunday 12 July 2009 13:20:49 Len Conrad wrote: > At 04:04 PM 7/12/2009, you wrote: > >On Sunday 12 July 2009 11:03:00 Len Conrad wrote: > >> >On Friday 10 July 2009 08:29:01 Len Conrad wrote: > >> >> FreeBSD 7.1-RELEASE #0: Thu Jan 1 14:37:25 UTC 2009 > >> >> root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 > >> >> > >> >> CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2496.26-MHz > >> >> 686-class CPU) Origin = "GenuineIntel" Id = 0x1067a Stepping = 10 > >> >> AMD Features=0x20100000 > >> >> AMD Features2=0x1 > >> >> Cores per package: 4 > >> >> real memory = 3484745728 (3323 MB) > >> >> avail memory = 3405537280 (3247 MB) > >> >> ACPI APIC Table: > >> >> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > >> >> cpu0 (BSP): APIC ID: 0 > >> >> cpu1 (AP): APIC ID: 1 > >> >> cpu2 (AP): APIC ID: 2 > >> >> cpu3 (AP): APIC ID: 3 > >> >> > >> >> > >> >> /sbin/dump -0uanL -f - / | ssh dump_images@xxx.net dd > >> >> of=/var/ftp/dump_images/mx1-root-test > >> >> > >> >> dump has completed only once. Several other dumps have all gotten > >> >> under way, target file is created and increases until the hang. > >> >> > >> >> CTRL-C gets back to shell,eg: > >> >> > >> >> DUMP: Date of this level 0 dump: Fri Jul 10 10:25:33 2009 > >> >> DUMP: Date of last level 0 dump: the epoch > >> >> DUMP: Dumping snapshot of /dev/da0s1d (/usr) to standard output > >> >> DUMP: mapping (Pass I) [regular files] > >> >> DUMP: mapping (Pass II) [directories] > >> >> DUMP: estimated 1713942 tape blocks. > >> >> DUMP: dumping (Pass III) [directories] > >> >> DUMP: dumping (Pass IV) [regular files] > >> >> ^C DUMP: Interrupt received. > >> >> DUMP: Do you want to abort dump?: ("yes" or "no") Killed by signal > >> >> 2. DUMP: Broken pipe > >> >> DUMP: The ENTIRE dump is aborted. > >> >> > >> >> Hangs always in Pass IV > >> > > >> >What's the output ps -auwwx|grep dump at the time of the dump. > >> > >> when the dump hangs: > >> > >> ps auxww | grep dump > >> > >> root 61360 0.0 0.0 3128 1168 p0 I+ 1:47PM 0:00.06 > >> /sbin/dump -0uanL -f - / (dump) > >> > >> root 61361 0.0 0.1 5560 2768 p0 I+ 1:47PM 0:03.65 ssh > >> xxx@xxx.net dd of=/var/ftp/dump_images/mx1-root-test > >> > >> root 61364 0.0 0.0 3128 1528 p0 I+ 1:47PM 0:00.36 dump: > >> /dev/da0s1a: pass 4: 92.66% done, finished in 0:00 at Sun Jul 12 > >> 13:47:52 2009 (dump) > > > >procstat -k 61364 please? > > I ran it again, diff pid: > > procstat -k 67765 > PID TID COMM TDNAME KSTACK > 67765 100159 dump - mi_switch sleepq_switch > sleepq_catch_signals sleepq_wait_sig _sleep sbwait soreceive_generic > soreceive soo_read dofileread kern_readv read syscall Xint0x80_syscall It looks like it's waiting ssh/dd to report. Is the same happening when you dump to a local file (on a different partition obviously)? This would rule out inter process communications within dump itself. FYI, I'm using this daily through periodic with a few 7.1-STABLE machines and -current. Although, I do compress (with gzip and bzip2 on faster CPU's) before transfer. The only difference is that I don't use then -n flag to dump. Worth a try, though I doubt the so_receive it's waiting on is because it's unable to notify a human in the operator group. If you're comfortable doing so, you could grab a 7.2-RELEASE livefs CD to see if this issue persists using the dump tools from there, though I don't know of any particular fixes in this area. > >Is the percentage always the same for the same disk? > > no, it varies widely. > > >If you kill dd on the other side, does dump notice it? > > yes, I kill dd on the target, and the dump shows: > > DUMP: dumping (Pass IV) [regular files] > Terminated > DUMP: Broken pipe > DUMP: The ENTIRE dump is aborted. -- Mel