From owner-freebsd-stable@FreeBSD.ORG Tue Dec 30 02:03:25 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 184B31065673 for ; Tue, 30 Dec 2008 02:03:25 +0000 (UTC) (envelope-from terry@tmk.com) Received: from server.tmk.com (server.tmk.com [204.141.35.63]) by mx1.freebsd.org (Postfix) with ESMTP id ECC4F8FC14 for ; Tue, 30 Dec 2008 02:03:24 +0000 (UTC) (envelope-from terry@tmk.com) Received: from tmk.com by tmk.com (PMDF V6.3-x13 #37010) id <01N3OF1382QO000125@tmk.com> for freebsd-stable@freebsd.org; Mon, 29 Dec 2008 20:38:38 -0500 (EST) Date: Mon, 29 Dec 2008 20:28:41 -0500 (EST) From: Terry Kennedy To: freebsd-stable@freebsd.org Message-id: <01N3OFGBCXMS000125@tmk.com> MIME-version: 1.0 Content-type: TEXT/PLAIN; CHARSET=us-ascii Subject: rdump stuck in sbwait state (RELENG_7) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2008 02:03:25 -0000 I upgraded a box (Dell Poweredge 1550, dual PIII processors) from a kernel + world of December 8th to one from today (December 29th) and I am experiencing a new problem with rdump. The symptom is that rdump stops sending data to the remote system. It is responsive to ^T and can be aborted with ^C. Here's the ^T status on the sending box (the aforementioned Dell RELENG_7 system): DUMP: dumping (Pass IV) [regular files] DUMP: 20.49% done, finished in 0:19 at Mon Dec 29 19:58:57 2008 DUMP: 38.00% done, finished in 0:16 at Mon Dec 29 20:00:52 2008 DUMP: 55.45% done, finished in 0:12 at Mon Dec 29 20:01:37 2008 load: 0.00 cmd: rdump 1493 [sbwait] 2.32u 11.25s 0% 34616k load: 0.00 cmd: rdump 1493 [sbwait] 2.32u 11.25s 0% 34616k load: 0.00 cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k load: 0.00 cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k load: 0.00 cmd: rdump 1494 [pause] 2.30u 11.22s 0% 34616k load: 0.00 cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k load: 0.00 cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k load: 0.00 cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k load: 0.00 cmd: rdump 1493 [sbwait] 2.32u 11.25s 0% 34616k load: 0.00 cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k load: 0.00 cmd: rdump 1492 [sbwait] 2.46u 4.89s 0% 34800k load: 0.02 cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k load: 0.02 cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k load: 0.02 cmd: rdump 1495 [pause] 2.37u 11.25s 0% 34616k load: 0.02 cmd: rdump 1492 [running] 2.46u 4.89s 0% 34800k A tcpdump on both the sending and receiving systems shows no packets between them from the rdump processes. However, I can rshell both ways and get the expected output, so the link isn't down. ps shows the same thing as ^T. The sbwait process looks like this: 0 1492 1489 0 4 0 36024 34808 sbwait I+ p0 0:07.35 rdump: /dev/amrd0s1f: pass 4: 69.66% done, finished in 0:08 at Mon Dec 29 20:01:53 2008 (rdump) and the status never changes. The remote (receiving) system is a HP DS10 running OpenVMS 8.3 with MultiNet 5.1A as the TCP stack. Despite this being a rather rare envir- onment, I haven't had any problems until this most recent kernel build. I have a large number (over a dozen) other systems running a variety of releases (6.4, 7.0, 7.1-PRERELEASE) which can do this same dump oper- ation without difficulty. I have the offending dump process still in this stuck state, so I can generate whatever sort of debugging information is needed. The box is a test box, so I can crash it and get a core dump if that's what is needed. Terry Kennedy http://www.tmk.com terry@tmk.com New York, NY USA