From owner-freebsd-stable@FreeBSD.ORG Wed Nov 17 10:23:44 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 66F1616A4CE for ; Wed, 17 Nov 2004 10:23:44 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id EBB2943D2F for ; Wed, 17 Nov 2004 10:23:43 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id iAHAMFOV070840; Wed, 17 Nov 2004 05:22:15 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)iAHAMFXE070837; Wed, 17 Nov 2004 10:22:15 GMT (envelope-from robert@fledge.watson.org) Date: Wed, 17 Nov 2004 10:22:14 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Lukas Ertl In-Reply-To: <4379f9100411170140118fcb3f@mail.gmail.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: 5.3-STABLE frozen on heavy network load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2004 10:23:44 -0000 On Wed, 17 Nov 2004, Lukas Ertl wrote: > I'm seeing complete freezes on a 5.3-STABLE SMP (with HTT) kernel from > Fri Nov 12. The machine is acting as a newsserver, thus it has heavy > network and disk load. Do you know if the freeze happens with 5.3-RELEASE "as released"? If you set 'debug.mpsafenet=0', do the freezes keep happening? What happens if you run with INVARIANTS on? Is the system too slow with WITNESS to run your workload? If not, it might be quite helpful to see information locks held, etc, such as "show locks" for each interesting network-related thread. Could you send dmesg output? Do you have an estimate of how long it takes to go from boot to hang? > With the help of MP_WATCHDOG I was able to get a backtrace: kernel is > still available if I should send more info. If/when this recurs, could I get you to run the following commands in DDB, and send output: - ps - show lockedvnods - show pcpu - show pcpu X, for each valid value of X (0 ... maxcpus-1) - do trace on each thread active on a CPU - do trace on any network device driver ithread, on the netisr, and any other thread that appears to be involved in network activity Using the current core, could you go to frame #29, and print *td, *td->td_proc, *uio, *active_cred, and *fp. Go to frame #28 and print *so. If possible, please keep this dump around, I may also ask you to inspect *so_pcb once we know what to cast it to (given that it's a news server, could well be TCP, in which cast *(struct inpcb *)so->so_pcb, as well as the tcpcb reached through that). Unfortunately "complete freeze" could be a result of a number of potential problems in many different areas of the system. I'm hoping that the ps and trace output will hint to us whether it's caused by the network stack or some other bit of the system (such as the file system code -- look out for lots of processes in "getblk" + lots of locked vnodes). Oh, one more thing that would be useful: if you compile with BREAK_TO_DEBUGGER, are you able to get into the debugger using a console break or a serial break? If so, which? I assume that because you're using MP_WATCHDOG, you can't, but it's worth asking. Right now, syscons requires Giant, so if you can get into the debugger via the serial link but not syscons, it will suggest something is spinning with Giant. Thanks! Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research