From owner-freebsd-bugs@FreeBSD.ORG Fri Feb 22 22:04:36 2013 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4A414A19 for ; Fri, 22 Feb 2013 22:04:36 +0000 (UTC) (envelope-from mrezny@hexaneinc.com) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mx1.freebsd.org (Postfix) with ESMTP id F0CB3E6D for ; Fri, 22 Feb 2013 22:04:35 +0000 (UTC) Received: from mfilter13-d.gandi.net (mfilter13-d.gandi.net [217.70.178.141]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 01EEF41C06A for ; Fri, 22 Feb 2013 23:04:25 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mfilter13-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter13-d.gandi.net (mfilter13-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id qSTReCLRiz-a for ; Fri, 22 Feb 2013 23:04:23 +0100 (CET) X-Originating-IP: 81.90.254.28 Received: from unknown (unknown [81.90.254.28]) (Authenticated sender: mrezny@hexaneinc.com) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 51BB241C051 for ; Fri, 22 Feb 2013 23:04:23 +0100 (CET) Date: Fri, 22 Feb 2013 23:04:20 +0100 From: Matthew Rezny To: freebsd-bugs@freebsd.org Subject: Dead console on FreeBSD 9.1 Message-ID: <20130222230420.00000279@unknown> X-Mailer: Claws Mail 3.9.0cvs98 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2013 22:04:36 -0000 I have now observed this on more than one machine so it's time to report it rather than ignore it as a fluke. Over a month ago, I saw this several times on FreeBSD/ppc64 9.1-RC. The ppc64 port is not exactly solid, so with more pressing issues to deal with I ignored it. Now, I see the same on a box running FreeBSD/amd64 9.1-STABLE. Seeing the same issue on multiple machines indicates it's a real problem. What happens when the console goes dead is that there is no further output and there is no response to the keyboard. There should be output to the console from a running program. The program continues to run, but the screen is not updated. I cannot switch virtual consoles using the keyboard. I can ssh in and continue to use the machine. Everything seems to run fine and I can see further output in dmesg. I always considered the console as the last resort, so to loose it when the box is still running is very troublesome indeed. The only correlation I can come up with is that both machine have disks on a SAS card using the mpt driver and there may have been a device disconnecting from the bus at the time of the console hang. It is impossible to tell exactly when the console hung so I can't be sure that the output from mpt in dmesg correlates to the moment of the hang. On the ppc64 machine, I was using ddrescue on a troublesome disk which periodically disconnects from the SAS bus when the firmware takes too long attempting to correct errors. I just wanted to get through imaging the disk, so I let it run and used ssh to check the status. The active virtual console was running ddrescue, which continued to run without interruption after the console hang. Any output from the driver would have been on the first virtual console, which was not the active one at the time, and without the ability to switch to it I can only look at the end of dmesg and take a guess. On the amd64 machine, I was doing a zfs send/receive from one pool to another. Again, a troublesome disk is present which periodically disconnects. The active virtual console was running "zpool iostat -v 1" to monitor the status. The zfs send/recive pipe was in another virtual console and I had top running in yet another. Again, the console hangs, I cannot switch to the first virtual console to see what it might say, but I can see mpt errors at the end of dmesg. I left it be to finish the zfs send/receive operation while monitoring status via ssh. The important work is done for the moment. I have not rebooted the system so it has a dead console but I still have ssh access. Any suggestions what to look at while it is in this state to attempt to determine the cause?