From owner-freebsd-stable@FreeBSD.ORG Wed Jun 4 15:07:51 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02CEE106566C for ; Wed, 4 Jun 2008 15:07:51 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id 70FA08FC0A for ; Wed, 4 Jun 2008 15:07:50 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id CF83A744008 for ; Wed, 4 Jun 2008 18:07:48 +0300 (EEST) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gUqSOdLqMeOY for ; Wed, 4 Jun 2008 18:07:48 +0300 (EEST) Received: from [10.2.1.87] (gateway.cybervisiontech.com.ua [91.198.50.114]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id 84BCA744007 for ; Wed, 4 Jun 2008 18:07:48 +0300 (EEST) Message-ID: <4846AFC3.3050101@icyb.net.ua> Date: Wed, 04 Jun 2008 18:07:47 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.14 (X11/20080513) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: mystery: lock up after fs dump X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jun 2008 15:07:51 -0000 I wouldn't report this if not for one coincidence (which is described below). I have too little facts, so this is more of a mystery problem tale than a real problem report. There are two systems: 1. old, slow, i386, UP, 7-STABLE 2. new, fast, amd64, MP, 6.3-RELEASE Systems are located at different physical locations. What is common between them: 1. they both have the same backup strategy where dumps of certain levels are performed on certain days; there are monthly dumps of level 2 (on first day of each month), weekly dumps of level 4 (each Sunday) and daily dumps of levels > 5 (each day except for Sunday - but including the firsts). dumps are done on live filesystems using -L. dumps are initially done to the same disk and only later are transfered to archive media. 2. both kernels are compiled with softupdates support but there are no filesystems with it enabled 3. both systems have root partition gmirror-ed, it is dumped 4. both systems have gjournal support (on 6.X it is added via a "non-official" patch), there are gjournaled filesystems on both systems and they are dumped. On June 1 (Sunday) exactly the same thing happened on both systems. At 4AM monthly level 2 dump was started and successfully performed. At 5AM weekly level 4 dump was started. Somewhere in the process of it system locked up. When I physically accessed the systems I found the following: keyboard didn't respond[*], screen froze, no pings. After reset I found that logs stopped being updated at some timer shortly after 5AM. [*] - although on amd64 system I was able to switch exactly once between virtual terminals (actually from X terminal to console terminal). But that's all, no led responses, no special combinations (like break to debugger - it is compiled in / enabled). This coincidence in details (and that one successful VT switch) lead me to believe that this was some lock up in kernel rather than a hardware problem. Also, I use that backup scheme for almost a year and never had such a problem before. I just checked and this was the first time that the 1st of a month fell on Sunday, so this was the first time when level 2 dump was followed by level 4 dump. In previous months it was followed by level > 6 dumps. All in all, quite strange. -- Andriy Gapon