From owner-freebsd-questions@FreeBSD.ORG Thu Feb 20 18:51:53 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CDB853A9; Thu, 20 Feb 2014 18:51:53 +0000 (UTC) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9776F1280; Thu, 20 Feb 2014 18:51:53 +0000 (UTC) Received: from smarthost.fisglobal.com ([10.132.206.191]) by ltcfislmsgpa07.fnfis.com (8.14.5/8.14.5) with ESMTP id s1KIpkWF003013 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 20 Feb 2014 12:51:46 -0600 Received: from THEMADHATTER (10.242.181.54) by smarthost.fisglobal.com (10.132.206.191) with Microsoft SMTP Server id 14.3.174.1; Thu, 20 Feb 2014 12:51:45 -0600 From: Sender: Devin Teske To: , References: In-Reply-To: Subject: RE: System freezes up during long-running ZFS disk activity Date: Thu, 20 Feb 2014 10:51:42 -0800 Message-ID: <10c801cf2e6c$cc6599f0$6530cdd0$@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQHrqfQ7eHdjJ596bEV3StGCI6cbhJqFgwoA Content-Language: en-us X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87, 1.0.14, 0.0.0000 definitions=2014-02-20_07:2014-02-20,2014-02-20,1970-01-01 signatures=0 Cc: dteske@FreeBSD.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Feb 2014 18:51:54 -0000 > -----Original Message----- > From: Christian Campbell [mailto:dcamp@alumni.ufl.edu] > Sent: Wednesday, February 19, 2014 12:07 PM > To: freebsd-questions@freebsd.org > Subject: System freezes up during long-running ZFS disk activity > > I recently installed 9.2-RELEASE-p3 on a Dell Precision T5400. I'm using ZFS > filesystem version: 5, ZFS storage pool version: features support (5000). The > pool was imported from a previous 9.2 box on which it worked without issue. > > I don't know if my problem is ZFS-related, but my ZFS use is why I noticed it and > I seem to be able to reproduce it reliably. Every so often, from minutes to > hours, my computer will freeze up while ZFS has been busy. This happens > during a resilver, a scrub, and a long-running process reading millions of files > from the pool. When it freezes, all output and input > freezes: tasks like zpool iostat -v 1 or top stop updating their output, whether on > the console or an ssh terminal over Ethernet. Pressing keys does not garner a > response.* Sometimes a freeze lasts minutes and then proceeds on its own. > Sometimes it goes on for hours. An action that typically, but not always, jogs it > is unplugging the USB keyboard -- the disk activity resumes immediately, and > any queued keyboard input immediately plays out whether on the console or > over ssh. Lastly, my ssh terminal (PuTTY) will stay connected for hours during a > freeze-up, *i.e.* the TCP circuit is not closed or timed out, as opposed to > closing pretty quickly after the server is powered off. > > In all cases, the system clock lags by the sum of the durations of the freezes. > > * During an initial resilver, I noticed that pressing a key such as Ctrl on the USB > keyboard would jog it, but pressing Ctrl or other keys doesn't jog my process of > long-running IO activity. But in all cases, even when unplugging and replugging > the USB keyboard doesn't jog it, Ctrl-Alt-Del prompts an orderly shutdown. > > Debugging advise is very welcome! > [Devin Teske] I had this exact same problem on a Dell 1U F1DH server. I didn't send any e-mail to the mailing lists, because I feared I was going crazy. Of course, it's been 30 days since I had that problem... if I try to remember what it was... it was either the bad SATA port (which had loose soldering), or it was the drive which said SATA port had fubar'd (putting that drive into another system saw the same thing happen in said new system). So what I did was rsync all the data off that drive to another one (and yes, because I had to "jog" the system to get it to be responsive, in the same exact situation you describe above) it took a very _very_ long time. But... once I got off of that drive everything looked much much better. I also found other ways to jog it were Alt+FN, and even the occasional ping would jog it too. It appeared to be interrupt driven in some way. Might I suggest that you have a drive acting up in your pool. -- Devin _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.