From owner-freebsd-stable@FreeBSD.ORG Tue Jul 15 17:11:43 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BC2A1065693 for ; Tue, 15 Jul 2008 17:11:43 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 0B0618FC1C for ; Tue, 15 Jul 2008 17:11:42 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.1/8.14.1) with ESMTP id m6FHBgr4007482; Tue, 15 Jul 2008 10:11:42 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.1/8.13.4/Submit) id m6FHBgVO007481; Tue, 15 Jul 2008 10:11:42 -0700 (PDT) Date: Tue, 15 Jul 2008 10:11:42 -0700 (PDT) From: Matthew Dillon Message-Id: <200807151711.m6FHBgVO007481@apollo.backplane.com> To: Steve Bertrand References: <487CCD46.8080506@ibctech.ca> Cc: freebsd-stable@freebsd.org Subject: Re: taskqueue timeout X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jul 2008 17:11:43 -0000 :Hi everyone, : :I'm wondering if the problems described in the following link have been :resolved: : :http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html : :I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them :are experiencing the behavior. : :The problem only happens with extreme disk activity. The box becomes :unresponsive (can not SSH etc). Keyboard input is displayed on the :console, but the commands are not accepted. : :Is there anything I can do to either figure this out, or work around it? : :Steve If you are getting DMA timeouts, go to this URL: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting Then I would suggest going into /usr/src/sys/dev/ata (I think, on FreeBSD), locate all instances where request->timeout is set to 5, and change them all to 10. cd /usr/src/sys/dev/ata fgrep 'request->timeout' *.c ... change all assignments of 5 to 10 ... Try that first. If it helps then it is a known issue. Basically a combination of the on-disk write cache and possible ECC corrections, remappings, or excessive remapped sectors can cause the drive to take much longer then normal to complete a request. The default 5-second timeout is insufficient. If it does help, post confirmation to prod the FBsd developers to change the timeouts. -- If you are NOT getting DMA timeouts then the ZFS lockups may be due to buffer/memory deadlocks. ZFS has knobs for adjusting its memory footprint size. Lowering the footprint ought to solve (most of) those issues. It's actually somewhat of a hard issue to solve. Filesystems like UFS aren't complex enough to require the sort of dynamic memory allocations deep in the filesystem that ZFS and HAMMER need to do. -Matt Matthew Dillon