From owner-freebsd-fs@FreeBSD.ORG Mon Jun 2 06:04:14 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B7AE1065675 for ; Mon, 2 Jun 2008 06:04:14 +0000 (UTC) (envelope-from andrew@thefrog.net) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.156]) by mx1.freebsd.org (Postfix) with ESMTP id D1E428FC19 for ; Mon, 2 Jun 2008 06:04:13 +0000 (UTC) (envelope-from andrew@thefrog.net) Received: by fg-out-1718.google.com with SMTP id l26so789772fgb.35 for ; Sun, 01 Jun 2008 23:04:12 -0700 (PDT) Received: by 10.82.112.3 with SMTP id k3mr800814buc.56.1212386652417; Sun, 01 Jun 2008 23:04:12 -0700 (PDT) Received: by 10.82.149.3 with HTTP; Sun, 1 Jun 2008 23:04:12 -0700 (PDT) Message-ID: <16a6ef710806012304m48b63161oee1bc6d11e54436a@mail.gmail.com> Date: Mon, 2 Jun 2008 16:04:12 +1000 From: "Andrew Hill" Sender: andrew@thefrog.net To: freebsd-fs@freebsd.org In-Reply-To: <93F07874-8D5F-44AE-945F-803FFC3B9279@thefrog.net> MIME-Version: 1.0 References: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> <20080518124217.GA16222@eos.sc1.parodius.com> <93F07874-8D5F-44AE-945F-803FFC3B9279@thefrog.net> X-Google-Sender-Auth: c517165170c0559e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS lockup in "zfs" state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2008 06:04:14 -0000 some more info... On Mon, May 19, 2008 at 1:11 AM, Andrew Hill wrote: > i tend to find that the timeouts occur on one or two disks at once - e.g. > ad0 and 2 will complain of timeouts, and the system locks up shortly > thereafter... after spitting out the usual errors from ad0 and ad2 (in this case) with TIMEOUTs and subsequent FAILUREs on READ_DMA[48] and WRITE_DMA[48]... i got the following panic vm_fault: pager read error, pid 1552 (tlsmgr) ad0: FAILURE - READ_DMA48 timed out LBA=352903900 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 437, size: 4096 ad2: FAILURE - WRITE_DMA timed out LBA=239717693 panic: ZFS: I/O failure (write on off 0: zio 0xffffff001d47c810 [L0 ZIL intent log] b000L/b000P DVA[0]=<0:c807795000:d000> zilog uncompressed LE contiguous birth=750230 fill=0 cksum=69f76525a84e1816:f6d86fe1d94cd68c:39:8af): error 5 KDB: enter: panic [thread pid 72 tid 100071 ] Stopped at kdb_enter_why+0x3d: movq $0,0x39b248(%rip) db> generally the lockups don't result in a panic (at least not in the short term of 5-10 minutes), so i can't be sure that this panic is necessarily caused by the same problem, but thought it might be worth posting in case it gives an indication of the location/cause of the deadlock unfortunately i couldn't get a backtrace or core dump for 'political' reasons (the system was required for use by others) but i'll see if i can get a panic happening after-hours to get some more info...